Now to the details. In contrast to a Bayes Network or a Hidden Markov Model a Random field is a undirected model. So instead of talking about a parent child relationship we talk about interaction between potentials. A potential is a function defined over variables of a clique. Just as a reminder,
in Computer Science a clique is a fully connected subgraph. For example if we were to label each frame in a sequence we could use the model below. The Y are the labels over time and X is the whole time series.
The potential function is defined as a three- clique over the last label, the current label and the whole time series.
While the transition probability (label sequence) is the same as in Hidden Markov Models as it models a
first order Markov Chain, the observations are fully accessible. In a Hidden Markov Model the observations have to be independent by definition. So now can define more powerful observations by designing potential functions that take previous values into account. Furthermore, we can use the labels in order to behave differently from label to label. The joint probability is then defined over these potentials. However for labeling the sequence we are not interested in the joint probability but in the conditional as P(Y | X) instead of P(Y, X). The conditional probability is then defined as:
And this is a Conditional Random Field. In my opinion and from what I read, this model is more powerful then Hidden Markov Models but has some drawbacks. The first one is that in order to train it, we need fully labeled data. While in a Hidden Markov Model, we can automatically distribute the data over the states by inferring the label sequence, this is not possible here. The reason is that instead of a Expectation Maximisation training, most people seem to use a Gradient Based optimization on the log-likelihood. Acquiring labels on that level might not be possible in all cases. The other thing I read is that these models need a lot more training data for training and the training is much slower. In the end that means use cases where only a few training instances can be collected this model won't be able to converge.


