Daniel@World: Baum Welch Initialisation: Flat Start vs Randomisation vs Viterbi Training

Sunday, July 14, 2013

Baum Welch Initialisation: Flat Start vs Randomisation vs Viterbi Training

As most of the readers will know Baum Welch is a parameter estimation Algorithm for Hidden Markov Models. Given a initial configuration of the HMM it will converge to a locally optimal parameter configuration. The important bit is that the initial configuration will determine "how good" the final parameter set will be. To my knowledge there are three initialisation methods two are implemented in the HTK3 toolkit:

Viterbi Training

Viterbi training is implemented in HTK3 in the tool HInit. First the data is uniformly segmented
with the number of segments equal to the number of states. Then we get the optimal path through the HMM using Viterbi + Backtracking. Using the path we reestimate the model.

Flat Start

Flat start simply initialises the model with the observations uniform. For example we estimate a uniform Gaussian from all data available and use it for all initial distributions. In both cases, Flat Start and Viterbi Training we set the initial transition probabilities ourselves. Flat Start is implemented in HTK3 in the HCompV tool. The problem of Viterbi training is that it requires labeled data, while Flat Start does not.

Random Initialisation with Random Restarts

Viterbi Training needs labeled data and Flat Start is a simple initialization but can give a bad initial estimate. Since Baum Welch is dependent on the initial starting condition, we can also use random initializations with random restarts. In that way we can initialize the transition matrix by setting all possible transitions to random values. The Gaussians can be initialized as in Flat Start and then we add some small random number to the mean. We construct multiple random initializations and run Baum Welch. In the end we keep the model with the best likelihood. We can also use a random subset for the initial Gaussians. Unfortunately there is no implementation for this method in HTK. In my experiments this initialization works best in the scenario where no labels are available and Flat Start gives a bad estimates.

Daniel@World

Sunday, July 14, 2013

Baum Welch Initialisation: Flat Start vs Randomisation vs Viterbi Training

No comments:

Post a Comment