Viterbi Training
Viterbi training is implemented in HTK3 in the tool HInit. First the data is uniformly segmentedwith the number of segments equal to the number of states. Then we get the optimal path through the HMM using Viterbi + Backtracking. Using the path we reestimate the model.
Flat Start
Flat start simply initialises the model with the observations uniform. For example we estimate a uniform Gaussian from all data available and use it for all initial distributions. In both cases, Flat Start and Viterbi Training we set the initial transition probabilities ourselves. Flat Start is implemented in HTK3 in the HCompV tool. The problem of Viterbi training is that it requires labeled data, while Flat Start does not.
Random Initialisation with Random Restarts
Viterbi Training needs labeled data and Flat Start is a simple initialization but can give a bad initial estimate. Since Baum Welch is dependent on the initial starting condition, we can also use random initializations with random restarts. In that way we can initialize the transition matrix by setting all possible transitions to random values. The Gaussians can be initialized as in Flat Start and then we add some small random number to the mean. We construct multiple random initializations and run Baum Welch. In the end we keep the model with the best likelihood. We can also use a random subset for the initial Gaussians. Unfortunately there is no implementation for this method in HTK. In my experiments this initialization works best in the scenario where no labels are available and Flat Start gives a bad estimates.