Hidden Markov Model With Continuous Emissions Emission Matrix Probability
Hidden Markov Model with Gaussian Mixture Model emissions ( GMMHMM
)
The Hidden Markov Model (HMM) is a state-based statistical model that can be used to represent an individual observation sequence class. As seen in the diagram below, the rough idea is that each state should correspond to one 'section' of the sequence.
In the image above, we can imagine creating a HMM to model head gestures of the 'nod' class. If the above signal represents the \(y\)-position of the head during a nod, then each of the five states above would represent a different 'section' of the nod, and we would fit this HMM by training it on many \(y\)-position head gesture signals of the 'nod' class.
A single HMM is modeled by the GMMHMM
class.
Parameters and Training
A HMM is completely determined by its parameters, which are explained below.
-
Initial state distribution \(\boldsymbol{\pi}\):
A probability distribution that dictates the probability of the HMM starting in each state.
-
Transition probability matrix \(A\):
A matrix whose rows represent a probability distribution that dictates how likely the HMM is to transition to each state, given some current state.
-
Emission probability distributions \(B\):
A collection of \(M\) continuous multivariate probability distributions (one for each state) that each dictate the probability of the HMM generating an observation \(\mathbf{o}\), given some current state. Recall that we are generally considering multivariate observation sequences – that is, at time \(t\), we have an observation \(\mathbf{o}^{(t)}=\left(o_1^{(t)}, o_2^{(t)}, \ldots, o_D^{(t)}\right)\). The fact that the observations are multivariate necessitates a multivariate emission distribution. Sequentia uses a mixture of multivariate Gaussian distributions.
In order to learn these parameters, we must train the HMM on examples that are labeled with the class \(c\) that the HMM models. Denote the HMM that models class \(c\) as \(\lambda_c=(\boldsymbol{\pi}_c, A_c, B_c)\). We can use the Baum-Welch algorithm (an application of the Expectation-Maximization algorithm to HMMs) to fit \(\lambda_c\) and learn its parameters. This fitting is implemented by the fit()
function.
Mixture Emissions
The assumption that a single multivariate Gaussian emission distribution is accurate and representative enough to model the probability of observation vectors of any state of a HMM is often a very strong and naive one. Instead, a more powerful approach is to represent the emission distribution as a mixture of multiple multivariate Gaussian densities. An emission distribution for state \(m\), formed by a weighted mixture of \(K\) multivariate Gaussian densities is defined as:
\[b_m(\mathbf{o}^{(t)}) = \sum_{k=1}^K c_k^{(m)} \mathcal{N}\big(\mathbf{o}^{(t)}\ ;\ \boldsymbol\mu_k^{(m)}, \Sigma_k^{(m)}\big)\]
where \(\mathbf{o}^{(t)}\) is an observation vector at time \(t\), \(c_k^{(m)}\) is a mixture weight such that \(\sum_{k=1}^K c_k^{(m)} = 1\) and \(\boldsymbol\mu_k^{(m)}\) and \(\Sigma_k^{(m)}\) are the mean vector and covariance matrix of the \(k^\text{th}\) mixture component of the \(m^\text{th}\) state, respectively.
Note that even in the case that multiple Gaussian densities are not needed, the mixture weights can be adjusted so that irrelevant Gaussians are omitted and only a single Gaussian remains. However, the default setting of the GMMHMM
class is a single Gaussian.
Then a GMM-HMM is completely determined by \(\lambda=(\boldsymbol{\pi}, A, B)\), where \(B\) is a collection of \(M\) emission distributions (one for each state \(m=1,\ldots,M\)), which are each parameterized by a collection of
-
mixture weights \(c_1^{(m)}, \ldots, c_K^{(m)}\),
-
mean vectors \(\boldsymbol\mu_1^{(m)}, \ldots, \boldsymbol\mu_K^{(m)}\),
-
covariance matrices \(\Sigma_1^{(m)}, \ldots, \Sigma_K^{(m)}\),
for each of the \(1,\ldots,K\) mixture components of each state.
Usually if \(K\) is large enough, a mixture of \(K\) Gaussian densities can effectively model any probability density function. With large enough \(K\), we can also restrict the covariance matrices and still get good approximations of any probability density function, and at the same time decrease the number of parameters that need to be updated during Baum-Welch.
The covariance matrix type can be specified by a string parameter covariance_type
in the GMMHMM
constructor that takes values 'spherical', 'diag', 'full' or 'tied'. The various types are explained well here, and summarized in the below image (also courtesy of the author of the response in the previous link).
Model Topologies
As we usually wish to preserve the natural ordering of time, we normally want to prevent our HMM from transitioning to previous states (this is shown in the previous figure). This restriction leads to what known as a left-right HMM, and is the most commonly used type of HMM for sequential modeling. Mathematically, a left-right HMM is defined by an upper-triangular transition matrix.
A linear topology is one in which transitions are only permitted to the current state and the next state, i.e. no state-jumping is permitted.
If we allow transitions to any state at any time, this HMM topology is known as ergodic.
Note
Ergodicity is mathematically defined as having a transition matrix with no zero entries.
Using the ergodic topology in Sequentia will still permit zero entries in the transition matrix, but will issue a warning stating that those probabilities will not be learned.
Sequentia offers all three topologies, specified by a string parameter topology
in the GMMHMM
constructor that takes values 'ergodic', 'left-right' or 'linear'.
Making Predictions
A score for how likely a HMM is to generate an observation sequence is given by the Forward algorithm. It calculates the likelihood \(\mathbb{P}(O|\lambda_c)\) of the HMM \(\lambda_c\) generating the observation sequence \(O\).
Note
The likelihood does not account for the fact that a particular observation class may occur more or less frequently than other observation classes. Once a group of GMMHMM
objects (represented by a HMMClassifier
) is created and configured, this can be accounted for by calculating the joint probability (or un-normalized posterior) \(\mathbb{P}(O, \lambda_c)=\mathbb{P}(O|\lambda_c)\mathbb{P}(\lambda_c)\) and using this score to classify instead (i.e. the Maximum A Posteriori classification rule). The addition of the prior term \(\mathbb{P}(\lambda_c)\) accounts for some classes occuring more frequently than others.
Sequentia provides support for uniform priors (equivalent to just using the likelihood to classify), class frequency priors, and also allows custom prior probabilities to be specified for each class.
See also
See the HMMClassifier
class to understand how the likelihood \(\mathbb{P}(O|\lambda_c)\) or joint probability \(\mathbb{P}(O, \lambda_c)\) is used to make a prediction for \(O\).
Example
1 import librosa 2 from sequentia.preprocessing import Compose , Custom , Standardize 3 from sequentia.classifiers import GMMHMM 4 from sequentia.datasets import load_digits 5 6 # Class to be represented by the HMM 7 digit = 4 8 9 # Load the FSDD dataset and split into training and testing 10 # (only fetch recordings of digit 4) 11 dataset = load_digits ( numbers = ( digit ,)) 12 train_set , test_set = dataset . split ( split_size = 0.2 , stratify = True , shuffle = True ) 13 14 # Set MFCC configuration 15 spec_kwargs = { 'sr' : 8000 , 'n_mfcc' : 5 , 'n_fft' : 1024 , 'hop_length' : 256 , 'power' : 2 } 16 17 # Create preprocessing pipeline 18 transforms = Compose ([ 19 Custom ( lambda x : librosa . feature . mfcc ( x . flatten (), ** spec_kwargs ) . T , name = 'MFCCs' , desc = 'Generate MFCCs' ), 20 Standardize () 21 ]) 22 23 # Apply transformations to the training and test set 24 train_set . X = transforms ( train_set . X ) 25 test_set . X = transforms ( test_set . X ) 26 27 # Create a linear HMM to represent the digit 4, with 3 states and 5 components in the GMM emission state distributions 28 hmm = GMMHMM ( label = digit , n_states = 3 , n_components = 5 , topology = 'linear' ) 29 # Set random initial state distributions and transition matrix according to the linear topology 30 hmm . set_random_initial () 31 hmm . set_random_transitions () 32 # Fit the HMM on recordings of the digit 4 33 hmm . fit ( train_set . X ) 34 35 # Calculate the forward probability for a new sequence (likelihood being generated by this HMM) 36 y0_forward = hmm . forward ( test_set . X [ 0 ])
For more elaborate examples, please have a look at the example notebooks.
API reference
- class sequentia.classifiers.hmm. GMMHMM ( label , n_states , n_components = 1 , covariance_type = 'full' , topology = 'left-right' , random_state = None ) [source]
-
A hidden Markov model with multivariate Gaussian mixture emissions representing a single sequence class.
- Parameters
-
- label: str or numeric
-
A label for the model, corresponding to the class being represented.
- n_states: int > 0
-
The number of states for the model.
- n_components: int > 0
-
The number of mixture components used in the emission distribution for each state.
- covariance_type: {'spherical', 'diag', 'full', 'tied'}
-
The covariance matrix type for emission distributions.
- topology: {'ergodic', 'left-right', 'linear'}
-
The topology for the model.
- random_state: numpy.random.RandomState, int, optional
-
A random state object or seed for reproducible randomness.
- Attributes
-
- label (property): str or numeric
-
The label for the model.
- model (property): hmmlearn.hmm.GMMHMM
-
The underlying GMMHMM model from hmmlearn.
- n_states (property): int
-
The number of states for the model.
- n_components (property): int
-
The number of mixture components used in the emission distribution for each state.
- covariance_type (property): str
-
The covariance matrix type for emission distributions.
- frozen (property): set (str)
-
The frozen parameters of the HMM or its GMM emission distributions (see
freeze()
). - n_seqs_ (property): int
-
The number of observation sequences used to train the model.
- initial_ (property/setter): numpy.ndarray (float)
-
The initial state distribution of the model.
- transitions_ (property/setter): numpy.ndarray (float)
-
The transition matrix of the model.
- weights_ (property): numpy.ndarray (float)
-
The mixture weights of the GMM emission distributions.
- means_ (property): numpy.ndarray (float)
-
The mean vectors of the GMM emission distributions.
- covars_ (property): numpy.ndarray (float)
-
The covariance matrices of the GMM emission distributions.
- monitor_ (property): hmmlearn.base.ConvergenceMonitor
-
The convergence monitor for the Baum–Welch algorithm.
- set_uniform_initial ( ) [source]
-
Sets a uniform initial state distribution \(\boldsymbol{\pi}=(\pi_1,\pi_2,\ldots,\pi_M)\) where \(\pi_i=1/M\quad\forall i\).
- set_random_initial ( ) [source]
-
Sets a random initial state distribution by sampling \(\boldsymbol{\pi}\sim\mathrm{Dir}(\mathbf{1}_M)\) where
-
\(\boldsymbol{\pi}=(\pi_1,\pi_2,\ldots,\pi_M)\) are the initial state probabilities for each state,
-
\(\mathbf{1}_M\) is a vector of \(M\) ones which are used as the concentration parameters for the Dirichlet distribution.
-
- set_uniform_transitions ( ) [source]
-
Sets a uniform transition matrix according to the topology, so that given the HMM is in state \(i\), all permissible transitions (i.e. such that \(p_{ij}\neq0\)) \(\forall j\) are equally probable.
- set_random_transitions ( ) [source]
-
Sets a random transition matrix according to the topology, so that given the HMM is in state \(i\), all out-going transition probabilities \(\mathbf{p}_i=(p_{i1},p_{i2},\ldots,p_{iM})\) from state \(i\) are generated by sampling \(\mathbf{p}_i\sim\mathrm{Dir}(\mathbf{1})\) with a vector of ones of appropriate size used as concentration parameters, so that only transitions permitted by the topology are non-zero.
- fit ( X ) [source]
-
Fits the HMM to observation sequences assumed to be labeled as the class that the model represents.
- Parameters
-
- X: list of numpy.ndarray (float)
-
Collection of multivariate observation sequences, each of shape \((T \times D)\) where \(T\) may vary per observation sequence.
- forward ( x ) [source]
-
Runs the forward algorithm to calculate the (log) likelihood of the model generating an observation sequence.
- Parameters
-
- x: numpy.ndarray (float)
-
An individual sequence of observations of size \((T \times D)\) where \(T\) is the number of time frames (or observations) and \(D\) is the number of features.
- Returns
-
- log-likelihood: float
-
The log-likelihood of the model generating the observation sequence.
- freeze ( params = None ) [source]
-
Freezes the specified parameters of the HMM or its GMM emission distributions, preventing them from being updated during the Baum–Welch algorithm.
- Parameters
-
- params: str, optional
-
A string specifying which parameters to freeze.
Can contain any combination of:
-
's' for initial state probabilities (HMM parameters),
-
't' for transition probabilities (HMM parameters),
-
'm' for mean vectors (GMM emission distribution parameters),
-
'c' for covariance matrices (GMM emission distribution parameters),
-
'w' for mixing weights (GMM emission distribution parameters).
Defaults to all parameters, i.e. 'stmcw'.
-
See also
-
unfreeze
-
Unfreezes parameters of the HMM or its GMM emission distributions.
- unfreeze ( params = None ) [source]
-
Unfreezes the specified parameters of the HMM or its GMM emission distributions which were frozen with
freeze()
, allowing them to be updated during the Baum–Welch algorithm.- Parameters
-
- params: str, optional
-
A string specifying which parameters to unfreeze.
Can contain any combination of:
-
's' for initial state probabilities (HMM parameters),
-
't' for transition probabilities (HMM parameters),
-
'm' for mean vectors (GMM emission distribution parameters),
-
'c' for covariance matrices (GMM emission distribution parameters),
-
'w' for mixing weights (GMM emission distribution parameters).
Defaults to all parameters, i.e. 'stmcw'.
-
See also
-
freeze
-
Freezes parameters of the HMM or its GMM emission distributions.
Hidden Markov Model Classifier ( HMMClassifier
)
Multiple HMMs can be combined to form a multi-class classifier. To classify a new observation sequence \(O'\), this works by:
-
Creating and training the HMMs \(\lambda_1, \lambda_2, \ldots, \lambda_C\).
-
Calculating the likelihoods \(\mathbb{P}(O'|\lambda_1), \mathbb{P}(O'|\lambda_2), \ldots, \mathbb{P}(O'|\lambda_C)\) of each model generating \(O'\).
-
Scaling the likelihoods by priors \(\mathbb{P}(\lambda_1), \mathbb{P}(\lambda_2), \ldots, \mathbb{P}(\lambda_C)\), producing un-normalized posteriors \(\mathbb{P}(O'|\lambda_c)\mathbb{P}(\lambda_c)\).
-
Performing MAP classification by choosing the class represented by the HMM with the highest posterior – that is, \(c'=\mathop{\arg\max}_{c\in\{1,\ldots,C\}}{\mathbb{P}(O'|\lambda_c)\mathbb{P}(\lambda_c)}\).
These steps are summarized in the diagram below.
Example
1 import librosa 2 from sequentia.preprocessing import Compose , Custom , Standardize 3 from sequentia.classifiers import GMMHMM , HMMClassifier 4 from sequentia.datasets import load_digits 5 6 # Load the FSDD dataset and split into training and testing 7 dataset = load_digits () 8 train_set , test_set = dataset . split ( split_size = 0.2 , stratify = True , shuffle = True ) 9 10 # Set MFCC configuration 11 spec_kwargs = { 'sr' : 8000 , 'n_mfcc' : 5 , 'n_fft' : 1024 , 'hop_length' : 256 , 'power' : 2 } 12 13 # Create preprocessing pipeline 14 transforms = Compose ([ 15 Custom ( lambda x : librosa . feature . mfcc ( x . flatten (), ** spec_kwargs ) . T , name = 'MFCCs' , desc = 'Generate MFCCs' ), 16 Standardize () 17 ]) 18 19 # Apply transformations to the training and test set 20 train_set . X = transforms ( train_set . X ) 21 test_set . X = transforms ( test_set . X ) 22 23 # Create and fit a HMM for each class - only training on the sequences that belong to that class 24 hmms = [] 25 for sequences , label in train_set . iter_by_class (): 26 # Create a linear HMM with 3 states and 5 components in the GMM emission state distributions 27 hmm = GMMHMM ( label = label , n_states = 3 , n_components = 5 , topology = 'linear' ) 28 # Set random initial state distributions and transition matrix according to the linear topology 29 hmm . set_random_initial () 30 hmm . set_random_transitions () 31 # Fit each HMM only on the observation sequences which had that label 32 hmm . fit ( sequences ) 33 hmms . append ( hmm ) 34 35 # Fit the classifier on the trained HMMs 36 clf = HMMClassifier () . fit ( hmms ) 37 38 # Make a single prediction 39 y0_pred = clf . predict ( test_set . X [ 0 ]) 40 41 # Make multiple predictions 42 y_pred = clf . predict ( test_set . X ) 43 44 # Make multiple predictions and return class scores (with parallelization) 45 y_pred , y_pred_scores = clf . predict ( test_set . X , return_scores = True , n_jobs =- 1 ) 46 47 # Calculate accuracy and generate confusion matrix (with parallelization) 48 accuracy , confusion = clf . evaluate ( test_set . X , test_set . y , n_jobs =- 1 )
For more elaborate examples, please have a look at the example notebooks.
API reference
- class sequentia.classifiers.hmm. HMMClassifier [source]
-
A classifier that combines individual
GMMHMM
objects, which each model sequences from a different class.- Attributes
-
- models_ (property): list of GMMHMM
-
A collection of the
GMMHMM
objects to use for classification. - encoder_ (property): sklearn.preprocessing.LabelEncoder
-
The label encoder fitted on the set of
classes
provided during instantiation. - classes_ (property): numpy.ndarray (str/numeric)
-
The complete set of possible classes/labels.
- fit ( models ) [source]
-
Fits the classifier with a collection of
GMMHMM
objects.- Parameters
-
- models: array-like of GMMHMM
-
A collection of
GMMHMM
objects to use for classification.
- predict ( X , prior = 'frequency' , return_scores = False , original_labels = True , verbose = True , n_jobs = 1 ) [source]
-
Predicts the label for an observation sequence (or multiple sequences) according to maximum likelihood or posterior scores.
- Parameters
-
- X: numpy.ndarray (float) or list of numpy.ndarray (float)
-
An individual observation sequence or a list of multiple observation sequences.
- prior: {'frequency', 'uniform'} or array-like of float
-
How the prior probability for each model is calculated to perform MAP estimation by scoring with the joint probability (or un-normalized posterior) \(\mathbb{P}(O, \lambda_c)=\mathbb{P}(O|\lambda_c)\mathbb{P}(\lambda_c)\).
-
'frequency': Calculate the prior probability \(\mathbb{P}(\lambda_c)\) as the proportion of training examples in class \(c\).
-
'uniform': Set the priors uniformly such that \(\mathbb{P}(\lambda_c)=\frac{1}{C}\) for each class \(c\in\{1,\ldots,C\}\) (equivalent to ignoring the prior).
Alternatively, class prior probabilities can be specified in an iterable of floats, e.g. [0.1, 0.3, 0.6].
-
- return_scores: bool
-
Whether to return the scores of each model on the observation sequence(s).
- original_labels: bool
-
Whether to inverse-transform the labels to their original encoding.
- verbose: bool
-
Whether to display a progress bar or not.
Note
If both
verbose=True
andn_jobs > 1
, then the progress bars for each process are always displayed in the console, regardless of where you are running this function from (e.g. a Jupyter notebook). - n_jobs: int > 0 or -1
-
The number of jobs to run in parallel.
Setting this to -1 will use all available CPU cores.
- Returns
-
- prediction(s): str/numeric or class:numpy:numpy.ndarray (str/numeric)
-
The predicted label(s) for the observation sequence(s).
If
original_labels
is true, then the returned labels are inverse-transformed into their original encoding. - scores: class:numpy:numpy.ndarray (float)
-
An \(N\times M\) matrix of scores (unnormalized log posteriors), for each of the \(N\) observation sequences, for each of the \(M\) HMMs. Only returned if
return_scores
is true.
- evaluate ( X , y , prior = 'frequency' , verbose = True , n_jobs = 1 ) [source]
-
Evaluates the performance of the classifier on a batch of observation sequences and their labels.
- Parameters
-
- X: list of numpy.ndarray (float)
-
A list of multiple observation sequences.
- y: array-like of str/numeric
-
An iterable of labels for the observation sequences.
- prior: {'frequency', 'uniform'} or array-like of float
-
How the prior probability for each model is calculated to perform MAP estimation by scoring with the joint probability (or un-normalized posterior) \(\mathbb{P}(O, \lambda_c)=\mathbb{P}(O|\lambda_c)\mathbb{P}(\lambda_c)\).
-
'frequency': Calculate the prior probability \(\mathbb{P}(\lambda_c)\) as the proportion of training examples in class \(c\).
-
'uniform': Set the priors uniformly such that \(\mathbb{P}(\lambda_c)=\frac{1}{C}\) for each class \(c\in\{1,\ldots,C\}\) (equivalent to ignoring the prior).
Alternatively, class prior probabilities can be specified in an iterable of floats, e.g. [0.1, 0.3, 0.6].
-
- verbose: bool
-
Whether to display a progress bar or not.
Note
If both
verbose=True
andn_jobs > 1
, then the progress bars for each process are always displayed in the console, regardless of where you are running this function from (e.g. a Jupyter notebook). - n_jobs: int > 0 or -1
-
The number of jobs to run in parallel.
Setting this to -1 will use all available CPU cores.
- Returns
-
- accuracy: float
-
The categorical accuracy of the classifier on the observation sequences.
- confusion: class:numpy:numpy.ndarray (int)
-
The confusion matrix representing the discrepancy between predicted and actual labels.
- save ( path ) [source]
-
Serializes the
HMMClassifier
object by pickling.- Parameters
-
- path: str
-
File path (usually with .pkl extension) to store the serialized
HMMClassifier
object.
- classmethod load ( path ) [source]
-
Deserializes a
HMMClassifier
object which was serialized with thesave()
function.- Parameters
-
- path: str
-
File path of the serialized data generated by the
save()
method.
- Returns
-
- deserialized: class:HMMClassifier
-
The deserialized HMM classifier object.
Source: https://sequentia.readthedocs.io/en/latest/sections/classifiers/gmmhmm.html
0 Response to "Hidden Markov Model With Continuous Emissions Emission Matrix Probability"
Post a Comment