EEL6586: Automatic Speech Processing - HW#5: HMM for Digit Recognition - Prof. John Gregor, Assignments of Electrical and Electronics Engineering

A homework assignment for a university course on automatic speech processing (eel6586). The assignment involves using a hidden markov model (hmm) to recognize english digits 'zero' through 'nine'. Students are encouraged to use the h2m matlab toolbox for implementation. The assignment includes tasks such as drawing the state diagram, recognizing digits using the hmm, and analyzing misclassified utterances.

Typology: Assignments

Pre 2010

Uploaded on 09/17/2009

koofers-user-l75
koofers-user-l75 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EEL6586: Automatic Speech Processing HW#5
EEL 6586: HW#5
Due Friday, April 9, 2004 in class. Late homework loses e#of days late 1percentage
points. See the current late penalty at
http://www.cnel.ufl.edu/hybrid/harris/latepoints.html
HMM-based Digit Recognition
In this part, you will investigate the implementation issues of a Hidden Markov Model (HMM) by
using the HMM to recognize the English digits “zero” through “nine”. You may implement your
own HMM, but we highly recommend you download the H2M Matlab Toolbox by Olivier Capp
http://www-sig.enst.fr/˜cappe/h2m/h2m.html. Read the contents of this webpage to become
familiar with H2M; its implementation is as straightforward as using any other Matlab Toolbox
functions. Example files such as ex_basic.m and ex_sprec.m are handy.
[1 ] Draw the state diagram for a left-right HMM typically used for word recognition. What
is a reasonable number of states to represent, say, the digits considered in this problem?
Explain. Also include the state transition matrix A and initial state probability vector π.
Include reasonable values for the A matrix and πvector. How did you find these values?
(Hint: consider “hard” segmentation as used in hmm_mint.m in H2M).
2In this problem, you will perform “clean” speech recognition on a digits database using an
HMM. The database of raw speech available in
http://www.cnel.ufl.edu/hybrid/courses/EEL6586/hw5.zip is separated into TEST and
TRAIN sets. See the README.TXT file in hw5.zip for details on the database format.
For your HMM, use the following parameters: States=6, n_Iterations=10, output obser-
vation pdfs are single multivariate Gaussian with diagonal covariance matrices. You can
use the speech features you used in HW4, though we recommend mfcc features (Malcolm
Slaney’s mfcc.m code in Auditory Toolbox available at
http://rvl4.ecn.purdue.edu/˜malcolm/interval/1998-010. For parametric classifiers, it is
typical to report two recognition scores: 1) recognition on the TRAIN data, and 2) recog-
nition on TEST data. Since your HMM model parameters were estimated from the TRAIN
data, recognition using the TRAIN data should be higher than recognition on unknown
TEST data. The difference in these two scores is an indicator of how “generalized” your
classifier is. Hand in both recognition results (Note: should be greater than 90% for this
database) along with both confusion matrices (same as in HW3) and a description of the
features you used.
3From the results of part [2] above, you should find only a few (if any) incorrectly-classified
utterances. Find the utterances misclassified when using the TEST data and report the
utterance label (eg: ti\_00F8ST01) as well as the class (digit) each one classified as.
Comment on possible explanations as to why these utterances were misclassified.
J.G. Harris March 19, 2004 1
pf3

Partial preview of the text

Download EEL6586: Automatic Speech Processing - HW#5: HMM for Digit Recognition - Prof. John Gregor and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

EEL 6586: HW#

Due Friday, April 9, 2004 in class. Late homework loses e#^ of^ days late^ − 1 percentage points. See the current late penalty at http://www.cnel.ufl.edu/hybrid/harris/latepoints.html

HMM-based Digit Recognition

In this part, you will investigate the implementation issues of a Hidden Markov Model (HMM) by using the HMM to recognize the English digits “zero” through “nine”. You may implement your own HMM, but we highly recommend you download the H2M Matlab Toolbox by Olivier Capp http://www-sig.enst.fr/˜cappe/h2m/h2m.html. Read the contents of this webpage to become familiar with H2M; its implementation is as straightforward as using any other Matlab Toolbox functions. Example files such as ex_basic.m and ex_sprec.m are handy.

[1 ] Draw the state diagram for a left-right HMM typically used for word recognition. What is a reasonable number of states to represent, say, the digits considered in this problem? Explain. Also include the state transition matrix A and initial state probability vector π. Include reasonable values for the A matrix and π vector. How did you find these values? (Hint: consider “hard” segmentation as used in hmm_mint.m in H2M). 2 In this problem, you will perform “clean” speech recognition on a digits database using an HMM. The database of raw speech available in http://www.cnel.ufl.edu/hybrid/courses/EEL6586/hw5.zip is separated into TEST and TRAIN sets. See the README.TXT file in hw5.zip for details on the database format. For your HMM, use the following parameters: States=6, n_Iterations=10, output obser- vation pdfs are single multivariate Gaussian with diagonal covariance matrices. You can use the speech features you used in HW4, though we recommend mfcc features (Malcolm Slaney’s mfcc.m code in Auditory Toolbox available at http://rvl4.ecn.purdue.edu/˜malcolm/interval/1998-010. For parametric classifiers, it is typical to report two recognition scores: 1) recognition on the TRAIN data, and 2) recog- nition on TEST data. Since your HMM model parameters were estimated from the TRAIN data, recognition using the TRAIN data should be higher than recognition on unknown TEST data. The difference in these two scores is an indicator of how “generalized” your classifier is. Hand in both recognition results (Note: should be greater than 90% for this database) along with both confusion matrices (same as in HW3) and a description of the features you used.

3 From the results of part [2] above, you should find only a few (if any) incorrectly-classified utterances. Find the utterances misclassified when using the TEST data and report the utterance label (eg: ti_00F8ST01) as well as the class (digit) each one classified as. Comment on possible explanations as to why these utterances were misclassified.

4 Repeat [2] above, but instead of using “clean” TEST utterances, use TEST utterances which have added white Gaussian noise (AWGN) at a global SNR of 20 dB. “Global” SNR is defined over an entire utterance, as opposed to “Local” SNR, where each frame of speech is set to the same SNR – high-energy frames have more noise added than low-energy frames (unrealistic since noise level is now speech-dependent which violates the assumption that the noise and utterance are independent). Report your recognition results on the noisy TEST data and clean TRAIN data as well as your confusion matrices. Below is code you can use to add AWGN to each utterance:

% Create random sequence of normal distribution (zero mean, unity variance): noise = randn(size(x)); % x is the "clean" time-domain utterance (whole word)

% Find energy of each utterance and noise: energyX = sum(x.^2); energyNoise = sum(noise.^2);

% Find amplitude for noise: noiseAmp = sqrt(energyX/energyNoise*10^(-SNR/10)); % SNR is in dB

% Add noise to utterance: x = x + noiseAmp*noise; 5 Repeat [4] above, but now you are free to modify your algorithm to improve recognition performance. Consider varying the number states in your HMM, the number of iterations used to train your HMM, the order in which you present your train data to the EM algorithm (see extra credit below). Report recognition results on the TEST and the TRAIN data as well as the confusion matrices. The TA reports results around 40% correct as a ballpark figure.

Extra Credit On unlabeled data found at http://www.cnel.ufl.edu/hybrid/courses/EEL6586/hw5EC.zip, you will classify the unla- beled data using your HMM from part [5]. The only thing you know about the unlabeled test data is that it is “clean” speech corrupted with AWGN at 20 dB SNR. Since you don’t know the utterance of the data, you cannot tabulate a confusion matrix. So instead, you will report your results as a column vector. The elements of your results vector correspond to the rows of the character matrix testNames containing the names of the unlabeled TEST utterances. Each element of your results vector is the class 0-9 output by your HMM in float format (not character format!). WHAT TO TURN IN: to get extra credit, store your results vector in a Matlab variable called resultsEC, then save this variable to a .mat file with the following naming convention – first initial followed by last name (eg: jharris.mat, lgu.mat). Mail your .mat file to the TA ([email protected]). Since grading will be done by a Windows PC, using the proper naming convention is CRITICAL! The students with the highest accuracy on the unlabeled set will receive bonus points.