Download Your Title Here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Results obtained in speaker recognition
using Gaussian Mixture Models
Marieta Gâta*, Gavril Toderean**
*North University of Baia Mare
**Technical University of Cluj Napoca
1. Introduction
•
a speaker identification system based on Gaussian Mixture Models (GMM) - good performance for textindependent speech and short test utterances
•
speaker recognition technique used - based on GMM
•
approach consists in three phases:
• parameterization
• model training
• classification
•
compare a model of a speech (unknown speaker) with models of speakers (our database)
•
in training process - use the EM (Expectation - Maximization) algorithm for GMM
•
study influences of different parameters in GMM's system performances :
• number of mixture components
• amount of training data (length of the wav file in seconds)
• numbers of iterations
•
probability density function consisting of maxim of 12 mixtures
•
for M speakers: to find the speaker model (with maximum posterior probability for the input vector sequence)
2. Speech Database
•
systems - evaluated with Romanian speech database
•
number of speakers = 200 (123 male and 77 female), different classes of age (student age 18-22)
•
each speaker - 4 sentences (2 for testing & 2 for training) => training data used for training and testing - different
•
speakers - recorded in 2/3 sessions (time among sessions < 1 month)
•
speech - clean (laboratory background), recorded with microphone and sampled at 22 kHz, 16 bit and mono
•
length of the training & testing sentences - from 4 to 10 seconds
•
training sentences are:
• “Un numar de telefon este format din cifrele zero unu doi trei patru cinci sase sapte opt noua zece”
• “Principalele operatii matematice sunt adunarea scaderea inmultirea si impartirea”
•
testing sentences are:
• “Numarul meu de telefon este patru zero doi sase doi unu doi trei patru cinci”
• “Automobilul meu atinge o viteza de o suta optzeci de km pe ora”
•
feature vectors used - with 12th order MFCC-Mel Frequency Cepstral Coefficients (obtained from 20 mel-wrapping
filter banks)
•
first experiment - relation: number of mixture components and recognition performance
•
second experiment - relation: different amount of training data (length of wav files), number of mixture components
and recognition performance
•
third and fourth experiments - relation: number of iterations, number of mixture components and recognition
performance
3. Relation between number of mixture components and recognition
performance
•
GMM with a full covariance matrix - most complex models
•
use a simplified form with each component consisting of diagonal of covariance matrix, mean and a weight
•
models - tested => results in Figure 1 (in training process used 10 seconds of speech)
•
better results with larger models (with more components)
Figure 1. GMM with diagonal covariance matrix
•
•
•
•
•
•
•
4. Relation between different amount of training data, number of mixture
components and recognition performance
A GMM is tested on model sizes extracted from wav file of 4, 6 and 10 seconds
recognition results increase with the number of mixture components and with the amount of training data
results in Figure 2, recognition scores in Table 1 (best recognition results: M=12)
growing length of training data generate best recognition results - right of the figure (higher number of components
in the model)
recognition error: right part of the figure - more reduced - then in the left part
small amount of training data (4 and 6 seconds) - not ideal for GMM method
best results were achieved with 12 components of GMM and the size of the wav files of 10 seconds
Amount of training
speech
4 seconds
6 seconds
10 seconds
Figure 2. Performance curves obtained from GMM models that were
trained with different amount of data
Model order
M=2
M=4
M=6
M=8
M=10
M=12
M=2
M=4
M=6
M=8
M=10
M=12
M=2
M=4
M=6
M=8
M=10
M=12
[%] correct
45
55
67
72
74
76
46
57
69
73
75
78
48
58
71
74
77
79
Table 1. GMM identification performance for
different amounts of training data and model orders
5. Relations between number of iterations, number of mixture components
and recognition performance
•
study the influence EM iterations for improving recognition score - improved 10 iterations, recommended 50 iterations
•
results in Figure 3 and Figure 4
Figure 3. Influence of EM iterations on recognition performance obtained from GMM with diagonal
covariance matrix for models with 4, 6, 8, 10, 12 number of mixture components
Figure 4. Influence of EM iterations on recognition performance obtained from GMM with diagonal
covariance matrix for models with 4, 8, 12 number of mixture components
6. Conclusions
•
performance of method - good
•
maximizing the use of speaker data (maximizing size of the model) => improve the speaker recognition
•
size of the model - too large (small amount of training data) => reduce performance of the recognition system
•
best performance obtained with -12 mixture components of GMM
-50 iterations of the process
-size of the wav files of 10 seconds
References
1. Fredouille, C, Pouchoulin, G, Bonastre, J-F, Azzarello, M, Ghio, A G, “Application of Automatic Speaker
Recognition techniques to pathological voice assessment”, 9th European Conference on Speech
Communication and Technology, (Interspeech) Lisboa, September 2005
2. Jin, Q, Waibel, A, “Application of LDA to Speaker Recognition”, Proceedings of International Conference on
Spoken Language Processing ( ICSLP-2000), Beijing, PRChina, October 2000
3. Morris, A, Wu, D, Koreman, J, “GMM based clustering and speaker separability in the Timit speech database”,
IEICE Transactions Fundamentals, Communications, Electronics, Informatics & Systems, Vol E85, 2005
4. Reynolds, DA, A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification, PhD
Thesis, Georgia Institute of Technology, September 1992
Related documents