Download Computing the score

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Speaker Verification System using SVM
Jun-Won Suh
Intelligent Electronic Systems
Human and Systems Engineering
Department of Electrical and Computer Engineering
Research Activities
• Software Release
 Tcl/Tk Search Demo debug
 Language Model Tester
 Diagnose Method for LanguageModel Classes
• Speaker Verification System
 Setup parameters for isip_verify
 Run the isip_verify utility to make SVM baseline
 Propose techniques to improve system (Thesis Topic)
• Overall
 Speed up on my research to graduate at DECEMBER.
Research Progress: Jun-Won Suh
Page 1 of 12
SVM baseline DEC curve
• Some minor changes in front-end cause little bit better
results.
Research Progress: Jun-Won Suh
Page 2 of 12
Classifying sequences using score-space kernels
• The score-space kernel enables SVMs to classify whole
sequences.
• A variable length sequence of input vectors is mapped
explicitly onto a single point in a space of fixed dimension.
• The score-space is derived from the likelihood score.
 ^f ( X )   ^ f ({ pk ( X | M k , k )}), X  {x1 ,..., xN }
F
F
• Score-argument, f ({ pk ( X |  k )}) ,which is a function of
scores of a set of generative model
• Score-mapping operator, F ,which maps the scalar scoreargument to the score-space. Choosing the first derivative
operator, the gradient of log likelihood wrt a parameter
describes how that parameter contribute to generating a
particular speaker model.
^
Research Progress: Jun-Won Suh
Page 3 of 12
Computing the score-space vectors
Define the global likelihood of a sequence X = {x1, …, xNl}
  {a j ,  kj , kj }
where, Ng is the number of Gaussians that make up the
mixture model. Nd is the dimensionality of the input
vectors with components, and parameters of GMM
Since we define the sequence X = {x1, …, xNl},
The derivatives are with respect to the covariances, means,
and priors of the GMM. Let
Research Progress: Jun-Won Suh
Page 4 of 12
Computing the score-space vectors
The derivative with respect to the jth prior is,
The derivative with respect to the kth components of the
jth mean is,
Lastly, the derivative with respect to the kth component of
the jth covariance is,
The fixed length score-space vector can be expressed as,
where, j* runs number of GMM, Ng and k* runs dimensionality
of input vectors Nd.
Research Progress: Jun-Won Suh
Page 5 of 12
Computing the score-space vectors
Using the first derivative with argument score-operator and
the same score-argument the mapping becomes
• This mapping have a minimum test performance that
equals the original generative model, M.
• The inclusion of the derivatives as “extra features”
.
should
give additional information for the classifier to use.
• An alternative score-argument is the ratio of two
generative models, M1 and M2,
Research Progress: Jun-Won Suh
Page 6 of 12
Computing the score-space vectors
• The dimensionality of the score-space is equal to the total
number of parameters in the generative models. Hence
the SVM can classify the complete utterance sequences.
• The kernel is constructed using dot products in scorespace
T
K ( X , Y )  ( X ) G(Y )
where, G is the inverse fisher information matrix in log
likelihood score-space mapping.
G  ( E ( ( X ) (Y )T )) 1
Research Progress: Jun-Won Suh
Page 7 of 12
One of MIT-LL approach: Phonetic SVM System
• Same approach as Score-space method (prosody, word
choice, pronunciation, and etc.)
• Using the phone sequence of acoustic information, the
system performs accurate on speaker verification job.
• This technique uses likelihood ratio score-space kernel
with no derivative arguments.
Research Progress: Jun-Won Suh
Page 8 of 12
Phone Sequence Extraction
• Phone sequence extraction for speaker recognition
process is performed using the phone recognition system
(PPRLM) designed by Zissman for language identification.
• Phone is modeled in a gender dependent context
independent manner using a three state HMM.
• Phone recognition is performed with a Viterbi search
using a fully connected null-grammar network on
monophones (no explicit language model in decoding).
• The phone sequences is vectorized by computing
frequencies of N-grams.
Research Progress: Jun-Won Suh
Page 9 of 12
Bag of N-grams
Produce N-grams by the standard transformation of the stream.
Example: For bigrams, the sequence of phones, t1, t2, …, tn, is
transformed to the t1_t2, t2_t3, …,tn-1_tn. The unique unigrams
and bigrams are designated d1,…,dM, and d1_d1, … dM_dM. Then
we calculate probabilities and joint probabilities.
Research Progress: Jun-Won Suh
Page 10 of 12
Kernel Construction
• Likelihood ratio computation serve as the kernel.
• Suppose that the sequence of N grams in each
conversation side is t1, t2, … , tn and u1, u2, …, um. Also
denote the unique set of N grams as d1, d2, …,dM.
Research Progress: Jun-Won Suh
Page 11 of 12
Conclusion
• Using the nature of human listeners, the Speaker
Verification can be improved.
• Using phone N-gram and Score-space technique can
improve the Speaker Verification system.
Research Progress: Jun-Won Suh
Page 12 of 12
References
• V. Wan, Speaker Verification using Support Vector
Machines, University of Sheffield, June 2003
• V. Wan, Building Sequence Kernels for Speaker
Verificaiton and Speech Recognition, University of
Sheffield
• S. Bengio, and J. Marithoz, Learning the Decision
Function for the Speaker Verification, IDIAP, 2001
• W.M. Campbell, J.P. Campbell, D.A. Reynolds, D.A.
Jones, and T.R. Leek, Phonetic Speaker Recognition with
Support Vector Machines, Advances in Neural
Information Processing Systems, 2004
Research Progress: Jun-Won Suh
Page 13 of 12