Download CALO 2004 report - Carnegie Mellon School of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CALO Decoder Progress
Report for June
Arthur (Decoder, Trainer, ICSI Training)
Yitao (Live-mode Decoder)
Ziad (ICSI Training)
Carnegie Mellon University
July 6, 2004
This Presentation
 Progress report for June (15 pages)





Review and Highlight (2 pages)
ICSI AM training (4 pages)
Infrastructure (2 page)
Decoder (8 pages)
Summary and Outlook (1 pages)
 Review of Q2 2004
 Live-mode APIs not completed
 Sphinx not yet tested for task with vocab>
2k
 ICSI training just started
June high-light
They are completed !

(to some extent)
 Live-mode APIs prototype is completed
 A demo is built.
 Sphinx 3.4 went through the WSJ 5k
task successfully
 Without pruning
 First two phases of ICSI training are
completed
ICSI Training
-Grand Plan
 By Ziad and ArthurC
 Transcript conversion is completed
 4 Phases
 Phase I - Replication of Rita’s training
 Phase II – Fixing Resource
 Use corrected train/test/dev sets
 Fixed transcriptions and dictionary
 Phase III – Tuning
 Training: On topology/#senones/#mix
 Recognition: Parameters tuning
 Phase IV – Further Improvement
 Use SCHMM to generate trees?
 Automatic question generation?
 Others?
ICSI Training
-Current Status
 Phase I completed
 Within 0.5% difference from Rita’ results
 Tested on transcriber’s meeting
 47.3% WERR. (45.2% WERR when
equivalence pair were considered)
 Phase II completed
 In the development set and testing set
 Results varied from 47% to 29%
 Clipped speech deletion found to be
ineffective.
ICSI Training
-Before we go to Phase III
 From the last two phases
 We have some results that looks good.
 BUT, Results vary with meeting conditions
 # of speakers?
 Speaker speaking rate entropy?
 Cross talk?
 Understanding is more important than
typing!
 Plan of next month
 Understand why recognition results vary
 Complete Phase III and IV with current test sets.
 Obtain standard test set from NIST
Infrastructure (2 pages)
-Workshops and Presentations
 2 CVS Workshops
 had great discussion in the workshop
 Slides can be found at ArthurC’s web
page
 Will re-do it in the new semester.
 2 Speech Developer’s meetings
 Next meeting on this Thursday:
 “From main() to GMM computation.
Infrastructure
-CVS
 What’re there in CVS?
 MRCP source code (v1 and v2)
 Standard training scripts:
 ICSI Conversion Scripts
 Communicator Training Scripts

Guarantee giving you 100% Satisfaction and 12% WERR.
 WSJ 5k Training Scripts

Guarantee giving you 100% Satisfaction and 8% WERR.
 Outlook
 Need to migrate to other machines.
 Next: ICSI training scripts (P1 to P4)
 Communicator /WSJ testing scripts.
Decoder work (7 pages)
-Interface
 By Yitao (he didn’t even get hurt!)
 Sphinx 2-like APIs’ prototype is completed,
functions completed
 Initialization
 A demo is also built.
 Will be officially included in Sphinx 3.5.
 Latest code already available in CVS
 Plan of July
 Let the APIs go-through its ultimate challenge:
be used in an application.
 Enable logging of the recognizer
Decoder work
-Speed
 With big help from Evandro
 WSJ 5k task evaluation completed
 NVP, perplexity ~= 90
 Tested under a 2G machine
 All results are not tuned. (very wide beam-width,
no fast GMM computation)




S3 (s3flat) : WERR 6.5%, Speed 2.7xRT
S3.4 (s3fast) : WERR 6.65%, Speed 0.94xRT
Conclusion : WSJ 5k task is not our challenge.
Plan of July -> It is time to try a 20k task.
(ICSI or WSJ 20k)
SphinxTrain work
 In the current Baum-Welch trainer of
SphinxTrain (v0.92)
 Silence is not optionally deleted in
Baum-Welch
 Multiple pronunciations are not allowed
in Baum-Welch
 We rely on force alignment to get the
correct alignment
SphinxTrain 0.93 progress
 Silence Modeling
 Optional silence deletion is now allowed
 Progress : Completed
 Multiple Pronunciation
 To be Allowed in Baum-Welch
 Progress : nearly completed (need 2-3 days)
 Correct Triphone Expansion
 May not have time to finish it in Q3.
 Plan of July
 Enable multiple pronunciations in Baum-Welch
 Legacy is a problem! (We could fix Sphinx 4
Trainer instead.)
Decoder work
-Adaptation
 Mainly code-tracing in this part
 Situation:
 Two versions of MLLR adaptation (Sam
Joo’s and SphinxTrain’s)
 Some code need to be refined before we
expose them
 S3flat has MLLR but not S3fast
 Plan of this month
 After finish trainer job, we will tackle it.
Decoder work
–Packaging and Distribution

Official Web page:


cmusphinx.sourceforge.net/
Release Process
 1, set n = 1
 2, Loop




3, Copy the RC into Sourceforge’s standard distribution
web site.
Current status:
 People yelled in RC II in the calm down period (Yitao fixed
them)
 Create RCIII this week.


Distribute the Release Candidate n
See anyone yell in one week (calm down period)
If yes, n = n + 1, loop again.
If no, break
Decoder work
-Miscellaneous
 Continuous HMM for Communicator
model is also completed.
 Ready for combination (Do we want to?)
 Possibly we want to combine ICSI model
and CMU model.
 Training script is still a big headache
for use
 Still have no time to fix it.
Decoder work –Documentation
(aka sphinxDoc)
 Only have progress when
 ArthurC procrastinates and doesn’t want to
read and play video game
 Draft I of Chapter I and II are completed.
 Chapter I : License Agreement and user
responsibility
 Chapter II :
 What is speech recognition for dummy.
 History of speech recognition
 History of sphinx
 Version of sphinx (When to use what)
Summary and Outlook
 We have done something in June
 We better do more in next 3 months.
 Priorities – We have to deal with “CALO
Grand Challenge”
 Recorder/Classifier/Recognizer Integration
 Improvement of Acoustic/Language Modeling
 Speaker Adaptation
 Non-completed tasks always on the list and
will pop up in the right time.