Download CALO 2004 report - Carnegie Mellon School of Computer Science

CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University July 6, 2004 This Presentation  Progress report for June (15 pages)      Review and Highlight (2 pages) ICSI AM training (4 pages) Infrastructure (2 page) Decoder (8 pages) Summary and Outlook (1 pages)  Review of Q2 2004  Live-mode APIs not completed  Sphinx not yet tested for task with vocab> 2k  ICSI training just started June high-light They are completed !  (to some extent)  Live-mode APIs prototype is completed  A demo is built.  Sphinx 3.4 went through the WSJ 5k task successfully  Without pruning  First two phases of ICSI training are completed ICSI Training -Grand Plan  By Ziad and ArthurC  Transcript conversion is completed  4 Phases  Phase I - Replication of Rita’s training  Phase II – Fixing Resource  Use corrected train/test/dev sets  Fixed transcriptions and dictionary  Phase III – Tuning  Training: On topology/#senones/#mix  Recognition: Parameters tuning  Phase IV – Further Improvement  Use SCHMM to generate trees?  Automatic question generation?  Others? ICSI Training -Current Status  Phase I completed  Within 0.5% difference from Rita’ results  Tested on transcriber’s meeting  47.3% WERR. (45.2% WERR when equivalence pair were considered)  Phase II completed  In the development set and testing set  Results varied from 47% to 29%  Clipped speech deletion found to be ineffective. ICSI Training -Before we go to Phase III  From the last two phases  We have some results that looks good.  BUT, Results vary with meeting conditions  # of speakers?  Speaker speaking rate entropy?  Cross talk?  Understanding is more important than typing!  Plan of next month  Understand why recognition results vary  Complete Phase III and IV with current test sets.  Obtain standard test set from NIST Infrastructure (2 pages) -Workshops and Presentations  2 CVS Workshops  had great discussion in the workshop  Slides can be found at ArthurC’s web page  Will re-do it in the new semester.  2 Speech Developer’s meetings  Next meeting on this Thursday:  “From main() to GMM computation. Infrastructure -CVS  What’re there in CVS?  MRCP source code (v1 and v2)  Standard training scripts:  ICSI Conversion Scripts  Communicator Training Scripts  Guarantee giving you 100% Satisfaction and 12% WERR.  WSJ 5k Training Scripts  Guarantee giving you 100% Satisfaction and 8% WERR.  Outlook  Need to migrate to other machines.  Next: ICSI training scripts (P1 to P4)  Communicator /WSJ testing scripts. Decoder work (7 pages) -Interface  By Yitao (he didn’t even get hurt!)  Sphinx 2-like APIs’ prototype is completed, functions completed  Initialization  A demo is also built.  Will be officially included in Sphinx 3.5.  Latest code already available in CVS  Plan of July  Let the APIs go-through its ultimate challenge: be used in an application.  Enable logging of the recognizer Decoder work -Speed  With big help from Evandro  WSJ 5k task evaluation completed  NVP, perplexity ~= 90  Tested under a 2G machine  All results are not tuned. (very wide beam-width, no fast GMM computation)     S3 (s3flat) : WERR 6.5%, Speed 2.7xRT S3.4 (s3fast) : WERR 6.65%, Speed 0.94xRT Conclusion : WSJ 5k task is not our challenge. Plan of July -> It is time to try a 20k task. (ICSI or WSJ 20k) SphinxTrain work  In the current Baum-Welch trainer of SphinxTrain (v0.92)  Silence is not optionally deleted in Baum-Welch  Multiple pronunciations are not allowed in Baum-Welch  We rely on force alignment to get the correct alignment SphinxTrain 0.93 progress  Silence Modeling  Optional silence deletion is now allowed  Progress : Completed  Multiple Pronunciation  To be Allowed in Baum-Welch  Progress : nearly completed (need 2-3 days)  Correct Triphone Expansion  May not have time to finish it in Q3.  Plan of July  Enable multiple pronunciations in Baum-Welch  Legacy is a problem! (We could fix Sphinx 4 Trainer instead.) Decoder work -Adaptation  Mainly code-tracing in this part  Situation:  Two versions of MLLR adaptation (Sam Joo’s and SphinxTrain’s)  Some code need to be refined before we expose them  S3flat has MLLR but not S3fast  Plan of this month  After finish trainer job, we will tackle it. Decoder work –Packaging and Distribution  Official Web page:   cmusphinx.sourceforge.net/ Release Process  1, set n = 1  2, Loop     3, Copy the RC into Sourceforge’s standard distribution web site. Current status:  People yelled in RC II in the calm down period (Yitao fixed them)  Create RCIII this week.   Distribute the Release Candidate n See anyone yell in one week (calm down period) If yes, n = n + 1, loop again. If no, break Decoder work -Miscellaneous  Continuous HMM for Communicator model is also completed.  Ready for combination (Do we want to?)  Possibly we want to combine ICSI model and CMU model.  Training script is still a big headache for use  Still have no time to fix it. Decoder work –Documentation (aka sphinxDoc)  Only have progress when  ArthurC procrastinates and doesn’t want to read and play video game  Draft I of Chapter I and II are completed.  Chapter I : License Agreement and user responsibility  Chapter II :  What is speech recognition for dummy.  History of speech recognition  History of sphinx  Version of sphinx (When to use what) Summary and Outlook  We have done something in June  We better do more in next 3 months.  Priorities – We have to deal with “CALO Grand Challenge”  Recorder/Classifier/Recognizer Integration  Improvement of Acoustic/Language Modeling  Speaker Adaptation  Non-completed tasks always on the list and will pop up in the right time.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CALO 2004 report - Carnegie Mellon School of Computer Science