Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University July 6, 2004 This Presentation Progress report for June (15 pages) Review and Highlight (2 pages) ICSI AM training (4 pages) Infrastructure (2 page) Decoder (8 pages) Summary and Outlook (1 pages) Review of Q2 2004 Live-mode APIs not completed Sphinx not yet tested for task with vocab> 2k ICSI training just started June high-light They are completed ! (to some extent) Live-mode APIs prototype is completed A demo is built. Sphinx 3.4 went through the WSJ 5k task successfully Without pruning First two phases of ICSI training are completed ICSI Training -Grand Plan By Ziad and ArthurC Transcript conversion is completed 4 Phases Phase I - Replication of Rita’s training Phase II – Fixing Resource Use corrected train/test/dev sets Fixed transcriptions and dictionary Phase III – Tuning Training: On topology/#senones/#mix Recognition: Parameters tuning Phase IV – Further Improvement Use SCHMM to generate trees? Automatic question generation? Others? ICSI Training -Current Status Phase I completed Within 0.5% difference from Rita’ results Tested on transcriber’s meeting 47.3% WERR. (45.2% WERR when equivalence pair were considered) Phase II completed In the development set and testing set Results varied from 47% to 29% Clipped speech deletion found to be ineffective. ICSI Training -Before we go to Phase III From the last two phases We have some results that looks good. BUT, Results vary with meeting conditions # of speakers? Speaker speaking rate entropy? Cross talk? Understanding is more important than typing! Plan of next month Understand why recognition results vary Complete Phase III and IV with current test sets. Obtain standard test set from NIST Infrastructure (2 pages) -Workshops and Presentations 2 CVS Workshops had great discussion in the workshop Slides can be found at ArthurC’s web page Will re-do it in the new semester. 2 Speech Developer’s meetings Next meeting on this Thursday: “From main() to GMM computation. Infrastructure -CVS What’re there in CVS? MRCP source code (v1 and v2) Standard training scripts: ICSI Conversion Scripts Communicator Training Scripts Guarantee giving you 100% Satisfaction and 12% WERR. WSJ 5k Training Scripts Guarantee giving you 100% Satisfaction and 8% WERR. Outlook Need to migrate to other machines. Next: ICSI training scripts (P1 to P4) Communicator /WSJ testing scripts. Decoder work (7 pages) -Interface By Yitao (he didn’t even get hurt!) Sphinx 2-like APIs’ prototype is completed, functions completed Initialization A demo is also built. Will be officially included in Sphinx 3.5. Latest code already available in CVS Plan of July Let the APIs go-through its ultimate challenge: be used in an application. Enable logging of the recognizer Decoder work -Speed With big help from Evandro WSJ 5k task evaluation completed NVP, perplexity ~= 90 Tested under a 2G machine All results are not tuned. (very wide beam-width, no fast GMM computation) S3 (s3flat) : WERR 6.5%, Speed 2.7xRT S3.4 (s3fast) : WERR 6.65%, Speed 0.94xRT Conclusion : WSJ 5k task is not our challenge. Plan of July -> It is time to try a 20k task. (ICSI or WSJ 20k) SphinxTrain work In the current Baum-Welch trainer of SphinxTrain (v0.92) Silence is not optionally deleted in Baum-Welch Multiple pronunciations are not allowed in Baum-Welch We rely on force alignment to get the correct alignment SphinxTrain 0.93 progress Silence Modeling Optional silence deletion is now allowed Progress : Completed Multiple Pronunciation To be Allowed in Baum-Welch Progress : nearly completed (need 2-3 days) Correct Triphone Expansion May not have time to finish it in Q3. Plan of July Enable multiple pronunciations in Baum-Welch Legacy is a problem! (We could fix Sphinx 4 Trainer instead.) Decoder work -Adaptation Mainly code-tracing in this part Situation: Two versions of MLLR adaptation (Sam Joo’s and SphinxTrain’s) Some code need to be refined before we expose them S3flat has MLLR but not S3fast Plan of this month After finish trainer job, we will tackle it. Decoder work –Packaging and Distribution Official Web page: cmusphinx.sourceforge.net/ Release Process 1, set n = 1 2, Loop 3, Copy the RC into Sourceforge’s standard distribution web site. Current status: People yelled in RC II in the calm down period (Yitao fixed them) Create RCIII this week. Distribute the Release Candidate n See anyone yell in one week (calm down period) If yes, n = n + 1, loop again. If no, break Decoder work -Miscellaneous Continuous HMM for Communicator model is also completed. Ready for combination (Do we want to?) Possibly we want to combine ICSI model and CMU model. Training script is still a big headache for use Still have no time to fix it. Decoder work –Documentation (aka sphinxDoc) Only have progress when ArthurC procrastinates and doesn’t want to read and play video game Draft I of Chapter I and II are completed. Chapter I : License Agreement and user responsibility Chapter II : What is speech recognition for dummy. History of speech recognition History of sphinx Version of sphinx (When to use what) Summary and Outlook We have done something in June We better do more in next 3 months. Priorities – We have to deal with “CALO Grand Challenge” Recorder/Classifier/Recognizer Integration Improvement of Acoustic/Language Modeling Speaker Adaptation Non-completed tasks always on the list and will pop up in the right time.