Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Online Vowel Training Program Partners: Ted Dallmann Paul Blair Advisor: Dr. Michael Schordilis Project Goal Automate a training process for the twelve English vowels by designing and implementing a computer system that, given a speech input, could provide real-time aural, graphical, and numeric feedback to the user to help him improve his pronunciation. Overview Speech Our System Brief Introduction to Speech and Phonemes Characteristics of Vowels Theory Algorithms Implementation Development Conclusion Results Possible Improvements Speech Speech is the system of acoustical cues with we use to communicate verbally These cues, called phonemes, are produced via the human body using the glottis, lungs, vocal cavity, nasal cavity, tongue, and other body parts Phonemes can be linked together to form syllables, which can be linked together to form words Speech Production The Human Speech Production System Power Supply = Lungs Modulator = Larynx Resonator = Vocal Tract Classifications of Phonemes Vowels Consonants Plosives Noisy; caused by sudden bursts of air Fricatives Periodic; caused by glottal pulses resonating Also noisy; caused by air constrictions Miscellaneous Combination of resonance and noise Can vary frequency with time Vowels Vowels are formed when air pulses from the glottis resonant through the vocal tract Because of their resonant nature, vowels are quasi-periodic The resonances of the vocal tract color the spectrum of the vowel— these resonant frequencies are called formants The shape of the vocal tract and the positioning of the tongue control the formants’ frequencies Formant values also differ between speakers based on the size, sex, and age of the speaker How Vowels are Differentiated If everyone produces different formants, how can we say the same vowels? The key to recognizing vowels is the ratios between formants, not the actual values themselves The ratio can be visualized in a F2/F1 graph Most vowels lie within a specific region of F2/F1; this region is commonly called the Vowel Triangle Our System Our automated training process should take a digital speech input, isolate and analyze the vowel, and provide feedback to the user Record the Given CVC word Isolate the Vowel Segment FFT And LPC/ Smooth Spectrum Graph AudioWave Extract Formants Provide User with Text Feedback Graph Vowel Triangle Vowel Isolation Because of their resonant nature, vowels have more sustained energy than consonants or silence By creating an normalized energy array of the average energy value over a 1ms interval, the vowel can be discerned from surrounding content By truncating the signal to the corresponding intervals with energy of 40% or more, we can separate the vowel Formant Detection FFT can give a quick but “noisy” view of the spectral content LPC tries to mimic original signal with minimal error, so its spectral content is smooth The local maxima of the LPC’s spectrum correspond to formants from the original vowel Feedback AURAL GRAPHICAL Providing a correct pronunciation for the user to play and allowing the user to replay his pronunciation over the speaker to compare Graphing the time and frequency response of the both the user’s pronunciation and the demo Plotting the user’s vowel inside the vowel triangle and highlighting the region for the target pronunciation NUMERICAL Providing a text box with the values, target values, and percent errors for the first three formants Implementation All algorithms were first developed and tested using the MATLAB environment Once all algorithms were finalized, they were rewritten into Java classes and further tested as applications using the Java programming language After the Java classes were finalized, a Java applet was created using those classes Development GUI - Graphical User Interface Java contains a comprehensive library for creating visual components called Swing Swing allows developers to use existing visual components or to create new ones based off of their prototypes “Webifying” Java applets can be embedded into an HTML webpage allowing users to run Java code over the internet Applets can reuse Java code from other applications Database Formant values, sample words, and the location of the demonstration sound file for each phoneme are stored serially in a separate database Given a phoneme, our program can access this database and load the corresponding formant values and locate the appropriate sound file Problems Encountered PROBLEMS SOLUTIONS Applets are not allowed to access files or system resources without being first granted security permissions Compress all our files within a JAR file and sign it using a private key— allowing the user to decide whether permissions should be granted or not Runtime errors can occur when the user denies security, does not have a microphone line, or does not speak clearly enough Robust error handling by the audio programming which throws exceptions to the GUI, which in turn displays message boxes to the user Browsers with older versions of Java will not properly load the applet’s classes Include a link to the Java download page and a link to a third-party applet capable of determining the Java version Results Possible Improvements Suggesting techniques for physically manipulating formants Bigger and more extensive database that includes other phonemes besides vowels Improved scoring system Recording the user’s progressive history Vocal tract shape estimation Demonstration http://umsis.miami.edu/~tdallman/sep/