Download An Online Vowel Training Program

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Java (programming language) wikipedia , lookup

Java performance wikipedia , lookup

Transcript
An Online Vowel Training
Program
Partners:
Ted Dallmann
Paul Blair
Advisor: Dr. Michael Schordilis
Project Goal
Automate a training process for the twelve
English vowels by designing and
implementing a computer system that, given
a speech input, could provide real-time aural,
graphical, and numeric feedback to the user
to help him improve his pronunciation.
Overview
„
Speech
‰
‰
„
Our System
‰
‰
‰
‰
„
Brief Introduction to Speech and Phonemes
Characteristics of Vowels
Theory
Algorithms
Implementation
Development
Conclusion
‰
‰
Results
Possible Improvements
Speech
„
„
„
Speech is the system of acoustical cues with
we use to communicate verbally
These cues, called phonemes, are produced
via the human body using the glottis, lungs,
vocal cavity, nasal cavity, tongue, and other
body parts
Phonemes can be linked together to form
syllables, which can be linked together to
form words
Speech Production
„
The Human Speech Production System
‰
‰
‰
Power Supply = Lungs
Modulator = Larynx
Resonator = Vocal Tract
Classifications of Phonemes
„
Vowels
„
„
Consonants
‰
Plosives
„
‰
Noisy; caused by sudden bursts of air
Fricatives
„
„
Periodic; caused by glottal pulses resonating
Also noisy; caused by air constrictions
Miscellaneous
„
„
Combination of resonance and noise
Can vary frequency with time
Vowels
„
„
„
„
„
Vowels are formed when air pulses
from the glottis resonant through
the vocal tract
Because of their resonant nature,
vowels are quasi-periodic
The resonances of the vocal tract
color the spectrum of the vowel—
these resonant frequencies are
called formants
The shape of the vocal tract and
the positioning of the tongue
control the formants’ frequencies
Formant values also differ between
speakers based on the size, sex,
and age of the speaker
How Vowels are Differentiated
If everyone produces different formants, how can we say the
same vowels?
„
„
„
The key to recognizing
vowels is the ratios between
formants, not the actual
values themselves
The ratio can be visualized
in a F2/F1 graph
Most vowels lie within a
specific region of F2/F1; this
region is commonly called
the Vowel Triangle
Our System
Our automated training process should take a
digital speech input, isolate and analyze the
vowel, and provide feedback to the user
Record the
Given CVC
word
Isolate the
Vowel
Segment
FFT
And
LPC/ Smooth
Spectrum
Graph AudioWave
Extract
Formants
Provide User
with Text
Feedback
Graph Vowel
Triangle
Vowel Isolation
„
„
„
Because of their resonant nature, vowels have more
sustained energy than consonants or silence
By creating an normalized energy array of the average energy
value over a 1ms interval, the vowel can be discerned from
surrounding content
By truncating the signal to the corresponding intervals with
energy of 40% or more, we can separate the vowel
Formant Detection
„
„
„
FFT can give a quick but
“noisy” view of the spectral
content
LPC tries to mimic original
signal with minimal error, so
its spectral content is
smooth
The local maxima of the
LPC’s spectrum correspond
to formants from the original
vowel
Feedback
„
AURAL
‰
„
GRAPHICAL
‰
‰
„
Providing a correct pronunciation for the user to play and
allowing the user to replay his pronunciation over the
speaker to compare
Graphing the time and frequency response of the both the
user’s pronunciation and the demo
Plotting the user’s vowel inside the vowel triangle and
highlighting the region for the target pronunciation
NUMERICAL
‰
Providing a text box with the values, target values, and
percent errors for the first three formants
Implementation
„
„
„
All algorithms were first developed and tested
using the MATLAB environment
Once all algorithms were finalized, they were
rewritten into Java classes and further tested
as applications using the Java programming
language
After the Java classes were finalized, a Java
applet was created using those classes
Development
„
„
„
GUI - Graphical User Interface
‰ Java contains a comprehensive library for creating visual
components called Swing
‰ Swing allows developers to use existing visual components or to
create new ones based off of their prototypes
“Webifying”
‰ Java applets can be embedded into an HTML webpage allowing
users to run Java code over the internet
‰ Applets can reuse Java code from other applications
Database
‰ Formant values, sample words, and the location of the
demonstration sound file for each phoneme are stored serially in
a separate database
‰ Given a phoneme, our program can access this database and
load the corresponding formant values and locate the appropriate
sound file
Problems Encountered
PROBLEMS
SOLUTIONS
Applets are not allowed to access
files or system resources without
being first granted security
permissions
Compress all our files within a JAR
file and sign it using a private key—
allowing the user to decide whether
permissions should be granted or
not
Runtime errors can occur when the
user denies security, does not have
a microphone line, or does not
speak clearly enough
Robust error handling by the audio
programming which throws
exceptions to the GUI, which in turn
displays message boxes to the user
Browsers with older versions of Java
will not properly load the applet’s
classes
Include a link to the Java download
page and a link to a third-party
applet capable of determining the
Java version
Results
Possible Improvements
„
„
„
„
„
Suggesting techniques for physically
manipulating formants
Bigger and more extensive database that
includes other phonemes besides vowels
Improved scoring system
Recording the user’s progressive history
Vocal tract shape estimation
Demonstration
http://umsis.miami.edu/~tdallman/sep/