Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Speech Perception Richard Wright Linguistics 453 Class Overview Physiology Auditory Shaping of the signal Auditory Cues Normalization and Context Experiment types Physiology 1: The Ear Outer: Pinna, Ear Canal, Ear Drum Middle: Ossicles, Oval Window Inner: Cochlea — Basilar Membrane, Tectorial Membrane, Hair Cells Physiology 1: The Outer Ear Pinna: directional hearing Ear Canal: high frequency emphasis (very short resonator closed at one end) Ear Drum: membrane’s vibrations convert pressure fluctuations to mechanical movement Physiology 1: The Middle Ear Ossicles (Malleus, Incus, Stapes): Convert eardrum movement to movement of oval window — overcomes air to fluid impedance. Lower frequency emphasis (5004000 Hz) Lessen impact of very loud noises by stiffening (damping) Physiology 1: The Inner Ear Cochlea: fluid filled cavity, wave propagation in fluid caused by movement of oval window Basilar Membrane:stiff and narrow at base — wide and flaccid at apex: base = high frequencies and apex = low frequencies (acts like series of band pass filters). Most of membrane is devoted to sounds below 5000 Hz. Shearing between Basilar and Tectorial membranes displace hair nerve endings cells exciting cochlear Physiology 2: Nerual Pathway Cochlear Nerve Cochlear Nucleus Lateral Lemniscus Auditory Cortex Medial geniculate CIC Inferior coliculus Auditory raditaions Cortex Probst Lateral lemnis cus Superior olive Held Monakow Cochlear nerve Mid-line Cochlear nucleus Auditory Shaping of the Signal Frequency Selectivity: Changes in frequency of stimulus do not result in equivalent changes in sensitivity Non-linear loudness sensitivity Phase Locking and noise reduction Lateral Inhibition and Tuning Onsets and neural spikes Frequency Selectivity Onset Advantage Delgutte and Kiang (1984) What are Cues? Cues: information in the signal that listeners use in recovering the segmental content of the utterance – – – – Place cues Manner cues Voicing cues Vowel quality cues Distribution of Cues Place cues stop release burst fricative noise F3 F2 F1 F2 transitions nasal pole and zero Distribution of Cues Manner cues stop release burst slope of formant transitions nasalization of vowel F3 F2 F1 abruptness and degree of attenuation fricative noise nasal pole and zero Distribution of Cues Voicing cues release burst amplitude vowel duration aspiration noise F3 F2 F1 vowel duration VOT stricture duration periodicity Distribution of Cues Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues Distribution of Cues Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues Fricative noise, particularly sibilant, contains robust cues: fricatives may be recovered in the absence of formant transitions Distribution of Cues Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues Fricative noise, particularly sibilant, contains robust cues: fricatives may be recovered in the absence of formant transitions Nasals contain strong manner cues but weak place cues Onset Advantage Redundancy advantage: Onset stops automatically have both a release burst and a set of formant transitions Coda stops may be unreleased and therefore have less cue redundancy Onset Advantage Onset consonant with flanking vowels Experimental Tasks Identification Discrimination Rating Method of Adjustment (MOA) Exp.Tasks 1: Identification Listeners are asked to identify stimuli as speech sounds... Open set: options open Forced choice: listeners choices constrained Experiment 1: Onset vs Coda Stimuli – – – – male speaker of American English /ba, da, ga, ab, ad, ag/ bursts excised 16 bit, 22 kHz mixed in three levels of white noise: • no noise • noise at 2 dB above RMS of signal • noise at 2 dB below RMS of signal Experiment 1: Onset vs Coda Task – – – – – onsets & codas mixed and randomized presented binaurally over headphones 3 way forced choice task: “B D G” labeled button press self paced Exp.Tasks 2: Discrimination Listeners are asked to respond “same” or “different” to presented sets of stimuli AX discrimination: fixed initial stimulus, variable second stimulus (same/different) ABX discrimination: two fixed initial stimuli, variable third stimulus (same A, same B) Experiment 2: vowel discrimination Stimuli – Synthetic vowel continuum – Equal steps: 2.37 Bark along F1-F2 dimension – 16 bit, 11 kHz – variable AX design Experiment 2: vowel discrimination Task – – – – same/different response to vowel pairs presented binaurally over headphones labeled button press speeded (limited time to decide) Exp.Tasks 3: Ratings Listeners are asked to rate a stimulus in some way: goodness, similarity, accentedness Example: Effect of intonational contour on naturalness: listeners hear sentences with and without f0 contour and rate naturalness on a 1-5 scale. Exp.Tasks 4: MOA Listeners are asked to adjust a stimulus along some dimensions until it fits some criterion: matches another stimulus, sounds most natural, matches a category, etc. (can be identification, discrimination, or rating exp.) Advantages and shortcomings 1 Open identification – Good: most natural, subjects understand – Bad: time consuming, little control of variables, stats difficult (non-comparable resoponses across subjects Forced choice identification – Good: less time consuming, control of response variables – Bad: not as natural Advantages and shortcomings 2 Discrimination – Good: allows experimenter to map relationship between classification and discrimination – Bad: very time consuming, not at all natural, unintuitive to subjects Advantages and shortcomings 3 Rating – Good: allows experimenter to map preferences in a multidimensional space, allows for correlation between one or more aspects of stimulus – Bad: hard to control interactions between preferences and stimulus variables, not that natural Advantages and shortcomings 4 Method of adjustment (MOA) – Good: much quicker method of mapping multidimensional perceptional – Bad: not natural, complex interaction of stimulus variables