* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download TRACE model (McClelland and Elman 1986)
Neuroeconomics wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Biological neuron model wikipedia , lookup
Neural modeling fields wikipedia , lookup
Recurrent neural network wikipedia , lookup
Neurocomputational speech processing wikipedia , lookup
Speech synthesis wikipedia , lookup
PS: Introduction to Psycholinguistics Winter Term 2005/06 Instructor: Daniel Wiechmann Office hours: Mon 2-3 pm Email: [email protected] Phone: 03641-944534 Web: www.daniel-wiechmann.net Session 4: Understanding speech Problems with recognition of speech Segmentation problem (how to seperate sounds in speech) Possible remedies: Possible-word constraint Metrical segmentation strategy Stress-based segmentation Syllable-based segmentation Session 4: Understanding speech Categorical perception Experiment Liberman et al. (1957) Speech synthesizer creates continuum of artificial syllable that differ in the place of articulation of one phoneme Subjects placed syllables into three categories (/b/, /d/, /g/) Session 4: Understanding speech Categorical perception voice onset time (VOT) voiced and unvoiced consonants (e.g. /b/,/d/ vs /p/,/t/) differ with respect to VOT (difference ~ 60 ms) Experimenters varied VOT on a scale (e.g. 30ms) Subjects make ‚either-or‘ distinctions Session 4: Understanding speech Categorical perception Selective adaptation Repeated presentation of /ba/ makes people less sensitive to voicing feature (fatigue feature detector) cut-off point for /b/-/p/ destinction shifts toward /p/end of continuum Session 4: Understanding speech Prelexical (phonetic) vs postlexical (phonemic) code Prelexical code computed directly from perceptual analysis (bottom-up) Postlexical coded is computed from higher-level units such as words (top-down) Foss and Blank (1980) phoneme-monitoring task But cf. Foss and Gernsbacher (1983 and MarslenWilson and Warren (1994) Session 4: Understanding speech In summary: There is a controversy about whether or not we identify phonemes before we recognize higher level units (e.g. syllbles or words) Session 4: Understanding speech The role of context in identifying sounds: the phonemic restoration effect (cf. Warren and Warren 1970) Session 4: Understanding speech It was found that the *eel was on the orange It was found that the *eel was on the axle It was found that the *eel was on the shoe It was found that the *eel was on the table Session 4: Understanding speech It was found that the peel was on the orange It was found that the wheel was on the axle It was found that the heel was on the shoe It was found that the meal was on the table Understanding speech Phonemic restoration effect: 2 explanations 1. Context interacts directly with buttom-up processes (sensitivity effect) 2. Context may simply provide additional source of information (response bias effect) Understanding speech: Samuel (1981, 1990) Method: Subjects listen to sentences and meaningless noise was presented during each sentence On some trials, noise was superimposed on one of the phonemes of a word On other trials, phoneme was deleted Finally, sometimes phoneme was predicatble from context Task decide whether or not crucial phoneme had been presented Understanding speech: Samuel (1981, 1990) Phonemic restoration effect: 2 explanations Hypotheses 1. If context improves sensitivity, then the ability to dicriminate between phoneme plus noise and noise alone should be improved by predicatble context 2. If context affects response bias, then participants should simply be more likely to decide that the phoneme was presented when the word was presented in predictable context Understanding speech: Samuel (1981, 1990) Results: Context affected response bias but not sensitivity Contextual information does not have a direct effect on bottom-up processing Understanding speech: Models of speech recognition Most influential models Motor theory (Libermann et al 1967) Listeners mimic the articulatory movements of the speaker Cohort theory (Marslen-Wilsen and Tyler 1980) TRACE model (McClelland and Elman 1986) Understanding speech: Models of speech recognition: neurons Understanding speech: Models of speech recognition: neuron (schematic) Synapse: The junction across which a nerve impulse passes from an axon terminal to a neuron Understanding speech: Models of speech recognition: neuronal networks The brain is composed of over 10-100 billion nerve cells, or neurons, that communicate with one another through specialized contacts called synapses. Typically, a single neuron receives 2000-5000 synapses from other neurons; these synapses are located almost exclusively on the neuron's dendrites, long projections that radiate out from the neuron's cell body. In turn, the neuron's axon, a long thin process that grows out from the cell body of a neuron, makes synaptic connections with 1000 other neurons. In this way, neuronal signals pass from neuron to neuron to form extensive and elaborate neural circuits. Understanding speech: Models of speech recognition: number of neurons human brain Understanding speech: Models of speech recognition: introducing connectionist models Understanding speech: Models of speech recognition: introducing connectionist models Two central assumptions artificial neural nets (ANN): 1) processing occurs through the action of many simple, interconnected processing units (neurons) 2) activation spreads around the network in a way determined by the strength of the links, i.e. the connections between units Understanding speech: Models of speech recognition: introducing connectionist models Some models learn back-propagation Some don‘t Interactive activation model (IAC) McClelland and Rumelhart (1981) does not learn TRACE model (McClelland and Elman 1986) is an IAC model Understanding speech: Models of speech recognition: from neural networks to connectionist models Connections can be inhibitory or excitatory(facilitatory) Connections (or links) have different weights Threshold: the total amount of activation needed to make the node fire Understanding speech: Models of speech recognition: from neural networks to connectionist models + 0.6 (excitatory) - 0.5 (inhibitory) + 0.7 (excitatory) -1 to +1 Threshold: 1.0 Ergo: no firing Understanding speech: Models of speech recognition: from neural networks to connectionist models -1 to +1 - 0.5 + 0.9 (excitatory) - 0.2 (inhibitory) - 0.9 + 0.4 (excitatory) -1 to +1 + 0.5 Threshold: 1.0 Ergo: firing Understanding speech: Models of speech recognition: from neural networks to connectionist models Understanding speech: Models of speech recognition: from neural networks to connectionist models Interactive activation network (McClelland and Rumelhart 1981) Understanding speech: Models of speech recognition: TRACE TRACE model (McClelland and Elman 1986) There are individual processing units, or nodes, at three different levels: FEATURES (place & manner of production, voicing) PHONEMES WORDS Understanding speech: Models of speech recognition: TRACE TRACE model (McClelland and Elman 1986) Feature nodes are connected to phoneme nodes Phoneme nodes are connected to word nodes Connections between levels operate in both directions, and are only facilitatory (i.e. no inhibition) Understanding speech: Models of speech recognition: TRACE TRACE model (McClelland and Elman 1986) There are connections among units or nodes at the same level These connections are inhibitory Understanding speech: Models of speech recognition: TRACE TRACE model (McClelland and Elman 1986) Nodes influence each other in proportion to their activation levels and the strength of their interconnections As excitation and inhibition spread among nodes, a pattern of activation, or TRACE, develops Understanding speech: Models of speech recognition: TRACE TRACE model (McClelland and Elman 1986) The word that is recognized is determined by the activation level of the possible candidate words.