Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1.0 1.1 CHAPTER ONE INTRODUCTION Hearing sound is different from listening to it and is also different from comprehending the sound. Speech intelligibility refers to the ability to comprehend what is spoken. It has been observed that humans can perceive a lot of sound without listening to it which is referred to as background noise. Humans can sometimes listen to sound without being able to understand it. This could be as a result of so many factors. Music is usually easier to understand than speech because in speech the information is ever changing and the brain may not be able to fill in missing part unlike music that contains some repeated words and rhymes that helps the listener understand what is been sang even after missing some parts of the music. In terms of individual communication, speech is probably the most important and efficient means, even in today's multi-media society. Experimental tests by Chapanis (Andrew marsh, uwa, 1999) show that, as would be expected, the performance time of cooperative tasks performed in groups was up to ten times faster when speech was allowed compared to when it was not. In situations where egress is complex or difficult, such as in high-rise buildings or large factories, human voice is often used to provide information. Failure to understand message content can result in several ways. A message that is not intelligent may not be understood. A message spoken in Spanish to an audience that only understands Cantonese will not be understood. A person talking rapidly or with a speech impediment can cause a message to not be understood. Even a well spoken, intelligent message in the language native to the listener can be misunderstood if it is not audible or if its delivery to the listener is distorted. These last failure mechanisms are the basis for the specification, modeling, and measurement of speech intelligibility performance. Thus, with many rooms being used solely for speech between individuals and groups, it is important that acoustic designs accommodate and enhance such use. In other to fully understand and make recommendations to speech intelligibility, there is the need to first understand how sound is perceived in humans. Page 1 of 15 1.2 ANATOMY OF THE HUMAN EAR Plate 1: Diagram showing anatomy of the ear. Source: virtualmedicalcentre. com From the above diagram the ear is divided into three parts, they are; 1.2.1 OUTER EAR The outer part of the human ear is made up of the pinna, the ear canal (or external auditory meatus) and the eardrum (or tympanic membrane). 1.2.2 MIDDLE EAR It is made of the three ear ossicles; the hammer (malleus) attached to the eardrum; the anvil (incus); and the stirrup (stapes), the oval window and the Eustachian tube. 1.2.3 INNER EAR It consists of the vestibule, semi- circular canals, and cochlea. Page 2 of 15 1.3 HEARING MECHANISM Sound is a disturbance that originates from a source and travels through a medium which could be air, liquid or solid which causes a sensation of hearing in animals. Sound is the means of auditory communication, including frog calls, bird songs and spoken language. The ear is the part of the animal sense organ that recognizes sound but sound interpretation is the function of the brain. Sometimes sound waves are transmitted to the inner ear by a method of hearing called bone conduction. For example, people hear their own voice partly by bone conduction. The voice causes the bones of the skull to vibrate, and these vibrations directly stimulate the soundsensitive cells of the inner ear. Only a relatively small part of a normal person’s hearing depends on bone conduction, but some totally deaf people can be helped if sound vibrations are transferred to the skull bones by a hearing aid. Humans hear primarily by detecting airborne sound waves, which are collected by the auricles. The auricles also help locate the direction of sound. After being collected by the auricles, sound waves pass through the outer auditory canal to the eardrum, causing it to vibrate. The vibrations of the eardrum are then transmitted through the ossicles, the chain of bones in the middle ear. As the vibrations pass from the relatively large area of the eardrum through the chain of bones, which have a smaller area, their force is concentrated. This concentration amplifies, or increases, the sound. When the sound vibrations reach the stirrup, the stirrup pushes in and out of the oval window. This movement sets the fluids in the vestibular and tympanic canals in motion. To relieve the pressure of the moving fluid, the membrane of the oval window bulges out and in. The alternating changes of pressure in the fluid of the canals cause the basilar membrane to move. The organ of Corti, which is part of the basilar membrane, also moves, bending its hairlike projections. The bent projections stimulate the sensory cells to transmit impulses along the auditory nerve to the brain. The part of the ear that is dedicated to sensing balance and position also sends impulses through the eighth cranial nerve, the 8th nerve's Vestibular Portion. Those impulses are sent to the vestibular portion of the central nervous system. The messages from the Page 3 of 15 right and left vestibular systems feed by way of the right and left vestibular nerves into the vestibular centers (nuclei) in the brainstem. These centers also receive input from the eyes, muscles, spinal cord, and joints. Furthermore, higher centers in the brain continue to process the information. The final result is an integrated system that allows us to maintain our balance in our ever-changing environment. Plate 2; diagram showing hearing mechanism Source: virtualmedicalcentre. com Page 4 of 15 2.0 2.1 CHAPTER TWO SPEECH INTELLIGIBILITY The intelligibility of speech refers to the accuracy with which a normal listener can understand a spoken word or phrase. Given the fact that some of the information communicated through speech is contained within contextual, visual and gestural cues, it is still possible to understand meaning even if only a fraction of the discrete speech units are heard correctly. However, in large auditoria and places where reproduced speech is used, the listener has limited access to these cues and must rely more heavily upon the sound actually produced by the mouth. 2.1.1 MEASURING INTELLIGIBILITY Research into this area began with the development of telephone and telecommunication systems in the early part of this century. A product of this research was a quantitative measure for intelligibility based on articulation testing. This procedure (as described by Lochner and Burger) normally consists of an announcer reading out lists of syllables, words or sentences to one or more listeners within the test enclosure. The percentage of these correctly recorded by the listeners is called the articulation score and is then taken as an 'in-situ' measure of the speech intelligibility of that enclosure. The science of articulation testing was substantially refined at Bell Telephone Laboratories and later at the Psycho-Acoustic Laboratory at Harvard University. From this later work, a set of phonetically balanced, mono-syllabic test lists were prepared, called the Harvard P.B.50 word score. In order to negate any influence of non-phonetic cues on the measured intelligibility, these word lists comprise only of meaningless or jumbled syllables. Thus, in order to be correctly recorded by the listener, each consonant and vowel sound must be clearly audible. As a further measure, many tests are conducted with the syllables embedded in a carrier phrase in an attempt to simulate fluent speech. There are now many derivations of this methodology (such as the Fairbanks rhythm method used by Bradley and Latham); however, the resulting value is a percentage score of correctly recorded syllables. Thus the degree of intelligibility is considered to correlate with the average of these scores. This percentage becomes the measured speech intelligibility rating for that particular enclosure. Page 5 of 15 However, even under perfect conditions, the maximum word score normally attainable is about 95% due to unavoidable errors. A word score of 80% enables the audience to understand every sentence without due effort. In a room where the word score is closer to 70%, the listener has to concentrate to understand what is said whilst below 60% the intelligibility is quite poor. 2.1.2 PREDICTING INTELLIGIBILITY There are several available methods of predicting Speech Intelligibility within an enclosure. These include the articulation index (AI), the speech interference level (SIL), the A-weighted signal-to-noise ratio (Lsa), useful/detrimental sound ratio's (U80 and U95) and the speech transmission index (STI). Each of these methods is based on the same fundamental principle, determining a ratio between the received speech signal and the level of interfering noise. It is this basic signal-to-noise relationship upon which speech intelligibility is deemed to depend - the higher the ratio, the greater the intelligibility. There are basically three measurable factors which influence these signal-to-noise ratios; The level and manner of the speech output, The level and spectrum of background noise and The nature and duration of the room response. 2.1.3 CALCULATING SPEECH INTELLIGIBILITY There are recently developed softwares that can be used to calculate speech intelligibility. They able to calculate reverberation time, critical distance, echo, and sound levels over the seating areas for rooms. The results show the inter-relationship between reverberation time, room finishes, speaker selection and speech intelligibility. The calculation is carried out by inputting the physical parameters of the space into the program. AutoCAD file can be used to transfer the data to the program. The data entry includes surface materials, background noise, and the seating layout. This calculation helps in auralization which is an acoustic program that simulate the sound as it would be heard after the project is built. Page 6 of 15 3.0 CHAPTER THREE 3.1 EFFECTS OF HUMAN MIDDLE EAR MUSCLE CONTRACTIONS ON SPEECH INTELLIGIBILITY. An assessment of the relationship between the acoustic reflex and speech intelligibility was made. Monosyllabic words mixed with noise were presented at -6, -3.0 and +3 dB signal-tonoise ratios. The word lists were presented with a 2000-Hz tone in the contra lateral ear at a level 15 dB above or 20 dB below the acoustic reflex threshold to evaluate intelligibility differences with the acoustic reflex contracted and relaxed. The results indicated that a significant decrement in speech intelligibility differences with the acoustic reflex contracted and relaxed. The results indicated that a significant decrement in speech intelligibility occurred with the reflex contracted at signal-to-noise ratios of -3, and 0 dB. Slight but not significant decrements were seen at -6 and +3 dB signal-to-noise ratios. 3.2 EFFECT OF AGE ON BINAURAL SPEECH INTELLIGIBILITY Sentence perception performance, in quiet and in background noise, was measured in three groups of adult subjects categorized as young, middle-aged, and elderly. Pure tone audiometric thresholds, measures of inner ear function, obtained in all subjects were within the clinically normal hearing range. The primary purpose of this study was to determine the effect of age on speech perception: a secondary purpose was to determine if the speech recognition problem commonly reported in elderly subjects might be due to alterations at sites central to the peripheral nervous system inner ear. Standardized sentence lists were presented in free field conditions in order to invoke binaural hearing that occurs at the brainstem level, and to simulate everyday speech-in-noise listening conditions. The results indicated: (1) an age effect on speech perception performance in quiet and in noise backgrounds, (2) absolute pure tone thresholds conventionally obtained monaurally do not accurately predict suprathreshold speech perception performance in elderly subjects, and (3) by implication the listening problems of the elderly may be influenced by auditory processing changes upstream of the inner ear. Page 7 of 15 4.0 4.1 CHAPTER FOUR FACTORS AFFECTING SPEECH INTELLIGIBILITY For speech to be intelligible, it must have adequate audibility (sound pressure level) and adequate clarity. For audibility, we are concerned with the signal-to-noise ratio. Voice is highly modulated, and so while intelligibility measurements do incorporate audibility, it is not to the same standards used for audibility of tone generating systems. 4.1.1 BACKGROUND NOISE Within every acoustic environment, there is always a certain level of ambient background noise present. The level of this is mostly dependant on the activities taking place within the space and its more immediate surrounds. The most obvious effect of background noise is that it masks the speech signal, thus reducing the signal-to-noise ratio as the receiver must specifically concentrate on the speech. There is, however, another effect of background noise. As is obvious, short duration speech may entail significant vocal effort whereas conversations of a longer duration require a much lower level in order that it is comfortably sustained by the speaker. This, however, explains only one aspect of a speaker choice of speech level. One of the most important determinants of speech level is what is termed the Lombard effect. This effect (originally noted by Lombard and studied further by Lane and Tranel) is most clearly illustrated when required to speak whilst listening to headphones. In general, a speaker checks his vocal effort using feedback from his own hearing and the exertion of muscles participating in the speech process. With headphones obscuring the ears and the music masking much of the feedback, the voice is almost automatically raised to compensate. Lane and Tranel showed that background and other interfering noise have much the same effect, as do both temporary and permanent hearing loss. Quantifying this effect is difficult as the individual response is often complex. For example, in normal conversation, rather than raising their voice, participants are more likely to move closer together. However a rule-of-thumb relationship (as suggested by Lazarus) is that every 1 dB increase in interfering noise above 45 dB will result in an average rise in output speech level of 0.5 - 0.6 dB. This automatic rise does not normally occur at softer speech levels as these are more likely to mean individual or face-to-face conversations. In these situations, the clear Page 8 of 15 preference is almost always to move closer together meaning that this effect is only applied to speech levels of 55 dB or higher. 4.1.2 ROOM RESPONSE The nature of the room response can significantly affect speech intelligibility. This influence, whether beneficial or detrimental, is a function of the impulse response. In general, the enclosure will enhance the perception of speech when the amount of energy reaching the listener within the speech integration period (35-50 msec) is relatively high. Given that ambient background noise is constantly being reflected about the enclosure, any additional early speech reflections will effectively increase the apparent signal-to-noise ratio. However, late arriving reflections and excessive reverberation actually contribute to the apparent background noise level by interfering with the direct speech signal. Thus, too much late sound energy will tend to reduce the apparent signal-to-noise ratio. 4.1.3 SPEECH LEVEL In a disturbance-free environment, normal speech levels fall between 55 and 65 dB (measured at a distance of 1m from the speaker). In specific situations, levels may reach as high as 96 dB when shouting a warning, or as low as 30 dB when whispering softly. It should be noted that speech levels vary significantly between individuals, even in similar acoustic conditions. Thus, when making predictions of speech intelligibility, it is often wise to base them on worst-case values rather than the average. This can be done by tempering the average value by an amount representing one standard deviation of inter-individual speech levels. Even this does not fully account for the variation at extremes of effort. As can easily be shown (Pearsons et al), up to 15% of women cannot raise their voice above 75 dB whilst 15% of shouting men can easily exceed 96 dB, sometimes reaching as high as 104 dB. The following table presents 'worst-case' values for vocal effort. Vocal Effort dB(A) Vocal Effort dB(A) Whispering 32 Raised 57 Soft 37 Loud 62 Relaxed 42 Very Loud 67 Shouting 72 Normal (private) 47 Page 9 of 15 Normal (public) 52 Max. Shout 77 Table 1: Showing average vocal effort and sound level. Source: When calculating intelligibility, it has been shown that loud or shouted speech is more difficult to understand, regardless of the level at the listeners’ ear. This is due mainly to changes in phonetics and intonation, becoming noticeable above 75 dB. Additionally, if, for more normal speech, the level arriving at the ear is very high (greater than 80 dB), there is research to indicate an overloading of the ear. This may occur as a result of a very short speaker-listener distance and results in a further decrease. Page 10 of 15 5.0 5.1 CHAPTER FIVE MEASURES OF SPEECH INTELLIGIBILITY 5.1.1 THE A-WEIGHTED SIGNAL/NOISE RATIO (SNA) This is probably the simplest and easiest to apply of all the methods proposed. Simply determined, this measurement relates to the difference between the A-weighted long-term average speech level and the A-weighted long-term average level of background noise, measured over any particular time. SNA = LSA - LNA 5.1.2 THE ARTICULATION INDEX (AI.) This value is basically a linear measure ranging from 0.0 to 1.0 based on calculations of the signal to noise ratios in five octave bands (with centre frequencies of 0.25, 0.5, 1, 2 and 4 kHz). It is possible to obtain a more accurate calculation based upon 1/3rd octave band sound pressure levels (based on work by Kryter); however, this requires more detailed knowledge of both the speech and noise spectrums. Since the speech level usually refers to the long term value for normal speakers, octave spectra are normally sufficient for simple calculations. 5.1.3 SPEECH TRANSMISSION INDEX (STI) It is a measure of speech transmission quality. The absolute measurement of speech intelligibility is a complex science. The STI measures some physical characteristics of a transmission channel (a room, electro-acoustic equipment, telephone line, etc.), and expresses the ability of the channel to carry across the characteristics of a speech signal. STI is a well-established objective measurement predictor of how the characteristics of the transmission channel affect speech intelligibility. STI predicts the likelihood of syllables, words and sentences being comprehended. As an example, for native speakers, this likelihood is given by: STI Quality according to Intelligibility of Intelligibility of Intelligibility of Page 11 of 15 Value IEC 60268-16 Syllables in % Words in % Sentences in % 0 - 34 0 - 67 0 – 89 poor 34 - 48 67 - 78 89 – 92 fair 48 - 67 78 - 87 92 – 95 good 67 - 90 87 - 94 95 – 96 90 - 96 94 - 96 96 – 100 0 - 0.3 bad 0.3 0.45 0.45 0.6 0.6 0.75 0.75 - 1 excellent Table 2: speech intelligibility likelihood for native speakers Source: Another method is defined for computing a physical measure that is highly correlated with the intelligibility of speech as evaluated by speech perception tests given a group of talkers and listeners. This measure is called the Speech Intelligibility Index, or SII 5.2 RECOMMENDED ACOUSTICS FOR SPEECH Unamplified speech sound at distant of 3meters 30 dBA - whispering. 60 dBA - lecture voice. 70 dBA - loud actor (down to 50 dB after 30 rows of seats). Page 12 of 15 6.0 6.1. CHAPTER SIX SPEECH REINFORCEMENT The goal of a speech reinforcement system is to deliver the speaking voice to listeners with sufficient clarity to be understood. Given the complexity of the speech signal, the task of providing high-quality speech reinforcement in real-world, less-than-ideal conditions is doubly complicated. This can be done through the following means. 6.1.1 MASKING. The most common obstacle that speech system designers face is the intrusion of unwanted sounds that inevitably interfere with the speech signal. The effect is called “masking,” — a general term that covers a very wide variety of situations. Masking noise can come from acoustical sources such as ventilation equipment, traffic, crowds and commonly, reverberation and echoes. It can also arise electronically from thermal noise, tape hiss or distortion products. If the sound system has unusually large peaks in its frequency response, the speech signal can even end up masking itself. One relationship between the strength of the speech signal and the masking sound is called the signal-to-noise ratio expressed in decibels. Ideally, the S/N ratio is greater than 0dB, indicating that the speech is louder than the noise. Just how much louder the speech needs to be in order to be understood varies with, among other things, the type and spectral content of the masking noise. The most uniformly effective mask is broadband noise. 6.1.2 FREQUENCY RESPONSE. One of the most obvious aspects of sound system performance that affect intelligibility is frequency response. Severely band-limited systems deliver speech poorly. For instance, telephones are generally limited to a 2 kHz bandwidth, and this makes it hard to distinguish between “f” and “s” or “d” and “t” sounds. High-quality speech systems need to cover the frequency range of about 80 Hz (for especially deep male voices) to about 10 kHz (for best reproduction of consonants, which are crucial to intelligibility). Response below 80 Hz must be eliminated to the extent possible: not only do these frequencies fall below the range of the speech signal, but also they will cause particularly destructive masking at high sound levels. Page 13 of 15 6.1.3 DISTORTION. Early studies of intelligibility in communication systems suggest that clipping the peaks of the speech signal, and then amplifying it to restore its peak-to-peak amplitude, improves intelligibility. The trick works in very noisy situations because clipping generates partials that are harmonically related to the fundamental — and thus less likely to mask the speech — and because it both accentuates consonants and increases the sound power of the signal. As such, it has been helpful for band-limited communication systems that are used in very noisy environments, such as the deck of an aircraft carrier. 6.1.4 TIME RESPONSE. Perhaps because it remains poorly understood and its effects are more subtle, phase response in communication systems has received scant attention. In fact, most published research about “phase” and intelligibility actually deals with the effects of relative polarity. It’s been shown, for instance, that when speech is presented with noise over headphones, intelligibility increases by about 25% if the speech signal in one ear is inverted relative to the other ear. But this result has no application in sound reinforcement, other than for in-ear stage monitors. Page 14 of 15 REFERENCES 1. http://www.sdngnet.com/Files/Lectures/FUTA-ARC-507 2. Andrew Marsh, UWA, 1999.The School of Architecture and Fine Arts The University of Western Australia 3. Greinwald, John H. Jr. MD; Hartnick, Christopher J. MD. The Evaluation of Children with Hearing Loss. Archives of Otolaryngology — Head & Neck Surgery. 128(1):84-87, January 2002 4. Stenström, J. Sten: Deformities of the ear; In: Grabb, W., C., Smith, J.S. (Edited): “Plastic Surgery”, Little, Brown and Company, Boston, 1979, ISBN 0-316-32269-5 (C), ISBN 0-316-32268-7 (P) 5. Lam SM. Edward Talbot Ely: father of aesthetic otoplasty. [Biography. Historical Article. Journal Article] Archives of Facial Plastic Surgery. 6(1):64, 2004 Jan-Feb. 6. Siegert R. Combined reconstruction of congenital auricular atresia and severe microtia. [Evaluation Studies. Journal Article] Laryngoscope. 113(11):2021-7; discussion 2028-9, 2003 Nov. 7. ENCARTA 2009 Page 15 of 15