* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 1
Survey
Document related concepts
Transcript
Multimedia Systems Chapter 3: Audio and video Technology Audio • Audio is a wave resulting from air pressure disturbance that reaches our eardrum generating the sound we hear. – Humans can hear frequencies in the range 2020,000 Hz. • ‘Acoustics’ is the branch of physics that studies sound Facsimile Technology All modes of mass communication are based on the process of facsimile technology. That is, sounds from a speaker and pictures on a TV screen are merely representations, or facsimiles, of their original form. In general, the more faithful the reproduction or facsimile is to the original, the greater is its fidelity. High-fidelity audio, or hi-fi, is a close approximation of the original speech or music it represents. And a videocassette recorder marketed as high fidelity boasts better picture quality than a VCR without hi-fi (known as H-Q, to distinguish video high fidelity from its audio counterpart). The second point about facsimile technology is that in creating their facsimiles, radio and TV are not limited to plaster of Paris, crayon, oils, or even photographic chemicals and film. Instead, unseen elements such as radio waves, beams of light, and digital bits and bytes are utilized in the process Bear in mind that the engineer’s goal in radio, TV, and cable is to • create the best possible facsimile of our original sound or image, to transport that image without losing too much fidelity (known as signal loss), and to: • re-create that sound or image as closely as possible to its original form. Today, engineers use both analog and digital systems to transport images and sounds, but more and more we are switching to digital transmission. Transduction • Another basic concept is transduction, the process of changing one form of energy into another: When the telephone operator says “the number is 555-2796” and you write it down on a sheet of notepaper. Why does this matter? getting a sound or picture from a TV studio or concert hall to your home usually involves at least three or four transductions. At each phase loss of fidelity is possible and must be controlled. With our current system of broadcasting, it is possible that with each phase the whole process may break down into noise—unwanted interference—rendering the communication impossible. Sound and audio • While sound refers to the ability of vibrations to pass through a medium and reflect off a medium. Audio is the ability to digitally create sound using electronic equipment. Sound is a continuous wave that travels through air. The wave itself is comprised of pressure difference. Detection of sound is accomplished by measuring these pressure levels and their succession in time. The human ear does this detection naturally when the wave with its pressure differences impinges on the •The properties of sound include: Frequency, Wavelength, Wave number, Amplitude, Sound pressure, Sound intensity, Speed of sound and Direction. The speed of sound is an important property that determines the speed at which sound travels. The speed of sound differs depending on the medium through which it travels. The frequency refers to the rate at which the wave repeats. It is expressed as cycles per second or by the units hertz. The human hear is capable of perceiving wave frequencies in the range 20Hz and 20KHz, which is audio in nature. The amplitude is a measure of the displacement of the wave from the mean. For human perception this is related but not the same as loudness. Air Pressure Amplitude Time One Period One particular frequency component The wavelength of a sound is the distance the disturbance travels in one cycle and is related to the sound’s speed and frequency. However, in order to store this input in a computer one has to convert it to a digital form, that is into 0s and 1s. Further a continuous wave has infinite resolution which cannot be represented in a computer. Waveform Representation Audio Source Human Ear Audio Capture Playback (speaker) Sampling & Digitization Storage or Transmission Receiver Digital to Analog Audio Generation and Playback SIGNAL GENERATION • This step involves the creation of the necessary oscillations, or detectable vibrations of electrical energy, which correspond to the frequencies of their original counterparts in nature. In plain language, signal generation involves getting the sound vibrations into a microphone, or the bits and bytes onto a CD, a DVD, or an MP3 player. Audio Signal Generation • Sound signals are generated by two main transduction processes: mechanical and electronic. Mechanical methods, like microphones, phonograph records, and tape recorders, have been in use for many years. Mechanical Methods: Mechanical generation • Mechanical means are used to translate sound waves into a physical form, one you can hold in your hand, like a phonograph record or an audiocassette. Inside the microphone • One place where speech or music is mechanically re-created to produce electrical signals is inside a microphone. There are three basic types of microphones: dynamic, velocity, and condenser. Each produces the waveforms required for transmission in a different manner. dynamic microphone • In the center of the microphone is a coil of electrical wire, called a voice coil. Sound pressure vibrates the diaphragm, which moves the voice coil up and down between the magnetic poles. Digitization • Digitization is achieved by recording or sampling the continuous sound wave at discrete points. The more frequently one samples the closer one gets to capturing the continuity of the wave Principles of Digitization • Sampling: Divide the horizontal axis (time) into discrete pieces • The other aspect of digitization is the measurement of the voltages at these discrete sampling points. As it turns out these values may be of arbitrary precision, that is we could have values containing small fractions or decimal numbers that take more bits to represent. 23 • Quantization: Divide the vertical axis (signal strength voltage) into pieces. For example, 8-bit quantization divides the vertical axis into 256 levels. 16 bit gives you 65536 levels. Lower the quantization, lower the quality of the sound Coding • The process of representing quantized values digitally Sample Sample Sampling and Quantization Time Sampling • Sampling rate: Number of samples per second (measured in Hz) • E.g., CD standard audio uses a sampling rate of 44,100 Hz (44100 samples per second) Time 3-bit quantization 3-bit quantization gives 8 possible sample values E.g., CD standard audio uses 16-bit quantization giving 65536 values. Why Quantize? To Digitize! 28 Quantizing • Instead of sending the actual sample, first the sampled signal was put into a known number of levels, which is informed to the receiver. • Suppose instead of sending a whole range of voltages, the source informs the destination that it is going to send only 4 voltage levels, say 0-3V. For example if the sample is 2.7V, first, source will convert it into a 3V sample. Then it will be sent through the transmission medium. Destination gets a sample of 3.3V. Then immediately he knows that this is not an agreed level, hence the sent value has been changed. destination converts 3.3V sample back into a 3V. linear quantization • With linear quantization every increment in the sampled value corresponds to a fixed size analogue increment. E.g. an 8 bit A-D or D-A with a 0 - 1 V analogue range has 1 / 256 = 3.9 mV per bit, regardless of the actual signal amplitude. Non-linear quantization • With non-linear quantization you normally have some sort of logarithmic encoding, so that the increment for small sample values is much smaller than the increment for large sample values. Ideally the step size should be roughly proportional to the sample size Quality of voice • The quality of voice transmission is measured by signal to noise ratio. That is the division of the original signal value by the change made when quantizing. • The following S/N ratios were calculated Linear quantizing is not used. Why....? • It is noticeable that even though the noise is the same for both these signals the S/N ratio is highly different. It seems that Linear Quantizing gives high S/N ratios for high signals and low S/N ratios for low signals Nyquist Theorem •Any analog signal consists of components at various frequencies. The simplest case is the sine wave, in which all the signal energy is concentrated at one frequency. In practice, analog signals usually have complex waveforms, with components at many frequencies. The highest frequency component in an analog signal determines the bandwidth of that signal. The higher the frequency, the greater the bandwidth, if all other factors are held constant. • Suppose the highest frequency component, in hertz, for a given analog signal is fmax. According to the Nyquist Theorem, the sampling rate must be at least 2fmax, or twice the highest analog frequency component. The sampling in an analog-to-digital converter is actuated by a pulse generator (clock). If the sampling rate is less than 2fmax, some of the highest frequency components in the analog input signal will not be correctly represented in the digitized output. Nyquist Theorem Consider a sine wave Sampling once a cycle Appears as a constant signal Sampling 1.5 times each cycle Appears as a low frequency sine signal • For Lossless digitization, the sampling rate should be at least twice the maximum frequency responses 38 Characteristics of Audio • Audio has normal wave properties – Reflection – Refraction – Diffraction • A sound wave has several different properties: – Amplitude (loudness/intensity) – Frequency (pitch) – Envelope (waveform) •Refraction occurs when a wave crosses a boundary from one medium to another. A wave entering a medium at an angle will change direction. •Diffraction refers to the "bending of waves around an edge" of an object. Diffraction depends on the size of the object relative to the wavelength of the wave Decibel (dB) • The decibel (dB) is a logarithmic unit used to describe a ratio. The ratio may be power, or voltage or intensity or several other things • Suppose we have two loudspeakers, the first playing a sound with power P1, and another playing a louder version of the same sound with power P2, but everything else (how far away, frequency) kept the same. • The difference in decibels between the two is given by 10 log (P2/P1) dB • If the second produces twice as much power than the first, the difference in dB is 10 log (P2/P1) = 10 log 2 = 3 dB. • If the second had 10 times the power of the first, the difference in dB would be 10 log (P2/P1)= 10 log 10 = 10 dB. • If the second had a million times the power of the first, the difference in dB would be 10 log (P2/P1) = 10 log 1000000 = 60 dB. What happens when you halve the sound power? • The log of 2 is 0.3, so the log of 1/2 is -0.3. So, if you halve the power, you reduce the power and the sound level by 3 dB. Halve it again (down to 1/4 of the original power) and you reduce the level by another 3 dB. That is exactly what we have done in the first graphic and sound file below. The first sample of sound is white noise (a mix of all audible frequencies, just as white light is a mix of all visible frequencies). The second sample is the same noise, with the voltage reduced by a factor of the square root of 2. 2-0.5 is approximately 0.7, so -3 dB corresponds to reducing the voltage or the pressure to 70% of its original value. How big is a decibel? • One decibel is close to the Just Noticeable Difference (JND) for sound level. As you listen to these files, you will notice that the last is quieter than the first, but it is rather less clear to the ear that the second of any pair is quieter than its predecessor. 10*log10(1.26) = 1, so to increase the sound level by 1 dB, the power must be increased by 26%, or the voltage by 12%. Standard reference levels ("absolute" sound level) • When the decibel is used to give the sound level for a single sound rather than a ratio, then a reference level must be chosen. For sound intensity, the reference level (for air) is usually chosen as 20 micropascals, or 0.02 mPa • Cont….. • This is very low: it is 2 ten billionths of an atmosphere. Nevertheless, this is about the limit of sensitivity of the human ear, in its most sensitive range of frequency. Usually this sensitivity is only found in rather young people or in people who have not been exposed to loud music or other loud noises. • Personal music systems with in-ear speakers ('walkmans') are capable of very high sound levels in the ear, and are believed by some to be responsible for much of the hearing loss in young adults in developed countries. • So if you read of a sound intensity level of 86 dB, it means that 20 log (p2/p1) = 86 dB • where p1 is the sound pressure of the reference level, and p2 that of the sound in question. Divide both sides by 20: log (p2/p1) = 4.3 p2/p1 = 104.3 p2/p1 = 4.3 10 • 4 is the log of 10 thousand, 0.3 is the log of 2, so this sound has a sound pressure 20 thousand times greater than that of the reference level (p2/p1 = 20,000). 86 dB is a loud but not dangerous level of sound, if it is not maintained for very long. What does 0 dB mean? • This level occurs when the measured intensity is equal to the reference level. i.e., it is the sound level corresponding to 0.02 mPa. In this case we have sound level = 20 log (pmeasured/preference) = 20 log 1 = 0 dB • So 0 dB does not mean no sound, it means a sound level where the sound pressure is equal to that of the reference level. This is a small pressure, but not zero. It is also possible to have negative sound levels: - 20 dB would mean a sound with pressure 10 times smaller than the reference pressure, i.e. 2 micropascals. Audio Amplitude • In microphones, audio is captured as analog signals (continuous amplitude and time) that respond proportionally to the sound pressure, p. • The power in a sound wave, all else equal, goes as the square of the pressure. – Expressed in dynes/cm2. • The difference in sound pressure level between two sounds with p1 and p2 is therefore 20 log10 (p2/p1) dB • The “acoustic amplitude” of sound is measured in reference to p1 = pref = 0.0002 dynes/cm2. – The human ear is insensitive to sound pressure levels below pref. Audio Amplitude Intensity 0 dB 20 dB 25 dB 40 dB 50 dB 60 - 70 dB 80 dB 90 dB 120 - 130 dB 140 dB Typical Examples Threshold of hearing Rustling of paper Recording studio (ambient level) Resident (ambient level) Office (ambient level) Typical conversation Heavy road traffic Home audio listening level Threshold of pain Rock singer screaming into microphone Audio Frequency • Audio frequency is the number of high-to-low pressure cycles that occurs per second. – In music, frequency is referred to as pitch. • Different living organisms have different abilities to hear high frequency sounds – – – – – Dogs: up to 50KHz Cats: up to 60 KHz Bats: up to 120 KHz Dolphins: up to 160KHz Humans: • Called the audible band. • The exact audible band differs from one to another and deteriorates with age. Audio Frequency • The frequency range of sounds can be divided into – – – – Infra sound Audible sound 20 Hz Ultrasound Hypersound 0 Hz – 20 Hz – 20 KHz 20 KHz – 1 GHz 1 GHz – 10 GHz • Sound waves propagate at a speed of around 344 m/s in humid air at room temperature (20 C) – Hence, audio wave lengths typically vary from 17 m (corresponding to 20Hz) to 1.7 cm (corresponding to 20KHz). • Sound can be divided into periodic (e.g. whistling wind, bird songs, sound from music) and nonperiodic (e.g. speech, sneezes and rushing Audio Frequency • Most sounds are combinations of different frequencies and wave shapes. Hence, the spectrum of a typical audio signal contains one or more fundamental frequency, their harmonics, and possibly a few cross-modulation products. – Fundamental frequency – Harmonics •The harmonics and their amplitude determine the tone quality or timbre. Audio Envelope • When sound is generated, it does not last forever. The rise and fall of the intensity of the sound is known as the envelope. • A typical envelope consists of four sections: attack, decay, sustain and release. Audio Envelope • Attack: The intensity of a note increases from silence to a high level • Decay: The intensity decreases to a middle level. • Sustain: The middle level is sustained for a short period of time • Release: The intensity drops from the sustain level to zero. Audio Envelope • Different instruments have different envelope shapes – Violin notes have slower attacks but a longer sustain period. – Guitar notes have quick attacks and a slower release