Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Surround sound wikipedia , lookup
Compact disc wikipedia , lookup
Loudspeaker wikipedia , lookup
Videocassette recorder wikipedia , lookup
Mixing console wikipedia , lookup
Dynamic range compression wikipedia , lookup
Sound reinforcement system wikipedia , lookup
Home cinema wikipedia , lookup
Music technology (electronic and digital) wikipedia , lookup
Multimedia Object Types: Sound ISMT Multimedia Dr Vojislav B Mišić Why sound? One of the fundamental sensing mechanisms for humans (arguably not the most important …) Hearing supports vision in our perception of the outside world Hearing also provides the primary sensing mechanism in cases where vision cannot function well enough (or does not function at all) ISMT Multimedia Lecture 04/1 © 2001 Dr. Vojislav B. Mišić Tones and sounds Sounds are often classified in three categories: Speech Music Noise ISMT Multimedia Lecture 04/2 © 2001 Dr. Vojislav B. Mišić Speech and voice Of all sounds, the human voice is probably the most absorbing Different narrators, ages, genders, accents and intonations, give rise to different voice effects Computer-generated speech may be used for simple messages – it is still not good enough for complex narrations ISMT Multimedia Lecture 04/3 © 2001 Dr. Vojislav B. Mišić Role of music Music is often used as an emotional and alluring enhancement to a project Music can be used for creating background Music may be a prominent focal point in some projects ISMT Multimedia Lecture 04/4 © 2001 Dr. Vojislav B. Mišić Using sound in presentations Musical tones can create a sense of harmony (or the contrary ) Music can invoke different emotions (depending on context, age, gender, culture, … ) Sound can create a mood, an atmosphere to enhance the effect of visible stimuli Sound and vision should act together towards the desired experience The sum is often more effective than the parts ISMT Multimedia Lecture 04/5 © 2001 Dr. Vojislav B. Mišić Using sound in interfaces Sound can provide humorous or serious accompaniment to events and content Sound in an interface can alert users of problems or opportunities mask transitions acknowledge user actions convey information divert attention from other processes ISMT Multimedia Lecture 04/6 © 2001 Dr. Vojislav B. Mišić Designing sound Rely on your movie/TV consumer experience Sound can be background music, voice, or sound effects Effective sounds can attract users, and vice versa Avoid silent intervals, unless silence is specifically used as the accompaniment Use sound effects – but don’t overdo it Spatial sound effects can enhance listener experience ISMT Multimedia Lecture 04/7 © 2001 Dr. Vojislav B. Mišić Designing dynamics The dynamics of sound don’t need to be natural shooting can be less loud than it really is whispering can be louder than it really is Proper sound dynamics should follow the presentation ISMT Multimedia Lecture 04/8 © 2001 Dr. Vojislav B. Mišić Back to the Basics What is sound? Say, for example … Is there a sound when a firecracker explodes on North Pole? Psychologists say: no (no humans to hear it) Biologists say: no (no humans, no bears to hear it) Physicists say: yes (there are waves in the air) ISMT Multimedia Lecture 04/9 © 2001 Dr. Vojislav B. Mišić Physics says … Variations of the air pressure in the range 20-20,000 Hertz (40-15,000 is more likely, especially for older people) Sensitivity depends on frequency (Fletcher-Munson curves) Depends on sound level Very low energy – 90dB ~ 10-3 W/m2 Intensity expressed on a logarithmic scale (deciBel, dB) 6dB higher = twice the pressure 20dB higher = 10 times the pressure Audible dynamic range about 120dB ISMT Multimedia Lecture 04/10 © 2001 Dr. Vojislav B. Mišić Fletcher-Munson Curves ISMT Multimedia Lecture 04/11 © 2001 Dr. Vojislav B. Mišić Human Ear ISMT Multimedia Lecture 04/12 © 2001 Dr. Vojislav B. Mišić Sound reception and perception Received through ears, processed in the inner ear, final processing / recognition performed by the brain Processing apparatus rather complex and sensitive ISMT Multimedia Lecture 04/13 © 2001 Dr. Vojislav B. Mišić … hear no evil … Sound reception cannot be consciously switched off (you cannot close your ears, really) ... but it can be masked semi-consciously Which can be useful in noisy environments (and for other things as well – but more on that later) ISMT Multimedia Lecture 04/14 © 2001 Dr. Vojislav B. Mišić Distortion Humans can tolerate large distortions, while still understanding spoken words, recognizing the speaker, or following a melody Humans can also detect very small distortions, especially for well known sounds Suitable for recognition purposes – with humans still outperforming computers ISMT Multimedia Lecture 04/15 © 2001 Dr. Vojislav B. Mišić Spatial Perception and How to Create It Binaural listening enables spatial perception, based on intensity and phase differences between signals from left and right ear The shape of our ears determines directional sensitivity left/right is the highest then front/back, and (finally) up/down Two channels are standard, more complex schemes emerging recently (4.1, 5.1, etc.) Should be sufficient to create a spatial sound image ISMT Multimedia Lecture 04/16 © 2001 Dr. Vojislav B. Mišić Hearing vs. Vision Usable frequency range is 8 to 10 octaves (only 1 for vision) Dynamic range is higher Detectable distortions are much smaller Sound recognition is generally better In other words, ear is a better receptor ISMT Multimedia Lecture 04/17 © 2001 Dr. Vojislav B. Mišić Audio standards analog audio: cassette tape, audio (vinyl) records, sound from video tapes system beeps and sounds (Mac, Windows) digital audio MP3 – perceptual coding MIDI (slowly fading into oblivion) ISMT Multimedia Lecture 04/18 © 2001 Dr. Vojislav B. Mišić Digital audio Digitalization: conversion of a continuous analog signal to a sequence of digital values Sampling frequency must be at least twice the highest frequency in the signal spectrum (Nyquist); for audio it is about 40 kHz CD quality recording – 44,100 Hz (44,025 is sometimes used for TV compatibility) lower sampling frequencies may be used … with corresponding loss of quality ISMT Multimedia Lecture 04/19 © 2001 Dr. Vojislav B. Mišić Dynamic range coding of samples: depends on the desired quality and on the available storage or bandwidth 16 bit coding gives 65536 possible signal levels … or over 96 dB dynamic range (each extra bit adds about 6dB of signal-to-noise ratio) … which is quite sufficient, even for classical music, because most of us don’t have the proper environment to enjoy it less bits result in lower quality, e.g., 8 bits per sample: telephone quality ISMT Multimedia Lecture 04/20 © 2001 Dr. Vojislav B. Mišić Storage requirements CD quality sound (stereo) requires about 172 KB/s Hence, 1 minute takes about 10.5 MB (oops!) Red Book audio CD's have 680MB capacity, hold about 70 minutes of music Hence, lower sampling rates are used, with the associated loss in color/brightness ISMT Multimedia Lecture 04/21 © 2001 Dr. Vojislav B. Mišić Why Not Compress It? Sound does not lend itself to compression easily (unlike video) because Redundancy is low Receptors are unusually good at spotting distortion Some attempts were made (RealAudio) but the compression factor is not high, or the quality is audibly deteriorated Perceptual Coding (as used in MP3 and other schemes) to the rescue … ISMT Multimedia Lecture 04/22 © 2001 Dr. Vojislav B. Mišić Audio editing set proper recording levels (small peak levels, or you will get distortion on loud passages) always record your sounds at highest possible frequency and resolution, then down-sample it to the desired frequency and/or resolution Postproduction: trimming splicing assembly and mixing different sources equalization time stretching digital effects format conversion resampling and downsampling ISMT Multimedia Lecture 04/23 © 2001 Dr. Vojislav B. Mišić Audio file formats Several formats exist for raw digital audio files: WAV (Microsoft) AU (Sun/Next audio) AIFF with different sampling frequencies, coding options, mono/stereo, … Fortunately, all players can play all formats … ISMT Multimedia Lecture 04/24 © 2001 Dr. Vojislav B. Mišić Basics of Perceptual Coding For all of our senses—or for vision and hearing at least— the amount of information actually received is much higher than the amount of information processed Therefore, it would be advantageous to try to code and store only the information which is actually processed, and simply discard the remainder Now, the trick is: how to find this really important information … ISMT Multimedia Lecture 04/25 © 2001 Dr. Vojislav B. Mišić What We Hear and Don’t Hear Loudness masking: a signal may be fairly audible by itself, but may be masked by another, possibly louder, signal at a different frequency ISMT Multimedia Lecture 04/26 © 2001 Dr. Vojislav B. Mišić What We Hear and Don’t Hear Temporal masking: a signal may be fairly audible by itself, but another signal at a different frequency may render it temporarily imperceptible Joint stereo: in a stereo signal (i.e., left and right signals of the same recording) the difference between the two is not too big, and it’s often limited to higher frequency Remember subwoofers: there’s only one per stereo system So, when you combine all of these … ISMT Multimedia Lecture 04/27 © 2001 Dr. Vojislav B. Mišić …Enter MP3 Requirements for high compression rates for VCDs (MPEG) have led to the research in audio compression … Which resulted in MP3 perceptual coding scheme MP3: MPEG-1 Audio Layer 3 (there are Layers 1 and 2), makes clever use of the characteristics of human perception How MP3 actually works? ISMT Multimedia Lecture 04/28 © 2001 Dr. Vojislav B. Mišić MP3 Signal is split into a number of separate frequency bands (32 for Layer 3) Signals within each band are analyzed for different masking effects Joint stereo used with low bit rates Inaudible components are discarded The remainder is coded in the “standard” way The decoder simply creates appropriate analog waveforms (can be very simple and fast) ISMT Multimedia Lecture 04/29 © 2001 Dr. Vojislav B. Mišić MP3 vs. Raw Digital Audio Reductions of 12:1 possible with little discernible loss of quality compared to standard CDs ISMT Multimedia Lecture 04/30 © 2001 Dr. Vojislav B. Mišić All The Best MP3 files are small Decoding is simple and effective File format designed to support decoders with different bit rates This means: you can play a file at lower quality if you want Support for streaming But MP3 is not without its downsides – first and foremost: who’s paying? ISMT Multimedia Lecture 04/31 © 2001 Dr. Vojislav B. Mišić Other Formats May Offer More Once the ice has been broken, other formats have been designed, such as Windows Media Audio (WMA), streaming, good quality Real Audio – streaming, keeps improving the quality A bunch of others with marginally better performance and no software support (but audio has always been the empire of hackers) SDMI – copyrighted work with digital watermarks Recording industry guys seem to have the upper hand at the moment, but technology keeps improving … and we may yet see a different outcome ISMT Multimedia Lecture 04/32 © 2001 Dr. Vojislav B. Mišić MIDI Musical Instrument Digital Interface: a real-time music control and network protocol Enables interconnection of electronic instruments and computers for musical performance, recording and playback Message-based protocol, includes hardware specifications and protocols, as well as a special file format Note: actual sounds are not recorded ISMT Multimedia Lecture 04/33 © 2001 Dr. Vojislav B. Mišić More on MIDI MIDI is a convenient numerical (i.e., computer-readable) notation for music Detailed descriptions of a musical score (notes, their sequence, beats, instruments, ...) A MIDI file is a list of time-stamped commands When played through a MIDI output device (e.g., synthesizer) it results in music MIDI file is editable, just like a musical score on paper (and the result can be checked instantaneously) ISMT Multimedia Lecture 04/34 © 2001 Dr. Vojislav B. Mišić MIDI playback Quality of the sound is (almost completely) dependent on the quality of the output device (sound synthesizer) Moreover: the same MIDI file can (and often will) sound different when played through different output devices MIDI files are rather small in size Most PC sound cards have built-in MIDI synthesizers ISMT Multimedia Lecture 04/35 © 2001 Dr. Vojislav B. Mišić When to use what? use MIDI when use digital audio when machine performance is insufficient a high-quality MIDI synthesizer exists you don't need spoken dialog you want to edit the music clip you don't have control over playback hardware (Internet) you need spoken dialog and music target hardware is of sufficient performance ISMT Multimedia Lecture 04/36 © 2001 Dr. Vojislav B. Mišić Summary of Lecture 4 sound is very important sound perception has subtle physiological and psychological aspects digital audio: many different flavors MP3 – fairly recent, fairly convenient MIDI: shorthand for music ISMT Multimedia Lecture 04/37 © 2001 Dr. Vojislav B. Mišić