Download ISMT multimedia

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Surround sound wikipedia , lookup

Compact disc wikipedia , lookup

Loudspeaker wikipedia , lookup

Videocassette recorder wikipedia , lookup

CD player wikipedia , lookup

HD-MAC wikipedia , lookup

MiniDisc wikipedia , lookup

Mixing console wikipedia , lookup

Dynamic range compression wikipedia , lookup

Sound reinforcement system wikipedia , lookup

Home cinema wikipedia , lookup

Music technology (electronic and digital) wikipedia , lookup

Transcript
Multimedia Object Types:
Sound
ISMT Multimedia
Dr Vojislav B Mišić
Why sound?
 One of the fundamental sensing
mechanisms for humans (arguably
not the most important …)
 Hearing supports vision in our
perception of the outside world
 Hearing also provides the primary
sensing mechanism in cases
where vision cannot function
well enough (or does not
function at all)
ISMT Multimedia Lecture 04/1 © 2001 Dr. Vojislav B. Mišić
Tones and sounds
 Sounds are often classified in three categories:



Speech
Music
Noise
ISMT Multimedia Lecture 04/2 © 2001 Dr. Vojislav B. Mišić
Speech and voice
 Of all sounds, the human voice is probably the most
absorbing
 Different narrators, ages, genders, accents and
intonations, give rise to different voice effects
 Computer-generated speech may be used for simple
messages – it is still not good enough for complex
narrations
ISMT Multimedia Lecture 04/3 © 2001 Dr. Vojislav B. Mišić
Role of music
 Music is often used as an emotional and alluring
enhancement to a project
 Music can be used for creating background
 Music may be a prominent focal point in some projects
ISMT Multimedia Lecture 04/4 © 2001 Dr. Vojislav B. Mišić
Using sound in presentations
 Musical tones can create a sense of harmony
(or the contrary )
 Music can invoke different emotions (depending on
context, age, gender, culture, … )
 Sound can create a mood, an atmosphere to enhance the
effect of visible stimuli
 Sound and vision should act together towards the desired
experience
 The sum is often more effective than the parts
ISMT Multimedia Lecture 04/5 © 2001 Dr. Vojislav B. Mišić
Using sound in interfaces
 Sound can provide humorous or serious accompaniment
to events and content
 Sound in an interface can





alert users of problems or opportunities
mask transitions
acknowledge user actions
convey information
divert attention from other processes
ISMT Multimedia Lecture 04/6 © 2001 Dr. Vojislav B. Mišić
Designing sound
 Rely on your movie/TV consumer experience
 Sound can be background music, voice, or sound effects
 Effective sounds can attract users, and vice versa
 Avoid silent intervals, unless silence is specifically used as
the accompaniment
 Use sound effects – but don’t overdo it
 Spatial sound effects can enhance listener experience
ISMT Multimedia Lecture 04/7 © 2001 Dr. Vojislav B. Mišić
Designing dynamics
 The dynamics of sound don’t need to be
natural


shooting can be less loud than it really is
whispering can be louder than it really is
 Proper sound dynamics should follow
the presentation
ISMT Multimedia Lecture 04/8 © 2001 Dr. Vojislav B. Mišić
Back to the Basics
 What is sound? Say, for example …
Is there a sound when
a firecracker explodes on
North Pole?
Psychologists say: no
(no humans to hear it)
Biologists say: no
(no humans, no bears to hear it)
Physicists say: yes
(there are waves in the air)
ISMT Multimedia Lecture 04/9 © 2001 Dr. Vojislav B. Mišić
Physics says …
 Variations of the air pressure in the range 20-20,000 Hertz

(40-15,000 is more likely, especially for older people)
 Sensitivity depends on frequency (Fletcher-Munson
curves)

Depends on sound level
 Very low energy – 90dB ~ 10-3 W/m2
 Intensity expressed on a logarithmic scale (deciBel, dB)


6dB higher = twice the pressure
20dB higher = 10 times the pressure
 Audible dynamic range about 120dB
ISMT Multimedia Lecture 04/10 © 2001 Dr. Vojislav B. Mišić
Fletcher-Munson Curves
ISMT Multimedia Lecture 04/11 © 2001 Dr. Vojislav B. Mišić
Human Ear
ISMT Multimedia Lecture 04/12 © 2001 Dr. Vojislav B. Mišić
Sound reception
and perception
 Received through ears, processed in the inner ear, final
processing / recognition performed by the brain
 Processing
apparatus
rather
complex
and
sensitive
ISMT Multimedia Lecture 04/13 © 2001 Dr. Vojislav B. Mišić
… hear no evil …
 Sound reception cannot be consciously switched off (you
cannot close your ears, really)
 ... but it can be masked
semi-consciously
 Which can be useful in
noisy environments
 (and for other things as well –
but more on that later)
ISMT Multimedia Lecture 04/14 © 2001 Dr. Vojislav B. Mišić
Distortion
 Humans can tolerate large distortions, while still
understanding spoken words, recognizing the speaker, or
following a melody
 Humans can also detect very small distortions, especially
for well known sounds
 Suitable for recognition purposes – with humans still
outperforming computers
ISMT Multimedia Lecture 04/15 © 2001 Dr. Vojislav B. Mišić
Spatial Perception and How to Create It
 Binaural listening enables spatial perception, based on
intensity and phase differences between signals from left
and right ear
 The shape of our ears determines directional sensitivity



left/right is the highest
then front/back, and
(finally) up/down
 Two channels are standard, more complex schemes
emerging recently (4.1, 5.1, etc.)
 Should be sufficient to create a spatial sound image
ISMT Multimedia Lecture 04/16 © 2001 Dr. Vojislav B. Mišić
Hearing vs. Vision
 Usable frequency range is 8 to 10 octaves
(only 1 for vision)
 Dynamic range is higher
 Detectable distortions are much smaller
 Sound recognition is generally better
 In other words, ear is a better receptor
ISMT Multimedia Lecture 04/17 © 2001 Dr. Vojislav B. Mišić
Audio standards
 analog audio: cassette tape, audio (vinyl) records, sound
from video tapes
 system beeps and sounds (Mac, Windows)
 digital audio
 MP3 – perceptual coding
 MIDI (slowly fading into oblivion)
ISMT Multimedia Lecture 04/18 © 2001 Dr. Vojislav B. Mišić
Digital audio
 Digitalization: conversion of a continuous analog signal to
a sequence of digital values
 Sampling frequency must be at least twice the highest
frequency in the signal spectrum (Nyquist); for audio it is
about 40 kHz
 CD quality recording – 44,100 Hz (44,025 is sometimes
used for TV compatibility)
 lower sampling frequencies may be used
… with corresponding loss of quality
ISMT Multimedia Lecture 04/19 © 2001 Dr. Vojislav B. Mišić
Dynamic range
 coding of samples: depends on the desired quality and on
the available storage or bandwidth
 16 bit coding gives 65536 possible signal levels
 … or over 96 dB dynamic range (each extra bit adds about
6dB of signal-to-noise ratio)
 … which is quite sufficient, even for classical music,
because most of us don’t have the proper environment to
enjoy it
 less bits result in lower quality, e.g., 8 bits per sample:
telephone quality
ISMT Multimedia Lecture 04/20 © 2001 Dr. Vojislav B. Mišić
Storage requirements
 CD quality sound (stereo) requires about 172 KB/s
 Hence, 1 minute takes about 10.5 MB (oops!)
 Red Book audio CD's have 680MB capacity,
hold about 70 minutes of music
 Hence, lower sampling rates are used, with the associated
loss in color/brightness
ISMT Multimedia Lecture 04/21 © 2001 Dr. Vojislav B. Mišić
Why Not Compress It?
 Sound does not lend itself to compression easily (unlike
video) because


Redundancy is low
Receptors are unusually good at spotting distortion
 Some attempts were made (RealAudio) but the
compression factor is not high, or the quality is audibly
deteriorated
 Perceptual Coding (as used in MP3 and other schemes) to
the rescue …
ISMT Multimedia Lecture 04/22 © 2001 Dr. Vojislav B. Mišić
Audio editing
 set proper recording levels
(small peak levels, or you will
get distortion on loud passages)
 always record your sounds at
highest possible frequency and
resolution, then down-sample it
to the desired frequency and/or
resolution
 Postproduction:








trimming
splicing
assembly and mixing different
sources
equalization
time stretching
digital effects
format conversion
resampling and downsampling
ISMT Multimedia Lecture 04/23 © 2001 Dr. Vojislav B. Mišić
Audio file formats
 Several formats exist for raw digital audio files:



WAV (Microsoft)
AU (Sun/Next audio)
AIFF
 with different sampling frequencies, coding options,
mono/stereo, …
 Fortunately, all players can play all formats …
ISMT Multimedia Lecture 04/24 © 2001 Dr. Vojislav B. Mišić
Basics of Perceptual Coding
 For all of our senses—or for vision and hearing at least—
the amount of information actually received is much
higher than the amount of information processed
 Therefore, it would be advantageous to try to code and
store only the information which is actually processed,
and simply discard the remainder
 Now, the trick is: how to find this really important
information …
ISMT Multimedia Lecture 04/25 © 2001 Dr. Vojislav B. Mišić
What We Hear and Don’t Hear
 Loudness masking: a signal may be fairly audible by
itself, but may be masked by another, possibly louder,
signal at a different frequency
ISMT Multimedia Lecture 04/26 © 2001 Dr. Vojislav B. Mišić
What We Hear and Don’t Hear
 Temporal masking: a signal may be fairly audible by
itself, but another signal at a different frequency may
render it temporarily imperceptible
 Joint stereo: in a stereo signal (i.e., left and right signals
of the same recording) the difference between the two is
not too big, and it’s often limited to higher frequency

Remember subwoofers: there’s only one per stereo system
 So, when you combine all of these …
ISMT Multimedia Lecture 04/27 © 2001 Dr. Vojislav B. Mišić
…Enter MP3
 Requirements for high compression rates for VCDs (MPEG)
have led to the research in audio compression …
 Which resulted in MP3 perceptual coding scheme
 MP3: MPEG-1 Audio Layer 3 (there are Layers 1 and 2),
makes clever use of the characteristics of human
perception
 How MP3 actually works?
ISMT Multimedia Lecture 04/28 © 2001 Dr. Vojislav B. Mišić
MP3
 Signal is split into a number of separate frequency bands
(32 for Layer 3)
 Signals within each band are analyzed for different
masking effects

Joint stereo used with low bit rates
 Inaudible components are discarded
 The remainder is coded in the “standard” way
 The decoder simply creates appropriate analog waveforms
(can be very simple and fast)
ISMT Multimedia Lecture 04/29 © 2001 Dr. Vojislav B. Mišić
MP3 vs. Raw Digital Audio
 Reductions of 12:1 possible with little discernible loss of
quality compared to standard CDs
ISMT Multimedia Lecture 04/30 © 2001 Dr. Vojislav B. Mišić
All The Best
 MP3 files are small
 Decoding is simple and effective
 File format designed to support decoders with different bit
rates

This means: you can play a file at lower quality if you want
 Support for streaming
 But MP3 is not without its downsides – first and foremost:
who’s paying?
ISMT Multimedia Lecture 04/31 © 2001 Dr. Vojislav B. Mišić
Other Formats May Offer More
 Once the ice has been broken, other formats have been
designed, such as



Windows Media Audio (WMA), streaming, good quality
Real Audio – streaming, keeps improving the quality
A bunch of others with marginally better performance and no software
support (but audio has always been the empire of hackers)
 SDMI – copyrighted work with digital watermarks
 Recording industry guys seem to have the upper hand at
the moment, but technology keeps improving … and we
may yet see a different outcome
ISMT Multimedia Lecture 04/32 © 2001 Dr. Vojislav B. Mišić
MIDI
 Musical Instrument Digital Interface: a real-time music
control and network protocol
 Enables interconnection of electronic instruments and
computers for musical performance, recording and
playback
 Message-based protocol, includes hardware specifications
and protocols, as well as a special file format
 Note: actual sounds are not recorded
ISMT Multimedia Lecture 04/33 © 2001 Dr. Vojislav B. Mišić
More on MIDI
 MIDI is a convenient numerical (i.e., computer-readable)
notation for music
 Detailed descriptions of a musical score (notes, their
sequence, beats, instruments, ...)
 A MIDI file is a list of time-stamped commands
 When played through a MIDI output device (e.g.,
synthesizer) it results in music
 MIDI file is editable, just like a musical score on paper
(and the result can be checked instantaneously)
ISMT Multimedia Lecture 04/34 © 2001 Dr. Vojislav B. Mišić
MIDI playback
 Quality of the sound is (almost completely) dependent on
the quality of the output device (sound synthesizer)
 Moreover: the same MIDI file can (and often will) sound
different when played through different output devices
 MIDI files are rather small in size
 Most PC sound cards have built-in MIDI synthesizers
ISMT Multimedia Lecture 04/35 © 2001 Dr. Vojislav B. Mišić
When to use what?
use MIDI when
use digital audio when
 machine performance is
insufficient
 a high-quality MIDI
synthesizer exists
 you don't need spoken
dialog
 you want to edit the music
clip
 you don't have control
over playback hardware
(Internet)
 you need spoken dialog
and music
 target hardware is of
sufficient performance
ISMT Multimedia Lecture 04/36 © 2001 Dr. Vojislav B. Mišić
Summary of Lecture 4
 sound is very important
 sound perception has subtle physiological and
psychological aspects
 digital audio: many different flavors
 MP3 – fairly recent, fairly convenient
 MIDI: shorthand for music
ISMT Multimedia Lecture 04/37 © 2001 Dr. Vojislav B. Mišić