Download mpeg4-SA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Music technology wikipedia , lookup

Dolby Digital Plus wikipedia , lookup

Sound reinforcement system wikipedia , lookup

Mixing console wikipedia , lookup

CD player wikipedia , lookup

Dynamic range compression wikipedia , lookup

Sound recording and reproduction wikipedia , lookup

MiniDisc wikipedia , lookup

PS Audio wikipedia , lookup

Music technology (electronic and digital) wikipedia , lookup

Transcript
MPEG-4
John Lazzaro
John Wawrzynek
June 18, 2001
Modified by Francois Thibault
January 20, 2003
Further modified by Ichiro Fujinaga
January 20, 2005
CS Division
University of California at Berkeley
www.cs.berkeley.edu/~johnw
MPEG 4 Standard
Finalized its standardization process in
1999 (Vancouver)
Design to integrate visual and audio
Includes "natural" (recorded) and
"synthetic" (synthesized) coding of audio
and video
MPEG 4 Scope
 Provides a set of technologies to satisfy the
needs of
 authors
 network
service providers
 end users
 Enables the production of content that has far
greater reusability in
 digital
television
 animated graphics
 web pages
MPEG 4 Features
MPEG-4 provide standardized ways to:
 represent units of aural, visual or audiovisual content,
called “media objects”
Natural origin
 Synthetic origin

 recorded with a camera or microphone, or generated
with a computer
 describe the composition of these objects to create
compound media objects that form audiovisual scenes
 multiplex and synchronize the data associated with
media objects, so that they can be transported over
networks providing a QoS (Quality of Service)
 interact with the audiovisual scene generated at the
receiver’s end
MPEG 4 Standard (audio)
MPEG 4
video
audio
Natural coding
AAC
T/F
CELP Parametric
ISO/IEC 14496-3 sec5
system
Synthetic coding
SA
TTS
MPEG 4 Audio: Natural (recorded)
 AAC: The Advanced Audio Coding
 Originally
created as an extension to MPEG-2
 Provides better quality at 64 kbit/sec/channel than
MP3 does at 128 kbit/sec/channel
 CELP: A codebook-excited linear prediction
 scheme
optimized for telephone- quality transmission
of speech in the range 8-32 kbps
 Parametric:
A
novel "harmonic vector + noise" method that allows
lossy but extremely low-bitrate coding of wideband
sounds down to 2 kbps/sec/ channel
MPEG 4 Audio: Synthetic
(synthesized)
 Structured Audio:
A
downloadable synthesis method that allows
producers to describe new synthesis methods as part
of the bitstream
 the receiver implements a reconfigurable synthesis
engine and synthesizes the sound on-the-fly as the
instructions are received
 Text-to-Speech:
 An
interface to standalone TTS systems is provided,
so that synthetic speech can be synchronized in
multimedia presentations
 No "method" of creating synthetic speech is
standardized by MPEG
MPEG 4 Standard - Structured Audio
MPEG 4
video
audio
Natural coding
AAC
T/F
CELP Parametric
system
Synthetic coding
SA
TTS
Structured Audio: One “component” in the MPEG audio standard.
ISO/IEC 14496-3 sec5
Audio Compression Basics
encoder
decoder
 Traditional Technique for Music
amp
Filter into
Critical Bands
Allocate
Bits
time
Compute
Masking
Format
Bitstream
The Kolmogorov alternative:
 Write a computer program that generates the
desired audio stream.
 Transmit the computer program.
 To decode, execute the program.
Similar to Postscript!
 MPEG-4 Structured Audio (MP4-SA) uses this
approach.
 Eric Scheirer, Editor (MIT Media Lab).
 http://sound.media.mit.edu/~eds/mpeg4/
MP4-SA Encoding
 may be a creative act: writing a program.
 directly
(emacs), or
 indirectly (GUI, webpage)
 In this case, MP4-SA is a lossless compressor.
 may be automatic: given a sound, an encoder
writes a program that generates the sound.

Automatic encoding is a hard in the general case.
MP4-SA Decoders
 are interpreters or compilers.
Key Application: Music Production
 Modern music production is computer-based.
 Musicians
enter performances into computers as
control information, not audio waveforms.
 Digital synthesizers, effects, and mixes create the
final audio, under engineer/producer control.
MP4-SA Maps to Modern Music Production
“The Program”
synthesis algorithms
effects “boxes”
mixers
Musical performance
Mix-down control information
Network
“The Decoder”
sound rendering
Premium on
low-bandwidth
Key Application: Music Production
 Modern music production is computer-based.
 Musicians
enter performances into computers as
control information, not audio waveforms.
 Digital synthesizers, effects, and mixes create the
final audio, under engineer/producer control.
MP4-SA Maps to Modern Music Production
“The Program”
synthesis algorithms
effects “boxes”
mixers
Musical performance
Mix-down control information
Standard
Framework
“The Decoder”
sound rendering
File System
Ideal for collaborative productions, remixes, and ...
Key Application: Music Performance
 Music Performance requires dynamic control.
 True
interactively requires parameterized sounds.
 Musicians control instruments and effects with
interactive controllers.
 Control could be indirect and remote (ex: games).
MP4-SA Enables Networked Music Performance
“The Decoder”
sound rendering
+
“The Decoder”
sound rendering
Network
Premium on
low-bandwidth
+
MPEG 4 Structured Audio:
 A binary file format that encodes:
 The
programming language SAOL (pronounced:
sail).
 The musical score language SASL.
 Legacy support for MIDI.
 Audio sample data.
 Result is normative: an MP4-SA file will
sound identical on all compliant decoders.
Different from MIDI files.
Why SAOL and MP4-SA?
Why not Java?
 Musical performance have temporal structure
that changes over several timescales:
Sample-by-sample
10’s of usec
Amplitude & timbre envelopes: 10’s of msec
Note-by-note: 100’s of msec
 Writing sound generation code in a
conventional language results in code
dominated by time-scale management.
 Hard
to maintain, hard to optimize.
Time management is built into SAOL.
 A SAOL program executes by moving a
simulated clock forward in time, performing
calculations along the way in a synchronous
fashion.
 Work is scheduled to happen:
 at
the a-rate (the audio sample rate)
 at the k-rate (envelope control rate)
 at the i-rate (rate for new notes)
 Language variables are typed as a/k/i-rate.
 A language statement is scheduled based on
the rate of the variables it contains.
SAOL, SASL, and Scheduling:
 Sound creation in MP4-SA can be compared
to a musician playing notes on an instrument.
 A SAOL subprogram (called an instr or
instrument) serves as the instrument.
 SASL commands (called score lines) act to
play notes on SAOL instruments.
 Many instances of a SAOL instr can be active
at one time, making sounds corresponding to
notes launched by different score lines in a
SASL file.
An example:
 SAOL instrument tone, that plays a gated
sine wave. (SAOL code in next slide.)
 This SASL file plays melody on tone:
0.5
1.5
2.5
When
3
instance
3.25
is launched 3.5
4
5
tone
tone
tone
tone
tone
tone
tone
end
How long instrument runs
0.75
0.75
0.5
0.25
0.25
0.5
0.5
52
64
63
59
61
63
64
0.25
0.25
0.25
0.2
0.225
0.225
0.25
Instance parameters
(note number, loudness)
SAOL code for tone
instr tone (note, loudness)
{
ivar a;
// sets osc f
ksig env;
// env output
asig x, y;
asig init;
// osc state
a = 2*sin(3.141597*cpsmidi(note)/s_rate);
env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0);
if (init == 0)
// first a-pass only
{
x = loudness;
init = 1;
}
}
x = x - a*y;
y = y + a*x;
// the FLOPS happen in
// these 3 statements
output(y*env);
// creates audio output
// end of instr tone
SAOL Features
Rate semantics:

i/k/a-rate execution
Vector arithmetic:

ex: A=B+C  for i=1,n A[i]=B[i]+C[i]
All floating-point arithmetic.
Extensive build-in audio function library:
signal
generators, table operators, pitch
converters, filters, fft, sample rate conversion,
effects, ...
Sfront - a SAOL-to-C translator
 Converts MP4-SA files to a ANSI C program,
that when executed, produces audio.
foo.mp4
sfront
sa.c
 Handles SAOL, SASL, MIDI, uncompressed
samples. SAOL
SASL
MIDI
Uncompressed
samples
 Runs
sfront
foo.mp4
sa.c
on UNIX, Windows, MacOS.
 Under Linux, supports real-time MIDI input, real-time
audio input and output, and MIDI over RTP (Real Time
Protocol).
 www.cs.berkeley.edu/~lazzaro/sa
Generator Techniques
 Much of the SA standard describes a library
 104
core opcodes (ex: pow(), allpass(), reverb() )
 16 wave table generators (ex: harm, spline, random)
 Sfront optimizes the code produced for each
library element instance based on the
invocation attributes
 rate,
width, size, constancy, integral nature of the
parameters, number of paramaters
Conclusions
 MP4-SA puts emphasis on sound synthesis
methods that can be described in a small
amount of space.
 Physical Modeling
 Sampling Natural Instruments
good
bad
 If models are chosen carefully, compression
ratios of 100 to 10,000 are possible.
 MP4-SA specifies that a decoder produces
audio that “sounds identical” to computing
the program accurately.