Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
520.443 Digital Multimedia Coding & Processing: A Review Trac D. Tran ECE Department The Johns Hopkins University Baltimore, MD 21218 General Information I Instructor Prof. Trac D. Tran Office: Barton 215. Phone: 410-516-7416. Email: [email protected] Office Hour: Wed 10-12 or by appointment TA Yi Chen. Office: Barton 322. Email: [email protected] Office Hour: Tues 2-4 or by appointment Lectures Wednesday 2:30 – 5:00, Barton 225 Course Web Page http://thanglong.ece.jhu.edu/Course/443/ General Information II Homework Assignments Around 5-6, most with computer assignments Final project Team of 2 or 3 on a topic of choice Topic can be chosen from a list of suggestions A final project report and a 15-minute oral presentation Grading Homework/Class Participation: 50%. Project: 50%. General Information III Prerequisites 520.435 Digital Signal Processing Some prior experience with Matlab and/or C/C++ Basic knowledge in linear algebra and probability Programming Emphasizes hand-on learning with a lot of computer assignments and projects. There will not be any exam! The use of Matlab and C/C++ is encouraged You need to bring a laptop with Matlab installed to lectures Recommended Textbooks K. Saywood, Introduction to Data Compression, 3rd Edition, Morgan Kaufmann, 2005. ISBN 012620862X. J. W. Woods, Multidimensional Signal, Image, and Video Processing and Coding, Academic Press, 2006. ISBN 0120885166. K. R. Rao and J. J. Hwang, Techniques and Standards for Image Video and Audio Coding, Prentice Hall, Upper Saddle River, NJ. ISBN 0133099075. A. M. Tekalp, Digital Video Processing, Prentice Hall, Upper Saddle River, NJ. ISBN 0131900757. B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video : An Introduction to MPEG-2, Chapman & Hall, New York, NY. ISBN 0412084112. V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards Algorithms and Architectures, Kluwer Academic Publishers, Boston, MA. ISBN 0792399528. J. L. Mitchell (Editor), W. B. Pennebaker (Editor), C. E. Fogg, and D. J. LeGall, MPEG Video: Compression Standard, Chapman & Hall, New York, NY. ISBN 0412087715. W. B. Pennebaker and Joan L. Mitchell, JPEG: Still Image Data Compression Standard, Van Nostrand Reinhold, New York, NY. ISBN 0442012721. Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and Communications, Prentice Hall, Englewood Cliffs, NJ, 2002. ISBN 0130175471. Course Overview Audio/Image/Video Compression and Communications Fundamentals: motivation, signal properties & formats, information theory, variable-length coding, quantization Transform coding framework, JPEG, JPEG2000, MP3 Video coding and international video standards Multimedia communications Goals Focus on big pictures, key concepts, elegant ideas, no rigorous treatment Provide hands-on experience with simple Matlab exercises Illustrate applications of digital signal processing Hopefully lead to future research and developments !!!Fun Fun Fun Fun Fun!!! Tentative Syllabus I Jan 27: Introduction. Motivation. Main Principles. Review. Feb 3: Information Measures. Lossless Coding Techniques. Entropy Coding. Huffman and Arithmetic Coding. Feb 10: Quantization. Optimal Conditions. Quantizer Design. Feb 17: Multirate System Fundamentals. Polyphase. Filter Banks. Transforms. Basis Functions. Feb 24: KLT. DFT. FFT. DCT. MLT. Wavelet Transform. Mar 3: Audio Coding Standards. MP3. AAC. Image Compression Standards. JPEG. Mar 10: Zerotree Coding. Embedded Coding. JPEG2000. Project Proposal Due. Mar 17: Spring Vacation. No Lecture. Tentative Syllabus II Mar 24: Video Coding Fundamentals. Motion Estimation and Compensation. Mar 31: Popular Video Coding Standards. MPEG Family. H.26 Family. Apr 7: Latest Video Compression Standard: H.264 or MPEG-4 Part 10 or MPEG-4 AVC. Apr 14: Multimedia Processing in the Compressed Domain. Communication and Networking Issues. Error Resilience. Apr 21: Multimedia Streaming. Packet Video. Apr 28: Final Project Oral Presentations. May 12: Final Project Report Due. Outline Introduction to multimedia coding & processing Multimedia is everywhere! The need for compression & efficient representation Multimedia signals: properties & formats, color spaces General multimedia compression framework A review Probability Random variables Random processes Statistical modeling of audio/image/video signals Error & similarity measurements Multimedia Everywhere! Fax machines: transmission of binary images Digital cameras: still images iPod / iPhone & MP3 Digital camcorders: video sequences with audio Digital television broadcasting Compact disk (CD), Digital video disk (DVD) Personal video recorder (PVR, TiVo) Images on the World Wide Web Video streaming & conferencing Video on cell phones, PDAs High-definition televisions (HDTV) Medical imaging: X-ray, MRI, ultrasound, telemedicine Military imaging: multi-spectral, satellite, infrared, microwave Digital Bit Rates A picture is worth a thousand words? Size of a typical color image For display 640 x 480 x 24 bits = 7372800 bits = 92160 bytes For current mainstream digital cameras (5 Mega-pixel) 2560 x 1920 x 24 bits = 117964800 bits = 14745600 bytes For an average word 4-5 characters/word, 7 bits/character: 32 bits ~= 4 bytes Bit rate: bits per second for transmission Raw digital video (DVD format) 720 x 480 x 24 x 24 frames: ~200 Mbps CD Music 44100 samples/second x 16 bits/sample x 2 channels ~ 1.4 Mbps Reasons for Compression Digital bit rates Terrestrial TV broadcasting channel: DVD: Ethernet/Fast Ethernet: Cable modem downlink: DSL downlink: Dial-up modem: Wireless cellular data: ~20 Mbps 10...20 Mbps <10/100 Mbps 1-3 Mbps 384...2048 kbps 56 kbps max 9.6...384 kbps Compression = Efficient data representation! Data need to be accessed at a different time or location Limited storage space and transmission bandwidth Improve communication capability Personal Video Recorder (PVR) MPEG2 Quality Best High Medium Basic 7.7 Mbps 5.4 Mbps 3.6 Mbps 2.2 Mbps Continuous & Discrete Representations Continuous-Amplitude x(t) Discrete-Amplitude x(t) Continuous -Time t Local telephone, cassette-tape (Space) t recording & playback, phonograph, photograph x[n] Discrete -Time (Space) telegraph x[n] n n CD, DVD, cellular phones, Switched capacitor filter, digital camera & camcorder, speech storage chip, half-tone digital television, inkjet photography printer Sound Fundamentals Sound waves: vibrations of air particles Fluctuations in air pressure are picked up by the eardrums Vibrations from the eardrums are then interpreted by the brain as sounds Sound Waves: 1-D signals How fast the air pressure fluctuates High pitch, low pitch xi (t ) Ai cos( i t i ) Volume frequency Frequency Amplitude of the sound wave How loud the sound is Phase volume phase envelope Determine temporal and spatial localization of the sound wave x(t ) xi (t ) i Frequency Spectrum for Audio 0 0 Human Auditory System 20Hz-20kHz 10k FM Radio Signals 100Hz-12kHz 10k 20k 20k AM Radio Signals 100Hz-5kHz 0 10k 20k f (Hz) f (Hz) f (Hz) Telephone Speech f max 3.3kHz f sampling 6.6kHz 300Hz-3.5kHz f (Hz) 0 10k 20k Speech Signals ph - o - n - e - t - i - c - ia - Main useful frequency range of human voice: 300 Hz – 3.4 kHz n Music Signals 2f fundamenta l frequency x(t ) cost 0.75 cos3t 0.5 cos5t 0.14 cos7t 0.5 cos9t 0.12 cos11t 0.17 cos13t Harmonics in Music Signals The spectrum of a single note from a musical instrument usually has a set of peaks at harmonic ratios If the fundamental frequency is f, there are peaks at f, and also at (about) 2f, 3f, 4f… Best basis functions to capture speech & music: cosines & sines Multi-Dimensional Digital Signals Images: 2-D digital signals pixel or pel black gray white p=0 p=128 p=255 colors: combination of RGB Video Sequences: 3-D digital signals, a collection of 2-D images called frames y t x Color Spaces: RGB & YCrCb RGB Red Green Blue, typically 8-bit per sample for each color plane YCrCb Y: luminance, gray-scale component Cr & Cb: chrominance, color components, less energy than Y Chrominance components can be down-sampled without much aliasing YCrCb, also known as YPrPb, is used in component video 0.504 0.098 R 16 Y 0.257 C 0.439 0.368 0.071 G 128 R CB 0.148 0.291 0.439 B 128 Y sample Cr, Cb sample Another Color Space: YUV YUV is another popular color space, similarly to YCrCb Y: luminance component UV: color components YUV is used in PAL/NTSC broadcasting 0.587 0.114 R Y 0.299 U 0.147 0.289 0.436 G V 0.615 0.515 0.100 B U: 88 x 72 Y: 176 x 144 V: 88 x 72 Popular Signal Formats CIF: Common Intermediate Format Frame n QCIF: Quarter Common Intermediate Format Y resolution: 352 x 288 CrCb/UV resolution: 176 x 144 Frame rate: 30 frames/second progressive 8 bits/pixel(sample) Y resolution: 176 x 144 CrCb/UV resolution: 88 x 72 Frame rate: 30 frames/second progressive 8 bits/pixel (sample) TV – NTSC DVD – NTSC Resolution: 720 x 480, 24 – 30 frames/second progressive Cr Cb Frame n+1 Resolution: 704 x 480, 30 frames/second interlaced Y Y Cr Cb High-Definition Television (HDTV) 720i Resolution: 1280 x 720, interlaced 720p Resolution: 1280 x 720, progressive 1080i Resolution: 1920 x 1080, interlaced 1080p Resolution: 1920 x 1080, progressive odd field Interlaced Video Frame even field Examples of Still Images Examples of Video Sequences Frame 1 51 71 91 111 Observations of Visual Data There is a lot of redundancy, correlation, strong structure within natural image/video Images Spatial correlation: a lot of smooth areas with occasional edges Video Temporal correlation: neighboring frames seem to be very similar Image/Video Compression Framework Quantization original signal T Q E compressed bit-stream Channel reconstructed signal T 1 Q Prediction Transform De-correlation 1 E 1 Information theory VLC Huffman Arithmetic Run-length Deterministic versus Random Deterministic Signals whose values can be specified explicitly Example: a sinusoid Random Digital signals in practice can be treated as a collection of random variables or a random process The symbols which occur randomly carry information Probability theory The study of random outcomes/events Use mathematics to capture behavior of random outcomes and events Random Variable Random variable (RV) A random variable X is a mapping which assigns a real number x to each possible outcome of a random experiment A random variable X takes on a value x from a given set. Thus it is simply an event whose outcomes have numerical values Examples X in coin toss, X=1 for Head, X=0 for Tail The temperature outside our lecture hall at any moment t The pixel value at location x, y in frame n of a future Hollywood blockbuster x Probability Density Function Probability density function (PDF) of a RV X Function f X (x) defined such that: Px1 X x2 Histogram of X !!! Main properties: f X ( x)dx 1 f X ( x) 0, x x2 f x1 X ( x)dx PDF Examples 1 /( a b), a x b f X ( x) otherwise 0, 1 ba 0 f X ( x) 1 2 e 2 x / a b x Uniform PDF 1 ( x ) 2 / 2 2 f X ( x) e 2 Gaussian PDF x Laplacian PDF x Discrete Random Variable RV that takes on discrete values only PDF of discrete RV = discrete histogram Example: how many Heads in 3 independent coin tosses? f X (x) 3/8 3/8 1/8 0 1/8 1 2 f X ( x) PX ( xk ) x xk k 3 x wit h PX ( xk ) PX xk Expectation Expected value Let g(X) be a function of RV X. The expected value of g(X) is defined as Eg X g x f X x dx Expectation is linear! Expectation of a deterministic constant is itself: EC C X EX x f X x dx Mean 2 Mean-square value E X 2 2 E X Variance X X E X 2 X2 X2 Cross Correlation & Covariance Cross correlation X, Y: 2 jointly distributed RVs Joint PDF: Px1 X x2 , y1 Y y2 Expectation: E g X , Y Cross-correlation: g x, y f y 2 x2 f ( x, y )dxdy y1 x1 XY RXY EXY XY Cross covariance CXY E X X Y Y RXY CXY X Y ( x, y )dxdy Independence & Correlation Marginal PDF: f X x f XY x, y dy fY y f XY x, y dx Statistically independent: f XY x, y f X x fY y Uncorrelated: Orthogonal: EXY EX EY , i.e. CXY 0 EXY 0 with 0-mean RVs Random Process Random process (RP) A collection of RVs A time-dependent RV Denoted {X[n]}, {X(t)} or simply X[n], X(t) We need N-dimensional joint PDF to characterize X[n]! Note: the RVs made up a RP may be dependent or correlated Examples: Temperature X(t) outside campus A sequence of binary numbers transmitted over a communication channel Speech, music, image, video signals Wide-Sense Stationary Wide-sense stationary (WSS) random process (RP) A WSS RP is one for which E[X[n]] is independent of n and Rm, n EX mX n only depends on the difference (m – n) Mean: mX EX n Auto-correlation sequence: RXX k EX nX n k Energy: E X 2 n R 0 XX 2 Variance: X E X n m X 2 X2 RXX 0 mX2 Co-variance: CXX k E X n mX X n k mX What happens if the WSS RP has 0-mean? White Random Process Power spectral density The power spectrum of a WSS RP is defined as the Fourier transform of its auto-correlation sequence S XX e j RXX k e jk k White RP A RP is said to be white if any pair of samples are uncorrelated, i.e., EX nX m EX nEX m, m n 2 mX , k 0 White WSS RP RXX k 2 2 m X, k 0 X White 0-mean WSS RP j RXX k 2 X 0 S XX e k X2 0 Stochastic Signal Model H z w[n] 1 1 n1 an z n N white 0-mean WSS Gaussian noise For speech: N = 10 to 20 x[n] AR(N) signal For images: N = 1! and a1 0.95 W z X z H z W z 1 z 1 AR(1) Signal X z W z z 1 X z 1 X z z X z W z xn wn xn 1 Error or Similarity Measures Mean Square Error (MSE) 1 L 2 - norm error : MSE N N 1 2 ˆ E X X i i i 0 Mean Absolute Difference (MAD) 1 N 1 L1 - norm error : MAD E X i Xˆ i N i 0 Max Error L - norm error : MaxError max E X i Xˆ i i Peak Signal-to-Noise Ratio (PSNR) M2 PSNR 10 log 10 ; MSE M maximum peak - to - peak value Summary Introduction to audio/image/video signals Audio-visual information is everywhere in our everyday life Efficient representation (compression) of audio/image/video facilitates information storage, archival, communications and even processing Compression is achievable since visual data contains a lot of redundancy, both spatially and temporally Review Random variables, PDF, mean, variance, correlation Random processes, wide-sense stationary RP, white Simple stochastic signal models via AR processes Error or similarity measures