Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EE665000 視訊處理 Chapter 8 Still Image Compression 1 8.1 Basics of Image Compression Purposes: 1. To remove image redundancy 2. To increase storage and/or transmission efficiency Why? l 2 Still Pictures for ISO JPEG Standard Test Pictures: 720(pels) × 576(lines) × 1.5bytes =4.977 Mbits =77.8(secs) @ 64k(bits/sec) 8.1 Basics of Image Compression (cont.) How? 1. Use characteristics of images(statistical): (a) General ---using statistical model from Shannon’s rate-distortion theory (b) Particular ---using nonstationary properties of images (wavelets, fractals, …) 2. Use characteristics of human perception (psychological): (a) Color representation (b) Weber-Fetcher law (c) Spatial/temporal masking 3 8.1.1 Elements of an Image Compression System A source encoder consists of the following blocks: Input image Transformers T Quantizer Q Entropy coder Binary C bit stream Fig. 8.1 :Block diagram of an image compression system 4 8.1.2 Information Theory l Entropy: Average Uncertainty (Randomness)of a stationary, ergodic signal source X, i.e., Bits needed to resolve its uncertainty. Discrete- Amplitude Memoryless Sources (DMS) X: ,which is a finite-alphabet M set containing k symbols. x n Ax a1 , a2 ,..., ak 5 8.1.2 Information Theory (cont.) Assume that is an independent identically distributed process; i, i, d x n that is M 1,2,..., M x n1 , x n2 ,...x nM p x ni Thepentropy i 1 = bits/ letter where H X E log P X k pi log 2 pi i 1 6 pi P X ai 8.1.2 Information Theory (cont.) l Sources with Memory Discrete-Amplitude Sources with Memory X: x n Ax a1 , a2 ,..., ak Assume that x n is stationary in strict sense, i.e., Pn1,n2,...,n M x n 1 , x n 2 ,..., x n M Pn1T ,n2T ,...,n M T x n 1 T ,..., x n M T 7 8.1.2 Information Theory (cont.) The Nth order Entropy (N-tuple, N-block) is given by HN X 1 E log P X N 1 N P x log P x 2 bits/ letter allX Where x x n , x n 1 ,..., x n N 1 X X n , X n 1 ,..., X n N 1 Lower bound of the source coding is H X lim H N X N 8 8.1.2 Information Theory (cont.) l Variable- Length Source Coding Theorem Lets X be discrete –amplitude stationary & ergodic source and H N X be its Nth order entropy, then there exits a binary prefix code with an average bit rate b satisfying 1 H N X b H N X N Where the prefix condition is observed by that no codeword is prefix (initial part ) of another codeword. 9 8.1.2 Information Theory (cont.) Example : source X X: DMS i, i, d x n Ax a sunny , b rainy P a p:1 0.8, Entropy P b p2 0.2 bits/letter H X p1 log2 p1 p2 log2 p2 0.72 10 8.1.2 Information Theory (cont.) l How many bits needed to remove uncertainty? s Single-Letter Block:N=1 Source Word Codeword (Bits) a 1 b 0 0 1×0.8+1×0.2=1 b 11 1 bits/letter Probability 0.8 0.2 8.1.2 Information Theory (cont.) l How many bits needed to remove uncertainty? s Two-Letter Block: N=2 0.78 bits/letter Source Word aa ab ba bb b 12 Codeword (Bits) 1 01 1 001 000 Probability 0.64 0.16 0.16 0.04 1×0.64+2×0.16+3×0.16+3×0.04=0.78 8.2 Entropy Coding How to construct a real code that achieves the theoretical limits? (a) Huffman coding (1952) (b) Arithmetic coding (1976) (c) Ziv-Lempel coding (1977) 13 8.2.1 Huffman coding l Variable-Length-Coding (VLC) with the following characteristics: t Lossless entropy coding for digital signals t Fewer bits for highly probable events t Prefix codes 14 8.2.1 Huffman coding (cont.) l Procedure Two-stages: (Given probability distribution of X) Stage 1: Construct a binary tree from events Select two least probable events a & b, and replace them by a single node, where probability is the sum of the probability for a & b. Stage 2: Assign codes sequentially from the root. 15 8.2.1 Huffman coding (cont.) Example: Let the alphabet Ax consist of four symbols as shown in the following table. Symbol a1 a2 a3 Probability 0.50 0.25 0.125 0.125 a4 The entropy of the source is H=-0.5 In0.5 - 0.25 In0.25 - 0.125 In0.125 - 0.25In0.25 =1.75 16 8.2.1 Huffman coding (cont.) The tree-diagram for Huffman coding is: a1 a2 a3 a4 17 0 p 0.5 0 p 0.25 1 0 p 0.125 1 1 p 0.125 p 0.25 p 0.5 8.2.1 Huffman coding (cont.) Which yields the Huffman code Symbol Huffman code Probability a1 0 a2 10 a3 110 a4 111 0.50 0.25 0.125 0.125 The average bit-rate b = 1×0.5+2×0.25+3×0.125+3×0.125=1.75=H 18 8.2.1 Huffman coding (cont.) l Performance Implementation: Step 1: Estimate probability distribution from samples. Step 2: Design Huffman codes using the probability obtained at Step 1. 19 8.2.1 Huffman coding (cont.) Advantage: t Approach H(X) (with or without memory), when block size N t Relative simple procedure, easy to follow. 20 8.2.1 Huffman coding (cont.) Disadvantage: t Large N or preprocessing is needed for source with memory. t Hard to adjust codes in real time. 21 8.2.1 Huffman coding (cont.) Variations: Modified Huffman code: Codewords longer than L become fixed-length Adaptive Huffman codes. 22 from Lesing Compression II Arithmetic coding: Unlike the variable-length codes described previously, arithmetic coding, generates non-block codes. In arithmetic coding, a one-toone correspondence between source symbols and code words does not exist. Instead, an entire sequence of source symbols (or message) is assigned a single arithmetic code word. The code word itself defines an interval of real numbers between 0 and 1. As the number of symbols in the message increases, the interval used to represent it becomes smaller and the number of information units (say, bits) required to represent the interval becomes larger. Each symbol of the message reduces the size of the interval in accordance with the probability of occurrence. It is supposed to approach the limit set by entropy. 23 from Compression II Let the message to be encoded be a1a2a3a3a4 24 from 25 Compression II from Compression II So, any number in the interval [0.06752,0.0688) , for example 0.068 can be used to represent the message. Here 3 decimal digits are used to represent the 5 symbol source message. This translates into 3/5 or 0.6 decimal digits per source symbol and compares favorably with the entropy of -(3x0.2log100.2+0.4log100.4) = 0.5786 digits per symbol 26 from Compression II As the length of the sequence increases, the resulting arithmetic code approaches the bound set by entropy. In practice, the length fails to reach the lower bound, because: The addition of the end of message indicator that is needed to separate one message from another The use of finite precision arithmetic 27 from Compression II Decoding: Decode 0.572. Since 0.8>code word > 0.4, the first symbol should be a3. 28 from Compression II Therefore, the message is a3a3a1a2a4 29 1.0 0.8 0.72 0.592 0.5728 0.8 0.72 0.688 0.5856 0.57152 0.4 0.56 0.624 0.2 0.48 0.592 0.0 0.4 0.56 0.5728 0.5664 0.56 056896 0.56768 0.5664 30 8.2.3 Arithmetic Coding Variable-length to Variable-length Lossless entropy coding for digital signals One source symbol may produce several bits; several source symbols (letters) may produce a single bit. Source model (Probability distribution) can be derived in real time. Similar to Huffman prefix codes in special cases. 31 8.2.3 Arithmetic Coding (cont.) Principle: A message (source string) is represented by an interval of real numbers between 0 and 1.More frequent messages have larger intervals allowing fewer bits to specify those intervals. 32 8.2.3 Arithmetic Coding (cont.) Example: Ax: Source symbol a b c d 33 Probability (binary) Cumulative Probability (binary) 0.500 0.100 0.000 0.000 0.250 0.010 0.500 0.100 0.125 0.001 0.750 0.110 0.125 0.001 0.875 0.111 8.2.3 Arithmetic Coding (cont.) The length of an interval is proportional to its probability 0.000 0.000 [ 0.500 0.100 a [ b 0.250 0.750 0.110 0.850 0.111 [ [ c d 1.0 1.0 ) Any point in the interval [0.0,0.5) represents “a”; say, 0.25(binary: 0.01), or 0.0(binary: 0.00) Any point in the interval [0.75,0.875) represents “c”; say, 0.8125(binary: 0.1101), or 0.75(binary: 0.110) 34 8.2.3 Arithmetic Coding (cont.) Transmitting 3 letters: 0.0 1st letter "a" [ 0.1 a 0.01 0.0 2nd letter "a" [ a 0.0011 0.0 0.001 0.011 3rd letter "b" [ [ [ 0.11 b [ 1.0 c [d) 0.1 [ [ [) [ [) t Any point in the interval [0.001,0.0011) identifies “aab”; say, 0.00101, or 0.0010. 35 t Need a model (probability distribution) 8.2.3 Arithmetic Coding (cont.) Procedure: Recursive computation of key values of an interval: C (Code Point)-leftmost point A (Interval Width) Receiving a symbol ai New C = Current C + (current A × Pi ) New A = Current A × pi Where Pi = Cumulative probability of ai P = Probability of a i i 36 8.2.3 Arithmetic Coding (cont.) 【Encoder】 Step 0: Initial C = 0; Initial A = 1 Step 1: Receive a source symbol (If no more symbols, it’s EOF) Compute New C and New A Step 2: If EOF, {send the code string that identifies this current interval; stop} Else {Send the code string that has been uniquely determined so far. Goto step 1} 37 8.2.3 Arithmetic Coding (cont.) 【Decoder】 Step 0: Initial C = 0; Initial A = 1 Step 1: Examine the code string received so far, and search for the interval in which it lies. Step 2: If a symbol can be decided, decode it. Else goto step 1 38 Step 3: If {this symbol is EOF, STOP} Else {Adjust C and A; goto step 2} 8.2.3 Arithmetic Coding (cont.) More details (I.H. Witten et al, “Arithmetic Coding for Data Compression”, COMM ACM, pp.520-540, June 1987) Integer arithmetic scale intervals up Bits to follow (undecided symbol…) Updating model 39 8.2.3 Arithmetic Coding (cont.) 40 Performance Advantages (1) Approach H1 X when possible delay and data precision (2) Adapted to the local statistics (3) Inter-letter correlation can be reduced by using conditional probability(model with context) (4) Simple procedures without multiplication and division have been developed(IBM Q-coder, AT&T Minimax-coder) Disadvantages: Sensitive to channel errors. 8.3 Lossless Compression Methods (1) Lossless prediction coding (2) Run-length coding of bit planes. 41 8.3.1 Lossless Predictive Coding Sample + Residual _ Previous Sample (a) Residual Reconstructed Sample + Previous Sample (b) Figure: Block diagram of a)an encoder, and b) a decoder using a simple predictor 42 8.3.1 Lossless Predictive Coding (cont.) Example: integer prediction b c a x d x int a b 2 ^ or 43 x int a b c d 4 ^ 8.3.1 Lossless Predictive Coding (cont.) Relative frequency 0 Original image intensity (a) Relative frequency 255 255 Integer prediction error 255 (b) Histogram of (a)the original image intensity and (b) integer prediction error 44 8.3.2 Run-Length Coding Source model: First-order Markov sequence, probability distribution of the current state depends only on the previous state, P X n X n 1 , X n 2 ,... P X ' n X n 1 Procedure: Run=k: (k-1) non-transitions followed by a transition. BBB W BB WWW ... 3 1 2 ... tB 1- w t 45 W B tw 1- B t 8.3.2 Run-Length Coding (cont.) lRemarks: ♦ All runs are independent (If runs are allowed to be ) ♦ Entropy of runs Entropy of the original source ♦ Modified run-length codes: RUNS a limit L 46 8.3.2.1 Run-Length Coding of Bit-Plane 0 1 0 0 0 130 0 0 1 8-bit gray level Mosst significant Figure: Bit-plane decomposition of an 8-bit image Gray code 1_D RLC 2_D RLC 47 8.4 Rate-Distortion Theory x n Loss Coding y n x n Distortion Measure: d x n , y n 0 48 y n 8.4 Rate-Distortion Theory (cont.) Random variables X and Y are related by mutual information I with P x, y I X ; Y P x, y log 2 P x P y H X H X Y Where H X represents the uncertainty about X before knowing Y, H X Y represents the uncertainty about X after knowing Y, and I X ; Y represents the average mutual information, i.e., the information provided about X by Y. 49 8.4.1 Rate-Distortion Function For Discrete-Amplitude Memoryless Source (DMS) X: x Ax ai , which forms source alphabet y Ay bi , which forms destination alphabet with P X given, and Single-letter distortion measure: d ai , bi Hence, average distortion E d x, y P x, y d x, y 50 8.4.1 Rate-Distortion Function (cont.) l Rate-Distortion Function R D or (Distortion-Rate Function D R ) min R D I X ; Y E d x, y D P X , Y -The average number of bits needed to represent a source symbol, if an average distortion D is allowed. 51 8.4.2 Source Coding Theorem l A code or codebook B of size M, block length N is a set of reproducing vectors (code words) B = y1 , y2 ,..., yM , , where each code word has N components, yi yi 1 , yi 2 ,..., yi N , l Coding Rule: A mapping between all the N-tuple source words x and B. Each source word x is mapped to the codeword y B that minimizes d x, y ; that is, d 52 x B min d yB x, y 8.4.2 Source Coding Theorem (cont.) Average distortion of code B: 1 E d x B P x d x B N all x l Source Coding Theorem: For a DMS X with alphabet Ax ,probability p(x) , and a single-letter distortion measure d( , ) , then, for a given average distortion D, there exists a sufficient large block length N, and a code B of size M and block length N such that log M Rate 53 N R D 8.4.2 Source Coding Theorem (cont.) In other words, there exits a mapping from the source symbols to codewords such that for a given distortion D, R D bits/symbol are sufficient to enable source reconstruction with an average distortion that is arbitrarily close to D. The function R D is called the rate-distortion function. Note that R 0 H X The actual rate R should obey R R D for the fidelity level D. 54 8.4.2 Source Coding Theorem (cont.) Rate,R H(X) R(D) D max 0 55 D Distribution, Distortion, D 8.5 Scalar Quantization Block size = 1 --Approximate a continuous-amplitude source with finite levels , given by Q s ri , if s di 1 , di , i 1,..., L A scalar quantizer Q is a function that is defined in terms of a ri finite set of decision levels d i and reconstruction levels where L is the number of output states. 56 8.5.1 Lloyd-Max Quantizer (Nonuniform Quantizer) To minimize E s ri 2 L di i 1 di1 s ri p s ds 2 w.r.t ri and di , it can be shown that the necessary conditions are d given by sp s ds i ri di 1 di , 1 i L p s ds di 1 ri ri 1 , 1 i L 1 2 with d 0 , d L di 57 8.5.1 Lloyd-Max Quantizer (Nonuniform Quantizer) (cont.) Example 1: Gaussian DMS X 0, X with squared-error distortion 1 X2 , 0 D X2 log 2 R D 2 D 2 0 D R 22 R X2 , D> X2 Uniform scalar quantizer at high rates: 2 X2 1 R D log 2 R ( D ) 0.255 2 D D* R 2 22 R X2 * 58 where 2 e 6 2.71 8.5.1 Lloyd-Max Quantizer (Nonuniform Quantizer) (cont.) Example 2: Lloyd-Max quantization of a Laplacian distribution signal with unity variance. Table: The decision and reconstruction levels for Lloyd-Max quantizatizers. levels 2 4 di 59 1 2 3 4 ri 0.707 1.141 8 di ri 1.102 0.395 1.810 1.087 di ri 0.504 1.181 2.285 0.731 0.222 0.785 1.576 2.994 8.5.1 Lloyd-Max Quantizer (Nonuniform Quantizer) (cont.) Where defines uniform quantizers. In case of nonuniform quantizer with L=4, d 2 d 0 X d 1 d1 1.102 X r 2 r1 1.810 d 1 d3 1.102 d 0 d2 0 r 1 r2 0.395 d 2 d4 X X r1 r3 0.395 r 2 r4 1.810 Example 3: Quantizer noise For a memoryless Gaussian signal s with zero mean and variance 2, we express the mean square quantization noise as D E s s 60 , 2 8.5.1 Lloyd-Max Quantizer (Nonuniform Quantizer) (cont.) then the signal noise ratio in is given by SNR 10 log10 2 D It can be seen that SNR 40dB implies D 10, 000 . Substituting this result into the rate-distortion function for a memoryless Gaussian source, given by 2 1 2 R D log 2 2 D we have R D 7 bits/sample. Likewise, we can show that quantization with 8 bits/sample yields approximately 48dB SNR . 61 8.6 Differential Pulse Code Modulation (DPCM) Input e - e Q Entropy Binary strean coder Quantizer s P s + Predictor (a) Binary Entropy strean decoder e + s s Decoded image P (b) Predictor with s s e for reconstruction Block diagram of a DPCM a)encoder b)decoder 62 Prediction: s Pr ed s 8.6.1 Optimal Prediction For 1-D case s n h k s n k k a k s n k k 63 in the source model 8.6.1.1 Image Modeling Modeling the source image by a stationary random field, a linear minimum mean square error (LMMSE) predictor, in the form s n1 , n2 c1s n1 1, n2 1 c2 s n1 1, n2 c3 s n1 1, n2 1 c4 s n1 , n2 1 can be designed to minimize the mean square prediction error T E s s s s 64 8.6.1.1 Image Modeling (cont.) The optimal coefficient vector c is given by c 1 where Rs 0,1 Rs 0, 2 Rs 1, 0 Rs 0, 0 R 0, 1 R 0, 0 R 0,1 R 1, 1 s s s s Rs 0, 2 Rs 0, 1 Rs 0, 0 Rs 1, 2 Rs 1, 0 Rs 1,1 Rs 1, 2 Rs 0, 0 and Rs 1,1 65 Rs 1,0 Rs 1, 1 Rs 0,1 T 8.6.1.1 Image Modeling (cont.) Although the optimal coefficient vector is optimal in the LMMSE sense, they are not necessarily optimal in the sense of minimizing the entropy of the prediction error. Furthermore, images rarely obey the stationary assumption. As a result, most DPCM schemes employ a fixed predictor. 66 8.6.1.1 Image Modeling (cont.) The analysis and design of the optimum predictor is difficult, because the quantizer is inside the feedback loop. A heuristic idea is adding a quantization noise rejection filter before the predictor. To avoid channel error propagation, leaky predictor may be useful. Variation of DPCM: Adaptive Prediction and Quantization. 67 8.6.2 Adaptive Quantization Adjusting the decision and reconstruction levels according to the local statistics of the prediction error. Q1 0.5 Q2 1.0 Block Q3 1.75 Q4 2.5 68 Coded block 8.7 Delta Modulation Sn S n 1 e e Fig. The quantizer for delta modulation 69 8.7 Delta Modulation (cont.) Granularity Slope overload Fig. Illustration of granular noise and slope overload 70 8.8 Transform Coding Motivation Rate-Distortion Theory –Insert distortion in frequency domain following the rate-distortion theory formula. Decorrelation-Transform coefficients are (almost) independent. Energy Concentration-Transform coefficients are ordered according to the importance of their information contents. 71 8.8.1 Linear Transforms Discrete-Space Linear Orthogonal Transforms Separable Transform: y k , l t1 m, k x m, n t2 n, l m n choose t1 , t2 , 72 DFT(Discrete Fourier Transform) DCT(Discrete Cosine Transform) DST(Discrete Sine Transform) WHT(Walsh-Hadamard Transform) … <= Fixed Basis Functions 8.8.1 Linear Transforms (cont.) Non-separable Transform: KLT(Karhunen-Loeve) -Basis functions bk are derived from the auto-correlation matrix R of the source signals by R bk k bk 73 8.8.1 Linear Transforms (cont.) KL Transform The KLT coefficients are uncorrelated. (If KLT coefficients are independent.) x is gaussian, KLT offers the best energy compaction. If x is 1st-order Markov with correlation coefficients : DCT is the KLT of such a source with 1 DST is the KLT of such a source with 0 74 Performance of typical still images: DCT KLT 8.8.2 Optimum Transform Coder For stationary random vector source x witch covariance matrix R x n Linear Transform A y k Scalar Quant y Q Goal: Minimize mean square coding errors E xx 75 x x T k Linear Transform B x n 8.8.2 Optimum Transform Coder (cont.) Question: What are the optimum A, B & Q ? Answer: A is the KLT of x Q is the optimum (entropy-constrained) quantizer for each y k B A1 76 8.8.3 Bit Allocation How many bits should be assigned to each transform coefficient? Total Rate 1 R N N n k 1 k where N is the block size, and nk is the bits assigned to the k th coefficient, y k 77 8.8.3 Bit Allocation (cont.) Distortion: MSE in transform domain 1 D N N y k y k 2 E k 1 Recall the rate-distortion function of the optimal scalar quantizer is Dk k2 k2 22 nk 78 8.8.3 Bit Allocation (cont.) where k2 2 is the variance of the coefficient y k , and k is a constant depending on the source probability distribution (=2.71 for Gaussian distribution). 1 Hence, D N 79 N k 1 2 k 2 k 22 nk 8.8.3 Bit Allocation (cont.) For a given total rate R, assuming all k ’s have the same value (= ), the results are: 1 k2 2 ; log 2 k nk 2 ; k2 0 1 2 N D min , k2 N k 1 1 R 2N log 2 2 2 k ; k k N If R is given, is obtained by solving the last equation. 80 8.8.3 Bit Allocation (cont.) Except for a constant (due to scalar quantizer), the above results are identical to the rate-distortion function of the stationary Gaussian source. That is, transform coefficients less than are not transmitted. 2 k2 A 81 0 B A B N k 8.8.4 Practical Transform Coding [Encoder] Image Block T Coeff Selection & Quantizer Entropy Encoder [Decoder] Codes 82 Entropy Decoder T -1 Peproduced Image Block Codes 8.8.4 Practical Transform Coding (cont.) Block Size: 8×8 Transform: DCT (type-2 2D) S k1 , k2 N 1 N 1 2n1 1 k1 2n2 1 k2 4 C k C k s n , n cos cos 1 2 1 2 N2 2 N 2 N n1 0 n2 0 where k1, k2 , n1, n2 , 0,1,..., N 1, and 1 for k 0 2 C k otherwise 1 83 8.8.4 Practical Transform Coding (cont.) Threshold Coding: S k1 , k2 S k1 , k2 NINT T k , k 1 2 16 12 14 14 T k1.k2 18 24 49 72 84 11 10 16 12 14 19 13 16 24 24 26 40 40 58 57 51 60 69 17 22 29 22 37 56 51 87 80 68 109 103 35 55 64 81 104 113 64 78 87 103 121 120 92 95 98 112 100 103 61 55 56 62 77 92 101 99 8.8.4 Practical Transform Coding (cont.) Zig-zag Scanning: 85 8.8.4 Practical Transform Coding (cont.) Entropy Coding: Huffman, or Arithmetic Coding DC, AC Source block DC DCT Divide by QM DPCM encoder Entropy coder Bits AC coefficients Bits Entropy decoder DPCM decoder AC coefficients 86 DC Multiply by QM IDCT Reconstructed block 8.8.5 Performance For typical CCIR 601 pictures: Excellent 2 bits/pel Good 0.8 bits/pel Blocking artifacts on reconstructed pictures at very low bit rates (< 0.5 bits/pel) Close to the best known algorithm around 0.75 to 2.0 bits/pel Complexity is acceptable. 87