Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
University of California, San Diego P. Venkat Rangan CSE 126: Multimedia Systems Spring 03 Solution for HW1 _______________________________________________________________________ 1. Multimedia: Color Concepts (6 points): 1.1 (2 points) Given three different kinds of color representations for a pixel in an image, which of them is better from the viewpoint of reducing the number of bits needed for their representation, and why? Sol. Three types of color representation have been widely used today are: 1. RGB (Red, Green, Blue) – Every color is a combination of three primary colors: red, green and blue. RGB allocates 8 bits for each of these colors to represent a true color. 2. HSB (Hue, Saturation, Brightness) – H indicates the dominant color of the color, S indicates how dominant of such a color is, and B represents the brightness of that color. 3. YUV – Y represents the intensity of the color while U and V represent the chrominance of the color. Since the human eye is more sensitive to intensity variation than color variation, we can allocate fewer bits to represent U and V. In fact, YUV encoding scheme provides similar quality in a color representation using 16 bits, as compared to 24 bits in an RGB representation. 1.2 (2 points) Supposing you are required to represent a rectangular object of red color and one eighth (1/8) the size of a TV screen (assume the TV to be HDTV resolution: i.e., it has 1000 horizontal lines). One way is to represent the red rectangle as an image of pixels. Another is to use geometric representation of the rectangle (assume the TV screen is a 2-dimensional coordinate system, and that up to 255 regular shapes such as polygons, etc., plus 1 category representing “all other shapes” are represented by the needed number of bits). How many bits do you need to represent in these two cases? Assume RGB representation for color, with each primary color represented by 1 byte. (Hint: In the pixel representation, each pixel will need its coordinate and the three colors specified.) Sol. We first consider the first approach, representing the red rectangle as an image of pixels. By the HDTV standard, we know that TV screen has 1000 scan lines and 16:9 aspect ratio. 16:9 Aspect Ratio Y Height = 1000 X Width = 1778 Let x and y be two dimension coordinates as shown above. As can be seen, the total possible number of x values is 1778 while that of y is 1000. Therefore, we need (10+11) = 21 bits to represent the coordinate (x,y) of each pixel. Because the size of the rectangle is one-eighth the size of the TV screen, the total number of pixels we are required to represent is (1/8)*1000*1000*(16/9). For each pixel, it needs 24 bits for its color representation and 21 bits for its coordinate. Hence, the total number of bits we need is (1/8)*1000*1000*(16/9)*(24+21). We now consider the second approach, using geometric representation of the rectangle. We can represent a rectangular shape by the following instruction: (‘rectangle’, starting coordinate (x,y), width, height, color) Because three are totally 256 possible types of shapes each shape can be distinguishably represented by 8 bits. In addition, we need 11 bits for starting x coordinate, 10 bits for starting y coordinate, 11 bits for width, 10 bits for height and 24 bits for color representation. Hence, we need totally 74 bits with this approach. 1.3 (1 point) Given a color represented in the RGB format by: Red = 0.80 Green = 0.50 Blue = 0.20 Find an equivalent color in the YUV format. Sol. Y U V = 0.3R + 0.59G + 0.11B = 0.3(0.8) + 0.59(0.5) + 0.11(0.2) = 0.24 + 0.295+0.022 = 0.557 = 0.493(B-Y) = 0.493(0.2-0.557) = -0.176 = 0.877(R-Y) = 0.877(0.8-0.557) = 0.213 1.4 ( 1 point) Given a color represented in the YUV format by: Y = 0.50 U = 0.10 V = 0.20 Find an equivalent color in the RGB format. Sol. B R G = Y + U/0.493 = 0.703 = Y + V/0.877 = 0.728 = (Y – 0.3R – 0.11B)/0.59 = 0.346 2. Television Video Synchronization: 5 points 2.1 (1 point) Differentiate between progressive and interlaced scanning in displays. Sol. Interlacing is a scanning technique used to keep video sequences smooth in displays. Frames are split in two halves: the horizontal odd scan lines and the even can lines. Both halves are displayed alternatively, yielding twice the video rate. Unlike the interlaced scanning, the progressive scanning displays the entire frame all at once. Even though its refresh rate is two time slower, it provides a better-quality pictures. 2.2 (2 points) Assume that you are making a new TV standard called MYTVSTD. This is different from NTSC in the sense that, instead of fixing the frame rate to be 29.97 frames per second, you have fixed it to be 29.95 frames per second. Now you want to propose a “drop frame” technique in which you drop some frame numbers from each minute to achieve synchronization with displays that run at 30Hz. Describe what frame numbers you will drop from which minute, and clearly show the reason for your answer. Sol. In a new MYTVSTD standard, the actual display rate is slower than the advertised display rate. We have to fix the frame rate to be 29.95 frames per second by dropping some frame numbers. In this case, the dropping rate of the frame number is (30-29.95) or 0.05 frames per second. In one minute since 1800 frames are displayed, 3 frame numbers must be dropped. Therefore, our scheme works by dropping the first three frame numbers of every minute. (The scheme in which you drop 1 frame number every 20 sec is also legitimate) 2.3 (2 points) Supposing instead, you propose a revolutionary new TV standard called REVTVSTD. In this standard, the frame rate is fixed to be 30.05 frames per second. How will you devise a scheme for the synchronization with 30Hz TV displays? Unlike the previous question, we have to fix the frame rate to be faster than the advertised rate. Instead of dropping, we duplicate 3 frame numbers every single minute. 3. Audio Coding: 6 points 3.1 (1 point) Describe the various steps in digitization and coding of audio signals. There are two main steps in digitization and coding of audio signals. 1. Sampling – sampling is a technique to convert an analog signal into a digital signal. It periodically samples an input signal and transforms into a sequence of intensity values (real numbers). 2. Quantization – quantization is a technique to round intensity values to a quantum so that they can be represented by a finite precision (usually bits). 3.2 The following sequence of real numbers has been obtained on sampling an audio signal: 2.3, 2.1, 3.2, 1.2, 1.3, 2.3, 2.5, 3.2, 3.8, 3.8, 2.5, 2.0, 1.4, 1.2, 1.2, 1.0, 0.8, 0.6,0.0, -0.3, 0.5, -0.8, -1.2, -1.5, -1.7, -1.9, -2.2, -2.5, -2.7, -2.9, -3.1, -3.9 Quantize this sequence by dividing the interval [-4, 4) into 32 uniformly distributed levels (place the level 0 at -4.0, the level 1 at -3.75, and so on. This should simplify your calculations). Assume that input values in the range [-4,-3.75) map to output -4 (which becomes level 0 for the quantizer), input values in the range [-3.75,-3.50) map to output 3.75 (which becomes level 1 for the quantizer), and so on. Here, the intervals are closed at the left and open at the right, which means -4 is included, but -3.75 is not in the first interval, and so on. 3.2.1 (2 points) Write down the quantized sequence. How many bits do you need to transmit it? Sol. The quantized of the sequence is as shown below: 25, 24, 28, 20, 21, 25, 26, 28, 31, 31, 26, 24, 21, 20, 20, 20, 19, 18, 16, 14, 14, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 0 From the above sequence, the difference between the maximum and the minimum level value is 31 (or say 32 possible values). Thus, each level can be optimally represented by 5 bits. Since there are totally 32 level values, we need 32*5 = 160 bits. 3.2.2 (3 points) Encode the quantized sequence using DPCM. What are the maximum and minimum differences between successive samples? Assuming these maximum and minimum values, find out how many bits are needed now to encode the sequence (1 points) Write a program in your favorite language to do the above computation. Give pseudocode for purposes of this homework. What is time complexity of the program? Express it in terms of the number of samples in the input. Sol. The following shows the quantized sequence encoded using DPCM. 25, -1, +4, -8, +1, +4, +1, +2, +3, 0, -5, -2, -3, -1, 0, 0, -1, -1, -2, -2, 0, -2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -3 The maximum and minimum between two successive samples are +4 and -8 respectively. Since the difference between the maximum and minimum values is 12, we need 4 bits to represent each value. However, the first number in the first sequence is encoded separately from the difference values. Therefore, we need 5 bits to represent the first value and 4 bits for each of the rest values. In conclusion, we need 5 + 4*31 = 129 total number of bits. 4. Quantization: 3 points 4.1 (1 points) Differentiate between scalar and vector quantization. Scalar quantization is a quantization technique that does not take relationships between various dimensions into account. It encodes each dimension independently. In constrast, vector quantization takes advantage of the relationships among the dimensions. (See sample in the lecture note) 4.2 (2 points) Suppose that you rewrite the input sequence in the above question,(on audio coding) as a sequence of pairs of numbers (adjacent two numbers in the sequence constituting a pair). Draw the set of resulting vectors. How many bits do you need to encode just these vectors? Does vector quantization help in this case? The above input sequence can be rewritten as a sequence of vectors of numbers (x,y) as follows: (25, 24), (28,20), (21,25), (26,28), (31,31), (26,24), (21,20), (20,20), (19,18), (16,14), (14, 12), (11,10), (9,8), (7,6), (5,4), (3,0) The difference between the maximum and minimum of the first value (x) is 28 while that of the second value (y) is 31. Therefore, in this case we need totally (5+5) = 10 bits to represent each vector. We now try to reduce the number of bits to represent the above sequence by using vector quantization. As can be seen, for each vector the first value (x) and the second (y) tends to be very close. Therefore, we can represent the relationship between each x and y coordinates of each vector as follows: the y coordinate of each vector is represented/transformed into new coordinate y’ = difference from its x coordinate. The result of the transformation is as shown below: (25, -1), (28,-8), (21,+4), (26,+2), (31,0), (26,-2), (21,-1), (20,0), (19,-1), (16,-2), (14, -2), (11,-1), (9,-1), (7,-1), (5,-1), (3,-3) Assume the maximum and minim size of y’ are those that are observed in the sequence being considered in the problem. The difference between the maximum and minimum of the first value (x) is 28 while the difference between the maximum and minimum of the second value (x-y) is only 12. Therefore, in this case we need only (4+5) = 9 bits to represent each vector.