Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
:: Final Presentation 2-D Discrete Cosine Transform The Future Team Paradigm (Group M2): Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping Zhan M2: Team Paradigm Team Paradigm M2: Team Paradigm M2: Team Paradigm The Future of Technology... M2: Team Paradigm Strategic Applications : High-resolution Digital Television (HDTV) : MPEG-1 and MPEG-2 : JPEG images M2: Team Paradigm We notice an exponential growth of profit! :: The Concept 6000000 5000000 4000000 3000000 2000000 1000000 0 l Bo w pe r TV Su s a Bi llb oa rd rd s oa M eg H gw y Bi llb R ad i o s Series1 os te r Sm al lP Dollar Figure Profit Advertising Profit per Medium Ad Source Thinking Outside The Box M2: Team Paradigm ::What is the Product? : T.A.D.A system (Targeted Advertisement Digital Adboard) : Taxicabs serve as mobile ad unit : Each cab equipped with a digital ad board : Ad board contains GPS transmitter, HDTV satellite receiver, solar panel/battery power Thinking out of the box M2: Team Paradigm ::Extended Product Measures : Target Grid System (TGS) : Central HUB Center (CHUB Center) : Joint Venture with Lucent Technologies & Bell Laboratories Young Adults (Gen X) Educational Zone Cautious Spenders M2: Team Paradigm ::Marketing M2: Team Paradigm M2: Team Paradigm Distribution M2: Team Paradigm Risks and Contingencies : Lack of specialization in this area - Partnership with Lucent Technologies - Difficulty in entering a new market :: What are the benefits? : : : : Expand company’s capabilities Gain profit in a new market Acquire new clients Advantage over competitors M2: Team Paradigm How does it work? M2: Team Paradigm Distributed algorithm of 1D DCT X0 A A A A x0 + x7 X2 B C -C -B x1 + x6 X4 A -A -A A x2 + x5 X6 C -B x3 + x4 X1 D E X3 E -G -D -F X5 X7 = 1/2 = 1/2 B -C F F -D G G -F G E E -D x0 - x7 x1 - x6 A = cos(/4) B = cos(/8) C = sin(/8) D = cos(/16) E = cos(3/16) F = sin(3/16) G = sin(/16) x2 - x5 x3 - x4 M2: Team Paradigm Distributed algorithm of 1D DCT (cont...) In two’s complement representation: ui = -buiB-1 + j=1, B-1 2-jbuij Where, buij is the jth bit, buiB-1 is the MSB, i.e. the sign bit Xn = j=1,B-1 2-jDn(bj) – Dn(bB-1), where Dn(bj) = (i=1,3Ci,n buij) D0(b14) = Ab014+Ab114+Ab214+Ab314 For example, X0 A A A A b015 b014…b00 X2 B -C -B b115 b114…b10 X4 A -A -A A b215 b214…b20 X6 C -B b315 b314…b30 = C B -C M2: Team Paradigm Structure of 1D DCT R0 R7 R0 R7 Selector Bit Address Generator R0 R7 Rom0 R0 R7 Bit Address Generator Rom0 Rom7 S1 S0 R5 bit 1 bit 1 bit 1 bit 1 Rom7 R6 1011 Parallel to Serial 1D DCT Simply repeat on rows to make 2D M2: Team Paradigm 2D DCT Data in 1D DCT (on rows) Transpose RAM 1D DCT (on columns) Data out Control logic Two 1D DCT can operate in pipeline to boost throughput performance, this requires RAM can be read and wrote at the same time and each 1D DCT module read/write the RAM in row and column order alternatively. M2: Team Paradigm Design Process M2: Team Paradigm Transistor count and performance estimation : 1DDCT module : adder register ROM Control logic total pins 4x(15x34+12) =1500 18x16x20 =5762 8x16x2 1000 ~9k 40 Shift Register Muxes SRAM mux(44x20)+ ff(18x20)=1300 2000 6000 throughput latency 8 samples/64 cycle 528 cycle 2DDCT = 2x1DDCT + SRAM ~ 24k M2: Team Paradigm Design Process : : : : : : : : : Design Proposal Architecture Proposal Floorplan Gate Level Design Component Layout Component Simulation Component Layout Chip Level Simulation Final Design Corrections M2: Team Paradigm M2: Team Paradigm Da Breakdown : Key to our success was breaking down our components into individual large blocks - 1D DCT - SRAM : Further we broke down the 1D DCT - easily connected - ease in simulating, lvs'ing, drc'ing M2: Team Paradigm ::Mid-Buffer : : : Dimensions: - 82.9u X 87.4u Metals: - M1, M2, M3 Directionality: - Left to Right and Down M2: Team Paradigm Accumulator and P to S M2: Team Paradigm Inbuffer M2: Team Paradigm Sram M2: Team Paradigm Sram Control M2: Team Paradigm Control M2: Team Paradigm Floorplan M2: Team Paradigm reg reg reg reg reg eg 4bit 16x1 mux 16bit 2x1 mux reg reg rom 16bit Add 16bit 4x1 mux 1x4 demux 4bit 16x1 mux Add 16bit 1x8 demux reg reg reg reg shift reg 600u 16bit 16bit Sub 4x1 mux 1x4 demux rom reg reg 150u Add Old floorplan proposal reg reg Control logic M2: Team Paradigm Floor plan Proposal 200u Add Add regregregregregregregreg regregregreg shift reg rom Add rom shift reg 4bit 16x1 mux Add 500u regregregreg ctrl 4bit 16x1 mux M2: Team Paradigm Layout Proposal 1D DCT Take bits 015 R7 Sub R0 R6 R1 MUX 4x1 R5 32' Take bits 1632 DeMux 4x1 Add DeMux 4x1 Rom Shift Reg Reg 8x16' R2 R4 R1 Control Logic approx. 220,000u 220u x 100u Add Rom Add M2: Team Paradigm 2D-DCT – Floorplan (new) 430u by 400u M2: Team Paradigm Layout Size Proposal : Using a reference of an inverter - 7u x 2.5u =14u total area - Contain 2 transistors : Our design has total of approx 24k - add space for wiring : Total area estimation of around 400,000u +100,000 : =500,000u M2: Team Paradigm Verification M2: Team Paradigm High level simulation (in C/C++) : three implementation of 1DDCT: 1. Based on definition 2. Based on fast algorithm 3. Based on distributed algorithm Function 1 Function 2 compare input pass/fail Function 3 Matlab M2: Team Paradigm Step 1: R0 R7 We begin by inputting eight, sixteen bit values into individual registers Selector R0 R7 - We use a selector to select the registers that will be added and subtracted The R0 & R7 values are added and subtracted in parallel...So forth for R1 & R6...R2 & R5....R3 & R4 It will take 8 clock cycles to get all the data M2: Team Paradigm Step 1 (Verilog) always @ (posedge clk or negedge rst) begin if(rst==0) begin count <= 0; end else begin if(in_clr==1) begin count <= 0; end else begin if(in_valid && ~out_full) begin buf[count] <= in_data; count <= count + 1; end end end end // always @ (posedge clk or negedge rst) always @ (posedge clk) begin if(in_read) begin out_data1 <= buf[in_addr]; out_data2 <= buf[7-in_addr]; end end Write operation Read operation M2: Team Paradigm Step 2 bit 1 bit 1 bit 1 bit 1 R0 R7 1011 Bit Address Generator Rom0 Rom7 Store the results from the addition and subtraction into 8, 16' registers Taking the first bit in each of the four registers (addition results and subtraction result) we use the value to allow the bit address generator to store it in the proper position in ROM M2: Team Paradigm Step 2 (Verilog) always @ (posedge clk or negedge rst) begin if(rst==0) begin count <= 0; end else begin if(in_clr==1) begin count <= 0; end else begin if(in_read & ~out_full) begin buf[count] <= in_data; count <= count + 1; end end end end always @ (in_bitpos) begin out_addr[3] <= buf[0][in_bitpos:in_bitpos]; out_addr[2] <= buf[1][in_bitpos:in_bitpos]; out_addr[1] <= buf[2][in_bitpos:in_bitpos]; out_addr[0] <= buf[3][in_bitpos:in_bitpos]; end Read operation Bit address generator M2: Team Paradigm Step 3 From the ROM the data in the addresses are added, stored in a register then the result is shifted (multiplied by a factor of two...two's complement) Rom0 Rom7 S1 S0 R5 R6 Parallel to Serial M2: Team Paradigm Step 3 (Verilog) always @ (posedge clk or negedge rst) begin if(rst==0) begin out_data <= 0; bit_pos <= 15; end else begin if(in_clr==1) begin out_data <= 0; bit_pos <= 15; end else begin if(~out_done) begin out_data <= out_data + in_data; bit_pos <= bit_pos - 1; end end // else: !if(in_clr==1) end end M2: Team Paradigm C Code Result M2: Team Paradigm Verilog Verification - 189c, ef9c M2: Team Paradigm Schematic Verification - 189c, ef9c M2: Team Paradigm Layout M2: Team Paradigm Poly and Active 1D-DCT M1 M2: Team Paradigm M4 M3 M2 M2: Team Paradigm 2D-DCT M2: Team Paradigm LVS M2: Team Paradigm 2D DCT dimension Original: 458*450 New: 458 x 439 M2: Team Paradigm Simulation strategy : Simulate 1D DCT : Only simulate using relevant SRAM cells - Simulating whole chip is inefficient - Simulating whole SRAM is unnecessary - Most thorough yet efficient method : This plan is consistent with that of the recommendations made by the class faculty M2: Team Paradigm 2D DCT Datasheet Specs Specifications Area Aspect ratio Transistors Density Speed Pin number = = = = = = 443 um X 437.3 um = 193,733 um2 1 : 1.013 34,660 0.1789 trans. / um2 = 5.590 um2 / trans. 200 MHz 19 inputs, 18 outputs Features Application - lossy compression situations - any form of media streaming or other forms of media storage Chip - pipelined, so that addition and subtraction occur same time - allows processing of two images/audio at once (read/write sram) Description The 2D DCT Chip is a fast and relatively small compression chip. It is based on the Discrete Cosine Transform-II. This algorithm is often used in signal and image processing, especially for lossy data compression. The reason for this is because it has a strong compaction property: most of the signal information tends to be concetrated in a few low-frequency components. Examples of its use include JPEG image compression, MJPEG video compresion, and MPEG video compression. Essentially the image is filtered to discard small (difficult-to-see) components. A modified version of this algorithm is used in AAC, Vorbis and MP3 audio compression. M2: Team Paradigm Conclusions M2: Team Paradigm Yaping - Integrated Circuit Rapper : IC Records M2: Team Paradigm Changshi - New Age Hippy Group M2: Team Paradigm Tommy - Basketball and Beyonce M2: Team Paradigm Brandon - The Next Hugh Hefner M2: Team Paradigm Bong --> Asain Boy Band - H.O.T. M2: Team Paradigm Now that the semester is over.... : We only have one thing to say..... M2: Team Paradigm