Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECE 232 Hardware Organization and Design Lecture 4 Performance, Design with VHDL Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Outline ° Performance, evaluation • Metrics: MIPS, CPI, execution time • Amdahl’s law ° VHDL basics • Combinational logic • Examples ° Instruction formats, cont’d • Addressing classes, modes • Examples • MIPS assembly ECE 232 L4 perform.2 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Two notions of “performance” Plane NY to Paris Speed Passengers Throughput (p/mph) Boeing 747 6.5 hours 610 mph 470 286,700 Concodre 3 hours 1350 mph 132 178,200 Which has higher performance? ° Time to do the task (Execution Time) – execution time, response time, latency ° Tasks per day, hour, week, sec, ns. .. (Performance) – throughput, bandwidth Response time and throughput often are in opposition ECE 232 L4 perform.3 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Performance - Example Performance: in units of things/time_unit - bigger is better • Time of Concorde vs. Boeing 747? • Concord is 1350 mph / 610 mph = 2.2 times faster (6.5 hours / 3 hours) • Throughput of Concorde vs. Boeing 747 ? • Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” • Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster” • Boeing is 1.6 times (“60%”)faster in terms of throughput • Concord is 2.2 times (“120%”) faster in terms of flying time We will focus primarily on execution time for a single job ECE 232 L4 perform.4 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Metrics of performance Answers per month Application Useful Operations per second Programming Language Compiler (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s ISA Datapath Control Megabytes per second Function Units Transistors Wires Pins Cycles per second (clock rate) Each metric has a place and a purpose, and each can be misused ECE 232 L4 perform.5 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Review: Aspects of CPU Performance CPU time = Seconds Program = Instructions x Cycles Program Instr count Instruction CPI Program X Compiler X X Instr. Set X X Organization Technology ECE 232 L4 perform.6 Adapted from Patterson 97 ©UCB x Seconds X Cycle Clock rate X X Copyright 1998 Morgan Kaufmann Publishers MIPS, CPI • MIPS = # instructions per cycle (in millions) MIPS = Instruction count / Execution time *106 • CPI = average # cycles per instruction CPI = Clock Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count n CPU time = ClockCycleTime * CPI i =1 i cycles per intstruction class i * Instri n CPI i =1 CPI = i * Fi where Fi = "instruction frequency" Instri Instruction Count ° Invest Resources where time is Spent! ECE 232 L4 perform.7 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Evaluating Instruction Sets Design-time metrics: ° Can it be implemented, in how long, at what cost? ° Can it be programmed? Ease of compilation? Static Metrics: ° How many bytes does the program occupy in memory? Dynamic Metrics: ° How many instructions are executed? ° How many bytes does the processor fetch to execute the program? ° How many clocks are required per instruction? ° How "lean" a clock is practical? Best Metric: Time to execute the program! NOTE: this depends on instructions set, processor organization, and compilation techniques. ECE 232 L4 perform.8 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Example (RISC processor) Base Machine (Reg / Reg) Op Freq Cycles ALU 50% 1 Load 20% 5 Store 10% 3 Branch 20% 2 CPI(i) .5 1.0 .3 .4 2.2 % Time 23% 45% 14% 18% Typical Mix • How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? • How does this compare with using branch prediction to shave a cycle off the branch time? • What if two ALU instructions could be executed at once? ECE 232 L4 perform.9 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Amdahl's Law Speedup due to enhancement E: Exec_time w/o E Speedup(E) = ----------------------Exec_time with E = F Performance with E --------------------------Performance w/o E F 1 1/S Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected. Then: Exec_time(with E) = (F/S + (1-F) ) X Exec_time(w/o E) Speedup(with E) = ECE 232 L4 perform.10 1 (1-F) + F/S Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Review: Summary of the Design Process • Hierarchical Design to manage complexity • Top Down vs. Bottom Up vs. Successive Refinement • Importance of Design Representations: • • • • • Block Diagrams Decomposition into Bit Slices Truth Tables, K-Maps Circuit Diagrams Other Descriptions: - state diagrams - timing diagrams - register transfer, . . . • Optimization Criteria: Area Gate Count top down bottom up Logic Levels Delay Power Fan-in/Fan-out [Package Count] Cost Design time Pin Out ECE 232 L4 perform.11 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Hardware Representation Languages Block Diagrams: FUs, Registers, & Dataflows Register Transfer Diagrams: Choice of busses to connect FUs, Regs Flowcharts State Diagrams Two different ways to describe sequencing & microoperations Hardware Description Languages HW modules described like programs with i/o ports, internal state, & parallel execution of assignment statements Verilog HDL VHDL Descriptions in these languages can be used as input to simulation systems "software breadboard" synthesis systems generate hw from high level description "To Design is to Represent" ECE 232 L4 perform.12 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers VHDL (VHSIC Hardware Description Language) ° Goals: • Support design, documentation, and simulation of hardware • Digital system level to gate level • “Technology Insertion” ° Concepts: • Design entity • Time-based execution model. Interface = External Characteristics Design Entity = Hardware Component Architecture (Body ) = Internal Behavior or Structure ECE 232 L4 perform.13 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers VHDL Example: nand Gate ENTITY nand is PORT (a,b: IN BIT; names (given) y: OUT BIT); END nand; a ARCHITECTURE behavioral OF nand is BEGIN nand y b y <= a NAND b; END behavioral; ° Entity describes interface ° Architecture give behavior (function) ° y is a signal, not a variable • it changes whenever the inputs change • NAND process is in an infinite loop ° Bit is 0, 1. Can also use STD_LOGIC (0,1, Z,X) ECE 232 L4 perform.14 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Modeling Delays ENTITY nand is PORT (a,b: IN BIT; y: OUT BIT); END nand; ARCHITECTURE behavioral OF nand is BEGIN y <= a NAND b after 1 ns; END behavioral; ° Model temporal, as well as functional behavior, with delays in signal statements. Time is one difference from programming languages ° Output y changes 1 ns after a or b changes ° Delay statements not supported by synthesis tools (non-synthesizable) ECE 232 L4 perform.15 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Bit-vector Operators ENTITY nand32 is PORT (a,b: IN STD_LOGIC_VECTOR ( 31 downto 0); y: OUT STD_LOGIC_VECTOR ( 31 downto 0); END nand32; ARCHITECTURE behavioral OF nand32 is BEGIN y <= a NAND b; STD_LOGIC_VECTOR END behavioral; • Can be converted to a 32 bit integer a [31:0] Y[31:0] nand32 b [31:0] ECE 232 L4 perform.16 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Simple Operators LIBRARY ieee; a 0 ENTITY mux2to1 IS PORT (a, b, sel: IN STD_LOGIC; b y: OUT STD_LOGIC; END mux2to1; 1 mux2to1 USE ieee.std_logic_1164.all; y sel ARCHITECTURE logic OF mux2to1 IS BEGIN WITH sel SELECT y <= a WHEN ‘0’ ; b WHEN OTHERS; END logic ; You can also use other constructs: IF … THEN WHEN, etc. ° Must use “others”, since sel={0,1,Z,X} (std_logic) ECE 232 L4 perform.17 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Arithmetic Operations ENTITY add32 is PORT (a,b: IN STD_LOGIC_VECTOR ( 31 downto 0); y: OUT STD_LOGIC_VECTOR ( 31 downto 0); END add32; ARCHITECTURE behavioral OF add32 is BEGIN y <= addum(a, b) ; END behavioral; ° “addum” adds two n-bit vectors to produce an n+1 bit vector ° Alternatively, you can declare a,b,y as INTEGERS, and use y <= a+b. ECE 232 L4 perform.18 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers Control Constructs ENTITY mux32 is PORT(A, B: In STD_LOGIC_VECTOR (31 downto 0); DOUT: STD_LOGIC_VECTOR (31 downto 0); SEL: in BIT); End mux32; ARCHITECTURE behavior Of mux32 Is begin mux32_process: process(A, B, SEL) begin if (SEL= 0) then DOUT <= A; else DOUT <= B; end if; end process; end behavior; ° Process fires whenever its “sensitivity list” changes ° Evaluates the body sequentially ° VHDL provide case statements as well ECE 232 L4 perform.19 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers