Download + t exe t` exe

MAMAS – Computer Structure 234267 Lecturers: Lihu Rappoport Adi Yoaz Some of the slides were taken from Avi Mendelson, Randi Katz, Patterson, Gabriel Loh 1 Computer Structure 2012 – Introduction General Course Information   2 Grade  20% Exercise (mandatory) ‫תקף‬  80% Final exam  No midterm exam Course web site  http://webcourse.cs.technion.ac.il/234267  Foils will be on the web several days before the class Computer Structure 2012 – Introduction Class Focus  CPU      Introduction: performance, instruction set (RISC vs. CISC) Pipeline, hazards Branch prediction Out-of-order execution Memory Hierarchy    Cache Main memory Virtual Memory  Advanced Topics  PC Architecture  3 Motherboard & chipset, DRAM, I/O, Disk, peripherals Computer Structure 2012 – Introduction Computer System – Sandy Bridge External Graphics Card PCI express ×16 DDRIII Cache Channel 1 Mem BUS DDRIII Memory controller Core GFX System Agent Core Channel 2 Display link South Bridge (PCH) HDMI PCI express ×1 4 Serial Port Parallel Port IO Controller Floppy Drive keybrd USB SATA SATA controller controller controller mouse DVD Drive Hard Disk PCI Sound Card speakers Lan Adap LAN Computer Structure 2012 – Introduction Architecture & Microarchitecture  Architecture The processor features seen by the “user”   Micro-architecture The way of implementation of a processor    5 Instruction set, addressing modes, data width, … Caches size and structure, number of execution units, … Timing is considered uArch (though it is user visible) Processors with different uArch can support the same Architecture Computer Structure 2012 – Introduction Compatibility  Backward compatibility  New hardware can run existing software • Core2 Duo can run SW written for Pentium4, PentiumM, Pentium III, Pentium II, Pentium, 486, 386, 268  Forward compatibility     Architecture independent SW   6 New software can run on existing hardware Example: new software written with SSE2TM runs on older processor which does not support SSE2TM Commonly supports one or two generations behind JIT – just in time compiler: Java and .NET Binary translation Computer Structure 2012 – Introduction Moore’s Law The number of transistors doubles every ~2 years 7 Computer Structure 2012 – Introduction CPI – Cycles Per Instruction  CPUs work according to a clock signal    Instruction Count (IC)   Clock cycle is measured in nsec (10-9 of a second) Clock frequency (= 1/clock cycle) measured in GHz (109 cyc/sec) Total number of instructions executed in the program CPI – Cycles Per Instruction  Average #cycles per Instruction (in a given program) CPI =  8 #cycles required to execute the program IC IPC (= 1/CPI) : Instructions per cycles Computer Structure 2012 – Introduction Calculating the CPI of a Program   ICi: #times instruction of type i is executed in the program IC  IC: #instruction executed in the program: n  IC i 1    Fi: relative frequency of instruction of type i : Fi = ICi/IC CPIi – #cycles to execute instruction of type i  e.g.: CPIadd = 1, CPImul = 3 #cycles required to execute the entire program: # cyc  n  CPI i 1  i CPI: # cyc CPI   IC 9 i  ICi  CPI * IC n  CPI  IC i 1 i IC i n n ICi   CPI i    CPI i  Fi IC i 1 i 1 Computer Structure 2012 – Introduction CPU Time  CPU Time - time required to execute a program CPU Time = IC  CPI  clock cycle  10 Our goal: minimize CPU Time  Minimize clock cycle: more GHz (process, circuit, uArch)  Minimize CPI: uArch (e.g.: more execution units)  Minimize IC: architecture (e.g.: SSETM) Computer Structure 2012 – Introduction Amdahl’s Law Suppose enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: texe t’exe t’exe = texe × (1 – Fractionenhanced) + texe Speedupoverall = t’exe = Fractionenhanced Speedupenhanced 1 (1 - Fractionenhanced) + 11 Fractionenhanced Speedupenhanced Computer Structure 2012 – Introduction Amdahl’s Law: Example • Floating point instructions improved to run at 2×, but only 10% of executed instructions are FP t’exe = texe × (0.9 + 0.1 / 2) = 0.95 × texe Speedupoverall = 1 = 1.053 0.95 Corollary: Make The Common Case Fast 12 Computer Structure 2012 – Introduction Comparing Performance  Peak Performance    MIPS, MFLOPS Often not useful: unachievable / unsustainable in practice Benchmarks    Real applications, or representative parts of real apps Targeted at the specific system usages SPEC INT – integer applications • Data compression, C complier, Perl interpreter, database system, chess-playing, Text-processing, …  SPEC FP – floating point applications • Mostly important scientific applications  TPC Benchmarks • Measure transaction-processing throughput 13 Computer Structure 2012 – Introduction Evaluating Performance of future CPUs  Use a performance simulator to evaluate the performance of a new feature / algorithm    Models the uarch to a great detail Run 100’s of representative applications Produce the performance s-curve   Sort the applications according to the IPC increase Baseline (0) is the processor without the new feature 3% Bad S-curve 2% 6% Positive outliers Good S-curve Positive outliers 4% 1% 0% 2% -1% -2% Negative outliers -3% 0% Small negative outliers -2% -4% 14 Computer Structure 2012 – Introduction Instruction Set Design software The ISA is what the user / compiler see instruction set hardware 15 The HW implements the ISA Computer Structure 2012 – Introduction ISA Considerations  Reduce the IC to reduce execution time   Simple instructions  simpler HW implementation   E.g., a single vector instruction performs the work of multiple scalar instructions Higher frequency, lower power, lower cost Code size  Long instructions take more time to fetch  Longer instructions require a larger memory • Important in small devices, e.g., cell phones 16 Computer Structure 2012 – Introduction Architectural Consideration Example Immediate data size 30% Int. Avg. FP Avg. 20% 10% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0% Immediate data bits   17 1% of data values > 16-bits 12 – 16 bits of needed Computer Structure 2012 – Introduction CISC Processors  CISC – Complex Instruction Set Computer    The idea: a high level machine language Example: x86 Characteristic   Many instruction types, with a many addressing modes Some of the instructions are complex • Execute complex tasks • Require many cycles  ALU operations directly on memory • Only a few registers, in many cases not orthogonal  Variable length instructions • common instructions get short codes  save code length 18 Computer Structure 2012 – Introduction Top 10 x86 Instructions Rank instruction % of total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% Simple instructions dominate instruction frequency 19 Computer Structure 2012 – Introduction CISC Drawbacks  Complex instructions and complex addressing modes  complicates the processor  slows down the simple, common instructions  contradicts Make The Common Case Fast  Not compiler friendly    Non orthogonal registers Unused complex addressing modes Variable length instructions are a pain   20 Difficult to decode few instructions in parallel • As long as instruction is not decoded, its length is unknown  Unknown where the inst. ends, and where the next inst. starts An instruction may cross a cache line or a page Computer Structure 2012 – Introduction RISC Processors  RISC - Reduced Instruction Set Computer   The idea: simple instructions enable fast hardware Characteristics   A small instruction set, with few instruction formats Simple instructions that execute simple tasks • Most of them require a single cycle (with pipeline)   A few indexing methods ALU operations on registers only • Memory is accessed using Load and Store instructions only    21 Many orthogonal registers Three address machine: Add dst, src1, src2 Fixed length instructions Computer Structure 2012 – Introduction RISC Processors (Cont.)  Simple architecture  Simple micro-architecture      Using a smart compiler    22 Better pipeline usage Better register allocation Existing RISC processor are not “pure” RISC   Simple, small and fast control logic Simpler to design and validate Leave space for large on die caches Shorten time-to-market e.g., support division which takes many cycles Examples: MIPSTM, SparcTM, AlphaTM, PowerTM Computer Structure 2012 – Introduction Compilers and ISA  Ease of compilation  Orthogonality: • no special registers • few special cases • all operand modes available with any data type or instruction type  Regularity: • no overloading for the meanings of instruction fields  streamlined • resource needs easily determined  Register Assignment is critical too  23 Easier if lots of registers Computer Structure 2012 – Introduction CISC Is Dominant  The x86 architecture, which is a CISC architecture, dominates the processor market   A vast amount of existing software Intel, AMD, Microsoft and others benefit from this • Intel and AMD put a lot of money to make high performance x86 processors, despite the architectural disadvantage • Current x86 processor give the best cost/performance   CISC processors use arch ideas from the RISC world Starting at Pentium II and K6, x86 processors translate CISC instructions into RISC-like operations internally • the inside core looks much like that of a RISC processor 24 Computer Structure 2012 – Introduction Software Specific Extensions  Extend arch to accelerate exec of specific apps  Example: SSETM – Streaming SIMD Extensions      128-bit packed (vector) / scalar single precision FP (4×32) Introduced on Pentium® III on ’99 8 new 128 bit registers (XMM0 – XMM7) Accelerates graphics, video, scientific calculations, … Packed: Scalar: 128-bits x3 x2 x1 128-bits x0 x3 x2 + y3 y2 x0 + y1 y0 x3+y3 x2+y2 x1+y1 x0+y0 25 x1 y3 y2 y1 y0 y3 y2 y1 x0+y0 Computer Structure 2012 – Introduction

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download + t exe t` exe