Download Computer Architecture 101

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computer Architecture 101 SDBS How does a computer look like? RAM CPU A B CPU RAM Driver RAM Network 2nd Storage 2nd Storage CPU CPU C 2nd Storage D Driver 2nd Storage Network controller Network Driver RAM Driver Network What does a CPU do? What is a hardware interrupt? A.  A signal from an external device to the CPU B.  A signal from the CPU to an external device C.  Signals exchanged between CPUs and external devices D.  A program call between CPUs and external devices What does an instrucIon look like? •  Data handling and memory –  Set (register to constant), move (between register and RAM), read/write (to/from device) •  ArithmeIc and logic –  +,´,*,\ –  Bitwise operaIons (and, or, not, xor) –  Compare (registers values) •  Control flow –  Branch, i.e., manipulate instrucIon reference (condiIonal, indirect) How does a CPU look like? InstrucIon Fetcher InstrucIon Decoder Registers ALU Memory Interface What is a 64 bit CPU? A.  CPU registers are 64 bits B.  ALU operates on 64 bits operands C.  A memory address is 64 bits long D.  All of the above What is Moore’s Law A.  The number of components on an integrated circuit will double every two years B.  The speed of CPUs will increase every two years C.  CPU performance will double every 18 months D.  CPU performance will increase quadraIcally Moore’s Law h]p://download.intel.com/museum/Moores_Law/ArIcles-­‐Press_releases/Gordon_Moore_1965_ArIcle.pdf The end of Moore’s law Performance Trends Diagram courtesy of A.Ailamaki (EPFL) CPU Parallelism A. 
B. 
C. 
D. 
Single instrucIon, single data (SISD) Single instrucIon, mulIple data (SIMD) MulIple instrucIon, single data (MISD) MulIple instrucIons, mulIple data (MIMD) Cache Hierarchy http://lwn.net/Articles/252125/
Intel Core 2 h]p://www.hotchips.org/wp-­‐content/uploads/hc_archives/hc18/3_Tues/HC18.S9/HC18.S9T4.pdf Figure courtesy of Appaloosa Motherboard What is an IO (in terms of hardware architecture)? A.  An access to memory B.  An access to secondary storage C.  An access to a device connected on the I/O bus What is the bandwidth of a modern hard disk (random IO per secnd)? A.  10 IOPS B.  100 IOPS C.  1000 IOPS D.  10000 IOPS E.  100000 IOPS How much faster are sequenIal IOs compared to random IOs on disk? A.  the same B.  2x faster C.  10x faster D.  100x faster tracks pla]er spindle read/write head actuator disk arm Controller disk interface Some Trends 2010 2000 HDD Capacity 200 GB x10 2 TB HDD GB/$ 0,05 x600 30 HDD IOPS 200 x1 200 14 GB (2001) x20 256 GB SSD GB/$ 3 x10E-­‐4 0,5 SSD IOPS 10E3 (SCSI) x1000 x1000 Flash SSD Capacity 10E6+ (PCIe) 5x10E3+ (SATA) PCM Capacity PCM IOPS 2x10E5 cells, 4 bits/cell 10E6+ (1 chip) The Good The hardware! •  A single flash chip offers great performance –  e.g., 40 MB/s Read, 10 MB/s Program –  Random access is as fast as sequenIal access –  Low energy consumpIon •  A flash device contains many (e.g., 32, 64) flash chips and provides inter-­‐chips parallelism •  Flash devices may include some (power-­‐failure resistant) SRAM The Bad The severe constraints of flash chips! •  C1: Program granularity: –  Program must be performed at flash page granularity •  C2: Must erase a block before updaIng a page •  C3: Pages must be programmed sequenIally within a block •  C4: Limited lifeIme (from 104 up to 106 erase operaIons) Pagess must be programmed sequenIally within the block (256 pages) Program granularity: a page (32 KB) Erase granularity: a block (1 MB) And The FTL The soGware!, the Flash TranslaHon Layer – emulates a classical block device and handle
flash constraints Constraints Read sector Write sector MAPPING GARBAGE COLLECTION WEAR LEVELING No constraint! SSD FTL Read page (C1) Program granularity (C2) Erase before prog. Program page (C3) SequenIal program Erase block within a block (C4) Limited lifeIme Flash chips Read
Write
Trim
Scheduling
& Mapping
Shared Internal
data structures
Garbage
collection
Wear
Leveling
Physical address space
Logical address space
Flash-­‐Based SSD Architecture Read
Program
Erase
chip
chip
chip
chip
chip
chip
chip
chip
chip
chip
chip
chip
… … … …
Flash memory array
Methodology: Device state Random Writes – Samsung SSD Out of the box Random Writes – Samsung SSD A9er filling the device è Enforce a well-­‐defined device state –  performing random write IOs of random size on the whole device –  The alternaIve, sequenIal IOs, is less stable, thus more difficult to enforce Methodology: Startup and running phases •  When do we reach a steady state? How long to run each test? Startup and running phases for the Mtron SSD (RW) Running phase for the Kingston DTI flash Drive (SW) è Startup and running phase: Run experiments to define §  IOIgnore: Number of IOs ignored when compuIng staIsIcs §  IOCount: Number of measures to allow for convergence of those staIsIcs. Methodology: Interferences SequenIal Reads 10 Random Writes SequenIal Reads Pause 1 0.1 0 250 500 750 1000 1250 è Interferences: Introduce a pause between experiments 1500 Results: Samsung, memoright, Mtron Granularity for the Memoright SSD • 
• 
For SR, SW and RR, –  linear behavior, almost no latency –  good throughputs with large IO Size For RW, ≈5ms for a 16KB-­‐128KB IO Locality for the Samsung, Memoright and Mtron SSDs •  When limited to a focused area, RW performs very well Results: Intel X25-­‐E Response Ime (μs) SR, SW and RW have similar performance. RR are more costly! Response Ime (μs) IO size (KB) RW (16 KB) performance varies from 100 μs to 100 ms!! (x 1000) Results : Fusion IO •  Capacity vs Performance tradeoff (80 GB à 22 GB!) •  SensiIvity to device state Response Ime (μs) 250"
IO Size = 4KB 200"
150"
100"
50"
0"
MaxWrite"
MaxCap"
MaxCap"
MaxWrite"
MaxWrite"
Low level forma]ed SR"
SR"
RR"
RR"
SW"
SW"
RW"
RW"
MaxCap"
MaxWrite"
Fully wri]en Phase-­‐Change Memory (PCM) h]p://cseweb.ucsd.edu/users/swanson/papers/HotStorage2011-­‐Onyx.pdf •  Byte addressable •  In-­‐place update (no erase) •  10^6 write cycles per cell •  2012 PCM chip characterisIcs: •  128 MB •  50 MB/sec (random read 16 B/IO) •  0.5 MB/sec (random write 64 B/IO) h]p://www.micron.com/products/phase-­‐change-­‐memory Modern Computer Architecture h]p://hpts.ws/session2/mohan.pdf