Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999 Redundancy • If a processor’s output is error-prone, reliability can be provided with redundancy Input Program Primary Core Checker Core Verify & Commit Redundancy • If a processor’s output is error-prone, reliability can be provided with redundancy Input Program Primary Core Checker Core Verify & Commit Checker Core One checker can detect errors. For recovery, we may need another checker or some other form of redundancy Why Redundancy? • Soft Errors: A high energy particle can strike a device and deposit enough charge to flip the value Input Program Primary Core Checker Core Verify & Commit Cosmic rays Alpha particles Why Redundancy? • Soft Errors: voltage spikes or noise Input Program Primary Core Checker Core Verify & Commit Crosstalk di/dt Lower voltages Why Redundancy? • Allows unverified or aggressively clocked primary cores Input Program Primary Core Checker Core Verify & Commit Functionally incorrect core: some corner case slips through Electrically incorrect core: high temperature causes a circuit to not meet the timing constraint DIVA Microarchitecture BPred I-$ Dec/Ren IQ Rename Regs Arch Regs If both checks succeed, write 12 into LR15 Storage Check Rd LR3 and LR7 from Arch Regs and confirm it equals 4 and 8 ALU Check Add 4+8 and confirm it equals 12 ALU D-$ LR3 + LR7 LR15 4 8 12 Microarchitecture Details • Instructions are fed to checker in order during commit • The logic and storage checks detect errors in ALUs and datapath • The checker core is a simple in-order pipeline – easy to design and verify • An error in an earlier stage (LR3 instead of LR2) can be detected by also adding a ren/decode stage to the checker • In-order core has no stalls (need bypass for register file) – no data dependences, cache misses, branch mispredicts • Contention for register file and data cache can degrade primary thread Recovery • The architected register file and data cache are ECC protected – when an error is detected, it is assumed that checker and architected state are correct • Primary core is re-started from faulting instruction • A fault in the primary core may result in deadlock: e.g. instruction that produces R5 is waiting for R5 to be produced (instead of R4) A timeout in the checker signals an error Redundant Multi-Threading • Execute two threads in parallel (CMP or SMT) – each thread maintains its own register state • Threads execute as in a conventional processor, except trailing thread commits after verifying result leading thread commits stores to a buffer – these get written to cache/memory only after verification load values of the leading thread are sent to trailing thread, so trailing thread never accesses data cache branch outcomes are also sent to trailing thread Reg results, load values, branch outcomes Leading Thread Trailing Thread Store values Fault Model • A single error in either core can be detected • Since loads are not replicated, the load/store datapath must be ECC protected • For recovery, a second checker thread is required • ECC in the checker register file will enable recovery in most cases without a second checker RMT on SMT/CMP + SMT does not require inter-core traffic – values can be read from shared register file/data cache – Single thread performance may be degraded – Each redundant instr executes on high-power pipeline + Trailing CMP core can be a simple in-order processor low power/area overheads + Trailing core’s frequency can be independently controlled + Heterogeneous CMP where cores can be dynamically employed for throughput/reliability + Lower probability for errors Parallelization of Trailing Thread Parallel Thread 1 Parallel Thread 2 Parallel Thread 3 Parallel Thread 4 Sequential Thread Is it more power-efficient to execute the verification thread in parallel? Parallelization of Trailing Thread Parallel Thread 1 Parallel Thread 2 Parallel Thread 3 Parallel Thread 4 Sequential Thread If the trailing cores are frequency-scaled, dynamic power does not change, but leakage power increases If the trailing cores are frequency-and-voltage scaled, dynamic power decreases, and leakage power increases Error Types Acronyms!! • MTTF & MTBF: Mean time to/between failures • Errors are either SDC (silent data corruption) or DUE (detected unrecoverable errors) Many errors get masked: • ACE bits: these bits are required for architecturally correct execution • un-ACE bits: these bits do not affect the final output • AVF: architecture vulnerability factor (the percentage of time/space that a structure holds ACE state) Partial Coverage • RMT covers faults in the entire core (almost!) • If that is too expensive, provide error coverage in specific structures to reduce error probabilities • Are there ways to ensure that an instruction spends less time in architecturally vulnerable structures? Title • Bullet