Download Intra-chip coherence protocol

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bus (computing) wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Internet protocol suite wikipedia , lookup

Direct memory access wikipedia , lookup

Transcript
ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών
Piranha: A Scalable Architecture Based on
Single-Chip Multiprocessing
Luiz André Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz
Qadeer,
Barton Sano, Scott Smith, Robert Stets, and Ben Verghese
In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000
Piranha: A Scalable Architecture Based
on Single-Chip Multiprocessing
•Problem: complex processors are ill-suited for
commercial applications
•Solution: CMP approach
•Piranha System:
-research prototype developed at COMPAQ
-Exploits CMP, intergrades 8 simple Alpha processor cores
with 2-level cache hierarchy on a single chip
• Piranha unique design choices:
-shared second level cache with no inclusion
-Highly optimized cache coherence protocol
-Novel I/O architecture
Piranha: A Scalable Architecture Based
on Single-Chip Multiprocessing
•Different behavior of commercial workloads relative to
technical workloads
-large memory stalls
-Data-dependent nature of the computation & lack of ILP
-No use of high-performance FP and multimedia functionality
•Techniques:
-SMT
-CMT
•Goal of Piranha: build a system to achieve superior
performance on commercial workloads
Piranha: Architecture Overview
Piranha: Architecture Overview
Alpha CPU Core and L1 Caches
CPU
•Single issue, in order design capable of executing Alpha
ISA
•500 MHZ pipelined datapath
•Performance enhancing features: branch target buffer,
pre-compute logic for branch predictions, fully by-passed
datapath
L1 caches
• 64KB two-way set associative blocking caches
•2-bit state field per cache line-> 4 stages in a typical
MESI protocol
•I and D-cache are kept coherent by hardware
Intra-Chip Switch
•Uses bidirectional, push-only interface
•Initiator sources data, if destination ready then ICS
schedules the data transfer. A grant is issued to the
initiator to commense data transfer. The destination
receives a request signal: ID of initiator and type of
transfer
•Each port to ICS consists of 2 independent datapaths
•Implemented by a set of 8 internal datapaths
•Supports 2 logical lanes (low and high priority)
Second - Level Cache
•1MB unified I/O cache, physically partitioned into 8 banks
•Each bank 8-way set associative and uses round robin
replacement policy
•L2 controllers responsible for intra-chip coherence and cooperate
with engines to to enforce inter-chip coherence
•Non-inclusive on-chip cache: Keep a duplicate copy of the L1
tags and state at the L2 controllers
•L1 misses that also miss in L2 are filled directly from memory. L2
behaves as victim cache
•The duplicate L1 state is extended to include “ownership”
•Intra-chip coherence protocol
L2 controllers are responsible
Similarities to a full map centralized directory based protocol
Piranha architecture
Memory controller
•Does not have direct access to ICS, is controlled
by and routed through L2 controller
•Two parts: 1) RAC and 2) Memory Controller
Engine
Protocol Engines
•Home Engine: responsible for exporting memory
whose home is at the local node
•Remote Engine: imports memory whose hoem is
remote
Inter-node Coherence Protocol
•Invalidation-based directory protocol
•Support for 4 request types: read, read-exclusive,
exclusive, exclusive-without-data
•Support features: clean-exclusive optimization,
reply forwarding from remote owner, eager exclusive
replies
•Unique property: avoids NAK
•Unique techniques: 1) Network uses “hot potato”
routing, 2)Buffer space is shared among all lanes
3)Cruise-missile-invalidates (CMI)
System Interconnect
•OO Output queue: accepts packets via the
packet switch from the protocol engines or
from the system controller
•Router: transmits and receives packets to
and from other nodes
•IQ Input queue: receives packets that are
addressed to the local node and forwards
them to the target module via the packet
switch
Reliability Features
RAS features: redundancy on all
memory components, CRC protection
on most datapaths, redundant
datapaths, protocol error recovery,
error logging, hot-swappable links and
in-band system reconfiguration support
Evaluation
•Workloads
DSS workload with TCP-D benchmark
OLTP workload with TCP-B benchmark
•Simulation environment
Use of SinOS Alpha environment:simulates hardware components of
Alpha based multiprocessors
•Simulated architectures
Performance Evaluation of Piranha
Conclusions
•Use of CMP in future multiprocessor
designs
•Piranha: from evaluation:
outperforms other designs