Download Network Victim Cache: Leveraging Network-on

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Piggybacking (Internet access) wikipedia , lookup

Computer network wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Network tap wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Airborne Networking wikipedia , lookup

Transcript
Zhongkai Chen
3/25/2010
Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang
Dept. of Comput. Sci. & Technol., Tsinghua Univ.,
Beijing, China
This paper appears in: Embedded and Multimedia
Computing, 2009. EM-Com 2009. 4th International
Publication Date : 10-12 Dec. 2009
 Introduction



Problems
Network on Chip (NoC)
Victim Cache
 Network


Victim Cache Design
Baseline Architecture
NVC Scheme
 Performance
Evaluation

The large working sets of commercial and
scientific workloads favor a shared L2 cache
design that maximizes the aggregate cache
capacity and minimizes off-chip memory
requests in Chip Multiprocessors (CMP)

Two Important hurdles that restrict the scalability
of these chip multiprocessors:
 the on-chip memory cost of directory
 the long L1 miss latencies
Network on Chip (NoC)
In a NoC system, modules such as processor cores,
memories and specialized IP blocks exchange data using a
network as a "public transportation" sub-system for the
information traffic.
An NoC is constructed from multiple point-to-point data
links interconnected by routers, such that messages can
be relayed from any source module to any destination
module over several links, by making routing decisions at
the routers.
Victim Cache
A victim cache is a cache used to hold blocks evicted from
a CPU cache upon replacement. The victim cache lies
between the main cache and its refill path.
The victim cache is usually fully associative, and is
intended to reduce the number of conflict misses. Only a
small fraction of the memory accesses of the program
require high associativity. The victim cache exploits this
property by providing high associativity to only these
accesses.

Baseline Architecture
L1 caches are kept coherent by
using directory-based cache
coherence protocol.
Directory
The tile CMP is organized as 2D array of replicated tiles each with a
core, a private L1 cache, an L2 cache slice, and a router that connects the
tile to the network on chip.
The L2 cache slices form a logically shared L2. L1 cache misses are sent
to the corresponding home tile, which looks up the directory information
and performs the actions needed to ensure coherence.

Baseline Router Architecture
In tiled CMP, L1 cache and L2 cache are attached to
router through Network Interface Component (NIC).
Routers are connected together by four direction
interfaces to form a 2D network on chip.

The Network Victim Cache (NVC)
The difference from the baseline router architecture is
the modification of network interface component. VC
and DC are added into the network interface component.
Remove directory information from L2
caches and stored it in Directory Caches
(DC) in the network interface
components to save memory space
The saved directory space is used
as Victim Caches (VC) to capture
and store evictions from local L1
caches to reduce subsequent L1
miss latencies.
VC
Evicted by a conflict
or capacity miss
L1
Cache
Miss
Request
DC
L2 Cache
Fetched Data Block

At the home tile, the DC captures L1 miss request in the network
interface component and looks up directory information of the
requesting block. It fetches data block from local L2 cache and sends
reply back to the requestor.

If a L1 cache line is evicted because of a conflict or capacity miss,
we attempt to keep a copy of the victim line in the VC to reduce
subsequent access latency to the same line.
L1
Cache
Miss
Request
Move Back

Miss
VC
DC
……
Hit -> Invalidate
All L1 misses will first check VC when they flow through the network
interface component in case there’s a valid block. On a VC miss, the
request continues to travel to the home tile. On a VC hit, the block is
invalidated in the VC and moved into the L1 cache.
 Simulation
Environment
Use GEMS simulator to evaluate the performance of NVC against over
the baseline CMP.
The number of entries of VC is equal to that of L1 cache and the number
of entries of DC is twice that of L1 cache.
Detailed system parameters
8 workloads from SPLASH-2
and PARSEC benchmarks
on Solaris 10 operating system
 Impact
on L1 cache miss latency
NVC decreases the L1 cache miss latencies by 21-49%, and by
31% on average. For water benchmark, small working set
makes most of L1 misses can be satisfied in local victim
cache, and then reduces the L1 miss latencies by 49%.
Normalized L1 cache average miss latency
 Impact
on execution time
NVC reduces the execution time of each benchmark by 1034%.The execution time of lu and water are reduced by 34%.
For water benchmark, small working set makes most of L1
misses can be satisfied in local victim cache and leads
to better performance. NVC improves performance of CMP by
23% on average.
Execution time
 On-Chip
Network Traffic Reduction
An additional benefit of NVC is the reduction of on-chip
coherence traffic. NVC reduces the number of
coherence messages of each benchmark by 16-48%,
and by 28% on average. NVC eliminates some inter-tile
messages when accesses can be resolved in local
victim caches.
 Scalability
Compared to conventional shared L2 cache design, NVC
increases on-chip storage by only 0.18%. As the number of
cores increases, the saved directory storage from L2 cache
will increase significantly, while the storage overhead of the
proposed scheme will increase far slower. NVC can provide
much better scalability than the conventional shared L2
cache design when the number of cores increases.