Download presentation

Caching in multiprocessor systems Tiina Niklander In AMICT 2009, Petrozavodsk 19.5.2009 Background  More transistors on one chip  Multiple cores  Larger cache  Multiple on chip caches  More functionality (more functional units, dedicated multimedia / deciphering cell, integrated GPU)  Multiple cores introduce  Cache organization  Private vs shared caches  Cache coherence Cache organization  Common organization:  L1 is private  Last-level cache is shared  With three levels:  L1 private  L2 ? Private or shared  L3 Shared Private vs Shared cache  Fully private, fully shared, partially shared Private L2 (pair of processors share) Shared L2 (all can access all L2) F. Sibai: On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures. Microprocessors and Microsystems 32 ( 2008), pp. 405-412 Shared cache  Simple coherence issue (just one copy)  Different latencies (CPU - cache location)  Cache access competition (wait for other core) M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for Chip Multiprocessors. In SC2008. IEEE, 2008, pp. Private cache  No access competition, smaller latencies,  But coherence becomes an issue!  Same date in multiple caches -> invalidate on write  Cache partitioning  Design time: Fixed partitioning  Run time:  Fixed partitioning (configuration issue)  Dynamic (based on current need) Cache coherence  Protocols: MESI, MSI, MOSI, MOESI  Invalidation message: RFO (Read for ownership)  Each cache snoops the bus to monitor memory ops M E S I M N N N Y E N N N Y S N N Y Y I Y Y Y Y wikipedia M – modified (O- Owned) E – Exlusive S – Shared I – Invalid N – not allowed state Y – allowed state (Distributed) cooperative caches  Add a directory structure  Knows the data locations in local caches  Cache-to-cache copying  When in another cache (directory locates)  On eviction (store temporarily on another cache) E, Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In PACT’08. ACM 2008, pp. 134-142 New improvement ideas for cache performance 1/2  Split the cache for different tasks  Dynamically allocate cache areas  Software controlled eviction  GOAL: thread moves unneeded, but strongly-shared data to shared cache to improve performance of other threads  New instruction evict tells the processor to move some data from private L1 or L2 to shared L3 New improvement ideas for cache performance 2/2  Helper threads  GOAL: additional thread executes parts of the code ahead of the actual thread to ‘prefetch’ data to cache  Generate memory traces for the programmer  Tuning the software performance Conclusion  Focus on fine-tuning the cache performance  Cache coherence itself is solved earlier  Not always used (if allowed non-coherent usage)  L2 and L3 caches  Shared or private  Cache partitioning  Support for software-based improvements  Eviction hints  Traces  Prefetching (like helper thread) References  S. Fide, S. Jenks: Proactive use of shared L3 caches to enhance cache communications in multi-core processors. IEEE Comp. Arch. L. vol 7 (2008), pp 57-60  E. Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In Conf. on Parallel architectures and compilation techniques, PACT’08. ACM 2008, pp. 134-142  M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for Chip. Multiprocessors. In Proc. of the 2008 ACM/IEEE Conf. on Supercomputing. IEEE, 2008, pp. 1-12  L. Peng, et.al.: Memory hierarchy performance measurement of commercial dual-core desktop processors. Journal of Systems Architecture 54(2008), pp. 816-828.  F. Sibai: On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures. Microprocessors and Microsystems 32 ( 2008), pp. 405-412  J. Zhang, X. Fan, S.H. Liu: A Pollution Alleviate L2 Cache Replacement Policy for Chip Multiprocessor Architecture. In Int. Conf. on Networking, Architecture and Storage, IEEE, 2008, pp. 310-316

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download presentation