Download Robert van Liere

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
OGO 2.1
SGI Origin 2000
Robert van Liere
CWI, Amsterdam
TU/e, Eindhoven
11 September 2001
unite.sara.nl
• SGI Origin 2000
• Located at SARA in Amsterdam
• Hardware configuration :
–
–
–
–
–
128 MIPS R10000 CPUs @ 250 Mhz
64 Gbyte main memory
1 Tbyte disk storage
11 ethernet @ 100 Mbits
1 ethernet @ 1 Gbit
Contents
• Architecture
– Overview
– Module interconnect
– Memory hierarchies
• Programming
– Parallel models
– Data placement
• Pros and cons
Overview - Features
• 64 bit RISC microprocessors
• Large main memory
• “Scalable” in CPU, memory and I/O
• Shared memory programming model
Overview - Applications
• Worldwide : +/- 30.000 systems
– ~ 50 with >128 CPUs
– ~ 100 with 64-128 CPUs
– ~ 500 with 32-64 CPUs
• Computing serving : many CPUs and memory
• Database serving : many disks
• Web serving : many I/O
System architecture – 1 CPU
•
•
•
•
CPU + cache
One system bus
Memory
I/O (network + disk)
• Cached data
System architecture – N CPU
• Symmetric multiprocessing (SMP)
•
•
•
•
Multi-CPU + caches
One shared bus
Memory
I/O
N CPU – cache coherency
• Problem:
– Inconsistent cached data
• Solution:
– Snooping
– Broadcasting
• Not scalable
Architecture – Origin 2000
• Node board
•
•
•
•
•
2 CPU + cache
Memory
Directory
HUB
I/O
Origin 2000 Interconnect
• Node boards
• Routers
– Six ports
Interconnect Topology
Sample Topologies
128 Topology
Virtual Memory
• One CPU, multi programs
• Page
• Paging disk
• Page replacement
O2000 Virtual Memory
• Multi CPU, Multi progs
• Non-Uniform Memory Access
• Efficient programs:
– Minimize data movement
– Data “close” to CPU
Latencies and Bandwidth
Application performance
• Scientific computing
– LU, ocean, barnes, radiosity
• Linear speedup
– More CPUs -> performance
Programming support
• IRIX operating system
• Parallel programming
– C source level with compiler pragmas
– Posix Threads
– UNIX processes
• Data placement
– dplace , dlock, dperf
• Profiling
– timex, ssrun
Parallel Programs
• Functional Decomposition
– Decompose the problem into different tasks
• Domain Decomposition
– Partition the problem’s data structure
• Consider
– Mapping tasks/parts onto CPUs
– Coordinate work and communication of CPUs
Task Decomposition
• Decompose problem
• Determine dependencies
Task Decomposition
• Map tasks on threads
• Compare:
– Sequential case
– Parallel case
Efficient programs
• Use many CPUs
– Measure speedups
• Avoid:
– Excessive data dependencies
– Excessive cache misses
– Excessive inter-node communication
Pros vs Cons
• Multi-processor (128 )
• Large memory (64 Gbyte)
• Slow integer CPU
• Performance penalty:
• Shared memory
programming
– Data dependencies
– Off board memory
Related documents