Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001 unite.sara.nl • SGI Origin 2000 • Located at SARA in Amsterdam • Hardware configuration : – – – – – 128 MIPS R10000 CPUs @ 250 Mhz 64 Gbyte main memory 1 Tbyte disk storage 11 ethernet @ 100 Mbits 1 ethernet @ 1 Gbit Contents • Architecture – Overview – Module interconnect – Memory hierarchies • Programming – Parallel models – Data placement • Pros and cons Overview - Features • 64 bit RISC microprocessors • Large main memory • “Scalable” in CPU, memory and I/O • Shared memory programming model Overview - Applications • Worldwide : +/- 30.000 systems – ~ 50 with >128 CPUs – ~ 100 with 64-128 CPUs – ~ 500 with 32-64 CPUs • Computing serving : many CPUs and memory • Database serving : many disks • Web serving : many I/O System architecture – 1 CPU • • • • CPU + cache One system bus Memory I/O (network + disk) • Cached data System architecture – N CPU • Symmetric multiprocessing (SMP) • • • • Multi-CPU + caches One shared bus Memory I/O N CPU – cache coherency • Problem: – Inconsistent cached data • Solution: – Snooping – Broadcasting • Not scalable Architecture – Origin 2000 • Node board • • • • • 2 CPU + cache Memory Directory HUB I/O Origin 2000 Interconnect • Node boards • Routers – Six ports Interconnect Topology Sample Topologies 128 Topology Virtual Memory • One CPU, multi programs • Page • Paging disk • Page replacement O2000 Virtual Memory • Multi CPU, Multi progs • Non-Uniform Memory Access • Efficient programs: – Minimize data movement – Data “close” to CPU Latencies and Bandwidth Application performance • Scientific computing – LU, ocean, barnes, radiosity • Linear speedup – More CPUs -> performance Programming support • IRIX operating system • Parallel programming – C source level with compiler pragmas – Posix Threads – UNIX processes • Data placement – dplace , dlock, dperf • Profiling – timex, ssrun Parallel Programs • Functional Decomposition – Decompose the problem into different tasks • Domain Decomposition – Partition the problem’s data structure • Consider – Mapping tasks/parts onto CPUs – Coordinate work and communication of CPUs Task Decomposition • Decompose problem • Determine dependencies Task Decomposition • Map tasks on threads • Compare: – Sequential case – Parallel case Efficient programs • Use many CPUs – Measure speedups • Avoid: – Excessive data dependencies – Excessive cache misses – Excessive inter-node communication Pros vs Cons • Multi-processor (128 ) • Large memory (64 Gbyte) • Slow integer CPU • Performance penalty: • Shared memory programming – Data dependencies – Off board memory