* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download System Software Architecture
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					BlueGene/L System Software Derek Lieber IBM T. J. Watson Research Center February 2004 Topics  Programming environment  Compilation  Execution  Debugging Programming model  Processors  Memory  Files  Communications  What happens under the covers  2 Programming on BG/L     A single application program image Running on tens of thousands of compute nodes Communicating via message passing Each image has its own copy of   Memory File descriptors 3 Programming on BG/L  A “job” is encapsulated in a single host-side process A merge point for compute node stdout streams  A control point for     Signaling (ctl-c, kill, etc) Debugging (attach, detach) Termination (exit status collection and summary) 4 Programming on BG/L    Cross compile the source code Place executable onto BG/L machine’s shared filesystem Run it  “blrun <job information> <program name> <args>”  Stdout of all program instances appears as stdout of blrun  Files go to user-specified directory on shared filesystem blrun terminates when all program instances terminate Killing blrun kills all program instances   5 Compiling and Running on BG/L stdout Workstation datafiles shared filesystem sources BG/L Machine programs + datafiles program local filesystem cross-tools 6 Programming Models  “Coprocessor model”     64k instances of a single application program each has 255M address space each with two threads (main, coprocessor)  non-coherent shared memory “Virtual node model”    128k instances 127M address space one thread (main) 7 Programming Model  Does a job behave like    A group of processes? Or a group of threads? A little bit of each 8 A process group?  Yes   Each program instance has its own  Memory  File descriptors No    Can’t communicate via mmap, shmat Can’t communicate via pipes or sockets Can’t communicate via signals (kill) 9 A thread group?  Yes  Job terminates when  All program instances terminate via exit(0)  Any program instance terminates    Voluntarily, via exit(!0) Involuntarily, via uncaught signal (kill, abort, segv, etc) No   Each program instance has own set of file descriptors Each has own private memory space 10 Compilers and libraries  GNU C, Fortran, C++ compilers can be used with BG/L, but they do not exploit 2nd FPU  IBM xlf/xlc compilers have been ported to BG/L, with code generation and optimization features for dual FPU  Standard glibc library  MPI for communications 11 System calls   Traditional ANSI + “a little” POSIX I/O   Time   Open, close, read, write, etc Gettimeofday, etc Signal catchers   Synchronous (sigsegv, sigbus, etc) Asynchronous (timers and hardware events) 12 System calls  No “unix stuff”    No system calls needed to access most hardware      fork, exec, pipe mount, umount, setuid, setgid Tree and torus fifos Global OR Mutexes and barriers Performance counters Mantra   Keep the compute nodes simple Kernel stays out of the way and lets the application program run 13 Software Stack in BG/L Compute Node    CNK controls all access to hardware, and enables bypass for application use User-space libraries and applications can directly access torus and tree through bypass As a policy, user-space code should not directly touch hardware, but there is no enforcement of that policy Application code User-space libraries CNK Bypass BG/L ASIC 14 What happens under the covers?  The machine  The job allocation, launch, and control system  The machine monitoring and control system 15 The machine  Nodes     IO nodes Compute nodes Link nodes Communications networks      Ethernet Tree Torus Global OR JTAG 16 The IO nodes    1024 nodes talk to outside world via ethernet talk to inside world via tree network    not connected to torus embedded linux kernel purpose is to run   network filesystem job control daemons 17 The compute nodes     64k nodes, each with 2 cpus and 4 fpus application programs execute here custom kernel non-preemptive   kernel and application share same address space   application program has full control of all timing issues kernel is memory protected kernel provides    program load / start / debug / termination file access all via message passing to IO nodes 18 The link nodes    Signal routing, no computation Stitch together cards and racks of io and compute nodes into “blocks” suitable for running independent jobs Isolate each block’s tree, torus, and global OR network 19 Machine configuration machine manager jtag/ethernet host core link link 20 Kernel booting and monitoring jtag/ethernet machine manager host core …1024… ciod ciod ciod cnk …64… cnk cnk cnk …64… …64… cnk cnk 21 Job execution blrun blrun tcp/ethernet host core …1024… ciod ciod tree ciod cnk …64… cnk cnk cnk …64… …64… cnk cnk 22 Blue Gene/L System Software Architecture tree I/O Node 0 Front-end Nodes Console File Servers Pset 0 C-Node 0 C-Node 63 CNK CNK Linux ciod Service Service Node Node DB2 Ethernet MMCS I/O Node 1023 I2C Scheduler torus C-Node 0 C-Node 63 CNK CNK Linux ciod Ethernet IDo chip JTAG Pset 1023 23 Conclusions  BG/L system software stack has     BG/L system software must scale to very large machines     Custom solution (CNK) on compute nodes for high performance Linux solution on I/O nodes for flexibility and functionality MPI as default programming model Hierarchical organization for management Flat organization for programming Mixed conventional/special-purpose operating systems Many challenges ahead, particularly in performance, scalability and reliability 24