* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download HPCC - Chapter1 - Auburn Engineering
Multi-core processor wikipedia , lookup
Stream processing wikipedia , lookup
Distributed computing wikipedia , lookup
Data-intensive computing wikipedia , lookup
Application Interface Specification wikipedia , lookup
Parallel computing wikipedia , lookup
Supercomputer wikipedia , lookup
High Performance Cluster Computing Architectures and Systems Book Editor: Rajkumar Buyya Slides Prepared by: Hai Jin Internet and Cluster Computing Center Introduction  Need more computing power  Improve the operating speed of processors & other components  constrained by the speed of light, thermodynamic laws, & the high financial costs for processor fabrication  Connect multiple processors together & coordinate their computational efforts   2 parallel computers allow the sharing of a computational task among multiple processors Era of Computing  Rapid technical advances   the recent advances in VLSI technology software technology    grand challenge applications have become the main driving force Parallel computing   3 OS, PL, development methodologies, & tools one of the best ways to overcome the speed bottleneck of a single processor good price/performance ratio of a small clusterbased parallel computer Need of more Computing Power: Grand Challenge Applications Solving technology problems using computer modeling, simulation and analysis Geographic Information Systems Life Sciences 4 CAD/CAM Aerospace Digital Biology Military Applications Parallel Computer Architectures  Taxonomy        5 based on how processors, memory & interconnect are laid out Massively Parallel Processors (MPP) Symmetric Multiprocessors (SMP) Cache-Coherent Nonuniform Memory Access (CC-NUMA) Distributed Systems Clusters Grids Parallel Computer Architectures  MPP    A large parallel processing system with a shared-nothing architecture Consist of several hundred nodes with a high-speed interconnection network/switch Each node consists of a main memory & one or more processors   SMP     6 Runs a separate copy of the OS 2-64 processors today Shared-everything architecture All processors share all the global resources available Single copy of the OS runs on these systems Parallel Computer Architectures  CC-NUMA    Distributed systems     considered conventional networks of independent computers have multiple system images as each node runs its own OS the individual machines could be combinations of MPPs, SMPs, clusters, & individual computers Clusters    7 a scalable multiprocessor system having a cache-coherent nonuniform memory access architecture every processor has a global view of all of the memory a collection of workstations of PCs that are interconnected by a high-speed network work as an integrated collection of resources have a single system image spanning all its nodes Towards Low Cost Parallel Computing  Parallel processing    linking together 2 or more computers to jointly solve some computational problem since the early 1990s, an increasing trend to move away from expensive and specialized proprietary parallel supercomputers towards networks of workstations the rapid improvement in the availability of commodity high performance components for workstations and networks  Low-cost commodity supercomputing   8 from specialized traditional supercomputing platforms to cheaper, general purpose systems consisting of loosely coupled components built up from single or multiprocessor PCs or workstations need to standardization of many of the tools and utilities used by parallel applications (ex) MPI, HPF Windows of Opportunities  Parallel Processing   Network RAM      Redundant array of inexpensive disks Use the arrays of workstation disks to provide cheap, highly available, & scalable file storage Possible to provide parallel I/O support to applications Use arrays of workstation disks to provide cheap, highly available, and scalable file storage Multipath Communication  9 Use memory associated with each workstation as aggregate DRAM cache Software RAID   Use multiple processors to build MPP/DSM-like systems for parallel computing Use multiple networks for parallel data transfer between nodes Cluster Computer and its Architecture   A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource A node      10 a single or multiprocessor system with memory, I/O facilities, & OS generally 2 or more computers (nodes) connected together in a single cabinet, or physically separated & connected via a LAN appear as a single system to users and applications provide a cost-effective way to gain features and benefits Cluster Computer Architecture 11 Prominent Components of Cluster Computers (I)  Multiple High Performance Computers PCs  Workstations  SMPs (CLUMPS)  Distributed HPC Systems leading to Metacomputing  12 Prominent Components of Cluster Computers (III)  High Performance Networks/Switches         13 Ethernet (10Mbps), Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12micro-sec latency) ATM Myrinet (1.2Gbps) Digital Memory Channel FDDI Prominent Components of Cluster Computers (V)  Fast Communication Protocols and Services     14 Active Messages (Berkeley) Fast Messages (Illinois) U-net (Cornell) XTP (Virginia) Prominent Components of Cluster Computers (VI)  Cluster Middleware    Hardware   DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques Operating System Kernel/Gluing Layers   Single System Image (SSI) System Availability (SA) Infrastructure Solaris MC, Unixware, GLUnix Applications and Subsystems    Applications (system management and electronic forms) Runtime systems (software DSM, PFS etc.) Resource management and scheduling software (RMS)  15 CODINE, LSF, PBS, NQS, etc. Prominent Components of Cluster Computers (VII)  Parallel Programming Environments and Tools  Threads (PCs, SMPs, NOW..)    MPI       16  C/C++/Java Parallel programming with C++ (MIT Press book) RAD (rapid application development tools)   Linux, NT, on many Supercomputers PVM Software DSMs (TreadMark) Compilers   POSIX Threads Java Threads GUI based tools for PP modeling Debuggers Performance Analysis Tools Visualization Tools Prominent Components of Cluster Computers (VIII)  Applications   Sequential Parallel / Distributed (Cluster-aware app.)  Grand Challenge applications       17 Weather Forecasting Quantum Chemistry Molecular Biology Modeling Engineering Analysis (CAD/CAM) ………………. PDBs, web servers,data-mining Key Operational Benefits of Clustering High Performance  Expandability and Scalability  High Throughput  High Availability  18 Clusters Classification (III)  Node Hardware  Clusters of PCs (CoPs)  Piles of PCs (PoPs) Clusters of Workstations (COWs)  Clusters of SMPs (CLUMPs)  19 Clusters Classification (V)  Node Configuration  Homogeneous Clusters   Heterogeneous Clusters  20 All nodes will have similar architectures and run the same OSs All nodes will have different architectures and run different OSs Clusters Classification (VI)  Levels of Clustering  Group Clusters (#nodes: 2-99)      Departmental Clusters (#nodes: 10s to 100s) Organizational Clusters (#nodes: many 100s) National Metacomputers (WAN/Internetbased) International Metacomputers (Internet-based, #nodes: 1000s to many millions)    21 Nodes are connected by SAN like Myrinet Metacomputing Web-based Computing Agent Based Computing  Java plays a major in web and agent based computing Commodity Components for Clusters (III)  Disk and I/O   Overall improvement in disk access time has been less than 10% per year Amdahl’s law   Parallel I/O  22 Speed-up obtained from faster processors is limited by the slowest system component Carry out I/O operations in parallel, supported by parallel file system based on hardware or software RAID Cluster Middleware & SSI  SSI   Supported by a middleware layer that resides between the OS and user-level environment Middleware consists of essentially 2 sublayers of SW infrastructure  SSI infrastructure   System availability infrastructure  23 Glue together OSs on all nodes to offer unified access to system resources Enable cluster services such as checkpointing, automatic failover, recovery from failure, & fault-tolerant support among all nodes of the cluster What is Single System Image (SSI) ?    24 A single system image is the illusion, created by software or hardware, that presents a collection of resources as one, more powerful resource. SSI makes the cluster appear like a single machine to the user, to applications, and to the network. A cluster without a SSI is not a cluster SSI Boundaries -- an applications SSI boundary Batch System SSI Boundary 25 (c) In search of clusters Single System Image Benefits      26 Provide a simple, straightforward view of all system resources and activities, from any node of the cluster Free the end user from having to know where an application will run Free the operator from having to know where a resource is located Let the user work with familiar interface and commands and allows the administrators to manage the entire clusters as a single entity Reduce the risk of operator errors, with the result that end users see improved reliability and higher availability of the system Single System Image Benefits (Cont’d)        27 Allowing centralize/decentralize system management and control to avoid the need of skilled administrators from system administration Present multiple, cooperating components of an application to the administrator as a single application Greatly simplify system management Provide location-independent message communication Help track the locations of all resource so that there is no longer any need for system operators to be concerned with their physical location Provide transparent process migration and load balancing across nodes. Improved system response time and performance Resource Management and Scheduling (RMS)    RMS is the act of distributing applications among computers to maximize their throughput Enable the effective and efficient utilization of the resources available Software components  Resource manager   Resource scheduler   Queueing applications, resource location and assignment Reasons using RMS      28  Locating and allocating computational resource, authentication, process creation and migration Provide an increased, and reliable, throughput of user applications on the systems Load balancing Utilizing spare CPU cycles Providing fault tolerant systems Manage access to powerful system, etc Basic architecture of RMS: client-server system Services provided by RMS  Process Migration     Checkpointing Scavenging Idle Cycles      29 Computational resource has become too heavily loaded Fault tolerant concern 70% to 90% of the time most workstations are idle Fault Tolerance Minimization of Impact on Users Load Balancing Multiple Application Queues Computing Platforms Evolution Breaking Administrative Barriers 2100 2100 2100 2100 2100 2100 2100 ? P E R F O R M A N C E 2100 Administrative Barriers Individual Group Department Campus State National Globe Inter Planet Universe Desktop 30 2100 (Single Processor) SMPs or SuperCom puters Local Cluster Enterprise Cluster/Grid Global Inter Planet Cluster/Grid Cluster/Grid ?? Why Do We Need Metacomputing?  Our computational needs are infinite, whereas our financial resources are finite users will always want more & more powerful computers  try & utilize the potentially hundreds of thousands of computers that are interconnected in some unified way  need seamless access to remote resources  31 Towards Grid Computing…. 32 What is Grid ?  An infrastructure that couples       33 Computers (PCs, workstations, clusters, traditional supercomputers, and even laptops, notebooks, mobile computers, PDA, and so on) … Software (e.g., renting expensive special purpose applications on demand) Databases (e.g., transparent access to human genome database) Special Instruments (e.g., radio telescope-SETI@Home Searching for Life in galaxy, Austrophysics@Swinburne for pulsars) People (may be even animals who knows ?) Across the Internet and presents them as an unified integrated (single) resource http://www.csse.monash.edu.au/~rajkumar/ecogrid/ Conceptual view of the Grid Leading to Portal (Super)Computing 34 Grid Application-Drivers  Old and new applications getting enabled due to coupling of computers, databases, instruments, people, etc. (distributed) Supercomputing  Collaborative engineering  High-throughput computing   large scale simulation & parameter studies Remote software access / Renting Software  Data-intensive computing  On-demand computing  35 The Grid Impact “The global computational grid is expected to drive the economy of the 21st century similar to the electric power grid that drove the economy of the 20th century” 36 Metacomputer Design Objectives and Issues (II)  Underlying Hardware and Software Infrastructure   37 A metacomputing environment must be able to operate on top of the whole spectrum of current and emerging HW & SW technology An ideal environment will provide access to the available resources in a seamless manner such that physical discontinuities such as difference between platforms, network protocols, and administrative boundaries become completely transparent Metacomputer Design Objectives and Issues (III)  Middleware – The Metacomputing Environment  Communication services   Directory/registration services   provide the mechanism for registering and obtaining information about the metacomputer structure, resources, services, and status Processes, threads, and concurrency control  38 needs to support protocols that are used for bulkdata transport, streaming data, group communications, and those used by distributed objects share data and maintain consistency when multiple processes or threads have concurrent access to it Metacomputer Design Objectives and Issues (V)  Middleware – The Metacomputing Environment  Security and authorization       System status and fault tolerance Resource management and scheduling  39 confidentiality: prevent disclosure of data integrity: prevent tampering with data authorization: verify identity accountability: knowing whom to blame efficiently and effectively schedule the applications that need to utilize the available resource in the metacomputing environment Metacomputer Design Objectives and Issues (VI)  Middleware – The Metacomputing Environment  Programming tools and paradigms     User and administrative GUI   intuitive and easy to use interface to the services and resources available Availability  40 include interface, APIs, and conversion tools so as to provide a rich development environment support a range of programming paradigms a suite of numerical and other commonly used libraries should be available easily port on to a range of commonly used platforms, or use technologies that enable it to be platform neutral Metacomputing Projects  Globus (from Argonne National Laboratory)   Legion (from the University of Virginia)   provides a high-level unified object model out of new and existing components to build a metasystem Webflow (from Syracuse University)  41 provides a toolkit on a set of existing components to build metacomputing environments provides a Web-based metacomputing environment Globus (I)  A computational grid   A layered architecture   high-level global services are built upon essential low-level core local services Globus Toolkit (GT)      42 A hardware and software infrastructure to provide dependable, consistent, and pervasive access to high-end computational capabilities, despite the geographical distribution of both resources and users a central element of the Globus system defines the basic services and capabilities required to construct a computational grid consists of a set of components that implement basic services provides a bag of services only possible when the services are distinct and have well-defined interfaces (API) Globus (II)    Globus Alliance http://www.globus.org GT 3.0      GT 4.0 (2005)      43 Resources management (GRAM) Information Service (MDS) Data Management (GridFTP) Security (GSI) Execution management Information Services Data management Security Common runtime (WS) The Impact of Metacomputing   44 Metacomputing is an infrastructure that can bond and unify globally remote and diverse resources At some stage in the future, our computing needs will be satisfied in same pervasive and ubiquitous manner that we use the electricity power grid
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            