Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This talk does not necessarily reflect NSFs official opinions Hardware Cost and Implications . Storages are large and cheap. Information and computing available everywhere. Major Challenges: $400,000/MIPS (Cray-I) $250/MIPS (i860) . 1980 1990 . $1/MIPS or less 2002 distributed resource management security and privacy availability reliability Impact on US Computer Exports Speed Limits on Computer Exports - Russia, China, India, and Middle East Countries - Millions of Theoretical Operations Per Second (MTOPS) Before 2001, MTOPS = 28,000 - less powerful than a cluster of ten 1.5 GHz/2-way PCs. 2001, MTOPS = 85,000 - less powerful than a cluster of ten 2.2 GHz/4-way PCs. 2002, MTOPS = 195,000 MTOPS -less powerful than a cluster of ten 3 GHz/8-way PCs. MTOPS Hardly Reflects Reality MTOPS views a computer as a high performance calculator. - ignores the deep memory hierarchy, - ignores the fast internel interconnections, - ignores the power of clusters, and - ignores resource sharing using Internet. Senete passed a bill to remove MTOPS on 9/6/01. The computing power is mainly determined by effective utilization of aggregated networked resources. Commodity Processors Based Clusters Cluster technology becomes mature, providing sufficient computing resources for 90% applications. Dawning-4000A is ranked number 10 in Top 500. Who take care the 10% ultra scale applications? High-end systems addressing the problems of Scalability: scale the system to tens of thousands nodes. Reliability: make the system run for thousands of hours. Managing deep memory hierarchy: fast data delivery. High-end comp != Grid and Cluster computing! Client/Server based IT Infrastructure Services provided by data/computing centers. Grid and Web search engines are server-based. Each server can be built by a distributed cluster. Inter- and intra resource coordination. Services are guaranteed and trusted Security is enforced within each server. Client/Server based Grid System Original vision and state-of-the-art Grid: a global networking infrastructure connecting multiple high performance computational resources. Targeted applications: Supercomputing across the globe. Collaborative computing Global data repository and data-intensive computing Core Technology: centralized administration (e.g. resource registrations) centralized management (e.g. job scheduling) NSF Sponsored Grid Efforts 1997 to 2002: Two Partnerships for Adv. Comp. Infras. (PACI) NCSA at Illinois and NPACI at San Diego leading 60+ institutions from 27 states. Missions: - providing grid computing and data resources - developing grid software tools - applications on grids - education outreach and training. Building National Grid Infrastructure 2001 to 2004: Distributed Terascale Facility (DTF) 4 DTF sites: NCSA, NPACI, Argonne, and Caltech providing aggregated 14+ teraflops and 450+ terabytes. Tasks: NCSA: 6+ TFs & 240+TBs Linux cluster of Itanium’s - NPACI: 4+ TFs & 225+ TBs - Angonne: 1+ TF IBM cluster, grid & viz. software - Caltech: 86 TB on-line storage. - Large NSF Sponsored Grid Projects GIOD (Globally Interconnected Object Databases) global data storage and accesses of particle collider experiments GriPhyN (Grid Physics Network) building global grids for experimental physics studies. iVDgL (international Virtual-Data grid Lab) grids for physics/astronomy experiments data-intensive science, US & EU collaboration NEES (Network for Earthquake Engineering Simulation) shifting from physical tests to simulation (20 grid sites) Additional NSF Grid Efforts 2003 to 2005: Enhanced Distributed Terascale Facility 4 original DTF sites plus Pittsburgh SC. Tasks: Enhancing the existing DTFs’ software and hardware - Testing large scale applications. - Widely connecting to users. - Limits of Current Grid Systems Deployment of grid is still not easy. Application scope is narrow, and killer apps are limited High cost and case by case (e.g. NSF grid projects) Increasingly more local clusters will satisfy applications. Special ones by custom-designed HEC (ES, Blue-gene). Global supercomputing is not cost- and performanceeffective: storing data is much cheaper than transferring. Centralized administration and management limiting the scalability. Creating single points of failures. Beyond Client/Server World: Internet The rapid growing Internet services are provided by an increasing number of peers. Variety of devices: from cell phones to a Supercomputer Centers. Pervasive computing: access information and services anytime and anywhere. Client/Server Model is Being Challenged No single server or search engine can sufficiently cover increasing Web contents. 21018 Bytes/year generated in Internet. But only 31012 Bytes/year available to public (0.00015%). Google only searches 1.3108 Web pages. (Source: IEEE Internet Computing, 2001) Client/Server (continued) Client/server model seriously limits utilization of available bandwidth and service. Popular servers and search engines become traffic bottlenecks. But high speed networks connecting many clients become idle. Computing cycles and information in clients are ignored. Content Delivery Networks:(CDN) A Transition Model Servers are decentralized (duplicated) throughout the Internet. The distributed servers are controlled by a centralized authority (headquarters). Examples: Internet content distributions by Akamai, Overcast, and FFnet. Both Client/Server and CDN models have single point of failures. A New Paradigm: Peer-oriented Systems Both client (consumer) & server (producer). Has the freedom to join and leave any time. Huge peer diversity: service ability, storage space, networking speed, and service demand. A widely decentralized system opening for both opportunities and new concerns. Peer-oriented Systems Client/server a search engine/grid Content Delivery Networks Server e.g. Akami Duplicated Server Server Hybrid P2P Pure P2P directory e.g. Napster e.g. Freenet & Gnutella Objectives and Benefits of P2P • As long as there no physical break in the network, the target file will always be found. • Adding more contents to P2P will not affect its performance. (information scalability). • Adding and removed nodes from P2P will not affect its performance. (system scalability). Peer-oriented Applications File Sharing: document sharing among peers with no or limited central controls. Instant Messaging (IM): Immediate voice and file exchanges among peers. Distributed Processing: One can widely utilize resources available in other remote peers. P2P Network Infrastructure Overlay networks: peers communicate to each other in the application layer. Making friends with an IP address globally without considering distance, message types, low level protocols used. Peers are not required to understand physical networks, creating a new domain of development opportunities. More on Overlay Networks Overlay Graph: each edge is a TCP connection or a pointer to an IP address. Overlay Maintenance: (1) periodically ping to verify liveness of peers; (2) delete the edge with an dead peer; (3) new peer needs to bootstrap. Overlay Problems: (1) topology-unaware; (2) duplicated messages; (3) inefficient network usage. P2P Types and Operations Directory-based P2P: a centralized index server makes a direct map between a pair of requesting and serving peers, e.g. Napster. Unstructured P2P: peers are randomly connected in overlay graph, flooding for queries/retrievals, e.g. Gnutella, and KaZaA. Structured P2P: peers are objectively connected in overlay graph by a Distributed Hash Table for registrations and queries/retrievals, e.g. Chord, CAN Directory-based P2P of Sharing Music: Napster join get query file answer central index ... Brief History and Implication of Napster 1999/1: Shawn Fanning (freshman, Northeastern), dropped out and started it. 1999/6: Napster began operations for swapping music among peers. 1999/12: lawsuit on copyright violation (RIAA), asking for $100K of each. 2000/3: universities ban it due to heavy traffic, e.g. 25% traffic in Uwisc. 2000/5: VC firm Hummer Winblad invested $15 millions to Napster. 2000/7/26: US District judge orders to stop Napster’s operations in 2 days. 2000/7/28: 9th US Circuit Appeals Court rules it is allowed to continue. 2001/2: Federal Appeals Court rules it must stop trading copyrighted music. 2001/9: It reaches a settlement with music writers/publishers: pay $26 M for the past damage and a % to them as it starts as a paying service in 2002. How does Naspter Work (very simple!) Application-level: (1) client/server protocol over point-to-point TCP/IP; (2) central directory server. User operation steps: connect to Napster server (www.napster.com) upload a request list and the IP address in the server. Index server searches the list and returns results to the IP. User pings the music hosts, looking for best transfer rate. User chooses a music provider for data transfer. The index server does not scale its P2P system. Unstructured P2P: Gnutella flooding query Super Node based P2P: KaZaA (Morpheus) ... ... ... ... super peer query ... file get ... answer Super Node based P2P: KaZaA (Morpheus) ... flooding query ... ... ... super peer ... ... Distributed Hash Table (DHT) K V K V K V K V K V K V K V K V K V K V K V Distributed Hash Table (DHT) K V K V K V K V K V K V K V K V K V insert (K1,V1) K V K V Distributed Hash Table (DHT) K V K V K V K V K V K V K V K V K V insert (K1,V1) K V K V Distributed Hash Table (DHT) (K1,V1) K V K V K V K V K V K V K V K V K V K V K V Distributed Hash Table (DHT) K V K V K V K V K V K V K V K V K V K V K V retrieve (K1) Problem 1: Loosing Security and Privacy Providing a conduit for evil code and viruses. Providing loopholes for information leakage. Relaxing the privacy protection by exposing peer identities. Problem 2: Weak Resource Coordinations With limited or no central control, but mainly rely on self-organization. Lacking communication monitoring and scheduling: cause unnecessary traffic jams. Lacking access and service coordinations: unbalanced loads among peers. Demanded Solution (1): Fast Peer Services Dynamically identifying and collecting trusted and guaranteed peers as the backbones. Establishing adaptive self-organization and monitoring for resource coordinations. Fast data and service searching in low-diameter region. (2): Allowing Distrustful Peers Exist Ensure that peer interactions do not become intrusive (monitoring/scheduling) do protect privacy (communication anonymity) not used for denial-of-service attacks (security) (3): Measurable Security Metrics Benchmarks for security measurement. Stochastical models for security analysis. Validating systems and quantifying security degrees. (4): Understanding the Trade-offs Analyzing the impact of centralized controls to performance and security. Quantifying the security loss and performance gain/loss by decentralization. Optimizing peer-oriented systems for individual and combined objectives: high performance, highly secured, balanced of both, for a given performance objective, finding... (5): Utilizing Existing Infrastructure New standards and protocols should be easily implemented in existing Internet. Avoid modifying commonly used and general purpose software. Peer-oriented processing should be automatic with little user involvement. Factors determining P2P or Not P2P Budget: applications demanding cost-effectiveness. Resource relevance to peers: common interests. Security: mutual trusts among peers. Rate of peer changes: relatively stable applications. Non-Critical solutions: QoS is not guaranteed. NSF’s Efforts on Cyberinfrastructures Grids: provides a global problem solving environment for large and critical scientific applications and professional collaborations, where each grid is a server. Funding sources: H&S infrastructure (continuous support) and large ITRs on apps (00, 01, 02, 03). P2P: provides a globally decentralized system for anyone to participate. Funding source: a large ITR for DHT (02). Application Differences: Grid & P2P Grid: providing (1) a global problem solving environment for large scientific applications, (2) commercial/public services, (2) professional collaborations, where each grid is a server. P2P: providing a self-organized information sharing/searching services, where each peer can be both server and client. Operation Differences: Grid & P2P Grid: objectively access to computing, software, and data resources in remote & targeted sites. (Servers-based) P2P: random accesses to available computing, software, and data resources without a specific target. (Clients-based) Different Participants: Grid & P2P Grid: pre-determined and registered clients and servers. P2P: clients and servers are not distinguished and registered (for an identity purpose), which can come and go by their choices. Different QoS: Grid & P2P Grid: guaranteed and reliable services are required for each grid server. P2P: only partially reliable, because services from some peers are not guaranteed and trusted. Security Differences: Grid & P2P Grid: authentication, authority, and firewall protection to each grid. P2P: privacy, anonymity, authentication, authority, and fire wall protection to each peer is not guaranteed. Different Controls: Grid & P2P Grid: centralized control plays an important role in resource monitoring/allocations and job scheduling. P2P: limited or no central controls, mainly rely on self-organization. Edge and Utility Computing Objectives: adaptively, timely, and (temporally) move contents and computing resources from centralized centers to sites (edge) close to the endusers. Benefits: QoS improvement (e.g. low response time) High utilization of resources Easy manageability and high availability of services High cost-effectiveness. Core Technology and Challenges Dynamic resource provisioning: deployment of Internet applications upon demand. What we do not have but demand to have: Automation of resource provisioning. Optimization of resource provisioning Effective service distributions of resource provisioning. Merging P2P and Grid Objectives: Building a Scalable and Reliable Cyber Resource Sharing System. Keys: Resource administration & management. Keeping merits of grids on security and reliable services. Keeping merits of P2P on scalability and avoiding single point of failures. Balancing the trade-offs between Grid and P2P. Heterogeneous Internet Members Co-exist Billions of clients in the format of c-phones, PDAs, laptops, PCs at home (Internet/wireless). Millions of clients become termed super-peer nodes. Millions of powerful clusters for local services. Millions of trusted/independent grid nodes serve. Millions of trusted/collaborative grid nodes serve. Dozens of supercomputers for science advancement. Future of Distributed Computing Grid infrastructure will provide reliable service (some computing) resources. In a grid region, P2P techniques will be integrated for resource administration and management. P2P paradigm will play a major role for information retrievals. The demand for data accesses/transfers will be higher than cycles.