Download On Reliable and Fast Resource Sharing in Peer

Document related concepts

Computer security wikipedia , lookup

Theoretical computer science wikipedia , lookup

Lateral computing wikipedia , lookup

Trusted Computing wikipedia , lookup

Natural computing wikipedia , lookup

Transcript
Next Generation Cyber-Infrastructure:
Integrating Peer-based and Grid Systems
Xiaodong Zhang
College of William and Mary
National Science Foundation
This talk does not necessarily reflect NSFs official opinions
Hardware Cost and Implications
.

Storages are large and cheap.

Information and computing
available everywhere.

Major Challenges:
$400,000/MIPS (Cray-I)
$250/MIPS (i860)
.
1980
1990
.
$1/MIPS or less
2002
distributed resource management
security and privacy
availability
reliability
Impact on US Computer Exports


Speed Limits on Computer Exports
- Russia, China, India, and Middle East Countries
- Millions of Theoretical Operations Per Second (MTOPS)
Before 2001, MTOPS = 28,000
- less powerful than a cluster of ten 1.5 GHz/2-way PCs.

2001, MTOPS = 85,000
- less powerful than a cluster of ten 2.2 GHz/4-way PCs.

2002, MTOPS = 195,000 MTOPS
-less powerful than a cluster of ten 3 GHz/8-way PCs.
MTOPS Hardly Reflects Reality



MTOPS views a computer as a high
performance calculator.
- ignores the deep memory hierarchy,
- ignores the fast internel interconnections,
- ignores the power of clusters, and
- ignores resource sharing using Internet.
Senete passed a bill to remove MTOPS on 9/6/01.
The computing power is mainly determined
by effective utilization of aggregated
networked resources.
Commodity Processors Based Clusters




Cluster technology becomes mature, providing
sufficient computing resources for 90% applications.
Dawning-4000A is ranked number 10 in Top 500.
Who take care the 10% ultra scale applications?
High-end systems addressing the problems of




Scalability: scale the system to tens of thousands nodes.
Reliability: make the system run for thousands of hours.
Managing deep memory hierarchy: fast data delivery.
High-end comp != Grid and Cluster computing!
Client/Server based IT Infrastructure

Services provided by data/computing centers.

Grid and Web search engines are server-based.

Each server can be built by a distributed cluster.

Inter- and intra resource coordination.

Services are guaranteed and trusted

Security is enforced within each server.
Client/Server based Grid System


Original vision and state-of-the-art Grid:

a global networking infrastructure connecting multiple
high performance computational resources.

Targeted applications:

Supercomputing across the globe.

Collaborative computing

Global data repository and data-intensive computing
Core Technology:

centralized administration (e.g. resource registrations)

centralized management (e.g. job scheduling)
NSF Sponsored Grid Efforts

1997 to 2002:
Two Partnerships for Adv. Comp. Infras. (PACI)
NCSA at Illinois and NPACI at San Diego
leading 60+ institutions from 27 states.

Missions:
- providing grid computing and data resources
- developing grid software tools
- applications on grids
- education outreach and training.
Building National Grid Infrastructure

2001 to 2004:
Distributed Terascale Facility (DTF)
4 DTF sites: NCSA, NPACI, Argonne, and Caltech
providing aggregated 14+ teraflops and 450+ terabytes.

Tasks:
NCSA: 6+ TFs & 240+TBs Linux cluster of Itanium’s
- NPACI: 4+ TFs & 225+ TBs
- Angonne: 1+ TF IBM cluster, grid & viz. software
- Caltech: 86 TB on-line storage.
-
Large NSF Sponsored Grid Projects

GIOD (Globally Interconnected Object Databases)
global data storage and accesses of particle collider experiments

GriPhyN (Grid Physics Network)
building global grids for experimental physics studies.

iVDgL (international Virtual-Data grid Lab)
grids for physics/astronomy experiments
data-intensive science, US & EU collaboration

NEES (Network for Earthquake Engineering Simulation)
shifting from physical tests to simulation (20 grid sites)
Additional NSF Grid Efforts

2003 to 2005:
Enhanced Distributed Terascale Facility
4 original DTF sites plus Pittsburgh SC.

Tasks:
Enhancing the existing DTFs’ software and hardware
- Testing large scale applications.
- Widely connecting to users.
-
Limits of Current Grid Systems

Deployment of grid is still not easy.


Application scope is narrow, and killer apps are limited




High cost and case by case (e.g. NSF grid projects)
Increasingly more local clusters will satisfy applications.
Special ones by custom-designed HEC (ES, Blue-gene).
Global supercomputing is not cost- and performanceeffective: storing data is much cheaper than transferring.
Centralized administration and management
limiting the scalability.
 Creating single points of failures.

Beyond Client/Server World: Internet

The rapid growing Internet services are
provided by an increasing number of peers.

Variety of devices: from cell phones to a
Supercomputer Centers.

Pervasive computing: access information
and services anytime and anywhere.
Client/Server Model is Being Challenged
No single server or search engine can
sufficiently cover increasing Web contents.



21018 Bytes/year generated in Internet.
But only 31012 Bytes/year available to
public (0.00015%).
Google only searches 1.3108 Web pages.
(Source: IEEE Internet Computing, 2001)
Client/Server (continued)
Client/server model seriously limits utilization
of available bandwidth and service.

Popular servers and search engines become
traffic bottlenecks.

But high speed networks connecting many
clients become idle.

Computing cycles and information in clients
are ignored.
Content Delivery Networks:(CDN) A
Transition Model




Servers are decentralized (duplicated)
throughout the Internet.
The distributed servers are controlled by a
centralized authority (headquarters).
Examples: Internet content distributions by
Akamai, Overcast, and FFnet.
Both Client/Server and CDN models have
single point of failures.
A New Paradigm: Peer-oriented Systems

Both client (consumer) & server (producer).

Has the freedom to join and leave any time.

Huge peer diversity: service ability, storage
space, networking speed, and service demand.

A widely decentralized system opening for both
opportunities and new concerns.
Peer-oriented Systems
Client/server
a search engine/grid
Content Delivery Networks
Server
e.g. Akami
Duplicated
Server
Server
Hybrid P2P
Pure P2P
directory
e.g. Napster
e.g. Freenet & Gnutella
Objectives and Benefits of P2P
• As long as there no physical break in the network,
the target file will always be found.
• Adding more contents to P2P will not affect its
performance. (information scalability).
• Adding and removed nodes from P2P will not affect
its performance. (system scalability).
Peer-oriented Applications

File Sharing: document sharing among peers
with no or limited central controls.

Instant Messaging (IM): Immediate voice and
file exchanges among peers.

Distributed Processing: One can widely utilize
resources available in other remote peers.
P2P Network Infrastructure

Overlay networks: peers communicate to each other
in the application layer.

Making friends with an IP address globally without
considering distance, message types, low level protocols
used.

Peers are not required to understand physical networks,
creating a new domain of development opportunities.
More on Overlay Networks

Overlay Graph: each edge is a TCP connection or a
pointer to an IP address.

Overlay Maintenance: (1) periodically ping to verify
liveness of peers; (2) delete the edge with an dead
peer; (3) new peer needs to bootstrap.

Overlay Problems: (1) topology-unaware; (2)
duplicated messages; (3) inefficient network usage.
P2P Types and Operations

Directory-based P2P: a centralized index server
makes a direct map between a pair of requesting
and serving peers, e.g. Napster.

Unstructured P2P: peers are randomly connected in
overlay graph, flooding for queries/retrievals, e.g.
Gnutella, and KaZaA.

Structured P2P: peers are objectively connected in
overlay graph by a Distributed Hash Table for
registrations and queries/retrievals, e.g. Chord, CAN
Directory-based P2P of Sharing Music: Napster
join
get
query
file
answer
central index
...
Brief History and Implication of Napster

1999/1: Shawn Fanning (freshman, Northeastern), dropped out and started it.

1999/6: Napster began operations for swapping music among peers.

1999/12: lawsuit on copyright violation (RIAA), asking for $100K of each.

2000/3: universities ban it due to heavy traffic, e.g. 25% traffic in Uwisc.

2000/5: VC firm Hummer Winblad invested $15 millions to Napster.

2000/7/26: US District judge orders to stop Napster’s operations in 2 days.

2000/7/28: 9th US Circuit Appeals Court rules it is allowed to continue.

2001/2: Federal Appeals Court rules it must stop trading copyrighted music.

2001/9: It reaches a settlement with music writers/publishers: pay $26 M for
the past damage and a % to them as it starts as a paying service in 2002.
How does Naspter Work (very simple!)

Application-level: (1) client/server protocol over
point-to-point TCP/IP; (2) central directory server.

User operation steps:






connect to Napster server (www.napster.com)
upload a request list and the IP address in the server.
Index server searches the list and returns results to the IP.
User pings the music hosts, looking for best transfer rate.
User chooses a music provider for data transfer.
The index server does not scale its P2P system.
Unstructured P2P: Gnutella
flooding query
Super Node based P2P: KaZaA (Morpheus)
...
...
...
...
super peer
query
...
file
get
...
answer
Super Node based P2P: KaZaA (Morpheus)
...
flooding query
...
...
...
super peer
...
...
Distributed Hash Table (DHT)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Distributed Hash Table (DHT)
K V
K V
K V
K V
K V
K V
K V
K V
K V
insert
(K1,V1)
K V
K V
Distributed Hash Table (DHT)
K V
K V
K V
K V
K V
K V
K V
K V
K V
insert
(K1,V1)
K V
K V
Distributed Hash Table (DHT)
(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Distributed Hash Table (DHT)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
Problem 1: Loosing Security and Privacy

Providing a conduit for evil code and viruses.

Providing loopholes for information leakage.

Relaxing the privacy protection by exposing
peer identities.
Problem 2: Weak Resource Coordinations

With limited or no central control, but
mainly rely on self-organization.

Lacking communication monitoring and
scheduling: cause unnecessary traffic jams.

Lacking access and service coordinations:
unbalanced loads among peers.
Demanded Solution (1): Fast Peer Services

Dynamically identifying and collecting trusted
and guaranteed peers as the backbones.

Establishing adaptive self-organization and
monitoring for resource coordinations.

Fast data and service searching in low-diameter
region.
(2): Allowing Distrustful Peers Exist

Ensure that peer interactions
do not become intrusive (monitoring/scheduling)
do protect privacy (communication anonymity)
not used for denial-of-service attacks (security)
(3): Measurable Security Metrics

Benchmarks for security measurement.

Stochastical models for security analysis.

Validating systems and quantifying security
degrees.
(4): Understanding the Trade-offs



Analyzing the impact of centralized controls to
performance and security.
Quantifying the security loss and performance
gain/loss by decentralization.
Optimizing peer-oriented systems for individual
and combined objectives:
high performance, highly secured, balanced of both,
for a given performance objective, finding...
(5): Utilizing Existing Infrastructure

New standards and protocols should be
easily implemented in existing Internet.

Avoid modifying commonly used and
general purpose software.

Peer-oriented processing should be
automatic with little user involvement.
Factors determining P2P or Not P2P

Budget: applications demanding cost-effectiveness.

Resource relevance to peers: common interests.

Security: mutual trusts among peers.

Rate of peer changes: relatively stable applications.

Non-Critical solutions: QoS is not guaranteed.
NSF’s Efforts on Cyberinfrastructures

Grids: provides a global problem solving
environment for large and critical scientific
applications and professional collaborations,
where each grid is a server.


Funding sources: H&S infrastructure (continuous
support) and large ITRs on apps (00, 01, 02, 03).
P2P: provides a globally decentralized system
for anyone to participate.

Funding source: a large ITR for DHT (02).
Application Differences: Grid & P2P

Grid: providing (1) a global problem solving
environment for large scientific applications, (2)
commercial/public services, (2) professional
collaborations, where each grid is a server.

P2P: providing a self-organized information
sharing/searching services, where each peer can
be both server and client.
Operation Differences: Grid & P2P

Grid: objectively access to computing,
software, and data resources in remote &
targeted sites. (Servers-based)

P2P: random accesses to available
computing, software, and data resources
without a specific target. (Clients-based)
Different Participants: Grid & P2P

Grid: pre-determined and registered clients and
servers.

P2P: clients and servers are not distinguished
and registered (for an identity purpose), which
can come and go by their choices.
Different QoS: Grid & P2P

Grid: guaranteed and reliable services are
required for each grid server.

P2P: only partially reliable, because services
from some peers are not guaranteed and trusted.
Security Differences: Grid & P2P

Grid: authentication, authority, and firewall
protection to each grid.

P2P: privacy, anonymity, authentication,
authority, and fire wall protection to each
peer is not guaranteed.
Different Controls: Grid & P2P

Grid: centralized control plays an important
role in resource monitoring/allocations and
job scheduling.

P2P: limited or no central controls, mainly
rely on self-organization.
Edge and Utility Computing


Objectives: adaptively, timely, and (temporally)
move contents and computing resources from
centralized centers to sites (edge) close to the endusers.
Benefits:




QoS improvement (e.g. low response time)
High utilization of resources
Easy manageability and high availability of services
High cost-effectiveness.
Core Technology and Challenges

Dynamic resource provisioning: deployment of
Internet applications upon demand.

What we do not have but demand to have:

Automation of resource provisioning.

Optimization of resource provisioning

Effective service distributions of resource provisioning.
Merging P2P and Grid





Objectives: Building a Scalable and Reliable Cyber
Resource Sharing System.
Keys: Resource administration & management.
Keeping merits of grids on security and reliable
services.
Keeping merits of P2P on scalability and avoiding
single point of failures.
Balancing the trade-offs between Grid and P2P.
Heterogeneous Internet Members Co-exist

Billions of clients in the format of c-phones, PDAs,
laptops, PCs at home (Internet/wireless).
Millions of clients become termed super-peer nodes.
Millions of powerful clusters for local services.
Millions of trusted/independent grid nodes serve.
Millions of trusted/collaborative grid nodes serve.

Dozens of supercomputers for science advancement.




Future of Distributed Computing




Grid infrastructure will provide reliable service
(some computing) resources.
In a grid region, P2P techniques will be integrated
for resource administration and management.
P2P paradigm will play a major role for information
retrievals.
The demand for data accesses/transfers will be
higher than cycles.