Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Distributed, Internet and Grid Computing 1 Distributed Computing • Current supercomputers are too expensive • ASCI White (#1 in TOP500) costs more than $110 and needed a new building • Few institutions or research groups can afford this level of investment • There are more than 500 million PCs around the world • some as powerful as early 90s supercomputers • they are idle most of the time (60% to 90%), even when being used (spreadsheet, typing, printing,...) • corporations and institutions have hundreds or thousands of PCs on their networks Try to harness idle PCs on a network and use them on computationally intensive problems 2 Entropia network • Born in 1997 to apply idle computers worldwide to problems of scientific interest • In 2 years grew to more than 30,000 computers with aggregate speed of over 1 Tflop/second • Several scientific achievements, e.g. Identification of largest known prime number • Gone commercial: www.entropia.com and used for applications from: • Life sciences • Financial services • Product design, etc. 3 SETI @ home project setiathome.ssl.berkeley.edu • SETI = Search for ExtraTerrestrial Intelligence • Started in 1996 to enlist PCs to work on analysing data from the Arecibo radio telescope • Good mix of popular appeal and good technology • Now running on more than ½ million PCs • delivering ~ 1,200 CPU years per day • ~ 35 Tflops/sec • fastest (but special-purpose) computer in the world 4 Folding @ home project www.stanford.edu/group/pandegroup/Cosm • Enlists PCs to work on the protein folding problem • most important problem in modern molecular biology • From genome to structure: • Genome sequence of DNA specifies amino acids that make up proteins, but says little about their functions: what is needed is how a protein fold (3D structure) • Protein folding is very fast (microseconds) and complex • Simulation timescale is of the order of nanoseconds 10^3 gap distributed computing • Currently around 20,000 users 5 Great Internet Mersenne Prime Search mersenne.org • Started in 1996 to find large Mersenne Prime numbers (i.e. primes of the form 2^p – 1) • 3, 7, 31, 127, 8191,...are Mersenne primes, corresponding to p=2, 3, 5, 7, 13, ... • Currently 39 Mersenne primes are known; GIMPS found the largest 5: • 2^6972593 - 1 found on June 99 • 2^13466917 - 1 found on November 2001 (current largest; more than 4 million digits) • Are there infinitely many Mersenne primes? Not known • Uses Entropia Network and runs at ~ 3.4 Tflops/sec 6 • More Internet computing projects: • • • • Genome @ home genomeathome.stanford.edu Compute-against-Cancer www.parabon.com/cac.jsp Fight AIDS @ home www.fightaidsathome.org Climate simulation www.climate-dynamics.rl.ac.uk • More Internet computing companies: • Parabon www.parabon.com • United Devices www.uniteddevices.com • See more at www.aspenleaf.com/distributed 7 The GRID • Internet computing is just a special case of communities sharing resources to tackle common goals • Grid technologies: link data, computers, devices and other resources of teams (from different institutions, states, countries, continents) into a single virtual laboratory • Needed: protocols, services, software kits for flexible and controlled resource sharing on a large scale Internet Protocol (TCP-IP) Grid Protocol ? 8 • Grid Forum is working to create a formal standard: main tool is • Globus Toolkit: open-architecture and open-source infrastructure for Grid applications such as security, resource management, data access and sharing • Mostly driven by physics and CS groups (in Europe the Large Hadron Collider (LHC) at CERN, cost > 2 billion euro) • • • • Global Grid Forum www.gridforum.org Globus Project www.globus.org Grid Physics Network project www.griphyn.org European Data Grid eu-datagrid.web.cern.ch 9