Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Le projet EGEE Applications Guy Wormser LAL Orsay 15 Mars 2007 Web: information sharing • Invented at CERN by Tim Berners-Lee No. of Internet hosts (millions) • Quickly crossed over into public use Tim BernersLee Year • Agreed protocols: HTTP, HTML, URLs • Anyone can access information and post their own Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 2 Grid: Resource Sharing • • Share more than information Data, computing power, applications • Middleware handles everything Your Program The Grid Single computer PROGRAMS Word/Excel Games MIDDLEWARE User Interface Machine Your Program Email/Web Resource Broker OPERATING SYSTEM Disks, CPU etc Disk Server CPU CPU Cluster Cluster Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 3 Electricity Grid Analogy with the Electricity Power Grid Power Stations Distribution Infrastructure 'Standard Interface' Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 4 Définitions Grilles • Grilles: Ensemble distribué de ressources informatiques reliées par des réseaux rapides et accessible de façon transparente par l’utilisateur • EGEE: projet européen financé dans le cadre du 6ème PCRD à vovation pluridisciplinaire s’appuyant sur la physique des hautes énergies mais ouverts à bcp d’autres sciences: Biologie/médecine, sciences de la Terre, Astrophysique, chimie • LCG: LHC Computing GRID, grille internationale mise en place autour du CERN seul outil choisi pour satisfaire les besoins de calcul du LHC. • EGEE et LCG partagent la même infrastructure, la même équipe logicielle,etc… • Plus de 200 nœuds et 15000 processeurs sont opérationnels 24h/24 7jours/7. Les Data Challenges des expériences LHC sont complètement basées sur cette infrastructure • Tier1/2 Centres de ressources de la grille LCG/EGEE. – Tier1 : très fortes obligations de service, gros volume de stockage, dédié aux productions centralisées – Tier2 Forte réactivité pendant heures ouvrables, dédié analyses et simulation Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 5 EGEE : Enabling Grids for E-sciencE Goal create a general European Grid production quality infrastructure on top of present and future EU RN infrastructure Build on EU and EU member states major investment in Grid Technology Several pioneering prototype results Largest Grid development team in the world Goal can be achieved for about €100m/4 years on top of the national and regional initiatives Approach Leverage current and planned national and regional Grid programmes (e.g. LCG) Work closely with relevant industrial Grid developers, NRNs and US applications EGEE Geant network Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 6 The Large Hadron Collider Project 4 detectors CMS ATLAS LHCb Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 7 Bat 40 Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 8 New solutions are necessary! Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 9 LHC Computing Model Lab m Uni x Uni a USA Brookhaven Lab a UK USA FermiLab Physics Department France The LHC Computing Tier 1 Centre Tier2 CERN Uni n ………. Italy Desktop NL Germany Lab b Lab c Uni y [email protected] Uni b Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 10 HEP commitment to Grids • 2000-2003 : Exploratory phase. Several R&D projects in the world to develop the middleware, build mid size test beds, port some applications – US : PPDG, Griphyn – Europe : DATAGRID – Several initiatives in Asia • 2002: Decision to build a grid infrastructure as THE TOOL for LHC computing. Creation of the LCG project (LHC Grid Computing). Point of no return! • 2007 Large scale deployment of the world largest distributed computing infrastructure(s): – LCG, EGEE, OSG (US), NAREGI(Japan), etc…. – >200 sites, 35 000 processors, 12 PB of storage Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 11 Grids are a reality Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 12 Deployment of applications • Pilot applications – High Energy Physics (LHC + D0, BaBar, CDF) – Biomed applications (12) • Generic applications – Deployment under way – – – – Computational Chemistry Earth science research EGEODE: first industrial application Astrophysics • With interest from – – – – – – Hydrology Seismology Grid search engines Stock market simulators Digital video etc. Industry (provider, user, supplier) Pilot New Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 13 Status 250 No. Sites 200 150 100 50 Ap r- 04 Ju n0 Au 4 g04 O ct -0 4 D ec -0 4 Fe b05 Ap r05 Ju n05 Au g05 O ct -0 5 D ec -0 5 Fe b06 Ap r06 Ju n0 Au 6 g06 O ct -0 6 D ec -0 6 0 ~17.5 million jobs run (6450 cpu-years) in 2006; Workloads of the “not HEP VOs” start to be significant – approaching 8-10K jobs per day; and 1000 cpu-months/month • one year ago this was the overall scale of work for all VOs 40000 35000 25000 20000 15000 10000 5000 04 Ju n0 Au 4 g04 O ct -0 4 D ec -0 4 Fe b05 Ap r05 Ju n0 Au 5 g05 O ct -0 5 D ec -0 5 Fe b06 Ap r06 Ju n0 Au 6 g06 O ct -0 6 D ec -0 6 0 Ap r- No. CPU 30000 Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 14 Grid Virtual Organizations • Routine and large-scale use of EGEE infrastructure. • Virtual Organizations: – – 200+ visible on the grid 100+ registered with EGEE http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_vo.php Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 15 CPU Usage Sep. ’06 Jan. ’06 Virtual Organizations Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 16 Production service Sites 200 180 160 140 120 100 80 sites 60 40 20 ec -0 5 D ct -0 5 O Au g05 Ju n05 Fe b05 Ap r05 ec -0 4 D ct -0 4 O Au g04 Ju n04 Ap r- 04 0 Size of the infrastructure today: • 192 sites in 40 countries • ~25 000 CPU • ~ 3 PB disk, + tape MSS 30000 No. CPU 25000 20000 CPU 15000 10000 5000 A pr -0 4 Ju n04 A ug -0 4 O ct -0 4 D ec -0 4 Fe b05 A pr -0 5 Ju n05 A ug -0 5 O ct -0 5 D ec -0 5 Fe b06 0 Date Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 17 EGEE Resources #countries #sites #cpu #cpu DoW disk (TB) CERN 0 1 4400 1800 770* UK/I 2 23 4306 2010 310 Italy 1 27 2800 2280 373 France 1 10 2316 1252 300* De/CH 2 13 2895 1852 280* Northern Europe 6 16 2379 1860 64 SW Europe 2 13 956 898 16* SE Europe 8 26 1101 1189 30 Central Europe 7 21 1584 1163 70 Russia 1 15 515 445 38 Asia-Pacific 8 19 840 751 72 North America 2 8 4069 - 229 Totals 40 192 28161 20265 2552 Region * Estimates taken from reporting as IS publishes total MSS space Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 18 Usage of the infrastructure other VOs EGEE workload planck 1800000 Jobs/month ops >50k jobs/day 1600000 magic 1400000 lhcb 1200000 geant4 1000000 fusion esr 800000 egrid 600000 egeode 400000 dteam 200000 compchem Au g06 Ju l-0 6 Ju n06 06 M ay - 06 Ap r- 06 M ar - Fe b06 Ja n06 ec -0 5 D ov -0 5 N ct -0 5 O Se p05 Au g05 Ju l-0 5 Ju n05 05 M ay - 05 Ap r- M ar - 05 cms Fe b05 Ja n05 0 biomed atlas alice Normalized CPU time other VOs 6000000 planck magic lhcb 4000000 geant4 fusion 3000000 esr egrid 2000000 egeode dteam 1000000 compchem cms 0 Au g06 Ju l-0 6 Ju n06 06 M ay - 06 Ap r- 06 M ar - Fe b06 Ja n06 ec -0 5 D ov -0 5 N ct -0 5 O Se p05 Au g05 Ju l-0 5 Ju n05 05 M ay - 05 Ap r- M ar - 05 biomed Fe b05 Ja n05 k.SI2k. hours ops ~7000 CPU-months/month 5000000 atlas alice Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 19 Non-LHC VOs EGEE workload 250,000 planck 200,000 ops Jobs/month magic geant4 150,000 fusion esr 100,000 egrid egeode 50,000 compchem biomed Au g06 Ju l-0 6 Ju n06 06 06 06 Fe b06 Ja n06 ec -0 5 ov -0 5 ct -0 5 Se p05 Au g05 Ju l-0 5 Ju n05 05 05 05 Fe b05 other VOs M ay - Ap r- M ar - D N O M ay - Ap r- Workloads of the “other VOs” start to be significant – approaching 810K jobs per day; and 1000 cpu-months/month • one year ago this was the overall scale of work for all VOs M ar - Ja n05 0 Normalized CPU time 800,000 planck 700,000 ops magic geant4 500,000 f usion esr 400,000 egrid 300,000 egeode dteam 200,000 compchem 100,000 biomed other VOs -0 6 A ug Ju l-0 6 6 Ju n0 06 M ay - 06 A pr - 06 M ar - 6 eb -0 6 F Ja n0 ec -0 5 D ov -0 5 N ct -0 5 O -0 5 S ep -0 5 A ug Ju l-0 5 5 Ju n0 05 M ay - 05 A pr - 05 M ar - eb -0 5 F 5 0 Ja n0 k.SI2k. hours 600,000 Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 20 Grids: a great oportunity for interdisciplinary contacts • Grids have become a great vehicle for promoting interdisciplinary contacts between HEP, many other application fields and computing scientists – – – – – – – – – – Bioinformatics Medecine Chemistry Fusion science Earth sciences Astrophysics/astronomy Neuroinformatics Climate Finance ….. • HEP can be proud to have been a key player in these endeavours Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 21 D0 MC efficiency on LCG2 since Xmas (but small statistics) CE Success Failed bohr0001.tier2.hep.man.ac.uk 237 3 cclcgceli01.in2p3.fr – grid-ce.physik.uni-wuppertal.de - 14 - - gridkap01.fzk.de 2564 19 golias25.farm.particle.cz 198 15 heplnx131.pp.rl.ac.uk 246 4 lcgce02.gridpp.rl.ac.uk 293 10 mu6.matrix.sara.nl 397 7 tbn18.nikhef.nl 154 2 Total 4089 74 •Efficiency 98 % •System running monitored very closely by run-manager in close contact with sites Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 22 APPLICATIONS ported on EGEE Earth Observation by Satellite Hydrology Solid Earth Physics Meteorology Climate Geosciences Chemistry of the Mars Upper Atmosphere Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 23 SEISMOLOGY[1] Fast Determination of mechanisms of important earthquakes (IPGP: E. Clévédé, G. Patau) Challenge Provide results 24h -48h after its occurrence 5 Seisms already ported: Peru, Guadeloupe, Indonesia (Dec.), Japon, Indonesia (Feb.) Application to run on alert Collect data of 30 seismic stations from GEOSCOPE worldwide network Select stations and data Peru earthquake, 23/6/2001, Mw=8.3 Definition of a spatial 3D grid +time Data used: 15 Geoscope Stations Run for example 50-100jobs Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 24 Management of water resources in Mediterranean area (SWIMED) G. Lecca (CRS4 Italy), P. Renard (Unine, CH), J. Kerrou (INAT, Tunisia), R. Ababou (IMFT, Fr) Korba coastal aquifer Tunisia 45 km Cape Bon Peninsula 70km south-east of Tunis Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 25 GEOSCIENCES • Generic seismic platform software, based on Geocluster commercial software developed by CGG • Includes 400 geophysical modules, implemented on EGEE • Used by both academics and private companies. • Free of charge for Academics, with charge for R&D Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 26 Status of Biomedical VO RLS, VO LDAP Server: CC-IN2P3 PADOVA BARI 4 RBs: CNAF, IFAE, LAPP, UPV 15 resource centres ( ) 17 CEs (>750 CPUs) 16 SEs 4 RBs 1 RLS 1 LDAP Server Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 27 GATE GEANT4 Application to Tomography Emission • Scientific objectives Radiotherapy planning for improving the treatment of cancer by ionizing radiations of the tumours. Therapy planning is computed from pre-treatment MR scans by accurately locating tumours in 3D and computing radiation doses applied to the patients. • Method GEANT4 base software to model physics of nuclear medicine. Use Monte Carlo simulation to improve accuracy of computations (as compared to the deterministic classical approach) Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 28 Drug Discovery • WISDOM focuses on in silico drug discovery for neglected and emerging diseases. • Malaria — Summer 2005 – 46 million ligands docked – 1 million selected – 1TB data produced; 80 CPU-years used in 6 weeks • Avian Flu — Spring 2006 – H5N1 neuraminidase – Impact of selected point mutations on eff. of existing drugs – Identification of new potential drugs acting on mutated N1 • Fall 2006 – Extension to other neglected diseases Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 29 WISDOM : Wide In Silico Docking On Malaria • Goals of the first biomedical “data challenge” (July - August 2005) – Biological goal : Proposition of new inhibitors for a family of proteins produced by Plasmodium falciparum – Biomed. informatics goal : Deployment of in silico virtual docking on the grid – Grid goal : Deployment of a CPU consuming application generating large data flows to test the grid infrastructure and services. • Partners – Fraunhofer SCAI (Project PI: Martin Hofmann) – LPC Clermont-Ferrand (CNRS/IN2P3) – CMBA (Center for Bio-Active Molecules screening) • Representing different projects: – EGEE (EU FP6) – Simdat (EU FP6) – AuverGrid and Campus Grid (French and German Regional Grids) – Accamba project (french ACI project) Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 30 High Throughput Virtual Docking Millions of chemical compounds available in laboratories Chemical compounds : ZINC Molecular docking : FlexX, Autodock Targets structures : PDB Grid infrastructure : EGEE Chemical compounds : Chembridge – 500,000 Drug like – 500,000 High Throughput Screening 1-10$/compound, nearly impossible Molecular docking (FlexX, Autodock) ~80 CPU years, 1 TB data Computational data challenge ~6 weeks on ~1000/1600 computers Targets : Plasmepsin II (1lee, 1lf2, 1lf3) Plasmepsin IV (1ls5) Hits screening using assays performed on living cells Leads Clinical testing Drug Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 31 Modeling for the molecular docking • Target scenarios – • Software scenarios – – • Docking methods (Autodock) Water molecules place and max overlapping volume (Flexx) Target preparation – – – – • number of water molecules in the active site X-ray crystal structures of 5 plasmepsins (PDB) Superimposed all the proteins on to 1lee (PDB Kabsch and PDB transform) Native ligand conversion in mol2 and hydrogens added (Babel and Corina) Active site created from native crystal ligand Compounds preparation – – Yet drug like Conversion for Autodock in pdbqs Active site Ligand Loops variation between structures Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 32 Grid workflow Results Compounds list Software Storage Element Site1 Computing Element Statistics Parameter settings Target structures Compounds sublists User interface Compounds database Storage Element Results Computing Element Site2 Software • FlexX license server : – 3000 floating licenses given by BioSolveIT to SCAI – Maximum number of used licenses was 1008 Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 33 Score results in different scenarios with VS Explorer (SCAI) Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 34 gPTM3D 3D Medical Image Analysis Software • Scientific objectives Interactive volume reconstruction on large radiological data. PTM3D is an interactive tool for performing computer-assisted 3D segmentation and volume reconstruction and measurement (RSNA 2004) Reconstruction of complex organs (e.g. lung) or entire body from modern CT-scans is involved in augmented reality use case e.g. therapy planning. • Method Starting from an hand-made rough Initialization,a snake-based algorithm segments each slice of a medical volume. 3D reconstruction is achieved in parallel by triangulating contours from consecutive slices. Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 35 GPS@ • Grid added value The NPS@ portal records 3000 hits a day and is limited in the size of the databanks and the kind of computations performed by local resources. The grid version, GPS@, can: - for biological data: provide Biologist with a convenient way to distribute and to access to international databanks, and to store more and larger of these databanks - for bioinformatic algorithms: allow each portal user to compute larger datasets with the available algorithms through larger bioinformatic computations - Open to a wider user community. • Results and perspectives 9 world-used bioinformatic softwares have currently been gridified: such as BLAST, CLUSTALW, PattInProt, … GPS@ is stressing the grid infrastructure with a large number of rather short jobs (few minutes each). Optimizations are worked on to: - Speed-up access to databases. - Lower short jobs latencies. - Processing data or software dependent jobs (workflow) Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 36 Conclusion • Les grilles : Une très grande réussite en quelques années du concept à une réalité opérationnelle! • « Huge infrastructure in place, other sciences embarked, very high political visibility, HEP role clearly recognised » • Rôle très important de la France et du CNRS dans EGEE, notamment au cœur du secteur crucial des applications: notre mission d’ouvrir le spectre des applications d’EGEE a été réussie au-delà de toutes les espérances • Il faut bâtir sur le succès d’EGEE pour construire la suite pérenne de l’après EGEE-II: une European Grid Initiative basée sur des « National Grid Initiatives » Initiative nationale de Grille française en cours de construction avec le ministère Guy Wormser, Entrevue Arnold Migus , 23 Février 2006 37