Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 What is a Grid? Name borrowed from the power grid. • The concept: A ubiquitous information & computation resource A definition • a network of compute and data resources that has been supplemented with a layer of services that provide uniform and secure access to a set of applications of interest to a distributed community of users. Grids may be wide-area or enterprise Scientific Challenges Genetics and Disease Susceptibility Science Communities and Outreach The current and future generations of scientific problems • Communities are: • CERN’s Large Hadron Collider experiments • Data Oriented • Physicists working in HEP and similarly data intensive scientific Increasingly stream based. disciplines Often need petabyte • National collaborators and those across the digital divide in archives disadvantaged countries • In need of on-demand • Scope computing resources • Interoperation between LHC Data Gridby Hierarchy and ETF • Conducted geographically • Create and Deploy Scientific distributed teamsGrid of specialists Data and Services Portals • Bringdon’t the Power of ETF bear Who want totobecome on LHC Physics Analysis: Help experts in grid computing. discover the Higgs Boson! Phenotype 1 Phenotype 2 Phenotype 3 Phenotype 4 Ethnicity Environment Age Gender • Partners Identify Genes • Caltech Pharmacokinetic s • UniversityMetabolism ofFlorida Endocrine Biomarker Physiology Proteome • Open Science Grid and Grid3 Signatures Transcriptome Immune Morphometrics • Fermilab • DOE PPDG Predictive Disease Susceptibility Terry Magnuson, UNC •Source: CERN Storms Forming • NSF GriPhyn and iVDGL Forecast Model Streaming• EU LCG and EGEE Data Mining Observations • Brazil (UERJ,…) On-Demand • Pakistan (NUST, …) Storm predictions • Korea (KAIST,…) LHC Data Distribution Model Information/Knowledge Grids Distributed (10’s to 1000’s) of data sources (instruments, file systems, curated databases …) Data Deluge: 1 (now) to 100’s petabytes/year (2012) • Moore’s law for Sensors Possible filters assigned dynamically (on-demand) • Run image processing algorithm on telescope image • Run Gene sequencing algorithm on compiled data Needs decision support front end with “what-if” simulations Metadata (provenance) critical to annotate data Integrate across experiments as in multi-wavelength astronomy Data Deluge comes from pixels/year available Internet Scale Distributed Services Grids use Internet technology to manage sets of network connected resources • Classic Web: independent one-to-one access to individual resources • Grids integrate together and manage multiple Internetconnected resources: People, Sensors, computers, data systems Grids are built on top of commodity web service technology with broad industry support Organization can be explicit as in • TeraGrid which federates many supercomputers; • CrisisGrid which federates first responders, commanders, sensors, GIS, (Tsunami) simulations, science/public data Organization can be implicit such as curated databases and simulation resources that “harmonize a community” The Architecture of Gateway Grids The Users Desktop. Grid Portal Server Gateway Services Proxy Certificate Server / vault Application Workflow Application Deployment Application Events Resource Broker App. Resource catalogs User Metadata Catalog Replica Mgmt Core Grid Services Security Services Information Services Self Management Resource Management OGSA-like Layer Physical Resource Layer Execution Management Data Services Let’s look at a few real examples (about a dozen … many more exist!) BIRN – Biomedical Information Mesoscale Meteorology NSF LEAD project - making the tools that are needed to make accurate predictions of tornados and hurricanes. - Data exploration and Grid workflow Workflow in the LEAD Grid Katrina output Renci Bio Gateway Providing access to biotechnology tools running on a back-end Grid. - leverage state-wide investment in bioinformatics - undergraduate & graduate education, faculty research - another portal soon: national evolutionary synthesis center X-Ray Crystallography SERVOGrid SERVOGrid Requirements Seamless Access to Data repositories and large scale computers Integration of multiple data sources including sensors, databases, file systems with analysis system • Including filtered OGSA-DAI (Grid database access) Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid Portals with component model for user interfaces and web control of all capabilities Collaboration to support world-wide work Basic Grid tools: workflow and notification NOT metacomputing Repositories Federated Databases Database Sensors Streaming Data Field Trip Data Database Sensor Grid Database Grid Research SERVOGrid Education Compute Grid Data Filter Services Research Simulations ? GIS Discovery Grid Services Customization Services From Research to Education Analysis and Visualization Portal Grid of Grids: Research Grid and Education Grid Education Grid Computer Farm Google maps can be integrated with Web Feature Service Archives to filter and browse seismic records. Integrating Archived Web Feature Services and Google Maps MyGrid - Bioinformatics The Williams Workflows A A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence B C BioInformatics Grid Chemical Informatics Grid … Services HTS Tools Quantum Calculations CIS … Sequencing Tools Biocomplexity Simulations BIS Domain Specific Grids/Services Information/Knowledge Collaboration Portals Compute/Supercomputer MIS Instrument/Sensor Application Services Policy Data Access/Storage Discovery Security Core Low Level Grid Services Messaging Workflow Metadata Management Physical Network M(B,C)IS is Molecular (Bio, Chem) Information System supporting specific metadata (CML, CellML, SBML) and physical representations Comments on Grid Components Support GT4 and WS-I+(+); Support Java and .NET Portals – all services will have a portlet interface Compute Grid -- This is some sort of Condor Grid (as used by Cambridge) Supercomputer Grid -- (extended) TeraGrid Workflow, Metadata, Information Management – learn from Taverna, link with BPEL style workflow, link with other Semantic Grid/metadata services Instruments – learn from CIMA/Reciprocal Net, compare with Sensors in LEAD/SERVOGrid MIS/CIS – See if idea sensible – in any case need CML, LSID, Molecular visualization Application Services – Need a wizard. Support “filters” (Wild) and loosely coupled simulations (Baik) Data – Link to PubChem and Bioinformatics – link to Baik database Discovery – Extended UDDI Security – review any special requirements and status of PubChem, caBIG, myGrid etc, Collaboration, Management, Messaging, Policy -- nothing special needed