* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Life Sciences Platform
Survey
Document related concepts
Transcript
Session id: 40263 Oracle Life Sciences Platform and 10g Preview Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining [email protected] Oracle Corporation Welcome to the Oracle Life Sciences User Group Meeting Oracle HQ Bldg 350 Conference Center Redwood Shores, CA September 10th, 2003 8:30 am-7:30 pm Oracle Life Sciences Day & User Group Meeting Agenda 8:00-8:30 8:30-8:45 8:45-9:45 Breakfast Welcome Oracle's Platform for Life Sciences - New 10G Features Preview & Solicitation Process for Features in Next Release Charlie Berger, Oracle Corporation 9:45-10:30 New In Silico Drug Discovery Integrated Demo Joyce Peng, Oracle Corporation 10:30-10:50 Break 10:50-11:30 European Bioinformatics Institutes (EBI), Peter Stoehr Managing Scientific Literature (Medline) and XML Data Within Oracle 11:30-12:10 The Wellcome Trust Sanger Institute, Martin Widlake Implementing a Terascale Data Store (20 TB) 12:10-1:00 Lunch & Wish List Feature Post-it Notes 1:00-1:40 Wyeth Research, Peter Smith 21 CFR PART 11 via Oracle Auditing at Wyeth Oracle Life Sciences Day & User Group Meeting Agenda 1:40-2:20 2:20-3:00 3:00-3:20 3:20-4:00 4:00-4:40 4:40-5:20 5:20-5:30 5:30-6:30 6:30-7:30 Sequence Search Capabilities in the Database, Myriad Proteomics Johnson & Johnson, Richard Guida & Rajesh Shah Building a Secure Infrastructure with Oracle in Life Sciences, J & J PKI and Secure Connectivity to Oracle Break & Afternoon Refreshments Kyoto University, Japan, Susumu Goto Integrating Biological Information and Pathways using Oracle, KEGG at Kyoto University BioMed Central Limited, Matthew Cockerill Managing Scientific Images with Oracle - Multimedia Database Improves the Bottom Line Abbott Laboratories, Shon Naeymirad Electronic Records, 21 CFR Part 11 and Oracle 9i Break ISV Lightening Rounds, Life Sciences ISV Partners ISV Reception and Demo Grounds Oracle’s Commitment "My industry is going to become pretty boring soon – I don't believe you'll ever see this proliferation of informatics companies or computer companies like you saw in the decade of the Nineties. The life sciences industry is where the horizons are wide open. There'll be lots and lots of companies born, lots of new products, lots of new science at least for the next 50 years. Because of that...we've decided to focus heavily on the life sciences industry.” -Larry Ellison, CEO, Oracle Corporation, Bio-IT World magazine, premier issue March 2002 Life Sciences Value Chain Public/ Private Data Discovery Development Sample Data Biotech / Pharmaceutical Research Labs Biomedical Firm Pharmaceutical Company Pharmaceutical Company Pharmaceutical Mfg. Plant Biomedical Firm Regulatory Agency Contract Research Organization Distribution Pharmacy Manufacturing, Sales and Marketing Hospital Oracle’s Solutions for Life Sciences Discovery Discovery Finance Sales & Marketing HR Projects Development & Clinical Maintenance Manufacture/ Supply Chain Management Database Manage all your data Application Server Run all your applications Drug Discovery Economics 101 Better Data Management Accelerates Discovery Competition from Generics Goal: Accelerate the Discovery Process Revenue Sales Revenue 15 Costs RR && DD Costs 20 Years Product Launch Costs Identify Clinical Identify Identify Identify Pre- Preand and Clinical Trials and and Trails Clinical Validate Validate Targets Leads Validate Validate Trails Targets Patent Expiry Clinical Trials Leads Source: Ernst & Young, Price Waterhouse Life Sciences Discovery Genes and Proteins Run the Cell Organism Cell Nucleus Chromosome Protein Gene (mRNA) Graphics courtesy of the National Human Genome Research Institute Gene (DNA) Life Sciences Challenge Correlate Biological and DNA Variation 3.2 billion letters of human DNA ~ 2 million variation points (SNPs) SNP = Single Nucleotide Polymorphism a at t g g aa g c a aa t g a ca t c a ca g c a gg t c a ga g a a aa a g g gt t g a gc g g c ag g c a cc c a g ag t a g ta g g t ct t t g gc a t t ag g a g ct t g a gc c c a ga c g g cc c t a gc a g g ga c c c ca g c g cc c g a ga g a c ca t g c ag a g g tc g c c tc t g g aa a a g gc c a g cg t t g tc t c c aa a c t tt t t t tc a g c tg g a c ca g a c ca a t t tt g a g ga a a g ga t a c ag a c a gc g c c tg g a a tt g t c ag a c a ta t a c ca a a t cc c t t ct g t t ga t t c tg c t g ac a a t ct a t c tg a a a aa t t g ga a a g ag a a agaatttcat at[T/C]gtg gaagaggac t gg g a t ag a g a gc t g g ct t c a aa g a a aa a t c ct a a a ct c a t ta a t g cc c t t cg g c g at g t t tt t t c tg g a g at t t a tg t t c ta t g g aa t c t tt t t a ta t t t ag g g g aa g t c ac c a a ag c a g ta c a g cc t c t ct t a c tg g g a ag a a t ca t a g ct t c c ta t g a cc c g g at a a c aa g g a gg a a c gc t c t at c g c ga t t t at c t a gg c a t ag g c t ta t g c ct t c t ct t t a tt g t g ag g a c ac t g c tc c t a ca c c c ag c c a tt t t t gg c c t tc a t c ac a t t gg a a t gc a g a tg a g a at a g c ta t g t tt a g t tt g a t tt a t a ag a a g ac t t t aa a g c tg t c a ag c c g tg t t c ta g a t aa a a t aa g t a tt g g a ca a c t tg t t a gt c t c ct t t c ca a c a ac c t g aa c a a at t t g at g a a gg a c t tg c a t tg g c a ca t t t cg t g t gg a t c gc t c c tt t g c aa g t g gc a c t cc t c a tg g g g ct a a t ct g g g ag t t g tt a c a gg c g t ct g c c tt c t g tg g a c tt g g t tt c c t ga t a g tc c t t gc c c t tt t t c ag g c t gg g c t ag g g a ga a t g at g a t ga a g t ac a g a ga t c a ga g a g ct g g g aa g a t ca g t g aa a g a ct t g t ga t t a cc t c a ga a a t ga t t g aa a a t at c c a at c t g tt a a g gc a t a ct g c t gg g a a ga a g c aa t g g aa a a a at g a t tg a a a ac t t a ag a c a aa c a g aa c t g aa a c t ga c t c gg a a g gc a g c ct a t g tg a g a ta c t t ca a t a gc t c a gc c t t ct t c t tc t c a gg g t t ct t t g tg g t g tt t t t at c t g tg c t t cc c t a tg c a c ta a t c aa a g g aa t c a tc c t c cg g a a aa t a t tc a c c ac c a t ct c a t tc t g c at t g t tc t g c gc a t g gc g g t ca c t c gg c a a tt t c c ct g g g ct g t a ca a a c at g g t at g a c tc t c t tg g a g ca a t a aa c a a aa t a c ag g a t tt c t t ac a a a ag c a a ga a t a ta a g a ca t t g ga a t a ta a c t ta a c g ac t a c ag a a g ta g t g at g g a ga a t g ta a c a gc c t t ct g g g ag g a g gg a t t tg g g g aa t t a tt t g a ga a a g ca a a a ca a a a ca a t a ac a a t ag a a a aa c t t ct a a t gg t g a tg a c a gc c t c tt c t t ca g t a at t t c tc a c t tc t t g gt a c t cc t g t cc t g a aa g a t at t a a tt t c a ag a t a ga a a g ag g a c ag t t g tt g g c gg t t g ct g g a tc c a c tg g a g ca g g c aa g a c tt c a c tt c t a at g a t ga t t a tg g g a ga a c t gg a g c ct t c a ga g g g ta a a a tt a a g ca c a g tg g a a ga a t t tc a t t ct g t t ct c a g tt t t c ct g g a tt a t g cc t g g ca c c a tt a a a ga a a a ta t c a tC TT t gg t g t tt c c t at g a t ga a t a ta g t ac a g a ag c g t ca t c a aa g c a tg c c a ac t a g aa g a g ga c a t ct c c a ag t t t gc a g a ga a a g ac a a t at a g t tc t t g ga g a a gg t g g aa t c a ca c t g ag t g g ag g t c aa c g a gc a a g aa t t Graphics courtesy of the National Human Genome Research Institute Life Sciences Challenge Correlate Diseases, Genes and Environment Stroke Breast cancer Diabetes Schizophrenia Manic-depression Myocardial Infarction Hypertension Obesity Hyperlipidemia Inflammatory Bowel Disease Graphics courtesy of the National Human Genome Research Institute Life Science Challenge Exploding Volumes of Data 500TB 450TB 400TB 350TB 300TB 250TB Data Storage Today 200TB 150TB 100TB 50TB 1994 1995 1996 1997 1998 Oct-1999 Apr-2000 Nov-2001 Jan-01 2002 2003 2004 2005 2006 “To meet the scientific goals we believe we need to add around 80 - 100TB of storage each year for the next 5 years” 0 P. Butcher, The Sanger Centre Life Science Challenge Many Different Kinds of Data Genomics Proteomics Modeling Pathways Clinical Pharmacogenomics Functional Genomics Graphic modified from original courtesy of Sun Microsystems Cheminformatics Life Science Challenge Just A Few Biological Databases Life Science Challenge Typical Research Environment Public Access Databases heterogeneous data Manage vast Local quantities of Databases data Collaborate securely Industrial Research Lab Local Copies Integrate Find Patterns a variety and of data Private/Service insights Databases types Access Partner or heterogeneous Collaborator Data Oracle Vision : At the core is a data management platform Run All Your Applications Manage All Your Data Browser Mobile Device Clients Oracle10g App Server Oracle10g Database Server Introducing Oracle 10g Runs all your applications Stores all your information Highly scalable, available, reliable Secure Easy to manage – Make individual systems self-managing – Manage thousands of servers at once Oracle’s Platform for Life Sciences Genomics Proteomics Cheminformatics Pathways Clinical 1. 2. 3. 4. 5. Access heterogeneous data Integrate a variety of data types Manage vast quantities of data Find patterns and insights Collaborate securely Oracle Life Sciences Platform Access heterogeneous data Integrate a variety of data types Manage vast quantities of data Find Patterns and insights Collaborate securely Access heterogeneous Data Oracle Life Sciences Platform Transparent Gateways Fast access using Oracle OCI e.g. PubMed MySQL GenBank e.g. Distributed Queries searches across domains Manage Access Perform Collaborate External Tables Generic Gateways vast Ability to index and Access any data using ODBC heterogeneous securely query external files quantities Realof Application Clusters UltraSearch data Linear scalability Search external sites Oracle Portal data & repositories Build personalized portals Application Server Provide scalability for the middle tier e.g. XML DB Security Collaboration Suite Flexibly manage data Enforce security interMedia Auditing Store & manage images Create audit trail to facilitate FDA compliance Collaborate securely Workflow SwissProt SP-ML Automate laboratory & business processes MySQL Toolkit Easily move MySQL data into Oracle iFS/Files Share documents Access Integrate Extensibility Mining Find Patterns Data heterogeneous SQL a Loader variety Framework Discover patterns & insights Transportable (Data cartridges), manage High performance data loader and Statistics scientific data Tablespaces Data of datacomplex Web Services Perform basic statistics LOBs Rapidly exchange tables insights Standard communication Manage unstructured data Table Functions Oracle Streams types between applications Implement complex algorithms O Cl Cl Merge/Upsert Enabling update and insert in one step Text Index & query text, e.g. literature searches OLAP & Discoverer Interactive query & drill-down Rule-based subscription for information sharing Oracle Life Sciences Platform Transparent Gateways Fast access using Oracle OCI e.g. PubMed MySQL GenBank e.g. Distributed Queries Perform searches across domains External Tables Generic Gateways Ability to index and query external files Access any data using ODBC Real Application Clusters Oracle Portal Build personalized portals Application Server Provide scalability for the middle tier e.g. SwissProt SP-ML SQL Loader High performance data loader Web Services Standard communication between applications Merge/Upsert Enabling update and insert in one step Linear scalability XML DB Security Collaboration Suite Flexibly manage data Enforce security interMedia Auditing Store & manage images Create audit trail to facilitate FDA compliance Collaborate securely Workflow Extensibility Framework Automate laboratory & business processes O Cl Cl (Data cartridges), manage complex scientific data MySQL Toolkit Easily move MySQL data into Oracle Share documents Data Mining Discover patterns & insights Statistics Perform basic statistics Manage unstructured data Table Functions Index & query text, e.g. literature searches Search external sites & repositories iFS/Files LOBs Text UltraSearch Transportable Tablespaces Rapidly exchange tables Oracle Streams Implement complex algorithms Rule-based subscription for information sharing OLAP & Discoverer Interactive query & drill-down 1. Access Heterogeneous Data UltraSearch External Sites Distributed query Flat files External Table Sybase MySQL Generic Connectivity MySQL Migration Toolkit DBlinks Transportable Tablespaces DB2 Transparent Transparent Gateway Gateway 1. Access Heterogeneous Data Flat files Oracle Transparent Gateways – Integrate data from disparate systems Generic Connectivity – ODBC/JDBC connectivity External Tables – Access data from flat files Distributed Queries – Query across multiple Oracle and heterogeneous data sources Transportable tablespaces – Rapidly move tablespaces between Oracle databases MySQL SQL*Loader – High performance data loader Oracle Streams – Rule-based subscription for information sharing Dblinks – Connectivity between databases UltraSearch – Query range of data repositories (web sites, files, email, databases, etc.) Migration Toolkits – Tools to facilitate movement of data into Oracle Merge / Upsert – Update and insert in one step 2. Integrate a Variety of Data Types Genomics Proteomics Modeling Pathways Clinical Pharmacogenomics Functional Genomics Graphic modified from original courtesy of Sun Microsystems Cheminformatics 2. Integrate a Variety of Data Types XML DB – – Unite XML content and relational data SQL & XML become one LOBs – Manage unstructured data Internet File System (Oracle Files) – Manage files and folders Text – Index and query of text content & documents (Word, Powerpoint, HTML, Adobe PDFs, etc.) interMedia – Manage audio, video and image data XML European Bioinformatics Institute (EBI) Hosts major public databases (e.g. SwissProt, EMBL Nucleotide Sequence Database, Medline) on Oracle. (Total: > 5 TB) Uses Oracle XML DB and Oracle Text for Medline – in development. – Size: 11 million records, 200 GB Uses Oracle9i Database and Application Server. 2. Integrate a Variety of Data Types Extensibility Framework (Data Cartridges) - Manage complex scientific data Oracle9i Server Chemical Searching Chemistry searching requires special techniques – Chemical name is not unique Chemical Searching Chemistry searching requires special techniques “Viagra®” – Chemical name is not unique Chemical Searching Chemistry searching requires special techniques “Viagra®” – Chemical name is not unique “sildenafil citrate” Chemical Searching Chemistry searching requires special techniques “Viagra®” – Chemical name is not unique – Chemists think graphically “sildenafil citrate” H H O H H O N N N H H N N N S H O H O H Chemical Searching Chemistry searching requires special techniques – Chemical name is not unique – Chemists think graphically “Viagra®” “sildenafil citrate” H H H The solution: O H O N N N – A graphical user interface H O operators such as substructure search (“sss”) = a chemical “contains” O Cl Cl N S H finds N N –Specialized H H O H MDL Information Systems, Inc. MDL Discovery Framework A multi-tier system for managing and integrating discovery data and workflows – Domain-specific application and database services and API – Chemistry rules, drawing, and rendering – Single application access to multiple DBs and services Key Advantages – – – Integrate data sources across R&D Easily create web or client solutions Quickly adopt new tools and methods for development www.mdl.com Oracle Features – – – Oracle 8i/9i Database Extensibility Option (chemical data cartridge) Replication support Oracle9iAS J2EE services IDBS The ActivityBase Suite – – – Capture, manage and use chemical and biological data in life sciences discovery Manage full range of disparate data types The leading application for drug discovery research worldwide Key Advantages – – – – Integration framework for cheminformatics and bioinformatics data Rich data context enables data quality Supports manual and automated data capture & management Maximizes the value of discovery data www.id-bs.com Oracle Features – – – – – – – Chemistry cartridge (ChemXtra) PL/SQL stored procedures JAVA stored procedures XML Materialized views Data warehousing 9i compatible 3. Manage Vast Quantities of Data Grid support in Oracle 10g Oracle Scales to Petabytes – – Largest life sciences databases run Oracle Oracle 80% market share - IDC 500TB 450TB 400TB 350TB 300TB 250TB Data Storage Today Partitioning Divide and conquer Oracle 10g Application Server – Provide scalability for middle tier Oracle Data Guard – Protect data from human or system failures 150TB 100TB 50TB 1994 1995 1996 1997 1998 Oct-1999 Apr-2000 Nov-2001 Jan-01 2002 2003 2004 2005 2006 – 200TB 0 3. Manage Vast Quantities of Data Support for Grid Distributed queries, External Tables, Security, RAC Grid Access to Oracle Utilities through Globus Resource Allocation Manager (GRAM) – Export, Import, SQLPlus Grid Access to Oracle 10g Database – Invoke PL/SQL routines specified in Globus Resource Specification Language Grid Resource Information Service (GRIS) for Oracle Database – Discover & monitor Oracle databases 3. Manage Vast Quantities of Data • Real Application Clusters (RAC) – Start with one server, one database and grow as you grow – Linear scalability out of the box – Save on Hardware and Storage costs Data Loads Proteomics Portal Sample/Lab – Works with ALL applications – Fail-over transparent to users – Easy to administer High-speed interconnect A-Z Oracle Real Application Clusters Works for All Applications Oracle 1. Add new node 2. Start instance on new node No Code Change Oracle Real Application Clusters Greater Than 85% Scalability 100% 80% 60% 40% 20% 0% 1 Node 2 Nodes 4 Nodes 8 Nodes 16 Nodes Genentech, Inc. Leading biotech company – – – Over 2 TBs of data in Oracle – Oracle 9i database Oracle serves as a centralized – Real Application Clusters information resource for gene searching and database cross Oracle9i Real Application referencing. Clusters provide the Oracle used for the entire foundation for the scalable pipeline from research to clinical and highly available data to manufacturing and sales applications. database infrastructure we Key Advantages of Oracle – – – Oracle Environment Improved performance Greater reliability Genentech's corporate goal is 99.999% availability in a 24x7 environment require to meet our growing data demands in all areas of our business." --Scooter Morris, Genentech, Inc. The Dragon Genomics Center of Takara Bio Inc. The Dragon Genomics Center of Takara Bio Inc., specializing in large-scale sequencing, is among the highest speed genome-analyzing centers in Asia. High-Level Project Goals – – – – Oracle Database Enterprise Manage data throughout every Edition step of a complicated process – Oracle9iAS Enterprise Edition Create a laboratory information management system (LIMS) enabling large scale sequencing "We trust Oracle in its ability to run terabyte-class databases in Provide reliable back up and clustered environments with recovery of vast amounts of data Key Benefits – – Oracle Environment Provided easy access and management for vast amounts of data Ensured scalability needed to accommodate future growth high availability. And we're pleased to say that Oracle has not disappointed us. " -- Toru Suzuki, Project Manager, Dragon Genomics Center, Takara Bio Inc. Bioinformatics Center Institute for Chemical Research Kyoto University The Bioinformatics Center Institute for Chemical Research Kyoto University is leading biotechnology research thanks to its comprehensive studies in various areas, including the life sciences, information sciences, chemistry and physics. “In order to manage this massive amount of genetic information and to operate efficiently, it is essential to have a platform with paramount stability. Our web site receives accesses from all over the world continuously, 24 hours a day. In order to offer the latest information under such circumstances, performance is also an issue. In this sense, the Oracle Database was the most appropriate since it can handle this enormous amount of data in a fast and stable manner, 24 hours a day.” – Professor and Director Minoru Kanehisa, Bioinformatics Center Institute for Chemical Research Kyoto University 4. Find Patterns and Insights Oracle Data Mining – Find relationships and clusters associated with healthy and diseased states Naïve Bayes, Adaptive Bayes Networks, Attribute Importance, Association Rules, K-Means, O-Cluster, SVM, NMF algorithms Data Mining for Java (DM4J) GUI wizards and results browser Oracle Discoverer & Oracle OLAP – Interactive query & drill-down Statistical functions – Perform basic statistics in Oracle e.g. summary statistics, e.g. mean, stdev, median, quantiles, hypothesis testing, distribution fitting, correlations, linear regression Oracle Text & Text Mining – Classify & cluster documents relevant to area of interest Table Functions – Implement complex algorithms within the database 4. Find Patterns and Insights Life Sciences data Deductive Analysis Functional Genomic Databases Clinical Databases Proteomics Database Pharmacological databases Answer complex questions about the relationships in genomic, clinical and pharmacological data Inductive Analysis Finding relationships for classification, class discovery and prediction