Download Life Sciences Platform

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Relational model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Oracle Database wikipedia , lookup

Transcript
Session id: 40263
Oracle Life Sciences Platform
and 10g Preview
Charlie Berger
Sr. Director of Product Management, Life Sciences
and Data Mining
[email protected]
Oracle Corporation
Welcome to the
Oracle Life Sciences
User Group Meeting
Oracle HQ
Bldg 350 Conference Center
Redwood Shores, CA
September 10th, 2003
8:30 am-7:30 pm
Oracle Life Sciences Day &
User Group Meeting Agenda
8:00-8:30
8:30-8:45
8:45-9:45
Breakfast
Welcome
Oracle's Platform for Life Sciences - New 10G Features Preview
& Solicitation Process for Features in Next Release
Charlie Berger, Oracle Corporation
9:45-10:30 New In Silico Drug Discovery Integrated Demo
Joyce Peng, Oracle Corporation
10:30-10:50 Break
10:50-11:30 European Bioinformatics Institutes (EBI), Peter Stoehr
Managing Scientific Literature (Medline) and XML Data Within
Oracle
11:30-12:10 The Wellcome Trust Sanger Institute, Martin Widlake
Implementing a Terascale Data Store (20 TB)
12:10-1:00 Lunch & Wish List Feature Post-it Notes
1:00-1:40 Wyeth Research, Peter Smith
21 CFR PART 11 via Oracle Auditing at Wyeth
Oracle Life Sciences Day &
User Group Meeting Agenda
1:40-2:20
2:20-3:00
3:00-3:20
3:20-4:00
4:00-4:40
4:40-5:20
5:20-5:30
5:30-6:30
6:30-7:30
Sequence Search Capabilities in the Database, Myriad
Proteomics
Johnson & Johnson, Richard Guida & Rajesh Shah
Building a Secure Infrastructure with Oracle in Life Sciences,
J & J PKI and Secure Connectivity to Oracle
Break & Afternoon Refreshments
Kyoto University, Japan, Susumu Goto
Integrating Biological Information and Pathways using Oracle,
KEGG at Kyoto University
BioMed Central Limited, Matthew Cockerill
Managing Scientific Images with Oracle - Multimedia Database
Improves the Bottom Line
Abbott Laboratories, Shon Naeymirad
Electronic Records, 21 CFR Part 11 and Oracle 9i
Break
ISV Lightening Rounds, Life Sciences ISV Partners
ISV Reception and Demo Grounds
Oracle’s Commitment
"My industry is going to become pretty boring soon –
I don't believe you'll ever see this proliferation of informatics
companies or computer companies like you saw
in the decade of the Nineties. The life sciences industry
is where the horizons are wide open. There'll be lots and lots
of companies born, lots of new products, lots of new science
at least for the next 50 years.
Because of that...we've decided to focus heavily
on the life sciences industry.”
-Larry Ellison, CEO, Oracle Corporation,
Bio-IT World magazine, premier issue March 2002
Life Sciences Value Chain
Public/
Private Data
Discovery
Development
Sample
Data
Biotech /
Pharmaceutical
Research Labs
Biomedical
Firm
Pharmaceutical
Company
Pharmaceutical
Company
Pharmaceutical
Mfg. Plant
Biomedical
Firm
Regulatory
Agency
Contract
Research
Organization
Distribution
Pharmacy
Manufacturing, Sales
and Marketing
Hospital
Oracle’s Solutions for Life
Sciences
Discovery
Discovery
Finance
Sales &
Marketing
HR Projects
Development
& Clinical
Maintenance
Manufacture/
Supply Chain Management
Database
Manage all your data
Application
Server
Run all your applications
Drug Discovery Economics 101
Better Data Management Accelerates Discovery
Competition
from Generics
Goal: Accelerate the Discovery Process
Revenue
Sales Revenue
15
Costs
RR
&&
DD
Costs
20
Years
Product
Launch
Costs
Identify
Clinical
Identify Identify
Identify Pre- Preand
and
Clinical
Trials
and
and Trails Clinical
Validate
Validate
Targets
Leads
Validate
Validate
Trails
Targets
Patent
Expiry
Clinical
Trials
Leads
Source: Ernst & Young, Price Waterhouse
Life Sciences Discovery
Genes and Proteins Run the Cell
Organism
Cell
Nucleus
Chromosome
Protein
Gene (mRNA)
Graphics courtesy of the National Human Genome Research Institute
Gene (DNA)
Life Sciences Challenge
Correlate Biological and DNA Variation
3.2 billion letters of human DNA ~ 2 million variation points (SNPs)
SNP = Single Nucleotide Polymorphism
a at t g g aa g c a aa t g a ca t c a ca g c a gg t c a ga g a a aa a g g gt t g a gc g g c ag g c a cc c a
g ag t a g ta g g t ct t t g gc a t t ag g a g ct t g a gc c c a ga c g g cc c t a gc a g g ga c c c ca g c
g cc c g a ga g a c ca t g c ag a g g tc g c c tc t g g aa a a g gc c a g cg t t g tc t c c aa a c t tt t t
t tc a g c tg g a c ca g a c ca a t t tt g a g ga a a g ga t a c ag a c a gc g c c tg g a a tt g t c ag a c
a ta t a c ca a a t cc c t t ct g t t ga t t c tg c t g ac a a t ct a t c tg a a a aa t t g ga a a g ag a a
agaatttcat
at[T/C]gtg
gaagaggac
t gg g a t ag a g a gc t g g ct t c a aa g a a aa a t c ct a a a ct c a t ta a t g cc c t t cg g c g at g t
t tt t t c tg g a g at t t a tg t t c ta t g g aa t c t tt t t a ta t t t ag g g g aa g t c ac c a a ag c a
g ta c a g cc t c t ct t a c tg g g a ag a a t ca t a g ct t c c ta t g a cc c g g at a a c aa g g a gg a a
c gc t c t at c g c ga t t t at c t a gg c a t ag g c t ta t g c ct t c t ct t t a tt g t g ag g a c ac t g
c tc c t a ca c c c ag c c a tt t t t gg c c t tc a t c ac a t t gg a a t gc a g a tg a g a at a g c ta t g
t tt a g t tt g a t tt a t a ag a a g ac t t t aa a g c tg t c a ag c c g tg t t c ta g a t aa a a t aa g t
a tt g g a ca a c t tg t t a gt c t c ct t t c ca a c a ac c t g aa c a a at t t g at g a a gg a c t tg c a
t tg g c a ca t t t cg t g t gg a t c gc t c c tt t g c aa g t g gc a c t cc t c a tg g g g ct a a t ct g g
g ag t t g tt a c a gg c g t ct g c c tt c t g tg g a c tt g g t tt c c t ga t a g tc c t t gc c c t tt t t
c ag g c t gg g c t ag g g a ga a t g at g a t ga a g t ac a g a ga t c a ga g a g ct g g g aa g a t ca g t
g aa a g a ct t g t ga t t a cc t c a ga a a t ga t t g aa a a t at c c a at c t g tt a a g gc a t a ct g c
t gg g a a ga a g c aa t g g aa a a a at g a t tg a a a ac t t a ag a c a aa c a g aa c t g aa a c t ga c t
c gg a a g gc a g c ct a t g tg a g a ta c t t ca a t a gc t c a gc c t t ct t c t tc t c a gg g t t ct t t
g tg g t g tt t t t at c t g tg c t t cc c t a tg c a c ta a t c aa a g g aa t c a tc c t c cg g a a aa t a
t tc a c c ac c a t ct c a t tc t g c at t g t tc t g c gc a t g gc g g t ca c t c gg c a a tt t c c ct g g
g ct g t a ca a a c at g g t at g a c tc t c t tg g a g ca a t a aa c a a aa t a c ag g a t tt c t t ac a a
a ag c a a ga a t a ta a g a ca t t g ga a t a ta a c t ta a c g ac t a c ag a a g ta g t g at g g a ga a t
g ta a c a gc c t t ct g g g ag g a g gg a t t tg g g g aa t t a tt t g a ga a a g ca a a a ca a a a ca a t
a ac a a t ag a a a aa c t t ct a a t gg t g a tg a c a gc c t c tt c t t ca g t a at t t c tc a c t tc t t
g gt a c t cc t g t cc t g a aa g a t at t a a tt t c a ag a t a ga a a g ag g a c ag t t g tt g g c gg t t
g ct g g a tc c a c tg g a g ca g g c aa g a c tt c a c tt c t a at g a t ga t t a tg g g a ga a c t gg a g
c ct t c a ga g g g ta a a a tt a a g ca c a g tg g a a ga a t t tc a t t ct g t t ct c a g tt t t c ct g g
a tt a t g cc t g g ca c c a tt a a a ga a a a ta t c a tC TT t gg t g t tt c c t at g a t ga a t a ta g
t ac a g a ag c g t ca t c a aa g c a tg c c a ac t a g aa g a g ga c a t ct c c a ag t t t gc a g a ga a a
g ac a a t at a g t tc t t g ga g a a gg t g g aa t c a ca c t g ag t g g ag g t c aa c g a gc a a g aa t t
Graphics courtesy of the National Human Genome Research Institute
Life Sciences Challenge
Correlate Diseases, Genes and Environment
Stroke
Breast cancer
Diabetes
Schizophrenia
Manic-depression
Myocardial Infarction
Hypertension
Obesity
Hyperlipidemia
Inflammatory Bowel Disease
Graphics courtesy of the National Human Genome Research Institute
Life Science Challenge
Exploding Volumes of Data
500TB
450TB
400TB
350TB
300TB
250TB
Data Storage
Today
200TB
150TB
100TB
50TB
1994
1995
1996
1997
1998
Oct-1999
Apr-2000
Nov-2001
Jan-01
2002
2003
2004
2005
2006
“To meet the scientific
goals we believe we
need to add around
80 - 100TB of storage
each year for the next
5 years”
0
P. Butcher,
The Sanger Centre
Life Science Challenge
Many Different Kinds of Data
Genomics
Proteomics
Modeling
Pathways
Clinical
Pharmacogenomics
Functional
Genomics
Graphic modified from original courtesy of Sun Microsystems
Cheminformatics
Life Science Challenge
Just A Few Biological Databases
Life Science Challenge
Typical Research Environment
Public
Access
Databases
heterogeneous
data
Manage
vast
Local
quantities
of
Databases
data
Collaborate
securely
Industrial
Research Lab
Local
Copies
Integrate
Find Patterns
a variety
and
of data Private/Service
insights
Databases
types
Access
Partner or
heterogeneous
Collaborator
Data
Oracle Vision :
At the core is a data management platform
Run All Your
Applications
Manage All
Your Data
Browser
Mobile Device
Clients
Oracle10g
App Server
Oracle10g
Database
Server
Introducing Oracle 10g
 Runs all your applications
 Stores all your information
 Highly scalable, available,
reliable
 Secure
 Easy to manage
– Make individual
systems self-managing
– Manage thousands of
servers at once
Oracle’s Platform for Life Sciences
Genomics
Proteomics
Cheminformatics
Pathways
Clinical
1.
2.
3.
4.
5.
Access heterogeneous data
Integrate a variety of data types
Manage vast quantities of data
Find patterns and insights
Collaborate securely
Oracle Life Sciences Platform
Access
heterogeneous
data
Integrate
a variety
of data
types
Manage
vast
quantities of
data
Find Patterns
and
insights
Collaborate
securely
Access
heterogeneous
Data
Oracle Life Sciences Platform
Transparent Gateways
Fast access using Oracle OCI
e.g.
PubMed
MySQL
GenBank
e.g.
Distributed Queries
searches across
domains
Manage
Access Perform
Collaborate
External Tables
Generic Gateways
vast
Ability to index and
Access any data using
ODBC
heterogeneous
securely
query external files
quantities
Realof
Application Clusters
UltraSearch
data
Linear scalability
Search external sites
Oracle Portal
data
& repositories
Build personalized portals
Application Server
Provide scalability for the
middle tier
e.g.
XML DB
Security
Collaboration Suite
Flexibly manage data
Enforce security
interMedia
Auditing
Store & manage images
Create audit trail to facilitate
FDA compliance
Collaborate securely
Workflow
SwissProt
SP-ML
Automate laboratory
& business processes
MySQL Toolkit
Easily move MySQL
data into Oracle
iFS/Files
Share documents
Access
Integrate Extensibility
Mining
Find Patterns Data heterogeneous
SQL
a Loader
variety Framework
Discover patterns & insights
Transportable
(Data cartridges), manage
High performance data loader
and
Statistics
scientific data
Tablespaces
Data
of datacomplex
Web Services
Perform basic statistics
LOBs
Rapidly exchange tables
insights
Standard communication
Manage unstructured data
Table Functions Oracle Streams
types
between applications
Implement complex algorithms
O
Cl
Cl
Merge/Upsert
Enabling update and insert
in one step
Text
Index & query text,
e.g. literature
searches
OLAP & Discoverer
Interactive query & drill-down
Rule-based subscription for
information sharing
Oracle Life Sciences Platform
Transparent Gateways
Fast access using Oracle OCI
e.g.
PubMed
MySQL
GenBank
e.g.
Distributed Queries
Perform searches across domains
External Tables
Generic Gateways
Ability to index and
query external files
Access any data using ODBC
Real Application Clusters
Oracle Portal
Build personalized portals
Application Server
Provide scalability for the
middle tier
e.g.
SwissProt
SP-ML
SQL Loader
High performance data loader
Web Services
Standard communication
between applications
Merge/Upsert
Enabling update and insert
in one step
Linear scalability
XML DB
Security
Collaboration Suite
Flexibly manage data
Enforce security
interMedia
Auditing
Store & manage images
Create audit trail to facilitate
FDA compliance
Collaborate securely
Workflow
Extensibility
Framework
Automate laboratory
& business processes
O
Cl
Cl
(Data cartridges), manage
complex scientific data
MySQL Toolkit
Easily move MySQL
data into Oracle
Share documents
Data Mining
Discover patterns & insights
Statistics
Perform basic statistics
Manage unstructured data
Table Functions
Index & query text,
e.g. literature
searches
Search external sites
& repositories
iFS/Files
LOBs
Text
UltraSearch
Transportable
Tablespaces
Rapidly exchange tables
Oracle Streams
Implement complex algorithms Rule-based subscription for
information sharing
OLAP & Discoverer
Interactive query & drill-down
1. Access Heterogeneous Data
UltraSearch
External Sites
Distributed query
Flat files
External
Table
Sybase
MySQL
Generic
Connectivity
MySQL Migration
Toolkit
DBlinks
Transportable
Tablespaces
DB2
Transparent Transparent
Gateway
Gateway
1. Access Heterogeneous Data
Flat files
 Oracle Transparent
Gateways
–
Integrate data from disparate
systems
 Generic Connectivity
–
ODBC/JDBC connectivity
 External Tables
–
Access data from flat files
 Distributed Queries
–
Query across multiple Oracle and
heterogeneous data sources
 Transportable
tablespaces
–
Rapidly move tablespaces
between Oracle databases
MySQL
 SQL*Loader
–
High performance data loader
 Oracle Streams
–
Rule-based subscription for
information sharing
 Dblinks
–
Connectivity between databases
 UltraSearch
–
Query range of data repositories
(web sites, files, email, databases,
etc.)
 Migration Toolkits
–
Tools to facilitate movement of data
into Oracle
 Merge / Upsert
–
Update and insert in one step
2. Integrate a Variety of Data Types
Genomics
Proteomics
Modeling
Pathways
Clinical
Pharmacogenomics
Functional
Genomics
Graphic modified from original courtesy of Sun Microsystems
Cheminformatics
2. Integrate a Variety of Data Types
 XML DB
–
–
Unite XML content and relational data
SQL & XML become one
 LOBs
–
Manage unstructured data
 Internet File System (Oracle Files)
–
Manage files and folders
 Text
–
Index and query of text content & documents (Word,
Powerpoint, HTML, Adobe PDFs, etc.)
 interMedia
–
Manage audio, video and image data
XML
European Bioinformatics
Institute (EBI)
 Hosts major public databases (e.g. SwissProt,
EMBL Nucleotide Sequence Database,
Medline) on Oracle. (Total: > 5 TB)
 Uses Oracle XML DB and Oracle Text for
Medline – in development.
–
Size: 11 million records, 200 GB
 Uses Oracle9i Database and Application
Server.
2. Integrate a Variety of Data Types
Extensibility Framework (Data Cartridges)
- Manage complex scientific data
Oracle9i
Server
Chemical Searching
 Chemistry searching requires special
techniques
–
Chemical name is not unique
Chemical Searching
 Chemistry searching requires special
techniques
“Viagra®”
–
Chemical name is not unique
Chemical Searching
 Chemistry searching requires special
techniques
“Viagra®”
–
Chemical name is not unique
“sildenafil citrate”
Chemical Searching
 Chemistry searching requires special
techniques
“Viagra®”
–
Chemical name is not unique
–
Chemists think graphically
“sildenafil citrate”
H
H
O
H
H
O
N
N
N
H
H
N
N
N
S
H
O
H
O
H
Chemical Searching
 Chemistry searching requires special
techniques
–
Chemical name is not unique
–
Chemists think graphically
“Viagra®”
“sildenafil
citrate”
H
H
H
 The solution:
O
H
O
N
N
N
– A graphical
user interface
H
O
operators such as substructure
search (“sss”) = a chemical “contains”
O
Cl
Cl
N
S
H
finds
N
N
–Specialized
H
H
O
H
MDL Information Systems, Inc.
 MDL Discovery Framework
A multi-tier system for managing and
integrating discovery data and
workflows
– Domain-specific application and
database services and API
– Chemistry rules, drawing, and
rendering
– Single application access to
multiple DBs and services
 Key Advantages
–
–
–
Integrate data sources across
R&D
Easily create web or client
solutions
Quickly adopt new tools and
methods for development
 www.mdl.com
 Oracle Features
–
–
–
Oracle 8i/9i Database
Extensibility Option (chemical
data cartridge)
Replication support
Oracle9iAS J2EE services
IDBS
 The ActivityBase Suite
–
–
–
Capture, manage and use
chemical and biological data in life
sciences discovery
Manage full range of disparate
data types
The leading application for drug
discovery research worldwide
 Key Advantages
–
–
–
–
Integration framework for
cheminformatics and

bioinformatics data
Rich data context enables data
quality
Supports manual and automated
data capture & management
Maximizes the value of discovery
data
 www.id-bs.com
Oracle Features
–
–
–
–
–
–
–
Chemistry cartridge (ChemXtra)
PL/SQL stored procedures
JAVA stored procedures
XML
Materialized views
Data warehousing
9i compatible
3. Manage Vast Quantities of Data
 Grid support in Oracle 10g
 Oracle Scales to Petabytes
–
–
Largest life sciences databases run
Oracle
Oracle 80% market share - IDC
500TB
450TB
400TB
350TB
300TB
250TB
Data Storage
Today
 Partitioning
Divide and conquer
 Oracle 10g Application Server
–
Provide scalability for middle tier
 Oracle Data Guard
–
Protect data from human or system
failures
150TB
100TB
50TB
1994
1995
1996
1997
1998
Oct-1999
Apr-2000
Nov-2001
Jan-01
2002
2003
2004
2005
2006
–
200TB
0
3. Manage Vast Quantities of Data
Support for Grid
 Distributed queries, External Tables,
Security, RAC
 Grid Access to Oracle Utilities
through Globus Resource Allocation
Manager (GRAM)
–
Export, Import, SQLPlus
 Grid Access to Oracle 10g
Database
–
Invoke PL/SQL routines specified in
Globus Resource Specification
Language
 Grid Resource Information Service
(GRIS) for Oracle Database
–
Discover & monitor Oracle databases
3. Manage Vast Quantities of Data
• Real Application Clusters (RAC)
– Start with one server, one database and grow
as you grow
– Linear scalability out of the box
– Save on Hardware and Storage costs
Data
Loads
Proteomics Portal
Sample/Lab
– Works with ALL
applications
– Fail-over transparent
to users
– Easy to administer
High-speed
interconnect
A-Z
Oracle Real Application Clusters
Works for All Applications
Oracle
1. Add new node
2. Start instance on new node
No Code Change
Oracle Real Application Clusters
Greater Than 85% Scalability
100%
80%
60%
40%
20%
0%
1 Node
2 Nodes
4 Nodes
8 Nodes
16 Nodes
Genentech, Inc.
 Leading biotech company
–
–
–
Over 2 TBs of data in Oracle
– Oracle 9i database
Oracle serves as a centralized
– Real Application Clusters
information resource for gene
searching and database cross Oracle9i Real Application
referencing.
Clusters provide the
Oracle used for the entire
foundation for the scalable
pipeline from research to clinical
and highly available
data to manufacturing and sales
applications.
database infrastructure we
 Key Advantages of Oracle
–
–
–
 Oracle Environment
Improved performance
Greater reliability
Genentech's corporate goal is
99.999% availability in a 24x7
environment
require to meet our
growing data demands in
all areas of our business."
--Scooter Morris,
Genentech, Inc.
The Dragon Genomics Center
of Takara Bio Inc.
The Dragon Genomics Center of Takara Bio Inc., specializing in large-scale
sequencing, is among the highest speed genome-analyzing centers in Asia.
 High-Level Project Goals
–
–
–
– Oracle Database Enterprise
Manage data throughout every
Edition
step of a complicated process
– Oracle9iAS Enterprise Edition
Create a laboratory information
management system (LIMS)
enabling large scale sequencing  "We trust Oracle in its ability to
run terabyte-class databases in
Provide reliable back up and
clustered environments with
recovery of vast amounts of data
 Key Benefits
–
–
 Oracle Environment
Provided easy access and
management for vast amounts of
data
Ensured scalability needed to
accommodate future growth
high availability. And we're
pleased to say that Oracle has
not disappointed us. "
-- Toru Suzuki, Project Manager,
Dragon Genomics Center, Takara
Bio Inc.
Bioinformatics Center Institute for
Chemical Research Kyoto University
The Bioinformatics Center Institute for Chemical Research Kyoto University is leading
biotechnology research thanks to its comprehensive studies in various areas, including the life
sciences, information sciences, chemistry and physics.
“In order to manage this massive amount of genetic
information and to operate efficiently, it is essential to have
a platform with paramount stability. Our web site receives
accesses from all over the world continuously, 24 hours a
day. In order to offer the latest information under such
circumstances, performance is also an issue. In this sense,
the Oracle Database was the most appropriate since it can
handle this enormous amount of data in a fast and stable
manner, 24 hours a day.”
– Professor and Director Minoru Kanehisa, Bioinformatics Center Institute for
Chemical Research Kyoto University
4. Find Patterns and Insights
 Oracle Data Mining
–
Find relationships and clusters
associated with healthy and diseased states
 Naïve Bayes, Adaptive Bayes Networks, Attribute Importance,
Association Rules, K-Means, O-Cluster, SVM, NMF algorithms
 Data Mining for Java (DM4J) GUI wizards and results browser
 Oracle Discoverer & Oracle OLAP
–
Interactive query & drill-down
 Statistical functions
–
Perform basic statistics in Oracle
 e.g. summary statistics, e.g. mean, stdev, median, quantiles, hypothesis testing,
distribution fitting, correlations, linear regression
 Oracle Text & Text Mining
–
Classify & cluster documents relevant to area of interest
 Table Functions
–
Implement complex algorithms within the database
4. Find Patterns and Insights
Life Sciences data
Deductive Analysis
Functional
Genomic
Databases
Clinical
Databases
Proteomics
Database
Pharmacological
databases
Answer complex
questions about the
relationships in
genomic, clinical and
pharmacological data
Inductive Analysis
Finding relationships
for classification,
class discovery and
prediction