Download Courtesy Affymetrix Inc. - Oracle Software Downloads

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Session : 40382
Life Sciences:
Data Revolution
Building Gene Expression Databases
Mahendra Navarange
Microarray Centre
MRC Clinical Sciences Centre and Imperial College, UK
Agenda
What is Life Science?
MiMiR : database for gene expression data
Data acquisition process and data characteristics
System requirements
Design issues
Code snippets
What is Life Sciences ?
Includes
Biology
BioTechnology
Chemistry
Pharmaceuticals
Agriculture / Plant Science
Environmental Sciences
????
Objective
Understand the molecular and evolutionary basis
of living organisms
Focus Areas
Genomics
Human Genome Project
Draft published in 2000
Finished version on 14 April
2003
Sequencing data doubles every
year
Transcriptomics
Study of transcription (gene
expression)
Proteomics
Study of translation (protein
synthesis)
Courtesy F. Hoffmann-La Roche Ltd.
Data…Data…Data
Sanger Centre 5TB
Celera ~ 100TB+ (2001)
700
600
500
TB
400
300
200
100
0
1999
2000
2001
2002
2003
2004
2005
2006
2007
Data Revolution in Life Sciences
Impact of technology
High throughput platforms (HTP)
– Robotics
– Miniaturisation
Data driven science
Datawarehousing technologies
Data mining and visualisation software
Life Sciences
Information
Technology
Databases
Genomics
Sanger
NCBI
TIGR
KEGG
Transcriptomics
ArrayExpress
Proteomics
Protein Databank (PDB)
SWISSPROT
Entrez
Using Life Sciences Data
identify causes of genetic
diseases
discover new drug
compounds
personalised medicine
develop new diagnostics
Drug Discovery Pipeline
Target
Identification
Target
Validation
HTP
Screening
Hits
Leads
Clinical
Leads
Trials
FDA
Life Sciences : The Future
“…..biology is changing from a purely
laboratory-based science to an information
based science.”
Eric Lander,
Director, Whitehead Institute MIT
Agenda
 What is Life Sciences ?
MiMiR: database for gene expression data
Data acquisition process and data characteristics
System requirements
Design issues
Code snippets
Transcriptomics
Comparing gene expression
across databases
Collaborate to share expertise
Benefits
Diagnostics
Screen target drug
compounds
Identify toxic side effects
Screen patients for clinical
trials
Literature
Experiment
design
Further
Analysis
GO
Workflow
HTP
Local DB
NCBI
Data
Preliminary
Analysis
Collaboration
HTP Microarray Platform : Hardware
Courtesy Affymetrix Inc., Dell Inc
Microarray Data Acquisition
Courtesy Fisher Scientific
Courtesy Affymetrix Inc.
Microarray Data
High density
microarray
~ 500,000 spots of
~18 µm size
>20,000 genes
Typical file size 45MB
No. of files produced
in typical experiment
10-20.
Courtesy Affymetrix Inc.
Life Sciences Data Explosion
Data Characteristics
Image data generated by HTP platforms,
annotation by researchers
Large volume and size
Varied data types
Datawarehousing challenges
Non-summarisable
High dimensionality
Limited knowledge of underlying biological
processes
No standard industry data models or best practices
Agenda
 What is Life Sciences ?
MiMiR: database for gene expression data
 Data acquisition process and data characteristics
System requirements
Design issues
Code snippets
System Requirements
Seamless data integration
Handle wide range of datatypes
Processor intensive and I/O intensive
Exponential growth in data storage
Open architecture, collaboration
System Requirements
Rapid changes – new databases,
technologies and instruments
Competitive pressures, quick response,
low access times
Plug and play capability
Security
MIcroarray Data MIning Resource
MiMiR – Microarray Datawarehouse
~250GB. Expected to double in next few
months
~2500 images, over 1500 BioAssays
52 tables, largest table 15GB
Infrastructure
Oracle 9i Release 1 on Windows 2000
Dell PowerEdge Quad Processor, 2 GB
memory, 400 GB hard disk
1 TB NAS capacity
Requirements vs. Solutions
Integrate different types of data sources
Use of XML for data exchange
Use of Oracle UltraSearch
Efficient data retrieval
Stringent response time standards on procedures
Indexed Organised Tables, Partitioning
Security
Firewall
Single Sign-On servers (in progress)
Rapid change management
BC4J framework, Jdeveloper
Extreme programming, prototyping
MiMiR System Architecture
Ext Ref
Images
Annotation
Blast
MiMiR
MAGE-ML
Spot Info
JDeveloper
XSQL
ArrayExpress
9iAS Admin
Application
Server
XSU
XDK
Private
BC4J
JSP
JClient
Oracle Products Used
Oracle 9i Database Server/Client (Release1)
Partitioning
Join indexing
Oracle 9i JDeveloper (9.0.2)
Oracle 9i Application Server (BC4J)
Oracle XML features
Oracle PL/SQL packages for XML
Oracle XSQL publishing framework
XDK (DOMParser and SAXParser)
XSU
Oracle Data Mining (Future)
Oracle Collaboration Suite (Future)
Why Oracle ?
Readily scalable
Manage wide variety of data types
Integrated development tools
Support XML and Java
High performance middleware
Secure collaboration
Agenda
 What is Life Sciences ?
MiMir : database for gene expression data
 Data acquisition and profiling
 System requirements
Design issues
Code snippets
Oracle and XML :Design Issues
Storage
Storing XML in tables
Storing XML in CLOBs
Hybrid
Generation
XDK for Java, PL/SQL
XSU
Transformation
XSL Stylesheet
Views
Processing
XDK DOMParser
XDK SAXParser
Searching
XPATH
Oracle Text
Publishing
XSQL publishing
framework
XSL
Oracle and XML : XSQL Example
<?xml version="1.0" encoding='windows-1252'?>
<!-| Uncomment the following processing instruction and
replace
| the stylesheet name to transform output of your XSQL Page
using XSLT
<?xml-stylesheet type="text/xsl" href="YourStylesheet.xsl" ?>
-->
<?xml-stylesheet type="text/xsl" href="mimirArray.xsl"?>
<xsql:query connection="micro" xmlns:xsql="urn:oraclexsql">
select * from array
</xsql:query>
Oracle and XML: Design Issues
Agenda
 What is Life Sciences ?
MiMir : database for gene expression data
 Data profiling
 System requirements
 Design issues
Code snippets
An Example
Creating XML from 500,000
records in the database
Solution 1
Using XSU Java API to get XMLDOM.
1) conn=createConnection.createConnection();
2) String query = "SELECT * FROM IMAGE_QUANTITATION i "+
"WHERE QUANT_FILENAME = 'PMB2002011001Aaa'";
3) OracleXMLQuery q1 = new OracleXMLQuery(conn,query);
4) q1.keepCursorState(true);
5) XMLDocument xmlDoc=(XMLDocument)q1.getXMLDOM();
6) XMLDocument.print(out);
Solution 2
Using XSU Java API to get
XMLString.
1) conn=createConnection.createConnection();
2) String query = "SELECT * FROM IMAGE_QUANTITATION i "+
"WHERE QUANT_FILENAME = 'PMB2002011001Aaa'";
3) OracleXMLQuery q1 = new OracleXMLQuery(conn,query);
4) q1.keepCursorState(true);
5) # XMLDocument xmlDoc=(XMLDocument)q1.getXMLDOM();
6) # XMLDocument.print(out);
7) System.out.println(q1.getXMLString());
Solution 3
Using dbms_xmlquery package to get
XML output from SQL
Select dbms_xmlquery.getXML(‘select * from
IMAGE_QUANTITATION where
quant_filename=‘’PMB2002011001Aaa’’’) from dual
<?xml version = '1.0'?>
<ROWSET>
<ROW num="1">
<IMAGE_ID>PMB2002011003Aaa</IMAGE_ID>
<CHIP_TYPE>MG-U74Av2</CHIP_TYPE>
<ELE_SET_NAME>AFFX-MurIL2_at</ELE_SET_NAME>
<POSITIVE>2</POSITIVE>
<NEGATIVE>5</NEGATIVE>
<PAIRS>20</PAIRS>
<PAIRS_USED>20</PAIRS_USED>
<PAIRS_IN_AVG>19</PAIRS_IN_AVG>
Summary
Life sciences is generating enormous
amount of data using HTP
The data is non-summarisable,
distributed and has varied data types
Data integration and secure
collaboration is key to success
MiMiR
Acknowledgements
Dr. Helen Causton
Prof. Tim Aitman
Dr. Laurence Game
Vihar Wadekar
Helen Figueira
Helen Banks
Nicola Cooley
MGED Data Society
(www.mged.org)
Session : 40382
Life Sciences:
Data Revolution
Building Gene Expression Databases
What Next :
Opportunities for collaboration for development
of Knowledge Management Systems for
Drug Discovery
Contact: [email protected]
http://microarray.csc.mrc.ac.uk