Download Data Management and Analysis issues in Microarray Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Management and
Analysis issues in
Microarray Data
Aditya Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
1
Roadmap





Microarray technology basics
Gene expression data analysis
Microarray data management
GeneChip Analysis Core at Washington
University
Function Express (at WashU)
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
2
Microarray Technology Basics
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
3
Elementary Concepts
Cell -> Chromosome -> DNA -> mRNA ->
Proteins -> Function



Every cell of the body contains a full set of
chromosomes and identical genes
Only a fraction of these genes are “switched on” or
“expressed”
Gene expression is a highly complex and regulated
process
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
4
Life Scientists Want to…

Identify genes that are involved in various
diseases.


Reveal new patterns of coordinated gene
expression



Find differentially expressed genes (“targets”)
Find co-regulated genes
Find genes responsible for “biological pathways”.
Uncover new categories of genes
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
5
DNA Microarrays


Microarrays allow biologists to analyze
expression of hundreds of genes within a cell in
a single experiment quickly and efficiently
Microarrays can be used to find gene
expression within a single sample or compare
gene expression from two different tissue
samples – healthy and diseased tissue
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
6
DNA Microarrays:
Technical Foundation




A set of unique probes (usually short, single-stranded
DNA sequences) are immobilized as single spots on a
solid surface (chemically modified glass chips)
mRNA is extracted from cell or tissue samples.
cDNA target is generated from the mRNA sample. This
is labeled with fluorescent or radioactive dye (cy5 and
cy3).
The target is incubated with the array, and each probe
will bind its complementary target molecule if present.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
7
An example
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
8
A DNA Microarray Experiment





Prepare a DNA chip using chosen target DNAs
Generate a hybridization solution containing
mixture of fluorescently labeled cDNAs
Hybridize mixture with DNA chip
Detect cDNA intensity using laser technology
and store data in a computer
Analyze data using computational methods
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
9
Types of Microarrays


Two kind of samples are co-hybridized
on the array (e.g. cDNA arrays)
Only one sample is hybridized and
comparisons are made between arrays
(e.g. Affymetrix oligonucliotide arrays)
Need to deal with different data formats.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
10
Gene Expression Data Analysis
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
11
Issues With Output Data

Data Quality

Detect false positives from true positives




Replicate chips
Use independent methods to validate results
Dye effects
Position effects
Replication is essential
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
12
Preprocessing Tasks

Adjusting data



Filter out genes that are not expressed in any experiments
Log Transform data: replace all data values X by log2(X)
Data Normalization


Intensities are scaled/normalized to a selected chip so that
multiple chips can be compared
Uses data from a set of controls that have been “spiked” into
the DNA and which has an avg. expression ratio of 1.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
13
Analysis Issues

Identify genes that are involved in various
diseases.



Reveal new patterns of coordinated gene
expression



Find differentially expressed genes (“targets”)
e.g. find genes that are overexpressed in 6 out of 7
tumor samples versus 8 out of 10 normal samples
by five-fold or more
Find co-regulated genes
Find genes responsible for “biological pathways”.
Uncover new categories of genes
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
14
Data Mining:
Extracting Meaningful Patterns

Data mining: extracting meaningful patterns


Supervised methods: You have apriori knowledge
of the biological system and are looking for
specific patterns e.g. Neighbourhood analysis,
supervised tree harvesting
Unsupervised methods: Identify patterns that you
couldn’t have necessarily been aware of
beforehand. E.g. Hierarchical clustering, K-means
clustering, SOM, PCA
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
15
Example of Hierarchical Clustering
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
16
Statistical Analysis




Ad hoc approaches (eg. ‘fold change’) do
not consider variability of measurements
Gives more “sensitive” and “selective”
analysis
Provides estimate of confidence that
gene expression pattern observed would
occur
Rank the genes by statistical scores
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
17
Microarray Data Management
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
18
Sharing Gene Expression Data

Goals

Facilitates comparisons between
experiments
Improves analysis
 Confidence in results



Conduct multivariate analysis of data
generated by multiple researchers
Don’t penalize those who share
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
19
Tracking All Aspects of
Microarray Experiments

An array experiment has many steps






RNA preparation
Array fabrication, Array platform
Scanner setting
Image Analysis
Use of integrated laboratory information
management system (LIMS)
Common protocols and language for data sharing

Aditya D Phatak
MIAME: Minimal information about a microarry
experiment (from MGED)
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
20
Sharing Paradigms

What to share




Raw images (TIFF)
Extracted raw spot intensity values with
background measurements
Processed data such as avg. intensity
values
List of genes that show clear differential
expression
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
21
Protection of Intellectual Property


Most array experiments identify dozens of
genes of interest, only a few of which can
be studied by one lab
Some results might provide substantial
intellectual property rights to Pharma
companies
Which data should be shared and when
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
22
GeneChip Analysis Core at
Washington University
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
23
Architecture of GeneChip Core
Image
Format and Upload
Image and Data
Scan
Wash
Gene
Expression
Database
Hybridize
probe to
Array
Control
Experiment
DNA Samples
Aditya D Phatak
Web-based
Data Analysis
Tools
Web/
Application
Server
UniGene
Locus
Link
GO
Gene Annotation Databases
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
24
Function Express
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
25
Why Function Express?


Existing analysis software provide clustering
algorithms
These software lack in gene annotation


It is not possible to visualize genes based on functional
classification, chromosomal localization or tissue expression
-- E.g. Give me genes that are transcription factors, are
expressed in pancreas and are located on chromosome 1p31
Integration of gene annotation with clustering
techniques is vital to understanding the underlying
biological process
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
26
Features of Function Express



Annotates genes on chips/experiment
automatically
Annotation is updated periodically
Allows examination of gene expression
across different experiments conducted
on different arrays and on different
species
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
27
Gene Annotation in
Function Express



Provides annotation from UniGene and
LocusLink and GO databases.
There databases are updated frequently
Uses Homologene database to get crossspecies annotation
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
28
Cross-Species Investigation
Seeing how genes that show differential
expression in one experiment on an
organism (say mice) correlate with genes
from another experiment done in
another organism (say human)


Find more about interesting genes
Validation
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
29
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
30
Microarray Data Schema
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
31
Gene Annotation Schema
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
32
Q1
Q2
Experiment
data
Q3
Annotation
data
View maintenance
using MQO
Append only
Updated frequently
UniGene
Locus
Link
GO
Gene Annotation Databases
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
Update deltas
may or may not be
available
33
Screenshots of Function Express
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
34
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
35

The user enters an experiment name, chips included in the
analysis along with an abscissa value and x-axis label for each
chip in order to create an experiment.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
36

A comparison of raw (left panel) versus mean-standard deviation
centered (right panel) data demonstrates that transformations reveal
similar patterns of gene regulation
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
37

The query generator allows the user to create
virtually any combination of logical queries,
using a simple GUI interface.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
38


The Gene Inspector (A), Gene Annotation (B), Comments (C),
and Chip data Inspector (D) windows are shown.
Each window is updated when the probe selection changes in the
Spreadsheet window.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
39
Function Express Client
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
40
References
1.
2.
DJ Lockhart and EA Winzeler, Genomics, Gene
Expression and DNA Arrays. Nature (2000)
405(6788):827-836.
The Chipping Forecast.
http://www.nature.com/ng/chips_interstitial.html
Nature Genetics published a special issue (January
1999 Supplement), The Chipping Forecast. It's a
collection of more than 10 reviews (60 pages) on
different aspects of microarray analysis.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
41
References…
3.
4.
John Quackenbush, Computational
Analysis of Microarray Data. Nature
Reviews (June 2001) Volume 2
Kathleen Kerr and Gary Churchill,
Statistical design and the analysis of
gene expression microarray data. Genet.
Res., Camb. (2001) 77: 123-128.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
42
References…
5.
6.
Lot of Opinion/review articles from
Nature (June 2001) Volume 2
Microarray Gene Expression Database
Group(MEGD) http://www.mged.org/
Home page for the organization that's
trying to establish a data standard for
microarray data.
Aditya D Phatak
Persistent Systems Pvt. Ltd.
http://www.persistent.co.in
43
Related documents