Download Computing Co-Expression Relationships

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Twin study wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Metagenomics wikipedia , lookup

Oncogenomics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Public health genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene expression programming wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Essential gene wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Genome (book) wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Computing Co-Expression
Relationships
Wen-Dar Lin
Contents
• Motivation
• Basic Idea
• Case Studies
– An Example of Single Experiment
– An Example of Time-Course Experiment
• Potential Applications
• Availability
• Future Works
Motivation
• Given a set of differentially displayed genes
that are reported by an array experiment.
– We would like to know relationships among
these genes.
– These relationships may recover important
modules or motifs with respect to the
experiment.
Motivation
• Co-expression relationships are one kind of
the most biologically meaningful and easily
computable relationships.
– Co-expression relationships form modules that
may infer important biological information.
– They can be computed from a large amount of
publicly available array data.
Basic Idea
• Array data can be retrieved from publicly
available data repository
– like the NASCarrays, NCBI GEO, EMBL-EBI
ArrayExpress
• They should be normalized before
computing the co-expression relationships.
– e.g. normalized by the RMA method
Basic Idea
• Defining co-expression
relationships
– We define that a coexpression relationship
between two genes exists if
the pearson correlation
coefficient between their
normalized expression
levels is greater than or
equal to a certain threshold.
slide #
1
2
3
4
…
gene X
1
2
10
3
…
gene Y
5
2
12
4
…
Y
X
Basic Idea
negative correlation
• Properties of pearson
correlation coefficient
– Let Correl(A, B) be the
pearson correlation
coefficient between
normalized expression
levels of gene A and
gene B.
– 0   Correl(A, B)   1
from http://www.gseis.ucla.edu/courses/ed230bc1/notes1/var1.html
Basic Idea
• The computational assistance
– Given a set of interested genes
– Compute co-expression relationships among
them
– Identify co-expression clusters
Case Studies
• We have implemented aforementioned ideas
into a tool kit and applied it to two case
studies.
– A single experiment
– A time-course experiment
A Single Experiment
• In this example, an array experiment was
performed
– 178 differentially displayed genes were
identified.
– Based on RMA array data of 300 ATH1 slides
downloaded from the NASCarrays
• sample of each slide was derived nonexclusively
from roots
• Threshold for pearson correlation coefficient = 0.7
A Single Experiment
One minor
subcluster
Two larger
clusters
A Single Experiment
• We may compute co-expression
relationships based on all kinds of array
experiment data
– Based on RMA array data of 1436 ATH1 slides
downloaded from the TAIR, co-expression
relationships were identified
• Threshold for pearson correlation coefficient = 0.7
A Single Experiment
Two larger
clusters
A Single Experiment
• Is there any difference between the graphs
based on root-array data and that based on
all-array data?
– By differentially marking clusters of one graph
onto the other graph.
A Single Experiment
One cluster
that should be
root-specific
Two clusters
mapped by the
other graph
A Single Experiment
Cluster size: 9
Cluster sizes:
47 & 14
A Single Experiment
• Some remarks
– The number of differentially displayed genes reported
by the experiment is 178
– The number of clustered genes is 47+14+9 = 70
• Reduced by more than 50%
– The co-expression relationships are recovered
• Each cluster may be a module that usually work together.
– Finding tissue-specific co-expression relationships
• Can be done by mapping the graph based on all-array data onto
the graph based on tissue-related-array data.
A Single Experiment
• In addition to cluster genes according
to co-expression relationships, we
also fished genes that may potentially
co-expressed.
– These genes may not be identified as
differentially displayed in the
experiment.
A Single Experiment
• A GO enrichment analysis was also carried out
– using the GOBU software (gobu.iis.sinica.edu.tw)
– which should give a conceptual view of clustered
genes.
A time-course experiment
• In this example, a time-course array
experiment was performed
– Three time points
– About 800 genes differentially displayed at
least one time point.
– Based on array data of 300 ATH1 slides
extracted from RMA array data of about 2600
ATH1 slides downloaded from the NASCarrays
• Threshold for pearson correlation coefficient = 0.8
A time-course experiment
Time point 1
About 100 genes
About 100 genes
A time-course experiment
Time point 2
About 100 genes
About 100 genes
A time-course experiment
Time point 3
About 100 genes
About 100 genes
A time-course experiment
• Though this clustering and time-course
expression data shows some biological
meaning,
– this size of clustered genes (more than 200)
• makes the graph too complex and
• is too large to be realized in a short time.
A time-course experiment
• Reducing the size of clustered genes may help
– reducing complexity of the graph and
– realizing revealed co-expression module
• We reduced the graph by removing co-expression
relationships that generally exist in the entire plant
– based on RMA array data of about 2600 ATH1 slides
downloaded from the NASCarrays
– Threshold for pearson correlation coefficient = 0.7
A time-course experiment
• Edges (relationships) to be removed
Y
root-related
others
X
A time-course experiment
• Edges (relationships) to be retained
Y
root-related
others
X
A time-course experiment
Time point 1
About 20 genes
About 50 genes
About 60 genes
A time-course experiment
Time point 2
About 20 genes
About 50 genes
About 60 genes
A time-course experiment
Time point 3
About 20 genes
About 50 genes
About 60 genes
A time-course experiment
• Some remarks
– The number of differentially displayed genes at least
one time point is about 800.
– The number of clustered genes is about 60+50+20 =
130
• Reduced by more than 80%
– The retained graph contains edges, i.e., gene pairs, that
are co-expressed in root but not in the entire plant
• The recovered clusters should be root specific.
Potential Applications
• We have created a tool kit that
– computes co-expression relationships based on array
data
• where probe names can be replaced by aliases made by
something like orthologous mapping
• can be used for studying non-model organism using array data
of a model organism.
Potential Applications
• We have created a tool kit that
– fills colors according to graphs by
• intensity fold-changes, or
• clusters in another graph
Potential Applications
• We have created a tool kit that
– removes/retains co-expression relationships in
another graph
– finds specific or common co-expression
relationships
200 genes
120 genes
Potential Applications
• We have created a tool kit
that
– fishes genes that are
potentially co-expressed
with assigned bait
Future Works
• Incorporate pathway database
– like the AraCyc
– for finding relationships between co-expression
clusters and known pathways
• A user-friendly interface which would
– facilitate using this tool kit and
– help manage output data
Availability
• The tool kit is now an open-source project
– http://maccu.sourceforge.net
– Project name: MACCU
• Multi-Array Correlation Computation Utility
– A detailed description of each program module
has been created.
– A running script with example is provided.
Special Thanks
• I would like to thank
– Drs. Chang (Bill), Schmidt & Wu
• for raising this idea,
• the initial implementation, and
• valuable comments.
Thank you!