Download Medicago Genomics and Bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantitative trait locus wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

RNA silencing wikipedia , lookup

Essential gene wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Oncogenomics wikipedia , lookup

RNA interference wikipedia , lookup

X-inactivation wikipedia , lookup

Metagenomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Public health genomics wikipedia , lookup

Gene therapy wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

Minimal genome wikipedia , lookup

Gene desert wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene nomenclature wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Ridge (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

NEDD9 wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
PLPTH 890 Introduction to Genomic Bioinformatics
Lecture 26
Microarray Data Analysis - III
Liangjiang (LJ) Wang
[email protected]
April 19, 2005
Outline
• Statistical tests and clustering (review).
• Use of Gene Ontology (GO) in microarray
data analysis.
Sample
acquisition
RNA: purification, labeling
Data
acquisition
Microarray: hybridization,
washing, image analysis
Data
analysis
Data: preprocessing,
statistical inference,
clustering analysis, . . .
(Hypothesis generation)
Hypothesis
testing
Biological insight
Finding Significant Genes
• Fold change: use a single fold change threshold
to select genes; does not take into account the
variability inherent in the microarray data.
• Student’s t test: tests whether a difference is
significant by comparing gene expression
measurements between two conditions.
• ANOVA (ANalysis Of VAriance): used to find
significant genes in more than two conditions.
• Correction for multiple testing:
p  ( / N )
– Bonferroni correction:
– False Discovery Rate (FDR):
i
p
q
N
Clustering Analysis
• Unsupervised methods for discovering gene
expression patterns and data structures.
• There is no single method that is best for
every dataset.
• Commonly used clustering methods:
– Hierarchical clustering: good for visualizing
patterns, but often misused to partition data.
– k-means: a simple method to partition data
into a fixed number (k) of clusters.
– Self-Organizing Map (SOM): a neural networkbased clustering approach.
Self-Organizing Map (SOM)
• The user defines an initial geometry of nodes (reference
vectors) for the partitions such as a 3 x 2 rectangular grid.
• During the iterative “training” process, the nodes migrate to fit
the gene expression data.
• The genes are mapped to the most similar reference vector.
Genes in a Cluster May Be Co-Regulated
Microarrays measure steady-state levels of mRNAs.
Multi-level regulation
RNA synthesis
RNA processing
RNA turnover
Protein synthesis
Protein modification
and degradation
Transcriptional
regulation
Posttranscriptional
regulation
Translational
regulation
Posttranslational
regulation
Beyond Clustering Analysis
• Using GO to understand significant
functional associations of a gene cluster.
• Mapping gene expression data onto
biochemical pathways.
• Mapping gene expression data onto proteinprotein interaction networks.
• Discovering regulatory elements shared by
the promoters of co-expressed genes.
• Inferring gene regulatory networks.
What Is an Ontology?
• An ontology is a set of terms, relationships
and definitions that capture the knowledge
of a certain domain.
• Terms represent a controlled vocabulary,
and define the concepts of a domain.
• Terms are linked by relationships, which
constitute a semantic network.
• Ontologies augment natural language
annotations and can be more easily
processed computationally.
The Gene Ontology (GO)
• Providing structured vocabularies for
describing gene products in the domain of
molecular biology.
• Enabling a common understanding of model
organisms and between databases.
• Consisted of three structurally unlinked
hierarchies (molecular function, biological
process and cellular component).
• 2 types of relationships between GO terms:
– is-a: subclass.
– part-of: physical part of, or subprocess of.
Three GO Hierarchies
• Molecular function: elemental activity/task
(what)
(e.g., DNA-binding, polymerase, transcription factor)
• Biological process: goal or objective
(why)
(e.g., mitosis, DNA replication, cell cycle control)
• Cellular component: location or complex
(where)
(e.g., nucleus, ribosome, pre-replication complex)
(Gene Ontology information can be accessed at
http://www.geneontology.org/)
Example: Gene Ontology Hierarchy
Biological process
(GO:0008150)
i
i
Development
(GO:0007275)
i
… … …
P
part of
…
…
i
P
is a
Behavior
(GO:0007610)
i
Cell growth
(GO:0008151)
Cell aging
Programmed
(GO:0007569) (GO:0012501)
… … …
Physiological
(GO:0007582)
i
i
Communication
Cell death
(GO:0007154) (GO:0008219)
P
i
i
Cellular process
(GO:0009987)
i
… … …
i
…
… … …
i
Induction
Apoptosis
(GO:0012502) (GO:0006915)
i
… … …
HS response
(GO:0009626)
…
i
Autophagic cell death
(GO:0048102)
… … …
…
Gene Annotation Using GO Terms
• Association of GO terms with gene products
based on evidence from literature reference
or computational analysis.
• The creation of GO and the association of GO
terms with gene products (gene annotation)
are two independent operations.
• A gene can be associated with one or more
GO terms (gene categories), and one category
normally has many genes (many-to-many
relationship between genes and GO terms).
Example of Molecular Function
(The Gene Ontology Consortium, 2000)
Example of Biological Process
(The Gene Ontology Consortium, 2000)
Example of Cellular Component
(The Gene Ontology Consortium, 2000)
Genes from the Same Biological Process
Tend to Be Co-Expressed
(The Gene Ontology Consortium, 2000)
Gene Names
Bio Process
How to Assess Overrepresentation
of a GO Term?
Genes on an array:
Total number of genes (N):
Number of genes – cell cycle (R):
Genes in a cluster:
Number of genes in the cluster (n):
Number of genes – cell cycle (r):
2,285
161
147
25
Is the GO term (i.e., cell cycle) significantly
overrepresented in the cluster?
Using the Z-Statistic
• Assume the hypergeometric distribution.
• The z-score:
(observed  expected)
z

stdev(observed )
R

r  n 
N

R  N  n 
 R 
n 1  

N  N  1 
 N 
• For the example:
161 

 25  147 

2285 

z
 4.88
161  2285  147 
 161 
147
1 


 2285  2285  2285  1 
Using the Fisher Exact Test
• Contingency table:
Cluster
in
GO
class
in
out
out
a
c
b
d
a=r
b=R-r
c=n-r
d=N-R-n+r
• Probability of finding a genes of the GO class in
the cluster:
(a  b)!(c  d )!(a  c)!(b  d )!
pa 
N !a!b!c!d!
• The p value:
a b
p   pi
i a
MAPPFinder
• A tool for mapping
gene expression
data to the GO
hierarchies.
• Part of the free
software package
GenMAPP.
• Available at
http://www.genmapp.org/.
(Doniger et al., 2003)
MAPPFinder Sample Output
(Doniger et al., 2003)
GoMiner
• A client-server application using Java (data on the server side).
• Available at http://discover.nci.nih.gov/gominer/.
(Zeeberg et al., 2003)
Onto-Express
• A web application for GO-based microarray data
analysis (http://vortex.cs.wayne.edu/Projects.html).
• The input to Onto-Express is a list of Affymetrix
probe IDs, GenBank sequence accessions or
UniGene cluster IDs.
• Part of the integrated Onto-Tools, including:
– Onto-Compare: compare commercial arrays.
– Onto-Design: help array design (probe selection).
– Onto-Translate: provide mapping of different IDs.
p
GO
# genes
(Genes linked to poor breast cancer outcome)
Summary
• “Statistical significance is fine, but
biological significance is better”
(Baxevanis and Ouellette, 2005).
• Gene Ontology (GO) can be used to assess
significant functional associations of a
gene cluster or a list of significant genes.
• Several tools are available to assist the
GO-based analysis of microarray data.
• Next: pathways and regulatory networks.