Download Gene Ontology - Computational Cancer Biology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pharmacogenomics wikipedia , lookup

Minimal genome wikipedia , lookup

X-inactivation wikipedia , lookup

Epistasis wikipedia , lookup

Pathogenomics wikipedia , lookup

Oncogenomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Point mutation wikipedia , lookup

Copy-number variation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genetic engineering wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

NEDD9 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genome evolution wikipedia , lookup

Gene therapy wikipedia , lookup

The Selfish Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Transcript
Gene set analyses of genomic datasets
Andreas Schlicker
Jelle ten Hoeve
Lodewyk Wessels
Scenario
You have a gene expression dataset containing data from normal colon
and adenoma samples.
- Which pathways are differentially regulated between normal and CRC samples?
- Do products of significantly differently expressed genes have specific functions
(Gene Ontology)?
- Is there a significant overlap with published expression signatures (mutations,
response to treatment, ...)?
Overview
• Mapping probe sets to functional annotation
• Hypergeometric test (Fisher’s exact test)
• Gene Set Enrichment Analysis
• Global test
Mapping probe sets to functional annotation
Examples of functional annotation
• Pathway databases (e.g. KEGG, Pathway Interaction
Database, ConsensusPathDB, www.pathguide.org/)
• Functional categories (e.g. Gene Ontology, FunCat)
• Enzyme Commission numbers, disease associations,
protein domains, …
• Published gene signatures
Example KEGG pathway
http://www.genome.jp/kegg/kegg2.html
Gene Ontology
• Collection of three separate ontologies:
biological process, molecular function,
cellular component
• Organized in a graph structure,
i.e. each term (concept, category) can have several parents
Gene Ontology (II)
Gene Ontology (III)
• Annotations with GO terms are assigned an evidence code:
G protein alpha subunit; GO:0060158 activation of phospholipase C …; ISS
• Different categories of evidence codes: experimental,
computational, Author/Curator statement, fully automatic
(IEA)
Details at http://www.geneontology.org/GO.evidence.shtml
The true path rule
If a gene product is annotated
with term A, all annotations with
ancestors of A must also be valid.
• Gene product annotated with
this term
 It can also be annotated
with the term‘s ancestors
• Different gene products are
usually not annotated on the
same level of the hierarchy
Hands on Time
The hypergeometric test / Fisher’s exact test
Basics
• Enrichment test
• Analysis steps:
1. Single gene test (e.g. t-test for finding differentially expressed genes)
2. Do list (step 1) and gene sets overlap significantly?
diff. Expressed
in gene set
not in gene set
not diff. expressed
Example
• Microarray: 20000, MAPK: 100, diff. expressed: 200
diff.
Expressed
not diff.
expressed
total
MAPK
2
98
100
not MAPK
198
19702
19900
total
200
19800
20000
 Fisher‘s exact test p = 0.26
Example
• Microarray: 20000, MAPK: 100, diff. expressed: 200
diff.
Expressed
not diff.
expressed
total
MAPK
6
94
100
not MAPK
194
19706
19900
total
200
19800
20000
 Fisher‘s exact test p = 0.0005
Another Example
• Consider having data on treatment response and gene mutation for
samples in a dataset
Resistant
Sensitive
Mutated
WT
total
! Choose threshold for resistance/sensitivity
total
Problem with this approach
• Null hypothesis: Genes in the gene set are randomly drawn
 Significant result means that genes in the gene set are more alike than
random genes
• Problem: Gene set has been selected such that the genes have
something in common
 False positives
Hands on Time
PAGE: Parametric Analysis of Gene Set Enrichment
Basics
• For each gene set and each sample:
– How different is the mean expression of all genes in a gene set from
the overall mean expression?
• Applied to full expression matrix
– No need for selecting interesting genes (based on e.g. t-test)
Basics
Problem with this approach
• What happens if one part of the pathway is up-regulated and
the another part is down-regulated?
Hands on Time
The global test
Basics
• Group test
• Can the genes in the gene set predict the response?
• What is needed?
– Clinical variable
– Gene expression
– Gene sets
e.g. normal vs. CRC
e.g. GSE8671
e.g. KEGG pathways
Interpretation
• Interpretation of significant test result (w.r.t. genes):
– Gene set is associated with clinical variable
– “On average“ the genes in the set are associated with the clinical
variable
– Not every gene needs to be associated
Interpretation
Interpretation
• Interpretation of significant test result (w.r.t. samples):
– Expression profile in the gene set differs for different values of the
clinical variable
– Samples with similar value (clinical variable) have relatively similar
expression profiles
Interpretation