Download Advanced Data Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics in learning and memory wikipedia , lookup

NEDD9 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Oncogenomics wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Point mutation wikipedia , lookup

Genomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Metagenomics wikipedia , lookup

Gene therapy wikipedia , lookup

Pathogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene desert wikipedia , lookup

Public health genomics wikipedia , lookup

The Selfish Gene wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome evolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Gene ontology &
hypergeometric test
Simon Rasmussen
CBS - DTU
The DNA Microarray Analysis Pipeline
Question/hypothesis
Experimental Design
Array design
Probe design
Sample Preparation
Hybridization
Buy standard
Chip / Array
Image analysis
Normalization
Expression Index
Calculation
Comparable
Gene Expression Data
Statistical Analysis
Fit to Model (time series)
Advanced Data Analysis
ClusteringPCA
Gene Annotation Analysis
Promoter Analysis
Classification
Meta analysis
Survival analysis
Regulatory Network
Gene Ontology
• Gene Ontology (GO) is a collection of controlled
vocabularies describing the biology of a gene
product in any organism
• Very useful for interpreting biological function of
microarray data
• Organized in 3 independent sets of ontologies in
a tree structure
– Molecular function (MF), Biological process (BP),
Cellular compartment (CC)
Tree structure
• Controlled networked terms (total ~25.000)
– Parent / child network organized as a tree
– Terms get more detailed as you move down
the network
Relationship
• A gene can be
– present in any of the ontologies (MF / BP /
CC)
– a member of several GO terms
• True path rule
– If a gene is member of a term it is also
member of the terms parents
GO Tree example
•visit www.geneontology.org for more information
KEGG
• KEGG PATHWAYS:
– Manually drawn pathway maps representing our
knowledge on the molecular interaction and reaction
networks, for a large selection of organisms
•
•
•
•
•
•
1. Metabolism
2. Genetic Information Processing
3. Environmental Information Processing
4. Cellular Processes
5. Human Diseases
6. Drug Development
Other pathway database: Reactome
KEGG example
Using Gene ontology
• Input: Any list of genes; from microarray exp.
– Cluster of genes with similar expression
– Up/down regulated genes
• Question we ask:
– Are any GO terms overrepresented in the gene list,
compared to what would happen by chance?
• Method
– Hypergeometric testing
Hypergeometric test
• The hypergeometric distribution arises from
sampling from a fixed population.
20 white balls
out of
100 balls
10 balls
• We want to calculate the probability for drawing 7 or
more white balls out of 10 balls given the
distribution of balls in the urn
Example
• List of 80 significant genes from a microarray
experiment of yeast (~ 6000 genes)
• 10 of the 80 genes are in BP-GO term: DNA replication
– Total nr of yeast genes in GO term is 100
• What is the probability of this occurring by chance?
100 white balls
out of
6000 balls
10 x
70 x
p = 6.2 * 10-8
Total 80 balls
The GO term DNA replication is overrepresented in our list