Download Using Gene Ontology - Center for Genomic Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oncogenomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene desert wikipedia , lookup

Heritability of IQ wikipedia , lookup

X-inactivation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Essential gene wikipedia , lookup

Public health genomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Minimal genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Using Gene
Ontology
Models and Tests
Mark Reimers, NCI
Outline






What we might gain by using annotations
Models for group effects
Enrichment of selected genes
Chi-square and Fisher test
Group scores
Overlap in hierarchical annotations
Why Use Annotations





Goal: How to identify biological processes or biochemical
pathways that are changed by treatment
Common procedure: select ‘changed’ genes, and look
for members of known function
Problem: moderate changes in many genes
simultaneously will escape detection
New approach: start with a vocabulary of known GO
categories or pathways, and look for coherent changes
Variations: look for chromosome locations, or protein
domains, that are common among many genes that are
changed
Statistical Methods



How likely is it that the set of ‘significant’
genes will include as many from the
Category
Others
category, as you see?
112
Two-way table:
On list 8
Fisher Exact test
Not o n list 42
12,500


handles small categories better
How to deal with multiple categories?
GoMiner: Leverages the Gene Ontology
(Zeeberg, et al., Genome Biology 4: R28, 2002)
P-values for Tests




About 3,000 GO biological process
categories
Most overlap with some others
p-values for categories are not
independent
Permutation test of all categories
simultaneously in parallel
Gene Set Expression Analysis

Ignore for the moment the
‘meaning’ of the p-value:
consider it just as a ranking of
S/N


If we select a set of genes ‘at
random’, then the ranking of
S/N ratios should be random


between group difference
relative to within-group
ie. a sample from a uniform
distribution
Adapt standard (K-S) test of
distribution
Continuous Tests


Model: all genes in group contribute
roughly equally to effect
Test: zG   sg for each group G
g G



Compare z to permutation distribution
More sensitive under model assumptions