* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Using Gene Ontology - Center for Genomic Sciences
Oncogenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Gene desert wikipedia , lookup
Heritability of IQ wikipedia , lookup
X-inactivation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Essential gene wikipedia , lookup
Public health genomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression programming wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Designer baby wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Using Gene Ontology Models and Tests Mark Reimers, NCI Outline What we might gain by using annotations Models for group effects Enrichment of selected genes Chi-square and Fisher test Group scores Overlap in hierarchical annotations Why Use Annotations Goal: How to identify biological processes or biochemical pathways that are changed by treatment Common procedure: select ‘changed’ genes, and look for members of known function Problem: moderate changes in many genes simultaneously will escape detection New approach: start with a vocabulary of known GO categories or pathways, and look for coherent changes Variations: look for chromosome locations, or protein domains, that are common among many genes that are changed Statistical Methods How likely is it that the set of ‘significant’ genes will include as many from the Category Others category, as you see? 112 Two-way table: On list 8 Fisher Exact test Not o n list 42 12,500 handles small categories better How to deal with multiple categories? GoMiner: Leverages the Gene Ontology (Zeeberg, et al., Genome Biology 4: R28, 2002) P-values for Tests About 3,000 GO biological process categories Most overlap with some others p-values for categories are not independent Permutation test of all categories simultaneously in parallel Gene Set Expression Analysis Ignore for the moment the ‘meaning’ of the p-value: consider it just as a ranking of S/N If we select a set of genes ‘at random’, then the ranking of S/N ratios should be random between group difference relative to within-group ie. a sample from a uniform distribution Adapt standard (K-S) test of distribution Continuous Tests Model: all genes in group contribute roughly equally to effect Test: zG sg for each group G g G Compare z to permutation distribution More sensitive under model assumptions