* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gene Ontology - Computational Cancer Biology
Pharmacogenomics wikipedia , lookup
Minimal genome wikipedia , lookup
X-inactivation wikipedia , lookup
Pathogenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Point mutation wikipedia , lookup
Copy-number variation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic engineering wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome evolution wikipedia , lookup
Gene therapy wikipedia , lookup
The Selfish Gene wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression programming wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels Scenario You have a gene expression dataset containing data from normal colon and adenoma samples. - Which pathways are differentially regulated between normal and CRC samples? - Do products of significantly differently expressed genes have specific functions (Gene Ontology)? - Is there a significant overlap with published expression signatures (mutations, response to treatment, ...)? Overview • Mapping probe sets to functional annotation • Hypergeometric test (Fisher’s exact test) • Gene Set Enrichment Analysis • Global test Mapping probe sets to functional annotation Examples of functional annotation • Pathway databases (e.g. KEGG, Pathway Interaction Database, ConsensusPathDB, www.pathguide.org/) • Functional categories (e.g. Gene Ontology, FunCat) • Enzyme Commission numbers, disease associations, protein domains, … • Published gene signatures Example KEGG pathway http://www.genome.jp/kegg/kegg2.html Gene Ontology • Collection of three separate ontologies: biological process, molecular function, cellular component • Organized in a graph structure, i.e. each term (concept, category) can have several parents Gene Ontology (II) Gene Ontology (III) • Annotations with GO terms are assigned an evidence code: G protein alpha subunit; GO:0060158 activation of phospholipase C …; ISS • Different categories of evidence codes: experimental, computational, Author/Curator statement, fully automatic (IEA) Details at http://www.geneontology.org/GO.evidence.shtml The true path rule If a gene product is annotated with term A, all annotations with ancestors of A must also be valid. • Gene product annotated with this term It can also be annotated with the term‘s ancestors • Different gene products are usually not annotated on the same level of the hierarchy Hands on Time The hypergeometric test / Fisher’s exact test Basics • Enrichment test • Analysis steps: 1. Single gene test (e.g. t-test for finding differentially expressed genes) 2. Do list (step 1) and gene sets overlap significantly? diff. Expressed in gene set not in gene set not diff. expressed Example • Microarray: 20000, MAPK: 100, diff. expressed: 200 diff. Expressed not diff. expressed total MAPK 2 98 100 not MAPK 198 19702 19900 total 200 19800 20000 Fisher‘s exact test p = 0.26 Example • Microarray: 20000, MAPK: 100, diff. expressed: 200 diff. Expressed not diff. expressed total MAPK 6 94 100 not MAPK 194 19706 19900 total 200 19800 20000 Fisher‘s exact test p = 0.0005 Another Example • Consider having data on treatment response and gene mutation for samples in a dataset Resistant Sensitive Mutated WT total ! Choose threshold for resistance/sensitivity total Problem with this approach • Null hypothesis: Genes in the gene set are randomly drawn Significant result means that genes in the gene set are more alike than random genes • Problem: Gene set has been selected such that the genes have something in common False positives Hands on Time PAGE: Parametric Analysis of Gene Set Enrichment Basics • For each gene set and each sample: – How different is the mean expression of all genes in a gene set from the overall mean expression? • Applied to full expression matrix – No need for selecting interesting genes (based on e.g. t-test) Basics Problem with this approach • What happens if one part of the pathway is up-regulated and the another part is down-regulated? Hands on Time The global test Basics • Group test • Can the genes in the gene set predict the response? • What is needed? – Clinical variable – Gene expression – Gene sets e.g. normal vs. CRC e.g. GSE8671 e.g. KEGG pathways Interpretation • Interpretation of significant test result (w.r.t. genes): – Gene set is associated with clinical variable – “On average“ the genes in the set are associated with the clinical variable – Not every gene needs to be associated Interpretation Interpretation • Interpretation of significant test result (w.r.t. samples): – Expression profile in the gene set differs for different values of the clinical variable – Samples with similar value (clinical variable) have relatively similar expression profiles Interpretation