* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Handouts
Genetically modified crops wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
X-inactivation wikipedia , lookup
Human genome wikipedia , lookup
Transposable element wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Gene nomenclature wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Oncogenomics wikipedia , lookup
Gene desert wikipedia , lookup
Public health genomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nutriepigenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Essential gene wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Outline GeneList EnrichmentAnalysis GeorgeBell,Ph.D. BaRC HotTopics March16,2010 Whydoenrichmentanalysis? • Mostarray,sequencing,andscreensproduce – Ameasurementformostorallgenes – List(s)of“interesting” genes • • • • Mostcellularprocessesinvolvesetsofgenes. Canwecomparetheabovetwodatasets? Istheoverlapdifferentthanexpected? Doesthistellussomethingaboutcellular mechanisms? • • • • • • • • Whydoenrichmentanalysis? Maintypes Selectingorrankinggenes Annotationsources Statistics Remainingissues Presentingfindings Recommendedtools Whynotjustlinkgenestophysiology? • Toomanygenestoexamineindetail. • Arewebiased? • Howdoweknowthatwhatwe’reseeingis surprising? Genome = 20,000 genes Our list = 100 genes schmooase activity = 1000 genes Intersection = 10 genes Maintypesofenrichmentanalysis • Listbased:inputsare – Asubsetofallgeneschosenbysomerelevantmethod – Alistofannotations,eachlinkedtogenes • Rankbased:inputsare – Asetofallgenesrankedbysomemetric(ratio,fold change,etc.) – Alistofannotations,eachlinkedtogenes • Listbasedwithrelationships:inputsare – Asubsetofallgenes – Alistofannotations,eachlinkedtogenes,organizedin somerelationship(e.g.,ahierarchy) Annotationsources • GeneOntology(mostpopular) – biologicalprocess,molecularfunction,cellularcomponent – Termsmayhave>1“parent” (moregeneralterm) – GOSlim:includesonlygeneralcategories • KEGG;REACTOMEpathways • Genessharingamotifofregulatedbythesame protein/miRNA • Genesfoundonthesamechromosome • Also… seeBroad’sMolecularSignaturesDatabase (MSigDB) • [anygroupingthatisbiologicallysensible] Gettingyourlist • Goal:Identifyalistofgenes(orprobes)thatappeartobe workingtogetherinsomeway. • Whatidentifierstouse? • Mostcommonmethod:Getalistofdifferentiallyexpressed genes – Pvalueorfoldchange? – Threshold? • Alternatives: – Defineacluster – Sortdataand/orapplyamodeltorankgenes • Recommendations: – Trylistsofvaryinglength – Trytomaximizesignal/noise(Whatproducesthesmallestpvalues forenrichment?) Statisticstotestforenrichment Genome = 20,000 genes Our list = 100 genes schmooase activity = 1000 genes 10% 5% Intersection = 10 genes Our list = 1000 genes stroumphase activity = 20 genes 0.1% 0.2% Intersection = 2 genes Testsforenrichment • • • • • • • Fisher’sexact Hypergeometric Binomial Chisquared Z KolmogorovSmirnov Permutation Otherstatisticalissues • Goal:Identifyingtheme(s)ofmaximal biologicalsignificance – butthisisnotperfectlycorrelatedwithstatistical significance • Whatisyourbackgroundgeneset? – Allgenesthatcouldappearinyourlist • Whataboutsparseannotationgroups? • Someannotationtermsmaybesubsetsof otherterms. Statisticstotestforenrichment • Whatisthechanceofobservingenrichmentatleast thisextremeduetochance? • Differenttestsproduceverydifferentrangesofp values • Alllookforoverenrichment;somelookforunder enrichment • Recommendation:Usepvaluesasatooltorank genesbutdon’ttakethemliterally • Mostmethodscorrectformultipletesting(e.g.,with FDR),whichisnecessary Practicalities • Chooseatoolthat – Includesyourspecies – Includesyourgene/probeidentifiers – Hasuptodateannotation – Letsyoudefineyourbackground(ifpossible) • Getrecommendationsfromtheusualsources. • Tryatleastafewtools. Presentingresults Enrichmenttools • Generallyignoreenrichedcategorieswhich – Containveryfewgenes – Showhighoverlapwithothercategories • • • • Whenindoubt,selectmoregeneralcategory. Simplifycomplexresults. Graphicalortextsummary? Plantoshareyourgenelistswhenyou publish. • Seehttp://www.geneontology.org/GO.tools.shtml Somerecommendedtools • • • • • • DAVID GSEA BIOBASE(Whiteheadhaslicense) BiNGO (usesCytoscape) GoMiner:http://discover.nci.nih.gov/gominer GOstat:http://gostat.wehi.edu.au DAVID • DatabaseforAnnotation,Visualizationand IntegratedDiscovery(NIAID) • Listbased • http://david.abcc.ncifcrf.gov/ • Lotsofidentifiers;lotsofspecies • Allowsbackgrounddefinition • StatisticisamodifiedFisherexacttest GSEA • • • • • • Input: preranked gene list Enrichment at bottom of list GeneSetEnrichmentAnalysis Rankbased http://www.broadinstitute.org/gsea/ AsaJavaWebStartordesktopapplication LinkedtoMSigDB (annotatedgenelists) Alsopermitscustomannotation BiNGO • • • • BiNGO:ABiologicalNetworkGeneOntologytool http://www.psb.ugent.be/cbd/papers/BiNGO/ WorkswithCytoscape networkvisualizationtool Alsopermitscustomannotation Enrichment at top of list Shows relationship between annotation categories BIOBASE • BIOBASEKnowledgeLibrary • UseInternetExplorer • Goto“GeneSetAnalysis” References • Bioinformaticsenrichmenttools:pathstowardthecomprehensive functionalanalysisoflargegenelists.(PMID:19033363)Review • SystematicandintegrativeanalysisoflargegenelistsusingDAVID bioinformaticsresources.(PMID:19131956)DAVID • Genesetenrichmentanalysis:aknowledgebasedapproachfor interpretinggenomewideexpressionprofiles.(PMID:16199517)GSEA