Download Handouts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetically modified crops wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

NEDD9 wikipedia , lookup

X-inactivation wikipedia , lookup

Human genome wikipedia , lookup

Genomics wikipedia , lookup

Epistasis wikipedia , lookup

Transposable element wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene nomenclature wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Public health genomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Nutriepigenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Essential gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microevolution wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Outline
GeneList
EnrichmentAnalysis
GeorgeBell,Ph.D.
BaRC HotTopics
March16,2010
Whydoenrichmentanalysis?
• Mostarray,sequencing,andscreensproduce
– Ameasurementformostorallgenes
– List(s)of“interesting” genes
•
•
•
•
Mostcellularprocessesinvolvesetsofgenes.
Canwecomparetheabovetwodatasets?
Istheoverlapdifferentthanexpected?
Doesthistellussomethingaboutcellular
mechanisms?
•
•
•
•
•
•
•
•
Whydoenrichmentanalysis?
Maintypes
Selectingorrankinggenes
Annotationsources
Statistics
Remainingissues
Presentingfindings
Recommendedtools
Whynotjustlinkgenestophysiology?
• Toomanygenestoexamineindetail.
• Arewebiased?
• Howdoweknowthatwhatwe’reseeingis
surprising?
Genome =
20,000 genes
Our list =
100 genes
schmooase activity =
1000 genes
Intersection = 10 genes
Maintypesofenrichmentanalysis
• Listbased:inputsare
– Asubsetofallgeneschosenbysomerelevantmethod
– Alistofannotations,eachlinkedtogenes
• Rankbased:inputsare
– Asetofallgenesrankedbysomemetric(ratio,fold
change,etc.)
– Alistofannotations,eachlinkedtogenes
• Listbasedwithrelationships:inputsare
– Asubsetofallgenes
– Alistofannotations,eachlinkedtogenes,organizedin
somerelationship(e.g.,ahierarchy)
Annotationsources
• GeneOntology(mostpopular)
– biologicalprocess,molecularfunction,cellularcomponent
– Termsmayhave>1“parent” (moregeneralterm)
– GOSlim:includesonlygeneralcategories
• KEGG;REACTOMEpathways
• Genessharingamotifofregulatedbythesame
protein/miRNA
• Genesfoundonthesamechromosome
• Also… seeBroad’sMolecularSignaturesDatabase
(MSigDB)
• [anygroupingthatisbiologicallysensible]
Gettingyourlist
• Goal:Identifyalistofgenes(orprobes)thatappeartobe
workingtogetherinsomeway.
• Whatidentifierstouse?
• Mostcommonmethod:Getalistofdifferentiallyexpressed
genes
– Pvalueorfoldchange?
– Threshold?
• Alternatives:
– Defineacluster
– Sortdataand/orapplyamodeltorankgenes
• Recommendations:
– Trylistsofvaryinglength
– Trytomaximizesignal/noise(Whatproducesthesmallestpvalues
forenrichment?)
Statisticstotestforenrichment
Genome =
20,000 genes
Our list =
100 genes
schmooase activity =
1000 genes
10%
5%
Intersection = 10 genes
Our list =
1000 genes
stroumphase activity =
20 genes
0.1%
0.2%
Intersection = 2 genes
Testsforenrichment
•
•
•
•
•
•
•
Fisher’sexact
Hypergeometric
Binomial
Chisquared
Z
KolmogorovSmirnov
Permutation
Otherstatisticalissues
• Goal:Identifyingtheme(s)ofmaximal
biologicalsignificance
– butthisisnotperfectlycorrelatedwithstatistical
significance
• Whatisyourbackgroundgeneset?
– Allgenesthatcouldappearinyourlist
• Whataboutsparseannotationgroups?
• Someannotationtermsmaybesubsetsof
otherterms.
Statisticstotestforenrichment
• Whatisthechanceofobservingenrichmentatleast
thisextremeduetochance?
• Differenttestsproduceverydifferentrangesofp
values
• Alllookforoverenrichment;somelookforunder
enrichment
• Recommendation:Usepvaluesasatooltorank
genesbutdon’ttakethemliterally
• Mostmethodscorrectformultipletesting(e.g.,with
FDR),whichisnecessary
Practicalities
• Chooseatoolthat
– Includesyourspecies
– Includesyourgene/probeidentifiers
– Hasuptodateannotation
– Letsyoudefineyourbackground(ifpossible)
• Getrecommendationsfromtheusualsources.
• Tryatleastafewtools.
Presentingresults
Enrichmenttools
• Generallyignoreenrichedcategorieswhich
– Containveryfewgenes
– Showhighoverlapwithothercategories
•
•
•
•
Whenindoubt,selectmoregeneralcategory.
Simplifycomplexresults.
Graphicalortextsummary?
Plantoshareyourgenelistswhenyou
publish.
• Seehttp://www.geneontology.org/GO.tools.shtml
Somerecommendedtools
•
•
•
•
•
•
DAVID
GSEA
BIOBASE(Whiteheadhaslicense)
BiNGO (usesCytoscape)
GoMiner:http://discover.nci.nih.gov/gominer
GOstat:http://gostat.wehi.edu.au
DAVID
• DatabaseforAnnotation,Visualizationand
IntegratedDiscovery(NIAID)
• Listbased
• http://david.abcc.ncifcrf.gov/
• Lotsofidentifiers;lotsofspecies
• Allowsbackgrounddefinition
• StatisticisamodifiedFisherexacttest
GSEA
•
•
•
•
•
•
Input:
preranked
gene list
Enrichment at bottom of list
GeneSetEnrichmentAnalysis
Rankbased
http://www.broadinstitute.org/gsea/
AsaJavaWebStartordesktopapplication
LinkedtoMSigDB (annotatedgenelists)
Alsopermitscustomannotation
BiNGO
•
•
•
•
BiNGO:ABiologicalNetworkGeneOntologytool
http://www.psb.ugent.be/cbd/papers/BiNGO/
WorkswithCytoscape networkvisualizationtool
Alsopermitscustomannotation
Enrichment at top of list
Shows relationship
between annotation
categories
BIOBASE
• BIOBASEKnowledgeLibrary
• UseInternetExplorer
• Goto“GeneSetAnalysis”
References
• Bioinformaticsenrichmenttools:pathstowardthecomprehensive
functionalanalysisoflargegenelists.(PMID:19033363)Review
• SystematicandintegrativeanalysisoflargegenelistsusingDAVID
bioinformaticsresources.(PMID:19131956)DAVID
• Genesetenrichmentanalysis:aknowledgebasedapproachfor
interpretinggenomewideexpressionprofiles.(PMID:16199517)GSEA