* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download • Most methods will reveal complex lists of hundreds or thousands of
Epigenetics of depression wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
X-inactivation wikipedia , lookup
Oncogenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
History of genetic engineering wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Essential gene wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
11/2/14 Making biological sense of complex genome datasets • Most methods will reveal complex lists of hundreds or thousands of genes – how to interpret them? • First step is usually GO and pathway analysis – If you experiment worked well, genes should be of related funcDons • DAVID (hIp://david.abcc.ncifcrf.gov/) offers collecDons of GO, pathways databases, protein domain databases, and other – Different datasets Ded together in “clusters” based on overlapping content of genes – The DAVID clustering algorithm gives a correlated cluster a score based on the cumulaDve p-‐values of entrants, allows some lower-‐ scoring categories to sDll enrich your biological understanding But what next? Which genes and funcDons are most informaDve and representaDve? • Especially important for complex datasets with mulDple condiDons, e.g. Dmecourse, Dssues, individuals – Most such gene sets will be comprised of several groups, with different paIerns of expression and separate (although likely related) funcDons – CorrelaDon in a cluster with known genes can suggest funcDon for novel transcripts )”guilt by associaDon” – Choosing examples from the major clusters can be a good strategy for developing follow on hypotheses 1 11/2/14 Cluster assignment is the basis of “heat maps” and expression correlaDon networks • Different types of staDsDcal methods can be used to measure paIern similarity, based on an “expression matrix” • Simplest methods use a simple Pearson correlaDon staDsDcs, but other variants are more accurate (e.g. weighted correlaDon,WGCNA) • Compares the “vector” of values for genes A and B over N condiDons • Accurate correlaDon requires a large matrix with varied condiDons! Genes 1 2 3 4 5 6 7 8 . n EXPRESSION VALUE (V) UNDER EACH CONDITION B C D E F G H I J ……… A V1 . . . . . . . . . V2 . . . . . . . . . V3 . . . . . . . . . V4 . . . . . . . . . V5 . . . . . . . . . V6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vn Cluster analysis: heat maps • Cluster sogware was developed by Michael Eisen in the early days of microarray analysis • latest version is Cluster3.0, developed by M. de Hoon (U. Tokyo, hIp://bonsai.hgc.jp/~mdehoon/sogware/ cluster/sogware.htm) – Offers many different staDsDcal opDons and data analysis paths; see the (very long! But excellent) manual on the website for more informaDon – Default is Pearson correlaDon • Cluster is available as a GUI (which we will use) and as a standalone package – Combined with Java TreeView sogware for display (hIp://jtreeview.sourceforge.net/) • A valuable tool for display is “centering” – plots the expression of each gene across the condiDons as an + (red) or – (green) value relaDve to average expression of the gene 2 11/2/14 Expression vector correlaDons also underlie expression correlaDon networks • Expression correlaDons can also be displayed in network format – Distances between the vectors describing expression paIerns for “nodes” (genes) are depicted as edges of proporDonal lengths (closer=more similar) – PosiDve and negaDve correlaDons can be displayed – Genes within a network “module” are related, and can be linked through interconnecDng nodes – Node metrics can be measured, such as “betweenness centrality”, which defines most central nodes • For both Cluster and Networks, it helps to not try to display every gene – Select to display differenDally expressed genes, or use some other metric Network staDsDcs and visualizaDon • Weighted correlaDons produce superior results – e.g. WGCNA, Horvath et al. • Simpler staDsDcal packages can sDll yield useful informaDon for working hypothesis – The same Pearson staDsDcs used by default in Cluster3.0 – Input is an expression matrix: same basic format as for Cluster • A very simple, user friendly tool is ExpressionCorrelaDon by Joel Bader – A free “plug-‐in” that can be deployed and displayed using Cytoscape (v 2.8.1; not supported yet for Cytoscape v3) • Like Cluster, Network analysis is a valuable tool for linking genes into groups, and for hypothesis development – But as with all gene expression analysis tools you should be careful: changing parameters or staDsDcal tools can yield different results! 3 11/2/14 Exercises available for Cluster/ExpressionCorrelaDon • Create heatmaps from an expression dataset using Cluster3.0 • Use the same expression dataset, formaIed for ExpressionCorrelaDon, to create, analyze and examine a network in Cytoscape • OpDonal: – Analyze a gene expression dataset from GEO, and download the expression matrix file – Select a subset of genes from a genome-‐wide expression matrix using Galaxy 4