Download • Most methods will reveal complex lists of hundreds or thousands of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of depression wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

MicroRNA wikipedia , lookup

X-inactivation wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

History of genetic engineering wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Essential gene wikipedia , lookup

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

NEDD9 wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Microevolution wikipedia , lookup

Gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genome evolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
11/2/14 Making biological sense of complex genome datasets •  Most methods will reveal complex lists of hundreds or thousands of genes –  how to interpret them? •  First step is usually GO and pathway analysis –  If you experiment worked well, genes should be of related funcDons •  DAVID (hIp://david.abcc.ncifcrf.gov/) offers collecDons of GO, pathways databases, protein domain databases, and other –  Different datasets Ded together in “clusters” based on overlapping content of genes –  The DAVID clustering algorithm gives a correlated cluster a score based on the cumulaDve p-­‐values of entrants, allows some lower-­‐
scoring categories to sDll enrich your biological understanding But what next? Which genes and funcDons are most informaDve and representaDve? •  Especially important for complex datasets with mulDple condiDons, e.g. Dmecourse, Dssues, individuals –  Most such gene sets will be comprised of several groups, with different paIerns of expression and separate (although likely related) funcDons –  CorrelaDon in a cluster with known genes can suggest funcDon for novel transcripts )”guilt by associaDon” –  Choosing examples from the major clusters can be a good strategy for developing follow on hypotheses 1 11/2/14 Cluster assignment is the basis of “heat maps” and expression correlaDon networks •  Different types of staDsDcal methods can be used to measure paIern similarity, based on an “expression matrix” •  Simplest methods use a simple Pearson correlaDon staDsDcs, but other variants are more accurate (e.g. weighted correlaDon,WGCNA) •  Compares the “vector” of values for genes A and B over N condiDons •  Accurate correlaDon requires a large matrix with varied condiDons! Genes 1 2 3 4 5 6 7 8 . n EXPRESSION VALUE (V) UNDER EACH CONDITION B C D E
F
G H I
J
……… A
V1
.
.
.
.
.
.
.
.
.
V2
.
.
.
.
.
.
.
.
.
V3
.
.
.
.
.
.
.
.
.
V4
.
.
.
.
.
.
.
.
.
V5
.
.
.
.
.
.
.
.
.
V6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . Vn
Cluster analysis: heat maps •  Cluster sogware was developed by Michael Eisen in the early days of microarray analysis •  latest version is Cluster3.0, developed by M. de Hoon (U. Tokyo, hIp://bonsai.hgc.jp/~mdehoon/sogware/
cluster/sogware.htm) –  Offers many different staDsDcal opDons and data analysis paths; see the (very long! But excellent) manual on the website for more informaDon –  Default is Pearson correlaDon •  Cluster is available as a GUI (which we will use) and as a standalone package –  Combined with Java TreeView sogware for display (hIp://jtreeview.sourceforge.net/) •  A valuable tool for display is “centering” –  plots the expression of each gene across the condiDons as an + (red) or – (green) value relaDve to average expression of the gene 2 11/2/14 Expression vector correlaDons also underlie expression correlaDon networks •  Expression correlaDons can also be displayed in network format –  Distances between the vectors describing expression paIerns for “nodes” (genes) are depicted as edges of proporDonal lengths (closer=more similar) –  PosiDve and negaDve correlaDons can be displayed –  Genes within a network “module” are related, and can be linked through interconnecDng nodes –  Node metrics can be measured, such as “betweenness centrality”, which defines most central nodes •  For both Cluster and Networks, it helps to not try to display every gene –  Select to display differenDally expressed genes, or use some other metric Network staDsDcs and visualizaDon •  Weighted correlaDons produce superior results –  e.g. WGCNA, Horvath et al. •  Simpler staDsDcal packages can sDll yield useful informaDon for working hypothesis –  The same Pearson staDsDcs used by default in Cluster3.0 –  Input is an expression matrix: same basic format as for Cluster •  A very simple, user friendly tool is ExpressionCorrelaDon by Joel Bader –  A free “plug-­‐in” that can be deployed and displayed using Cytoscape (v 2.8.1; not supported yet for Cytoscape v3) •  Like Cluster, Network analysis is a valuable tool for linking genes into groups, and for hypothesis development –  But as with all gene expression analysis tools you should be careful: changing parameters or staDsDcal tools can yield different results! 3 11/2/14 Exercises available for Cluster/ExpressionCorrelaDon •  Create heatmaps from an expression dataset using Cluster3.0 •  Use the same expression dataset, formaIed for ExpressionCorrelaDon, to create, analyze and examine a network in Cytoscape •  OpDonal: –  Analyze a gene expression dataset from GEO, and download the expression matrix file –  Select a subset of genes from a genome-­‐wide expression matrix using Galaxy 4