Download M01 Presentation: Introduction File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Molecular evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcript
Genomic Data Manipulation
Thinking about data visually
Curtis Huttenhower
[email protected]
http://huttenhower.sph.harvard.edu/bio508
Harvard School of Public Health
Department of Biostatistics
01-28-13
The usual suspects
2
Small changes, big differences
3
Fig. 3. Comparison of Sargasso Sea scaffolds to Crenarchaeal clone 4B7.
J C Venter et al. Science 2004;304:66-74
Published by AAAS
Only one of many ways to think
about DNA sequence data...
5
Fig. 7. Phylogenetic tree of rhodopsinlike genes in the Sargasso Sea data along
with all homologs of these genes in GenBank.
(Almost)
everything can
be clustered into
a tree, even DNA
sequences
J C Venter et al. Science 2004;304:66-74
Published by AAAS
Aerobic, microaerobic and
anaerobic communities
But not every
tree is a
clustering
Model of microbial biomarkers
Why are
networks so
popular in
biology?
8
Don’t be afraid
to get creative
when
representing
data!
Hunger Games
Avengers
Dark Knight Rises
Twilight XXVII
http://xach.com/moviecharts/2012.html
9
Wordles
10
Four 11-pair datasets with the same...
Anscombe's quartet
X mean, X standard deviation,
Y mean, Y standard deviation,
Correlation, and regression coefficients
μ(x)=9
σ(x)=11
μ(y)=7.5
σ(y)=4.1
ρ=0.816
y=3+0.5x
Looking at
data – it’s not
just fun, it’s
important, too!
11