Download Good Statistics Can Help You Discover Biological Process

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Good Statistics Can Help You Discover Biological
Process Using Modern Genomic Assays
Tom Downey
President, Partek Inc.
GoToWebinar Attendee Interface
1. Viewer Window
2. Control Panel
2
Copyright © 2009, All Rights Reserved. Partek Inc.
Who is Partek?

Founded in 1993

Based in St. Louis, MO USA

Focused on Genomics


Thousands of customers
worldwide
Building tools for both
biologists and bioinformaticians
3
Copyright © 2009, All Rights Reserved. Partek Inc.
Partek® GS™ - Software for Integrated Genomics

Multi-Platform Support
• Multi-Assay Support
• Gene Expression
• Alternative Splicing
• miRNA
• Copy Number Variation
• ChIP-Chip, ChIP-Seq
• RNA-Seq
4
Copyright © 2009, All Rights Reserved. Partek Inc.
Partek® GS™ for Integrated Genomics
Microarray
Genome
• Copy Number
• Total & Allele Specific
• Association
• Loss of Heterozygosity
&
Next Generation Sequencing
Regulation
Transcriptome
• Gene Expression
• Exon/Alternative Splicing
• DGE & mRNA –Seq
5
• ChIP-Chip
• ChIP-Seq
• microRNA
Copyright © 2009, All Rights Reserved. Partek Inc.
What is Statistics?
•
Statistics (Source: www.m-w.com)

a branch of mathematics dealing with
collection, analysis,
interpretation,and presentation of
masses of numerical data
6
Copyright © 2009, All Rights Reserved. Partek Inc.
The Role of Experimental Design
•
•
The goal of statistics is to find signals in a sea of noise
The goal of experimental design is to reduce that noise
so true biological signals can be found with as small a
sample size as possible
7
Copyright © 2009, All Rights Reserved. Partek Inc.
Example 1: ER+ Positive Breast Cancer
Study

Evaluate expression of ER+ breast
cancer cell lines



5 treatments
3 time points
2 replicates
Control
Estrogen E2 +
(E2)
ICI
E2 +
Raloxifene
E2 +
Tomoxifen
0 hr
2
0
0
0
0
8 hr
0
2
2
2
2
48 hr
0
2
2
2
2
8
Copyright © 2009, All Rights Reserved. Partek Inc.
As Seen Using PCA
Samples
clustered by
batches “A”
and “B”
•
9
Copyright © 2009, All Rights Reserved. Partek Inc.
As Seen Using Hierarchical Clustering
10
Copyright © 2009, All Rights Reserved. Partek Inc.
Good News!
•
•
Since the treatments were perfectly balanced with the
batches, the batch can be can be completely removed from
the data.
With a simple 2-way ANOVA, this gene was #228 on the
gene list and would not pass multiple test correction for
significance. With a 3-way ANOVA including batch, it was #2
on the gene list.
Factor
2-way ANOVA
3-way ANOVA
Treatment
0.00391497
3.43275e-007
Time
0.396031
0.00964938
Treatment*Time
0.100862
3.56752e-005
11
Copyright © 2009, All Rights Reserved. Partek Inc.
Balanced with respect to Batch

Monday
Control
Estrogen E2 +
(E2)
ICI
E2 +
Raloxifene
E2 +
Tomoxifen
8 hr
1
1
1
1
48 hr
1
1
1
1
Estrogen E2 +
(E2)
ICI
E2 +
Raloxifene
E2 +
Tomoxifen
8 hr
1
1
1
1
48 hr
1
1
1
1
0 hr

1
Tuesday
Control
0 hr
1
12
Copyright © 2009, All Rights Reserved. Partek Inc.
#2 Most Significant Gene
MedianA=8.5
MedianB=9.7
Tue/Mon more
than 2-fold
Monday on top, Tuesday on bottom
13
Copyright © 2009, All Rights Reserved. Partek Inc.
ER+ B.C. Unadjusted Signal
14
Copyright © 2009, All Rights Reserved. Partek Inc.
ER+ B.C. Batch-Adjusted Signal
15
Copyright © 2009, All Rights Reserved. Partek Inc.
Experiment Design: Consider Batch Effects
16
Copyright © 2009, All Rights Reserved. Partek Inc.
How NOT to Ruin Your Next Experiment!
•
•
•
Samples are frequently “organized”
by treatment groups.
Samples are then processed in
batches corresponding to treatment
groups.
E.g. Control samples processed on
Monday, Treated samples processed
on Tuesday.
17
Copyright © 2009, All Rights Reserved. Partek Inc.
Statistical Analysis
18
Copyright © 2009, All Rights Reserved. Partek Inc.
A brief history of statistics
Pearson
1900-1901:
Regression,
Correlation
(r), ChiSquare
Student (William Gosset)
1908:
T-test
Fisher
1923:
ANOVA
1934: Statisticians began to realize that these
were all re-inventions of the “linear model”…
19
Copyright © 2009, All Rights Reserved. Partek Inc.
Analysis of Variance


Analysis (m-w.com)
 separation of a whole into
its component parts
“Analysis of Variance”
(ANOVA) measures the
effect of multiple
experimental factors (or
phenotypes) on expression
levels.
20
1.15%
Treatment
17.49%
Time
17.40%
58.36%
Treatment *
Noise
Batch
1.64%
Copyright © 2009, All Rights Reserved. Partek Inc.
2-way ANOVA

Total variance is partitioned into variability due to
influencing factors and the rest is assumed to be due to
random error (noise).
18.64%
Treatment
17.40%
58.36%
Time
Treatment * Time
Noise
1.64%

R2=81% for 2-way ANOVA
21
Copyright © 2009, All Rights Reserved. Partek Inc.
ANOVA Partitions Variability

Total variance is partitioned into variability due to
influencing factors and the rest is assumed to be due to
random error (noise).
1.15%
Treatment
17.49%
Time
17.40%
58.36%
Treatment * Time
Noise
Batch
1.64%

R2=99% when Batch included
22
Copyright © 2009, All Rights Reserved. Partek Inc.
Monitor Biological Trends with GO Enrichment
23
Copyright © 2009, All Rights Reserved. Partek Inc.
Upregulation/Downregulation by Biological Process
24
Copyright © 2009, All Rights Reserved. Partek Inc.
Correlation is a form of ANOVA

Correlating a continuous variable with a binary
variable is mathematically equivalent to a ttest, or 1-way ANOVA
Gene A
Gene B
8.3
5.6
7.1
7.3
6.8
8.2
8.9
6.1
7.3
6.9
…
…
…
…
…
…
…
…
…
AML/ALL
0 (AML)
1 (ALL)
1 (ALL)
0 (AML)
1 (ALL)
…
25
Copyright © 2009, All Rights Reserved. Partek Inc.
Example 2: Aging Study

Aging Study:






Examine how gene expression profiles change
with age (GEO GSE8479)
Total of 49 Adult Subjects
N=24 Young (Age 18-29)
N=25 Old (Age 65-84)
26 Female, 23 Male
~6 Beadchips (6*8=48)
26
Copyright © 2009, All Rights Reserved. Partek Inc.
Simple Correlation of Age w/ Genes


R2=81% for correlation (1-factor ANOVA)
Noise = 19%
27
Copyright © 2009, All Rights Reserved. Partek Inc.
Correlation of Age w/ Genes, considering
other effects


R2=90% using 3-factor ANOVA
Noise = 10%
28
Copyright © 2009, All Rights Reserved. Partek Inc.
Partek ANOVA






Balanced & Unbalanced &
Incomplete
Random & Fixed Effects
(mixed model)
Numeric & Categorical
Variables
Nested/Nesting
(Hierarchical) designs
Any number of factors
Flexible Linear Contrasts
29
Copyright © 2009, All Rights Reserved. Partek Inc.
Sample
preparation
High-Throughput
Sequencing
Alignment
Downstream
Analysis
Publication
30
Copyright © 2009, All Rights Reserved. Partek Inc.
Partek Genome Browser – RNA-Seq Data
31
Copyright © 2009, All Rights Reserved. Partek Inc.
Next Generation Sequencing Data Along with Exon
Microarray Data
Partek® Genomics Suite™




Powerful Statistics with Interactive Visualization
Fast, Memory-efficient
Easy to Use
Support for a Wide Variety of Genomic Technologies









3’ Gene Expression
Whole Transcript Exon & Gene
Copy Number, AsCN, LOH
microRNA Expression
CHIP-chip
CHIP-Seq and RNA-Seq
Taqman Real-time PCR
Genomic Integration
Integrated with Public Genomic Resources: NCBI GEO, UCSC, Ensembl,
Gene ontology, KEGG etc.
33
Copyright © 2009, All Rights Reserved. Partek Inc.
Micro RNA - hsa_miR_25
miRNA negatively correlated with its mRNA
target
mRNA expression - 208860_s_at
34
Copyright © 2009, All Rights Reserved. Partek Inc.
Questions & Answers
Questions may be sent via the question box
FREE Data Analysis Webinars
www.partek.com/webinars
Get your FREE trial today!
Email [email protected]
35
Copyright © 2009, All Rights Reserved. Partek Inc.