Download anova

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Analysis of variance wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
SPH 247
Statistical Analysis of
Laboratory Data
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
1
ANOVA—Fixed and Random Effects
 We will review the analysis of variance (ANOVA) and
then move to random and fixed effects models
 Nested models are used to look at levels of variability
(days within subjects, replicate measurements within
days)
 Crossed models are often used when there are both
fixed and random effects.
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
2
The Basic Idea
 The analysis of variance is a way of testing whether
observed differences between groups are too large to
be explained by chance variation
 One-way ANOVA is used when there are
k ≥ 2 groups for one factor, and no other quantitative
variable or classification factor.
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
3
A
April 2, 2013
B
C
9
10
12
7
9
14
7
8
14
9
9
12
SPH 247 Statistical Analysis of Laboratory Data
4
Data = Grand Mean +
Column Deviations from grand mean +
Cell Deviations from column mean
Are the column deviations from the grand mean
too big to be accounted for by the cell
deviations from the column means?
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
5
Data
A
April 2, 2013
B
C
9
10
12
7
9
14
7
8
14
9
9
12
SPH 247 Statistical Analysis of Laboratory Data
6
Column Means
A
April 2, 2013
B
C
8
9
13
8
9
13
8
9
13
8
9
13
SPH 247 Statistical Analysis of Laboratory Data
7
Deviations from Column Means
A
April 2, 2013
B
C
1
1
-1
-1
0
1
-1
-1
1
1
0
-1
SPH 247 Statistical Analysis of Laboratory Data
8
red.cell.folate
package:ISwR
R Documentation
Red cell folate data
Description:
The 'folate' data frame has 22 rows and 2 columns. It contains
data on red cell folate levels in patients receiving three
different methods of ventilation during anesthesia.
Format:
This data frame contains the following columns:
folate: a numeric vector. Folate concentration (μg/l).
ventilation: a factor with levels:
'N2O+O2,24h': 50% nitrous oxide and 50% oxygen, continuously for 24~hours;
'N2O+O2,op': 50% nitrous oxide and 50% oxygen, only during operation;
'O2,24h':
no nitrous oxide, but 35-50% oxygen for 24~hours.
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
9
> data(red.cell.folate)
> help(red.cell.folate)
> summary(red.cell.folate)
folate
ventilation
Min.
:206.0
N2O+O2,24h:8
1st Qu.:249.5
N2O+O2,op :9
Median :274.0
O2,24h
:5
Mean
:283.2
3rd Qu.:305.5
Max.
:392.0
> attach(red.cell.folate)
> plot(folate ~ ventilation)
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
10
350
300
200
250
folate
N2O+O2,24h
N2O+O2,op
O2,24h
ventilation
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
11
> folate.lm <- lm(folate ~ ventilation)
> summary(folate.lm)
Call:
lm(formula = folate ~ ventilation)
Residuals:
Min
1Q
-73.625 -35.361
Median
-4.444
3Q
35.625
Max
75.375
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
316.62
16.16 19.588 4.65e-14 ***
ventilationN2O+O2,op
-60.18
22.22 -2.709
0.0139 *
ventilationO2,24h
-38.62
26.06 -1.482
0.1548
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 45.72 on 19 degrees of freedom
Multiple R-Squared: 0.2809,
Adjusted R-squared: 0.2052
F-statistic: 3.711 on 2 and 19 DF, p-value: 0.04359
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
12
> anova(folate.lm)
Analysis of Variance Table
Response: folate
Df Sum Sq Mean Sq F value Pr(>F)
ventilation 2 15516
7758 3.7113 0.04359 *
Residuals
19 39716
2090
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
13
Two- and Multi-way ANOVA
 If there is more than one factor, the sum of squares can
be decomposed according to each factor, and possibly
according to interactions
 One can also have factors and quantitative variables in
the same model (cf. analysis of covariance)
 All have similar interpretations
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
14
Heart rates after enalaprilat (ACE inhibitor)
Description:
36 rows and 3 columns.
data for nine patients with congestive heart failure before and
shortly after administration of enalaprilat, in a balanced two-way
layout.
Format:
hr a numeric vector. Heart rate in beats per minute.
subj a factor with levels '1' to '9'.
time a factor with levels '0' (before), '30',
(minutes after administration).
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
'60', and
'120'
15
> data(heart.rate)
> attach(heart.rate)
> heart.rate
hr subj time
1
96
1
0
2 110
2
0
3
89
3
0
4
95
4
0
5 128
5
0
6 100
6
0
7
72
7
0
8
79
8
0
9 100
9
0
10 92
1
30
......
18 106
9
30
19 86
1
60
......
27 104
9
60
28 92
1 120
......
36 102
9 120
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
16
> plot(hr~subj)
> plot(hr~time)
> hr.lm <- lm(hr~subj+time)
> anova(hr.lm)
Analysis of Variance Table
Note that when the design is orthogonal,
the ANOVA results don’t depend on the
order of terms.
Response: hr
Df Sum Sq Mean Sq F value
Pr(>F)
subj
8 8966.6 1120.8 90.6391 4.863e-16 ***
time
3 151.0
50.3 4.0696
0.01802 *
Residuals 24 296.8
12.4
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> sres <- resid(lm(hr~subj))
> plot(sres~time)
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
17
130
120
110
100
70
80
90
hr
1
2
3
4
5
6
7
8
9
subj
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
18
130
120
110
100
70
80
90
hr
0
30
60
120
time
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
19
10
5
-5
0
sres
0
30
60
120
time
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
20
Fixed and Random Effects
 A fixed effect is a factor that can be duplicated (dosage
of a drug)
 A random effect is one that cannot be duplicated
 Patient/subject
 Repeated measurement
 There can be important differences in the analysis of
data with random effects
 The error term is always a random effect
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
21
Fixed Effect One-way ANOVA
y  i  òi
y     i  òi
 i  i  

i
0
ò ~ N (0,  ò2 )
E ( MSE )   ò2
E ( MSA)  Q( i )   ò2
Q( i ) 
n  i2
a 1
H 0 : Q( i )  0
MSA / MSE ~ F under the null
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
22
Random Effect One-way ANOVA
y  i  òi
y     i  òi
ò ~ N (0,  ò2 )
 ~ N (0,  2 )
E ( MSE )   ò2
E ( MSA)  n 2   ò2
n is replicates per level of 
H 0 :  2  0
MSA / MSE ~ F under the null
ˆ2  ( MSA  MSE ) / n
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
23
Estradiol data from Rosner
 5 subjects from the Nurses’ Health Study
 One blood sample each
 Each sample assayed twice for estradiol (and three
other hormones)
 The within variability is strictly technical/assay
 Variability within a person over time will be much
greater
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
24
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
25
> anova(lm(Estradiol ~ Subject,data=endocrin))
Analysis of Variance Table
Response: Estradiol
Df Sum Sq Mean Sq F value
Pr(>F)
Subject
4 593.31 148.329 24.546 0.001747 **
Residuals 5 30.21
6.043
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Replication error variance is 6.043, so the standard deviation of replicates is 2.46 pg/mL
This compared to average levels across subjects from 8.05 to 18.80
Estimated variance across subjects is (148.329 − 6.043)/2 = 71.143
Standard deviation across subjects is 8.43 pg/mL
If we average the replicates, we get five values, the standard deviation of which is also 71.1
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
26
Fasting Blood Glucose
 Part of a larger study that also examined glucose
tolerance during pregnancy
 Here we have 53 subjects with 6 tests each at intervals
of at least a year
 The response is glucose as mg/100mL
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
27
> anova(lm(FG ~ Subject,data=fg2))
Analysis of Variance Table
Response: FG
Df Sum Sq Mean Sq F value
Pr(>F)
Subject
52 10936 210.310 2.9235 9.717e-09 ***
Residuals 265 19064 71.938
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >
Estimated within-Subject variance is 71.938, so the standard deviation is 8.48 mg/100mL
Estimated between-Subject variance is (210.310 − 71.938)/6 = 23.062, sd = 4.80 mg/100mL
The variance of the 53 means is 35.05, which is larger because it includes a component of the
within-subject variance
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
28
Nested Random Effects Models
 Cooperative trial with 6 laboratories, one analyte (7 in
the full data set), 3 batches per lab (a month apart),
and 2 replicates per batch
 Estimate the variance components due to labs,
batches, and replicates
 Test for significance if possible
 Effects are lab, batch-in-lab, and error
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
29
Analysis using lm or aov
> anova(lm(Conc ~ Lab + Lab:Bat,data=coop2))
Analysis of Variance Table
Response: Conc
Df Sum Sq
Lab
5 1.89021
Lab:Bat
12 0.20440
Residuals 18 0.11335
Mean Sq F value
Pr(>F)
0.37804 60.0333 1.354e-10 ***
0.01703 2.7049
0.02768 *
0.00630
The test for batch-in-lab is correct, but the test for lab is not—the denominator should be
The Lab:Bat MS, so F(5,12) = 0.37804/0.01703 = 22.198 so p = 3.47e-4, still significant
Residual
Batch
Lab
April 2, 2013
0.00630
0.00537
0.01683
0.0794
0.0733
0.2453
SPH 247 Statistical Analysis of Laboratory Data
30
Expected Mean Squares
l laboratories
b batches per laboratory
r replicates per batch
laboratories
br L2  r B2   e2
batches within laboratories
r B2   e2
replicates within batches
 e2
ˆ B2  ( SS B  SSE ) / r
ˆ L2  ( SS L  SS B ) / br
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
31
Analysis using lme
 R package nlme
 Two separate formulas, one for the fixed effects and
one for the random effects
 In this case, no fixed effects
 Nested random effects use the / notation
lme(Conc ~1, random = ~1 | Lab/Bat,data=coop2)
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
32
lme(Conc ~1, random = ~1 | Lab/Bat,data=coop2)
Linear mixed-effects model fit by REML
Data: coop2
Log-restricted-likelihood: 21.02158
Fixed: Conc ~ 1
(Intercept)
0.5080556
Average Concentration
Random effects:
Formula: ~1 | Lab
(Intercept)
StdDev:
0.2452922
SD of Labs
Formula: ~1 | Bat %in% Lab
(Intercept)
Residual
StdDev: 0.07326702 0.07935504
SD of Batches and Replicates
Number of Observations: 36
Number of Groups:
Lab Bat %in% Lab
6
18
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
33
Hypothesis Tests
 When data are balanced, one can compute expected




mean squares, and many times can compute a valid F
test.
In more complex cases, or when data are unbalanced,
this is more difficult
One requirement for certain hypothesis tests to be
valid is that the null hypothesis value is not on the
edge of the possible values
For H0: α = 0, we have that α could be either positive or
negative
For H0: σ2 = 0, negative variances are not possible
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
34
Effect
Variance SD
-----------------------------------Residual 0.00630 0.0794
Batch
0.00537 0.0733
Lab
0.01683 0.2453
 The variance among replicates a month apart (0.00630
+ 0.00537 = 0.01167) is about twice that of those on the
same day (0.00630), and the standard deviations are
0.1080 and 0.0794. These are CV’s on the average of
21% and 16% respectively
 The variance among values from different labs is about
0.0285, with a standard deviation of 0.1688 and a CV of
about 33%
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
35
More complex models
 When data are balanced and the expected mean
squares can be computed, this is a valid way for testing
and estimation
 Programs like lme and lmer in R and Proc Mixed in
SAS can handle complex models
 But most likely this is a time when you may need to
consult an expert
April 2, 2013
SPH 247 Statistical Analysis of Laboratory Data
36