Download Overview – Courses - STT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Estimation and hypothesis testing
Estimation & hypothesis testing (F-test, Chi2-test, t-tests)
• Introduction
• t-tests
• Outlier tests (k • SD: Grubbs, Dixon's Q)
• F-test, 2 (=Chi2)-test (= 1-sample F-test)
• Tests and confidence limits
CI&NHST; CI&NHST-EXCEL; CI&NHST-Exercise;
Grubbs, free download from:
http://www.graphpad.com/articles/outlier.htm
Analysis of variance (ANOVA)
• Introduction
• Model I ANOVA
Performance strategy
– Testing of outliers
– Testing of variances (Cochran "C", Bartlett)
• Model II ANOVA
– Applications
Cochran&Bartlett; ANOVA
Power and sample size
Power
Statistics & graphics for the laboratory
57
Introduction
Introduction
When we have a set/sets of data ("sample"), we often want to know whether a
statistical estimate thereof (e.g., difference in 2 means, difference of a SD from a
target) is pure coincidence or whether it is "statistically significant". We can approach
this problem in the following way:
The null hypothesis H0 (no difference) is tested against the alternative hypothesis H1
(there is a difference) on the basis of collected data. The decision
acceptance/rejection of the hypothesis is made with a certain probability, most often
with 95% (statistical significance).
Because, usually, we have a limited set of data ("sample"), we extrapolate the
estimates from our sample to the underlying populations by use of the statistical
distribution theory and we assume random sampling.
Hypothesis testing: Example
Is the difference
between the means
of two data sets
real or only accidental?
50
40
30
20
10
0
Statistical significance in more detail
In statistics the words ‘significant’ and ‘significance’ have specific meanings. A
significant difference, means a difference that is unlikely to have occurred by chance.
A significance test, shows up differences unlikely to occur because of a purely
random variation. To decide if one set of results is significantly different from another
depends not only on the magnitude of the difference in the means but also on the
amount of data available and its spread.
Different?
Note: Significance is a function of sample size. Comparing very large samples will
nearly always lead to a significant difference but a statistically significant result is not
necessarily an important result: does it really matter in practice?
Statistics & graphics for the laboratory
58
Introduction
Significance testing – Qualitative investigation
Adapted from: Shaun Burke, RHM Technology Ltd, High Wycombe,
Buckinghamshire, UK. Understanding the Structure of Scientific Data
LC • GC Europe Online Supplement
probably not different and would
'pass' the t-test (tcrit > tcalc)
probably different and would
'fail' the t-test (tcrit < tcalc)
could be different but not enough
data to say for sure (i.e., would 'pass'
the t-test [tcrit > tcalc])
practically identical means, but with so
many data points there is a small but
statistically siginificant ('real') difference
and so would 'fail' the t-test (tcrit < tcalc)
spread in the data as measured
by the variance are similar would
'pass' the F-test (Fcrit > Fcalc)
spread in the data as measured by
the variance are different would 'fail'
the F-test (Fcrit < Fcalc)
could be a different spread
but not enough data to say
for sure would 'pass' the
F-test (Fcrit > Fcalc)
Statistics & graphics for the laboratory
59
Introduction
General remarks
General requirements for parametric tests
• Random sampling
• Normal distributed data
• Homogeneity of variances, when applicable
Note on the testing of means
When we test means, the central limit theorem is of great importance because it
favours the use of parametric statistics.
Central limit theorem (see also "sampling statistics")
The means of independent observations tend to be normally distributed
irrespective of the primary type of distribution.
Implications of the central limit theorem
• When dealing with mean values, the type of primary distribution is of limited
importance, e.g. the t-test for comparisons of means.
• When dealing with percentiles, e.g. reference intervals, the type of distribution is
indeed important.
Statistics & graphics for the laboratory
60
Introduction
Overview of test procedures (parametric)
Testing levels
• 1-sample t-test: comparison of a mean value with a target or limit
• t-test: comparison of mean values (unpaired): Perform F-test before:
– t-test equal variances
– t-test unequal variances
• paired t-test: comparison of paired measurements (x, y)
Testing outliers
• k • SD = Grubbs (http://www.graphpad.com/articles/outlier.htm)
• Dixon's Q (Annex: n = 3 to 25)
Testing dispersions
• F-test for comparison of variances: F =s22/s12
• 2 (=Chi2)-test or 1-sample F-test
Testing variances (several groups)
• Cochran "C"
• Bartlett
Concordance between some parametric and non-parametric tests
Parametric test
Non-parametric analogue
One-sample t-test
Wilcoxon signed ranks (Sign-test)
2 samples t-test
Mann-Whitney U
Paired sample t-test
Wilcoxon signed ranks
Pearson's correlation
Spearman's correlation
ANOVA
Kruskal-Wallis one-way ANOVA
Chi-Squared "Goodness
Kolmogorov Smirnov
of Fit Test"
Non-parametric tests are more robust towards outliers, otherwise the difference in
practice is limited (the central limit theorem makes the t-test asymptotically
nonparametric).
Statistics & graphics for the laboratory
61
t-tests
t-tests
Difference between a mean and a target ("One-sample" t-test)
With: 95%-CI: xm +/- t0.05;n • s/N
 t = (µ0 - xm)/(s/N)
For t
Degrees of freedom: n = N-1
Probability a = 0.05
(s/N = Standard error)
Important  t-distribution (see before: sampling statistics)
Difference between two means
Perform F-test before, decide on the outcome to use the t-test with equal or
unequal variances.
Given independence, the difference between two variables that are normally
distributed is also normally distributed.
The variance of the difference is the sum of the individual variances:
t = (xm2 – xm1)/[s2/N1 + s2/N2]0.5
where s2 is a common estimate of the variance (= "pooled variance")
s2 = [(N1 – 1)s12 + (N2 – 1)s22]/(N1 + N2 – 2)
Standard error of mean difference
• SEdif = [s2/N1 + s2/N2]0.5 ; If N1 = N2 : SEdif = SQRT(2) • s/N
95%-confidence interval of the mean difference
• Mean difference +/- t0.05SEdif
Example
t-Test: 2-Sample Equal Variances
Mean
Variance
Observations
Pooled Variance
Hyp. Mean Diff.
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
A
B
99.98 98.40
9.995 8.608
30
30
9.301
0
58
2.002
0.025
1.672
0.05
2.002
Not given by EXCEL
Difference : -1.5767
95% CI
: -3.1529 to -0.0004
Unpaired Wilcoxon test
(Mann Whitney)
Two-tailed probability, P = 0.0546
Statistics & graphics for the laboratory
62
t-tests
t-test – different variances
The difference is still normally distributed given σ1  σ2 and the difference of
means has the variance: σ12/N1 + σ22/N2, which is estimated as: s12/N1 + s22/N2.
However, the t value: t´= (xm2 – xm1)/[s12/N1 + s22/N2]0.5 does not strictly follow the
t-distribution. The problem is mainly of academic interest and special tables for t´
have been provided (Behrens, Fisher, Welch).
>Perform F-test before t-test!
paired t-test – comparison of mean values (paired data)
Example: Measurements before and after treatment in patients. When testing for
a difference with paired measurements, the paired t-test is preferable. This is
because such measurements are correlated and pairing of the data reduces the
random variation. Thereby, it increases the probability of detecting a difference.
Calculations
The individual paired differences are computed:
difi = x2i – x1i
The mean and standard deviation of the N (=N1 = N2) differences are computed:
difm = Σ difi /N ; sdif = [Σ (difi – difm)2/(N-1)]0.5
SEdif = sdif/N0.5
Testing for whether the mean paired difference deviates from zero:
t = (difm – 0)/SEdif (N-1 degrees of freedom)
Example
t-Test: Paired
Mean
Variance
Observations
Pearson Correl.
Hypoth. Mean Diff.
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
A
B
99.98 98.78
9.995 11.61
30
30
0.894
0
29
4.299
9E-05
1.699
2E-04
2.045
Not given by EXCEL
Mean difference : 1.2033
95 % CI
: 0.6309 to 1.7758
[Paired] Wilcoxon test
2-tailed probability, P = 0.0005
z-tests
When SD is known, t can be substituted by z in the above t-tests. Nevertheless,
the same propagation rules apply when pooled SDs and SDs of differences are
calculated.
Statistics & graphics for the laboratory
63
Outliers
Outliers
Outliers have great influence on parametric statistical tests. Therefore, it is
desirable to investigate the data for outliers (see Figure, for example).
The upper point
is an outlier
according to the
"Grubb's test"
(P < 0.05)
Testing for outliers can be done with the following techniques
• k • SD = Grubbs (http://www.graphpad.com/articles/outlier.htm)
• Dixon's Q (Annex: n = 3 to 25)
All assume normal distributed data.
The k • SD method (outlier = point > k • SD away from the mean)
With this method, it is important to know that the statistical chance to find an
outlier will increase with the number of data investigated.
Outlier probability: Approximately NP, where P is the probability corresponding to
the SD distance for the normal distribution.
Example: N 100; 3SD
Probability ~ 100 x 0.0013 x 2 ~ 0.26
Exact formula: P = 1-(1-0.0027)N
Example: N 100; 4SD
Probability ~ 100 x 0.000032 x 2 ~ 0.006
Note:
When you use a k • SD criterion (e.g., 3 SD), expand that when N becomes
high (e.g., 4 SD)! >Preferably: perform Grubbs test
Grubbs test (free download from: www.graphpad.com/articles/outlier.htm)
This test is recommended by ISO
Grubbs test, critical values 95%
n
Value
(software required). The Grubbs test
3
1.153
can be used iteratively
5
1.672
(after removal of the 1st, look for a second).
Dixon's Q-test
A common outlier test for moderate
sample sizes. Requires software or a
table with critical values (see Annex).
10
2.176
20
2.557
50
2.956
100
3.207
140
3.318
Statistics & graphics for the laboratory
64
F-test & 2 (=Chi2)-test
F-test: Comparing variances
If we have two data sets, we may want to compare the dispersions of the
distributions.
Given normal distributions, the ratio between the variances is considered.
The variance ratio test was developed by Fisher, therefore the ratio is usually
referred to as the F-ratio and related to tables of the F-distribution.
Calculation
F =s22 /s12
Note: The greater value
should be in the numerator  F  1!
Numerator
Denominator
Example
F =s22 /s12
= (0.228)2/(0.182)2
= 1.6
n.s.
Degrees of freedom:
df2(Num) = 14-1 =13
df1(Denom) = 21-1 = 20
Critical(0.05) F = 2.25
F-test: Some notes
• When testing whether two variances are equal or not, take the larger one and
divide by the smaller one: F  1.
• Testing is here two-sided, i.e. the one sided P-value from a F-table should be
multiplied by two.
In other situations, testing may be one-sided, e.g. F-tests in ANOVA and
Regression.
• Notice that the correct numbers of degrees of freedom are used for the
numerator and denominator variance!
Statistics & graphics for the laboratory
65
F-test & 2 (=Chi2)-test
F-test (ctd.)
2 (=Chi2)-test (or 1-sample F-test)
Comparing a variance with a target or limit
Chi2exp = [s2exp • n]/s2Man
Test whether Chi2exp  Chi2critical (1-sided, 0.05).
One-sided: because we test versus a targhet or a limit.
The Chi2-test is used in the CLSI EP 5 protocol.
Relationships between F, t, and Chi2
Relationship between Chi2 and F
Chi2/n = Fn, n = degrees of freedom.
Relationship between F and t
The one-tailed F-test with 1 and n degree of freedom is equivalent to the t-test
with n degree of freedom. The relationship t2 = F holds for both calculated and
tabular values of these two distributions: t(12,0.05) = 2.17881; F(1,12,0.05) = 4.7472
Peculiarities and problems with the EXCEL F-test
Dietmar Stöckl, Diego Rodríguez Cabaleiro, Linda M. Thienpont. Clin Chem Lab
Med 2004;42(12):1455.
EXCEL includes two different versions of the F-test. The first can be accessed
via Tools (main toolbar)/Data Analysis/F-Test Two-Sample for Variances. It yields
the F-value, the one-tailed P-value and the one-tailed F-critical (the Alpha-level is
specified during input; e.g. Alpha = 0.05). The second can be accessed with the
function FTEST: fx (icon in the Standard toolbar)/Statistical (function
category)/FTEST (function name). This function simply returns the P-value.
Remarkably, however, the function returns the two-tailed P-value, while the
explanation in the pop-up menu states that FTEST(Array1;Array2) "Returns the
result of an F-test, the one-tailed probability that the variances in Array1 and
Array2 are not significantly different”.
For completeness of information, EXCEL includes two other functions dealing
with F-statistics, namely, FINV(probability,degrees_freedom1,degrees_freedom2),
which returns F, and FDIST(x,degrees_freedom1,degrees_freedom2), which
returns P. Both functions return values with one-tailed probability and
correspond to the F-test function in the Data Analysis menu.
Statistics & graphics for the laboratory
66
P-values
Interpretation of the P-value
A test for statistical significance (at a certain probability P), tests whether a
hypothesis has to be rejected or not, for example, the nulhypothesis.
The nulhypothesis of the F-test is that 2 variances are not different or that an
experimentally found difference is only by chance.
The nulhypothesis of the F-test will not be rejected when the calculated probability
Pexp is greater or equal than the chosen probability P (P usually chosen as 0.05 =
5%), or when the experimental Fexp value is smaller or equal than the critical Fcrit
value.
Example
Fexp (calculated)
Critical value of Fcrit
Pexp (from experiment)
1.554
2.637
0.182
Chosen probability
P
0.05
Observation
The calculated P-value (0.182 = ~18%) is greater than the chosen P-value (0.05 =
5%). However, the experimental F-value is < the critical F-value.
Conclusion
The nulhypothesis is not rejected, this means that the difference of the variances
is only by chance.
NOTE
The P-value is a fixed calculated value for a test performed on a specific data-set.
Don't confuse that with the a-level of a test that can be chosen in EXCEL (see
screenshot). Different a-values do NOT change the P-value in the output, BUT the
critical value for the test (here Fcrit)
Statistics & graphics for the laboratory
67
Tests and confidence limits
Tests and confidence limits
We have seen for the 1-sample t-test the close relationship between confidence
intervals and significance testing. In many situations, one can use either of them
for the same purpose. Confidence intervals have the advantage that they can be
shown in graphs and they provide information about the spread of an estimate
(e.g., a mean).
The tables below give an overview about the concordance between CI's and
significance testing for means and variances (SD's).
-t: 2-sided, or 1-sided; 1-sided for comparison with claims
-When stable s is known, z may be chosen instead of t
Statistics & graphics for the laboratory
68
Exercises
CI&NHST; CI&NHST-EXCEL
This tutorial/EXCEL template explains the connection between Significance Tests
and Confidence Intervals when the purpose is Null Hypothesis Significance
Testing (NHST). Indeed, for the specific purpose of NHST, P-values as well as CI's
can be used (look whether the null value or target value is inside or outside the
CI), they are just two sides of the same medal.
Examples are the comparison of
i) a standard deviation (SD) with a target value,
ii) two standard deviations,
iii) a mean with a target value,
iv) two means, and
v) a mean paired difference with a target value.
The statistical tests involved are the 1-sample F-test, F-test, 1-sample t-test, t-test,
and the paired t-test, respectively, the CI's of SD, F, mean, mean difference, and
mean paired difference.
Another exercise shows how NHST is influenced by
-The magnitude of the difference
-The number of data-points
-The magnitude of the SD
Please follow the guidance given in the "Exercise Icons" and read the comments.
CI&NHST-Exercise
This file contains exercises for the concepts dealt in the CI&NHST-file. It treats the
following cases:
-1-sample t-test (P = 0.0366; Different)
-t-test (1st example: Outlier Grubbs 96.06; P = 1.19E-5, Pipettes cannot be
exchanged); 2nd example: F-test P = 0.0028 > t-test with unequal variances: P =
0.2162; Precision problems)
-Paired t-test (P = 0.03680; Batches are different)
-1-sample F-test (CI-Calculator: CI = 1.10 – 2.48 : Different >Out of specification)
-F-test (P = 0.001398; Performance of Tech2 is worse)
Please follow the instructions given in the respective Worksheets.
Grubbs
Not included in the package. Please download it free from:
http://www.graphpad.com/articles/outlier.htm
Statistics & graphics for the laboratory
69
Notes
Notes
Statistics & graphics for the laboratory
70
ANOVA
Analysis of Variance: ANOVA
The Three Universal Assumptions of Analysis of Variance
1. Independence
2. Normality
3. Homogeneity of Variance
Overview of the concepts
• Model I (Assessing treatment effects)
Comparison of mean values of several groups.
• Model II (Random effects)
Study of variances: Analysis of components of variance
Model I and II: Identical computations - but different purposes and interpretations!
Why ANOVA?
Model I (Assessing treatment effects)
• ANOVA is an extension of the commonly used t-test for comparing the means of
two groups.
• The aim is a comparison of mean values of several groups.
• The tool is an assessment of variances.
12
12
10
10
t-test
comparison
of two groups
8
6
4
2
Variable value
Variable value
Model I: t-test versus ANOVA
ANOVA
Comparison of more
than two groups
8
6
4
2
0
0
Group
Group
Why not multiple t-tests?
• With several groups, many t-tests are necessary for pair-wise comparisons, e.g.
6 times for 4 groups.
• Multiple comparisons inflate the t-value, i.e. too often one will get a “significant”
result, i.e. a P-value below 5%.
• Thus, ANOVA is useful when dealing with several groups.
Note: ANOVA cannot tell us which individual mean or means are different from the
consensus value and in what direction they deviate. The most effective way to
show this is to plot the data.
Statistics & graphics for the laboratory
71
ANOVA
Introduction – Types of ANOVA
One-way: Only one type of classification, e.g. into various treatment groups
Ex.: Study of serum cholesterol level in various treatment groups
Two-way: Subclassification within treatment groups, e.g. according to gender
Ex.: Do various treatments influence serum cholesterol in the same way in men
and women? (not considered further here)
Principle of One-way ANOVA
12
Distances within (- - -) and
between (—) groups are
squared and summed, and
finally compared.
Variable value
10
8
6
4
2
0
1
2
3
4
Group No.
Case 1: Null-hypothesis valid
12
Variable value
10
8
No significant difference
between groups
6
4
Red — distances are small =
the main source of variation
is within-groups.
2
0
1
2
3
4
Group No.
Case 2: Alternative hypothesis valid
12
Variable value
10
8
Significant difference
between groups
6
4
Red — distances are large =
the main source of variation
is between-groups.
2
0
1
2
3
4
Group No.
Statistics & graphics for the laboratory
72
ANOVA
Introduction – Mathematical model
One-way ANOVA
Mathematical model (example: treatment)
Yij = Grand mean
+ (j) treatment (between-group) effectj
+ ij (within-group)
• Null hypothesis: Treatment group effects are zero
• Alternative hypothesis: Treatment group effects present
Avoiding some of the pitfalls using ANOVA
In ANOVA it is assumed that the data are normally distributed. Usually in ANOVA
we don’t have a large amount of data so it is difficult to prove any departure from
normality. It has been shown, however, that even quite large deviations do not
affect the decisions made on the basis of the F-test.
A more important assumption about ANOVA is that the variance (spread) between
groups is homogeneous (homoscedastic). The best way to avoid this pitfall is, as
ever, to plot the data. There also exist a number of tests for heteroscedasity (i.e.,
Bartlett's test and Levene's test). It may be possible to overcome this type of
problem in the data structure by transforming it, such as by taking logs. If the
variability within a group is correlated with its mean value then ANOVA may not be
appropriate and/or it may indicate the presence of outliers in the data. Cochran's
test can be used to test for variance outliers.
Statistics & graphics for the laboratory
73
ANOVA
Model I ANOVA – Violation of assumptions
12
12
Outlier within subgroups
Large variance in subgroup
10
Variable value
Variable value
10
8
6
4
2
8
6
4
2
0
0
Set of measurements
12
Set of measurements
12
Variance heterogeneity
Variable value
Variable value
Variance increasing with level
10
10
8
6
4
8
6
4
2
2
0
0
Set of measurements
Set of measurements
12
Valid ANOVA
Outlying subgroup,
F significant
Variable value
10
8
6
4
2
0
Set of measurements
PERFORMANCE STRATEGY
Inspection/testing for outliers within a group
• Grubbs test
Variance evaluation
• Cochran´s test$ for a deviating variance
of a subgroup (see Annex for critical values).
The test should not be used iteratively.
Assumes ~ the same number of data in the groups.
• Bartlett´s tests for variance homogeneity (Sokal & Rohlf. Biometry; p: 398)
Consider whether there is a relation between level and variance, e.g.
Proportionality. In the latter case, consider to do a logarithmic transformation.
Statistics & graphics for the laboratory
74
ANOVA
Model I ANOVA – Short summary
• Plot your data
• Generally, the procedure is robust towards deviations from normality.
• However, it is indeed sensitive towards outliers, i.e. investigate for outliers within
groups.
• When the variance within groups is not constant, e.g. being proportional to the
level, logarithmic transformation may be appropriate.
• Testing for variance homogeneity may be carried out by Bartlett´s test.
• Cochran's test can be used to test for variance outliers.
When F is significant
 Supplementary analyses: will not be addressed in more detail!
• Maximum against minimum (Student-Newman-Keuls’ procedure)
• Pairwise comparisons with control of type I error (Tukey)
• Post test for trend (regression analysis)
• Control versus others (Dunnett)
Control group (C) versus treatment groups
10
Treatment
8
6
Control
Apply Dunnett´s Test based on
the principle of “least significant
difference” (LSD), i.e. critical t-values
for differences between treatment groups
and the control group are adjusted
to ensure the correct overall type I error.
(J Am Stat Assoc 1955;50:1096).
12
Variable value
Often, focus is on effects in
treatment groups versus
the control group.
4
2
0
Group
Hitherto, we considered Model I ANOVA:
Treatment (fixed) effects: Effect of planned (controlled) interventions.
Another approach is to look at the variation within and between groups. This
leads us to Model II ANOVA: Variation among groups due to random effects,
e.g. nature´s (uncontrolled) intervention.
Statistics & graphics for the laboratory
75
ANOVA
Model II (random effects) ANOVA
Example:
Ranges of
serum cholesterol
in different subjects.
12
Variable value
10
8
6
4
2
0
Set of measurements
Model II (random effects) ANOVA
(analysis of components of variation)
Mathematical model
Yij = Grand mean
+ Between-group variation j (B)
+ Within-group variation i (W)
Reminder
In model II ANOVA, the main point is to estimate components of variation and
NOT hypothesis testing.
Example: components of variation
• Total dispersion of a single measurement:
T2 = B2 + W2
• Total dispersion of means of n measurements in each group (XGP):
GP2 = B2 + W2/n
The analysis of components of variation has shown us that, generally, standard
deviations are propagated by summing their squares (= variances). This means for
a "total" standard deviation itself, that it is the square root of the sum of the
variances. However, depending on the mathematical relationship between
components that make up a total SD, different propagation rules have to be used
(see next page).
Statistics & graphics for the laboratory
76
ANOVA
Total variance (total standard deviation)
The standard deviation (s) of calculated results (propagation of s)
1. Sums and differences
y = a(±sa) + b(±sb) + c(±sc)  sy = SQRT[sa2 + sb2 + sc2] (SQRT = square root)
Do not propagate CV!
2. Products and quotients
y = a(±sa) • b(±sb) / c(±sc)  sy/y = SQRT[(sa/a)2 + (sb/b)2 + (sc/c)2]
3. Exponents (the x in the exponent is error-free)
y = a(±sa)x  sy/y = x • sa/a
Addition of variances: stot : SQRT[s21 + s22]
A large component will dominate:
Forms the basis for the suggestion by Cotlove et al.: SDA < 0.5 x SDI
• A: analytical variation
• I: within-individual biological variation
 In a monitoring situation the total random variation of changes is only
increased up to 12% as long as this relation holds true.
Component SDs
Total SD
1+1
= 1.41
1+0.5
= 1.12
Applications of model II ANOVA
• Quality control/Assessment
• Method evaluation
• Biological variation
• Goal-setting
Statistics & graphics for the laboratory
77
ANOVA
Software output
One-way ANOVA: Output of statistical programs
Variances within and between groups are evaluated
XGP: Group mean
XGM: Grand mean
*df: Degrees of freedom
**(Mean square = Variance = Squared SD)
Interpretation of model I ANOVA: The F-ratio
If the ratio of between- to within-mean square exceeds a critical F-value (refer to a
table or look at the P-value), a significant difference between group means has
been disclosed.
F: Fisher published the ANOVA approach in 1918.
Components of variation:
Relation to standard output of statistics programs
F = MSB/MSW = [n SDB2 + SDW2]/SDW2
For unequal group sizes, a sort of average n is calculated
according to a special formula: n0 = [1/(K-1)][N - ni2/N]
Interpretation of model II ANOVA:
The components of variation W2 and B2
Note forB
Due to the formula, a negative SQRT may occur. EXCEL will give an error. In that
case, set SDbetween to zero!
Statistics & graphics for the laboratory
78
ANOVA
Conclusion
Model I ANOVA
A general tool for assessing differences between group means
Model II ANOVA
Useful for assessing components of variation
Nonparametric ANOVA
• Kruskall-Wallis test: A generalization of the Mann-Whitney test to deal with > 2
groups.
• Friedman´s test: A generalization of Wilcoxon’s paired rank test to more than two
repeats.
The study of components of variation: not suitable for nonparametric analysis.
Software
ANOVA is included in standard statistical packages (SPSS, BMDP, StatView,
STATA, StatGraphics etc.)
Variance components may be given or be derived from mean squares as outlined
in the tables.
Direct estimation of components of variation, e.g. within- and between-run SD in
quality control or inter/intra-individual biological variation.
CBstat: A Windows program distributed by K. Linnet
Information and download:
http://www.cbstat.com
References
Snedecor GW, Cochran WG. Statistical methods, 8.ed. Iowa State University
Press: Ames, Iowa, 1989, Chapters 12-13.
Fraser CG. Biological variation: From principles to practice. AACC Press,
Washington, 2001.
Statistics & graphics for the laboratory
79
Exercises
Cochran&Bartlett
Many statistical programs do not include the Cochran or Bartlett test. Therefore,
they have been elaborated in an EXCEL-file.
The Cochran&Bartlett file contains the formula's for the
-Cochran test for an outlying variance (including the critical values)
-Bartlett test for variance homogeneity
Both are important for ANOVA
-A calculation example
More experienced EXCEL users may be able to adapt this template to their own
applications.
ANOVA
This tutorial contains interactive exercises for self-education in Analysis of
Variance (ANOVA).
ANOVA can be used for 2 purposes:
-Model I (Assessing treatment effects)
Comparison of MEAN values of several groups.
-Model II (Random effects)
Study of VARIANCES: Analysis of components of variance
Model I and II have IDENTICAL computations but different purposes and
interpretations!
Worksheets 1 - 6 describe a systematic approach to Model I ANOVA. They
address:
-Outlier detection
-Investigation of variance homogeneity or outlying variance
-Performing ANOVA with EXCEL
>Tools>Data Analysis>Anova: Single Factor.
A "Screen Shot" guides the application.
Worksheets 8 & 9 contain Model II ANOVA applications.
NOTE!
EXCEL cannot do Model II ANOVA by default. However, it is easy to calculate the
components of variation from the ANOVA output.
>See the formulas in the respective cells
>and the explanation in the PICTURE.
Note forB
Due to the formula, a negative SQRT may occur. EXCEL will give an error. In that
case, set SDbetween to zero!
Statistics & graphics for the laboratory
80
Notes
Notes
Statistics & graphics for the laboratory
81
Notes
Notes
Statistics & graphics for the laboratory
82
Power and sample size
The statistical Power concept & sample size calculations
When testing statistical hypotheses, we can make 2 types of errors. The so-called
type I (or a error) and the type II (or b error). The power of a statistical test is
defined as 1- b error. The power concept is demonstrated in the figure below,
denoting the probability of the a-error by p, and the one of the b-error by q. Like
significance testing, power calculations can be done 1-and 2-sided.
Relative frequency of
estimated difference
H0
HA
1-q
• Type I error (p)
The probability of
detecting a difference
when it is not present
• Type II error (q)
The probability of not
detecting a difference
when it is present
• Power = 1 - q
q
p/2
0
True difference
D
Purpose of power analysis and sample-size calculation
Some key decisions in planning any experiment are, "How precise will my
parameter estimates tend to be if I select a particular sample size?" and "How big
a sample do I need to attain a desirable level of precision?”
Power analysis and sample-size calculation allow you to decide (a) how large a
sample is needed to enable statistical judgments that are accurate and reliable
and (b) how likely your statistical test will be to detect effects of a given size in a
particular situation. "
Statistics & graphics for the laboratory
83
Power and sample size
The statistical Power concept & sample size calculations
Calculations
Definitions
zp/2 = probability of the nul-hypothesis
(usually 95%, 1- or 2-sided; e.g.: zp/2 = 1.65 or 1.96)
z1-q = probability of the alternative-hypothesis
(usually 90%, always 1-sided; e.g.: z1-q = 1.28)
N = number of measurements to be performed
Mean versus a target value
N = [SD/(mean – target)]2 • (zp/2 + z1-q)2
Detecting a relevant difference (gives the number required in each group)
N = (SDDelta/Delta)2 • (zp/2 + z1-q)2
Delta = Difference to be detected
SDDelta = SQRT(SDx2 + SDy2), usually: SDx = SDy >SDDelta = 2 • SD
(requires previous knowledge of the SD)
Example: difference between 2 groups
Software
Assumptions:
-Delta = 5; SD = 4.5 for both; from this: SDDelta = 2 • SD = 6.36
-Significance level 2-sided 95%, P = 0.05 (zp/2 = 1.96)
Sampling
-Power = 90%, P = 0.1 (z1-q = 1.28)
-N = (6.36/5)2 • (1.96 + 1.28)2 = 17
Conclusion:
To detect a difference of 5, we would need 17 measurements for each group.
Statistics & graphics for the laboratory
84
Exercises
Power
This file contains 2 worksheets that explain the power concept and allow simple
sample-size calculations.
Please use dedicated software for routine power calculations.
Concept
Use the respective "Spinners" to change the values (or enter the values directly in
the blue cells) for:
-Mean
-SD
For comparison of a sample mean versus a target, use sample SD
For comparison of 2 sample means with the same SD,
use SD = SQRT(2)*SD
-Sample size
-Significance level (Only with Spinner!!!)
Limited to the same value for alpha- and beta-error!
NOTE: alpha = 2-sided, beta = 1-sided!!!
>Observe the effect on the power.
Calculations
This worksheet allows the calculation of sample sizes for
-Comparing a mean with a target
-Comparing 2 means.
The calculations are explained in this text and in the "Exercise Icon".
Statistics & graphics for the laboratory
85
Notes
Notes
Statistics & graphics for the laboratory
86