Download Applied Statistics in Biological Research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Categorical variable wikipedia , lookup

History of statistics wikipedia , lookup

Analysis of variance wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Applied Statistics in Biological Research 1 Applied Statistics in Biological Research
Basic concepts, related problems and common methods
Content:
I. Introduction:
The research process
The ‘who is who…?’
some useful definitions
II. Basic Concepts:
Experimental vs. correlational research
Methods of data collection
Analyzing data/descriptive statistics
Frequency table & distribution
Population and sample
Random sampling distribution (RSD)
Estimating parameters and the Standard Normal
distribution
Hypothesis testing
III. Related Problems: How many, how often, how sure . . .?
and the badly desired significance
black sheep, outliers and underdogs
multiple testing
IV. Common Methods: selecting the correct test
simple cases: comparison of 2 or more groups
on one metric character;
and more advanced cases: more than 2 groups
and/or on more than one character
correlation, regression and
analyses of frequencies
September2014 -­‐ CSF -­‐ vienna BIOCENTER ([email protected])
Dr. Karin Schmid Applied Statistics in Biological Research 2 I. INTRODUCTION
1.1 The research process/strategy
Study the background literature and identify feasible methods of testing the hypothesis
formulate clear objectives
extract hypotheses and formulate them in a formal logical language, allowing straight
observation/examination of the hypothesized relation of the variables in question.
Design the experimental protocol carefully: number of experimental and /or control groups;
description of the intervention(s), the sequence or course of any intervention. Who, where,
when and how will the experiment be performed.
the statistical model should be developed and defined in parallel and includes the statistical
method to be used, the necessary sample size, the desired power and the expected effect size.
data analyses: should be correctly performed and the maximum information should be
extracted.
interpretation: only test your a priori formulated hypotheses; do the results reveal substantial
scientific progress or knowledge?; post-hoc interpretation of randomly discovered relations is
incorrect and un-professional scientific work.
report: use guidelines to prevent losing important information and support the cumulative
progress in research.
Applied Statistics in Biological Research 3 1.2. The ‘who is who’ in statistics
Variables measures that vary/change between individuals, entities, locations or in time
• independent variable (cause) ~ factor
• dependent variable (effect) ~ variate
nominal or categorical variables:
• binary: 2 categories (dead/alive)
• categorical: > 2 categories (vegetar./vegan/ omnivore)
ordinal or rank scaled variables:
categories have a logical order (ranks)
interval scaled variables:
equal intervals represent equal differences, e.g. testscore……
rational scaled variables:
ratio of scores makes sense, e.g. length, weight…….
1.3. Some useful definitions:
Treatments, manipulations or interventions, which are under control of the experimenter, are
usually defined as ‘fixed factors’.
A fixed (effect) factor:
discrete variable used to classify experimental units; for example ”sex” (factor with two
levels: “male” and “female”); “diet” (levels “low”, “medium” and “high”), dose, genotype
and any treatment which can be administered. The levels within each factor can be discrete,
such as “drug A” and “drug B”, or they may be quantitative such as 10, 20, 30 and 40 mg/kg.
Random (effect) factors:
usually not controlled by the investigator; unknown factors, that may influence the variable of
interest: e.g. inter-individual differences, litter effects, time and environmental effects,
differences in diet. These effects are responsible for noise (variation), so the aim is to partition
these effects out.
Control of variation:
is of fundamental importance when designing an experiment. Sample size can be reduced,
power increased and smaller responses/effects detected.
situational variation: groups should be processed identically throughout the whole
experiment; e.g. housed in the same room.
inter-individual variation: age, weight, sex
Uncontrolled variability reduces the signal/noise ratio - so larger sample sizes are needed to
detect the effect of a treatment.
Avoiding bias: time, space or other unknown factors also influencing results can only be
controlled by randomisation.
Randomisation of experimental units to treatment groups depends on the experimental design
(randomised block design; completely randomised design; randomisation of the order in which
measurements are made etc.).
Applied Statistics in Biological Research 4 II. BASIC CONCEPTS
2.1. Experimental vs. correlational research
Correlational research: register what naturally happens without any intervention: e.g. genomewide association studies
Experimental research: manipulate at least one variable and register the effect on another
Ex.: ‘… it is a proven fact that there is a significant correlation between the number of
murders committed and ice-cream consumed.’
‘… children with longer arms reason better than those with shorter arms.’
.. shoe size is strongly correlated with reading skills:’
… spurious, sometimes even ‘spooky’ correlations Correlation measures association,
but association is not causation.
2.2.
Methods of data collection
manipulate independent variable
using different entities: between-groups, between-subjects independent design
manipulate independent variable
the same entities: within-subjects, repeated-measures design
Sources of variation:
systematic variation due to the manipulation &
unsystematic variation: random factors
2.3. Error propagation:
every measured quantity: true value + error/uncertainty
when we use a measured quantity to calculate another quantity (sum, mean, division) the
error/uncertainty ‘propagates’ to the uncertainty of the quantity.
assumption: errors are uncorrelated and random →
error propagates differently, depending on the mathematical operation
2.4.
Analyzing data / descriptive statistics
Frequency table and frequency distribution
Measures of the central tendency: mode, median, mean
Measures of the dispersion: range, inter quartile range (IQR), standard deviation, variance
Skew: negative vs. positive (frequent scores are clustered at the upper or lower end,
respectively)
Kurtosis: leptokurtic (thin and spiky) vs. platykurtic (broad and flat) Applied Statistics in Biological Research 5 2.5. Population and sample
samples are drawn from the population; parameters are estimated from the sample
aim: we want to say something about the population based on analyzing the sample!
2.6. Random sampling distribution (RSD)
is the distribution of sample means
SEM (standard error of the mean): standard deviation of sample means
The central limit theorem (CLT) states that the distribution of means of a large number of
independent, identically distributed variables will be approximately normal, regardless of the
underlying distribution.
2.7.
Estimating parameters
Standard Normal
Distribution
CI:
± (1.96 x SE)
Applied Statistics in Biological Research 6 2.8.
Hypothesis testing
Alternative hypothesis
H1: µi ≠ µj postulates an effect
Null hypothesis
H0: µi = µj postulates no effect
Hypotheses postulate effect for population – decision is drawn from sample!
What we test: . . . the chances for obtaining the data we’ve collected assuming that the null
hypothesis is true! p-value < .05 (or lower) – significant
Hypothesis: ‘male and female mice differ with respect to their tail length’
two groups on one metric character -> t-Test for independent samples (details see below)
systematic variation: explained by the model
unsystematic variation: not explained by the model ~ error
!"#$%&
!"#$%
=
!"#$"%&' !"#$%&'!(
!"#$"%&' !"# !"#$%&'!(
=
!""!#$
!""#"
test statistic (t, F, chi2 …)
Truth
(for population studied)
Decision
(based on
sample)
Null
hypothesis
true
Null
hypothesis
false
Reject
null hypothesis
Type I. error
(α - error)
Correct decision
Accept
Null hypothesis
Correct decision
Type II. Error
(β – error)
Type I./ Type II. error
In a controlled experiment usually two or more samples are compared with respect to their
mean, median, their distributions or variance. We normally set up a “null hypothesis” and aim
to reject it.
Due to inter-individual variability we run the risk of making a mistake. If we fail to find a true
difference, then we have a false negative result, also known as a Type II. or beta error. If we
conclude that there is a difference, when in fact it is just due to chance or sampling variation,
then we have a false positive, Type I. or alpha error. These are shown in the table above.
Type I error: is controlled by the significance level (alpha).
Type II error: control is more complicated; depends on several parameters, most importantly
on the effect size (difference between means of the groups), the inter-individual variability
and the sample size.
Applied Statistics in Biological Research 7 III. RELATED PROBLEMS
3.1.
How many, how often, how sure . . .?
Power, effect & sample size
3.1.1. Power
Power is the probability that an experiment can detect a treatment effect (signal), if there is
one. Power is often set at 0.8 or 0.9 (80% or 90%), though the higher the power the larger the
sample size required. Power depends on effect size, sample size and significance level.
In fact the null hypothesis can hardly ever be true, since the probability that some values
calculated from two random samples will be equal is vanishingly small. If the sample size is
large enough, any small difference will become significant, without any scientific meaning.
Hence we have to decide how big a difference between two test statistics has to be for being
meaningful?
3.1.2. Effect size
characterizes the magnitude of the difference between the means of the two groups (M1-M2),
which is likely to be of clinical or scientific importance. It has to be specified by the
investigator in advance. If an estimate of the standard deviation is available (previous work in
the same field), a power analysis can help to estimate the effect size you are likely to be able
to detect for the sample size you decide to use.
Interindividual variability (noise)
is the variation among the experimental subjects, expressed as standard deviation (in the
case of measurement characteristics) and needs to come from previous studies or a pilot study.
If no good estimate is available, a power analysis can be conducted with a low and a high
estimate to see what difference it makes to the estimated sample size or to use the standard
deviation of the control (non-treated) group.
“Standardised effect size” or “Cohen’s d”
is used as a general indication of the magnitude of an effect. Values of d of 0.2, 0.4 or 0.8 are
considered as “small”, “medium” and “large” effect sizes respectively in psychological
research. In work with laboratory animals, especially inbred strains, much larger effects are
seen since the noise is much better controlled. For animal studies use: small effect d= 0.5,
medium effect d= 1.0 and large effects d= 1.5
some useful formulas:
similar SDs in both groups
Inhomogeneity of variances - SD of control group
n1 ≠ n2 use of weighted SD*pooled
Applied Statistics in Biological Research 8 Calculation of the standardized mean difference from t-stats or correlation coefficient r
n1 = n2:
3.1.3. Sample size is the number in each (treatment or control) group. If we know the size of the effect to be
expected, the power and significance level, the sample size can be estimated by various
programs or stat. packages (G*Power). If there is a fixed number of subjects/entities available
the achieved power and effect size can be estimated.
Sample size → affects SD → SE → significance
• sample size affects whether an observed difference between samples is deemed
significant.
• in large samples even very small differences can be significant
• in small samples even large differences can be non-significant
•
•
•
•
•
power
sample size
inter-individual variability
effect size (magnitude of the response to a treatment)
significance level
are interrelated!
If it is possible to specify four of the above mentioned variables you can estimate the fifth
one. Hence we are able to estimate sample size or effect size or power.
Applied Statistics in Biological Research 9 Critical view
When a statistical result is significant, can we assume that it is also important?
No! p is affected by n èsmall and unimportant effect can be significant
Does a non-significant result mean that the null hypothesis is true?
No! The null hypothesis is never true
Does a significant result means that the null hypothesis is false?
No! It just means that that the probability of making a type 1 error is small if the null
hypothesis is accepted.
Two results, say the one with p =.0499, the other with p = 0.512, should we really have
different conclusions? No Solution: meta-analysis:
report:
several effect sizes – constant positive effect?
mean effect size or weighted mean effect size
confidence interval ; exact p–value + effect size
[p = 0.06 or p = 0.93: both p > .05 non-significant]
3.2. Treatment of black sheep, underdogs and outliers …
3.2.1. General Assumptions
Violations against assumptions are one main source of bias:
The statistical model, test statistic or p-value are inaccurate and can lead to wrong conclusions
Additivity and linearity: the outcome variable is linearly related to any predictors; with
several predictors - their combined effect is best described by adding their effects together.
Normality: for CIs the estimate has to come from a normal distribution (ND); for significance
tests, the RSD must be normally distributed; for estimates that define a model the residuals in
the pop. must be normally distributed; central limit theorem: .. if your sample is large enough
normality is less of a concern
Homoscedasticity ~ homogeneity of variances: influences/ biases the standard error and
therewith CIs and significance tests; Levene Test – corrected model
Independence: errors in the model are not related to each other; important for significance
testing and CIs
3.2.2. Outlier: score very different from the rest of the data scores
• changes parameter estimate (the mean)
• even greater influence on the error associated with that estimate
• sum of squared error higher used to compute the SD,
• which in turn is used to estimate the SE
• most test statistics are based of the SS and thus will be biased too
Identify:
graphically: boxplots and histogram
use z-scores (± 2 SD; see also ‘trimming data’)
best case: mistake of tipping in data : )
Applied Statistics in Biological Research 10 3.2.3. Normality:
Kolmogorov-Smirnov Test: exact test for any hypoth. distribution; can be used in small samples,
though tends to be conservative, fails to detect deviations (Lillefors correction for critical
values) or
Shapiro Wilk Test: specifically for ND; more power; also for small samples
disadvantage in both: based on NH significance testing
• large samples – significant for marginal effects
• small samples – lack power to detect violations
large sample – don’t worry too much (CLT)
small sample – don’t rely on a non-significant test (look at skewness, kurtosis and graph)
3.2.4. Reducing bias – correction of outliers:
trimming: certain amount of scores from the extremes are deleted
(a) trimming by a percentage bases rule: e.g. deleting 10% of the highest and lowest
scores (resulting: trimmed mean or M-estimator)
(b) trimming by a SD based rule: remove all values that lie above or below a certain
number of SD greater than the mean (usually it is 2.5 SD)
winsorizing: replace outliers with the next highest score that is not an outlier or use a z-score
of 3 and replace the outlier by 3 x the SD
robust methods: non-parametric procedures; bootstrapping: properties of the (unknown)
sampling distribution are estimated from the sample data
transforming data: normality or linearity in question; changes the form of relationship, but
not the relative distance between scores
3.2.5. Transformations:
transformation is performed to every single value; if we compare differences between
variables or over the time, all variables have to be transformed.
Log transformation (log(xi)): positive skew, positive kurtosis, unequal variances, lack of
linearity
Square root transformation ( 𝑥𝑖): more effect on large scores; positive skew, positive
kurtosis, unequal variances, lack of linearity
Reciprocal transformation (1/xi): reduces large scores; resulting scale is inverse 1/(xhighest –xi);
positive skew, positive kurtosis, unequal variances
Reverse score transformations: negatively skewed data need to be reversed before any
transformation by (xhighest –xi); this reverses the scores – caution: interpretation
prefer robust methods; enhance sample size (>40 or more); F-test and t-test are quite robust in
skewed distributions!
Applied Statistics in Biological Research 11 3.3. Non-parametric Tests
Basic idea: metric data are ranked (or data have already ordinal scale level)
analysis is carried out on ranks
• assumption-free - less assumptions
• small samples (CLT does not really apply)
• ordinal or ranked data sets
Advantages: overcome problem of distribution shape
overcome problem of outliers
Costs:
loss of information about magnitude of differences
less powerful
limited number of comparisons
mean is not representative (report median instead)
3.4. Multiple Testing
Conducting more tests results in inflated error rates: overall probability of no Type I. error is
multiplied
Ex: conduct 3 tests – each at α = .05
overall probability of no type I. error = (0.95)3
= 0.95 x 0.95 x 0.95 = 0.857
probability of at least one type I. error: 1 - 0.857
= 0.143 or 14.3%
an increase from 5% to 14.3%
conducting 10 tests: 1 – 0.9510= .40
40% chance of having made at least on Type I error
familywise error rate: FWER= 1 – (0.95)n
3.4.1. Control of inflated error rates
!
Bonferroni correction (Pcrit = ! ): k = number of comparisons; conservative; loss of power
The Šidák procedure (αsid = 1 – (1-α)1/m): m = number of comparisons; independence of test
statistics; more power
Hochberg’s Sequential Method: independence of test statistics; more power
Tukey prodecure: only for pairwise comparisons; independent observations and equal
variation; more power
Games-Howell: unequal variances unequal group sizes small sample sizes
3.4.2. Different approaches to control Type I. errors:
• Per comparison error rate (PCER): the expected value of the number of Type I errors
divided by the number of hypotheses
• Per-family error rate (PFER): the expected number of Type I errors
• Family-wise error rate: the probability of at least one type I error, FEWR = P (V ≥ 1)
• False discovery rate (FDR) is the expected proportion of Type I errors among the rejected
hypotheses; FDR tests are designed to control the expected proportion of incorrectly rejected
null hypotheses ("false discoveries")
• Positive false discovery rate (pFDR): the rate that discoveries are false
Applied Statistics in Biological Research 12 IV. COMMON METHODS
1. 4.1. Selection of the correct test How many groups are we planning to compare?
2 groups ore more than two groups
2. Will each individual be tested once ore are there measures over several points of
time?
A repeated measures design is a design where one and the same dependent variable is
measured at two or more time points. Note that it is the same character or behaviour
that is measured in the time dimension.
If we investigate the effect of a drug, say on several blood parameters, we have
different dependent variables also called variates. These can either be compared
separately in different tests or by multivariate methods.
[! Note that: whenever possible a within subjects comparison is favourable!]
The number of animals is lower, the precision and power of test is higher - due to
lower interindiv. variablility.
3. What will be measured?
The ‘dependent variable’, variable of interest, the character we plan to manipulate
with the treatment. The statistical method depends on the scale level of the dependent
variable. The higher the scale level, the better the statistical approach –as long as …
4. Are the respective assumptions met?
4 hierarchical scales: characteristics and options for further operations
• ratio scale
natural zero point ; all math. Operations allowed; e.g. lengths, weight
• interval scale same intervals; e.g. IQ-scores, but no ratios
• ordinal scale scale of ranks; >/< ; e.g. ranks; level of education
• nominal scale equal/un-equal; frequencies in categories; e.g. sex; stream, group
Keep in mind, it is always possible to scale downwards, from a higher to a lower scale level,
but not the other way round.
Accordingly to the scales the statistical methods available are to be seen hierarchically
concerning their power and value. This is, however only true if the test-specific assumptions
are met or the violations are minor.
Applied Statistics in Biological Research 13 4.2. Selection of statistical hypothesis test - comparison of 2 or more groups
Scale level
comparison of
comparison of
2 groups
> 2 groups
of dependent
variable
Measures
metric & ND
independent
Dependent
t-Test
t-Test
for
independent
samples
ordinal/ranks
Nominal
Test of Median,
Man-Whitney-UTest
Chi-Square-Test
independent
dependent
univariate
univariate
ANOVA
ANOVA
for
for
for
paired samples
independent
repeated measures
samples
sign-Test,
Kruskal-Wallis HFriedman’s
Wilcoxon-signedTest
Rank-AV
Rank-Test
Chi-Square-Test Cochran-Q-Test
McNemar‘s χ2
Test of symmetry
Random sampling is required for all statistical inference because it is based on probability,
though true random samples are difficult to find.
4.3. Simple cases: 2 independent groups
on one metric character
Example: H1: male and female mice differ with respect to their tail length
H0: the population means from the two unrelated groups are equal
T-Test for independent samples
Assumptions: unrelated, independent groups;
normality of the dependent variable (Shapiro-Wilk)
homogeneity of variances (Levene’s Test)
Violations:
robust against violation of normality assumption
correction for heterogeneous variances
What if …?
deviation from normality & different group sizes
transform data (log of data) or apply …
Applied Statistics in Biological Research 14 Mann-Whitney U-Test
comparison of 2 independent groups for ranked data
H0 :
the distributions of both groups are equal
the groups do not differ with resp. to their medians
Assumptions:
independence of observations
responses/values are ordinal
Asymptotic method: approximation for large samples (n1 > 10 and n2 > 10: U approx. nd)
Exact method: gives you the exact significance (in small samples < 50)
4.4. Simple cases: 2 related groups
on one metric character
Example: one group of n=12 mice was tested on the effect of ‘Red Bull’ on their explorative
behaviour on the T-maze compared to application of saline.
H1: the number of head-dips differs between the two conditions (same entities tested twice)
H0: the population mean of paired differences of the two samples is 0
Paired-Samples t Test
• one sample has been tested twice (repeated measures) or,
• two samples have been "matched" or "paired", in some way.
(matched subjects design)
Assumptions:
paired observations; independent observations;
approx. normal distribution of difference scores;
homogeneity of variances; no significant outliers
Violations:
robust against violation of normality assumption
What if …?
strong deviation from normality
correction of outliers or apply …
Wilcoxon signed rank Test
comparison of 2 related groups for ranked data
H0: the median difference between the pairs is zero
Assumptions: data are paired and come from the same population
each pair is chosen randomly and independent
data are measured at least on ordinal scale
the distribution of the differences is symmetric around the median
4.5.
One sample t-Test
H0: the sample mean does not deviate from the population
Applied Statistics in Biological Research 15 parameter µ
Assumptions: normal distribution of dependent variable
random samples population
independence of samples
known population mean
Violations:
robust against violation of normality assumption for
sample sizes equal to or greater than 30
___________________________________________________________________________
4.6. Simple cases: more than 2 independent groups
on one metric character
Ex: H1: different drug treatment in parent animals reveals differences in tail length of their
offspring
H0: µi = µj = … µk
one (between) factor (≥ 3 levels)
alcoholics, junkies, control
one dependent metric variable
tail length of their offspring
univariate one-way ANOVA
Assumptions: independence of observations & random samples
normal distribution of dependent variable (in each group) –> Shapiro-Wilk test
homogeneity of variances -> Levene’s test
Violations:
independence – very seriously
violation of ND: robust resp. Type I. error;
skewness little effect, platykurtosis attenuates power
homogeneity: robust if group sizes are approx. equal
(largest/smallest < 1.5)
What if …? strong deviation from normality and/or unequal sample size
correction of outliers; transformation of raw data or
Kruskal-Wallis (H)-Test
more than 2 independent groups on ranked/ordinal data, normality not assumed, outliers
H0: whether samples originate from the same distribution
Assumptions: independence of observations
the responses are ordinal
identically shaped and scaled distribution for each group
___________________________________________________________________________
4.7. ANOVA/MANOVA Models
Applied Statistics in Biological Research 16 Whenever possible: try to get on with an ANOVA method, it has the highest power and
allows max. output of the data!
The great benefit of all ANOVA designs comes with the addition of another 2nd or 3rd factor,
since we can test more hypotheses within a single test.
Typically, there are many factors such as sex, age, genotype, diet etc. which can influence the
outcome of an experiment. Factorial designs are efficient and provide extra information (the
interactions between the factors), which cannot be obtained when using single factor designs;
and factorial designs are powerful because differences among the levels of each factor are
determined by averaging across all other factors.
Overview of ANOVA/MANOVA designs
> 2 groups
DV = 1
Factor(s)
ANOVA
independent
univariate
one-way
two-way
three-way
…
1
2
3
…
3 mouse
strains
3 mouse
strains x 2
sex
3 mouse
strains x 2
sex x 3 dose
3x2
3x2x3
3 mouse
strains at
t1, t2, t3
3 mouse
strains x 2
sex at
t1, t2, t3
3x3
3x2x3
3 mouse
…
strains x 2
sex x 3 dose
at
t1, t2, t3
3x2x3
1
2
3
3 mouse
strains x 2
sex on
BGL,
temp.,
cholest.
3 mouse
strains x 2
sex x 3 dose
on BGL,
temp.,
cholest.
3x2
3x2x3
Design
repeated m.
univariate
(mixed model
repeated
measures
design)
Design
DV > 1
MANOVA
independent
Design
(M)ANCOVA
multivariate 3 mouse
strains on
BGL,
temp.,
cholest.
3x3
repeated m.
multivariate … same at
t1, t2, t3
… same at … same at
t1, t2, t3
t1, t2, t3
independent
&
repeated m.
univariate
1 fixed
2 fixed
&
factor
factors
multivariate & covariate &
covariate
3 fixed
factors
&
covariate
…
…
…
Applied Statistics in Biological Research 17 4.8. advanced cases: more than one independent group/factor
on one metric character
(univariate) two-way ANOVA
factor 1 (sex)
factor 2 (drug)
one dependent metric variable: tail length
3 hypotheses in one (omnibus) test
main effect 1: no difference between male and female
main effect 2: no difference between drug treatment
interaction: no difference between sex x drug
3 x 2 design: one instead of 7 t-tests (α – Inflation)! Example:
two-way ANOVA: 2 x 3 design
on one dependent variable: tail length
Main effect
Main effect
Basic concept
SS
t
Total variability
SS
m
variance explained by the exp.
manipulation
SS
A
variance explained
by factor A
SS
B
variance explained
by factor A
SSr
unexplained
variance
SS
AxB
variance explained
by interaction
Applied Statistics in Biological Research 18 Specific hypothesis:
predefined contrasts: comparisons constructed to answer specific research questions.
• Deviation
• Simple
• Repeated
• Helmert
• Difference
or Post-hoc comparisons
• Fisher's least significant difference LSD liberal • Studentized Newman-Keuls lacks control over FWER • Bonferroni’s (small number of means) control Type I. error, but conservative
• Tukey’s (large number of means) control Type I. error, but conservative
• Dunn conservative • Scheffé conservative
REGWQ (Ryan-­‐Einot-­‐Gabriel-­‐Welsch Q) •
good power; comparison of all pairs, not for different group sizes
Keep in mind: Type I. error rate and statistical power are linked. If a test is conservative
(prob. of Type I. error small) it is likely to lack statistical power (prob. of Type II. error high)
4.9. advanced cases: more than 2 groups, more than 2 times
ANOVA for repeated measures:
> 2 groups and more than 2 times
“extension of paired t-test” for more than 2 groups
one dependent metric variable
measured over more than two points in time
or Mixed model repeat. measures: one (between) factor (group variable)
one dependent metric variable
measured more than 2 times (within factor)
variants:
one between – one within factor
one between – two within factors
two between – one within factor a.s.o.
benefit:
randomized block designs; variability among subjects due to individual
differences is completely removed from the error term -> more power and
fewer number of subjects
Assumptions: independence of observations
multivariate normality
sphericity: homogeneity of variances of differences of the repeated measures
levels (k > 2; Mauchly’s Sphericity Test; Epsilon: ε = 1
max. deviation: ε = 1/(k-1)
Violations:
independence – violation very seriously
violation of multivariate ND: robust resp. Type I. error
sphericity: Greenhouse-Geisser ε for correction of df
or
Multivariate approach (if n > k+10)
Applied Statistics in Biological Research 19 Example: two-way ANOVA for repeated measures
2 fixed (between) factors:
strain (3); sex (2)
1 rep. measures factor:
BGL at t0, t1 and t2 (within)
18 cell means
Main effect
Hypotheses:
main effect 1: no difference betw. mouse strains (12 x 12 x 12 mice)
main effect 2: no difference betw. male and female mice (24 x 24 mice)
within effect: no difference in points of time (36 mice)
interactions: no difference between strains x sex
no difference between strains x time
no difference between sex x time
no difference between strains x sex x time
post-hoc pairwise comparisons; when sphericity assumption is violated:
when ε > .70 -> Tukey
when ε < .70 -> Bonferroni
What if …?
Friedman rank ANOVA
but only for one repeated measures factor!
H0: no difference in the central tendency of the samples
Assumptions: data are paired
each pair is chosen randomly and is independent
data are measured at least on ordinal scale
comparable distributions of the samples
___________________________________________________________________________
Applied Statistics in Biological Research 20 4.10. advanced cases: several factors on more than one metric character
multivariate approach
one-way MANOVA
one fixed factor (between)
more than one dependent metric variable
Assumptions: independence of observations & random samples
observations on dependent variable follow a multivariate ND/group
population covariance matrices for the p dependent variables are equal
Violations:
independence – violation very seriously
violation of mv ND: robust resp. Type I. error; platykurtosis
attenuates power; in fact never tested
cov. matrices: robust if group sizes are approx. equal
(largest/smallest < 1.5)
Example: one-way MANOVA
1 fixed (between) factor:
sex (2)
3 dependent variables:
BGL, Cholesterine, MCV
2 x 3 design: 6 cell means
v after treatment
variables
Sex
BGL
Chol.
MCV
female
male
H0 :
no difference between f/m with resp. to BGL or Chol. or MCV
no difference between f/m with respect to pattern/relation between BGL, Chol. and
MCV
_____________________________________________________________________
Applied Statistics in Biological Research 21 4.11. Correlation
Example: … dramatic increase of people with diabetes mellitus. In the sense of
monitoring diabetes it would thus be of high interest whether there is a relation
between glucose level in blood and glucose level in saliva.
2 interval scaled variables: relation? Do the variables co-variate?
Standardized covariance
Pearson’s product-moment
correlation coefficient
Bivariate correlation
Assumptions:
linearity
normality
small samples
bias – outliers
interval scale level
Assumptions met:
Non-linearity:
Non-normality/outliers:
Pearson r
Transform data
Spearman rho (rs)
Kendall’s τ (for small samples and tied ranks)
Applied Statistics in Biological Research 22 If one of the two variables is dichotomous . . .
Biserial correlation (rb) :dichotomous variable is discrete; ‘dead or alive’,‘male/female’
‘marker present/absent’
Point-biserial correlation (rpb): dichotomous variable is continuous; ‘passing or failing an
exam’; ‘p-values’ : ); ‘artificial’ dichotomy
Partial correlation:
qualifies the relationship between two variables while accounting for the effects of a third
variable on both variables.
___________________________________________________________________________
4.12. Linear Regression: simple
Example: … we want to know whether the cholesterol level as a baseline measure can be used
in terms of a predictor for the cholesterol level after one month? So again, we fit a model to
our data…..
Constant b0:if X = 0, the predicted value for y
Model Summary
Model
1
R
,861
R Square
a
Adjusted R
Std. Error of the
Square
Estimate
,741
,740
25,258
a. Predictors: (Constant), Cholesterol, baseline
R2 The baseline cholesterol level accounts for 74.1% of the variation.
r estimate of the overall fit of the regression model.
Applied Statistics in Biological Research 23 ANOVA
Model
1
Sum of Squares
b
df
Mean Square
F
Regression
314337,948
1
314337,948
Residual
109729,408
172
637,962
Total
424067,356
173
Sig.
492,722
,000
a
a. Predictors: (Constant), Cholesterol, baseline
b. Dependent Variable: Cholesterol after 1 Month
F-ratio: regression model results in a significant prediction
Coefficients
a
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
Cholesterol, baseline
Std. Error
34,546
9,416
,863
,039
Coefficients
Beta
t
,861
Sig.
3,669
,000
22,197
,000
a. Dependent Variable: Cholesterol after 1 Month
gradient of regression line (b1) – change in the outcome
Assumptions:
• Additivity and Linearity
• Independent errors: residuals should be uncorrelated; Durbin-Watson Test around 2
• Homoscedasticity (variances are assumed not to be significantly different)
• Normally distributed errors
• Predictors are uncorrelated to ‘external variables’
• No perfect multicolinearity
Variable types:
predictor variables: quantitative or categorical (at least 2 categories)
outcome variable: quantitative, continuous
chol1 = 34.546 + (0.863 x cho.0)
For a baseline cholesterol level of 280 the level of cholesterol after one month can be
expected to be 276
Applied Statistics in Biological Research 24 4.13. Multiple Regression
Example: … we want to include other predictors in the model: age, body weight, blood
glucose level (BGL)..
_____________________________________________________________________
4.14. Analysis of frequencies: χ2-procedures
2
Basic idea of all χ -procedures:
comparison of an observed frequency with an empirical frequency
according to a theoretical distribution (normal, equal, other)
Is this difference between the expected and observed frequency due to
sampling error, or is it a real difference?
…compare whether the parents’ drug addiction led to a different risk for dying (for three
categories) by application of a one dimensional χ2-test ,
or we conduct a k x l χ 2-test and compare the frequencies of two categorical variables in one
test (sex x parent drug addiction: risk of dying)
Applied Statistics in Biological Research 25 k x l χ 2-test: Frequency
Overview: Chi-square procedures
1 variable
2 variables
2
2
dichotomous
(2-tiered)
1 time: one-dimensional χ
2 times: McNemar test of
symmetry (pre-/post)
k times: Cochran-Q test
multi-level
(categorical)
one-dimensional χ :
k.l χ -test
comparison of an emprirical
and theoretical distribution
2
4-fields-χ -test
2
k variables
configural-frequency
analysis for alternative
variables
configural-frequency
analysis for multi-level
variables
References and recommendations:
Books
Brilliant introduction into the basics of Statistics with lots of laughter:
Field, A. (2012). Discovering Statistics using IBM SPSS Statistics. 4rd Ed.; Sage Publications.
or
Field, A. (2012). Discovering Statistics using R ….; Sage Publications.
Field, A. (2012). Discovering Statistics using SAS ….; Sage Publications.
More advanced/specialized literature:
Applied Statistics in Biological Research 26 Stevens, J.P (2009). Applied Multivariate Statistics for the Social Sciences. 5th Ed.;
Routledge, NY.
Tinsley, Brown (2000). Handbook of applied multivariate statistics and mathematical
modelling. Cohen et al. (2003). Applied multiple regression / correlation analysis for the behavioural
sciences. 3rd ed.
Ewens, W.J. and Grant, G.R. (2002): Statistical methods in bioinformatics. Springer, New
York.
Very good for small samples and non-parametric tests (though in German):
Bortz, J.; Lienert, G.A. (2008). Kurzgefasste Statistik für die Klinische Forschung. Leitfaden
für die verteilungsfreie Analyse kleiner Stichproben. Springer, Berlin.
Basic books and bibles:
Stevens (1999). Intermediate statistics. A modern approach
Gravetter, Wallnau (2012). Statistics for the Behavioral Sciences, 9th ed.
Bortz, J. (1999): Statitsik für Sozialwissenschaftler. 5. Auflage; Springer, Berlin.
Bortz, J. & Döring, N. (2006): Forschungsmethoden und Evaluation: für Human- und
Sozialwissenschaftler: Fur Human- Und Sozialwissenschaftler (Springer-Lehrbuch)
4. Auflage; Springer, Berlin
___________________________________________________________________________