Download day2-E2005

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Ph.D. COURSE IN BIOSTATISTICS
DAY 2
SOME RESULTS ABOUT MEANS AND VARIANCES
The sample mean and the sample variance were used to describe a
typical value and the variation in the sample.
We may similarly use the population mean, the expected value, and
the population variance to describe the typical value and the variation
in a population.
These values are often referred to as the theoretical values, and the
sample mean and the sample variance are considered as estimates
of the analogous population quantities.
If X represents a random variable, e.g. birth weight or blood pressure,
the mean and variance are often denoted
mean  E ( X )  
variance  Var ( X )   2
The notation is also used when the distribution is not normal.
1
The random variation in a series of observations is transferred to
uncertainty, i.e. sampling error or sampling variation, in estimates
computed from the observations. The average, or sample mean,
is an important example of such an estimate.
Let X 1 , X 2 , , X n denote a random sample of size n from a
population with mean  and variance  2 then the average X
is itself a random variable.
If several samples of size n are drawn from the population,
the average value will vary between samples.
Terminology:
A ”random sample” implies that the observations are mutually
independent replicates of the experiment ”take a unit at random from
the population and measure the value on this unit”.
For the average (sample mean) we have
E( X )  
Var ( X ) 
2
n
2
The sample mean is an unbiased estimate of the population mean.
The variance of the sample mean is proportional to the variance of
a single observation and inversely proportional to the sample size.
The standard deviation of the sample mean =
standard error of the mean  
n = s.e.m.
Interpretation: The expected value, the variance, and the standard error
of the mean are the values of these quantities that one would expect to
find if we generated a large sample of averages each obtained from
independent random samples of size n from the same population.
The result shows that the precision of the sample mean increases with
the sample size.
Moreover, if the variation in the population follows a normal distribution
the sampling variation of the average also follows a normal distribution
X
N (  ,  2 n)
3
Consider a random sample X 1 , X 2 , , X n of size n from a population
2
with mean  X and variance  X and Y1 , Y2 , , Ym an independent
random sample of size m from a population with mean Y and
2
variance  Y . For the difference between the sample means we have
E ( X  Y )  E ( X )  E (Y )   X  Y
Var ( X  Y )  Var ( X )  Var (Y ) 
 X2
n

 Y2
m
These results are a consequence of the following general results
• Linear transformations of random variables (change of scale)
E (a  bX )  a  bE ( X )
Var (a  bX )  b 2Var ( X )
• The expected value of a sum of random variables
E (a0  a1 X 1 
 an X n )  a0  a1E ( X 1 ) 
 an E ( X n )
• The variance of a sum of independent random variables
Var (a0  a1 X 1 
 an X n )  a12Var ( X 1 ) 
 an2Var ( X n )
4
For a random sample of size n from a normal distribution the result
above can be reformulated as
X 
 n
standard normal distribution
The standard normal distribution is tabulated, so for given values of
 and  this relation can be used to derive probability statements
about the sample mean.
The sampling distribution of the variance
The sample variance s2 is also a statistic derived from the observations
and therefore subject to sampling variation. For a random sample from
a normal distribution one may show that
E(s 2 )   2
so the sample variance is an unbiased estimate of the population
variance
5
For a random sample of size n from a normal distribution the
sampling error of the sample variance can also be described.
We have
s2

(n  1)
2
 2-distribution with f = n -1 degrees of freedom
The  -distributions (chi-square distributions) are tabulated so for
2
a given value of  this relation can be used to derive probability
2
statements about the sample variance. A  -distribution is the
distribution of a sum of independent, squared standard normal variates.
2
6
n=5
n = 10
n = 20
n = 50
n = 100
4
2
0
0
1
2
3
4
2
The distribution of the sample variance when   1 for various n.
6
INTRODUCTION TO STATISTICAL INFERENCE
Statistical inference: The use of a statistical analysis of data to draw
conclusions from observations subject to random variation.
Data are considered as a sample from a population (real or hypothetical)
The purpose of the statistical analysis is to make statements about
certain aspects of this population
The basic components of a statistical analysis
• Specification of a relevant statistical model (the scientific problem
is ”translated” to a statistical problem)
• Estimation of the population characteristics (the model parameters)
• Validation of the underlying assumptions
• Test of hypotheses about the model parameters.
A statistical analysis is always based on a statistical model, which
formalizes the assumptions made about the sampling procedure and
the random and systematic variation in the population from which
the sample is drawn.
7
The validity of the conclusions depends on the degree to which
the statistical model gives an adequate description of the sampling
procedure and the random and systematic variation.
Consequently, checking the appropriateness of the underlying
assumptions (i.e. the statistical model) is an important part of a
statistical analysis.
The statistical model should be seen as an approximation to the
real world.
The choice of a suitable model is always a balance between
complex models, which are close approximations,
but very difficult to use in practice,
and
simple models, which are crude approximations,
but easy to apply
8
Example: Comparing the efficacy of two treatments
Design: Experimental units (e.g. patients) are allocated to two
treatments. For each experimental unit in both treatment groups an
outcome is measured. The outcome reflects the efficacy of the treatment.
Purpose: To compare the efficacy of the two treatments
Analysis: To summarize the results the average outcome is computed
in each group and the two averages are compared.
Possible explanations for a discrepancy between the average outcome
in the two groups
• The treatments have different efficacy. One is better than the other
• Random variation
• Bias originating from other differences between the groups. Other
factors which influence the outcome may differ between the groups
and lead to apparent differences between the efficacy of the two
treatments (confounding).
9
A proper design of the study (randomization, blinding etc.) can
eliminate or reduce the bias and therefore make this explanation unlikely.
Bias correction (control of confounding) is also possible in the
statistical analysis.
The statistical analysis is performed to estimate the size of the treatment
difference and evaluate if random variation is a plausible explanation for
this difference.
If the study is well-designed and the statistical analysis indicates that
random variation is not a plausible explanation for the difference, we
may conclude that a real difference between the efficacy of the two
treatments is the most likely explanation of the findings.
The statistical analysis can also identify a range of plausible values, a
so-called confidence interval, for the difference in efficacy.
10
STATISTICAL ANALYSIS OF A SAMPLE FROM
A NORMAL DISTRIBUTION
Example. Fish oil supplement and blood pressure in pregnant women
Purpose: To evaluate the effect of fish oil supplement on diastolic
blood pressure in pregnant women.
Design: Randomised controlled clinical trial on 430 pregnant women,
enrolled at week 30 and randomised to either fishoil supplement or
control.
Data: Diastolic and systolic blood pressure at week 30 and 37
(source: Sjurdur Olsen)
The Stata file fishoil.dta contains the following variables
grp
group
difsys
difdia
treatment group, 1 for control, 2 for fish oil
a string variable with the name of the group allocation
increase in systolic blood pressure from week 30 to week 37
increase in diatolic blood pressure from week 30 to week 37
11
We shall here consider the change in diastolic blood pressure
from week 30 to week 37.
Stata
histogram difdia , by(group)
Control
Fishoil
.06
Density
.04
.02
0
-40
-20
0
20
40 -40
difdia
Graphs by group
-20
0
20
40
12
Stata
qnorm difdia if grp==1, title("Control")
saving(q1,replace)
qnorm difdia if grp==2, title("Fish oil")
saving(q2,replace)
graph combine q1.gph q2.gph
40
40
20
20
0
0
-20
-20
-40
-40
-20
-10
0
Inverse Normal
///
Fish oil
difdia
difdia
Control
///
10
20
-20
-10
0
10
Inverse Normal
20
13
For both groups the histogram and the probability plot correspond
closely to the expected behavior of normally distributed data.
Hence our statistical model is:
The observations in each group can be considered as a random
sample from a normal distribution with unknown parameters as below:
Group Mean Variance
Control 1
 12
Fishoil 2
 22
The two sets of observations are independent.
The ultimate goal of the analysis is to compare the two treatments
with respect to the expected change in blood pressure.
We shall return to this analysis later.
First we want to examine the change in the diastolic blood pressure
in women in the control group.
14
We now consider the control group and focus on the increase in
diastolic blood pressure
Problem:
Do the data suggest that the diastolic blood pressure in pregnant
women increases from week 30 to week 37?
Data
The observed values of the change in diastolic blood pressure in the
213 women who participated in the study
Statistical model
The data are considered as a random sample of size 213 from a
2
normal distribution with mean  and variance  .
The parameter  describes the expected change and the parameter
 2 describes the random variation caused by biological factors and
measurement errors.
15
Assumptions
The assumptions of the statistical model are
1. The observations are independent
2. The observations have the same mean and the same variance
3. A normal distribution describes the variation.
Checking the validity of the assumptions is usually done by various
plots and diagrams. Knowledge of the measurement process can often
help in identifying points which need special attention.
Re 1. Checking independence often involves going through the
sampling procedure. Here the assumption would e.g. be violated
if the same woman contributes with more than one pregnancy.
Re 2. Do we have “independent replications of the same experiment”?
Factors that are known to be associated with changes in blood
pressure are not accounted for in the model. They contribute to
the random variation.
16
Re 3. The plots above indicate that a normal distribution gives an
adequate description of the data
Estimation
The estimation problem: Find the normal distribution that best fits
the data.
Solution: Use the normal distribution with mean equal to the sample
mean and variance equal to the sample variance.
sum difdia if grp==1
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------difdia |
213
1.901408
7.528853
-28
29
i.e.
ˆ  x  1.90
ˆ 2  s 2  56.68 ( s  7.53)
Note: A normal distribution is completely determined by the values
of the mean and the variance.
Convenient notation: A “^” on top of a population parameter is
used to identify the estimate of the parameter
17
Question: Do the data suggest a systematic change in the diastolic
pressure? No systematic change means that the expected change is 0,
i.e.
Hypothesis: The data are consistent with the value of  being 0.
This hypothesis is usually written as H :   0
We have observed an average value of 1.90. Is sampling variation a
possible explanation for the difference between the observed value of
1.90 and the expected value of 0?
Statistical test
To evaluate if the random variation can account for the difference we
assume that the hypothesis is true and compute the probability that
the average value in a random sample of size 213 differs by at least
as much as the observed value.
From the model assumptions we conclude that the average can be
considered as an observation from a normal distribution with
mean 0 and standard deviation equal to  n  213
18
Consequently, the standardized value
z
x 0
 213
is an observation from a standard normal distribution.
Problem: The population standard deviation is unknown, but in large
samples we may use the sample standard deviation and still rely on
the normal distribution. Small samples are considered later.
Replacing  with the estimate s we therefore get
"z" 
x 0
1.901  0
1.901


 3.69
s 213 7.529 213 0.516
For a normal distribution a value more than 3 standard deviations
from the mean is very unlikely to occur.
Using a table of the standard normal distribution function we find that
a value that deviates more 3.69 in either direction occurs with a
probability of 0.00023.
19
p-value
The probability computed above is called the p-value.
p-value = the probability of obtaining a value of the test statistics as
least as extreme as the one actually observed if the hypothesis is true.
Usually extreme values in both tails of the distribution are included
(two-sided test), so in the present case
p  value  P( z  3.69)  P( z  3.69)  2  0.000114  0.00023
0.4
0.3
-3.69
3.69
0.2
0.1
0.0
-4
-3
-2
-1
0
1
2
3
4
The calculation indicates that sampling variation is a highly implausible
explanation for the observed change in blood pressure. The observed
deviation from the hypothesized value is statistically significant.
20
Usually a hypothesis is rejected if the p-value less than 0.05.
SMALL SAMPLES – use of the t – distribution
To compute the p-value above we replaced the unknown population
standard deviation with the sample standard deviation and referred
the value of the test statistic to a normal distribution.
For large samples this approach is unproblematic, but for small samples
the p-value becomes too small, since the sampling error of the sample
standard deviation is ignored. Statistical theory shows that the correct
distribution of the test statistic is a so-called t-distribution with f = n – 1
degrees of freedom.
The t-distribution has been tabulated, so we are still able to compute
a p-value. Note that the t-distributions does not depend on the
2
parameters  and  so the same table applies in all situations.
As the sample size increases the t-distribution will approach a standard
normal distribution. Usually the approximation is acceptable for samples
larger than 60, say.
21
If we again compute
t
x 0
 3.69
s 213
but this time look-up the value in a table of a t-distribution with f = 212
degrees of freedom, we get p = 0.00029. Since the sample is relatively
large the result is almost identical to the one above.
standard normal
t-dist. n = 5
t-dist. n = 20
t-dist. n = 100
0.4
0.3
0.2
0.1
0.0
-4
-2
0
2
4
A comparison of t-distribution with 4, 19, and 99 degrees of freedom
and a standard normal distribution (the black curve!)
22
STATA: PROBABILITY CALCULATIONS
Output from statistical programs like Stata usually also includes
p-values so statistical tables are rarely needed. Moreover Stata
has a lot of build-in functions that can compute almost any kind of
probabilities. Write help probfun to see the full list.
Some examples
display norm(3.6858) returns .99988601, the value of the
cumulative probability function of a standard normal distribution at
3.6858, i.e. P( Z  3.6858), the probability that a standard normal
variate is less than or equal to 3.6858
display ttail(212,3.6858) returns .00014478, the probability
that a t-statistic with 212 degrees of freedom is larger than 3.6858.
display Binomial(224,130,0.5134) returns .02608126, the
probability of getting at least 130 successes from a Binomial distribution
with n = 224 and p = 0.5134.
23
ONE SAMPLE t-TEST: THE GENERAL CASE
Above we derived the t-test of the hypothesis H :   0
The same approach can be used to test if any specified value is
consistent with the data. If we e.g. want to test the hypothesis
H :   2 we compute
t
x 2
1.901  2
0.099


 0.1911
s 213 7.529 213 0.516
display 2*ttail(212,0.1911) returns the p-value .84863014,
so an expected change of 2 is compatible with the data and can not
be rejected.
Note: The function ttail gives a probability in the upper tail of the
distribution. A negative t-value should therefore be replaced by the
corresponding positive value when computing the p-value.
24
CONFIDENCE INTERVALS
In the example the observed average change in blood pressure is
1.901, and this value was used as an estimate of the expected change 
Values close to 1.901 are also compatible with the data, we saw e.g.
that the value 2 could not be rejected.
Problem: Find the range of values for the expected change that is
supported by the data.
A confidence interval is the solution to this problem.
Formally:
A 95% confidence interval identifies the values of the unknown parameter
which would not be significantly contradicted by a (two-sided) test at the
5% level, because the p-value associated with the test statistic for each
of these values is larger than 5%
25
Frequency interpretation: If the experiment is repeated a large
number of times and a 95% confidence interval is computed for each
replication, then 95% of these confidence intervals will contain the
true value of the unknown parameter.
How to calculate the 95% confidence interval
The limits of the confidence interval are the values of t equal to the
2.5 and 97.5 percentile of a t-distribution with n – 1 degrees of freedom.
The t-distribution is symmetric around 0, so t0.025  t0.975 and the
the confidence limits are therefore given by the values of  satisfying
i.e.
x  t0.975
x 
 t0.975
s n
s
s
   x  t0.975
n
n
The formula shows the confidence intervals becomes more narrow
as the sample size increases.
26
Example continued
In Stata the command invttail gives the upper percentiles and
display invttail(212,0.025) returns 1.971217.
The 95% confidence limits for the expected change in diastolic blood
pressure therefore becomes
x  t0.975
0.88
s
 1.9014  1.9712  0.5159  
n
2.92
and the 95% confidence interval becomes 0.88    2.92
99% confidence intervals are derived from the upper 0.5 percentile in
a similar way.
Also, one-sided confidence intervals can be defined and computed
from one-sided statistical test (statistical tests are called one-sided
if large deviations in only one direction are considered extreme).
27
STATA: ONE SAMPLE t-TEST
A single command in Stata will give all the results derived so far.
ttest difdia=0 if grp==1
One-sample t test
--------------------------------------------------------------------Variable |
Obs
Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+----------------------------------------------------------difdia |
213 1.901408 .5158685 7.528853
.8845197
2.918297
--------------------------------------------------------------------t =
3.6858
hypothesis tested
degrees of freedom =
212
Ha: mean < 0
Pr(T < t) = 0.9999
Ho: mean(difdia) = 0
Ha: mean != 0
pr(|T| > |t|) = 0.0003
Ha: mean > 0
Pr(T > t) = 0.0001
(two-sided) p-value
To test the hypothesis H :   2 use ttest difdia=2 if grp==1
instead
28
Statistical inference about the variance
So far we have looked at statistical inference about the mean of a
normal population based on a random sample.
In the same setting we can also derive a test statistic for hypotheses
about the variance (or the standard deviation) and obtain confidence
intervals for this parameter. The arguments are based on the result
about the sampling distribution of the sample variance (see p. 6)
s2

(n  1)
2
 2 -distribution with f = n -1 degrees of freedom
Inference problems involving a hypothesis about the variance are
much less common, but may e.g. arise in studies of methods of
measurement
Example continued
Suppose we for some reason want to see if the change in diastolic blood
pressure has a standard deviation of 7, or equivalently a variance of 49.
29
To test the hypothesis H :   7 we could compute
s2
56.68
(213  1) 
(213  1)  245.24
49
49
2
and see if this value is extreme when referred to a  -distribution
on 212 degrees of freedom.
Using Stata’s probability calculator, display chi2(212,245.24),
we get .94165889.This is the probability of a values at less than or
equal to 245.24. The probability of getting a value larger than 245.24
is 1-.94165889 = .05834111. Stata can also give this result
directly from the command display chi2tail(212,245.24).
The p-value is 2 times the smallest tail probability, i.e. 0.117.
A standard deviation of 7 can not be rejected.
Rule:
If the test statistic, x, is smaller than the degrees of freedom, f, use
display 2*chi2(x,f) else use display 2*chi2tail(x,f)
30
Confidence intervals for variances and standard deviations
2
A 95% confidence interval for the population variance  is given by
f  s2
2
 0.975
2 
f  s2
2
 0.025
where f is the degrees of freedom and  0.025 and  0.975 are the 2.5
2
and the 97.5 percentiles of a  -distribution with f degrees of freedom.
2
2
A 95% confidence interval for the standard deviation therefore
becomes
s
f

2
0.975
  s
f
2
 0.025
Example – diastolic blood pressure continued
Stata’s probability calculator has a function invchi2 that computes
2

percentiles of -distributions. We find that
display invchi2(212,0.025) gives 173.5682
display invchi2(212,0.975) gives 254.2178
31
A 95% confidence interval for the standard deviation is therefore
7.5289
212
212
   7.5289
254.2178
173.56823
 6.88    8.32
More Stata
A test of a hypothesis about the standard deviation is carried out
by the command
sdtest difdia=7 if grp==1
One-sample test of variance
---------------------------------------------------------------------Variable | Obs
Mean
Std. Err. Std. Dev. [95% Conf. Interval]
---------+-----------------------------------------------------------difdia | 213 1.901408
.5158685
7.528853
.8845197
2.918297
---------------------------------------------------------------------sd = sd(difdia)
c = chi2 = 245.2435
hypothesized value degrees of freedom =
Ho: sd = 7
212
Ha: sd < 7
Pr(C < c) = 0.9417
Ha: sd != 7
2*(C > c) = 0.1166
Ha: sd > 7
Pr(C > c) = 0.0583
(two-sided) p-value
Note that the 95% confidence interval is the confidence interval for
the population mean and not for the standard deviation.
32
STATISTICAL ANALYSIS OF TWO INDEPENDENT
SAMPLES FROM NORMAL DISTRIBUTIONS
Example. Fish oil supplement and blood pressure in pregnant women
The study was a randomized trial carried out to evaluate the effect
of fishoil supplement on diastolic blood pressure in pregnant women.
Pregnant women were assigned at random to one of two treatment
groups. One group received fish oil supplement, the other was a
control group.
Here we shall compare the two treatments using difdia, the change
in diastolic blood pressure, as outcome, or response.
We have already seen histograms and Q-Q plots of the distribution
of difdia in each of the two groups (see p. 12-13) and these plots
suggest that the random variation may be adequately described by
normal distributions.
33
The standard analysis of this problem is based on the following
statistical model
The observations in each group can be considered as a random
sample from a normal distribution with unknown parameters as below:
Group Mean Variance
Control 1
2
Fishoil 2
2
The two sets of observations are independent.
Note that the size of the random variation is assumed to be the same
in the two groups, so this assumption should also be checked.
The purpose of the analysis is to quantify the difference between the
expected change in the two groups and assess if this difference is
statistically different from 0
34
Model assumptions
1. Independence within and between samples
2. Random samples from population with the same variance
3. The random variation in each population can be described by a
normal distribution
Note: The model assumptions imply that if this difference is not
statistically different from 0 we may conclude that the distributions are
not significantly different, since a normal distribution is completely
2
determined by the parameters  and  .
Re 1. Inspect the design and the data. Repeated observations on the
same individual usually imply violation of the independence assumption.
Re 2. A formal test of the hypothesis of identical variances of normal
distribution is described below.
Re 3. Histograms and Q-Q plots, see page 12-13
35
Estimation
Basic idea: population values are estimated by the corresponding
sample values. This gives two estimates of the variance, which should
be pooled to a single estimate.
Stata performs the basic calculations with
bysort grp: summarize difdia
_________________________________________________________________
-> grp = control
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+-----------------------------------------------------difdia |
213
1.901408
7.528853
-28
29
_________________________________________________________________
-> grp = fish oil
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+------------------------------------------------------difdia |
217
2.193548
8.364904
-28
31
i.e. control group: mean = 1.90
fish oil group: mean = 2.19
36
The standard deviations are rather similar, so let us assume for
a moment that it is reasonable to derive a pooled estimate. How
should this be done?
Statistical theory shows that the best approach is to compute a pooled
estimate of the variance as a weighted average of the sample
variances and use the corresponding standard deviation as the pooled
estimate. The weighted average uses weights proportional to the
degrees of freedom, i.e. f = n – 1. Hence
s
2
pooled
f1s12  f 2 s22
s 
f1  f 2
2
p
and
s pooled  s p  s 2p
Stata does not include this estimate in the output above, but the
result is produced by the commands
quietly regress difdia grp
display e(rmse)
giving the output 7.9617662, i.e. sp = 7.962
writing quietly in front
suppresses output from
the command
the string variable group
can not be used here
37
Statistical test
comparing means of two independent samples
The expected change in diastolic blood pressure is slightly higher in
the fish oil group. Does this reflect a systematic effect?
To see if random variation can explain the difference we test the
hypothesis
H : 1  2
of identical population means in the two samples.
The line of argument is similar to the one that was used in the onesample case. Assume that the hypothesis is true. This observed
difference between the two means must then be caused by sampling
variation.
The plausibility of this explanation is assessed by computing a p-value,
the probability of obtaining a result at least as extreme as the observed.
38
From the model assumption we conclude that if the hypothesis is true
then the difference between the sample means can be considered as
an observation from a normal distribution with mean 0 and variance
1 1
Var  X 1  X 2  

   
n1 n2
 n1 n2 
2
2
2
Consequently, the standardized value
x1  x2

1
n1

1
n2
is an observation from a standard normal distribution. If the standard
deviation  is replace by the pooled estimate s p we arrive at the
test statistic
x1  x2
t
sp
1
n1

1
n2
39
To derive the p-value this test statistic should be referred to a
t-distribution with f1  f 2  (n1  1)  (n2  1)  n1  n2  2 degrees
of freedom, since we may show that sampling distribution of the pooled
variance estimate is identical to the sampling distribution of a variance
estimate with f1  f 2 degrees of freedom (see page 6).
We get
x1  x2
t
sp
1
n1


1
1.9014  2.1935
1
7.9618
n2
213

 0.38
1
217
and the p-value becomes 0.70. The difference is not statistically
significant different from 0.
0.4
0.3
0.2
0.1
0.0
-4
-3
-2
-1
0
1
2
3
4
40
Confidence intervals for the parameters of the model
The model has unknown three parameters 1 , 2 , and  .
A 95% confidence interval for the expected value 1 becomes
2
x1  t0.975
sp
n1
 1  x1  t0.975
sp
n1
and similarly for 2 . Note that the pooled standard deviation is used
and t.975 is therefore the 97.5 percentile of a t-distribution with f1  f 2
degrees of freedom. For the change in diastolic blood pressure we get
1.901  1.966
7.962
7.962
 1  1.901  1.966
213
213
 0.83  1  2.97
Note: some programs, e.g. Stata, use the separate sample standard
deviation when computing these confidence intervals.
A 95% confidence interval for the standard deviation is based on the
pooled estimate with 212 + 216 = 428 degrees of freedom (see page 31)
7.962
428
428
   7.962
487.21
372.57
 7.46    8.53
41
Confidence intervals for the difference between means
In a two-sample problem the parameter of interest is usually   1  2 ,
the difference between the expected values. From the results above
(page 39) we get
 x1  x2   t0.975 s p
1
n1

1
n2
    x1  x2   t0.975 s p
1
n1

1
n2
where the t-percentile refers to a t-distribution with f1  f 2 degrees of
freedom.
The example
1.901  2.194   1.966  7.962
1
213

1
217

  1.901  2.194   1.966  7.962

1
213

1
217
1.80    1.22
42
STATA: TWO SAMPLE t-TEST (equal variances)
A single command in Stata gives all the results derived so far except
and estimate of the pooled variance (see page 37)
ttest difdia , by(grp)
s.d. in combined samples,
not pooled s.d.
Two-sample t test with equal variances
--------------------------------------------------------------------Group
Obs
Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+----------------------------------------------------------control |
213 1.901408 .5158685
7.528853 .8845197
2.918297
fish oil |
217 2.193548 .5678467
8.364904
1.074318
3.312778
---------+----------------------------------------------------------combined |
430 2.048837 .3835675
7.953826
1.294932
2.802743
---------+----------------------------------------------------------diff |
-.2921399 .7679341
-1.801531
1.217252
--------------------------------------------------------------------diff = mean(control) - mean(fish oil)
t = -0.3804
Ho: diff = 0
degrees of freedom =
428
hypothesis tested
Ha: diff < 0
Pr(T < t) = 0.3519
Ha: diff != 0
Pr(|T| > |t|) = 0.7038
(two sided) p-value
Ha: diff > 0
Pr(T > t) = 0.6481
43
Comparing the variances: The F-distribution
In the statistical model we assumed the same variance in the two
populations. To assess this assumption we consider a statistical
2
2
test of the hypothesis H :  1   2
An obvious test statistic is the ratio of sample variances
s12
F 2
s2
A value close to 1 is expected if the hypothesis is true. Both small and
large values would suggest that the variances differ.
From statistical theory follows that the distribution of the ratio of two
independent variance estimates is a so-called F-distribution if the
corresponding population variances are identical (i.e. if H is true).
The F-distribution is characterized by a pair of degrees of freedom
(the degrees of freedom for the two variance estimates). Like normal,
t-, and chi-square distributions the F-distributions are extensively
tabulated.
44
Comparing the variances
In practice the hypothesis of equal variances is tested by computing
max( s12 , s22 ) largest variance estimate
Fobs 

2
2
min( s1 , s2 ) smallest variance estimate
and p-value is then obtained as p  2  P  F  Fobs  where the pair
of degrees of freedom are those of the numerator and the denominator.
Example
For the change in diastolic blood pressure we have
Fobs
69.97

 1.2344
56.68
Stata’s command display 2*Ftail(216,212,1.2344) returns
0.125, so the p-value becomes 0.125.
The difference between the two standard deviations is not statistically
significant.
45
STATA: COMPARISON OF TWO VARIANCES
Stata’s command sdtest can also be used to compare two variances.
Write
sdtest difdia , by(grp)
Variance ratio test
------------------------------------------------------------------Group | Obs
Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------control | 213 1.901408 .5158685 7.528853
.8845197
2.918297
fish oil | 217 2.193548 .5678467 8.364904
1.074318
3.312778
---------+--------------------------------------------------------combined | 430 2.048837 .3835675 7.953826
1.294932
2.802743
------------------------------------------------------------------ratio = sd(control) / sd(fish oil)
f =
0.8101
Ho: ratio = 1
degrees of freedom = 212, 216
hypothesis tested
Ha: ratio < 1
Pr(F < f) = 0.0622
Ha: ratio != 1
2*Pr(F < f) = 0.1245
Ha: ratio > 1
Pr(F > f) = 0.9378
(two sided) p-value
46
Comparing the means when variances are unequal
Problem: What if the assumption of equal variances is unreasonable?
Some solutions:
1. Try to obtain homogeneity of variances by transforming the
observations in a suitable way, e.g. by working with log-transformed
data.
2. Use an approximate t-test, that does not rely on equal variances.
The approximate t-test has the form
tapprox 
x1  x2
s12
n1

s22
n2
Under the hypothesis of equal means the distribution of this test
statistic is approximately equal to a t-distribution. To compute the
degrees of freedom for the aproximate t-distribution first compute
47
s12 n1
c 2
s1 n1  s22 n2
the degrees of freedom is then obtained as
 c
(1  c) 



n

1
n

1
2
 1

2
f approx
2
1
3. Use a non-parametric test, e.g. a Wilcoxon-Mann-Whitney test.
We shall consider solution 1 next time and solution 3 later in the
course. The Stata command ttest computes solution 2 if the option
unequal is added.
Note: When the variances of the two normal distributions differ
the hypothesis of equal means are no longer equivalent to the
hypothesis of equal distributions.
48
STATA: TWO SAMPLE t-TEST (unequal variances)
To compute the approximate t-test (solution 2 above) with Stata write
ttest difdia , by(grp) unequal
approximate
confidence limits
Two-sample t test with unequal variances
-------------------------------------------------------------------Group | Obs
Mea
Std. Err Std. Dev. [95% Conf. Interval]
---------+---------------------------------------------------------control | 213 1.901408 .5158685 7.528853
.8845197
2.918297
fish oil | 217 2.193548 .5678467 8.364904
1.074318
3.312778
---------+---------------------------------------------------------combined | 430 2.048837 .3835675 7.953826
1.294932
2.802743
---------+---------------------------------------------------------diff |
-.2921399
.7671833
-1.800088
1.215808
-------------------------------------------------------------------diff = mean(control) - mean(fish oil)
t = -0.3808
Ho: diff = 0
Satterthwaite's degrees of freedom = 424.831
Ha: diff < 0
Pr(T < t) = 0.3518
Ha: diff != 0
Pr(|T| > |t|) = 0.7035
(two sided) p-value
approximate t-test
Ha: diff > 0
Pr(T > t) = 0.6482
degrees of freedom of
the approximate t-test
49
SOME GENERAL COMMENTS ON STATISTICAL TESTS
To test a hypothesis we compute a test statistic, which follows a
known distribution if the hypothesis is true. We can therefore compute
the probability of obtaining a value of the test statistic as least as
extreme as the one observed. This probability is called the p-value.
The p-value describes the degrees of support of the hypothesis found
in the data. The result of the statistical test is often classified as
”statistical significant” or ”non-significant” depending on whether or
not the p-value is smaller than a level of significance, often called ,
and usually equal to 0.05.
The hypothesis being tested is often called the null hypothesis. A
null hypothesis always represents a simplication of the statistical model.
Hypothesis testing is sometimes given a decision theoretic formulation:
The null hypothesis is either true or false and a decision is made based
on the data.
50
When hypothesis testing is viewed as decisions, two types of error are
possible
• Type 1 error: Rejecting a true null hypothesis
• Type 2 error: Accepting (i.e. not rejecting) a false null hypothesis.
The level of significance specifies the risk of a type 1 error. In the
usual setting the null hypothesis is tested against an alternative
hypothesis which includes different values of the parameter, e.g.
H0 :
 0
against
H A:
 0
The risk of a type 2 error depends on which of the alternative values
are the true value.
The power of a statistical test is 1 minus the risk of type 2 error. When
planning an experiment power considerations are sometimes used
to determine the sample size. We return to this in the last lecture.
Once the data are collected confidence intervals are the appropriate
way to summarize the uncertainty in the conclusions.
51
Relation between p-values and confidence intervals
In a two sample problem it is tempting to compare the 95% confidence
intervals of the two means and conclude that the hypothesis 1  2 is
non-significant if the 95% confidence intervals overlap.
This is not correct.
Overlapping 95% confidence intervals does not imply that the
difference is not significant on a 5% level. On the other hand, if the
95% confidence intervals do not overlap, the difference is statistical
significant on a 5% level (actually, the p-value is 1% or smaller).
This may at first seem surprising, but it is a simple consequence of the
fact that for independent samples the result
implies that
Var ( x  y )  Var ( x )  Var ( y )
se( x  y )  se( x )  se( y )
52