Download Biostatistics course Part 9 Comparison between two means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Biostatistics course Part 9
Comparison between two
means
Dr. Sc Nicolas Padilla Raygoza
Department Nursing and Obstetrics
Division Health Sciences and Engineering
Campus Celaya-Salvatierra
University of Guanajuato, Mexico
Biosketch
 Medical Doctor by University Autonomous of Guadalajara.
 Pediatrician by the Mexican Council of Certification on






Pediatrics.
Postgraduate Diploma on Epidemiology, London School of
Hygine and Tropical Medicine, University of London.
Master Sciences with aim in Epidemiology, Atlantic International
University.
Doctorate Sciences with aim in Epidemiology, Atlantic
International University.
Professor Titular A, Full Time, University of Guanajuato.
Level 1 National Researcher System
[email protected]
Competencies
 The reader will apply a Z test to inferences
from a comparison of two paired means.
 He (she) will apply a Z test to inferences from
two independent means.
 He (she) will apply t test to inferences from a
mean of differences in a small sample.
 He (she) will apply a t test to inferences for
two independent means in a small sample.
 He (she) will obtain a confidence interval for two
independent means and for a mean of differences.
Introduction
 Often we want to compare two groups.
 The statistical methods used for the
comparison of two means depends on how
these means were obtained.
 The data can be obtained from paired or not
paired samples.
Paired data
 How to obtain paired data?

Paired samples occur when first measure is
matched with a second measure in the same
subject.
 For quantitative data usually occurs when
there are repeated measurements on the
same person.
Example
 In a study to determine whether birth weight
measurements are adequate, we compared
the birth weight of newborns from a hospital
in Celaya, Gto.
 The measurements were performed by
different people, to control the measurement
bias, being an observer blinded to the
measurement of another observer.
Non-paired data
 How to obtain non-paired data?

We get non-paired data when observations in
a sample are independent from observations
in another sample.
Example
 To study the effects of a new drug to treat the
parasitic burden of Ascaris lumbricoides,
patients were randomized to receive
nitazoxanide (group A) and albendazole
(group B).
 The effect of the drug in each group was
measured and compared.
 In the analysis of paired data we calculate the
difference between the first and second
measurement. This gives us a sample of
differences, and then apply the methods of
analysis for quantitative data from one mean.
Analysis of quantitative paired data
 When analyzing paired data, you must first calculate the
difference between two measurements in the same subject.
 We measurement birth weights of newborns in Celaya, by two
observers.
Patient
Observer 1 (g)
Observer 2 (g)
Difference
(d)
1
2970
3010
- 40
2
3525
3650
-125
3
3100
3125
- 25
4
2750
2550
200
5
4000
4050
- 50
6
3200
3300
-200
7
3000
3000
0
8
2500
2700
-200
9
3200
3400
-200
10
3900
3700
200
Analysis of quantitative paired data
 To assess the difference in paired measurements we can calculate the
mean differences and confidence intervals; we can also calculate
whether the mean of the differences is significantly different from 0.
 The notation that we use to indicate the mean of differences and
standard deviation in the sample and the population are displayed:
Population
Sample
Mean of
differences
_
δ
_
d
Standard
deviation
σ
s
Confidence interval
 If there is no difference between the paired
measurements, the average of the differences will be
0.
 To calculate the confidence interval of the mean of
the differences in the sample and test the hypothesis
that is equal to 0, we need to know:



The mean differences
The standard deviation of differences
The standard error of the mean of the differences.
Confidence interval
 We can estimate the confidence interval
around the mean of the differences in the
sample in the same way as we did for one
mean.
 The confidence interval at 95% tells us that
we have 95% confidence that the true mean
of differences in the population is between
the confidence interval 95% to the sides of
the mean of differences of the sample.
Confidence interval
 The general formula for confidence interval
95% is:

Estimate of the sample ± 1.96 X SE of the
estimate of the sample
 Then the confidence interval 95% for the
mean of the differences is:

δ + 1.96 x (s (δ) / √ n)
 δ is the mean of the differences.
 1.96 is the multiplier used to calculate the
confidence interval at 95%.
 If it is calculated at 90% using 1.64 as a
multiplier.
Example
 Confidence interval 95%




d of birth weights = -34.0
s= 140.94
SE= 140.94/√10=44.60
-34±1.96 (44.60) = -121.42 a 53.42
Example
 Confidence interval 90%




d of birth weights = -34.0
s= 140.94
SE= 140.94/√10=44.60
-34±1.64 (44.60) = -107.14 a 39.1
Hypothesis test for a mean of
differences
 A confidence interval gives us a 95% range to
the sides of the mean of the differences that
we have confidence in 95% of times that it
includes the mean of differences in the
population.
 We can also calculate the probability that, on
average, there is no difference between the
paired observations in the population, using a
hypothesis test.
Hypothesis test for a mean of
differences
 The null hypothesis is that the mean
differences in the population is zero:
Ho: δ = 0

This is equivalent to say that the distribution of
mean of differences in the sample is Normal
with mean 0 and a standard error that
depends on the standard deviation of the
difference in the population.
 The alternative hypothesis is that the mean of
the difference in population is not zero:
Ha: δ ≠ 0
Hypothesis test for a mean of
differences
 Test hypothesis:
 To
test null hypothesis, we calculate Z
test
Mean of differences of the sample mean of the difference of hypothesis
d-0
z = ----------------------------------------------------- = -----------standard error of the mean of the
ES(d)
differences if the sample

Where the mean of differences of hypothesis is
zero.
Hypothesis test for a mean of
differences
 Calculate the value of z in the hypothesis test,
tells us how many standard errors of the
mean observed is the center of the
distribution, defined by the null hypothesis.
δ-0
Z= ----------------S(δ) /√n
Example
 We have seen that the mean of differences in
weight in 10 babies was -34, with s = 140.9
and confidence intervals at 95% -121.42 to
53.42 gr.
 We want to find out if the measurements
taken by the two observers were really
different.
Example
 We should note the null hypothesis:



“In average, all possible measurements taken
by two observers arte equal” or
Mean of the differences in the population is
zero.
Alternative hypothesis will be: the mean of the
differences in the population will no be zero.
Example
To test hypothesis, we calculate


-34 – 0
z = ----------- = - 0.76
44.60
Assuming that the mean of the differences is normally
distributed with mean zero, the test result said that mean of
differences estimate is -0.76 standard errors from the center
of the distribution.
Referring the Z value of -0.76 in tables for two tails of Normal
distribution, the p-value is 0.44.
 The conclusion is that we accept the null hypothesis
and say the sampling variation is a likely explanation
for the mean of differences.
How obtain the p-value
 In the table of distribution Z or Normal, we
search the Z value obtained with our test and
see in the column on the right, the
corresponding p-value.
 This table can be found in textbooks of
Biostatistics.
Small paired samples
 When the sample size is small, the
distribution of samples is not exactly Normal,
but the follow the t distribution.
 Therefore, if the sample size is small (less
than 50) we use the values of the t
distribution for calculating the confidence
interval and hypothesis test.
Confidence interval for paired sample
 Formulae for 95% confidence interval is
estimate ± t0.05 (ES)
 Where estimate is the mean of differences
 t0.05 is the value of t distribution to 0.05 of p
with n-1 degree of freedom.
 The first column from t distribution is the
degrees of freedom corresponding to n1. We go on the right until the value of
0.05 and that is the multiplier used for
the confidence interval.
Hypothesis test for small paired
samples
 The formulae for hypothesis test is:
t = mean of differences – 0 /SE
 The formulae is similar that Z test, only that
the result, to obtain the p-value, is search in
the table of t distribution.
 The first column is degree of freedom (n-1)
and it is search on the right the t value and in
top of the column see the p-value.
Analysis of independent samples
 Differs from the analysis of paired data, as we
observe the difference between two
independent means rather than the mean of
the difference of two paired observations.
 Examples


Do smokers have a different blood pressure
than non-smokers?
In a sample of smokers and non-smokers:
 Systolic blood pressure averaged 148 and
138 non-smokers.
 The difference in average is 148-138 = 10.
Analysis of independent samples
 Notation:



We are observing two independent populations and it is
needed two samples, we need additional notations. As
shown in the table below:
Remember that we use Greek letters for population
parameters and Latin letters for the sample estimates:
The lower numbers serve to distinguish between sample 1
and sample 2, and between populations 1 and 2.
Population
Sample
1
2
1
2
_
_
Mean
μ1
μ2
X1
X2
Standard deviation
σ1
σ2
s1
s2
Sampling distribution for two
independent samples
 The sampling distribution of the difference
between two independent means is found
using the same procedures used for a single
sample.
 Repeatedly took random samples of size n1
and size n2 and each time, we calculated the
means (x1, x2) and standard deviations (s1,
s2) in both populations and then measure the
difference between the means for each pair
of samples.
 The result is a sampling distribution of
differences between two independent means.
Sampling distribution for two
independent samples
 Generating this distribution we see that:
1 .- The mean of the sampling distribution is the value of the
population, which is the difference between the two means in
the population.
2 .- The standard deviation of the sampling distribution depends
on n1 and n2, which are the sample sizes.
3 .- The shape of the distribution becomes closer to Normal
when n1 and n2, are increasing.
 We know that the sampling distribution of any estimate of the
sample can be inferred from the data collected from only one
sample.
 The same principles apply here: the sampling distribution of
difference of means can be inferred from only one group of two
samples. To do this, we need:


The difference between the two means from the samples
The standard error of the difference between the two means
from the samples
Standard error for the distribution of
differences of means
 The standard error of the difference between two
independent means is the combination of the
standard errors of two independent sampling
distributions.
 We know that the standard error for half of the
sample is:
s
SE = -------√n
 Variance of the mean is the square of standard error:
Variance = σ2 / n
Standard error for the distribution of
differences of means
 One can show that the variance of two independent means is equal to
the sum of the variances of the two averages of samples as:
σ1
σ2
SE (X1) = ------SE (X2) = -------n1
n2
_
_
σ 21
σ22
Variance (X1 –X2) = variance of X1 + variance of X2 = --------- + ------n1
n2
 The variances are coupled because each sample contributes to
sampling error of the distribution of differences.
 Then, the standard error of the difference between two independent
samples is given by:
σ21
σ22
SE (X1 – X2) = √ ------- + -----n1
n2
Standard error for the distribution of
differences of means
 In most situations we do not know the
standard deviations of the population (σ1 and
σ2), in the practice, we use the standard
deviations of the sample (s1 and s2) so that:
s 21
s 21
SE(X1 – X2) = √ ------- + --------n1
n2
Confidence interval for the difference
of two means
 Assuming that the sampling distribution of
(X1 – X2) is Normal, we can calculate
confidence interval for the difference of two
means using the formulae general:
Difference of means ± 1.96 (ES (X1 –X2))

For a 95% confidence interval, assuming
Normal distribution:
_
_
(X1 – X2) ± 1.96 [√(s21 / n1) + (s21 / n2)]
Example
 In a study to evaluate the efficacy of oral rehydration
solution (ORS) in children with acute diarrhea, 40
children were in the treatment group and 40 children
in the control group. We measured the duration in
hours of diarrhea and its standard deviation.
Group
n Mean duration of diarrhea
s
Treatment 40
72
10
Control
40
120
12
Example
 To calculate confidence interval 95% for the
difference between means of independent samples,
we need to calculate difference between means and
standard error:
_
_
X1 – X2 = 72 – 120 = - 48 hours
s21 s22
102 122
ES(X1 – X2) = √ -----+ ----- = √------ + ---- =√2.5+3.6 = 2.47
n1 n2
40
40
95% IC = -48 ± 1.96 (2.47)= - 52.84 a – 43.16
Example
 The difference from means was -48 hours
with an standard error of 2.47.
 Confidence interval 95% say us that we have
95% of confidence that the difference
between means of duration of diarrhea in the
population is between -52.8 hours and -43.16
hours.
 The interval does not include the unit, we can
say that the difference of means is significant
statistically.
Hypothesis test for two independent
means
 To calculate probability (p-value) that two independent means
are equal. We use Z test to probe hypothesis.
 We used the Z test in the same form, that in did in mean of the
differences in paired samples:



Null hypothesis is that the two means are equal:
Ho: μ1 – μ2 = 0
Alternative hypothesis is: H1: μ1 - μ2 ≠ 0
Then, the formulae for Z test is:
_
_
(X1 – X2) - 0
z = -----------------ES(X1 –X2)
ES (X1 –X2) = √(s21 /n1) + (s21 /n2)
Example
 To apply the hypothesis test in the study of oral rehydration
solution, of the duration of diarrhea is in average the same for
the two groups.
 Differences from means is - 48 hours. Standard error is 2.47.
- 48 - 0
Z = ----------- = - 19.43
2.47
 This say us that the observed difference is -19.43 standard
errors from the center of distribution (0).
 P-value, for z= -19.43 is <0.0001
 If it does not having difference in duration of diarrhea, should
having a small opportunity (p<0.0001) of observe an extreme
difference as observed.
 We can say that it is more probable that the means are
different; difference in mean in the group with ORS
comparing with control group, are different statistically.
Small samples with two independent
samples
 When comparing two independent samples that are
small, we use the t distribution instead of the Normal
distribution to calculate confidence intervals and test
hypotheses.
 The procedure is similar to that we used data from a
sample, with one exception: when calculating the
standard error.
 The common variance:
With small samples, we estimate a common variance using
data from two independent samples. Is the average of the
two variances:
(n1 – 1)s21 + (n2 -1)s21
S2 = --------------------------(n1 – 1) + (n2 -1)

Small samples with two independent
samples
 Standard error of the difference of means in
the samples is:
SE(X1-X2) = s x √1/n1 + 1/n2
Example
 In a study for the treatment of iron deficiency
anemia, with two different types of iron, were
randomized the students in a village school,
to receive either treatment.
 Initially, the levels of hemoglobin (HB) in g /
dl. were similar in both groups.
 After 3 months of treatment were measured
the levels of HB.
Example
Hemoglobin
n Mean (g/Dl.)
s
Iron A
15
14.8
0.5
Iron B
13
12.1
1.1
Confidence interval 95% = difference of means
± multiplier t0.05 x SE
Multiplier t0.05 with n-2 degree of freedom =
2.056
S2 = (15-1)0.52 + (13 -1)1.12 /15-1 + 13-1 =3.5
+14.52/26 = 18/26 =0.69
Example
Hemoglobin
n Mean (g/Dl.)
s
Iron A
15
14.8
0.5
Iron B
13
12.1
1.1
Confidence interval 95% = 14.8 - 12.1 ± 2.056 x
0.32
SE = s √1/n1 + 1/n2 = √0.69 x√1/15 + 1/13=
0.83 x 0.379 = 0.32
CI95% = 2.7± 0.66 =2.04 a 3.36
Example
Hemoglobin
n Mean (g/Dl.)
s
Iron A
15
14.8
0.5
Iron B
13
12.1
1.1
Ho: µ1=µ2 o µ1-µ2= 0
HA: µ1≠µ2 o µ1-µ2≠ 0
t= (14.8 - 12.1)-0 / 0.32 = 8.44
df n-2 = 26 p<0.05
Bibliografía
 1.- Last JM. A dictionary of epidemiology.
New York, 4ª ed. Oxford University Press,
2001:173.
 2.- Kirkwood BR. Essentials of medical
statistics. Oxford, Blackwell Science, 1988: 14.
 3.- Altman DG. Practical statistics for medical
research. Boca Ratón, Chapman & Hall/
CRC; 1991: 1-9.