* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Samples
Survey
Document related concepts
Transcript
CTSI BERD Research Methods Seminar Series
Statistical Analysis II
Mosuk Chow, Ph.D.
Senior Scientist and Professor
Statistics Department
University Park
November 8, 2016
Basic statistical concepts
(from Stat I)
Descriptive statistics (numeric/graphical)
Population distribution vs. Sampling
distribution
Standard Deviation vs. Standard Error
Estimation of population mean
Confidence interval
Hypothesis testing
P-value
Outline for Stat II
Estimate population proportion
Paired design
1-sample t
Non-paired design
2-sample t
Pooled variance versus non-pooled
variance
Estimation of population proportion (p)
Examples:
Proportion of patients who became infected
Proportion of patients who are cured
Proportion of individuals positive on a blood test
Proportion of adverse drug reactions
Proportion of premature infants who survive
Sampling Distribution of Sample Proportion
Sampling distribution of sample proportion can be
approximated by normal distribution when sample
size is sufficiently large (central limit theorem)
The standard error of a sample proportion p is
estimated by:
p̂ (1 p̂)
SE(p̂)
n
95% Confidence Interval for a Proportion
pˆ 2 SE (pˆ )
The rule of thumb for good normal approximation is
n pˆ 5 and n (1 pˆ ) 5
Example
In a study of 200 patients, 90 patients experienced
adverse drug reactions
The estimated proportion who experience an
adverse drug reaction is
90
pˆ
0.45
200
95% confidence interval for the population
proportion is
0.45 0.55
0.45 2
200
= (0.38, 0.52)
Paired design
Paired design
Self-pairing:
Measurements are taken at two distinct points in
time from a single subject (e.g. Before vs. After)
Matched pairs (e.g., twins, eyes, subjects matched
on important characteristics such as age and
gender)
Why pairing?
Control extraneous noise
Control confounding factors that affect the
comparison
Make comparison more precise
Example: Blood Pressure and Oral
Contraceptive Use (n=10 women)
Participant
BP Before OC
After-Before
1
126
2
105
3
104
4
115
…
BP After OC
132
109
102
117
Paired samples
sample
115.6
2nd sample
Sb=11.3
Sa=13.1
1st
Sample Mean:
Sample Standard Deviation:
120.4
6
4
-2
2
Example (cont.)
Scientific questions:
What is the mean change in blood pressure after
oral contraceptives (OC) use in a population of
women who use OC?
Estimate the mean change by a confidence
interval approach
Is there any change in mean blood pressure after
oral contraceptives use in a population of women
who use OC?
Hypothesis testing
Inference on mean change
Due to the design of the study, we can
reduce the BP information on two samples
(women’s BP prior to OC use and the same
subject’s BP after OC use) into one piece of
information: information on the differences in
BP between the times points for the same
subject.
Perform the one sample inference on the
difference for the relevant research question.
Inference on mean change
Reduce the BP information on two samples
(women prior to OC use, women after OC use)
into one piece of information: information on the
differences in BP between the times points.
The sample average of the differences: xdiff
Sample standard deviation of the differences: Sd
95% confidence interval for mean change in BP:
xdiff ± tn-1,0.975
Sd
n
where n is the number of pairs, tn-1,0.975 is the critical
value from t distribution with df=n-1.
The sample average of the differences is 4.8, which can
also be obtained by
xdiff xafter xbefore
(4.8 = 120.4 – 115.6)
The sample standard deviation of the differences is
n
sd
2
(
x
x
)
diff i diff
i 1
n 1
4.6
Example: Blood Pressure and Oral
Contraceptive Use (n=10 women)
Participant
Before
1
2
3
4
…
BP Before OC
126
105
104
115
115.6
Sample
Mean:
SD:
BP After OC
132
109
102
117
120.4
After6
4
-2
2
4.8
xbefore
xafter
xdiff
Sb=11.3
Sa=13.1
Sd=4.6
95% CI of mean change in BP
4.8 ± t9, 0.975 S d
n
4.6
4.8 2.26
10
4.8 ± 2.26 1.45
1.52 to 8.08
Notes
The number 0 is NOT in the confidence interval
(1.52, 8.08)
Because 0 is not in the interval, this suggests there
is a non-zero change in BP over time.
The BP change could be due to factors other than
oral contraceptives.
A control group of comparable women who were
not taking oral contraceptives but taking the
placebo would strengthen this study.
Comparison of Two Independent Samples
A Low Carbohydrate as Compared with a
Low Fat Diet in Severe Obesity1
132 severely obese participants
randomized to one of two diet groups
Participants followed for a six-month
period
At the end of the study period
Participants on the low carbohydrate diet
lost more weight than those on a low fat
diet.
1Samaha,
F., et. al. A Low-Carbohydrate as Compared with a Low-Fat
Diet in Severe Obesity, New England Journal of Medicine 348;21
Comparison of Two Independent Samples
Number of Subjects
Mean Weight Change (kg)
Post-diet less pre-diet
Standard Deviation of Weight
Change (kg)
Diet Group
Low Fat
Low Carb
68
64
-1.8
-5.7
3.9
8.6
Is weight loss associated with diet
type?
Comparison of Two Independent Samples
In statistical terms, is there a difference in the
average weight loss for the participants on the low
fat diet as compared to participants on the low
carbohydrate diet?
Although there are paired pre/post measurements
on each participant, the comparison of interest is
not paired.
For each participant we compute a change in
weight (after diet weight minus before diet weight)
However, we are comparing the changes in weight
between two independent diet groups.
Comparison of Two Independent Samples
We have two samples: {x11, x12, x13,…, x1n1}
and {x21, x22, x23,…, x2n2} drawn from
populations with means 1 and 2 and
variances 12 and 22 , respectively.
The two samples are independent; there is no
pairing of observations.
We would like to estimate the difference of the
population means, 2 - 1.
Using the confidence interval, we can decide
whether the two means are different.
Comparison of Two Independent Samples
We know our best estimate for the
mean (of a single population) is the
sample mean, x .
It would seem sensible to estimate 1
with x1 , and 2 with x2
and 2 – 1 with x2 x1 .
Sampling Distribution of the Difference
in Sample Means
Since we have largish samples (both
greater than 30) we know the sampling
distributions of the sample means in
both groups are approximately normal
It turns out the difference of any
quantities, which are (approximately)
normally distributed, is also normally
distributed.
Sampling Distribution of the Difference
in Sample Means
So, the good news is . . .
The sampling distribution of the difference of
two sample means, each based on large
samples, approximates a normal distribution.
This sampling distribution is centered at the
true mean difference, µ2 - µ1.
Confidence Interval for (2 - 1)
We can construct a confidence interval for
2 - 1 using the (pivotal) quantity
( X 2 X 1 ) ( 2 1)
T
Standard Error( X 2 X 1 )
Two Independent (Unpaired) Samples
The standard error of the difference
for two independent samples is
calculated differently than we did for
paired designs.
The formula for the standard error of
the difference depends on the
sample sizes in both groups and
standard deviations in both groups.
Comparison of Two Independent
Samples
The formula is
x x / n1 / n2
2
1
2
1
2
2
If we follow the same reasoning we did for the
one sample case, we could substitute s1 and s2
for 1 and 2, respectively, to give an estimate
of
sx2 x1 s12 / n1 s22 / n2
Comparison of Two Independent
Samples
The distribution of
Ts
(X 2 X 1 ) ( 2 1 )
S12 / n1 S22 / n2
can be approximated by the t distribution where
the degrees of freedom are calculated as
( s12 / n1 s22 / n2 ) 2
d 2
( s1 / n1 ) 2 /(n1 1) ( s22 / n2 ) 2 /( n2 1)
You may see this referred to as Welch’s or
Satterthwaite’s approximation.
Confidence Interval for (2 - 1)
We can construct a confidence interval for 2
- 1 using the (pivotal) quantity
(X 2 X 1 ) ( 2 1 )
S / n1 S / n2
2
1
2
2
An approximate (1- ) 100% confidence interval
is given by
X 2 X 1 t d ,1 / 2 S / n1 S / n2
2
1
2
2
Comparison of Two Independent Samples with
equal variance (21=22 =2)
If 12 and 22 are unknown, but equal to a
common value 2, we could “pool” our samples
to obtain an estimate of 2 to estimate the
standard error of the difference in sample
means:
The previous estimate we were working with
x2 x1 12 / n1 22 / n2
is an unpooled estimate because we obtained
estimates of 12 and 22 separately.
sx2 x1 s / n1 s / n2
2
1
2
2
Comparison of Two Independent Samples
(cont.)
A pooled estimate of 2 is
n1
s
2
p
(x
x1 ) ( x2 j x2 )
2
1i
i 1
n2
j 1
n1 1 n2 1
2
(n1 1) s12 (n2 1) s22
.
n1 n2 2
When 12=22=2, we have
x x / n1 / n2 / n1 / n2
2
1
2
1
2
2
2
2
Comparison of Two Independent
Samples (cont.)
If we substitute the pooled estimator of 2 into
(X 2 X 1 ) ( 2 1 )
/ n1 / n2
2
1
2
2
(X 2 X 1 ) ( 2 1 )
/ n1 / n2
2
2
,
we have
TP
( X 2 X 1 ) ( 2 1 )
S P2 / n1 S P2 / n2
( X 2 X 1 ) ( 2 1 )
S P2 (1 / n1 1 / n2 )
Comparison of Two Independent
Samples (cont.)
TP follows a t distribution with n1+n2-2 degrees
of freedom.
A (1- ) 100% confidence interval is given by
( X 2 X 1 ) t n1 n2 2, 1 / 2 S (1 / n1 1 / n2 )
2
P
Choosing when to Pool
One rule of thumb is to use the pooled
variances as long as the ratio of the sample
standard deviations (larger s/smaller s) is 2,
but this cutoff is somewhat arbitrary.
Usually the results are not that different.
If you are unsure of which one to use, go with
the separate variance as that is more
conservative.
Diet and Weight Loss Example
A 95% confidence interval is
( X 2 X 1 ) t n1 n2 2,1 / 2 S (1 / n1 1 / n2 )
2
P
(5.7 (1.8)) T68 64 2, 0.975 1.15
3.9 1.98 1.15
3.9 2.277
(6.2,1.6) kg
Back to Blood Pressure and Oral
Contraceptive Use (n=10 women)
Participant
Before
1
2
3
4
…
BP Before OC
126
105
104
115
115.6
BP After OC
132
109
102
117
120.4
After6
4
-2
2
4.8
Sample
Mean:
xbefore
xafter
xdiff
SD:
Sb=11.3
Sa=13.1
Sd=4.6
If we do not realize that we should
use the paired t but use the two
sample t procedure to obtain the CI,
will the interval be wider or
narrower?
Anwser:
Paired t Confidence Interval:
( X 2 X 1 ) t n , 1 / 2 S d2 / n
2-sample t Confidence Interval
( X 2 X 1 ) tn1 n2 2, 1 / 2 S (1 / n1 1 / n2 )
2
P
It is very important to know the
design and use the appropriate
statistical technique to analyze the
data.
If we have a control group for the OC
example, then we will use two
sample t to compare the mean
change in blood pressure in the two
groups.
THE END
Want to learn more statistics
or have consultations, contact:
http://ctsi.psu.edu/ctsiprograms/biostatisticsepidemiologyresearch-design/