Download Standard error

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistical Inference II
Confidence Intervals give:
*A plausible range of values for a population
parameter.
*The precision of an estimate.(When sampling
variability is high, the confidence interval will be
wide to reflect the uncertainty of the
observation.)
*Statistical significance (if the 95% CI does not
cross the null value, it is significant at .05)
Confidence Intervals: Estimating
the Size of the Effect
(Sample statistic) 
(measure of how confident we want to be)  (standard error)
Common Levels of Confidence

Commonly used confidence levels are 90%,
95%, and 99%
Confidence
Level
80%
90%
95%
98%
99%
99.8%
99.9%
Z value
1.28
1.645
1.96
2.33
2.58
3.08
3.27
The true meaning of a
confidence interval




A computer simulation:
Imagine that the true population value is
10.
Have the computer take 50 samples of the
same size from the same population and
calculate the 95% confidence interval for
each sample.
Here are the results…
95% Confidence Intervals
95% Confidence Intervals
For a 95% confidence
3 misses=6% error rate
interval, you can be
95% confident that you
captured the true
population value.
Confidence Intervals for our
weight example…
(Sample statistic) 
(measure of how confident we want to be)  (standard error)
95% CI: 160 1.96  1.5 = 157-163 lbs
99% CI: 160  2.58  1.5 = 156-164 lbs
Note how the confidence intervals
do not cross the null value of 150!

Confidence intervals give the same
information (and more) than hypothesis
tests…
Duality with hypothesis tests.
Null value
95% confidence interval
150 151 152 153 154 155 156 157 158 159 160 161 162 163
Null hypothesis: Average weight is 150 lbs.
Alternative hypothesis: Average weight is not 150 lbs.
P-value < .05
Duality with hypothesis tests.
Null value
99% confidence interval
150 151 152 153 154 155 156 157 158 159 160 161 162 163
Null hypothesis: Average weight is 150 lbs.
Alternative hypothesis: Average weight is not 150 lbs.
P-value < .01
Review Question 1
A 95% confidence interval for a mean:
a.
b.
c.
d.
Is wider than a 99% confidence interval.
Is wider when the sample size is larger.
In repeated samples will include the population mean 95% of
the time.
Will include 95% of the observations of a sample.
Review Question 1
A 95% confidence interval for a mean:
a.
b.
c.
d.
Is wider than a 99% confidence interval.
Is wider when the sample size is larger.
In repeated samples will include the population mean 95%
of the time.
Will include 95% of the observations of a sample.
Review Question 2
Spine bone density is normally distributed in young women,
with a mean of 1.0 g/cm2 and a mean of 0.1 g/cm2. In my
sample of 100 young women runners, the average spine
bone density is .93 g/cm2. What is the 95% confidence
interval?
a.
b.
c.
d.
.93  1.96(.1) = .91-.1.13
.93  (.1) = .83-1.13
1.0  (.1) = .90-1.10
.93  1.96(.01) = .91-.95
Review Question 2
Spine bone density is normally distributed in young women,
with a mean of 1.0 g/cm2 and a mean of 0.1 g/cm2. In my
sample of 100 young women runners, the average spine
bone density is .93 g/cm2. What is the 95% confidence
interval?
a.
b.
c.
d.
.93  1.96(.1) = .91-.1.13
.93  (.1) = .83-1.13
1.0  (.1) = .90-1.10
.93  1.96(.01) = .91-.95
Note how the
confidence interval does
not cross the null value
of 1.0!
Summary: Single population
mean (known )

Hypothesis test:
Z
observed mean  null mean

n

Confidence Interval
confidence interval  observed mean  Z/2 * (

n
)
Examples of Sample Statistics:
Single population mean (known )
Single population mean (unknown )
Single population proportion
Difference in means (ttest)
Difference in proportions (Z-test)
Odds ratio/risk ratio
Correlation coefficient
Regression coefficient
…
Standard deviation is unknown
NOTE: if we were actually doing the above experiment, i.e. sampling 100 doctors, we may not know
the standard deviation of weights in the whole population () ahead of time (unlike with dice, there is
no theoretical variance, only a population variance that we can never know exactly without measuring
the entire population).
To estimate :
n
ˆ  s 

( xi  x ) 2
i 1
n 1
Estimated standard error of the mean:
n
 (x  x)
i
i 1
s
n
basically dividing by n twice…

n 1
n
2
Standard error of the mean
when true sigma is unknown
n
(x
i
 x)
i 1
sx 
s
n

n1
n
2
When  is unknown, use t rather
than Z!



A t-distribution is like a Z distribution,
except has slightly fatter tails to reflect the
uncertainty added by estimating .
The bigger the sample size (i.e., the bigger
the sample size used to estimate ), then
the closer t becomes to Z.
If n>100, t approaches Z.
Student’s t Distribution
Note: t
Z as n increases
Standard
Normal
(t with df = )
t (df = 13)
t-distributions are bellshaped and symmetric, but
have ‘fatter’ tails than the
normal
t (df = 5)
0
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
t
SEE APPENDIX A in your
book!
Student’s t Table
Upper Tail Area
df
.25
.10
.05
1 1.000 3.078 6.314
Let: n = 3
df = n - 1 = 2
 = .10
/2 =.05
2 0.817 1.886 2.920
/2 = .05
3 0.765 1.638 2.353
The body of the table
contains t values, not
probabilities
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
0
2.920 t
t distribution values
With comparison to the Z value
Confidence
t
Level
(10 d.f.)
t
(20 d.f.)
t
(30 d.f.)
Z
____
.80
1.372
1.325
1.310
1.28
.90
1.812
1.725
1.697
1.64
.95
2.228
2.086
2.042
1.96
.99
3.169
2.845
2.750
2.58
Note: t
Z as n increases
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
Practice problem

You want to estimate the average ages of kids
that ride a particular kid’s ride at Disneyland. You
take a random sample of 8 kids exiting the ride,
and find that their ages are: 2,3,4,5,6,6,7,7.
a. Calculate the sample mean.
b. Calculate the sample standard deviation.
c. Calculate the standard error of the mean.
d. Calculate the 99% confidence interval.
Answer (a,b)
a. Calculate the sample mean.
8
X8 
X
i 1
8
i
2  3  4  5  6  6  7  7 40


 5.0
8
8
b. Calculate the sample standard deviation.
8
s X2 

( X i  5) 2
i 1
8 1
s X  3.4  1.9
32  2 2  12  0  2(12 )  2(2 2 ) 24


 3.4
7
7
Answer (c)
c. Calculate the standard error of the mean.
sX 
sX
n

1 .9
8
 .67
Answer (d)
d. Calculate the 99% confidence interval.
mean  s X (t df , / 2 )
5.0  .67 (3.50)  (2.65, 7.35)
t7,.005=3.5
Review Question 3
A t-distribution:
a.
b.
c.
d.
Is approximately a normal distribution if n>100.
Can be used interchangeably with a normal
distribution as long as the sample size is large
enough.
Reflects the uncertainty introduced when using the
sample, rather than population, standard deviation.
All of the above.
Review Question 3
A t-distribution:
a.
b.
c.
d.
Is approximately a normal distribution if n>100.
Can be used interchangeably with a normal
distribution as long as the sample size is large
enough.
Reflects the uncertainty introduced when using the
sample, rather than population, standard deviation.
All of the above.
Example problem, class data:
A two-tailed hypothesis test:
A researcher claims that Stanford affiliates eat
fewer than the recommended intake of 5 fruits
and vegetables per week.
We have data to address this claim: 20 people in the
class provided data on their daily fruit and
vegetable intake.
Do we have evidence to dispute her claim?
Fruit and veggie consumption,
this class…
Mean=3.9 servings
Median=3.5 servings
Mode=3.0 servings
Std Dev=1.5 servings
Answer
1. Define your hypotheses (null, alternative)
H0: P(average servings)=5.0
Ha: P(average servings)≠5.0 servings (two-sided)
2. Specify your null distribution
We do not know the true standard deviation of fruit and veggie
consumption, so we must use a T-distribution to make inferences,
rather than a Z-distribution.
X 20
1.5
~T19 (5.0,
 0.33)
20
Answer, continued
3.
Do an experiment
observed mean in our experiment =3.9 servings
4.
Calculate the p-value of what you observed
T19 
3.9  5
 3.5
0.33
p-value < .05;
5. Reject or fail to reject (~accept) the null hypothesis
Reject! Stanford affiliates eat significantly fewer than the
recommended servings of fruits and veggies.
T19 critical value
for p<.05, two
tailed = 2.093
95% Confidence Interval
X 20 T19,.025 * (standard error )
 3.9  2.093 * (0.33)
 3.2  4.5
H0: P(average servings)=5.0
The 95% CI excludes 5, so p-value <.05
Paired data (repeated measures)
Patient
BP Before (diastolic)
BP After
1
100
92
2
89
84
3
83
80
4
98
93
5
108
98
6
95
90
What about these
data? How do you
analyze these?
Example problem: paired ttest
Patient
Diastolic BP Before
D. BP After
Change
1
100
92
-8
2
89
84
-5
3
83
80
-3
4
98
93
-5
5
108
98
-10
6
95
90
-5
Null Hypothesis: Average Change = 0
Example problem: paired ttest
X
 8  5  3  5  10  5  36

 6
6
6
Change
-8
( 8  6) 2  ( 5  6) 2  ( 3  6) 2 ...
sx 

5
4  1  9  1  16  1
32

 2.5
5
5
-5
-3
-5
sx 
2.5
 1.0
6
60
T5 
 6
1.0
Null Hypothesis: Average Change = 0
With 5 df, T>2.571
corresponds to p<.05
(two-sided test)
-10
-5
Example problem: paired ttest
Change
95% CI : - 6  2.571* (1.0)
 (-3.43, - 8.571)
Note: does not include 0.
-8
-5
-3
-5
-10
-5
Summary: Single population
mean (unknown )

Hypothesis test:
observed mean  null mean
t n 1 
sx
n

Confidence Interval
sx
confidence interval  observed mean  t n -1,/2 * ( )
n
Summary: paired ttest

Hypothesis test:
observed mean d  0
tn 1 
sd
n

Where d=change
over time or
difference within
a pair.
Confidence Interval
sd
confidence interval  observed mean d  t n -1,/2 * ( )
n
Review Question 4
If we have a p-value of 0.03 and so decide that our
effect is statistically significant, what is the probability
that we’re wrong (i.e., that the hypothesis test gave us a
false positive)?
a.
b.
c.
d.
e.
.03
.06
Cannot tell
1.96
95%
Review Question 4
If we have a p-value of 0.03 and so decide that our
effect is statistically significant, what is the probability
that we’re wrong (i.e., that the hypothesis test gave us a
false positive)?
a.
b.
c.
d.
e.
.03
.06
Cannot tell
1.96
95%
Review Question 5
Suppose we take a random sample of 100 people, both
men and women. We form a 90% confidence interval of
the true mean population height. Would we expect
that confidence interval to be wider or narrower than if
we had done everything the same but sampled only
women?
a. Narrower
b. Wider
c. It is impossible to predict
Review Question 5
Suppose we take a random sample of 100 people, both
men and women. We form a 90% confidence interval of
the true mean population height. Would we expect
that confidence interval to be wider or narrower than if
we had done everything the same but sampled only
women?
a. Narrower
b. Wider
c. It is impossible to predict
Standard deviation of
height decreases, so
standard error
decreases.
Review Question 6
Suppose we take a random sample of 100 people, both
men and women. We form a 90% confidence interval of
the true mean population height. Would we expect
that confidence interval to be wider or narrower than if
we had done everything the same except sampled 200
people?
a. Narrower
b. Wider
c. It is impossible to predict
Review Question 6
Suppose we take a random sample of 100 people, both
men and women. We form a 90% confidence interval of
the true mean population height. Would we expect
that confidence interval to be wider or narrower than if
we had done everything the same except sampled 200
people?
a. Narrower
b. Wider
c. It is impossible to predict
N increases so standard
error decreases.
Review Question 7
I am calculating the mean, median, standard deviation, and
standard error for several variables (age, height, weight, income,
blood pressure, etc.) from a sample of 246 patients. If I receive
data for an additional 100 patients, which of the above statistics
(mean, median, standard deviation, or standard error) would be
expected to change substantially?
a.
b.
c.
d.
e.
All of them
Mean, standard deviation, standard error
Standard deviation, standard error
Standard deviation only
Standard error only
Review Question 7
I am calculating the mean, median, standard deviation, and
standard error for several variables (age, height, weight, income,
blood pressure, etc.) from a sample of 246 patients. If I receive
data for an additional 100 patients, which of the above statistics
(mean, median, standard deviation, or standard error) would be
expected to change substantially?
a.
b.
c.
d.
e.
All of them
Mean, standard deviation, standard error
Standard deviation, standard error
Standard deviation only
Standard error only
Examples of Sample Statistics:
Single population mean (known )
Single population mean (unknown )
Single population proportion
Difference in means (ttest)
Difference in proportions (Z-test)
Odds ratio/risk ratio
Correlation coefficient
Regression coefficient
…
Sampling distribution of a sample
proportion
 pˆ  p
 pˆ 
s pˆ 
p=true population proportion.
p(1  p )
n
pˆ (1  pˆ )
n
pˆ ~ Normal( p,
BUT… if you knew p you wouldn’t
be doing the experiment!
pˆ (1  pˆ )
)
n
Always a normal
distribution!
Example

You poll 100 random people in Ohio and
find that 90% approve of Obama’s job as
President. Form a 99% confidence interval
for the true proportion of Obamasupporters in Ohio.
Answer
pˆ  .90
Z  2.575 (for 99% confidence )
p̂(1  p̂)
.90(.10)
s pˆ 

 .03
n
100
.90  Z ( s pˆ )  .90  2.575(.03)  (.82,.98)
margin of error  8%
Key one-sample Hypothesis
Tests…
Test for Ho: μ = μ0 (σ2 unknown):
Test for Ho: p = po:
Z
pˆ  p 0
( p 0 )(1  p 0 )
n
t n 1
x  0

sx
n
Corresponding confidence
intervals…
For a mean (σ2 unknown):
For a proportion:
x  t n 1, / 2 
( pˆ )(1  pˆ )
pˆ  Z  / 2 
n
sx
n
Symbol overload!








n: Sample size
Z: Z-statistic (standard normal)
tdf: T-statistic (t-distribution with df degrees of freedom)
p: (“p-hat”): sample proportion
X: (“X-bar”): sample mean
s: Sample standard deviation
p0: Null hypothesis proportion
0: Null hypothesis mean
Two-sample tests
Examples of Sample Statistics:
Single population mean (known )
Single population mean (unknown )
Single population proportion
Difference in means (ttest)
Difference in proportions (Z-test)
Odds ratio/risk ratio
Correlation coefficient
Regression coefficient
…
The two-sample t-test
The two-sample T-test

Is the difference in means that we observe
between two groups more than we’d
expect to see based on chance alone?
The standard error of the
difference of two means
 x y 
x
n
2

y
2
m
**First add the variances and then take the square root of
the sum to get the standard error.
Distribution of differences
If X and Y are the averages of n and m subjects, respectively:

X n  Ym ~ N (  x   y ,
x
n
2

 y2
m
)
But…


As before, you usually have to use the
sample SD, since you won’t know the true
SD ahead of time…
So, again becomes a T-distribution...
Estimated standard error
(using pooled variance estimate)
 xy 
sp
2

n
sp
2
m
where :
n
 s 2p 

i 1
( xi  xn )2 
m

( yi  ym )2
i 1
nm2
The degrees
of freedom
are n+m-2
Example: two-sample t-test

In 1980, some researchers reported that “men
have more mathematical ability than women” as
evidenced by the 1979 SAT’s, where a sample of
30 random male adolescents had a mean score ±
1 standard deviation of 436±77 and 30 random
female adolescents scored lower: 416±81
(genders were similar in educational
backgrounds, socio-economic status, and age).
Do you agree with the authors’ conclusions?
Data Summary
Group 1:
women
Group 2:
men
n
Sample
Mean
Sample
Standard
Deviation
30
416
81
30
436
77
Two-sample t-test
1. Define your hypotheses (null, alternative)
H0: ♂-♀ math SAT = 0
Ha: ♂-♀ math SAT ≠ 0 [two-sided]
Two-sample t-test
2. Specify your null distribution:
F and M have approximately equal standard
deviations/variances, so make a “pooled”
estimate of variance.
s 2p

(n  1)sm2  (m  1)s 2f
M 30  F30
nm2
(29)77 2  (29)812

 6245
58
6245 6245
~T58 (0,

)
30
30
6245 6245

 20.4
30
30
Two-sample t-test
3. Observed difference in our experiment = 20
points
Two-sample t-test
4. Calculate the p-value of what you observed
20  0
T58 
 .98
20.4
p  .33
5. Do not reject null! No evidence that men are better
in math ;)
Example 2

Example: Rosental, R. and Jacobson, L.
(1966) Teachers’ expectancies:
Determinates of pupils’ I.Q. gains.
Psychological Reports, 19, 115-118.
The Experiment
(note: exact numbers have been altered)




Grade 3 at Oak School were given an IQ test at
the beginning of the academic year (n=90).
Classroom teachers were given a list of names of
students in their classes who had supposedly
scored in the top 20 percent; these students were
identified as “academic bloomers” (n=18).
BUT: the children on the teachers lists had
actually been randomly assigned to the list.
At the end of the year, the same I.Q. test was readministered.
Example 2

Statistical question: Do students in the treatment
group have more improvement in IQ than
students in the control group?
What will we actually compare?
 One-year change in IQ score in the treatment
group vs. one-year change in IQ score in the
control group.
Results:
“Academic
bloomers”
(n=18)
Change in IQ score:
12.2 (2.0)
12.2 points
The standard deviation
of change scores was
2.0 in both groups. This
affects statistical
significance…
Controls
(n=72)
8.2 (2.0)
8.2 points
Difference=4 points
What does a 4-point difference
mean?


Before we perform any formal statistical
analysis on these data, we already have a
lot of information.
Look at the basic numbers first; THEN
consider statistical significance as a
secondary guide.
Is the association statistically
significant?


This 4-point difference could reflect a true
effect or it could be a fluke.
The question: is a 4-point difference bigger
or smaller than the expected sampling
variability?
Hypothesis testing
Step 1: Assume the null hypothesis.
Null hypothesis: There is no difference between
“academic bloomers” and normal students (=
the difference is 0%)
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true

These predictions can be made by
mathematical theory or by computer
simulation.
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true—math theory:
s p  4.0
2
4 4
"gifted"   control ~T88 (0,

 0.52)
18 72
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true—computer simulation:


In computer simulation, you simulate
taking repeated samples of the same size
from the same population and observe the
sampling variability.
I used computer simulation to take 1000
samples of 18 treated and 72 controls
Computer Simulation Results
Standard error is
about 0.52
3. Empirical data
Observed difference in our experiment =
12.2-8.2 = 4.0
4. P-value
t-curve with 88 df’s has slightly wider
cut-off’s for 95% area (t=1.99) than a
normal curve (Z=1.96)
12.2  8.2
4
t88 

8
.52
.52
p-value <.0001
Visually…
If we ran this
study 1000 times
we wouldn’t
expect to get 1
result as big as a
difference of 4
(under the null
hypothesis).
5. Reject null!

Conclusion: I.Q. scores can bias
expectancies in the teachers’ minds and
cause them to unintentionally treat
“bright” students differently from those
seen as less bright.
Confidence interval (more
information!!)
95% CI for the difference: 4.0±1.99(.52) =
(3.0 – 5.0)
t-curve with 88 df’s
has slightly wider cutoff’s for 95% area
(t=1.99) than a normal
curve (Z=1.96)
Summary: ttest, pooled variance
T
X n  Ym
sp
2
n
s 2p 

sp
2
~ t n m  2
m
(n  1) s x2  (m  1) s 2y
nm2
What if our standard deviation
had been higher?

The standard deviation for change scores in
treatment and control were each 2.0. What
if change scores had been much more
variable—say a standard deviation of 10.0
(for both)?
Standard error is
0.54
Standard error is 2.58
Std. dev in
change scores =
2.0
Std. dev in
change scores =
10.0
With a std. dev. of 10.0…
LESS STATISICAL POWER!
Standard
error is 2.58
If we ran this
study 1000 times,
we would expect to
get +4.0 or –4.0
12% of the time.
P-value=.12
Don’t forget: The paired T-test



Did the control group in the previous
experiment improve
at all during the year?
Do not apply a two-sample ttest to answer
this question!
After-Before yields a single sample of
differences…
Data Summary
Group 1:
Change
n
Sample
Mean
Sample
Standard
Deviation
72
+8.2
2.0
Paired Ttest
t71 
8.2
8.2

 32
2
.24
2.0
72
p-value <.0001
Paired Ttest

Correlated (paired) data: either the same
person on different occasions or pairs of
people who are more similar to each other
than to individuals from other pairs
(husband-wife pairs, twin pairs, matched
cases and controls, etc.)
Review Question 8
In a medical student class, the 6 people born on odd days had heights of
64.64 inches; the 10 people born on even days had heights of 71.15 inches.
Height is roughly normally distributed. Which of the following best represents
the correct statistical test for these data?
a. Z  71.1  64.6  6.5  1.44; p  ns
4.5
4.5
b. Z 
71.1  64.6 6.5

 4.6; p  .0001
4.5
1.4
16
c. T14  71.1  64.6  6.5  2.7; p  .05
4.7 2 4.7 2

10
6
d.
T14 
2.4
71.1  64.6 6.5

 1.44; p  ns
4.5
4.5
Review Question 8
In a medical student class, the 6 people born on odd days had heights of
64.64 inches; the 10 people born on even days had heights of 71.15 inches.
Height is roughly normally distributed. Which of the following best represents
the correct statistical test for these data?
a. Z  71.1  64.6  6.5  1.44; p  ns
4.5
4.5
b. Z 
71.1  64.6 6.5

 4.6; p  .0001
4.5
1.4
16
c. T14  71.1  64.6  6.5  2.7; p  .05
4.7 2 4.7 2

10
6
d.
T14 
2.4
71.1  64.6 6.5

 1.44; p  ns
4.5
4.5
Review Question 9
Fifty percent of the people born on odd days
commute to school by car two or more times
per week, whereas only 40 percent of people
born on even days do.
To test whether this difference is more than
expected by chance, we would use:
a.
b.
c.
d.
A two-sample ttest
A paired ttest
A one-sample proportions test
A two-sample proportions test
Review Question 9
Fifty percent of the people born on odd days
commute to school by car two or more times
per week, whereas only 40 percent of people
born on even days do.
To test whether this difference is more than
expected by chance, we would use:
a.
b.
c.
d.
A two-sample ttest
A one-sample ttest
A one-sample proportions test
A two-sample proportions test
Review Question 10
Standard error is:
a.
b.
c.
d.
e.
For a given variable, its standard deviation
divided by the square root of n.
A measure of the variability of a sample statistic.
The inverse of sample size.
A measure of the variability of a characteristic.
All of the above.
Review Question 10
Standard error is:
a.
b.
c.
d.
e.
For a given variable, its standard deviation
divided by the square root of n.
A measure of the variability of a sample
statistic.
The inverse of sample size.
A measure of the variability of a characteristic.
All of the above.
Two sample proportions
(Z test)

Compare the difference in proportions
between two independent
samples…(binary outcome rather than
continuous outcome)
Z-test
Z
p1  p 2
p(1  p ) p(1  p )

n1
n2
n1 p1  n2 p 2
p
n1  n2
Example: Difference in
proportions

Research Question: Are antidepressants a
risk factor for suicide attempts in children
and adolescents?
Example modified from: “Antidepressant Drug Therapy and Suicide in Severely
Depressed Children and Adults ”; Olfson et al. Arch Gen Psychiatry.2006;63:865872.

Example: Difference in
Proportions



Design: Case-control study
Methods: Researchers used Medicaid records to
compare prescription histories between 263
children and teenagers (6-18 years) who had
attempted suicide and 1241 controls who had
never attempted suicide (all subjects suffered
from depression).
Statistical question: Is a history of use of
antidepressants more common among cases than
controls?
Example

Statistical question: Is a history of use of
antidepressants more common among heart
disease cases than controls?
What will we actually compare?
 Proportion of cases who used antidepressants in
the past vs. proportion of controls who did
Results
Any antidepressant
drug ever
No (%) of
cases
(n=263)
No (%) of
controls
(n=1241)
120 (46%)
448 (36%)
46%
36%
Difference=10%
What does a 10% difference
mean?


Before we perform any formal statistical
analysis on these data, we already have a
lot of information.
Look at the basic numbers first; THEN
consider statistical significance as a
secondary guide.
Is the association statistically
significant?


This 10% difference could reflect a true
association or it could be a fluke in this
particular sample.
The question: is 10% bigger or smaller than
the expected sampling variability?
Hypothesis testing
Step 1: Assume the null hypothesis.
Null hypothesis: There is no association
between antidepressant use and suicide
attempts in the target population (= the
difference is 0%)
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true

These predictions can be made by
mathematical theory or by computer
simulation.
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true—mathematical theory:
p̂ cases
p̂ controls ~ N(0, σ =
568
568
568
568
(1
)
(1
)
1504
1504 + 1504
1504 = .033)
263
1241
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true—computer simulation:


In computer simulation, you simulate
taking repeated samples of the same size
from the same population and observe the
sampling variability.
I used computer simulation to take 1000
samples of 263 cases and 1241 controls.
Also: Computer Simulation Results
Standard error is
about 3.3%
Hypothesis Testing
Step 3: Do an experiment
We observed a difference of 10% between
cases and controls.
Hypothesis Testing
Step 4: Calculate a p-value—mathematical theory:
.10
Z=
= 3.0; p = .003
.033
P-value from our simulation…
We also got 3
results as small
or smaller than
–10%.
When we ran this
study 1000 times,
we got 1 result as
big or bigger than
10%.
P-value
From our simulation, we
estimate the p-value to be:
4/1000 or .004
Hypothesis Testing
Step 5: Reject or do not reject the null hypothesis.
Here we reject the null.
Alternative hypothesis: There is an association
between antidepressant use and suicide in the
target population.
What would a lack of statistical
significance mean?

If this study had sampled only 50 cases
and 50 controls, the sampling variability
would have been much higher—as
shown in this computer simulation…
Standard error is
about 3.3%
Standard error is
about 10%
263 cases and
1241 controls.
50 cases and 50
controls.
With only 50 cases and 50 controls…
Standard
error is
about 10%
If we ran this
study 1000 times,
we would expect to
get values of 10%
or higher 170 times
(or 17% of the
time).
Two-tailed p-value
Two-tailed
p-value =
17%x2=34%
Key two-sample Hypothesis
Tests…
Test for Ho: μx- μy = 0 (σ2 unknown, but roughly equal):
t n 2 
x y
s 2p
nx

s 2p
; s 2p 
( n x  1) s x2  ( n y  1) s 2y
n2
ny
Test for Ho: p1- p2= 0:
Z
pˆ 1  pˆ 2
n1 pˆ 1  n2 pˆ 2
;p
n1  n2
( p )(1  p ) ( p )(1  p )

n1
n2
Corresponding confidence
intervals…
For a difference in means, 2 independent samples
(σ2’s unknown but roughly equal):
( x  y )  t n  2, / 2 
s 2p
nx

s 2p
ny
For a difference in proportions, 2 independent
samples:
( pˆ 1  pˆ 2 )  Z  / 2 
( p)(1  p ) ( p )(1  p)

n1
n2