Download Statistical Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Confidence interval wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
9 - Statistical Inference
9.1 – Introduction to Hypothesis or Significance Testing
In several examples we considered in the sections on the binomial distribution and the
sampling distribution we made conclusions about what was true of the population based
upon a probability. In essence in all of those situations we conducted a hypothesis test.
A hypothesis test is used to make decisions about some characteristic of the population(s)
being studied. A jury trial in the U.S. legal system is an example of a hypothesis test. As
juror we told to assume the defendant is innocent and then evidence (data) is presented by
prosecution in support of the defendant’s guilt. As a juror we could ask ourselves “what
is the probability/chance that they would have this body of evidence against a person who
was truly was innocent?” If this seems highly unlikely in our minds then we reject the
defendant’s innocence in favor of their guilt.
In statistics, we use the same logic in making decisions. For example, suppose we have a
new treatment that we think will be better than an existing one. Initially we must play the
skeptic or “devil’s advocate” and assume that the new treatment is no better than the
current. A clinical trial is then conducted and evidence (data) is gathered that researchers
hope will support their belief that the new treatment is superior. Suppose that the data
from the clinical trial supports their contention. In order to convince others in the
medical community that the new treatment is better, we consider the likelihood that we
would obtain the observed evidence from the clinical trial if in fact the new method was
no better than the existing one. If it is unlikely the observed results would be obtained by
chance variation alone, then one could conclude that the initial assumption about the new
treatment is not true and conclude that the new treatment is indeed better.
There are essentially five steps in conducting a hypothesis test which we will discuss
below. To aid in our understanding of these steps we will connect them to some
examples we have considered previously which are shown below.
Example 1: Success of New Medical Treatment
The current method used to treat certain form of childhood cancer has a 60% success rate. A new method
has been proposed that will hopefully have an even higher success rate. To test the new method a sample n
= 20 patients with this form of cancer are treated and we find all but 4 patients experience remission. Can
we conclude on the basis of this result the new method has a higher success rate/lower failure rate?
Example 2: From the cholesterol level example in the sampling distribution section
Suppose we took sample of adult males between the ages of 50 – 60 who are also strict vegetarians and
obtained sample mean of X  188 mg/dl. Does this provide evidence that the subpopulation of
vegetarians have a lower mean cholesterol level that the greater population of men in this age group?
1
Steps in a Hypothesis Test
1.
2.
2
3.
4.
3
5.
4
9.2 - Hypothesis Testing for a Single Population Mean (  )
Null Hypothesis ( H o )
Alternative Hypothesis ( H a )
p-value area
  o
  o
  o
  o
  o
  o
Upper-tail
Lower-tail
Two-tailed
(perform test using CI for  )
Test Statistic (in general)
In general the basic form of most test statistics is given by:
(estimate)  (hypothesized value)
Test Statistic =
(think “z-score”)
SE (estimate)
which measures the discrepancy between the estimate from our sample and the
hypothesized value under the null hypothesis.
Intuitively, if our sample-based estimate is “far away” from the hypothesized value
assuming the null hypothesis is true, we will reject the null hypothesis in favor of the
alternative or research hypothesis. Extreme test statistic values occur when our estimate
is a large number of standard errors away from the hypothesized value under the null.
The p-value is the probability, that by chance variation alone, we would get a test statistic
as extreme or more extreme than the one observed assuming the null hypothesis is true.
If this probability is “small” then we have evidence against the null hypothesis, in other
words we have evidence to support our research hypothesis.
Type I and Type II Errors ( &  ) (D - 7.9, G – Ch. 9)
Truth
Decision
H o true
H a true
Reject H o
Fail to
Reject H o
Example: Testing Wells for a Perchlorate in Morgan Hill & Gilroy, CA.
EPA guidelines suggest that drinking water should not have a perchlorate level exceeding
4 ppb (parts per billion). Perchlorate contamination in California water (ground, surface,
and well) is becoming a widespread problem. The Olin Corp., a manufacturer of road
5
flares in the Morgan Hill area from 1955 to 1996 was is the source of the perchlorate
contamination in the this area.
Suppose you are resident of the Morgan Hill area which alternative do you want well
testers to use and why?
H o :   4 ppb
H a :   4 ppb
or
H o :   4 ppb
H a :   4 ppb
Test Statistic for Testing a Single Population Mean (  ) ~ (t-test)
t
X  o
X  o
~ t-distribution with df = n – 1.
or t 
s
SE ( X )
n
Assumptions:
When making inferences about a single population mean we assume the following:
1. The sample constitutes a random sample from the population of interest.
2. The population distribution is normal. This assumption can be relaxed when
our sample size in sufficiently “large”. How large the sample size needs to be is
dependent upon how “non-normal” the population distribution is.
Example 1: Length of Stay in a Nursing Home
In the past the average number of nursing home days required by elderly patients before
they could be released to home care was 17 days. It is hoped that a new program will
reduce this figure. Do these data support the research hypothesis?
3
5
12
7
22
6
2
18
9
8
20
15
3
36
38
43


Normality does not appear to be satisfied here!
Notice the CI for the mean length of stay is (8.38 days, 22.49 days).
6
Hypothesis Test:
Ho :
1)
HA :
2) Choose 
Test statistic
3) Compute test statistic
4) Find p-value (use t-Probability Calculator.JMP)
5) Make decision and interpret
To perform a t-test in JMP, select Test Mean from the LOS pull-down menu and enter
value for mean under the null hypothesis,17.0 in this example.
Conclusion:
7
Example 2: Creatinine Levels in End-Stage Renal Disease Patients
A nephrology nurse believes that the population mean creatinine level for end-stage renal
disease patients is greater than 8.4 mg/dl. A sample n = 12 of end-stage renal disease
patients undergoing hemodialysis was taken and their creatinine levels were recorded
resulting in the data below:
6.7 8.0 13.4 12.4 14.9 6.3 16.5 13.5 12.4 16.9 9.1 13.0
Do these data provide evidence in support the nurse’s research hypothesis?
Hypothesis Test:
Ho :
1)
HA :
2) Choose 
Test statistic
3) Compute test statistic
4) Find p-value (use t-Probability Calculator.JMP)
5) Make decision and interpret
8
In JMP
* click High Side for upper-tail test, similarly for the other two types of alternatives.
9
9.3 - Power and Sample Size
(Daniel, Section 7.10, Gerstman, Section 9.6 & 11.7)
In designing a study, we often times have prior knowledge about how large of difference
or effect we want to be able to detect as significant. We can use this knowledge to help
us determine the sample size to use in conducting our study.
Without going through the derivation, the formula we use is for determining n is
2
 ( z  z  )( ) 
n
  round this up to the next integer value.
 ( 1   o ) 
where,
z  standard normal value corresponding to  , Type I error probability.
For one-tailed hypotheses these values are: z.01  2.33, z.05  1.645, z.10  1.28
For two-sided alternatives these values are: z.01  2.576, z.05  1.96, z.10  1.645
z  = standard normal value corresponding to  , Type II error probability.
z.01  2.33, z.05  1.645, z.10  1.28 , etc. basically the one-tailed values above.
  conservative “guess” for the true population standard deviation (   Range/4)
1  population mean assuming alternative hypothesis is true
 o  population mean assuming null hypothesis is true.
(1   o )  difference we wish to be able to detect as significant with a probability of
( 1   ) using a significance level  test.
Power = P(Reject Ho|Ho is False) = 1  
  P(Reject Ho| Ho is True)
Example: Suppose the nurse in previous example wanted to have a 95% chance of
detecting a mean of 10.4 as being significantly greater than 8.4 using a significance test
with   .05. What sample size would be required is she believes the range of creatinine
levels will be between 5 mg/dl and 20 mg/dl?
10
Use Power calculator in JMP:
DOE > Sample Size and Power
11
9.4 - Statistical Inference for a Population Proportion
(Daniel, Sections 6.5 & 7.5, Gerstman, Ch. 16)
We have already discussed the confidence interval as a means of make a decision about
the value of the population proportion, p. The CI results are summarized below.
General Form for a C for Population Proportion (p)
estimate  (table value)  (estimated standard error of estimate)
pˆ  (normal table value) 
Margin of Error  z
pˆ (1  pˆ )
n
or
pˆ  z
pˆ (1  pˆ )
n
pˆ (1  pˆ )
n
Normal Table Values:
Confidence Level
95 % (   .05)
90 % (   .10 )
99 % (   .01 )
z
1.96
1.645
2.576
Hypothesis Tests for p
H o : p  po
H a : p  po or p  po or p  po (use CI for two - sided which is rarely of interest for p anyway)
Test Statistic
pˆ  p o
z
~ standard normal N(0,1) provided npo  5 and n(1  po )  5
p o (1  p o )
n
When our sample size is small or we want an exact test we can use the binomial
distribution to calculate the p-value.
Example: Hypertension During Finals Week
In the college-age population in this country (18 – 24 yr. olds), about 9.2% have
hypertension (systolic BP > 140 mmHg and/or diastolic BP > 90 mmHg). Suppose a
sample of n = 196 WSU students is taken during finals week and 29 have hypertension.
Do these data provide evidence that the percentage of students who are hypertensive
during finals week is higher than 9.2%?
12
Hypothesis Test:
Ho :
1)
HA :
2) Choose 
Test statistic
3) Compute test statistic
4) Find p-value (use Normal Probability Calculator.JMP)
5) Make decision and interpret
Binomial Exact Test
Use n = 196 and p = .092 (hypothesized value under Ho)
Exact p-value =
Confidence Interval:
13
9.4 - Determining the Sample Size Necessary for a Desired
Margin of Error (Daniel, Sections 6.7 & 6.8, Gerstman, Sections 11.7 & 16.6)
Population Mean ()
In the Sampling Distributions handout we found that the interval
X  1.96 

up to X  1.96 

n
n
had a 95% chance of covering the population mean. The margin of error for this interval
is
Margin of Error  1.96

n
If we wanted this to be at most E units what sample size should we use?
This says that to obtain a 95% CI for  with a margin of error no larger than E we should
use a sample size of
 1.96   
n

 E 
2
However we cannot calculate this in practice unless we know  ? Which of course we
don’t and furthermore we don’t even know s, the sample standard deviation, until we
have our data in hand. Thus in order to use this result we need to plug in a “best guess”
for  . This guess might come from:
 Pilot study where s = sample standard deviation is calculated
 Prior studies
 Use approximation based on the Range,   Range . Granted we don’t
4
the range until the data is collected, but we might be able to guess the
largest and smallest values we might expect to see when collect our data.
 In general, using a  which is too large is better than using one that is too
small.
Example: What sample size would be necessary to estimate the mean cholesterol level
for the population of females between the ages of 30 – 40 with a 95% confidence interval
that has a margin of error no larger than 5 mg/dl?
14
Population Proportion (p)
In the handout 8 – Sampling Distributions we found that the interval
p(1  p)
p(1  p)
up to pˆ  1.96 
pˆ  1.96 
n
n
had a 95% chance of covering the population proportion. The margin of error for this
interval is
p(1  p)
Margin of Error  1.96
n
If we wanted this to be at most E units what sample size should we use?
This says that to obtain a 95% CI for p with a margin of error no larger than E we should
use a sample size of
 1.962 p(1  p) 

n  
2
E


However we cannot calculate this in practice unless we know p? Which of course we
don’t and furthermore we don’t even know p̂ , the sample proportion, until we have our
data in hand. In order to use this result we need to plug in a “best guess” for p. This
guess might come from:
 Pilot study where p̂ = sample proportion is calculated
 Prior studies
 Use the worst case scenario by noting that p(1  p)  .25 and is equal to
.25 when p=.50. Using p = .50 simplifies the formula to
1.96 2
n
4E 2
If you have no “best guess” for p this conservative approach is the one
you should take.
Example: How many patients would need to be used to estimate the success rate of
medical procedure, if researchers initially believe the success rate is no smaller than 85%
and wish to estimate the true success rate using a 95% confidence interval with a margin
of error no larger than E = .03?
What if they wish to assume nothing about the success rate initially?
15
9.5 - Comparing Two Population Means Using Dependent
Samples or Matched Pairs ( 1 vs.  2 )
(Daniel, Section 7.4, Gerstman Ch. 12)
When using dependent samples each observation from population 1 has a one-to-one
correspondence with an observation from population 2. One of the most common cases
where this arises is when we measure the response on the same subjects before and after
treatment. This is commonly called a “pre-test/post-test” situation. However, sometimes
we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race,
gender, socio-economic status, height, weight, etc... to control for the influence these
characteristics might have on the response of interest. When this is done we say that we
are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects
we are in effect removing the effect of potential confounding factors, thus giving us a
clearer picture of the difference between the two populations being studied.
DATA FORMAT
Matched Pair X 1i
1
2
3
...
n
X 2i
X 11 X 21
X 12 X 22
X 13 X 23
...
...
X 1n X 2 n
d i  X 1i  X 2i
d1
d2
d3
...
dn
The hypotheses are
H o : d  o
H a :  d   o or H a :  d   o or H a :  d 
For the sample paired differences
( d i ' s ) find the sample mean (d )
and standard deviation ( s d ) .
We actually can hypothesize any size difference for the
mean of the paired differences that we want. For example
if wanted to show a certain diet resulted in at least a 10 lb.
decrease in weight then we could test if the paired
differences: d = Initial weight – After diet weight had
greater than 10 ( H a :  d  10 lbs. )
mean
o
Test Statistic for a Paired t-test
d  o
t
~ t-distribution with df = n - 1
sd
n
Note:  o  the hypothesized value for the mean paired difference
100(1-  )% CI for
d
s
 where t comes from the appropriate quantile of t-distribution df = n – 1.
d  t  d

n


This interval has a 100(1-  )% chance of covering the true mean paired difference.
16
Example: Effect of Captopril on Blood Pressure
In order to estimate the effect of the drug Captopril on blood pressure (both systolic and
diastolic) the drug is administered to a random sample n = 15 subjects. Each subjects
blood pressure was recorded before taking the drug and then 30 minutes after taking the
drug. The data are shown below.
Syspre – initial systolic blood pressure
Syspost – systolic blood pressure 30 minutes after taking the drug
Diapre – initial diastolic blood pressure
Diapost – diastolic blood pressure 30 minutes after taking the drug
Research Questions:
 Is there evidence to suggest that Captopril results in a systolic blood pressure
decrease of at least 10 mmHg on average in patients 30 minutes after taking it?
 Is there evidence to suggest that Captopril results in a diastolic blood pressure
decrease of at least 5 mmHg on average in patients 30 minutes after taking it?
For each blood pressure we need to consider paired differences of the form
d i  BPpre i  BPpost i . For paired differences defined this way, positive values
correspond to a reduction in their blood pressure ½ hour after taking Captopril. To
answer research questions above we need to conduct the following hypothesis tests:
H o :  syspre syspost  10 mmHg
and
H o :  diaprediapost  5 mmHg
H a :  syspre syspost  10 mmHg
H a :  diaprediapost  5 mmHg
Below are the relevant statistical summaries of the paired differences for both blood
pressure measurements.
17
The t-statistics for both tests are given below:
Systolic BP
Diastolic BP
We can use the t-Probability Calculator in JMP to find the associated p-values or better
yet use JMP to conduct the entire t-test.
Systolic Blood Pressure
Diastolic Blood Pressure
Both tests result in rejection of the null hypotheses. This we have sufficient evidence to
suggest that taking Captopril will result in mean decrease in systolic blood pressure
exceeding 10 mmHg (p = _______) and a mean decrease in diastolic blood pressure
exceeding 5 mmHg (p = _______). Furthermore we estimate that the mean change in
systolic blood pressure will be somewhere between _______ mmHg and ______ mmHg,
and that the mean change in diastolic blood pressure could be as large as ______ mmHg.
18
9.6 – Comparing Two Population Means Using Independent
Samples (Daniel, Sections 6.4 & 7.3, Gerstman, Ch. 12)
Example 1: Effect of Cadmium Oxide on Hemoglobin Levels in Dogs
An experiment was conducted to determine the examine the potential effect cadmium
oxide might have on the hemoglobin levels of dogs. It is thought that cadmium oxide
exposure would lead to decreased hemoglobin levels. 10 dogs were randomly assigned to
the control group and 15 were randomly assigned to the cadmium oxide exposure group.
Research Question: Is there evidence to suggest the cadmium oxide exposure lowers the
hemoglobin level found in dogs?
To answer the question of interest we need tools for comparing the population mean
hemoglobin level for dogs not exposed to cadmium oxide vs. that for dogs that have had
cadmium oxide exposure, i.e. how does  control compare to  exp osed .
Basic Idea:
19
Case 1 ~ Equal Populations Variances/Standard Deviations
(  1 2   2 2 =  2  common variance to both populations)
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are equal.
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are “large”.
100(1 -  )% Confidence Interval for ( 1   2 )
( X 1  X 2 )  t  SE ( X 1  X 2 )
where
1
2 1
SE ( X 1  X 2 )  s p  
 n1 n 2
where
(n  1) s1  (n2  1) s 2
 1
n1  n 2  2
2
sp
2



2
if n1  n 2
Rule of Thumb for Checking
Variance Equality
If the larger sample variance is
more than twice the smaller
sample variance do not assume
the variances are equal.
s 2p 
s12  s 22
if n1  n2
2
s p is called the “pooled estimate of the common variance ( 2 ) ”. The degrees of
2
freedom for the t-distribution is df  n1  n2  2 .
CI Example: Cadmium Exposure and Hemoglobin Levels
20
Hypothesis Testing ( 1 vs.  2 )
The general null hypothesis says that the two population means are equal, or equivalently
there difference is zero. The alternative or research hypothesis can be any one of the
three usual choices (upper-tail, lower-tail, or two-tailed). For the two-tailed case we can
perform the test by using a confidence interval for the difference in the population means
discussed above.
H o : 1   2 or equivalently (  1   2 )  0
H a: 1   2 or equivalently ( 1   2 )  0 (upper - tail)
or
H a : 1   2 or equivalently ( 1   2 )  0 (two - tailed, USE CI! )
etc....
Test Statistic
t
(X1  X 2 )  0
~ t-distribu tion with df  n1  n2  2
SE ( X 1  X 2 )
where the SE ( X 1  X 2 ) is as defined in the confidence interval section above.
Testing Example: Cadmium Exposure and Hemoglobin Levels
In JMP
21
EXAMPLE 2: Normal Human Body Temperatures Females vs. Males
Do men and women have the same normal body temperature? Putting this into a
statement involving parameters that can be tested:
H o :  F   M or ( F   M )  0
H a :  F   M or ( F   M )  0
 F  mean body temperature for females.
 M  mean body temperature for males.
Intuitive Decision
In order to determine whether or not the null or alternative hypothesis is true, you could
review the summary statistics for the variable you are interested in testing across the two
groups. Remember, these summary statistics and/or graphs are for the observations you
sampled, and to make decisions about all observations of interest, we must apply some
inferential technique (i.e. hypothesis tests or confidence intervals)
One of the best graphical displays for this situation is the side-by-side boxplots. To get
side-by-side boxplots, select Analyze > Fit Y by X. Place Gender in the X box and
Temperature in the Y box. Place the mean diamonds on the boxplots and jitter the
points. The more separation there is in the mean diamonds, the more likely we are to
reject the null hypothesis (i.e data tends to support the alternative hypothesis).
Summary Statistics
x F  98.39
x M  98.10
s F  .743
s M  .699
n F  65
n M  65
Assumptions
1. The two groups must be independent of each other.
2. The observation from each group should be normally distributed.
3. Decide whether or not we wish to assume the population variances are equal.
22
Assessing Normality of the Two Sampled Populations
To assess normality we select Normal Quantile Plot from the Oneway Analysis pulldown menu as shown below.
Normality appears to
be satisfied here.
Checking the Equality of the Population Variances
To test the equality of the population variances select Unequal Variances from the
Oneway Analysis pull-down menu.
The test is:
Ho : F   M
Ha : F   M
JMP gives four different tests for examining the equality of population variances. To use
the results of these tests simply examine the resulting p-values. If any/all are less than .10
or .05 then worry about the assumption of equal variances and use the unequal variance tTest instead of the pooled t-Test.
p-values for testing variances
23
Performing the Test
To perform the two-sample t-test for independent samples:
 assuming equal population variances select the Means/Anova/Pooled t option
from Oneway-Analysis pull-down menu.
 assuming unequal population variances select t-Test from the Oneway-Analysis
pull-down menu.
Because we have no evidence
against the equality of the
population variances
assumption we will use a
pooled t-Test to compare the
population means.
Several new boxes of output will appear below the graph once the appropriate option has
been selected, some of which we will not concern ourselves with. The relevant box for us
will be labeled t Test as shown below for the mean body temperature comparison.
Because we have concluded
that the equality of variance
assumption is reasonable for
these data we can refer to the
output for the t-Test assuming
equal variances.

Summary Statistics
What is the test statistic for this test?
x F  98.39
x M  98.10
s F  .743
s M  .699
n F  65
n M  65
t
x A  xB

SE ( x A  x B )
x A  xB
 1
1 

s 

 n A nB 
~ t  distributi on df  n A  n B  2
2
p
where,
sp 
2
(n A  1) s A2  (nY  1) s B2
(n A  1)  (n B  1)
s A  sB
2
2
or
sp 
2
24
2
when n A  n B

What is the p-value?

What is your decision for the test?

Write a conclusion for your findings.
Construct and Interpret a 95% CI for the Difference in the
Mean Body Temperatures ( F   M )
Summary Statistics
For body temperature and gender example we have:
x F  98.39
x M  98.10
s F  .743
s M  .699
n F  65
n M  65
t
x A  xB

SE ( x A  x B )
x A  xB
 1
1 
s 


n
n
B 
 A
~ t  distributi on df  n A  n B  2
2
p
where,
sp 
2
Interpretation of the CI for ( F   M )
(n A  1) s A2  (nY  1) s B2
(n A  1)  (n B  1)
s A  sB
2
2
or
sp 
2
( A   B )
( x A  xB )  t  SE( x A  xB )
CI for
25
2
when n A  n B
Case 2 ~ Unequal Populations Variances/Standard Deviations (  1   2 )
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are NOT equal.
(This can be formally tested or use rule o’thumb)
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are “large”.
100(1 -  )% Confidence Interval for ( 1   2 )
( X 1  X 2 )  t  SE ( X 1  X 2 )
where
2
SE ( X 1  X 2 ) 
2
s1
s
 2
n1
n2
and
df 
 s1 2 s 2 2 


n  n 
2 
 1
2
rounded down to the nearest integer
2
2
 s1 2 
 s2 2 




n 
n 
1
2

 

n1  1
n2  1
The t-quantiles are the same as those we have seen previously.
Hypothesis Testing
Test Statistic
t
(X1  X 2 )  0
~ t-distribution with df = (see formula above)
SE ( X 1  X 2 )
where the SE ( X 1  X 2 ) is as defined in the confidence interval section above.
26
Example: Cell Radii of Malignant vs. Benign Breast Tumors
In your previous work with these data you noticed that the radii of malignant breast
tumor cells were generally larger than the radii of benign breast tumor cells. Assuming
the researchers initially hypothesized that cancerous breast tumor cells have larger radii
than non-cancerous cells, conduct a test to see if this is supported by these data.
The cell radii of the malignant tumors certainly appear to be larger than the cell radii of
the benign tumors. The summary statistics support this with sample means/medians of
rough 17 and 12 units respectively. The 95% CI’s for the mean cell radius for the two
tumor groups do not overlap, which further supports a significant difference in the cell
radii exists.
Formally Testing the Equality of Population Variances (see Section 7.8)
Ho :  
2
1
2
2
H a :  12   22
or equivalently
Ho :1   2
In JMP
H a : 1   2
Test Statistic
 s12 s 22 
F  max  2 , 2  which has an F-distribution with
 s 2 s1 
numerator df = n1  1 and denominator df = n2  1 if
s12  s 22 and are reversed if s12  s 22 .
If F is large then one variance is several times larger than
the other and we should reject the null in favor of the
alternative. There is separate F-table for each level of
significance. If our test statistic value exceeds the value in
the table for appropriate level of significance and degrees of
freedom we reject the null hypothesis. BETTER TO JUST
USE JMP!!!
27
Because we conclude that the population variances are unequal we should use the nonpooled version to the two-sample t-test. No one does this by hand, so we will use JMP.
Conclusion:
28
9.7 - Comparing Two Population Proportions Using
Independent Samples ( p1 vs. p2 )
(Daniel, Section 7.6, Gerstman, Ch. 17)
100(1 -  )% Confidence Interval for ( p1  p2 )
( pˆ 1  pˆ 2 )  z  SE( pˆ 1  pˆ 2 )  (provided n1 & n2 are “large”)
where,
SE ( pˆ 1  pˆ 2 ) 
“Large” sample sizes
Both samples should be
larger than 25 and both
samples should have more
than 5 “successes” and more
than 5 “failures”
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )

n1
n2
and
Confidence Level
95 % (   .05)
90 % (   .10 )
99 % (   .01 )
z
1.96
1.645
2.576
Hypothesis Testing
Test Statistic
z
( pˆ 1  pˆ 2 )  0
~ standard normal dist. provided n1 , n2 are “large” (see above)
SE ( pˆ 1  pˆ 2 )
where the SE ( X 1  X 2 ) is as defined in the confidence interval section above
SE ( pˆ 1  pˆ 2 ) 
1
1
pq  
 n1 n2



where
p
# of successes in combined sample
n1  n2

n1 pˆ 1  n2 pˆ 2
n1  n2
q  1 p
29
Example: In a study conducted to investigate the non-clinical factors associated with
the method of surgical treatment received for early-stage breast cancer, some patients
underwent a modified radical mastectomy while others had a partial mastectomy
accompanied by radiation therapy. We are interested in determining whether the age of
the patient affects the type of treatment she receives. In particular, we want to know
whether the proportions of women under 55 are identical in the two treatment groups.
In a sample of n = 658 women who underwent a partial mastectomy and subsequent
radiation therapy contains 292 women under 55, which is a sample percentage of 44.4%.
In another independently drawn sample of n = 1580 women who received a modified
radical mastectomy 397 women were under 55, which is a sample percentage of 25.1%.
Conduct a test comparing the proportion of women each group under the age of 55
and construct a 95% confidence interval for the difference in these two proportions.
30
Fisher’s Exact Test for Comparing Two Proportions (in JMP)
Enter these data as you would for setting up a 2 X 2 contingency table.
In JMP, select Analyze > Fit Y by X and place Surgery in the X box and Age in the Y.
The following output from JMP is obtained.
The results of Fisher's Exact Test are always included in the JMP output whenever we are working with a
2 X 2contingency table.
The three p-values given are for testing the following:
(1) Left, p-value = 1.000 is for testing if the proportion of women under age 55 is larger for the modified
radical mastectomy group than the partial mastectomy group. This is clearly not supported as the p-value
>> .05. Obviously we would not conclude this when only 25% of women in the mod. rad. Group were
under age 55 compared to 44.4% in the partial mastectomy group.
(2) Right, p-value < .0001 is for testing if the proportion of women under age 55 is larger for the partial
mastectomy group than the modified radical mastectomy group. The fact this p-value is highly significant
suggests that the proportion of women under age 55 in partial mastectomy group is indeed greater than the
proportion under 55 in the modified radical mastectomy group. This was the research hypothesis.
(3) 2-Tail, p-value < .0001 is for testing if the proportion of women under age 55 differs between the two
surgery groups. The fact this p-values is highly significant suggests that these proportion of women under
age 55 is not the same for both surgery groups.
31
Example 2: Low Birth Weight and Smoking
These data come from a study looking at the effects of smoking during pregnancy on
birth weight. Amongst the 381 non-smokers in the study, 13 had babies with low birth
weight, while amongst the 299 mothers who smoked during pregnancy, 28 had babies
with low birth weight. Is there evidence to suggest that the proportion of babies born
with low birth weight is greater for mothers who smoked during pregnancy?
Normal
Birth
Weight
Low Birth
Weight
Nonsmoker
368
96.59%
13
3.41%
381
Smoker
271
90.64%
28
9.36%
299
Column
Totals
639
41
Smoking
Status
Row
Totals
680
Hypothesis Test:
1)
2)
3)
4)
5)
32
Construct and interpret a 95% CI for ( p smo ker  pnonsmo ker )
Fisher’s Exact Test Results from JMP
Conclusion:
Find and interpret the RR and OR for low birth weight associated with smoking
during pregnancy (Note: this was on Assignment 3).
33
9.8 - Confidence Intervals for the RR and OR
(Daniel, Section 12.7, Gerstman Ch.18)
1) OR =
Disease
Present
Disease
Absent
Risk factor present
a
b
Risk factor absent
c
d
________
RR = _________________
95% CI for OR:
1) Take natural logarithm of OR, to obtain ln( OR ) .
Normal
Birth
Weight
Low Birth
Weight
Nonsmoker
368
96.59%
13
3.41%
381
3) Find ln( OR )  1.96  SE (ln( OR )) to obtain (LCL, UCL)
Smoker
271
90.64%
28
9.36%
299
4) 95% CI for OR is then given by (e LCL , eUCL )
Column
Totals
639
41
2) Compute SE(ln(OR)) =
1 1 1 1
  
a b c d
Smoking
Status
Smoking and Birthweight Example:
34
Row
Totals
680
CI for RR (not in text):
1) Take natural logarithm of RR to obtain ln( RR ) .
2) Compute SE(ln(RR)) =
b
d

a ( a  b ) c (c  d )
Normal
Birth
Weight
Low Birth
Weight
Nonsmoker
368
96.59%
13
3.41%
381
Smoker
271
90.64%
28
9.36%
299
Column
Totals
639
41
3) Find ln( RR )  1.96  SE (ln( RR )) to obtain (LCL, UCL)
Smoking
Status
4) 95% CI for RR is then given by (e LCL , eUCL )
Row
Totals
680
Smoking and Birthweight example:
35