Download Inferences for a Single Population Mean ( )

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

German tank problem wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
5 – Review of Basic Statistical Inference
5.1 – Review of the Sampling Distribution Concept with
Applications and Examples
When take a sample of size n from a population and calculate summary statistics like the
sample mean (X ) , the sample median (Med), the sample variance ( s 2 ), the sample
standard deviation (s), or the sample proportion ( p̂ ) we must realize that these quantities
will _________________________________________________________________
and hence are themselves ________________________________________.
Any random variable in statistics has a probability distribution that determines the
likelihood of certain values of the random variable being obtained. The distribution of a
summary statistic, e.g. the sample mean (X ) is called the
______________________________________.
In this handout we explore the sampling distributions of the sample mean ( X ) and the
sample proportion ( p̂ ).
5.1.1 - Sampling Distribution of X and Applications
The sample mean ( X ) is a random quantity that varies from sample to sample. The
probability distribution the sample mean follows is called the sampling distribution of X .
The sampling distribution demo I showed in class is found at the following web address:
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/
45
The Central Limit Theorem (CLT) ~ tells us about the sampling distributions of
the sample mean ( X ). There is also a version (which we will see later) that tells us about
the sampling distribution of the sample proportion ( p̂ ) .
The CLT for X says the following:
1.
2.
3. The sampling distribution will be ___________ if either of the conditions
below are met:

or if

We now consider applications of the central limit theorem (CLT).
Applications to Decision Making
Example 1: Cholesterol levels of adult males (50-60 yrs. old)
The mean blood cholesterol level of adult males (50-60 yrs. old) is 200 mg/dl with a
standard deviation of 30 mg/dl. Assume also that blood cholesterol levels are
approximately normally distributed in this population.
a) What can be said about the sampling distribution of the sample mean ( X ) when
drawing a sample of size n = 25 from this population?
b) Give a range of values that we would expect the sample mean ( X ) to fall
approximately 95% of the time.
46
c) Suppose we took sample of adult males between the ages of 50 – 60 who are also
strict vegetarians and obtained sample mean of X  182 mg/dl. Does this provide
evidence that the subpopulation of vegetarians have a lower mean cholesterol level that
the greater population of men in this age group? Explain.
Example 2: S/R Ratio
The objectives of a study by Skjelbo et al. (1996) were to examine (a) the relationship
between chloroguanide metabolism and efficacy in malaria prophylaxis and (b) the
mephenytoin metabolism and its relationship to chloroguanide metabolism among
Tanzanians. From information provided by urine specimens from the n = 216 subjects,
the investigators computed the ratio of unchanged S-mephenytoin to R-mephenytoin (S/R
ratio).
Is there evidence that the S/R ratio of vaccinated Tanzanians is greater than .275?
47
Confidence Intervals for the Population Mean 
Example: Suppose we are trying to estimate the birth weight of infants born to women
who smoke during pregnancy. A sample of n = 73 women who smoked during
pregnancy and the birth weight of their baby was obtained yielding a sample mean
of X  6.08 lbs..
This is called a _____________________ for the population mean () because it yields a
single value for this unknown quantity.
A better estimate might be 6.08 lbs. give or take _____ lbs., i.e. ______ up to _______.
This is called an __________________________ as it gives a range or interval of
plausible values for the population mean.
How do we know this if this a good interval estimate?
What properties should a good interval estimate have?
1)
2)
The central limit theorem states that if our sample size (n) is sufficiently large, then

X 
X ~ N ( ,
) which also says by standardizing that Z 
~ N (0,1)

n
n
This means that when we collect our data the probability our observed sample mean will
fall within two standard errors of the mean is approximately .95 or a 95% chance, or
more precisely
X 


P(2  Z  2)  P(2 
 2)  P(2 
 X    2
)

n
n
n
P(   2

 X 2

)  .9544
n
n
To make this 95% exactly, we simply use 1.96 in place of 2.00 in the expression above,
because P(-1.96 < Z < 1.96) = .9500. For 99% confidence we use ________ and for 90%
we use ________ in place of 1.96.
Starting with the statement,
P(1.96 
X 

 1.96)  .9500
n
we can perform similar algebraic manipulations to those above to isolate the population
mean in the middle of the inequality instead. By doing this we will obtain an interval
that has an approximate 95% chance of covering the true population mean (.
48
This says that the interval from X  1.96 

up to X  1.96 

has a 95% chance of
n
n
covering the true population mean . This interval is simply the sample mean plus or
minus roughly two standard errors. However, this interval cannot be calculated in
practice! WHY?
A “simple fix” to this would be replace ____ by the estimated standard deviation from
our data _____.
The problem with our “simple fix” is that the distribution of
X 
is not a standard
s
n
normal, i.e. N(0,1)!!!
FACT: If the population we are sampling from is approximately normal then
X 
has a t-distribution with degrees of freedom df = n – 1.
s
n
What does a t-distribution look like?
Facts about the t-distribution:



Examples: Using the t-table to find confidence intervals
a) n = 20 and 95% confidence t =
b) n = 20 and 99% confidence t =
c) n = 50 and 90% confidence t =
d) n = 10 and 95% confidence t =
49
The basic form of most confidence intervals is:
(estimate)  (table value)( SE of estimate)
MARGIN OF ERROR
General Form for a Confidence Interval for the Mean
For the population mean we have,
X  (t  table value)SE ( X )
or
X t
s
n
The appropriate columns in t-distribution table for the different confidence intervals are
as follows:
90% Confidence look in the .05 column (if n is “large” we can use 1.645)
95% Confidence look in the .025 column (if n is “large” we can use 1.960)
99% Confidence look in the .005 column (if n is “large” we can use 2.576)
Example: Suppose we are trying to estimate the birth weight of infants born to women
who smoke during pregnancy. A sample of n = 73 women from Baltimore who smoked
during pregnancy and the birth weight of their baby was obtained yielding a sample mean
of X  6.08 lbs. with a sample standard deviation of s = 1.45 lbs.
a) Use this information to find a 95% CI for the mean birth weight of infants born to
mothers who smoked during pregnancy found, assuming that birth weights for this
population are normally distributed.
b) Suppose a sample of n = 113 Baltimore mothers who did not smoke during pregnancy
was obtained and a sample mean birth weight of X  6.71 lbs. with a standard deviation
of s = 1.66 lbs was obtained. Find a 95% confidence interval for the mean birth weight
of infants born to nonsmoking mothers.
50
c) Does this interval in conjunction with the interval obtained for mothers who smoked
during pregnancy provide evidence that infants born to smoking mothers have a lower
mean birth weight?
51
5.1.2 – Sampling Distribution of p̂ and Applications
Approximate Sampling Distribution of the Sample Proportion ( p̂ )
As with the sample mean ( X ) the sample proportion ( p̂ ) is also random, as it too varies
from sample to sample. The sampling distribution of p̂ has the following properties:
1. The mean of the sampling distribution is the population proportion (p)
2. The standard deviation of the sampling distribution or the standard error of
p̂ and is given by:
SE ( pˆ ) 
p  population proportion (unknown)
p(1  p)
where
n  sample size
n
3. The sampling distribution is approx. normal provided n is “sufficiently large”.
np  5
n(1  p )  5
* Note : some recommend using 10 in place of 5.
Note: When estimating proportions large sample sizes are general ly used (e.g. n > 100)
Exact Sampling Distribution of the Sample Proportion ( p̂ )
The sample proportion comes from a binomial probability experiment. A binomial
probability experiment satisfies the following conditions
1. There are a fixed number of n trials carried out.
2. The outcome of a given trial is either a “success” or “failure”.
3. The probability of success (p) remains constant from trial to trial.
P( success )  p and P( failure )  1  p  q
4. The trials are independent, i.e. the outcome of a trial is not affected by the
outcome of any other trial.
A binomial random variable X is defined to the number of “successes” in n independent
trials where the P(success) = p is constant. The sample proportion is defined in terms of
X
the number of successes observed in our random sample of size n, pˆ  .
n
Binomial Probability Function
n
n!
P( X  x)    p x q n  x 
p x q n x , x  0,1,..., n
x!(n  x)!
 x
the coefficient in front denotes the number of ways to obtain x successes in n trials.
52
The binomial distribution function gives P( X  x) . We can use the binomial distribution
in making inferences about the true population proportion (or binomial probability of
“success”) (p). For example suppose we trying to determine if a coin is fair, i.e. p =
P(Head) = .50. What will have to happen in order for you to believe that coin is biased in
favor of landing heads, i.e. p = P(Head) > .50?
USING THE SAMPLING DISTRIBUTION TO MAKE INFERENCES ABOUT
THE POPULATION PROPORTION (both the approximate and exact approaches)
Example: New Method for Treating a Certain Illness/Disease
Suppose the current treatment method for certain disease has 70% success rate. A new
method has been proposed that will hopefully have a higher success rate. The new
method is administered to a sample n = 50 patient and 40 have successful treatment.
Can we conclude on the basis of this result that the new method has a higher success
rate?
Using the Normal Approximation to the Sampling Distribution
53
Using Normal Probability Calculator in JMP
Using the Exact Binomial Sampling Distribution
Here n = 50, the observed number of successes X = 40 and if the new method is not
better then the hypothesized proportion (i.e. assumed true initially) is p = .70.
54
CONFIDENCE INTERVALS FOR THE POPULATION PROPORTION (p)
Motivating Example: In a study conducted to investigate the non-clinical factors
associated with the method of surgical treatment received for early-stage breast cancer,
some patients underwent a modified radical mastectomy while others had a partial
mastectomy accompanied by radiation therapy. We are interested in determining whether
the age of the patient affects the type of treatment she receives. In particular, we want to
know whether the proportions of women under 55 are identical in the two treatment
groups.
A sample of n = 658 women who underwent a partial mastectomy and subsequent
radiation therapy contains 292 women under 55, which is a sample percentage of 44%.
A better estimate might be 44% give or take 4%, i.e. estimating that the actual percentage
of women who receive this form of treatment under the age of 55 is between 39% and
48%. This is called an “interval estimate”, as it gives a range or interval of plausible
values for the population proportion/percentage. As with the population mean discussed
earlier, we wish this interval to be narrow enough to provide useful information about
this unknown percentage, yet have a high probability or chance of covering the actual
percentage of women under 55 amongst those opting for this course of treatment for
early-stage breast cancer.
The central limit theorem for proportions states that if our sample size (n) is sufficiently
p(1  p)
large, then pˆ ~ N ( p,
) . This means that when we take our sample and find our
n
sample proportion, p̂ , the probability our observed sample proportion will fall within
approximately two standard errors of the population proportion is roughly 95%, or more
precisely
P( p  1.96 
p(1  p)
p(1  p)
 pˆ  p  1.96 
)  .9500  Recall: P 1.96  Z  1.96  .9500
n
n
Starting with this statement we can perform some algebraic manipulations to isolate the
population proportion, p,in the middle of the inequality above. By doing this we will see
that the resulting interval will have a 95% chance of covering the true population
proportion (p).
After a Wonderfully Simple Mathematical Derivation:

p(1  p)
p(1  p)
up to pˆ  1.96 
has a 95%
n
n
chance of covering the true population proportion p. This interval is simply the sample
proportion plus or minus roughly two standard errors, i.e. pˆ  1.96  SE ( pˆ ) . However,
this interval cannot be calculated in practice! WHY?
This says that the interval from pˆ  1.96 
55
A simple fix is to replace ______ by our sample based estimate ________. Provided the
sample size is sufficient large the resulting interval will still have an approximate 95%
chance of covering the true population proportion. This gives what we should technically
call the estimated standard error of the proportion, but when we say “standard error of the
proportion” it is assumed this estimated version is the one we are talking about because in
reality the population proportion p is NOT known. If p were known we would not be
conducting a study in first place!
General Form for a C for Population Proportion (p)
estimate  (table value)  (estimated standard error of estimate)
pˆ  (normal table value) 
Margin of Error  z
pˆ (1  pˆ )
n
or
pˆ  z
pˆ (1  pˆ )
n
pˆ (1  pˆ )
n
Normal Table Values:
95% Confidence we use z = 1.96
90% Confidence we use z = 1.645
99% Confidence we use z = 2.576
Again we see the confidence interval has the basic form:
ESTIMATE  (TABLE VALUE)  (STANDARD ERROR OF THE ESTIMATE)
MARGIN OF ERROR
In other words we take our estimate plus or minus a certain number of standard errors to
obtain the confidence interval, i.e. plus or minus the margin of error.
Example: Early-Stage Breast Cancer Treatment Method and Age (cont’d)
In a sample of n = 658 women who underwent a partial mastectomy and subsequent
radiation therapy contains 292 women under 55, which is a sample percentage of 44.4%.
Find a 95% CI for the true proportion of women under 55 in this population.
56
In a sample of n = 1580 women who received a modified radical mastectomy 397 women
were under 55, which is a sample percentage of 25.1%. Find a 95% CI for the true
proportion of women under 55 in this population.
Do these intervals suggest that the proportion of women under the age of 55 differs
significantly for these two courses of treatment of early-stage breast cancer?
57
One-Sided Confidence Intervals
One-Sided CI’s for the Population Mean (
Lower Bound for 
s
X t
n
Upper Bound for 
s
X t
n
Where t comes from the t-distribution with df = n – 1. The appropriate columns in Table
A.4 for the different confidence intervals are as follows:
90% Confidence look in the .10 column
95% Confidence look in the .05 column
99% Confidence look in the .01 column
One-Sided CI’s for the Population Proportion (p)
Lower Bound for p
pˆ (1  pˆ )
pˆ  z
n
Upper Bound for p
pˆ (1  pˆ )
pˆ  z
n
Where z comes from the standard normal distribution. The appropriate values for the
different confidence intervals are as follows:
90% Confidence use z = 1.280
95% Confidence use z = 1.645
99% Confidence use z = 2.330
58
5.2 – Review of the Basic Steps in a Hypothesis Test
Several of the examples in the different sections of 5.1 above are really examples of a
hypothesis test. In this section we review the formal process of carrying out a hypothesis
test. As the course progresses we relax the formality of the process but keep in mind the
steps list below are always there even if they are written out or discussed explicitly.
Before we look at hypothesis testing for a single population mean (  ) we will examine
the five basic steps in a hypothesis test and introduce some important terminology and
concepts.
Steps in a Hypothesis Test
1.
2.
3.
59
4.
5.
* 6.
60
5.3 - Hypothesis Test for a Single Population Mean (  )
Null Hypothesis ( H o )
Alternative Hypothesis ( H a )
p-value area
  o
  o
  o
  o
  o
  o
Upper-tail
Lower-tail
Two-tailed
(perform test using CI for  )
Test Statistic (in general)
In general the basic form of most test statistics is given by:
(estimate)  (hypothesized value)
Test Statistic =
(think “z-score”)
SE (estimate)
which measures the discrepancy between the estimate from our sample and the
hypothesized value under the null hypothesis.
Intuitively, if our sample-based estimate is “far away” from the hypothesized value
assuming the null hypothesis is true, we will reject the null hypothesis in favor of the
alternative or research hypothesis. Extreme test statistic values occur when our estimate
is a large number of standard errors away from the hypothesized value under the null.
The p-value is the probability, that by chance variation alone, we would get a test statistic
as extreme or more extreme than the one observed assuming the null hypothesis is true.
If this probability is “small” then we have evidence against the null hypothesis, in other
words we have evidence to support our research hypothesis.
Truth
Type I and Type II Errors ( &  )
Decision
H o true
H a true
Reject H o
Fail to
Reject H o
61
Example: Testing Wells for a Perchlorate in Morgan Hill & Gilroy, CA.
EPA guidelines suggest that drinking water should not have a perchlorate level exceeding
4 ppb (parts per billion). Perchlorate contamination in California water (ground, surface,
and well) is becoming a widespread problem. The Olin Corp., a manufacturer of road
flares in the Morgan Hill area from 1955 to 1996 was is the source of the perchlorate
contamination in the this area.
Suppose you are resident of the Morgan Hill area which alternative do you want well
testers to use and why?
H o :   4 ppb
H a :   4 ppb
or
H o :   4 ppb
H a :   4 ppb
Test Statistic for Testing a Single Population Mean (  ) ~ (t-test)
t
X  o
X  o
~ t-distribution with df = n – 1.
or t 
s
SE ( X )
n
Assumptions:
When making inferences about a single population mean we assume the following:
1. The sample constitutes a random sample from the population of interest.
2. The population distribution is normal. This assumption can be relaxed when
our sample size in sufficiently “large”. How large the sample size needs to be is
dependent upon how “non-normal” the population distribution is.
Example 1: Length of Stay in a Nursing Home
In the past the average number of nursing home days required by elderly patients before
they could be released to home care was 17 days. It is hoped that a new program will
reduce this figure. Do these data support the research hypothesis?
3
5
12
7
22
6
2
18
9
8
20
15
3
36
38
43
62


Normality does not appear to be satisfied here!
Notice the CI for the mean length of stay is (8.38 days, 22.49 days).
Hypothesis Test:
Ho :
1)
HA :
2) Choose 
Test statistic
3) Compute test statistic
4) Find p-value (use t-Probability Calculator.JMP)
63
5) Make decision and interpret
To perform a t-test in JMP, select Test Mean from the LOS pull-down menu and enter
value for mean under the null hypothesis,17.0 in this example.
Conclusion:
In JMP
The hypothesized mean is the value for the population mean under the
null hypothesis. If normality is questionable or the same size is small
a nonparametric test may be more appropriate. We will discuss
nonparametric tests later in the course.
The graphic on the left is obtained by
selecting P-value animation from the pulldown menu next to Test Mean=value.
* click Low Side for a lower-tail test, similarly for the other two types of alternatives.
64
5.4 - Determining the Sample Size Necessary for a Desired
Margin of Error When Estimating the Population Mean
Population Mean ()
In the discussion above we found that the interval
X  1.96 

up to X  1.96 

n
n
had a 95% chance of covering the population mean. The margin of error for this interval
is
Margin of Error ( E )  1.96

n
If we wanted this to be at most E units what sample size should we use?
This says that to obtain a 95% CI for  with a margin of error no larger than E we should
use a sample size of
 1.96   
n

 E 
2
However we cannot calculate this in practice unless we know  ? Which of course we
don’t and furthermore we don’t even know s, the sample standard deviation, until we
have our data in hand. Thus in order to use this result we need to plug in a “best guess”
for  . This guess might come from:
 Pilot study where s = sample standard deviation is calculated
 Prior studies
 Use approximation based on the Range,   Range . Granted we don’t
4
the range until the data is collected, but we might be able to guess the
largest and smallest values we might expect to see when collect our data.
 In general, using a  which is too large is better than using one that is too
small.
Example: What sample size would be necessary to estimate the mean age of DUI
offenders in MN with a 95% confidence interval that has a margin of error no larger than
2 years?
65
5.5 - Power and Sample Size for Testing the Population Mean
In designing a study, we often times have prior knowledge about how large of difference
or effect we want to be able to detect as significant. We can use this knowledge to help
us determine the sample size to use in conducting our study.
Without going through the derivation, the formula we use is for determining n is
2
 ( z  z  )( ) 
n
  round this up to the next integer value.
 ( 1   o ) 
where,
z  standard normal value corresponding to  , Type I error probability.
For one-tailed hypotheses these values are: z.01  2.33, z.05  1.645, z.10  1.28
For two-sided alternatives these values are: z.01  2.576, z.05  1.96, z.10  1.645
z  = standard normal value corresponding to  , Type II error probability.
z.01  2.33, z.05  1.645, z.10  1.28 , etc. basically the one-tailed values above.
  conservative “guess” for the true population standard deviation (   Range/4)
1  population mean assuming alternative hypothesis is true
 o  population mean assuming null hypothesis is true.
 (1   o )  difference we wish to be able to detect as significant with a probability of
( 1   ) using a significance level  test.
Power = P(Reject Ho|Ho is False) = 1  
  P(Reject Ho| Ho is True)
Example: Suppose in the nursing home stay study wanted to have a 95% chance of
detecting a mean of 14 days as being significantly less than 17 days using a significance
test with   .05. What sample size would be required is she believes the range of length
of stay will be between 2 days and 50 days?
Use Power calculator in JMP:
DOE > Sample Size and Power
66
If you leave to of the fields empty amongst the Power, Sample Size and Difference to
Detect you will get a plot of the two left empty versus one another for the case where the
specified field is as chosen.
For example we could specify in the previous example we could just specify the power as
.95 and then obtain a plot of sample size (n) vs. difference to detect (). From this we
can see that the approximate sample size needed to detect a reduction of 5 days is around
n = 70 to 75.
Below we have specified a difference to detect  = 3 days and left the sample size and
power fields empty which gives us a plot of power (1 - ) vs. sample size (n). For a
sample size of n = 125 we find a power of around 80%.
67
5.6 - Statistical Inference for a Population Proportion (p)
We have already discussed the confidence interval as a means of make a decision about
the value of the population proportion, p. The CI results are summarized below.
General Form for a C for Population Proportion (p)
estimate  (table value)  (estimated standard error of estimate)
pˆ  (normal table value) 
Margin of Error  z
pˆ (1  pˆ )
n
or
pˆ  z
pˆ (1  pˆ )
n
pˆ (1  pˆ )
n
Normal Table Values:
Confidence Level
95 % (   .05)
90 % (   .10 )
99 % (   .01 )
z
1.96
1.645
2.576
Hypothesis Tests for p
H o : p  po
H a : p  po or p  po or p  po (use CI for two - sided which is rarely of interest for p anyway)
Test Statistic
pˆ  p o
z
~ standard normal N(0,1) provided npo  5 and n(1  po )  5
p o (1  p o )
n
When our sample size is small or we want an exact test we can use the binomial
distribution to calculate the p-value as follows:
Reject Ho in favor of Ha: p > po if P(X > x | n,po) < 
Reject Ho in favor of Ha: p < po if P(X < x | n,po)< 
Reject Ho in favor of Ha: p  po if either P(X > x | n,po) < or P(X < x | n,po) < 
(This is called the Binomial Exact Test for p)
Example: Hypertension During Finals Week
In the college-age population in this country (18 – 24 yr. olds), about 9.2% have
hypertension (systolic BP > 140 mmHg and/or diastolic BP > 90 mmHg). Suppose a
sample of n = 196 WSU students is taken during finals week and 29 have hypertension.
Do these data provide evidence that the percentage of students who are hypertensive
during finals week is higher than 9.2%?
68
Hypothesis Test:
Ho :
1)
Ha :
2) Choose 
Test statistic
3) Compute test statistic
4) Find p-value (use Normal Probability Calculator.JMP)
Binomial Exact Test
Use n = 196 and p = .092 (hypothesized value under Ho)
Exact p-value =
5) Make decision and interpret
6) Confidence Interval for p
69
5.7 – Sample Size and Power for Inference about the
Population Proportion (p)
CI for the Population Proportion (p)
In the discussion above we found that the interval
p(1  p)
p(1  p)
up to pˆ  1.96 
pˆ  1.96 
n
n
had a 95% chance of covering the population proportion. The margin of error for this
interval is
p(1  p)
Margin of Error  1.96
n
If we wanted this to be at most E units what sample size should we use?
This says that to obtain a 95% CI for p with a margin of error no larger than E we should
use a sample size of
 1.962 p(1  p) 

n  
2
E


However we cannot calculate this in practice unless we know p? Which of course we
don’t and furthermore we don’t even know p̂ , the sample proportion, until we have our
data in hand. In order to use this result we need to plug in a “best guess” for p. This
guess might come from:
 Pilot study where p̂ = sample proportion is calculated
 Prior studies
 Use the worst case scenario by noting that p(1  p)  .25 and is equal to
.25 when p=.50. Using p = .50 simplifies the formula to
1.96 2
n
4E 2
If you have no “best guess” for p this conservative approach is the one
you should take.
Example: How many patients would need to be used to estimate the success rate of
medical procedure, if researchers initially believe the success rate is no smaller than 85%
and wish to estimate the true success rate using a 95% confidence interval with a margin
of error no larger than E = .03?
70
What if they wish to assume nothing about the success rate initially?
Power Considerations
In designing a study, we often times have prior knowledge about how large of difference
or effect we want to be able to detect as significant. We can use this knowledge to help
us determine the sample size to use in conducting our study.
Suppose we wish to conduct a test at the  level, having a power = 1 - , of
detecting a difference between the true proportion (p1) and the hypothesized proportion
(po) of po- p|, then the sample size necessary to achieve that goal is given by
 z
n


po (1  po )  z 
po  p1
p1 (1  p1 ) 



2
z  standard normal value corresponding to  , Type I error probability.
For one-tailed hypotheses these values are: z.01  2.33, z.05  1.645, z.10  1.28
For two-sided alternatives these values are: z.01  2.576, z.05  1.96, z.10  1.645
z  = standard normal value corresponding to  , Type II error probability.
z.01  2.33, z.05  1.645, z.10  1.28 , etc. basically the one-tailed values above.
Example: Suppose we view an increase from 9.2% to 12.0% to be a meaningful increase
in the percentage of college students exhibiting hypertension. Suppose we wish to have
an 80% chance of detecting such an increase as statistically significant at the 
level, what sample size do we need?
n = 659
71
5.8 - Comparing Two Population Means Using Dependent or
Paired Samples
When using dependent samples each observation from population 1 has a one-to-one
correspondence with an observation from population 2. One of the most common cases
where this arises is when we measure the response on the same subjects before and after
treatment. This is commonly called a “pre-test/post-test” situation. However, sometimes
we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race,
gender, socio-economic status, height, weight, etc... to control for the influence these
characteristics might have on the response of interest. When this is done we say that we
are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects
we are in effect removing the effect of potential confounding factors, thus giving us a
clearer picture of the difference between the two populations being studied.
DATA FORMAT
Matched Pair X 1i
1
2
3
...
n
X 2i
X 11 X 21
X 12 X 22
X 13 X 23
...
...
X 1n X 2 n
d i  X 1i  X 2i
d1
d2
d3
...
dn
For the sample paired differences
( d i ' s ) find the sample mean (d )
and standard deviation ( s d ) .
The general hypotheses are
H o : d  o
H a :  d   o or H a :  d   o or H a :  d   o
 d  mean for the population of paired differences
Note: While 0 is usually used as the hypothesized mean
difference under the null, we actually can hypothesize any
size difference for the mean of the paired differences that
we want. For example if wanted to show a certain diet
resulted in at least a 10 lb. decrease in weight then we
could test if the paired differences: d = Initial weight –
After diet weight had mean greater than 10
( H a :  d  10 lbs. )
Test Statistic for a Paired t-Test
(estimate of mean paired difference) - (hypothesized mean difference)
t
SE(estimate)
d  o
~ t - distributi on with df  n - 1
sd
n
where  o  the hypothesized value for the mean paired difference under the null
hypothesis.

100(1-  )% CI for
d
s
 where t comes from the appropriate quantile of t-distribution df = n – 1.
d  t  d

n


This interval has a 100(1-  )% chance of covering the true mean paired difference.
72
Example: Effect of Captopril on Blood Pressure
In order to estimate the effect of the drug Captopril on blood pressure (both systolic and
diastolic) the drug is administered to a random sample n = 15 subjects. Each subjects
blood pressure was recorded before taking the drug and then 30 minutes after taking the
drug. The data are shown below.
Syspre – initial systolic blood pressure
Syspost – systolic blood pressure 30 minutes after taking the drug
Diapre – initial diastolic blood pressure
Diapost – diastolic blood pressure 30 minutes after taking the drug
Research Questions:
 Is there evidence to suggest that Captopril results in a systolic blood pressure
decrease of at least 10 mmHg on average in patients 30 minutes after taking it?
 Is there evidence to suggest that Captopril results in a diastolic blood pressure
decrease of at least 5 mmHg on average in patients 30 minutes after taking it?
For each blood pressure we need to consider paired differences of the form
d i  BPpre i  BPpost i . For paired differences defined this way, positive values
correspond to a reduction in their blood pressure ½ hour after taking Captopril. To
answer research questions above we need to conduct the following hypothesis tests:
H o :  syspre syspost  10 mmHg
and
H o :  diaprediapost  5 mmHg
H a :  syspre syspost  10 mmHg
H a :  diaprediapost  5 mmHg
Below are the relevant statistical summaries of the paired differences for both blood
pressure measurements.
The t-statistics for both tests are
given below:
Systolic BP
Diastolic BP
73
We can use the t-Probability Calculator in JMP to find the associated p-values or better
yet use JMP to conduct the entire t-test.
Systolic Blood Pressure
Diastolic Blood Pressure
Both tests result in rejection of the null hypotheses. This we have sufficient evidence to
suggest that taking Captopril will result in mean decrease in systolic blood pressure
exceeding 10 mmHg (p = _______) and a mean decrease in diastolic blood pressure
exceeding 5 mmHg (p = _______). Furthermore we estimate that the mean change in
systolic blood pressure will be somewhere between _______ mmHg and ______ mmHg,
and that the mean change in diastolic blood pressure could be as large as ______ mmHg.
74
5.9 – Comparing Two Pop. Means Using Independent Samples
Example 1: Prior Knowledge of Instructor and Lecture Rating
How powerful are rumors? Frequently, students ask friends and/or look at instructor
evaluations to decide if a class is worth taking. Kelley (1950) found that instructor
reputation has a profound impact on actual teaching ratings. Towler and Dipboye (1998)
replicated and extended this study by asking: “Does an instructor's prior reputation affect
student ratings?”
Towler, A., & Dipboye, R. L. (1998). “The effect of instructor reputation and need for cognition on student
behavior”
Experimental Design:
Subjects were randomly assigned to one of two conditions. Before viewing the lecture,
students were give a summary of the instructors prior teaching evaluations. There were
two conditions: Charismatic instructor and Punitive instructor.
Summary given in the "Charismatic instructor" condition:
Frequently at or near the top of the academic department in all teaching categories. Professor S was always
lively and stimulating in class, and commanded respect from everyone. In class, she always encouraged
students to express their ideas and opinions, however foolish or half-baked. Professor S was always
innovative. She used differing teaching methods and frequently allowed students to experiment and be
creative. Outside the classroom, Professor S was always approachable and treated students as individuals.
Summary given in the "Punitive instructor" condition:
Frequently near the bottom of the academic department in all important teaching categories. Professor S did
not show an interest in students' progress or make any attempt to sustain student interest in the subject.
When students asked questions in class, they were frequently told to find the answers for themselves. When
students felt they had produced a good piece of work, very rarely were they given positive feedback. In
fact, Professor S consistently seemed to grade students harder than other lecturers in the department.
Then all subjects watched the same twenty-minute lecture given by the exact same lecturer. Following the
lecture, subjects rated the lecturer. Subjects answered three questions about the leadership qualities of the
lecturer. A summary rating score was computed and used as the variable "rating" here.
75
Research Question: Does an instructor prior reputation affect student ratings of a lecture
given by a professor?
Summary Statistics
xC  2.613
x P  2.236
s C  .533
s P  ..543
nC  25
n P  24
Intuitive Decision
In order to determine whether or not the null or alternative hypothesis is true, you could
review the summary statistics for the variable you are interested in testing across the two
groups. Remember, these summary statistics and/or graphs are for the observations you
sampled, and to make decisions about all observations of interest, we must apply some
inferential technique (i.e. hypothesis tests or confidence intervals)
One of the best graphical displays for this situation is the side-by-side boxplots. To get
side-by-side boxplots, select Analyze > Fit Y by X. Place Prior Info in the X box and
Rating in the Y box. Place mean diamonds & histograms on the plot, and we may also
want to jitter the points. The more separation there is in the mean diamonds, the more
likely we are to reject the null hypothesis (i.e data tends to support the alternative
hypothesis).
To answer the question of interest formally we need inferential tools for comparing the
mean rating given to a lecture when students are told the professor is a charismatic
individual vs. mean rating given when students are given the punitive instructor prior
opinion, i.e. compare  charismatic to  punitive.
76
Hypothesis Testing ( 1 vs.  2 )
The general null hypothesis says that the two population means are equal, or equivalently
there difference is zero. The alternative or research hypothesis can be any one of the
three usual choices (upper-tail, lower-tail, or two-tailed). For the two-tailed case we can
perform the test by using a confidence interval for the difference in the population means
and determining whether 0 is contained in the confidence interval.
H o : 1   2 or equivalently (  1   2 )  hypothesized difference (typically 0)
H a: 1   2 or equivalently ( 1   2 )  hypothesized difference (upper - tail)
or
H a : 1   2 or equivalently ( 1   2 )  hypothesized difference (two - tailed, USE CI! )
etc....
Test Statistic
t
( X 1  X 2 )  (hypothesized difference)
~ t-distribu tion with appropriat e degrees of freedom
SE ( X 1  X 2 )
where the SE ( X 1  X 2 ) and degrees of freedom for the t-distribution comes from one of
the two cases described below.
Confidence Interval for the Difference in the Population Means
100(1 -  )% Confidence Interval for ( 1   2 )
( X 1  X 2 )  t  SE ( X 1  X 2 )
where t comes from t-table with appropriate degrees of freedom (see two cases below).
There are two cases one needs to consider when comparing two population means using
independent samples.
Case 1 ~ Equal Populations Variances/Standard Deviations
2
2
(  1   2 =  2  common variance to both populations) Rule of Thumb for Checking
Variance Equality
If the larger sample variance is more
than twice the smaller sample variance
do not assume the variances are equal.
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are equal.
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are “large”.
77
Case 1 – Equal Variances (cont’d)
Assuming the assumptions listed above are all satisfied we have the following for the
standard error of the difference in the sample means.
1 
2 1
SE ( X 1  X 2 )  s p   
 n1 n 2 
where
(n  1) s1  (n2  1) s 2
 1
n1  n 2  2
2
sp
2
2
if n1  n 2
s 2p 
s12  s 22
if n1  n2
2
s p is called the “pooled estimate of the common variance ( 2 ) ”. The degrees of
2
freedom for the t-distribution in this case is df  n1  n2  2 .
Example 1: Prior Knowledge of Instructor and Lecture Rating (cont’d)
Case 1 – Equal Variances
To perform the “pooled t-Test” select the
Means/Anova/Pooled t option from the
Oneway Analysis pull-down menu.
Case 2 – Unequal Variances
If you do not want the to assume the population
variances are equal then select the t Test
option.
To formally test whether we can assume the
population variances are equal select UnEqual
Variances from pull-down menu.
78
t-Test Results from JMP
Discussion:
In the previous example we chose to use a pooled t-test assuming the population
variances were equal based upon the visual evidence and applying the “rule of thumb”.
To formally test this assumption, choose the UnEqual Variances option from the
Oneway Analysis pull-down menu. The results are shown below.
Interpretation of Results
79
Example 2: Normal Human Body Temperatures Females vs. Males
Do men and women have the same normal body temperature? Putting this into a
statement involving parameters that can be tested:
H o :  F   M or ( F   M )  0
H a :  F   M or ( F   M )  0
 F  mean body temperature for females.
 M  mean body temperature for males.
Assumptions
1. The two groups must be independent of each other.
2. The observation from each group should be normally distributed.
3. Decide whether or not we wish to assume the population variances are equal.
Checking Assumptions
Assessing Normality of the Two Sampled Populations (Assumption 2)
To assess normality we select Normal Quantile Plot from the Oneway Analysis pulldown menu as shown below.
Normality appears to
be satisfied here.
80
Checking the Equality of the Population Variances
To test the equality of the population variances select Unequal Variances from the
Oneway Analysis pull-down menu.
The test is:
Ho : F   M
Ha : F   M
JMP gives four different tests for examining the equality of population variances. To use
the results of these tests simply examine the resulting p-values. If any/all are less than .10
or .05 then worry about the assumption of equal variances and use the unequal variance tTest instead of the pooled t-Test.
p-values for testing variances
81
Example 2: Normal Human Body Temperatures Females vs. Males (cont’d)
To perform the two-sample t-Test for independent samples:
 assuming equal population variances select the Means/Anova/Pooled t option
from Oneway-Analysis pull-down menu.
 assuming unequal population variances select t-Test from the Oneway-Analysis
pull-down menu.
Because we have no evidence
against the equality of the
population variances
assumption we will use a
pooled t-Test to compare the
population means.
Several new boxes of output will appear below the graph once the appropriate option has
been selected, some of which we will not concern ourselves with. The relevant box for us
will be labeled t-Test is shown below for the mean body temperature comparison.
Because we have concluded
that the equality of variance
assumption is reasonable for
these data we can refer to the
output for the t-Test assuming
equal variances.

What is the test statistic value for this test?

What is the p-value?

What is your decision for the test?

Write a conclusion for your findings.
82

Interpretation of the CI for ( F   M )
Case 2 - Unequal Populations Variances/Standard Deviations (  1   2 )
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are NOT equal.
(This can be formally tested or use rule o’thumb)
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are “large”.
Test Statistic
t
(X1  X 2 )  0
~ t-distribution with df = (see formula below)
SE ( X 1  X 2 )
where the SE ( X 1  X 2 ) is as defined below.
100(1 -  )% Confidence Interval for ( 1   2 )
( X 1  X 2 )  t  SE ( X 1  X 2 )
where
2
SE ( X 1  X 2 ) 
2
s1
s
 2
n1
n2
and
df 
 s1 2 s 2 2 


n  n 
2 
 1
2
rounded down to the nearest integer
2
2
 s1 2 
 s2 2 




n 


 1    n2 
n1  1
n2  1
The t-quantiles are the same as those we have seen previously.
83
Example: Cell Radii of Malignant vs. Benign Breast Tumors
These data come from a study of breast tumors conducted at the University of WisconsinMadison. The goal was determine if malignancy of a tumor could be established by
using shape characteristics of cells obtained via fine needle aspiration (FNA) and
digitized scanning of the cells. The sample of tumor cells were examined under an
electron microscope and a variety of cell shape characteristics were measured.
One of the goals of the study was to determine which cell characteristics are most useful
for discriminating between benign and malignant tumors.
The variables in the data file are:
 ID - patient identification number (not used)
 Diagnosis determined by biopsy - B = benign or M = malignant
 Radius = radius (mean of distances from center to points on the perimeter
 Texture texture (standard deviation of gray-scale values)
 Smoothness = smoothness (local variation in radius lengths)
 Compactness = compactness (perimeter^2 / area - 1.0)
 Concavity = concavity (severity of concave portions of the contour)
 Concavepts = concave points (number of concave portions of the contour)
 Symmetry = symmetry (measure of symmetry of the cell nucleus)
 FracDim = fractal dimension ("coastline approximation" - 1)
Medical literature citations:
W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
Machine learning techniques to diagnose breast cancer from
fine-needle aspirates.
Cancer Letters 77 (1994) 163-171.
W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
Image analysis and machine learning applied to breast cancer
diagnosis and prognosis.
Analytical and Quantitative Cytology and Histology, Vol. 17
No. 2, pages 77-87, April 1995.
W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
Computerized breast cancer diagnosis and prognosis from fine
needle aspirates.
Archives of Surgery 1995;130:511-516.
W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian.
Computer-derived nuclear features distinguish malignant from
benign breast cytology.
Human Pathology, 26:792--796, 1995.
See also:
http://www.cs.wisc.edu/~olvi/uwmp/mpml.html
http://www.cs.wisc.edu/~olvi/uwmp/cancer.html
In this example we focus on the potential differences in the cell radius between benign
and malignant tumor cells.
84
The cell radii of the malignant tumors certainly appear to be larger than the cell radii of
the benign tumors. The summary statistics support this with sample means/medians of
rough 17 and 12 units respectively. The 95% CI’s for the mean cell radius for the two
tumor groups do not overlap, which further supports a significant difference in the cell
radii exists.
Testing the Equality of Population Variances
85
Because we conclude that the population variances are unequal we should use the nonpooled version to the two-sample t-test. No one does this by hand, so we will use JMP.
Conclusion:
86
5.9 – Effect Size (d), Variance Explained, and Polyserial Correlation
87