Download Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Demonstrations IV
Dr. Scott Stevens
AA.
AB.
AC.
AD.
Estimation of the population mean when  is known (Confidence intervals)
Estimation of the Population Mean when  unknown (Confidence intervals)
Estimation of Population Proportion (Confidence Interval for One Proportion)
Determination of Appropriate Sample Size (Proportion)
1
AA. Estimation of the population mean when  is known (Confidence intervals)
Problem: Statisticians sometimes report the 50% confidence interval, with the margin for sampling
error known as the probable error. For example, an estimate x-bar of the average useful life of a TV
picture tube is said to have a probable error of e years if there is a 50% chance that the interval from
x-bar – e to x-bar + e has a 50% chance of including the population mean. Calculate the probable
error if the standard deviation in TV tube lives is known to be 2.5 years and the average useful
lifetime in a sample of 25 TV tubes is found to be 8.15 years.
This is an estimation problem. In estimation problems, you are given information about a sample, and are
asked to provide an estimate of a population parameter. This estimate usually is expressed as a confidence
interval, or interval estimate. The process of doing this is always the same. We compute a statistic from
the sample, such as the sample mean, and use this value as a point estimate (single number estimate), of
the corresponding population parameter. We then compute a margin of error for this estimate. We’ll have
more to say about the margin of error below: how to compute it, and what it means.
When finding the confidence interval for the mean, we can use the procedure in the box on the next page.
The box describes how to build a confidence interval for the mean when the population  is known (which
is the situation in this demonstration) and when it is unknown (as in demonstration AB). Read that
procedure on the top of the next page now, then come back to this page and continue.
*****
I’ve implemented this box in a spreadsheet template called Confidence Interval for the Mean. You can
find it on the website. You won’t have this template available for exams, but it will allow you to check
your work in homework, and will walk you through the required calculations step by step.
I haven’t gone into the why of this box, but I think your book does an okay job of explaining it in Chapter
8. The basic idea is that the critical score tells you how many "standard deviations" you have to extend out
from the population mean  in order to pick up the fraction c of the sample means. The second term
(  n or s / n ) tells you how big one standard deviation is for the sampling distribution of the mean.
(The actual value is  n , but in real life, we rarely know , so we approximate with s / n . This
quantity, s / n , comes up a lot, and it's usually called the standard error. It's not a great name, but get
used to it. Whenever you see a reference to the standard error of a statistic, it always means: "here's our
approximation to how big one standard deviation is in the sampling distribution of this statistic".
One more comment before proceeding. As we’ve mentioned before, notation in statistics is not always
universal. When referring to a critical z or t score, I’ll always indicate it with an asterisk (z * or t*). This is
just a way of letting you know that this particular score is a critical value. Although this usage is common,
your book doesn’t use it.
2
Finding a Confidence Interval for the Population Mean
What you need:




The mean of your sample, x-bar
The size of your sample, n
Either the standard deviation of the population (), or the standard deviation of your sample (s).
The confidence level that you desire, c. (95% is the most common.)
What you get:
 A confidence interval for the population mean at the specified confidence level.
1. The confidence interval for the population mean always looks like this:
x-bar + margin of error.
We abbreviate margin of error as MOE. If you know , the population standard deviation, go to
step 2a. If you do not know the population standard deviation , then go to step 2b.
2a. You know . In order to compute the MOE, you'll need to find the critical z value, z*.
 Find z*. In Excel, z* = NORMSINV((1+c)/2). You can also find the value of z * for the most
commonly used confidence levels by looking in the table below.
For confidence level
Use z* =
 MOE = z*  
.90
1.645
.95
1.960
.99
2.576
n , so
 confidence interval = x-bar + z*  
n
 go on to step 3
2b. You don't know . In order to compute MOE you'll need the critical t value, t*, and the sample
standard deviation s.
 Find t*. In Excel, t* = TINV(1-c,n-1). Alternatively, you can use the book's explanation of how to
obtain t* from Table E.3 at the back of your text.
 Find the sample standard deviation s, if it hasn't been given. You can do this with the Excel command
= STDEV(data range), where data range is the set of sample values.
 confidence interval = x-bar + t* 
s/ n
 go on to step 3
3. Verify that the assumptions of this technique are satisfied. You are okay if any of the following are true.
 The original population is normally distributed, or
 The data is roughly normally distributed and n > 10, or
 The data is roughly symmetric and n > 20, or
 The sample is large (n > 30 is usually enough)
3
Now let’s solve the problem. It’s whole discussion of probable error just says: The probable error is the
margin of error for a 50% confidence interval. So we have:
x-bar = 8.15
 = 2.5
n = 25
c = 0.50
I'll show the work in Excel, using my Confidence Interval for the Population Mean template. As
always, the template includes in the blue box the formulas used to perform the calculation. Note that, for
this problem, I typed in the values of , n, and x-bar. If you actually typed in the 25 TV tube lifetimes,
Excel would have computed n and x-bar for you.
Confidence Interval for the Population Mean
Population standard deviation, , if known
Sample mean, x-bar
Sample size, n
Sample standard deviation, s
confidence level, c
sample mean, x-bar
sample size, n
sigma known, s not needed
critical z value, z*
standard error, SE
margin of error, MOE
lower confidence limit
upper confidence limit
2.5
8.15
25
0.5
8.15
25
--0.674490366
0.5
0.337245183
7.812754817
8.487245183
CELL MUST BE EMPTY IF
<== SIGMA NOT KNOWN!
=AVERAGE(range)
=COUNT(range)
=NORMSINV((1+c)/2)
=sigma/SQRT(n)
=z* x SE
=x-bar - MOE
=x-bar + MOE
Check
Population must be roughly symmetric--sample<30
So our answer is: The probable error in our 8.15 year estimate of average TV life is about 0.337 years,
which is about 4 months. Since the sample is of 25 TV sets, this conclusion should be valid as long as the
distribution of TV set lifetimes is roughly symmetric. Since the problem didn't give us the data, we can't
conclude that for sure.
The only fishy thing in the work above is the calculation of the critical z, z *. Where does that (1+c)/2 come
from? It may be easiest to understand by dealing with a particular confidence level, like the 0.5 level in
this problem. We want a “central chunk” of the standard normal distribution; in this case, the middle 50%
(or 0.50) of it. How far from the middle of the distribution are the endpoints of this chunk? Well, if 50%
of the area is in this chunk, then 50% is outside of it, too. Since the chunk we’re looking at is central, this
means that 25% of the area is in each of the two tails. So I want to know what z values give me 25% of the
area in each tail. The cutoffs are =NORMSINV(0.25) and =NORMSINV(0.75). Make sure you see why.
These values are –0.67449 and +0.67449. Since the one value will always be the negative of the other
(why?), we just compute the upper one.
4
0.45
0.4
Area = c
density
0.35
0.3
0.25
0.2
Area = (1 - c)/2
Area = (1 - c)/2
0.15
0.1
0.05
0
-4
-3
-2
-1
0
z
1
z*
2
3
4
The graph above shows the more general situation. If we use a confidence level c, then we want the area in
the central chunk to be c, so we want the area in the tails to total to 1 – c. Since there are two tails, each
gets an area of (1 – c)/2. Now look how much of the curve’s area lies to the left of z*: (1 – c)/2 for the
lower tail and c for the central chunk. The total is (1 + c)/2, so the value of z* is given by
=NORMSINV((1+c)/2). (Recall how NORMSINV works from Demonstration X.)
What this means—the whole idea of confidence interval
It's important to understand what the confidence interval (or margin of error) tells you, and what it does not.
Let me try to get this across to you by eavesdropping on a conversation between Bob, the TV Guy, and his
friend Phil (also known as “Phil the Nitpicker”). Phil’s TV is broken.
Phil: So my TV tube’s shot, huh?
Bob: ‘Fraid so. I’m gonna have to put in a new one.
Phil: How long will the new one last?
Bob: Well, it varies, of course.
Phil: Okay—I understand you can’t tell me how long my new tube will last. But how long do TV tubes
last on average?
Bob: Well, I’d say about 8.15 years.
Phil: Really?
Bob: Really. A buddy of mine has an electronics store. He ran 25 TV sets until their picture tubes went
south, and he said that they lasted an average of 8.15 years. He likes figuring stuff like that.
Phil: You have weird friends, Bob.
Bob: (eyeing Phil) Tell me about it.
Phil: Still…that’s only 25 sets. I mean, you can’t say that the average for all TV tubes in the world is
8.15 years, just because it was 8.15 years for the sets your friend had.
Bob: Nope, Phil, you can’t. That’s why I said, “about 8.15 years”. Hand me that screwdriver, willya?
Phil: (hands it over) Yeah. Yeah. (pauses) So when you say that tubes last on average about 8.15
years, You mean—what? Like, a tube will last between, say, 8 and 8.3 years? I mean, I’ll be
good for 8 years anyway, huh?
Bob: (stopping work to look at Phil) Look, man—I have no idea what your tube will do. It could burn out
in a week, a year, 10 years, or 50 years, for all I know. That’s what warranties are for. Some
tubes last a long time, some don’t. I’m just saying that, on average, they last about 8.15 years.
Okay?
Phil: Yeah, sure, sure. I understand. Sure. That’s what I meant.
5
Bob, a bit fed up, returns to working on the set.
Phil: So, the average of your friend’ tubes was 8.15 years, so you figure that the average for all the
tubes is about 8.15 years. That makes sense.
Bob: Seems so to me.
Phil: So—what? You figure the average lifetime for all the picture tubes in the world is between 8
years and 8.3 years?
Bob: I reckon so. Probably.
Phil: Yeah, yeah. Me too. I reckon so. (pauses) What do you mean, “probably”?
Bob: (pulling his head out of the back of the TV) Look, Phil—I DON’T KNOW, okay? It’s possible, I
suppose, that my friend just happened to get a bunch of Super TV tubes in his set! Or maybe they
were particularly bad, just by chance! So maybe, just maybe, the actual average life of a picture
is way off from 8.15 years! Maybe they only last 3 months on average, and everyone that you and
I ever heard of has just been really lucky!
Phil: But you don’t think so…
Bob: No, I don’t think so. It could happen, but it’s bloody unlikely! All right? I think it’s pretty darned
likely that the average for all picture tubes is between 8 and 8.3 years. I can’t tell you exactly how
likely it is that I’m right about this, but I never took statistics in TV repair school!
Phil: I was just asking for your opinion…
Bob: (putting down his tools, keeping himself under control) Okay. Here’s my opinion. I know a guy
with 25 sets, and his tubes lasted an average of 8.15 years. That part, I’m sure of. From that, I’m
guessing that the average lifetime of all of the TV tubes in the world is about 8.15 years. And,
since you are so concerned about it, by “about 8.15 years”, I mean “between 8 and 8.3 years”. But
maybe I’m wrong, and the average isn’t between 8 and 8.3 years. I’m saying that I guess that
there’s about a 10% chance that I’m wrong, but that’s a guess, too. Now, DO YOU WANT ME
TO FIX YOUR SET, OR DON’T YOU?
Phil: Well, about how much is it going to cost?
As annoying as Phil is in the conversation above, he does have a few points. First, you can’t be sure that a
sample, even one taken with proper care to assure randomness, is representative of the population as a
whole. Second, if you’re going to tell me that the lifetime is “about 8.15 years”, I need to know what you
consider to be “about 8.15”. Is 8 years “about “ 8.15? How about 7 years? 12 years? Finally, even if you
specify to me what you mean by “about”, there’s still the possibility that you’re wrong—that the range that
you gave me doesn’t include the actual mean lifetime of TV tubes. (And, of course, even if you could tell
me exactly how long tubes last on average, that still doesn’t make any guarantees about my particular tube.)
So the consequence of all of this is: when you’re going to estimate the value of the population mean, , by
saying that it’s “about as big as x-bar”, you have to tell me two things: first off, what do you mean by
“about”, and secondly, how likely is it that you’re right. And this is exactly what we do whenever we build
a confidence interval. We use the sample mean, x-bar, as our guess for the population mean, . Our
definition of “about” come from our MOE (margin of error). Finally, our chance of being right in our
estimate is what we call our confidence level.
So in the language of statistics, Bob was saying that he believed that the 90% confidence interval for mean
tube life ran from 8 to 8.3, implying that the margin of error was 0.15 for this interval. He’s quite wrong,
as the problem solution on page 3 shows. There is, in fact only a 50% chance that the average life is
between 7.81 and 8.48 years. If you computed the 90% confidence interval that Bob needs, you’d find it
has a margin of error of 1.645  0.5 = 0.8225 years, or about 10 months. That is, there is about a 90%
chance that the mean lifetimes of all TV tubes is within 10 months of 8.15 years.
6
Meaning of "90% Confidence"
A very good way to think of what confidence means—say, "90% confidence"—is this thought experiment.
I have a barrel, filled with black marbles and white marbles, mixed thoroughly together. You know that
90% of the marbles in the barrel are white and 10% are black. Reach into the barrel and take a marble.
Hold it tightly in your hand, and don't look at it.
We would then say that you are 90% confident that the marble you took is white. Without looking at it,
you don't know what it is, but picking a marble the way that you did will give you a white marble 90% of
the time.
It's terribly common for students to completely misinterpret confidence intervals. Check out the
interpretations below and make sure you understand their mistakes.
WHAT THE SOLUTION TO THE PROBLEM DOES NOT MEAN: The MOE was 0.337 years for a
confidence level of 50%. This does not mean that 50% of all TVs have lifetimes within 0.337 years of
8.15. TV lifetimes are much more spread out than that. The confidence interval is a statement about the
population mean—the average lifetime of a TV set.
WHAT THE SOLUTION TO THE PROBLEM DOES NOT MEAN: The MOE was 0.337 years for a
confidence level of 50%. This does not mean that 50% of all samples of 25 TV sets will have means
within 0.337 of 8.15 years. Again, we are using a sample to build a confidence interval for the population
mean.
WHAT THE SOLUTION TO THE PROBLEM DOES NOT MEAN: The MOE was 0.337 years for a
confidence level of 50%. This does not mean that the population mean will be within 0.337 years of 8.15
"50% of the time". The mean lifetime of TV sets is a single number, and it either is in the confidence
interval, or it isn't. After all, the marble in your hand in the thought experiment above isn't white 90% of
the time and black 10% of the time, is it? It doesn't flash back and forth from color to color!
Sometimes people will say "there's a 50% chance that the population mean is within 0.337 years of 8.15".
This is skirting the edges of acceptable language, and it can give a false impression. It’s better to say that
we're 50% confident that the average lifetime of a TV set is within 0.337 years of 8.15 years.
AB. Estimation of the Population Mean when  unknown (Confidence intervals)
Problem: Henry Cavendish (who, by the way, was a real mad scientist) made 23 measurements of
the density of the earth relative to the density of water. In so doing, he is sometimes credited as being
“the man who weighed the earth”. Assuming that his observations (provided on the next page, in
yellow) are independent measurements made from a normal distribution, does the 99% confidence
interval include the value of 5.517 that is now accepted as the density of the earth?
We are given the observations in a sample and asked to compute from it alone a confidence interval for the
mean of the population. To solve this, we’re going to use the box on page 2 of these demonstrations.
We'll find s and use it to find the standard error (which is s/n). We'll find the critical t value (which is
=TINV(1-c, n-1)), then multiply the standard error by this t value to get the margin of error (MOE). This
quantity is added to and subtracted from the sample mean (x-bar) to give the confidence interval. Here's
the work for this one, in Excel. Again, I’ve used my Excel template Confidence Interval for the
Population Mean.
7
Confidence Interval for the Population Mean
DATA
5.1
5.27
5.29
5.29
5.3
5.34
5.34
5.36
5.39
5.42
5.44
5.46
5.47
5.53
5.57
5.58
5.62
5.63
5.65
5.68
5.75
5.79
5.85
Population standard deviation, , if known
<==Since you've provided data, I'll compute
the values of x-bar, s and n.
I'll ignore anything in these three cells ==>
confidence level, c
0.99
sample mean, x-bar
5.483478261
sample size, n
23
sample standard deviation, s
0.190420795
critical t value, t*
2.818760549
standard error, SE
0.03970548
margin of error, MOE
0.111920242
lower confidence limit
5.371558019
upper confidence limit
5.595398503
Check
Population must be roughly symmetric--sample<30
=AVERAGE(range)
=COUNT(range)
=STDEV(range)
=TINV(1-c,n-1)
=s/SQRT(n)
=t* x SE
=x-bar - MOE
=x-bar + MOE
So the 99% confidence interval runs from 5.37 to 5.60 times the density of water. This does indeed include
5.517. We were instructed to assume that the population is normal, so our assumptions in the box on page
2 are satisfied.
Two final points before we leave this problem. First, let’s look at the formula for the critical t value, t*.
Where does this come from? Well, unlike the other INV functions in Excel, TINV wants you to tell it the
total area in the two tails. Don’t ask me why. It means, though, that a confidence level of c requires that
the “central chunk” of the t distribution have an area of c, which means that the total area in the two tails
must be 1 – c. (See page 4 of this demonstration if this doesn’t make sense.) The n – 1 is the “degrees of
freedom” for the t distribution, and we’ll talk about what that means in class. For confidence interval work
for one mean, it’s always n – 1.
Okay, the last point: you can get Excel to do this work for you directly. If you're interested, I've provided
instructions on the next page. I'd encourage you to give it a try.
So why don't we just let Excel do this work all the time? Four reasons.
1.
Excel doesn't check assumptions—you must, or what Excel tells you could be garbage.
2.
Excel requires that you provide all of the sample observation. You can't just tell it x-bar, s, and n,
and let it go.
3.
Excel assumes that you don't know . If you do, then it's proper to use the z-distribution, not the tdistribution. (Frankly, this isn’t much of an objection. In real life, we almost never know sigma.)
8
4.
It's important that you understand what you're doing. It's almost impossible to expand upon your
knowledge of stats if you don't.
Having Excel Automatically Generate the Confidence Interval
for the Mean of a Population from a Sample Alone
What you need:
 a randomly selected sample from a population. The sample must satisfy the requirements in step 3
of the procedure on page 2.
What you get:
 a lot of information, including the confidence interval for the mean of the population.
Step 1: Enter your sample into Excel, either as a single row, or a single column.
Step 2: On Excel's Tools menu, choose Data Analysis. From the menu that appears, choose
Descriptive Statistics. Click OK.
Step 3: Click in the box labeled Input Range, then highlight the numbers in your sample. The range
identifying these cells should appear in the Input Range box.
Step 4: Check the boxes next to Summary Statistics and Confidence Level for Mean. In the Confidence
Level For Mean box, put the desired confidence level. If you want a 99% confidence interval,
type 99.
Step 5: Click OK. The top row of the output (labeled mean) tells you x-bar, and the bottom row
(labeled Confidence Level) tells you the MOE.
Step 6: To get the lower limit of the confidence interval, subtract the MOE from the mean. To get the
upper limit of the confidence interval, add the MOE to the mean.
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
5.483478
0.039705
5.46
5.29
0.190421
0.03626
-0.5514
0.149761
0.75
5.1
5.85
Sum
Count
Confidence Level(99.0%)
126.12
23
0.11192
Here's the result of applying the box above to the
data in the problem. As you can see, we get a
mean of 5.483 and a MOE of 0.11192, just as we
did earlier. Note the entries for standard
deviation (s), standard error, and count (n)
9
AC. Estimation of Population Proportion (Confidence Interval for One Proportion)
Problem: After the confirmation hearing of Justice Clarence Thomas, a survey of 1300 members of
the National Association for Female Executives revealed that all but 299 of them considered
sexual harassment in the workplace to be a problem. Find the 95% confidence interval for
the fraction of all female executives who consider sexual harassment in the workplace to be a
problem.
The difference between this problem and the preceding ones is that the parameter of interest in a population
proportion (π) rather than a population mean (). The work will be very similar to the work in
demonstration Y, with this difference: the standard error of the proportion is not  n , but is given by
standard error of the proportion =
p(1  p)
n
(NOTE: In my Excel templates, I refer to the sample proportion p as “p-hat”. Different textbooks use
different symbols for this quantity, and I wrote the templates for another book. Just ignore the “-hat” for
this class.)
We'll do this problem with our Excel spreadsheet. The formulas used in the sheet are shown, also. Cell B3
contains the confidence level. B2 is p, D3 is the sample size, and B7 is the MOE. Note that proportion
problems always use a z score, not a t score. This is the case even though we don’t know the population
standard deviation.
I’ve created a template, Confidence Interval for the Population Proportion, for this kind of problem, but
let’s do it here from scratch, just for practice.
Given Data
p-hat =
0.77
c=
0.95
n=
1300
Calculations
Confidence Interval
z*=
1.95996
lower limit= 0.74712
std. err. of p-hat= 0.01167
upper limit = 0.79288
MOE=
0.02288
Checks
# successes=
1001
# failures=
299
At least 5 of each, technique okay.
Given Data
p-hat =
0.77
c=
0.95
n=
1300
Calculations
Confidence Interval
z*=
=NORMSINV((1+B3)/2)
lower limit=
=B2-B7
std. err. of p-hat=
=SQRT(B2*(1-B2)/D3)
upper limit =
=B2+B7
MOE=
=B5*B6
Checks
# successes=
=B2*D3
# failures=
=D3-B9
=IF(AND(B9>=5,D9>=5),"At least 5 of each, technique okay.","Insufficient successes or failures.")
10
So, assuming that the sample is randomly selected from the target population, we are 95% confident that
the proportion of all female executives who find sexual harassment in the workplace a problem is between
74.7% and 79.3%.
Is it likely that this sample is random? No. First, the poll was taken of the membership of the National
Association for Female Executives, based in New York. It is doubtful that this group is representative of
female executives as a whole (the target population), but even if it were, we must ask how the poll was
taken. Most likely it was either a convenience sample (for example, of women in the home chapter of
NYC), or a self-selected sample (of members who, say, visited the organization's website and chose to
vote). If the former is the case, then the sample will reflect the bias of NYC residents. If the latter is the
case, then the harassment figure is probably inflated, since the people most eager to express their opinion
are likely to be those with strong feelings on the matter.
The second bias is due to the timing of the poll. The issue of sexual harassment was saturating the media at
that time, and emotions were running high on the Thomas/Hill debate. This could lead people to identify
harassment as a problem who normally would "shrug it off".
Note the "Checks" box on my Excel sheet. In order for this technique to be valid, we should have at least 5
successes and 5 failures in our sample. In this case, 23% of the 1300 votes were "failures"—women who
said "no". 0.23  1300 = 299 is obviously a lot bigger than 5, and the remaining 1001 women were
"successes"—they said "yes".
Never use a statistical technique without checking its assumptions!
AD. Determination of Appropriate Sample Size (Proportion)
Problem: Ms. Goodman wishes to conduct a study to determine who controls the TV remote in
American couples, the man or the woman. She will construct a 95% confidence interval for
the proportion of men who control the remote, and wants her margin of error to be no more
than 4.5%. How large a sample should she take?
This is a proportion problem, and the margin of error in the proportion formula is the product of the critical
p(1  p)
n
z score (z*) and the standard error of the proportion,
.
The z score for 95% is just 1.960, as
we saw in the solution box on page 2 of these demonstrations. (It can also be computed by z* =
NORMSINV((1+c)/2).) The MOE for this problem is to be 0.045. So for Ms. Goodman to get her reported
answer, we'd need that
0.045 = 1.960 
p(1  p)
n
Now divide both sides by 1.96 and square both sides:
(0.045/1.96)2 = p(1 – p)/n,
or
n = p(1 – p)/(0.045/1.96)2
The problem, of course, is that we don’t know what p is! We haven’t yet taken the sample! What do we
do?
Two approaches are often adopted, and we’ll discuss them both.
(1) Cover your butt. If you look at the formula for n above, you’ll find that it always takes on its
biggest value when p is 0.5. We can feel confident then, that our sample will be large enough if n
is at least
11
0.5(1 – 0.5)/(0.045/1.96)2
This is 474.3, so surveying 475 randomly selected couples should be sufficient.
(2) If you don’t want to take a worst case estimate of p, your only alternative is to use a reasonable
estimate for it. Often this value is obtained by doing a rather small study and using the observed
value of p in the equation on the last page. For example, Ms. Goodman might do a preliminary
survey of 50 couples and find that in 40 of those 50 couples, the man controlled the remote. Using
p = 0.8 in the formula for n on the last page, then, would give 303.5, so Ms. Goodman might
decide on a total survey of, say 320 people. (The slight inflation of the value of n would be a good
idea, since our value for p was estimated.)
Note that this second approach can also be used to estimate the required sample size when
building a confidence interval for the population mean, too. We proceed in the same way, taking
the expression for MOE and setting it equal to the MOE desired. For the mean, though, this
formula is going to include  or s, and there is no “worst case” value for these. In brief, you have
to be able to have a reasonable guess of s or  (or do a small preliminary study to find a
reasonable guess), or you can’t determine the required sample size for a certain MOE in a study of
a confidence interval for the population mean.
12