Download Estimation of the Mean and Proportion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
1
Excel Manual
Chapter
Estimation of the
Mean and Proportion
8
While the spreadsheet setups described in this guide may seem to be getting more complicated,
once they are created (and tested!), they will serve as handy tools for making quick and dynamic
calculations. This section explains how to use Excel to create confidence intervals for population
means and proportions as well as hypothesis testing for both.
Since the calculations of both confidence intervals and hypothesis testing are similar, you can
set up the same spreadsheet to solve both types of problems. Excel displays its calculations as
the values are entered, so you can observe how changing the values affect the results. For
example, you can play with the n value to see what happens if it is increased or decreased, or
find out how changing the sample proportion would affect the margin of error.
This point cannot be emphasized enough: The spreadsheets created in this chapter need to be
tested using known values to make sure that you are obtaining the correct results. Use the
examples shown here to ensure an accurate setup.
Estimating a Population Mean: σ known
=CONFIDENCE(alpha, standard deviation, sample size): This formula calculates the margin of
error for a confidence interval. Note: This function only applies to estimating a
population mean when the standard deviation of the population standard deviation is
known.
The following illustration shows how to set up a spreadsheet to calculate the margin of error
and confidence interval given the α-level, mean, standard deviation, and sample size:
Note the negative sign on the NORMSINV() function. This makes the Z value positive. Input data
in the white cells; the shaded areas contain formulas or labels.
Chapter 8/ Introductory Statistics - Mann
2
Excel Manual
The example below uses the information in Example 8-1 on page 366 of your text, creating a
confidence interval for estimating the mean cost of a college textbook:
To enter the Greek letters as shown, select Symbol from the Symbols group in the Insert tab,
choose the desired letter and click Insert.
The 90% confidence interval is found in cells B10 and C10. We are 90% confident that the true
mean time of community service is in the interval (133.49, 156.51). This is a little different from
the text’s results due to rounding errors. The text rounds the Z* to the nearest hundredth,
whereas Excel carries the decimal out much further. (* indicates that the Z-score is a “criticalvalue.”)
Finding the Z and Required Sample Size for a Confidence Interval
Calculating values of zα/2 for any α.
Create the following spreadsheet to automatically calculate the zα/2 given the desired α level.
You can also set up the bottom portion of this spreadsheet to automatically yield the required
sample size to ensure that the margin of error is below a specified level for a given α level and
standard deviation.
The following are the results for Example 8-3 on page 371. We have calculated Z and the sample
size for estimating the mean debt of college graduates with 99% confidence, given a standard
deviation of $11,800 and an error of no more than $800.
Chapter 8/ Introductory Statistics - Mann
3
Excel Manual
From the results, we see that a sample size of 1444 is needed for 99% confidence with an error
of $800. This value is slightly different from the text due the rounding of Z*.
Estimating a Mean When σ is Not Known
=TINV(probability, degrees of freedom): This calculates the t value for the student’s tdistribution based on the degrees of freedom (n-1) for a specific probability. This
function assumes a two-tailed or “not equal” alternative hypothesis or a two-sided
confidence interval.
=TDIST(t-statistic, degrees of freedom, tails): This calculates the tail probability of a t test
statistic based on the degrees of freedom (n-1) and the number of tails (1 or 2). It is
important to note that the t statistic must be positive. If you use this function in a
situation where the value could be negative, you will need to use the ABS() function to
make it positive or else an error will result.
Note: The TINV can only be used with a two-tailed test. Therefore, if a one-tailed test is desired,
you have to divide this probability by two. On the other hand, the TDIST function allows
either a one-tailed or a two-tailed test, and this must be indicated as a final argument of
the function.
The following graphic demonstrates how to set up a spreadsheet to calculate the t necessary to
create a confidence interval based on sample size (n) and significance level (α):
Chapter 8/ Introductory Statistics - Mann
4
Excel Manual
The following spreadsheet will calculate the entire confidence interval automatically:
The next spreadsheet shows the results of creating a 95% confidence interval for the average
premium paid for family health insurance using the t-distribution from the data in Example 8-5
on page 378 in your text. The results show that we are 95% confident that the true mean
premium paid for health insurance coverage lies in the interval (6269.78, 6930.22). There is a
slight difference from the confidence interval in the textbook due to a more precise value of t.
Alternatively, if you would like Excel to perform all your calculations and you want to be able to
use the actual data in Excel, the following spreadsheet is set up to calculate everything—
including n, mean, and standard deviation. The formulas for n, mean and standard deviation in
this sheet can accommodate any number of entries in column D since the range is open-ended.
In other words, because (D:D) has no top nor bottom, any number in the column will be
included in the calculation.
The following example uses the normally distributed data from Exercise 8-47 on page 381 in
your text.
Chapter 8/ Introductory Statistics - Mann
5
Excel Manual
Finally, you may also use the Data Analysis ToolPak to calculate a confidence interval from a
sample. This will only work for a t-distribution. It is not available for the previous confidence
intervals that involved the z-distribution.
To use the Data Analysis ToolPak, enter the data in a column as shown in the previous example.
Select Data Analysis from the Analysis group in the Data tab. (If it is not there, you will need to
add it as described on the first page of Chapter 2.) Select Descriptive Statistics from the list and
click OK. Click on the red arrow in the input range option and select the list of data in your
spreadsheet. If you have a label in the first cell of the input range, be sure to check the
appropriate box. Check Summary Statistics and Confidence Level for Mean. Enter the desired
confidence level (100(1-α)). Click OK.
Chapter 8/ Introductory Statistics - Mann
6
Excel Manual
The result is stored in a separate spreadsheet. The confidence level calculated is the Margin of
Error and therefore must be added and subtracted from the sample mean to obtain the interval.
Exercise 8-47 on page 381 for a normally distributed data set is shown below:
Hence, the 95% confidence interval for the true mean in this data set is in the interval 1.409 +/4.810 = (-3.401, 6.219).
Estimation of a Population Proportion: Large Samples
You can set up a spreadsheet similar to the one above to automatically calculate these values
for a proportion. B3, B4, and B9 in the spreadsheet below are the input areas.
One notable difference in this sheet from the previous example is the calculation of Margin of
Error. In this sheet, the formula Z-Critical * Standard Error (=B11*B6) has been used instead of
the CONFIDENCE() function, since the CONFIDENCE() is used only in population mean situations.
The CONFIDENCE() function may be used with some modifications: Use the Standard Error for
the standard deviation argument and a sample size of n = 1. This works because the Standard
Error already has the sample size calculated into it.
The spreadsheet illustrated has the number of successes put in as a formula in B4 which is
̂.
This is the product of the sample size and the sample proportion. The number of successes (B4)
may also be entered as a numerical value such as 440:
Chapter 8/ Introductory Statistics - Mann
7
Excel Manual
The following sheet references Example 8-7 on page 384 of your text and calculates the
confidence interval about the population proportion. In this example, the sheet determines the
proportion of American adults that said owning a home is a very important part of the American
dream with 95% confidence.
According to the results, the 95% confidence interval for the proportion of American adults that
say that owning a home is a very important part of the American dream is (.509, .591).
Determining a Sample Size for Estimating a Population Proportion
Next we’ll review two different Excel methods for determining the sample size necessary to
obtain a maximum margin of error. The first uses the spreadsheet from the previous example
along with a handy feature in Excel called Goal Seek. Enter an estimated sample size to begin
with. Be sure the formula is used to calculate x in B4, the number of successes. The .5 represents
p-hat, which is .5 to obtain the largest error. Start B3 at an arbitrary value such as 200.
B3: 200
Chapter 8/ Introductory Statistics - Mann
8
Excel Manual
B4: =B3*0.5
Next, go to the Data tab and select Goals Seek from the What-If Analysis of the Data group, as
shown in the illustration below:
The Margin of Error cell (B12) must be entered into the Set Cell entry. Enter .02 for the To Value
entry to obtain the maximum margin of error requested in the text, and enter the sample size
cell (B3) for the By Changing entry. See illustration below.
When you click OK, Excel will perform a numerical analysis routine to determine the
approximate sample size to obtain the desired margin of error. The results for this scenario are
shown below:
Chapter 8/ Introductory Statistics - Mann
9
Excel Manual
In this case, the margin of error is slightly over the .02 accepted. Step up the sample size
incrementally from 2357, and continue by trial and error by entering values. You will find that
entering 2401 for the sample size yields the first margin of error less than .02 for a confidence
level of 95%, just as shown in Example 8-9.
As an alternative method, you can set up a spreadsheet to find the answer. The setup for this
looks essentially the same as the sheet for means. The only difference is that instead of using
the standard deviation for the margin of error, the calculation uses the product of the
probability success and the probability of failure (pq) or p(1-p). The spreadsheet should look like
this:
To practice this, look at Example 8-9 on page 387. To estimate the number of subjects it is
necessary to survey to ensure the margin of error is no more than 2% (if the estimated
proportion is .5 and the level of confidence is 95%), set up the spreadsheet as shown:
Chapter 8/ Introductory Statistics - Mann
10 Excel Manual
To obtain the sample size necessary to ensure the margin of error is at most 2%, round this
value up to 2401.
Chapter 8/ Introductory Statistics - Mann