Download MATH 115 ACTIVITY 1:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
MATH 116 ACTIVITY 9:
WHY:
Estimating a mean using the t-distributions (using only sample data)
We want to be able to estimate a mean for a population based strictly on sample data - that is, without having to
know population standard deviation (). "Student's t" is the usual statistic for forming confidence intervals [and
testing hypotheses] for the mean of a population. The methods used (and the form of the tables, for hand checking)
are common to a wide variety of estimation and test situations, so it is essential to get a firm grasp on them.
LEARNING OBJECTIVES:
1.
Be able to find a confidence interval for a specified level of confidence in estimating a population mean (using Student's t).
2.
Understand the meaning of a confidence interval estimate for a population parameter.
3.
Work as a team, using the team roles.
CRITERIA:
1.
Success in completing the exercises.
2.
Success in working as a team and in filling the team roles.
RESOURCES:
1.
The Math 116 Statistics Handbook - chapter 6 on Estimation of a mean and the table of Critical Values for Student's t.
2.
Your calculators
3.
The team role desk markers (handed out in class for use during the semester)
4.
40 minutes
PLAN:
1.
Select roles, if you have not already done so, and decide how you will carry out steps 2 and 3
2.
Read the discussion and work through the model of calculation of a confidence interval based only on sample data [when we
don't know ] (below)
3.
Complete the exercises given here - be sure all members of the team understand and agree with all the results in the recorder's
report.
4.
Assess the team's work and roles performances and prepare the Reflector's and Recorder's reports including team grade .
DISCUSSION:
We have considered the general form of a confidence interval for estimating the mean of a population when the variable is


approximately normal or the sample is large: If  is known, our C confidence interval is given by x - E to x + E , with E =
1-C
z*X/ n . Here z* is the upper z-value to cut off the middle C of the area under the z-curve (cuts off the top 2 of the area).
We could find z* values for typical confidence levels in the bottom row of the "Student's t" table.
If  is not known, we estimate it with s (sample standard deviation) - but now when we standardize, we are working with t


x-
x-

=
instead of Z =
- there are two sources of variation (variation in x and variation in s) instead of one. Due to
s/ n
/ n
work of S. Gossett (about 1910 – he used the name “Student” when he published because the brewing industry was so very

x-
secretive) we know the distribution of
- it's a lot like Z, but spreads out more - and how much more depends on
s/ n

x-
"degrees of freedom" (how much s is likely to vary). For t =
, the "degrees of freedom" measurement is n-1 (sample
s/ n
size minus 1).
When working with s instead of (essentially – all the time for working with means) we use t* instead of z* for confidence
intervals. We have to use the appropriate row (determined by degrees of freedom) to find the critical values (the t* values). 

our formula for estimating  is still x - E to x + E , but with E = t*sX/ n based on n-1 degrees of freedom.
MODEL:
If we are trying to estimate the height of mature ponderosa pines based on a sample of 28 heights giving a mean 58.2 ft and
standard deviation 6.7 ft [notice this says the standard deviation came from the sample - that's the usual situation]. We would
first note that heights tend to be approximately normal, so our procedures seem reasonable [if we had the real data here, we'd
look at the histogram - should be unimodal, roughly symmetric, no outliers] .
To estimate the mean height with 90% confidence, E = t*(6.7)/ 28, df = 27 so t* = 1.703 [notice this is a bit bigger than
1.645 which is the z* value for 90% confidence] and our error allowance is 2.2 ft
With 90% confidence, we say the mean height of ponderosa pines is between 56.0 ft and 60.4 ft.
If we wanted 95% confidence, all that would change would be the t* value - for df=27 and confidence = 95%, t* = 2.052 - we
would have to allow for error (in estimating) E = 2.052(6.7)/ 28 = 2.6 ft
With 95% confidence, we say the mean height is between 55.6 ft and 60.8 ft.
EXERCISES:
1.
A random sample of 15 households in a small city gives the following number of hours of television watched in one week (for
each) . A marketing firm wishes to determine whether the amount of television watching per week .
a.) Make a histogram or stem plot to show that the data are not badly skewed and have no outliers (so using our estimation
methods is justified)
b.) give a 90% confidence interval estimate of the mean amount of television watched (per week) by households in this city.
c.) Give a 99% confidence interval estimate for the mean amount of television watched (per week) by households in this city.
# hours of television watched (n = 15)
7.5
6.3 7.2
5.0 7.9 10.7 8.9 8.8 10.3 8.8 9.5 9.5 6.1 9.4 8.4
2.
Three confidence-interval estimates of a mean weight are given here. Two are based on the same data, but are 90% and 95%
confidence intervals. The other is a 90% confidence interval based on a different set of data. Identify them - which is which?

[Hint - the two estimates based on the same data use the same value of x ]
a.) 30.0 g to 32.2g
b.) 29.6 g to 32.0g
c.) 29.7g to 32.5 g
3.
If 200 researchers collect sample of Lake Michigan water and each uses her sample to calculate a 90% confidence interval
for the mean concentration of mercury (per liter) about how many of the intervals will be wrong (will miss the real mean).
Assume these are very careful researchers, and they make no mistakes, so incorrect results are the result of random chance,
not of errors by the researchers.
CRITICAL THINKING QUESTIONS:(answer individually in your journal)
1.
Why are the t*-values for various confidence levels larger than the corresponding z*-values for these confidence levels? Why
do the t* values get close to the z* values when we look at large "degrees of freedom" values [which correspond to lare
sample sizes]?
2.
Can you see how we switched from calculating probability of getting a certain type of sample (knowing  and ) to

estimating the mean for the population, by using the idea of "error of the estimate" (amount by which x misses the mean )?
Does the connection make sense, or does it seem like magic?
SKILL EXER CISES
Exercises for Chapter 6, #4,5, 7-12 (skip 9c, 11b)