Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia, lookup

Transcript
```8.3: Confidence Intervals for
Means
2.9.2017
Means vs. Proportions
• We started with calculating confidence
intervals for proportions
– Usually categorical/discrete variables
– E.g. whether a card is red
• Now we’re going to do confidence intervals for
means
– Usually used for continuous variables
– Could also be used for discrete as well (but not
categorical)
Two Very Different Situations
We know the population
standard deviation
• This is basically what we did in
8.1
• We use the population
standard deviation to calculate
the standard deviation of the
distribution
normal distribution (just like
they did for proportions in 8.2)
We DON’T know the population
standard deviation
• We have to estimate the
standard deviation from the
sample
• We use the sample standard
deviation to estimate the
standard deviation of the
distribution
• Critical values follow the t
distribution
Known Population Standard Deviation
• This is the easier case
• CI=(point estimate) ± (critical value)(St. dev of
distribution)
• Standard deviation of the distribution equals
standard deviation of the population divided by
the square root of n
•
Example
• Researchers would like to estimate the mean
cholesterol level of a particular variety of
monkey that is often used in laboratory
experiments. They take a sample of 200
monkeys, and find a mean of 34.2 mg/dl. A
previous study has found that the standard
deviation of cholesterol levels is about 5
mg/dl. Find the 95% confidence interval
Example
• (34.2) ± (critical value)(5 / 200)
• (34.2) ± (2)(.354)
• (34.2) ± (.707)
• 33.493—34.907
Back to the Monkeys
• Researchers would like to estimate the mean
cholesterol level of a particular variety of
monkey that is often used in laboratory
experiments. They would like their estimate to
be within 1 mg/dl of the true value at a 95%
confidence level. A previous study has found
that the standard deviation of cholesterol
levels is about 5 mg/dl. What sample size do
they need to take?
Back to the Monkeys
• ME=(critical value)(st dev of distribution)
• 1=(critical value)(5 / 𝑛)
• 1=(2)(5 / 𝑛)
• .5=(5 / 𝑛)
• 𝑛=5/.5
• 𝑛=10
• N=100
Back to the Monkeys (again)
• What if we wanted a 90% confidence level
instead of a 95% confidence level
Back to the Monkeys (again)
• What if we wanted a 90% confidence level instead of a
95% confidence level
•
•
•
•
•
•
1=(critical value)(5 / 𝑛)
Critical value = invnorm(.05,0,1) = 1.645
1=(1.645)(5 / 𝑛)
.608=(5 / 𝑛)
𝑛= 8.224
N=67.63
• So they should sample 68 monkeys
When we DON’T know the population
standard deviation
• This is the harder situation
– And, unfortunately, more common
• Note: this only applies to means, NOT PROPORTIONS
• The difference occurs in the way that we find critical
values
– Before, when we knew the standard deviation of the
population, we could easily calculate how many standard
deviations we needed to go out from the mean to capture
the desired proportion of the data
• .95 for a 95% confidence interval gave us a critical value of 2 (1.96)
Critical Values
• So now we are replacing σ (pop. Standard
deviation) with 𝑆𝑥 (sample standard deviation)
• Z was our critical value
– For proportions
– For means when we know σ
• But now our critical value is t
Why do we need a new critical value?
• If we take a sample from a population, and measure the
standard deviation of the sample, it is (on average) going to
vary more than the standard deviation of the population. See
the simulation below (
The t distribution
• This new distribution is
called the t distribution
• It looks pretty similar to a
normal distribution
• The difference is that it
has more area in the tails
than a normal
distribution
– Particularly with smaller
sample sizes
So…how do we get critical values?
• Instead of using standard normal probabilities
– Table A or invnorm()
• Now we will use the t distribution
– T distribution only has one input: ‘degrees of
freedom’ or ‘df’
– Sample size minus 1
What are degrees of freedom?
• This is a common question that people want to
ask—what do the degrees of freedom mean?
• The teacher’s edition, unsatisfyingly, says:
“Unfortunately, there is no simple answer. For
now, simply explain that the shape and spread of
the t distributions depend on the degrees of
freedom, which depend on the sample size. The
larger the sample size—and the more degrees of
freedom—the closer the t-distributions come to
the standard Normal distribution”
What are degrees of freedom?
• That answer doesn’t help much
• Unfortunately, the answer has to do with ideas that are above the
• One fairly accurate way of thinking about it is that it is the number
of pieces of information that we have that allow us to make an
estimate
• For more precise (and more complicated) explanations:
https://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)
• In practice, you don’t need to know what degrees of freedom really
mean—you just need to know how many there are
So…how do we get critical values?
• Instead of using standard normal probabilities
• Now we will use the t distribution
– T distribution only has one input: df
• 2 options:
1. Use Table B
2. (if you have a TI-84 or above): invT()
Using Table B
• Table B is (I think) more intuitive to use than
Table A, so it is a viable option
– Find your degrees of freedom (df) on the rows,
and your confidence level C on the collumns
– Where they intersect tells you the critical value
Using Table B
Using InvT
• Only if you have a TI-84 or above
– Sorry TI-83 owners
• Works the same way as Invnorm
– You plug in the area (to the left)
– And then the degrees of freedom
• Table B told us the critical value was 2.201
• InvT tells us that the critical value is 2.200985143
– Rounds to 2.201
• We can only use these procedures if the sample size is big
enough
• If n<15, only if the data appear to be approximately Normal
(symmetric, single-peaked, no outliers)
• If 15≤n<30, can use unless there are significant outliers or
skewness
• If n≥30, go for it!
• Sidenote: if you are choosing the sample size for a study—it
should probably be bigger than 30 so that this isn’t a
problem
An Example
• John Isner is a professional tennis player. I take
an SRS of size 51 of his first serves and
measure their speed in miles/hour.
• In the sample, the mean is 124 mph and the
standard deviation is 8 mph
• Find the 98% confidence interval for the mean
• (Point Estimate) ± ME
• (Point estimate) ± (critical value)(St. dev)
• (Point Estimate) ± ME
• (Point estimate) ± (critical value)(St. dev)
• 124 ±
•
•
•
•
•
𝑆𝑥
(2.109)( )
𝑛
8
(2.403)( )
51
124 ±
124 ± (2.403)(1.12)
124 ± 2.691
121.31—126.69
We are 98% confident that John Isner’s mean first
serve speed is between 121.31 and 126.69
Different Example
• A random sample of 30 students received SAT
math scores with a mean of 580 and a
standard deviation of 80
• Find the 60% confidence interval
• (Point Estimate) ± ME
• (Point estimate) ± (critical value)(St. dev)
• 580 ±
•
•
•
•
•
𝑆𝑥
(.854)( )
𝑛
80
(.854)( )
30
580 ±
580 ± (.854)(14.606)
580 ± 12.473
567.53—592.47
We are 60% confident that the mean SAT math score is
between 567.53 and 592.47
```