Download What are "reasonable values" for the population mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Confidence Intervals
Math 283
Confidence Intervals for the Population Mean
Recall that from the empirical rule that the interval of the mean plus/minus 2 times the
standard deviation will contain about 95% of the observations. So if X is distributed
σ
σ 

approximately normally, P  µ − 2
< X < µ+2
 ≈ 0.95 if we rearrange it so µ is
n
n

σ
σ 

in the middle then P  x − 2
< µ < x +2
 ≈ 0.95 . The interval
n
n

σ
σ 

,x +2
x −2
 has a probability of 0.95 of capturing the mean.
n
n

Definition:
If X is the sample mean of a random sample of size n from a population with variance
σ 2 , a (1 − α )100% confidence interval for µ is given by
σ
σ 

, X + Zα / 2
 X − Zα / 2

n
n

Where Zα / 2 is the Z value from the normal table with area α / 2 to its right. If σ , the
population standard deviation is unknown, it can be replaced by s, the sample standard
deviation with no serious loss of accuracy for large sample cases.
If we use α = 0.05 , we report that we are 95% confident that the population mean will be
within our interval. Why are we able to say we are 95% confident?
We know (from the Empirical Rule) that about 95% of all possible sample means will lie
within two standard errors of the actual population mean. We hope that our sample mean
is one of these, because if it is, then our confidence interval will contain the population
mean, and our estimate will be correct. If not, then our interval will be incorrect. But this
only happens 5% of the time.
The term "95% confidence" means that if we took repeated samples, and found a
confidence interval for each sample, 95% of those confidence intervals would actually
contain the population mean; 5% of them would not. Whether our own confidence
interval contains the population mean, we will never know!
The Empirical Rule Theorem v.s. A Confidence Interval
The Empirical Rule Theorem and A Confidence Interval for the Mean are used to answer
two different research questions.
1
Confidence Intervals
Math 283
The Empirical Rule Theorem is used to answer the question "Most of the values for the
variable fall between what two values?" This is a range of values used to discuss what
we know about the individuals in our sample or population.
A Confidence Interval is used to answer the question "What is the mean of the
population?" This is a range of values used to give reasonable values for the population
mean.
Example:
The average zinc concentration recovered from a sample of zinc measurements in 36
different locations in a river is found to be 2.6 grams per milliliter. Find the 95% and
99% confidence intervals for the mean zinc concentration in the river. Assume the
population standard deviation is 0.3.
Example:
An important property of plastic clays is the percent of shrinkage on drying. For a certain
type of plastic clay 45 test specimens showed an average shrinkage of 18.4% with a
standard deviation of 1.2. Estimate the mean percent shrinkage for this type of clay with
a 90% confidence interval.
Another way to think about the confidence interval is: x ± MOE where MOE is the
margin of error, MOE = Zα / 2
σ
. Notice the width or precision of our confidence
n
interval depends on confidence level 1 − α , sample size n, and standard deviation of the
population. The accuracy of our sample mean depends on the sample size, n, the
standard deviation of the population, σ , and bias.
2
Confidence Intervals
Math 283
Example: Decision Making with a Confidence Interval
The owners of General Light are planning to advertise their light bulbs in the Sunday
edition of the newspaper. In the ad, they want to report "the mean lifetime of their light
bulbs." To determine the mean lifetime of their light bulbs, they took a random sample
of 40 light bulbs. For their sample, the bulbs lasted on average, 299.5 hours with a
standard deviation of 58 hours.
1. Construct a 95% confidence interval for the mean lifetime of light bulbs.
2. Should General Light advertise that the mean lifetime of their light bulbs is 350
hours? Why or why not?
3. Should General Light advertise that the mean lifetime of their light bulbs is 310
hours? Why or why not?
Determining Sample Size
When our objective is to estimate the population mean, µ , we should do the following to
determine our sample size:
1. Determine the largest margin of error you are willing to accept and a confidence
level.
2. Obtain or estimate the population standard deviation.
3. Find the sample size, n, that makes the following true: Your MOE = Zα / 2
σ
n
.
4. Check the sample size against your budget. If necessary, return to step 1.
Example:
You are planning a survey of starting salaries for liberal arts major graduates from you
college. From a pilot study you estimate that the standard deviation is about $9000.
What sample size do you need to have a margin of error equal to $400 with 95%
confidence?
3
Confidence Intervals
Math 283
Cautions:
∼ Data must come from a SRS
∼ No correct method from data haphazardly collected with bias of unknown size.
∼ The sample mean is not resistant to outliers. So look at your data carefully before
determining a CI.
∼ If n is small and population is not normal, the true confidence level may be
different from what you used.
o As long as n ≥ 30 , CLT applies
o If n ≥ 15 , it is ok unless there are extreme outliers i.e. quite strong
skewness.
∼ Must know σ .
The Case of the Unknown σ
If X is the sample mean of a random sample of size n where X 1 , X n are from a normal
distribution then the random variable
X −µ
s/ n
has a probability distribution called the t-distribution with degrees of freedom n − 1 .
t=
Properties of the t-distribution (or Student’s t-distribution)
• Bell shaped with the mean zero.
ν
where ν is the degrees of freedom.
•
The variance
•
The limiting distribution of the t is the standard normal distribution as n goes to
infinity.
See Table attached.
•
ν −2
A (1 − α )100% Confidence Interval for µ when σ is unknown
Let x and s be the sample mean and standard deviation of a random sample of size n
from a normally distributed population then the confidence interval is given by
s
s 

, X + tα / 2
 X − tα / 2

n
n

where tα / 2 is the value from the t-distribution with degrees of freedom n − 1 and α / 2 is
the upper tail probability. Note, this interval is fairly robust to non-normal data. If the
data is not too skewed, then t procedure is useful when 15 ≤ n < 40 . When n ≥ 40 , the t
procedure can be used even for skewed data.
4
Confidence Intervals
Math 283
Example:
The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2,
and 9.6 liters. Find a 95% confidence interval for the mean of all such containers,
assuming the data are from a normal distribution.
Example:
A random sample of 12 graduates of a certain secretarial school typed an average of 79.3
words per minute with a sample standard deviation of 7.8 words per minute. Assuming a
normal distribution for the number of words typed per minute, find a 99% confidence
interval for the mean number of words per minute for all graduates.
Confidence Interval for the Population Proportion
If X 1 , X n are independent observations from a population with probability of success,
n
then the random variable X = ∑ X i is distributed binomial with E ( X ) = np and
i =1
V (=
X ) np (1 − p ) . We showed that Z n =
X − np
np (1 − p )
approaches the standard normal
distribution as n goes to infinity.
So the sampling distribution of pˆ = X / n is approximately normal with µ p̂ = p and
σ pˆ =
p (1 − p )
.
n
5
Confidence Intervals
Math 283
A (1 − α )100% confidence interval for p, the population proportion is given by

 pˆ − Zα / 2


pˆ (1 − pˆ )
, pˆ + Zα / 2
n
pˆ (1 − pˆ ) 


n

Where Zα / 2 is the Z value from the normal table with area α / 2 to its’ right.
Example:
A survey of 1280 student loan borrowers found that 448 had loans totaling more than
$20,000 for their undergraduate education. Give a 95% confidence interval for the
proportion of all student loan borrowers who have loans of $20,000 or more for their
under graduate degree.
Determining Sample Size
When our objective is to estimate the population proportion, p, we should do the
following to determine our sample size:
1. Determine the largest margin of error you are willing to accept and a confidence
level.
2. Determine p from previous study or use p = 0.5 .
3. Find the sample size, n, that makes the following true:
p (1 − p )
.
n
4. Check the sample size against your budget. If necessary, return to step 1.
Your MOE = Zα / 2
Example:
You are planning an evaluation of an alcohol awareness program at your college that will
take place six months after the program. How large a sample should you take if you want
the margin of error for 95% to be about 0.1?
6
Confidence Intervals
Math 283
Student’s t Distribution
d.f.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
35
40
45
50
55
60
90
120
∞
0.125
2.414
1.604
1.423
1.344
1.301
1.273
1.254
1.240
1.230
1.221
1.214
1.209
1.204
1.200
1.197
1.194
1.191
1.189
1.187
1.185
1.183
1.182
1.180
1.179
1.178
1.177
1.176
1.175
1.174
1.173
1.170
1.167
1.165
1.164
1.163
1.162
1.158
1.156
1.150
0.100
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.306
1.303
1.301
1.299
1.297
1.296
1.291
1.289
1.282
Upper tail probability
0.075
0.050
0.025
4.165
6.314
12.706
2.282
2.920
4.303
1.924
2.353
3.182
1.778
2.132
2.776
1.699
2.015
2.571
1.650
1.943
2.447
1.617
1.895
2.365
1.592
1.860
2.306
1.574
1.833
2.262
1.559
1.812
2.228
1.548
1.796
2.201
1.538
1.782
2.179
1.530
1.771
2.160
1.523
1.761
2.145
1.517
1.753
2.131
1.512
1.746
2.120
1.508
1.740
2.110
1.504
1.734
2.101
1.500
1.729
2.093
1.497
1.725
2.086
1.494
1.721
2.080
1.492
1.717
2.074
1.489
1.714
2.069
1.487
1.711
2.064
1.485
1.708
2.060
1.483
1.706
2.056
1.482
1.703
2.052
1.480
1.701
2.048
1.479
1.699
2.045
1.477
1.697
2.042
1.472
1.690
2.030
1.468
1.684
2.021
1.465
1.679
2.014
1.462
1.676
2.009
1.460
1.673
2.004
1.458
1.671
2.000
1.452
1.662
1.987
1.449
1.658
1.980
1.440
1.645
1.960
0.010
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.438
2.423
2.412
2.403
2.396
2.390
2.368
2.358
2.326
0.005
63.656
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.724
2.704
2.690
2.678
2.668
2.660
2.632
2.617
2.580
7
Confidence Intervals
Math 283
1.
The average weight of 40 randomly selected minivans was 4150 pounds.
a.
Find and interpret a 98% confidence interval for the mean weight of all
minivans. The standard deviation is known to be 480 pounds.
b.
What could we do to reduce the width of this interval?
c.
What are the advantages/disadvantages of your answers in b?
2.
The weight of grapefruit follows a normal distribution. A random sample of 12
new hybrid grapefruit had a mean weight of 1.7 pounds with a standard deviation
of 0.24 pounds. Find a 95% confidence interval for the mean weight of the
population of new hybrid grapefruits.
3.
A researcher wishes to estimate, within $25, the true average amount of postage
that parents of college students spend each year. If she wishes to be 90% confident,
how large a sample is necessary? The standard deviation is known to be $80.
4.
A survey by Brides magazine found that 8 out of 10 brides are planning to take the
surname of their new husband. How large a sample is needed to estimate the true
proportion to within 3% with 98% confidence?
5.
A researcher wishes to estimate the proportion of adult females under 5 feet tall.
He wants to be 90% confident that his estimate is within 5% of the true proportion.
What sample size should he use?
6.
In a survey of 200 workers, 169 said they were interrupted three or more times an
hour by phone messages, faxes, etc. Find and interpret a 90% confidence interval
of the population of proportion of workers who are interrupted three or more times
an hour.
7.
A sample of 17 states had these cigarette taxes (in cents): 112, 120, 98, 55, 71, 35,
99, 124, 64, 150, 150, 55, 100, 132, 35, 70, 93. Find a 98% confidence interval for
the mean cigarette tax in all 50 states. What assumption is necessary?
8