Document related concepts

Degrees of freedom (statistics) wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Taylor's law wikipedia, lookup

Resampling (statistics) wikipedia, lookup

German tank problem wikipedia, lookup

Misuse of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Transcript
```Chapter 9
Estimating the Value of a Parameter KEY
Review on symbols:
𝑥̅ the mean of a sample
𝑝̂ the proportion of a sample
µ the true population mean
p the true population proportion
𝜎 the true population standard deviation
s the sample standard deviation
n the size of the sample (number of data collected)
Which of the above is a parameter?
Which of the above is a statistic?
Chapter 9.1 Estimating a Population Proportion
Objective A :Point Estimate
A point estimate is the value of a statistic that estimates the value of a parameter.
x
The best point estimate of the population proportion is a sample proportion ( pˆ  ).
n
The best point estimate of the population mean is a sample mean ( x 
 x ).
n
Since p̂ varies from sample to sample, we use an interval based on p̂ to capture the unknown
population proportion with a level of confidence.
Objective B : Confidence Interval
A confidence interval for an unknown parameter consists of an interval of numbers based on
a point estimate.
The level of confidence represents the expected proportion of intervals that will contain the
parameter if a large number of different samples is obtained.
The level of confidence is denoted as 1    100% . The level of confidence controls the width
of the interval.
Confidence interval estimates for a parameter are of the form:
Point estimate  margin of error.
Confidence interval for p :
1
pˆ  Z  / 2   pˆ
 pˆ 
where
pˆ (1  pˆ )
n
provided that n pˆ (1  pˆ )  10 . Used when p (the true
proportion) is not known. Can be written as pˆ  E
The value of Z / 2 is called the critical value of the distribution.
The margin of error, E , in a 1    100% confidence interval for a population proportion is
pˆ (1  pˆ )
. The width of the interval is determined by the margin of error.
n
Note: More confidence leads to a wider interval
Ex. 100% confidence
vs
50% confidence:
given by E  Z  / 2
Example 1:Use StatCrunch to determine the critical value Z / 2 that corresponds to the given level
of confidence. Stat-calc-normal –between-μ=0, σ=1
(a) 90%
(b) 95%
Diagram:
diagram:
P(____< x <____) = .90 P(____< x <____) = .95
Compute
𝑧1 = -1.645
𝑧1 = -1.96
(c) 98%
diagram:
P(____< x <____) = .98
𝑧1 = -2.33
(d) 92%
diagram:
P(____< x <____) = .92
𝑧1 = -1.75
𝑧2 = 1.645
𝑧2 = 1.96
𝑧2 = 2.33
𝑧2 = 1.75
NOTE: for 95%
z is close 2 SD
Example 2: Determine the margin of error (E) for p with x  540 and n  900 at a 99% level
of confidence. Assume this represents the number who admitted to have texted in the last
month while driving from a sample of 900 people.
𝑥
540
𝑝̂ =𝑛 = 900 = 0.60
Statcrunch gives z = 2.58 for 99% level of confidence
pˆ (1  pˆ )
.60(.40)
=2.58√ 900 =0.0421
n
Note: the 99% CI: 0.60 ± .0421
E  Z / 2
2
Example 3: A Rasmussen Reports national survey of 1000 adult Americans found that 18% dreaded
Valentine's Day. Construct a 95% confidence interval for the population proportion of adult
Americans who dread Valentine's Day. Explain what does the interval mean.
n=1000
𝑝̂ = 0.18
95% CI:
0.18 ± 2SE (You can use 1.96 for z to be more exact. For 68% you can use z=1, for 99.7% use z=3))
Standard deviation of sample (or standard error of sample) =  pˆ 
pˆ (1  pˆ )
.18(.82)
=√ 1000 =0.01215
n
So 0.18 ± 2SE = 0.18 ± 2(0.01215) = 0.18 ± .024
CI: (0.18 – 0.024, 0.18 + 0.024) = (0.156, 0.204) or between 15.6% and 20.4%
We are 95% confident that the true proportion of adults who dread Valentine’s Day is between 15.6% and
20.4%.
Summary of CI’s
68% use z = 1
90% z=1.65
95% z=2 (or 1.96)
98% z = 2.33
99.7% z= 3
99% z =2.58
95% confidence interval: About 95 samples out of 100 will capture the true proportion and about 5 samples
will not.
Example 4: Construct a confidence interval of the population proportion at the given level
of confidence where 80 students out of 200 came late to class for a lecture on a randomly
selected day.
x  80, n  200, 96% confidence
Stat-proportion stat-one sample-with summary-80 successes-200 observations-confidence interval 96%compute
Lower limit, upper limit = (0.329, 0.471)
*We are 96% confident that the true population proportion of students who come late is between 32.9% and
47.1%.
Example 5: In a study of 1228 randomly selected medical malpractice lawsuits, it is found that 856
of them were later dropped or dismissed.
(a) What is the best point of estimate of the proportion of medical malpractice lawsuits
that are dropped or dismissed?
856
𝑝̂ =
= 0.697
1228
3
(b) Construct a 99% confidence interval (by hand) for the population proportion
of medical malpractice lawsuits that are dropped or dismissed?
99%: use z = 2.58
pˆ (1  pˆ )
.697(.303)

 0.0131
n
1228
CI = 0.697 ± 2.58(0.0131) = 0.697 ± 0.0338
CI : (.697 - .034, .697+.034) = (0.663, 0.731)
CI = 𝑝̂ ± 2.58𝑆𝐸
where SE =
(c) Interpret the interval.
We are 99% confident that the true proportion of medical malpractice lawsuits that are dropped/dismissed is
between 66.3% and 73.1%.
Objective C :Sample Size Needed for Estimating the Population Proportion p
The sample size required to obtain a 1    100% confidence interval for p with a
margin of error E is given by
Z

n  pˆ (1  pˆ )  / 2 
 E 
2
Round up to the next integer
p̂ is a prior estimate of p
If a prior estimate of p is unavailable, the sample size required is
2
Z 
n  0.25   /2 
Round up to the next integer
 E 
So you can use 𝑝̂ = 0.50 if proportion is not given
Example 1 : An urban economist wishes to estimate the proportion of Americans who own
their homes. What size sample should be obtained if he wishes the estimate to be
within 0.02 with 90% confidence if:
(a) he uses a 2010 estimate of 0.669 obtained from the U.S Census Bureau?
He wants: 𝑝̂ ± 0.02 = 0.669 ± 0.02 with 90% confidence so use z = 1.65
0.02 = 1.65 SE
.669(.331)
0.02 = 1.65 √
𝑛
0.02
. 669(.331)
=√
1.65
𝑛
4
0.0121 = √
. 669(.331)
𝑛
(0.0121)2 =
. 669(.331)
𝑛
(0.0121)2 𝑛 = .669(.331)
𝑛=
.669(.331)
(0.0121)2
≈ 1507.17 round up to 1508. He should sample 1508 Americans so that the estimate is within
0.02 margin of error at a 90% confidence level.
Note: For MML you might have to be more precise and use z = 1.645 for 90%CI and z = 1.96 got 95%CI
(b) he does not use any prior estimates?
If not estimate is given for the proportion, then we use 𝑝̂ = .50
.50(.50)
0.02 = 1.65 √
𝑛
0.02
. 50(.50)
=√
1.65
𝑛
0.0121 = √
. 50(.50)
𝑛
(0.0121)2 =
. 50(.50)
𝑛
(0.0121)2 𝑛 = .50(.50)
.50(.50)
𝑛 = (0.0121)2≈ 1707.5 round up to 1708. He should sample 1708 Americans. Note: if you do not round your
Example 2: In a Gallup poll conducted in October 2010, 64% of the people polled answered
"more strict" to the following question: "Do you feel that the laws covering the sale
of firearms should be made more strict as they are now?" Suppose the margin of
error in the poll was 3.5% and the estimate was made with 95% confidence. At
least how many people were surveyed?
(a) he uses a 2010 estimate of 0.64 obtained from the U.S Census Bureau?
𝑝̂ ± 𝐸 = 0.64 ± 0.035 with 95% confidence so use z = 1.96 or 2.
0.035 = 1.96 SE
.64(.36)
0.035 = 1.96√
𝑛
5
0.035
. 64(.36)
=√
1.96
𝑛
0.0179 = √
. 64(.36)
𝑛
(0.0179)2 =
. 64(.36)
𝑛
(0.0179)2 𝑛 = .64(.36)
.64(.36)
𝑛 = (0.0179)2≈ 722.5 round up to 723. So 723 people were surveyed in total.
Note: If z = 2 is used instead, then n = 753 people.
Example 3: A Gallup poll conducted in November 2010 found that 493 of 1050 adult Americans
believe it is the responsibility of the federal government to make sure all Americans
have healthcare coverage.
(a) Obtain a point estimate for the proportion of adult Americans who believe it is the
responsibility of the federal government to make sure all Americans have healthcare
coverage.
493
𝑝̂ =
= 0.470
1050
(b) Verify the requirements for constructing a confidence interval for p are satisfied.
npq ≥ 10
1050(0.47)(0.53) ≥ 10
263 ≥ 10 yes
Condition is met so 𝑝̂ (the sampling distribution) will be normally distributed
(c) Construct a 95% confidence interval for the proportion of adult
Americans who believe it is the responsibility of the federal government to make
sure all Americans healthcare coverage. Interpret the interval.
By hand: 95%: use z = 2
pˆ (1  pˆ )
.47(.53)

 0.0154
n
1050
CI = 0.47 ± 2 (0.0154) = 0.47 ± 0.0308 ≈ 0.47 ± 0.031
CI : (0.47 - 0.031, 0.47+0.031) = (0.439, 0.501)
CI = 𝑝̂ ± 2𝑆𝐸 where SE =
6
We are 95% confident that the true proportion of adults who believe the federal government should cover
healthcare is between 43.9% and 50.1%.
Using Statcrunch (which will use z = 1.96)
Stat-proportion stat-one sample-with summary-493 successes-1050 observations-confidence interval 95%compute
CI = (0.439, 0.500) or between 43.9% and 50.0%
(d) You wish to conduct your own study for the proportion of adult Americans who
believe it is the responsibility of the federal government to make sure all Americans
have healthcare coverage. What sample size would be needed for the estimate to be
within 3 percentage points with 90% confidence if you use the estimate obtained in
part (a). (Use statcrunch).
pˆ (1  pˆ )
for 90% confidence z = 1.65
n
Statcruch: Stat-proportion-one sample- power/sample size-confidence interval 0.90-target proportion 0.47width 0.06-compute
N= 749 rounding up. (The total width is 0.06 since it is 3% on each side of the normal curve.
By hand it would be: 0.47 ± 0.03 and setting 0.03 = 1.65
(e) You wish to conduct your own study for the proportion of adult Americans who
believe it is the responsibility of the federal government to make sure all Americans
have healthcare coverage. What sample size would be needed for the estimate to be
within 3 percentage points with 90% confidence if you do not have a prior estimate?
By hand it would be: 0.50 ± 0.03 and setting 0.03 = 1.65
pˆ (1  pˆ )
for 90% confidence z = 1.65 and we
n
would use 𝑝̂ = 0.50.
Statcruch: Stat-proportion-one sample- power/sample size-confidence interval 0.90-target proportion 0.50width 0.06-compute
N = 752 (rounded up)
Note this is still close to our previous answer but larger.
Chapter 9.2 Estimating a Population Mean
Objective A : Point Estimate
7
The best point estimate of the population mean,  , is the sample mean, x .
Objective B :Student's t - distribution
Properties of the t - distribution
1. The t - distribution is different for different degrees of freedom ( df  n  1 ).
2. The t - distribution has the same general symmetric bell shape as the standard normal
distribution but its area in the tails is a little greater than the area in the tails of the standard
normal distribution due to the greater variability that is expected with small samples.
3. The t - distribution has a mean of t  0 at the center of the distribution.
4. As the sample size n gets larger, the t - distribution gets closer to the standard normal
distribution.
Example 1: Use StatCrunch to determine the t -value.
*Note: There is more variability for smaller sample sizes.
(a) Using Statcrunch find the t -value such that the area in the right tail is 0.05 with 19 degrees of freedom.
Stat-Calc-T-degrees of freedom 19- P(x≥___) = 0.05 –compute
t value = 1.73
Note n = 20 (sample size)
You could also use P(x≤___) = 0.95
(b) Find the t -value such that the area left of the t -value is 0.02 with 6 degrees of freedom.
Stat-Calc-T-degrees of freedom 6- P(x≤___) = 0.02 –compute
t value = - 2.61
(c) Find the critical t -value that corresponds to 95% confidence. Assume 12 degrees of freedom.
Stat-Calc-T-between-degrees of freedom 12- P(___≤x≤___) = 0.95 –compute
(note sample size n = 13)
T value = ± 2.18
d) Find the critical t -value that corresponds to 95% confidence. Assume 50 degrees of freedom.
8
Stat-Calc-T-between-degrees of freedom 50- P(___≤x≤___) = 0.95 –compute
(note sample size n = 51)
T value = ± 2.0086
e) What happened to the width of the interval as the sample size increased from 13 to 51?
As the sample size increases, the interval became more narrow. The distribution became more normal.
In general, the population standard deviation is unknown for estimating a population mean based
on a sample mean. The t -distribution is used to off-set the additional variability introduced by using
s in place of  .
Objective C :Confidence Interval for a Population Mean
Constructing a 1    100% Confidence Interval for 
Point estimate  margin of error
s
s
where E  t  / 2 
.
x  t /2 
n
n
provided the data come from a population that is normally distributed, or the sample size is large.
Example 1: A simple random sample of size n  30 has been obtained. From the normal probability
plot and boxplot, judge whether a t -interval should be constructed.
(a)
9
Yes, no outliers in the normal probability plot and the boxplot is roughly symmetrical. Thus we can use the t
distribution regardless of the sample size.
(b)
No, one slight outlier in the normal probabilty plot and boxplot is skewed to the left. Since the sample size is
less than 30 we cannot use a t distribution.
Example 2: A simple random sample of size n is drawn from a population that is normally distributed to
investigate the age when working people start thinking about retirement .
The sample mean, x , is found to be 50, and the sample standard deviation, s , is found to be 8.
(a) Construct a 98% confidence interval for  if the sample size, n , is 20. (That is 20 people are
surveyed.)
By hand: for 98%: use t = 2.54 from statcrunch (stat-calc-t)
CI: 𝑥̅ ±𝐸== 𝑥̅ ± 2.54 𝑆𝐸 where SE =
CI = 50 ± 2.54 (
8
s
n

8
20
) = 50 ± 4.54
√20
CI= (50 – 4.54, 50 + 4.54) ≈ (45.46, 54.54) ≈ (45.5, 54.5)
One can be 98% confident that the true mean age when people start thinking about retirement is between
45.8 and 54.2 years of age.
Statcrunch:
10
Stat-t stats-one sample-with summary-mean 50, SD 8, n=20, CI .98-compute CI= (45.46, 54.54) ≈ (45.5, 54.5)
(b) Use StatCrunch to construct a 98% confidence interval for  if the sample size, n , is 15.
How does decreasing the sample size affect the margin of error, E ?
Stat-t stats-one sample-with summary-mean 50, SD 8, n=15, CI .98-compute CI= (44.6, 55.4) This interval is
wider. So when sample size is decreased, the CI increases.
8
E = 2.54 (
)=5.25 The margin of error increased when sample size decreased (compared to 4.54 in part a).
√15
You can also find E from the CI: E =
𝑈𝐿−𝐿𝐿
2
=
55.4−44.6
2
=5.4 (Difference due to rounding in computations)
(c) Construct a 95% confidence interval for  if the sample size, n , is 20.
Compare the results to those obtained in part (a). How does decreasing the level of
confidence affect the margin of error, E ?
Stat-t stats-one sample-with summary-mean 50, SD 8, n=20, CI .95-compute CI=(46.3, 53.7) This CI is more
narrow than the one in part a: (45.5, 54.5). So as the level of confidence increases, the interval will become
wider.
E for 95% CI is E =
𝑈𝐿−𝐿𝐿
2
=
53.7−46.3
2
= 3.7 This is smaller than E in part a: 4.5
So the margin of error decreases as the level of confidence decreases.
In part a: About 98 samples out of 100 will capture the true proportion.
In part b: About 95 samples out of 100 will capture the true proportion.
(d) Could we have computed the confidence intervals in parts (a) to (c) if the population
had not been normally distributed? Why? No because the conditions to apply this formula
would not have been met. The sample size ,n was less than 30 in both parts.
Example 3: Determine the point estimate of the population mean and margin of error for the
following confidence interval.
Lower bound: 5
Upper bound: 23
E=
𝑈𝐿−𝐿𝐿
2
=
23−5
2
= 9 ( so 5+9 = 14 and 23-9 = 14) The population mean, 𝑥̅ = 14.
You can also compute 𝑥̅ by finding the midpoint of the interval:
23+5
2
=14
Margin of error, E = 9
Point estimate of the popultion mean, 𝑥̅ = 14
Note CI = 14 ± 9
11
Example 4 : How much time do Americans spend eating or drinking? Suppose for a random
sample of 1001 Americans age 15 or older, the mean amount of time spent eating
or drinking per day is 1.22 hours with a standard deviation of 0.65 hour.
(a) A histogram of time spent eating and drinking each day is skewed right. Use this
result to explain why a large sample size is needed to construct a confidence
interval for the mean time spent eating and drinking each day.
Since the population is not normally distributed, you need a large sample size to achieve a sampling
distribution of 𝑥̅ that will be normally distributed. Thus, use n >30.
(b) Determine a 95% confidence interval for the mean amount of
time Americans age 15 or older spend eating and drinking each day. Interpret the interval.
By Hand: First find the t-value using Statcrunch for 95% confidence: stat-calc-t-between, df 1000,
P(___≤x≤___) = 0.95
–compute
t = 1.96 (this is close to the z value for 95% since the sample size was large, which we can then also use 2SE’s)
s
0.65
CI: 𝑥̅ ±𝐸= 𝑥̅ ± 1.96 𝑆𝐸 where SE =

n
1001
CI = 1.22 ± 1.96 (
0.65
) = 1.22 ± 0.040
√1001
CI= (1.22 – 0.040, 1.22 + 0.040) ≈ (1.18, 1.26)
Using Statcrunch: Stat-t stats-one sample-with summary-mean 1.22, SD 0.65, n=1001, CI 0.95-compute
CI = (1.18, 1.26)
*One can be 95% confident that the true mean time spent eating and drinking each day is between 1.18 and
1.26 hours.
(c) Could the interval be used to estimate the mean amount of time a 9-year-old American
spends eating and drinking each day? Explain. No. The study was conducting using people
who were 15 years old or more. Therefore, the point estimate of the mean is for a population that was 15 or
older only.
Objective D : Determining the Sample Size n
The sample size required to estimate the population mean,  , with a level of confidence
1    100% within a specified margin of error,
E , is given by
 Z s 
n    /2 
 E 
where n is rounded up to the nearest whole number.
2
Note: *The t -distribution approaches the standard normal z - distribution as the sample size increases. Z is
used in this formula instead of t to approx. n.
12
Example 1: A researcher wanted to determine the mean number of hours per week (Sunday through
Saturday) the typical person watches television. Results from the Sullivan Statistics Survey
indicate that s  7.5 hours.
(a) How many people are needed to estimate the number of hours people watch television
per week within 2 hours with 95% confidence?
The standard deviation is s  7.5 hours. Note the mean was not provided.
Want CI: 𝑥̅ ± 𝐸 = 𝑥̅ ± 2
2=𝑡
𝑠
√𝑛
Can use z instead of t , z = 1.96 or t = 2
2 = 1.96
7.5
√𝑛
2√𝑛 = 1.96 (7.5)
1.96 (7.5)
√𝑛 =
2
[(1.96
𝑛=
(7.5))/2]2
n = 54.02 round up to 55 people
55 people need to be surveyed per week so that the margin of error is within 2 hours at (at a 95% confidence
level).
(b) How many people are needed to estimate the number of hours people watch television
per week within 1 hour with 95% confidence?
Let’s do this one using Statcrunch: Stat-z stat-one sample- power/sample size-select ‘confidence interval
width’-confidence level 0.95, SD 7.5, width 2-compute
n = 217
(c) What effect does doubling the required accuracy have on the sample size?
If you want to be more accurate (within 1 hour instead of within 2 hours), increase the sample size. In this case
it was increased at a ratio of
217
55
≈ 4. If you double the accuracy, the sample has to be 4 times as large.
Chapter 9 Estimating a Population Standard Deviation (Supplementary Materials)
Finding CI for standard deviations
Objective A : Point Estimate
The best point estimate of the population variance,  2 , is the sample variance, s 2 .
13
Objective B : Chi-Square Distribution
Example 1: Use StatCrunch to find the critical values  12 / 2 and  2 / 2 for the given level of confidence
and sample size.
(a) 90% confidence, n  23
Stat  Calculators  Chi-Square  DF 22 (n-1) 
Between - P(___≤x ≤ ____) = 0.90  compute
The critical values are 12.338 and 33.924. (the ‘z’ values)
14
Objective C : Confidence Interval for a Population Variance or Standard Deviation
(1   ) 100% of the values of  2 will lie between  12 / 2 and  2 / 2 .
( Recall:  2 
(n  1) s 2
2
)
To find a (1   ) 100% confidence interval about  , take the square root of the lower bound and upper
bound.
Example 1: A simple random sample of size n is drawn from a population that is known to be
normally distributed. The sample variance, s 2 , is determined to be 19.8. (Thus the standard
deviation is √19.8 ≈ 4.45).
(a) Use StatCrunch to construct a 95% confidence interval for  2 if the sample size, n , is 10.
Stat  Variance Stats  One Sample  with summary  Sample variance: 19.8, sample size:
10  Confidence interval for 𝜎 2 : 0.95  compute and record the results.
95% confidence interval results:
σ2 : Variance of population
15
Variance Sample Var. DF L. Limit U. Limit
σ2
19.8
9 9.367722 65.99048
*One can be 95% confident that the true variance is between 9.37 and 65.99.
(b) If the sample size is increased to n = 25, how does increasing the sample size affect the width
of the interval? It will decrease the width (becomes narrower).
(c) If the confidence level is increased to 99%, how does increasing the level of
confidence affect the width of the confidence interval?
The interval becomes wider if you want higher confidence.
Example 2: Travelers per taxes for flying, car rentals, and hotels. The following data represent the total
travel tax for a 3-day business trip in eight randomly selected cities. It was verified that the
data are normally distributed. Use StatCrunch to construct a 90% confidence interval for the
standard deviation travel tax for a 3-day business trip. Interpret the interval.
First we need to compute the variance since the raw data has been provided.
Stat  Input given data  Summary Statistics  Columns Var1 Variance compute
Summary statistics:
Column Variance
var1
151.87187
Now we will find the interval:
Stat  Variance Stats  One Sample  with summary 
Sample variance: 151.87187, sample size: 8  Confidence interval for 𝜎 2 : 0.90 
compute
Alternate way: Stat  Input given data  Variance Stats 
One Sample  with data Columns Var1 Confidence interval for 𝜎 2 : 0.90 
compute and record the results.
90% confidence interval results:
σ2 : Variance of population
16
Variance Sample Var. DF L. Limit U. Limit
σ2
151.87187 7 75.573504 490.50829
Manually compute the square root of each limit to change from variance to
standard deviation.
√75.573504= 8.69330224943 Lower Limit
√490.50829= 22.1474217461 Upper Limit
We are 90% confident that the standard deviation of the travel tax for a 3-day
business trip is between \$8.69 and \$22.15.
Summary of CI of means and proportions:
To summarize:
Confidence Intervals
As sample size n increases (you get better results) ----- CI narrows
As standard error (SE or SD) decreases (you get better results) ---- CI narrows
As % of confidence level increases  CI widens
Sample size
As n increases (better results) --- SE/SD decreases
As n increases (better results) ---shape becomes more normal (symmetric)
As n increases (better results) --- sample mean 𝑥̅ approaches true population mean, µ (or sample proportion
𝑝̂ approaches true population proportion, p)
17
```