Download Interval Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
---‫לא מסווג‬---
Quantitative Methods
2013
Confidence Interval Estimation
µ
x
[---------------------
x ---------------------]
[--------------------- x ---------------------]
[--------------------- x ---------------------]
---‫לא מסווג‬---
1
Point Estimate for Population μ
A point estimate is a single value estimate for a population parameter. Point estimate of the population mean, µ, is the sample
mean.
Example:
A random sample of 32 textbook prices is taken from a local bookstore. Find a point estimate for the population mean, µ.
34
56
79
94
34
65
86
95
38
65
87
96
45
66
87
98
45
67
87
98
45
67
88
101
45
68
90
110
54
74
90
121
2
---‫לא מסווג‬---
34
56
79
94
34
65
86
95
38
65
87
96
45
66
87
98
45
67
87
98
45
67
88
101
45
68
90
110
54
74
90
121
x ≈ 74.22
The point estimate for the population mean of textbooks in the bookstore is $74.22.
The problem with one value or a point estimate is that they are presented as being exact and the probability of them being precisely the right value is low.
3
---‫לא מסווג‬---
Interval Estimate
In practice it is more meaningful to have an interval
estimate and to quantify these intervals by probability levels that give an estimate of the error in the measurement.
•
74.22
Lower Confidence Limit
interval estimate
Upper
Confidence Limit
A point estimate is a single number. The interval estimate is a range within which the population parameter is likely to fall.
4
---‫לא מסווג‬---
Point estimate for textbooks
•
74.22
Interval estimate
How confident do we want to be that the interval estimate contains the population mean, μ?
The estimate for the prices of the textbooks is between 66.1 and 82.34 and I am 95% confident of these figures.
5
---‫לא מסווג‬---
Population
(mean, μ, is unknown)
74.22
I am 95% confident that μ is between 66.1 and 82.34.
Sample
6
---‫לא מסווג‬---
An interval estimate can be computed by adding and
subtracting a margin of error to the point estimate.
Point Estimate +/− Margin of Error
74.22
•
Margin of Error
Margin of Error
Lower Confidence Limit
Upper
Confidence Limit
The general form of an interval estimate of a
population mean is:
+/−
7
---‫לא מסווג‬---
Lower Confidence Limit
Margin of Error
Margin of Error
Point Estimate
Upper
Confidence Limit
Width of confidence interval
An interval gives a range of values:
– Takes into consideration variation in sample statistics from sample to sample.
– Based on observations from 1 sample.
– Gives information about closeness to unknown population parameters.
– Stated in terms of level of confidence.
8
---‫לא מסווג‬---
Level of Confidence
The level of confidence c is the probability that the interval estimate contains the population parameter.
Since the sampling distribution shows how values of X
are distributed around the population mean μ, the sampling distribution of X provides information about the possible differences between X and μ.
c
(1 – c) µ
The remaining area in the tails is 1 – c .
9
---‫לא מסווג‬--Interpretation:
In the long run, c% of all the confidence intervals
that can be constructed will contain the unknown true parameter.
c = 1− α
x
µx = µ
A specific interval either will contain or will not contain the true parameter
x1
x2
c %
of intervals constructed contain μ; (1‐c) % do not.
Confidence Intervals
10
---‫לא מסווג‬---
Suppose confidence level = 95% Also written (1 ‐ α) = 0.95
α is called the level of significance
α/2
1− α
α/2
11
---‫לא מסווג‬---
Confidence Intervals for the Mean
(Large Samples)
12
---‫לא מסווג‬---
Confidence Interval for μ (n ≥ 30 or σ Known ) Assumptions
‐ n ≥ 30 ‐ or σ known with a normally distributed population
Confidence interval estimate:
σ
X ± Zc
n
When n ≥ 30, the sample standard deviation, s, can be used for σ.
where is the point estimate, X
σ/ n is the standard error
13
---‫לא מסווג‬---
Commonly used confidence levels are 90%, 95%, and 99%
Confidence Level
80%
90%
95%
98%
99%
99.8%
99.9%
Confidence Coefficient
Z value
0.80
0.90
0.95
0.98
0.99
0.998
0.999
1.28
1.645
1.96
2.33
2.58
3.08
3.27
c = 1−α
14
Sampling Distribution of the Mean
---‫לא מסווג‬---
c = 1− α
α/2
Intervals extend from
X+Z
to X−Z
α/2
x
µx = µ
x1
σ
n
x2
c % of intervals constructed contain μ; (1‐c)% do not.
σ
n
Confidence Intervals
15
---‫לא מסווג‬---
Finding a Confidence Interval for a Population Mean (n ≥ 30 or σ known with a normally distributed population)
In Words
1. Find the sample statistic.
In Symbols
x =
2. Specify σ, if known. Otherwise, if n ≥ 30, find the sample standard deviation s and use it as s =
an estimate for σ.
∑x
n
∑( x − x )2
n −1
3. Find the critical value zc that corresponds to Use Excel.
the given level of confidence.
=NORSMINV((1+c)/2)
4. Find the left and right endpoints and form the confidence interval.
16
---‫לא מסווג‬---
Example:
A random sample of 32 textbook prices is taken from a local college bookstore. The mean of the sample is 74.22, the sample standard deviation is s = 23.44.
Construct a 95% confidence interval for the mean price of all textbooks in the bookstore. Since n ≥ 30, s can be substituted for σ.
σ
X±Z
x = 74.22
n
Use a 95% confidence level
s = 23.44
Z = 1.96
17
74.22
---‫לא מסווג‬---
σ
X±Z
n
Left endpoint = ?
•
74.22 – 8.12 = 66.1
≈ 1.96
23.44
32
Right endpoint = ?
•
74.22
•
74.22 + 8.12 = 82.34
With 95% confidence we can say that the cost for all textbooks in the bookstore is between $66.10 and $82.34.
18
---‫לא מסווג‬---
Example:
A random sample of 25 students had a grade point average with a mean of 2.86. Past studies have shown that the standard deviation is 0.15 and the population is normally distributed. Construct a 90% confidence interval for the population mean grade point average.
n = 25
x = 2.86
2.81 < σ = 0.15 zc = 1.645
σ
< 2.91
X±Z
n
With 90% confidence we can say that the mean grade point average for all students in the population is
between 2.81 and 2.91.
19
---‫לא מסווג‬---
Confidence Intervals for the Mean (Small Samples)
20
---‫לא מסווג‬---
The t-Distribution
When a sample size is less than 30, and the random variable X is approximately normally distributed,
use a t‐distribution.
X ± tc
s
n
21
---‫לא מסווג‬---
Properties of the t‐distribution
1. The t‐distribution is bell shaped and symmetric about the mean.
2. The t‐distribution is a family of curves, each determined by a parameter called the degrees of freedom. The degrees of freedom are the number of free choices left after a sample statistic such as x is calculated. When you use a t‐distribution to estimate a population mean, the degrees of freedom are equal to one less than the sample size.
d.f. = n – 1 Degrees of freedom
22
---‫לא מסווג‬---
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated.
Example: Suppose the mean of 3 numbers is 8.0
Let X1 = 7
Let X2 = 8
What is X3?
If the mean of these three values is 8.0, then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
2 values can be any numbers, but the third is not free to vary for a given mean.
23
---‫לא מסווג‬---
t ‐ Distribution
As the degrees of freedom increase, the t-distribution approaches the normal distribution. After 30 d.f., the t-distribution is very close to the standard normal z-distribution.
Standard Normal
(t with df = ∞)
t‐distributions are bell‐
shaped and symmetric, but have ‘fatter’ tails than the normal
t (df = 13)
t (df = 5)
t
24
---‫לא מסווג‬---
=TINV(probability, df) is used to find the value of the t under the distribution given the
total area outside the curve or α.
Note the difference in the way you enter the variables for the t and the normal distribution. For the t‐distribution you enter the area in the tails, whereas for the normal distribution you enter the area of the curve from the extreme left to a value on the x‐axis.
25
---‫לא מסווג‬---
Critical Values of t
Example:
Find the critical value tc for a 95% confidence when the sample size is 5.
95% of the area under the t‐distribution curve with 4 degrees of freedom lies between t = ±2.776.
c = 0.95
=TINV(0.05,4)
−tc = − 2.776
tc = 2.776
t
26
---‫לא מסווג‬--Constructing a Confidence Interval for the Mean: t‐Distribution
In Words
In Symbols
1. Identify the sample statistics.
x =
∑x
n
2
s = ∑( x − x )
n −1
2. Identify the degrees of freedom, the level of confidence c, and the critical value tc.
d.f. = n – 1 =TINV(alpha, df) 3. Find the left and right endpoints and form the confidence interval.
x ± tc
s
n
27
---‫לא מסווג‬---
Example:
A random sample of n = 25 taken from a normal population has X = 50 and s = 8. Form a 95% confidence interval for μ
d.f. = n – 1 = 24, so t0.95 , n −1 = t 0.95,24 = 2.0639
=TINV(0.05,24)
The confidence interval is X ± tc, n -1
s
8
= 50 ± (2.0639)
n
25
46.698 ≤ µ ≤ 53.302
28
Example:
In a random sample of 20 customers at a local fast food restaurant, the mean waiting time to order is 95 seconds, and the standard deviation is 21 seconds. Assume the wait times are normally distributed and construct a 90% confidence interval for the mean wait time of all customers. ---‫לא מסווג‬---
x = 95
n = 20
d.f. = 19
tc
s
n
= 1.729 ⋅
s = 21
tc = 1.729 =TINV(0.1,19)
21
20
= 8.1
29
---‫לא מסווג‬---
X ± tc, n -1
s
n
86.9 < μ < 103.1
We are 90% confident that the mean wait time for all customers is between 86.9 and 103.1 seconds. 30
---‫לא מסווג‬---
Normal or t‐Distribution?
Use the normal distribution with
Is n ≥ 30?
Yes
No
Yes
Is σ known?
No
σ
n
If σ is unknown, use s instead.
No
Is the population normally, or
approximately normally,
distributed?
X±Z
You cannot use the normal
distribution or the t-distribution.
Use the normal distribution with
Yes
X±Z
σ
n
Use the t-distribution with
n – 1 degrees of freedom.
s
X ± tc
n
31
---‫לא מסווג‬---
Normal or t‐Distribution?
Example:
Determine whether to use the normal distribution, the t‐distribution, or neither.
a.) n = 50, the distribution is skewed, s = 2.5
The normal distribution would be used because the sample size is 50.
b.) n = 25, the distribution is skewed, s = 52.9
Neither distribution would be used because n < 30 and the distribution is skewed.
c.) n = 25, the distribution is normal, σ = 4.12
The normal distribution would be used because although n < 30, the population standard deviation is known.
32
Question:
---‫לא מסווג‬---
The 95% confidence interval of the sample mean of employee age for a major corporation is 19 years to 44 years based on a z‐statistic. The population of employees is more than 5,000 and the sample size of this test is 100. Assuming the population is normally distributed, the standard error of mean employee age is closest to:
A. 1.96,
B. 11.58,
C. 6.38,
D. 12.50.
33
---‫לא מסווג‬---
C. At the 95% confidence level, with sample n=100 and mean 31.5 years, that appropriate test statictic is z=1.96 .
Thus, the confidence interval is
31.5 ± 1.96 s X
where is the standard error of the sample mean sX
If we take the upper bound, we know that 31.5 ± 1.96 s X = 44
s X = 6.38
34
---‫לא מסווג‬---
Question:
An agricultural inspector wants to now the level of vitamin C in an load of kiwi fruits.
The inspector took a random sample of 25 kiwis from the ship’s hold and measured the vitamin C content (in milligrams).
Milligrams of vitamins per kiwi sampled:
109 88 91 136 93
101 89 97 115 92
114 106 94 109 110
97 89 117 105 92
83 79 107 100 93
35
---‫לא מסווג‬---
Estimate the average level of vitamin C in the
kiwi fruits and give a 95% confidence level of this estimate.
Lower confidence level: 95.01
Upper confidence level: 105.47
36
---‫לא מסווג‬---
APPENDIX
37
---‫לא מסווג‬---
Student’s t Table
1-tail
2-tails
d.f.
0.25
0.5
0.1
0.2
0.05
0.1
0.025
0.05
0.01
0.02
0.005
0.01
0.001
0.002
1
2
3
4
1.000
0.816
0.765
0.741
3.078
1.886
1.638
1.533
6.314
2.920
2.353
2.132
12.706
4.303
3.182
2.776
31.821
6.965
4.541
3.747
63.657
9.925
5.841
4.604
318.309
22.327
10.215
7.173
The body of the table contains t values, not probabilities
Let: n = 3 df = n ‐ 1 = 2 α = 0.10
α/2 = 0.05
α/2 = 0.05
α/2 = 0.05
0
2.920
t
c=1‐α=0.9
38