Download Chapter 7 Inferences Based on a Single Sample: Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Chapter 7
Inferences Based on a Single Sample:
Estimation with Confidence Intervals
7.1
Identifying and Estimating the Target Parameter
Definition 7.1 Target Parameter: The unknown population parameter (mean or proportion) that we are interested in estimating is called the target parameter.
Parameter Phrases
μ
Mean (average)
p
Proportion (fraction, percentage)
Data Type
quantitative
qualitative
Definition 7.2 Point Estimator: A point estimator of a population parameter is a rule
that tells you how to use the sample data to calculate a single number that can be used as an
estimate of the population parameter. For example, the sample mean x̄ is a point estimator
for the population mean μ.
Definition 7.3 Interval Estimator: An interval estimator of a population parameter is a
rule that tells you how to calculate two numbers; an upper and a lower limit, based on the
sample data, forming an interval within which the parameter is expected to lie. This pair
of numbers is called an interval estimate or confidence interval. The large number which
located at the upper end of the interval, is called the upper confidence limit (UCL) and
the number that located at the lower extreme of the interval, is called the lower confidence
limit (LCL).
Confidence width: The difference between UCL and LCL is called confidence width. That
is
Confidence width = U CL − LCL
35
7.2
Confidence Interval for a Population Mean: Normal (z) Statistic
Definition 7.4 Confidence Coefficient: Confidence coefficient is the probability that a
randomly selected confidence interval will enclose the parameter. The confidence coefficient
measures the proportion of samples that produce a confidence interval containing the population parameter. A good confidence interval is as narrow as possible and has a confidence
coefficient near to 1. The narrower the interval, the more exactly we have located the parameter. The larger the confidence coefficient, the more the confidence we have that a particular
interval enclose the parameter.
Large Sample Confidence Interval for μ
The confidence interval for any population mean or proportion is defined as
point Estimator ± Bound (Margin of error)
Point Estimator ± Table Value × SE (Estimator)
σ
x̄ ± z α2 × √ .
n
(7.1)
Using equation (7.1), the (1 − α)100% confidence interval (CI) for μ is obtained as
σ
σ
x̄ ± z α2 × √ = x̄ − z α2 × √
n
n
,
σ
x̄ + z α2 × √ ,
n
(7.2)
where z α2 is a value from normal table such that P (z > z α2 ) = α2 . For example, if α = 0.05,
then z α2 = z 0.05 = z0.025 = 1.96 (from Normal Table IV, Appendix B).
2
If σ is unknown, then replace it with s, the sample standard deviation.
Assumptions:
(1) The n observations in the sample were randomly selected from a population
(2) Large sample size (n ≥ 30)
Some common confidence intervals, the corresponding confidence coefficients and z values
are given in Table 7.1
Table 7.1: Confidence coefficient
(1 − α)
0.80
0.90
0.95
0.98
0.99
α
0.10
0.10
0.05
0.02
0.01
α
2
0.05
0.05
0.025
0.01
0.005
36
z α2
1.28
1.645
1.96
2.33
2.58
What do we mean by a 95% CI?
A 95% confidence interval is constructed according to a method such that 95% of all the
confidence intervals contain the true value of the population parameter and 5% of all intervals
do not contain the true value of the parameter. For example, you wish to estimate the
average height of the students of this class by an interval estimation. Suppose, you consider
100 random samples from this class, and you construct 100 confidence intervals, then you
would expect 95% of such intervals will contain the mean height (true mean or population
mean) of all students of this class. Remember, in real life we consider only one sample,
that’s why, we are 95% confident only does not necessarily mean that your interval will
contain or capture the population mean (true mean).
Example 7.3, page 281: The estimate of the mean number of unoccupied seats is x̄ = 11.6.
The margin of error is
σ
4.1
1.645 × √ = 1.645 × √
= 0.45
n
225
Conclusion: We are 90% confident that our estimate of 11.6 is within 0.45 of the true mean
number of unoccupied seats.
The 90% confidence limits for the true mean number of unoccupied seats is
4.1
σ
= 11.6 ± 0.45 = [11.15 , 12.05]
x̄ ± 1.645 × √ = 11.6 ± 1.645 × √
n
225
Conclusion: We are 90% confident that the true mean number of unoccupied seats (μ) will
lie between 11.15 to 12.05.
Exercise 7.4, page 283.
Exercise 7.10, page 284.
7.3
Confidence Interval for a Population Mean: Student’s t-Statistic
Basic Idea: If x is distributed as Normal with mean μ and standard deviation σ. Then
z=
x̄ − μ
√
σ/ n
is distributed as standard normal (z ∼ N (0, 1)) and the percentile points of this distribution
are presented in Table IV of Appendix B. However, if σ is unknown, we estimate it by sample
standard deviation, s and we have the following new formula,
t=
x̄ − μ
√ .
s/ n
37
Then t is distributed as Student’s t with (n − 1) degrees of freedoms. See Figure 7.7, page
287, for both normal (z) and Student’s t distribution (with 4 degrees of freedom) functions.
The percentile points of t distribution are presented in Table V (Appendix B, page 467) for
various degrees of freedoms.
Assumptions:
1. The n observations were randomly selected from a population.
2. Data are from a normal population.
The (1 − α)100% confidence interval (CI) for μ is: x̄ ± t α2 ,n−1 ×
√s
n
s
s
x̄ − t α2 ,n−1 × √ ≤ μ ≤ x̄ + t α2 ,n−1 × √ ,
n
n
(7.3)
where t α2 ,n−1 is the percentage point of the t distribution with (n − 1) degrees of freedom
such that p(tn−1 ≥ t α2 ,n−1 ) = α2 .
Example 7.4, page 288.
Example 7.5, page 290.
Exercise 7.26, page 293.
Exercise 7.30, page 294.
7.4
Large Sample Confidence Interval for a Population
Proportion
Assumptions:
(1) The n observations (measurements) in the sample were randomly selected from a binomial
population.
(2) Sample size is large, ie n ≥ 30.
Point Estimation of p:
Point Estimator:
p̂ =
total number of successes among n trials
x
=
n
n
Example 7.6, page 296.
Sampling Distribution of p̂
1. The mean of the sampling distribution of p̂ is p. That is
E(p̂) = p
38
2. The standard deviation of p̂ is
σp̂ =
3. By central limit theorem, p̂ ≈ N p,
pq
n
pq
,
n
q = 1−p
.
The 100(1 − α)% confidence interval for population proportion p is: p̂ ± z α2 ×
p̂ − z α2 ×
p̂q̂
≤ p ≤ p̂ + z α2 ×
n
p̂q̂
.
n
p̂q̂
n
(7.4)
Example 7.7, page 298.
Extra Example 1: A statistician is interested to estimate the proportion of female students
at FIU. He randomly sampled 200 students from a total of 32,000 students and found 88 are
female. (n=200 and N=32,000)
(a) Define the population of interest in the survey.
(b) Construct a 95% confidence interval for p, the population proportion of female students
at FIU.
(c) It is told by the administrator that 45% of the students at FIU are female. Does your
confidence interval constructed in (a) support the administrator’s claim? Why or why not?
Exercise 7.38, page 301.
Exercise 7.46, page 302.
39