Download Lecture 7. Point estimation and confidence intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Law of large numbers wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Lecture 7. Point estimation and confidence intervals
Mathematical Statistics and Discrete Mathematics
November 23rd, 2015
1 / 18
Motivating example
• A producer of t-shirts for Chalmers wants us to provide them with the average
height of a male student.
• We do not have the resources to measure the height of every student.
• We choose 10 students at random, and get the following heights in cm
182, 171, 177, 174, 186, 183, 193, 172, 180, 181
• We compute the mean of these values and obtain
x̄ = (182 + 171 + . . . + 181)/10 = 179.9
• 179.9 is our estimate of the average height.
Note that our estimate needs not be the true average height of all students.
Disclaimer: I did not gather this data and I cannot guarantee its validity.
2 / 18
General framework of parameter estimation
• We want to study a numerical property possessed by members of a certain large
population, and it is impossible/impractical to gather data about the whole
population. (Note that the population may be hypothetical, e.g. the population of
all cellphone batteries - already produced batteries and batteries produced in the
future).
• The distribution of the property in the whole population is described by a
random variable X, whose characteristics/parameters, like the mean or variance,
we want to estimate/approximate.
• We choose a (relatively) small random sample of n members of the population.
We do it in such a way that the selection of one member does not influence the
selection of any other member.
• Before the actual choice, the property of the ith member is described by a
random variable Xi which has the same distribution as X, and the variables
X1 , X2 , . . . , Xn
are independent. We call these variables a random sample too.
3 / 18
General framework of parameter estimation
• By measuring the properties of the chosen members, we obtain n observed
values (numbers)
x1 , x2 , . . . , xn
of the random variables X1 , X2 , . . . , Xn . We call the observed values a random
sample too.
• We use the observed values to estimate the parameters of interest.
Note that we used the term random sample to denote three different notions: the
randomly selected members of the population, the collection of random variables
associated with these members, and the collection of the observed values of the
variables. The interpretation of the term is usually clear from the context.
4 / 18
Random sample and statistics
A random sample of size n from the distribution of X is a collection of n independent
random variables
X1 , X2 , . . . , Xn ,
each with the same distribution as X.
A statistic is a random variable whose value can be be computed from the values of
the random sample X1 , X2 , . . . , Xn .
Typical statistics of interest are:
Pn
X
i=1 Xi ,
Pn
X X = i=1 Xi /n (sample mean),
Pn
2
X
i=1 Xi ,
X maxi Xi ,
X mini Xi .
Note that if the mean µ of X is unknown, then, e.g. X − µ is not a statistic.
5 / 18
Estimators
Let θ be a parameter of the distribution of X. A statistics used to estimate θ is called
an estimator, and is denoted by θ̂. An estimate is the value of an estimator for a
particular collection of observed values of a random sample.
It is important to remember that an estimator is a random variable, and an estimate is
a number.
Examples of parameters are the mean value µ, variance σ 2 , standard deviation σ, and
parameters λ, p for Poisson, exponential, and binomial distributions, etc.
An estimator θ̂ is called unbiased if
E[θ̂] = θ.
The fact that an estimator is unbiased tells us that it is fluctuates around the right
value.
6 / 18
Sample mean
Let X1 , X2 , . . . , Xn be a random sample of size n from a distribution with mean µ.
Then, the statistic
Pn
Xi
X = i=1
n
is called the sample mean. X is an unbiased estimator for µ, and hence we can write
µ̂ = X.
Proof. We have to prove that E[X] = µ. Using linearity of expectation, we have
n
n
n
n
hX
i 1 hX
i 1X
1X
Xi /n = E
Xi =
E[X] = E
E[Xi ] =
µ = µ.
n
n
n
i=1
i=1
i=1
i=1
X The sample mean estimate (or the observed value for X) for the height of a
student from our first example is x̄ = 179.9 cm.
7 / 18
Sample mean
Even though we know that the sample mean is unbiased, this does not give us much
information about the accuracy of our estimates. It just tells us that we draw our
estimates from a distribution with the right mean value. A desirable property of an
estimator is that it has small variance for large sample sizes. Small variance implies
that our estimates will be precise with large probability.
Let X be the sample mean based on a random sample of size n from a distribution
with mean µ and variance σ 2 . Then,
Var[X] =
σ2
.
n
Proof. Since variance is additive for independent random variables, we can write
n
n
n
n
hX
i
hX
i
1
1 X
1 X 2
σ2
Var[X] = Var
Xi /n = 2 Var
Xi = 2
Var[Xi ] = 2
σ =
.
n
n
n
n
i=1
i=1
i=1
i=1
The above implies that the larger the sample size, the larger the probability that our
estimates are close to the true mean µ.
8 / 18
Sample variance
Let X1 , X2 , . . . , Xn be a random sample of size n from a distribution of X. Then, the
statistic
Pn
(Xi − X)2
S2 = i=1
n−1
is called the sample
variance. S2 is an unbiased estimator for σ 2 . Furthermore, the
√
2
statistic S = S is called the sample standard deviation.
Even though S2 is an unbiased estimator for σ 2 , S is not an unbiased estimator for σ.
9 / 18
Sample variance
Proof. We have to prove that E[S2 ] = σ 2 . Note that S2 does not change if we add a
constant to the variable X. This implies that it is enough to consider the case
µ = E[X] = 0. Recall that E[XY] = E[X]E[Y] for independent random variables X
and Y, and hence
n
n
1
1 h X i 1 X 1
σ2
Xi =
E Xj Xi = E[Xj2 ] = E[X 2 ] =
, and
E[Xj · X] = E Xj
n
n
n
n
n
i=1
i=1
n
n
1 h X i 1 X
1
σ2
σ2
E[X ] = E
Xi X =
E[Xi X] = · n ·
=
.
n
n
n
n
n
2
i=1
i=1
For µ = 0, we have
n
n
h 1 X
i
hX
i
1
2
E[S2 ] = E
(Xi − X)2 =
E
Xi2 − 2Xi X + X
n−1
n−1
i=1
=
1
n−1
n X
i=1
i=1
2
E[Xi2 ] − 2E[Xi X] + E[X ] =
n
1 X
1 2
1−
σ = σ2 .
n−1
n
i=1
10 / 18
Sample variance
It is usually more convenient to use the computational formula for the sample
variance given by
Pn
2
X 2 − nX
S2 = i=1 i
.
n−1
X The sample variance estimate (or the observed value of S2 ) for the height of a
student is
s2 =
1822 + 1712 + . . . + 1812 − 10 · (179.9)2
= 45.43.
9
X The sample standard deviation estimate (or the observed value of S) for the
height of a student is
√
s = s2 = 6.74 cm.
11 / 18
Sample variance
Proof of the computational formula. We have
n
S2 =
n
1 X
1 X 2
2
Xi − 2XXi + X
(Xi − X)2 =
n−1
n−1
i=1
n
i=1
n
X 1 X 2
2
=
Xi − 2X
Xi + nX
n−1
i=1
i=1
n
1 X 2
2
2
=
Xi − 2nX + nX
n−1
i=1
n
1 X 2
2
=
Xi − nX .
n−1
i=1
12 / 18
Interval estimation
• Point estimates provide us with a number estimate of the parameter that we want
to know but they do not contain any information about their accuracy.
• One way of trying to introduce accuracy quantification into our considerations is
to try to construct intervals that should contain the parameter of interest.
13 / 18
Confidence intervals
Let X1 , X2 , . . . , Xn be a random sample of size n from a distribution of X, and let L
and R be sample statistics such that L ≤ R. Let θ be a parameter of the distribution of
X. The random interval [L, R] is called a confidence interval for the parameter θ with
confidence level 1 − α (or 100%(1 − α)) if
P(L ≤ θ ≤ R) = 1 − α.
It is important to remember that θ is a parameter, and both L and R are random
variables.
14 / 18
Confidence intervals for normal variables with known σ 2
Let X be a normal random variable with known variance σ 2 , and let X1 , X2 , . . . , Xn be
a random sample of size n from the distribution of X. Let Z ∼ N (0, 1) be a standard
normal variable, and let zα/2 > 0 be such that
FZ (−zα/2 ) = α/2.
Then, the random interval [L, R], where
√
and
L = X − zα/2 σ/ n
√
R = X + zα/2 σ/ n
is a confidence interval for the true mean µ with confidence level 1 − α, that is
P(L ≤ µ ≤ R) = 1 − α.
15 / 18
Confidence intervals for normal variables with known σ 2
Proof. If X ∼ N (µ, σ 2 ), then by the property of the normal distribution
σ2 X ∼ N µ,
,
n
and hence
X−µ
√ ∼ N (0, 1)
σ/ n
is a standard normal variable. By the definition of zα , and since the distribution of Z is
symmetric, we have
X−µ
√ ≤ zα/2
1 − α = P(−zα/2 ≤ Z ≤ zα/2 ) = P − zα/2 ≤
σ/ n
√
√ = P − zα/2 σ/ n ≤ X − µ ≤ zα/2 σ/ n
√
√ = P X − zα/2 σ/ n ≤ µ ≤ X + zα/2 σ/ n .
Note that the assumption that we know the variance of the underlying distribution is
idealistic. However, the assumption that the distribution of the random sample is
(approximately) normal is very reasonable as we will see in the next lecture.
16 / 18
Confidence intervals for normal variables with known σ 2
Let us assume that the height of a student is distributed like a normal variable with
standard deviation 7 cm. Let us construct a 95% confidence interval for the mean
height using the data from the first example.
Using a table for the standard normal, we find that
z5/2 = z2.5 = 1.96
since
P(−1.96 ≤ Z ≤ 1.96) = 0.95.
Hence, using the formula from the slide before, the observed confidence interval with
confidence level 95% is [l, r], where
√
l = x̄ − z2.5 σ/ 10 = 179.9 − 1.96 · 7/3.162 = 175.56,
and
√
l = x̄ + z2.5 σ/ 10 = 179.9 − 1.96 · 7/3.162 = 184.23.
It is very important to understand that it is not correct to say that with 95% probability
the true average height of a student is in the interval [175.56,184.23]. It does not
make sense to talk about probabilities since µ, 175.56, 184.23 are fixed numbers and
not random variables.
17 / 18
Interpretation of confidence intervals
• The confidence level tells us about the accuracy of the whole procedure of
computing confidence intervals and not a single observed interval.
• If we keep repeating to construct 100%(1 − α) confidence intervals with
independently gathered data, the constructed intervals will contain the true
parameter in 100%(1 − α) cases on average.
• This implies that in α% cases on average, the constructed confidence interval
will not contain the true parameter.
18 / 18