Download Inferential Statistics Statistical inference is the branch of statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Statistical inference: example 1
Inferential Statistics
• A clothing store chain regularly buys from a supplier large
quantities of a certain piece of clothing. Each item can be
classified either as good quality or top quality. The agreements require that the delivered goods comply with standards
predetermined quality. In particular, the proportion of good
quality items must not exceed 25% of the total.
• From a consignment 40 items are extracted and 29 of these
are of top quality whereas the remaining 11 are of good
quality.
SAMPLE
POPULATION
Statistical inference is the branch of statistics concerned with
drawing conclusions and/or making decisions concerning a population based only on sample data.
INFERENTIAL PROBLEMS:
1. provide an estimate of π and quantify the uncertainty associated with such estimate;
2. provide an interval of “reasonable” values for π;
3. decide whether the delivered goods should be returned to the
supplier.
188
187
Statistical inference: example 2
Statistical inference: example 1
Formalization of the problem:
• A machine in an industrial plant of a bottling company fills
one-liter bottles. When the machine is operating normally
the quantity of liquid inserted in a bottle has mean µ = 1
liter and standard deviation σ =0.01 liters.
• POPULATION:
all the pieces of clothes of the consignment;
• VARIABLE OF INTEREST:
good/top quality of the good (binary variable);
• Every working day 10 bottles are checked and, today, the
average amount of liquid in the bottles is x̄ = 1.0065 with
s = 0.0095.
• PARAMETER OF INTEREST:
proportion of good quality items π;
INFERENTIAL PROBLEMS:
• SAMPLE:
40 items extracted from the consignment.
1. provide an estimate of µ and quantify the uncertainty associated with such estimate;
The value of the parameter π is unknown, but it affects the
sampling values. Sampling evidence provides information on the
parameter value.
189
2. provide an interval of “reasonable” values for µ;
3. decide whether the machine should be stopped and revised.
190
Statistical inference: example 2
The sample
Formalization of the problem:
• Census survey: attempt to gather information from each and
every unit of the population of interest;
• POPULATION:
“all” the bottles filled by the machine;
• sample survey: gathers information from only a subset of the
units of the population of interest.
• VARIABLE OF INTEREST:
amount of liquid in the bottles (continuous variable);
• PARAMETERS OF INTEREST:
mean µ and standard deviation σ of the amount of liquid in
the bottles;
• SAMPLE:
10 bottles.
Why using a sample?
1. Less time consuming than a census;
2. less costly to administer than a census;
The values of the parameters µ and σ are unknown, but they
affect the sampling values. Sampling evidence provides information on the parameter values.
3. measuring the variable of interest may involve the destruction of the population unit;
4. a population may be infinite.
191
192
Probability sampling
A probability sampling scheme is one in which every unit in the
population has a chance (greater than zero) of being selected in
the sample, and this probability can be accurately determined.
SIMPLE RANDOM SAMPLING: every unit has an equal probability of being selected and the selection of a unit does not
change the probability of selecting any other unit. For instance:
Probabilistic description of a population
• Units of the population;
• variable X measured on the population units;
• sometimes the distribution of X is known, for instance
• extraction with replacement;
(i) X ∼ N (µ, σ 2);
• extraction without replacement.
(ii) X ∼Bernoulli(π).
For large populations (compared to the sample size) the difference between these two sampling techniques is negligible. In the
following we will always assume that samples are extracted with
replacement from the population of interest.
193
194
Probabilistic description of a sample
Sampling distribution of a statistic (1)
• The observed sampling values are
x1 , x2 , . . . , xn ;
• Suppose that the sample is used to compute a given statistic,
for instance
• BEFORE the sample is observed the sampling values are
unknown and the sample can be written as a sequence of
random variables
(i) the sample mean X̄;
(ii) the sample variance S 2;
(iii) the proportion P of units with a given feature;
X1, X2, . . . , Xn
• generically, we consider an arbitrary statistic
• for simple random samples (with replacement):
T = g(X1, . . . , Xn)
where g(·) is a given function.
1. X1, X2, . . . , Xn are i.i.d.;
2. the distribution of Xi is the same as that of X
for every i = 1, . . . , n.
195
196
Normal population
Sampling distribution of a statistic (2)
• Suppose that X ∼ N (µ, σ 2)
• Once the sample is observed, the observed value of the statistic is given by
• in this case the statistics of interest are:
(i) the sample mean X̄ =
t = g(x1, . . . , xn);
• suppose that we draw all possible samples of size n from the
given population and that we compute the statistic T for
each sample;
• the sampling distribution of T is the distribution of the population of the values t of all possible samples.
n
1 X
Xi
n i=1
(ii) the sample variance S 2 =
n
1 X
(Xi − X̄)2
n − 1 i=1
• the corresponding observed values are
x̄ =
n
1 X
xi
n i=1
and
s2 =
n
1 X
(xi − x̄)2,
n − 1 i=1
respectively.
197
198
Expected value of the sample mean
The sample mean
The sample mean is a linear combination of the variables forming
the sample and this property can be exploited in the computation
of
• the expected value of X̄, that is E(X̄);
• the variance of X̄, that is Var X̄ ;
• the probability distribution of X̄.
For a simple random sample X1, . . . , Xn, the expected value of
X̄ is
X1 + X2 + · · · + Xn
E(X̄) = E
n
=
1
E (X1 + X2 + · · · + Xn)
n
=
1
[E(X1) + E(X2) + · · · + E(Xn )]
n
=
1
(n × µ)
n
= µ
200
199
Sampling distribution of the mean
Variance of the sample mean
For a simple random sample X1, . . . , Xn , the variance of X̄ is
Var X̄
= Var
X1 + X2 + · · · + Xn
n
=
1
Var(X1 + X2 + · · · + Xn )
n2
=
1
[Var(X1) + Var(X2) + · · · + Var(Xn )]
n2
=
1
(n × σ 2)
n2
=
σ2
n
For a simple random sample X1, X2, . . . , Xn , the sample mean X̄
has
• expected value µ and variance σ 2/n;
• if the distribution of X is normal, then
σ2
X̄ ∼ N µ,
n
!
• more generally, the central limit theorem can be applied to
state that the distribution of X̄ is APPROXIMATIVELY normal.
201
202
The sample variance
The chi-squared distribution
• The sample variance is defined as
S2 =
• Let Z1, . . . , Zr be i.i.d.
N (0; 1);
n
1 X
(Xi − X̄)2
n − 1 i=1
random variables with distribution
• the random variable
X = Z12 + · · · + Zr2
is said to follow a CHI-SQUARED distribution with r degrees
of freedom (d.f.);
• if Xi ∼ N (µ, σ 2) then
(n − 1)S 2
∼ χ2
n−1
σ2
• we write X ∼ χ2
r;
• E(X) = r and Var(X) = 2r.
• X̄ and S 2 are independent.
203
204
Counting problems
The sample proportion (1)
• The variable X is binary, i.e. it takes only two possible values; for instance “success” and “failure”;
• the random variable X takes values “1” (success) and “0”
(failure);
• the parameter of interest is π, the proportion of units in the
population with value “1”;
and
• the variable X takes two values: 0 and 1, and the sample
proportion is a special case of sample mean
Pn
Xi
P = i=1
n
• Formally, X ∼ Bernoulli(π) so that
E(X) = π
• Simple random sample X1, . . . , Xn;
• the observed value of P is p.
Var(X) = π(1 − π).
205
206
The sample proportion (2)
Estimation
• The sample proportion is such that
E(P ) = π
and
• Parameters are specific numerical characteristics of a population, for instance:
π(1 − π)
Var(P ) =
n
– a proportion π;
• for the central limit theorem, the distribution of X̄ is approximatively normal;
• sometimes the following empirical rules are used to decide if
the normal approximation is satisfying:
1. n × π > 5 and n × (1 − π) > 5.
– a mean µ;
– a variance σ 2.
• When the value of a parameter is unknown it can be estimated on the basis of a random sample.
2. np(1 − p) > 9.
207
208
Point estimation
Estimator vs estimate
A point estimate is an estimate that consists of a single value
or point, for instance one can estimate
• An estimator of a population parameter is
– a random variable that depends on sample information,
• a mean µ with the sample mean x̄;
– whose value provides an approximation to this unknown
parameter.
• a proportion π with a sample proportion p;
a point estimate is always provided with its standard error that
is a measure of the uncertainty associated with the estimation
process.
209
• A specific value of that random variable is called an estimate.
210
Point estimation of a mean (σ 2 known)
Estimation and uncertainty
• Parameter θ;
• the sampling statistics T = g(X1, . . . , Xn) on which estimation is based is called the estimator of θ and we write
θ̂ = T
• Consider the case where X1, . . . , Xn is a simple random sample from X ∼ N (µ, σ 2);
• Parameters:
– µ, unknown;
– assume that the value of σ 2 is known.
• the observed value of the estimator, t, is called an estimate
of θ and we write θ̂ = t;
• it is fundamental to assess the uncertainty of θ̂;
• a measure of uncertainty is the standard deviation of the
estimator, that is SD(T ) = SD(θ̂). This quantity is called
the
STANDARD ERROR of θ̂ and denoted by SE(θ̂).
211
b = X̄;
• the sample mean can be used as estimator of µ: µ
• the distribution of the estimator is normal with
b =µ
E(µ)
and
b =
Var(µ)
σ2
n
b
• STANDARD ERROR µ
σ
b = √
SE(µ)
n
212
Point estimation of a mean (σ 2 unknown)
Point estimation of a mean with σ 2 known: example
• Typically the value of σ 2 is not known;
In the “bottling company” example, assume that the quantity
of liquid in the bottles is normally distributed. Then a point
estimate of µ is
b = 1.0065
• µ
• in this case we estimate it as σ̂ 2 = s2;
• this can be used, for instance, to estimate the standard error
of µ̂
σ̂
d
SE(µ̂)
=√ .
n
• and the standard error of this estimate is
• In the “bottling company” example, if σ is unknown it can
be estimated as
0.01
σ
=√
= 0.0032
SE(µ̂) = √
10
10
d
SE(µ̂)
=
213
0.0095
√
= 0.0030
10
214
Point estimation of a proportion
• Parameter: π;
Point estimation for the mean of a non-normal population
• the sample proportion P is used as an estimator of π
b =P
π
• X1, . . . , Xn i.i.d. with E(Xi ) = µ and Var(Xi ) = σ 2;
• this estimator is approximately normally distributed with
• the distribution of Xi is not normal;
• for the central limit theorem the distribution of X̄ is approximatively normal.
b) = π
E(π
and
b) =
Var(π
π(1 − π)
n
• the STANDARD ERROR of the estimator is
s
π(1 − π)
n
and in this case the value of standard error is never known.
b) =
SE(π
215
216
Estimation of a proportion: example
Properties of estimators: unbiasedness
For the “clothing store chain” example the estimate of the proportion π of good quality items is
A point estimator θ̂ is said to be an unbiased estimator of the
parameter θ if the expected value, or mean, of the sampling
distribution of θ̂ is θ, formally if
b =
• π
11
= 0.275
40
E(θ̂) = θ
Interpretation of unbiasedness: if the sampling process was repeated, independently, an infinite number of times, obtaining in
this way an infinite number of estimates of θ, the arithmetic
mean of such estimates would be equal to θ. However, unbiasedness does not guarantees that the estimate based on one
single sample coincides with the value of θ.
• and an ESTIMATE of the standard error is
d
SE(π̂)
=
s
0.275(1 − 0.275)
= 0.07
40
217
218
Point estimator of the variance
The sample variance S 2 is an unbiased estimator of the variance
σ 2 of a normally distributed random variable
Bias of an estimator
Let θ̂ be an estimator of θ. The bias of θ̂, Bias(θ̂), is defined as
the difference between the expected value of θ̂ and θ
E(S 2) = σ 2.
On the other hand S̃ 2 is a biased estimator of σ 2
Bias(θ̂) = E(θ̂) − θ
The bias of an unbiased estimator is 0.
(n − 1) 2
E(S̃ 2) =
σ .
n
220
219
Properties of estimators: Mean Squared Error (MSE)
For an estimator θ̂ of θ the (unknown) estimation “error” is given
by
|θ − θ̂|
The Mean Squared Error (MSE) is the expected value of the
square of the “error”
M SE(θ̂) = E[(θ − θ̂)2]
Most Efficient Estimator
• Let θ̂1 and θ̂2 be two estimator of θ, then the MSE can be
use to compare the two estimators;
• if both θ̂1 and θ̂2 are unbiased then θ̂1 is said to be more
efficient than θ̂2 if
Var θ̂1 < Var θ̂2
• note that if θ̂1 is more efficient than θ̂2 then also M SE(θ̂1) <
M SE(θ̂2) and SE(θ̂1) < SE(θ̂2);
= Var θ̂ + [θ − E(θ̂)]2
= Var θ̂ + Bias(θ̂)2
Hence, for an unbiased estimator, the MSE is equal to the variance.
221
• the most efficient estimator or the minimum variance unbiased estimator of θ is the unbiased estimator with the smallest variance.
222
Interval estimation
Confidence interval for the mean of a normal population (σ known)
• A point estimate consists of a single value, so that
• X1, . . . , Xn simple random sample with Xi ∼ N (µ, σ 2);
– if X̄ is a point estimator of µ then it holds that
• assume σ known;
P (X̄ = µ) = 0
• a point estimator of µ is
σ2
µ̂ = X̄ ∼ N µ,
n
– more generally, P (θ̂ = θ) = 0.
• Interval estimation is the use of sample data to calculate
an interval of possible (or probable) values of an unknown
population parameter.
!
σ
• the standard error of the estimator is SE(µ̂) = √
n
224
223
Before the sample is extracted...
Confidence interval for µ
• The sample distribution of the estimator is completely known
but for the value of µ;
• the uncertainty associated with the estimate depends on the
size of the standard error. For instance, the probability that
µ̂ = X̄ takes a value in the interval µ ± 1.96 × SE is 0.95 (that
is 95%).
• The probability that X̄ belongs to the interval
(µ − 1.96 SE, µ + 1.96 SE)
is 95%;
• this can be also stated as: the probability that the interval
(X̄ − 1.96 SE, X̄ + 1.96 SE)
contains the parameter µ is 95%
area 95%
+1.96 SE
-1.96 SE
✛
µ − 3 SE
µ − 2 SE
µ − 1 SE
µ
µ + 1 SE
µ + 2 SE
✲
µ + 3 SE
µ
b
µ
P µ − 1.96 SE ≤ X̄ ≤ µ + 1.96 SE = 0.95
225
226
Confidence interval for µ with σ known: example
Formal derivation of the 95% confidence interval for µ (σ known)
In the bottling company example, if one assumes σ = 0.01
known, a 95% confidence interval for µ is
It holds that
X̄ − µ
∼ N (0, 1)
SE
where
σ
SE = √
n
so that
0.95 = P
−1.96 ≤
0.01
1.0065 − 1.96 √ ;
10
!
X̄ − µ
≤ 1.96
SE
"
= P −1.96 SE ≤ X̄ − µ ≤ 1.96 SE
"
= P X̄ − 1.96 SE ≤ µ ≤ X̄ + 1.96 SE
!
that is
= P −X̄ − 1.96 SE ≤ −µ ≤ −X̄ + 1.96 SE
"
0.01
1.0065 + 1.96 √
10
(1.0065 − 0.0062;
1.0065 + 0.0062)
so that
(1.0003;
1.0126)
227
228
After the sample is extracted...
a different sample...
b = x̄
On the basis of the sample values the observed value of µ
is computed. x̄ may belong to the interval µ ± 1.96 × SE or not.
For instance
A different sample may lead to a sample mean x̄ that, as in the
example below, does not belong to the interval µ±1.96×SE and,
as a consequence, also the interval (x̄ − 1.96 × SE; x̄ + 1.96 × SE)
will not contain µ.
area 95%
x̄
area 95%
µ − 3 SE
µ − 2 SE
µ − 1 SE
µ
µ + 1 SE
µ + 2 SE
x̄
µ + 3 SE
µ − 3 SE
and in this case x̄ belongs to the interval µ ± 1.96 × SE and, as
a consequence, also the interval (x̄ − 1.96 × SE; x̄ + 1.96 × SE)
will contain µ.
229
µ − 2 SE
µ − 1 SE
µ
µ + 1 SE
µ + 2 SE
µ + 3 SE
The interval (x̄ − 1.96 × SE; x̄ + 1.96 × SE) will contain µ for the
95% of all possible samples.
230
Interpretation of confidence intervals
Probability is associated with the procedure that leads to the
derivation of a confidence interval, not with the interval itself.
A specific interval either will contain or will not contain the true
parameter, and no probability involved in a specific interval.
Confidence interval: definition
A confidence interval for a parameter is an interval constructed
using a procedure that will contain the parameter a specified
proportion of the times, typically 95% of the times.
A confidence interval estimate is made up of two quantities:
Confidence intervals
for five different samples of size n = 25,
extracted from a normal population with
µ = 368 and σ = 15.
interval: set of scores that represent the estimate for the parameter;
confidence level: percentage of the intervals that will include
the unknown population parameter.
231
232
A wider confidence interval for µ
Confidence level
• Since it also holds that
The confidence level is the percentage associated with the interval. A larger value of the confidence level will typically lead
to an increase of the interval width. The most commonly used
confidence levels are
P µ − 2.58 SE ≤ X̄ ≤ µ + 2.58 SE = 0.99
area 99%
area 99%
x̄
• 68% associated with the interval X̄ ± 1 SE;
x̄
• 95% associated with the interval X̄ ± 1.96 SE;
• 99% associated with the interval X̄ ± 2.58 SE.
µ − 3 SE
µ − 2 SE
µ − 1 SE
µ
µ + 1 SE
µ + 2 SE
µ + 3 SE
µ − 3 SE
µ − 2 SE
µ − 1 SE
µ
µ + 1 SE
µ + 2 SE
• then the probability that (X̄ −2.58 SE, X̄ +2.58 SE) contains
µ is 99%.
233
µ + 3 SE
Where the values 1, 1.96 and 2.58 are derived from the standard
normal distribution tables.
234
Confidence interval for µ with σ known: formal derivation (1)
Notation: standard normal distribution tables
It holds that
X̄ − µ
∼ N (0, 1)
SE
• Z ∼ N (0, 1);
• α value between zero and one;
• zα value such that the area under the Z pdf between zα and
+∞ is equal to α;
and
so that
• formally
P (Z > zα) = α
σ
SE = √
n
where
P µ − zα/2 SE ≤ X̄ ≤ µ + zα/2 SE = 1 − α
P (Z < zα) = 1 − α
or, equivalently,
• furthermore
P (−zα/2 < Z < zα/2) = 1 − α
P
−zα/2 ≤
X̄ − µ
≤ zα/2
SE
!
=1−α
235
236
Confidence interval at the level 1 − α for µ with σ known
Confidence interval for µ with σ known: formal derivation (2)
X̄ − µ
≤ zα/2
−zα/2 ≤
SE
1−α = P
!
= P −zα/2 SE ≤ X̄ − µ ≤ zα/2 SE
= P −X̄ − zα/2 SE ≤ −µ ≤ −X̄ + zα/2 SE
= P X̄ − zα/2 SE ≤ µ ≤ X̄ + zα/2 SE
A confidence interval at the (confidence) level 1 − α, or (1 − α)%,
for µ is given by
X̄ − zα/2 SE;
X̄ + zα/2 SE
Since SE = √σn then
σ
X̄ − zα/2 √ ;
n
237
σ
X̄ + zα/2 √
n
!
238
Margin of error
• The confidence interval
Reducing the margin of error
σ
x̄ ± zα/2 × √
n
σ
M E = zα/2 × √
n
• can also be written as x̄ ± M E where
σ
M E = zα/2 × √
n
The margin of error can be reduced, without changing the accuracy of the estimate, by increasing the sample size (n ↑).
is called the margin of error.
• the interval width is equal to twice the margin of error.
240
239
Confidence interval for µ with σ unknown
The Student’s t distribution (1)
!
• For Z ∼ N (0; 1) and X ∼ χ2
r , independent;
• the random variable
Z
T =q
X/r
σ2
• X̄ ∼ N µ, √ ;
n
•
X̄ − µ
∼ N (0, 1);
SE
is said to follow a Student’s t distribution with r degrees of
freedom;
• in this case the standard error is unknown and needs to be
estimated.
σ
σ̂
d
SE(µ̂) = √
is estimated by SE(µ̂)
=√
where σ̂ = S
n
n
• and it holds that
X̄ − µ
d
SE
∼ tn−1
• the pdf of the t distribution differs from that of the standard
normal distribution because it has “heavier tails”.
normal
Student’s t
−4
−3
−2
−1
0
1
2
3
4
t1 and N (0; 1) comparison.
241
242
Confidence interval for µ with σ unknown
The Student’s t distribution (2)
• For r −→ +∞ the Student’s t distribution converges to the
standard normal distribution.
0.4
0.3
0.2
0.1
1
2
3
4
−tn−1,α/2 ≤
d
SE
≤ tn−1,α/2
!
d ≤ X̄ − µ ≤ t
d
= P −tn−1,α/2 SE
n−1,α/2 SE
d ≤ −µ ≤ −X̄ + t
d
= P −X̄ − tn−1,α/2 SE
n−1,α/2 SE
.........
..................................
........
......
......
......
.....
.....
......
......
......
.......
......
......
......
.......
.
......
......
......
......
......
.
..
......
......
......
........
......
....
......
......
......
.
..
......
......
......
........
......
....
......
......
......
.
..
......
....
......
.....
....
..
...
...
...
.
..
...
...
...
.
.
.
...
...
...
...
...
.
...
..
...
...
.
.
...
...
...
...
.
...
.
.
....
....
.....
....
......
.
......
.....
.......
........
.
........
.
.
.
........
.........
............
..............
.
.
.
..............
.
.
.
.
..
................
...............
.......................
.........................
.
.
.
.
.
.
.
.
.......................................... . . ...........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..............................................................
..................
........
−5 −4 −3 −2 −1 0
1−α = P
X̄ − µ
d ≤ µ ≤ X̄ + t
d
= P X̄ − tn−1,α/2 SE
n−1,α/2 SE
where tn−1,α/2 is the value such that the area under the t pdf,
with n − 1 d.f. between tn−1,α/2 and +∞ is equal to α/2.
5
Hence, a confidence interval at the level (1 − α) for µ is
t25 and N (0; 1) comparison
S
X̄ − tn−1,α/2 √ ;
n
S
X̄ + tn−1,α/2 √
n
!
243
244
Confidence interval for µ with σ unknown: example
For the bottling company example, if the value of σ is not known,
then
s = 0.0095
e
t9;0.025 = 2.2622
• X1, . . . , Xn i.i.d. with E(Xi ) = µ and Var(Xi ) = σ 2;
and a 95% confidence interval for µ is
1.0065 − 2.2622
0.0095
√
;
10
1.0065 + 2.2622
• the distribution of Xi is not normal;
0.0095
√
10
• for the central limit theorem the distribution of X̄ is approximatively normal;
!
• if one uses the procedures described above to construct a
confidence interval for µ the nominal confidence level of the
interval is only an approximation of the true confidence level.
that is
(1.0065 − 0.0068;
Confidence interval for the mean of a non-normal population
1.0065 + 0.0068)
so that
(0.9997;
1.0133)
245
246
Confidence interval for π
Confidence interval for π: example
For the central limit theorem
P −π
≈ N (0, 1)
SE
where
SE =
s
For the clothing store chain example, a 95% confidence interval
for π is
11
11
− 1.96 SE(π̂);
+ 1.96 SE(π̂)
40
40
so that
π(1 − π)
n
so that
P −π
1 − α ≈ P −zα/2 ≤
≤ zα/2
SE
= P −zα/2 SE ≤ P − π ≤ zα/2 SE
= P −P − zα/2 SE ≤ −π ≤ −P + zα/2 SE
= P P − zα/2 SE ≤ π ≤ P + zα/2 SE
s
s
π(1 − π)
π̂(1 − π̂)
d
SE(π̂) =
is estimated by SE(π̂)
=
n
n
and one obtains
11
11
− 1.96 × 0.07 ;
+ 1.96 × 0.07
40
40
so that
(0.137;
where
π̂ = x̄ =
0.413)
Since π is always unknown, it is always necessary to estimate the
standard error.
247
248
Statistical hypotheses
Example of decision problem
A decisional problem in expressed by means of two statistical
hypotheses:
Problem: in the example of the bottling company, the quality
control department has to decide whether to stop the production in order to revise the machine.
Hypothesis: the expected (mean) quantity of liquid in the bottles is equal to one liter.
The standard deviation is assumed known and equal to σ = 0.01.
The decision is based on a simple random sample of n = 10
bottles.
249
• the null hypothesis H0
• the alternative hypothesis H1
the two hypotheses concern the value of an unknown population
parameter, for instance µ,
(
H0 : µ = µ0
H1 : µ 6= µ0
250
11
40
Distribution of X̄ under H0
Observed value of the sample mean
If H0 is true (that is under H0) the distribution of the sample
mean X̄
The observed value of the sample mean is x̄.
• has expected value equal to µ0 = 1;
x̄ is almost surely different form µ0 = 1.
• if X1, . . . , X10 is a normally distributed i.i.d. sample than also
X̄ follows a normal distribution, otherwise the distribution
of X̄ is only approximatively normal (by the central limit
theorem).
under H0, the expected value of X̄ is equal to µ0 and the difference between µ0 and x̄ is uniquely due to the sampling error.
√
• has standard error equal to SE = σ/ 10 = 0.00316.
HENCE THE SAMPLING ERROR IS
x̄ − µ0
that is
0.9905
0.9937
0.9968
1.0000
1.0032
1.0063
observed value “minus” expected value
1.0095
251
Decision rule
252
Outcomes and probabilities
The space of all possible sample means is partitioned into a
There are two possible “states of the world” and two possible
decisions. This leads to four possible outcomes.
• rejection region also said critical region;
H0 TRUE H0 FALSE
• nonrejection region.
H0
Type I
IS
error
REJECTED
α
Type II
H0
IS NOT
REJECTED
OK
OK
error
β
The probability of the type I error is said significance level of the
test and can be arbitrarily fixed (typically 5%).
253
254
Test statistic
A test statistic is a function of the sample, that can be used to
perform a hypothesis test.
• for the example considered, X̄ is a valid test statistics, which
is equivalent to the, more common, “z” test statistic
Hypothesis testing: example
• 5% significance level (arbitrarily fixed);
• Z=
• Z=
X̄ − 1
∼ N (0, 1)
0.00316
1.0065 − 1
= 2.055;
0.00316
• the empirical evidence leads to the rejection of H0.
X̄ − µ0
√ ∼ N (0, 1)
σ/ n
• the observed value of Z is z =
255
256
z test for µ with σ known
p-value approach to testing
• the p-value, also called observed level of significance is the
probability of obtaining a value of the test statistic more extreme than the observed sample value, under H0.
• decision rule: compare the p-value with α:
– p-value < α
=⇒
reject H0
– p-value ≥ α
=⇒
nonreject H0
• X1, . . . , Xn i.i.d. with distribution N (µ, σ 2);
• the value of σ is known.
• Hypotheses:
• for the example considered
p-value=P (Z ≤ −2.055) + P (Z ≥ 2.055) = 0.04;
(
H0 : µ = µ0
H1 :
...
• test statistic:
Z=
X̄ − µ0
√
σ/ n
• p-value < 5% = statistically significant result.
• under H0 the test statistic Z has distribution N (0; 1)
• p-value < 1% = highly significant result.
257
258
z test: two-sided hypothesis
z test: one-sided hypothesis (right)
H1 : µ 6= µ0
H1 : µ > µ0
in this case
in this case
p − value = P (Z > |z|)
−3.5
−|z|
−1.0
0.0
1.0
|z|
p − value = P (Z > z)
3.0
3.5
−3.0
−2.0
−1.0
0.0
1.0
z
3.0
259
3.5
260
t test for µ with σ unknown
z test: one-sided hypothesis (left)
• Hypotheses:
H1 : µ < µ0
in this case
(
p − value = P (Z < z)
H0 : µ = µ0
H1 : µ 6= µ0
• test statistic:
t=
−3.5
z
−1.0
0.0
1.0
2.0
X̄ − µ0
√
S/ n
• p-value: P (Tn−1 > |t|) where Tn−1 follows a Student’s t distribution with n − 1 degrees of freedom.
3.0
261
262
z test for π
• Hypotheses:
Test for a proportion
(
• Null hypotheses: H0 : π = π0;
• Under H0 the sampling distribution of P is approximately
normal with expected value E(P ) = π0 and standard error
SE(P ) =
s
H0 : π = π0
H1 : π 6= π0
• test statistic:
π0 (1 − π0)
n
P − π0
Z=q
π0(1 − π0)/n
Note that under H0 there are no unknown parameters.
• P -value: P (Z > |z|)
263
z test for π: example
For the clothing store chain example, the hypotheses are
(
H0 : π = 0.25
H1 : π > 0.25
Hence, under H0 the standard error is
SE =
s
0.25(1 − 0.25)
= 0.068
40
so that
0.275 − 0.25
= 0.37
0.068
and the p-value is P (Z ≥ 0.37) = 0.36 and the null hypothesis
cannot be rejected.
z=
265
264