Download Student`s t Distribution (Also simply called t Distribution) Student is

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Student’s t Distribution (Also simply called t
Distribution)
Student is the pen name of the discoverer:
William Gosset.
Definition:
2
If Z N 0, 1 , W
, and Z, W are
independent, then we say that
Z
T
W/
has a t distribution with degrees of
freedom.
t
N 0, 1 when n
(They are
sufficiently close, when
30
See the slices on the web regarding
t-distribution:
Recall: If X i
by (2)
X
/ n
by (3)
n 1
2
i.i.d.
2
N ,
1, 2, . . . , n, then
, i
N 0, 1 ,
S2
2
1 , and
n
X and S 2 are independent.
X
/ n
Let Z
,W
n 1
2
S2,
n
1.
Since X and S 2 are independent, Z, W are
independent. We may obtain a t- distibtution:
T
X
/ n
Z
W/
n 1
2
X
/ n
X
/ n
S/
S2
2
X
S/ n
tn
1.
S 2 /n
1
X
/ n S
(4) T
X
S/ n
This statistic: T
with
When
tn
1.
X
S/ n
is similar to
X
/ n
, but
replaced by its estimator S.
2
is unknown, T
in making inference for .
X
S/ n
is very useful
Application of Student’s t Distribtuion
Example 2: An automobile manufacturer
wishes to estimate his new model’s millage
(miles per gallon). He decides to carry out a
fuel efficiency test. Six non-professional
drivers are randomly selected and each
drives a new-model car from St. Catharines
to Toronto. Assume the millage is normally
distributed with unknown mean and
unknown variance 2 , find the probability that
X will be within 2Sn of the true mean .
Given X i
i.i.d.
Ask: P |X
N ,
|
2
2S
n
, i
1, 2, . . . , 6.
?
Solution:
Let T
P |X
X
S
n
|
2S
n
P
|X
|
S
n
2
P |T |
According the definition of Student’s t
X
distribution, this T
t5 .
S
n
Looking at Table 5, P849, we see that:
P T 2. 015
0. 05
This implies P |T | 2. 015
0. 90.
P |T | 2 is slightly less than 0. 90.
Note:
If 2 was known, the we would use Normal
distribution rather than t distribution or
by emperical rule P |Z | 2
95%.
2
Another distribution related to the normal
distribution: F distribution:
Definition:
2
2
If W 1
1 , W2
2 , and W 1 , W 2 are
independent, then we say that
W1/ 1
F
W2/ 2
has an F distribution with 1 numerator
degrees of freedom and 2 denominator
degrees of freedom. Denote: F F 1 , 2 .
(An F random variable is defined as a ratio of
two independent chi-square random
variables.)
See the slices on the web regarding
F-distribution:
A practical example:
Suppose we have two independent random
samples:
Sample 1: X
1 i.i.d.
i
2 i.i.d.
i
1,
N
2
1
, i
1, 2, . . . , n 1 ;
N 2 , 22 , i 1, 2, . . . , n 2 ;
Sample 2: X
These two sample are independent with
sample variances S 21 , S 22 .
Then by (3) we have
n1 1 2
2
n1 1 ,
2 S1
1
n2 1
2
2
S 22
2
n2
1.
So we can construct an F-distributed statistic:
F
n1 1
2
1
n2 1
2
2
S 21 / n 1 1
S 22
S 22 /
S 22 / n 2 1
Specially, when
S 21
S 21 /
F n1
1, n 2
2
1
2
1
2
2
2
2,
1.
F n1
1, n 2
1.
(5) If there are two independent random
samples:
Sample 1: X
Sample 2: X
Then, F
1 i.i.d.
i
2 i.i.d.
i
S 21 /
S 22 /
2
1
2
2
N
1,
2
1
, i
1, 2, . . . , n 1 ;
N
2,
2
2
, i
1, 2, . . . , n 2 ;
F n1
1, n 2
1.
An application of F-distribution:
Two independent samples: n 1 6, , n 2
from two normal population with equal
population variances. Find b such that
S 21
b
0. 95.
p 2
S2
S 21
Solution: F
p
S 21
so p
S 21
S 22
b
F 5, 9 , and
S 22
1
b
S 22
10
p
S 21
S 22
b
0. 95,
0. 05. Look-up Table 7, P852:
b 3. 48.
Since the sample sizes are relatively small,
even two populations are having equal
variances, the probability that the ratio of their
sample variances exceed 3. 48 is still 5%.
Review:
Sampling Distributions
Related to the Normal Distribution
i.i.d.
N ,
If X i
Then, (1)
2
, i
1, 2, . . . , n.
i.i.d.
Xi
N 0, 1 ,
and
n
2
Xi
2
i 1
n
(2) with X
1
n
Xi,
i 1
X
X
/ n
2
N , n ,
N 0, 1 ;
n;
n
(3) with S 2
1
X 2,
Xi
n 1
i 1
n
n 1
2
S
2
Xi X
2
2
n
1 , and
i 1
X and S 2 are independent.
X
tn 1;
S/ n
(4) T
(5) For two independent samples with sample
variances S 21 , S 22 ,
S 21 /
F
S 22 /
2
1
2
1
2
2
2
2,
F n1
S 21
S 22
1, n 2
F n1
1 ; Specially, when
1, n 2
1.
Assumption of the Normal distribution is
required for (1) to (5).
For example, Result (2) (it is actually
Theorem 7.1) states:
If X i
i.i.d.
2
N ,
, i
1, 2, . . . , n.
n
with X
1
n
X i , then
i 1
X
Exactly
X
/ n
2
N , n ,
Exactly
N 0, 1 .
What if the assumption of the Normal
distribution is invalid?
Let’s take a look at a non-normal distribution:
Exponential distribution E 10 , for example.
fx
1
10
0,
e
x
10
, x
0;
otherwise.
Images from top: density of E(10), histograms
of some sample means, X, with sample sizes
of 5, 10, 120 respectively.
Central Limit Theorem (CLT) (THM 7.4):
i.i.d.
If X i
Any distribution with mean
variance 2
, i 1, 2, . . . , n.
and
n
1
n
with X
X i , then
i 1
Asymptotically
X
Namely, X
X
/ n
N ,
Asymptotically
X
/ n
Approximately
Approximately
2
, and
N 0, 1 .
n
2
N , n and
N 0, 1 , when n is sufficiently
large.
Compare to Result (2) in (THM 7.1):
If X i
i.i.d.
2
N ,
, i
1, 2, . . . , n.
n
with X
1
n
X i , then
i 1
X
Exactly
N ,
2
n
, and
X
/ n
Exactly
N 0, 1 .
Reading pages:
353-364; 370-373.
Example 7.9
The service times for customers coming
through a checkout counter in a retail store
are independent random variables with mean
1. 5 minutes and variance 1. Approximate the
probability that 100 customers can be served
in less than 2 hours of total service time.
Let X i be the service time for ith customer.
Note that X i has unknown distribution with
mean
1. 5, and variance 2 1.
However, we have a large sample n 100.
100
We want to find: P
Xi
i 1
120 minutes
?
According to CLT, X has an approximately
normal distribution:
Approx.
X
N 1. 5, 1 ,
100
X 1. 5 Approx. N 0, 1 .
Z
1
100
So we have,
100
Xi
100
P
Xi
120
P
P X
1. 5
i 1
PZ
i 1
100
1. 2
1
100
3
120
100
PX
1. 5
1
100
PZ
3
0. 00135.
Therefore, the probability that 100 customers
can be serviced in less than 2 hours is very
very small.
1. 2
Perhaps another cashier line is needed.
Assume the manager decided to add on more
line, with the same service speed, to make
this situation better. Assuming each service
line can share the service load equally. Then
what is the probability that 100 customers can
be serviced in less than 2 hours now?
Now the mean service time becomes
1. 5/2 0. 75 mins, assuming the variance
hasn’t changed. We shall have:
Approx.
X
N 0. 75, 1 ,
100
Approx.
X
0.
75
N 0, 1 .
Z
1
100
100
Xi
P
120
PX
P X
0. 75
1. 2
i 1
PZ
1. 2
1
100
4. 5
0. 75
1
100
1
PZ
4. 5
Surely the 100 customers would be served
within 2hrs.
1.
Normal approximation to other distributions
has been very useful in making inferences.
One of the important ones includes:
The Normal approximation to the Binomial
distribution.
Y  bn, p
Finding
b
n
p y i 1 − p n−y i  ?
PY ≤ b  ∑
yi
y i 0
Tables are available only for some sample
sizes: n  5, 10, 15, 20, 25 in your text.
Others cases are not available and their
calculation are tedious, especially for large
n.
Consider Y be the number of successes in
n trials, so it can be expressed as
n
Y
∑ Yi,
i1
where
Yi 
1, if the ith trial results in success;
otherwise.
0,
The distribution of Y i is Bernoulli
distribution with PY i  1  p, and
PY i  0  1 − p.
EY i   p, VarY i   p1 − p.
When n is large, according to CLT,
n
Y  1
n
n
∑ Y i  Ȳ
Approximately

p1 − p
Np,
.
n
i1
In addition, we can consider Y having
approximately normal distribution
Nnp, np1 − p.
Note that: this approximated distribution,
p1 − p
Y Approx.

Np,
,
n
n
is very important for making inference
regarding population proportions.
Example 7.10.
Candidate A believes that she can win a
city election if she can earn at least 55% of
the votes in Precinct 1. She also believes
that 50% voters of the whole city favor her.
If 100 voters show up to vote at Precinct 1,
what is the probability that she will receive
at least 55% of their votes?
Let Y be the number of voters at Precinct 1
who vote for her. We need to find
P Y
n ≥ 0. 55  ?
We assume the 100 voters at Precinct 1 be
a random sample from the city.
Y  b100, 0. 5.
Then, its normal approximation will be
0. 51 − 0. 5
Y Approximately

N0. 5,
,
n
100
Approximately
Y

N0. 5, 0. 0025.
i. e. n
Y −0.5
Y
≥ 0.55−0.5 
P n ≥ 0. 55  P n
0.0025
 P
Y
n
−0.5
0.0025
≥ 1 ≈ PZ ≥ 1
 15. 87%
(Or applying Empirical Rule,
 16%. )
≈ 1−68%
2
0.0025
How good can the Normal approximation
make to the Binomial distribution? Check
Example 7.11 (See P381).
From Y  b25, 0. 4, find the exact
probability that Y ≤ 8 and Y  8 and
compare these to the corresponding
values found by using its normal
approximation.
From Y  b25, 0. 4, and Table 1,
PY  8  PY ≤ 8 − PY ≤ 7
Exactly
 0. 274 − 0. 154  0. 12.
Let W be a random variable having a
normal distribution: Nnp, np1 − p
Approx.
Then Y  W ′ s distribution:
Nnp  10, np1 − p  6
PY  8 ≈ P7. 5 ≤ W ≤ 8. 5
P
7. 5 − 10 ≤ W − 10 ≤ 8. 5 − 10
6
6
6
 P−1. 02 ≤ Z ≤ −0. 61
 PZ ≤ −0. 61 − PZ ≤ −1. 02
 PZ ≥ 0. 61 − PZ ≥ 1. 02
 0. 2709 − 0. 1539  0. 1170
It is close to 0. 12 in the Binomial Table.
From Y  b25, 0. 4, and Table 1,
PY ≤ 8  0. 274;
Approx.
From Y  W ′ s distribution: N10, 6
PY ≤ 8 ≈ PW  8. 5  P W−10  8.5−10 
6
6
 PZ  −0. 61  PZ  0. 61  0. 2709.
It is close to 0. 274 in the Binomial Table.
Commonly used continuity correction:
If Y  bn, p, then
Approx.
Y 
PY ≤
PY ≥
PY 
W ′ s distribution: Nnp, np1 − p.
k ≈ PW  k  0. 5;
k ≈ PW  k − 0. 5;
k ≈ Pk − 0. 5  W  k  0. 5.
This approximation performs well even
when the sample size is moderately large.
One guideline is that this approximation
can be used whenever the sample size
satisfies:
Larger of p and 1 − p
.
n  9∗
Smaller of p and 1 − p
Reading Pages in Chapter 7:
P346-382.
Purpose of study: Making inference for a
population using sample data.
Making Inferences:
(1) Point Estimation;
(2) Interval Estimation;
(3) Hypothesis Testing.
Point Estimation
We mimic the point estimation with a
shooting competition.
Three players A, B, and C: Say the center
of the target is your population parameter
(population mean, for instance) that you
are estimating.
Each player shoots 5 times, and their
scores are recorded.
Which player gives the best performance?
A parameter– the target
An estimate– each shooting score
An estimator – a shooter
Which estimator is the best? How would
we judge them?
(1) the closeness to the center (target);
and
(2) with small variability.
An estimator– a rule or formula that tells
how to calculate the value of an estimate
based on sample observations.
A good estimator has to be (1) targeted at
the center and (2) with a small variance.
Let  be a population parameter, and ̂ is
an point estimator for . Note that: An
estimator is a statistic so its value changes
from one sample to another.
Property (1) means E ̂  .
(unbiasedness)
Property (2) means Var ̂ is small.
(small variance)
Unbiasedness:
If E ̂  , then ̂ is called an unbiased
estimator for . If E ̂ ≠ , then ̂ is said
to be biased.
Let B ̂ denote E ̂ − , then B ̂ is
called the bias of ̂ for estimating .
B ̂  E ̂ − .
When B ̂  0, ̂ is unbiased;
When B ̂  0, ̂ tends to over-estimate
;
When B ̂  0, ̂ tends to
under-estimate .
Another measurement for the performance
of an estimator is called Mean Square
Error (MSE):
The MSE of a point estimator ̂ is defined
as:
2
̂
̂
MSE   E  −  .
Show that MSE ̂
 Var ̂  B ̂
The first approach using
EY 2   VarY  EY 2 .
The second approach is as follows:
2
.
MSE ̂
 E ̂ − 
2
 E ̂ − E̂   E̂  − 
E
̂ − E̂  2
2 ̂ − E̂  E̂  − 
 E̂  − 
E
̂ − E̂ 
2
2
 2 E ̂ − E̂ 
E
2
E̂  − 
E̂  − 
2
 Var ̂  2 ∗ 0 ∗ B ̂  E̂  − 
 Var ̂  B ̂
2
.
2
As we know, for a normal population, two
most important estimators are:
(i) X̄ for population mean ;
(ii) S 2 for population variance  2 .
How good are they?
Example 1: Assume X 1 , X 2 , X 3 are
randomly selected sample from N(,  2 ).
X 1 2X 2
2
,
unbiased
(1) Are X̄ , X 1 , X 1 X
2
3
estimators for ?
(2) Which one is the best for estimating ?
n
(3) Let S 2∗ 
1
n
∑X i − X̄  2 . Are both S 2∗
i1
and S 2 (the sample variance
n
S2 
1
n−1
∑X i − X̄  2 ) unbiased for  2 ?
i1
(4) In this example, which one is better for
estimating  2 ?
2
MSES 2   VarS 2   n−1
4;
MSES 2∗   VarS 2∗   BS 2∗  2
2n − 1 4
1 4



n2
n2
 2n −2 1  4 .
n
Most commonly used unbiased point
estimators are:
Reading pages: P390-393, P396-399.
Techniques on "constructing an unbiased
estimator":
(i) If E ̂    a, namely ̂ is a biased
estimator for  and B ̂  a, then
E ̂ − a  . So, ̂ − a is unbiased for ;
(ii) If E ̂  k, k ≠ 0, and k ≠ 1, i.e. ̂ is a
biased estimator for , then E
̂
is unbiased for .
k
̂
k
 . So,
Reading pages: Chapter 9: P444-452.