Download Notes on the Chi-Squared Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Transcript
Notes on the Chi-Squared Distribution
October 19, 2005
1
Introduction
Recall the definition of the chi-squared random variable with k degrees of
freedom is given as
χ2 = X12 + · · · + Xk2 ,
where the Xi ’s are all independent and have N (0, 1) distributions. Also recall
that I claimed that χ2 has a gamma distribution with parameters r = k/2
and α = 1/2: Let
k/2 k/2−1 −αx
1
x
e
,
f (x) =
2
Γ(k/2)
where Γ(t) is the gamma function, given by
Γ(t) =
Z
∞
0
xt−1 e−x dx, for t > 0.
Then, we are saying that
P (χ2 ≥ a) =
Z
∞
a
k/2
1
2
xk/2−1 e−αx
dx.
Γ(k/2)
In this set of notes we aim to do the following two things:
1) Show that the chi-squared distribution with k degrees of freedom does
indeed have a gamma distribution;
2) Discuss the chi-squared test, which is similar to the one you have
already seen in class.
1
2
Proof that chi-squared has a gamma distribution
We first recall here a standard fact about moment generating functions:
Theorem 1. Suppose that X and Y are continuous random variables having
moment generating functions MX (t) = E(etX ) and MY (t) = E(etY ), respectively. Further, suppose that these functions exist for all t in a neighborhood
of 0, and that they are continuous at t = 0. Then,
P (X ≤ a) = P (Y ≤ a) for all a
⇐⇒
MX (t) = MY (t).
Note: Stronger version of this theorem are possible, but this is good enough
for our purposes.
We will not prove this theorem here, as it is long and technical. However,
let us now use it to determine the moment generating function for the chisquare distribution; first, we determine the m.g.f. for a gamma distribution:
Suppose Z is a random variable having a gamma distribution with parameters
r > 0 and α > 0. Then, it has pdf given by
g(x) =
α(αx)r−1 e−αx
, where x ≥ 0.
Γ(r)
Since it is a pdf, we know that
Z
∞
0
g(x)dx = 1,
regardless of the parameters α and r. Now, then, we have that MZ (t) is
given by
MZ (t)
=
E(etZ ) =
=
Z
=
Z
∞
etz g(z)dz
0
r r−1 −z(α−t)
e
dz
Γ(r)
0
Z ∞
αr
(α − t)((α − t)z)r−1 e−z(α−t)
dz
(α − t)r 0
Γ(r)
∞
αz
2
αr
1
(α − t)r
1
.
(1 − α−1 t)r
=
=
Notice here that we used the fact that the integral
Z
∞
0
(α − t)((α − t)x)r−1 e−x(α−t)
dx = 1,
Γ(r)
so long as t < α.
Also, notice that MZ (t) exists in a neighborhood of t = 0, namely the
neighborhood |t| < α.
Next, let us compute the moment generating function for the χ2 random
variable: We have that
Mχ2 (t)
2
=
E(eχ t )
=
E(e(X1 +···+Xk )t )
=
E(eX1 t · · · eXk t )
2
2
2
2
2
2
E(eX1 t ) · · · E(eXk t )
=
2
E(eX1 t )k .
=
The facts we have used here are that 1) The Xi ’s are independent, which im2
2
2
plies that the eXi t ’s are all independent, and so allows us to write E(eX1 t · · · eXk t )
2
2
as E(eX1 t ) · · · E(eXk t ); and, 2) The Xi ’s all have the same same distribution,
2
2
which gives E(eXi t ) = E(eX1 t ).
2
We now compute E(eX1 t ): We note that since X1 is N (0, 1), this expectation is
E(e
X12 t
)
=
=
=
=
Z
∞
e
−∞
x2 t e
−x2 /2
√
2π
dx
2
x
exp − 2(1−2t)
−1
√
dx
−∞
2π
s
x2
1 Z ∞ exp − 2(1−2t)−1
dx
√ q
1 − 2t −∞ 2π (1 − 2t)−1
Z
∞
s
1
.
1 − 2t
3
The last line was gotten here by realizing that the last integral is the integral
over the whole real line of a pdf for N (0, (1−2t)−1 ). Notice that this moment
generating function exists for |t| < 1/2.
Now, then, we have that
2
Mχ2 (t) = E(eX1 t )k =
1
1 − 2t
k/2
.
This moment generating function is the same as the one for a gamma distribution with parameters r = k/2 and α = 1/2 provided |t| < 1/2. So, our
theorem above then gives us that:
Theorem 2. If χ2 is a chi-squared random variable having k degrees of
freedom, and if Z is a random variable that obeys a gamma distribution
with parameters r = k/2 and α = 1/2, then we have that
P (χ2 ≥ a) = P (Z ≥ a).
Now recall that in the case where k is a positive even integer we get that
Γ(k/2) = (k/2 − 1)!, which gives us that
P (χ2 ≥ a) = P (Z ≥ a) =
k/2 Z
1
1
(k/2 − 1)! 2
∞
a
xk/2−1 e−x/2 dx.
Now, it was shown (or stated) in class that through a tedious integration-byparts calculation, this right-most expression equals the sum of probabilities
of a certain Poisson random variable. Specifically, let Y be a Poisson random
variable with parameter a/2. Then, we have that
k/2−1
P (χ2 ≥ a)
=
P (Z ≥ a) =
k/2−1
=
e
X
−a/2
j=0
X
P (Y = j)
j=0
(a/2)j
.
j!
In the case where k is odd, there exists a similar formula, but it is much
more involved. In our applications if k is odd we will either use a table lookup
or else approximate the chi-squared random variable by a normal distribution
via the Central Limit Theorem.
4
3
The chi-squared test statistic
We have seen one form of the chi-squared test already, which involved measurements of positions of objects that varied with time. In that case, we
supposed that an object had a given velocity v (in some fixed direction away
from the observer) and that at times t = 1, 2, ..., 6, a measurement of the
position was made. We are assuming that we can repeat the experiment of
measuring the particle many many times, and that p(t) is a random variable
giving the observed position of the particle at time t. Suppose we make the
following predicition: The acutal position of the particle is at time t is t,
and the discrepancy between the actual position and the observed position is
due to faulty measuring equipment. Further suppose that the error p(t) − t
is N (0, 1) for each of the times t = 1, 2, ..., 6. It is not a bad assumption
that this error has a normal distribution, since often errors in measurement
are the result of many little errors acting cumulatively on the observed value
(such as millions of air particles deflecting a laser beam slightly, where that
laser beam was used to find the position of an object).
We now want to test the hypothesis that “the actual position at time t
is t”, so we perform an experiment an obtain the following observed position
values:
p∗ (1) = 0.5, p∗ (2) = 1.5, p∗ (3) = 3.0, p∗ (4) = 3.0, p∗ (5) = 5.5, p∗ (6) = 6.0.
Now we compute
E = (p∗ (1) − 1)2 + (p∗ (2) − 2)2 + · · · + (p∗ (5) − 5)2 = 1.75;
that is, E is the sum of the square errors between the predicted location of
the object and the observed location of the object for times t = 1, 2, ..., 6.
Since we have assumed that p(t) − t is N (0, 1) it follows that
(p(1) − 1)2 + · · · + (p(6) − 6)2 is chi − squared with 6 degrees of freedom.
So, to see whether the hypothesis that “the actual position at time t is t” is
a good one, we compute
P (χ2 ≥ 1.75)
2
X
(1.75/2)j
j!
j=0
=
e−1.75/2
=
≈
0.41686202(1 + 0.875 + 0.3828125)
0.94.
5
Thus, there is about a 94% chance that one will get a sum-of-squares error
that is at least as big as our observed error E = 1.75. In other words, the
error we observed was actually quite small.
Another type of problem where a chi-squared distribution enters into
hypothesis testing is population sampling; indeed, this problem is one where
the chi-squared test statistic is absolutely critical in checking claims about a
population makeup. Here is the setup: Suppose you have a population that is
divided into k different categories. Further, you hypothesize that the percent
of individuals in the jth category is pj . Note that p1 + · · · + pk = 1. You
now wish to test this hypothesis by picking a large number N of individulas,
and checking to see which category they fall into. The expected number
of individuals in class j is ej = pj N ; and, suppose that the actual number
observed is Xj . Note that X1 + · · · + Xj = N .
Define the parameter
E =
k
X
(Xj − pj N )2
.
pj N
j=1
(1)
Then, E ≥ 0, and will be “large” if too many of the classes contain a number
of individuals that are far away from the expected number.
In order to be able to check our hypothesis that “pj percent of the population belongs to class j, for all j = 1, 2, ..., k”, we need to know the probability
distribution of E, and the following result gives us this needed information:
Theorem 3. For large values of N , the random variable E given in (1) has
approximately a chi-squared distribution with k − 1 degrees of freedom.
A natural question here is: Why only k − 1 degrees of freedom, why
not k? The reason is that the Xj ’s are not independent: We have that
any Xj is completely determined by the other k − 1 values Xi ’s through
X1 + · · · + X k = N .
We will not prove the above theorem, as its proof is long and technical,
but we will apply it here to a simple example:
Example: Suppose you read in a newspaper that likely voters in Florida
break down according to the following distribution: 40% will vote Republican, 40% will vote Democrat, 10% will vote Independent, 5% will vote Green,
and 5% will vote “other”. You decide to test this by doing a poll of your
6
own. Suppose that you ask 10,000 likely Florida voters which group they will
vote for, and suppose you receive the following data:
4,200 will vote Republican;
3,900 will vote Democrat;
1,000 will vote Independent;
700 will vote Green; and,
200 will vote “other”.
So, we have that
E
=
=
(4200 − 4000)2 (3900 − 4000)2
(700 − 500)2 (200 − 500)2
+
+0+
+
4000
4000
500
500
10 + 2.5 + 8 + 180 > 200.
Now, to test whether our conjectured probabilities for the types of likely
voters is correct, we select a parameter α, which is often taken to be 0.05.
Then, we ask ourselves: What is the probability that a chi-squared random
variable having k − 1 = 4 degrees of freedom has value ≥ 200; that is,
P (χ24 ≥ 200) = ?.
If this value of less than α = 0.05, then we reject the hypothesis on the
makeup of likely Florida voters.
After doing a table lookup, it turns out that
P (χ24 ≤ 200) = 1.000..., is very close to 1;
and so, the probability we seek is much smaller than α. So, we reject our
hypothesis on the makeup of likely Florida voters.
7