Download 6 Sampling Distributions Statisticians use the word population to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
6 Sampling Distributions
Statisticians use the word population to refer the total number of (potential) observations
under consideration
The population is just the set of all possible outcomes in our sample space (chapter 3)
Therefore, a population may be finite (e.g. number of households in the US)
or (effectively) infinite (e.g. number of stars in the universe)
e.g. question: average number of TV sets per household in US
population: number of TV sets in each household in US
question: average number of TV sets per household in North America
population: number of TV sets in each household in Canada, US and Mexico
question: probability that a star has planets
population: number of planets per star for all stars (past, present, future) in all
galaxies in the universe
In answering questions (e.g. what is the mean, what is the variance, what is the probability)
for a given population, one seldom answers the questions using the entire population.
In practice the questions are answered from a subset (a sample) of the population.
it is important to choose the sample in a way that does not bias the answers
This is the subject of an area of statistics referred to as experimental design. (how to design
the sample such that you adequately reflect the entire population)
e.g. in determining the probability of getting a pair in a poker hand, you would not sample
only poker hands that contained two pairs.
(technically this would be an attempt to determine the probability P(pair) for the entire
population by approximating it by a conditional probability P(pair|two pair)
e.g. To determine the average length of logs moving on a conveyor belt at constant speed,
one might decide to measure only the logs that pass a certain point on the conveyor belt
every 10 minutes. Upon reflection, you realize that longer logs have a greater probability of
being at the measuring point at the selected times, thus the sample would give a biased
average length measure that would be too large.
e.g. to determine the expected lifetime of a tire, you only test it on smooth, paved roads?
e.g. to determine fuel rating on cars, the EPA presumes that every car is driven 55 percent of
the time in the city and 45 percent of the time on the highway!?
One way to ensure unbiased sampling is to ensure your subset is a random sample
Suppose our sample is to consist of n observations, π‘₯1 , π‘₯2 , … , π‘₯𝑛 . We have to select the first
observation π‘₯1 , the second π‘₯2 , etc. We think of the procedure for picking π‘₯π‘˜ as selecting a
value for a random variable π‘‹π‘˜ , that is, we think of picking values π‘₯1 , π‘₯2 , … , π‘₯𝑛 for our
sample as the process of picking values for n random variables 𝑋1 , 𝑋2 , … , 𝑋𝑛 . Using this
thinking, we can define a random sample as follows:
finite population: A set of observations 𝑋1 , 𝑋2 , … , 𝑋𝑛 constitutes a random sample of size n
from a finite population of size N, if values for the set are chosen so that each subset of n of
the N elements of the population has the same probability of being selected.
infinite population: A set of observations 𝑋1 , 𝑋2 , … , 𝑋𝑛 constitutes a random sample of size
n from the infinite population described by distribution (discrete) or density (continuous)
𝑓(π‘₯) if
1. each 𝑋𝑖 is a RV whose distribution/density is given by 𝑓(π‘₯)
2. the n RVs are independent
The phrase random sample is applied both to the RV’s 𝑋1 , 𝑋2 , … , 𝑋𝑛 and their values
π‘₯1 , π‘₯2 , … , π‘₯𝑛
How to achieve a random sample?
e.g. the population is finite (and relatively small)
Label each element of the population 1, 2, …, N. Draw numbers sequentially, in groups of n,
from a random digits table
When the population size is large or infinite, this process can become practically impossible,
and careful thought must be given to, at least approximate, random sampling design.
e.g. areal sampling using a regular grid
works if underlying population (e.g. chemical contaminant concentration) is
relatively homogenous. Doesn’t work if underlying population is spatially concentrated.
e.g. replicate sampling in β€œanomalous” areas
6.2 Sampling Distribution of the Mean
For each sample π‘₯1 , π‘₯2 , … , π‘₯𝑛 of n observations, we can compute a mean π‘₯ . The mean value
will vary with each of our samples. Thus we can think of the sample mean (mean value for
each sample) as a random variable 𝑋 obeying some distribution function 𝑓(π‘₯ ; 𝑛) The
distribution 𝑓(π‘₯ ; 𝑛) is referred to as the theoretical sampling distribution. We put aside for
the moment the question of the form for 𝑓(π‘₯ ; 𝑛) and note that, in chapter 5.10, we have
already computed the mean and variance for 𝑓(π‘₯ ; 𝑛) in the case of continuous RV’s.
Theorem 6.1:
If a random sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 of size n is taken from a population having mean πœ‡ and
variance 𝜎 2 , then 𝑋 is a RV whose distribution 𝑓(π‘₯ ; 𝑛) has:
infinite population
finite population
mean value E(𝑋) = πœ‡ and variance π‘‰π‘Žπ‘Ÿ 𝑋 =
mean value E(𝑋) = πœ‡ and variance π‘‰π‘Žπ‘Ÿ 𝑋 =
π‘βˆ’π‘›
𝜎2
𝑛
𝜎 2 π‘βˆ’π‘›
βˆ™
𝑛 π‘βˆ’1
Note: The appearance of the term π‘βˆ’1 for the variance of 𝑋 in the finite population case is
unexpected based upon the calculation in 5.10. The calculations in 5.10, when applied to a
finite population, assume that 𝑛 β‰ͺ 𝑁. This correction factor, called the finite population
correction (fpc) factor is included to account for cases in which 𝑛 ≲ 𝑁.
Note that the fpc factor =0 for 𝑛 = 𝑁. (i.e. π‘‰π‘Žπ‘Ÿ 𝑋 =0 when 𝑛 = 𝑁). This implies that, when
one sample is taken using the entire population, 𝑋 exactly measures the population mean
with no error (variance).
e.g. For N = 1,000 and n = 10, the fpc is
990
𝑓𝑝𝑐 =
= 0.991
999
Note that the results in Theorem 6.1 are independent of what 𝑓(π‘₯ ; 𝑛) may actually be !!!
Apply Chebyshev’s theorem to the RV 𝑋
𝑃
Let πœ– = π‘˜
𝜎
,
𝑛
i.e. k =
π‘›πœ–
,
𝜎
π‘‹βˆ’πœ‡ >π‘˜
𝜎
1
< 2 .
π‘˜
𝑛
giving
𝜎2
𝜎2 πœ–2
𝑃 π‘‹βˆ’πœ‡ >πœ– < 2 =
π‘›πœ–
𝑛
Therefore, for any (arbitrarily small but) non-zero value for πœ–, the probability that 𝑋 differs
from πœ‡ can be made arbitrarily small by making n large enough. (We need 𝑛 ≫ 𝜎 2 πœ– 2 ,
which means n must get very large as πœ– gets small).
This observation is known as the law of large numbers
(if you make the sample size large enough, a single sample is sufficient to give a value for π‘₯
arbitrarily close to the population mean.)
Theorem 6.2 Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample, each having the same mean value πœ‡ and
variance 𝜎 2 . Then for any πœ– > 0
𝑃 𝑋 βˆ’ πœ‡ > πœ– β†’ 0 as 𝑛 β†’ ∞
as the sample size gets large, the probability that the average from a single random sample
differs from the true mean goes to zero.
Again – this result on 𝑋 is independent of what 𝑓(π‘₯ ; 𝑛) may actually be.
e.g. In an experiment, event A occurs with probability p.
Repeat the experiment n times and compute
number of times A occurs in 𝑛 trials
𝑛
relative frequency of occurrence of A =
Show that the relative frequency of A β†’ 𝑝 as 𝑛 β†’ ∞
Consider each trial as an independent RV, 𝑋1 , 𝑋2 , … , 𝑋𝑛
Each 𝑋𝑖 takes on two values, π‘₯𝑖 = 0,1 depending on whether A does not or does occur in
experiment i.
𝑋𝑖 has mean value E 𝑋𝑖 = 0 βˆ™ 1 βˆ’ 𝑝 + 1 βˆ™ 𝑝 = 𝑝
and variance π‘‰π‘Žπ‘Ÿ 𝑋𝑖 = 𝐸 𝑋𝑖2 βˆ’ 𝐸 𝑋𝑖 2 = 02 βˆ™ 1 βˆ’ 𝑝 + 12 βˆ™ 𝑝 βˆ’ 𝑝2 = 𝑝(1 βˆ’ 𝑝)
Then 𝑋1 + 𝑋2 + β‹― + 𝑋𝑛 records the number of times A occurs in n trials, and
𝑋1 + 𝑋2 + β‹― + 𝑋𝑛
𝑋=
𝑛
is in fact the relative frequency of occurrence of A. From Theorem 6.2 we have
𝑝(1 βˆ’ 𝑝) πœ– 2
𝑃 π‘‹βˆ’π‘ >πœ– <
β†’ 0 for any 𝑝 ∈ [0,1] as 𝑛 β†’ ∞
𝑛
π‘‰π‘Žπ‘Ÿ(𝑋) ≑ πœŽπ‘‹ =
𝜎
𝑛
is referred to at the standard error of the mean.
To reduce the standard error by a factor of two, it is necessary to increase 𝑛 β†’ 4𝑛.
Thus (unfortunately) increasing sample size decreases the standard error at a relatively slow
rate. (e.g. if n goes from 25 to 2,500 (a factor of 100), the standard error decreases only by
10.)
While the results in Theorems 6.1 and 6.2 are independent of the form of the theoretical
sampling distribution/density 𝑓(π‘₯ ; 𝑛), the actual form for 𝑓(π‘₯ ; 𝑛) depends on knowing the
probability distribution which governs the population. In general it can be very difficult to
compute the form of 𝑓(π‘₯ ; 𝑛).
Two results are known – both presented as theorems.
Theorem 6.3 (central limit theorem) Let 𝑋 be the mean of a random sample of size n
taken from a population having mean πœ‡ and variance 𝜎 2 . Then the associated RV, the
standardized sample mean
π‘‹βˆ’πœ‡
𝑍≑
𝜎 𝑛
is a RV whose distribution function approaches the standard normal distribution as 𝑛 β†’ ∞
The central limit theorem says that, as 𝑛 β†’ ∞,
the theoretical sampling distribution 𝑓(π‘₯ ; 𝑛) β†’
a normal distribution (i.e. 𝑋 is normally
distributed) with mean πœ‡ and variance 𝜎 2 𝑛
The distribution 𝑓(π‘₯ ; 𝑛) of 𝑋 for samples
of size n for a population with
exponential distribution
The distribution 𝑓(π‘₯ ; 𝑛) of 𝑋 for samples
of size n for population with uniform
distribution
In practice, the distribution for 𝑋 is well
approximated by a normal distribution for n as
small as 25 to 30.
Practical use of the central limit theorem:
You have a population whose mean πœ‡ and standard deviation 𝜎 you assume that you know
(but whose density function 𝑓(π‘₯) you do not know). You sample the population with a
sample of size n. From the sample you compute a mean value π‘₯ . If the sample size is
sufficiently large the central limit theorem will tell you the probability of getting the value π‘₯
given your assumptions on the values of πœ‡ and 𝜎.
To test your assumption, compute the standardized sample mean z using the measured π‘₯
and assumed values πœ‡ and 𝜎. The central limit theorem states that the probability of getting
the value π‘₯ is the same as the probability of getting the z-score z in a standard normal
distribution.
Theorem (Normal populations) Let 𝑋 be the mean of a random sample of size n taken from
a population that is normally distributed having mean πœ‡ and variance 𝜎 2 . Then the
standardized sample mean
π‘‹βˆ’πœ‡
𝑍≑
𝜎 𝑛
has the standard normal distribution function regardless of the size of n. (i.e. 𝑓(π‘₯ ; 𝑛) for 𝑋 is
normal density with mean πœ‡ and variance 𝜎 2 /𝑛).
Practical use of this theorem:
You have a population whose distribution is (assumed to be) normal and whose mean πœ‡ and
standard deviation 𝜎 you assume that you know. You sample the population with a sample
of size n. From the sample you compute a mean value π‘₯ . This theorem will tell you the
probability of getting the value π‘₯ given your assumptions on normality and the values of πœ‡
and 𝜎.
To test your assumptions, compute the standardized sample mean z using the measured π‘₯
and assumed values πœ‡ and 𝜎. This theorem states that the probability of getting the value π‘₯
is the same as the probability of getting the z-score z in a standard normal distribution.
e.g. 1-gallon paint cans (the population) from a particular manufacturer cover, on average
513.3 sq. ft, with a standard deviation of 31.5 sq. ft. What is the probability that the mean
area covered by a sample of 40 1-gallon cans will lie within 510.0 to 520.0 sq. ft.
Find the standardized sample means for the two limits of the range:
510.0 βˆ’ 513.3
520.0 βˆ’ 513.3
𝑧1 =
= βˆ’0.66,
𝑧2 =
= 1.34
31.5 40
31.5 40
Assuming the central limit theorem, we have from Table 3
𝑃 510.0 < 𝑋 < 520.0 = 𝑃 βˆ’0.66 < 𝑍 < 1.34 = 𝐹 1.34 βˆ’ 𝐹 βˆ’0.66
= 0.9099 βˆ’ 0.2546 = 0.6553
6.3 The Sampling Distribution of the Mean when 𝝈 is unknown (usual case)
In 6.2 we discussed aspects of the distribution of the sample mean 𝑋 (it has a distribution
with mean πœ‡,variance 𝜎 2 𝑛 (for continuous RVs), and the related RV
π‘‹βˆ’πœ‡
𝑍≑
𝜎 𝑛
the standardized sample mean approaches the standard normal distribution as 𝑛 β†’ ∞).
In practice 𝜎 is not known and we have to deal with the values
π‘₯βˆ’πœ‡
𝑑≑
𝑠 𝑛
where s is the sample standard deviation 𝑠 = 𝑠 2 , and 𝑠 2 is the sample variance
𝑛
2
𝑖=1 π‘₯𝑖 βˆ’ π‘₯
2
𝑠 =
π‘›βˆ’1
Similar to 𝑋, we define the random variable 𝑆 2 called the sample variance
𝑛
2
𝑖=1 𝑋𝑖 βˆ’ 𝑋
2
𝑆 =
π‘›βˆ’1
2
which has values 𝑠 .
In this section and the next, we are interested in the behavior of t and 𝑆 2 thought of as
random variables.
Little is known about the behavior of the distribution for t when n is small unless we are
sampling from a population governed by the normal distribution (a β€œnormal population”)
Theorem 6.4 If 𝑋 is the sample mean for a random sample of size n taken from a normal
population having mean πœ‡, then
π‘‹βˆ’πœ‡
𝑑≑
𝑆 𝑛
is a random variable having the t distribution with parameter 𝑣 = 𝑛 βˆ’ 1.
Note: it is convention to use small β€˜t’ for the RV for the t distribution (breaking the
convention to use capital letters for the RV and small letters for its values). We will use small
β€˜t’ to stand for both the RV and its values.
The t distribution: a one-parameter family of RVs, with values defined on (βˆ’βˆž, ∞)
density function
𝑣+1
𝑣+1
βˆ’ 2
2
Ξ“
𝑑
2
𝑓 𝑑; 𝑣 =
1
+
𝑣
𝑣
π‘£πœ‹Ξ“
2
mean value 0 (for 𝑣 > 1), otherwise undefined
variance
𝑣
π‘£βˆ’2
(for 𝑣 > 2), ∞ for 1 < 𝑣 < 2, otherwise undefined
The t distribution is symmetric
about 0, and very close to the
standard normal distribution. In
fact the t distribution β†’ the
standard normal distribution as
𝑣 β†’ ∞.
The t distribution has β€œheavier”
tails than the standard normal
distribution (i.e. there is higher
probability in the tails of the t
distribution).
It is often referred to as
β€œstudent’s t distribution”
𝑣
𝑣
𝑣
𝑣
The parameter v in the t distribution is referred to as the (number of) degrees of freedom
(df)
Recall that the sum of the sample deviations π‘₯𝑖 βˆ’ π‘₯ is 0, hence only n ─ 1 of the deviations
are independent of each other. Thus the RVs 𝑆 2 and, by the same reasoning, t both have n ─
1 degrees of freedom.
Similar to the 𝑧𝛼 for the standard normal distribution, we define the 𝑑𝛼 for the t
distribution. Because of the symmetry of the standard normal and t distributions we have
𝑧1βˆ’π›Ό = βˆ’π‘§π›Ό ,
𝑑1βˆ’π›Ό = βˆ’π‘‘π›Ό
Recall that Table 3 lists values of the cumulative standard normal distribution 𝐹(𝑧) for
various values of z
In contrast, Table 4 lists values of 𝑑𝛼 for various values of 𝛼 and v.
(Recall, 𝛼 is the probability in the right-hand tail above 𝑑𝛼 )
By symmetry, the probability in the left-hand tail below βˆ’π‘‘π›Ό is also 𝛼.
Note that for 𝑛 β†’ ∞, 𝑑𝛼 = 𝑧𝛼
The standard normal distribution provides a good approximation to the t distribution for
samples of size 30 or more.
Practical use of theorem 6.4:
You have a population whose distribution is ( assumed to be) normal and whose mean πœ‡
you assume that you know (but whose standard deviation you do not know). You sample
the population with a sample of size n. From the sample you compute a sample mean value
π‘₯ and the sample standard deviation s. Theorem 6.4 will tell you the probability of getting
the values π‘₯ and s given your assumptions on normality and the value of πœ‡ .
To test your assumption, compute the standardized sample mean z using the measured π‘₯
and s and the assumed values πœ‡. Theorem 6.4 states that the probability of getting the value
π‘₯ , s is the same as the probability of getting the value t in a t distribution with 𝑣 = 𝑛 βˆ’ 1
e.g. a manufacturer’s fuses (the population) will blow in 12.40 minutes on average when
subjected to a 20% overload. A sample of 20 fuses are subjected to a 20% overload. The
sample average and standard deviation were observed to be, respectively, 10.63 and 2.48
minutes. What is the probability of this observation given the manufacturers claim?
𝑑=
10.63 βˆ’ 12.40
= βˆ’3.19,
𝑣 = 20 βˆ’ 1 = 19
2.48/ 20
From Table 4, for v = 19, we see that a t value of 2.861 already has only 0.5% probability
(𝛼 = 0.005) of being exceeded. Consequently there is less than a 0.5% probability that a t
value smaller than -2.861 will occur. Since the t value obtained in our sample of 20 is ─3.19,
we conclude that there is less than 0.5% probability of getting this result. We therefore
suspect that the manufacturers claim is incorrect, and that the manufacturers fuses will
blow in less than 12.40 minutes on average when subjected to 20% overload.
If the population is not normal, studies have shown that the distribution of
π‘‹βˆ’πœ‡
𝑆 𝑛
is fairly close to that of the t distribution as long as the population distribution is relatively
bell-shaped and not too skewed. This can be checked using a normal scores plot on the
population.
6.4 The Distribution of the Sample Variance π‘ΊπŸ
Theorem 6.5 Consider a random sample of size n taken from a normal population having
variance 𝜎 2 . Then the RV
𝑛
2
2
(𝑛
βˆ’
1)𝑆
𝑋
βˆ’
𝑋
𝑖
𝑖=1
2 ≑
=
𝜎2
𝜎2
has the chi-square distribution with parameter 𝑣 = 𝑛 βˆ’ 1
The chi-square distribution: a one-parameter family of RVs, with values defined on (0, ∞)
density function
𝑣
𝑣
1
βˆ’1
βˆ’
𝑓 π‘₯; 𝑣 = 𝑣
π‘₯2 𝑒 2
𝑣
22 Ξ“ 2
mean value v
variance
2v
𝑣
The chi-square distribution is just the gamma distribution with 𝛼 = 2 , 𝛽 = 2
Again, the parameter v is referred to as the (number of) degrees of freedom (df)
We define the 2𝛼 notation similar to that of 𝑧𝛼 and 𝑑𝛼 . Just as for Table 4, Table 5 lists
values of 2𝛼 for various values of 𝛼 and v.
𝑣
𝑣
𝑣
𝑣
𝑣
𝑣
e.g. (the population) glass β€œblanks” from an optical firm suitable for grinding into lenses
Variance or refractive index of glass is 1.26 βˆ™ 10βˆ’4 . Random sample of size 20 selected from
any shipment, and if variance of refractive index of sample exceeds 2 βˆ™ 10βˆ’4 , the sample is
rejected. What is probability of rejection assuming underlying population is normal?
For the measured sample of 20
20 βˆ’ 1 2 βˆ™ 10βˆ’4
≑
= 30.2
1.26 βˆ™ 10βˆ’4
From Table 5, for 𝑣 = 19, 30.2 corresponds to a value 𝛼 = 0.05. There is therefore a 5%
probability of rejected a shipment
2
Practical use of theorem 6.5:
You have a population whose distribution is ( assumed to be) normal and whose variance 𝜎 2
you assume that you know. You sample the population with a sample of size n. From the
sample you compute a sample variance 𝑠 2 . Theorem 6.5 will tell you the probability of
getting the value 𝑠 2 given your assumptions on normality and the value of 𝜎 2 .
To test your assumption, compute the chi square value 2 using the measured 𝑠 2 and the
assumed value 𝜎 2 . Theorem 6.5 states that the probability of getting the value 𝑠 2 is the
same as the probability of getting the value 2 in a chi square distribution with 𝑣 = 𝑛 βˆ’ 1
Recap
sample space
(N outcomes if finite)
sample 1
n outcomes
𝑦1 … 𝑦𝑛
values for RV
π‘₯1 … π‘₯𝑛
e.g. n k-dice
sums
sample 2
e.g. n throws
each of k dice
sample j
Think of each π‘₯𝑖 value as resulting from a RV 𝑋𝑖 such that
1. each 𝑋𝑖 has the same density 𝑓(π‘₯), mean πœ‡, and variance 𝜎 2
2. the 𝑋𝑖 are independent
random sample
The population of outcomes in the sample space generates values for the RVs
2
Each sample generates a sample mean π‘₯ and a sample variance 𝑠 =
𝑛
𝑖=1
π‘›βˆ’1
Think of the sample means and variances are values for the RVs 𝑋 and 𝑆 2
What are 𝐹 𝑋 , 𝐸 𝑋 , π‘‰π‘Žπ‘Ÿ 𝑋 , 𝐹 𝑆 2 , 𝐸 𝑆 2 , π‘‰π‘Žπ‘Ÿ 𝑆 2 ?
Chapter 5 states:
𝐸 𝑋 = πœ‡,
𝐸 𝑋 = πœ‡,
π‘‰π‘Žπ‘Ÿ 𝑋 = 𝜎 2 /𝑛 for an infinite population
π‘‰π‘Žπ‘Ÿ 𝑋 =
𝜎 2 π‘βˆ’π‘›
𝑛 π‘βˆ’1
for an infinite population
Chapter 6 addresses the questions on 𝐹 𝑋 , 𝐹 𝑆 2
Law of large numbers for a single sample (and single value of 𝑋)
𝜎2
𝑃 π‘‹βˆ’πœ‡ >πœ– < 2
π‘›πœ–
Central limit theorem
π‘‹βˆ’πœ‡
𝑍≑
𝜎 𝑛
is a RV whose distribution 𝐹 𝑍 β†’ standard normal 𝑁(0,1) as 𝑛 β†’ ∞
(i.e. 𝑋 is a RV whosedistribution 𝐹 𝑋 β†’ 𝑁(πœ‡, 𝜎) as 𝑛 β†’ ∞)
π‘₯𝑖 βˆ’π‘₯ 2
If the π‘Ώπ’Š are normally distributed with mean πœ‡ and variance 𝜎 2
π‘‹βˆ’πœ‡
𝑍≑
𝜎 𝑛
is a RV whose distribution 𝐹 𝑍 = 𝑁(0,1) for all n
i.e. 𝑋 is a RV whose distribution 𝐹 𝑋 = 𝑁(πœ‡, 𝜎) for all n
If the π‘Ώπ’Š are normally distributed with mean πœ‡
π‘‹βˆ’πœ‡
𝑑≑
𝑆 𝑛
is a RV whose distribution 𝐹 𝑑 is the t-distribution with df 𝑣 = 𝑛 βˆ’ 1
If the π‘Ώπ’Š are normally distributed with variance 𝜎 2
𝑛
2
2
(𝑛
βˆ’
1)𝑆
𝑋
βˆ’
𝑋
𝑖
𝑖=1
2 ≑
=
𝜎2
𝜎2
is a RV whose distribution 𝐹 2 is the chiβˆ’square distribution with df 𝑣 = 𝑛 βˆ’ 1
Assume we have two populations. We may wish to inquire whether they have the same
variance. Assume 𝑆12 and 𝑆22 are measured sample variances for each population.
Theorem 6.6 If 𝑆12 and 𝑆22 are measured sample variances of independent random samples
of respective sizes 𝑛1 and 𝑛2 taken from two normal populations having the same variance
then
𝑆12
𝐹= 2
𝑆2
is a RV having the F distribution with parameters 𝑣1 = 𝑛1 βˆ’ 1 and 𝑣2 = 𝑛2 βˆ’ 1.
The F distribution: a two-parameter family of RVs, with values defined on (0, ∞)
density function
𝑓 π‘₯; 𝑣1 , 𝑣2 =
mean value
variance
𝑣2
𝑣2 βˆ’2
1
𝑣1
𝑣2
𝑣 𝑣
𝐡 21 , 22
𝑣1
2
𝑣1
π‘₯ 2 βˆ’1
𝑣1
1+ π‘₯
𝑣2
𝑣 +𝑣
βˆ’ 12 2
for 𝑣2 > 2
2𝑣22 (𝑣1 +𝑣2 βˆ’2)
𝑣1 𝑣2 βˆ’2 2 (𝑣2 βˆ’4)
for 𝑣2 > 4
The F distribution is similar to the beta distribution. 𝐡
1
𝐡 π‘₯, 𝑦 =
0
𝑑 π‘₯βˆ’1 1 βˆ’ 𝑑
𝑣1 𝑣2
,
2 2
π‘¦βˆ’1 𝑑𝑑
is the beta function
F distribution
𝑣1
𝑣1
𝑣1
𝑣1
𝑣1
𝑣2
𝑣2
𝑣2
𝑣2
𝑣2
The parameter 𝑣1 is referred to as the numerator degrees of freedom (df of nunerator)
The parameter 𝑣2 is referred to as the denominator degrees of freedom (df of
denominator)
As with 𝑧𝛼 , 𝑑𝛼 , etc we define 𝐹𝛼 . Values of 𝐹𝛼 are given in Table 6 for various values of 𝑣1
and 𝑣2 for 𝛼 = 0.05 (Table 6(a)) and 𝛼 = 0.01(Table 6(b))
Practical use of theorem 6.6:
You have two population whose distribution are ( assumed to be) normal and whose
variances you assume to be equal. You sample population 1 with a sample of size 𝑛1 and
population 2 with a sample of size 𝑛2 . From each sample you compute sample variances 𝑠12
and 𝑠22 . Theorem 6.6 will tell you the probability of getting the ratio 𝑠12 𝑠22 given your
assumptions on normality and equality of variance.
To test your assumption, compute the value F. Theorem 6.6 states that the probability of
getting the ratio 𝑠12 𝑠22 is the same as the probability of getting the value F in an F
distribution with 𝑣1 = 𝑛1 βˆ’ 1, 𝑣2 = 𝑛2 βˆ’ 1.
e.g. Two random samples of size 𝑛1 = 7 and 𝑛2 = 13 are taken from the same normal
population. What is the probability that the variance of the first sample will be at least 3
times that of the second.
For 𝑣1 = 6 and 𝑣2 = 12, Table 6(a) shows an F value of 3.00 for 𝛼 = 0.05. Therefore there
is a 5% probability that the variance of the first sample will be at least 3 times that of the
second.
6.5 Representations of normal distributions
Defining new random variables in terms of others is referred to as a representation
chi-square
Let 𝑍1 , 𝑍2 , … , 𝑍𝑣 be independent standard normal RVs. Define the RV
𝑣
𝑍𝑖2
2𝑣 =
𝑖=1
Then 2𝑣 has a chi square distribution with v df
Thus we also see that the square of a standard normal RV is a chi-square RV
Let
𝑣1
𝑣1 +𝑣2
𝑍𝑖2 and 22 =
12 =
𝑖=1
𝑍𝑖2
𝑖=𝑣1 +1
where the 𝑍𝑖 are independent standard normal RVs (and thus 12 and 22 are independent of
each other). Then
12 + 22
has a chi square distribution with 𝑣1 + 𝑣2 df. Thus we see that the sum of two independent
chi square RVs is also a chi square RV with the sum of the individual df
t distribution
Let 𝑍 be a standard normal RV and 2 be a chi-square RV with v df. Assume 𝑍 and 2 are
independent. Then
𝑍
𝑑≑
2
𝑣
has a t distribution with 𝑣 df
F distribution
Let 12 and 22 be chi-square RVs with df 𝑣1 and 𝑣2 respectively. Assume 12 and 22 are
independent. Then
12 𝑣1
𝐹𝑣1,𝑣2 ≑ 2
2 𝑣2
has an F distribution with 𝑣1 , 𝑣2 df
Thus we see that
𝑑2
is a RV with an π‘­πŸ,𝒗 distribution
𝑍2 1
≑ 2
 𝑣
e.g. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be n independent normal RVs all having mean πœ‡ and standard deviation
𝜎. Then
𝑋𝑖 βˆ’ πœ‡
𝑍𝑖 =
𝜎
is a standard normal RV for each i. Then
1
𝑛𝑍 ≑ 𝑛
𝑛
𝑛
𝑖=1
1
𝑍𝑖 = 𝑛
𝑛
is also a standard normal RV. Consider
𝑛
𝑛
𝑍𝑖 βˆ’ 𝑍
𝑖=1
i.e.
2
𝑛
𝑖=1
𝑛
𝑍𝑖2 βˆ’ 2𝑍
=
𝑖=1
𝑛
𝑛
𝑍𝑖2 βˆ’ 𝑛𝑍 2
𝑍𝑖 + 𝑛𝑍 2 =
𝑖=1
𝑖=1
𝑛
𝑍𝑖2 =
𝑖=1
𝑋𝑖 βˆ’ πœ‡
π‘‹βˆ’πœ‡
=
𝜎
𝜎/ 𝑛
𝑍𝑖 βˆ’ 𝑍
2
+ 𝑛𝑍 2
𝑖=1
Note that the LHS is chi square distribution with 𝑛df. The last term on the RHS is chi square
with 1 df. This implies that the first term on the RHS is chi-square with 𝑛 βˆ’ 1df. Thus we see
that
𝑛
𝑛
2
(𝑛 βˆ’ 1)𝑆 2
𝑋
βˆ’
𝑋
𝑖=1 𝑖
=
=
𝑍𝑖 βˆ’ 𝑍 2
2
2
𝜎
𝜎
𝑖=1
has a chi square distribution with 𝑛 βˆ’ 1df (as claimed in Theorem 6.5)
Let 𝑋𝑖 be 𝑁(πœ‡π‘– , πœŽπ‘–2 ) for 𝑖 = 1, … , 𝑛 be n independent normal RVs
Then
𝑛
𝑋=
𝑋𝑖
𝑖=1
is normal with
𝑛
𝐸𝑋 =
𝑛
𝑒𝑖 ,
πœŽπ‘–2
π‘‰π‘Žπ‘Ÿ 𝑋 =
𝑖=1
𝑖=1
A sum of normal RVs is a normal RV
Let 𝑋𝑖 be a chi-square RV with df= 𝑣𝑖 for 𝑖 = 1, … , 𝑛 ; assume the 𝑋𝑖 are independent
Then
𝑛
𝑋=
𝑋𝑖
𝑖=1
is a chi-square RV with df
𝑛
𝑣=
v𝑖
𝑖=1
A sum of chi-square RVs is chi-square
Let 𝑋𝑖 be a Poisson RV with parameter λ𝑖 for 𝑖 = 1, … , 𝑛 ; assume the 𝑋𝑖 are independent
Then
𝑛
𝑋=
𝑋𝑖
𝑖=1
is a Poisson RV with parameter
𝑛
Ξ»=
λ𝑖
𝑖=1
A sum of Poisson RVs is Poisson