Download Sampling distributions

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Central limit theorem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Ch04
Sampling Distributions
CHAPTER CONTENTS
• 4.1 Introduction ........................................................................................ 178
• 4.2 Sampling Distributions Associated with Normal Populations ............. 184
• 4.3 Order Statistics ................................................................................... 200
• 4.4 Large Sample Approximations ............................................................ 205
• 4.5 Chapter Summary ............................................................................... 210
• 4.6 Computer Examples ............................................................................ 211
• Projects for Chapter 4 ............................................................................... 215
4.1 Introduction
Sampling distributions play a very important role in statistical analysis and decision making.
Because a random sample is a set of random variables X1, . . ., Xn, it follows that a sample
statistic that is a function of the sample is also random.
We call the probability distribution of a sample statistic its sampling distribution.
Sampling distributions provide the link between probability theory and statistical inference.
a population distribution v.s. a sampling distribution
Population: Normal/Non-normal
 is the standard deviation of the population.
4.1.1 FINITE POPULATION
4.2 Sampling Distributions Associated with Normal Populations
Population: Normal
 X_bar is normal
Now we introduce some distributions that can be derived from a
normal distribution.
4.2.1 CHI-SQUARE DISTRIBUTION
The chi-square distribution is a special case of a gamma distribution with  = n/2 and  = 2.
n (a positive integer) : degree of freedom
 =n
2 = 2n
Ref: 3.2.5 Gamma distribution
S2 : sample variance
(n-1)S^2 / sigma^2 ~ Ch-sq (n-1)
4.2.2 STUDENT t-DISTRIBUTION
Let the random variables X1, . . ., Xn follow a normal distribution with mean  and variance 2.
If  is known, then we know that
n X    /  
is N(0,1).
If  is not known (as is usually the case), then it is routinely replaced by the sample standard
deviation s.
If the sample size is large, one could suppose that s  and apply the
Central Limit Theorem and obtain that
n X    / S is approximately
an N(0,1).

 
If the random sample is small, then the distribution of n X   / S
Is given by the so-called Student t-distribution (or simply t-distribution).
This was originally developed by W.S. Gosset in 1908. Because his employers, the
Guinness brewery, would not permit him to publish this important work in his own name,
he used the pseudonym “Student.” Thus, the distribution is known as the Student tdistribution.
In fact, the standard normal distribution
provides a good approximation to the tdistribution for sample sizes of 30 or more.
EXAMPLE 4.2.6
A manufacturer of fuses claims that with 20% overload, the fuses will blow in less than 10
minutes on the average.
To test this claim, a random sample of 20 of these fuses was subjected to a 20% overload,
and the times it took them to blow had the mean of 10.4 minutes and a sample standard
deviation of 1.6 minutes. It can be assumed that the data constitute a random sample
from a normal population.
Do they tend to support or refute the manufacturer’s claim?
EXAMPLE 4.2.7
The human gestation period—the period of time between conception and labor—is
approximately 40 weeks (280 days), measured from the first day of the mother’s last
menstrual period.
For a newborn full-term infant, the length appropriate for gestational age is assumed to be
normally distributed with  = 50 centimeters and  = 1.25 centimeters.
Compute the probability that a random sample of 20 infants born at full term results in a
sample mean greater than 52.5 centimeters.
4.2.3 F-DISTRIBUTION
The F-distribution was developed by Fisher to study the behavior of two variances
from random samples taken from two independent normal populations.
Question of interest:
whether the population variances are equal or not,
based on the response of the random samples.
4.3 Order Statistics
The extreme (i.e. largest) value distribution
4.4 Large Sample Approximations
If the sample size is large, the normality
assumption on the underlying
population can be relaxed.
A useful generalization of Corollary 4.2.2:
By Corollary 4.2.2, if the random sample came from a normal population,
then sampling distribution of the mean is normally distributed regardless of
the size of the sample.
By Theorem 4.4.1, regardless of the form of the population distribution, the
distribution of the z-transform of a sample mean X will be approximately a standard
normal random variable whenever n is large.
Even though the required sample size to apply Theorem 4.4.1 will depend on the
particular distribution of the population, for practical purposes we will consider the
sample size to be large enough if n30.
EXAMPLE 4.4.1
The average SAT score for freshmen entering a particular university is 1100 with a standard
deviation of 95. What is the probability that the mean SAT score for a random sample of 50 of
these freshmen will be anywhere from 1075 to 1110?
4.4.1 THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION
Because Y ?nX, by the Central Limit Theorem, Y has an approximate normal distribution with mean  = n and variance
2= np(1-p). Because the calculation of the binomial probabilities is cumbersome for large sample sizes n, the normal
approximation to the binomial distribution is widely used. A useful rule of thumb for use of the normal approximation to
the binomial distribution is to make sure n is large enough if np > 5 and n(1-p) > 5. Otherwise, the binomial distribution
may be so asymmetric that the normal distribution may not provide a good approximation.
Other rules, such as np10 and n(1-p) > 10, or np(1-p) > 10, are also used in the literature. Because all of these rules are
only approximations, for consistency’s sake we will use np > 5 and n(1-p) > 5 to test for largeness of sample size in the
normal approximation to the binomial distribution. If need arises, we could use the more stringent condition np(1-p) > 10.
CORRECTION FOR CONTINUITY FOR THE NORMAL APPROXIMATION TO THE BINOMIAL
DISTRIBUTION
EXAMPLE 4.4.2
A study of parallel interchange ramps revealed that many drivers do not use the entire length
of parallel lanes for acceleration, but seek, as soon as possible, a gap in the major stream of
traffic to merge. At one site on Interstate Highway 75, 46% of drivers used less than one third
of the lane length available before merging. Suppose we monitor the merging pattern of a
random sample of 250 drivers at this site.
(a) What is the probability that fewer than 120 of the drivers will use less than one third of
the acceleration lane length before merging?
(b) What is the probability that more than 225 of the drivers will use less than one third of
the acceleration lane length before merging?
4.5 Chapter Summary
4.6 Computer Examples (Optional)
Projects for Chapter 4