Download Sampling distribution of when sampling from a normally distributed

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
CENTRAL LIMIT THEOREM
Let be the mean of a sample of size from a population with an unknown distribution. When
is relatively large, the sampling distribution is approximately normally distributed. The
approximation becomes better as the sample size increases.
Sampling distribution of when sampling from a normally distributed population.
Let be the mean of a sample of size from a normally distributed population that has mean
deviation For all sample sizes , the sampling distribution of :
1. Is exactly normally distributed
2. Is centered at , the mean of the population
3. Has a standard deviation of /√ , where is the standard deviation of the population.
and standard
Sampling distribution of when sampling from a general population distribution.
Let be the mean of a sample of size from a population that has mean and standard deviation . When the
sample size, , is sufficiently large, the sampling distribution of :
1. Is approximately normally distributed
2. Is centered at , the mean of the population
3. Has a standard deviation of ⁄√ , where is the standard deviation of the population
In most cases, a sample size of 30 or more is sufficient.
CENTRAL LIMIT THEOREM APPLIED FOR THE SAMPLE PROPORTION
If the sample size, , is sufficiently large, then the sampling distribution of ̂ :
1. Is approximately normally distributed
2. Is centered at , the true proportion of successes in the population
3. Has a standard deviation of
1
/
NORMAL APPROXIMATION OF THE BINOMIAL DISTRIBUTION
When n becomes large, the binomial distribution can be reasonably approximated with a
. For a
normal distribution that has a mean of
and a standard deviation of
1
good approximation, should be large enough that both
5 and 1
5.
HdCLT.docx
Estimators
The distribution of the measurements in the original population is called the underlying or parent distribution.
To describe the parent distribution we are concerned with three characteristics:
1. The general shape of the distribution (eg. bell shaped, symmetric, long tailed, uniform)
2. A measure of the center of the distribution (eg, mean, median, trimmed mean, midrange)
3. A measure of the variability of the distribution (eg. variance, standard deviation, range, IQR)
A. The choice of parameters to summarize the center and variability depends on the shape of the parent
distribution. There are several methods available to determine the shape of the distribution:
• The stem and leaf plot and the histogram give the general appearance of the collected data, which assuming we
have a representative sample, resembles the shape of the parent distribution.
• The boxplot shows skewness and symmetry and identifies outliers.
• Midsummary analysis confirms skewness or symmetry. If the middle 50% of the distribution is skewed right,
will be farther from the median than . As a result the average of
and
(mid will be greater
then
than the median. Similarly, if the middle 75% of the distribution is skewed right, then the upper eighth will be
farther from the median than the lower eighth, resulting in the mid being larger than the mid which is
larger than the median. If successive midsummaries are progressively larger (smaller), then the distribution is
skewed right (left), respectively.
• The normal quantile plot checks for normality as well as identifies other unusual behavior.
B. The next step after using the sample data to predict the shape of population distribution is to choose the
appropriate parameters.
For a symmetric distribution all natural measures of center (mean, median, trimmed mean) coincide. Any one of the
three is a good measure of the center. To avoid confusion we identify the mean, , as the measure of center
associated with a symmetric distribution.
The most common parameter used to measure variability is the population standard deviation, . Because it is a
measure of variation about the population mean, , it should be used when is the parameter used to measure the
center of a distribution.
,
For a skewed distribution, in general, the median is the preferred measure of center and therefore the IQR,
is the preferred measure of dispersion.
C. The next step after choosing a parameter to describe a particular characteristic of the population distribution is
to choose a statistic that will be reasonably close to the unknown value of the parameter. The statistic that we
decide to use to estimate the parameter is called a point estimator.
A point estimator of a parameter is a statistic whose values should be close to the true value of the parameter. The
actual numerical value that the point estimator assumes from the the sample is called the point estimate.
If the sampling distribution of an estimator has a mean equal to the parameter being estimated, then it is called an
unbiased estimator of the parameter. In repeated sampling, an unbiased estimator will average out to equal the
parameter in question. Using it will not result in a systematic overestimate or underestimate of the unknown
parameter, as is the case with a biased estimator.
is an unbiased estimator of .
is an unbiased estimator of .
is an unbiased estimator of . (But
is not an unbiased estimator of .)
(If the parent distribution is continuous and symmetric, then the sample median and the trimmed means are also unbiased
estimators of .)
The possible values of an unbiased estimator are centered at the parameter being estimated. But unless the standard
deviation of the estimator is small, there is no guarantee that its value from a particular sample will be close to the
parameter being estimated. The smaller the standard deviation, the more tightly clustered will be the potential
values of the estimator about the unknown parameter. What we strive for is an unbiased estimator that has a
standard deviation smaller than the standard deviation of any unbiased estimator.
HdCLT.docx
If the parent population is normal with mean and standard deviation , then the sample mean and the sample
median are both unbiased estimators of , but has a smaller standard deviation. Under normality conditions of the
population distribution it can be shown that has a smaller standard deviation than any other unbiased estimator of
. (This is what is meant by the best estimator).
If the parent distribution is not normal than the sample mean may not be the best estimator. For instance in
symmetric distributions with long tails, the median is a better estimator of than is.
In some applications we may not know whether we are sampling from a normal, a long tailed, or other type of
distribution. We need a robust estimator (one that is insensitive to departures from normality). A robust estimator
works well in a variety of population distributions. One robust estimator of the center of a distributon is the
trimmed mean, .
The choice of an estimator to estimate a particular parameter depends upon the shape of the
underlying parent distribution and the properties of the estimator. We seek an estimator that is
unbiased, has a small standard deviation , and is reasonably robust.
Estimating the center of a distribution
1. The sample mean, , is the recommended estimator of the population mean when the parent distribution is
normal, near normal, or symmetric with tails that are not excessively long.
, is the recommended estimator of the population mean when the parent
2. A trimmed sample mean,
distribution is symmetric with long tails.
3. The sample median is the recommended estimator of the population center when the parent distribution is
skewed in either direction
HdCLT.docx