Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CENTRAL LIMIT THEOREM Let be the mean of a sample of size from a population with an unknown distribution. When is relatively large, the sampling distribution is approximately normally distributed. The approximation becomes better as the sample size increases. Sampling distribution of when sampling from a normally distributed population. Let be the mean of a sample of size from a normally distributed population that has mean deviation For all sample sizes , the sampling distribution of : 1. Is exactly normally distributed 2. Is centered at , the mean of the population 3. Has a standard deviation of /√ , where is the standard deviation of the population. and standard Sampling distribution of when sampling from a general population distribution. Let be the mean of a sample of size from a population that has mean and standard deviation . When the sample size, , is sufficiently large, the sampling distribution of : 1. Is approximately normally distributed 2. Is centered at , the mean of the population 3. Has a standard deviation of ⁄√ , where is the standard deviation of the population In most cases, a sample size of 30 or more is sufficient. CENTRAL LIMIT THEOREM APPLIED FOR THE SAMPLE PROPORTION If the sample size, , is sufficiently large, then the sampling distribution of ̂ : 1. Is approximately normally distributed 2. Is centered at , the true proportion of successes in the population 3. Has a standard deviation of 1 / NORMAL APPROXIMATION OF THE BINOMIAL DISTRIBUTION When n becomes large, the binomial distribution can be reasonably approximated with a . For a normal distribution that has a mean of and a standard deviation of 1 good approximation, should be large enough that both 5 and 1 5. HdCLT.docx Estimators The distribution of the measurements in the original population is called the underlying or parent distribution. To describe the parent distribution we are concerned with three characteristics: 1. The general shape of the distribution (eg. bell shaped, symmetric, long tailed, uniform) 2. A measure of the center of the distribution (eg, mean, median, trimmed mean, midrange) 3. A measure of the variability of the distribution (eg. variance, standard deviation, range, IQR) A. The choice of parameters to summarize the center and variability depends on the shape of the parent distribution. There are several methods available to determine the shape of the distribution: • The stem and leaf plot and the histogram give the general appearance of the collected data, which assuming we have a representative sample, resembles the shape of the parent distribution. • The boxplot shows skewness and symmetry and identifies outliers. • Midsummary analysis confirms skewness or symmetry. If the middle 50% of the distribution is skewed right, will be farther from the median than . As a result the average of and (mid will be greater then than the median. Similarly, if the middle 75% of the distribution is skewed right, then the upper eighth will be farther from the median than the lower eighth, resulting in the mid being larger than the mid which is larger than the median. If successive midsummaries are progressively larger (smaller), then the distribution is skewed right (left), respectively. • The normal quantile plot checks for normality as well as identifies other unusual behavior. B. The next step after using the sample data to predict the shape of population distribution is to choose the appropriate parameters. For a symmetric distribution all natural measures of center (mean, median, trimmed mean) coincide. Any one of the three is a good measure of the center. To avoid confusion we identify the mean, , as the measure of center associated with a symmetric distribution. The most common parameter used to measure variability is the population standard deviation, . Because it is a measure of variation about the population mean, , it should be used when is the parameter used to measure the center of a distribution. , For a skewed distribution, in general, the median is the preferred measure of center and therefore the IQR, is the preferred measure of dispersion. C. The next step after choosing a parameter to describe a particular characteristic of the population distribution is to choose a statistic that will be reasonably close to the unknown value of the parameter. The statistic that we decide to use to estimate the parameter is called a point estimator. A point estimator of a parameter is a statistic whose values should be close to the true value of the parameter. The actual numerical value that the point estimator assumes from the the sample is called the point estimate. If the sampling distribution of an estimator has a mean equal to the parameter being estimated, then it is called an unbiased estimator of the parameter. In repeated sampling, an unbiased estimator will average out to equal the parameter in question. Using it will not result in a systematic overestimate or underestimate of the unknown parameter, as is the case with a biased estimator. is an unbiased estimator of . is an unbiased estimator of . is an unbiased estimator of . (But is not an unbiased estimator of .) (If the parent distribution is continuous and symmetric, then the sample median and the trimmed means are also unbiased estimators of .) The possible values of an unbiased estimator are centered at the parameter being estimated. But unless the standard deviation of the estimator is small, there is no guarantee that its value from a particular sample will be close to the parameter being estimated. The smaller the standard deviation, the more tightly clustered will be the potential values of the estimator about the unknown parameter. What we strive for is an unbiased estimator that has a standard deviation smaller than the standard deviation of any unbiased estimator. HdCLT.docx If the parent population is normal with mean and standard deviation , then the sample mean and the sample median are both unbiased estimators of , but has a smaller standard deviation. Under normality conditions of the population distribution it can be shown that has a smaller standard deviation than any other unbiased estimator of . (This is what is meant by the best estimator). If the parent distribution is not normal than the sample mean may not be the best estimator. For instance in symmetric distributions with long tails, the median is a better estimator of than is. In some applications we may not know whether we are sampling from a normal, a long tailed, or other type of distribution. We need a robust estimator (one that is insensitive to departures from normality). A robust estimator works well in a variety of population distributions. One robust estimator of the center of a distributon is the trimmed mean, . The choice of an estimator to estimate a particular parameter depends upon the shape of the underlying parent distribution and the properties of the estimator. We seek an estimator that is unbiased, has a small standard deviation , and is reasonably robust. Estimating the center of a distribution 1. The sample mean, , is the recommended estimator of the population mean when the parent distribution is normal, near normal, or symmetric with tails that are not excessively long. , is the recommended estimator of the population mean when the parent 2. A trimmed sample mean, distribution is symmetric with long tails. 3. The sample median is the recommended estimator of the population center when the parent distribution is skewed in either direction HdCLT.docx