Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IV. Inferential Statistics A. Sampling Distributions In this section Unbiased Estimator with a Small Amount of Error Sampling Distribution of x Sampling Distribution of p̂ 1. Unbiased Estimator with a Small Amount of Error In order to understand the idea of a sampling distribution you have to be comfortable with some terminology that was introduced toward the beginning of the semester. The probability of error in inferential statistics comes from what we call the sampling distribution. Inferential Statistics – making inference about a population based on a sample Population – the group of interest or the set of all possible measurements Sample – a subset of the population Parameter – a numerical characteristic of the population Statistic – a numerical characteristic of the sample Since the population is often not available, we use statistics to estimate parameters. In order to discuss this idea, we will look at the sample mean, x . When dealing with measures other than the sample mean the equations change but the overall idea does not. It is really the process we use to estimate parameters based on statistics that is important to understand. In statistical application, we take a random sample from the population and compute a statistic, like x The value of the statistic x depends on which items are selected for the sample By putting the above two statements together you should understand that the statistic, is a random variable. Taking a random sample and calculating x is equivalent to randomly selecting a value for x out of all its possible values. x Sampling Distribution – the probability of a statistic over all possible samples We want the sampling distribution to be centered at the value of the parameter and to have little variation. Consider the graph below. In the graph there are three statistics that could be used to estimate the parameter, . Which one is best? Statistic 3 is the best choice in the graph above. The worst choice is statistic 1 which tends to always overestimate the parameter (the center of the distribution is right of the parameter). This is what a statistic looks like when it is biased. If you take a random sample, you will not end up with a biased statistic. This means your sampling distribution will be centered over the parameter of interest. When looking at statistic 2 you see there is a lot of variability. This is not good because the spread means there is a reasonable chance of getting a value far from the parameter. Notice with statistic 3 most of the outcomes are very close to the parameter we are estimating. This means that when using the statistic to estimate the parameter you can expect a small amount of error in the estimate. Hopefully, you remember that we can decrease this error by increasing the sample size. How this really works mathematically is that increasing your sample size decreases the variability in the sampling distribution of the statistic. 2. Sampling Distribution of x The statistic x (sample mean) estimates (population mean). You should understand that in this class we will only look at good statistics meaning that with proper techniques the sampling distribution will be centered with a small amount of variability. This is the case when looking at the sampling distribution for x . Let us look at a couple facts about the sampling distribution of x . x The average value of x across all possible samples is , the population mean. This should make since because we just said the sampling distribution of x is centered about the parameter it estimates which is . x n The standard deviation of the sampling distribution of x n is the population standard deviation divided by n . Notice that as increases the sample to sample variability in x decreases. If you increase the denominator, the overall quantity decreases. Therefore as has been stated previously, increasing sample size decreases the error when using a statistic to estimate a parameter. Standard Error – is the standard deviation of the sampling distribution ( x n is the standard error of x) Remember in section III. C. that if we have a normal distribution and when know the mean and standard deviation, then we can find the corresponding probability using the methods the transformation Z x . The important part here is that we take an observation and subtract the mean and divide by the standard deviation. We can use the same format for the sampling distribution. If our sample comes from a normal distribution with mean then Z x and standard deviation has a standard normal distribution. n Notice that in the transformation above we know the population is normally distributed. In reality when collecting data it is rare that this would ever be the case. So what can be done when the population is not normally distributed? This has been thoroughly researched by statisticians. The research has led to one of the most important theorems in statistics. Central Limit Theorem – if we sample from a population with mean deviation then Z x and standard is approximately standard normal for large n n Notice that the central limit theorem works when we do not have a normal distribution. Of course the question here becomes how large does n have to be in order for the central limit theorem to work? The answer is if n 30 or larger, the central limit theorem will apply in almost all cases. So for class purposes, this is the general rule we will use. Example 1 A population of soft drink cans has amounts of liquid following a normal distribution with 12 and 0.2 oz. 1. What is the probability that a single can is between 11.9 and 12.1 oz. 2. What is the probability that x is between 11.9 and 12.1 for n = 16 cans Example 2 A population of trees have heights with a mean of 110 feet and a standard deviation of 20 feet. Suppose a sample of 100 trees is selected and find the following. 1. 2. 3. 4. x x P( x 108 feet) What about P( X 108) ? 3. Sampling Distribution of p̂ Population Proportion p Sample Proportion p̂ pˆ p pˆ # in population with characteri stic # in population # in sample with characteri stic n p1 p n If we sample from a population with proportion p , then Z pˆ p p1 p n is approximately standard normal for large n Example 3 Suppose the president’s approval rating is 56% and we look at samples of size 100 p̂ Find p̂ 1. Find 2. Example 4 A survey of 1024 registered voters yields 512 plan to vote for the republican candidate p = proportion of all voters who plan to vote for the republican candidate 1. Calculate p̂ 2. Calculate the margin of error (Remember this is 1 n from section I. B.) 3. Calculate an estimate for the standard deviation of the sampling distribution 4. The empirical rule says that 95% of data should be within 2 standard deviations. Do you see where the margin of error comes from?