Download Chapter 18 Sampling Distribution Models

Chapter 4: Sampling Distribution Models Statistics that we calculate from data are functions of random variables and so have different distributions than the population that we drew them from. We will rely on the properties of sample means and variances that we learned earlier to derive the sampling distributions of some important statistics that we use to estimate population parameters. Sampling Distribution for Proportions If we are interested in estimating a population proportion, what would you guess would be a good estimator we could calculate from our data? That is, if we want to estimate the parameter p from a binomial, what would be a plausible sample statistic, p̂ ? Example: Suppose that 70% of all Florida adults approve of Bush’s handling of the situation in Iraq. Simulate a random sample of size n = 10 from this population using the table of random digits with a “random” starting place. Compute p̂ , the proportion of your sample who approve of Bush’s handling of the situation. Collecting the results of many repetitions of this simulation approximates the sampling distribution of p̂ for n = 10. With the computer, we can simulate thousands of random samples of size 10 or any other size. Compare the sampling distributions of p̂ for sample sizes 10, 25 and 100 – center, spread, and shape. The simulations verify the following results that can be proved theoretically about the sampling distribution of p̂ for simple random samples of size n from a population with proportion p: • The mean of the sampling distribution of p̂ is p. • The standard deviation of the sampling distribution of p̂ is • pq where q = 1 – p. n If n is large enough, then the sampling distribution of p̂ can be approximated by a normal model with mean p and standard deviation pq / n . The conditions under which the normal model can be used are the same as for the normal approximation to the binomial: when np ≥ 10 and nq ≥ 10. 2 The first two results follow from results in the previous two chapters and hold for any sample size n. They’re based on the fact that if the population size is large relative to the sample size (at least 10 times bigger or so), then selecting a random sample of size n can be modeled as n independent Bernoulli trials with probability of success p on each trial. Therefore, if we let X be the number of “successes” in the sample, then X has (approximately) a Binom(n,p) distribution. • What are E(X) and Var(X)? • Note that p̂ = X/n. Therefore, by the results in Chapter 4, what are E( p̂ ) and Var( p̂ )? By the previous result, how can the sampling distribution of p̂ be modeled in the example above (Bush’s handling of Iraq) if the true proportion who approve is 70% and the sample size is 100? (Be sure to check the 10% condition and the success/failure condition are satisfied). Use this model to approximate the probability that you will get p̂ greater than .5 in a sample size of 100. What does the 68-95-99.7 rule say for this sampling distribution? Notes • It’s only possible to simulate the sampling distribution of a sample proportion (or other statistic) for simple random samples or other probability samples. If the sample is not a probability sample (for example, it’s a sample of convenience), then it’s impossible to know how differently if it would have come out if the sampling method were repeated. • In practice, we don’t know p – that’s why we’re taking the sample. The results above depend on p. So how can these results help us in determining the accuracy of a sample proportion as an estimate of a population proportion? 3 Sampling Distribution of a Sample Mean What if we’re dealing with a quantitative variable and we want to estimate the population mean, µ? We estimate the population mean by the sample mean y . Again, an estimate without an indication of accuracy is not very useful. So we examine the sampling distribution of y to see how it varies from sample to sample around the true mean. When we say “sample”, we mean “simple random sample.” Example: Rolling a die n times is like taking an SRS from a large population with equal numbers of 1’s, 2’s, 3’s, 4’s, 5’s and 6’s. The mean of this “population” is 3.5 and the standard deviation is 2.92. Compare the sampling distributions of y for sample sizes 1-5 dice, for 1 to 10,000 rolls – center, spread, and shape. What will happen will 25 dice, 100 dice? See a nice applet at the following URL: http://www.stat.sc.edu/~west/javahtml/CLT.html Central Limit Theorem (the Fundamental Theorem of Statistics): the sampling distribution of the mean from simple random samples becomes more and more normal as the sample size n grows. This is true for any population, any quantitative variable. Which normal? • The mean of the sampling distribution of y is the population mean µ. • The standard deviation of the sampling distribution of y is σ / n • Notation: µ ( y ) = µ and σ ( y ) = SD ( y ) = σ / n . The results about the mean and standard deviation of the sampling distribution of y can be proved by the properties of expected value and variance of random variables from Chapter 4. They are true for any sample size. Putting the results above together: • As the sample size n grows, the sampling distribution of the sample mean tends toward a N(µ, σ / n ) distribution. Note: proportions are a special case of means since the proportion of successes can be thought of as the mean when successes are given the value “1” and failures the value “0”. How big does n need to be for the normal model to be a good approximation? It depends on the population distribution. If the population distribution is exactly normal, then the sampling distribution of y is exactly normal for any sample size. If the population distribution is extremely skewed, then the sampling distribution of y might not be well-approximated by the normal model until the sample size reaches 20 or 30 or more. If there are outliers, an even bigger sample size may be needed. 4 Example: Assume that the distribution of durations of human pregnancies follows a normal model with mean 266 days and standard deviation 16 days. a) What percentage of pregnancies last between 260 and 270 days? b) If an obstetrician has 50 patients, what’s the probability that the mean duration of their pregnancies will be between 260 and 270 days? What assumption must you make to make this calculation valid? c) Suppose the distribution of durations is not exactly normal but is skewed to the left. Is your calculation in a) still valid? Is your calculation in b) still valid?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 18 Sampling Distribution Models