* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 9 Notes Answers
Survey
Document related concepts
Transcript
Chapter 9: Sampling Distributions 9.1: Sampling Distributions IDEA: How often would a given method of sampling give a correct answer if it was repeated many times? That is, if you took repeated samples (MANY repeated samples), how often would the sample reflect the true distribution of the population you are sampling from? This is the basis of statistical inference which we will study in future chapters. The purpose of this chapter is to prepare us to answer those questions. Parameter: a ______________ that describes a ______________________. A parameter has _______ (and only one) value—we just don't know what it is. The most important parameters for us are the __________________________ (μ) and population ________________ (p or π). P S Population Parameter Sample Statistic Statistic: a ______________ that can be ___________________ from a sample without making use of any unknown _____________________. In practice we will use _____________________ to establish unknown parameters. Sampling Distribution: the ______________________________________ of a __________________ is the distribution of the values taken by the _____________________ in all ______________________ samples of the same size from the same population. We can view our ____________________________________ as a __________________ _______________________. That is, we have NO WAY of predicting ________________ what value we will get from a _________________________________. EXAMPLE: (p.570/#5) Sampling Test Scores, I Let us illustrate the idea of a sampling distribution of x in the case of a very small sample from a very small population. The population is the scores of 10 students on an exam: Student # Score 0 82 1 62 2 80 3 58 4 72 5 73 6 65 7 66 8 74 9 62 The parameter of interest is the mean score in this population, which is 69.4. The sample is a SRS drawn from the population. Because the students are labeled 0 to 9, a single random digit from Table B chooses 1 student from the sample. (a) Use Table B to draw an SRS of size n=4 from this population. Write the 4 scores in your sample and calculate the mean x of the sample scores. This __________________ is an ________________ of the ___________________________________________________. Sample: _____ _____ _____ _____ sample mean = _____ ACT Scores: _____ _____ _____ _____ (b) Repeat this process 10 times. That is, you will take 10 more samples of size 4 and compute each sample’s mean. Make a histogram of the 10 values of x . You are constructing the sampling distribution of x . Is the center close to 69.4? x1 x6 x2 x7 x3 x8 x4 x9 x5 x10 (c) Ten repetitions give a very crude approximation of the sampling distribution. Now pool your data with that of other students. Use your calculator’s list functions to organize and sort the data, and to construct a new histogram of the sample means. Copy your histogram below. Describe the shape of the distribution. Is the center close to 69.4? Is this histogram a better approximation of the sampling distribution? Describe the histogram you just drew. Shape: Center: Spread: Unusual Values: Note: We have no way of knowing whether or not OUR STATISTIC is _____________ to the parameter we are trying to ______________________. We must be aware of _________ and ______________________. Unbiased Statistic/Unbiased Estimator: A statistic used to estimate a parameter is _______________ if the __________ of its sampling distribution is ______________ to the _______________________ of the parameter being estimated. The statistic is called an ______________________________________ of the parameter. Variability of a Statistic: the variability of a statistic is described by the _____________ of the sampling distribution. The spread is determined by the sampling _________________ and the ____________ of the sample. Larger samples give __________________ spread. If the population is much larger than the sample (at least 10 times as large), the spread of the _____________________________ is approximately the same for any _____________ size. So, the ______________ of a sampling distribution depends ONLY on _________________ and NOT on the size of the _____________________. This means that if a survey of, say, 1,200 people will have the same VARIABILITY (or margin of error) whether the population being sampled is the city of Fullerton or the entire United States. Although not always intuitive, this concept will be shown throughout this and future chapters. Suppose you wanted to estimate the distribution of colors of regular M&M’s. You decide to take a sample of M&M’s. As long as the M&M’s are well mixed, the sample doesn’t know whether it is coming from a single serving size bag of M&M’s, a Costco size bag of M&M’s, or a large bucket of M&M’s! If any one sample taken is a SRS, the variability of the result depends only on the size of the sample. OUR GOAL: we want to have ______________________ AND ________________________________. ______ Bias, ______ Variability ______ Bias, ______ Variability ______ Bias, ______ Variability ______ Bias, ______ Variability 9.2 Sample Proportions The objective of some statistical applications is to reach a conclusion about a population proportion, p. For example, we may try to estimate an approval rating through a survey, or test a claim about the proportion of defective light bulbs in a shipment based on a random sample. Since p is unknown to us, we must base our conclusion on a sample proportion, p̂ . However, we know that the value of p̂ will vary from sample to sample. The amount of variability will depend on the size of our sample. Our estimator is the proportion of success: pˆ count of " successes" in sample size of sample X n Note: the values of X and p̂ will vary in repeated samples, both X and p̂ are ______________________. Something to think about… Proportions are just another way of looking at counts. For example, I can talk about how many male students I have in this class, or I can talk about the proportion of males in the class. These are two different ways of looking at the same information. So don’t be too surprised if we find that much of what we learn about ______________________ is based on what we already know about _________________. Sampling Distribution of a Sample Proportion: Choose an SRS of size n from a large population with population proportion p having some characteristic of interest. Let p̂ be the proportion of the sample having the characteristic. Then: The __________ of the sampling distribution of p̂ is ______________ p. The __________________________________ of the sampling distribution of p̂ is p(1 p) n RULE OF THUMB #1: Use the recipe for standard deviation of p̂ ONLY when the population is at least _______________ as large as the sample; that is, when _________________. Where ____ is the size of the ________________ and ____ is the size of the _________________. Note: we will use this rule throughout the rest of the year whenever our interest is drawing a sample to make inferences about a population. We are interested in sampling only when the population is large enough to make taking a census impractical. RULE OF THUMB #2: Use the Normal approximation to the sampling distribution of p̂ for values of n and p that satisfy _____________ and ____________________. EXAMPLE: Based on Census data, we know 11% of US adults are Black. Therefore, p = 0.11. We would expect a sample to contain roughly 11% Black representation. Suppose a sample of 1500 adults contains 138 Black individuals. Should we suspect ‘undercoverage’ in the sampling method? Note, p̂ 138 1500 Is this lower than what would be expected by chance? That is, we know it is possible that a sample could contain 9.2% Black representation…but is it likely that would happen due to natural variation in a random sampling method? Check assumptions: Rule of thumb #1: Rule of thumb #2: Find the mean and standard deviation: Calculate the probability: Interpret in context: 9.3 Sample Means When the objective of a statistical application is to reach a conclusion about a population mean, µ, we must consider a sample mean, x . However, as we have noted, we know that the value of x will vary from sample to sample. The amount of variability will depend on n, the size of our sample. Mean and Standard Deviation of a Sample Mean: Suppose that x is the mean of a SRS of size n drawn from a large population with a mean deviation , then: and standard The __________ of the sampling distribution of x is: The ________________________ of the sampling distribution of x is: EXAMPLE: ACT Scores The scores of individual students on the American College Testing (ACT) composite college entrance examination have a Normal distribution with mean 18.6 and standard deviation 5.9. (a) What is the probability that a single student randomly chosen from all those taking the test scores 21 or higher? (b) Now take a SRS of 50 students who took the test. What are the mean and standard deviation of the average (sample mean) score of the 50 students? (c) Do your results depend on the fact that individual scores have a Normal distribution? (d) What is the probability that the mean score x of the students is 21 or higher? CENTRAL LIMIT THEOREM: Draw a SRS of size n from ________ population whatsoever with mean and finite standard deviation . When n is large, the _____________________________________ of the sample mean x is close to the _______________ distribution N , n with mean and standard deviation n . Note: the CLT discusses the ______________ (and only the shape) of the sampling distribution of x when n is sufficiently large. If n is not large, the shape of the distribution more closely resembles the shape of the original population. Thus, there are three situations to consider when discussing the shape of the sampling distribution: Shape of Population Shape of Sampling Distribution