* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 7 Blank Notes
Survey
Document related concepts
Transcript
Chapter 7 – Sampling Distributions & Central Limit Theorem DAY DATE CLASSWORK Section 7.1 - Population vs. Sample & 1 Parameter vs. Statistic + pg. 428 (#1 - 8) HOMEWORK Video #1: Sampling Distribution + SOCS Video #2: Biased vs. Unbiased Estimator + High/Low Bias/Variability Video #3: Sampling Distributions for Sample Proportions 2 Sampling Distribution Activity 3 Quiz 7.1 Worksheets (p. 8 – 9) 4 pg. 439-441 (#30, 34, 36, 38) Nothing!!! 5 Quiz 7.2 Worksheets (p. 11 – 14) 6 pg. 454-455 (#49, 51, 54, 56) Video #4: Sampling Distributions for Sample Means (non-CLT) Video #5: Sampling Distributions for Sample Means (CLT) 7 pg. 455-456 (#59, 61, 63) Finish problems from class if not completed 8 Quiz 7.3 Worksheets (p. 19 – 22) 5 problems on p. 23 – 24 in note packet 9 pg. 431 (#21 – 24) + pg. 441 (#43 – 46) + pg. 456 (#65 – 68) FRAPPY Day tomorrow!!! 10 FRAPPY DAY Try out the Practice Worksheets (p. 25 – 27) 11 Chapter Review Study for Test!!! 12 Chapter 7 Test Watch & take notes over Video #1 of Chapter 8!!! Chapter 7 kicks off second semester with a recurring theme that we will experience for the rest of the school year. The ideas contained in this chapter will be re-used directly and indirectly, so it is VERY IMPORTANT that you try your best to follow along and ask questions to ensure your complete understanding of sampling distributions. 1 Chapter 7 Topics Page Number 7.1 – Sampling Distributions [VIDEO #1]............................................................... 4 7.1 – Estimators + Bias & Variability [VIDEO #2] .................................................. 6 7.2 – Sampling Distributions for Sample Proportions [VIDEO #3] ....................... 10 7.3 – Sampling Distributions for Sample Means (w/o CLT) [VIDEO #4] .............. 15 7.3 – Sampling Distributions for Sample Means (w/ CLT) [VIDEO #5] ................ 16 2 7.1 – Population vs. Sample & Parameter vs. Statistic – [IN-CLASS] Ex #1: You want to know the mean income of the subscribers to a particular magazine. You draw a random sample of 100 subscribers and determine that their mean income is $27,500. What is the population? _____________________________________________________________ What is the population parameter of interest? ___________________________________________ What is the sample? ________________________________________________________________ What is the sample statistic? _________________________________________________________ Ex #2: You want to know how many students at CHS consume alcohol. You survey a random sample of 200 CHS students and conclude that 65% do not consume alcohol. What is the population? _____________________________________________________________ What is the population parameter of interest? ___________________________________________ What is the sample? ________________________________________________________________ What is the sample statistic? _________________________________________________________ 3 7.1 – Sampling Distribution Introduction [VIDEO #1] Inference: _____________________________________________________________________ When you can’t access the whole population, you should take a _______________ ( _____ ) from the population. Since we don’t know much of anything about our population (its SOCS), then we need a distribution that we can rely on…introducing the…_____________________ _____________________________. Sampling Distribution: ____________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ We do not need “many, many” samples thankfully. The MAIN thing we need is just _______ , _______ sample. Is sampling distribution gives valuable information ( _________ ) that we can use to determine whether a claim about the population is plausible or not plausible… There may be only _______ population distribution, but there are ___________ different sampling distributions!!! Let’s explore sampling distributions and some awesome properties!!! “S”hape: ________________________________ “C”enter: ________________________________ “S”pread: ________________________________ ****************************************************************************************************************************** “S”hape: ______________________________________ “C”enter: ______________________________________ “S”pread: ______________________________________ (two more sampling distributions on the next page!) 4 “S”hape: ______________________________________ “C”enter: ______________________________________ “S”pread: ______________________________________ ****************************************************************************************************************************** “S”hape: ______________________________________ “C”enter: ______________________________________ “S”pread: ______________________________________ Summarize what you see with the population distribution vs. the various sampling distributions. Shape: A sampling distribution will act more and more like an __________________ _______________ distribution as the sample size _____________, even when the population itself is not _________________ distributed. Center: Sampling distributions have essentially the _____________ mean as the population from which it is drawn. Spread: As you increase the sample size, the sampling distribution tends to be much less _____________________ or wild than the _______________________ from which it was drawn. Outliers: Since large sample sizes tend to average out ____________ observations, sampling distributions typically do not have any outliers. 5 7.1 – Bias and Variability [VIDEO #2] Can ANY sample statistic be used to estimate it’s population parameter? ______________________________ Some statistics produce too much ___________ or error to closely estimate the parameter of interest. This type of statistic is called a __________________ _____________________________. But if a statistic ______________ estimates a parameter with very little bias or error, then that statistic is called an _____________________ ________________________________. A statistic is ___________________ if the center of its sampling distribution is approximately the same as the population parameter. The center of this population distribution: THE mean: _____________________ The center of this sampling distribution: the mean of all the 𝑥̅ ’s: _____________________ Is the mean an unbiased estimator??? Conclusion…________ The center of this population distribution: The center of this sampling distribution: THE range: _____________________ the mean of all the sample ranges: ____________________________ Is the range an unbiased estimator??? Conclusion…________ 6 There are four situations regarding BIAS and VARIABILITY! 1. 2. High Bias Low Variability 3. Low Bias High Variability Match each histogram with one of the descriptions above. 4. High Bias High Variability Low Bias Low Variability Just because you are precise (low variability) does NOT mean you are accurate (low bias), too! Answer the last problems of the video here… 7 Quiz 7.1 Worksheet #1 1. For each description below, identify each underlined number as a parameter or statistic. THEN, use the appropriate symbol to describe each number, like 𝑝̂ = 96% or 𝑥̅ = 2.4 𝑜𝑧. (a) A 1993 survey conducted by the Richmond Times-Dispatch one week before election day asked voters which candidate for the state’s attorney general they would vote for. 37% of the respondents said they would vote for the Democratic candidate. On election day, 41% actually voted for the Democratic candidate. 37% is a _______________________. ________ = 37% 41% is a _______________________. ________ = 41% (b) The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 and the standard deviation is 15. The medical director of a large company looks at the medical records of 72 executives in this age group and finds that the mean systolic blood pressure for these executives is 126.07. 128 is a ______________________. __________ = 128 15 is a _______________________. __________ = 15 126.07 is a ____________________. _________ = 126.07 2. Suppose that in a certain community, 40% of the residents would answer “Yes” to the question “Do you know the names of at least five other people who live on your block?” Suppose you plan to take a random sample of 100 people from this community and calculate the proportion of people in your sample whose response to this question is “Yes”. (a) The proportion of residents in your sample of 100 people who would say “Yes” is the statistic. Describe the parameter of interest in this situation. (b) The statistic in this case is an unbiased estimator of the parameter. What does that mean? (c) Suppose that in a much larger community, 40% of the residents would also answer “Yes”. If you took a sample of 100 people from this much larger community, would the sampling distribution of the statistic be different? In what way? (d) If you took a sample of 50 people instead of 100 from the original community, would the sampling distribution of the statistic change? In what way? 8 3. The Fathom screen shot below shows the results of taking 500 SRSs of 10 temperature readings from a population distribution that’s N(50, 3) and recording the sample minimum each time. (a) Is the dotplot to the right the true sampling distribution of sample minimums? Explain. (b) Describe the approximate sampling distribution. (c) Suppose that the minimum of an actual sample is 40°F. What would you conclude about the thermostat manufacturer’s claim? Explain. 4. During World War II, 12,000 able-bodied male undergraduates at the University of Illinois participated in required physical training. Each student ran a timed mile. Their times followed the Normal distribution with mean 7.11 minutes and standard deviation 0.74 minute. An SRS of 100 of these students has mean time X = 7.15 minutes. A second SRS of size 100 has mean X = 6.97 minutes. After many SRSs, the values of the sample mean 𝑋̅ follow the Normal distribution with mean 7.11 minutes and standard deviation 0.074 minute. (a) Describe the population distribution X of all 12,000 able-bodied male undergraduates at U of I. (b) Describe the sampling distribution of 𝑋̅. How is it different from the population distribution? 9 7.2 – Sampling Distributions of Sample Proportions [VIDEO #3] Characteristics of the Sampling Distribution of Sample Proportions 1. “S”hape – The sampling distribution for sample proportions ( p̂ ) will be ______________ _____________ if the following condition is met: ________________________ and ____________________________. The larger the sample size, n, the closer the shape is in becoming approx. normal. 2. “C”enter – The mean of all possible sample proportions ( ____ ) is equal to the population proportion, ___. ______________ = ______ 3. “S”pread – The standard deviation of all possible sample proportions ( _______ ) is ______________________________ IF the following condition is met!!! Is the population at least _______________ as large as the sample??? This is referred to as the “independent condition” or the “10% condition”. Ex: One way of calculating the effect of undercoverage, nonresponse, and other sources of error in a sample survey is to compare the sample with known facts about the population. About 11% of Americans are teens. The proportion p̂ of teens in a SRS of 1500 Americans should therefore be close to 11%. It is unlikely to be exactly 11% because of sampling variability. If a national sample contains only 9.2% teens, should we suspect that the sampling procedure is somehow under representing this group? Find the probability that a sample contains no more than 9.2% teens when the population actually consists of 11% teenagers. 1. Calculate the mean and standard deviation of the proportion p̂ of the sample that are teens. 2. Calculate the probability that a sample contains no more than 9.2% teens when the population actually consists of 11% teenagers. 3. Interpret your results 10 Quiz 7.2 Worksheet #1 11 12 Quiz 7.2 Worksheet #2 13 14 7.3 – Sampling Distributions of Sample Means [VIDEO #4] Characteristics of the Sampling Distribution of Sample Means 1. “S”hape – The sampling distribution for sample means ( ____ ) will be ______________ _____________ for ANY sample size, n, IF the population distribution is also _________________ _______________. What if the population is NOT approximately normal?!?!? We will discuss that soon 2. “C”enter – The mean of all possible sample means ( ____ ) is equal to the population mean, ____. ______________ = ______ 3. “S”pread – The standard deviation of all possible sample means ( _______ ) uses the formula ______________________________ IF the following condition is met!!! Is the population at least _______________ as large as the sample??? This is referred to as the “independent condition” or the “10% condition”. Ex: A bottling company uses a filling machine to fill plastic bottles with soda. The bottles are supposed to contain 300 mL. In fact, the contents vary according to a normal distribution with mean, μ = 298 mL, and standard deviation, σ = 3 mL. a) What is the probability that the mean contents of six randomly selected bottles is less than 295 mL? b) Would the probability that the mean contents of ten randomly selected bottles being less than 295 mL be less than or greater than your answer to part a)? 15 7.3 – The Central Limit Theorem (CLT) [VIDEO #5] From Video #4… Situation #1: When the Population Distribution is (Approximately) Normally Distributed …then we can assume the sampling distribution of the sample means is also approx. normal. …………………………………………………………………………………………………… Now for Video #5… Situation #2: When the Population is Not Normally Distributed or not known altogether …then the sampling distribution CAN BE approximately normal IF we have a large ______________ _____________ ... all thanks to the __________________ _______________ _________________ !!! How large a sample size n is needed for sampling distribution of sample means to be close to approximately normal depends on the shape of the _____________________ _____________________. More observations are required if the shape of the population distribution is far from ________________, but we can safely call the sampling distribution approx. normal when we reach a sample size of _____’ish. WARNING!!! The CLT is used with sampling distributions for sample ______________ ONLY!!!! Ex: The number of lightning strikes on a square kilometer of open ground in a year has a mean of 6 and standard deviation of 2.4. The National Lightning Detection Network (NLDN) uses automatic sensors to watch for lightning in a random sample of 10 one-square-kilometer plots of land. a) What are the mean and standard deviation of the sampling distribution of the sample mean number of strikes per square kilometer? b) Explain why you cannot safely calculate the probability that the mean number of lightning ̅ ) is less than 5 based on a sample size of 10. strikes per square kilometer ( 𝒙 c) Suppose the NLDN takes a random sample of 50 square kilometers instead. Calculate the ̅ ) is less probability that the mean number of lightning strikes per square kilometer ( 𝒙 than 5. 16 The Central Limit Theorem (CLT) Summarized! If the distribution of the Population is Normal… Population Distribution n Shape Center 1 Normal μ = 64.5 in. (Example: Heights of Women) Spread Picture = 2.5 in 57 59.5 62 64.5 67 69.5 72 Height of Women (Selected one at a time) Sample Distribution n≥1 Also Normal! Also 64.5! x = 64.5 in. Conclusions Shape: Stays Normal!!! Much Less! 2.5 x n If n = 100 x 0.25 Center: Same!!! 57 59.5 62 64.5 67 69.5 72 Average Height of 100 Women Spread: Smaller!!! OR… If the distribution of the Population is NOT Normal… (Ex: Rolling a Single Die) Population Distribution n Shape Center Spread 1 Uniform μ = 3.5 = 1.71 Picture 1 2 3 4 5 6 Outcome of a Single Die Roll Sample Distribution n≥1 Becomes Normal! Also 3.5! x = 3.5 Much Less! 1.71 x n Average of 10 Die Rolls Conclusions Shape: Becoming Normal!!! Center: Same!!! Spread: Smaller!!! 17 CLT & SOCS The central limit theorem tells us that a sampling distribution always has significantly less wildness or variability, as measured by standard deviation, than the population it’s drawn from. Additionally, the sampling distribution will look more and more like normal distribution as the sample size is increased, even when the population itself is not normally distributed! Thanks to the central limit theorem, we can be sure that a mean or x-bar based on a reasonably large randomly chosen sample will be remarkably close to the true mean of the population. If we need more certainty we need only increase the sample size. As the Sample Size “n” Increases: Shape: becomes more and more approx. normal’ish Center: stays the same as the population! Spread: becomes less and less variable or spread out! 18 Quiz 7.3 Worksheet #1 19 20 Quiz 7.3 Worksheet #2 21 22 Chapter 7: Sampling Distribution Practice Problems 1. Suppose that 35% of all business executives are willing to switch companies if offered a higher salary. If a headhunter randomly contacts an SRS of 100 executives, what is the probability that over 40% will be willing to switch companies if offered a higher salary? 2. The average outstanding bill for delinquent customer accounts for a national department store chain is $187.50 with a standard deviation of $54.50. If a delinquent account were randomly chosen, what is the probability that it has an outstanding bill of over $200? 23 3. The average outstanding bill for delinquent customer accounts for a national department store chain is $187.50 with a standard deviation of $54.50. In an SRS of 50 delinquent accounts, what is the probability that the mean outstanding bill is over $200? 4. The average number of daily emergency room admissions at a hospital is 85 with a standard deviation of 37. In an SRS of 30 days, what is the probability that the mean number of daily emergency admission is between 75 and 95? 5. Given that 58% of all gold dealers believe next year will be a good one to speculate in South African gold coins, in an SRS of 150 dealers, what is the probability that between 55% and 60% believe that it will be a good year to speculate? 24 Practice Worksheet 7.1 . 25 Practice Worksheet 7.2 I flip a fair coin ten times and record the proportion of heads I obtain. I then repeat this process of flipping the coin ten times and recording the proportion of heads obtained many, many times. When done, I make a histogram of my results. 1. About where will the center of my histogram be? Use appropriate notation to describe this fact. 2. What is the standard deviation of the sampling distribution of the proportion pö of heads obtained? 3. Describe the shape of the sampling distribution of pö . Justify your answer. The Harvard College Alcohol Study finds that 67% of college students support efforts to “crack down on underage drinking.” The study took a sample of almost 15,000 students, so the population proportion whom supports a crackdown is very close to p = 0.67. The administration of a large college surveys an SRS of 100 students and finds that 62 support a crackdown on underage drinking. 4. What is the sample proportion who supports a crackdown on underage drinking? 5. If in fact the proportion of all students on your campus who support a crackdown is the same as the national 67%, what is the probability that the proportion in an SRS of 100 students is as small or smaller than the result of the administration’s sample? Be sure to check that any necessary rules of thumb are met. 26 Practice Worksheet 7.3 The weights of newborn children in the United States vary according to the normal distribution with mean 7.5 pounds and standard deviation 1.25 pounds. The government classifies a newborn as having low birth weight if the weight is less than 5.5 pounds. 1. What is the probability that a baby chosen at random weighs less than 5.5 pounds at birth? You choose three babies at random and compute their mean weight, x . 2. What are the mean and standard deviation of the mean weight x of the three babies? 3. What is the probability that their average birth weight is less than 5.5 pounds? 4. Would your answers to 1, 2, or 3 be affected if the distribution of birth weights in the population were distinctly nonnormal? 27 Practice Worksheet 7.1 Answers: 1. 2. (a) No, it’s just an approximation of a sampling distribution generated by simulating 200 sample means. The actual sampling distribution includes the means from ALL POSSIBLE SAMPLES of size 12 from the population – many more than the 200 values here. (b) Only 8 out of 200 (or 4%) of the sample means in our simulation are as far or farther above 150 pounds as our sample was. If the population mean is really 150 pounds, then our sample is unusual, and we should be suspicious about the manufacturer’s claim. Practice Worksheet 7.2 Answers: 1. The center of the sampling distribution will be p̂ = 0.5 2. The standard deviation of the sampling distribution will be p̂ = p(1 p) = n .5(.5) = 0.1581 10 3. The shape will be symmetric because p = 0.5, but because n is small it may not be normally distributed. 4. p =0.62 5. The population of all students at that college is most likely greater than 10 times the sample size (10 x 100 = 1000), so we can calculate the standard deviation. np10 because (100)(0.67) = 67 n(1-p) 10 because (100)(0.33) = 33, so we can use the normal approximation P( p 0.62) = Normalcdf(- ∞,0.62, 0.67,0.047) = 0.1438 Note: p̂ = (.67)(.33) 100 Practice Worksheet 7.3 Answers: 1. P(x < 5.5) = Normalcdf(-∞, 5.5, 7.5, 1.25) = 0.0548 2. x = 7.5 and x = n = 1.25 3 = 0.7217 3. P( x < 5.5) = Normalcdf(-∞, 5.5, 7.5, 0.7217) = 0.0028 4. The answers to numbers 1 and 3 would be affected if the population were distinctly non-normal because the CLT only assures normality for large sample sizes. Since we are doing “normal” calculations, we need a larger sample size to assure the sample distribution becomes normal. 28 Preparing for Your Chapter 7 Test How to identify a parameter and a statistic from the context of the situation. How to find the mean of a sampling distribution (as long you have a SRS the mean of the sampling distribution should equal that of the population) - Know the proper notation. How to calculate the standard deviation of a sample mean and sample proportion (know the formulas and the proper notation) The exact definition of the important terms in the chapter such as: Sampling Distribution of a Statistic, Unbiased Estimator, Variability of a Statistic, etc. That the size of the sample is what impacts the spread (sampling variability) of the distribution. The population size does NOT affect spread (as long as the population is at least 10x the sample size). How to use and apply the Rule of Thumb #1 to sampling distributions How to use and apply the Rule of Thumb #2 to sampling distributions How to calculate probabilities based on the normal approximations using either Table A or the calculator commands (normalcdf) – Look over HW problems. Use proper notation. How to describe a sampling distribution. Address the following: shape, center, and spread. For example, “The distribution is normal with a mean of ____ and a standard deviation of ____”. The law of large numbers ensures us that as the number of observations drawn increases, the mean ( x ) of the observed values eventually approaches the mean of the population as closely as you specified and stays that close. The significance and use for the central limit theorem. o If the population is normally distributed, then the sampling distribution will also be normal regardless of the sample size. o If the population is NOT normally distributed, then the sampling distribution becomes more and more normal as the sample size increases. The larger the sample size, the more normally distributed we can assume the data to be. How to identify high/low bias and variability. 29