Sampling Distributions of Proportions • • The dotplot was a partial graph Remember thedistribution skittles of example. of the sampling all sample proportions of sample size We calculated the proportion of 40. If we found all the possible orange skittles &– this marked sample proportions would it be on the dot plots on the board. approximately normal! What shape did the n=40 dot plot have? Sampling Distribution • Choose an SRS of size n from a large population with population proportion p having some characteristic of interest. Let phat be the proportion of the sample having that characteristic.. • We need to come up with some formulas for the mean and standard deviation. Suppose we have a population of six people: Melissa, Jake, Charles, Kelly, Mike, & Brian What is the proportion of females? 1/3 What is the parameter of interest in this population? Proportion of females Draw samples of two from this population. How many different samples are possible? 6C2 =15 Find the 15 different samples that are possible & find the sample proportion of the number of females in each sample. Jake & Brian Melissa & Jake .5 Charles & Kelly Melissa & Charles .5 Melissa & Kelly 1 Charles & Mike How does the mean of the Melissa & Mike .5 Charles & Brian sampling distribution (mp-hat) Melissa & Brian .5 Kelly & Mike compare to the population Jake & Charles 0 Kelly & Brian parameter (p)? Jake & Kelly m.5p-hat = p Mike & Brian Jake & Mike 0 0 .5 0 0 .5 .5 0 Find the mean & standard deviation of all p-hats. μpˆ 1 3 & σ pˆ 0.29814 Formulas: The mean of the sampling distribution. X pˆ n m pˆ p The standard deviation of the sampling distribution. p1 p pˆ n Does the standard deviation of the sampling distribution equal the equation? NO - σ pˆ 1 2 3 3 1 2 3 0.29814 WHY? So – in order to calculate the standard deviation of the sampling distribution, we MUST be sure that our sample size is less than 10% of the population! We are sampling more than 10% of our population! Assumptions (Rules of Thumb) • Use this formula for standard deviation when the population is sufficiently large, at least 10 times as large as the sample. • Sample size must be large enough to insure a normal approximation can be used. We can use the normal approximation when np > 10 & n (1 – p) > 10 Why does the second assumption insure an approximate normal distribution? Remember back to binomial distributions Suppose n = 10 & p = 0.1 (probability of a success), a histogram of this np > 10 & n(1-p) > 10 distribution is strongly insures that the sample skewed right! Now use n 100 &enough p = 0.1 size is=large to (Now np >have 10!) While the a normal histogramapproximation! is still strongly skewed right – look what happens to the tail! Based on past experience, a bank believes that 7% of the μpˆ .07 people who receive loans will not make payments on .07 time. .93 σ . 01804 ˆ p The bank recently approved 200 loans. Yes – 200 np = 200(.07) = 14 and standard deviation n(1 - p) = 200(.93) = 186 What are the mean of the proportion of clients in this group who may not make payments on time? Ncdf(.10, Are assumptions met? 1E99, .07, .01804) = .0482 What is the probability that over 10% of these clients will not make payments on time? Example #1 A polling organization asks an SRS of 1500 first year college students whether they applied for admission to any other college. In fact, 35% of all first-year students applied to colleges besides the one they are attending. What is the probability that the random sample of 1500 students will give a result within 2 percentage points of the true value? STATE PLAN DO CONCLUDE Example #1 STATE: We want to know the probability that a random sample yields a result within 2 percentage points of the true proportion. We want to determine P (.33 pˆ .37) Example #1 PLAN: We have drawn an SRS of size 1500 from the population of interest. The mean of the sampling distribution of p-hat is 0.35: mpˆ 0.35 Example #1 PLAN: We can assume that the population of first-year college students is over 15,000, and are safe to use the standard deviation formula: p(1 p ) (0.35)(0.65) pˆ 0.0123 n 1500 In order to use a normal approximation for the sampling distribution, the expected number of successes and failures must be sufficiently large: np 10 and n(1 p) 10 1500(.35) 10 and 1500(.65) 10 Therefore, pˆ N (0.35,0.0123) Example #1 DO: Perform a normal distribution calculation to find the desired probability: P (.33 pˆ .37) .8961 Example #1 CONCLUDE: About 90% of all SRS’s of size 1500 will give a result within 2 percentage points of true proportion. Example #2 Suppose one student tossed a coin 200 times and found only 42% heads. Do you believe that this is likely to happen? = 100 & n(1-p) 200(.5) = 100 . 5 (. 5 ) np = 200(.5) = .0118 ncdf ,.42,.5, Since both a normal curve! > 10, I can use 200 m & using the formulas. Find No – since there is approximately a 1% chance of this happening, I do not believe the student did this. Example #3 Assume that 30% of the students at HH wear contacts. In a sample of 100 students, what is the probability that more than 35% of them wear contacts? mp-hat = .3 & p-hat = .045826 np = 100(.3) = 30 & n(1-p) =100(.7) = 70 Check assumptions! Ncdf(.35, 1E99, .3, .045826) = .1376 Example #4 (Your turn) • About 11% of American adults are black. Therefore, the proportion of blacks in an SRS of 1500 adults should be close to .11. If a national sample contains only 9.2% black, should we suspect that the sampling procedure is somehow under-representing blacks?