Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sampling Distributions of Proportions • Toss a penny 20 times and record the number of heads. • Now, Really think about it for a minute: • We are tossing exactly 20 times, expecting that the probability of a head will be .5 each time and that each toss will be independent • SOUND FAMILIAR??????? • Okay now, imagine 1000 people lining up and tossing the penny 20 times each. • If we were to record each sample proportion and histogram the results what would expect the shape to be? A Sampling Distribution Model for a Proportion A proportion is no longer just a computation from a set of data. It is now a random variable quantity that has a probability distribution. This distribution is called the sampling distribution model for proportions. Even though we depend on sampling distribution models, we never actually get to see them. We never actually take repeated samples from the same population and make a histogram. We only imagine or simulate them. Copyright © 2010, 2007, 2004 Pearson Education, Inc. The Sampling Distribution of Proportions How good is the statistic sampling distributi on of pˆ as an estimate of the parameter p? The pˆ answers this question. Consider the approximate sampling distributions generated by a simulation in which SRSs of Reese’s Pieces are drawn from a population whose proportion of orange candies is either 0.45 or 0.15. What do you notice about the shape, center, and spread of each? Copyright © 2010, 2007, 2004 Pearson Education, Inc. The Sampling Distribution of Proportions What did you notice about the shape, center, and spread of each sampling distribution? Shape : In some cases, the sampling distributi on of pˆ can be approximat ed by a Normal curve. This seems to depend on both the sample size n and the population proportion p. Center : The mean of the distributi on is pˆ p. This makes sense because the sample proportion pˆ is an unbiased estimator of p. Spread: For a specific value of p , the standard deviation pˆ gets smaller as n gets larger. The value of pˆ depends on both n and p. There is an important connection between th e sample proportion the number of " successes" X in the sample. pˆ count of successes in sample size of sample Copyright © 2010, 2007, 2004 Pearson Education, Inc. X n pˆ and Sample Proportions The Sampling Distribution of Proportions X np X np(1 p) Since pˆ X / n (1 / n) X , we are just multiplyin g the random variable X by a constant (1 / n) to get the random variable pˆ . Therefore, 1 pˆ (np) p n pˆ is an unbiased estimator of p 1 np(1 p) pˆ np(1 p) 2 n n p(1 p) n As sample size increases, the spread decreases. Copyright © 2010, 2007, 2004 Pearson Education, Inc. Sample Proportions In Chapter 6, we learned that the mean and standard deviation of a binomial random variable X are The Sampling Distribution Model for a Proportion (cont.) Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of p̂ is very much like the Binomial Distribution and for large sample sizes is modeled by a Normal model with Mean: Standard deviation: SD( p̂) ( p̂) p Copyright © 2010, 2007, 2004 Pearson Education, Inc. pq n Modeling the Distribution of Sample Proportions (cont.) A picture of what we just discussed is as follows: Copyright © 2010, 2007, 2004 Pearson Education, Inc. How Good Is the Normal Model? The Normal model gets better as a good model for the distribution of sample proportions as the sample size gets bigger. Just how big of a sample do we need? This will soon be revealed… Copyright © 2010, 2007, 2004 Pearson Education, Inc. Assumptions and Conditions Most models are useful only when specific assumptions are true. There are two assumptions in the case of the model for the distribution of sample proportions: 1. The Independence Assumption: The sampled values must be independent of each other. 2. The Sample Size Assumption: The sample size, n, must be large enough. Copyright © 2010, 2007, 2004 Pearson Education, Inc. Assumptions and Conditions (cont.) Assumptions are hard—often impossible—to check. That’s why we assume them. Still, we need to check whether the assumptions are reasonable by checking conditions that provide information about the assumptions. The corresponding conditions to check before using the Normal to model the distribution of sample proportions are the Randomization Condition, the 10% Condition and the Success/Failure Condition. Copyright © 2010, 2007, 2004 Pearson Education, Inc. Assumptions and Conditions (cont.) Randomization Condition: The sample should be a simple random sample of the population. 2. 10% Condition: the sample size, n, must be no larger than 10% of the population. 3. Success/Failure Condition: The sample size has to be big enough so that both np (number of successes) and nq (number of failures) are at least 10. …So, we need a large enough sample that is not too large. 1. Copyright © 2010, 2007, 2004 Pearson Education, Inc. Why does the third assumption insure an approximate normal distribution? Remember back to binomial distributions Suppose n = 10 & p = 0.1 (probability of a success), a histogram of this distribution > 10 &skewed n(1-p) >right! 10 isnp strongly insures that the sample size Now use n = 100 & p = 0.1 (Now is large enough to have a np normal > 10!) While the histogram is approximation! still strongly skewed right – look what happens to the tail! Consider the following situation: Suppose we have a population of six people: Alice, Ben, Charles, Denise, Edward, & Frank What is the proportion of females? 1/3 What is the parameter of interest in this population? gender Draw samples of two from this population. How many different samples are possible? 6C2 =15 Find the 15 different samples that are possible & find the sample proportion of the number of females in each sample. Ben & Frank Alice & Ben .5 Charles & Denise Alice & Charles .5 Alice & Denise 1 Charles & Edward Alice & Edward .5 Charles & Frank the mean of the Alice & Frank How does .5 Denise & Edward (p-hat) Ben & Charlessampling 0 distribution Denise & Frank Ben & Denise compare .5 to the population Edward & Frank parameter (p)? = p Ben & Edward 0p-hat 0 .5 0 0 .5 .5 0 Find the mean & standard deviation of all p-hats. μpˆ 1 3 & σ pˆ 0.29814 But WAIT! We said that the standard deviation should equal SD( p̂) pq n σ pˆ 1 2 3 3 1 2 3 0.29814 WHY did this happen? We are sampling more than 10% of our population! So – in order to calculate the standard deviation of the sampling distribution, we MUST be sure that our sample size is less than 10% of the population! Assumptions (Rules of Thumb) • Must start with a Simple Random Sample • Sample size must be less than 10% of the population (independence) • Sample size must be large enough to insure a normal approximation can be used. np > 10 & n (1 – p) > 10 A polling organization asks an SRS of 1500 first-year college students how far away their home is. Suppose that 35% of all first-year students actually attend college within 50 miles of home. What is the probability that the random sample of 1500 students will give a result within 2 percentage points of this true value? STATE: We want to find the probability that the sample proportion falls between 0.33 and 0.37 (within 2 percentage points, or 0.02, of 0.35). + Sample Proportions ˆ Using the Normal Approximation for p Inference about a population proportion p is based on the sampling distribution of pˆ . When the sample size is large enough for np and n(1 p) to both be at least 10 (the Normal condition), the sampling distribution of pˆ is approximately Normal. PLAN: We have an SRS of size n = 1500 drawn from a population in which the proportion p = 0.35 attend college within 50 miles of home. pˆ 0.35 pˆ (0.35)(0.65) 0.0123 1500 DO: Since np = 1500(0.35) = 525 and n(1 – p) = 1500(0.65)=975 are both greater than 10, we’ll standardize and then use Table A to find the desired probability. 0.35 0.37 0.35 0.33 z 1.63 1.63 0.123 0.123 P(0.33 pˆ 0.37) P(1.63 Z 1.63) 0.9484 0.0516 0.8968 z CONCLUDE: About 90% of all SRSs of size 1500 will give a result truth about the population. 2 percentage points of the within Based on past experience, a bank believes that 7% of the people who receive loans μpˆ .07 will not make payments on time. The bank recently approved 200 .93 .07loans. σ pˆ .01804 Yes – 200 What are the mean and standard deviation np = 200(.07) = 14 of the proportion of clients in this group n(1 - p) = 200(.93) = 186 who may not make payments on time? Are assumptions met? Ncdf(.10, 1E99, .07, .01804) = What is the probability that over 10% of .0482 these clients will not make payments on time? Suppose one student tossed a coin 200 times and found only 42% heads. Do you believe that this is likely to happen? .5(.5) .0118 ncdf ,.42,.5, 200 No – since there is approximately a 1% chance of this happening, I do not believe the student did this. Assume that 30% of the students at MSU wear contacts. In a sample of 100 students, what is the probability that more than 35% of them wear contacts? p-hat = .3 & p-hat = .045826 Check assumptions! np = 100(.3) = 30 & n(1-p) =100(.7) = 70 Ncdf(.35, 1E99, .3, .045826) = .1376 + Section 7.2 Sample Proportions Summary In this section, we learned that… When we want information about the population proportion p of successes, we ˆ to estimate the unknown often take an SRS and use the sample proportion p parameter p. The sampling distribution of pˆ describes how the statistic varies in all possible samples from the population. The mean of the sampling distribution of pˆ is equal to the population proportion p. That is, pˆ is an unbiased estimator of p. p(1 p) The standard deviation of the sampling distribution of pˆ is pˆ for n an SRS of size n. This formula can be used if the population is at least 10 times as large as the sample (the 10% condition). The standard deviation of pˆ gets smaller as the sample size n gets larger. When the sample size n is larger, the sampling distribution of pˆ is close to a p(1 p) Normal distribution with mean p and standard deviation pˆ . n In practice, use this Normal approximation when both np ≥ 10 and n(1 - p) ≥ 10 (the Normal condition).