Download Sampling Distribution for a Proportion Start with a population, adult

Sampling Distribution for a Proportion Start with a population, adult Americans and a binary variable, whether they believe in God. The key parameter is the population proportion p. In this case let us suppose 82% of Americans believe in God. Take a sample of 400 Americans, ask if they believe in God. The key statistic is the sample proportion p̂, (“pee-hat”) the number of yes answers divided by the total (400). The proportion of the sample who believe in God. Each sample has a different p̂. If we consider all possible samples we can make a histogram of those values, the sampling distribution of the random variable P̂ . The sampling distribution of P̂ has 1. A mean of µP̂ = p 2. A standard deviation (standard error) of σP̂ = q p(1−p) n 3. A normal distribution if n is big enough. • Wait, but parameters are supposed to be Greek letters and statistics are supposed to be Roman, right? p ought to be some cool Greek letter like π, (in some advanced textbooks its θ) but somehow people don’t follow that convention here. Notice though that the statistic has a decoration (the hat) just like the sample mean (the bar over x). • of course the mean is p. That is just saying the average sample would have 82% answering “yes” • The standard deviation of a sampling distribution (i.e. the standard deviation of a statistic) is always called the standard error The Fine Print There were 3 assumptions underlying last slide. None exactly true in real situations, but Rules of Thumb say when close enough. (a) SRS The sample is assumed to be a Simple Random Sample. The formula for standard deviation assumes sampling with replacement so successive individuals sampled are independent. This is close enough if population is much larger than the sample: (b) Independence/Large Population Assumption The population size is at least 20 times the sample size. As n gets larger the distribution gets more normal, but it happens faster if p is close to .5. We can use the normal dist. to model sample proportion if (c) Normality Assumption/Rule of 15 : the numbers np and n(1 − p) are both at least 15. • We will learn a number of rules of thumb for when we can take assumptions as being met. 1 • Populations are generally big, so the large population assumption is almost always obviously met. I will expect you to be able to say, if the sample size is say 200, that the population needs to be at least 4, 000. • In the rare situations where the large population rule is not met, there is a slightly more complicated formula for the standard deviation that works fine, so failure of this assumption is a pretty mild problem. • The numbers np and n(1 − p) are the average or expected number of yes and no answers in the sample. • If p is close to 1/2 this means n just has to be a little more than 30. if p is close to 0 or close to 1 n needs to be quite large. An Example 82% of adult Americans believe in God. Take a SRS of 400 adult Americans and ask if they believe. What are mean and standard deviation of proportion in your sample who do? What is chance less than 80% in your sample will believe in God? Between 80 and 90%? It says simple random sample, so SRS assumption:Met. The mean is µP̂ = p = .82. Check the large population assumption: Need there to be more than 400 · 20 = 8000 adult Americans: obviously true so Met. r r p(1 − p) .82 · .18 σP̂ = = = 0.0192. n 400 Check normality assumption/ rule of 15 : np = 400 · .82 = 328 ≥ 15. n(1 − p) = 400 · .18 = 72 ≥ 15 so P̂ is normal.Met • Checking the large population assumption was typical. I want to see that you know how big the population needs to be. Usually you do not know exactly how big the population is, but is generally obvious that it is big enough. • Notice the numbers np and n(1 − p) were the average number of yes and no answers you would expect in your sample. More Example 82% of adult Americans believe. Take a SRS of 400 adult Americans and ask if they support him. What are mean and standard deviation of proportion in your sample who do? What is chance less than 80% in your sample will believe? Between 80 and 90%? To find the probability it is less than 80% since P̂ is normal p P (P̂ < .8) = normdist(.8, .82, .82 ∗ .18/400, 1) = 14.9% 2 Between 80% and 90% : p .82 ∗ .18/400, 1) p − normdist(.8, .82, .82 ∗ .18/400, 1) = 85.1% P (.8 < P̂ < .9) = normdist(.9, .82, • Notice I put the formula for the s.d. into normdist, not just the rounded result. Normdist calculations are extremely sensitive to the standard deviation, and you can be quite far off if you round it off too much. So enter the formula directly into excel and do not round in the middle for this calculation. The Example - The Big Picture So we saw that if we take many samples of n = 400 from a population with proportion of successes p = .82 and compute the sample proportion p̂ for each one, these values of P̂ will have a normal distribution with mean µ = .82 and standard deviation σ = .019 • Another Example You know the answer to 75% of the questions your philosophy professor might ask. View the 50 questions on the test as a simple random sample of all questions s/he might ask. Find mean and s. d. of the proportion you will get right. What is your chance of getting over 90%?Between 80 and 90? Between 70 and 80? SRS: Met. Check large population assumption: Need more than 20·50 = 1000 questions s/he could ask. Lots of questions out there, seems reasonable. Met. Check rule of 15 : np = 50 · .75 = 37.5 ≥ 15. n(1 − p) = 50 · .25 = 12.5 which is < 15. Cannot assume P̂ is normal. Not Met. Continue with calc. treat results with skepticism. Compute mean and s.d. The mean is µP̂ = p = .75. r σP̂ = p(1 − p) = n r • 3 .75 · .25 = 0.0612. 50 More Other Example You get 75% of questions right, test has 50 questions. Find mean and s. d. of P̂ . Chance over 90%? Between 80 and 90? Between 70 and 80? The distribution of P̂ is roughly normal with p µP̂ = .75 σP̂ = .75 ∗ .25/50 = .0612. P (P̂ > .9) = 1 − normdist(.9, .75, p .75 · .25/50, 1) = .715% p .75 ∗ .25/50, 1) p − normdist(.8, .75, .75 ∗ .25/50, 1) = 20.0%. P (.8 < P̂ < .9) = normdist(.9, .75, p .75 ∗ .25/50, 1) p − normdist(.7, .75, .75 ∗ .25/50, 1) = 58.6%. P (.7 < P̂ < .8) = normdist(.8, .75, • Sampling Distribution In general consider a population and a variable. Take a simple random sample and compute some statistic like mean or proportion. Each time you do this you get a different answer, so it is a Random variable! If you consider the value of this statistic for every possible sample, you get a distribution, the sampling distribution. We want to know its mean, standard deviation, and shape of its histogram. • The population distribution is the values of the variable in the population (the 82% of all adult Americans who believe) • The data distribution is the values of the variable in one particular sample (maybe 320 yes answers in a sample of 400) • The sampling distribution is the different values of the statistic (like P̂ ) in different samples • One of the trickiest points in the class is the fact that we are taking the statistic as a random variable. This means in a sense our population has become the set of all possible samples out of the original population. If you can keep track of these levels (the original population, one particular sample, and the population of all samples) straight, you will own this course. If you can’t, be patient: Your brain takes time to stretch, but it gets there. 4 Lecture 16 Key Points After watching this lecture you should be able to • Know we mean by the sampling distribution of P̂ , and what it represents • Calculate the mean and standard deviation of P̂ • Check the Independence/ Large Population assumption and what it tells you (that the s.d. formula is correct) • Check the Normality / Rule of 15 assumption and what it tells you (can use normdist) • Calculate probabilities of P̂ using normdist. 5

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Sampling Distribution for a Proportion Start with a population, adult