Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Inference: Statistic & Parameter The government's Current Population Survey contacted a sample of 113,146 households in March 2005. Their mean income was $60,528. Describe the statistic and parameter of interest, µ and x bar. Statistic & Parameter The Gallup Poll asked a random sample of 515 US adults whether they believe in ghosts. Of the respondents, 160 said “Yes”. Identify the statistic and parameter, p and phat. For each boldface number, state whether it is a statistic or a parameter. 1) A department store reports that 84% of all customers who use the store’s credit plan pay their bills on time. 2) A sample of 100 students at a large university had a mean age of 24.1 years. 3) The Department of Motor vehicles reports that 22% of all vehicles registered in a particular state are imports. 4) A hospital reports that based on the ten most recent cases, the mean length of stay for surgical patients is 6.4 days. 5) A consumer group, after testing 100 batteries of a certain brand, reported an average life of 63 hours of use. QUESTION TIME! 1. Following a dramatic drop of 500 points in the Dow Jones Industrial Average in September 1998, a poll conducted for the Associated Press found that 92% of those polled said that a year from now their family financial situation will be as good as it is today or better. The number 92% is a: (a) Statistic (b) Sample (c) Parameter (d) Population Sampling Variability: Do you have a summer birthday (June – August)? Sample #1: Kevin, Lauren, Ernesto, Freddy Sample #2: Erik, Stephanie, Bradley, Emma Sample #3: Matthew, Neggin, Jason, Clay Sampling Variability: If we were to take multiple samples of size n from a population, our statistic (in this case ____ would vary from sample to sample). Sampling Distributions There are ________ possible samples of 4 students that I could choose. The difficult part….I don’t know which of these samples I chose! Some are good representations of the population…some are not so great. Sampling Distribution - Samples of 4 are a little small, so I ran a simulation taking 100 samples of size 50 from a population. I did a little research and found that (based on 2010 data) about 26% of births occur in June, July, and August. - If is took EVERY POSSIBLE sample of size 50 from the population, and made a histogram of the sample proportions, we would get a sampling distribution. Here is an approximate sampling distribution with 16,000 samples with n=25. It’s still not a complete sampling distribution, but it’s closer! If we were to take all possible samples of the same size from the population and compute the sample proportion, 𝑝 of each sample and then create a distribution it would be called a sampling distribution of 𝒑 . Question Time! The sampling distribution of a statistic is (a) the probability that we obtain the statistic in repeated random samples. (b) the mechanism that determines whether randomization was effective. (c) the distribution of values taken by a statistic in all possible samples of the same sample size from the same population. (d)the extent to which the sample results differ systematically from the truth. The following properties generally describe a sampling distribution of created from samples with a large size (usually n ≥30): (1) If the sample size is large enough (or if we are told that the population is normally distributed) The overall shape of the distribution is symmetric and approximately normal. The larger the sample size the closer the shape is to a normal distribution. A rule of thumb used to determine if a normal curve can be used to approximate the sampling distribution of population proportions is if: a) np > 10 and b) n(1-p) > 10 (2) The mean (center) of the distribution is equal to the true population parameter, p. (3) The variability (spread) of the sampling distribution depends on the sample size. The larger the sample-size the smaller the variability of the sampling distribution. (4) If the population is at least ten times larger than the sample size (N ≥ 10n). The standard deviation of the sampling distribution is p(1 p) n (1) The overall shape of the distribution is symmetric and approximately normal. The larger the sample size the closer the shape is to a normal distribution. A rule of thumb used to determine if a normal curve can be used to approximate the sampling distribution of population proportions is if: a) np > 10 and b) n(1-p) > 10 (2) The mean (center) of the distribution is equal to the true population parameter, p. 20,000 Samples of n=50 mean_of_phat S1 = mean 0.259504 (3) The variability (spread) of the sampling distribution depends on the sample size. The larger the sample-size the smaller the variability of the sampling distribution. n=50 p (1 p ) n n=25 Was our sample proportion a good estimation of the population parameter? The spread of the sampling distribution depends on the sample size, not the size of the population! An SRS of size 1500 from the entire population of the United States (about 300 million) and an SRS of 1500 from San Francisco (~750,000) would be equally precise/trustworthy!!!! Properly chosen statistics computed from random samples of sufficient size will have low bias and low variability http://statweb.calpoly.edu/chance/applets/Reeses/ReesesPieces.html Question Time! A simple random sample of 1000 Americans found that 61% were satisfied with the service provided by the dealer from which they bought their car. A simple random sample of 1000 Canadians found that 58% were satisfied with the service provided by the dealer from which they bought their car. The sampling variability associated with these statistics is a) exactly the same b) smaller for the sample of Canadians because the population of Canada is smaller than that of the United States, hence the sample is a larger proportion of the population. c) smaller for the sample of Canadians because the percent satisfied was smaller than that for the Americans. d) larger for the Canadians because Canadian citizens are more widely dispersed throughout the country than in the United States, hence they have more variable views. e) about the same. Question Time! If a statistic used to estimate a parameter is such that the mean of its sampling distribution is equal to the true value of the parameter being estimated, the statistic is said to be (a)random (b)biased (c) a proportion (d)unbiased So…. Sampling distributions + Normal calculations will allow us to quantify how confident we can be in our sample statistics! We will be able to say that there is a ___% chance that our sample proportion varies from the true population proportions by more than ___% points. Example: An SRS of 1500 high school seniors in CA was asked whether they applied to college early. Let’s assume that there are 100,000 high school seniors in the state of California, and that in fact 35% of them apply to college early. What is the probability that your sample of 1500 seniors will give a result within 2 percentage points of the true value of 35%? We have an SRS with n = 1500 drawn from a population in which the proportion p = .35 apply to college early. The sampling distribution of 𝑝 has a mean μ𝑝 = 2) Find the standard deviation (don’t forget to check the “rule of thumb” for independence) 3) Normal? Check Rule of Thumb for Normality: Example: An SRS of 1500 high school seniors in CA was asked whether they applied to college early. Let’s assume that there are 100,000 high school seniors in the state of California, and that in fact 35% of them apply to college early. What is the probability that your sample of 1500 seniors will give a result within 2 percentage points of the true value of 35%? n = 1500 μ𝑝 =.35 𝜎𝑝 =.0123 4) Perform a Normal Calculation: You practice… Survey undercoverage—One way of checking the undercoverage, nonresponse, and other sources of error in a sample survey is to compare the sample with known facts about the population. About 11% of Americans adults are black. The proportion 𝑝 of black adults in an SRS of 1500 adults should therefore be close to 0.11. It is unlikely to be exactly 0.11 because of sampling variability. If a national sample contains only 9.2% black adults, should we suspect the sampling procedure is somehow under representing black adults? We will find the probability that a sample contains no more than 9.2% black adults when the population is 11% black. 9.1/9.2 Exercises 9.7 (a-d), 9.8, 9.11-9.13, 9.20, 9.21, 9.25 (hint: for e, use the formula for standard deviation and your algebra skills), 9.27