* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CI_SP13
History of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Opinion poll wikipedia , lookup
Resampling (statistics) wikipedia , lookup
PRE-CLASS Questions: Recall from Week 3 our discussion on sampling and how we calculated a conservative margin of error for surveys by 1/sqrt(n). We discussed the Corbett survey which reported 52% of PA registered voters supported Governor Corbett's decision to sue the NCAA. The survey had a 3.8% margin of error based on a sample size of 675. Using this information, consider the following questions: 1. What is the population parameter of interest for this study? That is, what does this 52% estimate? The population parameter of interest is “the proportion of all PA registered voters who support Corbett suing NCAA” and is estimated by this 52%. 2. How "confident" do you think the group reporting these results are in their interval estimate being correct (i.e. 52% plus/minus 3.8%)? That is, does the interval captures the true value of interest in question 1? Seek class responses, then comment that the common level of confidence used is 95%. That is, when you see such reported information, you can assume – unless stated otherwise – that the margin of error was calculated based on a 95% level of confidence. This means essentially that if the survey was repeated many times under the same conditions that about 95% of the confidence intervals constructed would capture the true proportion. (Refer back to last week’s discussion on sampling distributions where we looked at how many intervals calculated captured the population parameter. 3. Do you think a wider interval would offer more or less confidence to the polling agency in their interval being correct? Explain in common sense terms why a wider interval would invoke a higher level of confidence in the interval capturing the unknown parameter. 4. If polling agency wanted a margin of error of 3% instead of 3.8% do you think they would have needed a larger or smaller sample size to achieve this new margin of error? Again, applying common sense, do you believe asking more people would increase the error or decrease the error in your estimate? Decrease error! 5. Referring back to question 3, what size sample do you think they would have needed to achieve a 3% margin of error? ME = 1/sqrt(n) so n = 1/ME2 = 1/(0.03)2 = 1/0.0009 =1111.11 ~ 1112 6. The Corbett survey was based on a random sample. Considering the survey you completed for our class two weeks ago, what population do you think this survey represents? Is our survey data based on any probability sampling methods? If not, what affect does that have on what population you feel is represented by our class survey? Based on volunteer sample. For some questions such as amount spend on first date or kissing w/o asking considered form of sexual assault, one could find support in our sample being representative of the population of UP undergraduates. But what about the Chi Omega picture? Do you think our class would be representative of the UP undergraduate 1 population? When you answer this question, consider if you feel some student population subgroup might be more offended than others – then ask if this population is being represented in our class. One-Proportion and One-Mean Confidence Intervals For our class purposes, we are going to base our discussion on confidence intervals for two types of data: Categorical and Quantitative. If data is Categorical, then our interest is in getting a confidence interval for the population proportion. If data is Quantitative, then our interest is in getting a confidence interval for the population mean. As a definition, a confidence interval is defined pretty much as it sounds. One it is an interval (i.e. a range of values) that one has some degree of confidence in for capturing (i.e. the range of values includes the parameter of interest) the population value one is trying to estimate. Thus the primary use of a confidence interval is to estimate some unknown population parameter, i.e the proportion for some proportion (p) or some mean (u) for a population. Recalling notation from sampling distributions we have the following: ̂) is a point estimate that estimates Population Proportion (p) Sample Proportion (𝒑 ̅ Sample Mean (𝑿) is a point estimate that estimates the Population Mean (u) The basic format of a confidence interval is: Sample statistic ± Margin of Error As we discussed above, however, this margin of error can be affected by such things as sample size (larger the sample the smaller the error) or by how confident one is (the more confident the wider the interval). Because of this, we may realize that a margin of error must incorporate these two features in its calculation. Recall from our sampling distribution discussion for sample proportion and sample mean that we understand that due to sampling we will have some degree of variation in our sample results, i.e. the sample proportions and sample means would vary from sample to sample. So we formulated a standard error. We then looked at how the Empirical Rule would be applied to these standard errors to create intervals that would contain the proportion of mean. If you remember, for 95% we used a multiplier of “2 times the Standard Error”. We apply that practice here to offer a basic construct for the margin of error: Margin of Errror = Multiplier × Standard error As stated previously, the most commonly used and accepted level of confidence is 95%. This means that if a study was repeated say 100 times – under the same conditions (e.g. sample size, sampling method, population sampled) – and 100 confidence intervals were constructed, then roughly 95 of these 100 intervals would include the population parameter of interest. Other lessused levels of confidence are: 90%, 98%, and 99%. 2 The Multiplier: Recall from our sampling distribution discussion that we stated that sample proportions and sample means will follow an approximate normal distribution under certain conditions. For sample proportions this will be having at least 10 successes (i.e. n*𝑝̂ >= 10 ) and at least 10 failures (n*(1-𝑝̂ ) >= 10. Using this “theory”, we can get our multipliers from the standard normal distribution (Z) table. Confidence Level 90% 95% 98% 99% Z-Multiplier 1.65 1.96 ~ 2.00 2.33 2.58 Motivation behind these multipliers; Looking at 95%, the multiplier is found by determining what z-score would result in a 0.95 probability for 0.25 in the lower (left) tail and 0.25 in the upper (right) tail. From the z-table we find that this takes place for a z-value of -1.96 and + 1.96 One Proportion Confidence Interval pˆ Z * pˆ (1 pˆ ) n p̂ = sample proportion and pˆ (1 pˆ ) n is the standard error (NOTE since we do NOT know the population proportion, p, we substitute in the sample proportion, p̂ ) Examples: Assuming our class survey represents a random sample of UP undergraduates, estimate with 95% confidence, the proportion of UP undergraduates who: 3 1. Knew that kissing someone without asking technically falls under the definition of sexual assault. 2. Felt the reaction to the Chi Omega photograph was proper. For Question 1: sample proportion is 73/177 = 0.41 Note 73 successes and 104 failures 0.41(1 0.41) = 0.41 1.96 * 0.037 = 0.41 ± 0.074 = 0.336 to 0.484 or 177 0.41 1.96 * from 33.6% to 48.4% INTERPRETATION: We are 95% confident that the proportion of UP undergraduates who know kissing w/o asking falls under sexual assault is from 33.6% to 48.4% Event = Yes Variable KnowKiss X 73 N 177 Sample p 0.412429 95% CI (0.339908, 0.484951) For Question 2: sample proportion is 48/156 = 0.41 Note 48 successes and 108 failures 0.31(1 0.31) = 0.31 1.96 * 0.037 = 0.41 ± 0.074 = 0.236 to 0.384 or 156 0.31 1.96 * from 23.6% to 38.4% INTERPRETATION: We are 95% confident that the proportion of UP undergraduates who felt the reaction to the Chi Omega photo was proper is from 23.6% to 38.4% Event = Proper Variable ChiOmega X 48 N 156 Sample p 0.307692 95% CI (0.235266, 0.380118) Using the normal approximation. One Mean Intervales For 1-mean the confidence interval will involve a new concept: Degrees of Freedom, or df. We will use this df in conjunction with Table A2 to find the multiplier. This occurs since we only have information on the sample and therefore do not know the population standard deviation (σ): so we have to estimate this with the sample standard deviation (s). The formula for a 1-mean confidence interval is: 4 xt* s n Therefore we apply similar techniques but now we are interested in estimating the population mean, μ, by using the sample statistic and the multiplier is a t-value. Until now we assumed that our random variable came from a normal distribution with a known population standard deviation, σ. However, typically we do not know this parameter and therefore must estimate it. This is done by using the standard deviation of the sample which is expressed as "S". Since we need to make this estimate we lose our reference to the variable being from a normal distribution. These t-values come from a t-distribution which is similar to the standard normal distribution from which the z-values came. The similarities are that the distribution is symmetrical and centered on 0. The difference is that when using a t-table we need to consider a new feature: degrees of freedom (df). This degree of freedom will be based on the sample size, n. Examples: Assuming our class survey represents a random sample of UP undergraduates, estimate with 95% confidence, 1. The mean amount of money they feel should be spent on the first date. 2. The mean loan amount they expect to face upon graduation. For Question 1: I took a random sample of 30 just to better illustrate using the t-table. The sample mean is 55 dollars with standard deviation of 85.9 dollars. With n of 30 the degrees of freedom (DF) = 30 – 1 = 29. xt* s n = 55 2.045 * 85.9 = 55 ± 2.045*15.7 = 55 ± 32.1 or from 22.9 to 77.1 dollars. 30 INTERPRETATION: We are 95% confident that the mean amount of money UP undergraduates feel should be spent on a first date is from $22.9 to $77.1 One-Sample T: Sample_FirstDate Variable Sample_FirstDate N 30 Mean 55.0 StDev 85.9 SE Mean 15.7 95% CI (22.9, 87.1) For Question 2: The sample mean is 36,352 dollars with standard deviation of 54,056 dollars. With n of 172 the degrees of freedom (DF) = 172 – 1 = 171. From table use 100 5 xt* s = 36352 1.984 * n to 44530 dollars. 54056 = 36352 ± 1.984*4122 = 36352 ± 8178 or from 28174 172 INTERPRETATION: We are 95% confident that the mean loan amount UP undergraduates will pay upon graduation is from $28,174 to $44,530 One-Sample T: Loan Variable Loan N 172 Mean 36352 StDev 54056 SE Mean 4122 95% CI (28216, 44488) 6