Download File - collingwoodresearch

POSC 202A: Lecture 9 Lecture: statistical significance. Statistical Significance The fundamental question of statistical significance: Statistical Significance The fundamental Question: How likely is the result we observe to be the product of chance? This question drives all of the tests we perform by allowing us to differentiate the systematic from the stochastic. Statistical Significance Confidence IntervalAn interval calculated from sample data that is guaranteed to capture the true population parameter. It tells us how large of an interval we need to create in order to capture the true population value in some fixed percentage of the intervals we draw. Statistical Significance Think of it this way: We can draw a sample in order to estimate some statistic. If we repeat over and over we start to create a sampling distribution. To create a 95% CI we need to see how large of an interval around the statistic we need to create to satisfy the condition that in 95% of the samples drawn the interval we draw contains the true population parameter (value). Page 386 has a nice graph of this. Statistical Significance Confidence Interval: An example: a 95% confidence interval is the range needed to capture the true population value in 95% of the intervals we draw from a population. Statistical Significance Confidence Interval: The interval thus captures the variability inherent in using samples to draw inferences about a population. To estimate, we need an estimate of the population mean and the standard deviation. Reported in the form: Estimate  Margin of Error Generally, the interval is: mean – (z*sd); mean+(z*sd) Statistical Significance Confidence Interval: To estimate, we need an estimate of the population mean and the standard deviation. How do we calculate this for a sampling distribution? This is for proportions: Mean=p S.D.  p(1  p) n Refresher: Thumb’s Rule Recall that with a normal distribution: Apportionment of area about the mean is +/- 1sd= 68% +/- 2sd= 95% +/- 3sd= 99.7% So, a 95% CI corresponds to x +/- 2sd= 95% Statistical Significance Lets create an example using a sample from Dear Abby, in which 400 women responded of whom 60% would rather just cuddle than have sex with their husbands. What do we want to know? Statistical Significance What do we want to know? Is 60% beyond what we would expect to see due to chance alone? Is this likely to be just random variation? Statistical Significance Lets create an example using a sample from Dear Abby, in which 400 women responded of whom 60% would rather just cuddle than have sex with their husbands. Sample mean=p S.D.  .6 =p, so: S.D.= p(1  p) n .6(1  .6) .24 .49    .0245 400 400 20 Confidence Interval Our confidence interval is thus: x +/- 2sd= 95 CI Or .6 +/- 2(.0245)= .6-.049 and .6+.049 Confidence Interval .6 +/- 2(.0245)= .6-.049 and .6+.049 Round off to .05 95% CI .55 .60 x .65 Confidence Interval 95% of all samples will capture the true population parameter in the range between .55-.65 95% CI .55 .60 x .65 Confidence Interval 95% of all samples will capture the true population parameter in the range between .55-.65 From this we conclude that we are 95% confident that between 55% and 65% of women prefer cuddling to sex. Statistical Significance How do our estimates change with the size of the population? Recall we found S.D.  p(1  p) n .6(1  .6) .24 .49    .0245 400 400 20 The population average (mean) stays the same regardless of sample size. But what of the SD? Sample Size and Confidence Intervals Where the sample statistic is .5 Sample #1 Sample #2 Sample #3 Sample #4 Sample #5 Sample #6 Sample #7 Mean N 0.5 100 0.5 200 0.5 300 0.5 400 0.5 500 0.5 1000 0.5 10000 Sd 95% CI 0.05 0.4 0.035 0.43 0.029 0.44 0.025 0.45 0.022 0.46 0.016 0.47 0.005 0.49 to 0.6 0.57 0.56 0.55 0.54 0.53 0.51 Confidence Interval A shortcut for approximating the 95% CI for a proportion: 1 N More accurate as you get closer to an even split (i.e. 50%) Confidence Interval Mean Sample Size SD-calculation p(1  p) N 2x SD 0.10 900 0.20 900 0.010 0.020 0.033 . 0.30 900 0.40 900 0.50 900 0.013 0.027 0.015 0.031 0.016 0.033 0.017 0.033 0.033 0.033 0.033 0.033 2 SD approximation 1 N Statistical Significance Recall--The fundamental Question: How likely is the result we observe to be the product of chance? This question drives all of the tests we perform by allowing us to differentiate the systematic from the stochastic. Significance testing is all about comparisons. Statistical Significance Significance testing is all about comparisons. Is what we observe close or far from what we expect ? Is what we observe so far from what we expect that we cannot attribute what we see to chance alone. Statistical Significance An example: Imagine we took a valid random sample of women’s preferences and got the same result as Dear Abby’s survey of women’s cuddling preferences (60% preferred cuddling to sex; 400 women responded). How would we conduct a significance test? We need to identify the appropriate comparison. Statistical Significance How would we conduct a significance test? What if women randomly answered “snuggle” or “sex”? Statistical Significance What we would expect if women just answered randomly? If so, we might expect 50% of respondents to prefer snuggling. Statistical Significance What we would expect if women just answered randomly? If so, we might expect 50% of respondents to prefer snuggling. Then we need to know if what we observed is (60%) too large of a result to be attributable to chance alone. How can we determine this? Statistical Significance How can we determine this? One way is to estimate a 95% confidence interval around our sample mean (60%) and see if it contains the result we would expect sue to chance alone (50%). Confidence Interval Recall, we created the following interval earlier. We can simply look to see if .50 is within the interval constructed. Since it is not, then our sample result is statistically significantly different than chance. 95% CI .50 .55 .60 x .65 Statistical Significance How can we determine this? A second way is to conduct a significance test by solving for areas under the normal curve. Recall we know how to find the likelihood that some event occurs using the formula: . Z Xi  X  This formula asks whether what we see is too far from chance to be attributed to chance alone. Statistical Significance We simply calculate the number o standard units what we observe is from random chance. Z Xi  X  60  50  4 2.5 Then use the Z table to obtain the likelihood that we see a sample as large as 60% if the true value is 50%. Statistical Significance Our Z table only goes to 3.4! Less than 1 time in 10,000 would we see a sample mean of 60% if the process were driven by chance alone. Confidence Interval We could illustrate this process on the normal curve as well. The Q: How likely is it that we would see a result above 60%? Z>.9998 Z<.0001 .50 x .55 .60 Xi

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download File - collingwoodresearch