Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AP Statistics 2011 Unit 2: Worksheet 2 Name:____Key_______________________ Date:_______________ NOTE: The graphs used in this key are screen shots from the calculator so they are not formatted properly with a complete Title and adequate labels that would be expected on an assessment! I. How Can We Assess Normality? 1) List three characteristics of all normally distributed data. Graph shows symmetrical, bell-shaped, unimodal distribution. Mean = Median Empirical Rule applies 2) Is any data perfectly normal? Explain. No – it is very rare that any real world data would be perfectly normal due to variability and sample size. Most data would be considered “approximately Normal”. 3) Given a set of data and your answer to questions 1 & 2 above, list at least 2 methods that you already know that you could use to determine if the data is normal. Graph it (always!): Use a histogram, stem & leaf plot or dot plot (small data sets) and analyze the distribution to see if it is bell-shaped, unimodal and symmetrical Determine if the Empirical Rule applies: Count the number of observations that fall within 1, 2, and 3 standard deviations from the mean and see if they fit the 68, 95, 99.7% pattern 4) Do you think that a large set of data is more likely to be normal than a small set of data? For example, if we examined the heights of students in our class (a small set of data) and compared it to the heights of high school students in the United States, would we get a different distribution? Yes, a large set of data from a normal population is more likely to be Normal than a small data set from the same distribution. There is more variability in a small data set. Smaller data sets have a larger variance and standard deviation. II. Assessing Normality United States 2009 Unemployment Rates: The following chart shows the unemployment rates in all 50 states from November, 2009. The data is arranged from lowest (North Dakota’s 4.1%) to highest (Michigan’s 14.7%). 4.1 7.0 8.6 10.6 4.5 7.2 8.7 10.6 5.0 7.4 8.8 10.8 6.3 7.4 8.9 10.9 6.3 7.4 9.1 11.1 6.4 7.8 9.2 11.5 6.4 8.0 9.5 12.3 6.6 8.0 9.6 12.3 6.7 8.2 9.6 12.3 6.7 8.2 9.7 12.7 6.7 8.4 10.2 14.7 6.9 8.5 10.3 7.0 8.5 10.5 5) Plot the data. Use a dotplot, stemplot or histogram. Describe the distribution. The distribution of unemployment rates is unimodal and fairly symmetrical with no apparent strong skews to the left or to the right. The median is approximately 8.5% and the mean is approximately 8.68%, so both measures of center are fairly consistent. There are no apparent outliers in the data. There is a range of approximately 10.6% in the fifty states’ reported unemployment rates. ___________________________________________________________ 6) Does the data follow the Empirical Rule? Complete the table to find out and analyze your results against what you would expect from the Empirical Rule. Mean = __8.68___________ Standard Deviation = __2.02__________ Low Value High Value Frequency Percent of Data Ц +/- 1σ 6.66 10.7 34 68% Ц +/- 2σ 4.64 12.72 47 94% Ц +/- 3σ 2.62 14.74 50 100% (Simply count the number of observations that fall within each category – I’ve color-coded them above so that it is easy to see). 7) Create a box plot of the data. Describe the distribution. The box plot shows that the data is fairly symmetrical with no apparent outliers. The IQR is about 3.3%, showing that 50% of all states have employment rates between 7% and 10.3%. 7) Does the data appear to be approximately normal? Why or why not? Be sure to explain the SOCS in your answer. Yes, the unemployment data from all 50 states appears to be approximately Normal. This is seen in the histogram and box plot showing virtually symmetrical distributions that are not skewed, with a single peak at approximately 6.5-8.5%. The mean (8.68%) and the mean (8.5%) are fairly close, indicating that either could be used as a measure of the center of the data. There are no apparent outliers in the unemployment data, as seen in the box plot. The IQR is about 3.3%, showing that 50% of all states have employment rates between 7% and 10.3%. All unemployment data falls within a range of 10.6% with the lowest unemployment rate at 4.1% and the highest rate at 14.7%. A Normal Probability Plot shows each observation (x) plotted against its expected z-score (y). Perfectly normal data is linear. Remember, however, that virtually no data is perfectly normal, so we should not overreact to slight variations from normal when we assess normality. Look for a linear pattern, but don’t overreact to minor wiggles in the plot. Look for shapes that show clear departures from Normality. 8) Using your calculator, construct a Normal Probability Plot for the Unemployment Rate data. Describe the data. Is it approximately normal? Why or why not? The Normal Probability plot shows that the data is approximately normal because it follows a linear pattern. The NPP graphs the unemployment rate on the X axis against the z-score on the Y axis. There is no strong variation away from the linear pattern in either direction. 9) Guinea Pig survival times: Scientists conducted an experiment using Guinea Pigs and tracked their survival times (in days) after they were injected with an infectious bacteria. 43 45 53 56 56 57 58 66 67 73 74 79 80 80 81 81 81 82 83 83 84 88 89 91 91 92 92 97 99 99 100 100 101 102 102 102 103 104 107 108 109 113 114 118 121 123 126 128 137 138 139 144 145 147 156 162 174 178 179 184 191 198 211 214 243 249 329 380 403 511 522 598 a) In your calculator, construct a histogram of the data. Describe the distribution. The distribution is heavily skewed to the right. The center is approximately 150 based on the median. The data peaks between 50 and 100, but the range is 600 days. There are potential outliers (380, 403, 511, 522, 598 days). b) Using your calculator, construct a Normal Probability Plot of the data. Describe the plot. The Normal Probability plot pattern is curved strongly to the right. This further confirms that the data is right skewed. c) Is the data approximately normal? Why or why not? No, the life expectancy of Guinea Pigs who have been injected with an infectious disease does not appear to be Normal, based on the strong skew in the Normal Probability Plot, the histogram which also shows a strong right skew, and the centers (both median and mean) are not approximately equal. 10) Below is a stem and leaf plot showing NBA Free-Throw Percents. Key: Stem Leaf 4 0 = 0.40 3 4 5 6 7 8 9 6 0 0 0 0 0 0 7 1 0 1 0 1 1 7 1 1 1 1 1 1 8 2 1 1 1 1 1 8 2 2 1 1 1 1 3 2 2 1 2 2 4 3 2 2 2 2 4 3 2 2 2 2 5 4 2 2 2 2 6 4 2 2 2 2 6 4 3 2 2 3 6 5 3 3 3 3 7 5 3 3 3 3 8 5 4 3 3 4 8 6 4 4 3 4 8 6 4 4 3 4 9 6 4 4 4 4 9 6 5 4 4 5 9 7 5 5 4 5 9 7 5 5 4 5 9 7 6 5 5 6 8 6 6 5 6 9 6 6 5 6 6 6 6 6 6 6 6 6 6 6 6 6 7 6 6 7 7 7 6 7 7 7 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 7 7 8 8 8 7 8 8 8 7 8 8 8 8 8 8 8 8 9 9 8 8 9 9 9 9 9 9 8 8 9 9 9 9 a) Describe the distribution. The distribution of NBA free-throw percents appears to left skewed and peaks at 80-89%. The center is approximately 68 – 70 percent, based on the median which is more resistant to outliers and skewed data than the mean. The range is approximately 63% based on the range. b) What would you expect the Normal Probability Plot to look like? Sketch it here. The NPP would be somewhat curved (skewed) left with most of the data clustered to the right. The data would look similar to the NPP shown below. 11) Sketch a Normal Probability Plot for each of the following: Normal Data Data Skewed Left Data Skewed Right