Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Histograms Used with numerical data Bars touch on histograms For comparative histograms – use two separate graphs with the same scale on the horizontal axis Histogram is used when quantitative variables are too many for a stemplot or dotplot. Divide the range of the data into groups of equal width Count the number of individuals in each group Draw the histogram, title, label axis There is no horizontal space between bars unless a group is empty Calculator STAT choose 1 Edit – Type values into L1 Set Up Histogram – 2nd Y (Stat Plot) Enter 1 Plot 1 ON Type “histogram” X List: L1 Freq: 1 Quick Graph ZOOM Choose 9 Trace to look at class intervals Set Window to match intervals Graph - Trace 195 204 204 192 192 193 209 194 199 204 204 192 214 222 209 Age (Months) 5 Frequency 4 3 2 1 0 192 198 204 210 15 Students 216 222 Center Shape Spread Outliers Cautions: Pancake and skyscraper effect States differ widely with respect to the percentage of college students who are enrolled in public institutions. The U.S. Department of Education provided the accompanying data on this percentage for the 50 U.S. states for fall 1999. Create a histogram to display this data and then give a brief description of the distribution. (use a minimum of 40, and maximum of 100 with class widths of 10) Percentage of College Students Enrolled in Public Institutions 95 73 63 92 96 75 77 87 76 52 81 74 91 90 65 69 75 88 80 62 85 95 86 93 85 73 70 82 56 80 80 91 89 84 76 82 55 81 60 82 72 89 79 89 92 81 56 84 43 Complete the frequency table below and construct the corresponding histogram. Class 25 to < 34 34 to < 43 43 to < 52 52 to < 61 61 to < 70 70 to < 79 • • • • • • Count Describe the shape: roughly symmetric, roughly skewed left, roughly skewed right, or no discernible shape. Describe the spread of the distribution. ………………………… What is the center of the distribution? (Hint: look at the original data set) …………… Do there appear to be any obvious outliers? If so, name them. ………………………………… What is the width of each class in the histogram? ………… Could this data set be represented by a pie graph? Why or why not? ease of construction convenient handling of outliers construction is not subjective (like histograms) Used with medium or large size data sets (n > 10) useful for comparative displays find five-number summary Min Q1 Med Q3 Max draw box from Q1 to Q3 draw median as center line in the box extend whiskers to min & max display outliers fences mark off mild & extreme outliersALWAYS use modified whiskers extend to largest boxplots in this class!!! (smallest) data value inside the fence A report from the U.S. Department of Justice gave the following percent increase in federal prison populations in 20 northeastern & mid-western states in 1999. 5.9 4.5 1.3 6.9 3.5 5.0 5.9 4.5 5.6 4.1 6.3 4.8 7.2 6.4 5.5 5.3 8.0 4.4 7.2 Create a modified boxplot. Describe the distribution. Use the calculator to create a modified boxplot. 3.2 Symmetrical boxplots Approximately symmetrical boxplot Skewed boxplot Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer. Cancer 10 21 20 45 16 21 17 33 5 23 15 11 9 13 27 13 39 22 7 12 15 3 8 11 18 16 23 16 9 57 18 38 37 10 15 11 18 210 22 11 16 10 No Cancer 9 38 11 12 29 5 7 6 8 29 24 12 17 11 11 3 9 33 17 55 11 29 13 24 7 11 21 6 39 29 7 8 55 9 21 9 3 85 11 14 Create parallel boxplots. Compare the distributions. Creating a Box Plot Cancer No Cancer 0 50 100 Radon 150 200 Cancer No Cancer 100 200 Radon The median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85. Which terms best represent the data? The mean and median best illustrate skewed data While variance and standard deviation represent symmetrical data Spread – how far away from the mean does the data stretch To calculate variances – we need to square the differences between the mean and each data value. Variance (s2) - a measure of how far a set of numbers is spread out. A variance of zero indicates that all the values are identical A small variance indicates a small spread, while a large variance means the numbers are spread out Standard Deviation (s) - shows how much variation or dispersion from the average exists