Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter Four Quantitative Data • Distribution: gives the possible values of a variable and the frequency of each value. – Organized into bins – 3 types of displays: • histogram, stem & leaf, dotplot Leaving on a jet plane??? Slide 2 - 4 KEY 5|5 = 5.0 Stem-and-Leaf Example • Compare the histogram and stem-and-leaf display for the pulse rates of 24 women at a health clinic. Which graphical display do you prefer? Slide 4 - 6 Back to Back Stem and Leaf Plots Dotplots • A dotplot is a simple display. It just places a dot along an axis for each case in the data. • The dotplot to the right shows Kentucky Derby winning times, plotting each race as its own dot. • You might see a dotplot displayed horizontally or vertically. Slide 4 - 8 Center, Shape & Spread • Shape – how many modes (humps) – Unimodal – 1 peak – Bimodal – 2 peaks – Multimodal – 3 or more peaks – Uniform – all bins approximately the same height Symmetry versus skew symmetric Skewed Right Skewed Left • Look for unusual features – Outliers: values that are extreme and don’t appear to belong with the rest of the data. Could be unusual values that need attention or a mistake. Look at why??? Outliers influence statistical analysis – Gaps: warn the data is not homogeneous – If unusual, can re-express data by methods such a logarithm (skewed right) or square (skewed left). Cookie Data Keebler Nabisco 8 8 15 16 10 10 16 16 11 12 16 17 12 12 17 18 13 13 18 18 13 14 18 18 15 15 18 19 15 16 20 21 16 16 22 23 17 17 24 27 19 33 President Age President Age President Age Washington 57 Buchanan 65 Harding 55 J. Adams 61 Lincoln 52 Coolidge 51 Jefferson 57 A. Johnson 56 Hoover 54 Madison 57 Grant 46 FD Roosevelt 51 Monroe 58 Hayes 54 Truman 60 JQ Adams 57 Garfield 49 Eisenhower 61 Jackson 61 Arthur 51 Kennedy 43 Van Buren 54 Cleveland 47 LB Johnson 55 WH Harrison 68 B. Harrison 55 Nixon 56 Tyler 51 Cleveland 55 Ford 61 Polk 49 McKinley 54 Carter 52 Taylor 64 T. Roosevelt 42 Reagan 69 Fillmore 50 Taft 51 G. Bush/GW 64/54 Pierce 48 Wilson 56 Clinton 46 Obama 47 The following data shows the percentage of water quality tests that failed to meet water quality standards at 82 swimming beaches in California. The data is divided into those beaches inside and outside of Los Angeles County. Mean or Median? • Because the median considers only the order of values, it is resistant to values that are extraordinarily large or small; it simply notes that they are one of the “big ones” or “small ones” and ignores their distance from center. • To choose between the mean and median, start by looking at the data. If the histogram is symmetric and there are no outliers, use the mean. • However, if the histogram is skewed or with outliers, you are better off with the median. What About Spread? The Standard Deviation • A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean. • A deviation is the distance that a data value is from the mean. – Since adding all deviations together would total zero, we square each deviation and find an average of sorts for the deviations. What About Spread? The Standard Deviation (cont.) • The variance, notated by s2, is found by summing the squared deviations and (almost) averaging them: 2 y y 2 s n 1 • The variance will play a role later in our study, but it is problematic as a measure of spread—it is measured in squared units! What About Spread? The Standard Deviation (cont.) • The standard deviation, s, is just the square root of the variance and is measured in the same units as the original data. y y 2 s n 1 Tell -- Shape, Center, and Spread • Next, always report the shape of its distribution, along with a center and a spread. – If the shape is skewed, report the median and IQR. – If the shape is symmetric, report the mean and standard deviation and possibly the median and IQR as well. GRAPHING STATIONS • Find the following for your data set – Mean & Standard Deviation – 5 Number Summary – IQR – Range • Produce missing display (Histogram or stem & leaf