Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 4: Quantitative Data Part 1: Displaying Quant Data (Week 2, Wednesday) Part 2: Summarizing Quant Data (Week 2, Friday) Displaying Quantitative Data Qualitative data • Few categories made it easy to display this data • Example: Gender has 2 categories (M/F) • Example: Grade has 5 categories (A/B/C/D/F) • Qual Tools: Pie Graphs, Frequency Tables, Bar Charts Quantitative data • Typically has many distinct values • Example: Weight, Age, Height, Salary • Therefore the above qualitative tools won’t work • Quant Tools: Histograms, Stem & Leaf, Dot Plots Displaying Quantitative Data Histogram (p. 48) • Group data into “bins” • Example: Test Grades 55 61 64 65 76 77 90 97 56 62 64 67 76 78 91 98 59 62 64 69 77 79 93 98 60 63 64 75 77 85 94 99 Bins: [55, 60) [70, 75) [85, 90) [60, 65) [75, 80) [90, 95) [65, 70) [80, 85) [95, 100) Displaying Quantitative Data Histogram (p. 48) • Group data into “bins” • Example: Test Grades 55 61 64 65 76 77 90 97 56 62 64 67 76 78 91 98 59 62 64 69 77 79 93 98 60 63 64 69 77 85 94 99 Bins: [55, 60) [70, 75) [85, 90) [60, 65) [75, 80) [90, 95) [65, 70) [80, 85) [95, 100) *** Notice the observation 60 is placed in the bin [60,65) not [55,60) *** This is the standard way to place observations that fall on the boundary Displaying Quantitative Data Histogram (p. 48) • Group data into “bins” • Example: Test Grades 55 61 64 65 76 77 90 97 56 62 64 67 76 78 91 98 59 62 64 69 77 79 93 98 60 63 64 69 77 85 94 99 Displaying Quantitative Data Histogram (p. 48) • Group data into “bins” • Example: Test Grades 55 61 64 65 76 77 90 97 56 62 64 67 76 78 91 98 59 62 64 69 77 79 93 98 60 63 64 69 77 85 94 99 Displaying Quantitative Data Stem and Leaf (p. 50) • “Histograms provide an easy-to-understand summary of the distribution, but they don’t show the data values themselves” • Stem and Leaf displays are the solution. Displaying Quantitative Data Stem and Leaf (p. 50) • Example: Test Grades 55 61 64 65 76 77 90 97 56 62 64 67 76 78 91 98 59 62 64 69 77 79 93 98 60 63 64 75 77 85 94 99 Bins: [50, 60) [80, 90) [60, 70) [90, 100) [70, 80) Displaying Quantitative Data Stem and Leaf (p. 50) • Example: Test Grades 55 61 64 65 76 77 90 97 56 62 64 67 76 78 91 98 59 62 64 69 77 79 93 98 60 63 64 75 77 85 94 99 Bins: [50, 60) [80, 90) [60, 70) [90, 100) [70, 80) Displaying Quantitative Data Stem and Leaf (p. 50) • Example: Test Grades 55 61 64 65 76 77 90 97 56 62 64 67 76 78 91 98 59 62 64 69 77 79 93 98 60 63 64 75 77 85 94 99 5 569 6 012234444579 7 56677789 8 5 9 01347889 Test Grades (1|2 means 12%) *** See page 51 to learn how to build stem and leaf displays for bins that differ from 10 units in length *** Displaying Quantitative Data Dotplots (p. 52) • Example: Test Grades 55 61 64 65 76 77 90 97 56 62 64 67 76 78 91 98 59 62 64 69 77 79 93 98 60 63 64 75 77 85 94 99 Chapter 4: Quantitative Data Part 2: Summarizing Quant Data (Week 2, Friday) Summarizing Quantitative Data Shape • Mode • Symmetry (Symmetric, Skewed) Center • Median • Mean Spread • Range • Quartiles • IQR Advanced Topics (used throughout rest of semester) • Variance • Standard Deviation Summarizing Quantitative Data Mode (p. 53) • “Does the histogram have a single, central hump or several separated humps? These humps are called modes.” Summarizing Quantitative Data Mode (p. 53) • “Does the histogram have a single, central hump or several separated humps? These humps are called modes.” Unimodal Only one central hump Summarizing Quantitative Data Mode (p. 53) • “Does the histogram have a single, central hump or several separated humps? These humps are called modes.” Bimodal Two central humps Multimodal More than one central hump Summarizing Quantitative Data Mode (p. 53) • “Does the histogram have a single, central hump or several separated humps? These humps are called modes.” Uniform All the bars are approximately the same height and no mode is obvious Summarizing Quantitative Data Symmetry (p. 54) • “Can you fold it along a vertical line through the middle and have the edges match pretty closely, or are more of the values on one side?” Symmetric Skewed to the Left Skewed to the Right Summarizing Quantitative Data Mean VS Median • Mean: what we typically think of when we hear the word “average”. Add up the values and divide by the total number • Median: the number such that exactly half of the values are above it and half are below it. Summarizing Quantitative Data Mean VS Median • Mean: what we typically think of when we hear the word “average”. Add up the values and divide by the total number • Median: the number such that exactly half of the values are above it and half are below it. • Example 1: Consider the test grades 83, 94, 98, 99, 60 The mean can be found through: Mean = (83+94+98+99+60)/5 = 86.8 The median can be found by first ordering the values from smallest to highest: 60 83 94 98 99 Then selecting the number that is in the middle. Median = 94 Summarizing Quantitative Data Mean VS Median • Mean: what we typically think of when we hear the word “average”. Add up the values and divide by the total number • Median: the number such that exactly half of the values are above it and half are below it. • Example 2: Consider the test grades 83, 94, 98, 99 The mean can be found through: Mean = (83+94+98+99)/4 = 93.5 The median can be found by first ordering the values from smallest to highest: 83 94 98 99 Then “averaging” the two numbers in the middle: Median = (94+98)/2 = 96 Summarizing Quantitative Data Range • Largest Number – Smallest Number • Example: Consider the test grades 83, 94, 98, 99 The range can be found through: Range = 99 – 83 = 16 *** THE RANGE IS A NUMBER. “83 to 99” IS WRONG Summarizing Quantitative Data Quartiles (p. 58) • A special way of splitting the data into fourths • Order the data, split it in half, Find the medians of each half. • “Lower Quartile” (or “Q1”) is the lower median • “Upper Quartile” (or “Q3”) is the upper median Example 1: (even number of values) Find Q1 and Q3 of the following ages: 23 34 33 22 50 21 18 22 First, order the numbers from lowest to highest: 18 21 22 22 23 33 34 50 Next, split the data in half (four numbers in each half) First half: 18 21 22 22 Q1 = Median of First Half = (21+22)/2 = 21.5 Last half: 23 33 34 50 Q3 = Median of Last Half = (33+34)/2 = 33.5 Summarizing Quantitative Data Quartiles (p. 58) • A special way of splitting the data into fourths • Order the data, split it in half, Find the medians of each half. • “Lower Quartile” (or “Q1”) is the lower median • “Upper Quartile” (or “Q3”) is the upper median Example 2: (odd number of values) Find Q1 and Q3 of the following ages: 34 33 22 50 21 18 22 First, order the numbers from lowest to highest: 18 21 22 22 33 34 50 Next, split the data in half (22 is included in both) First half: 18 21 22 22 Q1 = Median of First Half = (21+22)/2 = 21.5 Last half: 22 33 34 50 Q3 = Median of Last Half = (33+34)/2 = 33.5 Summarizing Quantitative Data Quartiles (p. 58) • A special way of splitting the data into fourths • Order the data, split it in half, Find the medians of each half. • “Lower Quartile” (or “Q1”) is the lower median • “Upper Quartile” (or “Q3”) is the upper median IQR (“Inner-quartile Range”) • IQR = Q3 – Q1 • Single number (just like range is a single number) Summarizing Quantitative Data Summation Notation (p. 62) • Consider the grades: 80, 85, 90, 95. • ∑y (represents a “summation” of the grades) That is: ∑y = 80 + 85 + 90 + 95 = 350 • For now on, we will use a new notation for MEAN y y n Where “y-bar” represents the mean and n is the number of values for y Summarizing Quantitative Data Variance • The variance of a variable is a measure of how “spread out” the data is. • It is given by the following “complicated” formula: 2 ( y y ) 2 s n 1 • Note that “s-squared” represents the variance. • The equation is best understood through an example. Summarizing Quantitative Data Variance Example Consider the grades: 80, 85, 90, 95. Find the variance through the equation: 2 ( y y ) 2 s n 1 Summarizing Quantitative Data Variance Example Consider the grades: 80, 85, 90, 95. Find the variance through the equation: 2 ( y y ) 2 s n 1 y y y ( y y )2 Summarizing Quantitative Data Variance Example Consider the grades: 80, 85, 90, 95. Find the variance through the equation: 2 ( y y ) 2 s n 1 y 80 85 90 95 y y ( y y )2 Summarizing Quantitative Data Variance Example Consider the grades: 80, 85, 90, 95. Find the variance through the equation: 2 ( y y ) 2 s n 1 y y y ( y y )2 80 85 90 95 y y 80 85 90 95 87.5 n 4 Summarizing Quantitative Data Variance Example Consider the grades: 80, 85, 90, 95. Find the variance through the equation: 2 ( y y ) 2 s n 1 y y y 80 -7.5 85 -2.5 90 2.5 95 7.5 ( y y )2 y y 80 85 90 95 87.5 n 4 Summarizing Quantitative Data Variance Example Consider the grades: 80, 85, 90, 95. Find the variance through the equation: 2 ( y y ) 2 s n 1 y y y ( y y )2 80 -7.5 56.25 85 -2.5 6.25 90 2.5 6.25 95 7.5 56.25 y y 80 85 90 95 87.5 n 4 Summarizing Quantitative Data Variance Example Consider the grades: 80, 85, 90, 95. Find the variance through the equation: 2 ( y y ) 2 s n 1 y y y ( y y )2 80 -7.5 56.25 85 -2.5 6.25 90 2.5 6.25 95 7.5 56.25 y y 80 85 90 95 87.5 2 ( y y ) 56.25 6.25 6.25 56.25 125 n 4 Summarizing Quantitative Data Variance Example Consider the grades: 80, 85, 90, 95. Find the variance through the equation: 2 ( y y ) 125 2 s 41.67 n 1 4 1 y y y ( y y )2 80 -7.5 56.25 85 -2.5 6.25 90 2.5 6.25 95 7.5 56.25 y y 80 85 90 95 87.5 2 ( y y ) 56.25 6.25 6.25 56.25 125 n 4 Summarizing Quantitative Data Variance Example Let’s take a closer look at what’s happening: s 2 ( y y) 2 n 1 y y y ( y y )2 80 -7.5 56.25 85 -2.5 6.25 90 2.5 6.25 95 7.5 56.25 y y 30 40 50 60 70 80 y 90 100 Summarizing Quantitative Data Variance Example Let’s take a closer look at what’s happening: s 2 30 ( y y) 2 n 1 40 50 60 y y y ( y y )2 80 -7.5 56.25 85 -2.5 6.25 90 2.5 6.25 95 7.5 56.25 70 80 90 100 Summarizing Quantitative Data Variance Example Let’s take a closer look at what’s happening: s 2 ( y y) 2 n 1 When data is more spread out, the result is a higher variance 30 40 50 60 y y y ( y y )2 80 -7.5 56.25 85 -2.5 6.25 90 2.5 6.25 95 7.5 56.25 70 80 90 100 Summarizing Quantitative Data Variance Example Let’s take a closer look at what’s happening: s 2 ( y y) 2 n 1 When all of the values are the same, the variance is 0 30 40 50 60 y y y ( y y )2 80 -7.5 56.25 85 -2.5 6.25 90 2.5 6.25 95 7.5 56.25 70 80 90 100 Summarizing Quantitative Data Standard Deviation Standard deviation is the square-root of variance: s s 2 Note: the symbol for standard deviation is s