Download Week 2 - gozips.uakron.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 4: Quantitative Data
Part 1: Displaying Quant Data
(Week 2, Wednesday)
Part 2: Summarizing Quant Data
(Week 2, Friday)
Displaying Quantitative Data
Qualitative data
• Few categories made it easy to display this data
• Example: Gender has 2 categories (M/F)
• Example: Grade has 5 categories (A/B/C/D/F)
• Qual Tools: Pie Graphs, Frequency Tables, Bar Charts
Quantitative data
• Typically has many distinct values
• Example: Weight, Age, Height, Salary
• Therefore the above qualitative tools won’t work
• Quant Tools: Histograms, Stem & Leaf, Dot Plots
Displaying Quantitative Data
Histogram (p. 48)
• Group data into “bins”
• Example: Test Grades
55
61
64 65
76
77
90
97
56
62
64 67
76
78
91
98
59
62
64 69
77
79
93
98
60
63
64 75
77
85
94
99
Bins:
[55, 60)
[70, 75)
[85, 90)
[60, 65)
[75, 80)
[90, 95)
[65, 70)
[80, 85)
[95, 100)
Displaying Quantitative Data
Histogram (p. 48)
• Group data into “bins”
• Example: Test Grades
55
61
64 65
76
77
90
97
56
62
64 67
76
78
91
98
59
62
64 69
77
79
93
98
60
63
64 69
77
85
94
99
Bins:
[55, 60)
[70, 75)
[85, 90)
[60, 65)
[75, 80)
[90, 95)
[65, 70)
[80, 85)
[95, 100)
*** Notice the observation 60 is placed in the bin [60,65) not [55,60)
*** This is the standard way to place observations that fall on the boundary
Displaying Quantitative Data
Histogram (p. 48)
• Group data into “bins”
• Example: Test Grades
55
61
64 65
76
77
90
97
56
62
64 67
76
78
91
98
59
62
64 69
77
79
93
98
60
63
64 69
77
85
94
99
Displaying Quantitative Data
Histogram (p. 48)
• Group data into “bins”
• Example: Test Grades
55
61
64 65
76
77
90
97
56
62
64 67
76
78
91
98
59
62
64 69
77
79
93
98
60
63
64 69
77
85
94
99
Displaying Quantitative Data
Stem and Leaf (p. 50)
• “Histograms provide an easy-to-understand summary
of the distribution, but they don’t show the data values
themselves”
• Stem and Leaf displays are the solution.
Displaying Quantitative Data
Stem and Leaf (p. 50)
• Example: Test Grades
55
61
64 65
76
77
90
97
56
62
64 67
76
78
91
98
59
62
64 69
77
79
93
98
60
63
64 75
77
85
94
99
Bins:
[50, 60)
[80, 90)
[60, 70)
[90, 100)
[70, 80)
Displaying Quantitative Data
Stem and Leaf (p. 50)
• Example: Test Grades
55
61
64 65
76
77
90
97
56
62
64 67
76
78
91
98
59
62
64 69
77
79
93
98
60
63
64 75
77
85
94
99
Bins:
[50, 60)
[80, 90)
[60, 70)
[90, 100)
[70, 80)
Displaying Quantitative Data
Stem and Leaf (p. 50)
• Example: Test Grades
55
61
64 65
76
77
90
97
56
62
64 67
76
78
91
98
59
62
64 69
77
79
93
98
60
63
64 75
77
85
94
99
5
569
6
012234444579
7
56677789
8
5
9
01347889
Test Grades (1|2 means 12%)
*** See page 51 to learn
how to build stem and
leaf displays for bins that
differ from 10 units in
length ***
Displaying Quantitative Data
Dotplots (p. 52)
• Example: Test Grades
55
61
64 65
76
77
90
97
56
62
64 67
76
78
91
98
59
62
64 69
77
79
93
98
60
63
64 75
77
85
94
99
Chapter 4: Quantitative Data
Part 2: Summarizing Quant Data
(Week 2, Friday)
Summarizing Quantitative Data
Shape
• Mode
• Symmetry (Symmetric, Skewed)
Center
• Median
• Mean
Spread
• Range
• Quartiles
• IQR
Advanced Topics (used throughout rest of semester)
• Variance
• Standard Deviation
Summarizing Quantitative Data
Mode (p. 53)
• “Does the histogram have a single, central hump or
several separated humps? These humps are called
modes.”
Summarizing Quantitative Data
Mode (p. 53)
• “Does the histogram have a single, central hump or
several separated humps? These humps are called
modes.”
Unimodal
Only one central hump
Summarizing Quantitative Data
Mode (p. 53)
• “Does the histogram have a single, central hump or
several separated humps? These humps are called
modes.”
Bimodal
Two central humps
Multimodal
More than one central
hump
Summarizing Quantitative Data
Mode (p. 53)
• “Does the histogram have a single, central hump or
several separated humps? These humps are called
modes.”
Uniform
All the bars are
approximately the
same height and no
mode is obvious
Summarizing Quantitative Data
Symmetry (p. 54)
• “Can you fold it along a vertical line through the
middle and have the edges match pretty closely, or are
more of the values on one side?”
Symmetric
Skewed to the Left
Skewed to the Right
Summarizing Quantitative Data
Mean VS Median
• Mean: what we typically think of when we hear the
word “average”. Add up the values and divide by the
total number
• Median: the number such that exactly half of the
values are above it and half are below it.
Summarizing Quantitative Data
Mean VS Median
• Mean: what we typically think of when we hear the
word “average”. Add up the values and divide by the
total number
• Median: the number such that exactly half of the
values are above it and half are below it.
• Example 1: Consider the test grades
83, 94, 98, 99, 60
The mean can be found through:
Mean = (83+94+98+99+60)/5 = 86.8
The median can be found by first ordering the values
from smallest to highest:
60 83 94 98 99
Then selecting the number that is in the middle.
Median = 94
Summarizing Quantitative Data
Mean VS Median
• Mean: what we typically think of when we hear the
word “average”. Add up the values and divide by the
total number
• Median: the number such that exactly half of the
values are above it and half are below it.
• Example 2: Consider the test grades
83, 94, 98, 99
The mean can be found through:
Mean = (83+94+98+99)/4 = 93.5
The median can be found by first ordering the values
from smallest to highest:
83 94 98 99
Then “averaging” the two numbers in the middle:
Median = (94+98)/2 = 96
Summarizing Quantitative Data
Range
• Largest Number – Smallest Number
• Example: Consider the test grades
83, 94, 98, 99
The range can be found through:
Range = 99 – 83 = 16
*** THE RANGE IS A NUMBER. “83 to 99” IS WRONG
Summarizing Quantitative Data
Quartiles (p. 58)
• A special way of splitting the data into fourths
• Order the data, split it in half, Find the medians of
each half.
• “Lower Quartile” (or “Q1”) is the lower median
• “Upper Quartile” (or “Q3”) is the upper median
Example 1: (even number of values)
Find Q1 and Q3 of the following ages:
23 34 33 22 50 21 18 22
First, order the numbers from lowest to highest:
18 21 22 22 23 33 34 50
Next, split the data in half (four numbers in each half)
First half: 18 21 22 22
Q1 = Median of First Half = (21+22)/2 = 21.5
Last half: 23 33 34 50
Q3 = Median of Last Half = (33+34)/2 = 33.5
Summarizing Quantitative Data
Quartiles (p. 58)
• A special way of splitting the data into fourths
• Order the data, split it in half, Find the medians of
each half.
• “Lower Quartile” (or “Q1”) is the lower median
• “Upper Quartile” (or “Q3”) is the upper median
Example 2: (odd number of values)
Find Q1 and Q3 of the following ages:
34 33 22 50 21 18 22
First, order the numbers from lowest to highest:
18 21 22 22 33 34 50
Next, split the data in half (22 is included in both)
First half: 18 21 22 22
Q1 = Median of First Half = (21+22)/2 = 21.5
Last half: 22 33 34 50
Q3 = Median of Last Half = (33+34)/2 = 33.5
Summarizing Quantitative Data
Quartiles (p. 58)
• A special way of splitting the data into fourths
• Order the data, split it in half, Find the medians of
each half.
• “Lower Quartile” (or “Q1”) is the lower median
• “Upper Quartile” (or “Q3”) is the upper median
IQR (“Inner-quartile Range”)
• IQR = Q3 – Q1
• Single number (just like range is a single number)
Summarizing Quantitative Data
Summation Notation (p. 62)
• Consider the grades: 80, 85, 90, 95.
• ∑y (represents a “summation” of the grades)
That is: ∑y = 80 + 85 + 90 + 95 = 350
• For now on, we will use a new notation for MEAN
y

y
n
Where “y-bar” represents the mean
and n is the number of values for y
Summarizing Quantitative Data
Variance
• The variance of a variable is a measure of how
“spread out” the data is.
• It is given by the following “complicated” formula:
2
(
y

y
)

2
s 
n 1
• Note that “s-squared” represents the variance.
• The equation is best understood through an example.
Summarizing Quantitative Data
Variance Example
Consider the grades: 80, 85, 90, 95.
Find the variance through the equation:
2
(
y

y
)

2
s 
n 1
Summarizing Quantitative Data
Variance Example
Consider the grades: 80, 85, 90, 95.
Find the variance through the equation:
2
(
y

y
)

2
s 
n 1
y
y y
( y  y )2
Summarizing Quantitative Data
Variance Example
Consider the grades: 80, 85, 90, 95.
Find the variance through the equation:
2
(
y

y
)

2
s 
n 1
y
80
85
90
95
y y
( y  y )2
Summarizing Quantitative Data
Variance Example
Consider the grades: 80, 85, 90, 95.
Find the variance through the equation:
2
(
y

y
)

2
s 
n 1
y
y y
( y  y )2
80
85
90
95
y
 y  80  85  90  95  87.5
n
4
Summarizing Quantitative Data
Variance Example
Consider the grades: 80, 85, 90, 95.
Find the variance through the equation:
2
(
y

y
)

2
s 
n 1
y
y y
80
-7.5
85
-2.5
90
2.5
95
7.5
( y  y )2
y
 y  80  85  90  95  87.5
n
4
Summarizing Quantitative Data
Variance Example
Consider the grades: 80, 85, 90, 95.
Find the variance through the equation:
2
(
y

y
)

2
s 
n 1
y
y y
( y  y )2
80
-7.5
56.25
85
-2.5
6.25
90
2.5
6.25
95
7.5
56.25
y
 y  80  85  90  95  87.5
n
4
Summarizing Quantitative Data
Variance Example
Consider the grades: 80, 85, 90, 95.
Find the variance through the equation:
2
(
y

y
)

2
s 
n 1
y
y y
( y  y )2
80
-7.5
56.25
85
-2.5
6.25
90
2.5
6.25
95
7.5
56.25
y
 y  80  85  90  95  87.5
2
(
y

y
)
 56.25  6.25  6.25  56.25  125

n
4
Summarizing Quantitative Data
Variance Example
Consider the grades: 80, 85, 90, 95.
Find the variance through the equation:
2
(
y

y
)
125

2
s 

 41.67
n 1
4 1
y
y y
( y  y )2
80
-7.5
56.25
85
-2.5
6.25
90
2.5
6.25
95
7.5
56.25
y
 y  80  85  90  95  87.5
2
(
y

y
)
 56.25  6.25  6.25  56.25  125

n
4
Summarizing Quantitative Data
Variance Example
Let’s take a closer look at what’s happening:
s
2
( y  y)


2
n 1
y
y y
( y  y )2
80
-7.5
56.25
85
-2.5
6.25
90
2.5
6.25
95
7.5
56.25
y
y
30
40
50
60
70
80
y
90
100
Summarizing Quantitative Data
Variance Example
Let’s take a closer look at what’s happening:
s
2
30
( y  y)


2
n 1
40
50
60
y
y y
( y  y )2
80
-7.5
56.25
85
-2.5
6.25
90
2.5
6.25
95
7.5
56.25
70
80
90
100
Summarizing Quantitative Data
Variance Example
Let’s take a closer look at what’s happening:
s
2
( y  y)


2
n 1
When data is more spread out,
the result is a higher variance
30
40
50
60
y
y y
( y  y )2
80
-7.5
56.25
85
-2.5
6.25
90
2.5
6.25
95
7.5
56.25
70
80
90
100
Summarizing Quantitative Data
Variance Example
Let’s take a closer look at what’s happening:
s
2
( y  y)


2
n 1
When all of the values are the
same, the variance is 0
30
40
50
60
y
y y
( y  y )2
80
-7.5
56.25
85
-2.5
6.25
90
2.5
6.25
95
7.5
56.25
70
80
90
100
Summarizing Quantitative Data
Standard Deviation
Standard deviation is the square-root of variance:
s s
2
Note: the symbol for standard deviation is s
Related documents