Download Chapter 3 Displaying and Summarizing Quantitative Data

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 3 Displaying and Summarizing
Quantitative Data
Dot Plots
Definition
A dot plot is the representation of a set of data over a number line.
The number of dots over a number represents the relative quantity of
the value.
Dot Plots
Definition
A dot plot is the representation of a set of data over a number line.
The number of dots over a number represents the relative quantity of
the value.
Example
The following ate the test scores for a particular high school student
in their math class over the course of an academic year.
64
71
83
92
73
56
87
95
85
83
92
92
74
76
84
91
83
85
95
Solution
Frequency
Grades of a High School Student
3
•
•
2
••
••
•
1
50
•
60
••••
70
Grades
•••• •••
80
90
100
Pros and Cons
What’s Good
Gives a good idea of distribution
Pros and Cons
What’s Good
Gives a good idea of distribution
Preserves all of the data points
Pros and Cons
What’s Good
Gives a good idea of distribution
Preserves all of the data points
What’s Not Good
Tedious to plot
Pros and Cons
What’s Good
Gives a good idea of distribution
Preserves all of the data points
What’s Not Good
Tedious to plot
Can be hard to read
Pros and Cons
What’s Good
Gives a good idea of distribution
Preserves all of the data points
What’s Not Good
Tedious to plot
Can be hard to read
Not practical for large data sets
Distributions
Definition
A distribution is a representation of data vs. frequency. It shows all
possible values and how often they occur.
Distributions
Definition
A distribution is a representation of data vs. frequency. It shows all
possible values and how often they occur.
Now we want to concern ourselves with the analysis of the graphs.
We can analyze these in a much more constructive way that we could
with the graphs of categorical variables. Here we are analyzing the
distribution represented by the graph.
Distribution Analysis
1
Center: Which class contains the central element(s)
Distribution Analysis
1
2
Center: Which class contains the central element(s)
Shape: Number of peaks, skewness
Distribution Analysis
1
2
3
Center: Which class contains the central element(s)
Shape: Number of peaks, skewness
Spread: Range=max-min
Distribution Analysis
1
2
3
Center: Which class contains the central element(s)
Shape: Number of peaks, skewness
Spread: Range=max-min
In our example, we can see a couple of things:
Range: Highest value - lowest value
Here, the range would be 95 − 56 = 39.
Distribution Analysis
1
2
3
Center: Which class contains the central element(s)
Shape: Number of peaks, skewness
Spread: Range=max-min
In our example, we can see a couple of things:
Range: Highest value - lowest value
Here, the range would be 95 − 56 = 39.
Center: The central value(s) is the center. It could be a value or a
class, depending on the type of graph.
Here, the center is the 10th value, since there are 19 data points in
the set. The value we seek is 84.
Distribution Analysis
1
2
3
Center: Which class contains the central element(s)
Shape: Number of peaks, skewness
Spread: Range=max-min
In our example, we can see a couple of things:
Range: Highest value - lowest value
Here, the range would be 95 − 56 = 39.
Center: The central value(s) is the center. It could be a value or a
class, depending on the type of graph.
Here, the center is the 10th value, since there are 19 data points in
the set. The value we seek is 84.
Shape: How many peaks are there? Is it roughly in the middle or
to one side?
Here we have one peak, so we would say the distribution is
unimodal. That peak is to the right, so the tail stretches out to the
left. We would say this graph is left skewed.
Stem-and-Leaf Plots
Similarities to Dot Plots
Gives idea of distribution
Stem-and-Leaf Plots
Similarities to Dot Plots
Gives idea of distribution
Preserves data
Stem-and-Leaf Plots
Similarities to Dot Plots
Gives idea of distribution
Preserves data
Not practical for large data sets
Differences from Dot Plots
Used for quantitative variables
Stem-and-Leaf Plots
Similarities to Dot Plots
Gives idea of distribution
Preserves data
Not practical for large data sets
Differences from Dot Plots
Used for quantitative variables
Easier to read actual data elements
Stem-and-Leaf Plots
Similarities to Dot Plots
Gives idea of distribution
Preserves data
Not practical for large data sets
Differences from Dot Plots
Used for quantitative variables
Easier to read actual data elements
Can be used for comparisons of two data sets
Stem-and-Leaf Plot Example
Example
Using the same data set as we did for the dot plot, construct a
stem-and-leaf plot.
First thing we need to do is order the data elements.
56
76
85
92
64
83
85
92
71
83
87
95
73
83
91
95
74
84
92
Stem-and-Leaf Plot Example
Grades for a High School Student
9
8
7
6
5
These would be the stems for our plot.
Stem-and-Leaf Plot Example
Grades for a High School Student
9
8
7
6
5
These would be the stems for our plot.
Note: Repetition is extremely important.
Stem-and-Leaf Plot Example
Grades for a High School Student
9 1 2 2 2 5 5
8 3 3 3 4 5 5 7
7 1 3 4 6
6 4
5 6
Here we get the exact same answer for the range and the center,
although we only give the class in which the center lies, so we would
say that the center is in the 80’s. We get that the shape is again
unimodal and skewed left. It may look different, but since it
represents the same distribution, we expect similar answers.
Stem-and-Leaf Plot Example
Grades for a High School Student
9 1 2 2 2 5 5
8 3 3 3 4 5 5 7
7 1 3 4 6
6 4
5 6
Here we get the exact same answer for the range and the center,
although we only give the class in which the center lies, so we would
say that the center is in the 80’s. We get that the shape is again
unimodal and skewed left. It may look different, but since it
represents the same distribution, we expect similar answers.
Notice that the values on the right are essentially in columns - this is
what allows us to quickly see which classes have more elements.
More Stem-and-Leaf Plots
What if we had a 3 digit number? Suppose the student got a 100 on
the next exam?
Grades for a High School Student
10 0
9 1 2 2 2 5 5
8 3 3 3 4 5 5 7
7 1 3 4 6
6 4
5 6
Stem-and-Leaf Plots for Comparisons
Example
Suppose we wanted to compare the careers of Babe Ruth and Mark
McGwire in terms of their yearly home run totals to determine which
player was the more consistent long ball hitter. Make a back-to-back
stem-and-leaf plot to make the is determination.
Ruth: 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22
McGwire: 49, 32, 33, 39, 22, 42, 9, 9, 39, 52, 58, 70, 65, 32, 29
Stem-and-Leaf Plots for Comparisons
Ruth v. McGwire
7
6
5
4
3
2
1
0
We set up the graph with one set of data increasing out to the right
and the other increasing out to the left. This way we have a
side-by-side comparison of the data sets.
Stem-and-Leaf Plots for Comparisons
9
7
6
6
9
6
Ruth v. McGwire
7 0
0 6 5
4 4 5 2 8
1 1 4 2 9
5 4 3 2 2 3
5 2 2 2 9
1
0 9 9
Who is more consistent and why?
9
9
Histograms
Used for quantitative variables
Histograms
Used for quantitative variables
Tracks frequency and shows distribution
Histograms
Used for quantitative variables
Tracks frequency and shows distribution
Does not preserve individual values
Histograms
Used for quantitative variables
Tracks frequency and shows distribution
Does not preserve individual values
Good for a large number of values
Histograms
Used for quantitative variables
Tracks frequency and shows distribution
Does not preserve individual values
Good for a large number of values
Bars must be vertical and must touch
Histograms
Example
For our test scores example, construct a histogram and analyze the
distribution.
It is easier if the values are in order as we will be grouping them into
classes.
56
76
85
92
64
83
85
92
71
83
87
95
73
83
91
95
74
84
92
Histograms
We first want to create a frequency table. This is a collection of
non-overlapping classes and the frequency of observation in each of
those classes. We need to determine the following in this order:
Histograms
We first want to create a frequency table. This is a collection of
non-overlapping classes and the frequency of observation in each of
those classes. We need to determine the following in this order:
Number of classes
The rule of thumb with the number of classes is to use the square root
of the number of observations in the data set.
√
19 ≈ 4.36
So, we can use 4 or 5 classes. I tend to go up to the next integer to be
sure I have enough classes. So we will use 5 for our graph.
Histograms
Size of each class
We want them to be the same width so that the taller classes will be
known to have the most elements. If not then we have to find the area
of each rectangle to determine relative size.
To find the size, we divide the ‘range’ by the number of classes.
size =
38
95 − 56 + 1
=
= 7.6
5
5
We could use 7.6 for the class width or we can go to the next largest
integer. Where we may have extra if we round up, it is better than not
having enough of a range in the classes to cover all of the data. For
the sake of simplicity, we will use 8.
Histograms
Endpoints of each class
We start the smallest class with a left endpoint of 56, since that was
our minimum. Then, to find the next left endpoint, add 8 to 56.
Continue in this manner until we have 5 classes.
Grade Range
5664728088-
Frequency
Histograms
Then, we subtract 1 from each left endpoint to find the right endpoint
of the previous class.
Grade Range
56-63
64-71
72-79
80-87
88-95
Frequency
Histograms
Then, we subtract 1 from each left endpoint to find the right endpoint
of the previous class.
Grade Range
56-63
64-71
72-79
80-87
88-95
Frequency
Finally, we count how many elements go in each class.
Grade Range
56-63
64-71
72-79
80-87
88-95
Frequency
1
2
3
7
6
Histograms
Grades of a High School Student
Frequency
8
6
4
2
56
64
72
80
Grades
88
96
We see the same range and shape. Here, we’d have no choice but to
give the class only for the center as we would lose the ability to see
individual values.
Using The Calculator
We can make some graphs on the TI-series graphing calculator. One
of the options we have is to make a histogram.
Using The Calculator
We can make some graphs on the TI-series graphing calculator. One
of the options we have is to make a histogram.
The advantages to using technology are that we don’t have to make
frequency tables or figure out how many classes we need, etc.
Using The Calculator
We can make some graphs on the TI-series graphing calculator. One
of the options we have is to make a histogram.
The advantages to using technology are that we don’t have to make
frequency tables or figure out how many classes we need, etc.
We do have to keep in mind, however, that the number of classes may
be different than when we make the graph by hand. We are using
more approximations when we work by hand than when we use
technology. But, this is an acceptable difference as long as the method
we use is valid.
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
3
Press 2nd and then MODE to quit to a blank screen
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
3
Press 2nd and then MODE to quit to a blank screen
4
Press 2nd and Y= to get into the STATPLOT menu
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
3
Press 2nd and then MODE to quit to a blank screen
4
Press 2nd and Y= to get into the STATPLOT menu
5
Make sure all of the plots are off (if need be, use option 4)
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
3
Press 2nd and then MODE to quit to a blank screen
4
Press 2nd and Y= to get into the STATPLOT menu
5
Make sure all of the plots are off (if need be, use option 4)
6
Press ENTER on one of the plots
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
3
Press 2nd and then MODE to quit to a blank screen
4
Press 2nd and Y= to get into the STATPLOT menu
5
Make sure all of the plots are off (if need be, use option 4)
6
Press ENTER on one of the plots
7
Turn the plot ON with the ENTER key
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
3
Press 2nd and then MODE to quit to a blank screen
4
Press 2nd and Y= to get into the STATPLOT menu
5
Make sure all of the plots are off (if need be, use option 4)
6
Press ENTER on one of the plots
7
Turn the plot ON with the ENTER key
8
Select the histogram, which is the third graph in the top row
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
3
Press 2nd and then MODE to quit to a blank screen
4
Press 2nd and Y= to get into the STATPLOT menu
5
Make sure all of the plots are off (if need be, use option 4)
6
Press ENTER on one of the plots
7
Turn the plot ON with the ENTER key
8
Select the histogram, which is the third graph in the top row
9
Make sure the XList is correct and then press ZOOM and then
9 , which is the option for ZOOMSTAT
How To Make Histograms On The TI
1
In the STAT menu, select EDIT
2
Input all of the data in the same column
3
Press 2nd and then MODE to quit to a blank screen
4
Press 2nd and Y= to get into the STATPLOT menu
5
Make sure all of the plots are off (if need be, use option 4)
6
Press ENTER on one of the plots
7
Turn the plot ON with the ENTER key
8
Select the histogram, which is the third graph in the top row
9
Make sure the XList is correct and then press ZOOM and then
9 , which is the option for ZOOMSTAT
If you want to get data from the graph, like endpoints for classes or
for the frequency for a class, you can press TRACE and then use the
arrows to scroll around.
Histograms
Example
The EPA lists most sports cars in its “two-seater” category. The table
below gives the city mileage in miles per gallon. Make and analyze a
histogram for the the city mileage.
Model
Acura NSX
Audi Quattro
Audi Roadster
BMW M Coupe
BMW Z3 Coupe
BMW Z3 Roadster
BMW Z8
Corvette
Prowler
Ferrari 360
Thunderbird
Mileage
17
20
22
17
19
20
13
18
18
11
17
Model
Insight
S2000
Lamborghini
Mazda
SL500
SL600
SLK230
SLK 320
911
Boxster
MR2
Mileage
57
20
9
22
16
13
23
20
15
19
25
Histograms
√
There are 22 cars, so we would use 4 < 22 < 5 classes, so here I
will choose 5. The size of each class would be
49
57 − 9 + 1
=
= 9.8
5
5
So we will use 10.
Histograms
√
There are 22 cars, so we would use 4 < 22 < 5 classes, so here I
will choose 5. The size of each class would be
49
57 − 9 + 1
=
= 9.8
5
5
So we will use 10.
Mileage
9 -18
19-28
29-38
39-48
49-58
Frequency
11
10
0
0
1
Histograms
MPG for Sports Cars
Frequency
12
9
6
3
9
19
29
39
MPG
49
59
Histograms
MPG for Sports Cars
Frequency
12
9
6
3
9
Center:
19
29
39
MPG
49
59
Histograms
MPG for Sports Cars
Frequency
12
9
6
3
9
19
29
39
MPG
49
Center: Boundary between the first two classes
Range:
59
Histograms
MPG for Sports Cars
Frequency
12
9
6
3
9
19
29
39
MPG
49
Center: Boundary between the first two classes
Range: 58 − 9 = 49
Shape:
59
Histograms
MPG for Sports Cars
Frequency
12
9
6
3
9
19
29
39
MPG
49
Center: Boundary between the first two classes
Range: 58 − 9 = 49
Shape: Unimodal, skewed right
59
Central Tendency
We will use three methods of measuring central tendency:
1
mean
2
median
3
mode
Example
Example
Find the mean, median and mode for the following data set.
11
10
9
8
7
6
5
4
3
2
1
0
0
4
0
0
0
5
4
0
0
5
0
0
0
1
2
5
Solution
Mean x
This the arithmetic center.
n
x=
1X
xk
n
k=1
This is just a fancy way of saying to add the 16 values together
and divide by 16. When we do we get
Solution
Mean x
This the arithmetic center.
n
x=
1X
xk
n
k=1
This is just a fancy way of saying to add the 16 values together
and divide by 16. When we do we get
x = 43.5
Solution
Median M
This is the geometric center. To find, we line up all of the values
in order and find the middle one. If there is an odd number of
observations, then the median is the one in the middle. If there is
an even number of observations, the median is the mean of the
two ‘middle’ values. Here, we have
Solution
Median M
This is the geometric center. To find, we line up all of the values
in order and find the middle one. If there is an odd number of
observations, then the median is the one in the middle. If there is
an even number of observations, the median is the mean of the
two ‘middle’ values. Here, we have
M=
32 + 35
= 33.5
2
Mode
This is the value(s) that occur most often, unless all values occur
the same number of times, in which case there is no mode. Here,
Solution
Median M
This is the geometric center. To find, we line up all of the values
in order and find the middle one. If there is an odd number of
observations, then the median is the one in the middle. If there is
an even number of observations, the median is the mean of the
two ‘middle’ values. Here, we have
M=
32 + 35
= 33.5
2
Mode
This is the value(s) that occur most often, unless all values occur
the same number of times, in which case there is no mode. Here,
mode = 30
The Relationship Between Mean and Median
The Relationship Between Mean and Median
This picture indicates a serious drawback to using means: outliers.
The median is what we call resistant; an extreme value does not affect
the median. The mean, however, is not resistant.
When We Use Mean v. Median
1
If distribution is symmetric, then mean = median, and we use the
mean
When We Use Mean v. Median
1
If distribution is symmetric, then mean = median, and we use the
mean
2
If there are outliers or strong skewness, we use the median
Using the Mean
Example
Suppose you got an 84, 72 and 78 on your first 3 exams and wanted to
know what grade you needed to get on the fourth exam to have at least
an 80 average?
Using the Mean
Example
Suppose you got an 84, 72 and 78 on your first 3 exams and wanted to
know what grade you needed to get on the fourth exam to have at least
an 80 average?
We want an average of 80 for the 4 grades. So, we need to solve for x
in
84 + 72 + 78 + x
234 + x
=
= 80
4
4
So, we get
234 + x
= 80 ⇒ 234 + x = 320 ⇒ x = 86
4
Another Mean Example
Example
Suppose you had a 75 average through 4 tests and got an 85 on the 5th
test. What is your average now?
Another Mean Example
Example
Suppose you had a 75 average through 4 tests and got an 85 on the 5th
test. What is your average now?
If we have a 75 average through 4 exams, then we have accumulated
75 × 4 = 300 points. So, if we wanted to know the average with this
5th grade, we’d have
x=
300 + 85
385
=
= 77
5
5
Yet Another Mean Example
Example
Suppose you had a group of 11 people and the average age was 27. If
one of those people left, the average age of the remaining 10 was 29.
What is the age of the person who left?
Yet Another Mean Example
Example
Suppose you had a group of 11 people and the average age was 27. If
one of those people left, the average age of the remaining 10 was 29.
What is the age of the person who left?
Total age of the 11 people: 11 × 27 = 297.
Total age of the 10 people : 10 × 29 = 290
Difference is 297 − 290 = 7
Means From Frequency Tables
Example
Find the mean of the following values.
Age
21
22
23
24
25
Frequency
5
8
4
1
2
Means From Frequency Tables
Example
Find the mean of the following values.
Age
21
22
23
24
25
Frequency
5
8
4
1
2
We first count the total number of observations, which is 20. Then ...
x=
21 ∗ 5 + 22 ∗ 8 + 23 ∗ 4 + 24 ∗ 1 + 25 ∗ 2
447
=
= 22.35
20
20
Box Plots and the 5-Number Summary
When dealing with the median, we measure variation with the
5-number summary. These 5 numbers indicate the maximum and
minimum, the median and the quartiles.
Box Plots and the 5-Number Summary
When dealing with the median, we measure variation with the
5-number summary. These 5 numbers indicate the maximum and
minimum, the median and the quartiles.
In order to find the five number summary, we first line the data
elements in order. Then we find the minimum and maximum, and
then the median.
minimum
smallest value of the set
Box Plots and the 5-Number Summary
When dealing with the median, we measure variation with the
5-number summary. These 5 numbers indicate the maximum and
minimum, the median and the quartiles.
In order to find the five number summary, we first line the data
elements in order. Then we find the minimum and maximum, and
then the median.
minimum
maximum
smallest value of the set
largest value of the set
Box Plots and the 5-Number Summary
When dealing with the median, we measure variation with the
5-number summary. These 5 numbers indicate the maximum and
minimum, the median and the quartiles.
In order to find the five number summary, we first line the data
elements in order. Then we find the minimum and maximum, and
then the median.
minimum
maximum
median
smallest value of the set
largest value of the set
central(s) value of the set
Box Plots and the 5-Number Summary
When dealing with the median, we measure variation with the
5-number summary. These 5 numbers indicate the maximum and
minimum, the median and the quartiles.
In order to find the five number summary, we first line the data
elements in order. Then we find the minimum and maximum, and
then the median.
minimum
maximum
median
first quartile Q1
smallest value of the set
largest value of the set
central(s) value of the set
median of all values smaller than the median
Box Plots and the 5-Number Summary
When dealing with the median, we measure variation with the
5-number summary. These 5 numbers indicate the maximum and
minimum, the median and the quartiles.
In order to find the five number summary, we first line the data
elements in order. Then we find the minimum and maximum, and
then the median.
minimum
maximum
median
first quartile Q1
third quartile Q3
smallest value of the set
largest value of the set
central(s) value of the set
median of all values smaller than the median
median of all values larger than the median
5-Number Summary Example
Example
Find the 5-number summary for the data from the first example.
11
10
9
8
7
6
5
4
3
2
1
0
0
4
0
0
0
5
4
0
0
5
0
0
0
1
2
5
5-Number Summary Example
Since the values are already in order, we only need to calculate the
values.
minimum
Q1
Median
Q3
maximum
4
30
33.5
52.5
110
Teddy Ballgame
Example
Ted Williams yearly RBI totals:
145, 113, 120, 137, 123, 114, 127, 159, 97, 126, 3, 34, 89, 83, 82, 87,
85, 43, 72
Find the 5-number summary for this set of data,
Teddy Ballgame
Example
Ted Williams yearly RBI totals:
145, 113, 120, 137, 123, 114, 127, 159, 97, 126, 3, 34, 89, 83, 82, 87,
85, 43, 72
Find the 5-number summary for this set of data,
What do we do first?
Teddy Ballgame
Example
Ted Williams yearly RBI totals:
145, 113, 120, 137, 123, 114, 127, 159, 97, 126, 3, 34, 89, 83, 82, 87,
85, 43, 72
Find the 5-number summary for this set of data,
What do we do first?
We put the values in order:
3, 34, 43, 72, 82, 83, 85, 87, 89, 97, 113, 114, 120, 123, 126, 127,
137, 145, 159.
Teddy Ballgame
Example
Ted Williams yearly RBI totals:
145, 113, 120, 137, 123, 114, 127, 159, 97, 126, 3, 34, 89, 83, 82, 87,
85, 43, 72
Find the 5-number summary for this set of data,
What do we do first?
We put the values in order:
3, 34, 43, 72, 82, 83, 85, 87, 89, 97, 113, 114, 120, 123, 126, 127,
137, 145, 159.
Then ...
Solution
Minimum
Q1
Median
Q3
Maximum
3
82
97
126
159
Box-and-Whisker Plots
How can we visually represent this summary of the data? We use box
plots, or box-and-whisker plots.
Box-and-Whisker Plots
How can we visually represent this summary of the data? We use box
plots, or box-and-whisker plots.
Ted Williams’ RBI Totals
RBIs
160
140
120
100
80
60
40
20
Teddy Ballgame
Box-and-Whisker Plots
How can we visually represent this summary of the data? We use box
plots, or box-and-whisker plots.
Ted Williams’ RBI Totals
RBIs
160
140
120
100
80
60
40
20
Teddy Ballgame
Box-and-Whisker Plots
How can we visually represent this summary of the data? We use box
plots, or box-and-whisker plots.
Ted Williams’ RBI Totals
RBIs
160
140
120
100
80
60
40
20
Teddy Ballgame
Box-and-Whisker Plots
How can we visually represent this summary of the data? We use box
plots, or box-and-whisker plots.
Ted Williams’ RBI Totals
RBIs
160
140
120
100
80
60
40
20
Teddy Ballgame
Using Technology
The box plot is another that we can construct using the TI-series
graphing calculator. We do everything the same as when constructing
a histogram until we reach the point where we choose the type of
graph.
Using Technology
The box plot is another that we can construct using the TI-series
graphing calculator. We do everything the same as when constructing
a histogram until we reach the point where we choose the type of
graph.
There are two options for box plots.
1
Second row, first graph shows outliers (we will get to those soon)
Using Technology
The box plot is another that we can construct using the TI-series
graphing calculator. We do everything the same as when constructing
a histogram until we reach the point where we choose the type of
graph.
There are two options for box plots.
1
Second row, first graph shows outliers (we will get to those soon)
2
Second row, second graph does not show outliers
Using Technology
The box plot is another that we can construct using the TI-series
graphing calculator. We do everything the same as when constructing
a histogram until we reach the point where we choose the type of
graph.
There are two options for box plots.
1
Second row, first graph shows outliers (we will get to those soon)
2
Second row, second graph does not show outliers
We again use ZOOM and 9 to produce the graph.
Getting Statistics
We can also find the statistics we need using the calculator relatively
easily.
1
Input the data in the usual way
Getting Statistics
We can also find the statistics we need using the calculator relatively
easily.
1
Input the data in the usual way
2
Press 2nd and MODE to quit to a blank screen
Getting Statistics
We can also find the statistics we need using the calculator relatively
easily.
1
Input the data in the usual way
2
Press 2nd and MODE to quit to a blank screen
3
Press STAT , scroll to CALC, and select 1-Var Stats
Getting Statistics
We can also find the statistics we need using the calculator relatively
easily.
1
Input the data in the usual way
2
Press 2nd and MODE to quit to a blank screen
3
Press STAT , scroll to CALC, and select 1-Var Stats
4
You will see 1-Var Stats on the screen; now select which list the
data is in by pressing 2nd and then the appropriate number 1-6,
followed by the ENTER key
Getting Statistics
We can also find the statistics we need using the calculator relatively
easily.
1
Input the data in the usual way
2
Press 2nd and MODE to quit to a blank screen
3
Press STAT , scroll to CALC, and select 1-Var Stats
4
You will see 1-Var Stats on the screen; now select which list the
data is in by pressing 2nd and then the appropriate number 1-6,
followed by the ENTER key
On this screen are some statistics we need x and Sx and if we scroll
down, we will see the 5-number summary.
The Geometric View
From the minimum to Q1 is the bottom 25% of the observations
The Geometric View
From the minimum to Q1 is the bottom 25% of the observations
From Q1 to Q3 is the middle 50% of the observations
The Geometric View
From the minimum to Q1 is the bottom 25% of the observations
From Q1 to Q3 is the middle 50% of the observations
From Q3 to the maximum of the top 25% of the observations
The Geometric View
From the minimum to Q1 is the bottom 25% of the observations
From Q1 to Q3 is the middle 50% of the observations
From Q3 to the maximum of the top 25% of the observations
We can look at this in other ways too:
The top half lies above the median
The Geometric View
From the minimum to Q1 is the bottom 25% of the observations
From Q1 to Q3 is the middle 50% of the observations
From Q3 to the maximum of the top 25% of the observations
We can look at this in other ways too:
The top half lies above the median
The top 75% lies above Q1
The Geometric View
From the minimum to Q1 is the bottom 25% of the observations
From Q1 to Q3 is the middle 50% of the observations
From Q3 to the maximum of the top 25% of the observations
We can look at this in other ways too:
The top half lies above the median
The top 75% lies above Q1
The bottom 75% lies below Q3
Box-and-Whisker Plot Example
Example
Construct a box-and-whisker plot for the data from the first example.
minimum
Q1
Median
Q3
maximum
4
30
33.5
52.5
110
Solution
Some Data Set
Values
125
100
75
50
25
Set 1
Analysis of Box-and-Whisker Plots
We can also look at the distribution like we did with histograms, but
in a limited way as we cannot really tell how many peaks. But we can
look at the spread and center (directly from the table) and we can look
at the skewness.
Analysis of Box-and-Whisker Plots
We can also look at the distribution like we did with histograms, but
in a limited way as we cannot really tell how many peaks. But we can
look at the spread and center (directly from the table) and we can look
at the skewness.
What is the range here?
Analysis of Box-and-Whisker Plots
We can also look at the distribution like we did with histograms, but
in a limited way as we cannot really tell how many peaks. But we can
look at the spread and center (directly from the table) and we can look
at the skewness.
What is the range here? 106
Analysis of Box-and-Whisker Plots
We can also look at the distribution like we did with histograms, but
in a limited way as we cannot really tell how many peaks. But we can
look at the spread and center (directly from the table) and we can look
at the skewness.
What is the range here? 106
What do we know about the distribution?
Analysis of Box-and-Whisker Plots
We can also look at the distribution like we did with histograms, but
in a limited way as we cannot really tell how many peaks. But we can
look at the spread and center (directly from the table) and we can look
at the skewness.
What is the range here? 106
What do we know about the distribution? Skewed right distribution.
Analysis of Box-and-Whisker Plots
We can also look at the distribution like we did with histograms, but
in a limited way as we cannot really tell how many peaks. But we can
look at the spread and center (directly from the table) and we can look
at the skewness.
What is the range here? 106
What do we know about the distribution? Skewed right distribution.
Further, this right endpoint seems to be pretty far away, so we may
think it is an outlier. But how do we determine if it is analytically?
IQR Criterion
Definition
The IQR Criterion is an analytic way for us to determine if data points
are outliers based on a 5-number summary. To determine outliers, we
use
Q1 − 1.5IQR
and
Q3 + 1.5IQR
to give us endpoints of the acceptable data range, where IQR is the
Interquartile Range and
IQR = Q3 − Q1
IQR Criterion
Definition
The IQR Criterion is an analytic way for us to determine if data points
are outliers based on a 5-number summary. To determine outliers, we
use
Q1 − 1.5IQR
and
Q3 + 1.5IQR
to give us endpoints of the acceptable data range, where IQR is the
Interquartile Range and
IQR = Q3 − Q1
These new endpoints are sometimes referred to as fences.
Using the IQR Criterion
So, basically what we are doing is saying that any values no further
away from the middle 50% than 1.5 times the range of the middle
50% are acceptable. Anything outside that range is an outlier.
Using the IQR Criterion
So, basically what we are doing is saying that any values no further
away from the middle 50% than 1.5 times the range of the middle
50% are acceptable. Anything outside that range is an outlier.
Example
Are there any outliers in the previous data set?
Using the IQR Criterion
So, basically what we are doing is saying that any values no further
away from the middle 50% than 1.5 times the range of the middle
50% are acceptable. Anything outside that range is an outlier.
Example
Are there any outliers in the previous data set?
First we find the IQR, which is
Q3 − Q1 = 52.5 − 30 = 22.5
and then we consider the new endpoints (fences).
Using the IQR Criterion
So, basically what we are doing is saying that any values no further
away from the middle 50% than 1.5 times the range of the middle
50% are acceptable. Anything outside that range is an outlier.
Example
Are there any outliers in the previous data set?
First we find the IQR, which is
Q3 − Q1 = 52.5 − 30 = 22.5
and then we consider the new endpoints (fences).
Q1 − 1.5IQR = 30 − 1.5(22.5) = 30 − 33.75 = −3.75
Q3 + 1.5IQR = 52.5 + 1.5(22.5) = 52.5 + 33.75 = 86.25
Since 110 is larger than this upper threshhold, we would say it is an
outlier.
Standard Deviation
The standard deviation measures the variation in data by measuring
the distance that the observations are from the mean. The standard
deviation tells us how far we can expect the average observation to be
from the mean.
Standard Deviation
The standard deviation measures the variation in data by measuring
the distance that the observations are from the mean. The standard
deviation tells us how far we can expect the average observation to be
from the mean.
Absolute Deviation
n
1X
|xi − x|
n
i=1
Standard Deviation
The standard deviation measures the variation in data by measuring
the distance that the observations are from the mean. The standard
deviation tells us how far we can expect the average observation to be
from the mean.
Absolute Deviation
n
1X
|xi − x|
n
i=1
Standard Deviation
sP
s=
(x − xi )2
n−1
Standard Deviation and Variance
Standard Deviation
sP
s=
(xi − x)2
n−1
Whereas it won’t have a lot of use for our purposes
Variance
2
s =
P
(xi − x)2
n−1
Finding the Standard Deviation
Example
Find the standard deviation of the daily caloric intake for a person
over the course of a week.
{1792, 1666, 1362, 1614, 1460, 1867, 1439}
Finding the Standard Deviation
Example
Find the standard deviation of the daily caloric intake for a person
over the course of a week.
{1792, 1666, 1362, 1614, 1460, 1867, 1439}
First we find the mean.
x=
11200
= 1600
7
Finding the Standard Deviation
Then, we need to find the difference between each of these values and
the mean, then square that differences and then sum them.
xi
1792
(xi − x)2
(1792 − 1600)2
square
1922
contribution
36864
Finding the Standard Deviation
Then, we need to find the difference between each of these values and
the mean, then square that differences and then sum them.
xi
1792
1666
1362
1614
1460
1867
1439
(xi − x)2
(1792 − 1600)2
(1666 − 1600)2
(1362 − 1600)2
(1614 − 1600)2
(1460 − 1600)2
(1867 − 1600)2
(1439 − 1600)2
square
1922
662
(−238)2
142
(−140)2
2672
(−161)2
contribution
36864
4356
56644
196
19600
71289
25921
Finding the Standard Deviation
Then, we need to find the difference between each of these values and
the mean, then square that differences and then sum them.
xi
1792
1666
1362
1614
1460
1867
1439
(xi − x)2
(1792 − 1600)2
(1666 − 1600)2
(1362 − 1600)2
(1614 − 1600)2
(1460 − 1600)2
(1867 − 1600)2
(1439 − 1600)2
square
1922
662
(−238)2
142
(−140)2
2672
(−161)2
sum
contribution
36864
4356
56644
196
19600
71289
25921
214870
Finding the Standard Deviation
Next, we divide by 6.
s2 =
s2 is the ...
214870
≈ 35811.67
6
Finding the Standard Deviation
Next, we divide by 6.
s2 =
s2 is the ...variance.
214870
≈ 35811.67
6
Finding the Standard Deviation
Next, we divide by 6.
s2 =
214870
≈ 35811.67
6
s2 is the ...variance.
Now we take the square root.
√
s = 35811.67 ≈ 189.24
So, the average value of the caloric intake is approximately 189
calories from the mean. Notice that we only care about magnitude and
not whether we are above or below the mean.
Mean and Standard Deviation
So what can we do with mean and standard deviation?
Mean and Standard Deviation
So what can we do with mean and standard deviation?
We can use them to relate individuals within our data set to the
distribution of the sample.
Mean and Standard Deviation
So what can we do with mean and standard deviation?
We can use them to relate individuals within our data set to the
distribution of the sample.
This is related to probability.
Mean and Standard Deviation
So what can we do with mean and standard deviation?
We can use them to relate individuals within our data set to the
distribution of the sample.
This is related to probability.
The total area underneath a distribution curve is always 1, so the
area under the curve is the same as the percent of observations
falling in the region.
We will see this better when we get to Normal distributions.
Uniform Distributions
For these, all values have the same probability of occurring. So, the
shape is that of a rectangle.
Random Number Between 0 and 2
f (x)
1
2
1
x
2
Uniform Distributions
Example
If we have a uniform distribution for a random number to be chosen
between 0 and 2, what is the probability that the number selected is
between .5 and 1.1?
Uniform Distributions
Example
If we have a uniform distribution for a random number to be chosen
between 0 and 2, what is the probability that the number selected is
between .5 and 1.1?
Random Number Between 0 and 2
f (x)
1
2
1
2
Uniform Distributions
Random Number Between 0 and 2
f (x)
1
2
1
x
2
Uniform Distributions
Random Number Between 0 and 2
f (x)
1
2
1
x
What is the area of a rectangle?
2
Uniform Distributions
Random Number Between 0 and 2
f (x)
1
2
1
x
What is the area of a rectangle? length × width.
2
Uniform Distributions
Random Number Between 0 and 2
f (x)
1
2
1
x
What is the area of a rectangle? length × width.
What are our dimensions?
2
Uniform Distributions
Random Number Between 0 and 2
f (x)
1
2
1
x
What is the area of a rectangle? length × width.
What are our dimensions? .5 × .6 = .3.
2
Uniform Distributions
Random Number Between 0 and 2
f (x)
1
2
1
x
2
What is the area of a rectangle? length × width.
What are our dimensions? .5 × .6 = .3.
So, there is a 30% chance that the number randomly selected falls in
this region.