Download Section 3: Analyzing Data with Fathom

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Section 3: Analyzing Data with Fathom
Section 3: Analyzing Data with Fathom
Summary: Teachers analyze automobile data using Fathom to describe center and
spread using dot plots, box plots, histograms. They will examine distributions of
univariate data of a quantitative attribute as well as comparison of distributions when a
qualitative attribute is added to separate distributions by categories. They will consider
pedagogical issues related to the use of various graphical representations, measures of
center and spread, and dynamic statistical software.
Objectives:
Mathematical: Teachers will be able to
• generate questions to explore given a data set;
• examine the distribution of a univariate data set using dot plots, box plots, and
histograms, including comparing distributions;
• describe the center and spread of a data set using resistant (median and
interquartile range) and nonresistant (mean and standard deviation) measures;
• develop a conceptual understanding of the usefulness of the standard deviation.
Technological: Teachers will be able to use Fathom to
• create dot plots, box plots, and histograms of univariate data;
• add a qualitative attribute to an existing graphical distribution of a quantitative
attribute, both as a key legend and as a category on the y-axis;
• plot statistical measures on graphs;
• compute basic statistics in a summary table.
Pedagogical: Teachers will
• consider the advantages and disadvantages of dynamic linking capabilities and
different graphical representations in Fathom;
• consider how different graphical representations and measures of center and
spread can draw attention to similarities and differences when comparing data
sets;
• consider the benefits and drawbacks of tasks to assist students in reasoning about
data.
Prerequisites: Material discussed in Section 1 of this module
Vocabulary: univariate data, bivariate data, interquartile range, deviations, standard
deviation, resistant measures, and nonresistant measures.
Technology Files: 2006_Vehicles.ftm
Emergency Technology Files: 2006_Vehicles_Part_3.ftm
Required Materials: Fathom v.2
_____________________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 1
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Section 3: Analyzing Data with Fathom
Data about an observed phenomenon comes in many different forms—often
frequencies, scores, codes, categories, or measurements. In addition, these different
forms of data can be represented in multiple ways. While viewing data in a table may
assist in examining individual cases, graphs and descriptive statistical measures may
help in analyzing and characterizing trends in the whole data set, or the aggregate.
Software tools have made the re-presentation of data in graphs and the calculation of
statistical measures quick and easy. Thus, rather than spending valuable time in
constructing graphical displays or computing measures, software tools facilitate quick
displays and computations that allow for more time to be spent on analyzing the data.
In Sections 1 and 2, we used the software TinkerPlots to assist in analysis of data. In
this Section 3 and 4, we will be using Fathom 2.0 (Key Curriculum Press, 2005.
TinkerPlots and Fathom use a similar interface to allow users to conduct data
analysis. TinkerPlots was designed to encourage users to create graphical displays by
implementing a series of actions, while Fathom allows users to easily create a variety
of standard graphical displays with fewer actions. While TinkerPlots has the
capability to display measures of center on a graph, Fathom includes a whole suite of
tools that can allow users to compute descriptive and inferential statistics. Thus,
Fathom is a much more powerful statistical tool, while TinkerPlots is a powerful tool
for analyzing data in graphical form. Like TinkerPlots, Fathom was created to allow
users to have dynamic control over data—meaning that as you change things in a
document, everything linked to what you are changing will update while you drag.
This linking between tabular data, graphical representations, and statistical measures
can be a powerful tool for exploring data in meaningful ways.
We will start this Section with exploring univariate data (a single attribute in a data
set) and will use what we learn with univariate data to explore bivariate data (two
attributes in a data set).
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 2
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Part 1: Asking Questions from Data
Increases in gas prices over the past several years may be one contributing factor to
many automobile manufacturers’ focus on improving vehicle miles per gallon (mpg)
performance and development of alternative types of engines that use a combination
of electricity and gasoline. Many people in America have also revisited the type of
vehicle they own, especially families who have longer commutes to the workplace.
To help us become more informed about the variety of vehicles on the market today,
we have assembled a collection of 41 vehicles manufactured in 2006. Most of the
vehicles (30) were rated as the top fuel economy leaders in the most popular vehicle
classes. This data is depicted in the table on the following page.
Although a typical cycle of data analysis starts with forming questions and then
collecting data to answer the question, textbooks and teachers often use pre-collected
data sets with their students to provide an immediate springboard for exploring a
phenomenon and to begin analyzing data. When students are presented with a given
data set, they need to learn how to examine the data and formulate specific questions
that can be answered knowing the various quantitative and qualitative variables
(called attributes in Fathom) available about each case.
FOCUS ON MATHEMATICS
M-Q1. Review the data in the table. Generate at least four different questions that you
could explore by analyzing this data set.
FOCUS ON PEDAGOGY
P-Q1. Describe two classroom situations, one for which it would be beneficial to use
a pre-collected set of data, and one for which students should be collecting data
themselves. Provide a rationale for the benefits in each situation.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 3
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
2006 Vehicle Data
Mfr
Chevrolet
Chevrolet
Ford
Ford
Ford
Ford
Ford
Ford
Gmc
Gmc
Gmc
Gmc
Honda
Honda
Honda
Honda
Honda
Honda
Hyundai
Hyundai
Hyundai
Isuzu
Jeep
Lexus
Lexus
Mazda
Mazda
Mazda
Merc-Benz
Mini
Mini
Pontiac
Saturn
Suzuki
Toyota
Toyota
Toyota
Toyota
Volkswagen
Volkswagen
Volkswagen
Model
Cargo Van
Passenger Van
Escape Fwd
Escape Hybrid Fwd
Focus Wagon
Focus Wagon
Ranger Pickup
Ranger Pickup
Savana Cargo Van
Savana Passen Van
Sierra Hybrid 2wd
Sierra Hybrid 4wd
Accord
Accord Hybrid
Civic Hybrid
Insight
Insight
Odyssey
Elantra
Sonata
Sonata
Ascender 4wd
Liberty 4wd
Rx 330 4wd
Rx 400h 4wd
B2300 2wd
B2300 2wd
Tribute 2wd
E320 Cdi
Mini Cooper
Mini Cooper
Vibe
Ion
Aerio Awd
Corolla Matrix
Prius
Scion Xb
Tacoma 2wd
Golf
New Beetle
New Beetle
Class
Van
Van
Suv
Suv
Wagon
Wagon
Truck
Truck
Van
Van
Truck
Truck
Sedan
Sedan
Compact
Compact
Compact
Minivan
Sedan
Sedan
Sedan
Suv
Suv
Suv
Suv
Truck
Truck
Suv
Sedan
Compact
Compact
Wagon
Compact
Compact
Wagon
Sedan
Wagon
Truck
Compact
Compact
Compact
Trans
Auto
Auto
Manual
Auto
Auto
Manual
Auto
Manual
Auto
Auto
Auto
Auto
Auto
Auto
Auto
Auto
Manual
Auto
Manual
Auto
Manual
Auto
Auto
Auto
Auto
Auto
Manual
Manual
Auto
Auto
Manual
Manual
Manual
Auto
Manual
Auto
Auto
Auto
Manual
Auto
Manual
City
15
15
24
36
26
26
21
24
15
15
18
17
24
25
49
57
60
20
27
24
24
22
22
18
31
21
24
24
27
26
28
30
37
35
30
60
30
21
37
35
37
Hwy
20
19
29
31
32
34
26
29
20
19
21
19
34
34
51
56
66
28
34
33
34
26
26
24
27
26
29
29
37
34
36
36
44
42
36
51
34
26
44
42
44
AnnFuel
1940
1940
1270
1000
1178
1138
1436
1270
1940
1940
1736
1835
1178
1176
660
591
525
1436
1099
1221
1178
1338
1338
1800
1138
1436
1270
1270
1024
1242
1242
1000
769
809
1000
601
1066
1436
769
809
769
Engine
Standard
Standard
Standard
Hybrid
Standard
Standard
Standard
Standard
Standard
Standard
Hybrid
Hybrid
Standard
Hybrid
Hybrid
Hybrid
Hybrid
Standard
Standard
Standard
Standard
Diesel
Diesel
Standard
Hybrid
Standard
Standard
Standard
Diesel
Standard
Standard
Standard
Diesel
Diesel
Standard
Hybrid
Standard
Standard
Diesel
Diesel
Diesel
Weight
4894
5295
3180
3627
2775
2771
3028
3028
4894
5295
5038
5357
3168
3589
2875
1881
1850
4475
2784
3266
3253
4954
4011
4065
4365
2994
2994
3192
3835
2557
2425
2700
2752
2859
2679
2890
2470
3180
2972
2965
2884
Mfr: Manufacturer Model: Model name Class: Vehicle classes used to classify by passenger and cargo volume (cars) and gross vehicle weight
rating (trucks). Trans: either Automatic or Manual Transmission City: estimated MPG in City driving Hwy: estimated MPG in Highway driving
AnnFuel: Estimated annual fuel cost assuming 15,000 miles per year (55% city and 45% hwy) and average fuel price Engine: Standard (accepts
unleaded gas), Diesel (accepts diesel), or Hybrid (runs part on electricity and part on unleaded fuel) Weight: Weight of vehicle, including standard
equipment and all fluids, but no passengers, cargo, or optional equipment Data retrieved from 2006 Fuel Economy Guide
http://www.fueleconomy.gov/feg/download.shtml
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 4
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Part 2: Examining Univariate Distributions
To explore the vehicle data using Fathom, open the 2006_Vehicles.ftm file. When
you open the file, you should see one icon: the
collection icon
2006 Vehicles
.
Tech Tip:
Different cases can be
viewed in the
inspection window by
clicking the right
arrow in the bottom
left corner of the
window. The number
41 indicates that there
are a total of 41 cases
in the collection.
Double clicking on the collection icon opens the
inspect collection window which provides a
view of the values for the attributes for each
case (shown in Figure 3.1). The name of each
attribute in the data set will be listed in pink
with one attribute per row. The inspection
window contains 41 data cards, one for each of
the cases in the data set. The data cards are
useful for examining each individual case.
However, to do analysis on the whole data set, it
is helpful to view the data set in a table.
Tech Tip:
If the Case Table does
not show the data, drag
and drop the name of
the collection onto the
body of the case table.
To view a collection of data as a table:
1. click on the Collection icon to select the 2006 vehicle
collection.
2. From the object shelf, drag and drop a New Case Table
into the document.
3. Click and drag a corner of the case table to resize it.
2006 Vehicles
Mfr
Figure 3. 1
Figure 3. 2
Model
Class
Trans
City
Hw y AnnFuel
Engine
Weight
1
Chevrolet Cargo Van
Van
Auto
15
20
1940 Standard
4894
2
Chevrolet Passenger Van
Van
Auto
15
19
1940 Standard
5295
3
Ford
Escape Fw d
Suv
Manual
24
29
1270 Standard
3180
4
Ford
Escape Hybrid Fw d
Suv
Auto
36
31
1000 Hybrid
3627
5
Ford
Focus Wagon
Wagon
Auto
26
32
1178 Standard
2775
6
Ford
Focus Wagon
Wagon
Manual
26
34
1138 Standard
2771
7
Ford
Ranger Pickup
Truck
Auto
21
26
1436 Standard
3028
8
Ford
Ranger Pickup
Truck
Manual
24
29
1270 Standard
3028
4894
9
Gmc
Savana Cargo Van
Van
Auto
15
20
1940 Standard
10
Gmc
Savana Passen Van Van
Auto
15
19
1940 Standard
5295
11
Gmc
Sierra Hybrid 2w d
Truck
Auto
18
21
1736 Hybrid
5038
12
Gmc
Sierra Hybrid 4w d
Truck
Auto
17
19
1835 Hybrid
5357
13
Honda
Accord
Sedan
Auto
24
34
1178 Standard
3168
14
Honda
Accord Hybrid
Sedan
Auto
25
34
1176 Hybrid
3589
Figure 3. 3
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 5
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
The first question we are going to examine about the 2006 vehicle data set is,
“How do these automobiles typically perform in their gas mileage when driving in
the city?”
In order to answer this question, we need a measurable attribute of the automobiles
that can be used to characterize performance in gas mileage when driving in the city.
The attribute that provides a measure of this characteristic is City, which gives the
estimated mpg reported by the US Environmental Protection Agency based on their
lab testing. When asking questions about a phenomenon, students may have difficulty
determining how to collect a measurable attribute that can be used to answer to the
question. This same difficulty can occur when students have access to a pre-collected
data set and want to ask questions about the phenomenon. They may ask questions for
which no quantitative or qualitative attribute in the data is helpful in answering.
To answer our question, it would be useful to view the distribution of the City mpg
graphically. To construct graphs in Fathom, a user must place an attribute on a given
axis. This action will populate the graph with the data associated with this attribute.
The purposeful placement of an attribute onto an axis can help students connect the
numerical data to the graphical representation. The default graph in Fathom is a dot
plot.
Tech Tip:
You can change the
scale of the axis by
clicking and dragging
the axis. When the
hand is vertical, this
will translate the axis.
When the hand is
horizontal, dragging
will dilate the scale.
To view data graphically,
1. click and drag the Graph object from the
object shelf. The graph will be blank.
2. Click and drag the attribute label (City)
in the Case Table and drop it onto the xaxis in the graph where it reads “Drop
an attribute here”.
Figure 3. 4
We currently have three representations of our data set: 1) Collection (shown as cards
in the inspection window), 2) case table, and 3) a dot plot. These representations of
data are linked together. This allows a user to locate a case across multiple
representations. In addition, changes in data in one representation will be
automatically changed in all representations of the data.
Tech Tip:
You can undo a few
changes by selecting
the Undo command
(ctrl-z) from the Edit
menu.
To change a data value,
1. from the case table, click on the row number for a case (e.g. to choose the
Ford Ranger Pickup, click on the number 7 to highlight that case row).
2. To change the data value graphically, click on the red data icon and drag it to
the left or right. Notice the change in the corresponding numerical value in the
table.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 6
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Since the 2006 vehicle data should be a fixed data set, we need to revert the data to its
original values. In Fathom, a data icon in a graph can be dragged to change its value;
however, it is possible to prevent a user from changing the data value. In the case of
the 2006 vehicle data, this would be wise.
To revert a collection,
1. select the2006 Vehicle collection object.
2. From the File menu, choose Revert Collection.
To prevent changes in a collection by dragging data icons,
1. select any of the open objects (e.g., Collection, Table, Graph) in the
workspace,
2. Under the Collection menu, choose Prevent Changing Values in Graphs.
Although we want to keep the data set fixed, we can still take advantage of the linked
capabilities between the case table and the graph to answer a few questions about the
vehicles performance for City mpg. The linking of these representations allows
students to explore individual cases while also considering the case with the entire
aggregate. Since many students initially are interested in and focus on individual
cases, it can be helpful to ask questions about individual cases that also allow students
to consider the relative position of these cases to the aggregate.
FOCUS ON MATHEMATICS
M-Q2. By clicking on the data icons on the graph, find which vehicles are at the low
and high ends of the distribution.
M-Q3. The Volkswagon New Beetle with Automatic transmission is a trendy favorite
for many Americans. By clicking on the case row for this vehicle in the case table,
use the graph to describe the New Beetle’s standing in City mpg relative to the other
vehicles.
M-Q4. There appears to be a cluster of 4 vehicles with a City mpg above 45. Clicking
and dragging a selection box around those data icons will highlight the vehicles in the
case table. Examine these 4 cases carefully. List two or three attributes these vehicles
have in common.
FOCUS ON PEDAGOGY
P-Q2. What are the advantages and disadvantages of having the representations
dynamically linked when working with a data set?
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 7
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
P-Q3. The linking of multiple representations in software like Fathom allows one to
simultaneously view the distribution of an entire data set while focusing on individual
cases. How might this feature help or hinder students’ analysis of the data?
Two other graphical representations often used to display quantitative attributes of
univariate data are histograms and box plots (also called box-and-whisker plots).
Viewing the data in these different representations may illuminate or obscure
different aspects of the distribution.
Drag down two more empty Graph objects into the workspace and drag and
drop the City attribute onto the x-axis of each graph. To assist in comparing the
three different representations, we are going to change one graph to be a box plot and
one to be a histogram.
To create a box plot,
1. from the drop down menu in the top
right corner of the graph window,
select the Box Plot option.
To create a histogram:
1. from the drop down menu in the top
right corner of the graph window,
select the Histogram option.
Figure 3. 5
To adjust the bin width in a histogram:
1. point to a vertical boundary for one bar in the
histogram. The cursor will change to a double
arrowed line.
2. Either click and drag to adjust the bin width
dynamically, or double click and enter a value
for the binAlignment and binWidth (see
Figure 3.6, in our example we can start the
first bin at 15 and have a width of 5).
Figure 3. 6
The distribution of City mpg is shown in Figure 3.7 as a dot plot, box plot, and
histogram. If you click on a case or select a range of cases in any one the graphs, the
corresponding cases will also be highlighted in the other graphs.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 8
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Figure 3. 7
FOCUS ON MATHEMATICS
M-Q5. Compare the representation of the City data in the three graphs in Figure 3.7.
What characteristics of the distribution are more noticeable or are hidden in each
representation?
M-Q6. By only examining the graphs, what would you characterize as a typical City
mpg for these automobiles?
FOCUS ON PEDAGOGY
P-Q4. How can examining a distribution using three different linked graphical
representations be a help or hindrance for students?
P-Q5. How could students use the box plot to describe the center and spread of the
City mpg?
P-Q6. Describe how you could help students understand why the median is not
located in the center of the middle 50% of the data.
Although the median is displayed in the box plot, it may be helpful to display the
location of the median and mean on the graphs. Overlaying a statistical measure on a
graphical representation can provide students with a visual way of conceptualizing
the location of the measure in relationship to the entire aggregate. This can help
students understand better how the value of the measure represents the entire data set
and how its location is related to the distribution of data values.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 9
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Tech Tip:
When typing formulas
in the Formula Editor,
if Fathom recognizes
the function, the text
turns blue. If the name
of an attribute is
recognized as one in
the data set, the text
turns pink.
To add a vertical line representing a measure
to a graph:
1. with the graph window selected,
choose the Graph menu and select the
Plot Value option.
2. A formula editor window will appear.
In the textbox to the right of “Value=”
type in the function to compute the
statistical measure. For our example,
we will want to use mean(City) and
median(City).
Figure 3. 8
You can add the mean and median measure to each of the three graphs. Figure 3.9
displays both measures overlaid on the dot plot.
Figure 3. 9
FOCUS ON MATHEMATICS
M-Q7. Do either of the measures of center, mean or median, best represent a typical
City mpg for these automobiles? Defend your choice or provide an alternative way of
representing the typical City mpg.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 10
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Part 3: Comparing Distributions Using Center and Spread1
Thus far, we have explored the City mpg for the entire aggregate of vehicles. It is
obvious from our analysis that some types of vehicles may have better City mpg than
others. In particular, we previously noticed that the four cases considered as outliers
were all Hybrid engines. Our data set contains vehicles of three different Engine
types: Standard, Diesel, and Hybrid. When students make an observation like this
about a data set, it often prompts them to explore a new question. This is an important
feature of EDA—analysis of data leads to more questions, which leads to further
analysis. Consider the following question:
Which type of engines give vehicles the best fuel economy in the city?
To examine this question, we need to use two attributes in the data set: City mpg and
Engine type. We now have a question that needs us to use bivariate data with one
quantitative attribute (City) and one qualitative attribute (Engine). Having students
examine one quantitative and one qualitative attribute together in a data set can
provide a transition into the working with bivariate data (two attributes) to answer a
question.
One way to begin examining the data with attention to the two attributes is to overlay
the qualitative attribute on top of the dot plot of the distribution of the City mpg. This
action will recolor the icons according to the categories of the qualitative attribute and
display a legend explaining the coloring.
To overlay a legend attribute to a graph:
1. click and drag the name of an
attribute form the case table and
point to the interior of the plot
window. Directions will appear as
shown in Figure 3.10. You only
need use the Shift or Ctrl keys if it
is not clear which type of attribute
you are dragging, or if you want to
purposely use an attribute a
specific way (e.g., if the categories
Figure 3. 10
of a qualitative attribute have been
entered using numeric codes such
as 1, 2, 3, you may have to use the Shift key to force Fathom to recognize the
data at categorical).
2. Release the mouse and notice the appearance of the legend and that different
shapes and colors are represented (see Figure 3.11). If the legend attribute is
1
The technology file “2006_Vehicles_Part_3.ftm” is available for students to use for Part 3 if they
were unable to complete Part 2 with the technology.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 11
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
qualitative, shapes and colors will be used, if the attribute is quantitative, a
color gradient will appear (we will explore this in a later section).
Figure 3. 11
FOCUS ON MATHEMATICS
M-Q8. Viewing Figure 3.11, what can you say about the City mpg for each of the
three Engine types?
FOCUS ON PEDAGOGY
P-Q7. How can overlaying a categorical (qualitative) attribute on a dot plot of a
numerical (quantitative) attribute influence students’ ability to examine data?
The graph in Figure 3.12 is good way for students to begin to coordinate two
attributes in a data set, and thus is a first step in learning to conduct bivariate data
analysis where one variable is quantitative and the other is qualitative. In Fathom,
students can also place the qualitative attribute on the y-axis and separate the data into
distinct categories. In our example, we can drag and drop the attribute Engine
onto the y-axis. This will allow us to view the distribution of City mpg for each
engine type separately (Figure 3.12)
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 12
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Tech Tip:
To remove a
legend attribute
from a graph,
click on the plot
window and from
the Graph menu,
select Remove
Legend Attribute.
Figure 3. 12
FOCUS ON MATHEMATICS
M-Q9. What similarities and differences do you notice about the distributions of City
mpg for each of the Engine types?
M-Q10. Examine the location of the mean and median in the three distributions.
Explain the relative location of the mean and median to each other in the three
distributions.
Although dot plots are useful, changing the graphical representation to another form
may highlight different aspects of the distribution. Change the graphical display
from a dot plot to a box plot (See Figure 3.13).
Figure 3. 13
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 13
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
FOCUS ON MATHEMATICS
M-Q11. What characteristics of the distributions beyond the measures of center are
highlighted when viewed as box plots?
FOCUS ON PEDAGOGY
P-Q8. How can examining the statistical measures of mean and median along with
the dot plot or box plot display of the distribution for each engine type assist students
in reasoning about center and spread when comparing the three groups?
P-Q9. How could you use the data to help students understand why in each of the
three box plots in Figure 3.13 the whiskers are not the same length?
In addition to comparing distributions graphically and displaying measures on a
graph, it is also helpful to use technology to compute and display the exact values of
several statistical measures. A summary table is useful in computing these statistics.
To create a Summary Table with several statistical measures,
1. drag down an empty summary object.
2. Click and drag a quantitative attribute
(City mpg) to the summary table. Once
the cursor is over the summary table, a
down arrow and a right arrow appear.
Figure 3. 14
Drop the quantitative attribute below the
down arrow.
3. By default, the measure computed and
displayed is the mean. There are three ways to
add more measures. From the Summary
menu, you could select Add Formula, Add
Basic Statistics, or Add Five-Number
Summary. For our example, choose Add FiveNumber Summary. You will likely have to
resize the Summary table window.
4. You can also add a qualitative attribute to the
Summary table to recompute the statistics for
each separate category. In our example, we
want to drag drop the attribute Engine next to
the right arrow. Again, you will likely have to
Figure 3. 15
resize the window to view the statistical
measures for each category.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 14
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Figure 3. 16
Now we have two powerful tools to help us analyze and compare the distributions of
City mpg for the different Engine types. We can change the graphical display to show
dot plots, box plots or histograms or use the Summary Table to compute additional
statistical measures.
FOCUS ON MATHEMATICS
M-Q12. Use the graphical displays and the statistical measures to compare the
distributions of the City mpg for the three Engine types. Which type of engines give
vehicles the best fuel economy in the city? Justify your reasoning.
FOCUS ON PEDAGOGY
P-Q10. What are some of the key features of this vehicle data set that make it useful
in helping students attend to important ideas of center and spread when comparing
data sets?
Asking students to compare distributions has been shown to be a useful technique for
helping students transition from considering data as individual cases to paying
attention to data as an aggregate. In addition, tasks that ask students to compare
distributions can help them consider characteristics such as shape and spread as useful
complements to measures of center.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 15
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Part 4: Understanding Spread of a Distribution
Pedagogy Tip:
A detailed
discussion of the
IQR can be found in
Section 1, Part 4.
When representing data in a box plot, students can focus on the median as a measure
of center and the interquartile range (IQR) as a measure of the middle 50% of the
data, represented as the “box”. Thus, the IQR can help describe the spread of a data
set and is useful to consider in concert with the median as a measure of center.
When we use means to compare centers, then it does not make sense to use
interquartile ranges, which are computed using the medians, to analyze spread.
Rather, a different measure of spread, the standard deviation, is often used. This
measure of spread takes into consideration how each data point deviates from the
mean.
Consider the diagram in Figure
3.17. There are five data points
shown with values {3, 5, 11, 12,
14}. The vertical red line
represents the location of the
mean, which has a value of 9.
From each data point, there is a
horizontal black line from that
point to the mean, representing
how much the value of that point
deviates from the mean. There are
five values for the deviations {-6,
-4, +2, +3, +5}. Notice that the
sum of the deviations from the
mean is zero.
The standard deviation is a way of
describing how the data points
typical deviate from the mean.
However, since some of the
deviation values are positive while
others are negative, it is not helpful
to simply find the sum or the mean
of these deviations. One method
that can be used to eliminate the
negative deviations is to square
Figure 3. 17
each deviation. Once deviations
from the mean are squared, their
sum will no longer be zero. The squared deviations are represented as the area of the
gray squares in the diagram with values {36, 16, 4, 9, and 25}.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 16
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
Two common measures that are used for describing the spread or dispersion of data
around the mean are variance and standard deviation, both of which are based on
the mean of the squared deviations. The variance is the mean of the squared
deviations and can be found by dividing the sum of the squared deviations by n (if
you are working with the entire population) or n-1 (if you are working with a
sample)2. In order to have a measure of spread that is on the same scale as the original
data, we can take the square root of this mean. This will standardize the measure,
resulting in the measure called the standard deviation. By default, Fathom will
compute standard deviations and variances based on a sample. However, there are
formulas in Fathom that can be used to compute these measures based on a
population if so desired.
The median and interquartile range are considered resistant measures because they
are based on ranks in data and not numerical values. Therefore, they are not strongly
influenced by outliers. The mean and the standard deviation are considered
nonresistant measures because they are based on numerical values of each data
point. Therefore, a numerical value well outside of the range of most of the data will
affect each of these measures.
FOCUS ON MATHEMATICS
M-Q13. What does the magnitude of the standard deviation tell you about the
dispersion of the data points in relationship to the mean?
M-Q14. Consider the following formulas for computing the variance (s2) and
standard deviation (s) for data in a sample of size n where x represents the mean and
xi is the ith data value.
n
n
s2 =
( xi x ) 2
i =1
s=
(x
i
x)2
i =1
n 1
n 1
Explain what each part of the formula represents with respect to the diagram in Figure
3.17 and the explanation above.
M-Q15. Explain why the 2006 Vehicle data are considered a sample rather than a
population.
2
When finding the variance and standard deviation of a population, we divide by n. However, most
data sets are a sample of the population. If we compute the variance for a sample in the same way that
we compute the variance of a population, we will have a biased estimator of the population variance.
That is, if we took all possible samples of n members and calculated the variance by dividing by n and
took the mean of those variances, this value would not be equal to the true value of the population
variance. Fortunately the correction for this bias is remarkably simple. To correct for this bias, we
divide by n-1 rather than n when we have a sample.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 17
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
M-Q16. Consider the distributions and location of the mean City mpg for each of the
three Engine types. Which engine type do you predict will have the largest standard
deviation? The smallest? Explain your reasoning based on how the data values
deviate from the mean for each Engine type.
M-Q17. Use a summary table to find the value of the standard deviation of the City
mpg for each of the three Engine types. What do these values tell you about the
spread of the City mpg? Do the calculations match your predictions?
FOCUS ON PEDAGOGY
P-Q11. Students are often introduced to the standard deviation through instruction on
how to compute its value based on the formulas shown in M-Q14. What is the benefit
of using a diagram such as the one in Figure 3.17 to help students conceptualize
standard deviation as a measure that describes typical deviation from the mean?
P-Q12. What are the advantages or drawbacks of having students examine several
distributions with the means indicated as in M-Q14 and asking them to predict
magnitude of a standard deviation before using Fathom to compute the exact values?
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 18
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
SUGGESTED ASSIGNMENTS
H-Q1 (Mathematical)
Use Fathom to create graphical displays and compute statistical measures to compare
the distributions of the Highway mpg for the three Engine types. Which type of
engines give vehicles the best fuel economy on the highway? Justify your reasoning.
H-Q2 (Mathematical and Pedagogical)
The mean absolute deviation is often introduced in middle school as an introductory
measure of spread. While the mean absolute deviation is easy to compute, the
behavior of the absolute value function make it a more difficult measure to use when
conducting more complex statistical analyses and is therefore infrequently used in
high school and college. Instead of using squaring as a method to eliminate the
negative deviations, the mean absolute deviation is computed by finding the
absolute value of each deviation from the mean and then finding the mean of these
values. Consider the collection of 9 cases with a mean of 5 shown in the table and dot
plot below.
a) What is the value of the mean absolute deviation (MAD) for this data
set?
b) What does the value of the MAD indicate about the spread of the data?
c) How would you need to change the values in the data set so that the
mean remains 5 but the MAD increases to 24/9?
d) Describe the benefits and drawbacks of using the mean absolute
deviation and the benefits and drawbacks of using the standard
deviation with middle and/or high school mathematics students.
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 19
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006
Section 3: Analyzing Data with Fathom
HQ-3. (Pedagogical)
Compare the pedagogical benefits and drawbacks of using Fathom and TinkerPlots to
explore univariate data with respect to the following points:
• The organization of data in a collection
• The linking of representations
• The representations available and the construction of graphs
• Use of color
• The ability to display measures on a graph
• Calculation of measures
H-Q4. (Pedagogical)
When is it advantageous to use the median and interquartile range as summary
measures? Mean and standard deviation? When examining a distribution, how can
you assist students in deciding if resistant or nonresistant measures are appropriate?
__________________________________________________________________________________
Learning to Teach Mathematics with Technology: An Integrated Approach
Page 20
DRAFT MATERIALS
DO NOT DISTRIBUTE
Modified 9/22/2006