Download Workshop Guide for Fathom Dynamic Statistics

Document related concepts
no text concepts found
Transcript
Workshop Guide for
Fathom Dynamic Statistics™ Software
Version 1.1
Contents
Introduction ...................................................................................................................... 1
Getting Started with Fathom ............................................................................................. 7
Tutorial 1: Data, Formulas, and Prediction—Wrist Versus Height .................................. 9
Tutorial 2: An Introduction to Exploring Data—Residents of Beverly Hills, California 15
Tutorial 3: Formulas, Functions, and Data—The Planets .............................................. 21
Tutorial 4: Simulation—Polling Voters ........................................................................... 25
Tutorial 5: Testing a Hypothesis .................................................................................... 29
Fathom Demo Documents ............................................................................................. 35
Sample Activities ............................................................................................................. 37
Authors: William Finzer and Tim Erickson
Editor: Jill Binker
Lead Programmer for Fathom: Kirk Swenson
Development Team for Fathom: William Finzer, Kirk Swenson, Nick Jackiw,
Jill Binker, Matt Litwin, Tim Erickson, Eugene Chen, Tony Thrall, Zach Teitler,
Denise Howald, Caroline Wales, and Vadim Keylis
Production Director: Diana Jean Parks
Production Coordinator: Ann Rothenbuhler
Cover Designer: Caroline Ayres
Publisher: Steven Rasmussen
Portions of this material are based upon work supported by the National Science Foundation under award number
III–9400091. Any opinions, findings, and conclusions or recommendations expressed in this publication are those
of the authors and do not necessarily reflect the views of the National Science Foundation.
© 2001 by Key Curriculum Press. Fathom’s Formula Editor © 1999 by Pacific Tech (www.PacificT.com).
All rights reserved.
™Fathom Dynamic Statistics and the Fathom logo are trademarks of KCP Technologies. The gold balls in Fathom’s
collections are taken from an After Dark® screen saver and are used with permission. After Dark is a registered trademark of
Berkeley Systems, Inc. All rights reserved. All other registered trademarks and trademarks in this book are the property of their
respective holders.
10 9 8 7 6 5 4 3 2
04 03 02 01
ISBN 1–55953–518-0
The demonstration edition of Fathom and this Workshop Guide are for demonstration purposes only.
Key Curriculum Press grants users the right to use and copy these for the purposes of evaluating the
materials or for use in teacher workshops. Copied materials must include this notice and Key
Curriculum copyright. The demonstration software and guide cannot be used for classroom or other
extended use. The full Fathom package is available from Key Curriculum Press, 1150 65th Street,
Emeryville, CA 94608, (800) 995-MATH (6284).
Introduction
As its title suggests, this guide is meant to be used in a workshop setting, one in which
participants can work together and in which a workshop leader can give
demonstrations, lead discussions, and answer questions. For learning to use Fathom
outside a workshop setting, try the Learning Guide that comes with Fathom, or the
Walkthrough Guide that comes with the demo.
To the Workshop Participant
Welcome to Fathom and to the world of Dynamic Statistics. The tutorials in this
guide will introduce you to the most important features of Fathom; they begin on
page 9.
As you work through the tutorials, you will encounter watch icons such as the one at
left. These indicate a pausing place. Your workshop leader is likely to ask that when
you reach these watch icons, you stop working through the tutorial and explore what
you have learned thus far. When most people have reached the watch icon, your
workshop leader will probably have everyone pause so that you can discuss things as a
group or the workshop leader can do a brief demo.
At the end of the workshop, take this booklet and the Fathom Demo CD with you to
explore and evaluate Fathom in more detail. But remember that the Demo CD and
Workshop Guide are for demonstration purposes only, and are not for classroom or
other extended use. And, as you become comfortable using Fathom, we encourage
you to help others by leading workshops yourself.
If you are waiting for the workshop to begin, please read the section entitled “The
Structure of Fathom,” beginning on page 3.
To the Presenter
We have found that a successful and simple workshop format for Fathom involves
little more than turning participants loose to work in pairs, with the tutorials in this
guide. (If you prefer that participants work singly, have them identify a partner with
whom they can consult when they have problems.) It is extremely useful to have a
computer that can project to a screen or large monitor that everyone can see.
The watch icons interspersed throughout the tutorials indicate reasonable pausing
places where you can ask everyone to stop work for a moment to answer questions
about the preceding section or to do a brief demonstration. It’s especially useful to
have one or more participants come forward and show something they have found in
front of the entire group.
In the later tutorials, as participants have become more independent, you may want to
skip watch icons or eliminate their use altogether.
Key Curriculum Press gives copy permission to any presenter who wants to give this
Workshop Guide to workshop participants. Key Curriculum Press also gives
permission to install multiple copies of the Fathom Demo or a single network server
copy for the purposes of giving a workshop.
What Makes the Fathom Demo Different from the “Real Thing?”
The Fathom demo has all the features of the full version with a few important
exceptions: You can’t print, save, or export your work. You can do investigations,
open documents, import or enter data, and use all the features of the full version, but
you cannot save any documents you create. You also cannot export data you generate
or copy pictures of Fathom objects to paste into another application. Finally, you
cannot print your work.
Fathom Workshop Guide
©2001 Key Curriculum Press
1
The full version comes with important and useful documentation.
• The Learning Guide contains a much more complete introduction to Fathom’s
features.
• The Reference Manual provides how-to instructions for use of Fathom.
• Data in Depth: Exploring Mathematics with Fathom, by Tim Erickson, contains 300
pages of classroom-ready activities in blackline-master form.
2
©2001 Key Curriculum Press
Fathom Workshop Guide
The Structure of Fathom
If you’re waiting for the workshop to begin, this section will help prepare you for the
workshop itself. Don’t worry if you don’t have time to read it before the workshop
begins. It makes equally good, perhaps even better, reading after the workshop.
The Fathom Window
Tool shelf
Case table
Collection
Text
Graph
(scatter
plot)
Graph
(histogram)
A collection,
opened up
A collection, made
small (iconified)
A Fathom document lives in a window and contains objects. There are several
different kinds of objects, led by the collection (which holds the data). The collection
looks like a rectangle with gold balls in it, or, if you make it small, a box of gold balls.
Graphs, tables, and statistical tests are objects, too. But they do not contain data. They
just provide ways of looking at or altering data. (Deleting a graph or table leaves your
data intact; deleting the collection deletes your data.)
Sliders are also objects; they control variable parameters, such as the coefficient of a
function or a probability in a simulation.
Text resides in objects, too. Text objects allow you to add explanatory notes.
When you first use Fathom, collections are boring and may even appear unnecessary.
Why can’t the data just live in a table, like in a spreadsheet? The answer is deep; one
way to look at it is that Fathom lets you distinguish between the way things are and
the way they look. A table is just one representation of the data, and we didn’t want to
give it the privileged position of being the data’s “native” appearance. So we invented
another representation, the gold ball. Another reason for the existence of collections
is that it makes some more advanced data manipulation, such as sampling, easier. For
example, you sample from among the gold balls in the collection—and the balls go
into another collection.
Cases and Attributes
Each gold balls is a case. A case has attributes (also known as variables, but we had to
pick a word and stick with it) and attributes have values. If the values are all numeric,
then it’s a continuous (measurement) attribute; if the values are text, it’s a categorical
attribute. If your cases are people, height and age are continuous; sex and race are
categorical.
Fathom Workshop Guide
©2001 Key Curriculum Press
3
Whether an attribute is continuous or categorical determines many things—including
the kinds of graphs you can make to represent it and tests and estimates you can
perform on it.
The Inspector
When you double-click a collection, its inspector
appears. It will show you the names and values of
the attributes, one case at a time. You may edit the
values here. You can also add measures (more on
this later) and comments to the collection, change
how the cases look, and control advanced functions,
such as sampling, using the inspector.
If you’re typing a lot of data into Fathom, the
inspector is not the most convenient place for it;
you’ll want to use a case table instead.
Inspector (Windows platform)
Case Tables
The case table is the familiar tabular view of data. Each
case appears as a row; each attribute as a column. You can
enter and edit data quickly in a case table and add new
attributes.
You can also rename attributes, move columns by
dragging them, resize columns by dragging their borders,
sort the data, and even hide attributes to make your table
less crowded. But though you can change data through a
case table, it is only one view of the data. If you delete the
case table, the data are still present—in the collection.
Graphs
Fathom also lets you view data graphically. The graph object supports various kinds
of charts and plots. You tell Fathom what to graph by dragging attribute names to an
axis of the graph. You specify the graph type by using a menu in the corner of the
graph object.
The configuration of attributes determines what
kinds of graphs are available. For example, if you put
age on an axis and leave the other axis blank, you can
get a box plot or a histogram—because age is a
continuous attribute. But if you put sex on that same
axis, you can’t get a histogram—because sex is
categorical. You get a bar chart instead.
You can also add things to graphs, such as functions.
Functions can be anything you can express with a
formula (see the next section) as well as the more
familiar lines such as the least-squares linear
regression line or the median-median line.
In most kinds of graphs, you can change the data by dragging unless the data are
locked. So, for example, you can drag points in a scatter plot and see how they affect
a least-squares line.
Ubiquitous Formulas
Formulas are everywhere in Fathom. You can use formulas as filters to tell which
cases you want to see and you can use them to compute values (including random
4
©2001 Key Curriculum Press
Fathom Workshop Guide
values) for attributes. You can use formulas to devise statistical measures and to plot
formulas as functions in graphs.
You write formulas in the formula editor. The
formula editor appears when you double-click the
place where a formula belongs, or when you select
certain menu items. For example, to make an
attribute that is the logarithm of income, make a
new attribute, then select its name. Choose Edit
Formula from the Edit menu. In the formula editor,
enter log(income) and click OK. The new attribute will
fill with the logs of the incomes. Later, any change
in the incomes will instantly be reflected in their
logs.
The formula editor supports a wide array of
functions, from trig functions to statistical
Windows platform
distributions to conditionals such as if() and switch().
They are all listed in the function and attribute list,
a pane at the right in the editor. Double-click an item in the list to enter it into the
formula pane above. You can also enter it by typing. The formula editor supports
syntax coloring, that is, it displays the names of functions and attributes in special
colors. If you type an attribute name or function and it remains black, Fathom does
not recognize it (you probably have a typo).
Using the Formula Editor
The Effect of
Selection
Before
After
The peculiar nature of mathematical formulas makes editing them different from
editing ordinary text. First, certain characters— parentheses, quotes, and absolutevalue signs—must appear in pairs. Fathom always makes the second of the pair when
you enter the first. Second, grouping and order of operations require special editing
capabilities. New users of Fathom can use their intuition most of the time, but there is
one new principle they must learn: The effect of some keys depends on what, if
anything, is selected when you press them. For example, if a + b—without
parentheses—is selected, and you press (, you get (a + b)—with parentheses. If you
press * (for multiplication), you get (a + b)*.
In each of the examples at left, the user types /4.
Finally, you need to click OK to close the formula editor. But sometimes you want to
see the result of your formula—to see if it’s right—before closing the editor. In that
case, click Apply on the screen or press Enter on your keyboard; this applies your
changes, leaving the editor open. However, you must close the formula editor before
you can go on to do other things.
The formula editor is not case sensitive; that is, it doesn’t care whether you type capital
or lowercase letters.
Measures
A regular attribute is a case attribute: It has a
separate value for each case in the collection.
But there are also measures—attributes with
one value for the entire collection. For
example, in a collection of five dice, each
might have an attribute called number. But
the collection itself could have a measure, or
total, that applies to the entire collection. It
would have a formula: sum(number). So, you
can think of measures as statistics that
Fathom Workshop Guide
©2001 Key Curriculum Press
5
summarize your collection, though they can serve other functions as well.
You create measures by using the inspector. Double-click a collection to make the
inspector appear, then go to the Measures pane.
Derived Collections
Some of Fathom’s collections fill with data automatically, according to rules you
specify. These are called derived collections. There are several kinds. The two most
important are samples and measures collections. A sample is just what it sounds like: a
collection that is a sample of some other collection, called its source. Select the source
and choose Sample Cases from the Analyze menu. The sample collection appears.
Control how many cases it samples and whether it samples with or without
replacement in its inspector.
The measures collection is the key to
simulation and analysis. It converts measures
into case attributes—so you can record
statistics (measures) about your collections.
(It’s a bit tricky and is best understood after
you’ve tried it.) Suppose you have a collection
with five (randomly generated) dice in it and a
measure, or total, that contains their sum. If
you connect a measures collection to that
source collection, the measures collection can
record the total as a case. If you tell Fathom
to collect 100 measures, it will instruct the
dice to re-roll 100 times, it will compute each total, and it will make a collection of
those 100 values. You have built up the distribution of that statistic.
To make a measures collection, select the source collection and choose Collect Measures
from the Analyze menu. Again, control how many measures it collects in its inspector.
6
©2001 Key Curriculum Press
Fathom Workshop Guide
Getting Started with Fathom
The Fathom Demo CD contains an installer that will install Fathom on the hard drive
of your computer. Alternatively, you can run Fathom directly from the CD.
System Requirements
Windows
•
Pentium-based system or equivalent
•
Windows 95, Windows 98, Windows 2000, or Windows NT 4
•
16 megabytes of memory
•
CD-ROM drive
Macintosh
•
OS 8 or better
•
16 megabytes of RAM
•
CD-ROM drive
Install Fathom
Windows
1.
2.
3.
4.
Insert the Fathom CD into your CD drive.
Double-click the My Computer icon on your desktop.
Double-click the CD icon. (It may be labeled either D: or Fathom.)
Double-click the Setup icon and follow the directions on the screen.
Macintosh
1. Insert the Fathom CD into your CD drive.
2. Double-click the CD icon.
3. Double-click the Installer icon and follow the directions on the screen.
Starting Fathom
When Installed on the Computer’s Hard Drive
The installer places a shortcut or alias to Fathom on your desktop. (It looks like a little
gold ball.) Double-click it. The Fathom application will launch.
From the CD
The top level of the CD contains a Fathom folder. Double-click this folder to open it.
Inside the folder is the Fathom application icon. Double-click it.
Note that depending on the speed of the computer and its CD drive, it can take
considerable time to start the Fathom demo.
Fathom Workshop Guide
©2001 Key Curriculum Press
7
Tutorial 1: Data, Formulas, and Prediction—Wrist
Versus Height
Here we look at entering data with an eye toward predicting values of one attribute
from another attribute. You’ll learn how to make a collection and enter data. From
the collection you’ll make several plots, including a dot plot and a scatter plot.
Presenter: Allow 10 minutes for this first section.
1. Chances are, there is a shortcut or alias to the Fathom application on your
desktop. If so, double-click it. This will launch Fathom, giving you a window
similar to the one below.
Making a Collection and Entering Data
We have the measurements for five people’s heights and wrist circumferences. Given
just these five measurements, how well can we predict height if all we know is wrist
circumference?
2. If you don’t already have an empty document, make one with the New command
in the File menu.
Across the top of the Fathom window is a shelf with tools for making objects.
Selection
tool
Another way to get a
case table is to choose
Case Table from the
Insert menu.
Fathom Workshop Guide
Collection
Case
table
Graph
Summary
table
Estimate
Test
Slider
Text
3. Drag a case table from the shelf into the document.
When you click, your mouse pointer becomes a
closed fist. As you move the mouse over the
document, you see a frame that shows you where
the case table will be placed when you release the
mouse.
You should get an empty table similar to the one
shown at right.
4. Click once on <new>.
5. Type Name for the first attribute and press Enter.
©2001 Key Curriculum Press
9
When you press Enter, a box representing an empty collection appears next to
the table.
6. Make the table wider by dragging on its right edge. This will give us room to see
all three of our attributes at once.
7. Make two more attributes: WristCircum and
Height.
You should now have a table and an
iconified collection as shown at right.
8. Make the WristCircum column wider by
dragging its right edge.
9. Double-click on the label Collection 1 in the
title of either the table or the collection.
This should bring up a dialog for
renaming the collection.
10. Type People and click OK.
Stop. Presenter: Demonstrate how to move and resize the table. Answer questions. Allow 5 minutes
for the following section.
Entering Data
Now we’re ready to enter the names and numbers for
our five people.
11. Click in the empty cell below Name.
12. Type Bill and press Tab.
13. Enter the rest of the data shown in the table at
right. All measurements are in inches. Use Tab to
move to the next cell. At the end of each case, Tab
will take you to the first cell in the next case.
14. If you are not using the demo version of Fathom, you should save and name
your file.
Undo and Redo
Fathom records all changes that you make to your document and lets you undo and
redo them.
15. Click on the Edit menu.
The first item in the Edit menu shows you what you can undo. If the last change
you made was to type in a value, the undo command is Undo Value Change as shown
at left.
16. Choose Undo from the Edit menu several times.
Each time you choose Undo, Fathom takes you one step further back. You can
continue this until you get to the state of the document when you opened (or
created) it.
17. Choose Redo from the Edit menu.
Redo is the inverse of Undo. You can tell from the label in the Edit menu what it is
you are about to redo.
Stop. Presenter: Answer questions. Demonstrate use of arrow keys to move inside the case table.
Show how to rename an attribute. Demonstrate keyboard shortcuts for Undo and Redo commands.
Allow 10 minutes for the next section.
10
©2001 Key Curriculum Press
Fathom Workshop Guide
Graphing Height Versus Wrist Circumference
Now let’s look at the relationship between the two continuous attributes, Height and
WristCircum.
18. Choose Graph from the Insert menu. You can also make a graph by dragging the
graph icon from the shelf to an empty area in the document.
19. Drag the word WristCircum to the horizontal axis of the graph over the spot labeled
Drop an attribute here. (The illustration below shows the steps involved.)
Drag on the column header for
WristCircum. The mouse pointer
becomes a closed fist.
As you move the mouse over the xaxis, a black border appears, showing
that you can release there.
20. Drag the column header for Height and drop it on
the vertical axis of the graph.
Your graph should look similar to the one at right.
Rescaling Axes and the Help System
Let’s use the question, “How do I rescale the axes of a
graph?” as an opportunity to introduce Fathom’s help
system.
21. Choose Fathom Help from the Help menu.
Fathom launches your Web browser.
22. Double-click on How Do I …?.
23. Double-click on Work with Graphs.
24. Click on Adjust axis bounds.
25. Read and try at least the main strategy entry in
How to Adjust the Bounds of Axes. You can also see a
video clip that demonstrates dragging.
26. Close the help window.
27. Rescale the axes of your graph.
28. From the Graph menu, choose Rescale Graph Axes
to put the axes back to their original positions.
Dragging Data
You can drag data in a graph to change it, making it easy to observe the effect of
changes in the data on graphs and analyses.
29. Move the mouse so that the tip of the arrow is on top of one of the data points.
Notice that the arrow changes from a northwest pointer to a west pointer. This is
your clue that you’re positioned to drag.
Notice also that the status bar at the bottom left of the Fathom window shows
the coordinates of the case.
30. Drag the data point.
As you drag, you should see the values for the two attributes changing in the case
table.
Fathom Workshop Guide
©2001 Key Curriculum Press
11
31. Choose Undo Drag from the Edit menu.
The point returns to its original position.
You can prevent data from being accidentally changed by selecting the collection and
choosing Lock Collection from the Data menu (this prevents you from dragging data
points). Make sure Lock Collection is unchecked before continuing with this activity.
Fitting a Movable Line
Recalling that our goal is to be able to predict height from wrist circumference, and
noticing that the points in the graph appear reasonably linear, we place a line in the
graph.
You can also right-click 32. Select the graph by clicking on it once.
(Win) Control-click
33. Choose Movable Line from the Graph menu.
(Mac) on the graph to
The line that appears in the graph is not a fitted line.
bring up a menu with
You can change its slope and intercept by dragging
commands that apply to
it. Dragging on one end of the line causes it to rotate
the graph.
around the other end. Dragging on the middle of the
line moves it parallel to itself. The cursor changes
shape to suggest what will happen when you drag.
Notice that the equation of the line shown below
the graph updates as you drag.
You might suppose that a line through these data
Notice that the lower
should have zero for a y-intercept. You can try out this idea by choosing Lock Intercept
left corner of the graph
at Zero from the Graph menu. You can unlock the intercept by choosing the same
is not necessarily the
point (0, 0).
command again to uncheck it.
34. Experiment with dragging the line. Position it so that it appears to give a good fit
to the data.
While eyeballing a fit through data points is sometimes sufficient, we often need some
criterion for a best fit. A commonly used criterion is a least-squares fit. You can see
how least-squares works.
35. With the graph selected, choose Show Squares
from the Graph menu.
The graph now shows a square constructed from
each point to the line, a square whose length is
equal to the difference between the actual and
predicted value for the point—the residual.
36. Experiment with dragging data points.
Notice that the squares change as you drag, but
the line does not move. (Be sure to use Undo to
put the points back where they started from
before going on.)
37. Experiment with dragging the line. Notice that the squares change and that the
sum of squares reported below the graph changes.
38. Adjust the line so that the sum of squares of the residuals is approximately at a
minimum.
The line that satisfies this criterion is called the least-squares regression line. Fathom can
compute this line.
39. With the graph selected, choose Least-Squares Line from the Graph menu.
How closely did you manage to adjust the movable line to match the least-squares
regression line?
12
©2001 Key Curriculum Press
Fathom Workshop Guide
Stop. Presenter: Answer questions. Demonstrate how to rescale the axes of a graph either by using
the menu command or by reselecting the chosen plot in the graph’s popup menu. Mention the Undo
command and the Revert Collection command as ways to restore data in the collection. Demonstrate
the Lock Collection command. Mention that a document can be locked to prevent inadvertent saves of
changed data. Emphasize that the Graph menu is not present unless a graph is selected. Using the
Show Squares command on a dot plot (unstacked), demonstrate the mean of univariate data as
equivalent to the least-squares line for bivariate data. Allow 15 minutes or more for the last section,
depending on how much time you wish participants to spend on Ideas for Further Exploration.
Making a Prediction—Looking at Residuals
The shortcut for Undo
is Ctrl-Z (Win) a-Z
(Mac); for Redo it is
Ctrl-R (Win) a-R
(Mac).
40. Measure your own wrist circumference in inches. (You can use the measure on
the edge of page 59.)
41. Choose Movable Line from the Graph menu (to uncheck it). The movable line
disappears.
42. Move the mouse pointer along the least-squares line, noting that the coordinates
of the tip of the arrow are reported in the status bar in the lower left corner of the
window. When the x-coordinate of the mouse pointer is at your wrist
circumference, you can read off a prediction for your height.
Probably this prediction was a bit off. Let’s see how much off we might expect it
to be.
43. With the graph selected, choose Make Residual
Plot from the Graph menu.
A plot of the residuals appears below the main
scatter plot. Corresponding to each point in the
original graph is another point below whose
y-value is the difference between the predicted
value and the actual value (its distance from
the line).
44. Drag one of the points in the top graph.
Notice how its residual changes in the residual
plot. Notice also that, since the least-squares line
is changing in response to dragging the point, the
other residual points are changing as well.
45. Use Undo to return the data to their original
values.
The residuals appear to lie roughly in a band
from –3 to +1. How far off was the prediction of your height? Did it lie within that
band?
46. Add yourself as a data point in the case table.
How much does this change the slope and intercept of the least-squares line?
The Influence of an Outlier
Let’s play around a bit with the idea that one data point may or may not have a
significant influence on the regression line.
First, we need a little room to create an outlier.
Fathom Workshop Guide
©2001 Key Curriculum Press
13
To zoom in, hold down
the Ctrl key (Win)
Option key (Mac) and
click or drag; adding
the Shift key zooms
out.
47. Manipulate the axes by dragging the upper
and lower ends of each axis toward the
middle.
You should have something similar to the
graph shown at right. (The bounds of the
axes have changed.)
48. Drag the rightmost data point far from the
line.
Notice how wildly the slope and intercept of the
regression line can change in response to changes
in one of the points. A least-squares regression
line is quite sensitive to outliers, making it
especially important that you look at your data
graphically before reporting a least-squares slope
or intercept.
49. Return the data point to its original value.
Ideas for Further Exploration
Hold down the Shift
key while you drag to
constrain the motion of
a point.
•
Find several qualitatively different arrangements of the five data points that have
a clear geometry but result in an r-squared value of zero.
•
Quantify the influence of particular points on the slope of the least-squares line.
You should come up with a way to measure the rate of change of the slope per
inch of height (or wrist circumference). Which point has the most influence?
Which point has the most influence on the y-intercept?
•
The Graph menu has a command to show a median-median line. Investigate this
line’s behavior when data is dragged and compare it to the behavior of the leastsquares regression line.
Stop. Presenter: Answer questions. Ask participants who have found interesting things (possibly by
working on the Ideas for Further Exploration) to show the rest of the group what they found. Ask
for participants’ reactions to the term “dynamic statistics” and why it applies to Fathom.
14
©2001 Key Curriculum Press
Fathom Workshop Guide
Tutorial 2: An Introduction to Exploring Data—
Residents of Beverly Hills, California
This tutorial shows how you can explore data using graphs and tables. We’ll look at a
sample of residents of Beverly Hills, California. The data were obtained from the 1990
U.S. Census “microdata.” Each case represents one person who filled out a long
census form in 1990.
Presenter: It may be helpful to demonstrate how to open the Beverly Hills document before turning
participants loose on this tutorial. Allow 10 minutes for the first section.
Opening the BeverlyHills Document
There is probably a
Fathom shortcut or
alias on the desktop
that you can doubleclick to launch
Fathom.
Fathom Workshop Guide
1. If Fathom is not already running, launch it.
2. Choose Open from the File menu.
3. In the resulting dialog box, open the Sample Documents folder, then the Learning
Guide Starters folder, and then the BeverlyHills file. (If you aren’t in the Fathom folder,
first navigate to the Programs folder (Win) or hard drive (Mac), then open the
Fathom folder.)
Your document window should look similar to the one shown below. There are
two objects: a collection of cases, each case a person represented as a gold ball,
and a text object, explaining about the document.
©2001 Key Curriculum Press
15
Inspecting the Data
You can also bring up
the inspector by
selecting the collection
and then pressing Ctrl-I
(Win) a-I (Mac).
4. Double-click the collection.
The inspector appears. If you double-click on a
case, the inspector shows that case; otherwise, it
defaults to the first case in the collection. The
inspector shows the first few attribute names in
the left column. To see the remaining attribute
names, use the scroll bar to scroll down the list,
or resize the inspector by dragging on the
window border.
5. Click on the inspector’s Comments tab.
You will see a brief description of this data set.
The Comments pane of the collection is a good
place to keep documentation and notes about
the data. It is also where notes and attribute
definitions may appear in collections that you
download from the Internet.
6. Close the inspector by clicking its close
box.
Making a Case Table
The keyboard shortcut
for inserting a case
table is Ctrl-T (Win)
a-T (Mac)
Often you want to see the data in a table. Here’s how to
make a case table in Fathom.
7. Click once on the collection to select it. (When you
have it selected, it has a border around it.)
8. Choose Case Table from the Insert menu.
The table that appears should be named BeverlyHills
and should show the attribute names in the top row.
If the table is empty, you probably did not have the
collection selected before choosing the command.
To connect the collection to the table, drag the
name as it appears in the collection onto the table as
shown below.
Drag from the collection to the table and release.
Each row in a case table shows attribute values for one case. Each column
corresponds to one attribute. You can use a case table to change values, to enter new
data, to add attributes, and to define formulas for attributes.
9. Drag a bottom corner of the case table to enlarge the table to see more cases and
attributes.
Making Graphs
Let’s look at the range of ages among the people in the sample by creating a graph
of age.
10. Drag a graph from the shelf into your document.
Fathom needs to know which attribute to graph.
16
©2001 Key Curriculum Press
Fathom Workshop Guide
11. Drag the age attribute by its name from the case table to the horizontal axis of the
graph over the spot labeled Drop an attribute here.
The result should be a stacked dot plot of the ages
of the people in the sample, as shown at right.
We see that the ages of the people in our sample range
from zero to almost 100.
12. Move the mouse pointer over one of the points in
the plot.
In the lower left corner of the Fathom window you
should see information about the point under the
mouse.
Different graphs will show different kinds of information as you move the mouse
pointer over the plot.
A stacked dot plot chooses to stack dots whose values lie inside a bin the width of
one dot. To get more control over the binning, we need a histogram rather than a dot
plot.
13. Click on the tab in the upper right corner labeled Dot Plot.
A popup menu, as shown at left, appears.
14. Choose Histogram from the popup menu.
This histogram groups people according to their ages. Each bin contains the number
of people indicated along the vertical axis by its height. Notice the peak in the range
from thirty to forty. Perhaps these are the baby boomers.
Stop. Presenter: Answer questions. Demonstrate having more than one graph visible simultaneously.
Show linked selection in the two graphs. Show how to delete components with the command in the
Edit menu and with the Delete key (on a Mac, a-D also deletes the selected object). Demonstrate the
View in Window command. Allow 5 minutes for the following section.
Plotting Values
Suppose we want to compare the mean and median of the age distribution. We can
do that right on the graph.
15. With the graph selected, choose Plot Value from
the Graph menu.
This brings up the formula editor, ready for
you to specify the expression whose value you
want plotted.
16. As shown at right, type mean( ).
Notice that when you type the left parenthesis,
you get the right parenthesis automatically.
17. Click OK.
A vertical line appears in the graph at the mean
and the value of the mean is shown below the
graph.
Now the median.
18. With the graph still selected, choose Plot Value from the Graph
menu again.
You could type median( ), but instead, here’s how you can choose it
from the list of functions.
19. Click on the control next to the word Functions in the list.
Fathom Workshop Guide
©2001 Key Curriculum Press
17
Apply updates the
graph but does not close
the formula editor.
Then you see the kinds of functions available.
20. Open the Statistical list by clicking its open control.
21. You’ll find median in the One Attribute group; double-click it to enter it into the
formula.
Notice that the bottom portion of the formula
editor displays some information about the median
function and gives an example of how to use it.
22. Click OK.
The graph should look similar to the one shown at
right.
Let’s experiment with how these plotted values behave
when data are changed.
23. Drag one of the bins of the histogram.
Notice that the mouse pointer becomes a hand when you are over a bin (as
opposed to a bar boundary), indicating that you can drag the whole bin.
As you drag, watch the two plotted values. When does the value for the mean
change? When does the value for the median change? Can you demonstrate that
the median is insensitive to outliers, while the mean is not?
24. Restore your data to their original values.
Stop. Presenter: Answer questions. Demonstrate that any expression may be plotted; e.g.
mean( ) + 1.96 * s( ). Demonstrate that a normal distribution can be plotted on top of the histogram
and that this is especially easy with a density histogram. Demonstrate that you can supply an
argument for a function, but that if you don’t, Fathom assumes the attribute on the axis is the
argument. Allow 10 minutes for the following section.
Making a Summary Table
The keyboard shortcut
for inserting a summary
table is Ctrl-U (Win)
a-U (Mac). You can
also drag one from the
shelf.
Formulas are displayed
under the table. The
values appear in the
same order in the table
cells.
18
A summary table is another important tool for exploring data.
25. Choose Summary Table from the Insert menu. Fathom puts an
empty table in your document, as shown at right.
The summary table is empty because Fathom doesn’t know
which attributes we might be interested in.
26. Drag the sex attribute to the summary table, dropping it on
the down-pointing arrow. (As with the graph, a black outline
appears when you are over a drop area.) The result should be
similar to the illustration.
From this table we see that there are 79 females and 71
males, 150 cases altogether.
By default, when showing a categorical attribute, the summary
table displays the number of cases in each group.
27. Drag the age attribute to the summary table, dropping it on the right-pointing
arrow.
When you add a continuous attribute such as age, Fathom
adds the mean for each group and the overall mean.
Suppose you want to know something other than the count and
the mean. You can add as many formulas as you want to a
summary table.
©2001 Key Curriculum Press
Fathom Workshop Guide
You will only see a
if the
summary table is
selected.
Summary menu
28. With the summary table selected, choose Add Formula from the Summary menu.
A formula editor appears, ready to accept any algebraic expression whose value
you wish to compute. Let’s compute the range of ages. Fathom doesn’t have a
built-in range function, so we’ll take the difference of the maximum and
minimum values.
29. Type the following expression into the formula editor max( ) – min ( ).
30. Click OK.
A third formula appears under the previous two, and a third value appears in
each cell.
Here are a few useful things to know about using summary tables. Try out the ones
that interest you.
• You can change which attributes are displayed in the table by dragging attributes
to the top or side column or the corresponding arrows.
• You can replace one attribute with another by dragging an attribute on top of
the attribute you wish to replace.
• To remove an attribute from the table, click once on the attribute and choose
Remove Attribute from the Summary menu.
See if you can answer the following questions using the summary table.
• What are the median incomes for males and females?
• Are there more married or never married people in this collection?
• What is the value of the mean age plus one sample standard deviation for
married people in this sample from Beverly Hills?
Splitting a Graph
31. Suppose we want to know whether the
distribution of incomes differs between men
and women.
32. Drag the sex attribute to the vertical axis of the
histogram and drag income to its horizontal axis.
The result should be similar to that shown here. The
sex attribute has split the histogram into an upper
histogram of males and a lower histogram of females.
We say that we split the histogram of the numeric
attribute, income, by the categorical attribute, sex.
Notice that two medians and two means are plotted,
one for each group. How do you interpret the values listed below the graph?
Some graphs can be split by a legend attribute. Let’s look at age versus income, split
by marital status.
33. Drag age to the x-axis and income to the y-axis.
You get a scatter plot.
34. Drag marital to the middle of the graph.
You should get a graph similar to the one at
right.
Describe the four square points in the upper
middle of the plot using all three of the attributes.
Find out if they are men or women.
Fathom Workshop Guide
©2001 Key Curriculum Press
19
Ideas for Further Exploration
•
Investigate the graphs you can make with only categorical attributes.
•
Make up and answer questions like the following. Which group has the higher
percentage of males: people who are divorced or people who are married? Which
is the most commonly listed ancestry and what is the median income of that
group?
•
Make a percentile plot of income and figure out how to read it. Use it to
determine the income corresponding to the 80th percentile.
•
Use a text object to write a statement about the people in this sample. Make a
display to go along with the statement. Repeat two more times.
Stop. Presenter: Answer questions. Demonstrate the various kinds of summary tables you can
generate, depending on the attribute configuration. Demonstrate adding a filter to a graph or summary
table. Allow participants who have completed one of the explorations to show the group what they
found.
20
©2001 Key Curriculum Press
Fathom Workshop Guide
Tutorial 3: Formulas, Functions, and Data—The
Planets
This tutorial focuses on using formulas to define new attributes and to plot functions.
We’ll look at a collection with only nine cases—the planets in our solar system. From
the data we’re given, we’ll compute the planets’ densities and investigate the
relationship between a planet’s distance from the sun and the length of its year.
You’ll learn how to use formulas to define attributes, how to plot functions in a
scatter plot, and how to use sliders.
Presenter: Allow 10 minutes for this first section.
Computing the Densities of the Planets
1. If Fathom is not already running, start it by double-clicking its icon.
2. Choose Open from the File menu. Choose the file Planets from the Learning Guide
Starters folder in the Sample Documents folder. Your window should resemble the
one below.
You can also create a
new attribute just to the
left of one that already
exists by selecting the
existing attribute and
choosing New Attribute
from the Data menu.
Fathom Workshop Guide
First we’ll compute each planet’s density. For that we need to know its volume and its
mass, since density is the ratio of these measurements. The volume comes from the
radius, the attribute R_Mm, measured in millions of meters. The mass, measured in
Earth masses, is recorded in the attribute M_Earths. We’ll compute density in two steps
using two new attributes (though we could do it in one), beginning with volume. You
will want to hide the text object first, to give yourself more room; when it is selected,
press Ctrl-H (Win) a-H (Mac).
3. Resize the table, dragging to the right until you see the column headed <new>.
4. Click once on <new>.
5. Type Volume and press Enter.
Now we need to tell Fathom the formula for volume.
6. From the Display menu, choose Show Formulas.
7. Double-click in the formula cell (which is shaded) just beneath Volume.
The formula editor appears as shown on the next page. Enter the formula using your
computer keyboard or the formula editor’s keypad and list. Here are some tips.
©2001 Key Curriculum Press
21
The formula for the
volume of a sphere is
V = 43 πr
3
• From the keyboard, type * for multiplication,
/ for division, and ^ (Shift-6) for
exponentiation.
• To escape the fraction, use either the right
arrow key or a mouse click.
• Multiplication is sometimes indicated as a
dot between terms and sometimes, as in
traditional algebra, as nothing between
terms, though you need to put it in.
• Double-click on an item from the list to
enter it into the formula.
• Enter π either by choosing it from the Special
heading or by typing pi (with an asterisk after
for multiplication).
• Use the attribute R_Mm in place of r. You can get it from the attributes list or by
typing it.
Look up Formulas in
the help system for more
information and hints.
You can resize the
formula row in the case
table by dragging its
bottom edge down.
1086.78 is the volume
of the Earth in cubic
megameters.
• Click an item in the list to get an explanation of it in the help area at the bottom
of the formula editor.
• Vertically resize the formula editor’s areas by dragging their borders.
• Use the arrow keys on the editor keypad or on your computer keyboard to move
right or left. The up arrow selects the next largest portion of the expression; the
down arrow selects a smaller term.
8. Close the formula window by clicking OK or pressing Alt-F4 (Win) a-W (Mac).
Fathom computes the volumes and displays them in the table in gray to
distinguish them from noncomputed values.
The next attribute has an easier formula: Density =
Mass
Volume
9. Make a new attribute, Density.
10. Give the new attribute the formula shown to the right. The
numerator is the mass in Earth masses and the denominator is the
volume in Earth volumes, so the density comes out in Earth
densities. This lets us understand the density of the planets relative to our own
planet.
11. Make a graph with Density on the
horizontal axis. It should look similar to
the one at right.
When you click on a data point, the
corresponding planet in the collection is
selected. You can select more than one case at
a time by clicking and dragging a rectangle
around some points as shown below left. When you double-click a planet, you get its
inspector.
Answer these questions.
• Which planet has the lowest density?
• What do the three planets on the right have in common?
• What do the four planets on the left have in common?
• What is special about Earth?
22
©2001 Key Curriculum Press
Fathom Workshop Guide
• If the Moon is a chunk of Earth, where should it lie in the graph? Look it up and
see where it does lie.
Stop. Presenter: Answer questions. Demonstrate that formulas can be copied and pasted, either
within the formula editor or from one attribute to another. Reinforce use of arrow keys to navigate
around a formula. Allow 10 minutes for the next section.
Playing Kepler
The graph has to be
selected for the Graph
menu to appear.
To zoom in, click on
the axis while holding
down the Ctrl key
(Win) Option key
(Mac).
Now we’ll study one of the most famous relationships in physics—how the period of
an orbit depends on the orbit’s radius.
12. Make a scatter plot with year on the vertical axis and orbit_AU on the horizontal axis.
You should see points lying on a curve gently curving upward. It looks as though we
could predict how long a planet takes to go around the sun if we knew the radius of
its orbit.
But what is the function? One way to find out is to try to plot it.
13. With the graph selected, choose Plot Function from
the Graph menu.
14. Type in a plausible function, say, orbit_AU2, and
click OK.
It’s not quite right! Let’s change the exponent. We’ll
use a slider so we can drag continuously from one
value to another.
15. Choose Slider from the Insert menu or drag one
from the shelf.
A slider appears. The illustration below right
shows its parts. By default the slider has the
name V1.
We think that the curve should be between linear and
Animation
Slider name Slider value
button
quadratic, that is, between a power of 1 and 2. The
slider scale is like a graph axis, so we can change it.
16. Double-click the name and type something more
reasonable, like P for power.
Draggable “thumb”
Scale
17. Zoom in on the slider (or drag the axis) until the
scale goes roughly between 1 and 2.
18. Double-click the formula below the graph so that you can edit it. Replace the 2 in
the exponent with P. Click OK.
19. Drag the slider’s thumb.
As you drag the slider’s thumb, the curve moves accordingly. Another way to change
the slider value is to edit it directly.
20. Double-click the slider’s value and type a new one, pressing Tab or Enter when
you’re done.
21. Find a value for P so the curve fits nicely through the points.
You should find that P = 1.5 or P =
3
2
fits very well. The curve’s equation is
year = radius3/2, or, as Kepler would have written it, T 2=R 3, where T is the length of
the year and R is the planet’s radius.
Stop. Presenter: Answer questions. Demonstrate slider animation. Allow 10 minutes for the next
section.
Fathom Workshop Guide
©2001 Key Curriculum Press
23
Kepler with Logs
Another way to figure out Kepler’s Third Law involves taking the log of both
attributes.
22. Make two new attributes, named logorbit and logyear. (These are names, not
formulas.)
23. Make their formulas log(orbit_AU) and log(year), respectively.
24. Make a scatter plot of logyear versus logorbit.
It looks like a line!
25. Choose Least-Squares Line from the Graph
menu.
Check out the slope. Look familiar?
If T2 = R3, then log(T2) = log(R3). So
2 log(T) = 3 log(R). This means that Kepler’s
law implies that the slope of log(year) versus
log(orbit) should be 1.5, and we see that it is.
As you can see in the graph at right, there is a
gap between Mars and Jupiter. What goes
there?
Ideas for Further Exploration
•
Investigate the residual plot for both the plot of orbit_AUP and the plot of logyear vs.
logorbit. One planet stands out in particular on these residual plots. Which one?
Postulate why.
•
Make an empty graph. Choose Function Plot from its popup menu. Try plotting
different kinds of functions. Use sliders as parameters.
Stop. Presenter: Answer questions. Ask for participants to show some of the things they tried. A
possible demo is to show how to compute an attribute in the collection whose values are the residuals
from a least-squares regression line.
24
©2001 Key Curriculum Press
Fathom Workshop Guide
Tutorial 4: Simulation—Polling Voters
In this tutorial, we simulate a population of voters, a certain proportion of whom will
vote in favor of a particular proposition. We investigate the question of how
accurately a random sample of voters can predict the outcome of the election.
In the process you’ll learn how to use a slider to define a population parameter, to use
the random function, to define a measure for a collection, and to repeatedly collect
measures from a collection.
Presenter: Allow 5 minutes for the first section.
Defining the Problem
The city of Freeport has a rent control initiative, Prop A, on the ballot. The local
newspaper is going to conduct a poll three weeks before the election to gauge public
sentiment. Staff members need to know how big a sample to poll. Your job is to set
up a simulation they can use to determine, for any given sample size, the accuracy
they should expect.
Making a Simulated Sample of Voters
First we model the population, the people who will vote in the election either for or
against the proposition. This model will consist of a single number, the proportion of
voters who will vote yes.
1. Start with a new, empty Fathom document.
2. From the shelf, drag a new slider into the document.
By default a slider gets the name V1. We want the slider to
have a name that fits with our model.
3. Double-click the name portion of the slider, the V1,
and type probYes.
Since a proportion of voters can only go between zero and
1, we need to adjust the slider scale.
4. Drag the numbers on the slider scale until the scale goes approximately from
0 to 1.
5. Set the value of the slider (by dragging its thumb) to something near 0.5,
simulating a close election.
The slider models the population. A collection will model a sample of voters.
6. Drag a collection from the shelf into the document.
The collection will be empty and named Collection 1.
7. Double-click on name below the box. Use the resulting dialog box to name the
collection Sample of Voters.
We want this collection to contain 100 cases for starters. Later we can investigate
other sample sizes.
8. With the collection selected, choose New Cases… from the Data menu.
9. Type 100 in the resulting dialog box and click on OK.
Stop. Presenter: Answer questions. You may wish to give a preview of the if( ) function. Allow 5
minutes for the next section.
Fathom Workshop Guide
©2001 Key Curriculum Press
25
Simulating the Votes
The shortcut for
Rerandomize is Ctrl-Y
(Win) a-Y (Mac).
So far, the cases in the collection have no attributes. We
want one attribute, vote, whose value is either yes or no.
10. Double-click the collection.
This brings up its inspector. In it you will define an
attribute for a vote.
11. Click <new>, type Vote, and press Enter.
12. Double-click in the formula cell in the right column.
You should see a formula editor. Now you need to enter
the formula that will choose “yes” the percentage of the
time specified in your slider, probYes.
13. Enter the conditional formula shown at right by
typing if(random()<probYes Tab “yes” Tab “no”.
The random function, if you don’t tell it a minimum and
maximum, generates random numbers from 0 to 1.
14. Close the formula editor by clicking the OK button.
You should see the value of the Vote attribute in the inspector become either yes
or no. Scroll through a few more cases by clicking on the right arrow in the lower
left corner of the inspector. Some of the values should be yes and some should be
no.
Let’s make a graph of the Votes. A bar chart will show us at a glance whether the poll
predicts the proposition will pass or fail.
15. Drag a graph from the shelf into the document.
16. Drag the vote attribute name from the
inspector onto the x-axis of the graph.
You should now have the three objects
shown at right.
17. Choose Rerandomize from the Analyze menu.
Do this several times.
Each time you rerandomize, the bars in the bar
chart will change, reflecting the results of a new sample.
Another way to rerandomize is to drag the slider thumb. When the slider value is near
1, most of the votes will be “yes,” and when the slider value is near zero, most of the
votes will be “no.”
Stop. Presenter: Answer questions. Allow 15 minutes for the remaining section of this tutorial.
Defining and Collecting a Measure
18. Move the slider back to something close to 0.5.
We’ve been hired to predict the accuracy of a poll conducted with a given sample size.
To figure that out, we’re going to have to record the prediction of each poll we take.
We do that with a measure.
19. In the inspector window, click the Measures tab.
As yet no measures have been defined.
20. Click <new> and type proportionOfYes for the measure’s name.
26
©2001 Key Curriculum Press
Fathom Workshop Guide
Your inspector won’t
look exactly like this
because we’ve adjusted
its size and column
widths.
If collecting the
measures takes too
long, press the Esc key.
(On a Mac, a-. also
cancels collecting
measures.) You can
speed things up by
turning animation off.
21. Double-click in the formula
cell in the proportionOfYes row.
22. Enter the formula proportion(vote
= “yes”) as shown at right.
23. Rerandomize several times
and observe the values for
proportionOfYes.
Now we’re ready to perform the poll of 100 voters many times, each time recording
the proportion of voters who say they will vote in favor of the proposition.
24. Close the Sample of Voters inspector.
25. With the collection selected, choose Collect Measures
from the Analyze menu.
You should see a new collection, as shown above right.
This collection will contain the results of repeatedly
conducting the poll.
26. Double-click the measures collection to open
its inspector.
27. Bring the Collect Measures pane forward.
28. Change the number in the measures field from 5
to 100.
29. Click Collect More Measures.
Fathom rerandomizes the Sample of Voters collection
100 times, each time computing a value for
proportionOfYes and storing that value in a new case in
the measures collection.
Let’s take a look at the results.
30. Make a new graph.
31. Bring the inspector’s Cases pane forward.
32. Drag the proportionOfYes attribute from the inspector to the x-axis of the graph.
33. From the graph’s popup menu, change the graph
from a dot plot to a histogram.
An example of what you might get is shown at right.
Here are some observations you might make about the
histogram of proportionOfYes.
• The distribution looks plausibly normal.
• The mean of the sample proportions should be
near the probYes value.
• It looks as if the mean of the sample proportions is near 0.5. We expect this
result since the probYes slider was set near 0.5, and the mean of the sample
proportions is the proportion for a single sample of 100 x 100 or 10,000 voters.
The spread of proportions that we get with a sample size of only 100 is quite large.
We certainly couldn’t accurately predict the results of a close election! Just how large
is this spread?
34. From the Insert menu, choose Summary Table or drag a summary table from the
shelf (the table with the big S).
35. Drag the proportionOfYes attribute from the inspector into the top row of the
summary table.
Fathom Workshop Guide
©2001 Key Curriculum Press
27
36. Double-click the formula, where it says
S1=mean( ), and change it to stdDev( ). This will
compute the standard deviation of the
proportions gathered in the measures collection.
Dependence of Spread on Sample Size
Let’s try a larger sample size.
37. With the Sample of Voters collection selected, choose New Cases… from the Data menu.
In the dialog box, ask for 300 new cases and click OK. The collection now has 400
cases altogether.
38. Bring back the Collect Measures pane in the measures collection inspector.
39. Uncheck Animation on.
40. Make sure Empty this collection first is checked.
41. Click Collect More Measures to collect 100 measures of 400 cases each.
We expect that the larger the sample, the smaller the variation in the observed
population proportion in favor of Prop A. But how much smaller? What do you find
for the standard deviation of sample proportions with a sample size of 400? What do
you think would be the standard deviation if the sample size were 1000?
You now have a tool to use with the newspaper staff members. You could sit down
with them and try different sample sizes until you find a sample size for which the
variability is less than some desired number of percentage points.
Ideas for Further Exploration
•
Determine a sample size such that twice the standard deviation of sample
proportions is about 0.03 or 3% when probYes is 0.5.
•
When we actually conduct a poll, we do not know the true proportion of the
population. (After all, that’s why we’re doing the poll.) Probability theory predicts
that we can expect to capture the true proportion 95% of the time in the range
± 1.96
p (1− p )
n
where p is the proportion found in the sample and n is the number in the sample.
Try modeling this in Fathom and see how close your simulation comes to theory.
•
We have set up this simulation to make it easy to change the population
proportion. With small sample sizes, population proportions near the extremes,
that is, near 0 or 1, are problematic.
Investigate why this might be so.
Stop. Presenter: Answer questions. Demonstrate how to
gather and display the data shown at right. Do this by
defining a new measure in the Sample of Voters collection.
Call it n and give it the formula count(). In the measures
collection, turn off the Empty this collection first option.
Collect measures with different sample sizes. Drag n to the y
axis of the graph while holding down the Shift key so that it
will be treated as categorical.
Have participants who have done further explorations show
what they have found.
28
©2001 Key Curriculum Press
Fathom Workshop Guide
Tutorial 5: Testing a Hypothesis
Charles Darwin believed that there were hereditary advantages in having two sexes in
both the plant and animal kingdoms. Some time after he wrote Origin of Species, he
performed an experiment in his garden at Down House in Kent. He raised two large
beds of snapdragons, one from cross-pollinated seeds, the other from self-pollinated
seeds. He observed, “To my surprise, the crossed plants when fully grown were
plainly taller and more vigorous than the self-fertilized ones.” This led him to another,
more time-consuming experiment in which he raised pairs of plants, one of each type
in the same pot, and measured the differences in their heights. He had a rather small
sample and was not sure that he could safely conclude that the mean of the
differences was greater than zero. His data for these plants were used by statistical
pioneer R. A. Fisher to illustrate the use of a t-test.
In this tutorial, you’ll learn how to perform a t-test for the mean, generate simulated
data from a normal distribution, and get a distribution for the t-statistic.
Presenter: Allow 10 minutes for the first section.
Looking at Darwin’s Data
1. Open the file Darwin in the Learning Guide
Starters folder in the Sample Documents folder.
This document contains the data for the experiment described above. The collection contains
only fifteen cases with one attribute.
2. Make a case table, a histogram, and a
summary table similar to those shown here.
We see that most of the measurements are
greater than zero, meaning that the crosspollinated plants grew bigger. But two of the
measurements are less than zero. These two
values might be due to some sort of error, but
we keep them because we know that Darwin did.
Formulating a Hypothesis
Darwin’s theory—that cross-pollination produced bigger plants than selfpollination—predicts that, on the average, the difference between the two heights
should be greater than zero. On the other hand, it might be that his fifteen pairs of
plants have a mean difference as great as they do—21 eighths of an inch—merely by
chance. You can write out these two hypotheses in Fathom, to be stored with your
document.
3. From the shelf, drag a text object into
the document.
4. Write the null hypothesis and the
alternative hypothesis. At right you can
see one way to phrase them.
Deciding on a Test Statistic
Not a statistician, Darwin decided to ask the advice of his cousin, Francis Galton, an
eminent statistician, who told him that there was currently no good theory to deal
with a small sample from a population whose standard deviation is not known. In
fact, it was not until some years later when William Gosset, an employee of the
Guinness Brewing Company and a student of Karl Pearson, developed a statistic and
Fathom Workshop Guide
©2001 Key Curriculum Press
29
its distribution. Gosset published his result under the pseudonym Student and the
statistic became known as Student’s t. When the null hypothesis is that the mean is
zero, the t-statistic is just
x
s
n
where x is the observed mean, s is the standard
deviation, and n is the number of observations.
Let’s compute this statistic for
Darwin’s data.
5. From the Analyze menu, choose
Test Hypothesis or drag a test
from the shelf (the balance
icon).
An empty test appears.
6. From the popup menu in the
upper right, choose Test Mean.
As shown at right, the Test Mean
test assumes that you are going to
type in summary statistics. The blue
text is editable. This is very useful
when you don’t have raw data.
7. Try editing the blue text. You can, for
example, enter the summary statistics
for Darwin’s data.
Here are some things to notice.
edit text cursor
popup menu cursor
edit formula cursor
• As you hold the cursor over blue text,
the cursor changes to show you to edit
that field.
• Numeric values are determined by a
formula. If you click on the value, you
get a popup menu as shown below.
Choose the one available option and
you get a formula editor. You can type
a number or a formula here.
• When you change something in one part of the test, it may affect other parts.
For example, editing the <AttributeName> field in the first line also changes it in the
hypothesis line and the last paragraph.
In statistics
terminology, we want
this to be a one-tailed
rather than a two-tailed
test.
30
• In the hypothesis line, clicking on the is not equal to phrase
brings up a popup menu from which you can choose
one of three options. For Darwin’s experiment, we want
the third option because his hypothesis is that the true
mean difference is greater than zero. Notice that making
this change alters the phrasing of the last line of the test as well.
Stop. Presenter: Answer questions. Make sure everyone understands how to edit the test hypothesis
object. Demonstrate using a slider for a parameter such as the alternative hypothesis mean. Allow 10
minutes for the next section.
©2001 Key Curriculum Press
Fathom Workshop Guide
Checking Assumptions
The shortcut for Edit
Formula is Ctrl-E (Win)
a-E (Mac).
The shortcut for
rerandomizing is Ctrl-Y
(Win) a-Y (Mac).
Gosset’s work with the t-statistic relied on an assumption about the population from
which measurements would be drawn, namely, that the values in the population are
normally distributed. Should we be comfortable with this assumption for Darwin’s
data?
In the assumption’s favor is experience with height measurements of other living
things, both plants and animals. These are usually normally distributed, and so are
differences between heights. But we might worry, because the two negative values
give a decidedly skewed appearance to the distribution.
Fathom can help by allowing us to determine qualitatively whether this amount of
skew is unusual or not. We’ll generate measurements randomly from a normal
distribution and compare the results with the original data.
8. Make a new attribute in the collection. Call it SimHeight for simulated height.
9. Click once on the name of the new attribute in the case table to select it.
10. Choose Edit Formula from the Edit menu to bring up a formula editor.
11. Enter the formula shown here: randomNormal (mean(HeightDifferences), stdDev
(HeightDifferences)). This will generate random values drawn from a normal
distribution whose mean and standard deviation are the same as for Darwin’s
data.
We want to compare these simulated heights with the actual ones.
12. Create a new graph and drag the SimHeight attribute to its x-axis. Use the plot
popup menu to make a histogram.
One set of simulated data doesn’t tell the story. We need to look at a bunch.
13. With the graph selected, choose Rerandomize from the Analyze menu.
Each time you rerandomize, you get a new set of 15 values from a population with
the same mean and standard deviation as the original 15 measurements.
The leftmost histogram below is of the original measurements. Next to it are three of
the histograms you might get from the simulated heights.
A bit of subjectivity is called for here. Does it appear that the original distribution is
very unusual or does it fit in with the simulated distributions?
Testing the Hypothesis
Once we have decided that the assumption of normality is met, we can go on to
determine whether the t-statistic for Darwin’s data is large enough to allow us to reject
the null hypothesis.
In a previous section, we typed the summary values into the test as though we didn’t
have the raw data. But we are in the fortunate position of having the raw data, so we
can ask Fathom to figure out all the statistics using those data.
Fathom Workshop Guide
©2001 Key Curriculum Press
31
14. Drag the HeightDifferences attribute from
the column header of the case table to
the top panel of the test, where it says
Attribute (continuous) <unassigned>.
15. If the hypothesis line does not already
say is greater than, then choose it from the
popup menu.
The last paragraph of the test describes the
results. If the null hypothesis were true and
the experiment were performed repeatedly,
the probability of getting a value for
Student’s t this great or greater would be
0.025. This is a low P-value, so we can safely
reject the null hypothesis and, with Darwin,
pursue the theory that cross-pollination
increases a plant’s height compared to selfpollination.
Stop. Presenter: Answer questions. Demonstrate the non-verbose form of the hypothesis test.
Demonstrate how to hide and how to iconify objects to free up screen space. Allow 10 minutes for the
next section.
Looking at the t-Distribution
You may want to
iconify some of the
objects to free up some
space. You’ll need the
test and the graph of
the original data.
32
It is helpful to be able to visualize the P-value as
an area under a distribution.
16. With the test selected, choose Show Test
Statistic Distribution from the Test menu.
The curve shows the probability density for the
t-statistic with 14 degrees of freedom. The
shaded area shows the portion of the area under
the curve to the right of the test statistic for
Darwin’s data. We’ve set this up as a one-tailed
test; we’re only interested in the mean difference
being greater than zero. Other times we want to
know if a given mean is different from zero. That would be a two-tailed test. You can
change the test to a two-tailed test using the popup menu to change is greater than to is
not equal to. If you do this, you will see a shaded region on both the positive and negative
ends of the plot. The total area under the curve is one, so the area of the shaded
portion corresponds to the P-value for a two-tailed test.
Let’s investigate how the P-value depends on the test mean, currently set to zero.
17. Drag a slider from the shelf into the document.
18. Edit the name of the slider to TestMean.
19. Select the histogram of original data and choose Plot Value from the Graph menu.
20. In the formula editor, type the name of the slider (or choose it from the Global
Values section of the list to the right of the keypad).
You should see a vertical line in the histogram corresponding to the value of the
slider.
21. Click on the 0 in the statement of the hypothesis in
the test. Choose Change formula for value from the
popup menu.
22. In the formula editor enter TestMean just as you did for the plotted value.
©2001 Key Curriculum Press
Fathom Workshop Guide
Now the value of the null hypothesis mean in the test, and the shaded area under the
t-distribution, change to reflect the new hypothesis.
23. Drag the slider slowly and observe the changes that take place.
For what value of the slider is half the area under the curve shaded? Why does the
shaded area under the curve increase as the value of the slider increases?
Ideas for Further Exploration
•
What effect do the negative values have on the results? Set TestMean to zero and
drag one of the negative points to the left to make it even more negative. How far
can you go before the P-value reaches 0.05?
•
Darwin had 15 observations to work with. Could he have gotten away with
fewer? Set up a Test Mean in which you type values into all the parameters of the
test except the number in sample to which you attach a slider, n, using the
formula round(n). What is the smallest sample size (with the same mean and
standard deviation) that results in a P-value of 0.05 or less?
Stop. Presenter: Answer questions. If there is time for a demonstration, show how to define the tstatistic as a measure for the SimHeight attribute with a population mean of zero. Collect many of
these measures and use the distribution of t-statistic values to show the relationship between the
simulation and the analytically computed value.
Fathom Workshop Guide
©2001 Key Curriculum Press
33
Fathom Demo Documents
The tutorials in this guide introduce you to a small portion of what you can do with
Fathom. Fathom comes with over 40 demo documents, and these provide a fairly
quick way to get an idea of other capabilities and possibilities for using Fathom in the
teaching of math and statistics. While the sample documents show off Fathom’s
capabilities and are often useful, especially for demonstrations, the real fun (and much
of the learning) comes from figuring out how to create such documents in Fathom.
Here we list a few of these documents with brief notes to help you choose which
ones to look at.
Fathom Demos\Algebra\CompoundInterest: This is a slider-parameterized
model of compound interest based on initial investment, interest rate, periods per
year, and payment per period. It comes with questions for students to investigate, but
it also can be used (without the text object) as a demonstration document.
Fathom Demos\Algebra\ScreenSaver: Here we have a kind of parametric plot
that makes interesting Lissajous figures when the sliders that define it are animated.
Consider it as an inspiration for student projects.
Fathom Demos\Data Display and Exploration\Computers in Schools: This
sample doesn’t really show much of anything that Fathom can do, but it does provide
some interesting data as a starting point for investigations about curve fitting.
Fathom Demos\Data Display and Exploration\TexasCounties: Here you can
see how it is possible to use an open collection to display meaningful information, in
this case median values of homes in the counties of Texas.
Fathom Demos\Distributions\Normal: The collection in this document has 100
numbers from a normal distribution whose mean and standard deviation are
determined by sliders. A histogram with a normal distribution curve overlay and a
normal quantile plot allow you to gain some insight into randomness and normal
distributions.
Fathom Demos\Simulations\BrownianMotion: This partially complete
simulation provides a starting point for investigating Brownian motion.
Fathom Workshop Guide
©2001 Key Curriculum Press
35
Sample Activities
With Fathom comes Data in Depth: Exploring Mathematics with Fathom, by Tim
Erickson, which contains well over 100 activities for using Fathom in the teaching of
mathematics and statistics. We’ve chosen a few of these activities to help show how
Fathom can illuminate concepts and pose worthwhile challenges. We’ve modified the
pages somewhat to fit within this Workshop Guide.
You can open the files required for the activities (those that require files) by
navigating to the DataInDepth folder within the SampleDocuments folder in the Fathom
folder.
Tall Buildings Sonata (p. 38) is an example of a type of activity in Data in Depth
called a Sonata for Data and Brain. A Sonata is longer than a problem or a worksheet,
but much shorter than a project—maybe at most an hour or two of work. And while
a Sonata lets students use a variety of methods to come to a conclusion, it provides
structure by describing the task, identifying relevant attributes, and giving instructions
on how to get the data. This simple Sonata deals with the problem of predicting the
height of buildings from the number of stories.
The Geography of U.S. Cars I (p. 40) is the first of three activities in which students
study vehicles per person and miles per vehicle.
Area and Perimeter (p. 43) asks the question, “What is the relationship between the
area and the perimeter of randomly constructed rectangles?” Fathom makes the
rectangle data for us in an elementary simulation; when we analyze it, we find that a
curve—whose equation we find—forms the boundary of a cloud of points.
Consumer Price Index (p. 46) looks at the CPI since its inception and computes
inflation rates. This highlights the difference between proportion and absolute
difference.
Heating Water (p. 48) asks you to fit a line to rising water temperature. It looks
linear, but making a residual plot shows previously unseen structure. This is the
introduction to residual plots. It works especially well with chemistry, physics, and
statistics.
Random Walk—Inventing Spread (p. 51) involves simulating a random walk using
Fathom and developing a measure of spread to go with its results.
A New Birthday Problem (p. 55) turns the old chestnut around a little, asking how
many people it takes until you get a match—you simulate this problem.
Aunt Belinda (p. 57) is a Sonata that explores an inference problem based on a
simple binomial situation.
Fathom Workshop Guide
©2001 Key Curriculum Press
37
Tall Buildings Sonata
You need Buildings
You’ll show that you can
•
make a prediction about
a numerical relationship
•
make scatter plots
•
use a movable line
•
develop a rule of thumb
In this Sonata, you’ll study data about the tallest buildings in Minnesota and
Florida. You’ll find the data in the file Buildings in the Exploring Mathematics with
Fathom folder. These data come from the World Wide Web.
The Question
How is the height of a building related to the number of stories it has?
Prediction
The relevant attributes are height (which is the height of the building in feet) and
stories (the number of stories). How do you predict the two quantities will be
related? In particular, about how many feet do you think there are per story?
Measurement
Analyze the data using Fathom. Be sure to use a movable line. Describe briefly
what you did:
Comparison
This is a good place to
describe anything else you
noticed or can explain. For
example, is there any
difference between Minnesota
and Florida? Can you
explain both the slope and the
intercept of any line you used?
38
Use Fathom to compare your prediction to the data. Report what you found
out. Refine your description of the relationship between height and number of
stories. Develop a rule of thumb (a simple calculation) you can use to predict
the height of a building from the number of stories. Describe that rule:
©2001 Key Curriculum Press
Fathom Workshop Guide
Tall Buildings Sonata—Teacher Notes
Data in Depth has many exercises in what is called “sonata form.” The main
idea in Sonatas is for students to predict or make conjectures. Insist that students
commit to their predictions and write them down before they see the data. This is a great
way to use class or homework time before you go to the lab. You can even
assess the “prediction” work—as long as you assess it only in terms of its being
sensible, not necessarily correct. Help students make simple mathematical
models (i.e., all linear) but daring ones (i.e., imagining data they have not seen).
Here are several levels of model, all acceptable depending on the students’
experience:
Fathom Workshop Guide
•
Qualitative: the more stories, the higher the building.
•
Functional: the graph will be a straight line (it’s proportional). The slope
represents the number of feet per story.
•
Functional with quantity: the above is true, and the slope is probably about
12 feet per story.
•
With context: the above is generally true, but there is definitely a yintercept, which might represent towers that tall buildings generally have.
©2001 Key Curriculum Press
39
The Geography of U.S. Cars I
You need
States–CarsNDrivers
You’ll learn how to
•
make new attributes
•
define attributes by
formula
These data come from the
U.S. Department of
Transportation and are for
1992 except where noted.
You can also scroll to the
<new> column, but this
technique puts the column
closer to the other attributes
you need in this activity.
This activity is about driving in the various states of the U.S. The file is called
States – CarsNDrivers found in the Exploring Mathematics with Fathom folder. For each
state, it records these attributes:
PopThou
Population of the state (in thousands)
DriversThou
Number of licensed drivers (in thousands)
VehThou
Number of registered vehicles (in thousands)
MilesMill
Number of miles driven (in millions)
RoadMiles
Number of miles of roadway in the state (as of
December 31, 1994)
AreaSqMi
Area of the state in square miles
You will be computing and comparing various per-unit quantities. We’ll start in
a very step-by-step way and quickly move to asking you the big questions, just
letting you use Fathom to respond.
1. Open States – CarsNDrivers. First we’re going to ask what percentage of the
population in each state drives. For that, we’ll divide the number of drivers
by the population, then multiply by 100 to get the percent. But first. . .
2. Predict! Which states do you think will have the highest and lowest
percentage of licensed drivers? Guess the three highest and the three
lowest.
3. Onward! Click the heading for PopThou. The column is highlighted.
4. Choose New Attribute from the Data menu. When the attribute name box
appears, name it PctDrivers for “Percent Drivers” and close the box.
5. Select the name of the new attribute. Choose Edit Formula from the Edit
menu. The formula editor appears.
6. Enter 100 * DriversThou / PopThou. Close the editor by clicking OK. You should
see numbers like the ones in the picture.
7. Figure out which states have the highest and the lowest percentages of
licensed drivers. Report the three highest and the three lowest.
8. How did you figure out which were the highest and lowest?
9. Why do you suppose those states are the highest and lowest?
Per Capita Vehicles
The Latin phrase per capita
means “per person.”
40
10. Use the same procedure to figure out which states have the largest and
smallest numbers of vehicles per capita in the population. First predict
which states will have the most and the fewest vehicles per capita.
©2001 Key Curriculum Press
Fathom Workshop Guide
11. How many vehicles per person were there for the highest and lowest
states?
12. Write the name of your new attribute and the formula that you used to
calculate it.
13. Sketch the box plot of that attribute.
14. Again, propose an explanation for why those states are the ones that are
highest and lowest.
Miles per Vehicle
15. Do the same thing you did above, only this time compute the average
number of miles each vehicle travels per year in each state. In the formula
for your new attribute, use the attribute MilesMill, which is the total number
of miles driven that year in millions of miles. Be careful! The number of
vehicles, VehThou, is in thousands of vehicles. Include responses to the
preceding five items (but for the new attribute).
This is easiest if you make two
dot plots or box plots, one for
each of the last two new
attributes. When you select
states in one plot, they are
highlighted in the other.
Fathom Workshop Guide
16. Look at the states that had the highest values of vehicles per person.
Where do they fall in miles per vehicle? Explain that. See if the same
pattern holds true for the states with the smallest numbers of vehicles per
person.
©2001 Key Curriculum Press
41
The Geography of U.S. Cars—Teacher Notes
This activity is the first of three that appear in Data in Depth. All three are about
cars and the 50 states (plus the District of Columbia). Students use ratios to
compare the states, making many different per-unit quantities.
What’s Important Here
This is about proportion, but let’s highlight a few issues:
•
Remind students to see whether their answers make sense. Assess them
through other assignments or by including sense-making in the rubric. This
could easily be a sentence that begins, “0.42 cars per person makes sense
because… ” followed by an explanation or calculation, or “5.7 cars per
person is surprising because in the city hardly anyone has a car… ”
•
Any “per” quantity is an average. It represents the whole group (especially
in comparisons), but individuals may differ greatly. Per capita income, for
example, may not represent many people’s actual earnings.
•
Help students use what they know about the states. Encourage
geographical speculation. For example, New England states are often
clustered together in the attributes we will study. Why? California, known
for its “car culture,” is unexceptional in these data except for being large.
Why is that?
The Geography of U.S. Cars I
Here are some suggestions for discussion topics:
1. Ask students how they found out which state had the maximum or
minimum of some quantity. Have them explain their methods (e.g.,
looking at a point plot or sorting the table) so the whole class can build its
repertoire of skills.
2. Ask students if they know how far their family cars (if any) are driven in a
year. Compare each distance to your state’s milesPerVeh. Explain any
discrepancies you come across. (This technique, for example, ignores
commercial travel.)
42
©2001 Key Curriculum Press
Fathom Workshop Guide
Area and Perimeter: A Study in Limits
You need to know how to
•
make new attributes
•
write formulas for
attributes
You’ll learn
How are areas and perimeters of rectangles related? We all know the
formulas—area = length*width, perimeter = 2(length + width)—but if you look at them
from a data perspective, what do you get?
In this activity, we’ll use Fathom to generate the measurements for rectangles
randomly and explore the relationships. You can also generate rectangles
haphazardly using any other technique you like (asking people to make them,
using The Geometer’s Sketchpad®, etc.) and use those data.
1. Open a new Fathom document and create a new case table.
•
how to plot functions
•
how to make sliders
•
2. Make two attributes—length and width.
one way to use random
numbers
3. With the case table selected, choose New Cases from the Data menu. Tell
Fathom to make 200 cases. They appear, though their attributes have no
values.
Making Random Rectangles
We won’t actually make the rectangles here; we’ll just make the measurements.
4. Select length, and choose Edit Formula from the Edit menu to bring up the
formula editor.
5. In that box, enter the formula random(10). Close the box by clicking OK.
Now length is filled with random numbers ranging from 0 to 10.
6. Use that same formula to define values for width. Now you have random
numbers for both attributes.
7. Make a scatter plot of these two attributes; it ought to look random.
Calculating Area and Perimeter
8. Make two new attributes, area and perimeter. Make formulas to determine
these; the formulas should depend only on length and width. Copy the first
two or three cases from your table here:
9. Make a scatter plot with perimeter on the horizontal axis and area on the
vertical axis. Sketch the graph at right; label and scale the axes.
10. If everything went well, there should be no points in the upper-left part of
the graph—a region bounded by a curve. Explain why there are no points
in that region.
Fathom Workshop Guide
©2001 Key Curriculum Press
43
The Boundary
Now your goal is to figure out the equation for that curved upper boundary of
the cloud of points. There are two strategies: the theoretical one and the
empirical one. In the theoretical strategy, you figure it out from the equations
and just plot it. These directions will concentrate on the empirical.
The boundary curves upward. It might be exponential, or it might be some
polynomial (such as a quadratic). Since finding areas can involve squaring, we’ll
try a quadratic first, area = perimeter2.
11. Select the graph and choose Plot Function from the Graph menu. The formula
editor appears.
12. Enter perimeter^2 into the box and close it. The curve will appear, as in the
illustration. (Remember that you don’t enter the left-hand side of a
function in Fathom.) Unfortunately, the curve is too steep.
You can alter the formula on your own, but the following is fun…
13. We’ll make a slider. Choose Slider from the Insert menu. A slider appears; it
looks like a number line with a pointer on it. There’s also some text that
gives the name of the variable and its value, for example V1 = 5. Change the
name of V1 to denom.
14. Edit the plotted function (double-click the equation at the bottom of the
graph). Change it to
perimeter 2
denom
. Now you can control the value of denom
by dragging the slider’s thumb. As you do so, the function changes. You
will have to change the scale of the slider. Do it the way you change an
axis—by dragging the numbers with the hand, or Ctrl-Shift-clicking
(Windows) Option-Shift-clicking (Mac) to zoom out.
15. Change the value of denom until the curve accurately forms the boundary.
What value of denom did you get? (It should be greater than 10.)
16. Study the points near the boundary. What is the shape of the rectangles
near the boundary curve? That is, what do they have in common?
17. Explain why your answer to the preceding question makes sense. That is,
why do all those rectangles have that property?
Going Further
1. Show, algebraically, what the curve’s equation must be, and show that it’s
the same as the one you got by looking at the data.
2. Depending on how you made the rectangles, there may be another empty
zone in the lower-right corner of the scatter plot. Why are there no
rectangles there? What is the equation of that border? Why?
44
©2001 Key Curriculum Press
Fathom Workshop Guide
Area and Perimeter: A Study in Limits—Teacher Notes
Structured computer
activity
Data generated in
simulation
Intermediate, medium
features
As described, students simulate creating random rectangles. You may also do
this very effectively (especially with less experienced students) if you have them
create genuine rectangles, cut them out, and measure them. Make sure, though,
that students have experience with parabolas if you do the second page. If not,
simply exploring the plot is worthwhile.
What’s Important Here
•
Using random numbers for simulation.
•
Using a scatter plot of data to illuminate geometrical constraints.
Discussion Questions
Topics
•
area and perimeter
•
parabolas
•
visualization
•
fitting nonlinear
functions using variable
parameters
•
What do rectangles in the lower left look like? (Small.)
•
How about rectangles in the lower right? (Long and skinny.)
•
What are the shapes of the rectangles along the curved border? (They’re
squares.)
•
Why do you suppose so many of the points seem to line up close to the
boundary? (Large-area rectangles must be close to being squares or they’d
have a dimension larger than 10. For smaller rectangles, you can have a
surprisingly large length / width before the rectangle moves a long way from
the border. But most important (and subtlest), the distribution of the
quotient of two uniform random variables, larger/smaller, is greatly skewed
toward 1. So you do get more squarish figures.)
What’s the Formula?
Here’s an elegant answer to Going Further 1, supplied by a teacher: The curved
boundary is made of the rectangles with the maximum area for a given
perimeter. Those are squares. So what’s the area of a square with perimeter P?
2
2
Each side is P4 , so the area is P16 . Therefore, the equation is A = P16 .
There are other perfectly good approaches.
What about the other boundary? That’s created because we’re limited to 10 in
length and width. So, for some perimeters there is a minimum area. For example,
with a perimeter of 30, what is the skinniest (minimum area) rectangle we can
make? You can only spend a maximum of 20 on two lengths, and you’ll have 10
left over for the widths. You wind up with a rectangle of area 50.
The width of such a rectangle (taking length = 10 as the long side) is,
width =
(perimeter −20 )
2
− 20)
so area min = 10 × (perimeter
= 5( perimeter) − 100
2
which is the line we seek for perimeter > 20.
Fathom Workshop Guide
©2001 Key Curriculum Press
45
Consumer Price Index
You’ll learn how to
•
make new attributes
•
use the formula editor
•
compute differences in
Fathom
Scientists often use the Greek
letter delta to mean “change
of.”
Note: You get a bogus result
in the first year because there’s
no previous year. Remember
to ignore that first point (we’ll
deal with it more explicitly in
a future activity).
In general, things get more expensive over time. How much more expensive?
One way to tell is to look at a statistic called the Consumer Price Index (CPI).
The CPI in these data is set so that 1967 = 100. This means that when the CPI
is 200, things generally cost twice as much as they did in 1967.
1. The file CPI contains annual data from 1913
to 1997. Open it.
2. Make a scatter plot showing CPI as a
function of year. Your graph should look
like the one in the illustration. As you can
see, things just keep getting more expensive.
3. People worry a lot about inflation. One way
to define inflation is that it’s the amount by
which the CPI is increasing. Look at the
graph: In what year does it look as if the
CPI was increasing most steeply?
4. Let’s figure out just how steep, using Fathom. Make a new attribute by
clicking in the column header marked <new>. Call it deltaCPI.
5. Now we make a formula that computes the change of CPI from the
preceding year to this one. Open the formula editor by selecting your new
attribute name and choosing Edit Formula from the Edit menu.
6. Enter CPI – prev(CPI) (prev stands for “previous”). Click OK to close the
formula editor. The differences should appear in your deltaCPI column.
7. Make a new graph that shows deltaCPI as a function of year. It should look
like the graph shown here. Now, in what year did CPI increase most
steeply? Explain how you used the graph to tell.
Some people would say that a big increase doesn’t matter—only a big
percentage increase does. After all, a CPI increase of 20 when the CPI is 300 is
not as important as when the CPI is 50. We’ll investigate this.
8. Make a new attribute, percentChange, that calculates the percent change of
the CPI from the preceding year. Graph percentChange as a function of year.
Sketch and label that graph below, and write the formula you used for
percentChange.
9. What’s a typical percentChange in the last decade of data? In what year, since
the beginning, was the percent change the greatest?
46
©2001 Key Curriculum Press
Fathom Workshop Guide
Consumer Price Index—Teacher Notes
Structured computer activity
Data from file
Intermediate, basic features
This activity is about computing rates. Students compute a rate simply by
taking the difference between one case and its predecessor. In a later activity,
students must also divide by the difference in time, just as when they take
derivatives in calculus.
What’s Important Here
Topics
•
rates
•
proportional rates
Students need to be able to
•
open a file
•
make a scatter plot
Materials
•
CPI
Consumer Price Index data
are from
http://stats.bls.gov/cpihome.h
tm.
We have intentionally chosen
this set of data to minimize
the issue of initial conditions.
•
Students learn how to use prev( ) to make a formula look at the previous
row.
•
Subtracting the previous value from the current one in order to calculate
change is a cornerstone of numerical analysis. Time spent checking for
understanding will be a good investment.
•
Students begin to see that the first case in a recursive or inductive
definition may have to be treated specially.
•
Students see the relationship between a graph of the data and a graph of
the change in the data.
•
Students see and calculate the difference between change as an absolute
quantity and change as a relative (percentage) quantity.
Discussion Questions
1. In the graph of deltaCPI, why is the first point so strange?
2. Given the data we have up to 1998, what do you think the CPI is now?
3. Ask if anyone can remember the price of something in 1997 or earlier.
Then ask students to use the CPI and extrapolate to the present to predict
the price now. Does the price match? If the prediction is high or low, what
does that mean?
4. Sometimes people use a CPI based on a different year. What formula
would you use to convert the 1967 = 100 CPI to one in which
1990 = 100?
5. Why would you want to look at the percentage increase in the CPI rather
than the absolute increase?
6. What historical features can you see on the graphs? Can you explain them?
(The Great Depression, World War II, the 1980s recession)
7. If there’s a peak in the “delta” graph, what does that mean on the
“straight” graph? (It’s at a high slope.) If there’s a peak in the “straight”
graph, what does that mean in the “delta” one? (It’s descending through
zero.)
Tech Note
In this activity, we ask students to identify points on a scatter plot; for example,
“In which year did the CPI increase the most?” Students can read the numbers
off the axis or find them in the table, but they can also simply point at the
point. The cursor changes to a left arrow and the coordinates of the point
appear in the status area in the bottom left of the Fathom window.
Fathom Workshop Guide
©2001 Key Curriculum Press
47
Heating Water
Sketch your prediction here:
In this activity, we’ll explore residuals, which are the amounts “left over” when
you model data with a function.
The author heated a pan of water, recording how many minutes and seconds
had passed each time the temperature increased by 5 degrees Celsius. Heating
has the data for temperatures between 35 and 90 degrees Celsius, which span
the times between 40 seconds and 205 seconds after the heat was turned on.
1. Before you open the file, sketch what you think the graph will look like.
That is, how does temperature increase with time for a pan of water on the
stove? Is it linear? If not, how does it curve? Make a labeled and scaled
sketch at left.
2. Now open the file. Make a new graph. Put time on the horizontal axis and
temp on the vertical. Describe how the actual graph differs from your
prediction.
3. Choose Least-Squares Line from the Graph menu. A least-squares fit to the
data appears. Write the equation for the least-squares line below and tell
what the slope means (e.g., what are its units?).
The line should look as if it goes through all the points, and r2 should be very
close to 1.00. We might make a mistake here and say, “Wow—temperature
increases linearly. The fit is nearly perfect.” That is, we might think we’re
finished. It’s true that the line explains a lot of the temperature change, but two
things should bother you:
Temp. How Far Off
35
40
45
50
55
60
65
70
75
80
85
90
(You don’t have to do all the
points—just enough to see a
pattern.)
48
• It’s hard to believe that it really is linear, that is, that the stove can heat
hot water as fast as it can heat cold water.
• It’s hard to believe that our measurements of times and temperatures are
so accurate that the points fall on the line exactly.
So we’ll look at the graph more closely.
4. For several of the points, zoom in to the point and record, in the table
we’ve provided, approximately how far it is from the line (in degrees
Celsius). Use positive numbers for above the line, negative for below.
These numbers are called the residuals.
5. Describe any pattern you see in the residuals.
6. Let’s make this process easier. Make a new attribute called predictedTemp.
7. Give predictedTemp a formula—the formula of the least-squares line. Now
the values of predictedTemp are the temperatures on the line for each of the
times in the collection.
8. Make another new attribute, residuals. Give it the formula temp – predictedTemp.
Explain why these residuals are the same as the “distance in degrees
Celsius” you reported in Step 4 (especially, is the sign right?).
©2001 Key Curriculum Press
Fathom Workshop Guide
9. Make a new separate graph and plot residuals as a function of time. Sketch
that graph (labeled and scaled) at left. The pattern you see in the graph
should be the same as the one you saw in the numbers in Step 5. Now,
what does that pattern (the curve) mean in terms of the situation of
heating water? That is, it’s not linear. In what way is it not linear, and why
do you suppose that’s true?
A Great Shortcut
10. Click on the original graph (temp as a function of time) to select it. Choose
Make Residual Plot from the Graph menu. Describe what appears.
11. Drag a point on the graph. Describe what happens to the line and to the
residual plot (undo the drag afterwards so that your data are safe).
Going Further
1. You can also use the residual plot to assess the accuracy of the
measurements. The thermometer was scaled with lines every degree (about
1.5 mm apart). One possibility is that the line really is a good model and
the residuals are just due to errors in measuring temperature. Is that very
likely? Base your response on the size of the residuals and on the pattern of
the residuals.
2. The author claims that he watched the thermometer and wrote down the
time when the temperature passed 30, 35, 40, . . . degrees. Do you think he
had enough time to do this (or did he invent the data)? If he was honest,
how accurate do you think his time measurements were?
3. Suppose the temperature measurements were completely correct and
accurate. What measurement blunders (in size and pattern) would the
author have had to make in the time measurements to produce the pattern
in the residuals?
4. Suppose the author, seeing the thermometer pass a temperature where he
needed to take a reading, did the same thing for every measurement: He
turned from the stove, wrote down the temperature, looked at his watch,
and wrote down the time. There is a delay between the time when the
temperature was correct and the time when he looked at his watch.
Describe the effect this might have on the fitted line and on the pattern of
residuals.
Fathom Workshop Guide
©2001 Key Curriculum Press
49
Heating Water—Teacher Notes
Three structured computer
activities
Data from files
Intermediate, basic features
This activity is the second of three that introduce students to residuals and
residual plots. Our approach in Residuals Playground and Heating Water is to have
students calculate residuals directly—by having Fathom calculate from the
given function—before introducing the Make Residual Plot command.
The third activity, Radiosonde Residuals, is suitable either as practice to follow the
second or as a quick introduction for more experienced students. It’s also
suitable for homework or open lab time.
What’s Important Here
Topics
• residuals
Students need to be able to
•
make scatter plots
•
make new attributes
•
use formulas to define
attributes
We use residuals for many purposes in data analysis. These activities focus on
using residuals to see patterns that are invisible in a “regular” graph.
•
You can see deviations (from the curve) or a pattern (e.g., nonlinearity)
better because the vertical scale of the residual plot is so different from the
one in the main plot. Making the residual plot magnifies the data with
respect to the model (the line).
•
If residuals don’t look random, we should question the model. In Heating
Water, for example, the linear model is OK for some purposes
(interpolating temperatures to within a degree, say), but we can see that the
phenomenon is really not linear.
Discussion Questions
1. What are the units on the vertical scale of the
residual plot? (degrees Celsius)
2. Compare the main plot to the residual plot.
Why are the scales so different?
3. (Especially in Residuals Playground.) What would
happen if we defined a residual as (predicted –
observed) instead of the other way around? (For
one thing, when we moved the data one way,
residuals would go the other. It would be
confusing.)
One way to think of residual
plots is that we turn the
model—the line or curve—
into a new axis.
A graph from Heating Water
(page 48) with a residual plot.
Tech Notes
Note: The student pages do
not tell students how to zoom
in. We describe one way here;
you can also encourage
students to look in Help.
50
•
Students need to zoom in for Heating Water. To do so, hold down Ctrl
(Win) Option (Mac) so that the cursor changes to a magnifying glass and
drag a rectangle around the region you want to investigate. The rectangle
expands to fill the whole graph. Rechoosing Scatter Plot from the popup
menu in the graph rescales the plot.
•
You can do Heating Water with your own data (especially if you’re teaching
chemistry, for example). If the burner is hot, you get a good, linear-looking
regime in the middle. At the beginning, it flattens out as the container
heats; at the end, it flattens as the water gets closer to boiling. If you
include these “flat” areas, (a) students should definitely use the movable
line—not least-squares—and fit it to the linear part of the data, and (b)
students will need to expand the vertical scale of the residual plot in order
to see the curve in the linear part of the temperature graph.
©2001 Key Curriculum Press
Fathom Workshop Guide
Random Walk—Inventing Spread
You’ll learn how to
•
build a simulation
•
use the inspector
•
define measures
•
collect measures
You can’t save your work if
you are running the Fathom
demo.
Fathom Workshop Guide
A random walk is a process in which, for every step, you flip a coin. Heads, you
step east. Tails, you step west. On the average, you’ll wind up where you
started—or will you? It depends on what you mean by average. This activity will
help you explore the random-walk situation.
Building the Simulation
We’ll begin by building a simulation of a random walk in which we’ll take 10
random steps from zero. Each step will be +1 or –1.
1. Open a new Fathom document.
2. Create a new case table, either by dragging one off the shelf or by choosing
Case Table from the Insert menu.
3. Make a new attribute, distance. To do this, click in the table header marked
<new>. Type distance in place of <new> and press Enter. This attribute, distance,
will record our distance from the start during our random walk.
4. Make 10 cases (these will be the steps in the walk). Choose New Cases from
the Data menu, enter 10, and click on OK to close the box.
We now have 10 cases. We need a formula to compute the values for distance.
5. Select distance and choose Edit Formula from the Edit menu. The formula
editor appears.
6. Enter this formula: prev(distance) + randomPick( –1, 1). That is, our distance from
the beginning at every step is the preceding distance plus either +1 or –1.
Click OK to close the formula editor. Numbers appear in the table,
representing our distance from the beginning after each step.
7. Press Ctrl-Y (Win) a-Y (Mac) to rerandomize. Do this a few times. Describe
what happens.
8. Let’s make a graph. Make a new graph by dragging one off the shelf or by
choosing Graph from the Insert menu.
9. Drag the attribute distance from the top of the column in the case table to
the vertical axis of the graph. You’ll see a graph like the upper one shown
here.
10. The dots show us what values distance took during our walk. Choose Line
Plot from the popup menu in the graph. You’ll see a graph like the lower
one shown here.
11. Rerandomize again; you’ll see the graph wiggle. (Note: The graph may
wiggle out of its original bounds. To change the bounds, drag the numbers
on the vertical axis. The cursor-hand changes depending on where on the
axis you drag. That tells you what will happen to the axis.)
12. Let’s rename our collection. Double-click its name (Collection 1) and rename
it Steps.
13. Save your work. Call the file RandomWalk.
©2001 Key Curriculum Press
51
Finding the Averages
Each random walk is different. We want to know, on the average, how far you
are from the place where you started.
14. Rerandomize 20 times. Each time, record the position (positive or
negative) after the tenth step.
15. Calculate (by hand or with a calculator) the average of those numbers.
What do you get?
16. What do you expect to get in the long run? Why?
17. This time, calculate the average of the absolute values of the positions. What
do you get?
18. Explain the difference between the meanings of the two averages above.
Why would you ever use one? Why the other?
Collecting Measures
It should not be obvious what to expect in the long run from the absolute
values. But if we do more walks, we should get a better idea. Fathom will help
us record the final positions and calculate the means.
To do that, we need to find the last value for distance in the simulation. We’ll
call that final. Then we need to collect a set of finals so we can average them.
Here’s how:
19. Double-click on the collection icon (box of balls) to bring up the
inspector.
20. Click on the Measures tab. We’re heading
for something like the picture shown here.
21. Make a new measure by clicking on the
<new> label and typing final. Press Enter
when you’re finished typing the name.
22. Double-click to the right of final in the
Formula column to bring up the formula
editor.
23. Enter this formula: last(distance). The function last( ) returns the last value in
the list.
24. Leaving the inspector open, press Ctrl-Y (Win) a-Y (Mac) to see the value of
final change. Be sure that it does before closing the inspector.
25. Now we’ll get Fathom to collect the finals. Select the collection (the box of
balls). Choose Collect Measures from the Analyze menu. A new collection
appears (called Measures from Steps; it may be under the inspector).
52
©2001 Key Curriculum Press
Fathom Workshop Guide
26. With the new (measures) collection selected (it should already be selected),
choose Case Table from the Insert menu. Now you can see the various final
positions. This list should resemble the one you made by hand.
27. Make a new graph and drag final to the horizontal axis. Change the graph to
a histogram by choosing Histogram from the popup menu in the graph.
28. We need to see the mean. With the graph selected, choose Plot Value from
the Graph menu. The formula editor appears.
29. Enter mean() and close the editor with OK. The value and a line appear on
the graph.
30. Sketch the graph at left. Explain below what the graph represents in terms
of the walk. That is, where did those numbers come from? Be precise and
concise.
31. We also want to see the mean of the absolute values. Plot another value
with the formula mean( | final | ). (You can use the on-screen keypad to get the
absolute value bars.) Sketch its position on the graph you drew for Step 30.
Explain why this number is greater than the previous one.
32. Now we want to do it again. Double-click on the measures collection to
bring up its inspector. It has a Collect Measures pane. Select it; you’ll see
something like the one shown here.
33. Click Collect More Measures to collect a new set of 5. If you position the
inspector correctly, you can see both graphs update.
34. Increase the number of measures to 40. Collect measures again. What do
you think are reasonable values for the “eventual” mean of final and the
mean of the absolute value of final?
Going Further
1. If you’ve studied the binomial theorem and expected values, compute the
theoretical mean of the absolute value and compare it to the result of your
experiment.
2. How will the mean of the absolute value change if we collect 400 walks,
say, instead of 40? Explain why.
3. How will the mean change if our walk is 40 steps instead of 10? Explain
why and do the experiment. Compare your results to your prediction.
Fathom Workshop Guide
©2001 Key Curriculum Press
53
Random Walk—Inventing Spread—Teacher Notes
Structured computer activity
Data generated in simulation
Starter, complex features
The random walk is a basic situation for the theoretical study of spread. Under
the hood, it’s a binomial problem—the populations at the various positions
after n steps follow the distribution in the nth row of Pascal’s triangle. With
enough steps, the distribution approaches normality, and (of vital importance)
the size of the standard deviation increases proportional to the square root of n.
Here we ask a less sophisticated question: What’s the average distance you’ve
gone after a random walk of 10 steps?
What’s Important Here
Topics
•
random walks
•
spread
Materials
•
RandomWalk
(optional)
•
This activity gives students some experience with which to underpin the
random-walk situation.
•
What we mean by average (e.g., whether we take the absolute value) varies
with context.
•
The activity develops experiences that will help students understand the
theoretical results.
Discussion Questions
1. What’s the horizontal axis in a line plot?
2. What does the histogram of the finals really mean? Explain the graph to
someone who hasn’t done the activity.
3. In general, do you expect someone who has taken more random steps to
be farther from the start? Why?
4. In general, do you expect someone who has taken twice as many random
steps to be twice as far from the start? Why?
A Big Fathom Idea
This activity introduces derived collections. We create what we call a measures
collection—a collection whose job is to collect summary values from another
collection. Those values are statistics—measures of that “source” collection. In
the case of our random walk, they’re the final position of the walk. Here’s a
table that compares them.
54
Random Walk (Source) Collection
Measures Collection
Each case is a step.
Each case represents one random walk.
The whole collection is one random
walk—a collection of steps.
The collection summarizes many
random walks (5 by default).
The final position is a single number (a
statistic, a measure) that summarizes the
whole walk.
Each case contains the final position of
one walk, so the collection has many
“final positions.”
You can’t calculate the mean distance
from the origin here because this
collection is only one example.
You can calculate the mean distance
from the origin in this collection—by
averaging the final positions.
©2001 Key Curriculum Press
Fathom Workshop Guide
A New Birthday Problem
Here’s a new version of an old problem.
Suppose people come into a room one at a time. Whenever a new person
comes into the room, he or she announces his or her birthday. If it matches
that of anyone else in the room, the door is locked. No one else is let in.
On the average, how many people will get into the room before this happens?
What’s the distribution of that number?
Conjecture and Design
First, what do you think? About how many people will come in before there’s a
birthday match?
Design a simulation that will answer the question. This answer should include
not only the average number of people but also the distribution of that number
if the task were repeated many times. Briefly describe how you’ll do it. What
collections will you need? Any special formulas? Any things you don’t know
that you need to find out?
Construction
Build your simulation in Fathom and run it. Report your results below: Sketch
the graph of the distribution and give the average number of people you got.
Comparison and Reflection
Do your results seem reasonable? How did they compare with your conjecture?
Did your simulation work as planned? Describe any modifications you had to
make.
Fathom Workshop Guide
©2001 Key Curriculum Press
55
A New Birthday Problem—Teacher Notes
Data from simulation
Independent, complex features
56
This problem is a variation on the old one that asks, “What’s the probability
that out of 30 people in a room, two will have the same birthday?” The
surprising answer is that the probability is better than .5. Here we turn it
around a little and ask how many people it takes until you get a match—and
simulate that.
If students have trouble, suggest that they start with a collection with one case
with one attribute, birthday, with a formula of randomInteger(1,365). They should
sample from that, using an until formula in the sample pane.
©2001 Key Curriculum Press
Fathom Workshop Guide
Aunt Belinda
Aunt Belinda claims to have magnetized her kitchen table. She says that it
makes nickels come up heads when she flips them. “Magnets don’t attract
nickels,” you say.
“Ordinary magnets don’t,” she admits, “but mine is special.” You supply her
with 20 nickels. She sits at her table and throws the coins into the air. When
they come down, 16 are heads.
Of course, you ask her to do it again. Of course, she refuses. “The energy fields
in my table are exhausted,” she says. “Besides, one proof is enough.” What do
you think?
Conjecture and Design
Without any calculations, what do you think is the chance of throwing 16
heads in 20 throws? Do you think a result of 16 out of 20 is convincing
evidence? If so, evidence of what?
Now design an experiment you can do with Fathom to compute
experimentally the probability of getting 16 or more heads out of 20 throws.
Describe what you will do before you set hand to mouse.
Measurement
If you happen to know about
the randomBinomial() function,
please don’t use it except to
check your work.
Build your simulation and run it. Describe below any changes you made in
your experiment design above. Sketch the histogram (or whatever display you
use) that finally gives your results.
Comparison and Conclusion
How did the actual probability compare to what you thought it would be?
Write what you think you can conclude from your experiment with Aunt
Belinda. Be as clear as possible.
Fathom Workshop Guide
©2001 Key Curriculum Press
57
Aunt Belinda—Teacher Notes
The Aunt Belinda problem is one of a suite of four called Inference Sonatas and
Other Challenges. We present the teacher notes for all of them even though only
one problem’s student pages are included in this guide.
Many computer investigations
Data from everywhere
Independent, complex features
Aunt Belinda
This Sonata is an inference problem based on a simple binomial situation.
One solution is to make a collection with 20 cases, each a coin with an attribute
face controlled by a formula such as randomPick("heads", "tails"). Then make a
measure, nHeads, with a formula such as count(face = "heads"). Now collect
measures from that collection. The distribution of nHeads in the Measures
collection will show that a result of 16 heads out of 20 is unusual. Students will
vary in what they do with that result.
Hot Hoops
We invite students to explore the phenomenon of being “hot” or “cold” in
sports. Do people get “hot hands,” or are we just looking for patterns in
randomness?
The key is to design an experiment in which there is an identifiable null
hypothesis (there are no streaks) and you can construct a statistic to measure
streakiness.
The suggestion on the student sheet leads to a 2-by-2 array that you can test for
independence using chi-square or any equivalent statistic. That is, the null
hypothesis is that there is no relationship between the outcome of the
preceding shot and this one.
Another alternative is to look at streak length itself. The function runLength( ) will
help you investigate streaks in your data; for example, max(runLength(shot)) will
give the length of the longest string of cases in which shot has the same value.
If the data are streaky, this should be large. Or count(runLength(shot) = 1) will give
the number of streaks. In streaky data, this number is small. To simulate the
null hypothesis, scramble your data so that order is random and build up a
distribution of your statistic.
Beyond Constructivist Dice
These are two pages of problem prompts based on the activity on the
Constructivist Dice activity.
More Inference Challenges
Here are more problems that apply other inferential techniques from this
chapter. Some of them use the census microdata files you may have used for
activities in the first chapter of this book. If you have not, open the Census Files
folder in the Samples Documents folder to choose a file with data from your area
or one you’re interested in.
58
©2001 Key Curriculum Press
Fathom Workshop Guide
Fathom Workshop Guide
©2001 Key Curriculum Press
59