Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Workshop Guide for Fathom Dynamic Statistics™ Software Version 1.1 Contents Introduction ...................................................................................................................... 1 Getting Started with Fathom ............................................................................................. 7 Tutorial 1: Data, Formulas, and Prediction—Wrist Versus Height .................................. 9 Tutorial 2: An Introduction to Exploring Data—Residents of Beverly Hills, California 15 Tutorial 3: Formulas, Functions, and Data—The Planets .............................................. 21 Tutorial 4: Simulation—Polling Voters ........................................................................... 25 Tutorial 5: Testing a Hypothesis .................................................................................... 29 Fathom Demo Documents ............................................................................................. 35 Sample Activities ............................................................................................................. 37 Authors: William Finzer and Tim Erickson Editor: Jill Binker Lead Programmer for Fathom: Kirk Swenson Development Team for Fathom: William Finzer, Kirk Swenson, Nick Jackiw, Jill Binker, Matt Litwin, Tim Erickson, Eugene Chen, Tony Thrall, Zach Teitler, Denise Howald, Caroline Wales, and Vadim Keylis Production Director: Diana Jean Parks Production Coordinator: Ann Rothenbuhler Cover Designer: Caroline Ayres Publisher: Steven Rasmussen Portions of this material are based upon work supported by the National Science Foundation under award number III–9400091. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation. © 2001 by Key Curriculum Press. Fathom’s Formula Editor © 1999 by Pacific Tech (www.PacificT.com). All rights reserved. ™Fathom Dynamic Statistics and the Fathom logo are trademarks of KCP Technologies. The gold balls in Fathom’s collections are taken from an After Dark® screen saver and are used with permission. After Dark is a registered trademark of Berkeley Systems, Inc. All rights reserved. All other registered trademarks and trademarks in this book are the property of their respective holders. 10 9 8 7 6 5 4 3 2 04 03 02 01 ISBN 1–55953–518-0 The demonstration edition of Fathom and this Workshop Guide are for demonstration purposes only. Key Curriculum Press grants users the right to use and copy these for the purposes of evaluating the materials or for use in teacher workshops. Copied materials must include this notice and Key Curriculum copyright. The demonstration software and guide cannot be used for classroom or other extended use. The full Fathom package is available from Key Curriculum Press, 1150 65th Street, Emeryville, CA 94608, (800) 995-MATH (6284). Introduction As its title suggests, this guide is meant to be used in a workshop setting, one in which participants can work together and in which a workshop leader can give demonstrations, lead discussions, and answer questions. For learning to use Fathom outside a workshop setting, try the Learning Guide that comes with Fathom, or the Walkthrough Guide that comes with the demo. To the Workshop Participant Welcome to Fathom and to the world of Dynamic Statistics. The tutorials in this guide will introduce you to the most important features of Fathom; they begin on page 9. As you work through the tutorials, you will encounter watch icons such as the one at left. These indicate a pausing place. Your workshop leader is likely to ask that when you reach these watch icons, you stop working through the tutorial and explore what you have learned thus far. When most people have reached the watch icon, your workshop leader will probably have everyone pause so that you can discuss things as a group or the workshop leader can do a brief demo. At the end of the workshop, take this booklet and the Fathom Demo CD with you to explore and evaluate Fathom in more detail. But remember that the Demo CD and Workshop Guide are for demonstration purposes only, and are not for classroom or other extended use. And, as you become comfortable using Fathom, we encourage you to help others by leading workshops yourself. If you are waiting for the workshop to begin, please read the section entitled “The Structure of Fathom,” beginning on page 3. To the Presenter We have found that a successful and simple workshop format for Fathom involves little more than turning participants loose to work in pairs, with the tutorials in this guide. (If you prefer that participants work singly, have them identify a partner with whom they can consult when they have problems.) It is extremely useful to have a computer that can project to a screen or large monitor that everyone can see. The watch icons interspersed throughout the tutorials indicate reasonable pausing places where you can ask everyone to stop work for a moment to answer questions about the preceding section or to do a brief demonstration. It’s especially useful to have one or more participants come forward and show something they have found in front of the entire group. In the later tutorials, as participants have become more independent, you may want to skip watch icons or eliminate their use altogether. Key Curriculum Press gives copy permission to any presenter who wants to give this Workshop Guide to workshop participants. Key Curriculum Press also gives permission to install multiple copies of the Fathom Demo or a single network server copy for the purposes of giving a workshop. What Makes the Fathom Demo Different from the “Real Thing?” The Fathom demo has all the features of the full version with a few important exceptions: You can’t print, save, or export your work. You can do investigations, open documents, import or enter data, and use all the features of the full version, but you cannot save any documents you create. You also cannot export data you generate or copy pictures of Fathom objects to paste into another application. Finally, you cannot print your work. Fathom Workshop Guide ©2001 Key Curriculum Press 1 The full version comes with important and useful documentation. • The Learning Guide contains a much more complete introduction to Fathom’s features. • The Reference Manual provides how-to instructions for use of Fathom. • Data in Depth: Exploring Mathematics with Fathom, by Tim Erickson, contains 300 pages of classroom-ready activities in blackline-master form. 2 ©2001 Key Curriculum Press Fathom Workshop Guide The Structure of Fathom If you’re waiting for the workshop to begin, this section will help prepare you for the workshop itself. Don’t worry if you don’t have time to read it before the workshop begins. It makes equally good, perhaps even better, reading after the workshop. The Fathom Window Tool shelf Case table Collection Text Graph (scatter plot) Graph (histogram) A collection, opened up A collection, made small (iconified) A Fathom document lives in a window and contains objects. There are several different kinds of objects, led by the collection (which holds the data). The collection looks like a rectangle with gold balls in it, or, if you make it small, a box of gold balls. Graphs, tables, and statistical tests are objects, too. But they do not contain data. They just provide ways of looking at or altering data. (Deleting a graph or table leaves your data intact; deleting the collection deletes your data.) Sliders are also objects; they control variable parameters, such as the coefficient of a function or a probability in a simulation. Text resides in objects, too. Text objects allow you to add explanatory notes. When you first use Fathom, collections are boring and may even appear unnecessary. Why can’t the data just live in a table, like in a spreadsheet? The answer is deep; one way to look at it is that Fathom lets you distinguish between the way things are and the way they look. A table is just one representation of the data, and we didn’t want to give it the privileged position of being the data’s “native” appearance. So we invented another representation, the gold ball. Another reason for the existence of collections is that it makes some more advanced data manipulation, such as sampling, easier. For example, you sample from among the gold balls in the collection—and the balls go into another collection. Cases and Attributes Each gold balls is a case. A case has attributes (also known as variables, but we had to pick a word and stick with it) and attributes have values. If the values are all numeric, then it’s a continuous (measurement) attribute; if the values are text, it’s a categorical attribute. If your cases are people, height and age are continuous; sex and race are categorical. Fathom Workshop Guide ©2001 Key Curriculum Press 3 Whether an attribute is continuous or categorical determines many things—including the kinds of graphs you can make to represent it and tests and estimates you can perform on it. The Inspector When you double-click a collection, its inspector appears. It will show you the names and values of the attributes, one case at a time. You may edit the values here. You can also add measures (more on this later) and comments to the collection, change how the cases look, and control advanced functions, such as sampling, using the inspector. If you’re typing a lot of data into Fathom, the inspector is not the most convenient place for it; you’ll want to use a case table instead. Inspector (Windows platform) Case Tables The case table is the familiar tabular view of data. Each case appears as a row; each attribute as a column. You can enter and edit data quickly in a case table and add new attributes. You can also rename attributes, move columns by dragging them, resize columns by dragging their borders, sort the data, and even hide attributes to make your table less crowded. But though you can change data through a case table, it is only one view of the data. If you delete the case table, the data are still present—in the collection. Graphs Fathom also lets you view data graphically. The graph object supports various kinds of charts and plots. You tell Fathom what to graph by dragging attribute names to an axis of the graph. You specify the graph type by using a menu in the corner of the graph object. The configuration of attributes determines what kinds of graphs are available. For example, if you put age on an axis and leave the other axis blank, you can get a box plot or a histogram—because age is a continuous attribute. But if you put sex on that same axis, you can’t get a histogram—because sex is categorical. You get a bar chart instead. You can also add things to graphs, such as functions. Functions can be anything you can express with a formula (see the next section) as well as the more familiar lines such as the least-squares linear regression line or the median-median line. In most kinds of graphs, you can change the data by dragging unless the data are locked. So, for example, you can drag points in a scatter plot and see how they affect a least-squares line. Ubiquitous Formulas Formulas are everywhere in Fathom. You can use formulas as filters to tell which cases you want to see and you can use them to compute values (including random 4 ©2001 Key Curriculum Press Fathom Workshop Guide values) for attributes. You can use formulas to devise statistical measures and to plot formulas as functions in graphs. You write formulas in the formula editor. The formula editor appears when you double-click the place where a formula belongs, or when you select certain menu items. For example, to make an attribute that is the logarithm of income, make a new attribute, then select its name. Choose Edit Formula from the Edit menu. In the formula editor, enter log(income) and click OK. The new attribute will fill with the logs of the incomes. Later, any change in the incomes will instantly be reflected in their logs. The formula editor supports a wide array of functions, from trig functions to statistical Windows platform distributions to conditionals such as if() and switch(). They are all listed in the function and attribute list, a pane at the right in the editor. Double-click an item in the list to enter it into the formula pane above. You can also enter it by typing. The formula editor supports syntax coloring, that is, it displays the names of functions and attributes in special colors. If you type an attribute name or function and it remains black, Fathom does not recognize it (you probably have a typo). Using the Formula Editor The Effect of Selection Before After The peculiar nature of mathematical formulas makes editing them different from editing ordinary text. First, certain characters— parentheses, quotes, and absolutevalue signs—must appear in pairs. Fathom always makes the second of the pair when you enter the first. Second, grouping and order of operations require special editing capabilities. New users of Fathom can use their intuition most of the time, but there is one new principle they must learn: The effect of some keys depends on what, if anything, is selected when you press them. For example, if a + b—without parentheses—is selected, and you press (, you get (a + b)—with parentheses. If you press * (for multiplication), you get (a + b)*. In each of the examples at left, the user types /4. Finally, you need to click OK to close the formula editor. But sometimes you want to see the result of your formula—to see if it’s right—before closing the editor. In that case, click Apply on the screen or press Enter on your keyboard; this applies your changes, leaving the editor open. However, you must close the formula editor before you can go on to do other things. The formula editor is not case sensitive; that is, it doesn’t care whether you type capital or lowercase letters. Measures A regular attribute is a case attribute: It has a separate value for each case in the collection. But there are also measures—attributes with one value for the entire collection. For example, in a collection of five dice, each might have an attribute called number. But the collection itself could have a measure, or total, that applies to the entire collection. It would have a formula: sum(number). So, you can think of measures as statistics that Fathom Workshop Guide ©2001 Key Curriculum Press 5 summarize your collection, though they can serve other functions as well. You create measures by using the inspector. Double-click a collection to make the inspector appear, then go to the Measures pane. Derived Collections Some of Fathom’s collections fill with data automatically, according to rules you specify. These are called derived collections. There are several kinds. The two most important are samples and measures collections. A sample is just what it sounds like: a collection that is a sample of some other collection, called its source. Select the source and choose Sample Cases from the Analyze menu. The sample collection appears. Control how many cases it samples and whether it samples with or without replacement in its inspector. The measures collection is the key to simulation and analysis. It converts measures into case attributes—so you can record statistics (measures) about your collections. (It’s a bit tricky and is best understood after you’ve tried it.) Suppose you have a collection with five (randomly generated) dice in it and a measure, or total, that contains their sum. If you connect a measures collection to that source collection, the measures collection can record the total as a case. If you tell Fathom to collect 100 measures, it will instruct the dice to re-roll 100 times, it will compute each total, and it will make a collection of those 100 values. You have built up the distribution of that statistic. To make a measures collection, select the source collection and choose Collect Measures from the Analyze menu. Again, control how many measures it collects in its inspector. 6 ©2001 Key Curriculum Press Fathom Workshop Guide Getting Started with Fathom The Fathom Demo CD contains an installer that will install Fathom on the hard drive of your computer. Alternatively, you can run Fathom directly from the CD. System Requirements Windows • Pentium-based system or equivalent • Windows 95, Windows 98, Windows 2000, or Windows NT 4 • 16 megabytes of memory • CD-ROM drive Macintosh • OS 8 or better • 16 megabytes of RAM • CD-ROM drive Install Fathom Windows 1. 2. 3. 4. Insert the Fathom CD into your CD drive. Double-click the My Computer icon on your desktop. Double-click the CD icon. (It may be labeled either D: or Fathom.) Double-click the Setup icon and follow the directions on the screen. Macintosh 1. Insert the Fathom CD into your CD drive. 2. Double-click the CD icon. 3. Double-click the Installer icon and follow the directions on the screen. Starting Fathom When Installed on the Computer’s Hard Drive The installer places a shortcut or alias to Fathom on your desktop. (It looks like a little gold ball.) Double-click it. The Fathom application will launch. From the CD The top level of the CD contains a Fathom folder. Double-click this folder to open it. Inside the folder is the Fathom application icon. Double-click it. Note that depending on the speed of the computer and its CD drive, it can take considerable time to start the Fathom demo. Fathom Workshop Guide ©2001 Key Curriculum Press 7 Tutorial 1: Data, Formulas, and Prediction—Wrist Versus Height Here we look at entering data with an eye toward predicting values of one attribute from another attribute. You’ll learn how to make a collection and enter data. From the collection you’ll make several plots, including a dot plot and a scatter plot. Presenter: Allow 10 minutes for this first section. 1. Chances are, there is a shortcut or alias to the Fathom application on your desktop. If so, double-click it. This will launch Fathom, giving you a window similar to the one below. Making a Collection and Entering Data We have the measurements for five people’s heights and wrist circumferences. Given just these five measurements, how well can we predict height if all we know is wrist circumference? 2. If you don’t already have an empty document, make one with the New command in the File menu. Across the top of the Fathom window is a shelf with tools for making objects. Selection tool Another way to get a case table is to choose Case Table from the Insert menu. Fathom Workshop Guide Collection Case table Graph Summary table Estimate Test Slider Text 3. Drag a case table from the shelf into the document. When you click, your mouse pointer becomes a closed fist. As you move the mouse over the document, you see a frame that shows you where the case table will be placed when you release the mouse. You should get an empty table similar to the one shown at right. 4. Click once on <new>. 5. Type Name for the first attribute and press Enter. ©2001 Key Curriculum Press 9 When you press Enter, a box representing an empty collection appears next to the table. 6. Make the table wider by dragging on its right edge. This will give us room to see all three of our attributes at once. 7. Make two more attributes: WristCircum and Height. You should now have a table and an iconified collection as shown at right. 8. Make the WristCircum column wider by dragging its right edge. 9. Double-click on the label Collection 1 in the title of either the table or the collection. This should bring up a dialog for renaming the collection. 10. Type People and click OK. Stop. Presenter: Demonstrate how to move and resize the table. Answer questions. Allow 5 minutes for the following section. Entering Data Now we’re ready to enter the names and numbers for our five people. 11. Click in the empty cell below Name. 12. Type Bill and press Tab. 13. Enter the rest of the data shown in the table at right. All measurements are in inches. Use Tab to move to the next cell. At the end of each case, Tab will take you to the first cell in the next case. 14. If you are not using the demo version of Fathom, you should save and name your file. Undo and Redo Fathom records all changes that you make to your document and lets you undo and redo them. 15. Click on the Edit menu. The first item in the Edit menu shows you what you can undo. If the last change you made was to type in a value, the undo command is Undo Value Change as shown at left. 16. Choose Undo from the Edit menu several times. Each time you choose Undo, Fathom takes you one step further back. You can continue this until you get to the state of the document when you opened (or created) it. 17. Choose Redo from the Edit menu. Redo is the inverse of Undo. You can tell from the label in the Edit menu what it is you are about to redo. Stop. Presenter: Answer questions. Demonstrate use of arrow keys to move inside the case table. Show how to rename an attribute. Demonstrate keyboard shortcuts for Undo and Redo commands. Allow 10 minutes for the next section. 10 ©2001 Key Curriculum Press Fathom Workshop Guide Graphing Height Versus Wrist Circumference Now let’s look at the relationship between the two continuous attributes, Height and WristCircum. 18. Choose Graph from the Insert menu. You can also make a graph by dragging the graph icon from the shelf to an empty area in the document. 19. Drag the word WristCircum to the horizontal axis of the graph over the spot labeled Drop an attribute here. (The illustration below shows the steps involved.) Drag on the column header for WristCircum. The mouse pointer becomes a closed fist. As you move the mouse over the xaxis, a black border appears, showing that you can release there. 20. Drag the column header for Height and drop it on the vertical axis of the graph. Your graph should look similar to the one at right. Rescaling Axes and the Help System Let’s use the question, “How do I rescale the axes of a graph?” as an opportunity to introduce Fathom’s help system. 21. Choose Fathom Help from the Help menu. Fathom launches your Web browser. 22. Double-click on How Do I …?. 23. Double-click on Work with Graphs. 24. Click on Adjust axis bounds. 25. Read and try at least the main strategy entry in How to Adjust the Bounds of Axes. You can also see a video clip that demonstrates dragging. 26. Close the help window. 27. Rescale the axes of your graph. 28. From the Graph menu, choose Rescale Graph Axes to put the axes back to their original positions. Dragging Data You can drag data in a graph to change it, making it easy to observe the effect of changes in the data on graphs and analyses. 29. Move the mouse so that the tip of the arrow is on top of one of the data points. Notice that the arrow changes from a northwest pointer to a west pointer. This is your clue that you’re positioned to drag. Notice also that the status bar at the bottom left of the Fathom window shows the coordinates of the case. 30. Drag the data point. As you drag, you should see the values for the two attributes changing in the case table. Fathom Workshop Guide ©2001 Key Curriculum Press 11 31. Choose Undo Drag from the Edit menu. The point returns to its original position. You can prevent data from being accidentally changed by selecting the collection and choosing Lock Collection from the Data menu (this prevents you from dragging data points). Make sure Lock Collection is unchecked before continuing with this activity. Fitting a Movable Line Recalling that our goal is to be able to predict height from wrist circumference, and noticing that the points in the graph appear reasonably linear, we place a line in the graph. You can also right-click 32. Select the graph by clicking on it once. (Win) Control-click 33. Choose Movable Line from the Graph menu. (Mac) on the graph to The line that appears in the graph is not a fitted line. bring up a menu with You can change its slope and intercept by dragging commands that apply to it. Dragging on one end of the line causes it to rotate the graph. around the other end. Dragging on the middle of the line moves it parallel to itself. The cursor changes shape to suggest what will happen when you drag. Notice that the equation of the line shown below the graph updates as you drag. You might suppose that a line through these data Notice that the lower should have zero for a y-intercept. You can try out this idea by choosing Lock Intercept left corner of the graph at Zero from the Graph menu. You can unlock the intercept by choosing the same is not necessarily the point (0, 0). command again to uncheck it. 34. Experiment with dragging the line. Position it so that it appears to give a good fit to the data. While eyeballing a fit through data points is sometimes sufficient, we often need some criterion for a best fit. A commonly used criterion is a least-squares fit. You can see how least-squares works. 35. With the graph selected, choose Show Squares from the Graph menu. The graph now shows a square constructed from each point to the line, a square whose length is equal to the difference between the actual and predicted value for the point—the residual. 36. Experiment with dragging data points. Notice that the squares change as you drag, but the line does not move. (Be sure to use Undo to put the points back where they started from before going on.) 37. Experiment with dragging the line. Notice that the squares change and that the sum of squares reported below the graph changes. 38. Adjust the line so that the sum of squares of the residuals is approximately at a minimum. The line that satisfies this criterion is called the least-squares regression line. Fathom can compute this line. 39. With the graph selected, choose Least-Squares Line from the Graph menu. How closely did you manage to adjust the movable line to match the least-squares regression line? 12 ©2001 Key Curriculum Press Fathom Workshop Guide Stop. Presenter: Answer questions. Demonstrate how to rescale the axes of a graph either by using the menu command or by reselecting the chosen plot in the graph’s popup menu. Mention the Undo command and the Revert Collection command as ways to restore data in the collection. Demonstrate the Lock Collection command. Mention that a document can be locked to prevent inadvertent saves of changed data. Emphasize that the Graph menu is not present unless a graph is selected. Using the Show Squares command on a dot plot (unstacked), demonstrate the mean of univariate data as equivalent to the least-squares line for bivariate data. Allow 15 minutes or more for the last section, depending on how much time you wish participants to spend on Ideas for Further Exploration. Making a Prediction—Looking at Residuals The shortcut for Undo is Ctrl-Z (Win) a-Z (Mac); for Redo it is Ctrl-R (Win) a-R (Mac). 40. Measure your own wrist circumference in inches. (You can use the measure on the edge of page 59.) 41. Choose Movable Line from the Graph menu (to uncheck it). The movable line disappears. 42. Move the mouse pointer along the least-squares line, noting that the coordinates of the tip of the arrow are reported in the status bar in the lower left corner of the window. When the x-coordinate of the mouse pointer is at your wrist circumference, you can read off a prediction for your height. Probably this prediction was a bit off. Let’s see how much off we might expect it to be. 43. With the graph selected, choose Make Residual Plot from the Graph menu. A plot of the residuals appears below the main scatter plot. Corresponding to each point in the original graph is another point below whose y-value is the difference between the predicted value and the actual value (its distance from the line). 44. Drag one of the points in the top graph. Notice how its residual changes in the residual plot. Notice also that, since the least-squares line is changing in response to dragging the point, the other residual points are changing as well. 45. Use Undo to return the data to their original values. The residuals appear to lie roughly in a band from –3 to +1. How far off was the prediction of your height? Did it lie within that band? 46. Add yourself as a data point in the case table. How much does this change the slope and intercept of the least-squares line? The Influence of an Outlier Let’s play around a bit with the idea that one data point may or may not have a significant influence on the regression line. First, we need a little room to create an outlier. Fathom Workshop Guide ©2001 Key Curriculum Press 13 To zoom in, hold down the Ctrl key (Win) Option key (Mac) and click or drag; adding the Shift key zooms out. 47. Manipulate the axes by dragging the upper and lower ends of each axis toward the middle. You should have something similar to the graph shown at right. (The bounds of the axes have changed.) 48. Drag the rightmost data point far from the line. Notice how wildly the slope and intercept of the regression line can change in response to changes in one of the points. A least-squares regression line is quite sensitive to outliers, making it especially important that you look at your data graphically before reporting a least-squares slope or intercept. 49. Return the data point to its original value. Ideas for Further Exploration Hold down the Shift key while you drag to constrain the motion of a point. • Find several qualitatively different arrangements of the five data points that have a clear geometry but result in an r-squared value of zero. • Quantify the influence of particular points on the slope of the least-squares line. You should come up with a way to measure the rate of change of the slope per inch of height (or wrist circumference). Which point has the most influence? Which point has the most influence on the y-intercept? • The Graph menu has a command to show a median-median line. Investigate this line’s behavior when data is dragged and compare it to the behavior of the leastsquares regression line. Stop. Presenter: Answer questions. Ask participants who have found interesting things (possibly by working on the Ideas for Further Exploration) to show the rest of the group what they found. Ask for participants’ reactions to the term “dynamic statistics” and why it applies to Fathom. 14 ©2001 Key Curriculum Press Fathom Workshop Guide Tutorial 2: An Introduction to Exploring Data— Residents of Beverly Hills, California This tutorial shows how you can explore data using graphs and tables. We’ll look at a sample of residents of Beverly Hills, California. The data were obtained from the 1990 U.S. Census “microdata.” Each case represents one person who filled out a long census form in 1990. Presenter: It may be helpful to demonstrate how to open the Beverly Hills document before turning participants loose on this tutorial. Allow 10 minutes for the first section. Opening the BeverlyHills Document There is probably a Fathom shortcut or alias on the desktop that you can doubleclick to launch Fathom. Fathom Workshop Guide 1. If Fathom is not already running, launch it. 2. Choose Open from the File menu. 3. In the resulting dialog box, open the Sample Documents folder, then the Learning Guide Starters folder, and then the BeverlyHills file. (If you aren’t in the Fathom folder, first navigate to the Programs folder (Win) or hard drive (Mac), then open the Fathom folder.) Your document window should look similar to the one shown below. There are two objects: a collection of cases, each case a person represented as a gold ball, and a text object, explaining about the document. ©2001 Key Curriculum Press 15 Inspecting the Data You can also bring up the inspector by selecting the collection and then pressing Ctrl-I (Win) a-I (Mac). 4. Double-click the collection. The inspector appears. If you double-click on a case, the inspector shows that case; otherwise, it defaults to the first case in the collection. The inspector shows the first few attribute names in the left column. To see the remaining attribute names, use the scroll bar to scroll down the list, or resize the inspector by dragging on the window border. 5. Click on the inspector’s Comments tab. You will see a brief description of this data set. The Comments pane of the collection is a good place to keep documentation and notes about the data. It is also where notes and attribute definitions may appear in collections that you download from the Internet. 6. Close the inspector by clicking its close box. Making a Case Table The keyboard shortcut for inserting a case table is Ctrl-T (Win) a-T (Mac) Often you want to see the data in a table. Here’s how to make a case table in Fathom. 7. Click once on the collection to select it. (When you have it selected, it has a border around it.) 8. Choose Case Table from the Insert menu. The table that appears should be named BeverlyHills and should show the attribute names in the top row. If the table is empty, you probably did not have the collection selected before choosing the command. To connect the collection to the table, drag the name as it appears in the collection onto the table as shown below. Drag from the collection to the table and release. Each row in a case table shows attribute values for one case. Each column corresponds to one attribute. You can use a case table to change values, to enter new data, to add attributes, and to define formulas for attributes. 9. Drag a bottom corner of the case table to enlarge the table to see more cases and attributes. Making Graphs Let’s look at the range of ages among the people in the sample by creating a graph of age. 10. Drag a graph from the shelf into your document. Fathom needs to know which attribute to graph. 16 ©2001 Key Curriculum Press Fathom Workshop Guide 11. Drag the age attribute by its name from the case table to the horizontal axis of the graph over the spot labeled Drop an attribute here. The result should be a stacked dot plot of the ages of the people in the sample, as shown at right. We see that the ages of the people in our sample range from zero to almost 100. 12. Move the mouse pointer over one of the points in the plot. In the lower left corner of the Fathom window you should see information about the point under the mouse. Different graphs will show different kinds of information as you move the mouse pointer over the plot. A stacked dot plot chooses to stack dots whose values lie inside a bin the width of one dot. To get more control over the binning, we need a histogram rather than a dot plot. 13. Click on the tab in the upper right corner labeled Dot Plot. A popup menu, as shown at left, appears. 14. Choose Histogram from the popup menu. This histogram groups people according to their ages. Each bin contains the number of people indicated along the vertical axis by its height. Notice the peak in the range from thirty to forty. Perhaps these are the baby boomers. Stop. Presenter: Answer questions. Demonstrate having more than one graph visible simultaneously. Show linked selection in the two graphs. Show how to delete components with the command in the Edit menu and with the Delete key (on a Mac, a-D also deletes the selected object). Demonstrate the View in Window command. Allow 5 minutes for the following section. Plotting Values Suppose we want to compare the mean and median of the age distribution. We can do that right on the graph. 15. With the graph selected, choose Plot Value from the Graph menu. This brings up the formula editor, ready for you to specify the expression whose value you want plotted. 16. As shown at right, type mean( ). Notice that when you type the left parenthesis, you get the right parenthesis automatically. 17. Click OK. A vertical line appears in the graph at the mean and the value of the mean is shown below the graph. Now the median. 18. With the graph still selected, choose Plot Value from the Graph menu again. You could type median( ), but instead, here’s how you can choose it from the list of functions. 19. Click on the control next to the word Functions in the list. Fathom Workshop Guide ©2001 Key Curriculum Press 17 Apply updates the graph but does not close the formula editor. Then you see the kinds of functions available. 20. Open the Statistical list by clicking its open control. 21. You’ll find median in the One Attribute group; double-click it to enter it into the formula. Notice that the bottom portion of the formula editor displays some information about the median function and gives an example of how to use it. 22. Click OK. The graph should look similar to the one shown at right. Let’s experiment with how these plotted values behave when data are changed. 23. Drag one of the bins of the histogram. Notice that the mouse pointer becomes a hand when you are over a bin (as opposed to a bar boundary), indicating that you can drag the whole bin. As you drag, watch the two plotted values. When does the value for the mean change? When does the value for the median change? Can you demonstrate that the median is insensitive to outliers, while the mean is not? 24. Restore your data to their original values. Stop. Presenter: Answer questions. Demonstrate that any expression may be plotted; e.g. mean( ) + 1.96 * s( ). Demonstrate that a normal distribution can be plotted on top of the histogram and that this is especially easy with a density histogram. Demonstrate that you can supply an argument for a function, but that if you don’t, Fathom assumes the attribute on the axis is the argument. Allow 10 minutes for the following section. Making a Summary Table The keyboard shortcut for inserting a summary table is Ctrl-U (Win) a-U (Mac). You can also drag one from the shelf. Formulas are displayed under the table. The values appear in the same order in the table cells. 18 A summary table is another important tool for exploring data. 25. Choose Summary Table from the Insert menu. Fathom puts an empty table in your document, as shown at right. The summary table is empty because Fathom doesn’t know which attributes we might be interested in. 26. Drag the sex attribute to the summary table, dropping it on the down-pointing arrow. (As with the graph, a black outline appears when you are over a drop area.) The result should be similar to the illustration. From this table we see that there are 79 females and 71 males, 150 cases altogether. By default, when showing a categorical attribute, the summary table displays the number of cases in each group. 27. Drag the age attribute to the summary table, dropping it on the right-pointing arrow. When you add a continuous attribute such as age, Fathom adds the mean for each group and the overall mean. Suppose you want to know something other than the count and the mean. You can add as many formulas as you want to a summary table. ©2001 Key Curriculum Press Fathom Workshop Guide You will only see a if the summary table is selected. Summary menu 28. With the summary table selected, choose Add Formula from the Summary menu. A formula editor appears, ready to accept any algebraic expression whose value you wish to compute. Let’s compute the range of ages. Fathom doesn’t have a built-in range function, so we’ll take the difference of the maximum and minimum values. 29. Type the following expression into the formula editor max( ) – min ( ). 30. Click OK. A third formula appears under the previous two, and a third value appears in each cell. Here are a few useful things to know about using summary tables. Try out the ones that interest you. • You can change which attributes are displayed in the table by dragging attributes to the top or side column or the corresponding arrows. • You can replace one attribute with another by dragging an attribute on top of the attribute you wish to replace. • To remove an attribute from the table, click once on the attribute and choose Remove Attribute from the Summary menu. See if you can answer the following questions using the summary table. • What are the median incomes for males and females? • Are there more married or never married people in this collection? • What is the value of the mean age plus one sample standard deviation for married people in this sample from Beverly Hills? Splitting a Graph 31. Suppose we want to know whether the distribution of incomes differs between men and women. 32. Drag the sex attribute to the vertical axis of the histogram and drag income to its horizontal axis. The result should be similar to that shown here. The sex attribute has split the histogram into an upper histogram of males and a lower histogram of females. We say that we split the histogram of the numeric attribute, income, by the categorical attribute, sex. Notice that two medians and two means are plotted, one for each group. How do you interpret the values listed below the graph? Some graphs can be split by a legend attribute. Let’s look at age versus income, split by marital status. 33. Drag age to the x-axis and income to the y-axis. You get a scatter plot. 34. Drag marital to the middle of the graph. You should get a graph similar to the one at right. Describe the four square points in the upper middle of the plot using all three of the attributes. Find out if they are men or women. Fathom Workshop Guide ©2001 Key Curriculum Press 19 Ideas for Further Exploration • Investigate the graphs you can make with only categorical attributes. • Make up and answer questions like the following. Which group has the higher percentage of males: people who are divorced or people who are married? Which is the most commonly listed ancestry and what is the median income of that group? • Make a percentile plot of income and figure out how to read it. Use it to determine the income corresponding to the 80th percentile. • Use a text object to write a statement about the people in this sample. Make a display to go along with the statement. Repeat two more times. Stop. Presenter: Answer questions. Demonstrate the various kinds of summary tables you can generate, depending on the attribute configuration. Demonstrate adding a filter to a graph or summary table. Allow participants who have completed one of the explorations to show the group what they found. 20 ©2001 Key Curriculum Press Fathom Workshop Guide Tutorial 3: Formulas, Functions, and Data—The Planets This tutorial focuses on using formulas to define new attributes and to plot functions. We’ll look at a collection with only nine cases—the planets in our solar system. From the data we’re given, we’ll compute the planets’ densities and investigate the relationship between a planet’s distance from the sun and the length of its year. You’ll learn how to use formulas to define attributes, how to plot functions in a scatter plot, and how to use sliders. Presenter: Allow 10 minutes for this first section. Computing the Densities of the Planets 1. If Fathom is not already running, start it by double-clicking its icon. 2. Choose Open from the File menu. Choose the file Planets from the Learning Guide Starters folder in the Sample Documents folder. Your window should resemble the one below. You can also create a new attribute just to the left of one that already exists by selecting the existing attribute and choosing New Attribute from the Data menu. Fathom Workshop Guide First we’ll compute each planet’s density. For that we need to know its volume and its mass, since density is the ratio of these measurements. The volume comes from the radius, the attribute R_Mm, measured in millions of meters. The mass, measured in Earth masses, is recorded in the attribute M_Earths. We’ll compute density in two steps using two new attributes (though we could do it in one), beginning with volume. You will want to hide the text object first, to give yourself more room; when it is selected, press Ctrl-H (Win) a-H (Mac). 3. Resize the table, dragging to the right until you see the column headed <new>. 4. Click once on <new>. 5. Type Volume and press Enter. Now we need to tell Fathom the formula for volume. 6. From the Display menu, choose Show Formulas. 7. Double-click in the formula cell (which is shaded) just beneath Volume. The formula editor appears as shown on the next page. Enter the formula using your computer keyboard or the formula editor’s keypad and list. Here are some tips. ©2001 Key Curriculum Press 21 The formula for the volume of a sphere is V = 43 πr 3 • From the keyboard, type * for multiplication, / for division, and ^ (Shift-6) for exponentiation. • To escape the fraction, use either the right arrow key or a mouse click. • Multiplication is sometimes indicated as a dot between terms and sometimes, as in traditional algebra, as nothing between terms, though you need to put it in. • Double-click on an item from the list to enter it into the formula. • Enter π either by choosing it from the Special heading or by typing pi (with an asterisk after for multiplication). • Use the attribute R_Mm in place of r. You can get it from the attributes list or by typing it. Look up Formulas in the help system for more information and hints. You can resize the formula row in the case table by dragging its bottom edge down. 1086.78 is the volume of the Earth in cubic megameters. • Click an item in the list to get an explanation of it in the help area at the bottom of the formula editor. • Vertically resize the formula editor’s areas by dragging their borders. • Use the arrow keys on the editor keypad or on your computer keyboard to move right or left. The up arrow selects the next largest portion of the expression; the down arrow selects a smaller term. 8. Close the formula window by clicking OK or pressing Alt-F4 (Win) a-W (Mac). Fathom computes the volumes and displays them in the table in gray to distinguish them from noncomputed values. The next attribute has an easier formula: Density = Mass Volume 9. Make a new attribute, Density. 10. Give the new attribute the formula shown to the right. The numerator is the mass in Earth masses and the denominator is the volume in Earth volumes, so the density comes out in Earth densities. This lets us understand the density of the planets relative to our own planet. 11. Make a graph with Density on the horizontal axis. It should look similar to the one at right. When you click on a data point, the corresponding planet in the collection is selected. You can select more than one case at a time by clicking and dragging a rectangle around some points as shown below left. When you double-click a planet, you get its inspector. Answer these questions. • Which planet has the lowest density? • What do the three planets on the right have in common? • What do the four planets on the left have in common? • What is special about Earth? 22 ©2001 Key Curriculum Press Fathom Workshop Guide • If the Moon is a chunk of Earth, where should it lie in the graph? Look it up and see where it does lie. Stop. Presenter: Answer questions. Demonstrate that formulas can be copied and pasted, either within the formula editor or from one attribute to another. Reinforce use of arrow keys to navigate around a formula. Allow 10 minutes for the next section. Playing Kepler The graph has to be selected for the Graph menu to appear. To zoom in, click on the axis while holding down the Ctrl key (Win) Option key (Mac). Now we’ll study one of the most famous relationships in physics—how the period of an orbit depends on the orbit’s radius. 12. Make a scatter plot with year on the vertical axis and orbit_AU on the horizontal axis. You should see points lying on a curve gently curving upward. It looks as though we could predict how long a planet takes to go around the sun if we knew the radius of its orbit. But what is the function? One way to find out is to try to plot it. 13. With the graph selected, choose Plot Function from the Graph menu. 14. Type in a plausible function, say, orbit_AU2, and click OK. It’s not quite right! Let’s change the exponent. We’ll use a slider so we can drag continuously from one value to another. 15. Choose Slider from the Insert menu or drag one from the shelf. A slider appears. The illustration below right shows its parts. By default the slider has the name V1. We think that the curve should be between linear and Animation Slider name Slider value button quadratic, that is, between a power of 1 and 2. The slider scale is like a graph axis, so we can change it. 16. Double-click the name and type something more reasonable, like P for power. Draggable “thumb” Scale 17. Zoom in on the slider (or drag the axis) until the scale goes roughly between 1 and 2. 18. Double-click the formula below the graph so that you can edit it. Replace the 2 in the exponent with P. Click OK. 19. Drag the slider’s thumb. As you drag the slider’s thumb, the curve moves accordingly. Another way to change the slider value is to edit it directly. 20. Double-click the slider’s value and type a new one, pressing Tab or Enter when you’re done. 21. Find a value for P so the curve fits nicely through the points. You should find that P = 1.5 or P = 3 2 fits very well. The curve’s equation is year = radius3/2, or, as Kepler would have written it, T 2=R 3, where T is the length of the year and R is the planet’s radius. Stop. Presenter: Answer questions. Demonstrate slider animation. Allow 10 minutes for the next section. Fathom Workshop Guide ©2001 Key Curriculum Press 23 Kepler with Logs Another way to figure out Kepler’s Third Law involves taking the log of both attributes. 22. Make two new attributes, named logorbit and logyear. (These are names, not formulas.) 23. Make their formulas log(orbit_AU) and log(year), respectively. 24. Make a scatter plot of logyear versus logorbit. It looks like a line! 25. Choose Least-Squares Line from the Graph menu. Check out the slope. Look familiar? If T2 = R3, then log(T2) = log(R3). So 2 log(T) = 3 log(R). This means that Kepler’s law implies that the slope of log(year) versus log(orbit) should be 1.5, and we see that it is. As you can see in the graph at right, there is a gap between Mars and Jupiter. What goes there? Ideas for Further Exploration • Investigate the residual plot for both the plot of orbit_AUP and the plot of logyear vs. logorbit. One planet stands out in particular on these residual plots. Which one? Postulate why. • Make an empty graph. Choose Function Plot from its popup menu. Try plotting different kinds of functions. Use sliders as parameters. Stop. Presenter: Answer questions. Ask for participants to show some of the things they tried. A possible demo is to show how to compute an attribute in the collection whose values are the residuals from a least-squares regression line. 24 ©2001 Key Curriculum Press Fathom Workshop Guide Tutorial 4: Simulation—Polling Voters In this tutorial, we simulate a population of voters, a certain proportion of whom will vote in favor of a particular proposition. We investigate the question of how accurately a random sample of voters can predict the outcome of the election. In the process you’ll learn how to use a slider to define a population parameter, to use the random function, to define a measure for a collection, and to repeatedly collect measures from a collection. Presenter: Allow 5 minutes for the first section. Defining the Problem The city of Freeport has a rent control initiative, Prop A, on the ballot. The local newspaper is going to conduct a poll three weeks before the election to gauge public sentiment. Staff members need to know how big a sample to poll. Your job is to set up a simulation they can use to determine, for any given sample size, the accuracy they should expect. Making a Simulated Sample of Voters First we model the population, the people who will vote in the election either for or against the proposition. This model will consist of a single number, the proportion of voters who will vote yes. 1. Start with a new, empty Fathom document. 2. From the shelf, drag a new slider into the document. By default a slider gets the name V1. We want the slider to have a name that fits with our model. 3. Double-click the name portion of the slider, the V1, and type probYes. Since a proportion of voters can only go between zero and 1, we need to adjust the slider scale. 4. Drag the numbers on the slider scale until the scale goes approximately from 0 to 1. 5. Set the value of the slider (by dragging its thumb) to something near 0.5, simulating a close election. The slider models the population. A collection will model a sample of voters. 6. Drag a collection from the shelf into the document. The collection will be empty and named Collection 1. 7. Double-click on name below the box. Use the resulting dialog box to name the collection Sample of Voters. We want this collection to contain 100 cases for starters. Later we can investigate other sample sizes. 8. With the collection selected, choose New Cases… from the Data menu. 9. Type 100 in the resulting dialog box and click on OK. Stop. Presenter: Answer questions. You may wish to give a preview of the if( ) function. Allow 5 minutes for the next section. Fathom Workshop Guide ©2001 Key Curriculum Press 25 Simulating the Votes The shortcut for Rerandomize is Ctrl-Y (Win) a-Y (Mac). So far, the cases in the collection have no attributes. We want one attribute, vote, whose value is either yes or no. 10. Double-click the collection. This brings up its inspector. In it you will define an attribute for a vote. 11. Click <new>, type Vote, and press Enter. 12. Double-click in the formula cell in the right column. You should see a formula editor. Now you need to enter the formula that will choose “yes” the percentage of the time specified in your slider, probYes. 13. Enter the conditional formula shown at right by typing if(random()<probYes Tab “yes” Tab “no”. The random function, if you don’t tell it a minimum and maximum, generates random numbers from 0 to 1. 14. Close the formula editor by clicking the OK button. You should see the value of the Vote attribute in the inspector become either yes or no. Scroll through a few more cases by clicking on the right arrow in the lower left corner of the inspector. Some of the values should be yes and some should be no. Let’s make a graph of the Votes. A bar chart will show us at a glance whether the poll predicts the proposition will pass or fail. 15. Drag a graph from the shelf into the document. 16. Drag the vote attribute name from the inspector onto the x-axis of the graph. You should now have the three objects shown at right. 17. Choose Rerandomize from the Analyze menu. Do this several times. Each time you rerandomize, the bars in the bar chart will change, reflecting the results of a new sample. Another way to rerandomize is to drag the slider thumb. When the slider value is near 1, most of the votes will be “yes,” and when the slider value is near zero, most of the votes will be “no.” Stop. Presenter: Answer questions. Allow 15 minutes for the remaining section of this tutorial. Defining and Collecting a Measure 18. Move the slider back to something close to 0.5. We’ve been hired to predict the accuracy of a poll conducted with a given sample size. To figure that out, we’re going to have to record the prediction of each poll we take. We do that with a measure. 19. In the inspector window, click the Measures tab. As yet no measures have been defined. 20. Click <new> and type proportionOfYes for the measure’s name. 26 ©2001 Key Curriculum Press Fathom Workshop Guide Your inspector won’t look exactly like this because we’ve adjusted its size and column widths. If collecting the measures takes too long, press the Esc key. (On a Mac, a-. also cancels collecting measures.) You can speed things up by turning animation off. 21. Double-click in the formula cell in the proportionOfYes row. 22. Enter the formula proportion(vote = “yes”) as shown at right. 23. Rerandomize several times and observe the values for proportionOfYes. Now we’re ready to perform the poll of 100 voters many times, each time recording the proportion of voters who say they will vote in favor of the proposition. 24. Close the Sample of Voters inspector. 25. With the collection selected, choose Collect Measures from the Analyze menu. You should see a new collection, as shown above right. This collection will contain the results of repeatedly conducting the poll. 26. Double-click the measures collection to open its inspector. 27. Bring the Collect Measures pane forward. 28. Change the number in the measures field from 5 to 100. 29. Click Collect More Measures. Fathom rerandomizes the Sample of Voters collection 100 times, each time computing a value for proportionOfYes and storing that value in a new case in the measures collection. Let’s take a look at the results. 30. Make a new graph. 31. Bring the inspector’s Cases pane forward. 32. Drag the proportionOfYes attribute from the inspector to the x-axis of the graph. 33. From the graph’s popup menu, change the graph from a dot plot to a histogram. An example of what you might get is shown at right. Here are some observations you might make about the histogram of proportionOfYes. • The distribution looks plausibly normal. • The mean of the sample proportions should be near the probYes value. • It looks as if the mean of the sample proportions is near 0.5. We expect this result since the probYes slider was set near 0.5, and the mean of the sample proportions is the proportion for a single sample of 100 x 100 or 10,000 voters. The spread of proportions that we get with a sample size of only 100 is quite large. We certainly couldn’t accurately predict the results of a close election! Just how large is this spread? 34. From the Insert menu, choose Summary Table or drag a summary table from the shelf (the table with the big S). 35. Drag the proportionOfYes attribute from the inspector into the top row of the summary table. Fathom Workshop Guide ©2001 Key Curriculum Press 27 36. Double-click the formula, where it says S1=mean( ), and change it to stdDev( ). This will compute the standard deviation of the proportions gathered in the measures collection. Dependence of Spread on Sample Size Let’s try a larger sample size. 37. With the Sample of Voters collection selected, choose New Cases… from the Data menu. In the dialog box, ask for 300 new cases and click OK. The collection now has 400 cases altogether. 38. Bring back the Collect Measures pane in the measures collection inspector. 39. Uncheck Animation on. 40. Make sure Empty this collection first is checked. 41. Click Collect More Measures to collect 100 measures of 400 cases each. We expect that the larger the sample, the smaller the variation in the observed population proportion in favor of Prop A. But how much smaller? What do you find for the standard deviation of sample proportions with a sample size of 400? What do you think would be the standard deviation if the sample size were 1000? You now have a tool to use with the newspaper staff members. You could sit down with them and try different sample sizes until you find a sample size for which the variability is less than some desired number of percentage points. Ideas for Further Exploration • Determine a sample size such that twice the standard deviation of sample proportions is about 0.03 or 3% when probYes is 0.5. • When we actually conduct a poll, we do not know the true proportion of the population. (After all, that’s why we’re doing the poll.) Probability theory predicts that we can expect to capture the true proportion 95% of the time in the range ± 1.96 p (1− p ) n where p is the proportion found in the sample and n is the number in the sample. Try modeling this in Fathom and see how close your simulation comes to theory. • We have set up this simulation to make it easy to change the population proportion. With small sample sizes, population proportions near the extremes, that is, near 0 or 1, are problematic. Investigate why this might be so. Stop. Presenter: Answer questions. Demonstrate how to gather and display the data shown at right. Do this by defining a new measure in the Sample of Voters collection. Call it n and give it the formula count(). In the measures collection, turn off the Empty this collection first option. Collect measures with different sample sizes. Drag n to the y axis of the graph while holding down the Shift key so that it will be treated as categorical. Have participants who have done further explorations show what they have found. 28 ©2001 Key Curriculum Press Fathom Workshop Guide Tutorial 5: Testing a Hypothesis Charles Darwin believed that there were hereditary advantages in having two sexes in both the plant and animal kingdoms. Some time after he wrote Origin of Species, he performed an experiment in his garden at Down House in Kent. He raised two large beds of snapdragons, one from cross-pollinated seeds, the other from self-pollinated seeds. He observed, “To my surprise, the crossed plants when fully grown were plainly taller and more vigorous than the self-fertilized ones.” This led him to another, more time-consuming experiment in which he raised pairs of plants, one of each type in the same pot, and measured the differences in their heights. He had a rather small sample and was not sure that he could safely conclude that the mean of the differences was greater than zero. His data for these plants were used by statistical pioneer R. A. Fisher to illustrate the use of a t-test. In this tutorial, you’ll learn how to perform a t-test for the mean, generate simulated data from a normal distribution, and get a distribution for the t-statistic. Presenter: Allow 10 minutes for the first section. Looking at Darwin’s Data 1. Open the file Darwin in the Learning Guide Starters folder in the Sample Documents folder. This document contains the data for the experiment described above. The collection contains only fifteen cases with one attribute. 2. Make a case table, a histogram, and a summary table similar to those shown here. We see that most of the measurements are greater than zero, meaning that the crosspollinated plants grew bigger. But two of the measurements are less than zero. These two values might be due to some sort of error, but we keep them because we know that Darwin did. Formulating a Hypothesis Darwin’s theory—that cross-pollination produced bigger plants than selfpollination—predicts that, on the average, the difference between the two heights should be greater than zero. On the other hand, it might be that his fifteen pairs of plants have a mean difference as great as they do—21 eighths of an inch—merely by chance. You can write out these two hypotheses in Fathom, to be stored with your document. 3. From the shelf, drag a text object into the document. 4. Write the null hypothesis and the alternative hypothesis. At right you can see one way to phrase them. Deciding on a Test Statistic Not a statistician, Darwin decided to ask the advice of his cousin, Francis Galton, an eminent statistician, who told him that there was currently no good theory to deal with a small sample from a population whose standard deviation is not known. In fact, it was not until some years later when William Gosset, an employee of the Guinness Brewing Company and a student of Karl Pearson, developed a statistic and Fathom Workshop Guide ©2001 Key Curriculum Press 29 its distribution. Gosset published his result under the pseudonym Student and the statistic became known as Student’s t. When the null hypothesis is that the mean is zero, the t-statistic is just x s n where x is the observed mean, s is the standard deviation, and n is the number of observations. Let’s compute this statistic for Darwin’s data. 5. From the Analyze menu, choose Test Hypothesis or drag a test from the shelf (the balance icon). An empty test appears. 6. From the popup menu in the upper right, choose Test Mean. As shown at right, the Test Mean test assumes that you are going to type in summary statistics. The blue text is editable. This is very useful when you don’t have raw data. 7. Try editing the blue text. You can, for example, enter the summary statistics for Darwin’s data. Here are some things to notice. edit text cursor popup menu cursor edit formula cursor • As you hold the cursor over blue text, the cursor changes to show you to edit that field. • Numeric values are determined by a formula. If you click on the value, you get a popup menu as shown below. Choose the one available option and you get a formula editor. You can type a number or a formula here. • When you change something in one part of the test, it may affect other parts. For example, editing the <AttributeName> field in the first line also changes it in the hypothesis line and the last paragraph. In statistics terminology, we want this to be a one-tailed rather than a two-tailed test. 30 • In the hypothesis line, clicking on the is not equal to phrase brings up a popup menu from which you can choose one of three options. For Darwin’s experiment, we want the third option because his hypothesis is that the true mean difference is greater than zero. Notice that making this change alters the phrasing of the last line of the test as well. Stop. Presenter: Answer questions. Make sure everyone understands how to edit the test hypothesis object. Demonstrate using a slider for a parameter such as the alternative hypothesis mean. Allow 10 minutes for the next section. ©2001 Key Curriculum Press Fathom Workshop Guide Checking Assumptions The shortcut for Edit Formula is Ctrl-E (Win) a-E (Mac). The shortcut for rerandomizing is Ctrl-Y (Win) a-Y (Mac). Gosset’s work with the t-statistic relied on an assumption about the population from which measurements would be drawn, namely, that the values in the population are normally distributed. Should we be comfortable with this assumption for Darwin’s data? In the assumption’s favor is experience with height measurements of other living things, both plants and animals. These are usually normally distributed, and so are differences between heights. But we might worry, because the two negative values give a decidedly skewed appearance to the distribution. Fathom can help by allowing us to determine qualitatively whether this amount of skew is unusual or not. We’ll generate measurements randomly from a normal distribution and compare the results with the original data. 8. Make a new attribute in the collection. Call it SimHeight for simulated height. 9. Click once on the name of the new attribute in the case table to select it. 10. Choose Edit Formula from the Edit menu to bring up a formula editor. 11. Enter the formula shown here: randomNormal (mean(HeightDifferences), stdDev (HeightDifferences)). This will generate random values drawn from a normal distribution whose mean and standard deviation are the same as for Darwin’s data. We want to compare these simulated heights with the actual ones. 12. Create a new graph and drag the SimHeight attribute to its x-axis. Use the plot popup menu to make a histogram. One set of simulated data doesn’t tell the story. We need to look at a bunch. 13. With the graph selected, choose Rerandomize from the Analyze menu. Each time you rerandomize, you get a new set of 15 values from a population with the same mean and standard deviation as the original 15 measurements. The leftmost histogram below is of the original measurements. Next to it are three of the histograms you might get from the simulated heights. A bit of subjectivity is called for here. Does it appear that the original distribution is very unusual or does it fit in with the simulated distributions? Testing the Hypothesis Once we have decided that the assumption of normality is met, we can go on to determine whether the t-statistic for Darwin’s data is large enough to allow us to reject the null hypothesis. In a previous section, we typed the summary values into the test as though we didn’t have the raw data. But we are in the fortunate position of having the raw data, so we can ask Fathom to figure out all the statistics using those data. Fathom Workshop Guide ©2001 Key Curriculum Press 31 14. Drag the HeightDifferences attribute from the column header of the case table to the top panel of the test, where it says Attribute (continuous) <unassigned>. 15. If the hypothesis line does not already say is greater than, then choose it from the popup menu. The last paragraph of the test describes the results. If the null hypothesis were true and the experiment were performed repeatedly, the probability of getting a value for Student’s t this great or greater would be 0.025. This is a low P-value, so we can safely reject the null hypothesis and, with Darwin, pursue the theory that cross-pollination increases a plant’s height compared to selfpollination. Stop. Presenter: Answer questions. Demonstrate the non-verbose form of the hypothesis test. Demonstrate how to hide and how to iconify objects to free up screen space. Allow 10 minutes for the next section. Looking at the t-Distribution You may want to iconify some of the objects to free up some space. You’ll need the test and the graph of the original data. 32 It is helpful to be able to visualize the P-value as an area under a distribution. 16. With the test selected, choose Show Test Statistic Distribution from the Test menu. The curve shows the probability density for the t-statistic with 14 degrees of freedom. The shaded area shows the portion of the area under the curve to the right of the test statistic for Darwin’s data. We’ve set this up as a one-tailed test; we’re only interested in the mean difference being greater than zero. Other times we want to know if a given mean is different from zero. That would be a two-tailed test. You can change the test to a two-tailed test using the popup menu to change is greater than to is not equal to. If you do this, you will see a shaded region on both the positive and negative ends of the plot. The total area under the curve is one, so the area of the shaded portion corresponds to the P-value for a two-tailed test. Let’s investigate how the P-value depends on the test mean, currently set to zero. 17. Drag a slider from the shelf into the document. 18. Edit the name of the slider to TestMean. 19. Select the histogram of original data and choose Plot Value from the Graph menu. 20. In the formula editor, type the name of the slider (or choose it from the Global Values section of the list to the right of the keypad). You should see a vertical line in the histogram corresponding to the value of the slider. 21. Click on the 0 in the statement of the hypothesis in the test. Choose Change formula for value from the popup menu. 22. In the formula editor enter TestMean just as you did for the plotted value. ©2001 Key Curriculum Press Fathom Workshop Guide Now the value of the null hypothesis mean in the test, and the shaded area under the t-distribution, change to reflect the new hypothesis. 23. Drag the slider slowly and observe the changes that take place. For what value of the slider is half the area under the curve shaded? Why does the shaded area under the curve increase as the value of the slider increases? Ideas for Further Exploration • What effect do the negative values have on the results? Set TestMean to zero and drag one of the negative points to the left to make it even more negative. How far can you go before the P-value reaches 0.05? • Darwin had 15 observations to work with. Could he have gotten away with fewer? Set up a Test Mean in which you type values into all the parameters of the test except the number in sample to which you attach a slider, n, using the formula round(n). What is the smallest sample size (with the same mean and standard deviation) that results in a P-value of 0.05 or less? Stop. Presenter: Answer questions. If there is time for a demonstration, show how to define the tstatistic as a measure for the SimHeight attribute with a population mean of zero. Collect many of these measures and use the distribution of t-statistic values to show the relationship between the simulation and the analytically computed value. Fathom Workshop Guide ©2001 Key Curriculum Press 33 Fathom Demo Documents The tutorials in this guide introduce you to a small portion of what you can do with Fathom. Fathom comes with over 40 demo documents, and these provide a fairly quick way to get an idea of other capabilities and possibilities for using Fathom in the teaching of math and statistics. While the sample documents show off Fathom’s capabilities and are often useful, especially for demonstrations, the real fun (and much of the learning) comes from figuring out how to create such documents in Fathom. Here we list a few of these documents with brief notes to help you choose which ones to look at. Fathom Demos\Algebra\CompoundInterest: This is a slider-parameterized model of compound interest based on initial investment, interest rate, periods per year, and payment per period. It comes with questions for students to investigate, but it also can be used (without the text object) as a demonstration document. Fathom Demos\Algebra\ScreenSaver: Here we have a kind of parametric plot that makes interesting Lissajous figures when the sliders that define it are animated. Consider it as an inspiration for student projects. Fathom Demos\Data Display and Exploration\Computers in Schools: This sample doesn’t really show much of anything that Fathom can do, but it does provide some interesting data as a starting point for investigations about curve fitting. Fathom Demos\Data Display and Exploration\TexasCounties: Here you can see how it is possible to use an open collection to display meaningful information, in this case median values of homes in the counties of Texas. Fathom Demos\Distributions\Normal: The collection in this document has 100 numbers from a normal distribution whose mean and standard deviation are determined by sliders. A histogram with a normal distribution curve overlay and a normal quantile plot allow you to gain some insight into randomness and normal distributions. Fathom Demos\Simulations\BrownianMotion: This partially complete simulation provides a starting point for investigating Brownian motion. Fathom Workshop Guide ©2001 Key Curriculum Press 35 Sample Activities With Fathom comes Data in Depth: Exploring Mathematics with Fathom, by Tim Erickson, which contains well over 100 activities for using Fathom in the teaching of mathematics and statistics. We’ve chosen a few of these activities to help show how Fathom can illuminate concepts and pose worthwhile challenges. We’ve modified the pages somewhat to fit within this Workshop Guide. You can open the files required for the activities (those that require files) by navigating to the DataInDepth folder within the SampleDocuments folder in the Fathom folder. Tall Buildings Sonata (p. 38) is an example of a type of activity in Data in Depth called a Sonata for Data and Brain. A Sonata is longer than a problem or a worksheet, but much shorter than a project—maybe at most an hour or two of work. And while a Sonata lets students use a variety of methods to come to a conclusion, it provides structure by describing the task, identifying relevant attributes, and giving instructions on how to get the data. This simple Sonata deals with the problem of predicting the height of buildings from the number of stories. The Geography of U.S. Cars I (p. 40) is the first of three activities in which students study vehicles per person and miles per vehicle. Area and Perimeter (p. 43) asks the question, “What is the relationship between the area and the perimeter of randomly constructed rectangles?” Fathom makes the rectangle data for us in an elementary simulation; when we analyze it, we find that a curve—whose equation we find—forms the boundary of a cloud of points. Consumer Price Index (p. 46) looks at the CPI since its inception and computes inflation rates. This highlights the difference between proportion and absolute difference. Heating Water (p. 48) asks you to fit a line to rising water temperature. It looks linear, but making a residual plot shows previously unseen structure. This is the introduction to residual plots. It works especially well with chemistry, physics, and statistics. Random Walk—Inventing Spread (p. 51) involves simulating a random walk using Fathom and developing a measure of spread to go with its results. A New Birthday Problem (p. 55) turns the old chestnut around a little, asking how many people it takes until you get a match—you simulate this problem. Aunt Belinda (p. 57) is a Sonata that explores an inference problem based on a simple binomial situation. Fathom Workshop Guide ©2001 Key Curriculum Press 37 Tall Buildings Sonata You need Buildings You’ll show that you can • make a prediction about a numerical relationship • make scatter plots • use a movable line • develop a rule of thumb In this Sonata, you’ll study data about the tallest buildings in Minnesota and Florida. You’ll find the data in the file Buildings in the Exploring Mathematics with Fathom folder. These data come from the World Wide Web. The Question How is the height of a building related to the number of stories it has? Prediction The relevant attributes are height (which is the height of the building in feet) and stories (the number of stories). How do you predict the two quantities will be related? In particular, about how many feet do you think there are per story? Measurement Analyze the data using Fathom. Be sure to use a movable line. Describe briefly what you did: Comparison This is a good place to describe anything else you noticed or can explain. For example, is there any difference between Minnesota and Florida? Can you explain both the slope and the intercept of any line you used? 38 Use Fathom to compare your prediction to the data. Report what you found out. Refine your description of the relationship between height and number of stories. Develop a rule of thumb (a simple calculation) you can use to predict the height of a building from the number of stories. Describe that rule: ©2001 Key Curriculum Press Fathom Workshop Guide Tall Buildings Sonata—Teacher Notes Data in Depth has many exercises in what is called “sonata form.” The main idea in Sonatas is for students to predict or make conjectures. Insist that students commit to their predictions and write them down before they see the data. This is a great way to use class or homework time before you go to the lab. You can even assess the “prediction” work—as long as you assess it only in terms of its being sensible, not necessarily correct. Help students make simple mathematical models (i.e., all linear) but daring ones (i.e., imagining data they have not seen). Here are several levels of model, all acceptable depending on the students’ experience: Fathom Workshop Guide • Qualitative: the more stories, the higher the building. • Functional: the graph will be a straight line (it’s proportional). The slope represents the number of feet per story. • Functional with quantity: the above is true, and the slope is probably about 12 feet per story. • With context: the above is generally true, but there is definitely a yintercept, which might represent towers that tall buildings generally have. ©2001 Key Curriculum Press 39 The Geography of U.S. Cars I You need States–CarsNDrivers You’ll learn how to • make new attributes • define attributes by formula These data come from the U.S. Department of Transportation and are for 1992 except where noted. You can also scroll to the <new> column, but this technique puts the column closer to the other attributes you need in this activity. This activity is about driving in the various states of the U.S. The file is called States – CarsNDrivers found in the Exploring Mathematics with Fathom folder. For each state, it records these attributes: PopThou Population of the state (in thousands) DriversThou Number of licensed drivers (in thousands) VehThou Number of registered vehicles (in thousands) MilesMill Number of miles driven (in millions) RoadMiles Number of miles of roadway in the state (as of December 31, 1994) AreaSqMi Area of the state in square miles You will be computing and comparing various per-unit quantities. We’ll start in a very step-by-step way and quickly move to asking you the big questions, just letting you use Fathom to respond. 1. Open States – CarsNDrivers. First we’re going to ask what percentage of the population in each state drives. For that, we’ll divide the number of drivers by the population, then multiply by 100 to get the percent. But first. . . 2. Predict! Which states do you think will have the highest and lowest percentage of licensed drivers? Guess the three highest and the three lowest. 3. Onward! Click the heading for PopThou. The column is highlighted. 4. Choose New Attribute from the Data menu. When the attribute name box appears, name it PctDrivers for “Percent Drivers” and close the box. 5. Select the name of the new attribute. Choose Edit Formula from the Edit menu. The formula editor appears. 6. Enter 100 * DriversThou / PopThou. Close the editor by clicking OK. You should see numbers like the ones in the picture. 7. Figure out which states have the highest and the lowest percentages of licensed drivers. Report the three highest and the three lowest. 8. How did you figure out which were the highest and lowest? 9. Why do you suppose those states are the highest and lowest? Per Capita Vehicles The Latin phrase per capita means “per person.” 40 10. Use the same procedure to figure out which states have the largest and smallest numbers of vehicles per capita in the population. First predict which states will have the most and the fewest vehicles per capita. ©2001 Key Curriculum Press Fathom Workshop Guide 11. How many vehicles per person were there for the highest and lowest states? 12. Write the name of your new attribute and the formula that you used to calculate it. 13. Sketch the box plot of that attribute. 14. Again, propose an explanation for why those states are the ones that are highest and lowest. Miles per Vehicle 15. Do the same thing you did above, only this time compute the average number of miles each vehicle travels per year in each state. In the formula for your new attribute, use the attribute MilesMill, which is the total number of miles driven that year in millions of miles. Be careful! The number of vehicles, VehThou, is in thousands of vehicles. Include responses to the preceding five items (but for the new attribute). This is easiest if you make two dot plots or box plots, one for each of the last two new attributes. When you select states in one plot, they are highlighted in the other. Fathom Workshop Guide 16. Look at the states that had the highest values of vehicles per person. Where do they fall in miles per vehicle? Explain that. See if the same pattern holds true for the states with the smallest numbers of vehicles per person. ©2001 Key Curriculum Press 41 The Geography of U.S. Cars—Teacher Notes This activity is the first of three that appear in Data in Depth. All three are about cars and the 50 states (plus the District of Columbia). Students use ratios to compare the states, making many different per-unit quantities. What’s Important Here This is about proportion, but let’s highlight a few issues: • Remind students to see whether their answers make sense. Assess them through other assignments or by including sense-making in the rubric. This could easily be a sentence that begins, “0.42 cars per person makes sense because… ” followed by an explanation or calculation, or “5.7 cars per person is surprising because in the city hardly anyone has a car… ” • Any “per” quantity is an average. It represents the whole group (especially in comparisons), but individuals may differ greatly. Per capita income, for example, may not represent many people’s actual earnings. • Help students use what they know about the states. Encourage geographical speculation. For example, New England states are often clustered together in the attributes we will study. Why? California, known for its “car culture,” is unexceptional in these data except for being large. Why is that? The Geography of U.S. Cars I Here are some suggestions for discussion topics: 1. Ask students how they found out which state had the maximum or minimum of some quantity. Have them explain their methods (e.g., looking at a point plot or sorting the table) so the whole class can build its repertoire of skills. 2. Ask students if they know how far their family cars (if any) are driven in a year. Compare each distance to your state’s milesPerVeh. Explain any discrepancies you come across. (This technique, for example, ignores commercial travel.) 42 ©2001 Key Curriculum Press Fathom Workshop Guide Area and Perimeter: A Study in Limits You need to know how to • make new attributes • write formulas for attributes You’ll learn How are areas and perimeters of rectangles related? We all know the formulas—area = length*width, perimeter = 2(length + width)—but if you look at them from a data perspective, what do you get? In this activity, we’ll use Fathom to generate the measurements for rectangles randomly and explore the relationships. You can also generate rectangles haphazardly using any other technique you like (asking people to make them, using The Geometer’s Sketchpad®, etc.) and use those data. 1. Open a new Fathom document and create a new case table. • how to plot functions • how to make sliders • 2. Make two attributes—length and width. one way to use random numbers 3. With the case table selected, choose New Cases from the Data menu. Tell Fathom to make 200 cases. They appear, though their attributes have no values. Making Random Rectangles We won’t actually make the rectangles here; we’ll just make the measurements. 4. Select length, and choose Edit Formula from the Edit menu to bring up the formula editor. 5. In that box, enter the formula random(10). Close the box by clicking OK. Now length is filled with random numbers ranging from 0 to 10. 6. Use that same formula to define values for width. Now you have random numbers for both attributes. 7. Make a scatter plot of these two attributes; it ought to look random. Calculating Area and Perimeter 8. Make two new attributes, area and perimeter. Make formulas to determine these; the formulas should depend only on length and width. Copy the first two or three cases from your table here: 9. Make a scatter plot with perimeter on the horizontal axis and area on the vertical axis. Sketch the graph at right; label and scale the axes. 10. If everything went well, there should be no points in the upper-left part of the graph—a region bounded by a curve. Explain why there are no points in that region. Fathom Workshop Guide ©2001 Key Curriculum Press 43 The Boundary Now your goal is to figure out the equation for that curved upper boundary of the cloud of points. There are two strategies: the theoretical one and the empirical one. In the theoretical strategy, you figure it out from the equations and just plot it. These directions will concentrate on the empirical. The boundary curves upward. It might be exponential, or it might be some polynomial (such as a quadratic). Since finding areas can involve squaring, we’ll try a quadratic first, area = perimeter2. 11. Select the graph and choose Plot Function from the Graph menu. The formula editor appears. 12. Enter perimeter^2 into the box and close it. The curve will appear, as in the illustration. (Remember that you don’t enter the left-hand side of a function in Fathom.) Unfortunately, the curve is too steep. You can alter the formula on your own, but the following is fun… 13. We’ll make a slider. Choose Slider from the Insert menu. A slider appears; it looks like a number line with a pointer on it. There’s also some text that gives the name of the variable and its value, for example V1 = 5. Change the name of V1 to denom. 14. Edit the plotted function (double-click the equation at the bottom of the graph). Change it to perimeter 2 denom . Now you can control the value of denom by dragging the slider’s thumb. As you do so, the function changes. You will have to change the scale of the slider. Do it the way you change an axis—by dragging the numbers with the hand, or Ctrl-Shift-clicking (Windows) Option-Shift-clicking (Mac) to zoom out. 15. Change the value of denom until the curve accurately forms the boundary. What value of denom did you get? (It should be greater than 10.) 16. Study the points near the boundary. What is the shape of the rectangles near the boundary curve? That is, what do they have in common? 17. Explain why your answer to the preceding question makes sense. That is, why do all those rectangles have that property? Going Further 1. Show, algebraically, what the curve’s equation must be, and show that it’s the same as the one you got by looking at the data. 2. Depending on how you made the rectangles, there may be another empty zone in the lower-right corner of the scatter plot. Why are there no rectangles there? What is the equation of that border? Why? 44 ©2001 Key Curriculum Press Fathom Workshop Guide Area and Perimeter: A Study in Limits—Teacher Notes Structured computer activity Data generated in simulation Intermediate, medium features As described, students simulate creating random rectangles. You may also do this very effectively (especially with less experienced students) if you have them create genuine rectangles, cut them out, and measure them. Make sure, though, that students have experience with parabolas if you do the second page. If not, simply exploring the plot is worthwhile. What’s Important Here • Using random numbers for simulation. • Using a scatter plot of data to illuminate geometrical constraints. Discussion Questions Topics • area and perimeter • parabolas • visualization • fitting nonlinear functions using variable parameters • What do rectangles in the lower left look like? (Small.) • How about rectangles in the lower right? (Long and skinny.) • What are the shapes of the rectangles along the curved border? (They’re squares.) • Why do you suppose so many of the points seem to line up close to the boundary? (Large-area rectangles must be close to being squares or they’d have a dimension larger than 10. For smaller rectangles, you can have a surprisingly large length / width before the rectangle moves a long way from the border. But most important (and subtlest), the distribution of the quotient of two uniform random variables, larger/smaller, is greatly skewed toward 1. So you do get more squarish figures.) What’s the Formula? Here’s an elegant answer to Going Further 1, supplied by a teacher: The curved boundary is made of the rectangles with the maximum area for a given perimeter. Those are squares. So what’s the area of a square with perimeter P? 2 2 Each side is P4 , so the area is P16 . Therefore, the equation is A = P16 . There are other perfectly good approaches. What about the other boundary? That’s created because we’re limited to 10 in length and width. So, for some perimeters there is a minimum area. For example, with a perimeter of 30, what is the skinniest (minimum area) rectangle we can make? You can only spend a maximum of 20 on two lengths, and you’ll have 10 left over for the widths. You wind up with a rectangle of area 50. The width of such a rectangle (taking length = 10 as the long side) is, width = (perimeter −20 ) 2 − 20) so area min = 10 × (perimeter = 5( perimeter) − 100 2 which is the line we seek for perimeter > 20. Fathom Workshop Guide ©2001 Key Curriculum Press 45 Consumer Price Index You’ll learn how to • make new attributes • use the formula editor • compute differences in Fathom Scientists often use the Greek letter delta to mean “change of.” Note: You get a bogus result in the first year because there’s no previous year. Remember to ignore that first point (we’ll deal with it more explicitly in a future activity). In general, things get more expensive over time. How much more expensive? One way to tell is to look at a statistic called the Consumer Price Index (CPI). The CPI in these data is set so that 1967 = 100. This means that when the CPI is 200, things generally cost twice as much as they did in 1967. 1. The file CPI contains annual data from 1913 to 1997. Open it. 2. Make a scatter plot showing CPI as a function of year. Your graph should look like the one in the illustration. As you can see, things just keep getting more expensive. 3. People worry a lot about inflation. One way to define inflation is that it’s the amount by which the CPI is increasing. Look at the graph: In what year does it look as if the CPI was increasing most steeply? 4. Let’s figure out just how steep, using Fathom. Make a new attribute by clicking in the column header marked <new>. Call it deltaCPI. 5. Now we make a formula that computes the change of CPI from the preceding year to this one. Open the formula editor by selecting your new attribute name and choosing Edit Formula from the Edit menu. 6. Enter CPI – prev(CPI) (prev stands for “previous”). Click OK to close the formula editor. The differences should appear in your deltaCPI column. 7. Make a new graph that shows deltaCPI as a function of year. It should look like the graph shown here. Now, in what year did CPI increase most steeply? Explain how you used the graph to tell. Some people would say that a big increase doesn’t matter—only a big percentage increase does. After all, a CPI increase of 20 when the CPI is 300 is not as important as when the CPI is 50. We’ll investigate this. 8. Make a new attribute, percentChange, that calculates the percent change of the CPI from the preceding year. Graph percentChange as a function of year. Sketch and label that graph below, and write the formula you used for percentChange. 9. What’s a typical percentChange in the last decade of data? In what year, since the beginning, was the percent change the greatest? 46 ©2001 Key Curriculum Press Fathom Workshop Guide Consumer Price Index—Teacher Notes Structured computer activity Data from file Intermediate, basic features This activity is about computing rates. Students compute a rate simply by taking the difference between one case and its predecessor. In a later activity, students must also divide by the difference in time, just as when they take derivatives in calculus. What’s Important Here Topics • rates • proportional rates Students need to be able to • open a file • make a scatter plot Materials • CPI Consumer Price Index data are from http://stats.bls.gov/cpihome.h tm. We have intentionally chosen this set of data to minimize the issue of initial conditions. • Students learn how to use prev( ) to make a formula look at the previous row. • Subtracting the previous value from the current one in order to calculate change is a cornerstone of numerical analysis. Time spent checking for understanding will be a good investment. • Students begin to see that the first case in a recursive or inductive definition may have to be treated specially. • Students see the relationship between a graph of the data and a graph of the change in the data. • Students see and calculate the difference between change as an absolute quantity and change as a relative (percentage) quantity. Discussion Questions 1. In the graph of deltaCPI, why is the first point so strange? 2. Given the data we have up to 1998, what do you think the CPI is now? 3. Ask if anyone can remember the price of something in 1997 or earlier. Then ask students to use the CPI and extrapolate to the present to predict the price now. Does the price match? If the prediction is high or low, what does that mean? 4. Sometimes people use a CPI based on a different year. What formula would you use to convert the 1967 = 100 CPI to one in which 1990 = 100? 5. Why would you want to look at the percentage increase in the CPI rather than the absolute increase? 6. What historical features can you see on the graphs? Can you explain them? (The Great Depression, World War II, the 1980s recession) 7. If there’s a peak in the “delta” graph, what does that mean on the “straight” graph? (It’s at a high slope.) If there’s a peak in the “straight” graph, what does that mean in the “delta” one? (It’s descending through zero.) Tech Note In this activity, we ask students to identify points on a scatter plot; for example, “In which year did the CPI increase the most?” Students can read the numbers off the axis or find them in the table, but they can also simply point at the point. The cursor changes to a left arrow and the coordinates of the point appear in the status area in the bottom left of the Fathom window. Fathom Workshop Guide ©2001 Key Curriculum Press 47 Heating Water Sketch your prediction here: In this activity, we’ll explore residuals, which are the amounts “left over” when you model data with a function. The author heated a pan of water, recording how many minutes and seconds had passed each time the temperature increased by 5 degrees Celsius. Heating has the data for temperatures between 35 and 90 degrees Celsius, which span the times between 40 seconds and 205 seconds after the heat was turned on. 1. Before you open the file, sketch what you think the graph will look like. That is, how does temperature increase with time for a pan of water on the stove? Is it linear? If not, how does it curve? Make a labeled and scaled sketch at left. 2. Now open the file. Make a new graph. Put time on the horizontal axis and temp on the vertical. Describe how the actual graph differs from your prediction. 3. Choose Least-Squares Line from the Graph menu. A least-squares fit to the data appears. Write the equation for the least-squares line below and tell what the slope means (e.g., what are its units?). The line should look as if it goes through all the points, and r2 should be very close to 1.00. We might make a mistake here and say, “Wow—temperature increases linearly. The fit is nearly perfect.” That is, we might think we’re finished. It’s true that the line explains a lot of the temperature change, but two things should bother you: Temp. How Far Off 35 40 45 50 55 60 65 70 75 80 85 90 (You don’t have to do all the points—just enough to see a pattern.) 48 • It’s hard to believe that it really is linear, that is, that the stove can heat hot water as fast as it can heat cold water. • It’s hard to believe that our measurements of times and temperatures are so accurate that the points fall on the line exactly. So we’ll look at the graph more closely. 4. For several of the points, zoom in to the point and record, in the table we’ve provided, approximately how far it is from the line (in degrees Celsius). Use positive numbers for above the line, negative for below. These numbers are called the residuals. 5. Describe any pattern you see in the residuals. 6. Let’s make this process easier. Make a new attribute called predictedTemp. 7. Give predictedTemp a formula—the formula of the least-squares line. Now the values of predictedTemp are the temperatures on the line for each of the times in the collection. 8. Make another new attribute, residuals. Give it the formula temp – predictedTemp. Explain why these residuals are the same as the “distance in degrees Celsius” you reported in Step 4 (especially, is the sign right?). ©2001 Key Curriculum Press Fathom Workshop Guide 9. Make a new separate graph and plot residuals as a function of time. Sketch that graph (labeled and scaled) at left. The pattern you see in the graph should be the same as the one you saw in the numbers in Step 5. Now, what does that pattern (the curve) mean in terms of the situation of heating water? That is, it’s not linear. In what way is it not linear, and why do you suppose that’s true? A Great Shortcut 10. Click on the original graph (temp as a function of time) to select it. Choose Make Residual Plot from the Graph menu. Describe what appears. 11. Drag a point on the graph. Describe what happens to the line and to the residual plot (undo the drag afterwards so that your data are safe). Going Further 1. You can also use the residual plot to assess the accuracy of the measurements. The thermometer was scaled with lines every degree (about 1.5 mm apart). One possibility is that the line really is a good model and the residuals are just due to errors in measuring temperature. Is that very likely? Base your response on the size of the residuals and on the pattern of the residuals. 2. The author claims that he watched the thermometer and wrote down the time when the temperature passed 30, 35, 40, . . . degrees. Do you think he had enough time to do this (or did he invent the data)? If he was honest, how accurate do you think his time measurements were? 3. Suppose the temperature measurements were completely correct and accurate. What measurement blunders (in size and pattern) would the author have had to make in the time measurements to produce the pattern in the residuals? 4. Suppose the author, seeing the thermometer pass a temperature where he needed to take a reading, did the same thing for every measurement: He turned from the stove, wrote down the temperature, looked at his watch, and wrote down the time. There is a delay between the time when the temperature was correct and the time when he looked at his watch. Describe the effect this might have on the fitted line and on the pattern of residuals. Fathom Workshop Guide ©2001 Key Curriculum Press 49 Heating Water—Teacher Notes Three structured computer activities Data from files Intermediate, basic features This activity is the second of three that introduce students to residuals and residual plots. Our approach in Residuals Playground and Heating Water is to have students calculate residuals directly—by having Fathom calculate from the given function—before introducing the Make Residual Plot command. The third activity, Radiosonde Residuals, is suitable either as practice to follow the second or as a quick introduction for more experienced students. It’s also suitable for homework or open lab time. What’s Important Here Topics • residuals Students need to be able to • make scatter plots • make new attributes • use formulas to define attributes We use residuals for many purposes in data analysis. These activities focus on using residuals to see patterns that are invisible in a “regular” graph. • You can see deviations (from the curve) or a pattern (e.g., nonlinearity) better because the vertical scale of the residual plot is so different from the one in the main plot. Making the residual plot magnifies the data with respect to the model (the line). • If residuals don’t look random, we should question the model. In Heating Water, for example, the linear model is OK for some purposes (interpolating temperatures to within a degree, say), but we can see that the phenomenon is really not linear. Discussion Questions 1. What are the units on the vertical scale of the residual plot? (degrees Celsius) 2. Compare the main plot to the residual plot. Why are the scales so different? 3. (Especially in Residuals Playground.) What would happen if we defined a residual as (predicted – observed) instead of the other way around? (For one thing, when we moved the data one way, residuals would go the other. It would be confusing.) One way to think of residual plots is that we turn the model—the line or curve— into a new axis. A graph from Heating Water (page 48) with a residual plot. Tech Notes Note: The student pages do not tell students how to zoom in. We describe one way here; you can also encourage students to look in Help. 50 • Students need to zoom in for Heating Water. To do so, hold down Ctrl (Win) Option (Mac) so that the cursor changes to a magnifying glass and drag a rectangle around the region you want to investigate. The rectangle expands to fill the whole graph. Rechoosing Scatter Plot from the popup menu in the graph rescales the plot. • You can do Heating Water with your own data (especially if you’re teaching chemistry, for example). If the burner is hot, you get a good, linear-looking regime in the middle. At the beginning, it flattens out as the container heats; at the end, it flattens as the water gets closer to boiling. If you include these “flat” areas, (a) students should definitely use the movable line—not least-squares—and fit it to the linear part of the data, and (b) students will need to expand the vertical scale of the residual plot in order to see the curve in the linear part of the temperature graph. ©2001 Key Curriculum Press Fathom Workshop Guide Random Walk—Inventing Spread You’ll learn how to • build a simulation • use the inspector • define measures • collect measures You can’t save your work if you are running the Fathom demo. Fathom Workshop Guide A random walk is a process in which, for every step, you flip a coin. Heads, you step east. Tails, you step west. On the average, you’ll wind up where you started—or will you? It depends on what you mean by average. This activity will help you explore the random-walk situation. Building the Simulation We’ll begin by building a simulation of a random walk in which we’ll take 10 random steps from zero. Each step will be +1 or –1. 1. Open a new Fathom document. 2. Create a new case table, either by dragging one off the shelf or by choosing Case Table from the Insert menu. 3. Make a new attribute, distance. To do this, click in the table header marked <new>. Type distance in place of <new> and press Enter. This attribute, distance, will record our distance from the start during our random walk. 4. Make 10 cases (these will be the steps in the walk). Choose New Cases from the Data menu, enter 10, and click on OK to close the box. We now have 10 cases. We need a formula to compute the values for distance. 5. Select distance and choose Edit Formula from the Edit menu. The formula editor appears. 6. Enter this formula: prev(distance) + randomPick( –1, 1). That is, our distance from the beginning at every step is the preceding distance plus either +1 or –1. Click OK to close the formula editor. Numbers appear in the table, representing our distance from the beginning after each step. 7. Press Ctrl-Y (Win) a-Y (Mac) to rerandomize. Do this a few times. Describe what happens. 8. Let’s make a graph. Make a new graph by dragging one off the shelf or by choosing Graph from the Insert menu. 9. Drag the attribute distance from the top of the column in the case table to the vertical axis of the graph. You’ll see a graph like the upper one shown here. 10. The dots show us what values distance took during our walk. Choose Line Plot from the popup menu in the graph. You’ll see a graph like the lower one shown here. 11. Rerandomize again; you’ll see the graph wiggle. (Note: The graph may wiggle out of its original bounds. To change the bounds, drag the numbers on the vertical axis. The cursor-hand changes depending on where on the axis you drag. That tells you what will happen to the axis.) 12. Let’s rename our collection. Double-click its name (Collection 1) and rename it Steps. 13. Save your work. Call the file RandomWalk. ©2001 Key Curriculum Press 51 Finding the Averages Each random walk is different. We want to know, on the average, how far you are from the place where you started. 14. Rerandomize 20 times. Each time, record the position (positive or negative) after the tenth step. 15. Calculate (by hand or with a calculator) the average of those numbers. What do you get? 16. What do you expect to get in the long run? Why? 17. This time, calculate the average of the absolute values of the positions. What do you get? 18. Explain the difference between the meanings of the two averages above. Why would you ever use one? Why the other? Collecting Measures It should not be obvious what to expect in the long run from the absolute values. But if we do more walks, we should get a better idea. Fathom will help us record the final positions and calculate the means. To do that, we need to find the last value for distance in the simulation. We’ll call that final. Then we need to collect a set of finals so we can average them. Here’s how: 19. Double-click on the collection icon (box of balls) to bring up the inspector. 20. Click on the Measures tab. We’re heading for something like the picture shown here. 21. Make a new measure by clicking on the <new> label and typing final. Press Enter when you’re finished typing the name. 22. Double-click to the right of final in the Formula column to bring up the formula editor. 23. Enter this formula: last(distance). The function last( ) returns the last value in the list. 24. Leaving the inspector open, press Ctrl-Y (Win) a-Y (Mac) to see the value of final change. Be sure that it does before closing the inspector. 25. Now we’ll get Fathom to collect the finals. Select the collection (the box of balls). Choose Collect Measures from the Analyze menu. A new collection appears (called Measures from Steps; it may be under the inspector). 52 ©2001 Key Curriculum Press Fathom Workshop Guide 26. With the new (measures) collection selected (it should already be selected), choose Case Table from the Insert menu. Now you can see the various final positions. This list should resemble the one you made by hand. 27. Make a new graph and drag final to the horizontal axis. Change the graph to a histogram by choosing Histogram from the popup menu in the graph. 28. We need to see the mean. With the graph selected, choose Plot Value from the Graph menu. The formula editor appears. 29. Enter mean() and close the editor with OK. The value and a line appear on the graph. 30. Sketch the graph at left. Explain below what the graph represents in terms of the walk. That is, where did those numbers come from? Be precise and concise. 31. We also want to see the mean of the absolute values. Plot another value with the formula mean( | final | ). (You can use the on-screen keypad to get the absolute value bars.) Sketch its position on the graph you drew for Step 30. Explain why this number is greater than the previous one. 32. Now we want to do it again. Double-click on the measures collection to bring up its inspector. It has a Collect Measures pane. Select it; you’ll see something like the one shown here. 33. Click Collect More Measures to collect a new set of 5. If you position the inspector correctly, you can see both graphs update. 34. Increase the number of measures to 40. Collect measures again. What do you think are reasonable values for the “eventual” mean of final and the mean of the absolute value of final? Going Further 1. If you’ve studied the binomial theorem and expected values, compute the theoretical mean of the absolute value and compare it to the result of your experiment. 2. How will the mean of the absolute value change if we collect 400 walks, say, instead of 40? Explain why. 3. How will the mean change if our walk is 40 steps instead of 10? Explain why and do the experiment. Compare your results to your prediction. Fathom Workshop Guide ©2001 Key Curriculum Press 53 Random Walk—Inventing Spread—Teacher Notes Structured computer activity Data generated in simulation Starter, complex features The random walk is a basic situation for the theoretical study of spread. Under the hood, it’s a binomial problem—the populations at the various positions after n steps follow the distribution in the nth row of Pascal’s triangle. With enough steps, the distribution approaches normality, and (of vital importance) the size of the standard deviation increases proportional to the square root of n. Here we ask a less sophisticated question: What’s the average distance you’ve gone after a random walk of 10 steps? What’s Important Here Topics • random walks • spread Materials • RandomWalk (optional) • This activity gives students some experience with which to underpin the random-walk situation. • What we mean by average (e.g., whether we take the absolute value) varies with context. • The activity develops experiences that will help students understand the theoretical results. Discussion Questions 1. What’s the horizontal axis in a line plot? 2. What does the histogram of the finals really mean? Explain the graph to someone who hasn’t done the activity. 3. In general, do you expect someone who has taken more random steps to be farther from the start? Why? 4. In general, do you expect someone who has taken twice as many random steps to be twice as far from the start? Why? A Big Fathom Idea This activity introduces derived collections. We create what we call a measures collection—a collection whose job is to collect summary values from another collection. Those values are statistics—measures of that “source” collection. In the case of our random walk, they’re the final position of the walk. Here’s a table that compares them. 54 Random Walk (Source) Collection Measures Collection Each case is a step. Each case represents one random walk. The whole collection is one random walk—a collection of steps. The collection summarizes many random walks (5 by default). The final position is a single number (a statistic, a measure) that summarizes the whole walk. Each case contains the final position of one walk, so the collection has many “final positions.” You can’t calculate the mean distance from the origin here because this collection is only one example. You can calculate the mean distance from the origin in this collection—by averaging the final positions. ©2001 Key Curriculum Press Fathom Workshop Guide A New Birthday Problem Here’s a new version of an old problem. Suppose people come into a room one at a time. Whenever a new person comes into the room, he or she announces his or her birthday. If it matches that of anyone else in the room, the door is locked. No one else is let in. On the average, how many people will get into the room before this happens? What’s the distribution of that number? Conjecture and Design First, what do you think? About how many people will come in before there’s a birthday match? Design a simulation that will answer the question. This answer should include not only the average number of people but also the distribution of that number if the task were repeated many times. Briefly describe how you’ll do it. What collections will you need? Any special formulas? Any things you don’t know that you need to find out? Construction Build your simulation in Fathom and run it. Report your results below: Sketch the graph of the distribution and give the average number of people you got. Comparison and Reflection Do your results seem reasonable? How did they compare with your conjecture? Did your simulation work as planned? Describe any modifications you had to make. Fathom Workshop Guide ©2001 Key Curriculum Press 55 A New Birthday Problem—Teacher Notes Data from simulation Independent, complex features 56 This problem is a variation on the old one that asks, “What’s the probability that out of 30 people in a room, two will have the same birthday?” The surprising answer is that the probability is better than .5. Here we turn it around a little and ask how many people it takes until you get a match—and simulate that. If students have trouble, suggest that they start with a collection with one case with one attribute, birthday, with a formula of randomInteger(1,365). They should sample from that, using an until formula in the sample pane. ©2001 Key Curriculum Press Fathom Workshop Guide Aunt Belinda Aunt Belinda claims to have magnetized her kitchen table. She says that it makes nickels come up heads when she flips them. “Magnets don’t attract nickels,” you say. “Ordinary magnets don’t,” she admits, “but mine is special.” You supply her with 20 nickels. She sits at her table and throws the coins into the air. When they come down, 16 are heads. Of course, you ask her to do it again. Of course, she refuses. “The energy fields in my table are exhausted,” she says. “Besides, one proof is enough.” What do you think? Conjecture and Design Without any calculations, what do you think is the chance of throwing 16 heads in 20 throws? Do you think a result of 16 out of 20 is convincing evidence? If so, evidence of what? Now design an experiment you can do with Fathom to compute experimentally the probability of getting 16 or more heads out of 20 throws. Describe what you will do before you set hand to mouse. Measurement If you happen to know about the randomBinomial() function, please don’t use it except to check your work. Build your simulation and run it. Describe below any changes you made in your experiment design above. Sketch the histogram (or whatever display you use) that finally gives your results. Comparison and Conclusion How did the actual probability compare to what you thought it would be? Write what you think you can conclude from your experiment with Aunt Belinda. Be as clear as possible. Fathom Workshop Guide ©2001 Key Curriculum Press 57 Aunt Belinda—Teacher Notes The Aunt Belinda problem is one of a suite of four called Inference Sonatas and Other Challenges. We present the teacher notes for all of them even though only one problem’s student pages are included in this guide. Many computer investigations Data from everywhere Independent, complex features Aunt Belinda This Sonata is an inference problem based on a simple binomial situation. One solution is to make a collection with 20 cases, each a coin with an attribute face controlled by a formula such as randomPick("heads", "tails"). Then make a measure, nHeads, with a formula such as count(face = "heads"). Now collect measures from that collection. The distribution of nHeads in the Measures collection will show that a result of 16 heads out of 20 is unusual. Students will vary in what they do with that result. Hot Hoops We invite students to explore the phenomenon of being “hot” or “cold” in sports. Do people get “hot hands,” or are we just looking for patterns in randomness? The key is to design an experiment in which there is an identifiable null hypothesis (there are no streaks) and you can construct a statistic to measure streakiness. The suggestion on the student sheet leads to a 2-by-2 array that you can test for independence using chi-square or any equivalent statistic. That is, the null hypothesis is that there is no relationship between the outcome of the preceding shot and this one. Another alternative is to look at streak length itself. The function runLength( ) will help you investigate streaks in your data; for example, max(runLength(shot)) will give the length of the longest string of cases in which shot has the same value. If the data are streaky, this should be large. Or count(runLength(shot) = 1) will give the number of streaks. In streaky data, this number is small. To simulate the null hypothesis, scramble your data so that order is random and build up a distribution of your statistic. Beyond Constructivist Dice These are two pages of problem prompts based on the activity on the Constructivist Dice activity. More Inference Challenges Here are more problems that apply other inferential techniques from this chapter. Some of them use the census microdata files you may have used for activities in the first chapter of this book. If you have not, open the Census Files folder in the Samples Documents folder to choose a file with data from your area or one you’re interested in. 58 ©2001 Key Curriculum Press Fathom Workshop Guide Fathom Workshop Guide ©2001 Key Curriculum Press 59