Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Analysis and Assessment By Katie Jean & Curtis Data and Statistics Project Standards (9-12) Minnesota State Standards: Evaluate reports based on data published in the media by identifying the source of the data, the design of the study, and the way the data are analyzed and displayed. Show how graphs and data can be distorted to support different points of view. Know how to use spreadsheet tables and 9.4.2.1 graphs or graphing technology to recognize and analyze distortions in data displays. Explain the uses of data and statistical thinking to draw For example: Shifting data on the vertical axis can make relative changes appear inferences, make deceptively large. predictions and justify Identify and explain misleading uses of data; recognize when arguments conclusions. 9.4.2.2 based on data confuse correlation and causation. 9.4.2.3 Explain the impact of sampling methods, bias and the phrasing of questions asked during data collection. Describe a data set using data displays, such as box-and-whisker plots; describe and compare data sets using summary statistics, including measures of center, location and spread. Measures of center and location 9.4.1.1 include mean, median, quartile and percentile. Measures of spread include standard deviation, range and inter-quartile range. Know how to use calculators, spreadsheets or other technology to display data and calculate summary statistics. Analyze the effects on summary statistics of changes in data sets. For example: Understand how inserting or deleting a data point may affect the mean and 9.4.1.2 standard deviation. Display and analyze data; use various measures associated with data to draw conclusions, identify trends and describe relationships. Another example: Understand how the median and interquartile range are affected when the entire data set is transformed by adding a constant to each data value or multiplying each data value by a constant. Use scatterplots to analyze patterns and describe relationships between two variables. Using technology, determine regression lines (line of best 9.4.1.3 fit) and correlation coefficients; use regression lines to make predictions and correlation coefficients to assess the reliability of those predictions. Use the mean and standard deviation of a data set to fit it to a normal distribution (bell-shaped curve) and to estimate population percentages. Recognize that there are data sets for which such a procedure is not appropriate. Use calculators, spreadsheets and tables to estimate areas under the normal curve. 9.4.1.4 For example: After performing several measurements of some attribute of an irregular physical object, it is appropriate to fit the data to a normal distribution and draw conclusions about measurement error. Another example: When data involving two very different populations is combined, the resulting histogram may show two distinct peaks, and fitting the data to a normal distribution is not appropriate. Curtis Jendro [email protected] Katie Garrity [email protected] Jean Benner Day 1 2 Unit Plan (9-12 Mathematics) Topic and Activity Description Materials and Handouts Students will work in small groups to decide how • Example A, page 77 of Discovering to report the typical backpack weight using the Advanced Algebra: an Investigative data provided in the Discovering Advanced Approach by Murdock, Kamischke, and Algebra textbook (example A on page 77). Kamischke (Key Curriculum Press 2004) Students will report their findings to the class. • Data and Statistics Calculator Instructions Together, the class will review measures of handout central tendency. • Graphing calculators Direct instruction: using TI-83, TI-83 plus, or TI84 calculators to enter data into lists and calculate one-variable statistics. Activity: Making the Data. In small groups, students will create data sets that have given statistics. Students will fill out surveys about themselves to be used in class later. (As students fill out surveys and collect data, facilitate a discussion on consistency and accuracy. What should we round to? What units should be used? Is it okay to give more than one answer?) 3 4 5 Topic: stem-and-leaf plots and histograms Raisin Activity: without looking, students predict how many raisins will be in a box. The class will compile data about the guesses and actual results and construct a stem-and-leaf plot and a histogram. Topic: box-and-whisker plots, interquartile range, range, percentile rank Activity: Students will record how long they can balance on each foot with eyes closed. Data will be compiled by gender and by foot (e.g. male left foot, female right foot) so that students can make box-and-whisker plots and make comparisons. Guided exploration: Measures of Spread worksheet Using an example about test scores, students will discover a need for a new statistic that describes the variability in data. The lesson develops the formula for standard deviation. Direct instruction: The Normal Curve • Making the Data Activity (Navigating through Data Analysis in Grades 6-8, NCTM 2003) • Student Data Survey • Raisin Activity http://score.kings.k12.ca.us/lessons/Raisin Cane.html - by Rob Roy • “Balancing Act” - Data: Kids, Cats, And Ads (Investigations in Number, Data, and Space) (Paperback) by Andee Rubin (Author) - Dale Seymour Publications; Teacher edition (December 31, 1998) • Graph paper • Graphing calculators • Measures of Spread worksheet • The Normal Curve worksheet • Graphing calculators 6 7 Batteries activity Students use data about batteries for graphing calculators to decide which product to buy. Creating and interpreting scatter plots (including Challenger data and calculator steps). Students will create scatter plots using data collected from class survey. 8 9 10 11 12 13 14 15 16 17 18 19 Direct instruction: randomness, correlation vs. causation, regression lines Students will use data collected from class survey to explore relationships and determine regression lines. Marble Rolling Activity Data analysis: murder data Students will draw conclusions from a controversial set of data and present their findings to the class. (Optional: collaborate with social studies or science department for a different topic.) Discussion of bias and distortion in data displays. Nickel-flicking activity In small groups, students will analyze data to decide which baseball player is the best homerun hitter. Then they will rank the remaining players, giving reasons for their choices. Data Trial introduction and topic brainstorming. Data Trial data collection and analysis. Data Trial data collection and analysis. Develop rubric as a class for assessment. Trial #1 presents their cases to the jury. Trial #2 presents their cases to the jury. Class discussion of verdicts, findings, distortions, bias, data displays, accuracy, and conclusions. Posttest • “Batteries” - Navigating through Data Analysis in Grades 6–8; By George W. Bright, Wallece Brewer, Kay McClain, and Edward S. Mooney Published: 3/25/2003 • Data and Statistics Calculator Instructions handout • Graphing calculators • The Challenger Space Shuttle – “Risk analysis of the space shuttle: PreChallenger prediction of failure,” Journal of the American Statistical Association, Vol. 84, pages 945-957 by S.R. Dalal, E.B. Fowlkes, and B. Hoadley • Data and Statistics Calculator Instructions handout • Graphing calculators • “The Marbleous Rolls” – by Arthur Wiebe – AIMS Magazine 1993 (No.1, pp 42-45) – AIMS Education Foundation • Marbelous Form handout • Murder Data • Murder Data explanations http://faculty.bemidjistate.edu/dwebb/mat h5962/math5962.htm • “Flick the Nick,” Addison-Wesley Publishing Company, Inc./Published by Dale Seymour Publications • Home Run Hitter Activity • Data Trial • Data Trial • Data Trial • Data Trial • Data Trial • Data Trial • Pre/Post Test Name Name Sex Sex Age in months Age in months Foot length cm Foot length cm Height cm Height cm Forearm length cm Forearm length cm Number of pets Number of pets Miles to school one way Miles to school one way # of hours spent on homework per week Shoe size # of hours spent on homework per week Shoe size Number of siblings Number of siblings Left foot time Left foot time Right foot time Right foot time Number of books in locker at this time Favorite color m&m Number of books in locker at this time Favorite color m&m Number of hours spent watching TV per week Cumulative grade point average Number of hours spent watching TV per week Cumulative grade point average Cost of last haircut Cost of last haircut Name: _____________________ How Many Raisins Are in Your Box? Guess how many raisins are in the snack-sized box: _________ (guess) Open the lid of the box and make another guess: ___________ (estimate) Put the actual number of raisins in the box here: ___________ (actual) Stem and Leaf: Mean: Histogram: __________ Median: __________ Mode: __________ Note some observations here: ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ Name: Hour: Measures of Spread 1. Two students' test scores for the semester are shown below. Curtis: 93, 77, 66, 99, 83, 74, 89 Jean: 83, 84, 82, 85, 84, 81, 82 a. Use statistics to describe each student’s performance. How are they alike? Which measure(s) of central tendency show the similarities? b. How are they different? Which measure(s) of central tendency show the differences? In addition to measures of center, we should also consider variability in a data set. A data set that is very spread out has a lot of variability. Another term for variability is dispersion, which stems from the word “disperse.” Imagine you are dispersing birdseed on the ground. The seeds will be spread out, so we can say they have a large amount of dispersion (or a large amount of variability). 2. Consider the average daily temperature in Fargo, ND compared to the average daily temperature in Los Angeles, CA. Los Angeles, CA Fargo, ND January 58 6 February 60 12 March 61 26 April 64 44 May 66 56 June 70 66 July 74 71 August 75 68 September 74 58 October 70 46 November 64 28 December 58 12 a. Calculate the mean of each data set. b. Which data set has more variability? c. To quantify the variability, we first consider how different each entry is from the mean, or the deviations. Complete the table below. Month Los Angeles, CA Fargo, ND Actual Deviation Actual Deviation January 58 6 February 60 12 March 61 26 April 64 44 May 66 56 June 70 66 July 74 71 August 75 68 September 74 58 October 70 46 November 64 28 December 58 12 d. What is the average deviation for each data set? e. In this example, is the average deviation a good measure of variability? Will this be true for every data set? Explain. f. How can you adjust your data to improve the usefulness of the average deviation? g. Statisticians use standard deviation to describe variability as a single value. With your class, calculate the standard deviation of each data set by hand. 3. Recall Jean and Curtis’s test scores. Which data set will have a larger standard deviation? Verify your prediction by calculating the standard deviation for each student’s test scores. Curtis: {93, 77, 66, 99, 83, 74, 89} Jean: {83, 84, 82, 85, 84, 81, 82} The Normal Curve One of the most common applications of standard deviation is its use in the bell-shaped curve. This is an example of a special type of curve called a normal curve or normal distribution. x - 3s x - 2s x-s x x+s x + 2s x + 3s An interesting characteristic of the normal distribution is that the mean, or x , is equal to the median and the mode. Another useful feature of the normal distribution is that certain percentages of scores fall at predictable distances from the mean. These distances are measured in standard deviations (s) from the mean. The normal curve is constructed such that approximately 68% of scores fall within one standard deviation from the mean (the area under the curve between x - s and x + s), approximately 95% of scores fall within two standard deviations of the mean (the area between x - 2s and x + 2s), and approximately 99.7% of scores fall within three standard deviations of the mean (the area between x - 3s and x + 3s). Use this information to construct a normal curve with a mean of 200 and a standard deviation of 25. Label the areas with percents and the x axis with the mean and the values that represent scores 1, 2, and 3 standard deviations from the mean. What percent of scores are below 150? What percent of scores fall between 175 and 225? If the curve represents a population with 360 members, how many have a score of less than 200? How many members of the population have a score greater than 225? Go to http://www.ms.uky.edu/%7Emai/java/stat/GaltonMachine.html to see a demonstration of a naturally occurring normal curve. • • • • Data and Statistics with a TI-83, TI-83 Plus, or TI-84 Calculator Entering Data into a List To access the statistics menu, press STAT. When you are beginning, your data will not be in your calculator. Select 1:Edit to edit your lists. The list editor looks and behaves like a spreadsheet. To clear an existing list, arrow up to the list name (L1, L2, etc.) and press CLEAR. (Note: If you press DEL instead, the list heading will disappear along with the values in the list. You can insert a list the same way you would insert any other symbol by moving the cursor to the column where you would like to insert a list and pressing 2nd [INS]. The list names L1-L6 can be found above the numbers 1 through 6. Press 2 nd to access these keys.) Enter your data into one of the existing lists by pressing ENTER between each entry. The display in the bottom left corner of the screen will help you stay organized. The display “L1(31)=” in the figure to the right indicates that the blank space highlighted in list 1 is the 31st entry in the list (so there are 30 pieces of data in your list). Sorting Data It is usually helpful if data is listed in order. To sort your list(s): • The “2:SortA(” command will place your data in ascending order. (Similarly, “3:SortD(” will sort in descending order.) Selecting a sort command will take you to the home screen. • The open parentheses indicate the calculator needs you to enter more instructions. Type in the name of the list to be sorted (usually found above the numbers 1 through 6), close your parentheses, and press ENTER. When you return to the list editor, your list will display the data in order. Calculating Statistics • Press STAT to access the statistics menu. • We want the calculator to do the calculations for us. Arrow to the right to highlight “CALC.” • To determine the measures of central tendency, select 1:1-Var Stats. The calculator will take you to the home screen. • Unless you tell it otherwise, the calculator will default to calculating statistics on list 1. To run statistics on another list, enter the list name and press ENTER. (As you start to use more lists, it is easy to be confused about which set of data your statistics are from. It’s a good idea to get into the habit of always selecting a list for statistics. The statistics that are produced take up more than one calculator screen. The arrow at the bottom of the screen indicates more information is below. Use your arrow keys to scroll up and down through the statistics. (Once you enter other commands, you cannot scroll through the stats anymore. To save time, press 2nd e until the 1-Var Stats command appears and press e.) x : mean minX: minimum x : sum of the data Q1: first quartile 2 ! x : sum of the squares of the data Med: median (2nd quartile) Sx: sample standard deviation Q3: third quartile maxX= maximum ! x: population standard deviation n: sample size ! Graphing Data To access the stat plot menu, press 2nd Y=. You can plot up to three sets of data at a time. A summary of the current settings will display on your screen. • Press ENTER to change the settings on the first graph. • Select “on” by pressing enter. Arrow down to “type:” and use the left and right arrows to select the graph type. Box and Whisker Plots • To create a box and whisker plot, select the icon in the middle of the second row (see figure below). If you choose the box plot with dots, your graph will indicate outliers with points instead of including them on the plot. • After Xlist, enter the name of the list you want to graph. (For L1-L6, use 2nd and the numbers 1 through 6. For named lists, press 2 nd STAT to see all of the possible lists. Press ENTER to select your list.) Scatter Plots • To create a scatter plot, select the first graph type. • After Xlist, enter the name of the list you want to graph along the x-axis. • After Ylist, enter the name of the list you want to graph along the y-axis. • Select the mark that will represent each point. (When graphing multiple plots, be sure to select different marks for each graph.) Viewing the Plot • To ensure your window will fit your data, press WINDOW and adjust your settings, or press ZOOM and select 9:ZoomStat, and the calculator will adjust the window for you. • To view the graph, press GRAPH. While viewing the graph, you can identify specific values by pressing TRACE and using your arrow keys to move around the screen. The up and down arrows will move the cursor to a different plot. The left and right arrows move the cursor left and right along the same plot. Regression Equations • To find a regression equation that fits a data set, first enter the data into lists and create a scatter plot. • Turn the diagnostic on so your calculator will produce a correlation coefficient. Access the catalog by pressing 2nd zero. The “A” in the top right corner indicates the alpha lock is on. Press “D” to skip to the commands that begin with the letter D. Select DiagnosticOn and press ENTER. • Access the statistics menu by pressing STAT. • Arrow over to highlight the “calc” menu. • Select the type of function you believe will fit your data. (If your plot appears linear, select 4:LinReg(ax +b). For parabola-shaped graphs, 5:QuadReg would be a better choice.) This will take you to the home screen. • After the regression command, enter the list you used as your Xlist in the scatter plot. • Press the comma key to separate the commands. • Enter the list you used as your Ylist in the scatter plot. Press the comma key. Press VARS, arrow to the right to highlight “Y-Vars,” and select 1:Function. Then select one of the functions. Your regression equation will be pasted here so that you can graph it along with your scatter plot. (If you have other equations listed in these functions that you want to keep, be sure to select a different function.) • The coefficients of the equation will appear on the screen along with the correlation coefficient. To view the graph along with the scatter plot, press GRAPH. • Marble Ace: ____________________ Measure distances rolled on the carpet to the nearest 0.5 centimeters. Compute the ratio: (mean distance rolled on the carpet) / (distance rolled on plane). Distance Rolled on Inclined Plane Distances Rolled on Carpet (cm) Marble A B C D E F Range of Distances (cm) G 15cm 30cm 45cm 60cm 75cm Prediction of median and mean distances rolled on the carpet for 90 centimeters on the inclined plane 90cm Record any observations here: Median Distance (cm) Mean Distance (cm) Carpet Plane Ratio to the nearest tenth Who Was the Greatest Home Run Hitter? The following table lists five of the greatest home run hitters in the United States with the number of home runs each hit. Babe Ruth Year HR 1915 4 1918 11 1919 29 1920 54 1921 59 1922 35 1923 41 1924 46 1925 25 1926 47 1927 60 1928 54 1929 46 1930 49 1931 46 1932 41 1933 34 1934 22 Hank Aaron Year HR 1955 27 1957 44 1958 30 1959 39 1960 40 1961 34 1962 45 1963 44 1964 24 1965 32 1966 44 1967 39 1968 29 1969 44 1970 38 1971 47 1972 34 1973 40 Barry Bonds Year HR 1988 24 1990 33 1992 34 1993 46 1994 37 1995 33 1996 42 1997 40 1998 37 2000 49 2001 73 2002 46 2003 45 2004 45 Lou Gehrig Year HR 1925 20 1926 16 1927 47 1928 27 1929 35 1930 41 1931 46 1932 34 1933 32 1934 49 1935 30 1936 49 1937 37 1938 29 Mickey Mantle Year HR 1952 23 1953 21 1954 27 1955 37 1956 52 1957 34 1958 42 1959 31 1960 40 1961 54 1962 30 1964 35 1967 22 1. Study these records. Which player is the greatest home run hitter? Why did your group choose this player? 2. Rank the five players. You may wish to compute means, medians, quartiles, create line plots, stem-and-leaf plots, box & whisker plots, or plots over time. Describe the reasons for your findings. Beyond a Statistical Doubt As you have seen, statistics can be interpreted in many different ways. When making decisions, it’s important to consider the source, sampling procedures, and the way the data were analyzed and displayed. In this project, you will be the statisticians in a trial charged with the task of proving your case with data. Like a real trial, both the prosecution and the defense will have access to the same data. It is up to your team to present the data in a way that supports your case. In court, juries are made up of average citizens, so your presentation should include clear data displays and thorough explanations. You will work on one case as part of a prosecution or defense team. On the second case, you will be a jury for your peers. The Trial As a class, we will decide on two trial topics. A few examples are given below to get you started. Kanye East is suing ABC Gum company for racial discrimination. Kanye claims the company doesn’t care about black people. James Sawyer is suing Kate’s Locke Company for gender discrimination. Sawyer claims he is qualified to enter codes into any lock, but he was not hired because of his gender. Paris Hyatt is charged with using marijuana. Paris claims she should not be punished because marijuana is not harmful. Alone or with a group, brainstorm three or more topics for a trial. It will be important to select topics that could be argued by both the defense and the prosecution. Da ta Colle ction With your team, compile a list of data you will need to make your case along with possible sources for finding the data. Consider all variables you will want to find out about so you can do a complete analysis of the situation. For example, in a case about discrimination in promotions, the statisticians would be interested in not only gender and race, but also factors like education, experience, past performance, age, etc. When the list of data to obtain has been finalized, your group will be responsible for locating the information. Be sure to cite your sources! The jury will want to know where the information came from. If necessary data cannot be found, a class simulation may be done if time permits. Da ta An alys is Using any of the data collected for the trial (by your group or your opponents), prepare your case. You may use any data displays that support your claim. Prepare visual aids for your presentation, and write a summary of your findings. You may want to think about how you will counter the opposition’s arguments so your data will be ready. You will present your case in front of the class and a jury of your peers. Rubric You will be evaluated both on your presentation of the data for your case as a statistician and on your critical analysis as a juror. We will develop the scoring rubric based on criteria for high quality work. Data Displays Written Summary Accuracy of Calculations Reasonableness of Conclusions