Download Data Analysis and Assessment Katie Jean Curtis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Data Analysis and
Assessment
By
Katie
Jean
&
Curtis
Data and Statistics Project Standards (9-12)
Minnesota State Standards:
Evaluate reports based on data published in the media by identifying the
source of the data, the design of the study, and the way the data are
analyzed and displayed. Show how graphs and data can be distorted to
support different points of view. Know how to use spreadsheet tables and
9.4.2.1 graphs or graphing technology to recognize and analyze distortions in data
displays.
Explain the uses of
data and statistical
thinking to draw
For example: Shifting data on the vertical axis can make relative changes appear
inferences, make
deceptively large.
predictions and justify
Identify and explain misleading uses of data; recognize when arguments
conclusions.
9.4.2.2
based on data confuse correlation and causation.
9.4.2.3
Explain the impact of sampling methods, bias and the phrasing of
questions asked during data collection.
Describe a data set using data displays, such as box-and-whisker plots;
describe and compare data sets using summary statistics, including
measures of center, location and spread. Measures of center and location
9.4.1.1 include mean, median, quartile and percentile. Measures of spread include
standard deviation, range and inter-quartile range. Know how to use
calculators, spreadsheets or other technology to display data and calculate
summary statistics.
Analyze the effects on summary statistics of changes in data sets.
For example: Understand how inserting or deleting a data point may affect the mean and
9.4.1.2 standard deviation.
Display and analyze
data; use various
measures associated
with data to draw
conclusions, identify
trends and describe
relationships.
Another example: Understand how the median and interquartile range are affected when
the entire data set is transformed by adding a constant to each data value or multiplying
each data value by a constant.
Use scatterplots to analyze patterns and describe relationships between
two variables. Using technology, determine regression lines (line of best
9.4.1.3
fit) and correlation coefficients; use regression lines to make predictions
and correlation coefficients to assess the reliability of those predictions.
Use the mean and standard deviation of a data set to fit it to a normal
distribution (bell-shaped curve) and to estimate population percentages.
Recognize that there are data sets for which such a procedure is not
appropriate. Use calculators, spreadsheets and tables to estimate areas
under the normal curve.
9.4.1.4
For example: After performing several measurements of some attribute of an irregular
physical object, it is appropriate to fit the data to a normal distribution and draw
conclusions about measurement error.
Another example: When data involving two very different populations is combined, the
resulting histogram may show two distinct peaks, and fitting the data to a normal
distribution is not appropriate.
Curtis Jendro [email protected]
Katie Garrity [email protected]
Jean Benner
Day
1
2
Unit Plan (9-12 Mathematics)
Topic and Activity Description
Materials and Handouts
Students will work in small groups to decide how • Example A, page 77 of Discovering
to report the typical backpack weight using the
Advanced Algebra: an Investigative
data provided in the Discovering Advanced
Approach by Murdock, Kamischke, and
Algebra textbook (example A on page 77).
Kamischke (Key Curriculum Press 2004)
Students will report their findings to the class.
• Data and Statistics Calculator Instructions
Together, the class will review measures of
handout
central tendency.
• Graphing calculators
Direct instruction: using TI-83, TI-83 plus, or TI84 calculators to enter data into lists and
calculate one-variable statistics.
Activity: Making the Data. In small groups,
students will create data sets that have given
statistics.
Students will fill out surveys about themselves to
be used in class later. (As students fill out
surveys and collect data, facilitate a discussion
on consistency and accuracy. What should we
round to? What units should be used? Is it okay
to give more than one answer?)
3
4
5
Topic: stem-and-leaf plots and histograms
Raisin Activity: without looking, students predict
how many raisins will be in a box. The class will
compile data about the guesses and actual
results and construct a stem-and-leaf plot and a
histogram.
Topic: box-and-whisker plots, interquartile
range, range, percentile rank
Activity: Students will record how long they can
balance on each foot with eyes closed. Data will
be compiled by gender and by foot (e.g. male
left foot, female right foot) so that students can
make box-and-whisker plots and make
comparisons.
Guided exploration: Measures of Spread
worksheet
Using an example about test scores, students
will discover a need for a new statistic that
describes the variability in data. The lesson
develops the formula for standard deviation.
Direct instruction: The Normal Curve
• Making the Data Activity (Navigating
through Data Analysis in Grades 6-8,
NCTM 2003)
• Student Data Survey
• Raisin Activity
http://score.kings.k12.ca.us/lessons/Raisin
Cane.html - by Rob Roy
• “Balancing Act” - Data: Kids, Cats, And
Ads (Investigations in Number, Data, and
Space) (Paperback) by Andee Rubin
(Author) - Dale Seymour Publications;
Teacher edition (December 31, 1998)
• Graph paper
• Graphing calculators
• Measures of Spread worksheet
• The Normal Curve worksheet
• Graphing calculators
6
7
Batteries activity
Students use data about batteries for graphing
calculators to decide which product to buy.
Creating and interpreting scatter plots (including
Challenger data and calculator steps).
Students will create scatter plots using data
collected from class survey.
8
9
10
11
12
13
14
15
16
17
18
19
Direct instruction: randomness, correlation vs.
causation, regression lines
Students will use data collected from class
survey to explore relationships and determine
regression lines.
Marble Rolling Activity
Data analysis: murder data
Students will draw conclusions from a
controversial set of data and present their
findings to the class. (Optional: collaborate with
social studies or science department for a
different topic.)
Discussion of bias and distortion in data displays.
Nickel-flicking activity
In small groups, students will analyze data to
decide which baseball player is the best
homerun hitter. Then they will rank the
remaining players, giving reasons for their
choices.
Data Trial introduction and topic brainstorming.
Data Trial data collection and analysis.
Data Trial data collection and analysis. Develop
rubric as a class for assessment.
Trial #1 presents their cases to the jury.
Trial #2 presents their cases to the jury.
Class discussion of verdicts, findings, distortions,
bias, data displays, accuracy, and conclusions.
Posttest
• “Batteries” - Navigating through Data
Analysis in Grades 6–8; By George W.
Bright, Wallece Brewer, Kay McClain, and
Edward S. Mooney Published: 3/25/2003
• Data and Statistics Calculator Instructions
handout
• Graphing calculators
• The Challenger Space Shuttle – “Risk
analysis of the space shuttle: PreChallenger prediction of failure,” Journal
of the American Statistical Association,
Vol. 84, pages 945-957 by S.R. Dalal, E.B.
Fowlkes, and B. Hoadley
• Data and Statistics Calculator Instructions
handout
• Graphing calculators
• “The Marbleous Rolls” – by Arthur Wiebe
– AIMS Magazine 1993 (No.1, pp 42-45)
– AIMS Education Foundation
• Marbelous Form handout
• Murder Data
• Murder Data explanations
http://faculty.bemidjistate.edu/dwebb/mat
h5962/math5962.htm
• “Flick the Nick,” Addison-Wesley
Publishing Company, Inc./Published by
Dale Seymour Publications
• Home Run Hitter Activity
• Data Trial
• Data Trial
• Data Trial
• Data Trial
• Data Trial
• Data Trial
• Pre/Post Test
Name
Name
Sex
Sex
Age in months
Age in months
Foot length cm
Foot length cm
Height cm
Height cm
Forearm length cm
Forearm length cm
Number of pets
Number of pets
Miles to school
one way
Miles to school
one way
# of hours spent on
homework per
week
Shoe size
# of hours spent on
homework per
week
Shoe size
Number of siblings
Number of siblings
Left foot time
Left foot time
Right foot time
Right foot time
Number of books
in locker at this
time
Favorite color
m&m
Number of books
in locker at this
time
Favorite color
m&m
Number of hours
spent watching TV
per week
Cumulative grade
point average
Number of hours
spent watching TV
per week
Cumulative grade
point average
Cost of last haircut
Cost of last haircut
Name: _____________________
How Many Raisins Are in Your Box?
Guess how many raisins are in the snack-sized box: _________ (guess)
Open the lid of the box and make another guess: ___________ (estimate)
Put the actual number of raisins in the box here: ___________ (actual)
Stem and Leaf:
Mean:
Histogram:
__________
Median: __________
Mode:
__________
Note some observations here:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Name:
Hour:
Measures of Spread
1. Two students' test scores for the semester are shown below.
Curtis: 93, 77, 66, 99, 83, 74, 89
Jean: 83, 84, 82, 85, 84, 81, 82
a. Use statistics to describe each student’s performance. How are they alike?
Which measure(s) of central tendency show the similarities?
b. How are they different? Which measure(s) of central tendency show the
differences?
In addition to measures of center, we should also consider variability in a data set. A data
set that is very spread out has a lot of variability. Another term for variability is dispersion,
which stems from the word “disperse.” Imagine you are dispersing birdseed on the ground.
The seeds will be spread out, so we can say they have a large amount of dispersion (or a large
amount of variability).
2. Consider the average daily temperature in Fargo, ND compared to the average daily
temperature in Los Angeles, CA.
Los Angeles, CA
Fargo, ND
January
58
6
February
60
12
March
61
26
April
64
44
May
66
56
June
70
66
July
74
71
August
75
68
September
74
58
October
70
46
November
64
28
December
58
12
a. Calculate the mean of each data set.
b. Which data set has more variability?
c. To quantify the variability, we first consider how different each entry is from
the mean, or the deviations. Complete the table below.
Month
Los Angeles, CA
Fargo, ND
Actual
Deviation
Actual
Deviation
January
58
6
February
60
12
March
61
26
April
64
44
May
66
56
June
70
66
July
74
71
August
75
68
September
74
58
October
70
46
November
64
28
December
58
12
d. What is the average deviation for each data set?
e. In this example, is the average deviation a good measure of variability? Will this
be true for every data set? Explain.
f. How can you adjust your data to improve the usefulness of the average
deviation?
g. Statisticians use standard deviation to describe variability as a single value.
With your class, calculate the standard deviation of each data set by hand.
3. Recall Jean and Curtis’s test scores. Which data set will have a larger standard deviation?
Verify your prediction by calculating the standard deviation for each student’s test scores.
Curtis: {93, 77, 66, 99, 83, 74, 89}
Jean: {83, 84, 82, 85, 84, 81, 82}
The Normal Curve
One of the most common applications of standard deviation is its use in the bell-shaped curve.
This is an example of a special type of curve called a normal curve or normal distribution.
x - 3s
x - 2s
x-s
x
x+s
x + 2s x + 3s
An interesting characteristic of the normal distribution is that the mean, or x , is equal to the
median and the mode. Another useful feature of the normal distribution is that certain
percentages of scores fall at predictable distances from the mean. These distances are
measured in standard deviations (s) from the mean. The normal curve is constructed such that
approximately 68% of scores fall within one standard deviation from the mean (the area under
the curve between x - s and x + s), approximately 95% of scores fall within two standard
deviations of the mean (the area between x - 2s and x + 2s), and approximately 99.7% of scores
fall within three standard deviations of the mean (the area between x - 3s and x + 3s).
Use this information to construct a normal curve with a mean of 200 and a standard deviation of
25. Label the areas with percents and the x axis with the mean and the values that represent
scores 1, 2, and 3 standard deviations from the mean.
What percent of scores are below 150?
What percent of scores fall between 175 and 225?
If the curve represents a population with 360 members, how many have a score of less than 200?
How many members of the population have a score greater than 225?
Go to http://www.ms.uky.edu/%7Emai/java/stat/GaltonMachine.html to see a demonstration of
a naturally occurring normal curve.
•
•
•
•
Data and Statistics with a TI-83, TI-83 Plus, or TI-84 Calculator
Entering Data into a List
To access the statistics menu, press STAT.
When you are beginning, your data will not be in your calculator. Select
1:Edit to edit your lists. The list editor looks and behaves like a
spreadsheet.
To clear an existing list, arrow up to the list name (L1, L2, etc.) and
press CLEAR. (Note: If you press DEL instead, the list heading will
disappear along with the values in the list. You can insert a list the same
way you would insert any other symbol by moving the cursor to the column
where you would like to insert a list and pressing 2nd [INS]. The list names
L1-L6 can be found above the numbers 1 through 6. Press 2 nd to access
these keys.)
Enter your data into one of the existing lists by pressing ENTER between
each entry. The display in the bottom left corner of the screen will help
you stay organized. The display “L1(31)=” in the figure to the right
indicates that the blank space highlighted in list 1 is the 31st entry in the list
(so there are 30 pieces of data in your list).
Sorting Data
It is usually helpful if data is listed in order. To sort your list(s):
• The “2:SortA(” command will place your data in ascending order. (Similarly,
“3:SortD(” will sort in descending order.) Selecting a sort command will
take you to the home screen.
• The open parentheses indicate the calculator needs you to enter more
instructions. Type in the name of the list to be sorted (usually found above
the numbers 1 through 6), close your parentheses, and press ENTER.
When you return to the list editor, your list will display the data in order.
Calculating Statistics
• Press STAT to access the statistics menu.
• We want the calculator to do the calculations for us. Arrow to the right to
highlight “CALC.”
• To determine the measures of central tendency, select 1:1-Var Stats. The
calculator will take you to the home screen.
• Unless you tell it otherwise, the calculator will default to calculating
statistics on list 1. To run statistics on another list, enter the list name and
press ENTER. (As you start to use more lists, it is easy to be confused
about which set of data your statistics are from. It’s a good idea to get into
the habit of always selecting a list for statistics.
The statistics that are produced take up more than one calculator screen. The arrow at the bottom of the
screen indicates more information is below. Use your arrow keys to scroll up and down through the
statistics. (Once you enter other commands, you cannot scroll through the stats anymore. To save time,
press 2nd e until the 1-Var Stats command appears and press e.)
x : mean
minX: minimum
x : sum of the data
Q1: first quartile
2
! x : sum of the squares of the data
Med: median (2nd quartile)
Sx: sample standard deviation
Q3: third quartile
maxX= maximum
! x: population standard deviation
n: sample size
!
Graphing Data
To access the stat plot menu, press 2nd Y=. You can plot up to three sets of data
at a time. A summary of the current settings will display on your screen.
• Press ENTER to change the settings on the first graph.
• Select “on” by pressing enter. Arrow down to “type:” and use the left and right
arrows to select the graph type.
Box and Whisker Plots
• To create a box and whisker plot, select the icon in the middle of the second row
(see figure below). If you choose the box plot with dots, your graph will indicate
outliers with points instead of including them on the plot.
• After Xlist, enter the name of the list you want to graph. (For L1-L6, use 2nd and
the numbers 1 through 6. For named lists, press 2 nd STAT to see all of the
possible lists. Press ENTER to select your list.)
Scatter Plots
• To create a scatter plot, select the first graph type.
• After Xlist, enter the name of the list you want to graph along the x-axis.
• After Ylist, enter the name of the list you want to graph along the y-axis.
• Select the mark that will represent each point. (When graphing multiple plots, be
sure to select different marks for each graph.)
Viewing the Plot
• To ensure your window will fit your data, press WINDOW and adjust your
settings, or press ZOOM and select 9:ZoomStat, and the calculator will adjust
the window for you.
• To view the graph, press GRAPH.
While viewing the graph, you can identify specific values by pressing TRACE and using
your arrow keys to move around the screen. The up and down arrows will move the
cursor to a different plot. The left and right arrows move the cursor left and right along
the same plot.
Regression Equations
• To find a regression equation that fits a data set, first enter the data into lists and
create a scatter plot.
• Turn the diagnostic on so your calculator will produce a correlation coefficient.
Access the catalog by pressing 2nd zero. The “A” in the top right corner indicates
the alpha lock is on. Press “D” to skip to the commands that begin with the letter
D. Select DiagnosticOn and press ENTER.
• Access the statistics menu by pressing STAT.
• Arrow over to highlight the “calc” menu.
• Select the type of function you believe will fit your data. (If your plot appears
linear, select 4:LinReg(ax +b). For parabola-shaped graphs, 5:QuadReg would be
a better choice.) This will take you to the home screen.
• After the regression command, enter the list you used as your Xlist in the scatter
plot.
• Press the comma key to separate the commands.
• Enter the list you used as your Ylist in the scatter plot. Press the comma key.
Press VARS, arrow to the right to highlight “Y-Vars,” and select 1:Function. Then
select one of the functions. Your regression equation will be pasted here so that
you can graph it along with your scatter plot. (If you have other equations listed
in these functions that you want to keep, be sure to select a different function.)
• The coefficients of the equation will appear on the screen along with the
correlation coefficient. To view the graph along with the scatter plot, press
GRAPH.
•
Marble Ace: ____________________
Measure distances rolled on the carpet to the nearest 0.5 centimeters.
Compute the ratio: (mean distance rolled on the carpet) / (distance rolled on plane).
Distance
Rolled on
Inclined
Plane
Distances Rolled on
Carpet (cm)
Marble
A
B
C
D
E
F
Range of
Distances
(cm)
G
15cm
30cm
45cm
60cm
75cm
Prediction of median and mean distances rolled
on the carpet for 90 centimeters on the inclined plane
90cm
Record any observations here:
Median
Distance
(cm)
Mean
Distance
(cm)
Carpet
Plane
Ratio
to the
nearest tenth
Who Was the Greatest Home Run Hitter?
The following table lists five of the greatest home run hitters in the United States with the
number of home runs each hit.
Babe Ruth
Year
HR
1915
4
1918
11
1919
29
1920
54
1921
59
1922
35
1923
41
1924
46
1925
25
1926
47
1927
60
1928
54
1929
46
1930
49
1931
46
1932
41
1933
34
1934
22
Hank Aaron
Year
HR
1955
27
1957
44
1958
30
1959
39
1960
40
1961
34
1962
45
1963
44
1964
24
1965
32
1966
44
1967
39
1968
29
1969
44
1970
38
1971
47
1972
34
1973
40
Barry Bonds
Year
HR
1988
24
1990
33
1992
34
1993
46
1994
37
1995
33
1996
42
1997
40
1998
37
2000
49
2001
73
2002
46
2003
45
2004
45
Lou Gehrig
Year
HR
1925
20
1926
16
1927
47
1928
27
1929
35
1930
41
1931
46
1932
34
1933
32
1934
49
1935
30
1936
49
1937
37
1938
29
Mickey Mantle
Year
HR
1952
23
1953
21
1954
27
1955
37
1956
52
1957
34
1958
42
1959
31
1960
40
1961
54
1962
30
1964
35
1967
22
1. Study these records. Which player is the greatest home run hitter? Why did your
group choose this player?
2. Rank the five players. You may wish to compute means, medians, quartiles,
create line plots, stem-and-leaf plots, box & whisker plots, or plots over time.
Describe the reasons for your findings.
Beyond a Statistical Doubt
As you have seen, statistics can be interpreted in many different ways. When making
decisions, it’s important to consider the source, sampling procedures, and the way the data
were analyzed and displayed. In this project, you will be the statisticians in a trial charged with
the task of proving your case with data. Like a real trial, both the prosecution and the
defense will have access to the same data. It is up to your team to present the data in a way
that supports your case. In court, juries are made up of average citizens, so your
presentation should include clear data displays and thorough explanations. You will work on
one case as part of a prosecution or defense team. On the second case, you will be a jury for
your peers.
The Trial
As a class, we will decide on two trial topics. A few examples are given below to get you
started.

Kanye East is suing ABC Gum company for racial discrimination. Kanye claims the
company doesn’t care about black people.

James Sawyer is suing Kate’s Locke Company for gender discrimination. Sawyer claims
he is qualified to enter codes into any lock, but he was not hired because of his
gender.

Paris Hyatt is charged with using marijuana. Paris claims she should not be punished
because marijuana is not harmful.
Alone or with a group, brainstorm three or more topics for a trial. It will be important to select
topics that could be argued by both the defense and the prosecution.



Da ta Colle ction
With your team, compile a list of data you will need to make your case along with possible
sources for finding the data. Consider all variables you will want to find out about so you can
do a complete analysis of the situation. For example, in a case about discrimination in
promotions, the statisticians would be interested in not only gender and race, but also factors
like education, experience, past performance, age, etc.
When the list of data to obtain has been finalized, your group will be responsible for locating
the information. Be sure to cite your sources! The jury will want to know where the
information came from. If necessary data cannot be found, a class simulation may be done if
time permits.
Da ta An alys is
Using any of the data collected for the trial (by your group or your opponents), prepare
your case. You may use any data displays that support your claim. Prepare visual aids for
your presentation, and write a summary of your findings. You may want to think about how
you will counter the opposition’s arguments so your data will be ready. You will present your
case in front of the class and a jury of your peers.
Rubric
You will be evaluated both on your presentation of the data for your case as a statistician
and on your critical analysis as a juror. We will develop the scoring rubric based on criteria for
high quality work.
Data Displays
Written Summary
Accuracy of
Calculations
Reasonableness
of Conclusions