Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Ars Conjectandi wikipedia, lookup

Inductive probability wikipedia, lookup

Statistics wikipedia, lookup

Probability interpretations wikipedia, lookup

Probability wikipedia, lookup

Transcript
```Grade: 7
Unit #5: Statistics and Probability
Time: 25 Days
Unit Overview
In this unit, students begin their study of probability, learning how to interpret probabilities and how to compute probabilities in simple
settings. They also learn how to estimate probabilities empirically. Probability provides a foundation for the inferential reasoning developed
in the second half of this module. Additionally, students build on their knowledge of data distributions that they studied in Grade 6, compare
data distributions of two or more populations, and are introduced to the idea of drawing informal inferences based on data from random
samples.
Random sampling: In earlier grades students have been using data, both categorical and measurement, to answer simple statistical
questions, but have paid little attention to how the data were selected. A primary focus for Grade 7 is the process of selecting a random
sample, and the value of doing so. If three students are to be selected from the class for a special project, students recognize that a fair way
to make the selection is to put all the student names in a box, mix them up, and draw out three names “at random.” Individual students
realize that they may not get selected, but that each student has the same chance of being selected. In other words, random sampling is a
fair way to select a subset (a sample) of the set of interest (the population). A statistic computed from a random sample, such as the mean
of the sample, can be used as an estimate of that same characteristic of the population from which the sample was selected. This estimate
must be viewed with some degree of caution because of the variability in both the population and sample data. A basic tenet of statistical
reasoning, then, is that random sampling allows results from a sample to be generalized to a much larger body of data, namely, the
population from which the sample was selected. “What proportion of students in the seventh grade of your school chooses football as their
favorite sport?” Students realize that they do not have the time and energy to interview all seventh graders, so the next best way to get an
answer is to select a random sample of seventh graders and interview them on this issue.
The sample proportion is the best estimate of the population proportion, but students realize that the two are not the same and a different
sample will give a slightly different estimate. In short, students realize that conclusions drawn from random samples generalize beyond the
sample to the population from which the sample was selected, but a sample statistic is only an estimate of a corresponding population
parameter and there will be some discrepancy between the two. Understanding variability in sampling allows the investigator to gauge the
expected size of that discrepancy.
The variability in samples can be studied by means of simulation. Students are to take a random sample of 50 seventh graders from a large
population of seventh graders to estimate the proportion having football as their favorite sport. Suppose, for the moment, that the true
proportion is 60%, or 0.60. How much variation can be expected among the sample proportions? The scenario of selecting samples from
this population can be simulated by constructing a “population” that has 60% red chips and 40% blue chips, taking a sample of 50 chips from
that population, recording the number of red chips, replacing the sample in the population, and repeating the sampling process. (This can be
done by hand or with the aid of technology, or by a combination of the two.) Record the proportion of red chips in each sample and plot the
results.
The dot plots show results for 200 such random samples of size 50 each. Note that the sample proportions pile up around 0.60, but it is not
too rare to see a sample proportion down around 0.45 or up around .0.75. Thus, we might expect a variation of close to 15 percentage
points in either direction. Interestingly, about that same amount of variation persists for true proportions of 50% and 40%, as shown in the
dot plots. Students can now reason that random samples of size 50 are likely to produce sample proportions that are within about 15
percentage points of the true population value. They should now conjecture as to what will happen if the sample size is doubled or halved,
and then check out the conjectures with further simulations. Why are sample sizes in public opinion polls generally around 1000 or more,
rather than as small as 50?
Informal comparative inference: To estimate a population mean or median, the best practice is to select a random sample from that
population and use the sample mean or median as the estimate, just as with proportions. But, many of the practical problems dealing with
measures of center are comparative in nature, as in comparing average scores on the first and second exam or comparing average salaries
between female and male employees of a firm. Such comparisons may involve making conjectures about population parameters and
constructing arguments based on data to support the conjectures (MP3). If all measurements in a population are known, no sampling is
necessary and data comparisons involve the calculated measures of center. Even then, students should consider variability figures in the
margin show the female life expectancies for countries of Africa and Europe. It is clear that Europe tends to have the higher life
expectancies and a much higher median, but some African countries are comparable to some of those in Europe. The mean and MAD for
Africa are 53.6 and 9.5 years, respectively, whereas those for Europe are 79.3 and 2.8 years. In Africa, it would not be rare to see a
country in which female life expectancy is about ten years away from the mean for the continent, but in Europe the life expectancy in most
countries is within three years of the mean.
For random samples, students should understand that medians and means computed from samples will vary from sample to sample and
that making informed decisions based on such sample statistics requires some knowledge of the amount of variation to expect. Just as for
proportions, a good way to gain this knowledge is through simulation, beginning with a population of known structure.
The following examples are based on data compiled from nearly 200 middle school students in the Washington, DC area participating in the
Census at Schools Project. Responses to the question, “How many hours per week do you usually spend on homework?,” from a random
sample of 10 female students and another of 10 male students from this population gave the results plotted to the left. Females have a
slightly higher median, but students should realize that there is too much variation in the sample data to conclude that, in this population,
females have a higher median homework time.
An idea of how much variation to expect in samples of size 10 is needed. Simulation to the rescue! Students can take multiple samples of
size 10 from the Census of Schools data to see how much the sample medians themselves tend to vary. The sample medians for 100
random samples of size 10 each, with 100 samples of males and 100 samples of females, is shown to left. This plot shows that the sample
medians vary much less than the homework hours themselves and provides more convincing evidence that the female median homework
hours is larger than that for males. Half of the female sample medians are within one hour of 4 while half of the male sample medians are
within half hour of 3, although there is still overlap between the two groups.
Chance processes and probability models In Grade 7, students build their understanding of probability on a relative frequency view of the
subject, examining the proportion of “successes” in a chance process—one involving repeated observations of random outcomes of a given
event, such as a series of coin tosses.
“What is my chance of getting the correct answer to the next multiple choice question?” is not a probability question in the relative frequency
sense. “What is my chance of getting the correct answer to the next multiple choice question if I make a random guess among the four
choices?” is a probability question, because the student could set up an experiment of multiple trials to approximate the relative frequency of
the outcome. And two students doing the same experiment will get nearly the same approximation. These important points are often
overlooked in discussions of probability.
Students begin by relating probability to the long-run (more than five or ten trials) relative frequency of a chance event, using coins, number
cubes, cards, spinners, bead bags, and so on. Hands-on activities with students collecting the data on probability experiments are critically
important, but once the connection between observed relative frequency and theoretical probability is clear, they can move to simulating
probability experiments via technology (graphing calculators or computers). It must be understood that the connection between relative
frequency and probability goes two ways. If you know the structure of the generating mechanism (e.g., a bag with known numbers of red
and white chips), you can anticipate the relative frequencies of a series of random selections (with replacement) from the bag. If you do not
know the structure (e.g., the bag has unknown numbers of red and white chips), you can approximate it by making a series of random
selections and recording the relative frequencies. This simple idea, obvious to the experienced, is essential and not obvious at all to the
novice. The first type of situation, in which the structure is known, leads to “probability”; the second, in which the structure is unknown, leads
to “statistics.”
A probability model provides a probability for each possible non-overlapping outcome for a chance process so that the total probability over
all such outcomes is unity. The collection of all possible individual outcomes is known as the sample space for the model. For example, the
sample space for the toss of two coins (fair or not) is often written as {TT, HT, TH, HH}. The probabilities of the model can be either
theoretical (based on the structure of the process and its outcomes) or empirical (based on observed data generated by the process). In the
toss of two balanced coins, the four outcomes of the sample space are given equal theoretical probabilities of ¼ because of the symmetry of
the process—because the coins are balanced, an outcome of heads is just as likely as an outcome of tails. Randomly selecting a name from
a list of ten students also leads to equally likely outcomes with probability 0.10 that a given student’s name will be selected. If there are
exactly four seventh graders on the list, the chance of selecting a seventh grader’s name is 0.40. On the other hand, the probability of a
tossed thumbtack landing point up is not necessarily ½ just because there are two possible outcomes; these outcomes may not be equally
likely and an empirical answer must be found by tossing the tack and collecting data.
The product rule for counting outcomes for chance events should be used in finite situations like tossing two or three coins or rolling two
number cubes. There is no need to go to more formal rules for permutations and combinations at this level. Students should gain experience
in the use of diagrams, especially trees and tables, as the basis for organized counting of possible outcomes from chance processes. For
example, the 36 equally likely (theoretical probability) outcomes from the toss of a pair of number cubes are most easily listed on a two-way
table. An archived table of census data can be used to approximate the (empirical) probability that a randomly selected Florida resident will
be Hispanic.
After the basics of probability are understood, students should experience setting up a model and using simulation (by hand or with
technology) to collect data and estimate probabilities for a real situation that is sufficiently complex that the theoretical probabilities are not
obvious. For example, suppose, over many years of records, a river generates a spring flood about 40% of the time. Based on these
records, what is the chance that it will flood for at least three years in a row sometime during the next five years?
Supporting Cluster Standards
Use random sampling to draw inferences about a population.
7.SP.1 Understand that statistics can be used to gain information about a population by examining a sample of the population;
generalizations about a population from a sample are valid only if the sample is representative of that population. Understand that random
sampling tends to produce representative samples and support valid inferences.
7.SP.2 Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple
samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. For example, estimate the mean word
length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey
data. Gauge how far off the estimate or prediction might be.
Investigate chance processes and develop, use and evaluate probability models.
7.SP.5 Understand that the probability of a chance event is a number between 0 and 1 that expresses the likelihood of the event occurring.
Larger numbers indicate greater likelihood. A probability near 0 indicates an unlikely event, a probability around ½ indicates an event that in
neither unlikely nor likely, and a probability near 1 indicates a likely event.
7.SP.6 Approximate the probability of a chance event by collecting data on the chance process that produces it and observing its long-run
relative frequency, and predict the approximate relative frequency given the probability. For example, when rolling a number cube 600 times,
predict that a 3 or 6 would be rolled roughly 200 times, but probably not exactly 200 times.
7.SP.7 Develop a probability model and use it to find probabilities of events. Compare probabilities from a model to observed frequencies; if
the agreement is not good, explain possible sources of the discrepancy.
a) Develop a uniform probability model by assigning equal probability to all outcomes, and use the model to determine probabilities of
events. For example, if a student is selected at random from a class, find the probability that Jane will be selected and the probability that a
girl will be selected.
b) Develop a probability model (which may not be uniform) by observing frequencies in data generated from a chance process. For example,
find the approximate probability that a spinning penny will land heads up or that a tossed paper cup will land open-end down. Do the
outcomes for the spinning penny appear to be equally likely based on the observed
7.SP.8 Find probabilities of compound events using organized lists, tables, tree diagrams, and simulation.
a) Understand that, just as with simple events, the probability of a compound event is the fraction of outcomes in the sample space for which
the compound event occurs.
b) Represent sample spaces for compound events using methods such as organized lists, tables and tree diagrams. For an event described
in everyday language (e.g., “rolling double sixes”), identify the outcomes in the sample space which compose the event.
c) Design and use a simulation to generate frequencies for compound events. For example, use random digits as a simulation tool to
approximate the answer to the question: If 40% of donors have type A blood, what is the probability that it will take at least 4 donors to find
one with type A blood?
Supporting Cluster Standards Unpacked
7.SP.1 Students recognize that it is difficult to gather statistics on an entire population. Instead a random sample can be representative of
the total population and will generate valid predictions. Students use this information to draw inferences from data. A random sample must
be used in conjunction with the population to get accuracy. For example, a random sample of elementary students cannot be used to give a
Example 1:
The school food service wants to increase the number of students who eat hot lunch in the cafeteria. The student council has been asked to
conduct a survey of the student body to determine the students’ preferences for hot lunch. They have determined two ways to do the survey.
The two methods are listed below. Determine if each survey option would produce a random sample. Which survey option should the
student council use and why?
1. Write all of the students’ names on cards and pull them out in a draw to determine who will complete the survey.
2. Survey the first 20 students that enter the lunchroom.
3. Survey every 3rd student who gets off a bus.
7.SP.2 Students collect and use multiple samples of data to make generalizations about a population. Issues of variation in the samples
Example 1:
Below is the data collected from two random samples of 100 students regarding student’s school lunch preference. Make at least two
inferences based on the results.
Student Sample
Hamburgers Tacos Pizza Total
#1
12
14
74
100
#2
12
11
77
100
Solution:
•
Most students prefer pizza.
•
More people prefer pizza and hamburgers and tacos combined.
7.SP.5 This is the students’ first formal introduction to probability.
Students recognize that the probability of any single event can be can be expressed in terms such as impossible, unlikely, likely, or certain
or as a number between 0 and 1, inclusive, as illustrated on the number line below.
The closer the fraction is to 1, the greater the probability the event will occur.
Larger numbers indicate greater likelihood. For example, if someone has 10 oranges and 3 apples, you have a greater likelihood of
selecting an orange at random.
Students also recognize that the sum of all possible outcomes is 1.
Example 1:
There are three choices of jellybeans – grape, cherry and orange. If the probability of getting a grape is and the probability of getting
cherry is , what is the probability of getting orange?
Solution:
The combined probabilities must equal 1. The combined probability of grape and cherry is
5
5
. The probability of orange must equal
to
10
10
get a total of 1.
Example 2:


The container below contains 2 gray, 1 white, and 4 black marbles. Without looking, if Eric chooses a marble from the container, will the
probability be closer to 0 or to 1 that Eric will select a white marble? A gray marble? A black marble? Justify each of your predictions.
Solution:
White marble:
Gray marble:
Black marble:
Closer to 0
Closer to 0
Closer to 1
Students can use simulations such as Marble Mania on AAAS or the Random Drawing Tool on NCTM’s Illuminations to generate data and
examine patterns.
Random Drawing Tool - http://illuminations.nctm.org/activitydetail.aspx?id=67
7.SP.6 Students collect data from a probability experiment, recognizing that as the number of trials increase, the experimental probability
approaches the theoretical probability. The focus of this standard is relative frequency -- The relative frequency is the observed number of
successful events for a finite sample of trials. Relative frequency is the observed proportion of successful event, expressed as the value
calculated by dividing the number of times an event occurs by the total number of times an experiment is carried out.
Example 1:
Suppose we toss a coin 50 times and have 27 heads and 23 tails. We define a head as a success. The relative frequency of heads is:
27
= 54%
50
The probability of a head is 50%. The difference between the relative frequency of 54% and the probability of 50% is due to small sample
size.

The probability of an event can be thought of as its long-run relative frequency when the experiment is carried out many times.
Students can collect data using physical objects or graphing calculator or web-based simulations. Students can perform experiments
multiple times, pool data with other groups, or increase the number of trials in a simulation to look at the long-run relative frequencies.
Example 2:
Each group receives a bag that contains 4 green marbles, 6 red marbles, and 10 blue marbles. Each group performs 50 pulls, recording the
color of marble drawn and replacing the marble into the bag before the next draw. Students compile their data as a group and then as a
class. They summarize their data as experimental probabilities and make conjectures about theoretical probabilities (How many green draws
would are expected if 1000 pulls are conducted? 10,000 pulls?).
Students create another scenario with a different ratio of marbles in the bag and make a conjecture about the outcome of 50 marble pulls
with replacement. (An example would be 3 green marbles, 6 blue marbles, 3 blue marbles.)
Students try the experiment and compare their predictions to the experimental outcomes to continue to explore and refine conjectures about
theoretical probability.
Example 3:
A bag contains 100 marbles, some red and some purple. Suppose a student, without looking, chooses a marble out of the bag, records the
color, and then places that marble back in the bag. The student has recorded 9 red marbles and 11 purple marbles. Using these results,
predict the number of red marbles in the bag.
7.SP.7 Probabilities are useful for predicting what will happen over the long run. Using theoretical probability, students predict frequencies
of outcomes. Students recognize an appropriate design to conduct an experiment with simple probability events, understanding that the
experimental data give realistic estimates of the probability of an event but are affected by sample size.
Students need multiple opportunities to perform probability experiments and compare these results to theoretical probabilities. Critical
components of the experiment process are making predictions about the outcomes by applying the principles of theoretical probability,
comparing the predictions to the outcomes of the experiments, and replicating the experiment to compare results. Experiments can be
replicated by the same group or by compiling class data. Experiments can be conducted using various random generation devices including,
but not limited to, bag pulls, spinners, number cubes, coin toss, and colored chips. Students can collect data using physical objects or
graphing calculator or web-based simulations. Students can also develop models for geometric probability (i.e. a target).
Example 1:
If Mary chooses a point in the square, what is the probability that it is not in the circle?
Solution:
The area of the square would be 12 x 12 or 144 units squared.
The area of the circle would be 113.04 units squared. The probability that
a point is not in the circle would be
30.96
or 21.5%
144
Example 2:
Jason is tossing a fair coin. He tosses the coin ten times and it lands on heads eight times. If Jason tosses the coin an eleventh time, what
is the probability that it will land

Solution:
The probability would be
1
. The result of the eleventh toss does not depend on the previous results.
2
Example 3:
Devise an experiment using a coin to determine whether a baby is a boy or a girl. Conduct the experiment ten times to determine the
gender of ten births.
 How could a number cube be used to simulate whether a baby is a girl or a boy or girl?
Example 4:
Conduct an experiment using a Styrofoam cup by tossing the cup and recording how it lands.

How many trials were conducted?

How many times did it land right side up?

How many times did it land upside down/

How many times did it land on its side?

Determine the probability for each of the above results
7.SP.8 Students use tree diagrams, frequency tables, and organized lists, and simulations to determine the probability of compound events.
Example 1:
How many ways could the 3 students, Amy, Brenda, and Carla, come in 1st, 2nd and 3rd place?
Solution:
Making an organized list will identify that there are 6 ways for the students to win a race
A, B, C
A, C, B
B, C, A
B, A, C
C, A, B
C, B, A
Example 2:
Students conduct a bag pull experiment. A bag contains 5 marbles. There is one red marble, two blue marbles and two purple marbles.
Students will draw one marble without replacement and then draw another. What is the sample space for this situation? Explain how the
sample space was determined and how it is used to find the probability of drawing one blue marble followed by another blue marble.
Example 3:
A fair coin will be tossed three times. What is the probability that two heads and one tail in any order will result?
Solution:
3
8
HHT, HTH and THH so the probability would be .

Example 4:
Show all possible arrangements of the letters in the word FRED
using a tree diagram. If each of the letters is on a tile and drawn at
random, what is the probability of drawing the letters F-R-E-D in
In that order?
What is the probability that a “word” will have an F as the first letter?
Solution:
There are 24 possible arrangements (4 choices • 3 choices • 2 choices • 1 choice)
The probability of drawing F-R-E-D in that order is
1
.
24
The probability that a “word” will have an F as the first letter is
6
1
or .
24
4



Instructional Strategies
Grade 7 is the introduction to the formal study of probability. Through multiple experiences, students begin to understand the probability of
chance (simple and compound), develop and use sample spaces, compare experimental and theoretical probabilities, develop and use
graphical organizers, and use information from simulations for predictions. Help students understand the probability of chance is using the
benchmarks of probability: 0, 1 and ½. Provide students with situations that have clearly defined probability of never happening
as zero, always happening as 1 or equally likely to happen as to not happen as 1/2. Then advance to situations in which the probability is
somewhere between any two of these benchmark values. This builds to the concept of expressing the probability as a number between 0
and 1. Use this to build the understanding that the closer the probability is to 0, the more likely it will not happen, and the closer to 1, the
more likely it will happen. Students learn to make predictions about the relative frequency of an event by using simulations to collect, record,
organize and analyze data. Students also develop the understanding that the more the simulation for an event is repeated, the closer the
experimental probability approaches the theoretical probability. Have students develop probability models to be used to find the probability of
events. Provide students with models of equal outcomes and models of not equal outcomes are developed to be
used in determining the probabilities of events. Students should begin to expand the knowledge and understanding of the probability of
simple events, to find the probabilities of compound events by creating organized lists, tables and tree diagrams. This helps students create
a visual representation of the data; i.e., a sample space of the compound event. From each sample space, students determine the
probability or fraction of each possible outcome. Students continue to build on the use of simulations for simple probabilities and now
expand the simulation of compound probability. Providing opportunities for students to match situations and sample spaces assists students
in visualizing the sample spaces for situations.
Students often struggle making organized lists or trees for a situation in order to determine the theoretical probability. Having students start
with simpler situations that have fewer elements enables them to have successful experiences with organizing lists and trees diagrams. Ask
guiding questions to help students create methods for creating organized lists and trees for situations with more elements. Students often
see skills of creating organized lists, tree diagrams, etc. as the end product. Provide students with experiences that require the use of these
graphic organizers to determine the theoretical probabilities. Have them practice making the connections between the process of creating
lists, tree diagrams, etc. and the interpretation of those models. Additionally, students often struggle when converting forms of probability
from fractions to percents and vice versa. To help students with the discussion of probability, don’t allow the symbol manipulation
conversions to detract from the conversations. By having students use technology such as a graphing calculator or computer software to
simulate a situation and graph the results, the focus is on the interpretation of the data. Students then make predictions about the general
population based on these probabilities.
Draw informal comparative inferences about two populations
7.SP.3 Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference
between the centers by expressing it as a multiple of a measure of variability. For example, the mean height of players on the basketball
team is 10 cm greater than the mean height of players on the soccer team, about twice the variability (mean absolute deviation) on either
team; on a dot plot, the separation between the two distributions of heights is noticeable.
7.SP.4 Use measures of center and measures of variability for numerical data from random samples to draw informal comparative
inferences about two populations. For example, decide
7.SP.3 This is the students’ first experience with comparing two data sets. Students build on their understanding of graphs, mean, median,
Mean Absolute Deviation (MAD) and interquartile range from 6th grade. Students understand that
1. a full understanding of the data requires consideration of the measures of variability as well as mean or median,
2. variability is responsible for the overlap of two data sets and that an increase in variability can increase the overlap, and
3. median is paired with the interquartile range and mean is paired with the mean absolute deviation .
Example:
Jason wanted to compare the mean height of the players on his favorite basketball and soccer teams. He thinks the mean height of the
players on the basketball team will be greater but doesn’t know how much greater. He also wonders if the variability of heights of the
athletes is related to the sport they play. He thinks that there will be a greater variability in the heights of soccer players as compared to
basketball players. He used the rosters and player statistics from the team websites to generate the following lists.
Basketball Team – Height of Players in inches for 2010 Season
75, 73, 76, 78, 79, 78, 79, 81, 80, 82, 81, 84, 82, 84, 80, 84
Soccer Team – Height of Players in inches for 2010
73, 73, 73, 72, 69, 76, 72, 73, 74, 70, 65, 71, 74, 76, 70, 72, 71, 74, 71, 74, 73, 67, 70, 72, 69, 78, 73, 76, 69
To compare the data sets, Jason creates a two dot plots on the same scale. The shortest player is 65 inches and the tallest players are 84
inches.
In looking at the distribution of the data, Jason observes that there is some overlap between the two data sets. Some players on both teams
have players between 73 and 78 inches tall. Jason decides to use the mean and mean absolute deviation to compare the data sets.
The mean height of the basketball players is 79.75 inches as compared to the mean height of the soccer players at 72.07 inches, a
difference of 7.68 inches.
The mean absolute deviation (MAD) is calculated by taking the mean of the absolute deviations for each data point. The difference between
each data point and the mean is recorded in the second column of the table The difference between each data point and the mean is
recorded in the second column of the table. Jason used rounded values (80 inches for the mean height of basketball players and 72 inches
for the mean height of soccer players) to find the differences. The absolute deviation, absolute value of the deviation, is recorded in the third
column. The absolute deviations are summed and divided by the number of data points in the set.
The mean absolute deviation is 2.14 inches for the basketball players and 2.53 for the soccer players. These values indicate moderate
variation in both data sets.
Solution:
There is slightly more variability in the height of the soccer players. The difference between the heights of the teams (7.68) is approximately
3 times the variability of the data sets (7.68 ÷ 2.53 = 3.04; 7.68 ÷ 2.14 = 3.59).
Soccer Players (n = 29)
Height (in)
65
67
69
69
69
70
70
71
71
71
72
72
72
72
73
73
73
73
73
73
74
74
74
74
76
76
76
78
∑ = 2090
Deviation from
Mean (in)
Absolute
Deviation (in)
-7
-5
-3
-3
-3
-2
-2
-1
-1
-1
0
0
0
0
+1
+1
+1
+1
+1
+1
+2
+2
+2
+2
+4
+4
+4
+6
7
5
3
3
3
2
2
1
1
1
0
0
0
0
1
1
1
1
1
1
2
2
2
2
4
4
4
6
∑ = 62
Mean = 2090 ÷ 29 =72 inches
MAD = 62 ÷ 29 = 2.14 inches
Deviation
Height (in)
from Mean
(in)
73
-7
75
-5
76
-4
78
-2
78
-2
79
-1
79
-1
80
0
80
0
81
+1
81
+1
82
+2
82
+2
84
+4
84
+4
84
+4
∑ = 1276
Mean = 1276 ÷ 16 =80 inches
MAD = 40 ÷ 16 = 2.53 inches
Absolute
Deviation (in)
7
5
4
2
2
1
1
0
0
1
1
2
2
4
4
4
∑ = 40
7.SP.4 Students compare two sets of data using measures of center (mean and median) and variability (MAD and IQR).
Showing the two graphs vertically rather than side by side helps students make comparisons. For example, students would be able to see
from the display of the two graphs that the ideas scores are generally higher than the organization scores. One observation students might
make is that the scores for organization are clustered around a score of 3 whereas the scores for ideas are clustered around a score of 5.
Example 1:
The two data sets below depict random samples of the management salaries in two companies. Based on the salaries below which measure
of center will provide the most accurate estimation of the salaries for each company?
•
Company A: 1.2 million, 242,000, 265,500, 140,000, 281,000, 265,000, 211,000
•
Company B: 5 million, 154,000, 250,000, 250,000, 200,000, 160,000, 190,000
Solution:
The median would be the most accurate measure since both companies have one value in the million that is far from the other values and
would affect the mean.
Focus Standards for Mathematical Practice
MP.2 Reason abstractly and quantitatively. Students reason quantitatively by posing statistical questions about variables and the
relationship between variables. Students reason abstractly about chance experiments in analyzing possible outcomes and designing
simulations to estimate probabilities.
MP.3 Construct viable arguments and critique the reasoning of others. Students construct viable arguments by using sample data to
explore conjectures about a population. Students critique the reasoning of other students as part of poster or similar presentations.
MP.4 Model with mathematics. Students use probability models to describe outcomes of chance experiments. They evaluate probability
models by calculating the theoretical probabilities of chance events, and by comparing these probabilities to observed relative frequencies.
MP.5 Use appropriate tools strategically. Students use simulation to approximate probabilities. Students use appropriate technology to
calculate measures of center and variability. Students use graphical displays to visually represent distributions.
MP.6 Attend to precision. Students interpret and communicate conclusions in context based on graphical and numerical data summaries.
Students make appropriate use of statistical terminology.
Skills and Concepts
Prerequisite Skills/Concepts: Students should already be able
to…
Summarize and describe distributions.







6.SP.5 Summarize numerical data sets in relation to their
context, such as by:
Reporting the number of observations.
Describing the nature of the attribute under investigation,
including how it was measured and its units of measurement.
Giving quantitative measures of center (median and/or mean)
and variability (interquartile range and/or mean absolute
deviation), as well as describing any overall pattern and any
striking deviations from the overall pattern with reference to
the context in which the data were gathered.
Relating the choice of measures of center and variability to the
shape of the data distribution and the context in which the

Investigate patterns of association in bivariate data (8.SP.1-4)
Understand independence and conditional probability and use them
to interpret data. (S-CP.1-5)
Use the rules of probability to compute probabilities of compound
events in a uniform probability model. (S-CP.6-7)
Skills: Students will be able to …
 Recognize and identify that different sampling techniques must be
used in real life situations, because it is very difficult to survey an
entire population. (7.SP.1)
 Select appropriate sample sizes based on a population in real-life
situations and explain why generalizations about a population from a


data were gathered. Understand ratio concepts and use ratio
reasoning to solve problems.
6.RP.3c Find a percent of a quantity as a rate per 100 (e.g.,
30% of a quantity means 30/100 times the quantity); solve
problems involving finding the whole, given a part and the
percent. Analyze proportional relationships and use them to
solve real-world and mathematical problems.
7.RP.2 Recognize and represent proportional relationships
between quantities.













sample are valid only if the sample is random and representative of
that population. (7.SP.1)
Collect data from a sample population in order to predict information
Interpret data from a random sample to draw inferences about a
population with an unknown characteristic of interest. (7.SP.2)
Generate multiple samples (or simulated samples) of the same size
to determine the variation in estimates or predictions by comparing
the samples. (7.SP.2)
Identify the degree of overlap between two numerical sets of data.
(7.SP.3)
Visually compare two numerical data distributions with like ranges.
(7.SP.3)
Measure the difference between the centers of two different data
distributions and express this difference as a multiple of a measure
of variability. (7.SP.3)
Use measures of center and measures of variability for numerical
data from random samples to draw informal comparative inferences
Represent the probability of a chance event as a number between 0
and 1. (7.SP.5)
Use the terms “likely”, “unlikely,” to describe the probability
represented by the fractions used. (7.SP.5)
Approximate the probability of a chance event by collecting data on
the chance process that produces it and observing its long-run
relative frequency. (7.SP.6)
Predict the approximate relative frequency of a chance event given
the probability. (7.SP.6)
Develop a uniform probability model by assigning equal probability to
all outcomes, and use the model to determine probabilities of events.
(7.SP.7)
Develop a probability model (which may not be uniform) by
observing frequencies in data generated from a chance process.







(7.SP.7)
Compare probabilities from a model to observed frequencies.
(7.SP.7)
If the agreement between a model and observed frequencies is not
good, explain possible sources of the discrepancy. (7.SP.7)
Find probabilities of compound events using organized lists, tables,
tree diagrams, and simulation. (7.SP.8)
Represent the probability of a compound event as the fraction of
outcomes in the sample space for which the compound event
occurs. (7.SP.8)
Represent sample spaces for compound events using methods such
as organized lists, tables and tree diagrams. (7.SP.8)
For an event described in everyday language (e.g., “rolling double
sixes”), identify the outcomes in the sample space which compose
the event. (7.SP.8)
Design and use a simulation to generate frequencies for compound
events. (7.SP.8)











Probability (A number between and that represents the likelihood that an outcome will occur.) Probability model (A probability model
for a chance experiment specifies the set of possible outcomes of the experiment—the sample space—and the probability associated
with each outcome.)
Compound event (An event consisting of more than one outcome from the sample space of a chance experiment.)
Tree diagram (A diagram consisting of a sequence of nodes and branches. Tree diagrams are sometimes used as a way of
representing the outcomes of a chance experiment that consists of a sequence of steps, such as rolling two number cubes, viewed as
first rolling one number cube and then rolling the second.)
Simulation (The process of generating “artificial” data that are consistent with a given probability model or with sampling from a known
population.)
Long-run relative frequency (The proportion of the time some outcome occurs in a very long sequence of observations.)
Random sample
Inference
Measures of center
Measures of variability