Download sample

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
A Mathematical View
of Our World
1st ed.
Parks, Musser, Trimpe,
Maurer, and Maurer
Chapter 9
Collecting and
Interpreting Data
Section 9.1
Populations, Samples, and Data
• Goals
• Study populations and samples
• Study data
• Quantitative data
• Qualitative data
• Study bias
• Study simple random sampling
9.1 Initial Problem
• How can a professor choose 5 students
from among 25 volunteers in a fair way?
• The solution will be given at the end of the section.
Populations and Samples
• The entire set of objects being studied is
called the population.
• A population can consist of:
• People or animals
• Plants
• Inanimate objects
• Events
• The members of a population are called
elements.
Populations and Samples, cont’d
• Any characteristic of elements of the
population is called a variable.
• When we collect information from a
population element, we say that we
measure the variable being studied.
• A variable that is naturally numerical is
called quantitative.
• A variable that is not numerical is called
qualitative.
Populations and Samples, cont’d
• A census measures the variable for
every element of the population.
• A census is time-consuming and
expensive, unless the population is very
small.
• Instead of dealing with the entire
population, a subset, called a sample,
is usually selected for study.
Example 1
• Suppose you want to determine voter
opinion on a ballot measure. You
survey potential voters among
pedestrians on Main Street during
lunch.
a) What is the population?
b) What is the sample?
c) What is the variable being measured?
Example 1, cont’d
a) Solution: The population consists of all the
people who intend to vote on the ballot
measure.
Example 1, cont’d
b) Solution: The sample consists of all the people
you interviewed on Main Street who intend to
vote on the ballot measure.
Example 1, cont’d
c) Solution: The variable being
measured is the voter’s intent to
vote “yes” or “no” on the ballot
measure.
Data
• The measurement information recorded
from a sample is called data.
• Quantitative data is measurements for a
quantitative variable.
• Qualitative data is measurements for a
qualitative variable.
Data, cont’d
• Qualitative data with a natural ordering
is called ordinal.
• For example, a ranking of a pizza on a
scale of “Excellent” to “Poor” is ordinal.
• Qualitative data without a natural
ordering is called nominal.
• For example, eye color is nominal.
Data, cont’d
• The types of data are illustrated below.
Example 2
• Suppose you survey potential voters
among the people on Main Street
during lunch to determine their political
affiliation and age, as well as their
opinion on the ballot measure.
• Classify the variables as quantitative or
qualitative.
Example 2, cont’d
• Solution:
• Political affiliation is a qualitative variable.
• Age is a quantitative variable.
• Opinion on the ballot measure is a
qualitative variable.
Question:
Suppose you survey potential voters among the
people on Main Street during lunch to determine
their political affiliation and age, as well as their
opinion on the ballot measure. Classify the
qualitative variables political affiliation and
opinion on the ballot measure as ordinal or
nominal.
a. Both are ordinal.
b. Both are nominal.
c. Political affiliation is ordinal and opinion is nominal.
d. Political affiliation is nominal and opinion is ordinal.
Samples, cont’d
• Statistical inference is used to make an
estimation or prediction for the entire
population based on data collected from the
sample.
• If a sample has characteristics that are
typical of the population as a whole, we say
it is a representative sample.
• A bias is a flaw in the sampling that makes it
more likely the sample will not be representative.
Common Sources of Bias
• Faulty sampling: The sample is not
representative.
• Faulty questions: The questions are
worded to influence the answers.
• Faulty interviewing: Interviewers fail to
survey the entire sample, misread
questions, and/or misinterpret answers.
Common Sources of Bias, cont’d
• Lack of understanding or knowledge:
The person being interviewed does not
understand the question or needs more
information.
• False answers: The person being
interviewed intentionally gives incorrect
information.
Example 3
• Suppose you wish to determine voter
opinion regarding eliminating the capital
gains tax. You survey potential voters
on a street corner near Wall Street in
New York City.
• Identify a source of bias in this poll.
Example 3, cont’d
• Solution: One source of bias in
choosing the sample is that people who
work on Wall Street would benefit from
the elimination of the tax and are more
likely to favor the elimination than the
average voter may be.
• This is faulty sampling.
Example 4
• Suppose a car manufacturer wants to
test the reliability of 1000 alternators.
They will test the first 30 from the lot for
defects.
• Identify any potential sources of bias.
Example 4, cont’d
• Solution: One source of bias could be that
the first 30 alternators are chosen for the
sample. It may be that defects are either
much more likely at the beginning of a
production run or much less likely at the
beginning. In either case, the sample would
not be representative.
• This is potentially faulty sampling.
Simple Random Samples
• Representative samples are usually
chosen randomly.
• Given a population and a desired
sample size, a simple random sample
is any sample chosen in such a way
that all samples of the same size are
equally likely to be chosen.
Simple Random Samples, cont’d
• One way to choose a simple random sample
is to use a random number generator or
table.
• A random number generator is a computer or
calculator program designed to produce
numbers with no apparent pattern.
• A random number table is a table produced
with a random number generator.
• An example of the first few rows of a random
number table is shown on the next slide.
Random Number Table
Example 5
• Choose a simple random sample of
size 5 from 12 semifinalists: Astoria,
Beatrix, Charles, Delila, Elsie, Frank,
Gaston, Heidi, Ian, Jose, Kirsten, and
Lex.
Example 5, cont’d
• Solution: Assign numerical labels to the
population elements, in any order, as shown
below:
Example 5, cont’d
• Solution, cont’d: Choose a random spot
in the table to begin.
• In this case, we could choose to start at
the top of the third column and to read
down, looking at the last 2 digits in each
number. This choice is arbitrary.
• Numbers that correspond to population
labels are recorded, ignoring duplicates,
until 5 such numbers have been found.
Example 5, cont’d
Example 5, cont’d
• Solution, cont’d: The numbers located
are 01, 06, 10, 11, and 07.
• The simple random sample consists of
Beatrix, Gaston, Heidi, Kirsten, and Lex.
Question:
Choose a different simple random
sample of size 5 from the 12
semifinalists: Astoria, Beatrix,
Charles, Delila, Elsie, Frank, Gaston,
Heidi, Ian, Jose, Kirsten, and Lex.
Question, cont’d
Use the first 2 digits of each number,
reading across the row starting in
row 128 of the random number
table.
a. Delila, Beatrix, Lex, Kirsten, Jose
b. Frank, Jose, Elsie, Delila, Ian
c. Charles, Ian, Frank, Beatrix, Gaston
d. Jose, Beatrix, Ian, Heidi, Lex
Example 6
• Choose a simple random sample of
size 8 from the states of the United
States of America.
Example 6, cont’d
• Solution: Numerical labels can be
assigned to the population elements in
any order.
• In this example we choose to order the
states by area.
• The labels are shown on the next slide.
Example 6, cont’d
Example 6, cont’d
• Solution, cont’d: We randomly choose
to start at the top row, left column of the
number table and read the last 2 digits
of each entry across the row.
• The entries are 03918 77195 47772
21870 87122 99445 10041 31795 63857
64569 34893 20429 43537 25368 95237
17707 34280 04755 64301 66836
12201…
Example 6, cont’d
• Solution, cont’d:
• The numbers obtained from the table are
18, 22, 45, 41, 29, 37, 07, 01.
• The states selected for the sample are
Washington, Florida, Vermont, West
Virginia, Arkansas, Kentucky, Nevada, and
Alaska.
9.1 Initial Problem Solution
• To fairly select 5 students from 25 volunteers,
a professor could choose a simple random
sample.
• Solution: Assign the students labels of 00
through 24 according to some ordering.
• Pick a starting place in a random number table
and read until 5 students have been selected.
Initial Problem Solution, cont’d
• Suppose the first 2 digits of each entry
in the last column are used.
• The first 5 numbers that are 24 or less are
20, 04, 16, 07, and 06.
• The students that were assigned these
labels are fairly chosen from the 25
volunteers.
Section 9.2
Survey Sampling Methods
• Goals
• Study sampling methods
• Independent sampling
• Systematic sampling
• Quota sampling
• Stratified sampling
• Cluster sampling
9.2 Initial Problem
• You need to interview at least 800 people
nationwide.
• You need a different interviewer for each county.
• Each interviewer costs $50 plus $10 per interview.
• Your budget is $15,000.
• Which is better, a simple random sample of all
adults in the U.S. or a simple random sample of
adults in randomly-selected counties?
• The solution will be given at the end of the section.
Sample Survey Design
• Simple random sampling can be
expensive and time-consuming in
practice.
• Statisticians have developed sample
survey design to provide less
expensive alternatives to simple
random sampling.
Independent Sampling
• In independent sampling, each member of
the population has the same fixed chance of
being selected for the sample.
• The size of the sample is not fixed ahead of
time.
• For example, in a 50% independent sample,
each element of the population has a 50%
chance of being selected.
Example 1
• Find a 50% independent sample of the
12 semifinalists: Astoria, Beatrix,
Charles, Delila, Elsie, Frank, Gaston,
Heidi, Ian, Jose, Kirsten, and Lex.
Example 1, cont’d
• Solution: Because a random number table
contains 10 digits, there is a 50% chance
that one of the five digits 0, 1, 2, 3, or 4 will
occur.
• Let the digits 0, 1, 2, 3, or 4 represent
“select this contestant” and let the remaining
digits represent “do not select this
contestant”.
Example 1, cont’d
• Solution, cont’d: We randomly choose
column 6 in the random number table and
look at the first 12 digits: 99445 20429 04.
• The first 9 indicates that Astoria is not selected.
• The second 9 indicates that Beatrix is not selected.
• The 4 represents that Charles is selected, and so
on…
• The 50% independent sample is Charles,
Delila, Frank, Gaston, Heidi, Ian, Kirsten,
and Lex.
Question:
Choose a 40% independent sample
from the 12 semifinalists: Astoria,
Beatrix, Charles, Delila, Elsie, Frank,
Gaston, Heidi, Ian, Jose, Kirsten, and
Lex.
Use the first 12 digits of row 145 of
the random number table and use
digits 0, 1, 2, 3 for selection.
Question, cont’d
Use the first 12 digits of row 145 of the
random number table and use digits 0, 1,
2, 3 for selection.
a. Astoria, Beatrix, Charles, Delila
b. Charles, Elsie, Frank, Gaston
c. Charles, Elsie, Frank, Gaston, Heidi,
Jose, Kirsten, Lex.
d. Beatrix, Charles, Delila, Frank, Heidi, Ian,
Lex.
Example 2
• Find a 10% independent sample of the
100 automobiles produced in one day
at a factory.
Example 2, cont’d
• Solution: Choose some ordering for the
100 automobiles.
• There is a 10% chance that the digit 0
will occur, so let the digit 0 represent
“select this automobile” and let the
other 9 digits represent “do not select
this automobile”.
Example 2, cont’d
• Solution, cont’d: We randomly start in the
first column, first row of the random
number table and read from left to right.
• In the first 100 digits we read in the table,
a 0 occurs in the positions 1, 7, 8, 19, 33,
39, 62, 70, 73, 81, 88, 93, 95, 98, and
100.
Example 2, cont’d
• Solution, cont’d: The automobiles that
are selected are highlighted.
Systematic Sampling
• In systematic sampling, we decide ahead of
time what proportion of the population we
wish to sample.
• For a 1-in-k systematic sample:
• List the population elements in some order.
• Randomly choose a number, r, from 1 to k.
• The elements selected are those labeled r, r +
k, r + 2k, r + 3k, …
Example 3
• Use systematic sampling to select a 1in-10 systematic sample of the 100
automobiles produced in one day at a
factory.
Example 3, cont’d
• Solution: List the automobiles in some
order.
• Suppose we randomly choose r = 5.
• Since r = 5 and k = 10, the automobiles
selected for the sample are those labeled
5, 15, 25, 35, 45, 55, 65, 75, 85, and 95.
Example 3, cont’d
• Solution, cont’d: The automobiles that
are selected are highlighted.
Example 3, cont’d
• A systematic sample is easier to
choose than an independent sample.
• However, the regularity in the selection
of a systematic sample can sometimes
be a source of bias.
Question:
Choose a 1-in-3 systematic sample from
the 12 semifinalists: Astoria, Beatrix,
Charles, Delila, Elsie, Frank, Gaston, Heidi,
Ian, Jose, Kirsten, and Lex. Use the
randomly chosen value of r = 2
a. Beatrix, Elsie, Heidi, Kirsten
b. Astoria, Delila, Gaston, Jose
c. Charles, Frank, Ian, Lex
d. Astoria, Charles, Elsie, Gaston, Ian,
Kirsten
Quota Sampling
• In quota sampling, the sample is chosen to
be representative for known important
variables.
• Quotas may be set for age groups, genders,
ethnicities, occupations, and so on.
• There is no way to know ahead of time which
variables are important enough to require
quotas.
• Quota sampling is not always reliable.
Stratified Sampling
• In stratified sampling, the population is
subdivided into 2 or more nonoverlapping
subsets, each of which is called a stratum.
Stratified Sampling, cont’d
• A stratified random sample is obtained
by selecting a simple random sample
from each stratum.
• A stratified sample can be less costly
because the strata allow a smaller sample
to be used.
Example 4
• Select a stratified random sample of 10
men and 10 women from a population
of 200.
• Suppose there are equal numbers of men
and women in the population.
• Use the first 2 digits of the 2nd and 3rd
columns of the random number table for
selecting men and women, respectively.
Example 4, cont’d
• Solution: The 2 strata are men and women.
• Choose a simple random sample from the
men.
• Number the 100 men with labels 00 through 99.
• The 10 men chosen from the random number
table are those with labels 77, 31, 25, 66, 49,
38, 00, 95, 24, and 57.
Example 4, cont’d
• Solution, cont’d: Choose a simple
random sample from the women.
• Number the 100 women with labels 00
through 99.
• The 10 women chosen from the random
number table are those with labels 47, 63,
95, 12, 49, 37, 48, 94, 35, and 78.
Example 4, cont’d
• Solution, cont’d: The stratified random
sample is represented below.
Question:
Suppose the 12 semifinalists can be divided into
2 strata as follows.
Junior division: Astoria, Charles, Delila, Gaston,
Heidi, Lex
Senior division: Beatrix, Elsie, Frank, Ian, Jose,
Kirsten
Choose a stratified random sample so the sample
contains 2 members of each stratum. Label the
members of each stratum 01 through 06. For the
junior division use the first 2 digits of column 3,
starting at the top and reading down. For the
senior division use the last 2 digits of column 3,
starting at the top and reading down.
Question, cont’d
Choose a stratified random sample
so the sample contains 2 members of
each stratum.
a. Frank, Beatrix, Astoria, Charles
b. Gaston, Lex, Ian, Elsie
c. Heidi, Astoria, Jose, Beatrix
d. Lex, Charles, Beatrix, Frank
Cluster Sampling
• In cluster sampling, the population is divided
into nonoverlapping subsets called sampling
units or clusters.
• Clusters may vary in size.
• A frame is a complete list of the sampling
units.
• A sample is a collection of sampling units
selected from the frame.
Cluster Sampling, cont’d
• In cluster sampling, a simple random
sample determines the clusters to be
included in the sample.
Example 5
• Select a cluster sample of 12
individuals from a population of 96
people who all live in four-person
suites.
• Use the first 2 digits of the 4th column of
the random number table.
Example 5, cont’d
• Solution: The clusters will be the 24
suites.
• Label the suites 01 through 24.
• We need a simple random sample of 3
of these suites to obtain a cluster
sample of 12 people.
Example 5, cont’d
• Solution, cont’d: The people in suites 21, 17,
and 10 are selected.
Sampling Summary
9.2 Initial Problem Solution
• You need to interview at least 800 people
nationwide.
• You need a different interviewer for each
county.
• Each interviewer costs $50 plus $10 per
interview.
• Your budget is $15,000.
• Which is better, a simple random sample of all
adults in the U.S. or a simple random sample
of adults in randomly-selected counties?
Initial Problem Solution, cont’d
• A simple random sample is unbiased, so this
might seem to be the best choice.
• However, there are 3130 counties in the
U.S.
• If, for example, you get people in your sample
from only 400 of the counties, it would cost you
400($50) + 800($10) = $28,000.
• You cannot afford to choose a simple
random sample.
Initial Problem Solution, cont’d
• The second type of sample is a much less
expensive choice.
• You must pay 800($10) = $8000 for the
interviews, which leaves $7000 for hiring
interviewers.
• You can select a simple random sample of up to
140 counties.
• Then select a simple random sample of people
from each selected county, for a total of 800
people.
Section 9.3
Central Tendency and Variability
• Goals
• Study measures of central tendency
• Mean
• Median
• Mode
• Study measures of dispersion
• Range
• Quartiles
• Standard deviation
9.3 Initial Problem
• Which stockbroker should you choose if you
want to minimize risk while maintaining a
steady rate of growth?
• One stockbroker’s recommendations had
percentage gains of 21%, -3%, 16%, 27%, 9%,
11%, 13%, 6%, and 17%.
• The other’s recommendations had percentage
gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%,
17%, and 18%.
• The solution will be given at the end of the section.
Measures of Central Tendency
• Statistics that tell us about the location of
values in a data set are called measures of
location.
• The most important measures of location,
called measures of central tendency, tell us
where the center of the data set lies.
• The most important measures of central
tendency are mean, median, and mode.
The Mean
• The mean is the most common type of
average.
• This is an arithmetic mean.
• If there are N numbers in a data set, the
mean is:
x1  x2   xN
N
The Mean, cont’d
• The mean of a sample is denoted by x ,
which is read “x-bar”.
• The mean of a population is denoted by
μ, the Greek letter pronounced “mew”.
Example 1
• Find the mean of each data set.
a) 1, 1, 2, 2, 3
b)1, 1, 2, 2, 11
c)1, 1, 2, 2, 47
• Solution:
11 2  2  3 9
4
 1
a)The mean is
5
5
5
Example 1, cont’d
• Solution, cont’d:
1

1

2

2

11
17
2
b) The mean is

3
5
5
5
c) The mean is 1  1  2  2  47  53  10 3
5
5
5
Example 2
• A college graduate reads that a
company with 5 employees has a mean
salary of $48,000.
• How might this be misleading?
Example 2, cont’d
• Solution: One possibility is that every employee
earns a salary of $48,000.
•
48000  48000  48000  48000  48000 240000

 $48, 000
5
5
• Another possibility is that the owner makes
$120,000, while the other 4 employees each earn
$30,000.
120000  30000  30000  30000  30000 240000
•

 $48, 000
5
5
Example 2, cont’d
• There are also other possible situations,
but these two are enough to show that
the salary the graduate could expect to
earn can vary widely based only on
knowing the mean salary.
Question:
Find the mean of the data set: 19,
27, 83, 94. Round to 2 decimal
places.
a. 54.33
b. 55.75
c. 44.60
d. 56.50
The Median
• The median is the “middle number” of a data
set when the values are arranged from
smallest to largest.
• If there are an odd number of data points, the
data point exactly in the middle of the list is the
median.
• If there are an even number of data points, the
mean of the two data points in the middle of the
list is the median.
Example 3
• Find the mean and median of each
data set.
a) 0, 2, 4
b) 0, 2, 4, 10
c) 0, 2, 4, 10, 1000
Example 3, cont’d
a) Solution for 0, 2, 4
• The median is 2.
• The mean is:
024 6
 2
3
3
Example 3, cont’d
b) Solution: for 0, 2, 4, 10
2

4
6
• The median is:
 3
2
2
• The mean is: 0  2  4  10  16  4
4
4
Example 3, cont’d
c) Solution: for 0, 2, 4, 10, 1000
• The median is 4.
• The mean is:
0  2  4  10  1000 1016

 203.2
5
5
Example 3, cont’d
• One very large or very small data
value can change the mean
dramatically.
• Large or small data values do not
have much of an effect on the
median.
Example 4
• Find the median salary for the 2
situations.
a) Five employees each earn $48,000.
b) Four employees earn $30,000 and one
earns $120,000.
Example 4, cont’d
• Solution:
a) The median salary is $48,000.
• The median is the same as the mean.
b) The median salary is $30,000.
• In this case the median more accurately
shows the typical salary than does the mean
of $48,000.
Question:
Find the median of the data set: 19,
27, 83, 94. Round to 2 decimal
places.
a. 27.00
b. 83.00
c. 56.50
d. 55.00
Symmetric Distributions
• If the mean and median of a data set are equal, the
data distribution is called symmetric.
• An example of a symmetric data set is shown
below.
Skewed Distributions
• A distribution is skewed left if the mean is less than
the median.
• A distribution is skewed right if the mean is greater
than the median.
The Mode
• The mode is the most commonlyoccurring value in a data set.
• A data set may have:
• No mode.
• One mode.
• Multiple modes.
Example 5
• Find the mode(s) of the following set of
test scores: 26, 32, 54, 62, 67, 70, 71,
71, 74, 76, 80, 81, 84, 87, 87, 87, 89,
93, 95, 96.
• Solution: The value 87 occurs more
times than any other score. The mode
is 87.
Example 5, cont’d
The Weighted Mean
• A weighted mean is calculated when
different data points have different
levels of importance, called weights.
• If the numbers in a data set,
x1 , x2 ,
, xN , have weights w1 , w2 ,
then the weighted mean is:
w1 x1  w2 x2   wN xN
w1  w2   wN
, wN
Example 6
• Suppose your grades one semester are:
• An A in a 5-credit course
• A B in a 4-credit course
• A C in two 3-credit courses
• What is your GPA that semester?
Example 6, cont’d
• Solution: A grade of A is worth 4 points,
a B 3 points, and a C 2 points.
• The weights are the number of credits.
• Your GPA is the weighted mean of your
grades:
4(5)  3(4)  2(3)  2(3)
 2.93
5 433
Example 7
• Determine the per capita income for the
group of nations listed in the table.
Example 7, cont’d
• Solution: The populations of the
countries are the weights.
• The per capita income of the entire
group is the weighted mean: 24.2
• The per capita income for the group of
countries in 2002 was about $24,200.
Measures of Variability
• The measures of central tendency describe
only part of the behavior of a data set.
• Statistics that tell us how the data varies from
its center are called measures of variability or
measures of spread.
• The measures of variability studied here are:
• Range
• Quartiles
• Standard deviation
The Range
• The range of a data set is the difference
between the largest data value and the
smallest data value.
Example 8
• Compute the mean and the range for
each data set.
a) 3, 4, 5, 6, 7, 8
b) 0, 2, 5, 7, 8, 11
Example 8, cont’d
•
Solution:
a) 3, 4, 5, 6, 7, 8
•
The mean is 5.5.
•
The range is 8 – 3 = 5.
b) 0, 2, 5, 7, 8, 11
•
•
The mean is 5.5.
•
The range is 11 – 0 = 11.
The two data sets have the same mean, but the
difference in ranges shows that the second data
set is more spread out.
Example 9
• Compute the range for each data set.
a) 0, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6,
6, 6, 7, 9
b) 0, 8, 9, 6, 1, 4, 6, 0, 1, 5, 3, 0, 9, 8, 0, 5,
6, 9, 5, 0
Example 9, cont’d
• Solution:
a) 0, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6,
6, 6, 7, 9
• The range is 9 – 0 = 9.
b) 0, 8, 9, 6, 1, 4, 6, 0, 1, 5, 3, 0, 9, 8, 0, 5,
6, 9, 5, 0
• The range is 9 – 0 = 9.
Example 9, cont’d
• Solution, cont’d:
The two data sets
have the same
range, but the
graphs show that
one data set varies
more than the
other.
Quartiles
• Quartiles are measures of location that divide
a data set approximately into fourths.
• The quartiles are labeled as the
• first quartile, q1
• second quartile, q2
• The second quartile is the same as the
median.
• third quartile, q3
Quartiles, cont’d
• To find the quartiles, arrange the data
values in order from smallest to
largest.
1) Find the median. This is also the second
quartile.
2) If the number of data points is even, go to
Step 3. If the number of data point is odd,
remove the median from the list before
going to Step 3.
Quartiles, cont’d
3) Divide the remaining data points into
a lower half and an upper half.
4) The first quartile, q1, is the median of
the lower half of the data.
5) The third quartile, q3, is the median
of the upper half of the data.
Quartiles, cont’d
• The interquartile range, IQR, is the
difference between the first and third
quartiles.
• IQR = q3 - q1
• The IQR is a measure of variability.
• About half of the data points lie within
the IQR
Example 10
• Find the median, the first and third
quartiles, and the interquartile range for
the test scores: 26, 32, 54, 62, 67, 70,
71, 71, 74, 76, 80, 81, 84, 87, 87, 87,
89, 93, 95, 96.
Example 10, cont’d
• Solution:
76  80
• The median is m 
 78
2
• Since there is an even number of data points, we
do not remove the median from the list.
• The first quartile is the median of the lower
half of the list: 26, 32, 54, 62, 67, 70, 71, 71,
74, 76.
67  70
 68.5
• The first quartile is q1 
2
Example 10, cont’d
• Solution, cont’d:
• The third quartile is the median of the
upper half of the list: 80, 81, 84, 87, 87,
87, 89, 93, 95, 96.
87  87
 87
• The third quartile is q3 
2
• The IQR is 87 – 68.5 = 18.5
The Five-Number Summary
• The five-number summary of a data set is a
list of 5 informative numbers related to that
set:
• The smallest value, s
• The first quartile, q1
• The median, m
• The third quartile, q3
• The largest value, L
• The numbers are always written in this order.
Example 11
• Consider the set of test scores from the
previous example: 26, 32, 54, 62, 67,
70, 71, 71, 74, 76, 80, 81, 84, 87, 87,
87, 89, 93, 95, 96.
• The five-number summary for this data
set is 26, 68.5, 78, 87, 96.
Question:
Find the 5 number summary of the
data set: 19, 27, 83, 94.
a. 19, 27, 55, 83, 94
b. 19, 23, 55, 85.5, 94
c. 19, 23, 55, 88.5, 94
d. 19, 27, 55, 85.5, 94
Box-and-Whisker Plot
• The box-and-whisker plot, also called a box
plot, is a graphical representation of the fivenumber summary of a data set.
• The box (rectangle) represents the IQR.
• The location of the median is marked within the box.
• The whiskers (lines) represent the lower and
upper 25% of the data.
Box-and-Whisker Plot, cont’d
Example 12
• The list of test scores from the previous
example had a five-number summary of 26,
68.5, 78, 87, 96.
• The box-and-whisker plot for this data set is
shown below.
Example 13
• The monthly rainfall for 2 cities is shown
below.
• Use box-and-whisker plots to compare the
rainfall amounts.
Example 13, cont’d
• Solution: In St. Louis, MO, the rainfalls were:
2.21, 2.23, 2.31, 2.64, 2.96, 3.20, 3.26, 3.29,
3.74, 4.10, 4.12.
• The median is 3.08.
• The first quartile is 2.475.
• The third quartile is 3.515.
• The five-number summary for St. Louis is
2.21, 2.475, 3.08, 3.515, 4.12.
Example 13, cont’d
• Solution, cont’d: In Portland, OR, the rainfalls
were: 0.46, 1.13, 1.47, 1.61, 2.08, 2.31, 3.05,
3.61, 3.93, 5.17, 6.14, 6.16.
• The median is 2.68.
• The first quartile is 1.54.
• The third quartile is 4.55.
• The five-number summary for Portland is
0.46, 1.54, 2.68, 4.55, 6.16.
Example 13, cont’d
• Solution, cont’d: The 2 box-and-whisker plots
are shown above.
• Note that the amount of rainfall in Portland,
OR, varies much more from month-to-month
than it does in St. Louis, MO.
Standard Deviation
• The standard deviation is a widely-used
measure of variability.
• Calculating the standard deviation requires
several intermediate steps, which will be
illustrated using the data set of incomes
shown below.
Deviation From The Mean
• The difference between a data point and the
mean of the data set is called the deviation
from the mean of that data point.
Deviation From The Mean, cont’d
• The mean income is $35,800.
Variance
• The variance is a type of average of all the
deviations from the mean.
• Variance is calculated differently for data from a
sample or from the entire population.
• Sample variance, s2: Divide the sum of all
the squared deviations from the mean by
n – 1.
• Population variance, σ2: Divide the sum of all
the squared deviations from the mean by n.
Sample Variance
• The variance of the incomes is calculated by first
squaring all the deviations.
Sample Variance, cont’d
• The squared deviations are added and
then divided by n – 1 = 9 – 1 = 8.
•
2, 465,560, 000
 308,195, 000
8
Standard Deviation
• Standard deviation is the square root
of the variance.
• Taking the square root allows the
standard deviation to have the same units
as the original data values.
• Because it is related to variance, the
standard deviation formula also
distinguishes between samples and the
population.
Standard Deviation, cont’d
• Sample standard deviation is
s s
2
• Population standard deviation is    2
• The standard deviation of the incomes is:
s  s  308,195, 000  $17,555.00
2
Example 14
• Find the sample standard deviation of
the weights (in pounds) in the 2 data
sets.
• Turkeys: 17, 18, 19, 20, 21
• Dogs: 13, 16, 19, 22, 25
Example 14, cont’d
• Solution:
• The sample mean for the turkeys is 19
pounds.
• The sample mean for the dogs is also
19 pounds.
• We note that although the means are the
same, the standard deviations should
reflect the amount of variability in the data
values.
Example 14, cont’d
• Solution, cont’d: The deviations from the
mean for the turkey weights are found.
Example 14, cont’d
• Solution, cont’d:
10 10
• The sample variance
s 

 2.5
of the turkey weights
5 1 4
2
is 2.5 square
pounds.
s
• The sample standard
2
s  2.5  1.58
deviation of the
turkey weights is
1.58 pounds.
Example 14, cont’d
• Solution, cont’d: The deviations from the
mean for the dog weights are found.
Example 14, cont’d
• Solution, cont’d:
• The sample variance
90
90
2
s 

 22.5 of the dog weights is
5 1 4
22.5 square pounds.
s
• The sample standard
deviation of the dog
2
s  22.5  4.74 weights is 4.74
pounds.
Example 14, cont’d
• Solution, cont’d: The standard
deviation of the sample of dog weights
is larger than the standard deviation of
the sample of turkey weights because
there was a much wider spread among
the dog weights.
Question:
Find the sample standard deviation
of the data set: 19, 27, 83, 94.
Round to the nearest hundredth.
a. 4382.75
b. 38.22
c. 66.20
d. 1460.92
9.3 Initial Problem Solution
• Which stockbroker should you choose if
you want to minimize risk while
maintaining a steady rate of growth?
• One stockbroker’s recommendations had
percentage gains of 21%, -3%, 16%, 27%,
9%, 11%, 13%, 6%, and 17%.
• The other’s recommendations had
percentage gains of 11%, 13%, 16%, 8%,
5%, 14%, 15%, 17%, and 18%.
Initial Problem Solution, cont’d
• First you could calculate the mean rate of
return for each stockbroker.
• Both stockbrokers have a mean rate of
return of 13%.
• Since the average growth rates are the
same, you can measure the variability to
determine which stockbroker’s
recommendations have the least
variability.
Initial Problem Solution, cont’d
• First stockbroker:
Initial Problem Solution, cont’d
• Second stockbroker:
Initial Problem Solution, cont’d
• The standard deviation of the second
portfolio is much smaller than the
standard deviation of the first stock
portfolio.
• Since the growth rates were the same,
the second stockbroker should be
chosen in order to minimize risk.