Download Chapter 13 notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1/20/2015
What You Will Learn
Section 13.1
Sampling
Techniques
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Sampling Techniques
Random Sampling
Systematic Sampling
Cluster Sampling
Stratified Sampling
Convenience Sampling
13.1-2
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Statistics
Statistics
Statistics is the art and science of
gathering, analyzing, and making
inferences (predictions) from numerical
information, data, obtained in an
experiment.
Statistics is divided into two main
branches.
Descriptive statistics is concerned
with the collection, organization, and
analysis of data.
Inferential statistics is concerned
with making generalizations or
predictions from the data collected.
13.1-3
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Statisticians
A statistician’s interest lies in drawing
conclusions about possible outcomes
through observations of only a few
particular events.
The population consists of all items or
people of interest.
The sample includes some of the items
in the population.
13.1-5
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.1-4
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Statisticians
When a statistician draws a conclusion
from a sample, there is always the
possibility that the conclusion is
incorrect.
13.1-6
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
1
1/20/2015
Types of Sampling
Types of Sampling
A random sampling occurs if a
sample is drawn in such a way that
each time an item is selected, each
item has an equal chance of being
drawn.
When a sample is obtained by drawing
every nth item on a list or production
line, the sample is a systematic
sample.
13.1-7
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.1-8
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Types of Sampling
Types of Sampling
A cluster sample is sometimes
referred to as an area sample because
it is frequently applied on a
geographical basis.
Stratified sampling involves dividing
the population by characteristics called
stratifying factors such as gender,
race, religion, or income.
13.1-9
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.110
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Types of Sampling
Example 1: Identifying Sampling
Techniques
Convenience sampling uses data
that are easily or readily obtained, and
can be extremely biased.
Identify the sampling technique used
to obtain a sample in the following.
Explain your answer.
a) Every 20th soup can coming off an
assembly line is checked for defects.
Solution
Systematic Sampling
Every 20th item is selected.
13.111
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.112
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
2
1/20/2015
Example 1: Identifying Sampling
Techniques
Example 1: Identifying Sampling
Techniques
b) A $50 gift certificate is given away at
the Annual Bankers Convention. Tickets
are placed in a bin, and the tickets are
mixed up. Then the winning ticket is
selected by a blindfolded person.
Solution
Random Sampling
Each ticket has an equal chance.
c) Children in a large city are classified
based on the neighborhood school they
attend. A random sample of five schools is
selected. All the children from each
selected school are included in the
sample.
13.113
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Solution
Cluster Sampling
Random sample of geographic areas is
selected.
13.114
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Identifying Sampling
Techniques
Example 1: Identifying Sampling
Techniques
d) The first 50 people entering a zoo are
asked if they support an increase in
taxes to support a zoo expansion.
e) Viewers of the USA Network are
classified according to age. Random
samples from each age group are
selected.
Solution
Stratified Sampling
Viewers divided into strata by age,
random sample from each strata.
Solution
Convenience Sampling
Sample is selected by picking data that
is easily obtained.
13.115
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.116
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
Section 13.2
The Misuses
of Statistics
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Misuses of Statistics
What is Not Said
Vague or Ambiguous Words
Draw Irrelevant Conclusions
Charts and Graphs
13.218
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
3
1/20/2015
Misuses of Statistics
Misuses of Statistics
When examining statistical
information, consider the following:
Was the sample used to gather the
statistical data unbiased and of
sufficient size?
Is the statistical statement
ambiguous, could it be interpreted in
more than one way?
Many individuals, businesses, and
advertising firms misuse statistics to
their own advantage.
13.219
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.220
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What is Not Said
What is Not Said
“Four out of five dentists recommend
sugarless gum for their patients who
chew gum.”
In this advertisement, we do not know
the sample size and the number of
times the experiment was performed
to obtain the desired results.
The advertisement does not mention
that possibly only 1 out of 100 dentists
recommended gum at all.
In a golf ball commercial, a “type A”
ball is hit and a second ball is hit in the
same manner.
The type A ball travels farther.
We are supposed to conclude that the
type A is the better ball.
The advertisement does not mention
the number of times the experiment
was previously performed or the
results of the earlier experiments.
13.221
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.222
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What is Not Said
Vague or Ambiguous Words
Possible sources of bias include
(1) wind speed and direction,
(2) that no two swings are identical,
and
(3) that the ball may land on a rough
or smooth surface.
Vague or ambiguous words also lead to
statistical misuses or
misinterpretations.
The word average is one such culprit.
There are at least four different
“averages,” some of which are
discussed in Section 13.4.
Each is calculated differently, and each
may have a different value for the
same sample.
13.223
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.224
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
4
1/20/2015
Vague or Ambiguous Words
Vague or Ambiguous Words
During contract negotiations, it is not
uncommon for an employer to state
publicly that the average salary of its
employees is $45,000, whereas the
employees’ union states that the
average is $40,000.
Who is lying?
Actually, both sides may be telling the
truth. Each side will use the average
that best suits its needs to present its
case.
Advertisers also use the average that
most enhances their products.
Consumers often misinterpret this
average as the one with which they are
most familiar.
13.225
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.226
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Vague or Ambiguous Words
Draw Irrelevant Conclusions
Another vague word is largest.
For example, ABC claims that it is the
largest department store in the United
States.
Does that mean largest profit, largest
sales, largest building, largest staff,
largest acreage, or largest number of
outlets?
Still another deceptive technique used
in advertising is to state a claim from
which the public may draw irrelevant
conclusions.
13.227
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.228
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Draw Irrelevant Conclusions
Draw Irrelevant Conclusions
For example, a disinfectant
manufacturer claims that its product
killed 40,760 germs in a laboratory in 5
seconds.
“To prevent colds, use disinfectant A.”
It may well be that the germs killed in
the laboratory were not related to any
type of cold germ.
Company C claims that its paper towels
are heavier than its competition’s
towels.
Therefore, they will hold more water. Is
weight a measure of absorbency?
A rock is heavier than a sponge, yet a
sponge is more absorbent.
13.229
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.230
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
5
1/20/2015
Draw Irrelevant Conclusions
Draw Irrelevant Conclusions
An insurance advertisement claims that
in Duluth, Minnesota, 212 people
switched to insurance company Z.
One may conclude that this company is
offering something special to attract
these people.
What may have been omitted from the
advertisement is that 415 people in
Duluth, Minnesota, dropped insurance
company Z during the same period.
A foreign car manufacturer claims that
9 of every 10 of a popular-model car it
sold in the United States during the
previous 10 years were still on the
road.
From this statement, the public is to
conclude that this foreign car is well
manufactured and would last for many
years.
13.231
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Draw Irrelevant Conclusions
The commercial neglects to state that
this model has been selling in the
United States for only a few years.
The manufacturer could just as well
have stated that 9 of every 10 of these
cars sold in the United States in the
previous 100 years were still on the
road.
13.233
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.232
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Charts and graphs can also be
misleading.
Even though the data is displayed
correctly, adjusting the vertical scale of
a graph can give a different
impression.
13.234
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Charts and Graphs
While each graph presents identical
information, the vertical scales have
been altered.
The graph in part (a) appears to show a
greater increase than the graph in part
(b), again because of a different scale.
13.235
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.236
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
6
1/20/2015
Charts and Graphs
Charts and Graphs
Consider a claim that if you invest $1, by
next year you will have $2. This type of
claim is sometimes misrepresented.
Actually, your investment has only doubled,
but the area of the square on the right is
four times that of the square on the left.
By expressing the amounts as cubes,
you increase the volume eightfold.
13.237
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
13.238
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
A circle graph can be misleading if the
sum of the parts of the graphs does
not add up to 100%.
13.239
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.240
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Charts and Graphs
The graph on the previous slide is
misleading since the sum of its parts is
183%.
A graph other than a circle graph
should have been used to display the
top six reasons Americans say they use
the Internet.
Despite the examples presented in this
section, you should not be left with the
impression that statistics is used solely
for the purpose of misleading or
cheating the consumer.
13.241
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.242
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
7
1/20/2015
Charts and Graphs
Section 13.2
As stated earlier, there are many
important and necessary uses of
statistics.
Most statistical reports are accurate
and useful.
You should realize, however, the
importance of being an aware
consumer.
13.243
The Misuses
of Statistics
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
Misuses of Statistics
Misuses of Statistics
What is Not Said
Vague or Ambiguous Words
Draw Irrelevant Conclusions
Charts and Graphs
Many individuals, businesses, and
advertising firms misuse statistics to
their own advantage.
13.245
13.247
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.246
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Misuses of Statistics
What is Not Said
When examining statistical
information, consider the following:
Was the sample used to gather the
statistical data unbiased and of
sufficient size?
Is the statistical statement
ambiguous, could it be interpreted in
more than one way?
“Four out of five dentists recommend
sugarless gum for their patients who
chew gum.”
In this advertisement, we do not know
the sample size and the number of
times the experiment was performed
to obtain the desired results.
The advertisement does not mention
that possibly only 1 out of 100 dentists
recommended gum at all.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.248
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
8
1/20/2015
What is Not Said
What is Not Said
In a golf ball commercial, a “type A”
ball is hit and a second ball is hit in the
same manner.
The type A ball travels farther.
We are supposed to conclude that the
type A is the better ball.
The advertisement does not mention
the number of times the experiment
was previously performed or the
results of the earlier experiments.
Possible sources of bias include
(1) wind speed and direction,
(2) that no two swings are identical,
and
(3) that the ball may land on a rough
or smooth surface.
13.249
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.250
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Vague or Ambiguous Words
Vague or Ambiguous Words
Vague or ambiguous words also lead to
statistical misuses or
misinterpretations.
The word average is one such culprit.
There are at least four different
“averages,” some of which are
discussed in Section 13.4.
Each is calculated differently, and each
may have a different value for the
same sample.
During contract negotiations, it is not
uncommon for an employer to state
publicly that the average salary of its
employees is $45,000, whereas the
employees’ union states that the
average is $40,000.
Who is lying?
Actually, both sides may be telling the
truth. Each side will use the average
that best suits its needs to present its
case.
13.251
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.252
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Vague or Ambiguous Words
Vague or Ambiguous Words
Advertisers also use the average that
most enhances their products.
Consumers often misinterpret this
average as the one with which they are
most familiar.
Another vague word is largest.
For example, ABC claims that it is the
largest department store in the United
States.
Does that mean largest profit, largest
sales, largest building, largest staff,
largest acreage, or largest number of
outlets?
13.253
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.254
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
9
1/20/2015
Draw Irrelevant Conclusions
Draw Irrelevant Conclusions
Still another deceptive technique used
in advertising is to state a claim from
which the public may draw irrelevant
conclusions.
For example, a disinfectant
manufacturer claims that its product
killed 40,760 germs in a laboratory in 5
seconds.
“To prevent colds, use disinfectant A.”
It may well be that the germs killed in
the laboratory were not related to any
type of cold germ.
13.255
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Draw Irrelevant Conclusions
Company C claims that its paper towels
are heavier than its competition’s
towels.
Therefore, they will hold more water. Is
weight a measure of absorbency?
A rock is heavier than a sponge, yet a
sponge is more absorbent.
13.257
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.256
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Draw Irrelevant Conclusions
An insurance advertisement claims that
in Duluth, Minnesota, 212 people
switched to insurance company Z.
One may conclude that this company is
offering something special to attract
these people.
What may have been omitted from the
advertisement is that 415 people in
Duluth, Minnesota, dropped insurance
company Z during the same period.
13.258
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Draw Irrelevant Conclusions
Draw Irrelevant Conclusions
A foreign car manufacturer claims that
9 of every 10 of a popular-model car it
sold in the United States during the
previous 10 years were still on the
road.
From this statement, the public is to
conclude that this foreign car is well
manufactured and would last for many
years.
The commercial neglects to state that
this model has been selling in the
United States for only a few years.
The manufacturer could just as well
have stated that 9 of every 10 of these
cars sold in the United States in the
previous 100 years were still on the
road.
13.259
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.260
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
10
1/20/2015
Charts and Graphs
Charts and graphs can also be
misleading.
Even though the data is displayed
correctly, adjusting the vertical scale of
a graph can give a different
impression.
13.261
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
While each graph presents identical
information, the vertical scales have
been altered.
13.262
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Charts and Graphs
The graph in part (a) appears to show a
greater increase than the graph in part
(b), again because of a different scale.
Consider a claim that if you invest $1, by
next year you will have $2. This type of
claim is sometimes misrepresented.
Actually, your investment has only doubled,
but the area of the square on the right is
four times that of the square on the left.
13.263
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
By expressing the amounts as cubes,
you increase the volume eightfold.
13.265
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.264
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
A circle graph can be misleading if the
sum of the parts of the graphs does
not add up to 100%.
13.266
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
11
1/20/2015
Charts and Graphs
Charts and Graphs
The graph on the previous slide is
misleading since the sum of its parts is
183%.
A graph other than a circle graph
should have been used to display the
top six reasons Americans say they use
the Internet.
13.267
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.268
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Charts and Graphs
Despite the examples presented in this
section, you should not be left with the
impression that statistics is used solely
for the purpose of misleading or
cheating the consumer.
As stated earlier, there are many
important and necessary uses of
statistics.
Most statistical reports are accurate
and useful.
You should realize, however, the
importance of being an aware
consumer.
13.269
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Section 13.3
Frequency
Distribution
and Statistical
Graphs
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.270
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
Frequency Distributions
Histograms
Frequency Polygons
Stem-and-Leaf Displays
Circle Graphs
13.372
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
12
1/20/2015
Frequency Distribution
A piece of data is a single response to
an experiment.
A frequency distribution is a listing
of observed values and the
corresponding frequency of occurrence
of each value.
13.373
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Frequency Distribution
The number of children per family is
recorded for 64 families surveyed.
Construct a frequency distribution of
the following data:
13.374
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.376
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Frequency Distribution
13.375
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Frequency Distribution
Eight families had no children,
11 families had one child,
18 families had two children,
and so on.
Note that the sum of the frequencies is
equal to the original number of pieces
of data, 64.
13.377
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Rules for Data Grouped by
Classes
1. The classes should be of the same
“width.”
2. The classes should not overlap.
3. Each piece of data should belong to
only one class.
Often suggested that there be 5 – 12
classes.
13.378
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13
1/20/2015
Definitions
Classes
 0−4 


 5−9 
10 − 14 
Lower class limits 
 Upper class limits
15 − 19 
 20 − 24 


 25 − 29 
Example 3: A Frequency
Distribution of Family Income
The following set of data represents
the family income (in thousands of
dollars, rounded to the nearest
hundred) of 15 randomly selected
families.
46.5
65.2
35.5
Midpoint of a class is found by adding
the lower and upper class limits and
dividing the sum by 2.
13.379
13.380
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
31.8
52.4
40.3
45.8
44.6
39.8
44.7
53.7
56.3
40.9
48.8
50.7
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: A Frequency
Distribution of Family Income
Example 3: A Frequency
Distribution of Family Income
Construct a frequency distribution with
a first class of 31.5–37.6.
Solution
Class width
is
37.6 – 31.5
= 6.2.
Solution
Rearrange data from lowest to highest.
31.8
35.5
39.8
13.381
40.3
40.9
44.6
44.7
45.8
46.5
48.8
50.7
52.4
53.7
56.3
65.2
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.382
Example 3: A Frequency
Distribution of Family Income
Histograms
A histogram is a graph with observed
values on its horizontal scale and
frequencies on its vertical scale.
Because histograms and other bar
graphs are easy to interpret visually,
they are used a great deal in
newspapers and magazines.
Solution
The modal class is 43.9–50.0.
The class mark of the first class is
(31.5 + 37.6)÷2 = 34.55.
13.383
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.384
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
14
1/20/2015
13.385
Constructing a Histogram
Constructing a Histogram
A bar is constructed above each
observed value (or class when classes
are used), indicating the frequency of
that value (or class).
The horizontal scale need not start at
zero, and the calibrations on the
horizontal and vertical scales do not
have to be the same.
The vertical scale must start at zero.
To accommodate large frequencies on
the vertical scale, it may be
necessary to break the scale.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.386
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.388
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.390
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 4: Construct a Histogram
The frequency distribution developed
in Example 1 is shown on the next
slide. Construct a histogram of this
frequency distribution.
13.387
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 4: Construct a Histogram
Solution
Vertical scale: 0 – 20.
Horizontal scale: 0 – 9.
Bar above 0 extends to 8.
Above 1, bar extends to 11.
Bar above 2 extends to 18.
Continue this procedure for each
observed value to get the histogram
on the next slide.
13.389
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
15
1/20/2015
13.391
13.393
Frequency Polygon
Constructing a Frequency Polygon
Frequency polygons are line graphs
with scales the same as those of the
histogram; that is, the horizontal
scale indicates observed values and
the vertical scale indicates frequency.
Place a dot at the corresponding
frequency above each of the observed
values.
Then connect the dots with straightline segments.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.392
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Constructing a Frequency Polygon
Example 5: Construct a Histogram
When constructing frequency
polygons, always put in two additional
class marks, one at the lower end and
one at the upper end on the
horizontal scale.
Since the frequency at these added
class marks is 0, the end points of the
frequency polygon will always be on
the horizontal scale.
Construct a frequency polygon of the
frequency distribution in Example 1,
found on the next slide.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.394
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 5: Construct a Histogram
Solution
Vertical scale: 0 – 20.
Horizontal scale: 0 – 9, plus one at
each end.
Place a mark above 0 at 8.
Place a mark above 1 at 11.
And so on.
Connect the dots, bring the end
points down to the horizontal axis.
13.395
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.396
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
16
1/20/2015
Stem-and-Leaf Display
A stem-and-leaf display is a tool
that organizes and groups the data
while allowing us to see the actual
values that make up the data.
13.397
13.399
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.398
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Constructing a Stem-and-Leaf
Display
Constructing a Stem-and-Leaf
Display
To construct a stem-and-leaf display
each value is represented with two
different groups of digits.
The left group of digits is called the
stem.
The remaining group of digits on the
right is called the leaf.
There is no rule for the number of
digits to be included in the stem.
Usually the units digit is the leaf and
the remaining digits are the stem.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.3100
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 8: Constructing a Stemand-Leaf Display
Example 8: Constructing a Stemand-Leaf Display
The table below indicates the ages of
a sample of 20 guests who stayed at
Captain Fairfield Inn Bed and
Breakfast. Construct a stem-and-leaf
display.
29 31 39 43 56
60 62 59 58 32
47 27 50 28 71
72 44 45 44 68
Solution
13.3101
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Stem
2
3
4
5
6
7
13.3102
Leaves
978
192
37454
6980
028
12
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
17
1/20/2015
Circle Graphs
Example 9: Circus Performances
Circle graphs (also known as pie
charts) are often used to compare
parts of one or more components of
the whole to the whole.
Eight hundred people who attended a
Ringling Bros. and Barnum & Bailey
Circus were asked to indicate their
favorite performance. The circle
graph shows the percentage of
respondents that answered tigers,
elephants, acrobats, jugglers, and
other. Determine the number of
respondents for each category.
13.3103
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 9: Circus Performances
13.3104
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 9: Circus Performances
Solution
To determine the number of
respondents in a category, we
multiply the percentage for each
category, written as a decimal
number, by the total number of
people, 800.
13.3105
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.3106
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 9: Circus Performances
Example 9: Circus Performances
Solution
Tigers 38%
Elephants 26%
Acrobats 17%
Jugglers 14%
Other 5%
Create the table on
the next slide.
Solution
13.3107
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.3108
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
18
1/20/2015
Example 9: Circus Performances
Section 13.4
Solution
304 people indicated tigers were their
favorite performance,
208 indicated elephants,
136 people indicated the acrobats,
112 people indicated the jugglers, and
40 people indicated some other
performance.
13.3109
Measures of
Central
Tendency
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
Measures of Central Tendency
Averages
Mean
Median
Mode
Midrange
Quartiles
An average is a number that is
representative of a group of data.
There are at least four different
averages: the mean, the median, the
mode, and the midrange.
Each is calculated differently and may
yield different results for the same set
of data.
13.4111
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.4112
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Measures of Central Tendency
Mean (or Arithmetic Mean)
Each will result in a number near the
center of the data; for this reason,
averages are commonly referred to as
measures of central tendency.
The mean, x , is the sum of the data
divided by the number of pieces of
data. The formula for calculating the
mean is
Σx
x=
n
where Σx represents the sum of all the
data and n represents the number of
pieces of data.
13.4113
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.4114
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
19
1/20/2015
Example 1: Determine the Mean
Determine the mean age of a group of
patients at a doctor’s office if the ages
of the individuals are 28, 19, 49, 35,
and 49.
Solution
x=
13.4115
The median is the value in the middle
of a set of ranked data.
Σx 28 + 19 + 49 + 35 + 49
=
n
5
180
=
= 36
5
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 2: Determine the Median
Determine the median age of a group
of patients at a doctor’s office if the
ages of the individuals are 28, 19, 49,
35, and 49.
Solution
Rank the data from smallest to largest.
19 28 35 49 49
35 is in the middle, 35 is the median.
13.4117
Median
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.4116
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Median of an Even Number of
Pieces of Data
Determine the median of the following
sets of data.
a) 9, 14, 16, 17, 11, 16, 11, 12
b) 7, 8, 8, 8, 9, 10
13.4118
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Median of an Even Number of
Pieces of Data
Example 3: Determine the
Median of an Even Number of
Pieces of Data
Solution
9, 11, 11, 12, 14, 16, 16, 17
8 pieces of data
Median is half way between middle two
data points 12 and 14
(12 + 14)÷2 = 26 ÷ 2 = 13
Solution
13.4119
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
7, 8, 8, 8, 9, 10
6 pieces of data
Median is half way between middle two
data points 8 and 8
(8 + 8)÷2 = 16 ÷ 2 = 8
13.4120
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
20
1/20/2015
Mode
Example 4: Determine the Mode
The mode is the piece of data that
occurs most frequently.
Determine the mean age of a group of
patients at a doctor’s office if the ages
of the individuals are 28, 19, 49, 35,
and 49.
Solution
The age 49 is the mode because it
occurs twice and the other values
occur only once.
13.4121
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.4122
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Midrange
Example 5: Determine the
Midrange
The midrange is the value halfway
between the lowest (L) and highest (H)
values in a set of data.
Determine the midrange age of a
group of patients at a doctor’s office if
the ages of the individuals are 28, 19,
49, 35, and 49.
Midrange =
lowest value + highest value
2
Solution
Midrange =
13.4123
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.4124
68
19 + 49
= 34
=
2
2
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Measures of Position
Percentiles
Measures of position are often used to
make comparisons.
Two measures of position are
percentiles and quartiles.
There are 99 percentiles dividing a set
of data into 100 equal parts.
13.4125
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.4126
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
21
1/20/2015
Percentiles
Quartiles
A score in the nth percentile means
that you out-performed about n% of
the population who took the test and
that (100 – n)% of the people taking
the test performed better than you did.
Quartiles divide data into four equal
parts:
The first quartile is the value that is
higher than about 1/4, or 25%, of the
population. It is the same as the 25th
percentile.
13.4127
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Quartiles
13.4128
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Quartiles
The second quartile is the value that is
higher than about 1/2 the population
and is the same as the 50th percentile,
or the median.
The third quartile is the value that is
higher than about 3/4 of the population
and is the same as the 75th percentile.
13.4129
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.4130
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Quartiles of a
Set of Data
To Determine the Quartiles of a
Set of Data
1. Order the data from smallest to
largest.
2. Find the median, or 2nd quartile, of
the set of data. If there are an odd
number of pieces of data, the
median is the middle value. If
there are an even number of
pieces of data, the median will be
halfway between the two middle
pieces of data.
13.4131
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.4132
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
22
1/20/2015
To Determine the Quartiles of a
Set of Data
To Determine the Quartiles of a
Set of Data
3. The first quartile, Q1, is the
median of the lower half of the
data; that is, Q1, is the median of
the data less than Q2.
4. The third quartile, Q3, is the
median of the upper half of the
data; that is, Q3 is the median of
the data greater than Q2.
13.4133
13.4134
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 8: Finding Quartiles
Electronics World is concerned about
the high turnover of its sales staff. A
survey was done to determine how
long (in months) the sales staff had
been in their current positions. The
responses of 27 sales staff follow.
Determine Q1, Q2, and Q3.
13.4135
Example 8: Finding Quartiles
25
3 7 15 31 36 17 21 2
11 42 16 23 16 21 9 20 5
8
12 27 14 39 24 18 6 10
Solution
List data from
2
3 5
12 14 15
21 23 24
13.4136
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
smallest to largest.
6 7 8 9 10 11
16 17 18 19 20 21
25 27 31 36 39 42
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 8: Finding Quartiles
Solution
2
3 5 6 7 8 9
12 14 15 16 17 18 19
21 23 24 25 27 31 36
The median, or middle of the
points is Q2 = 17.
The median, or middle of the
pieces of data is Q1 = 9.
The median, or middle of the
pieces of data is Q3 = 24.
13.4137
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
10
20
39
27
11
21
42
data
lower 13
upper 13
Section 13.5
Measures of
Dispersion
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
23
1/20/2015
What You Will Learn
Measures of Dispersion
Range
Standard Deviation
Measures of dispersion are used to
indicate the spread of the data.
13.5139
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.5140
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Range
Example 1: Determine the Range
The range is the difference between
the highest and lowest values; it
indicates the total spread of the data.
Range = highest value – lowest value
The amount of caffeine, in milligrams,
of 10 different soft drinks is given
below. Determine the range of these
data.
38, 43, 26, 80, 55, 34, 40, 30, 35, 43
13.5141
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Determine the Range
Solution
38, 43, 26, 80, 55, 34, 40, 30, 35, 43
Range = highest value – lowest value
= 80 – 26 = 54
The range of the amounts of caffeine is
54 milligrams.
13.5143
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.5142
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Standard Deviation
The standard deviation measures
how much the data differ from the
mean. It is symbolized with s when it
is calculated for a sample, and with ⌠
(Greek letter sigma) when it is
calculated for a population.
∑ (x − x )
2
s=
13.5144
n −1
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
24
1/20/2015
Standard Deviation
The standard deviation, s, of a set of
data can be calculated using the
following formula.
∑ (x − x )
2
s=
13.5145
n −1
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Find the Standard Deviation of
a Set of Data
1. Find the mean of the set of data.
2. Make a chart having three columns:
Data
Data – Mean
(Data – Mean)2
3. List the data vertically under the
column marked Data.
4. Subtract the mean from each piece
of data and place the difference in
the Data – Mean column.
13.5146
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Find the Standard Deviation of
a Set of Data
To Find the Standard Deviation of
a Set of Data
5. Square the values obtained in the
Data – Mean column and record
these values in the (Data – Mean)2
column.
6. Determine the sum of the values in
the (Data – Mean)2 column.
7. Divide the sum obtained in Step 6
by n – 1, where n is the number of
pieces of data.
8. Determine the square root of the
number obtained in Step 7. This
number is the standard deviation of
the set of data.
13.5147
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.5148
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Standard Deviation of Stock Prices
Example 3: Determine the
Standard Deviation of Stock Prices
The following are the prices of nine
stocks on the New York Stock
Exchange. Determine the standard
deviation of the prices.
$17, $28, $32, $36, $50, $52, $66,
$74, $104
Solution
The mean x is
∑x
x=
n
13.5149
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
=
=
13.5150
17 + 28 + 32 + 36 + 50 + 52 + 66 + 74 + 104
9
459
9
= 51
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
25
1/20/2015
Example 3: Determine the
Standard Deviation of Stock Prices
Example 3: Determine the
Standard Deviation of Stock Prices
Solution
Use the formula
∑ (x − x )
2
s=
n −1
=
5836
= 729.5 ≈ 27.01
9 −1
The standard deviation, to the nearest
tenth, is $27.01.
13.5151
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.5152
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
Section 13.6
The Normal
Curve
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Rectangular Distribution
J-shaped Distribution
Bimodal Distribution
Skewed Distribution
Normal Distribution
z-Scores
13.6154
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Rectangular Distribution
J-shaped Distribution
All the observed values occur with the
same frequency.
The frequency is either constantly
increasing or constantly decreasing.
13.6155
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.6156
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
26
1/20/2015
Bimodal Distribution
Skewed Distribution
Two nonadjacent values occur more
frequently than any other values in a
set of data.
Has more of a “tail” on one side than
the other.
13.6157
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.6158
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Skewed Distribution
Skewed Distribution
Smoothing the histograms of the
skewed distributions to form curves.
The relationship between the mean,
median, and mode for curves that are
skewed to the right and left.
13.6159
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.6160
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Normal Distribution
Properties of a Normal Distribution
The most important distribution is the
normal distribution.
The graph of a normal distribution is
called the normal curve.
The normal curve is bell shaped and
symmetric about the mean.
In a normal distribution, the mean,
median, and mode all have the same
value and all occur at the center of the
distribution.
13.6161
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.6162
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
27
1/20/2015
Empirical Rule
Approximately 68% of all the data lie
within one standard deviation of the
mean (in both directions).
Approximately 95% of all the data lie
within two standard deviations of the
mean (in both directions).
Approximately 99.7% of all the data lie
within three standard deviations of the
mean (in both directions).
13.6163
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
z-Scores
z-scores (or standard scores)
determine how far, in terms of
standard deviations, a given score is
from the mean of the distribution.
13.6164
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
z-Scores
Example 2: Finding z-scores
The formula for finding z-scores (or
standard scores) is
A normal distribution has a mean of 80
and a standard deviation of 10.
z=
=
value of piece of data − mean
standard deviation
x−µ
13.6165
Find z-scores for the following values.
a) 90 b) 95 c) 80 d) 64
σ
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.6166
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 2: Finding z-scores
Example 2: Finding z-scores
Solution
a) 90
Solution
b) 95
z =
value of piece of data − mean
standard deviation
90 − 80
10
=
=1
10
10
A value of 90 is 1 standard deviation
above the mean.
z90 =
13.6167
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
z =
value of piece of data − mean
standard deviation
95 − 80
15
=
= 1.5
10
10
A value of 90 is 1.5 standard
deviations above the mean.
z95 =
13.6168
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
28
1/20/2015
Example 2: Finding z-scores
Example 2: Finding z-scores
Solution
c) 80
Solution
d) 64
z =
value of piece of data − mean
standard deviation
80 − 80
0
=
=0
10
10
The mean always has a z-score of 0.
z80 =
13.6169
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
z =
z64 =
value of piece of data − mean
standard deviation
64 − 80
−16
=
= −1.6
10
10
A value of 64 is 1.6 standard
deviations below the mean.
13.6170
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Percent of Data
Between any Two Values
To Determine the Percent of Data
Between any Two Values
1. Draw a diagram of the normal curve
indicating the area or percent to be
determined.
2. Use the formula to convert the given
values to z-scores. Indicate these
z-scores on the diagram.
3. Look up the percent that corresponds
to each z-score in Table 13.7.
a) When finding the percent of data to
the left of a negative z-score, use
Table 13.7(a).
13.6171
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.6172
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Percent of Data
Between any Two Values
To Determine the Percent of Data
Between any Two Values
b) When finding the percent of data to
the left of a positive z-score, use
Table 13.7(b).
c) When finding the percent of data to
the right of a z-score, subtract the
percent of data to the left of that zscore from 100%.
13.6173
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.6174
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
29
1/20/2015
To Determine the Percent of Data
Between any Two Values
To Determine the Percent of Data
Between any Two Values
c) Or use the symmetry of a normal
distribution.
d) When finding the percent of data
between two z-scores, subtract the
smaller percent from the larger
percent.
13.6175
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Percent of Data
Between any Two Values
13.6176
Example 5: Horseback Rides
Assume that the length of time for a
horseback ride on the trail at Triple R
Ranch is normally distributed with a
mean of 3.2 hours and a standard
deviation of 0.4 hour.
a) What percent of horseback rides
last at least 3.2 hours?
Solution
In a normal distribution, half the data
are above the mean. Since 3.2 hours
is the mean, 50%, of the horseback
rides last at least 3.2 hours.
4. Change the areas you found in Step
3 to percents as explained earlier.
13.6177
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.6178
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 5: Horseback Rides
Example 5: Horseback Rides
b) What percent of horseback rides last
less than 2.8 hours?
Solution
Convert 2.8 to a z-score.
c) What percent of horseback rides are
at least 3.7 hours?
Solution
Convert 3.7 to a z-score.
2.8 − 3.2
= −1.00
0.4
The area to the left of –1.00 is 0.1587.
The percent of horseback rides that
last less than 2.8 hours is 15.87%.
3.7 − 3.2
= 1.25
0.4
Area to left of 1.25 is .8944 = 89.44%.
% above 1.25: 1 – 89.44% = 10.56%.
Thus, 10.56% of horseback rides last
at least 3.7 hours.
z2.8 =
13.6179
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
z3.7 =
13.6180
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
30
1/20/2015
Example 5: Horseback Rides
Example 5: Horseback Rides
d) What percent of horseback rides are
between 2.8 hours and 4.0 hours?
Solution
Convert 4.0 to a z-score.
Solution
4.0 − 3.2
= 2.00
0.4
Area to left of 2.00 is .9722 = 97.22%.
Percent below 2.8 is 15.87%.
The percent of data between –1.00 and
2.00 is 97.22% – 15.87% =
81.58%.
z4.0 =
13.6181
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Thus, the percent of horseback rides
that last between 2.8 hours and 4.0
hours is 81.85%.
13.6182
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 5: Horseback Rides
13.6183
e) In a random sample of 500
horseback rides at Triple R Ranch,
how many are at least 3.7 hours?
Section 13.7
Solution
In part (c), we determined that
10.56% of all horseback rides last
at least 3.7 hours.
Thus, 0.1056 × 500 = 52.8, or
approximately 53, horseback rides
last at least 3.7 hours.
Linear
Correlation
and
Regression
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
Linear Correlation
Linear Correlation
Scatter Diagram
Linear Regression
Least Squares Line
Linear correlation is used to
determine whether there is a linear
relationship between two quantities
and, if so, how strong the relationship
is.
13.7185
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.7186
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
31
1/20/2015
Linear Correlation Coefficient
The linear correlation coefficient, r, is a
unitless measure that describes the
strength of the linear relationship
between two variables.
If the value is positive, as one variable
increases, the other increases.
If the value is negative, as one
variable increases, the other
decreases.
The variable, r, will always be a value
between –1 and 1 inclusive.
13.7187
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Scatter Diagrams
Scatter Diagrams
A visual aid used with correlation is
the scatter diagram, a plot of points
(bivariate data).
The independent variable, x,
generally is a quantity that can be
controlled.
The dependent variable, y, is the
other variable.
13.7188
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Correlation
The value of r is a measure of how far
a set of points varies from a straight
line.
The greater the spread, the weaker
the correlation and the closer the r
value is to 0.
The smaller the spread, the stronger
the correlation and the closer the r
value is to 1 or –1.
13.7189
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Correlation
13.7190
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Linear Correlation Coefficient
The formula to calculate the correlation
coefficient (r) is as follows.
r=
( ) ( )( )
n (∑ x )− (∑ x ) n (∑ y )− (∑ y )
n ∑ xy − ∑ x ∑ y
2
13.7191
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.7192
2
2
2
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
32
1/20/2015
Example 1: Number of Absences
Versus Number of Defective Parts
Example 1: Number of Absences
Versus Number of Defective Parts
Egan Electronics provided the following
daily records about the number of
assembly line workers absent and the
number of defective parts produced for
6 days. Determine the correlation
coefficient between the number of
workers absent and the number of
defective parts produced.
13.7193
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.7194
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.7196
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Number of Absences
Versus Number of Defective Parts
Solution
Here’s the scatter diagram.
13.7195
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Number of Absences
Versus Number of Defective Parts
Example 1: Number of Absences
Versus Number of Defective Parts
Solution
Find r.
Solution
r=
( ) ( )( )
n (∑ x )− (∑ x ) n (∑ y )− (∑ y )
6 (387)− (17)(106)
6 (75)− (17) 6 (2002)− (106)
n ∑ xy − ∑ x ∑ y
2
2
r=
13.7197
2
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
r=
2
2
2322 − 1802
( )
(
)
6 75 − 289 6 2002 − 11,236
r=
2
13.7198
520
450 − 289 13,212 − 11,236
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
33
1/20/2015
Example 1: Number of Absences
Versus Number of Defective Parts
Solution
r=
520
≈ 0.922
161 1976
Since the maximum possible value for
r is 1.00, a correlation coefficient of
0.922 is a strong, positive correlation.
This result implies that, generally, the
more assembly line workers absent,
the more defective parts produced.
13.7199
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Linear Regression
Linear Regression
Linear regression is the process of
determining the linear relationship
between two variables.
13.7200
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
The Line of Best Fit
The equation of the line of best fit is
The line of best fit (regression line
or the least squares line) is the line
such that the sum of the squares of
the vertical distances from the line to
the data points (on a scatter diagram)
is a minimum.
y = mx + b,
where
) ( )(∑ y ),
n (∑ x )− (∑ x )
∑ y − m (∑ x )
b=
m=
(
n ∑ xy − ∑ x
and
2
2
n
13.7201
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.7202
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: The Line of Best Fit
Example 3: The Line of Best Fit
a) Use the data in Example 1 to find
the equation of the line of best fit that
relates the number of workers absent
on an assembly line and the number of
defective parts produced.
b) Graph the equation of the line of
best fit on a scatter diagram that
illustrates the set of bivariate points.
Solution
From Example 1, we know that
13.7203
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
m=
(
) ( )(∑ y )
n (∑ x )− (∑ x )
n ∑ xy − ∑ x
2
=
13.7204
2
520
≈ 3.23
161
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
34
1/20/2015
Example 3: The Line of Best Fit
Example 3: The Line of Best Fit
Solution
Now, find the y-intercept, b.
∑y − m ∑x
b=
n
106 − 3.23 17
=
6
Solution
( )
( )
≈
13.7205
The equation of the line of best fit is
y = mx + b
y = 3.23x + 8.52
51.09
≈ 8.52
6
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: The Line of Best Fit
13.7206
Example 3: The Line of Best Fit
Solution
To graph y = 3.23x + 8.52, plot at
least two points and draw the graph.
x
2
4
6
13.7207
x
2
4
6
y
14.98
21.44
27.90
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.7208
y
14.98
21.44
27.90
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
35