Download chapter 13 - El Camino College

Document related concepts

History of statistics wikipedia , lookup

Transcript
What You Will Learn
•
Sampling Techniques
•
Random Sampling
•
Systematic Sampling
•
Cluster Sampling
•
Stratified Sampling
•
Convenience Sampling
13.1-1
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Statistics
Statistics is the art and science of
gathering, analyzing, and making inferences
(predictions) from numerical information,
data, obtained in an experiment.
13.1-2
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Statistics
Statistics is divided into two main branches.
•
Descriptive statistics is concerned with the
collection, organization, and analysis of data.
•
Inferential statistics is concerned with making
generalizations or predictions from the data
collected.
13.1-3
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Statisticians
A statistician’s interest lies in drawing conclusions
about possible outcomes through observations of
only a few particular events.
The population consists of all items or people of
interest.
The sample includes some of the items in the
population.
13.1-4
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Statisticians
When a statistician draws a conclusion from a
sample, there is always the possibility that the
conclusion is incorrect.
13.1-5
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Types of Sampling
•
Random sampling (each item has equal probability)
•
Systematic sample (like select every 8th)
•
Cluster sample (area sampling)
•
Stratified sample (divide into groups according to
some characteristics, then select random sample(s)
from each group)
•
Convenient sample (uses data that are easily obtained,
and can be extremely biased)
13.1-6
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Identifying Sampling
Techniques
Identify the sampling technique used to obtain a
sample in the following. Explain your answer.
Every 20th soup can coming off an assembly
line is checked for defects.
Systematic Sampling
a)
13.1-7
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Identifying Sampling
Techniques
b) A $50 gift certificate is given away at the Annual
Bankers Convention. Tickets are placed in a bin,
and the tickets are mixed up. Then the winning
ticket is selected by a blindfolded person.
Random Sampling
13.1-8
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Identifying Sampling
Techniques
c) Children in a large city are classified based on
the neighborhood school they attend. A random
sample of five schools is selected. All the children
from each selected school are included in the
sample.
13.1-9
Cluster Sampling
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Identifying Sampling
Techniques
d) The first 50 people entering a zoo are asked if
they support an increase in taxes to support a zoo
expansion.
13.1-10
Convenience Sampling
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Identifying Sampling
Techniques
e) Viewers of the USA Network are classified
according to age. Random samples from each age
group are selected.
13.1-11
Stratified Sampling
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
•
Misuses of Statistics
•
What is Not Said
•
Vague or Ambiguous Words
•
Draw Irrelevant Conclusions
•
Charts and Graphs
13.2-12
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Misuses of Statistics
Many individuals, businesses, and
advertising firms misuse statistics to their
own advantage.
13.2-13
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Misuses of Statistics
When examining statistical information, consider
the following:
•Was
the sample used to gather the statistical
data unbiased and of sufficient size?
•Is
the statistical statement ambiguous, could it
be interpreted in more than one way?
13.2-14
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What is Not Said
“Four out of five dentists recommend sugarless gum for
their patients who chew gum.”
•
the advertisement does not tell the sample size and the
number of times the experiment was performed to obtain
the desired results.
•
The advertisement does not mention that possibly only 1
out of 100 dentists recommended gum at all.
13.2-15
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Vague or Ambiguous Words
Vague or ambiguous words also lead to statistical
misuses or misinterpretations.
The word average is one such culprit. There are at
least four different “averages,” some of which are
discussed in Section 13.4.
13.2-16
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Vague or Ambiguous Words
During contract negotiations, it is not uncommon
for an employer to state publicly that the average
salary of its employees is $45,000, whereas the
employees’ union states that the average is
$40,000.
Who is lying?
Actually, both sides may be telling the truth. Each
side will use the average that best suits its needs
to present its case.
13.2-17
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Vague or Ambiguous Words
Another vague word is largest.
For example, ABC claims that it is the largest
department store in the United States.
Does that mean largest profit, largest sales,
largest building, largest staff, largest
acreage, or largest number of outlets?
13.2-18
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Draw Irrelevant Conclusions
Still another deceptive technique used in
advertising is to state a claim from which the public
may draw irrelevant conclusions.
13.2-19
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Draw Irrelevant Conclusions
For example, a disinfectant manufacturer claims that its
product killed 40,760 germs in a laboratory in 5 seconds.
“To prevent colds, use disinfectant A.”
It may well be that the germs killed in the laboratory were
not related to any type of cold germ.
13.2-20
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Charts and graphs can also be misleading.
Even though the data is displayed correctly,
adjusting the vertical scale of a graph can
give a different impression.
13.2-21
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
While each graph presents identical
information, the vertical scales have
been altered.
13.2-22
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
The graph in part (a) appears to show a
greater increase than the graph in part (b),
again because of a different scale.
13.2-23
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Consider a claim that if you invest $1, by next
year you will have $2. This type of claim is
sometimes misrepresented. Actually, your
investment has only doubled, but the area of the
square on the right is four times that of the
square on the left.
13.2-24
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
By expressing the amounts as cubes, you
increase the volume eightfold.
13.2-25
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
A circle graph can be misleading if the
sum of the parts of the graphs does
not add up to 100%.
13.2-26
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Wrong pie chart.
Sum of percents = 183%
13.2-27
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Charts and Graphs
Despite the examples presented in this section,
you should not be left with the impression that
statistics is used solely for the purpose of
misleading or cheating the consumer.
There are many important and necessary uses of
statistics.
Most statistical reports are accurate and useful.
You should realize, however, the importance of
being an aware consumer.
13.2-28
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
•
Frequency Distributions
•
Histograms
•
Frequency Polygons
•
Stem-and-Leaf Displays
•
Circle Graphs
13.3-29
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Frequency Distribution
A piece of data is a single response to an
experiment.
A frequency distribution is a listing of observed
values and the corresponding frequency of
occurrence of each value. (Table!)
13.3-30
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Frequency Distribution
The number of children per family is recorded for 64
families surveyed. Construct a frequency distribution of
the following data:
13.3-31
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Frequency Distribution
Number of children
(observed values)
Number of families
(frequency)
0
8
1
11
2
18
3
11
4
6
5
4
6
2
7
1
8
2
9
1
Total: 64
13.3-32
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Rules for Data Grouped by
Classes
A more general frequency distribution (group):
1. The classes should be of the same “width.”
2. The classes should not overlap.
3. Each piece of data should belong to only one
class. Often suggested that there be 5 – 12
classes.
13.3-33
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Definitions
Classes
 0−4 


 5−9 
10 − 14 
Lower class limits 
 Upper class limits
15 − 19 
20 − 24 


25 − 29 
Midpoint of a class is found by adding the
lower and upper class limits and dividing
the sum by 2.
13.3-34
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: A Frequency
Distribution of Family Income
The following set of data represents the family income (in
thousands of dollars, rounded to the nearest hundred) of
15 randomly selected families.
46.5
65.2
35.5
13.3-35
31.8
52.4
40.3
45.8
44.6
39.8
44.7
53.7
56.3
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
40.9
48.8
50.7
Example 3: A Frequency
Distribution of Family Income
Construct a frequency distribution with
a first class of 31.5–37.6.
Solution:
First sort the data
(from smallest to
largest)
13.3-36
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Histograms
A histogram is a graph with observed
values on its horizontal scale and
frequencies on its vertical scale.
13.3-37
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 4: Construct a Histogram
The frequency distribution developed
in Example 1 is shown on the next
slide. Construct a histogram of this
frequency distribution.
13.3-38
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.3-39
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Frequency Polygon
Frequency polygons are line graphs
with scales the same as those of the
histogram; that is, the horizontal
scale indicates observed values and
the vertical scale indicates frequency.
13.3-40
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 5: Construct a Histogram
Construct a frequency polygon of the
frequency distribution in Example 1,
found on the next slide.
Comment: need to add two points, one on left
and another on right. They lie on horizontal axis.
13.3-41
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
13.3-42
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Stem-and-Leaf Display
A stem-and-leaf display is a tool
that organizes and groups the data
while allowing us to see the actual
values that make up the data.
13.3-43
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 8: Constructing a Stemand-Leaf Display
The table below indicates the ages of a sample
of 20 guests who stayed at Captain Fairfield Inn
Bed and Breakfast. Construct a stem-and-leaf
display.
29
60
47
72
13.3-44
31
62
27
44
39
59
50
45
43
58
28
44
56
32
71
68
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 8: Constructing a Stemand-Leaf Display
Solution
Stem
2
3
4
5
6
7
13.3-45
Leaves
978
192
37454
6980
028
12
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 9: Circus Performances
Eight hundred people who attended a Ringling
Bros. and Barnum & Bailey Circus were asked to
indicate their favorite performance. The circle
graph shows the percentage of respondents that
answered tigers, elephants, acrobats, jugglers,
and other. Determine the number of
respondents for each category.
13.3-46
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 9: Circus Performances
Solution
Answers:
304:
208:
136:
112:
40:
13.3-47
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
tigers
elephants
acrobats,
jugglers
other performance
What You Will Learn
1. Averages: mean, median, mode & midrange
2. Measure of positions: Percentile & Quartiles
13.4-48
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Measures of Central Tendency
An average is a number that is representative of a
group of data. There are at least 4 different
averages:
•
Mean
•
Median
•
Mode
•
midrange
13.4-49
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Measures of Central Tendency
Each will result in a number near the center
of the data; therefore, average is referred
to as measures of central tendency.
13.4-50
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Mean (or Arithmetic Mean)
The mean,
13.4-51
Σx
x=
n
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Determine the Mean
Determine the mean age of a group of
patients at a doctor’s office if the ages of
the individuals are 28, 19, 49, 35, and 49.
x = 36
13.4-52
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Median
The median is the value in the middle
of a set of ranked data.
13.4-53
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 2: Determine the Median
Determine the median age of a group of
patients at a doctor’s office if the ages of
the individuals are 28, 19, 49, 35, and 49.
Median = 35
Comment: odd number of pieces of data.
13.4-54
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Median of an Even Number of
Pieces of Data
Determine the median of the following sets
of data.
a) 9, 14, 16, 17, 11, 16, 11, 12 Median = 13
13.4-55
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Mode
The mode is the piece of data that
occurs most frequently.
13.4-56
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 4: Determine the Mode
Determine the mode age of a group of
patients at a doctor’s office if the ages of
the individuals are 28, 19, 49, 35, and 49.
Mode = 49
13.4-57
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Midrange
The midrange is the value halfway
between the lowest (L) and highest (H)
values in a set of data.
lowest value + highest value
Midrange =
2
13.4-58
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 5: Determine the
Midrange
Determine the midrange age of a group of
patients at a doctor’s office if the ages of
the individuals are 28, 19, 49, 35, and 49.
Midrange = 34
13.4-59
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Measures of Position
Measures of position are often used to
make comparisons.
Two measures of position are
percentiles and quartiles.
13.4-60
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Percentiles
There are 99 percentiles dividing a set
of data into 100 equal parts.
13.4-61
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Percentiles
A score in the nth percentile means that
you out-performed about n% of the
population who took the test and that (100
– n)% of the people taking the test
performed better than you did.
13.4-62
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Quartiles
Quartiles divide data into four equal parts:
13.4-63
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Quartiles of a
Set of Data
1.
2.
3.
4.
13.4-64
Order the data from smallest to largest.
Q2 = the median. Q2 divide the ranked data
into lower half and upper half.
Q1 = the median of the lower half of the
data
Q3 = the median of the upper half of the
data.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 8: Finding Quartiles
Electronics World is concerned about the high
turnover of its sales staff. A survey was done to
determine how long (in months) the sales staff
had been in their current positions. The responses
of 27 sales staff follow. Determine Q1, Q2, and Q3.
13.4-65
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 8: Finding Quartiles
25
3 7 15 31 36 17 21 2
11 42 16 23 16 21 9 20 5
8 12 27 14 39 24 18 6 10
Q2 = 16, Q1 = 9, Q3 = 24
13.4-66
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
Range
Standard Deviation
13.5-67
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Measures of Dispersion
Range and standard deviation are
measures of dispersion.
Measures of dispersion are used to
indicate the spread of the data.
13.5-68
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Range
The range is the difference between
the highest and lowest values; it
indicates the total spread of the data.
Range = highest value – lowest value
13.5-69
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 1: Determine the Range
The amount of caffeine, in milligrams, of 10
different soft drinks is given below.
Determine the range of these data.
38, 43, 26, 80, 55, 34, 40, 30, 35, 43
Range = 54 milligrams
13.5-70
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Standard Deviation
The standard deviation measures how
much the data differ from the mean. It is
symbolized with s when it is calculated for a
sample, and with σ (Greek letter sigma)
when it is calculated for a population.
13.5-71
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Standard Deviation
The standard deviation, s, of a set of
data can be calculated using the
following formula.
∑ (x − x )
2
s=
13.5-72
n −1
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Find the Standard Deviation of
a Set of Data
1.Find the mean of the set of data.
2.Make a chart having three columns: (3 – 6 see book)
Data
Data – Mean
(Data – Mean)2
7. Divide the sum obtained in Step 6 by n – 1, where n is
the number of pieces of data.
8. Determine the square root of the number obtained in
Step 7. This number is the standard deviation of the set
of data
13.5-73
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Standard Deviation of Stock Prices
The following are the prices of nine stocks on the
New York Stock Exchange. Determine the
standard deviation of the prices.
$17, $28, $32, $36, $50, $52, $66, $74, $104
13.5-74
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Standard Deviation of Stock Prices
Solution:
x = 51
See Excel.
13.5-75
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Standard Deviation of Stock Prices
13.5-76
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Standard Deviation of Stock Prices
Solution
Use the formula
∑ (x − x )
2
s=
n −1
=
5836
= 729.5 ≈ 27.01
9 −1
The standard deviation, to the nearest
tenth, is $27.01.
13.5-77
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 3: Determine the
Standard Deviation of Stock Prices
Another method: using TI-84
S = 27.01
13.5-78
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
What You Will Learn
•
Rectangular Distribution
•
J-shaped Distribution
•
Bimodal Distribution
•
Skewed Distribution
•
Normal Distribution
•
z-Scores
13.6-79
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Rectangular Distribution
All the observed values occur with the
same frequency.
13.6-80
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
J-shaped Distribution
The frequency is either constantly
increasing or constantly decreasing.
13.6-81
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Bimodal Distribution
Two nonadjacent values occur more
frequently than any other values in a
set of data.
13.6-82
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Skewed Distribution
Has more of a “tail” on one side than
the other.
13.6-83
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Skewed Distribution
Smoothing the histograms of the
skewed distributions to form curves.
13.6-84
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Skewed Distribution
The relationship between the mean,
median, and mode for curves that are
skewed to the right and left.
13.6-85
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Normal Distribution
The most important distribution is the
normal distribution.
13.6-86
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Properties of a Normal Distribution
•
•
•
13.6-87
The graph of a normal distribution is called the
normal curve.
The normal curve is bell shaped and symmetric
about the mean.
In a normal distribution, the mean, median,
and mode all have the same value and all occur
at the center of the distribution.
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Empirical Rule
Approximately 68% of all the data lie within one standard
deviation of the mean (in both directions).
Approximately 95% of all the data lie within two standard
deviations of the mean (in both directions).
Approximately 99.7% of all the data lie within three
standard deviations of the mean (in both directions).
13.6-88
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
z-Scores
z-scores (or standard scores)
determine how far, in terms of
standard deviations, a given score is
from the mean of the distribution.
13.6-89
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
z-Scores
The formula for finding z-scores (or
standard scores) is
value of piece of data − mean
z=
standard deviation
x−µ
=
σ
13.6-90
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 2: Finding z-scores
A normal distribution has a mean of 80
and a standard deviation of 10.
Find z-scores for the following values.
a) 90
b) 95
c) 80
d) 64
z90 = 1
13.6-91
z95 = 1.5
z80 = 0
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
z64 = −1.6
To Determine the Percent of Data
Between any Two Values
Look up the percent that corresponds to each
z-score in Table 13.7.
13.6-92
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Percent of Data
Between any Two Values
a) negative z-score, use Table 13.7(a).
13.6-93
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Percent of Data
Between any Two Values
b) positive z-score, use Table 13.7(b).
13.6-94
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Percent of Data
Between any Two Values
c) When finding the percent of data to the right of
a z-score
(1) Use complement:
area to the right of z-score
= 1 – area to the left of z-score
(2) Use symmetry:
area to the right of z-score
= area to the left of the negative z-score
13.6-95
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Percent of Data
Between any Two Values
d) When finding the percent of data
between two z-scores, subtract the
smaller percent from the larger
percent.
13.6-96
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
To Determine the Percent of Data
Between any Two Values
4. Change the areas you found in Step
3 to percents.
13.6-97
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 5: Horseback Rides
Assume that the length of time for a horseback ride on the
trail at Triple R Ranch is normally distributed with a mean
of 3.2 hours and a standard deviation of 0.4 hour.
50%
a)
What percent of horseback rides last at least 3.2 hours?
b)
What percent of horseback rides last less than 2.8
hours? 15.87%
10.56%
c)
What percent of horseback rides are at least 3.7 hours?
d)
What percent of horseback rides are between 2.8 hours
and 4.0 hours?
13.6-98
81.58%
Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Example 5: Horseback Rides
e) In a random sample of 500 horseback rides at
Triple R Ranch, how many are at least 3.7
hours?
Approximately 53 horseback rides last
At least 3.7 hours.
13.6-99
Copyright 2013, 2010, 2007, Pearson, Education, Inc.