Download Chapter 12: Discrete Math: Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter Twelve: Discrete Mathematics: Statistics
Section One: Measures of Central Tendency
We can describe a set of numbers by using a center measure. There are three different types of these
measures of central tendency.
Mean – The arithmetic average. Add all the numbers and divide by the number of numbers. We denote
the mean with x .
Median – The middle number. Place the numbers in order and find the middle number.
Mode – The number that occurs the most.
EX1: Find the measures of central tendency of the following set of numbers 25,33, 26,30, 28, 28, 29 .
EX2: Last month, 6 houses were sold in Centerville. The sale prices were as follows: $140,000, $165,000,
$150,000, $300,000, $146,000, and $125,000. Find the mean, median, and mode of the sale prices.
Explain which measure best represents the typical sale price of a house in Centerville last month?
Sometimes we need a way to keep track of the occurrences of our data. A frequency table is a good way
to do this.
EX3: Complete the frequency table for the data below on the number of
books read by 30 students last month. Then use the frequency table to
find the mean.
4, 0, 1, 2, 6, 0, 1, 2, 0, 4, 0, 2, 2, 1, 1, 3, 1, 0, 3, 2, 0, 1, 2, 2, 4, 1, 1, 1, 0, 4
Books Tally
Frequency
0
1
2
3
4
5
6
Total
When there are many different values we can use a grouped frequency
table. In a grouped frequency table, the values are grouped into classes that
contain an equal range of values. We can only estimate the mean when
looking at a group frequency table. We do so by finding the average of each
group.
EX4: The group frequency table lists the numbers of CDs bought by 50
students last year. Estimate the mean number of CDs bought by these
students last year.
# of CDs Frequency
0
4
1 5
14
6  10
9
11  15
10
16  20
8
21  25
1
26  30
2
31  35
1
36  40
1
Section Two: Stem-and-Leaf Plots, Histograms, and Circle Graphs
A stem-and-leaf plot is a way to arrange a set of data. We can see how spread out the data is by looking
at the plot. The stems and leaves can be arranged many different ways: tens and ones, whole and
decimal, etc. Each stem-and-leaf plot should have a key explaining the nature of the stems and leaves.
To the right is an example of a stem-and-leaf plot.
32 32 34 43 38
12 17 53 22 51
27 34 43 44 44
8
30 30 31 40
34 37 38 38 78
50 26 54 28 29

1 2  12
Stem
Leaf
0
8
1
2,7
2
2,6,7,8,9
3
0,0,1, 2, 2, 4, 4, 4,7,8,8,8
4
0,3,3, 4, 4
5
0,1,3, 4
6
7
8
EX1: A bakery collected the following data about the number of loaves
of fresh bread sold on each of 24 business days. Make a stem-and-leaf
plot of the data. Find the median and mode(s) of the data. How can the
owner use the stem-and-leaf plot to make plans for making bread?
53 49 27 48 60 52
44 38 47 52 82 46
55 31 39 54 51 47
50 45 50 61 43 64
A histogram is a bar graph the gives the frequency of different types of data. The horizontal axis is
divided into equal ranges and the height of the bar represents the range’s frequency.
EX2: Make a frequency table and a histogram of the following set of numbers:
0.2, 1.3, 5.4, 4.3, 2.2, 4.3, 4.6, 3.5, 5.1, 4.8, 1.5, 3.7, 5.4, 4.0, 4.2, 5.2
A relative frequency table shows how often a category occurs relative to the entire set of data. This
could be as a percentage or a decimal. We can take this data and create a relative frequency histogram.
EX3: Make a relative frequency table and a relative frequency histogram of the following data.
0.2, 1.3, 5.4, 4.3, 2.2, 4.3, 4.6, 3.5, 5.1, 4.8, 1.5, 3.7, 5.4, 4.0, 4.2, 5.2
A circle graph is another way of showing relative frequency. Each segment of the circle represents a
percentage of the whole set of data. Keep in mind that there is a total of 360 degrees in a circle and
each percentage represents a part of the 360.
Region
EX4: The table shows the distribution by region of the resident
population of the United States in 1996. Make a circle graph of the data.
Find the probability that a randomly chosen resident of the United
States in 1996 was not a resident of the South.
Population
Northeast
51.6
Midwest
62.1
South
93.1
West
58.5
Section Three: Box-and-Whiskers Plots
A box-and-whisker plot is a way to display data to see how it is dispersed. It divides the data into
quartiles (4 groups) that each contain 25% of the entire data. We know that the median of a set of data
divides the data into two equal groups. If we then divide the two equal groups created by the median,
we have found the first quartile (lower quartile) and the third quartile (upper quartile). These three
values along with the maximum and minimum values make up the five-point summary of the data set
that we use to create a box-and-whisker plot. The distance from the minimum to the maximum we have
learned is the range of the data. The distance between the first and third quartile is known as the
interquartile range (IQR).
Sometimes values are so extreme that they could skew our averages of a set of data. We call these
points outliers because they really aren’t very common to the other points of the data. We determine if
a point is an outlier by using the IQR. A value of 1.5 times the IQR below the first quartile or a value of
1.5 times the IQR above the third quartile is an outlier.
EX1: The data shows the age of 43 Florida Marlins
baseball players at the 1997 World Series. Find the
quartiles, range, and interquartile range for the
data. Create a box-and-whisker plot. Identify any
outliers.
29 25 31 19
34 22 32 34 20
31 35 27 35 22 38 28 25 26
21 22 27 26 21 32 25 25 23
35 32 27 28 35 25 30 22 23
38 25 36 30 26 34 26
EX2: The mean monthly temperatures for L.A. and
Chicago are listed in the table. Construct a box-andwhisker plot for each city and compare them.
Monthly Mean Temperatures (°F)
Los Angeles
Chicago
Jan
55.9
21.0
Feb
57.0
25.5
Mar
58.3
37.0
Apr
60.8
48.6
May
63.3
58.8
Jun
66.7
68.5
Jul
70.9
73.0
Aug
71.8
71.6
Sep
70.5
64.4
Oct
66.6
52.7
Nov
62.1
39.9
Dec
57.6
26.6
Section Four: Measures of Dispersion
The range and interquartile range that we studied about in previous lessons are two examples of
measures of dispersion. They measure how spread out the data in the set is. However, neither of these
is very reliable because they focus on only two values of the set. A better way of measuring dispersion is
with the mean deviation. It gives the average amount of space between all the numbers in your set and
the mean of the set.
Mean Deviation = Average of differences between the mean and each data point
Or
mean deviation 
1 n
 xi  x
n i1
EX1: The table lists a student’s test scores in two subjects this year. Find the range and the mean
deviation for the test scores. Then describe what these measures indicate about each set of test scores.
History Math
85
82
91
92
96
100
85
77
93
84
Two more ways of measuring dispersion is variance and standard deviation.
variance   2 

1 n
 xi  x
n i1

2
standard deviation  
EX2: Find the variance and standard deviation for the test scores in EX1.
EX3: In a survey, 30 people were asked to rank a new soda on a scale of
1 to 10. The results are shown in the table. Use a graphing calculator to
find the variance and standard deviation for the set of data.
5
7
10
8
7
7
8
8
10
9
9
8
7
9
8
6
9
9
8
7
8
7
6
10
7
10
8
8
10
9
Section Five: Binomial Distribution
A binomial experiment must meet the following criteria:
1. The experiment has only two possible outcomes: success or failure
2. The two outcomes are independent. Their probabilities add up to equal one.
To find the probability of a binomial experiment, we can use the following method
P=Combinations(success prob)(failure prob)
EX1: Suppose that the probability of a VCR that was manufactured in a certain factory being defective is
2%. What is the probability that 2 VCRs are defective in a shipment of 12 VCRs from this factory?
We have discussed in previous lessons that when finding the probability of “or” statements that we can
simply add the probability of each together.
EX2: A landscaping plan specifies for 10 trees of a certain type are to be planted in front of a building.
When this type of tree is planted in autumn, the probability that it will survive the winter is 85%. What is
the probability that no fewer than 8 of the 10 trees will survive the winter if planted in the autumn?
EX3: Create a relative frequency histogram of the scenario from EX2 displaying the probability that 0, 1,
2, 3, …, 9, 10 trees will live.
Section Six: Normal Distributions
In a normal distribution data varies randomly from the
mean, creating a bell-shaped pattern that is symmetric
about the mean when graphed. If we connect the tops of
the bars in the histogram it creates a normal curve. We
define a normal curve with its mean and standard deviation.
Because of its shape, we sometimes call the distribution a
bell curve. Normal distributions represent many real-life
scenarios such as IQ scores, adult heights, and baby birth
weights.
A normal curve with a mean of 0 and a standard deviation of 1 is called a standard normal curve. Since a
normal curve is derived from a relative frequency histogram, we can find different probabilities by
finding the area under the standard normal curve. The total area under the curve is 1 (100% probability).
The table below gives us a breakdown of the areas under the curve in terms of standard deviations.
SD
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Area 0.0000 0.0793 0.1554 0.2257 0.2881 0.3413 0.3849 0.4192 0.4452 0.4641 0.4772
A summary of the properties of normal distributions are seen in the chart below.
EX1: Approximate each probability by using the area table for a standard normal curve.
a. P  x  1.2
b. P  0.4  x  1.8
EX2: On a cereal box, the weight is listed as 12 ounces. Among all boxes of cereal of this type, however,
the actual weights approximate a normal distribution with a mean of 12 ounces and a standard
deviation of 0.1 ounce. What is the probability that the contents of a randomly selected box of cereal of
this type is within 0.1 ounce of 12 ounces? How many boxes of cereal out of 5000 can the manufacturer
expect to be within this range?
A z-score tells how far a value is from the data’s mean in terms of standard deviations. For example,
assume a set of data has a mean of 50 and a standard deviation of 10. A data value of 70 would have a zscore of 2 since it is 2 SDs from the mean. 20 would have a z-score of 3. We can find a z-score with the
formula z 
xx

.
EX3: A manufacturer finds that the life expectancy of its best-selling model of clothes dryer is 11 years
and the standard deviation is 1.5 years. Find the z-scores for 10 years and 12 years. What is the
probability that this model dryer will last from 10 years to 12 years?