Download PowerPoint

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 16
16.1 – Statistics Organizing Data
16.2 – Measures of Central Tendency
16.3 – Measures of Variation
Stem-and-Leaf Plots
ASTRONAUTS Display the data shown in a stem-and-leaf plot.
Step 1 Find the least and the greatest number.
54
77
Stem-and-Leaf Plots
ASTRONAUTS Display the data shown in a stem-and-leaf plot.
54 77
Step 2 Draw a vertical line and write the
stems from 5 to 7 to the left of the line.
Stem-and-Leaf Plots
ASTRONAUTS Display the data shown in a stem-and-leaf
plot.
Step 3 Write the leaves to the right of
the line, with the corresponding stem.
Stem and Leaf
Diagram
Stem-and-Leaf Plots
ASTRONAUTS Display the data shown in a stem-and-leaf
plot.
Step 4 Rearrange the leaves so they
are ordered from least to greatest.
Ranked Stem and
Leaf Plot
Frequency Distributions
Relative
Frequency
0.67
0.23
0.06
0
0.03
Relative
Frequency
0.14
0.33
0.25
0.17
0.11
Relative Frequency
The ratio of the absolute frequency to the total number of data points
in a frequency distribution
Also known as Experimental Probability.
As the number of data points in any experiment increase the
Experimental Probability approaches the theoretical probability.
Measures of Variation
Used to describe the distribution of the data:
Range: Difference between the high and the low data points
Variance:
Standard Deviation:
Relative Frequency
The ratio of the absolute frequency to the total number of data points
in a frequency distribution
Also known as Experimental Probability.
As the number of data points in any experiment increase the
Experimental Probability approaches the theoretical probability.
Measures of Variation
Used to measure how spread out the data is.
Range: Difference between the high and the low data points
Mean, Median, Mode: Measures of central tendency
Variance and Standard Deviation: Measures how much the data
values differ from the mean. Mean Deviation
Mean
The average of all data points. To find the mean of a set of data, add
all the data and divide by the number of data points.
n
x
x
i 1
i
n
Sample Space
The sum of all possible outcomes for the event
The sum of all probabilities assigned to outcomes in a sample space
must be 1.
Complement
For an event A the event Not A is the complement of A
Mean
The average of all data points. To find the mean of a set of data, add
all the data and divide by the number of data points.
n
x
x
i 1
i
n
Median
The value of the data point that is exactly in the middle of the data.
To find the median, put the data in order from least to greatest and
find the term exactly in the middle. If there is no one term exactly in
the middle, average the two that are in the middle.
Complement
For an event A the event Not A is the complement of A
Mean
The average of all data points. To find the mean of a set of data, add
all the data and divide by the number of data points.
n
x
x
i 1
i
n
Median
The value of the data point that is exactly in the middle of the data.
To find the median, put the data in order from least to greatest and
find the term exactly in the middle. If there is no one term exactly in
the middle, average the two that are in the middle.
Mode
The value of the data point that occurs most often. Possible to have
more than one mode
Box-and-Whisker Plots
Step 1 Find the least and greatest number. Then draw a number
line that covers the range of the data.
Step 2 Find the median, the extremes, and the upper and lower
quartiles. Mark these points above the number line.
New hampshire
13
Delaware
28
Maryland
31
Rhode Island
40
Georgia
100
Virginia
112
New York
127
New Jersey
130
South Carolina
187
Massachusetts
192
Maine
228
North Carolina
301
Florida
580
Lower Extreme
Lower Quartile : Lower hinge
Median
Upper Quartile : Upper hinge
Upper Extreme
Step 3 Draw a box and the whiskers.
Computing Measures of Variation
Range: Difference between the high and the low data points
n
Mean Deviation 
| x  x |
i 1
i
n
2
(
x

x
)

i
n
Variance:  2 
i 1
n
Standard Deviation:  
n
2
(
x

x
)

i
i 1
n
Find the mean, median, mode, variance, mean deviation, and
standard deviation of the following data
59
59
65
68
69
72
73
76
78
81
81
88
Find the mean, median, mode, variance, mean deviation, and
standard deviation of the following data
10
12
9
10
8
9
13
8
9
9
10
10
12
13
HW #16.1-3
Pg 692-693 3, 9, 10-13
Pg 698-699 3, 6, 9-16
Pg 702-703 5, 7-10
Chapter 16
16.4 – The Normal Distribution
The heights of 16-year-old girls are not distributed uniformly, or
evenly. Many more girls are average height than are very short or
very tall. The values are distributed so that they are frequent near the
mean, and become more rare and infrequent the farther they are from
the mean. The most common distribution with this characteristic is a
Normal curves are symmetric with respect to the vertical line at the mean.
The spread of each curve is defined by its standard deviation.
Areas under this curve represent probabilities from normal
distributions.
When data are distributed in a bell-shaped or normal curve about
68% of the data lie within one standard deviation on either side of
the mean, and about 95% lie within two standard deviations of the
mean.
Consider the bell-shaped distribution of IQ scores for students in a
school. The mean is 100 and the standard deviation is 15. What
percent of students in the school would we expect to have IQs
between 85 and 115?
What percent of the students can
we expect to have IQs between
70 and 115?
What percent of the students in the school can we expect to have
IQs above 115?
What percent of the students in the
school can we expect to have IQs in
the range from 85 to 145?
The given times of 33 minutes and
57 minutes represent one standard
deviation on either side of the
mean, So, 68% of the shoppers will
spend between 33 an 57 minutes in
the supermarket.
According to a survey by the National Center for Health Statistics, the
heights of adult men in the United States are normally distributed with
a mean of 69 inches and a standard deviation of 2.75 inches. If you
randomly choose 1 adult man, what is the probability that all he is
71.75 inches tall or taller?
Z-Score
A z-score for a value is the number of standard deviations the value is
from the mean. The sign of the z-score is its direction from the mean.
For example, if a value has a z-score of -2, it is two standard
deviations below the mean.
z
xx

What is the z-score for 46 in a normal distribution whose mean is 44
and whose standard deviation is 2?
For this distribution, mean = 44 and Standard Deviation = 2.
z
xx

46  44

1
2
Thus, 46 is one standard deviation above the mean.
A value is selected randomly from a normal distribution. What
is the probability that its z-score is less than -1.46?
The probability that a value has a z-score less than -1.46 is equal
to the area of the shaded region under the curve. This value is
given in Table 7.
Table 7 on page 850 gives the probability that a value in the
distribution has a z-score that is less than a given value
A student received a score of 56 on a normally distributed
standardized test. The test had a mean of 50 and a standard deviation
of 5.What is the probability that a randomly selected student achieved
a higher score?
We need to find the probability that a value greater than 56 is
selected from a normal distribution with a mean of 50 and a
standard deviation of 5.
Look up 1.2 in Table 7.
P(Score < 56) = 0.8849
P(Score > 56) = 1-0.8849 = 0.1151
How often would we expect to find an 1Q greater than 142 in a
sample of students whose mean 1Q was 110 where the standard
deviation was 16?
A company manufactures cover plates for boxes with lengths of 4
inches. Due to variation in the process, the lengths of the plates are
normally distributed about a mean of 4 inches with a standard
deviation of 0.01 inch. A plate is considered a "reject" if its length is
less than 3.98 inches or greater than 4.02 inches.What percent of the
production are considered "rejects"?
HW #16.4
Pg 708 1-31 Odd, 33-43
Chapter 16
16.5 – Collecting Data Randomness and Bias
Objective: Evaluate and select sampling methods.
Objective: Describe how to take a stratified random sample.
Objective: Evaluate and select sampling methods.
A scientist is studying the weight gain or loss of mice that are given
a certain treatment. When choosing mice for the experiment, the
scientist reaches into a cage with 30 mice and selects the 5 largest
mice in the cage. Is the sample random?
Objective: Evaluate and select sampling methods.
Describe how a random sample of 10 individuals might be chosen
from a high school graduating class of 202 to receive a gift
certificate.
Objective: Evaluate and select sampling methods.
Although the processes involved in the development of a random
sample guarantee that requirements of equal probability and
independence are satisfied, they do not guarantee that the sample
drawn will be REPRESENTATIVE
Objective: Evaluate and select sampling methods.
A town newsletter is doing an article on high school students. A
questionnaire is sent to a random sample of school-aged students. Are the
data representative?
No. A sample of students from kindergarten through grade
12 would not give results representative of high school
students.
Even when the sample is restricted to high school students,
the data may not be representative.
Data might have been collected largely from members of
the high school chorus.
In this case, the data would most likely be biased. That is,
it is likely that the data would be overly influenced by
factors that are related to musical interests.
Objective: Evaluate and select sampling methods.
Objective: Describe how to take a stratified random sample.
To draw a representative sample, we may need to divide the
population into distinct subgroups, called strata.
Then we can use stratified random sampling to assure that the
sample has the same characteristics as the population.
1. Each member of the population must be placed in one and
only one stratum.
2. A random sample is drawn so that the sample has the same
distribution among the strata as the population.
Objective: Describe how to take a stratified random sample.
A college has 1260 freshmen, 1176 sophomores, 840 juniors, and
924 seniors. Describe how to take a stratified random sample of 200
students.
We would randomly sample 60 freshmen, 56 sophomores, 40
juniors, and 44 seniors.
HW #16.5
Pg 713-714 1-19
Chapter 16
16.6 – Testing Hypothesis
Statisticians are often asked to test whether a given set of
observational data represents what one would expect to observe by
chance or whether it differs greatly from what one might expect.
To determine what constitutes a significant difference,
statisticians establish a level of error they are willing to tolerate.
If we throw a coin 30 times and the coin shows tails 23 times, we
may decide either that
1. The results occurred by chance.
•
The coin is fair and randomly showed 23 tails,
although the likelihood that a coin lands tails 23 times
is remote.
2. The results did not occur by chance.
•
The coin is unevenly weighted or was tossed so that
tails had a higher probability of being thrown.
Before stating that the results were biased or influenced by other
circumstances, we want to be sure there is a 5% or less probability that
the results occurred by chance. Then we can state that the coin is fair at
the 5% level of significance.
Find the probability of throwing 23 tails in 30 tosses
23
7
30
  1   1 
 23   2   2   .0019  1%
    
Since the probability is less than 5%, we may state that at the 5%
level of significance, the results did not occur by chance.
One of the most common ways to test whether a given set of
data differs from what one would expect by chance is to use
the chi-square (2) test. This test is used to compare
observed data with expected data.
If there is a large difference between observed and expected data, we
get a large value for  2. If there is no difference,  2 = O.
If we toss a coin 30 times and get 23 heads calculate 2 :
What does that tell us about the whether or not the event occurred by
chance or not?
The chi-square test is typically used to accept or reject a hypothesis
about a set of data and to generalize about a population.
null hypothesis: There is no statistical difference between the
expected and the observed data.
Thus, the observed results occurred by chance.
The larger the value of  2,the higher the probability the null
hypothesis is false.
How large must X2 be for us to reject the null hypothesis?
For a specific level of significance, we reject the null hypothesis
if the calculated value of chi-square exceeds the table value for
the number of possible outcomes.
Suppose we rolled a number cube 72 times and found that we
had 13 ones, 18 threes, and 12 sixes. Determine whether these
results occurred by chance. Use a 5% level of significance. The
null hypothesis is that the results occurred by chance.
There are 4 possible outcomes. The chi-square value for a 5%
significance level for 4 possible outcomes is 7.81. Since 4.44 does not
exceed this value, we can state that our results occurred by chance.
Thus, we can accept the null hypothesis.
HW #16.6
Pg 718-720 1-13
Test Review
* Stem and Leaf Plots
* Frequency plots
Relative Frequency
* Box and Whisker plots
* Normal Distributions
Mean, Median, Mode, Standard Deviation, Mean
Deviation, Variance, Z-score
* Hypothesis testing
Chi-Square, Significance Level, Null hypothesis
* Random Sample
Representative, Biases, Stratified random sample
* Challenge Problems
Find the variance and standard deviation of each set of
data.
{5, 8, 2, 9, 4}
{16, 22, 18, 31, 25, 22}
The useful life of a radial tire is normally distributed with a
mean of 30,000 miles and a standard deviation of 5000 miles.
The company makes 10,000 tires a month.
1. About how many tires will last between 25,000 and 35,000
miles?
2. About how many tires will last more than 40,000 miles?
3. About how many tires will last less than 25,000 miles?
4. What is the probability that if you buy a radial tire at random,
it will last between 20,000 and 35,000 miles?
The vending machine in the school cafeteria usually
dispenses about 6 ounces of soft drink. Lately, it is not
working properly, and the variability of how much of the soft
drink it dispenses has been getting greater. The amounts
are normally distributed with a standard deviation of 0.2
ounce.
1. What percent of the time will you get more than 6 ounces of
soft drink?
2. What percent of the time will you get less than 6 ounces of
soft drink?
3. What percent of the time will you get between 5.6 and 6.4
ounces of soft drink?
4. If you purchased a soda, what is the probability that it
dispenses less than 6.5 oz of soda
If the mean GPA at Troy is 3.4 with a standard deviation of
0.38, what is the GPA for a student in the top 5%?
Mr. Burnum gave an exam to his 30 Algebra 2 students at the
end of the first semester. The scores were normally distributed
with a mean score of 78 and a standard deviation of 6. What
percent of the students would you expect to receive a grade of
at least 70%.
Cucumbers grown on a certain farm have weights with
a standard deviation of 2 ounces. What is the mean
weight if 85% of the cucumbers weigh less than 16
ounces?
A coke machine is set up so that it will dispense soda into
a can. If the actual amount of soda dispensed is normally
distributed such that only 10% of all cans have less than
11.75 ounces and only 25% of all cans have more than
12.75 ounces, what are the mean and standard deviation
of the soda dispensing machine?
HW #R-16
Pg 722-724 1-17