Download Statistics Chapter 2 Name

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics Chapter 2
Name:
2.4 Measures of Variation
Learning objectives:
1. How to find the range of a data set
2. How to find the variance and standard deviation of a population and of a sample
3. How to use the Empirical Rule and Chebychevs Theorem to interpret standard deviation
4. How to approximate the sample standard deviation for grouped data
5. How to use the coefficient of variation to compare variation in different data sets
Range
1. The difference between the maximum and minimum data entries in the set.
2. The data must be quantitative.
3. Range = (Max. data entry) − (Min. data entry)
A corporation hired 6 graduates. The starting salaries for each graduate are shown. Find
the range of the starting salaries.
Starting salaries (1000s of dollars)
41
38
39
45
47
41
Deviation
1. The difference between the data entry, x, and the mean of the data set.
2. Population data set:
• Deviation of x = x − µ
3. Sample data set:
• Deviation of x = x − x̄
Example Find the
deviation of the
starting salaries.
Formulas for Population Variance and Standard Deviation
P
(x − µ)2
2
Population Variance
σ =
N
This gives us a formula for the ”average deviation” for population data. However, the units
of variance is equal to the units of the data raised to the second power. For example, if x is
in dollars, then the units on variance will be dollars squared.
rP
√
(x − µ)2
Population Standard Deviation
σ = σ2 =
N
If we take the square root of the variance we get σ, the population standard deviation
— a parameter that has the same units as numbers in the data set.
How to find the Population Variance & Standard Deviation
P
x
1. Find the mean of the population data set, µ =
N
2. Find deviation of each entry, x − µ
3. Square each deviation, (x − µ)2 .
P
4. Add to get the sum of squares, (x − µ)2
P
5. Divide by N to get the population variance,
(x − µ)2
N
rP
6. Find the square root to get the population standard deviation,
(x − µ)2
N
A corporation hired 6 graduates. The starting salaries for each graduate are shown. Find
the population variance and standard deviation of the starting salaries.
Starting salaries (1000s of dollars)
41
38
39
45
47
41
Formulas for Sample Variance and Standard Deviation
P
(x − x̄)2
2
Sample Variance
s =
n−1
rP
√
(x − x̄)2
Sample Standard Deviation
s = s2 =
n−1
How to find the Sample Variance & Standard Deviation
P
x
1. Find the mean of the sample data set, x̄ =
n
2. Find deviation of each entry, x − x̄
3. Square each deviation, (x − x̄)2 .
P
4. Add to get the sum of squares, (x − x̄)2
P
(x − x̄)2
5. Divide by (n − 1) to get the sample variance,
n−1
rP
6. Find the square root to get the sample standard deviation,
(x − x̄)2
n−1
A corporation hired 6 graduates. The starting salaries for each graduate are shown. Find
the sample variance and standard deviation of the starting salaries.
Starting salaries (1000s of dollars)
41
38
39
45
47
41
Try This! You are asked to compare three data sets (dotplots below). Without calculating,
determine which data set has the greatest sample standard deviation and which has the least
sample standard deviation.
Entry Deviation Squares
Data entries that lie more than two standard deviations
x
x−µ
(x − µ)2
from the mean are considered unusual, while those that
1
-3
9
lie more than three standard deviations from the mean
3
-1
1
are very unusual. Unusual and very unusual entries have
5
1
1
a greater influence on the standard deviation than entries
7
3
9
closer to the mean. This happens because the deviations
are squared. Consider the data in the above table. The squares of the deviations of the
entries farther from the mean (1 and 7) have a greater influence on the value of the standard
deviation than those closer to the mean (3 and 5).
Try This! A sample of 500 monthly utility bills for households in a city was collected. The
mean of the sample was $70 and the sample standard deviation was $8. Here is a short list
of a few of the 500 measurements from the sample: $74, $52, $62, $98 Are any of the data
entries unusual or very unusual? Explain your reasoning.
The Empirical Rule
For data with a (symmetric) bell-shaped distribution, the standard deviation has the
following characteristics:
• About 68% of the data lie within one standard deviation of the mean.
• About 95% of the data lie within two standard deviations of the mean.
• About 99.7% of the data lie within three standard deviations of the mean.
4. Example: The mean IQ score of students in a particular calculus class is 110, with a
standard deviation of 5. (Assume the data set has a bell-shaped distribution.)
a) Use the Empirical Rule to find the percentage of students with an IQ above 120.
b) Use the Empirical Rule to find the percentage of students with an IQ between 100
and 110
c) Use the Empirical Rule to find the percentage of students with an IQ between 105
and 120
5. Try This! : In a survey conducted by the National Center for Health Statistics, the
sample mean height of women in the United States (ages 20-29) was 64.3 inches, with a
sample standard deviation of 2.62 inches. Use the Empirical Rule to estimate the percent
of the women whose heights are between 59.06 inches and 64.3 inches.
Chebychev’s Theorem
1. The percentage of any data set lying within k standard
deviations
(where k is any
1
number greater than one) of the mean is at least:
1 − 2 · 100%
k
3
1
2. For example, when k = 2: In any data set, at least 1 − 2 = or 75% of the data lie
2
4
within 2 standard deviations of the mean.
8
1
3. When k = 3: In any data set, at least 1 − 2 = or 88.9% of the data lie within 3
3
9
standard deviations of the mean.
4. k can be any number greater than one
Example The mean time in the finals for the women’s 800-meter freestyle at the 2012
Summer Olympics was 502.84 seconds, with a standard deviation of 4.68 seconds. Apply
Chebychev’s Theorem to the data using k = 2. Interpret the results.
Try This! The mean time in the finals for the women’s 800-meter freestyle at the 2012
Summer Olympics was 502.84 seconds, with a standard deviation of 4.68 seconds. Apply
Chebychev’s Theorem to the data using k = 1.5. Interpret the results.
Example From a sample with n=48, the mean cost of purchasing a home in a major city was
$520,000 and the standard deviation was $40,000. Using Chebychev’s Theorem, determine
at least how many of the homes cost between $460,000 and $580,000?
Try This! Heights of adult women have a mean of 63.6 in. and a standard deviation
of 2.5 in. Does Chebyshev’s Theorem say about the percentage of women with heights
between 58.6 in. and 68.6 in.? At least how many women in a sample of 50 would have
heights between 58.6 in. and 68.6 in.?