Download Standard Deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
SECTION 3.2: MEASURES OF SPREAD
OBJECTIVES
1.
2.
3.
4.
5.
Compute the range of a data set
Compute the variance of a population and a sample
Compute the standard deviation of a population and a sample
Approximate the standard deviation with grouped data
Use the Empirical Rule to summarize data that are unimodal and approximately
symmetric
6. Use Chebyshev’s Inequality to describe a data set
7. Compute the coefficient of variation
OBJECTIVE 1
COMPUTE THE RANGE OF A DATA SET
The ____________________ of a data set is the difference between the largest value and the
smallest value.
E XAMPLE :
The average monthly temperatures, in degrees Fahrenheit, for San Francisco are
San Francisco
Jan
51
Feb
54
Mar
55
Apr
56
May
58
Jun
60
Jul
60
Aug
61
Sep
63
Oct
62
Nov
58
Dec
52
The range of temperatures is: ________________________________________.
Although the range is easy to compute, it is not often used in practice. The reason is that the
range involves only two values from the data set: the largest and smallest.
1
SECTION 3.2: MEASURES OF SPREAD
OBJECTIVE 2
COMPUTE THE VARIANCE OF A POPULATION AND A SAMPLE
VARIANCE
When a data set has a small amount of spread, like the San Francisco temperatures, most of the
values will be close to the mean. When a data set has a larger amount of spread, more of the
data values will be far from the mean. The variance is a measure of how far the values in a data
set are from the mean, on the average. The variance is computed slightly differently for
populations and samples. The population variance is presented first.
Let π‘₯1 , π‘₯2 , π‘₯3 , … , π‘₯𝑁 denote the values in a population of size 𝑁. Let πœ‡ denote the population
mean. The population variance, denoted by 𝜎 2 , is
P OPULATION V ARIANCE :
𝜎2 =
E XAMPLE :
Compute the population variance for the San Francisco temperatures:
San Francisco
Jan
51
Feb
54
Mar
55
Apr
56
May
58
Jun
60
Jul
60
Aug
61
Sep
63
Oct
62
Nov
58
Dec
52
S OLUTION :
2
SECTION 3.2: MEASURES OF SPREAD
SAMPLE VARIANCE
When the data values come from a sample rather than a population, the variance is called the
sample variance. The procedure for computing the sample variance is a bit different from the
one used to compute a population variance. In the formula, the mean πœ‡ is replaced by the
sample mean π‘₯Μ… and the denominator is 𝑛 βˆ’ 1 instead of 𝑁. The sample variance is denoted by
𝑠2.
S AMPLE V ARIANCE :
𝑠2 =
WHY DIVIDE BY 𝒏 βˆ’ 𝟏?
When computing the sample variance, we use the sample mean to compute the deviations. For
the population variance we use the population mean for the deviations. It turns out that the
deviations using the sample mean tend to be a bit smaller than the deviations using the
population mean. If we were to divide by 𝑛 when computing a sample variance, the value
would tend to be a bit smaller than the population variance. It can be shown mathematically
that the appropriate correction is to divide the sum of the squared deviations by 𝑛 βˆ’ 1 rather
than 𝑛.
E XAMPLE :
A company that manufactures batteries is testing a new type of battery designed
for laptop computers. They measure the lifetimes, in hours, of six batteries, and
the results are 3, 4, 6, 5, 4, 2. Find the sample variance of the lifetimes.
S OLUTION :
3
SECTION 3.2: MEASURES OF SPREAD
OBJECTIVE 3
COMPUTE THE STANDARD D EVIATION OF A POPULATION AND A SAMPLE
STANDARD DEVIATION
Because the variance is computed using squared deviations, the units of the variance are the
squared units of the data. For example, in the Battery Lifetime example, the units of the data
are hours, and the units of variance are squared hours. In most situations, it is better to use a
measure of spread that has the same units as the data.
We do this simply by taking the square root of the variance. This quantity is called the standard
deviation. The standard deviation of a sample is denoted 𝑠, and the standard deviation of a
population is denoted by 𝜎.
S AMPLE S TANDARD D EVIATION :
𝑠=√
P OPULATION S TANDARD D EVIATION :
𝜎=√
E XAMPLE :
The population variance of temperatures in San Francisco is 𝜎 2 = 14.083. Find
the population standard deviation.
S OLUTION :
E XAMPLE :
The variance of the lifetimes for a sample of six batteries 𝑠 2 = 2. Find the
sample standard deviation.
S OLUTION :
4
SECTION 3.2: MEASURES OF SPREAD
STANDARD DEVIATION ON THE TI-84 PLUS
The following steps will compute the standard deviation for both
sample data and population data on the TI-84 PLUS Calculator:
Step 1: Enter the data in L1.
Step 2: Run the 1-Var Stats command (the same command
used for means and medians), selecting L1 as the
location of the data.
Note: If your calculator does not support Stat Wizards, enter L1 next to the 1-Var
Stats command on the home screen and press enter to run the command
STANDARD DEVIATION AND RESISTANCE
Recall that a statistic is resistant if its value is not affected much by extreme values (large or
small) in the data set. The standard deviation is _____________________________________.
5
SECTION 3.2: MEASURES OF SPREAD
OBJECTIVE 4
APPROXIMATE THE STANDARD DEVIATION USING GROUPED DATA
STANDARD DEVIATION
Sometimes we don’t have access to the raw data in a data set, but we are given a frequency
distribution. In these cases we can approximate the standard deviation using the following
steps.
Step 1:
Compute the midpoint of each class and approximate the mean of the frequency
distribution.
Step 2:
For each class, subtract the mean from the class midpoint to obtain (Midpoint –
Mean).
Step 3:
For each class square the difference obtained in Step 2 to obtain (Midpoint –
Mean)2, and multiply by the frequency to obtain
(Midpoint – Mean)2 x (Frequency).
Step 4:
Add the products (Midpoint – Mean)2 x (Frequency) over all classes.
Step 5:
To compute the population variance, divide the sum obtained in Step 4 by 𝑛. To
compute the sample variance, divide the sum obtained in Step 4 by 𝑛 – 1.
Step 6:
Take the square root of the variance obtained in Step 5. The result is the
standard deviation.
6
SECTION 3.2: MEASURES OF SPREAD
E XAMPLE :
The following table presents the number of text messages sent via cell phone by
a sample of 50 high school students. Approximate the sample standard deviation
number of messages sent.
Number of Text Messages Sent
0 – 49
50 – 99
100 – 149
150 – 199
200 – 249
250 – 299
Frequency
10
5
13
11
7
4
S OLUTION :
7
SECTION 3.2: MEASURES OF SPREAD
GROUPED DATA ON THE TI-84 PLUS
The same procedure used to compute the mean for grouped data in a frequency distribution
may be used to compute the standard deviation. Enter the midpoint for each class into L1 and
the corresponding frequencies in L2. Next, select the 1-Var stats command and enter L1 in the
List field and L2 in the FreqList field, if using Stats Wizards. If you are not using Stats Wizards,
you may run the1-Var Stats command followed by L1, comma, L2.
8
SECTION 3.2: MEASURES OF SPREAD
OBJECTIVE 5
USE THE EMPIRICAL RULE TO SUMMARIZE DATA THAT ARE UNIMODAL AND APPROXIMATELY SYMMETRIC
BELL-SHAPED HISTOGRAMS
Many histograms have a single mode near the center of the data, and are approximately
symmetric. Such histograms are often referred to as bell-shaped.
THE EMPIRICAL RULE
When a data set has a bell-shaped histogram, it is often possible to use the standard deviation
to provide an approximate description of the data using a rule known as The Empirical Rule.
T HE E MPIRICAL R ULE
When a population has a histogram that is approximately bell-shaped, then:
Approximately _______ of the data will be __________________________________________.
Approximately _______ of the data will be __________________________________________.
____________________ of the data will be __________________________________________.
9
SECTION 3.2: MEASURES OF SPREAD
E XAMPLE :
The following table presents the U.S. Census Bureau projection for the
percentage of the population aged 65 and over for each state and the District of
Columbia. Use the Empirical Rule to describe the data.
14.1
14.1
12.3
13.1
14.3
13.3
14.1
12.2
14.4
14.3
15.3
12.4
17.8
16.0
13.0
15.0
12.0
8.1
13.6
12.6
14.9
11.5
10.5
13.6
12.6
14.1
12.4
13.7
13.7
10.2
13.5
15.5
12.8
12.4
13.9
14.6
13.8
13.4
10.7
9.0
13.7
15.6
11.5
12.2
12.4
12.8
14.3
14.0
13.8
13.9
12.7
S OLUTION :
10
SECTION 3.2: MEASURES OF SPREAD
OBJECTIVE 6
USE CHEBYSHEV’S INEQUALITY TO DESCRIBE A DATA SET
When a distribution is bell-shaped, we use ________________________________ to
approximate the proportion of data within one or two standard deviations. Another rule called
_________________________________ holds for any data set.
C HEBYSHEV ’ S I NEQUALITY :
In any data set, the proportion of the data that is within K standard deviations of the mean is at
least 1 – 1/K2. Specifically, by setting K = 2 or K = 3, we obtain the following results.
________________________________, of the data are within __________ standard deviations
of the mean.
________________________________, of the data are within __________ standard deviations
of the mean.
E XAMPLE :
As part of a public health study, systolic blood pressure was measured for a large
group of people. The mean was 120 and the standard deviation was 10. What
information does Chebyshev’s Inequality provide about these data?
S OLUTION :
11
SECTION 3.2: MEASURES OF SPREAD
OBJECTIVE 7
COMPUTE THE C OEFFICIENT OF VARIATION
The _______________________________ tells how large the standard deviation is relative to
the mean. It can be used to compare the spreads of data sets whose values have different units.
The coefficient of variation is found by dividing the standard deviation by the mean.
C OEFFICIENT OF V ARIATION :
CV =
E XAMPLE :
National Weather service records show that over a thirty-year period, the annual
precipitation in Atlanta, Georgia had a mean of 49.8 inches with a standard
deviation of 7.6 inches, and the annual temperature had a mean of 62.2 degrees
Fahrenheit with a standard deviation of 1.3 degrees. Compute the coefficient of
variation for precipitation and for temperature. Which has greater spread
relative to its mean?
S OLUTION :
12
SECTION 3.2: MEASURES OF SPREAD
YOU SHOULD KNOW …
ο‚·
How to compute the range of a data set
ο‚·
The notation for population variance, population standard deviation, sample variance,
and sample standard deviation
ο‚·
How to compute the variance and the standard deviation for populations and samples
ο‚·
How to use the TI-84 PLUS calculator to compute the variance and standard deviation
for populations and samples
ο‚·
How to approximate the standard deviation for grouped data
ο‚·
How to use The Empirical Rule to describe a bell-shaped data set
ο‚·
How to use Chebyshev’s Inequality to describe any data set
ο‚·
How to compute and interpret the coefficient of variation
13