Download Midpoint – Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
MEASURES OF SPREAD
Section 3.2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1.
2.
3.
4.
5.
6.
7.
Compute the range of a data set
Compute the variance of a population and a sample
Compute the standard deviation of a population and a sample
Approximate the standard deviation with grouped data
Use the Empirical Rule to summarize data that are unimodal and
approximately symmetric
Use Chebyshev’s Inequality to describe a data set
Compute the coefficient of variation
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Compute the range of a data set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The Range
The range of a data set is the difference between the largest value and
the smallest value.
The average monthly temperatures, in degrees Fahrenheit, for San
Francisco are
San Francisco
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
51
54
55
56
58
60
60
61
63
62
58
52
The range of temperatures is: 63 – 51 = 12.
Although the range is easy to compute, it is not often used in practice.
The reason is that the range involves only two values from the data set:
the largest and smallest.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Compute the variance of a population and a
sample
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Variance
When a data set has a small amount of spread, like the San Francisco
temperatures, most of the values will be close to the mean. When a
data set has a larger amount of spread, more of the data values will be
far from the mean.
The variance is a measure of how far the values in a data set are from
the mean, on the average.
The variance is computed slightly differently for populations and
samples. The population variance is presented first.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Definition: Population Variance
Let π‘₯1 , π‘₯2 , π‘₯3 , … , π‘₯𝑁 denote the values in a population of size 𝑁. Let πœ‡
denote the population mean. The population variance, denoted by 𝜎 2 ,
is
Population Variance
𝜎2
βˆ‘ π‘₯𝑖 βˆ’ πœ‡
=
𝑁
2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Population Variance
Compute the population variance for the San Francisco temperatures.
San Francisco
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
51
54
55
56
58
60
60
61
63
62
58
52
Solution:
Step 1: Compute the population mean πœ‡.
πœ‡=
βˆ‘π‘₯𝑖
𝑁
=
51+54+55+56+58+60+60+61+63+62+58+52
12
= 57.5
Step 2: For each population value π‘₯𝑖 compute π‘₯𝑖 βˆ’ πœ‡. These values are
shown in the second row below.
π‘₯𝑖
51
54
55
56
58
60
60
61
63
62
58
52
π’™π’Š βˆ’ 𝝁
–6.5
–3.5
–2.5
–1.5
0.5
2.5
2.5
3.5
5.5
4.5
0.5
–5.5
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Population Variance
Step 3: Square the deviations to obtain the quantity π‘₯𝑖 βˆ’ πœ‡ 2 . These values
are shown in the third row.
π‘₯𝑖
51
54
55
56
58
60
60
61
63
62
58
52
π‘₯𝑖 βˆ’ πœ‡
–6.5
–3.5
–2.5
–1.5
0.5
2.5
2.5
3.5
5.5
4.5
0.5
–5.5
42.25
12.25
6.25
2.25
0.25
6.25
6.25
12.25
30.25
20.25
0.25
30.25
π’™π’Š βˆ’ 𝝁
𝟐
Step 4: Sum the squared deviations to obtain the quantity βˆ‘ π‘₯𝑖 βˆ’ πœ‡ 2 .
βˆ‘ π‘₯𝑖 βˆ’ πœ‡ 2 = 42.25 + 12.25 + 6.25 + 2.25 + 0.25 + 6.25 + 6.25
+12.25 + 30.25 + 20.25 + 0.25 + 30.25
= 169
Step 4: Divide the sum obtained in Step 4 by the population size 𝑁 to obtain
the population variance 𝜎 2 .
βˆ‘ π‘₯𝑖 βˆ’ πœ‡
2
𝜎 =
𝑁
2
169
=
= 14.083.
12
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Sample Variance
When the data values come from a sample rather than a population, the
variance is called the sample variance. The procedure for computing
the sample variance is a bit different from the one used to compute a
population variance. In the formula, the mean πœ‡ is replaced by the
sample mean π‘₯ and the denominator is 𝑛 βˆ’ 1 instead of 𝑁. The sample
variance is denoted by 𝑠 2 .
Sample Variance
𝑠2
βˆ‘ π‘₯𝑖 βˆ’ π‘₯
=
π‘›βˆ’1
2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Why Divide by 𝑛 βˆ’ 1 ?
When computing the sample variance, we use the sample mean to
compute the deviations. For the population variance we use the
population mean for the deviations.
It turns out that the deviations using the sample mean tend to be a bit
smaller than the deviations using the population mean. If we were to
divide by 𝑛 when computing a sample variance, the value would tend to
be a bit smaller than the population variance.
It can be shown mathematically that the appropriate correction is to
divide the sum of the squared deviations by 𝑛 βˆ’ 1 rather than 𝑛.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Sample Variance
A company that manufactures batteries is testing a new type of battery
designed for laptop computers. They measure the lifetimes, in hours, of
six batteries, and the results are 3, 4, 6, 5, 4, 2. Find the sample
variance of the lifetimes.
Solution:
The sample mean is π‘₯ =
3+4+6+5+4+2
6
= 4.
The sample variance is given by
βˆ‘ π‘₯𝑖 βˆ’ π‘₯ 2
2
𝑠 =
π‘›βˆ’1
3βˆ’4 2+ 4βˆ’4 2+ 6βˆ’4 2+ 5βˆ’4
=
6βˆ’1
10
=
=2
5
2
+ 4βˆ’4
2
+ 2βˆ’4
2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 3
Compute the standard deviation of a population
and a sample
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Standard Deviation
Because the variance is computed using squared deviations, the units
of the variance are the squared units of the data. For example, in the
Battery Lifetime example, the units of the data are hours, and the units
of variance are squared hours. In most situations, it is better to use a
measure of spread that has the same units as the data.
We do this simply by taking the square root of the variance. This
quantity is called the standard deviation. The standard deviation of a
sample is denoted 𝑠, and the standard deviation of a population is
denoted by 𝜎.
Sample Standard Deviation
𝑠=
𝑠2
Population Standard Deviation
𝜎=
𝜎2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Standard Deviation
Example:
The population variance of temperatures in San Francisco is 𝜎 2 = 14.083. Find
the population standard deviation.
Solution:
The population standard deviation is 𝜎 = 𝜎 2 = 14.083 = 3.753.
Example:
The variance of the lifetimes for a sample of six batteries 𝑠 2 = 2. Find the
sample standard deviation.
Solution:
The sample standard deviation is 𝑠 = 𝑠 2 = 2 = 1.414.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Standard Deviation on the TI-84 PLUS
The following steps will compute the standard deviation for both sample
data and population data on the TI-84 PLUS Calculator:
Enter the data into L1 in the data editor.
Run the 1-Var Stats command (the same command
used for means and medians), selecting L1 as the
location of the data.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Standard Deviation and Resistance
Recall that a statistic is resistant if its value is not affected much by extreme
values (large or small) in the data set.
The standard deviation is not resistant.
That is, the standard deviation is affected by extreme values.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 4
Approximate the standard deviation using
grouped data
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Approximating the Standard Deviation
Sometimes we don’t have access to the raw data in a data set, but we are given a frequency
distribution. In these cases we can approximate the standard deviation using the following steps.
Step 1:
Compute the midpoint of each class and approximate the mean of the frequency
distribution.
Step 2:
For each class, subtract the mean from the class midpoint to obtain (Midpoint – Mean).
Step 3:
For each class square the difference obtained in Step 2 to obtain (Midpoint – Mean)2,
and multiply by the frequency to obtain
(Midpoint – Mean)2 x (Frequency).
Step 4:
Add the products (Midpoint – Mean)2 x (Frequency) over all classes.
Step 5:
To compute the population variance, divide the sum obtained in Step 4 by 𝑛. To
compute the sample variance, divide the sum obtained in Step 4 by
𝑛 – 1.
Step 6:
Take the square root of the variance obtained in Step 5. The result is the standard
deviation.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
The following table presents the number of text messages sent via cell
phone by a sample of 50 high school students. Approximate the sample
standard deviation number of messages sent.
Number of Text Messages Sent
Frequency
0 – 49
10
50 – 99
5
100 – 149
13
150 – 199
11
200 – 249
7
250 – 299
4
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 1: Compute the midpoint of each class. Recall from the last
section that the sample mean was computed as 137.
Number of Text Messages Sent
Class
Midpoint
0 – 49
25
50 – 99
75
100 – 149
125
150 – 199
175
200 – 249
225
250 – 299
275
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 2: For each class, subtract mean from the class midpoint to
obtain (Midpoint – Mean).
Number of Text Messages Sent
Class
Midpoint
(Midpoint –
Mean)
0 – 49
25
–112
50 – 99
75
–62
100 – 149
125
–12
150 – 199
175
38
200 – 249
225
88
250 – 299
275
138
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 3: For each class, square the differences obtained in Step 2 to
obtain (Midpoint – Mean)2, and multiply by the frequency to
obtain (Midpoint – Mean)2 x (Frequency).
Number of Text Messages Sent
Frequency
(Midpoint –
Mean)
(Midpoint –
Mean)2 x
(Frequency)
0 – 49
10
–112
125,440
50 – 99
5
–62
19,220
100 – 149
13
–12
1,872
150 – 199
11
38
15,884
200 – 249
7
88
54,208
250 – 299
4
138
76,176
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 4: Add the products (Midpoint – Mean)2 x (Frequency) over all
classes.
(Midpoint – Mean)2 x
(Frequency)
125,440
19,220
1,872
βˆ‘ Midpointβˆ’Mean
2
× Frequency
= 125,440 + 19,220 + 1,872 + 15,884 + 54,208 + 76,176
= 292,800
15,884
54,208
76,176
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 5: Since we are computing the sample variance, we divide the
sum obtained in Step 4 by 𝑛 – 1.
𝑠2
βˆ‘ Midpointβˆ’Mean 2 × Frequency 292,800
=
=
π‘›βˆ’1
50 βˆ’ 1
= 5975.51020
Step 6: Take the square root of the variance to obtain the standard
deviation.
𝑠=
𝑠 2 = 5975.51020 = 77.30142
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Grouped Data on the TI-84 PLUS
The same procedure used to compute the mean for grouped data in a
frequency distribution may be used to compute the standard deviation.
Enter the midpoint for each class into L1 and the corresponding frequencies in
L2. Next, select the 1-Var stats command and enter L1 in the List field and
L2 in the FreqList field, if using Stats Wizards. If you are not using Stats
Wizards, you may run the1-Var Stats command followed by L1, comma, L2.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Class Midpoint
Frequency
25
10
75
5
125
13
175
11
225
7
275
4
The output for the last example on the TI-84
PLUS Calculator is presented below.
The value of s represents the approximate
sample standard deviation. In this example
s = 77.30142. Therefore the approximate
standard deviation is 77.30142.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 5
Use the Empirical Rule to summarize data that
are unimodal and approximately symmetric
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bell-Shaped Histogram
Many histograms have a single mode near the center of the data, and
are approximately symmetric. Such histograms are often referred to as
bell-shaped.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The Empirical Rule
When a data set has a bell-shaped histogram, it is often possible to use the standard
deviation to provide an approximate description of the data using a rule known as The
Empirical Rule.
When a population has a histogram that is approximately bell-shaped, then:
β€’
Approximately 68% of the data will be within one standard deviation of the mean.
β€’
Approximately 95% of the data will be within two standard deviations of the mean.
β€’
All, or almost all, of the data will be within three standard deviations of the mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – The Empirical Rule
Example:
The following table presents the U.S. Census Bureau projection for the percentage of the
population aged 65 and over for each state and the District of Columbia. Use the
Empirical Rule to describe the data.
14.1
14.1
12.3
13.1
14.3
13.3
14.1
12.2
14.4
14.3
15.3
12.4
17.8
16.0
13.0
15.0
12.0
8.1
13.6
12.6
Solution:
We first note that the histogram is
approximately bell-shaped and we
may use the TI-84 PLUS
calculator, or other technology, to
compute the population mean and
standard deviation.
14.9
11.5
10.5
13.6
12.6
14.1
12.4
13.7
13.7
10.2
13.5
15.5
12.8
12.4
13.9
14.6
13.8
13.4
10.7
9.0
13.7
15.6
11.5
12.2
12.4
12.8
14.3
14.0
13.8
13.9
12.7
Mean:
𝝁 = πŸπŸ‘. πŸπŸ’πŸ—
Standard Deviation: 𝝈 = 𝟏. πŸ”πŸ–πŸπŸ•
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – The Empirical Rule
Solution (continued):
We compute the following:
πœ‡ βˆ’ 𝜎 = 13.249 βˆ’ 1.6827 = 11.57
πœ‡ + 𝜎 = 13.249 + 1.6827 = 14.93
Approximately 68% of the
data values are between
these.
πœ‡ βˆ’ 2𝜎 = 13.249 βˆ’ 2(1.6827) = 9.88
πœ‡ + 2𝜎 = 13.249 + 2(1.6827) = 16.61
Approximately 95% of the
data values are between
these.
πœ‡ βˆ’ 3𝜎 = 13.249 βˆ’ 3(1.6827) = 8.20
πœ‡ + 3𝜎 = 13.249 + 3(1.6827) = 18.30
Almost all of the data
values are between these.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 6
Use Chebyshev’s Inequality to describe a data
set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Any Data Set
When a distribution is bell-shaped, we use The Empirical Rule to
approximate the proportion of data within one or two standard
deviations. Another rule called Chebyshev’s Inequality holds for any
data set.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chebyshev’s Inequality
In any data set, the proportion of the data that is within K standard deviations
of the mean is at least 1 – 1/K2. Specifically, by setting K = 2 or K = 3, we
obtain the following results.
β€’ At least 3/4, or 75%, of the data are within two standard deviations of the
mean.
β€’ At least 8/9, or 89%, of the data are within three standard deviations of
the mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Chebyshev’s Inequality
Example:
As part of a public health study, systolic blood pressure was measured for a large group
of people. The mean was 120 and the standard deviation was 10. What information
does Chebyshev’s Inequality provide about these data?
Solution:
We compute the following:
π‘₯ βˆ’ 2𝑠 = 120 βˆ’ 2 10 = 100
π‘₯ + 2𝑠 = 120 + 2 10 = 140
π‘₯ βˆ’ 3𝑠 = 120 βˆ’ 3 10 = 90
π‘₯ + 3𝑠 = 120 + 3 10 = 150
We conclude:
β€’ At least 3/4 (75%) of the people had systolic blood pressures between 100 and 140.
β€’ At least 8/9 (89%) of the people had systolic blood pressures between 90 and 150.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 7
Compute the coefficient of variation
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Coefficient of Variation
The coefficient of variation (CV for short) tells how large the standard
deviation is relative to the mean. It can be used to compare the spreads
of data sets whose values have different units.
The coefficient of variation is found by dividing the standard deviation
by the mean.
CV =
𝝈
𝝁
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Coefficient of Variation
Example:
National Weather service records show that over a thirty-year period, the annual
precipitation in Atlanta, Georgia had a mean of 49.8 inches with a standard deviation of
7.6 inches, and the annual temperature had a mean of 62.2 degrees Fahrenheit with a
standard deviation of 1.3 degrees. Compute the coefficient of variation for precipitation
and for temperature. Which has greater spread relative to its mean?
Solution:
We compute the following:
CV for precipitation =
standard deviation for precipitation
7.6
=
= 0.15
49.8
mean precipitation
CV for temperature =
standard deviation for temperature
1.3
=
= 0.02
62.2
mean temperature
The CV for precipitation is larger than the CV for temperature. Therefore, precipitation
has a greater spread relative to its mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
You Should Know…
β€’
β€’
β€’
β€’
β€’
β€’
β€’
β€’
How to compute the range of a data set
The notation for population variance, population standard deviation,
sample variance, and sample standard deviation
How to compute the variance and the standard deviation for
populations and samples
How to use the TI-84 PLUS calculator to compute the variance and
standard deviation for populations and samples
How to approximate the standard deviation for grouped data
How to use The Empirical Rule to describe a bell-shaped data set
How to use Chebyshev’s Inequality to describe any data set
How to compute and interpret the coefficient of variation
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.