Download Section 3.2 - USC Upstate: Faculty

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Section 3.2
Measures of Variation
Range
Standard Deviation
Variance
3.2 / 1
The Range
• The range is the difference between the
largest and smallest values of a distribution.
• Example: Find the range:
10, 13, 17, 17, 18
The range = largest minus smallest
= 18 -10 = 8
3.2 / 2
The Standard Deviation
The standard variation is a measure of the
average variation of the data entries from
the mean.
Standard deviation of a sample
s
 (x  x)
n 1
n = sample size
2
mean of the
sample
3.2 / 3
To calculate standard deviation of a
sample
• Calculate the mean of the sample.
• Find the difference between each entry (x) and the
mean. These differences will add up to zero.
• Square the deviations from the mean.
• Sum the squares of the deviations from the mean.
• Divide the sum by (n  1) to get the variance.
• Take the square root of the variance to get the
2
• standard deviation.
s
 (x  x)
n 1
3.2 / 4
The Variance
The variance is the square of the standard
deviation
Variance of a Sample
s
2
(x  x )


2
n 1
3.2 / 5
Example
Find the standard deviation and variance
x
30
26
22
78
xx
4
0
-4
Mean = 26
The variance
s2 
2
(
x

x
)

n 1
(x - x)
Sum = 0
2
16
0
16
___
32
The standard deviation
= 32  2 =16
s = 16  4
6
Example
Find the mean, the
standard deviation and variance
x
xx
(x - x)
mean = 5
4
1
1
5
0
0
5
0
0
7
2
4
4
1
1
Σx =25
 ( x  x)
2
2
6
3.2 / 7
Example cont.
Mean = 5
S tan dard deviation 

s
( x  x )2
n 1
1 .5  1 .22
6

 1.5  1.22
4
Variance
s  1.5
6

 1 .5
4
2
3.2 / 8
Computation Formulas for Sample Variance
and Standard Deviation:
2


x
x


n
2
Sample variance  s 2 
n 1
Sample standard devaition  s 
To find Σx2
To find ( Σx ) 2
2


x
2

x


n1
n
Square the x values, then add.
Sum the x values, then square.
3.2 / 9
Use the computing formulas to
find s and s2
x
4
x2
16
5
25
5
25
7
4
25
s
49
16
131
2
s 
2
x


2

 x 
n
n 1
131  625
s
2
51
2
x
 
5  1.5
 x
n 1
2
n
s  1.5  1.22
10
Population Mean
population mean 
x


N
where N  number of data values in the population
Population Standard Deviation

 x   
2
N
where N  number of data values in the population
3.2 / 11
Coefficient Of Variation
• The disadvantage of the standard deviation as a
comparative measure of variation is that it
depends on the units of measurement. This
means that it is difficult to use the standard
deviation to compare measurements from
different populations.
• For this reason, statisticians have defined the
coefficient of variation, which expresses the
standard deviation as a percentage of the sample
or population mean.
3.2 / 12
Coefficient Of Variation:
• The coefficient of variation is a measurement of
the relative variability (or consistency) of data.
s

CV   100 or
 100
x

• Notice that the numerator and denominator in the
definition of CV have the same units, so CV itself has no
units of measurement. This give us the advantage of
being able to directly compare the variability of two
different populations using the coefficient of variation.
3.2 / 13
CV is used to compare
variability or consistency
A sample of newborn infants had a mean weight of
6.2 pounds with a standard deviation of 1 pound.
A sample of three-month-old children had a mean
weight of 10.5 pounds with a standard deviation of
1.5 pound.
Which (newborns or 3-month-olds) are more variable
in weight?
3.2 / 14
To compare variability,
compare Coefficient of Variation
• For newborns:
CV = 16%
• For 3-month-olds:
CV = 14%
Higher CV: more
variable
Lower CV: more
consistent
Use Coefficient of Variation
• You may wish to compare two groups of data, to
answer:
– Which is more consistent?
– Which is more variable?
3.2 / 15
Example
A local fishing store sells spinners (a type of fishing lure).
The store has only 8 different types of spinners for sale.
The prices (in dollars) are
2.10 1.95 2.60 2.00 1.85 2.25 2.15 2.25
Find the coefficient of variation
Solution
a. Compute the mean and standard deviation of the
population
μ = $2.14 and σ = $0.22
3.2 / 16
Example cont.
b. Compare the CV of prices and comment on the
meaning of the results.

0.22
CV  x100 
x100  .1028 x100  10.28%

2.14
The CV can be though of as a measure of the spread of
the data relative to the average of the data. Since
the fishing store is very small, it carries a small
selection of spinners that are all priced similarly.
The CV tells us that the standard deviation of the
spinner prices is only 10.28% from the mean.
3.2 / 17
Example
A large fishing store in Nebraska has a broad selection
of spinners. The prices of a random sample of 10
spinners are
1.69 1.49 3.09 1.79 1.39 2.89 1.49 1.39 1.49 1.99
a. Use the calculator to compute x and s
x  $1.87 and s = $0.62
b. Compute the CV for the spinner prices

0.62
CV  x100 
x100  .3316 x100  33.16%

1.87
3.2 / 18
Example cont.
Compare the mean, standard deviation, and CV for the
spinner prices at the two fishing stores. Comment on
the differences.
The CV for Nebraska store is three times more than the
CV from the previous example.
First, because the fishing store in the previous example
is small, and tends to have higher prices (larger μ).
Second, it has limited selection of spinners with a
smaller variation of price.
3.2 / 19
Shebyshev’s Theorem
The spread of dispersion of a set of data about the mean
will be small if the standard deviation is small, and it will
be large if the standard deviation is large. If we are
dealing with a symmetrical bell-shaped distribution,
then we can make very definite statements about the
proportion of the data that must lie within a certain
number of standard deviations on either side of the
mean.
However, the concept of data spread about the mean can
be expressed quite generally for all data distributions
(skewed, symmetric, or other shape) by using the
remarkable theorem of Chebyshev.
3.2 / 20
CHEBYSHEV'S THEOREM
For any set of data and for any number k,
greater than one, the proportion of the data
that lies within k standard deviations of the
mean is at least:
1
1 
k
2
3.2 / 21
Results of Chebyshev’s theorem
1
1
1

1


1

 0.75  75
2
2
k
2
4
• For k = 2:
or at least 75%
of the data fall in the interval from
1
• from   2
to   2
(between 2 St Deviations)
• For K = 3 at least 88.9% (between 3 St Deviations)
•
• For K = 4 at least 93.8% (between 4 St Deviations)
3.2 / 22
Using Chebyshev’s Theorem
• A mathematics class completes an
examination and it is found that the class
mean is 77 and the standard deviation is 6.
• According to Chebyshev's Theorem, between
what two values would at least 75% of the
grades be?
3.2 / 23
Mean = 77
Standard deviation = 6
At least 75% of the grades would be in the
interval:
x  2 s to x  2 s
77 – 2(6) to 77 + 2(6)
77 – 12 to 77 + 12
65 to 89
Assignment 5
3.2 / 24
Entering Data (Calc.)
Data is stored in Lists on the calculator. Locate and press the
STAT button on the calculator. Choose EDIT. The calculator
will display the first three of six lists (columns) for entering
data. Simply type your data and press ENTER. Use your arrow
keys to move between lists.
Data can also be entered from the home screen using set
notation -- {15, 22, 32, 31, 52, 41, 11} → L1 (where → is the
STO key)
• Data can be entered in a second list based upon the
information in a previous list. In the example below, we will
double all of our data values in L1 and store them in L2. If you
arrow up ONTO L2, you can enter a formula for generating
L2. The formula will appear at the bottom of the
screen. Press ENTER and the new list is created.
3.2 / 25
Clearing Data (Calc.)
• To clear all data from a list: Press STAT. From the EDIT
menu, move the cursor up ONTO the name of the list
(L1). Press CLEAR. Move the cursor down. NOTE: The list
entries will not disappear until the cursor is moved
down. (Avoid pressing DEL as it will delete the entire
column. If this happens, you can reinstate the column by
pressing STAT #5 SetUpEditor.)
• You may also clear a list by choosing option #4 under the EDIT
menu, ClrList. ClrList will appear on the home screen waiting
for you to enter which list to clear. Enter the name of a list by
pressing the 2nd button and the yellow L1 (above the 1).
To clear an individual entry: Select the value and press DEL.
3.2 / 26
Sorting Data (Calc.)
• Sorting Data: (helpful when finding the mode)
Locate and press the STAT button. Choose option #2, SortA(.
Specify the list you wish to sort by pressing the 2nd button
and the yellow L1 list name. Press ENTER and the list will be
put in ascending order (lowest to highest). SortD will put the
list in descending order.
• One Variable Statistical Calculations:
Press the STAT button. Choose CALC at the top. Select 1-Var
Stats. Notice that you are now on the home screen. Specify
the list you wish to use by choosing the 2nd button and the
list name:
Press ENTER and view the calculations. Use the down arrow
to view all of the information.
•
3.2 / 27
One Variable Statistical
Calculations (Calc.)
= mean
x
= the sum of the data
x 2 = the sum of the squares of the data
= the sample standard deviation
sx
= the population standard deviation
x
= the sample size (# of pieces of data)
n
min X = the smallest data entry
= data at the first quartile
Q1
med = data at the median (second quartile)
= data at the third quartile
Q3
max X = the largest data entry
x
3.2 / 28
Measures of Dispersion (Calc)
Range, Standard Deviation, Variance, Mean Absolute Deviation
• Problem: For the data set {10, 12, 40, 35, 14, 24, 13, 21, 42, 30},
find the range, the standard deviation, the variance, and the mean
absolute deviation to the nearest hundredth.
• A quick reminder before we begin the solution:
In statistics, the population form is used when the data being
analyzed includes the entire set of possible data.
The sample form is used when the data is a random sample taken
from the entire set of data. You should use population form
unless you know that you are working with a random sample of
the data.
3.2 / 29
Measures of Dispersion cont. (Calc)
• To find the range:
• To find the range:
Enter the data, as is, into L1. You can enter the list on the
home screen and "store" to L1, or you can go directly to L1 (2nd
STAT, #1 Edit).
• Sort the list to quickly retrieve the highest and lowest values
for the range. (2nd STAT, #2 SortA). You can choose ascending
or descending. Read the high and low values from L1 for
computing the range.
Range = 42 - 10 = 32.
• OR: To find the range: Do not sort. Simply type on the home
screen using the min and max functions found under MATH →
NUM #6 min and #7 max.
3.2 / 30
Range = 32
Measures of Dispersion cont. (Calc)
• To find standard deviation:
• To find standard deviation: Since this question deals with the
complete set, we will be using "population" form, not sample
form.
• Go to one-variable stats for "population" standard
deviation. STAT → CALC #1 1-Var Stats
•
• NOTE! The standard deviations found in the CATALOG, stdDev,
and also found by 2nd LIST → MATH #7 stdDev are both
Sample standard deviations.
• Population Standard Deviation = 11.43
3.2 / 31
Measures of Dispersion cont. (Calc)
To find variance: To find variance: The "population" variance is the
square of the population standard deviation. The symbol is under
VARS - #5 Statistics
NOTE! The variance found in the CATALOG and also found by 2nd
List → MATH #8 variancePopulation
are both
1Sample variances.
MAD   | x  x |
n
To find mean absolute deviation: To find mean absolute deviation:
To calculate the mean absolute deviation you will have to enter
the formula.
n
i 1
i
1 n
Population MAD   | xi  x |
n i1
Mean Absolute Deviation = 10.12
3.2 / 32
Measures of Dispersion cont. (Calc)
NOTE! Be sure that you have run 1-Var Stats (under STAT - CALC #1)
first, so that the calculator will have computed . Otherwise, you
will get an error from this formula.
x and n are found under VARS #5 Statistics. Sum and abs are
quickly found in CATALOG. Sum is also under 2nd LIST - MATH #5
sum. abs is also under MATH - NUM #1abs.
OR: To find mean absolute deviation:
A longer, but workable, solution can also be accomplished using
the lists. As stated above, run 1-Var Stats so the calculator will
compute . Now, go to L2 (STAT #1 EDIT) and move UP onto
L2. Type, at the bottom of the window, the portion of the formula
that finds the difference between each data entry and the mean,
using absolute value to make these distances positive. Now, find
the mean, , of L2 by using 1-Var Stats on L2, and read the answer of
10.12.
3.2 / 33
Measures of Dispersion on Grouped Data
Problem:
Data Entry Frequency
100
8
150
15
200
21
250
14
300
5
For the data set shown in this table, find the range, the standard
deviation, and the variance to the nearest hundredth.
Since this question deals with the complete set, we will be using
"population" form, not sample form.
For central tendency on grouped data, see Mean, Mode,
Median with Grouped Data.
3.2 / 34
Measures of Dispersion on Grouped Data
• Solution:
• To find the range: No need for calculator work for the range. It is
easily observed from the table.
Range = 300 - 100 = 200.
To find standard deviation: Remember, we are looking for
"population" form which will be found using 1-Var Stats.
• Enter the "Data Entry" into L1 and the "Frequency" into L2. Go to
one-variable stats to find "population" standard deviation.
STAT → CALC #1 1-Var Stats
Be sure to use parameters L1, L2 to indicate both the values AND
their frequencies.
• NOTE! The standard deviation found in the CATALOG, stdDev, and
also found by 2nd LIST → MATH #7 stdDev are both Sample
standard deviations.
3.2 / 35
Population Standard Deviation = 56.42
Measures of Dispersion on Grouped Data
To find variance: The "population" variance is the square of the
population standard deviation. The symbol is under VARS - #5
Statistics
NOTE! The variance found in the CATALOG and also found by
2nd List → MATH #8 variance are both Sample variances.
Population Variance = 3183.42
3.2 / 36