Document related concepts
no text concepts found
Transcript
```Welcome to
Week 04
College Statistics
http://media.dcnews.ro/image/201109/w670/statistics.jpg
Descriptive Statistics
Averages tell where the data
tends to pile up
Descriptive Statistics
Another good way to describe
data is how spread out it is
Descriptive Statistics
Suppose you are using the mean
“5” to describe each of the
VARIABILITY
IN-CLASS PROBLEMS
For which sample would “5” be
closer to the actual data
values?
VARIABILITY
IN-CLASS PROBLEMS
In other words, for which of
the two sets of data would the
mean be a better descriptor?
VARIABILITY
IN-CLASS PROBLEMS
For which of the two sets of
data would the mean be a
better descriptor?
Variability
our data values are
are called
“Measures of Variability”
Variability
The variability tells how close
to the “average” the sample
data tend to be
Variability
Just like measures of central
tendency, there are several
measures of variability
Variability
Range = max – min
Variability
Interquartile range (symbolized
IQR):
IQR = 3rd quartile – 1st quartile
Variability
“Range Rule of Thumb”
A quick-and-dirty variance
measure:
(Max – Min)/4
Variability
Variance (symbolized s2)
sum of (obs – x)2
s2 =
n - 1
Variability
An observation “x” minus the
mean x is called a “deviation”
The variance is sort of an
average (arithmetic mean) of
the squared deviations
Variability
Sums of squared deviations are
used in the formula for a
circle:
r2 = (x-h)2 + (y-k)2
where r is the radius of the
circle and (h,k) is its center
Variability
OK… so if its sort of an
arithmetic mean, howcum is it
divided by “n-1” not “n”?
Variability
Every time we estimate
something in the population
using our sample we have used
up a bit of the “luck” that we
representative sample
Variability
To make up for that, we give a
little edge to the opposing side
of the story
Variability
Since a small variability means
our sample arithmetic mean is a
better estimate of the
population mean than a large
variability is, we bump up our
estimate of variability a tad to
make up for it
Variability
Dividing by “n” would give us a
smaller variance than dividing
by “n-1”, so we use that
Variability
Why not “n-2”?
Variability
Why not “n-2”?
Because we only have used 1
estimate to calculate the
variance: x
Variability
So, the variance is sort of an
average (arithmetic mean) of
the squared deviations bumped
up a tad to make up for using
an estimate (x) of the
population mean (μ)
Variability
Trust me…
Variability
Standard deviation (symbolized
“s” or “std”)
s =
variance
Variability
The standard deviation is an
average square root of a sum
of squared deviations
We’ve used this before in
distance calculations:
d =
(x1−x2)2 + (y1−y2)2
Variability
The range, interquartile range
and standard deviation are in
the same units as the original
data (a good thing)
The variance is in squared units
(which can be confusing…)
Variability
Naturally, the measure of
variability used most often is
the hard-to-calculate one…
Variability
Naturally, the measure of
variability used most often is
the hard-to-calculate one…
… the standard deviation
Variability
Statisticians like it because it
is an average distance of all of
the data from the center – the
arithmetic mean
Variability
Range = max – min
IQR = 3rd quartile – 1st quartile
Range Rule of Thumb =
(max – min)/4
sum of (obs – x)2
Variance =
n - 1
s =
variance
Questions?
Variability
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
What is the range?
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Min
Max
Data: 1 1 2 2 3 3
Range = 3 – 1 = 2
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
What is the IQR?
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Q1 Median Q3
Data: 1 1 2 2 3 3
IQR = 3 – 1 = 2
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
What is the Thumb?
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Min
Max
Data: 1 1 2 2 3 3
Thumb = (3-1)/4 = 0.5
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
What is the Variance?
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
First find x!
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
3+3+2+2+1+1
x =
= 2
6
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
Now calculate the deviations!
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data:
1
1 2
2 3 3
Dev: 1-2=-1 1-2=-1 2-2=0 2-2=0 3-2=1 3-2=1
Variability
What do you get if you add up all
of the deviations?
Data: 1
1
2
2
3
3
Dev: 1-2=-1 1-2=-1 2-2=0 2-2=0 3-2=1 3-2=1
Variability
Zero!
Variability
Zero!
That’s true for ALL deviations
everywhere in all times!
Variability
Zero!
That’s true for ALL deviations
everywhere in all times!
That’s why they are squared in
the sum of squares!
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1
Dev: -1 =1 -1 =1
2
2
2
0 =0
2
2
0 =0
2
3
1 =1
2
3
1 =1
2
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2
2 3 3
sum(obs–x)2: 1+1+0+0+1+1 = 4
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2
2 3 3
Variance: 4/(6-1) = 4/5 = 0.8
YAY!
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
What is s?
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
IQR = 3rd quartile – 1st quartile
Thumb = (max – min)/4
sum of (obs – x)2
Variance =
n - 1
s = variance
Data: 1 1 2 2 3 3
s =
0.8 ≈ 0.89
VARIABILITY
IN-CLASS PROBLEMS
So, for: Data: 1 1 2 2 3 3
Range = max – min = 2
IQR = 3rd quartile – 1st quartile = 2
Thumb = (max – min)/4 = 0.5
2
sum
of
(obs
–
x)
Variance =
= 0.8
n - 1
s = variance ≈ 0.89
Variability
does all this for you???
Questions?
Variability
Just like for n and N
and x and μ there are
population variability symbols,
too!
Variability
Naturally, these are going to
have funny Greek-y symbols
just like the averages …
Variability
The population variance
2
is “σ ”
called “sigma-squared”
The population standard
deviation is “σ”
called “sigma”
Variability
Again, the sample statistics s2
and s values estimate population
parameters σ2 and σ (which are
unknown)
Variability
Some calculators can find x s
and σ for you
(Not recommended for large
data sets – use EXCEL)
Variability
s sq vs sigma sq
Variability
s sq is divided by “n-1”
sigma sq is divided by “n”
Questions?
Variability
Outliers!
They can really affect your
statistics!
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Is the mode affected?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Original mode: 1
New mode: 1
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Is the midrange affected?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Original midrange: 3
New midrange: 371
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Is the median affected?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Original median: 1.5
New median: 1.5
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Is the mean affected?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Original mean: 2
New mean: 124
𝟓
𝟔
𝟏
𝟔
Outliers!
variability?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Is the range affected?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Original range: 4
New range: 740
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Is the interquartile range
affected?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Original IQR: 2.5 – 1 = 1.5
New IQR: 1.5
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Is the variance affected?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Original s2: ≈2.57
New s2: ≈91,119.37
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Is the standard deviation
affected?
OUTLIERS
IN-CLASS PROBLEMS
Suppose
1 1 1
Suppose
1 1 1
we
2
we
2
3 5
now have data:
3 741
Original s: ≈1.60
New s: ≈301.86
Questions?
Descriptive Statistics
Last week we got this summary
table from
Excel Descriptive
Statistics
Beans
Liquor
Butter
BEQ
Mean
72,836.8
5,230.8
18,537.5
104,030.2
Standard Error
1,835.5
309.9
593.1
1,528.7
Median
72,539.0
5,020.0
18,011.3
104,617.2
Mode
#N/A
#N/A
#N/A
#N/A
Standard Deviation
9,359.4
1,580.2
3,024.1
7,794.8
Sample Variance
87,599,301.8 2,496,988.9 9,145,138.6 60,759,154.8
Kurtosis
-1.2
-0.2
-1.3
-1.0
Skewness
0.0
0.1
0.3
-0.1
Range
32,359.4
6,477.2
9,384.7
27,075.8
Midrange
71,625.3
5,076.6
19,263.4
103,849.2
Minimum
55,445.6
1,838.0
14,571.0
90,311.3
Maximum
87,805.0
8,315.2
23,955.7
117,387.1
Sum
1,893,757.1 136,000.0 481,975.2 2,704,784.1
Count
26.0
26.0
26.0
26.0
Descriptive Statistics
Which are Measures of Central
Tendency?
Beans
Liquor
Butter
BEQ
Mean
72,836.8
5,230.8
18,537.5
104,030.2
Standard Error
1,835.5
309.9
593.1
1,528.7
Median
72,539.0
5,020.0
18,011.3
104,617.2
Mode
#N/A
#N/A
#N/A
#N/A
Standard Deviation
9,359.4
1,580.2
3,024.1
7,794.8
Sample Variance
87,599,301.8 2,496,988.9 9,145,138.6 60,759,154.8
Kurtosis
-1.2
-0.2
-1.3
-1.0
Skewness
0.0
0.1
0.3
-0.1
Range
32,359.4
6,477.2
9,384.7
27,075.8
Midrange
71,625.3
5,076.6
19,263.4
103,849.2
Minimum
55,445.6
1,838.0
14,571.0
90,311.3
Maximum
87,805.0
8,315.2
23,955.7
117,387.1
Sum
1,893,757.1 136,000.0 481,975.2 2,704,784.1
Count
26.0
26.0
26.0
26.0
Descriptive Statistics
Which are Measures of Central
Tendency?
Beans
Liquor
Butter
BEQ
Mean
72,836.8
5,230.8
18,537.5
104,030.2
Standard Error
1,835.5
309.9
593.1
1,528.7
Median
72,539.0
5,020.0
18,011.3
104,617.2
Mode
#N/A
#N/A
#N/A
#N/A
Standard Deviation
9,359.4
1,580.2
3,024.1
7,794.8
Sample Variance
87,599,301.8 2,496,988.9 9,145,138.6 60,759,154.8
Kurtosis
-1.2
-0.2
-1.3
-1.0
Skewness
0.0
0.1
0.3
-0.1
Range
32,359.4
6,477.2
9,384.7
27,075.8
Midrange
71,625.3
5,076.6
19,263.4
103,849.2
Minimum
55,445.6
1,838.0
14,571.0
90,311.3
Maximum
87,805.0
8,315.2
23,955.7
117,387.1
Sum
1,893,757.1 136,000.0 481,975.2 2,704,784.1
Count
26.0
26.0
26.0
26.0
Descriptive Statistics
Which are Measures of
Variability?
Beans
Liquor
Butter
BEQ
Mean
72,836.8
5,230.8
18,537.5
104,030.2
Standard Error
1,835.5
309.9
593.1
1,528.7
Median
72,539.0
5,020.0
18,011.3
104,617.2
Mode
#N/A
#N/A
#N/A
#N/A
Standard Deviation
9,359.4
1,580.2
3,024.1
7,794.8
Sample Variance
87,599,301.8 2,496,988.9 9,145,138.6 60,759,154.8
Kurtosis
-1.2
-0.2
-1.3
-1.0
Skewness
0.0
0.1
0.3
-0.1
Range
32,359.4
6,477.2
9,384.7
27,075.8
Midrange
71,625.3
5,076.6
19,263.4
103,849.2
Minimum
55,445.6
1,838.0
14,571.0
90,311.3
Maximum
87,805.0
8,315.2
23,955.7
117,387.1
Sum
1,893,757.1 136,000.0 481,975.2 2,704,784.1
Count
26.0
26.0
26.0
26.0
Descriptive Statistics
Which are Measures of
Variability?
Beans
Liquor
Butter
BEQ
Mean
72,836.8
5,230.8
18,537.5
104,030.2
Standard Error
1,835.5
309.9
593.1
1,528.7
Median
72,539.0
5,020.0
18,011.3
104,617.2
Mode
#N/A
#N/A
#N/A
#N/A
Standard Deviation
9,359.4
1,580.2
3,024.1
7,794.8
Sample Variance
87,599,301.8 2,496,988.9 9,145,138.6 60,759,154.8
Kurtosis
-1.2
-0.2
-1.3
-1.0
Skewness
0.0
0.1
0.3
-0.1
Range
32,359.4
6,477.2
9,384.7
27,075.8
Midrange
71,625.3
5,076.6
19,263.4
103,849.2
Minimum
55,445.6
1,838.0
14,571.0
90,311.3
Maximum
87,805.0
8,315.2
23,955.7
117,387.1
Sum
1,893,757.1 136,000.0 481,975.2 2,704,784.1
Count
26.0
26.0
26.0
26.0
Questions?
Variability
Ok… swell… but
WHAT DO YOU USE THESE
MEASURES OF VARIABILITY
FOR???
Variability
From last week – THE BEANS!
Mean
Standard
Deviation
Sample
Variance
Range
Minimum
Maximum
Moong Moong Moong Black- Black- BlackCran- CranLima- LimaFava- Fava-L
-W
-D
L
W
D Cran-L W
D Lima-L W
D Fava-L W
D
4.77 3.38 3.00 8.23 5.54 4.15 12.85 7.85 5.92 20.77 13.08 6.54 27.92 17.77 8.00
0.44
0.65
0.71
1.01
0.78
0.90
1.21
0.69
0.86
1.01
1.12
1.66
1.75
1.36
0.19
1.00
4.00
5.00
0.42
2.00
2.00
4.00
0.50 1.03
2.00 3.00
2.00 7.00
4.00 10.00
0.60
3.00
4.00
7.00
0.81 1.47
2.00 4.00
3.00 10.00
5.00 14.00
0.47
2.00
7.00
9.00
0.74 1.03 1.24 2.77 3.08 1.86 5.83
3.00 4.00 4.00 7.00 5.00 5.00 10.00
4.00 19.00 11.00 4.00 26.00 15.00 5.00
7.00 23.00 15.00 11.00 31.00 20.00 15.00
We wanted to know – could you
use sieves to separate the
beans?
2.42
Variability
You could have plotted the
mean measurement for each
bean type:
Variability
This might have helped you tell
whether sieves could separate
the types of beans
Variability
But… beans are not all
“average” – smaller beans might
slip through the holes of the
sieve!
How could you tell if the beans
were totally separable?
Variability
Make a graph that includes not
just the average, but also the
Variability
New Excel Graph:
hi-lo-close
Variability
the labels are followed by the
maximums, then the minimums,
then the means:
Maximum
Minimum
Mean
Moong Moong Moong Black- Black- BlackCran- CranLima- LimaFava- Fava-L
-W
-D
L
W
D Cran-L W
D Lima-L W
D Fava-L W
D
5.00 4.00 4.00 10.00 7.00 5.00 14.00 9.00 7.00 23.00 15.00 11.00 31.00 20.00 15.00
4.00 2.00 2.00 7.00 4.00 3.00 10.00 7.00 4.00 19.00 11.00 4.00 26.00 15.00 5.00
4.77 3.38 3.00 8.23 5.54 4.15 12.85 7.85 5.92 20.77 13.08 6.54 27.92 17.77 8.00
Highlight this data
Click “Insert”
Click “Other Charts”
Click the first Stock chart: “Hi-Lo-Close”
Ugly… as usual
…but informative!
Left click the graph area
Click on “Layout”
Enter title and y-axis label:
Click one of the “mean”
markers on the graph
Click
Format
Data
Series
the markers
Repeat for the max (top of
black vertical line) and min
(bottom of
black
vertical line)
TAH DAH!
Which beans can you sieve?
Questions?
How to Lie with Statistics #4
You can probably guess…
It involves using the type of
measure of variability that