Download Week03 Class1 PowerPoint

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Welcome to
Week 03 Tues
MAT135 Statistics
http://media.dcnews.ro/image/201109/w670/statistics.jpg
Review
Types of Statistics
Descriptive statistics –
describe our sample – we’ll use
this to make inferences about
the population
Inferential statistics –
make inferences about the
population with a level of
probability attached
Descriptive Statistics
graphs
n
max
min
each observation
frequencies
Descriptive Statistics
And…
averages!
Measures of Central Tendency
In statistics class, averages are
called “measures of central
tendency”
(where the data “tend to center”)
Measures of Central Tendency
arithmetic mean
(sample x, population μ)
x is your best estimate of the
population mean μ
Measures of Central Tendency
The arithmetic mean is the
balance point for a data set
Measures of Central Tendency
3
3
3
3
3
4
5
8
8
8
8
8
9
The median (50th percentile)
is also a measure of central
tendency (aka: average)
Measures of Central Tendency
The mode (the most commonlyoccurring observation) is
another!
Statistics vs Parameters
Statistic
Parameter
n
N
x
μ
Questions?
Descriptive Statistics
Averages tell where the data
tends to pile up
Descriptive Statistics
Another good way to describe
data is how spread out it is
Descriptive Statistics
Suppose you are using the mean
“5” to describe each of the
observations in your sample
VARIABILITY
IN-CLASS PROBLEM 5
For which sample would “5” be
closer to the actual data
values?
VARIABILITY
IN-CLASS PROBLEM 5
In other words, for which of
the two sets of data would the
mean be a better descriptor?
VARIABILITY
IN-CLASS PROBLEM 6
For which of the two sets of
data would the mean be a
better descriptor?
Variability
Numbers telling how spread out
our data values are
are called
“Measures of Variability”
Variability
The variability tells how close
to the “average” the sample
data tend to be
Variability
Just like measures of central
tendency, there are several
measures of variability
Variability
Range = R = max – min
Variability
Variance (symbolized s2)
2
sum
of
(obs
–
x)
s2 =
n - 1
Variability
An observation “x” minus the
mean x is called a “deviation”
The variance is sort of an
average (arithmetic mean) of
the squared deviations
Variability
In algebra, the absolute value
of “deviations” are a measure
of distance
Variability
We square them because it
gets rid of the “+” “-” problem
and has mathematical
advantages over taking absolute
values
Variability
Sums of squared deviations are
used in the formula for a
circle:
r2 = (x-h)2 + (y-k)2
where r is the radius of the
circle and (h,k) is its center
Variability
OK… so if its sort of an
arithmetic mean, howcum is it
divided by “n-1” not “n”?
Variability
Every time we estimate
something in the population
using our sample we have used
up a bit of the “luck” that we
had in getting a (hopefully)
representative sample
Variability
To make up for that, we give a
little edge to the opposing side
of the story
Variability
Since a small variability means
our sample arithmetic mean is a
better estimate of the
population mean than a large
variability is, we bump up our
estimate of variability a tad to
make up for it
Variability
Dividing by “n” would give us a
smaller variance than dividing
by “n-1”, so we use that
Variability
Why not “n-2”?
Variability
Why not “n-2”?
Because we only have used 1
estimate to calculate the
variance: x
Variability
So, the variance is sort of an
average (arithmetic mean) of
the squared deviations bumped
up a tad to make up for using
an estimate (x) of the
population mean (μ)
Variability
Trust me…
Variability
Standard deviation (symbolized
“s” or “std”)
s =
variance
Variability
The standard deviation is an
average square root of a sum
of squared deviations
We’ve used this in algebra class
for distance calculations:
d =
(x1−x2)2 + (y1−y2)2
Variability
The range and standard
deviation are in the same units
as the original data (a good
thing)
The variance is in squared units
(which can be confusing…)
Variability
Naturally, the measure of
variability used most often is
the hard-to-calculate one…
Variability
Naturally, the measure of
variability used most often is
the hard-to-calculate one…
… the standard deviation
Variability
Statisticians like it because it
is an average distance of all of
the data from the center – the
arithmetic mean
Variability
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s =
variance
Questions?
Variability
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
VARIABILITY
IN-CLASS PROBLEM 7
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2 2 3 3
What is the range?
VARIABILITY
IN-CLASS PROBLEM 7
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Min
Max
Data: 1 1 2 2 3 3
Range = 3 – 1 = 2
VARIABILITY
IN-CLASS PROBLEMS
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2 2 3 3
What is the Variance?
VARIABILITY
IN-CLASS PROBLEM 8
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2 2 3 3
First find x!
VARIABILITY
IN-CLASS PROBLEM 8
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2 2 3 3
3+3+2+2+1+1
x =
= 2
6
VARIABILITY
IN-CLASS PROBLEM 9
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2 2 3 3
Now calculate the deviations!
VARIABILITY
IN-CLASS PROBLEM 9
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data:
1
1 2
2 3 3
Dev: 1-2=-1 1-2=-1 2-2=0 2-2=0 3-2=1 3-2=1
Variability
What do you get if you add up all
of the deviations?
Data: 1
1
2
2
3
3
Dev: 1-2=-1 1-2=-1 2-2=0 2-2=0 3-2=1 3-2=1
Variability
Zero!
Variability
Zero!
That’s true for ALL deviations
everywhere in all times!
Variability
Zero!
That’s true for ALL deviations
everywhere in all times!
That’s why they are squared in
the sum of squares!
VARIABILITY
IN-CLASS PROBLEM 10
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1
Dev: -1 =1 -1 =1
2
2
2
0 =0
2
2
0 =0
2
3
1 =1
2
3
1 =1
2
VARIABILITY
IN-CLASS PROBLEM 11
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2
2 3 3
sum(obs–x)2: 1+1+0+0+1+1 = 4
VARIABILITY
IN-CLASS PROBLEM 12a,b
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2
2 3 3
Variance: 4/(6-1) = 4/5 = 0.8
YAY!
VARIABILITY
IN-CLASS PROBLEM 13
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2 2 3 3
What is s?
VARIABILITY
IN-CLASS PROBLEM 13
Range = max – min
sum of (obs – x)2
Variance =
n − 1
s = variance
Data: 1 1 2 2 3 3
s =
0.8 ≈ 0.89
VARIABILITY
IN-CLASS PROBLEMS
So, for: Data: 1 1 2 2 3 3
Range = max – min = 2
sum of (obs – x)2
Variance =
= 0.8
n − 1
s =
variance ≈ 0.89
Variability
Aren’t you glad Excel
does all this for you???
Questions?
Variability
Just like for n and N
and x and μ there are
population variability symbols,
too!
Variability
Naturally, these are going to
have funny Greek-y symbols
just like the averages …
Variability
The population variance
2
is “σ ”
called “sigma-squared”
The population standard
deviation is “σ”
called “sigma”
Variability
Again, the sample statistics s2
and s values estimate population
parameters σ2 and σ (which are
unknown)
Variability
Some calculators can find
x s and σ for you
(Not recommended for large
data sets – use EXCEL)
Variability
s sq “s2”
vs
sigma sq “σ2”
Variability
s2 is divided by “n-1”
σ2 is divided by “N”
Questions?
Standard Deviation
What does standard deviation
mean?
STANDARD DEVIATION
IN-CLASS PROBLEM 14
Suppose we have two pizza
delivery drivers
We’re going to give one a raise
But who?
STANDARD DEVIATION
IN-CLASS PROBLEM 14
Both have the same mean
delivery time of 15 minutes
but Amanda’s standard
deviation of delivery times =
2.6 minutes
while Bethany’s standard
deviation of delivery times =
8.4 minutes.
STANDARD DEVIATION
IN-CLASS PROBLEM 14
Who should get the raise?
STANDARD DEVIATION
IN-CLASS PROBLEM 15
What are the advantages of
having a data set that has a
small standard deviation?
Questions?
Variability
Outliers!
They can really affect your
statistics!
OUTLIERS
IN-CLASS PROBLEM 16
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Is the mode affected?
OUTLIERS
IN-CLASS PROBLEM 16
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Original mode: 1
New mode: 1
OUTLIERS
IN-CLASS PROBLEM 17
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Is the median affected?
OUTLIERS
IN-CLASS PROBLEM 17
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Original median: 2
New median: 2
OUTLIERS
IN-CLASS PROBLEM 18
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Is the mean affected?
OUTLIERS
IN-CLASS PROBLEM 18
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Original mean: 2.4
New mean: 149.6
Outliers!
How about measures of
variability?
OUTLIERS
IN-CLASS PROBLEM 19
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Is the range affected?
OUTLIERS
IN-CLASS PROBLEM 19
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Original range: 4
New range: 740
OUTLIERS
IN-CLASS PROBLEM 20
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Is the standard deviation
affected?
OUTLIERS
IN-CLASS PROBLEM 20
Suppose
1 1 2
Suppose
1 1 2
we
3
we
3
originally had data:
5
now have data:
741
Original s: ≈1.7
New s: ≈330.6
STANDARD DEVIATION
IN-CLASS PROBLEM 21
What advantages does the
standard deviation have over
the range?
In-class Project
Turn in your classwork!
Don’t forget
your homework
due next class!
See you Thursday!