Download V045desc

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lesson Objectives
 Learn when each measure
of a “typical value” is appropriate.
Also called “central tendency” or “location.”
 Learn when each measure of
a “variation” are appropriate.
Also called “scatter” or “dispersion.”
 See how these measures relate to
statistical inference, which will covered
later in the course.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 1
Statistics is the science of
• collecting
• organizing
• summarizing
• interpreting
DATA
for making decisions.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 2
Organize / Summarize
Data
Graphical
 Department of ISM, University of Alabama, 1995-2003
Numerical
M07-Numerical Summaries 1 3
Key Features of Data Distributions
Shape

Typical Value
Spread
This section
covers these
two.
Outliers
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 4
Measures of Location
Give “middle” or “typical” values
or “central tendency.”
Measures of Variation
Describe “spread” or “scatter”
or “dispersion” in the data.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 5
Measures of Location
1. Mean
the “center of gravity”
of the data (histogram).
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 6
formula for mean
Sample
Mean
X=
=
=
Sum of observations
divided by
sample size
S Xi
n
X1 + X2 + ··· +Xn
 Department of ISM, University of Alabama, 1995-2003
n
M07-Numerical Summaries 1 7
The mean is ________________
to extreme values (outliers).
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 8
2. Median - midpoint of distribution
At least half of the observations
are at
or less than the median,
and at least half are
at or greater than the median.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 9
Note: For n observations,
the median is located at the
n+1
-th observation
2
in the ordered sample.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 10
Example 1
Data: 14, 18, 20, 12, 24, 15, 14
(n = 7  “odd”)

Median is the middle value of the “ordered” data.

At least half the values are at or greater;
at least half are at or lower.

7+1
= 4th  location of median
2
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 11
Example 2
median example
Data: 14, 18, 20, 12, 24, 15, 14
94 (outlier)
(n = 7  “odd”)


still the middle value.
Median is resistant to outliers.
Median is
Original,
X=
with outlier, X =
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 12
Example 3
Data: 14, 18, 20, 12, 24, 15, 14, 214
(n = 8  “even,” outlier)



Median is the average of the two middle values.
Exactly half the values are greater, half lower.
8+1
= 4.5th  location of median
2
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 13
Summary for finding Median
1. Order the data.
2. For odd n, the median is
the center observation.
3. For even n, the median is
the average of the two center
observations.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 14
3. Mode - most frequently
occurring number
In a histogram, modal class
is the one having
largest frequency,
i.e., highest bar.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 15
When should each estimator be used?
What type of variable is it?
 If categorical, use the mode.
“Average” is meaningless;
look at “percentages” of occurrences.
 If variable is quantitative,
first look at a graph:
 Skewed or outliers?
Use median.
 More or less symmetric? Use mean.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 16
Numerical Summary
Location
Mean
Median
Mode
 Department of ISM, University of Alabama, 1995-2003
Variation
Range
Std. Deviation
IQR
M07-Numerical Summaries 1 17
Why does variation matter?
Mountain Climbing Rope.
Two suppliers; sample and
test three ropes from each.
“Snap Breaking Strength”
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 18
Measures of Variation
1. Range
2. Variance &
Standard Deviation
3. Mean Absolute Deviation (Mad)
4. Interquartile Range (IQR)
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 19
1. Range
Highest minus lowest
value in the sample.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 20
3, 4, 1, 7, 4, 5
Example 4:
Range =
Example 5:
1, 1, 1, 7, 7, 7
Range =
1
2
3
4
5
6
7
 Department of ISM, University of Alabama, 1995-2003
1
2
3
4
5
6
7
M07-Numerical Summaries 1 21
Range
Advantage: _________
_________________.
Disadvantage:
_______ most of the data.
______________
to outliers.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 22
2. Variance &
Standard Deviation
How far are the data
from the middle,
on average?
Notation:
Sample Variance = s2
Sample Std. Dev. = s
 Department of ISM, University of Alabama, 1995-2003
Population Variance = s2
Population Std. Dev. = s
M07-Numerical Summaries 1 23
Example 4:
1
2
3, 4, 1, 7, 4, 5
3
 Department of ISM, University of Alabama, 1995-2003
4
5
6
7
M07-Numerical Summaries 1 24
Note: The average of the deviations
from the mean will always be zero.
We need to keep the negatives
from canceling the positives.
We can do this by
1. _____________,
2. _____________,
 Department of ISM, University of Alabama, 1995-2003
______
_____
M07-Numerical Summaries 1 25
Equation for Variance:
For a population:
s2 =
For a sample:
2
s
=
S(Xi - m)2
N
S(Xi -
(see page 88)
2
X)
n-1
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 26
Equation for Variance:
Example 4 data:
2
s
=
=
S(Xi -
2
X)
(see page 88)
n-1
(3-4)2 + (4-4)2 + (1-4)2 + (7-4)2 + (4-4)2 + (5-4)2
6-1
=
=
 Department of ISM, University of Alabama, 1995-2003
units?
M07-Numerical Summaries 1 27
Equations for Variance:
1.
2.
3.
2
s
=
2
s =
2
s =
S(Xi -
2
X)
(see page 88)
n-1
2
2
n X
Xi
-
n
2
Xi
-1
(see page 90)
(  X i)
- n
n -1
 Department of ISM, University of Alabama, 1995-2003
2
M07-Numerical Summaries 1 28
Example 4:
2
X
X
9
3
4 16
1
1
7 49
4 16
5 25
24 116
3, 4, 1, 7, 4, 5
 Department of ISM, University of Alabama, 1995-2003
SX =
2
SX =
M07-Numerical Summaries 1 29
(  Xi )
X n
2
s =
n-1
2
2
i
2
s
=
6-1
 Department of ISM, University of Alabama, 1995-2003
= 4.0
M07-Numerical Summaries 1 30
Comments
• Both equations should
give the same answer.
• First is easier when data
and the mean are integers.
• Second is easier for larger
data sets, or data not integer.
• More chance of round-off error
with first equation.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 31
Variance
Advantage: ________________;
________________.
Disadvantages:
Units are _________.
____ resistant to
outliers.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 32
Standard Deviation
S=
S
2
=
4.0
=
2.0
“The square root
of the variance.”
Advantage:
Easier to interpret
than variance,
Units same as data.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 33
3. Mean Absolute Deviation, MAD
S xi – m
MAD =
N
S xi – x
MAD =
n
(see page 87)
for population data
for sample data
This will be used extensively in OM 300
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 34
4. Interquartile Range (IQR)
IQR = Q 3 - Q 1

IQR is the range of the
middle 50% of the data.

Observations more than
1.5 IQR’s beyond quartiles
are considered outliers.
 Department of ISM, University of Alabama, 1995-2003
M07-Numerical Summaries 1 35
Statistical Inference
Generalizing from a sample
to a population,
by using a statistic
to estimate
a parameter.
 Department of ISM, University of Alabama, 1995-2003
C07-Numerical Summaries 1 36
Statistic
Parameter
Mean:
X
estimates
____
Standard
deviation:
s
estimates
____
Proportion:
p
estimates
____
from sample
 Department of ISM, University of Alabama, 1995-2003
from entire
population
C07-Numerical Summaries 1 37
Statistics
Descriptive
Graphical
Numerical
 Department of ISM, University of Alabama, 1995-2003
C07-Numerical Summaries 1 38
Example 5:
Estimate the true mean net weight of
16 oz. bags of Golden Flake Potato Chips
with a 95% confidence interval.
Measured Weights in ounces.
16.05
16.01
15.92
15.68
16.10
16.01
15.72
15.80
16.21
15.70
15.95
16.24
16.02
15.90
16.07
16.05
16.18
15.45
16.04
16.05
 Department of ISM, University of Alabama, 1995-2003
Is the filling machine
doing what it should
be doing?
C07-Numerical Summaries 1 39
Most commonly
used features.
Session window
Data window
name of worksheet file
 Department of ISM, University of Alabama, 1995-2003
C07-Numerical Summaries 1 40
“Stat”
“Basic Statistics ”
“Display descriptive statistics”
 Department of ISM, University of Alabama, 1995-2003
C07-Numerical Summaries 1 41
 Department of ISM, University of Alabama, 1995-2003
C07-Numerical Summaries 1 42
“Session Window” results
Results for: c07 Weight of chips.MTW
Descriptive Statistics: Weights
Variable
Weights
Variable
Weights
N
Mean
Median
TrMean
StDev
SE Mean
20
15.958
16.015
15.970
0.199
0.045
Minimum
Maximum
Q1
Q3
15.450
16.240
15.825
16.065
“Five number” summary
Executing from file: C:\Program Files\MTBWIN\MACROS\Describe.MAC
Descriptive Statistics Graph: Weights
 Department of ISM, University of Alabama, 1995-2003
C07-Numerical Summaries 1 43
Histogram with
Normal distribution
curve superimposed
Box
plot
“95% Confidence Interval”
for the population mean.
A confidence interval gives the
limits of the plausible values
of the true population mean, m.
Our sample mean was 15.957 oz.
This is less than 16.000.
Should we be concerned?
____, because 16.000 is a
plausible value for the true
population mean.
“95% Confidence Interval”
for the population mean.
Related documents