Download 1 Section 2.4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Review: Central Measures
Mean, Median and Mode
When do we use mean or median?
•
•
•
median
mode
C
B
A
A
A
A
B
A B C
If there is (are) outliers, use Median
If there is no outlier, use Mean.
Example:
• For a data 1, 1.2, 1.5, 1.7, 1.8, 1.9, 2.3, 2.5, 2.8, 3.
mean
A
Which
one is more appropriate?
A B C
• For a data 1, 1.2, 1.5, 1.7, 1.8, 1.9, 2.3, 2.5, 2.8, 3, 10, 40,
which one is more appropriate?
Relationship between the central measures
A
Agresti/Franklin Statistics, 1 of 25
Section 2.4
C
B CAgresti/Franklin Statistics, 2 of 25
Measuring Spread
How Can We Describe the Spread of
Quantitative Data?
Range: difference between the largest
and smallest observations.
IQR =Interquartile Range
=3rd quartile-1st quartile
• IQR is robust to outliers since it is the
difference of two medians.
Standard deviation
Agresti/Franklin Statistics, 3 of 25
Agresti/Franklin Statistics, 4 of 25
Identify the minimum and maximum
sugar values:
a.
2 and 14
c.
1 and 15
b.
d.
1 and 3
0 and 16
mode
•
•
•
•
Standard Deviation
Creates a measure of variation by
summarizing the deviations of each
observation from the mean and
calculating an adjusted average of these
deviations
s=
dot plot
Mean=8.8, median=10, Q1=3, Q3=12
Min=1
Max=15
Range=15-1=14
Agresti/Franklin Statistics, 5 of 25
( x − x )2
n −1
Agresti/Franklin Statistics, 6 of 25
1
Sample Standard Deviation
(Shortcut Formula)
n (Σ
Σx2) - (Σ
Σx)2
n (n - 1)
s=
Example: Publix check-out
waiting times in minutes
Data: 1, 4, 10. Find the sample mean
and sample standard deviation.
x−x
( x − x )2
x
n=3
1
4
10
15
∑x
Formula 2-5
1−5=-4
16
1
25
42
∑ ( x − x )2
-1
5
∑ (x − x )
2
s=
Agresti/Franklin Statistics, 7 of 25
Data: 1, 4, 10. Find the sample mean
and sample standard deviation
Using the shortcut formula:
s =
=
=
=
∑
x
2
−
(∑
)
2
x
n (n − 1)
3 ( 117
) − (15
3 (3 − 1)
− 225
6
351
21
=
42
= 21 = 4.6 min
3 −1
Standard Deviation Key Points
The standard deviation is a measure of variation of
all values from the mean
The value of the standard deviation s is usually
positive and always non-negative.
The value of the standard deviation s can increase
dramatically with the inclusion of one or more
outliers (data values far away from all others)
)2
=
=
15
= 5.0 min
3
Agresti/Franklin Statistics, 8 of 25
Example: Publix check-out
waiting times in minutes
n
n −1
x=
126
6
The units of the standard deviation s are the same as
the units of the original data values
4 . 6 min
Agresti/Franklin Statistics, 9 of 25
Agresti/Franklin Statistics, 10 of 25
Empirical Rule
Parameter and Statistic
For bell-shaped data sets:
Approximately 68% of the observations fall
within 1 standard deviation of the mean
Approximately 95% of the observations fall
within 2 standard deviations of the mean
Approximately 99% of the observations fall
within 3 standard deviations of the mean
Agresti/Franklin Statistics, 11 of 25
A parameter is a numerical summary of
the population
A statistic is a numerical summary of a
sample taken from a population
Agresti/Franklin Statistics, 12 of 25
2
Five summary statistics
Boxplot
Minimum =1
1st quartile = 3
Median =10
3rd quartile=12
Maximum =15
Agresti/Franklin Statistics, 13 of 25
Agresti/Franklin Statistics, 14 of 25
B oxplot of SUGA Rg
16
Boxplot
max
14
12
Q3
A box is constructed from Q1 to Q3
10
Q2=median
A line is drawn inside the box at the median
A line extends outward from the lower end of
the box to the smallest observation that is not
a potential outlier
A line extends outward from the upper end of
the box to the largest observation that is not a
potential outlier
g
R
A 8
G
U
S
mean
6
4
2
Q1
min
0
Agresti/Franklin Statistics, 15 of 25
Boxplot
A box is constructed from Q1 to Q3
A line is drawn inside the box at the median
A line extends outward from the lower end of
the box to the smallest observation that is not
a potential outlier
A line extends outward from the upper end of
the box to the largest observation that is not a
potential outlier
Agresti/Franklin Statistics, 17 of 25
Agresti/Franklin Statistics, 16 of 25
Comparison using boxplots
Example: Your company makes plastic pipes, and you are
concerned about the consistency of their diameters. You
measure ten pipes a week for three weeks. Create a
boxplot to examine the distributions.
1
2
3
4
Open the worksheet PIPE.MTW.
Choose Graph > Boxplot or Stat > EDA > Boxplot.
Under Multiple Y's, choose Simple. Click OK.
In Graph Variables, enter 'Week 1' 'Week 2' 'Week 3'.
Click OK.
Agresti/Franklin Statistics, 18 of 25
3
Graph window output
B ox pl ot o f We ek 1 , We ek 2 , W ee k 3
Skewed to the right
9
Symmetric
Skewed to the left
8
7
a
t
a
D
6
5
4
Week 1
Week 2
Week 3
Agresti/Franklin Statistics, 19 of 25
Agresti/Franklin Statistics, 20 of 25
Interpreting the results
Tip To see precise information for Q1, median, Q3, interquartile
range, whiskers, and N, hover your cursor over any part of the
boxplot. The boxplot shows:
Week 1 median is 4.985, and the interquartile range is 4.4525
to 5.5575.
Week 2 median is 5.275, and the interquartile range is 5.08 to
5.6775. An outlier appears at 7.0.
Week 3 median is 5.43, and the interquartile range is 4.99 to
6.975. The data are positively skewed.
Conclusion: The medians for the three weeks are similar.
However, during Week 2, an abnormally wide pipe was created,
and during Week 3, several abnormally wide pipes were
created.
Z-Score
The z-score for an observation measures how far
an observation is from the mean in standard
deviation units
z=
Agresti/Franklin Statistics, 21 of 25
observatio n - mean
standard deviation
An observation in a bell-shaped distribution is a
potential outlier if its z-score < -3 or > +3
Agresti/Franklin Statistics, 22 of 25
Inverse problem
Example: Converting to z-score
If Bob’s score is 1.5 standard deviation
higher than the mean, what is Bob’s
score for the previous problem.
Denote Bob’s score=x,
then 1.5=(x-75)/10
so x=1.5(10)+75=90.
Agresti/Franklin Statistics, 23 of 25
Scores on a test have a mean of 75 and
a standard deviation of 10. Bob has a
score of 90. Convert Bob’ score to a zscore. Round to the nearest hundredth.
Bob’s z-score=(90-75)/10=1.50 which
means that Bob’s score is 1.5
standard deviation higher than the
mean.
Agresti/Franklin Statistics, 24 of 25
4
2.6 How are descriptive
summaries misused? (read)
Figure 2.18, page 75
HW4:
• read section 3.2
• problems 2.57, 2.62, 2.63, 2.65, 2.67, 2.68,
2.69, 2.71, 2.72
Agresti/Franklin Statistics, 25 of 25
5