Download Chapter 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 3
Descriptive Statistics: Numerical
Methods
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.
Descriptive Statistics
3.1 Describing Central Tendency
3.2 Measures of Variation
3.3 Percentiles, Quartiles and Box-andWhiskers Displays
3-2
Describing Central Tendency
• In addition to describing the shape of a
distribution, want to describe the data
set’s central tendency
– A measure of central tendency represents
the center or middle of the data
– “Center” means typical or regular in this
setting.
3-3
Parameters and Statistics
• A population parameter is a number
calculated from all the population
measurements that describes some
aspect of the population
• A sample statistic is a number
calculated using the sample
measurements that describes some
aspect of the sample
3-4
Measures of Central Tendency
Mean, 
The average or expected
value
Median, Md
The value of the middle
point of the ordered
measurements
Mode, Mo
The most frequent value
3-5
The Mean
Population X1, X2, …, XN

Sample x1, x2, …, xn
x
Population Mean
Sample Mean
n
N


Xi
i=1
N
x
x
i
i=1
n
3-6
The Sample Mean
For a sample of size n, the sample mean is defined as
n
x
x
i 1
n
i
x1  x2  ...  xn

n
and is a point estimate, one-number estimate, of the
population mean 
• It is the value to expect, on average and in the long run
3-7
Example 3.1: The Car Mileage Case
• Example 3.1:Sample mean for first five
car mileages from Table 3.1:
30.8, 31.7, 30.1, 31.6, 32.1
5
x
x1  x2  x3  x4  x5
x

5
5
30.8  31.7  30.1  31.6  32.1 156.3
x

 31.26
5
5
i 1
i
3-8
The Median
The median Md is a value such that 50% of
all measurements, after having been
arranged in numerical order, lie above (or
below) it
1. If the number of measurements is odd, the
median is the middlemost measurement in the
ordering, or (n+1)/2 th value in the ordered list.
2. If the number of measurements is even, the
median is the average of the two middlemost
measurements in the ordering, or the average of
n/2 th and (n/2 +1) th values in the ordered list.
3-9
Example: Car Mileage Case
• Example 3.1: First five observations
from Table 3.1:
30.8, 31.7, 30.1, 31.6, 32.1
• In order: 30.1, 30.8, 31.6, 31.7, 32.1
• There is an odd so median is one in
middle, or 31.6
3-10
The Mode
The mode Mo of a population or sample of
measurements is the measurement that
occurs most frequently
– Modes are the values that are observed “most
typically”
– Sometimes higher frequencies at two or more
values
• If there are two modes, the data is bimodal
• If more than two modes, the data is multimodal
– When data are in classes, the class with the
highest frequency is the modal class
• The tallest box in the histogram
3-11
Suggested Exercise
• Page 122
3.3, 3.4
3-12
Mean Vs Median
• Data set: 1, 2, 3, 4
Mean=
2.5;
median=2.5
• Data set: 1, 2, 3, 4 , 100
Mean= 22
median= 3
3-13
3-14
Mean Vs Median
• Compare with the mean, the median is
resistant to extreme values. The median
can resist the influence of the extreme
values better than the mean.
3-15
Measures of Variation
• Data set 1:
4, 5, 6, 7, 8
• Mean
• Data set 2:
1, 4, 6, 8, 11
• Mean
3-16
Data set 1:
4, 5, 6, 7, 8
Mean
Data set 2:
1, 4, 6, 8, 11
Mean
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.
3-18
Measures of Variation
• Knowing the measures of central tendency is
not enough
• Both of the distributions below have identical
measures of central tendency
3-19
Measures of Variation
Range
Largest minus the smallest
measurement
Variance
The average of the squared
deviations of all the population
measurements from the population
mean
Standard
The square root of the variance
Deviation
3-20
The Range
• Largest minus smallest
• Measures the interval spanned by all
the data
• For Figure 3.13, largest repair time is 5
and smallest is 3
• Range is 5 – 3 = 2 days
3-21
Variance
• For a population of size N, the
population variance σ2 is:
N
2 
2


x


 i
i 1
N
2
2
2

x1     x2       xN   

N
• For a sample of size n, the sample
variance s2 is:
n
s2 
2


x

x
 i
i 1
n 1
2
2
2

x1  x   x2  x     xn  x 

n 1
3-22
Standard Deviation
• Population standard deviation (σ):
 
2
• Sample standard deviation (s):
s s
2
3-23
Example: Chris’s Class Sizes This
Semester
• Data points for a populaton are: 60, 41,
15, 30, 34
• Mean is µ=36
N
2


x


 i
• Variance is:  2  i1

2
N
2
2
2

x1     x2       xN   

N
2
2
2
2
2

60  36  41  36  15  36  30  36  34  36

5
576  25  441  36  4 1082


 216.4
5
5
Standard deviation is:
  216.4  14.71
3-24
Example: Sample Variance and
Standard Deviation
• Example 3.7: sample data for first five
car mileages from Table 3.1 are 30.8,
31.7, 30.1, 31.6, 32.1
• The sample mean is 31.26
5
s2 
 x  x 
i 1
2
i
5 1
2
2
2
2
2

30.8  31.26  31.7  31.26  30.1  31.26  31.6  31.26  32.1  31.26

4
2.572

 0.643
4
s  s 2  0.643  0.8019
3-25
An alternative formula for the
sample variance
n
n
( xi )
2
1
2
2
i 1
s  [ xi 
]
n  1 i1
n
3-26
n
n
n
( xi )
2
1
2
s  [ xi  i1 ]
n  1 i1
n
2
s2 
2


x

x
 i
i 1
n 1
2
2
2

x1  x   x2  x     xn  x 

n 1
 in1 xi2  x12  x22  ...  xn2
( in1 xi ) 2  ( x1  x2  ...  xn ) 2
• Data points are: 60, 41, 15, 30, 34
• Mean is 36,
• Sample variance is:
1 2 2 2 2 2 (60  41  15  30  34)
s  [(60  41  15  30  34 ) 
]
5 1
5
2
2
3-27
Percentiles, Quartiles, and Box-andWhiskers Displays
For a set of measurements arranged in
increasing order, the pth percentile is a value
such that p percent of the measurements fall
at or below the value and (100-p) percent of
the measurements fall at or above the value
• The first quartile Q1 is the 25th percentile
• The second quartile (or median) is the 50th
percentile
• The third quartile Q3 is the 75th percentile
• The interquartile range IQR is Q3 - Q1
3-28
Example: Quartiles
20 customer satisfaction ratings:
1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10
i=25/100*20=5
i=75/100*20=15
Md = (8+8)/2 = 8
Q1 = (7+8)/2 = 7.5
(5.25, 11.25)Q = (9+9)/2 = 9
3
IQR = Q3  Q1 = 9  7.5 = 1.5
3-29
Calculating Percentiles
1. Arrange the measurements in
increasing order
2. Calculate the index i=(p/100)n where p
is the percentile to find
3. (a) If i is not an integer, round up and
the next integer greater than i denotes
the pth percentile
(b) If i is an integer, the pth percentile
is the average of the measurements in
the i and i+1 positions
3-30
Percentile Example (p=10th Percentile)
7,524
11,070
18,211
26,817
36,551
41,286
49,312
57,283
72,814
90,416 135,540 190,250
• i=(10/100)12=1.2
• Not an integer so round up to 2
• 10th percentile is in the second position
so 11,070
• Q1? i=(25/100)*12=3,
3-31
Percentile Example (p=25th Percentile)
7,524
11,070
18,211
26,817
36,551
41,286
49,312
57,283
72,814
90,416 135,540 190,250
• i=(25/100)12=3
• Integer so average values in positions 3
and 4
• 25th percentile (18,211+26,817)/2 or
22,514
3-32
Five Number Summary
1. The smallest measurement
2. The first quartile, Q1
3. The median, Md
4. The third quartile, Q3
5. The largest measurement
•
Displayed visually using a box-andwhiskers plot
3-33
Box-and-Whiskers Plots
• The box plots the:
– first quartile, Q1
– median, Md
– third quartile, Q3
– inner fences
– outer fences
3-34
Box-and-Whiskers Plots
Continued
• Inner fences: IQR= Q3–Q1
– Located 1.5IQR away from the quartiles:
• Q1 – (1.5  IQR)
• Q3 + (1.5  IQR)
• (Q1 – (1.5  IQR), Q3 + (1.5  IQR) )
• (7.5-1.5*1.5, 9+1.5*1.5)
• (5.25, 11.25)
• Outer fences
– Located 3IQR away from the quartiles:
• Q1 – (3  IQR)
• Q + (3  IQR)
3-35
Box-and-Whiskers Plots
Continued
• The “whiskers” are dashed lines that
plot the range of the data
– A dashed line drawn from the box below
Q1 down to the smallest measurement
– Another dashed line drawn from the box
above Q3 up to the largest measurement
3-36
Box-and-Whiskers Plots
Continued
3-37
Outliers
• Outliers are measurements that are very
different from other measurements
– They are either much larger or much smaller than
most of the other measurements
• Outliers lie beyond the fences of the box-andwhiskers plot: less than Q1 – (1.5  IQR)
or greater than Q3 + (1.5  IQR)
– Measurements between the inner and outer fences
are mild outliers
– Measurements beyond the outer fences are severe
outliers
3-38
Related documents