Download ArdiCh123F15

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
..
..
..
..
..
..
SLIDES BY
John Loucks
St. Edward’s University
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 1
Chapter 3, Part A
Descriptive Statistics: Numerical Measures


Measures of Location
Measures of Variability
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 2
Measures of Location

Mean

Median
Mode



Percentiles
Quartiles
If the measures are computed
for data from a sample,
they are called sample statistics.
If the measures are computed
for data from a population,
they are called population parameters.
A sample statistic is referred to
as the point estimator of the
corresponding population parameter.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 3
Mean




Perhaps the most important measure of location is
the mean.
The mean provides a measure of central location.
The mean of a data set is the average of all the data
values.
The sample mean x is the point estimator of the
population mean m.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 4
Sample Mean x
x
x
Sum of the values
of the n observations
i
n
Number of
observations
in the sample
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 5
Population Mean m
m
x
Sum of the values
of the N observations
i
N
Number of
observations in
the population
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 6
Sample Mean
 Example: Apartment Rents
Fifty efficiency apartments were randomly
sampled in a small college town. The monthly rent
prices for these apartments are listed below.
438
476
119
237
49
253
282
66
221
49
103
330
190
345
66
211
505
73
34
43
235
68
378
206
270
119
250
143
391
141
282
28
273
463
192
106
220
216
511
131
763
478
15
558
130
128
525
340
334
80
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 7
Sample Mean
Xbar = SUM(X)/n
SUM
Count
Average
Average
12064
50
241.28
241.28
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 8
Median
 The median of a data set is the value in the middle
when the data items are arranged in ascending order.
 Whenever a data set has extreme values, the median
is the preferred measure of central location.
 The median is the measure of location most often
reported for annual income and property value data.
 A few extremely large incomes or property values
can inflate the mean.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 9
Median
 For an odd number of observations:
26 18 27 12 14 27 19
12
14
18
19
26
27 27
7 observations
in ascending order
the median is the middle value.
Median = 19
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 10
Median
 For an even number of observations:
26 18 27 12 14 27 30 19
12
14
18
19
26
27 27
30
8 observations
in ascending order
the median is the average of the middle two values.
Median = (19 + 26)/2 = 22.5
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 11
Median
 Example: Apartment Rents
Averaging the 25 and 26th data values:
Median = (216 + 220)/2 = 218
15
28
34
43
49
49
66
66
68
73
80
103
106
119
119
128
130
131
141
143
190
192
206
211
216
220
221
235
237
250
253
270
273
282
282
330
334
340
345
378
391
438
463
476
478
505
511
525
558
763
Note: Data is in ascending order.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 12
Mode
 The mode of a data set is the value that occurs with
greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
bimodal.
 If the data have more than two modes, the data are
multimodal.
 Caution: If the data are bimodal or multimodal,
Excel’s MODE function will incorrectly identify a
single mode.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 13
Mode
 Example: Apartment Rents
450 occurred most frequently (7 times)
Mode = 450
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Note: Data is in ascending order.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 14
Measures of Variability
 It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 15
Measures of Variability
 Range
 Variance
 Standard Deviation
 Coefficient of Variation
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 16
Range
 The range of a data set is the difference between the
largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 17
Range
 Example: Apartment Rents
Range = largest value - smallest value
Range = 615 - 425 = 190
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Note: Data is in ascending order.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 18
Variance
The variance is a measure of variability that utilizes
all the data.
It is based on the difference between the value of
each observation (xi) and the mean ( x for a sample,
m for a population).
The variance is useful in comparing the variability
of two or more variables.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 19
Variance
The variance is the average of the squared
differences between each data value and the mean.
The variance is computed as follows:
2  ( xi  x )
s 
n 1
for a
sample
2
 ( xi  m )
 
N
2
2
for a
population
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 20
Standard Deviation
The standard deviation of a data set is the positive
square root of the variance.
It is measured in the same units as the data, making
it more easily interpreted than the variance.
The standard deviation is computed as follows:
s  s2
  2
for a
sample
for a
population
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 21
Coefficient of Variation
The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
The coefficient of variation is computed as follows:
s

 100  %
x



 100  %
m

for a
sample
for a
population
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 22
Sample Mean
 Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a small college town. The monthly rent
prices for these apartments are listed below.
445
440
465
450
600
570
510
615
440
450
470
485
515
575
430
440
525
490
580
450
490
590
525
450
472
470
445
435
435
425
450
475
490
525
600
600
445
460
475
500
535
435
460
575
435
500
549
475
445
600
445
460
480
500
550
435
440
450
465
570
500
480
430
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
615
450
480
465
480
510
440
Slide 23
Using Excel to Compute the Sample Variance,
Standard Deviation, and Coefficient of Variation
 Formula Worksheet
1
2
3
4
5
6
7
A
B
C
D
E
Apart- Monthly
ment Rent ($)
1
525
Mean =AVERAGE(B2:B71)
2
440
Median =MEDIAN(B2:B71)
3
450
Mode =MODE.SNGL(B2:B71)
4
615
Variance =VAR.S(B2:B71)
5
480
Std. Dev. =STDEV.S(B2:B71)
6
510
C.V. =E6/E2*100
Note: Rows 8-71 are not shown.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 24
Using Excel to Compute the Sample Variance,
Standard Deviation, and Coefficient of Variation
 Value Worksheet
1
2
3
4
5
6
7
A
B
C
D
Apart- Monthly
ment Rent ($)
1
525
Mean
2
440
Median
3
450
Mode
4
615
Variance
5
480
Std. Dev.
6
510
C.V.
E
490.80
475.00
450.00
2996.16
54.74
11.15
Note: Rows 8-71 are not shown.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 25
Using Excel’s
Descriptive Statistics Tool
Step 1 Click the Data tab on the Ribbon
Step 2 In the Analysis group, click Data Analysis
Step 3 Choose Descriptive Statistics from the list of
Analysis Tools
Step 4 When the Descriptive Statistics dialog box appears:
(see details on next slide)
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 26
Using Excel’s
Descriptive Statistics Tool
 Excel’s Descriptive Statistics Dialog Box
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 27
Using Excel’s
Descriptive Statistics Tool
 Excel Value Worksheet (Partial)
1
2
3
4
5
6
7
8
E
D
C
B
A
Apart- Monthly
Monthly Rent ($)
ment Rent ($)
525
1
490.8
Mean
440
2
6.542348114
Standard Error
450
3
475
Median
615
4
450
Mode
480
5
Standard Deviation 54.73721146
510
6
2996.162319
Sample Variance
575
7
Note: Rows 9-71 are not shown.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 28
Using Excel’s
Descriptive Statistics Tool
 Excel Value Worksheet (Partial)
9
10
11
12
13
14
15
16
A
8
9
10
11
12
13
14
15
B
430
440
450
470
485
515
575
430
C
D
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
E
-0.334093298
0.924330473
190
425
615
34356
70
Note: Rows 1-8 and 17-71 are not shown.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 29
Percentiles
 A percentile provides information about how the
data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.

The pth percentile of a data set is a value such that at least
p percent of the items take on this value or less and at least
(100 - p) percent of the items take on this value or more.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 30
Using Excel’s Rank and Percentile Tool
to Compute Percentiles and Quartiles
 Using Excel’s Percentile Function
The formula Excel uses to compute the location (Lp)
of the pth percentile is
Lp = pn + (1 – p)
Excel would compute the location of the 80th
percentile for the apartment rent data as follows:
L80 = (0.8)70 + (1 – 0.8) = 56 + .2 = 56.2
The 80th percentile would be
535 + .2(549 - 535) = 535 + 2.8 = 537.8
Excel interpolates over the interval from 0 to n.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 31
Using Excel’s Rank and Percentile Tool
to Compute Percentiles and Quartiles
 Excel Formula Worksheet
1
2
3
4
5
6
80th percentile
A
B
C
D
Apart- Monthly
ment Rent ($)
80th Percentile
1
525
=PERCENTILE.INC(B2:B71,.8)
2
440
3
450
It is not necessary
4
615
5
480
to put the data
Note: Rows 7-71 are not shown.
in ascending order.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 32
Quartiles
 Quartiles are specific percentiles.
 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 33
Third Quartile
 Using Excel’s QUARTILE.INC Function
Excel computes the locations of the 1st, 2nd, and 3rd
quartiles by first converting the quartiles to
percentiles and then using the following formula to
compute the location (Lp) of the pth percentile:
Lp = (p/100)n + (1 – p/100)
Excel would compute the location of the 3rd quartile
(75th percentile) for the rent data as follows:
L75 = (75/100)70 + (1 – 75/100) = 52.5 + .25 = 52.75
The 3rd quartile would be
515 + .75(525 - 515) = 515 + 7.5 = 522.5
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 34
Third Quartile
 Excel Formula Worksheet
1
2
3
4
5
6
A
B
Apart- Monthly
ment Rent ($)
1
525
2
440
3
450
4
615
5
480
C
3rd quartile
D
Third Quartile
=QUARTILE.INC(B2:B71,3)
It is not necessary
to put the data
in ascending order.
Note: Rows 7-71 are not shown.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 35
Using Excel’s QUARTILE.INC Function
 If the value of 1 in the QUARTILE.INC function is
changed to 0, Excel computes the minimum value in
the data set.
 If the value of 1 is changed to 4, Excel computes the
maximum value in the data set.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 36
Excel’s Rank and Percentile Tool
Step 1 Click the Data tab on the Ribbon
Step 2 In the Analysis group, click Data Analysis
Step 3 Choose Rank and Percentile from the list of
Analysis Tools
Step 4 When the Rank and Percentile dialog box appears
(see details on next slide)
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 37
Excel’s Rank and Percentile Tool
Step 4 Complete the Rank and Percentile dialog
box as follows:
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 38
Excel’s Rank and Percentile Tool
 Excel Value Worksheet
1
2
3
4
5
6
7
8
9
10
B
Rent
525
440
450
615
480
510
575
430
440
C
D
Point
4
63
35
42
49
56
28
21
7
E
F
Rent
Rank
615
1
615
1
600
3
600
3
600
3
600
3
590
7
580
8
575
9
G
Percent
98.50%
98.50%
92.70%
92.70%
92.70%
92.70%
91.30%
89.80%
86.90%
Note: Rows 11-71 are not shown.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 39
Interquartile Range
 The interquartile range of a data set is the difference
between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 40
Interquartile Range
 Example: Apartment Rents
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Note: Data is in ascending order.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 41
Trimmed Mean
 Another measure, sometimes used when extreme
values are present, is the trimmed mean.
 It is obtained by deleting a percentage of the
smallest and largest values from a data set and then
computing the mean of the remaining values.
 For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 42
72
91
74
67
36
38
60
44
14
3
13
17
47
44
28
38
31
33
35
36
37
37
38
39
Count 0.07 Round Up
50
3.5
4
Drop Top and Bottom
59.45
59.45 Average Static
59.45 Average Dynamic OFFSET
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 43
End of Chapter 3, Part A
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide 44