Download Doing Statistics for Business Data, Inferences

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Statistics education wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Doing Statistics for Business
Data, Inference, and Decision Making
Marilyn K. Pelosi
Theresa M. Sandifer
Chapter 4
Numerical
Descriptors of
Data
1
Doing Statistics for Business
Chapter 4 Objectives
 Numerical Measures of Center: The Mean,
the Median, and the Mode
 Numerical Measures of Variability: The
Range & the Standard Deviation
 Describing a Set of Data: The Empirical
Rule & Boxplots
2
Doing Statistics for Business
Chapter 4 Objectives (con’t)
 Measures of Relative Standing:
Percentiles
Percentile Rank
 Identifying Outliers:
z-scores
Boxplots
3
Doing Statistics for Business
A Statistic is a numerical descriptor that
is calculated from sample data and is
used to describe the sample. Statistics
are usually represented by Roman
letters.
4
Doing Statistics for Business
A Parameter is a numerical descriptor
that is used to describe a population.
Parameters are usually represented by
Greek letters.
5
Doing Statistics for Business
The Sample Mean is the center of
balance of a set of data, and is found by
adding up all of the data values and
dividing by the number of observations.
6
Doing Statistics for Business
The Population Mean is represented by
the Greek letter  (mu).
7
Doing Statistics for Business
TRY IT NOW!
Restaurant Table Times
Calculating the Sample Mean
A restaurant is trying to decide whether it has an adequate number of
tables available. The restaurant owner decides that she would like some
information on the amount of time a table is occupied by a customer. She
collects data on the length of time a customer occupies a table for a
random sample of 10 customers and obtains the following data.
8
Doing Statistics for Business
TRY IT NOW!
Restaurant Table Times
Calculating the Sample Mean (con’t)
.
Customer
Time (min.)
1
2
3
4
5
6
7
8
9
10
59.3 58.6 62.7 65.4 59.0 67.3 62.8 68.1 59.4 63.7
Calculate the sample mean for the length of time a table is occupied.
9
Doing Statistics for Business
The Sample Median is the value of the
middle observation in an ordered set
of data.
10
Doing Statistics for Business
TRY IT NOW!
Town Hall Traffic
Calculating the Sample Median
In the past few years the town council of a small town has received
complaints that it has become increasingly difficult to cross the main
street in town near the library. The council decides to look at traffic flow
on the street. It selects a site directly in front of the library where most
people try to cross the road and records the number of cars that pass the
point in a two-minute period.
11
Doing Statistics for Business
TRY IT NOW!
Town Hall Traffic
Calculating the Sample Median(con’t)
This is done for 10 two-minute periods at 3:00 p.m. over several weeks
and the following data are obtained.
Number of cars 20 27 29 28 37 23 21 28 29 28
Find the median number of cars that pass the site in two minutes.
Remember to SORT the data before
you locate the median!
12
Doing Statistics for Business
Figure 4.2 Mean and
Median for a Symmetric
Distribution
Mean
Median
13
Doing Statistics for Business
Figure 4.3 Mean and
Median for Skewed
Distributions
Left skew
Right skew
14
Doing Statistics for Business
TRY IT NOW!
Airline Cancellations
Comparing the Mean and the Median
An airline company is wondering about the number of cancellations that it
receives for a particular business commuter flight. The airline takes a
random sample of 15 days from the first quarter of the year and obtains
the following data:
# of cancellations 4 9 9 12 12 13 14 14 15 15 16 16 17 17 24
15
Doing Statistics for Business
TRY IT NOW!
Airline Cancellations
Comparing the Mean and the Median
(con’t)
Find the mean and median for the # of cancellations for the commuter
f light.
When compared, do the data appear symmetric or skewed?
Make a dotplot of the data.
From the dotplot, do the data appear symmetric or skewed?
Note: the data have been sorted for
you.
16
Doing Statistics for Business
Discovery Exercise 4.1
The Trimmed Mean
Part I. Investigating the Data
In a report to the administration of a large university,
the Psychology Department states that the average class size
is greater than the 35 students per class allowed by the university charter.
The report indicates that the mean class size is 39.4.
17
Doing Statistics for Business
Discovery Exercise 4.1
The Trimmed Mean
Part I. Investigating the Data (con’t)
No data are appended to the report, but you can obtain the current
enrollments easily. The data you find are:
3
3
5
9
11
13
14
15
15
17
21
22
22
23
24
24
25
26
26 42
27 45
28 45
28 190
36 193
38 193
18
Doing Statistics for Business
Discovery Exercise 4.1
The Trimmed Mean
Part I. Investigating the Data (con’t)
A. Do you think that the mean is a god measure of center
for these data? Why or why not?
B. By simply studying the data, what do you think a typical class size for
the Psychology Department is?
C. What is the median of the data? Is this closer to what you thought?
D. Compare the mean and median. What doe the comparison lead you to
believe about the data?
E. Display the data graphically. Do you still think the same thing?
19
Doing Statistics for Business
The Sample Mode is the data value that
has the highest frequency of occurrence
in the sample.
20
Doing Statistics for Business
The Modal Class is the class interval in
a frequency distribution or histogram
that has the highest frequency.
21
Doing Statistics for Business
Figure 4.4 Histogram of Bimodal Data
20
Frequency
10
0
3
5
7
9
11
13
15
17
19
21
23
X
22
Doing Statistics for Business
Discovery Exercise 4.2
Investigating Variability
The table contains air-quality data collected by the
Environmental Protection Agency. The data show the number
of days in which the ozone level was dangerous for 14 major U.S. cities
in 2000.
City
Number of unhealthy days
Atlanta
18
Boston
0
Chicago
0
Dallas
5
Denver
0
Houston
94
Kansas City
0
23
Doing Statistics for Business
Discovery Exercise 4.2
Investigating Variability (con’t)
City
Number of unhealthy days
Los Angeles
1
New York
13
Philadelphia
2
Pittsburgh
3
San Francisco
0
Seattle
0
Washington, DC
0
24
Doing Statistics for Business
Discovery Exercise 4.2
Investigating Variability (con’t)
A. Display these data using a dotplot.
B. Find the typical number of unhealthy days by calculating the average
value.
C. Can you expect every observation to be typical? Why not?
25
Doing Statistics for Business
A Sample Range, R, is the difference
between the maximum and minimum
observations in the sample.
26
Doing Statistics for Business
TRY IT NOW!
Restaurant Table Time
Calculating the Sample Range
The restaurant looking at the turnaround time for its tables, wonders how
variable the occupation time for a table really is. The data the restaurant
had collected are:
Time (min) 59.3 58.6 62.7 65.4 59.0 67.3 62.8 68.1 59.4 63.7
27
Doing Statistics for Business
TRY IT NOW!
Restaurant Table Time
Calculating the Sample Range (con’t)
What is the range of turnaround times?
Previously you calculated the mean turnaround time to be 62.6 minutes.
Using this information and the value for the range, what would the
restaurant expect as its lowest turnaround time? Its highest turnaround
time?
28
Doing Statistics for Business
The Sample Variance, s2, is the average
of the squared deviations of the data
values from the sample mean.
29
Doing Statistics for Business
The Sample Standard Deviation, s, is
the positive square root of the sample
variance.
30
Doing Statistics for Business
The population variance and standard
deviation are represented by the Greek
letter,  (sigma), where 2 is the
population variance and  is the
population standard deviation
31
Doing Statistics for Business
The Empirical Rule says that for a
mound-shaped, symmetric distribution:
 about 68% of all data values are within
one standard deviation of the mean
 about 95% of all observations are within
two standard deviations of the mean
 almost all (more than 99%) of the
observations are within three standard
deviations of the mean.
32
Doing Statistics for Business
TRY IT NOW!
Town Hall Traffic Flow
Calculating the Sample Variance
and Standard Deviation
The town council looking at the traffic flow problem has seen reports that
use the standard deviation, and wants to use it to describe the variability of
traffic flow. The data are:
Number of Cars 20
27
29
28
37 23
21
28
29
28
What is the sample standard deviation of the traffic flow?
Use whatever method you feel most
comfortable with. If you have a stat.
calc. learn how to use it now
33
Doing Statistics for Business
x
68%


95%

99%



Figure 4.5 The Empirical Rule
34
Doing Statistics for Business
TRY IT NOW!
Loan Processing
The Empirical Rule
Errors in filling out loan applications can lead to delays in having the
loans approved. Bank employees must contact the applicants to correct
the errors. This sometimes requires multiple contacts. To understand the
extent to which the errors affect the application process a bank collected
data on the number of follow-up contacts required before a loan could be
processed.
35
Doing Statistics for Business
TRY IT NOW!
Loan Processing
The Empirical Rule (con’t)
The bank looked at 25 different applications and found:
0
0
1
1
1
1
2
2
2
2
2
2
3
3
3
3
4
4
4
4
4
4
5
5
7
Make a dotplot of the data.
36
Doing Statistics for Business
TRY IT NOW!
Loan Processing
The Empirical Rule (con’t)
From the dotplot, do you think that the assumption that the data
have a symmetric, bell-shaped distribution is a reasonable one?
Find the mean and standard deviation of the data.
According to the empirical rule, between what two values should 68% of
the observations fall?
Between what two values should 95% of the observations fall?
Between what two values should more than 99% of the observations fall?
37
Doing Statistics for Business
A z-score measures the number of
standard deviations that a data value is
from the mean.
38
Doing Statistics for Business
TRY IT NOW!
Town Hall Traffic
Calculating z-Scores
The town that was looking at traffic flow in front of the town hall
wonders if the observation of 37 cars is unusual. Although the town
officials know that their sample size of 10 cars is not large enough to
ensure accuracy; they want to use z-scores to look at the data:
Number of Cars 20
27
29
28
37 23
21
28
29
28
What is the z-score for the observation of 37 cars?
39
Doing Statistics for Business
TRY IT NOW!
Town Hall Traffic
Calculating z-Scores
Comparing the z-score to the empirical rule, do you think
that the value is unusual?
40
Doing Statistics for Business
The Pth Percentile of a data set is the
value that has p% of the data at or
below it.
41
Doing Statistics for Business
The Percentile Rank of a value is the
percentage of the data in the sample that
are at or below the value of interest.
42
Doing Statistics for Business
TRY IT NOW!
Aptitude Test Scores
Calculating the Percentile Rank
A group of employees at a manufacturing facility take a test
to determine their aptitude for training. The tests are scored on a
400-point scale and are shown here in increasing order:
185 227
241
257
281
299
314
329
195 228
243
261
283
304
318
333
196 234
248
269
283
307
319
335
199 238
250
271
291
309
322
349
223 241
253
272
297
310
328
353
43
Doing Statistics for Business
TRY IT NOW!
Aptitude Test Scores
Calculating the Percentile Rank
One of the employees who scored 283 wants to know how
he stands relative to the other employees who took the exam.
What is the percentile rank for the employee’s score?
What is the percentile rank of the employee that scored 319?
44
Doing Statistics for Business
The first quartile, Q1, is the value in
the sample that has 25% of the data at
or below it.
45
Doing Statistics for Business
The third quartile, Q3, is the value in
the sample that has 75% of the data at
or below it.
46
Doing Statistics for Business
TRY IT NOW!
Training Aptitude
Finding the Quartiles
The company looking at training aptitude wants to give
employees who scored in the top 25% on the test the opportunity
to attend a seminar on training. The test scores are:
185 227
241
257
281
299
314
329
195 228
243
261
283
304
318
333
196 234
248
269
283
307
319
335
199 238
250
271
291
309
322
349
223 241
253
272
297
310
328
353
47
Doing Statistics for Business
TRY IT NOW!
Training Aptitude
Finding the Quartiles (con’t)
In the sample, what is the cutoff score for those people who
will be able to attend the seminar?
Hint: the value that defines the top 25% is the same as the value that defines the
bottom 75%.
Suppose that the company decides that the employees who scored in the
bottom 25% need some additional classes on team building. What is the
cutoff score for those employees who need the classes on team building?
48
Doing Statistics for Business
A Boxplot or Box and Whisker diagram
is a graphical display that uses summary
statistics to display the distribution of a
set of data.
49
Doing Statistics for Business
A Interquartile Range (IQR) is the
difference between the third and first
quartiles Q3 - Q1.
50
Doing Statistics for Business
Figure 4.6
Box Portion of Boxplot
Figure 4.7
Boxplot with Whiskers
51
Doing Statistics for Business
The Inner Fences of a boxplot are
located at Q1 - 1.5 (IQR) and Q3 + 1.5 (IQR).
52
Doing Statistics for Business
The Outer Fences of a boxplot are
located at Q1 - 3 (IQR) and Q3 + 3 (IQR).
53
Doing Statistics for Business
Figure 4.8 Boxplots for Skewed Data
54
Doing Statistics for Business
TRY IT NOW!
Training Aptitude
Finding the Quartiles
The company that administered the training aptitude test to
its employees would like a better picture of how the employees
performed on the test. The data are:
185 227
241
257
281
299
314
329
195 228
243
261
283
304
318
333
196 234
248
269
283
307
319
335
199 238
250
271
291
309
322
349
223 241
253
272
297
310
328
353
55
Doing Statistics for Business
TRY IT NOW!
Training Aptitude
Finding the Quartiles (con’t)
In the previous exercise, you found the first and third quartiles
of the data set. Use these values to complete the calculations
needed for a boxplot.
Draw a complete boxplot of the data.
Were there any outliers? If so, which data values were they?
56
Doing Statistics for Business
The basics of creating a chart in Excel,
using the Chart Wizard.
1. Highlight the data (Frequency table) that you want to
graph.
2. Invoke the Chart Wizard by clicking on the icon on the
toolbar.
3. Follow the directions and hints from the Chart Wizard.
4. Edit the graph to include any other features or changes
you want.
57
Doing Statistics for Business
Calculating Summary Statistics in Excel
1. Position the cursor in the textbox labeled Input Range and
highlight the range of data for which you want to calculate
summary statistics.
2. Specify location for the output, either a section of the
current worksheet, or a new worksheet or workbook. Click
on the radio button for your choice. If you select Output
Range, you must specify a location on the worksheet.
58
Doing Statistics for Business
Calculating Summary Statistics in Excel
(con’t)
2. Position the cursor in the textbox for Output Range and
click on the cell where you want the upper left corner of
the results to appear. If you want to put the results in a
new worksheet, you have the option of giving the sheet a
name in the textbox or just letting Excel create a new,
numbered sheet.
3. Click on the box labeled Summary statistics and finally
click on OK. The output does not include the quartiles.
When a statistic cannot be computed, the output will read
59
N/A.
Doing Statistics for Business
Figure 4.10
Descriptive Statistics Dialog Box
60
Doing Statistics for Business
Figure 4.11
Output from Tools>Data
Analysis>Descriptive Statistics
61
Doing Statistics for Business
Making a Boxplot in KaddStat (note Excel
does not include boxplots as part of the graphs it
can create)
Be sure you have enabled the KaddStat add-in!
1. From the KADD menu select Boxplots. The Boxplot
Dialog Box will open.
2. Position the cursor in the textbox labeled Input Range
and highlight the cells that contain the data.
3. Indicate where you want the boxplot to appear.
4. Click OK.
62
Doing Statistics for Business
Figure 4.12 KADDSTAT Menu Selection
63
Doing Statistics for Business
Figure 4.13 The Boxplot Dialog Box
64
Doing Statistics for Business
Figure 4.14 Finished Boxplot for Golf Ball Data
65
Doing Statistics for Business
Chapter 4 Summary
In this chapter you have learned:
 There are many ways to describe a set of data
using sample statistics. No single number will do
the job, nor is there any standard way to proceed.
 The measures that you choose must reflect the
characteristics of the data itself.
66
Doing Statistics for Business
Chapter 4 Summary (con’t)
 Often the best descriptions come from the use of
multiple measure and conclusions that can be
reached by comparing them.
 It is useful to create images of data using
combinations of different statistics.
 The Empirical Rule and Boxplots are examples
of using summary statistics to get a picture of the
67
distribution of data.