Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Displaying & Summarizing Data
(Lesson – 02A)
From Data to Information
Dr. C. Ertuna
1
Displaying & Summarizing Data
Consumer Price Index: U.S. city
average; 1967=100
Month
CPI
Jan-1999
492.3
Jul-1999
499.2
Jan-2000
505.8
Jul-2000
517.5
Dec-2000
521.1
Dr. C. Ertuna
What do
these data
tell us
about CPI?
2
Displaying & Summarizing Data
• Raw data need to be converted into
useful managerial information
• Visual Displays and Statistical
Summaries are means to convert raw
data into insightful information.
• That information could be interpreted
for decision making purposes
Dr. C. Ertuna
3
Visual Display of Data
Consumer Price Index
530
520
510
500
490
480
Dr. C. Ertuna
Jul-00
Jan-00
Jul-99
Jan-99
470
4
Displaying & Summarizing Data
• Raw data need to be converted into
useful managerial information
• Visual Displays and Statistical
Summaries are means to convert raw
data into insightful information.
• That information could be interpreted
for decision making purposes
Dr. C. Ertuna
5
Displaying & Summarizing Data
• CPI increased
steadily over the
last two years.
Consumer Price Index
530
520
510
500
• Over the last 6
months, however,
that increase
slowed down.
490
480
Jul-00
Jan-00
Jul-99
Jan-99
470
Dr. C. Ertuna
6
Displaying & Summarizing Data
Consumer Price Index: U.S. city
average; 1967=100
530
520
Jan-1999
492.3
Jul-1999
499.2
500
Jan-2000
505.8
480
Jul-2000
517.5
Dec-2000
521.1
510
490
Jul-00
Jan-99
470
Jan-00
CPI
Jul-99
Month
Consumer Price Index
• CPI increased steadily over the last two years.
• Over the last 6 months, however, that increase
slowed down.
Dr. C. Ertuna
7
Graphical Display Methods
• Charts (Column, Bar, Line, Pie, Area) and
Scatter Diagrams make it easier to gain
insights about the data (visual interpretation of
data),
• They provide also excellent communication
vehicles
• The drawback is that data could be distorted
by manipulating the scale on the chart.
Dr. C. Ertuna
8
Creating a Column Chart
Profits (in 1000 of Dollars)
2000
1998
5. Select “Series Rows”
6. Select Finish
Tractor
Mower
1996
20000
1996 1997 1998 1999 2000 18000
Tractor 1,808 2,674 3,974 11,802 18,089 16000
Mower 1,246 1,204 1,226
981 2,549 14000
12000
10000
1. Enter data in the worksheet
8000
2. Select Chart Wizard
6000
4000
3. Select Clustered Column Chart (1:1)
4. Select Data Range (including Labels) 20000
Data: St-CE-Ch02-x1-Examples-Slide
Dr. C. Ertuna
9
Statistical Summaries
• Descriptive Statistics such as
– Measures of Central Tendency
(mean, median, mode, midrange, etc.)
– Measures of Dispersion
(range, variance, standard deviation, etc.)
– Frequency Distributions
– Histograms
• Statistical Relationships (such as Correlation)
provide effective way of obtaining meaningful
information from data.
Dr. C. Ertuna
10
Descriptive Statistics
• Descriptive statistics need to be computed for
both the sample and the population
• “Population Parameters” is the name for the
Descriptive Statistics for population
Greek letters represent Population Parameters
• “Sample Statistics” is the name for the Descriptive
statistics for sample
Roman letters represent Sample Statistics.
Dr. C. Ertuna
11
Measure of Central Tendency
B
C
2 Blood Pressure
1/A
3
NA
Others
4
5
6
7
8
9
175
162
159
193
148
151
128
117
152
138
97
115
Blood pressures for North
American and other managers
are given on the left.
Which group has higher blood
pressure?
To answer this question we need to
measure first, the central
tendency of each group.
Data: St-CE-Ch02-x1-Examples-Slide 13
Dr. C. Ertuna
12
Descriptive Measures of C.T.
Descriptive
Measure
Computation Method
Data Level
Pros/Cons
Mean
Sum of values
divided by the
number of values
Ratio
Interval
• Numerical center of the data
• Sum of the deviations from the mean is zero
• Sensitive to extreme values
Median
Middle value for
data that have
been sorted
Ratio
Interval
Ordinal
• Not sensitive to extreme values
• Computed only for the central values
• Does not use information from all the data
Mode
Value(s) that
occur most
frequently in the
data
Ratio
• May not reflect the center
Interval • May not exist
Ordinal • Might have multiple modes
Nominal
Dr. C. Ertuna
13
Descriptive Measures of C.T.
Descriptive
Measure
Mean
Median
Mode
Excel Command
=Average(Range)
=Median(Range)
=Mode(Range)
Symbol
, x
Pros/Cons
• Numerical center of the data
• Sum of the deviations from the mean is zero
• Sensitive to extreme values
• Not sensitive to extreme values
• Computed only for the central values
• Does not use information from all the data
• May not reflect the center
• May not exist
• Might have multiple modes
Dr. C. Ertuna
14
Example: Measure of C. T. (cont.)
B
C
2 Blood Pressure
1/A
3
NA Others
4
5
6
7
8
9
175
162
159
193
148
151
128
117
152
138
97
115
Data: St-CE-Ch02-x1-Examples-Slide 13
Blood pressures for NA and
other managers are given
on the left.
1. Compare mean, median,
mode and midrange.
2. Explain the meaning of the
results.
3. Evaluate the implications.
Dr. C. Ertuna
15
Example: Measure of C. T. (cont.)
1/A
B
C
D
E
F
G
H
2 Blood Pressure
3
NA Others
4
5
6
7
8
9
175
162
159
193
148
151
128
117
152
138
97
115
NA
=AVERAGE(B4:B23)
Median = =MEDIAN(B4:B23)
Mode = =MODE(B4:B23)
Mean =
Range =
=MAX(B4:B23)-MIN(B4:B23)
Data: St-CE-Ch02-x1-Examples-Slide 13
Dr. C. Ertuna
16
Example: Measure of C. T. (cont.)
C
D
B
2 Blood Pressure
1/A
3
NA Others
4
5
6
7
8
9
175
162
159
193
148
151
128
117
152
138
97
115
E
F
G
H
NA Others
Mean =
Median =
Mode =
Midrange =
Range =
158.15
160.50
148.00
135.50
115.00
119.50
116.00
138.00
120.50
63.00
Data: St-CE-Ch02-x1-Examples-Slide 13
Dr. C. Ertuna
17
Checking on the Extreme Value
• Before we can relay on the mean as a “good”
measure for central tendency we need to test for
extreme values.
• One such measure is the 3-sigma rule.
• The standard deviation of a series is computed then
multiplied by 3 and the result is: (a) once added to
the mean to find the Upper Limit for extreme value
detection and (b) once deducted from the mean
value to determine the Lower Limit.
Dr. C. Ertuna
18
Example: Checking on Extreme Value
2 Blood Pressure
3
NA Others
4
5
6
7
8
9
10
175
162
159
193
148
151
78
128
117
152
138
97
115
105
NA Others
Mean =
Std Dev =
3 Std Dev =
Upper L =
Lower L =
Extreme V =
Dr. C. Ertuna
158.15
23.87
71.60
229.75
86.55
1
119.50
17.83
53.49
172.99
66.01
0
19
Example: Measure of C. T. (cont.)
B
C
NA
D
Others
Mean = 158.15
119.50
Median = 160.50
Mode = 148.00
116.00
138.00
Range = 115.00
63.00
Blood pressure statistics for
NA and other managers
are given on the left.
1. Compare mean, median,
mode.
2. Explain the meaning of
the results.
3. Evaluate the implications.
Dr. C. Ertuna
20
Note: Comparison of the Means
• Actually To compare the means of two
series we need to run a hypothesis test (two
sample mean test – t-test).
• We will learn how to run hypothesis tests,
later.
• For now we will do a simple face value
comparison with the knowledge that the
true comparison requires a formal test.
Dr. C. Ertuna
21
Comparison: Measure of C. T. (cont.)
• Mean, Median, and Mode of NAmanagers are greater than “Other” managers
• Median is slightly higher than the mean for
the NA managers and the opposite is true
for others.
Dr. C. Ertuna
22
Meaning: Measure of C. T. (cont.)
1 Mean values suggests that blood pressures of
NA-managers are higher than the “other”
managers
2 Midrange could be interpreted as a raw
measurement for the effect of distortion by the
extreme values on the mean. It suggests that
the real mean for NA is under estimated
and for “Others” it is pretty much accurate.
Dr. C. Ertuna
23
Meaning: Measure of C. T. (cont.)
3 Half of the NA managers have higher blood
pressure than the mean. Median values
support point 2 for NA and suggests that
mean for “Others” may overestimate the
central tendency
4 Mode values reveal that most observed
numbers are below the mean for NA and
above for “Others.”
Dr. C. Ertuna
24
Evaluation: Measure of C. T. (cont.)
• High blood pressure is an indicator for stress and
strain
• The results suggest that the North American
managers of the company are under much more
stress than the managers of the company in the
other parts of the world
• If corrective measures are not taken than Errors,
Loss of Managerial Talent, etc. may occur in NA.
Dr. C. Ertuna
25
Next Lesson
(Lesson – 02B)
Measure of Dispersion
Dr. C. Ertuna
26