Download Summary Descriptive Measures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Summary Descriptive Measures
Percent
Location is an indicator of where the data is located.
10
0
90
80
70
60
50
40
30
20
35
30
25
20
15
10
5
0
10
Percentage
Projects Completed Early
Projects Completed Early
40
30
% 20
10
Plant B
0
10 15 20
25 30 35
40 45
50
Percent
Scale is a measure of how “spread out” data is.
Plant A
Criteria for Measures of Location and Scale
Must be well defined for:
Raw Data
Grouped Data
Theoretical Curves
For Business Purposes:
Must be arithmetic
Measures of Location
Mode
Simply the most frequent value in a data set.
Problems:
Raw Data:
Many data sets have no repeat values, therefore mode does not
exist.
Mode is taken as midpoint of the bin with the greatest
frequency.
But consider the data discussed in the last lecture.
Histogram of Labor Costs
30
Frequency
25
20
15
10
5
0
20
30
40
50
60
70
80
Labor Cost
Histogram of Labor Costs
35
30
Frequency
Grouped Data:
25
20
15
10
5
0
25
35
45
55
Labor Costs
65
75
Theoretical Data: Mode may not exist; consider the theoretical distribution of
random numbers which should look like:
6
7
8
9
0.
0.
0.
0.
x= random number
1
5
0.
3
0.
4
2
0.
0.
1
0.
1.2
1
0.8
0.6
0.4
0.2
0
0
f(x)
Uniform Density Function
Measures of Location
Median
The median is that data value which has approximately the same percentage
of observations below it as above it (for large data sets this proportion will approach
50%).
The word “median” comes from the Latin word “medius”, meaning
“middle”.
Raw Data:
Finding the median from raw data is a two step process. First you
must put the data in order, then you need to find the middle value.
Example:
Data = 3, -1, 6, 10, 11
Ordered Data = -1, 3, 6, 10, 11
Median = 6
If sample size is odd then median will be the value occupying
position (n+1)/2 in the ordered data.
Example:
Data = 3, -1, 6, 10, 11, 7
Ordered Data= -1, 3, 6, 7, 10, 11
Median = any value between 6 and 7. Usually average two
points to get 6.5 .
If sample size is even then median is the arithmetic average of
the values occupying positions (n/2) and (n/2) +1 in the ordered
data.
Notice: Median is not computed, it is found. For example replace the value of 11 in
the above example by 12,000. The median remains 6.5
Cannot be manipulated algebraically.
Finding the Median of Raw Data Using EXCEL
Open the file “thickdat.xls” in the MBA Mod 1 folder.
Find an empty cell and type in =median(
Then highlight the range of the data. You should see something that looks like the
following:
Finally, type in the right parenthesis.
The result is 355 which is the average of the 30th and 31st values, both of which
happen to be 355.
Finding the Median from Grouped Data
Suppose you did not have the raw data for steel thickness, but only had the data
grouped as shown below:
Interval
341.5
344.5
347.5
350.5
353.5
356.5
359.5
362.5
344.5
347.5
350.5
353.5
356.5
359.5
362.5
365.5
m(i)
Midpoint
f(i)
Freq
F
343
346
349
352
355
358
361
364
1
3
8
8
20
13
5
2
1
4
12
20
40
53
58
60
Using the column labeled “F”, it is clear that the 30th and 31st observations lie in the
interval [353.5 to 356.5].
Altogether there are 20 observations in the interval [353.5 to 356.5].
Since there are 20 observations below 353.5, we need 10 more to get to the 30th
value.
ASSUMPTION:
The data points in the interval are equi-spaced throughout the
interval
To get the 30th value, we need to go 10/20ths (or .5) into the interval. Since the bin is
3 units wide, we need to go a distance of (10/20)*3 = 1.5 into the interval. Therefore
we estimate the 30th value as 353.5 + 1.5 = 355
To get the 31st value, we need to go 11/20ths (or .55) into the interval. Since the bin
is 3 units wide, we need to go a distance of (11/20)*3 = 1.65 into the interval.
Therefore we estimate the 31st value as 353.5 + 1.65 = 355.15.
The median is estimated as median = (355 + 355.15)/2 = 355.075.
Finding the Median From Theoretical Probability
Distributions
If f(x) is the probability density function of x, the median is that value med
satisfying the integral equation:

med
 f ( x)dx .5

Problems with the Median
Suppose you had two groups of people. In Group 1 you had 50 people with a
median hourly wage of $15.00 per hour. In Group 2 you had 100 people with a
median hourly wage of $17.00 per hour. Given this information can you determine
the median hourly wage of all 150 people?
Consider the following data:
Time 1
median
Time 2 change
5
10
15
20
25
4
12
18
19
23
-1
2
3
-1
-2
15
18
-1
Change in median is 18 15 =3
Median Change
is -1