Download Statistics - WordPress.com

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Descriptive Statistics
Prepared By
Masood Amjad Khan
GCU, Lahore
Index
Subject
1. Index
2. Index
3. Statistics (Definitions)
4. Descriptive Statistics
5. Inferential Statistics
6. Examples of 4 and 5
7. Data, Level of measurements
8. Variable
9. Discrete variable
10. Continues variable
11. Frequency Distribution
12. Constructing Freq. Distn.
13. Example of 12
14. Displaying the Data
15. Bar Chart, Pie Chart
16. Stem Leaf Plot
17. Graph
18. Histogram
19. Frequency Polygon
20. Cumulative Freq. Polygon
Slide
No.
2
3
4
5
11
14
15
8
10
9
6
22, 23
24, 25
7
16
32-34
17
26, 27
28, 29
30, 31
Subject
Slide
No.
21. Summary Measures
18
22. Goals
19
23. Arithmetic Mean
37, 40
24. Characteristic of Mean
20
25. Examples of 23
38-39
26. Weighted Mean
41
27. Example weighted Mean
42
28. Geometric Mean
43
29. Example: Geometric Mean
44
30. Median
45
31. Example of Median
46
32. Properties of Median
47
33. Mode
48
34. Examples of Mode
49-50
35. Positions of mean, median
and mode.
51
36. Dispersion
52
37. Range and Mean Deviation
53
39. Example of Mean Deviation
54-55
40. Variance
56
Index
Subject
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
Examples of variance
Moments
Examples of Moments
Skewness
Types of Skewness
Coefficient of Skewness
Example of skewness
Empirical Rule
Exercise
Slide
No.
57-59
60
61-62
63
64
65
66-67
68-69
70
Subject
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
75.
79.
80.
Slide
No.
STATISTICS
Numerical Facts
(Common Usage)
1. No. of children born in a hospital
in some specified time.
2. No. of students enrolled in GCU
in 2007.
3. No of road accidents on motor
way.
4. Amount spent on Research
Development in GCU
during 2006-2007.
5. No. of shut down of Computer
Network on a particular day.
Examples of Descriptive
And Inferential
Statistics
Field or Discipline of Study
Definition
The Science of Collection, Presentation,
Analyzing and Interpretation of Data to make
Decisions and Forecasts.
Probability provides the transition
between Descriptive and
Inferential Statistics
Descriptive
Statistics
Inferential
Statistics
1
Descriptive Statistics
Consists of methods for Organizing, Displaying,
and Describing Data by using Tables, Graphs,
and Summary Measures.
Data
Types of Data
A data set is a collection of observations on one
or more variables.
1
1
Organizing the Data
Construction of
Frequency Distribution
Tables
Frequency Table
A grouping of qualitative data into
mutually exclusive classes showing
the number of observations in each
class.
Preference of four type of beverage
by 100 customers.
Beverage
Number
Cola-Plus
40
Coca-Cola
25
Pepsi
20
7-UP
15
Frequency Distribution
A grouping of quantitative data into
mutually exclusive classes showing
the number of observations in each
class.
Selling price of 80 vehicles
Vehicle Selling
Number of
Price
Vehicles
15000 to 24000
48
24000 to 33000
30
33000 to 42000
2
Displaying the Data
Diagrams/Charts
Stem and Leaf Plot
Graph
 Histogram
 Frequency Polygon
 Bar Chart
 Pie Chart
1
Variable
A characteristic under study that assumes different
values for different elements. (e.g Height of persons,
no. of students in GCU )
Quantitative
Variable
Qualitative or
Categorical variable
A variable that can not assume
a numerical value but can be
classified into two or more
non numeric categories is
called qualitative or categorical
variable.
 Educational achievements
 Marital status
 Brand of PC
A variable that can be measured
numerically is called quantitative
variable.
Continuous
variable
Discrete
variable
1
Go to Descriptive Statistics
Continuous variable
A variable whose observations can assume any
value within a specific range.
 Amount of income tax paid.
 Weight of a student.
 Yearly rainfall in Murree.
Time elapsed in successive network breakdown.
Back
1
Discrete variable
Variable that can assume only certain values, and there
are gaps between the values.





Children in a family
Strokes on a golf hole
TV set owned
Cars arriving at GCU in an hour
Students in each section of statistics course
1
Back
Inferential
Statistics
Consists of methods, that use sample results to help
make decisions or predictions about population.
1
Sample
1. A portion of population selected for study.
2. A sub set of Data selected from a population.
Estimation
Point
Estimation
Selecting a Sample
Testing of
Hypothesis
Interval
Estimation
Go to Inferential Statistics
1
Population
1. Consists of all-individual items or objects-whose
characteristics are being studied.
2. Collection of Data that describe some phenomenon
of interest.
Examples
Finite Population
Infinite Population
 Length of fish in particular lake.
 No. of students of Statistics
course in BCS.
 No. of traffic violations on some
specific holiday.
 Depth of a lake from any conceived
position.
 Length of life of certain brand of
light bulb.
 Stars on sky.
Go to Inferential Statistics
1
Descriptive and Inferential
Statistics
Examples
Descriptive
1. At least 5% of all fires reported
last year in Lahore were
deliberately set.
2. Next to colonial homes, more
residents in specified locality
prefer a contemporary design.
Inferential
1. As a result of recent poll, most
Pakistanis are in favor of
independent and powerful parliament.
2. As a result of recent cutbacks by the
oil-producing nations, we can expect
the price of gasoline to double in the
next year.
1
1
Types of Data
 Data can be classified according to level of measurement.
 The level of measurement dictates the calculations that can
be done to summarize and present the data.
 It also determines the statistical tests that should be performed.
Level of measurement
Nominal
Ordinal
Data may
only be
classified
Data are ranked no
meaningful difference
between values
 Jersey numbers
of football
player.
 Make of car.
 Your rank in
class.
 Team standings.
Interval
Meaningful
difference
between values.
 Temperature
 Dress size
Ratio
Meaningful 0 point
and ratio between
values.
 No. of patients seen
 No of sales call made
 Distance students
travel to class
Diagrams/Charts
Bar Chart
Pie Chart
A graph in which the classes
are reported on the horizontal
axis and the class frequencies on
vertical axis. The class frequencies
are proportional to the heights of
the bars.
A chart that shows the proportion or
percent that each class represents
of the total number of frequencies.
f
White
Fusion
red
Magnetic
lime
600
400
200
0
Bright
white
No. of
Covers(Class
Frequency)
Covers for Cell phones
Cover Color(variable of interest)
130
36
Black
104
29
Lime
325
90
Orange
455
126
Red
286
79
1300
360
n=
Back
Angle
Red
22%
Orange
35%
White
10%
Black
8%
Lime
25%
Angle = (f/n)360
1
Graphs
Histogram
Frequency
Polygon
Cumulative Frequency
Polygon
Go to Descriptive Statistics
1
Describing the Data
Summary Measures
Measures of
Location
Goals
Measures of
Dispersion
Moments
 Arithmetic Mean
 Weighted Arithmetic
Mean
 Geometric Mean
 Median
 Mode
 Moments about Origin
 Moments about mean
 Range, Mean Deviation
 Variance, Standard
Deviation
Skewness
1
Summary Measures
Goals
 Calculate the arithmetic mean,
weighted mean, median, mode,
and geometric mean.
 Explain the characteristics, uses,
advantages, and disadvantages
of each measure of location.
 Identify the position of the mean,
median, and mode for both
symmetric and skewed distributions.
 Compute and interpret the range,
mean deviation, variance, and
standard deviation.
 Understand the characteristics, uses,
advantages, and disadvantages
of each measure of dispersion.
 Understand Chebyshev’s theorem and
 the Empirical Rule as they relate to a set
of observations.
1
Characteristics of the Mean
The arithmetic mean is
the most widely used
measure of location. It
requires the interval
scale.
Its major characteristics
are:



All values are used.
It is unique.
The sum of the deviations
from the mean is 0.
 It is calculated by
summing the values and
dividing by the number of
values.





Every set of interval-level and
ratio-level data has a mean.
All the values are included in
computing the mean.
A set of data has a unique mean.
The mean is affected by unusually
large or small data values.
The arithmetic mean is the only
measure of central tendency where
the sum of the deviations of each
value from the mean is zero.
1
Selecting a Sample
Use of Tables of Random Numbers
 Random numbers are the randomly produced digits from 0 to 9.
 Table of random numbers contain rows and columns of these randomly
produced digits.
 In using Table, choose:
 the starting point at random
 read off the digits in groups containing either one, two, three, or more
of the digits in any predetermined direction (rows or columns).
Example
Choose a sample of size 7 from a group of 80 objects.
 Label the objects 01, 02, 03, …, 80 in any order.
 Arbitrarily enter the Table on any line and read out the pair of digits in any two
consecutive columns.
Go to Sample
 Ignore numbers which recur and those greater than 80.
1
Construction of Frequency Distribution
Step 1
Step 2
 How many no. of groups (classes)?  Determine the class interval (width).
 Just enough classes to reveal the  the class interval should be the same
shape of the distribution.
 Let k be the desired no. of classes.
 k should be such that 2k > n.
 If n = 80 and we choose k = 6,
26 =
then
64 which is < 80, so k = 6
is not desirable. If we take k = 7,
then 27 = 128, which is > 80, so no.
for all classes.
 The formula to determine class width:
i
where i is the class width, H is the
highest observed value, L is the
lowest observed value, and k is the
number of classes.
Next
of classes should be 7.
1
H L
k
Construction of Frequency Distribution
(continued)
Step 3
 Set the individual class limits.
 Class limits should be very clear.
 Class limits should not be
overlapping.
 Some time class width is rounded
which may increase the range H-L.
 Make the lower limit of the first
class a multiple of class width.
Step 4
 Make tally of observations falling
in each class.
Step 5
 Count the number of items in each
class (class frequency)
Back
Example
1
Construction of Frequency Distribution
( Example )
Raw Data
( Ungrouped Data )
23197
23372
20454
23591
24220
30655
22442
17891
18021
28683
30872
19587
21558
21639
24296
15935
20047
24285
24324
24609
26651
29076
20642
19889
19873
25251
25277
28034
23169
28337
17399
20895
20004
17357
20155
19688
28670
20818
19766
21981
20203
23765
25783
26661
24533
27453
32492
17968
24052
25799
15794
18263
23657
35851
20642
20633
20356
21442
21722
19331
32277
15546
29237
18890
20962
22845
26285
27896
35925
27443
17266
23613
21740
22374
24571
25449
22817
26613
19251
20445
Back
Continued
1
Construction of Frequency Distribution
( Example Continued )
 Following Step 1, with n = 80 k should be 7.
 Following Step 2 the class width should be 2911.
 The width size is usually rounded up to a number multiple of 10 or 100.
 The width size is taken as i = 3000.
 Following Step 3, with i = 3000 and k = 7, the range is 7×3000=21000.
 Where as the actual range is H – L = 35925 - 15546 = 20379.
 The lower limit of the first class should be a multiple of class width.
 Thus the lower limit of starting class is taken as 15000.
 Following Step 4
and Step 5
Back
Selling Price
Frequency
15000 up to 18000
8
18000 up to 21000
23
21000 up to 24000
17
24000 up to 27000
18
27000 up to 30000
8
30000 up to 33000
4
33000 up to 36000
2
1
Total = 80
Histogram
A graph in which the classes are marked on the horizontal axis and the
class frequencies on the vertical axis. The class frequencies are represented
by the heights of the bars and the bars are drawn adjacent to each other.
Example 1 k = 6
cf
Histogram (Example 1)
Group
H
f
1.6 - 2.2
2.1
2
2
35
30
2.2 – 2.8
2.7
6
4
25
20
2.8 - 3.4
3.3
19
13
15
10
3.4 – 4.0
3.9
32
13
5
0
1.60
4.0 - 4.6
4.5
38
6
4.6 - 5.2
5.1
40
2
Next
2.20
2.80
3.40
4.00
4.60
5.20
Groups
1
Histogram
Example 1 k = 7
Group
1.5 - 2.0
H
cf
f
2
2
2
Histogram (Example 1)
40
2.5
4
2
2.5- 3.0
3
9
5
3.0 - 3.5
3.5
24
15
3.5- 4.0
4
32
8
30
Percent
2.0 - 2.5
20
10
0
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
4.0 - 4.5
4.5
38
6
4.5 - 5.0
5
40
2
Back
Groups
1
FrequencyPolygon
A graph in which the points formed by the intersections of the class
midpoints and the class frequencies are connected by line segments.
Frequency Polygon (Example 1)
Example 1 k = 6
Group
Mid pt
cf
35.0
f
3.10
3.70
30.0
1.9
2
2
2.2 - 2.8
2.5
6
4
2.8 - 3.4
3.1
19
13
25.0
Percent
1.6 - 2.2
20.0
15.0
4.30
10.0
3.4 – 4.0
3.7
32
13
4.0 - 4.6
4.3
38
6
2.50
5.0
1.90
0.0
1
4.6 - 5.2
4.9
Mid point = ( Li +Hi )/2
40
4.90
2
2
3
3
4
5
Raw Data
2
Back
1
Frequency Polygon
Continued
Example 1 k = 7
Mid pt
cf
f
1.5 – 2.0
1.75
2
2
2.0 - 2.5
2.25
3
1
2.5 – 3.0
2.75
3.0 - 3.5
3.25
7
22
4
15
3.5 – 4.0
3.75
32
10
4.0 - 4.5
4.25
37
5
Frequency Polygon (Example 1)
40.0
3.25
35.0
30.0
Percent
Group
25.0
3.75
20.0
15.0
10.0
4.25
2.75
5.0
1.75
0.0
1
2
4.75
2.25
3
4
Data Example1
4.5 – 5.0
4.75
Back
40
3
1
Cumulative Frequency
Polygon
A graph in which the points formed by the intersections of the class
midpoints and the class cumulative frequencies are connected by line
segments.
A cumulative frequency polygon portrays the number or percent of
observations below given value.
Example 1 k = 6
1.6 - 2.2
Mid pt
cf
f
1.9
2
2
2.2 - 2.8
2.5
6
4
2.8 - 3.4
3.1
19
13
3.4 – 4.0
3.7
32
13
4.0 - 4.6
4.3
38
6
4.6 - 5.2
4.9
40
2
100.0
Cumulative Percent
Group
Ogive Example 1
5.20
4.60
4.00
75.0
50.0
3.40
25.0
2.80
2.20
0.0
1
2
2
3
3
4
5
Data Example 1
Next
1
Cumulative Frequency Polygon
Continued
Example 1 K = 7
Mid pt
cf
f
1.5 – 2.0
1.75
2
2
2.0 - 2.5
2.25
3
1
2.5 – 3.0
2.75
7
4
3.0 - 3.5
3.25
22
15
3.5 – 4.0
3.75
32
10
Ogive Example 1
100.0
Cumulative Percent
Group
5.00
4.50
4.00
75.0
3.50
50.0
25.0
3.00
2.00 2.50
0.0
1
4.0 - 4.5
4.25
37
5
4.5 – 5.0
4.75
40
3
Back
2
3
4
Data Example 1
1
Stem and Leaf Plot
What is A Stem and Leaf Plot Diagram?
What Are They Used For?





A Stem and Leaf Plot is a type
of graph that is similar to a
histogram but shows more
information.
Summarizes the shape of a set
of data.
provides extra detail regarding
individual values.
The data is arranged by placed
value.
Stem and Leaf Plots are great
organizers for large amounts of
information.




The digits in the largest place
are referred to as the stem.
The digits in the smallest place
are referred to as the leaf
The leaves are always
displayed to the left of the
stem.
Series of scores on sports
teams, series of temperatures
or rainfall over a period of time,
series of classroom test scores
are examples of when Stem
and Leaf Plots could be used.
Constructing
Stem and Leaf Plot
1
Constructing
Stem and Leaf Plot

Make Stem and Leaf Plot with
the following temperatures for June.
77 80 82 68 65 59 61
57 50 62 61 70 69 64
67 70 62 65 65 73 76
87 80 82 83 79 79 71
80 77





Stem (Tens) and Leaf (Ones)

Temperature
Stem (Tens)
Leaf (Ones)
5
079
6
11224 555789
7
001367799
8
0002237

1
Begin with the lowest
temperature.
The lowest temperature of the
month was 50.
Enter the 5 in the tens column
and a 0 in the ones.
The next lowest is 57.
Enter a 7 in the ones
Next is 59, enter a 9 in the
ones.
find all of the temperatures that
were in the 60's, 70's and 80's.
Enter the rest of the
temperatures sequentially until
your Stem and Leaf Plot
contains all of the data.
Next
Stem and Leaf
Example
Make a Stem and Leaf Plot for the
following data.
Freq
Stem
6
0
234479
14
1
12233456778889
17
2
00111334455667889
2.4
0.7
3.9
2.8
1.3
1.6
2.9
2.6
3.7
2.1
3.2
3.5
1.8
3.1
0.3
4.6
0.9
3.4
2.3
2.5
0.4
2.1
2.3
1.5
4.3
8
3
12455799
1.8
2.4
1.3
2.6
1.8
2
4
36
2.7
0.4
2.8
3.5
1.4
1.7
3.9
1.1
5.9
2.0
2
5
39
5.3
6.3
0.2
2.0
1.9
1
6
3
1.2
2.5
2.1
1.2
1.7
1
50
Next
Leaf
Back
Stem and Leaf Plot
Example
Following are the car battery life
Data.
2.2
4.1
3.5
4.5
3.2
3.7
3
2.6
3.1
1.6
3.1
3.3
3.8
3.1
4.7
3.7
f
S
L
2
1
69
2
2566
9
25
3
00111112223334455677788
99
8
4
11234577
5
2.5
4.3
3.4
3.6
2.9
3.3
3.9
3.1
3.3
3.1
3.7
4.4
3.2
4.1
1.9
3.4
4.7
3.8
3.2
2.6
3.9
3
4.2
3.5
40
Make a Stem and Leaf Plot.
1
Next
Back
Stem and Leaf Plot
Example
Stem
1
69
1
2
2
4
2
5669
15
3
001111122233344
10
3
5567778899
5
4
11234
3
4
577
Leaf
Frequenc
y
2
Go to Stem and Leaf Plot
40
Back
1
Measures of Location
1
Point of
Equilibrium
Arithmetic Mean
Ungrouped Data
Population
N observations
X1, X2,…, XN in
the population.
Grouped Data
Sample
n observations
X1, X2 ,…, Xn in
the sample
n
N

X
i 1
N
i
X  X 2  ...  X N
 1
N
X
X
i 1
i
Population
Let Xi and fi be the mid
point and frequency
respectively of the ith
group in the population
The mean is defined as
Let Xi and fi be the mid
point and frequency
respectively of the ith
group in the sample
The mean is defined as
n
N
n

Next
Sample
fX
i 1
N
i
f
i 1
i
i
X
fX
i 1
n
i
f
i 1
i
i
Numerical Examples Of Arithmetic Mean
Ungrouped Data
Example of Sample Mean
Following is a random sample of
12 Clients showing the number of
minutes used by clients in a
particular cell phone last month.
Example of Population Mean
There are automobile manufacturing
Companies in the U.S.A. Listed below
is the no. of patents granted by the US
Government to each company.
Number of
90
110
89
113
91
94
100
112
77
92
119
83
Company
What is the mean number of
Minutes Used?
X
X
n
1
90  91  77  ...  83 1170


 97.5
12
12
Next
Patent Granted
Company
Patent Granted
General Motors
511
Mazda
Nissan
385
Chrysler
97
DaimlerChrysler
275
Porsche
50
Toyota
257
Mistubishi
36
Honda
249
Volvo
23
Ford
234
BMW
13
210
Is this information a sample or
population?

Back
Number of
X
N

511  385  ...  13 2340

 195
12
12
Numerical Examples Of Arithmetic Mean
Grouped Data
Following is the frequency distribution of Selling Prices of Vehicles at
Whitner Autoplex Last month.
Selling Price
Frequency
Midpoint
($ thousands)
f
X
fX
15 - 18
8
16.5
132.0
18 - 21
23
19.5
448.5
21 - 24
17
22.5
382.5
24 - 27
18
25.5
459.0
27 - 30
8
28.5
228.0
30 - 33
4
31.5
126.0
33 - 36
2
34.5
69.0
Total
Find arithmetic mean.
X
1845.0
 fX  1845  23.1
 f 80
So the mean vehicle selling price is $23100.
Back
80
Go to
Summary measures
1
Point of
Equilibrium
X1
X2
X3
f1
f2
f3
X4 X5
X f 4 f5
X6
f6
An object is balanced at X when
( X 1  X ) f1  ( X 2  X ) f 2  ( X 3  X ) f 3  ( X  X 4 ) f 4  ( X  X 5 ) f 5  ( X  X 6 ) f 6
f1 X 1  f 2 X 2  f3 X 3  ( f1  f 2  f3 ) X  ( f 4  f 5  f 6 ) X  ( f 4 X 4  f 5 X 5  f 6 X 6 )
f1 X 1  f 2 X 2  f3 X 3  f 4 X 4  f 5 X 5  f 6 X 6  ( f1  f 2  f 3  f 4  f 5  f 6 ) X
X
f1 X 1  f 2 X 2  f3 X 3  f 4 X 4  f 5 X 5  f 6 X 6
f1  f 2  f3  f 4  f5  f 6
6

fX
i 1
6
i
f
i 1
i
i
Back
1
Summary Measures
Weighted Mean


A special case of arithmetic mean.
Case when values of variable are
associated with certain quality, e.g
price of medium, large, and big
Soft Drink
Price
Weights
Medium
$0.90
3
Large
$1.25
4
Big
$1.50
3
The weight mean of a set of numbers
X1, X2, ..., Xn, with corresponding
weights w1, w2, ...,wn, is computed
from the following formula:
Xw 
w1 X 1  w2 X 2  ...  wn X n
w1  w2  ...  wn
n

w X
i 1
n
w
i 1
EXAMPLE
Weighted Mean
1
i
i
i
EXAMPLE
Weighted Mean
The Carter Construction Company pays its hourly employees
$16.50, $19.00, or $25.00 per hour. There are 26 hourly employees,
14 of which are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the
$25.00 rate. What is the mean hourly rate paid the 26 employees?
Go to
Summary measures
Back
1
Summary Measures
Geometric Mean
The geometric mean of a set of n
positive numbers is defined as the
nth root of the product of n values.
The formula for the geometric mean
is written:
GM  ( X1 X 2 ... X n )
1


n
The geometric mean used as the
average percent increase over time
n is calculated as:
GM  n
Value at the end of period
Value at the start of period

Useful in finding the average
change of percentages, ratios,
indexes, or growth rates over
time.
It has a wide application in
business and economics
because we are often
interested in finding the
percentage changes in sales,
salaries, or economic figures,
such as the GDP, which
compound or build on each
other.
The geometric mean will
always be less than or equal to
the arithmetic mean.
Example
1
Example of Geometric Mean
The return on investment by certain
Company for four successive years
was 30%, 20%, -40%, and 200%.
Find the geometric mean rate of
return on investment.
Solution:
The 1.3 represents the 30 percent
return on investment, i.e original
Investment of 1.0 plus the return of
0.3. So
If you earned $30000 in 1997 and
$50000 in 2007, what is your annual rate of
increase over the period?
GM  n
Value at the end of period
Value at the start of period
GM 
n
50000
 1  0.0524
30000
GM  4 (1.3)(1.2)(0.6)(3.0)  1.294
The annual rate of increase is 5.24 percent.
Which shows that the average return is
29.4 percent.
Summary Measures
Back
1
Median
Median is the midpoint of the values
after they have been ordered from
the smallest to the largest, or the
largest to the smallest
If number of observations n is odd,
the median is( n+1)/2th observation.
If n is even the median is the
average of n/2th and (n/2+1)th
observations
Example:
Determine the median for each set of
data.
(1) 41 15 39 54 31 15 33
(2) 15 16 27 28 41 42
Arrange the set of data
1)
(1) 15 15 31 33 39 41 54
(2) 15 16 27 28 41 42
n=7 median is 4th observation
that is 33.
2) n=6, median is average of 3rd and
4th observation, that is (27+28)/2
= 27.5.
Median for Grouped Data
The median is obtained by using the
formula:
X  Lm 
Im n
(  cf m 1 )
fm 2
Where m is the group of n/2th obs.
Lm, Im, fm, and cfm-1 are the lowest
value, class width, frequency, and
cumulative frequency respectively of
the mth group.
Example
1
Example
(Median)
Find the Median for the following
data.
Example 1
L
H
f
cf
1.60
<
2.20
2
2
2.20
<
2.80
4
6
2.80
<
3.40
13
19
3.40
<
4.00
13
32
4.00
<
4.60
6
38
4.60
<
5.20
2
40
n/2 = 20, so median group is 3.40-4.00
Lm = 3.40, Im = 0.6, fm = 13, cfm-1 = 19
0.6
X  3.40  (20  19)  3.45 3.5
13
Back
Go to Summary Measures
1
Properties of the Median




There is a unique median for each data set.
It is not affected by extremely large or small
values and is therefore a valuable measure of
central tendency when such values occur.
It can be computed for ratio-level, interval-level,
and ordinal-level data.
It can be computed for an open-ended
frequency distribution if the median does not lie
in an open-ended class.
Go to Summary Measures
1
Mode
Region
No. of Seniors
The mode is the value of the
observation that appears most
frequently.
No. of Seniors
679
E.S.Central
196
W.S.Central
436
Mountain
346
Pacific
783
Pacific
S.Atlantic
Mountain
367
W.S.Central
W.N.Central
E.S.Central
815
S.Atlantic
E.N.Central
W.N.Central
818
E.N.Central
Middle Atlantic
Middle
Atlantic
524
New
England
New England
900
800
700
600
500
400
300
200
100
0
Regions
Next
M
o
d
e
1
Mode
(Example)
Next
Back
1
Mode
Grouped Data
Calculating Mode for Grouped Data.
Mode  Lm 
f m  f m 1
Im
( f m  f m 1 )  ( f m  f m 1 )
Calculate the mode of the following
Distribution.
Group
f
1.6 - 2.2
2
2.2 - 2.8
4
2.8 - 3.4
14
3.4 - 4.0
12
4.0 - 4.6
6
4.6 - 5.1
2
Back
Solution:
Modal Group is 2.8 - 3.4
fm = 14, fm-1 = 4, fm+1 = 12 and Im= 0.6
Mode  Lm 
f m  f m 1
Im
( f m  f m 1 )  ( f m  f m 1 )
 2.8 
14  4
0.6
(14  4)  (14  12)
 3.3
Go to Summary Measures
1
The Relative Positions of
the Mean, Median and the Mode
Go to Summary Measures
1
Dispersion
Why Study Dispersion?
 A measure of location, such as the
mean or the median, only
describes the center of the data.
It is valuable from that standpoint,
but it does not tell us anything
about the spread of the data.
 For example, if your nature guide
told you that the river ahead
averaged 3 feet in depth, would
you want to wade across on foot
without additional information?
Probably not. You would want to
know something about the
variation in the depth.
 A second reason for studying the
dispersion in a set of data is to
compare the spread in two or
more distributions.
Studying dispersion through display.
Next
1
Range and
Mean Deviation

Range
Range = Largest value – Smallest value

Mean Deviation
n
M .D 
X
i 1
i
X
n
Example
The number of cappuccinos sold at
the Starbucks location in the Orange
Country Airport between 4 and 7p.m.
for a sample of 5 days last year were
20, 40, 50, 60, and 80. Determine the
mean deviation for the number of
cappuccinos sold.
Range = Largest – Smallest value
= 80 – 20 = 60
Next
Back
1
Mean Deviation
Example
Solution
Number of
Cappuccinos
Example
The number of cappuccinos sold
at he Starbucks location in the
Orange Country Airport between
4 and 7 p.m. for a sample of 5
days last year were 20, 40, 50,60,
and 80.
Determine the mean deviation for
the number of cappuccinos sold.
n
M .D 
X
i 1
i
n
X

Sold Daily ( X )
Absolute Deviation
X X
X X
20
20 - 50 = 30
30
40
40 - 50 = 10
10
50
50 - 50 = 0
0
60
60 - 50 = 10
10
80
80 - 50 = 30
30
80
 16
5
Total
Next
Back
80
1
Mean Deviation
(Grouped Data)
Mean Deviation for Grouped Data
k
MD 

i 1
fi X i  X
k
f
i 1
Selling Price
($ thousands)
15 - 18
i
18 - 21
21 - 24
24 - 27
Frequency
f
8
23
17
18
27 - 30
30 - 33
33 - 36
8
4
2
Total
80
X
 fX
f

1845
 23.1
80
Back
X
16.5
19.5
22.5
25.5
28.5
31.5
34.5
f
8
23
17
18
8
4
2
X X
X
16.5
19.5
22.5
25.5
28.5
31.5
34.5
f X X
-6.6
-3.6
-0.6
2.4
5.4
8.4
11.4
52.8
82.8
10.2
43.2
43.2
33.6
22.8
Total
k
MD 
f
i 1
Xi  X
i
k
f
i 1
Go to Summary Measures

288.6
288.6
 3.61
80
i
1
Variance and
Standard Deviation
Population variance and standard
deviation.
Let X1, X2,…, XN be N observations
in the population.
The variance is defined as:
N
 
2
(X
i 1
  )2
i
n
N
The standard deviation is defined as:
N
 
(X
i 1
The sample variance and
Standard deviation.
Let X1, X2,…, Xn be n observations
in the sample.
The variance is defined as:
i
  )2
s2 
(X
i 1
 X )2
n 1
The standard deviation is defined
as:
n
N
s
Next
i
(X
i 1
i
 X )2
n 1
1
Example
Variance and standard deviation
The number of traffic citations issued
during the last five months in
Beaufort County, South Carolina, is
38, 26, 13, 41, and 22. What is the
population variance?
The hourly wages for a sample of
part-time employees at Home Depot
are: $12, $20, $16, $18, and $19.
What is the sample variance?
X
Hourly Wage
85
 17.0
5
2
$ ( X ) X  X (X  X )
Next
Back
2
12
-5
25
20
3
9
16
-1
1
18
1
1
19
2
4
85
0
40
n
s2 

(X
i 1
i
 X )2
n 1
40
 10.0
4
Example
Grouped Data
The sample standard deviation is defined as:
f ( X  X )2

s
( f )  1
Example:
For the following frequency distribution of prices of vehicle, compute the
standard deviation of the prices.
Next
Back
2
Example
(continued)
Alternate method of computing variance is:
Example
fX2
Group
Mid pt (X)
f
1.5- 2.0
1.75
2
3.5
6.125
2.0 - 2.5
2.25
2
4.5
10.13
2.5 - 3.0
2.75
5
13.75
37.81
3.0 - 3.5
3.25
15
48.75
158.4
3.5 - 4.0
3.75
8
30
112.5
4.0 - 4.5
4.25
6
25.5
108.4
4.5 - 5.0
4.75
2
9.5
45.13
135.5
478.5
Total
Back
40
fX
( fX ) 2
1
2
s 
( fX 
)
n 1
n
2
1
(135.5) 2
s 
(478.5 
)  0.5
40  1
40
2
Go to Measures of Dispersion
2
Moments
Moments about Origin
The rth moment about origin ‘a’ is
defined as:
 ( X  a)
m 
r
Moments of Grouped Data
The rth moment about origin ‘a’ is
defined as:
r
 f ( X  a)
m 
f
n
r
Moments about Mean
The rth moment about mean is
defined as:
mr
(X  X )


The rth moment about mean is
defined as:
r
n
First moment about mean is Zero.
Next
r
mr 

f ( X  X )r
f
First moment about mean is Zero.
2
Example of Moments
f ( X  X )2
Mid pt
Moments about
Mean.
Group
(X)
f
fX
1.5- 2.0
1.75
2
3.5
2.0 - 2.5
2.25
2
4.5
2.5 - 3.0
2.75
5
13.75
3.0 - 3.5
3.25
15
48.75
3.5 - 4.0
3.75
8
30
4.0 - 4.5
4.25
6
25.5
4.5 - 5.0
4.75
2
9.5
Total
X
40
135.5
 fX  135.5  3.4
 f 40
Next
mr
f ( X  X )3
5.445
-8.98425
14.824013
2.645
-3.04175
3.4980125
2.1125
-1.373125
0.8925313
0.3375
-0.050625
0.0075937
0.98
0.343
0.12005
4.335
3.68475
3.1320375
3.645
4.92075
6.6430125
19.5
-4.50125
29.11725
 f (X  X )

f
r
-4.50125
m3 
 0.1125
40
Back
f ( X  X )4
m2 
m4 
19.5
 0.5
40
29.11725
 0.7279
40
2
Example of Moments
(Continued)
X X
Example
f ( X  X ) f ( X  X )2
f ( X  X )3
f ( X  X )4
Class
f
X
fX
0.0-0.8
5
0.4
2
-1.97
-9.84
19.37
-38.11
75.00
0.8-1.6
9
1.2
10.8
-1.17
-10.51
12.28
-14.34
16.75
1.6-2.4
15
2
30
-0.37
-5.52
2.03
-0.75
0.28
2.4-3.2
10
2.8
28
0.43
4.32
1.87
0.81
0.35
3.2-4.0
6
3.6
21.6
1.23
7.39
9.11
11.22
13.82
4.0-4.8
2
4.4
8.8
2.03
4.06
8.26
16.78
34.10
4.8-5.6
1
5.2
5.2
2.83
2.83
8.02
22.71
64.32
5.6-6.4
2
6
12
3.63
7.26
26.38
95.82
348.03
87.31 0

 1.75
50
87.31
Total
 fX
X 
f
m4
50
118.4

 2.37
50
 f (X  X )

f
4

m2


118.4f ( X  X )2
552.65
 11.05
50
f
Back
m3
94.14
f (X  X )


f
3
Go to Dispersion
552.65

94.14
 1.88
50
2
Skewness



Mean, median and mode are
measures of central location for
a set of observations and
measures of data dispersion are
range and the standard
deviation.
Another characteristic of a set
of data is the shape.
There are four shapes
commonly observed:
 symmetric,
 positively skewed,
 negatively skewed,
 Bimodal

The coefficient of skewness can
range from -3 up to 3.
 A value near -3, such as 2.57, indicates considerable
negative skewness.
 A value such as 1.63
indicates moderate positive
skewness.
 A value of 0, which will
occur when the mean and
median are equal, indicates
the distribution is
symmetrical and that there
is no skewness present.
Next
2
Types of
Skewness
Next
Back
2
Coefficient of
Skewness
The Pearson coefficient of skewness is
defined as:
3( X  X )
sk 
s
Example


Following are the earnings per share for
a sample of 15 software companies for
the year 2005. The earnings per share
are arranged from smallest to largest.
Compute the mean, median, and
standard deviation. Find the coefficient of
skewness using Pearson’s estimate.
What is your conclusion regarding the
shape of the distribution?
Next
Back
Solution
X 
X
n
$74.26

 $4.95
15

s

X X

2
n 1
($0.09  $4.95) 2  ...  ($16.40  $4.95) 2 )

15  1
 $5.22
3( X  Median)
s
3($4.95  $3.18)

 1.017
$5.22
sk 

The shape is moderately positively
skewed.
2
Example of Skewness
(Continued)
X  Lm 
Example
cf
fX
5
5
2
0.8-1.6
8
13
9.6
12.14
1.6-2.4
14
27
28
2.61
2.4-3.2
11
38
30.8
1.49
3.2-4.0
7
45
25.2
9.55
4.0-4.8
2
47
8.8
7.75
4.8-5.6
1
48
5.2
7.66
5.6-6.4
2
50
12
25.46
Total
50
118.4
87.31
Class
f
0.0-0.8
X
 fX  121.6  2.43
 f 50
Back
 1.6 
f (20.65
X  X )2
s
 f (X  X )
n 1
Next
sk 
Im n
(  cf m 1 )
fm 2
0.8 50
(
 13)  2.29
14
2
3( X  X ) 3(2.43  2.29)

 0.3147
s
1.3348
The skewness can also be measured
with moments
as:
m32
b 3
m2 = 1.75, m3 = 62
m2
b = 0.492
 The shape is slightly positively skewed
2

87.31
 1.3348
49
2
Go to Skewness
Example
Skewness
Histogram
30
25
Percent
20
15
10
5
0
0.00
0.80
1.60
2.40
3.20
4.00
4.80
5.60
6.40
Data
Mode
Back
Median
Next
Mean
Go to Skewness
2
Empirical Rule
Empirical Rule
For a symmetrical, bell-shaped frequency distribution:
 Approximately 68% of the observations will lie within plus and minus one standard
deviations of the mean. ( mean ±s.d )
 About 95% of the observations will lie within plus and minus two standard deviations
of the mean. ( mean ± 2s.d )
 Practically all (99.7%) wiill lie within plus and minus three standard deviations of the
mean. ( mean ± 3s.d )
 Let the mean of a symmetric distribution be 100 and standard deviation be 10, then
the empirical rule is as follows:
70
80 90 100 110 120
Next
68%
Back
95%
99.7%
130
Go to Skewness
2
Example
Empirical Rule
Consider the following distribution:
Group
f
X
fX
fX^2
1.6
2.5
3
3.4
3.8
1.5- 2.0
2
1.75
3.5
6.13
1.8
2.6
3.2
3.5
4.1
2.0 - 2.5
5
2.25
11.3
25.3
2
2.6
3.2
3.6
4.1
2.5 - 3.0
8
2.75
22
60.5
2.3
2.6
3.2
3.6
4.2
3.0 - 3.5
10
3.25
32.5
106
2.3
2.8
3.3
3.6
4.3
3.5 - 4.0
8
3.75
30
113
2.3
2.8
3.3
3.7
4.3
4.0 - 4.5
5
4.25
21.3
90.3
2.4
2.9
3.4
3.7
4.5
4.5 - 5.0
2
4.75
9.5
45.1
2.5
3
3.4
3.8
4.6
130
446
Check the empirical rule.
Mean = 3.2 s.d = 0.75
Mean ± sd = ( 2.45 – 3.95 ) ( 67.5%)
Mean ± 2sd = ( 1.7 – 4.7 ) ( 97.5%)
Mean ± 3sd = ( 0.89 – 5.45 ) (100%)
40
Mean = 3.25 sd = 0.77
Mean ± sd = ( 2.48 – 4.05) ( 67.5%)
Mean ± 2sd = ( 1.71 – 4.79 ) ( 97.5%)
Mean ± 3sd = ( 0.94 – 5.56 ) ( 100%)
Back
Next
2
Exercise
For the following data of examination
marks find the Mean, Median, Mode,
Mean Deviation and variance. Also
find the
Skewness. No. of students
Marks
30 – 39
8
40 – 49
87
50 – 59
190
60 – 69
304
70 – 79
211
80 – 89
85
90 - 99
20
Back
3
The following is the distribution of
Wages per thousand employees in a
Certain factory.
No. of
Daily Wages
Employees
22
24
26
28
30
32
34
36
38
40
42
44
3
13
43
102
175
220
204
139
69
25
6
1
Calculate the
Modal
and Median
wages. Why is
difference b/w
the two.