Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Describing Data: Numerical
Measures
Chapter
Three
McGraw-Hill/Irwin
© 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
In this Chapter (3), we learn to describe data using
2 numerical techniques:
1. Measures of Location
2. Measures of Dispersion
3- 3
Measure of location
Often, we want to know in a set of collected data:
What is a representative data or the typical value ?
OR
What is the center/average of the distribution ?
Egs. US Family income,
Price of a house in LA,
Rainfall in Seattle,
Batting scores.
Read the inset ‘Statistics in action’ on page 58.
3 types of ‘averages’.
•Mean
•Median
•Mode
3- 5
Population Mean is the sum of all the
population values divided by the total number of
population values:
X


N
where
µ “mu” is the population mean
N is the total number of observations. (Note the ‘Capital’ N)
X is a particular raw data value.
 “sigma” indicates the operation of adding.
Mean=Average=Arithmetic Mean (synonyms)
Two terms you should know:
i) Parameter - is a measurable characteristic of a population.
Hence, Population Mean μ is a Parameter.
Ii) Statistic - is a measurable characteristic of a sample.
Hence, Sample Mean x is a Statistic.
A Parameter is a measurable characteristic of a
population.
The Kiers
family owns
four cars. The
following is the
current mileage
on each of the
four cars.
X


N
3- 7
56,000
42,000
23,000
73,000
Find the mean mileage for the cars.
56,000  ...  73,000

 48,500
4
Example 1
3- 8
A Statistic is a measurable characteristic of a sample.
Sample Mean is the sum of all the
sample values divided by the number of
sample values:
“X bar”
(not μ!)
X
X 
n
where n is the total number of
values in the sample. ( Note the ‘small’ n )
3- 9
A sample of
five
executives
received the
following
bonus last
year ($000):
14.0,
15.0,
17.0,
16.0,
15.0
X 14 .0  ...  15 .0 77
X 


 15 .4
n
5
5
The sample mean here is a Statistic (ie, not a Parameter).
3- 10
Properties of the Mean
•
•
Every set of interval-level and ratio-level data has a mean.
A set of data has a unique mean.
•
Sum of deviations of each value from the mean is zero*.
•
All values included in computing the mean (a good thing).**
•
The mean is affected by unusually large or small outlier data
values (a shortcoming).
* see next slide
** not true of Median or Mode
3- 11
Consider the set of values: 3, 8, and 4.
The mean is 5. Illustrating the fifth
property. ie, sum of deviations is zero.
( X  X )  (3  5)  (8  5)  (4  5)  0
Example 3
3- 12
The Median is the value at the middle location after
all the data have been ordered from the smallest to
the largest.
For an odd set of values, the median will be the middle
number and is found at (n+1)/2 of data.
For an even set of values, the median will be the
arithmetic average of the two middle numbers and is
found around (n+1)/2 of data.
3- 13
The ages for a sample of five college students are:
21, 25, 19, 20, 22.
•Arrange
the data in
ascending order
•The median is at
location (5+1)/2
19, 20, 21, 22, 25.
Question:
Calculate the median if the age of the 5th student is 60 years (and not 25).
The heights of four basketball players, in inches,
are: 76, 73, 80, 75.
Arrange
the data in ascending order :
73, 75, 76, 80
The
median is around (4+1)/2 = 2.5th location
Take
the mean of 2nd & 3rd observation
Thus
the median is 75.5
3- 15
Properties of the Median

Unique to each data set

Not affected by extremely large or small
values (avoids influence of outlier values)

Can be computed for ordinal, interval and
ratio level data
Eg. A good measure for Housing Prices
3- 16
The Mode is another measure of location and
represents the value of the observation that appears
most frequently.
Examples: The exam scores for ten students are: 81, 93, 84, 75,
68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most
often, it is the mode.
Mode can help you with making decisions!
Prof. Beatle gives out more “B”s than any other grade.
If you have excess production capacity, you may make more of
the product that sells most.
Properties of the Mode
• Can be used for all levels of data (nominal, ordinal,
interval and ratio).
• Not affected by extreme values
Problems:
• If every data value is unique, there is no mode!
• You can have equal number of different values
leading to multiple modes
eg. Bimodal, trimodal, etc.
Practice
Problem #62
Page 87-88
State how you would do (a) & (b)?
What is the answer to (c)
3- 19
Dispersion
- spread or variability in the data.
If you are told the river ahead has an average depth of
4 feet, would you begin crossing it?
Mean by itself is not reliable if dispersion is high
Useful in comparing two sets of data with same mean
value
Measures of dispersion



Range
Variance
Standard deviation
3- 21
Range
The following represents the current year’s Return on
Equity of the 25 companies in an investor’s portfolio.
-8.1
-5.1
-3.1
-1.4
1.2
3.2
4.1
4.6
4.8
5.7
Lowest value: -8.1
5.9
6.3
7.9
7.9
8.0
8.1
9.2
9.5
9.7
10.3
12.3
13.3
14.0
15.0
22.1
Highest value: 22.1
Range = Highest value – lowest value
= 22.1-(-8.1)
= 30.2
Uses just 2 values!
Not useful if Range is
wide
3- 22
Variance:
- average of the squared deviations
from the mean.
- larger deviations are given higher weight
when squared.
Standard deviation:
- square root of the variance
- brings the variance to the same unit as the data
3- 23
Population Variance formula:


=
 (X - )2
N
X is the value of an observation in the population
μ is the arithmetic mean of the population
N is the number of observations in the population
Example – Page 78
Fill me in
σ is called the Population Standard Deviation
(has the same unit of measure as the original data)
3- 25
Sample variance (s2)
s2
=
(X - X)2
n-1
(n-1) & NOT n
Sample standard deviation (s)
s s
2
Guaranteed question in any stat test!
Watch out if data is a Population (σ) or Sample (s).
3- 26
Practice Time!
The hourly wages earned by a sample of five students are:
$7, $5, $11, $8, $6.
Find the sample variance and standard deviation.
X 37
X 

 7.40
n
5

 X  X 
7  7.4  ...  6  7.4
2
s 

n 1
5 1
21.2

 5.30
5 1
2
s
s 
2
2
5.30  2.30
2
Say, you conduct a study of heights of all students in class.
You compute mean and standard deviation. Now you
decided to compute,
how many students are within mean ± 1 s.d.
how many students are within mean ± 2 s.d.
how many students are within mean ± 3 s.d., … etc.
Chebyshev’s theorem:
For any set of observations, the minimum proportion of the
values that lie within k standard deviations of the mean is at
least:
1
1
k2
where k is any constant greater than 1.
3- 27
Practice!
Page 84
Problems: 49, 50
3- 29
Empirical Rule: For any symmetrical, bell-shaped
distribution:
About
68% of the observations will lie within 1 s.d.
the mean
About
95% of the observations will lie within 2 s.d.
of the mean
Virtually
the mean
all the observations will be within 3 s.d. of
Empirical Rule
See Page 496: Z Column – Values 1, 2, 3
Related documents