Download chap03 - Kent State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Mean field particle methods wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Statistics
1
Alan D. Smith
Descriptive Statistics Measures of Central Tendency
• Chapters 3 & 4
TODAY’S GOALS
2
TO CALCULATE THE ARITHMETIC MEAN,
THE WEIGHTED MEAN, THE MEDIAN, THE
MODE, AND THE GEOMETRIC MEAN.
TO EXPLAIN THE CHARACTERISTICS,
USE, ADVANTAGES, AND DISADVANTAGES
OF EACH MEASURE OF CENTRAL
TENDENCY.
TO IDENTIFY THE POSITION OF THE
ARITHMETIC MEAN,MEDIAN, AND MODE
FOR BOTH A SYMMETRICAL AND A
SKEWED DISTRIBUTION.
POPULATION MEAN
3
 Definition: For ungrouped data, the population
mean is the sum of all the population values
divided by the total number of population
values. To compute the population mean, use the
following formula.
Sigma
mu
X


N
Population
Size
Individual
value
EXAMPLE
4
 Parameter: A measurable characteristic of a
population. For example, the population mean.
 A racing team has a fleet of four cars. The
following are the miles covered by each car over
their lives: 23,000, 17,000, 9,000, and 13,000.
Find the average miles covered by each car.
 Since this fleet is the population, the mean is
(23,000 + 17,000 + 9,000 + 13,000)/4 = 15,500.
THE SAMPLE MEAN
5
• Definition: For ungrouped data, the sample
mean is the sum of all the sample values divided
by the number of sample values. To compute the
sample mean, use the following formula.
X-bar
Sigma

X
X n
Sample
Size
Individual
value
WHAT’S THE DIFFERENCE?
 Population Mean 
 Sample Mean x
?
6
EXAMPLE
7
 Statistic: A measurable characteristic of a
sample. For example, the sample mean.
 A sample of five executives received the
following amounts of bonus last year: 14, 15, 17,
16, and 15 in $1,000. Find the average bonus for
these five executives.
 Since these values represent a sample of size 5,
the sample mean is (14,000 + 15,000 + 17,000 +
16,000 + 15,000)/5 = $15,400.
PROPERTIES OF THE ARITHMETIC
MEAN
8
 Every set of interval-level and ratio-level data
has a mean.
 All the values are included in computing the
mean.
 A set of data has a unique mean.
 The mean is affected by unusually large or small
data values.
 The arithmetic mean is the only measure of
central tendency where the sum of the deviations
of each value from the mean is zero.
Sum of Deviations
9
 Consider the set of values: 3, 8, and 4. The mean
is 5. So (3 -5) + (8 - 5) + (4 - 5) = -2 + 3 - 1 = 0.
 Symbolically we write:
( X  X )  0
 The mean is also known as the “expected value”
or “average.”
THE MEDIAN
10
 Definition: The midpoint of the values after they
have been ordered from the smallest to the
largest, or the largest to the smallest. There are
as many values above the median as below it in
the data array.
 Note: For an odd set of numbers, the median will
be the middle number in the ordered array.
 Note: For an even set of numbers, the median
will be the arithmetic average of the two middle
numbers.
EXAMPLE
11
 Compute the median:
 The road life for a sample of five tires in miles is:
42,000 51,000 40,000 39,000 48,000
 Arranging the data in ascending order gives:
39,000 40,000 42,000 48,000 51,000. Thus
the median is 42,000 miles.
EXAMPLE
12
 Compute the median:
 The following values are years of service for a
sample of six store managers: 16 12 8 15 7
23.
 Arranging in order gives 7 8 12 15 16 23.
Thus the median is (12 + 15)/2 = 13.5 years.
PROPERTIES OF THE MEDIAN
13
• There is a unique median for each data set.
• It is not affected by extremely large or small
values and is therefore a valuable measure of
central tendency when such values occur.
• It can be computed for ratio-level, interval-level,
and ordinal-level data.
• It can be computed for an open-ended frequency
distribution if the median does not lie in an
open-ended class.
THE MODE
14
• Definition: The mode is that value of the
observation that appears most frequently
• The exam scores for ten students are: 81 93 75
68 87 81 75 81 87. What is the modal exam
score?
• Since the score of 81 occurs the most, then the
modal score is 81.
• The next slide shows the histogram with six
classes for the water consumption from our
previous class. Observe that the modal class is
the blue box with a midpoint of 15.
WATER CONSUMPTION IN 1,000 GALLONS
15
THE WEIGHTED MEAN
16
 Definition: The weighted mean of a set of
numbers X1, X2, ..., Xn, with corresponding
weights w1, w2, ... , wn, is computed from the
following formula.
w1 X1  w2 X 2 ...wn X n
Xw 
w1  w2 ...wn
(
w

X
)

or X w 
w
 Why would anyone want to weight observations?
EXAMPLE
17
 During a one hour period on a busy Friday
night, fifty soft drinks were sold at the Kruzin
Cafe. Compute the weighted mean of the price
of the soft drinks. (Price ($), Number sold):
(0.5, 5), (0.75, 15), (0.9, 15), and (1.10, 15).
 The weighted mean is
[0.55 + 0.7515 + 0.915 + 1.115]/[5 +15+15+ 15]
= $43.75/50 =
$0.875
THE GEOMETRIC MEAN
18
 Definition: The geometric mean (GM) of a set of
n numbers is defined as the nth root of the
product of the n numbers. The formula for the
geometric mean is given by:












GM  n X X X ...  X n 
1 2 3
 One main use of the geometric mean is to
average percents.
 How do you compute the nth root of a number?
EXAMPLE
19
 The profits earned by ABC Construction on
three projects were 6, 3, and 2 percent
respectively. Compute the geometric mean
profit and the arithmetic mean and compare.
 The geometric mean is GM  3 (6)(3)(2)  33019
.
.
 The arithmetic mean profit =(6 + 3 + 2)/3 =
3.6667.
 The geometric mean of 3.3019 gives a more
conservative profit figure than the arithmetic
mean of 3.6667. This is because the GM is not
heavily weighted by the profit of 6 percent.
THE GEOMETRIC MEAN (continued)
20
 The other main use of the geometric mean to
determine the average percent increase in sales,
production or other business or economic series
from one time period to another.
 The formula for the geometric mean as applied
to this type of problem is:
G M  n 1
V a lu e a t en d o f p erio d
V a lu e a t b eg in n in g o f p erio d
 Where did this come from?
1
EXAMPLE
21
 The total enrollment at a large university
increased from 18,246 in 1985 to 22,840 in 1995.
Compute the geometric mean rate of increase
over the period.
 Here n = 10, so n - 1 = 9 = (number of periods)
 The geometric mean rate of increase is given by
22,840
9
GM 
 1  00253
.
.
18,246
 That is, the geometric mean rate of increase is
2.53%.
EXAMPLE
22
• Do you prefer I use the arithmetic mean or the
geometric mean to compute class score
averages?
?
THE MEAN OF GROUPED DATA
23
 The mean of a sample of data organized in a
frequency distribution is computed by the
following formula:
X-bar
X values Class midpoint
Xf  Xf

X
 n
f
Sum of
frequencies
Sample
size
f - class
frequency
EXAMPLE
24
 A sample of twenty appliance stores in a large
metropolitan area revealed the following number
of VCR’s sold last week. Compute the mean
number sold. The formula and computation is
shown below.
fX
fX


X
 n
f
= 325/20
= 16.25 VCR’s
EXAMPLE (continued)
25
 The table also gives the necessary computations.
SYMMETRIC DISTRIBUTION
Symmetric
Distribution
Zero
Skewness
Mode = Median = Mean
26
RIGHT SKEWED DISTRIBUTION
27
MODE
MEDIAN
MEAN
Positively skewed
Mean and median
are to the RIGHT of
the mode.
LEFT SKEWED DISTRIBUTION
MODE
MEDIAN
MEAN
Negatively skewed
Mean and median
are to the LEFT of
the mode.
28
USEFUL RELATIONSHIPS
If two averages of a moderately skewed
frequency distribution are known, the third
can be approximated. The formulas are:
Mode = Mean - 3(Mean - Median)
Mean = [3(Median) - Mode]/2
Median = [2(Mean) + Mode]/3
29
READING ASSIGNMENT
• Read Chapter 4 and 5 of text.
James S. Hawkes
30
TODAY’S GOALS
31
 TO COMPARE (COMPUTE) VARIOUS MEASURES
OF DISPERSION FOR GROUPED AND
UNGROUPED DATA.
 TO EXPLAIN THE CHARACTERISTICS, USES,
ADVANTAGES, AND DISADVANTAGES OF EACH
MEASURE OF DISPERSION.
 TO EXPLAIN CHEBYSHEV’S THEOREM AND THE
EMPIRICAL (NORMAL) RULE.
 TO COMPUTE THE COEFFICIENTS OF
VARIATION AND SKEWNESS.
DEMONSTRATE COMPUTING STATISTICS WITH
EXCEL.
MEASURES OF DISPERSION UNGROUPED DATA
32
 Range: For ungrouped data, the range is the
difference between the highest and lowest values
in a set of data. To compute the range, use the
following formula.
RANGE = HIGHEST VALUE - LOWEST VALUE
 EXAMPLE : A sample of five recent accounting
graduates revealed the following starting salaries
(in $1000): 17 26 18 20 19. The range is thus
$26,000 - $17,000 = $9,000.
MEAN DEVIATION
33
 Mean Deviation: The arithmetic mean of the
absolute values of the deviations from the
arithmetic mean. It is computed by the formula
below:
Individual
Value
XX
MD 
n
Sample
Size
Arithmetic
Mean
EXAMPLE
34
The weights of a sample of crates ready for
shipment to France are(in kg) 103, 97, 101, 106,
and 103.
1. X = 510/5 = 102 kg.
|103-102| +
|97-102| +
|101-102| +
|106 - 102| +
|103 - 102| = ?
2. MD = 12/5 = 2.4 kg.
3. Typically, the
weights of the crates
are 2.4 kg from the
mean weight of 102 kg.
POPULATION VARIANCE
35
 Population Variance: The population variance
for ungrouped data is the arithmetic mean of the
squared deviations from the population mean. It
is computed from the formula below:
Individual
value
2 
Sigma
square
Population mean
2
( X   )
N
Population
size
EXAMPLE
36
 The ages of all the patients in the isolation ward
of Yellowstone Hospital are 38, 26, 13, 41, and 22
years. What is the population variance? The
computations are given below.
 = (X)/N
= 140/5 = 28.
2 = (X - )2/N
= 534/5
= 106.8.
ALTERNATIVE FORMULA FOR THE
POPULATION VARIANCE
37
2
2


X
X


2


 


N
N


Verify, using above formula, that the population
variance is 106.8 for the previous example.
Why would you use this formula?
THE POPULATION STANDARD
DEVIATION
38
 Population Standard Deviation: The population
standard deviation () is the square root of the
population variance.
 For the previous example, the population
standard deviation is  = 10.3344 (square root of
106.8).
 Note: If you are given the population standard
deviation, just square that number to get the
population variance.
SAMPLE VARIANCE
39
 Sample Variance: The formula for the sample
variance for ungrouped data is:
Sample
variance
OR

(
X
2
s
X)
n 1
2
(
X
)



X
n
s2 
n 1
2
2
This sample variance is used to estimate the
population variance.
EXAMPLE
 A sample of five hourly wages for blue-collar
40
jobs is: 17 26 18 20 19. Find the variance.
X
= 100/5 = 20
s2 = 50/(5 - 1)
= 12.5.
SAMPLE STANDARD DEVIATION
41
 Sample Standard Deviation: The sample
standard deviation (s) is the square root of the
sample variance.
 For the previous example, the sample standard
= 3.5355 (square root of 12.5).
 Note: If you are given the sample standard
deviation, just square that number to get the
sample variance.
s
INTERPRETATION AND USES OF THE
STANDARD DEVIATION
42
 Chebyshev’s theorem: For any set of
observations (sample or population), the
minimum proportion of the values that lie within
k standard deviations of the mean is at least
1 - 1/k2, where k is any constant greater than 1.
 Empirical Rule: For any symmetrical, bellshaped distribution, approximately 68% of the
observations will lie within  1 of the mean ();
approximately 98% within  2 of the mean ();
and approximately 99.7% within  3 of the
mean ().
Bell-Shaped Curve showing the relationship between  and .
Between:
1. 68.26%
2. 95.44%
3. 99.97%
3
2 1

1 2 3
43
INTERQUARTILE RANGE
44
Interquartile range: Distance between the third
quartile Q3 and the first quartile Q1.
 First Quartile: It is the value corresponding to
the point below which 25% of the observations
lie in an ordered data set.
 Third Quartile: It is the value corresponding to
the point below which 75% of the observations
lie in an ordered data set.
Interquartile range
 Third quartile  First quartile
= Q Q
3 1
PERCENTILE RANGE
45
 Percentiles: Each data set has 99 percentiles,
thus dividing the set into 100 equal parts. Note:
in order to determine percentiles, you must first
order the set.
 Percentile Range: The 10-to-90 percentile range
is the distance between the 10th and 90th
percentiles.
10-to-90 Percentile Range
10%
Mi
n
P10
80%
10%
P90
Max
RELATIVE DISPERSION
46
 Coefficient of Variation: The ratio of the
standard deviation to the arithmetic mean,
expressed as a percentage.
s
CV  (100%)
X
 For example, if the CV for the yield of two
different stocks are 10 and 25. The stock with
the larger CV has more variation relative to the
mean yield. That is, the yield for this stock is not
as stable as the other.
SKEWNESS
47
 Skewness: Measurement of the lack of symmetry
of the distribution.
 The coefficient of skewness is computed from the
following formula:
Sk = 3(Mean - Median)/(Standard deviation)
Note: There are other coefficients of skewness.
SYMMETRIC DISTRIBUTION
Symmetric
Distribution
Zero
Skewness
Mode = Median = Mean
48
RIGHT SKEWED DISTRIBUTION
49
MODE
MEDIAN
MEAN
Positively skewed
Mean and median
are to the RIGHT of
the mode.
LEFT SKEWED DISTRIBUTION
MODE
MEDIAN
MEAN
Negatively skewed
Mean and median
are to the LEFT of
the mode.
50
EXCEL FUNCTIONS
=AVERAGE(A1:A10)
Arithmetic Mean
=MEDIAN(A1:A10)
Median Value
=MODE(A1:A10)
Modal Value
=GEOMEAN(A1:A10)
Geometric Mean
=QUARTILE(A1:A10,Q)
Quartile Q Value
=MAX(A1:A10)-MIN(A1:A10) Range
=PERCENTILE(A1:A10,P)
Percentile P Value
51
MORE EXCEL FUNCTIONS
52
=AVEDEV(A1:A10) Mean Absolute Deviation (MAD)
=VAR(A1:A10)
Sample Variance
=STDEV(A1:A10)
Sample Standard Deviation
=VARP(A1:A10)
Population Variance
=STDEVP(A1:A10) Population Standard Deviation
=STDEV(A1:A10)/AVERAGE(A1:A10) Coefficeint of
Variation
=SKEW(A1:A10)
Coefficient of Skewness
(not same as book)