Download STA 291 Fall 2009

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 6
Dustin Lueker

Statistics that describe variability
◦ Two distributions may have the same mean
and/or median but different variability
 Mean and Median only describe a typical value, but
not the spread of the data
◦
◦
◦
◦
Range
Variance
Standard Deviation
Interquartile Range
 All of these can be computed for the sample or
population
STA 291 Fall 2009 Lecture 6
2

Difference between the largest and smallest
observation
◦ Very much affected by outliers
 A misrecorded observation may lead to an outlier, and
affect the range

The range does not always reveal different
variation about the mean
STA 291 Fall 2009 Lecture 6
3

Sample 1
◦ Smallest Observation: 112
◦ Largest Observation: 797
◦ Range =

Sample 2
◦ Smallest Observation: 15033
◦ Largest Observation: 16125
◦ Range =
STA 291 Fall 2009 Lecture 6
4

The deviation of the ith observation xi from
the sample mean x is the difference between
them, ( xi  x )
◦ Sum of all deviations is zero
◦ Therefore, we use either the sum of the absolute
deviations or the sum of the squared deviations as
a measure of variation
STA 291 Fall 2009 Lecture 6
5

Variance of n observations is the sum of the
squared deviations, divided by n-1
s
2
(x


i
x)
2
n 1
STA 291 Fall 2009 Lecture 6
6
Observation
Mean
Deviation
Squared
Deviation
1
3
4
7
10
Sum of the Squared Deviations
n-1
Sum of the Squared Deviations / (n-1)
STA 291 Fall 2009 Lecture 6
7

About the average of the squared deviations
◦ “average squared distance from the mean”

Unit
◦ Square of the unit for the original data
 Difficult to interpret
◦ Solution
 Take the square root of the variance, and the unit is
the same as for the original data
 Standard Deviation
STA 291 Fall 2009 Lecture 6
8

s≥0
◦ s = 0 only when all observations are the same


If data is collected for the whole population
instead of a sample, then n-1 is replaced by n
s is sensitive to outliers
STA 291 Fall 2009 Lecture 6
9

Sample
◦ Variance
s2 
2
(
x
i

x
)

n 1
◦ Standard Deviation

Population
◦ Variance
2 
s
2
(
x
i

x
)

n 1
2
(
x
i


)

◦ Standard Deviation
N

2
(
x
i


)

N
STA 291 Fall 2009 Lecture 6
10

Population mean and population standard deviation
are denoted by the Greek letters μ (mu) and σ
(sigma)
◦ They are unknown constants that we would like to estimate

Sample mean and sample standard deviation are
denoted by x and s
◦ They are random variables, because their values vary
according to the random sample that has been selected
STA 291 Fall 2009 Lecture 6
11

If the data is approximately symmetric and
bell-shaped then
◦ About 68% of the observations are within one
standard deviation from the mean
◦ About 95% of the observations are within two
standard deviations from the mean
◦ About 99.7% of the observations are within three
standard deviations from the mean
STA 291 Fall 2009 Lecture 6
12

SAT scores are scaled so that they have an
approximate bell-shaped distribution with a
mean of 500 and standard deviation of 100
◦ About 68% of the scores are between
◦ About 95% of the scores are between
◦ If you have a score above 700, you are in the
top
%
 What percentile would this be?
STA 291 Fall 2009 Lecture 6
13

According to the National Association of Home
Builders, the U.S. nationwide median selling price
of homes sold in 1995 was $118,000
◦ Would you expect the mean to be larger, smaller, or equal
to $118,000?
◦ Which of the following is the most plausible value for the
standard deviation?
(a) –15,000, (b) 1,000, (c) 45,000, (d) 1,000,000
STA 291 Fall 2009 Lecture 6
14

Standardized measure of variation
◦ Idea
 A standard deviation of 10 may indicate great
variability or small variability, depending on the
magnitude of the observations in the data set

CV = Ratio of standard deviation divided by
mean
◦ Population and sample version
STA 291 Fall 2009 Lecture 6
15

Which sample has higher relative variability?
(a higher coefficient of variation)
◦ Sample A
 mean = 62
 standard deviation = 12
 CV =
◦ Sample B
 mean = 31
 standard deviation = 7
 CV =
STA 291 Fall 2009 Lecture 6
16
Related documents