Download 9.3 When Things Are NOT Normal

Document related concepts
no text concepts found
Transcript
Section 9.3
When Things Aren’t
Normal
Center, Spread, and Shape
Center, Spread, and Shape
Center: goal is to estimate population mean
Center, Spread, and Shape
Center: goal is to estimate population mean
What is usually true about the population
mean,  ?
Center, Spread, and Shape
Center: goal is to estimate population mean
What is usually true about the population
mean,  ?
 is not known, otherwise we would not
have to estimate 
Center, Spread, and Shape
Center, Spread, and Shape
Spread: What is usually true about the
population standard deviation,  ?
Center, Spread, and Shape
Spread: What is usually true about the
population standard deviation,  ?
 usually unknown.
What do we do?
Center, Spread, and Shape
Spread: What is usually true about the
population standard deviation,  ?
 usually unknown. What do we do?
Substitute s for  and use critical values
of t* instead of critical values of z*
Center, Spread, and Shape
Center, Spread, and Shape
Shape: Because t-procedures are robust
(not very sensitive) to departures from
normality, you can usually get away with
less than a normally distributed
population.
Center, Spread, and Shape
Shape: Because t-procedures are robust
(not very sensitive) to departures from
normality, you can usually get away with
less than a normally distributed population.
You still have to check a condition about
shape.
Why Check Conditions?
If the sample size is small and
Why Check Conditions?
If the sample size is small and
if the underlying population is highly skewed
rather than normal or has extreme outliers,
then:
Why Check Conditions?
If the sample size is small and
if the underlying population is highly skewed
rather than normal or has extreme outliers,
then:
1) Capture rate for interval of the form
s
x  t* 
might be substantially lower
n
than the advertised capture rate
and
Why Check Conditions?
2) Significance test based on normal
distribution will falsely reject Ho
substantially more often than the
advertised rate, for example 5%
Before you conduct any tests or construct
any confidence intervals, always
your data to see their shape.
plot
If plot looks like data came from normal
population, then do not worry about
shape.
If plot looks like data came from normal
population, then do not worry about shape.
If plot shows any major deviations from
normal shape, you may try another
approach.
Try a Transformation
Try a Transformation
Skewed distributions almost always can be
made more nearly symmetric by
transforming them to a new scale.
Try a Transformation
Skewed distributions almost always can be
made more nearly symmetric by
transforming them to a new scale.
If a change of scale makes data look
roughly normal, again you do not need to
worry about shape.
Outliers may now look like “part of the herd”
Common Transformations
1) For distribution skewed right, try log
transformation
Common Transformations
1) For distribution skewed right, try log
transformation
2) When data are ratios, try reciprocal
transformation
Brain Weights for Selection of
68 Species of Animals
Page 603
Always Plot Data!!
Log Transformation
(logarithms of brain weights)
Log Transformation
(distribution of sample means)
Sample means for
100 samples of size
5 from the
logarithms of brain
weights
Reciprocal Transformation
Whenever data come in the form of a ratio,
think about what would happen if you
invert the ratio.
Always Plot Your Data!!!
Page 606
Always Plot Your Data!!!
Always Plot Your Data!!!
Page 606
Reciprocal Transformation
Reciprocal Transformation
Outliers
If changing the scale does not take care of
outliers, then . . .
Outliers
If changing the scale does not take care of
outliers, then do two analyses:
• one with all the data
• one without the outliers
Then what?
Outliers
If both analyses yield same conclusion,
then you are in good shape.
What if you get a “split decision”?
Outliers
If both analyses yield same conclusion, then
you are in good shape.
What if you get a “split decision”?
Get more data!
Worst cases are small samples with extreme
skewness or extreme outliers.
15/40 Guideline
Worst cases are small samples with extreme
skewness or extreme outliers.
To be safe in using the t-procedure, you can
rely on the 15/40 guideline.
15/40 Guideline
First, plot your data.
Modified boxplot helps identify shape and
outliers.
15/40 Guideline
If your random sample looks like it
reasonably could have come from a
normally distributed population, then
15/40 Guideline
If your random sample looks like it
reasonably could have come from a
normally distributed population, then you
can proceed with t-procedures for
confidence intervals and significance
tests.
15/40 Guideline
If you suspect data did not come from a
normally distributed population, follow
15/40 guideline.
15/40 Guideline
If you suspect data did not come from a
normally distributed population, follow
15/40 guideline.
If sample size is less than 15:
Be very careful. Your data or transformed
data must look as if they came from
normally distributed population . . . little
skewness, no outliers.
15/40 Guideline
If sample size between 15 and 40:
Proceed with caution.
If you do not transform the data or if outliers
remain even after a change of scale, do
two analyses of test or confidence interval,
one with and one without outliers.
15/40 Guideline
If sample size between 15 and 40:
Proceed with caution.
If you do not transform the data or if outliers
remain even after a change of scale, do
two analyses of test or confidence interval,
one with and one without outliers.
Do not rely on any conclusions that
depend on whether or not outliers
included.
15/40 Guideline
If sample size is at least 40:
You are in good shape.
15/40 Guideline
If sample size is at least 40:
You are in good shape.
Skewness will not reduce capture rates nor
alter significance levels enough to matter.
If outliers present, then should do two
analyses.
15/40 Guideline
Page 608
Page 608, D12
Page 608, D12
a) Sample of size 10 is a small sample from
a population that may be strongly skewed
toward the higher priced houses.
Check a plot of the data for skewness and
outliers.
A transformation may take care of
skewness.
Page 608, D12
b) This is a large sample (n = 100) from a
population that may be strongly skewed
toward the higher prices.
Now you need not be so concerned about
skewness, but you should still look for
outliers that might affect the results.
Page 608, D12
c) The population of SAT scores is
generally quite normal in shape, so there
is little cause for concern here.
The t-procedure should work fine, so no
transformation is needed.
Page 608, D12
d) Waiting times are notoriously skewed.
A typical distribution of data of this type
would show many small to moderate
times, but a few very long ones.
With a sample size of 20, a transformation
would be necessary to bring the data into
the normal fold.
Page 611, E44
Page 611, E44
a) Yes, it is appropriate to construct a CI
without transforming the data.
The lengths of stay from the sample are
slightly skewed toward the larger values,
but the sample size of 396 is so large a
confidence interval based on t should work
fine.
Page 611, E44
b)
8: TInterval
Inpt: Data Stats
x: 2.91
sx: 1.58
n: 396
C-Level: .90
Calculate
(2.7791, 3.0409)
Page 611, E44
c) No, we should not be more concerned
about constructing a CI without a
transformation if the sample size was 40
instead of nearly 400.
For a sample size of 4, a transformation
should be used to see if the transformed
data look as if they came from a normally
distributed population.
Page 612, E47
Page 612, E47
a) No, the situation looks even worse.
When the original three outliers are
removed, still more outliers are created.
This result is typical of strongly skewed-right
data.
Page 612, E47
b)
All 68 species: (102.34, 686.66)
3 original outliers removed:
Page 612, E47
b)
All 68 species: (102.34, 686.66)
3 original outliers removed: (69.315, 229.49)
Page 612, E47
c) The center of the interval after removing
the outliers is much lower than the center
with all 68 brain weights ( 149.4 vs 394.5)
The width of the second interval is also
much smaller.
Confidence intervals are apparently highly
variable when the distribution is highly
skewed.
Page 612, E47
d) (2.327, 3.627)
I am 95% confident that the mean natural
logarithm of the weight of animals’ brains
is between 2.327 and 3.627 grams.
Questions?