Download See 8.5 Transformations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Renormalization group wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
See 8.5 Transformations
If we find that our data don't look very normal, we have a few options.
1) Use a different distribution. For example, Weibull distributions are often used
to model lifetimes of machines.
2) There are "nonparametric methods" that don't rely on any assumption about
the underlying distribution. These methods sacrifice some power. When a
distribution can be used, you are better off doing so.
3) Re-scale (transform) the data so that the normality assumption appears to
hold.
In this class, we will cover this transformation option. In many situations, a log
transformation works well.
For log transformed data it is useful to back-transform means to the original scale
so readers can compare the values as originally measured. This back-transformed
mean is the geometric mean.
Example:
Y X=logY
1
0
10
1
100
2
Y = 37 X = 1
y
x
~
IDX
The arithmetic mean is
Y = 37
The geometric mean is lOx
It's always true that:
= 101= 10
geometric mean
< arithmetic mean
A random variable, Y, such that 10g(Y) or In(Y) is normal is called a lognormal
random variable.
If In(Y) = X
rv
N(JL,(72),then E(Y) = eIJ+!q2 Mean of Y
eIJ is the population
--------
geometric mean of Y.
How do we find a confidence interval for the geom~ric mean?
l!sing the X
= In(Y)
values, we find a 95% confidence interval for I-t,(ilL, {Lu)
ilL= X - t~
ilu = X + tlfn
P({LL < I-t < {Lu)
P(eJLL
<
e/J
Comparing
P(XI
<
= 0.95
= 0.95
e/Ju)
Confidence Interval for Geometric Mean
2 groups with log transformation
- X2 -
tSEXI-X2
P(eXl-x2-tSEi21-i22
<
::~
< P,l - 1-t2< Xl -
X2 +
tSExI-X2)= 0.95
< eXI-X2+tSEi21-i22)
= 0.95
Backtransforming confidence intervals for the difference in means in the log scale
gives a confidence interval for the ratio of the geometric means. For at = a~,
the ratio of the population geometric means in also the ratio of the means, so the
backtransformed logscale confidence interval is an interval for the ratio of the means
in the original scale.
We just backtransform the endpoints of the confidence interval in the In scale.
Or we could find a confidence interval for logY and then find (10JJL,10JJU)
Note: This is a confidence interval for the geometric mean of Y, not the mean of
Y.
If In(Y) rv N(p"a2) and a2 is small, then the coefficientof variation, CV, for Y in
the orginal scale is approximately a * 100%
The coefficient of variation is defined as
CV
CV
= StandardMeanDeviation X 1000170
= relative variation
For example if In(Y) has standard deviation 0.01, then the coefficient of variation
of Y is about 0.01*100 = 1%
Standard deviations in the In scale represent relative variation in the original scale.
So if populations have similar coefficients of variation, then In(Y) or 10g(Y) will
have similar variances.
It's not uncommon for groups with bigger values to have more variation than treatment groups with smaller values. But often the groups have similar relative variation, CV's. In these situations, transforming to a logarithmic scale can put us into
the usual ANOVA situation with equal variances.
--
~
-
----
-
--~
-
-
..
In other situations,
different transformations
help. - .
X -l
-y
Sometimes
Y = time
X
Y
Y
= rate = ~
Poisson
= number of random events in time or space
f'J
vYmakes variances
Y
f'J
nearly equal as long as J.ty isn't too small
Binomial
Y = number of successes in n independent trials arcsin vYmakes variances nearly
equal as long as the probability of success isn't too close to 0 or 1.
i -and
..;x
l
X -I
..;x
= X1/2
x-
are special cases of power functions.
The In(X) function is related to a power function by
.
11m
'\-+0
X,\
A
1
-
1
- n (X)
The family of power transformations in the form
X>'-l
---xare called Box-Cox transformations.
SAS procedure TRANSREG and other software packages allow you to see how
different power transformations, including a logarithmic transformation, perform
in giving us normally distributed data with equal variances.
--
--
-
-
~
---~