Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
See 8.5 Transformations If we find that our data don't look very normal, we have a few options. 1) Use a different distribution. For example, Weibull distributions are often used to model lifetimes of machines. 2) There are "nonparametric methods" that don't rely on any assumption about the underlying distribution. These methods sacrifice some power. When a distribution can be used, you are better off doing so. 3) Re-scale (transform) the data so that the normality assumption appears to hold. In this class, we will cover this transformation option. In many situations, a log transformation works well. For log transformed data it is useful to back-transform means to the original scale so readers can compare the values as originally measured. This back-transformed mean is the geometric mean. Example: Y X=logY 1 0 10 1 100 2 Y = 37 X = 1 y x ~ IDX The arithmetic mean is Y = 37 The geometric mean is lOx It's always true that: = 101= 10 geometric mean < arithmetic mean A random variable, Y, such that 10g(Y) or In(Y) is normal is called a lognormal random variable. If In(Y) = X rv N(JL,(72),then E(Y) = eIJ+!q2 Mean of Y eIJ is the population -------- geometric mean of Y. How do we find a confidence interval for the geom~ric mean? l!sing the X = In(Y) values, we find a 95% confidence interval for I-t,(ilL, {Lu) ilL= X - t~ ilu = X + tlfn P({LL < I-t < {Lu) P(eJLL < e/J Comparing P(XI < = 0.95 = 0.95 e/Ju) Confidence Interval for Geometric Mean 2 groups with log transformation - X2 - tSEXI-X2 P(eXl-x2-tSEi21-i22 < ::~ < P,l - 1-t2< Xl - X2 + tSExI-X2)= 0.95 < eXI-X2+tSEi21-i22) = 0.95 Backtransforming confidence intervals for the difference in means in the log scale gives a confidence interval for the ratio of the geometric means. For at = a~, the ratio of the population geometric means in also the ratio of the means, so the backtransformed logscale confidence interval is an interval for the ratio of the means in the original scale. We just backtransform the endpoints of the confidence interval in the In scale. Or we could find a confidence interval for logY and then find (10JJL,10JJU) Note: This is a confidence interval for the geometric mean of Y, not the mean of Y. If In(Y) rv N(p"a2) and a2 is small, then the coefficientof variation, CV, for Y in the orginal scale is approximately a * 100% The coefficient of variation is defined as CV CV = StandardMeanDeviation X 1000170 = relative variation For example if In(Y) has standard deviation 0.01, then the coefficient of variation of Y is about 0.01*100 = 1% Standard deviations in the In scale represent relative variation in the original scale. So if populations have similar coefficients of variation, then In(Y) or 10g(Y) will have similar variances. It's not uncommon for groups with bigger values to have more variation than treatment groups with smaller values. But often the groups have similar relative variation, CV's. In these situations, transforming to a logarithmic scale can put us into the usual ANOVA situation with equal variances. -- ~ - ---- - --~ - - .. In other situations, different transformations help. - . X -l -y Sometimes Y = time X Y Y = rate = ~ Poisson = number of random events in time or space f'J vYmakes variances Y f'J nearly equal as long as J.ty isn't too small Binomial Y = number of successes in n independent trials arcsin vYmakes variances nearly equal as long as the probability of success isn't too close to 0 or 1. i -and ..;x l X -I ..;x = X1/2 x- are special cases of power functions. The In(X) function is related to a power function by . 11m '\-+0 X,\ A 1 - 1 - n (X) The family of power transformations in the form X>'-l ---xare called Box-Cox transformations. SAS procedure TRANSREG and other software packages allow you to see how different power transformations, including a logarithmic transformation, perform in giving us normally distributed data with equal variances. -- -- - - ~ ---~