Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

Data transformation The Aim By the end of this lecture, the students will be aware of data transformation methods to make the appropriate statistical tests and be able to transform data by using SPSS. 2 The Goals • Explain why sometimes the data transformation is needed to apply. • Count the effects of data transformation on the data. • Explain how to transform data. • Explain the typical data transformations and how they will impact the data; – – – – – Logarithmic transformation Square root 1/y Square Logit • Must be able to transform the typical data with SPSS. 3 3 • If we switch to our data analysis and when we want to apply the significance tests, we face some of the assumptions of the tests we will apply. • If our data does not meet the assumptions, since we can not apply the relevant statistical analysis, it would be a solution to meet the assumptions of the test by applying data transformation. 4 4 • The reasons for data trasformation applications and the conditions resulting after the application of transformation are: – Our data is not normally distributed. However, normal distribution as the distribution of the assumption is a necessity in many statistical tests. – Spread of data between our groups can show extreme differences. But in some tests, such as t-test for independent groups, the variance parameter assumptions must be equal. – Two variables are not related as linear. However, as in the regression analysis, linearity is a necessary assumption. How data transformation is done? • If we want to apply transformations to any variables in our data set, we must apply the same mathematical operations to all variables of that data. eg: -We want to deal with our variable age but, when we examine our data we see that not fitting a normal distribution. Suppose that "age" value of 100 people found in our dataset. We must apply the same treatment to each age variable (eg: take the square). Ultimately a new tarnsformed varible (eg: “YaşDön”) will appear. 6 6 • As it would be difficult to interpret transformed clinical variables (It is difficult to interpret square of age for clinicians), after doing our analysis, while reporting our results we need to backtransformation (if we tooke the square, this time we take the square root). • We should note that back-transformation can be a problem in some data (When we apply "square transformation” having negative value variable, taking the square root of the results will be misleading.) 7 7 Transformatios -Logarithmic transformation -Square root transformation -Reciprocal transformation -Square transformation -Logit (logistic) transformation 8 Tipical transformations 1. Logarithmic transformation, z = log y • Logarithmic transformation can be done according to logarithmic base 10 or base e. -(log10 or log) -((loge or ln ) • Please note that we can not take the logarithm of zero and negative numbers. Back-transformation of logarithm is called antilog. eg: If you take the logarithm of 100; log10 (100) = 2 Antilog (2) = 102 = 100 9 9 • y is inclined to the right "z = log y" usually have an approximately normal distribution. In this case we are talking about the lognormal distribution with y . • If there is an exponential relationship between y and x, when the graphic is drown as x on the horizontal axis and y on the vertical axis, If an upward sloping graph appears, there would be a linear relationship between "z = log y" and x . 10 10 • Suppose we measure y (eg: height) which is a continuously variable in different groups. As y is large in the groups, variance will be large also. In particular, in the case of being equal of variance coefficent ( standard deviation / mean ) between groups, variances will be equal for “z=log y” variable. 11 11 Şekil: The effects of logarithmic transformations : (a) normalizing agent , and (b ) in that the linearized , (c) variance stabilizing. 12 12 • Since easy interpretation and the data skewed right generally, log transformation is often used in medicine. eg: • As we shall see later, compliance with the normal distribution is an important assumption for many Hypothesis testing. In diyabet.sav data set when we examine "Weight" variable we see that in the histogram graph, the tail of the bell curve (right skewed) is toward right. 13 13 Count By using Kolmogorov-Smirnov or Skewness analysis we may also show that “wieght” variable is not normall distributed: Analyze > Descriptive statistics > Frequencies [“Weight” değişkenini Variables kısmına geçirin, Statistics butonunu tıklayıp Skewness kutucuğunu seçiniz] > Continue > OK. Aşağıdaki çıktı oluşacaktır: N Valid 424 Missing 6 Skewness Std. Error of Skewness 75 50 25 0 50 ,0 75 ,0 10 0,0 12 5,0 15 0,0 Weigh t 1,329 ,119 Since Weight Skewness value (1,329) is pozitive and more than twice of Standard deviation (0.119) we can say that our data is inclined to the right. 14 14 • Analyze > Nonparametric Tests > 1- Sample K-S [“Weight” değişkenini Variables kısmına geçirin] > Continue > OK. Aşağıdaki çıktı oluşacaktır: One-Sample Kolmogorov-Smirnov Test Weight N Normal Parameters(a,b) Most Extreme Differences Kolmogorov-Smirnov Z Asymp. Sig. (2-tailed) Mean Std. Deviation Absolute Positive Negative 424 74,266 15,1381 ,094 ,094 -,045 1,926 ,001 15 15 • Let's make a new variable by taking the logarithm of «Weigth» variable : – Transform>Compute variable>[“Target Variable” alanına “LogWeight”, “Numeric Expression” alanına ise “LG10(weight)” yazalım]>OK • A new variable with a name of "LogWeight" will appear. Let us look at the histogram graph of this variable; – Graphs>Interactive>Histogram [X eksenine “LogWeight” değişkenini sürükleyelim. Üstteki “Histogram” sekmesini tıklayıp “Normal curve” kutucuğunu işaretleyelim]>OK 16 16 N 50 Skewness Std. Error of Skewness 40 Count Valid Missing 424 6 ,248 ,119 • As it is seen the bell curve become symmetrical. • When we look at our new variable skewness value: It is close to twice the standard deviation. • When analyzed by the Kolmogorov –Smirnov; 30 20 10 1,60 1,80 2,00 2,20 LogWei ght N Normal Parameters(a,b) Most Extreme Differences Kolmogorov-Smirnov Z Asymp. Sig. (2-tailed) Mean Std. Deviation Absolute Positive Negative lgweight 424 1,86 ,084 ,053 ,053 -,038 1,091 ,185 17 17 • p value (0.185) is larger than 0.05, so we see that is normally distributed • In the meantime, we must check the normal distribution of our statistical accounts and we need to report the results of that article, but we should note that the most of parametric tests (as will be seen later) are tolerant to slight deviations from normality. 18 18 2. Square root transformation z = √y • The characteristics of this transformation is similar to the log transformation. However, there may be problems in the interpretation of data during the back-transformation. Besides the features of normalizing and linearizing, in the case of as y increases, variance increases (Variance / arithmetic average is fixed) variance stabilizing effect is also present. • Square root transformation is usually used for Poisson type variables. Also, it should be noted that the square root of negative numbers can not be taken. 19 19 3. Reciprocal transformation, z=1/y • As we do not use special techniques, reciprocal transformation is used on survival analysis. • There are similar effects of reciprocal transformation to log transformation. In addition to its normalizing an linearizing abilities, it is more effective at stabilizing variance than the log transformation if the variance increases very markedly with increasing values of y, if the variance divided by the mean is constant. • It should be kept in mind that reciprocal of zero cannot be taken. 20 4. Square transformation, z=y2 -Square transformation does the opposite transformation of log transformation. -If y inclined to the left, z=y2 is usually normally distributed. -If the relationship between two variables, x and y, is such that a line curving downward is produced when we plot y against x, then the relationship between z=y2 and x is approximately linear - If the variance of a continuous variable y, tends to decrease as the value of y increases, then the square transformation, z = y2, stabilizes the variance 21 • Şekil: Effects square transformation: (a) normalizing agent, and (b ) in that the linearized, (c) variance stabilizing. 22 5. Logit (logistic) transformation •This is the transformation we apply most often to each proportion, p, in a set of proportions. We cannot take the logit transformation if either p = 0 or p = 1 • It linearizes a sigmoid curve. 23 • Şekil: Logit transformation effects on a sigmoid curve. 24 Summary Transformations -Logarithmic transformation -Square root transformation -Reciprocal transformation -Square transformation -Logit (logistic) transformation 25