* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Working with Data Part 7
Predictive analytics wikipedia , lookup
Generalized linear model wikipedia , lookup
Inverse problem wikipedia , lookup
Regression analysis wikipedia , lookup
Data analysis wikipedia , lookup
Multidimensional empirical mode decomposition wikipedia , lookup
Least squares wikipedia , lookup
Simplex algorithm wikipedia , lookup
Granular computing wikipedia , lookup
Data assimilation wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Working with Data Part 7 QMM 384 Transforming Variables • • • • • Some of the variables in Equity are highly skewed. Transforming variables make it easier to see patterns in the data and some data mining modeling techniques perform better when data are not skewed. A common approach is to use the Log function to transform skewed variables into variables that are more normally distributed. The new transformed variable, and not the original variable, could be used in any modeling effort. To complete this process perform the following steps. QMM 384 Transforming Variables • Remember that the LOAN was highly skewed. • To transform this variable, from the data table window, right click on the LOAN column heading and select New Formula Column > Transform > Log. • A new column is created called Log[LOAN]. • Next choose Analyze > Distribution and add LOAN and Log[LOAN] to Y, Columns and then OK. • The graphs show that LOAN is highly skewed and Log[LOAN] is more normally distributed. • Log[LOAN] can now be used instead of LOAN. QMM 384 Binning Continuous Data • Binning is used when continuous variables have many missing and/or zero values. • Select DEROG and DELINQ within Cols > Column Viewer > Show Summary and then for the same variables choose Analyze > Distribution. • See that DEROG has 708 missing cases and a significant number of 0 responses. • See that DELINQ has 580 missing cases and a significant number of 0 responses. • We can create a formula to bin the values into categories, with an extra category for missing values. QMM 384 Binning Continuous Data • We can create a new variable with three categories: None (0 responses), 1 or More, and Missing. • To accomplish this for DEROG: – Right click in a blank column heading in the data table and select New Column. Enter DEROG Binned as the Column Name and then OK. – Right click on the variable name DEROG Binned in the data table and select Formula. – Select DEROG from the Table Columns list and from the Functions list select Conditional and If. – Click on DEROG in the If statement formula and from the Functions list select Comparison and a <= b. QMM 384 Binning Continuous Data • DEROG Binning Example Continued – Type 0 in the highlighted box and “None” in the then clause box and then Enter. (Make sure you use the quotes.) – With “None” highlighted hit the ^ symbol 4 times. You must use the symbol in the window and not shift 6. – Click on the first new expr argument and select DEROG and from the Functions list select Comparison and a > b. – Type 0 in the highlighted box and “1 or More” in the then clause box and then Enter. – Click on the second new expr argument and select DEROG and from the Functions list select Comparison and Is Missing. QMM 384 Binning Continuous Data • DEROG Binning Example Continued – Type “Missing in the then clause box and then Enter. – Click on else clause at the end of the formula and hit delete. – Click Apply to view the results in the data table and then OK to accept. – Notice that if DEROG was 0, then DEROG Binned is None; if DEROG was any number greater than 0, then DEROG Binned is 1 or More; and if DEROG was blank, then DEROG Binned is Missing. – DEROG Binned can now be used to build a model. QMM 384