Download Working with Data Part 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Predictive analytics wikipedia , lookup

Generalized linear model wikipedia , lookup

Inverse problem wikipedia , lookup

Regression analysis wikipedia , lookup

Data analysis wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

Least squares wikipedia , lookup

Simplex algorithm wikipedia , lookup

Granular computing wikipedia , lookup

Data assimilation wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Working with Data
Part 7
QMM 384
Transforming Variables
•
•
•
•
•
Some of the variables in Equity are highly skewed.
Transforming variables make it easier to see
patterns in the data and some data mining
modeling techniques perform better when data are
not skewed.
A common approach is to use the Log function to
transform skewed variables into variables that are
more normally distributed.
The new transformed variable, and not the original
variable, could be used in any modeling effort.
To complete this process perform the following
steps.
QMM 384
Transforming Variables
• Remember that the LOAN was highly skewed.
• To transform this variable, from the data table
window, right click on the LOAN column
heading and select New Formula Column >
Transform > Log.
• A new column is created called Log[LOAN].
• Next choose Analyze > Distribution and add
LOAN and Log[LOAN] to Y, Columns and then
OK.
• The graphs show that LOAN is highly skewed
and Log[LOAN] is more normally distributed.
• Log[LOAN] can now be used instead of LOAN.
QMM 384
Binning Continuous Data
• Binning is used when continuous variables have
many missing and/or zero values.
• Select DEROG and DELINQ within Cols >
Column Viewer > Show Summary and then for
the same variables choose Analyze >
Distribution.
• See that DEROG has 708 missing cases and a
significant number of 0 responses.
• See that DELINQ has 580 missing cases and a
significant number of 0 responses.
• We can create a formula to bin the values into
categories, with an extra category for missing
values.
QMM 384
Binning Continuous Data
• We can create a new variable with three
categories: None (0 responses), 1 or More, and
Missing.
• To accomplish this for DEROG:
– Right click in a blank column heading in the data
table and select New Column. Enter DEROG
Binned as the Column Name and then OK.
– Right click on the variable name DEROG Binned
in the data table and select Formula.
– Select DEROG from the Table Columns list and
from the Functions list select Conditional and If.
– Click on DEROG in the If statement formula and
from the Functions list select Comparison and a
<= b.
QMM 384
Binning Continuous Data
• DEROG Binning Example Continued
– Type 0 in the highlighted box and “None” in the then
clause box and then Enter. (Make sure you use the
quotes.)
– With “None” highlighted hit the ^ symbol 4 times. You
must use the symbol in the window and not shift 6.
– Click on the first new expr argument and select
DEROG and from the Functions list select
Comparison and a > b.
– Type 0 in the highlighted box and “1 or More” in the
then clause box and then Enter.
– Click on the second new expr argument and select
DEROG and from the Functions list select
Comparison and Is Missing.
QMM 384
Binning Continuous Data
• DEROG Binning Example Continued
– Type “Missing in the then clause box and then Enter.
– Click on else clause at the end of the formula and hit
delete.
– Click Apply to view the results in the data table and
then OK to accept.
– Notice that if DEROG was 0, then DEROG Binned is
None; if DEROG was any number greater than 0,
then DEROG Binned is 1 or More; and if DEROG was
blank, then DEROG Binned is Missing.
– DEROG Binned can now be used to build a model.
QMM 384