Download Working with Data Part 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

Granular computing wikipedia , lookup

Predictive analytics wikipedia , lookup

Data analysis wikipedia , lookup

Simplex algorithm wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Working with Data
Part 4
QMM 384
Exploring Data Two Variables at a Time
•
•
•
In this section, we will explore two variables at a time
using Tabulate and Fit Y by X.
From the data table window, select Analyze > Tabulate.
For those familiar with Pivot Tables in Excel, the
Tabulate tool produces similar information.
QMM 384
Exploring Data Two Variables at a Time
•
•
•
•
Try putting two continuous variables in the row and
column. Why dose JMP not allow this?
Note that we cannot compute statistics like mean,
standard deviation, etc. if the variables in the rows and
columns are both nominal.
We can create a table with BAD in the columns and
REASON and JOB in the rows.
Use Analyze > Tabulate to create this table.
– Drag and drop BAD to the Drop zone for columns.
– Drag and drop REASON to the Drop zone for rows
(directly to the left of 4771).
QMM 384
Exploring Data Two Variables at a Time
•
•
– Next drag and drop JOB to the location directly below
HomeImp.
– When you do this correctly, a blue box will appear and
that is where to drop JOB.
– Next, drag and drop N and Row % to the results area.
We see more loans for consolidating debt than home
improvement.
Sales and Self have higher percentage of more bad
loans than other categories.
QMM 384
Exploring Data Two Variables at a Time
• Use Tabulate to create a table where BAD is in
the columns and all of the continuous variables
are in the rows.
• Compute the Mean and the number of missing
cases. (Dropping Mean on Sum replaces the
sum values with the mean values.)
• What is the mean for CLAGE for all of the cases
where BAD is 0 and how many missing cases
are there for YOJ when BAD is 1?
QMM 384
Exploring Data Two Variables at a Time
• FIT Y by X allows us to “statistically” analyze
the relationship between two variables.
• Choose Analyze > Fit Y by X.
• The icons on the bottom left indicate what type
of statistical analysis can be performed based
on the variable types for the X and Y variables.
• Make BAD the Y,Response variable and LOAN
and JOB the X,Factor variables.
• Since BAD is nominal and LOAN is continuous,
logistic regression can be used.
QMM 384
Exploring Data Two Variables at a Time
• The line represents the probability that BAD=0.
Since the logistic plot line for LOAN is positive,
this means that the probability of low risk
increases as the loan amount increases.
• We will not discuss the statistical results since
they will be covered in great detail later in the
class.
• Since BAD is nominal and JOB is nominal,
contingency can be used.
• A mosaic plot is displayed along with statistical
output.
QMM 384
Exploring Data Two Variables at a Time
• The width of the bars on the X axis shows the
frequency of the job categories relative to one
another.
• BAD = 0 appears on bottom and BAD =1
appears on the top.
• There are very few customers in Sales or Self
Employed.
• The Other category has the most customers.
• Low risk customers are Office Workers and
Professional or Executives.
QMM 384
Exploring Data Two Variables at a Time
• Now start a new analysis.
• Select Analyze/Fit Y by X.
• Let LOAN be your Y response variable and Job
be the X factor.
• The appropriate model is a one-way analysis of
variance.
• ANOVA will test to see if all the means for loan
are the same across all difference jobs or is at
least one mean loan amount different for one
job type.
QMM 384
Exploring Data Two Variables at a Time
•
•
•
•
•
•
Select Means/ANOVA
If your p-value for the ANOVA test is “significant” then at
least one mean loan amount is different for one job type.
So…which mean is different? Select Compare
Means/Tukey’s HSD (Honest Significant Difference) test.
What means are significantly different from the others?
What means are not significantly different?
Try some different multiple comparison methods. What
do you find?
Try the Analysis of Means test. What do you see? Try
changing the alpha-value for the test. What changes?
How does the value of alpha affect your analysis?
QMM 384