Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Working with Data Part 4 QMM 384 Exploring Data Two Variables at a Time • • • In this section, we will explore two variables at a time using Tabulate and Fit Y by X. From the data table window, select Analyze > Tabulate. For those familiar with Pivot Tables in Excel, the Tabulate tool produces similar information. QMM 384 Exploring Data Two Variables at a Time • • • • Try putting two continuous variables in the row and column. Why dose JMP not allow this? Note that we cannot compute statistics like mean, standard deviation, etc. if the variables in the rows and columns are both nominal. We can create a table with BAD in the columns and REASON and JOB in the rows. Use Analyze > Tabulate to create this table. – Drag and drop BAD to the Drop zone for columns. – Drag and drop REASON to the Drop zone for rows (directly to the left of 4771). QMM 384 Exploring Data Two Variables at a Time • • – Next drag and drop JOB to the location directly below HomeImp. – When you do this correctly, a blue box will appear and that is where to drop JOB. – Next, drag and drop N and Row % to the results area. We see more loans for consolidating debt than home improvement. Sales and Self have higher percentage of more bad loans than other categories. QMM 384 Exploring Data Two Variables at a Time • Use Tabulate to create a table where BAD is in the columns and all of the continuous variables are in the rows. • Compute the Mean and the number of missing cases. (Dropping Mean on Sum replaces the sum values with the mean values.) • What is the mean for CLAGE for all of the cases where BAD is 0 and how many missing cases are there for YOJ when BAD is 1? QMM 384 Exploring Data Two Variables at a Time • FIT Y by X allows us to “statistically” analyze the relationship between two variables. • Choose Analyze > Fit Y by X. • The icons on the bottom left indicate what type of statistical analysis can be performed based on the variable types for the X and Y variables. • Make BAD the Y,Response variable and LOAN and JOB the X,Factor variables. • Since BAD is nominal and LOAN is continuous, logistic regression can be used. QMM 384 Exploring Data Two Variables at a Time • The line represents the probability that BAD=0. Since the logistic plot line for LOAN is positive, this means that the probability of low risk increases as the loan amount increases. • We will not discuss the statistical results since they will be covered in great detail later in the class. • Since BAD is nominal and JOB is nominal, contingency can be used. • A mosaic plot is displayed along with statistical output. QMM 384 Exploring Data Two Variables at a Time • The width of the bars on the X axis shows the frequency of the job categories relative to one another. • BAD = 0 appears on bottom and BAD =1 appears on the top. • There are very few customers in Sales or Self Employed. • The Other category has the most customers. • Low risk customers are Office Workers and Professional or Executives. QMM 384 Exploring Data Two Variables at a Time • Now start a new analysis. • Select Analyze/Fit Y by X. • Let LOAN be your Y response variable and Job be the X factor. • The appropriate model is a one-way analysis of variance. • ANOVA will test to see if all the means for loan are the same across all difference jobs or is at least one mean loan amount different for one job type. QMM 384 Exploring Data Two Variables at a Time • • • • • • Select Means/ANOVA If your p-value for the ANOVA test is “significant” then at least one mean loan amount is different for one job type. So…which mean is different? Select Compare Means/Tukey’s HSD (Honest Significant Difference) test. What means are significantly different from the others? What means are not significantly different? Try some different multiple comparison methods. What do you find? Try the Analysis of Means test. What do you see? Try changing the alpha-value for the test. What changes? How does the value of alpha affect your analysis? QMM 384