Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Working with Data Part 3 QMM 384 Exploring Data One Variable at a Time We start by studying the variables one at a time for the Equity.jmp file to get familiar with our data and to uncover any data quality issues. Statistics of interest for Continuous variables: – Mean (average) – Standard deviation (dispersion around the mean) – Minimum and maximum values – Number of missing observations – Shape of distribution (symmetrical or skewed) – Outliers – Errors QMM 384 Exploring Data One Variable at a Time • • • Statistics of interest for Nominal and Ordinal variables: – Number of categories – Number of observations in each category – Number of missing observations Use the Columns Viewer and Distribution to provide the statistics discussed above. Go to Cols > Columns Viewer and select all of the variables from the Select Column list. Click Show Summary. QMM 384 Exploring Data One Variable at a Time • For the nominal variables, you can see the number of observations (N), the number of missing values (N Missing), and the number of categories. • How many missing cases are there for the nominal variable JOB? • How many categories are there for the nominal variable REASON? QMM 384 Exploring Data One Variable at a Time • • • • For the continuous variables we see N and N Missing in addition to summary statistics. What two continuous variables have the most number of missing cases? Is this a concern? Min and Max provide ranges of the data for each variable. The mean is the average for each variable and the standard deviation is a measure of dispersion around the mean. QMM 384 Exploring Data One Variable at a Time • • • Next we will explore the information provided by Distribution. You can get to Distribution through the Analyze menu in the Data Table. – Go to Analyze > Distribution and then select all variables, click Y, Columns, and OK. You can also access this information by clicking the Distribution button above the statistics in the window created by the Columns Viewer. – Highlight all of the variables and click Distribution. QMM 384 Exploring Data One Variable at a Time • Categorical variables show bar charts and frequency distributions. • Continuous variables display histograms, box plots, and summary statistics. • The default view is vertical to allow more information to be shown. • To change to a horizontal view, click on the red triangle next to Distribution and select Stack. • To change this permanently, in the main JMP window select File > Preferences > Platforms > Distribution (from the list) > Stack > Apply > OK. QMM 384 Exploring Data One Variable at a Time • • • • Returning to our Equity file, what percentage of the customers have JOB as ProfExe? The graphs also show that LOAN, MORTDUE, and VALUE all have extreme values significantly greater than their means. This means that each distribution is skewed to the right. Another useful feature in JMP is to click on the bar for bad risk customers in the BAD graph (BAD=1). This highlights the bar and also shows us how bad risk customers are distributed across all of the other variables. QMM 384 Exploring Data One Variable at a Time • • • Do you see that more bad risk customers are consolidating debt rather than taking out a loan for home improvement? Also observe that 2 customers have loan ages near 1200 days (see box plot to right of graph). In summary, here are the data quality issues: – Missing values for many variables (DEBTINC and DEROG). – Skewed distributions and a long tail for some variables (LOAN, MORTDUE, VALUE, CLAGE). – Messy data – continuous variables with many zeros (DEROG and DELINQ). – Outliers in CLAGE. QMM 384 Exploring Data One Variable at a Time • • • • Later in this chapter, we will see how to address these issues. Saving your work is important. The easiest way to do this is to save the script (called JSL for JMP Scripting Language) from any JMP output window by clicking on the top red triangle and selecting Script > Save Script to Data Table. The saved script will appear in the table panel of the data table (top left corner). The output can be recreated by clicking on the red triangle and selecting Run Script. QMM 384