Download Information

1. TITLE: Australian Credit Approval 2. USE IN STATLOG 2.1- Testing Mode 10-Fold Cross Validation 2.2- Special Preprocessing Yes (See REMARKS) 2.3- Test Results Algorithm --------Cal5 Itrule LogDisc Discrim Dipol92 Radial Cart Castle Bayes IndCart BackProp C4.5 Smart BayTree KNN Ac2 NewId LVQ Alloc80 Cn2 QuaDisc Default Cascade Kohonen Error Rate ---------0.131 0.137 0.141 0.141 0.141 0.145 0.145 0.148 0.151 0.152 0.154 0.155 0.158 0.171 0.181 0.181 0.181 0.197 0.201 0.204 0.207 0.440 100.000 100.000 3. SOURCES and PAST USAGE 3.1 ORIGINAL SOURCE (confidential) Submitted by [email protected] 3.2 PAST USAGE See Quinlan, * "Simplifying decision trees", Int J Man-Machine Studies 27, Dec 1987, pp. 221-234. * "C4.5: Programs for Machine Learning", Morgan Kaufmann, Oct 1992 3.2. RELEVANT INFORMATION This file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. This dataset is interesting because there is a good mix of attributes -- continuous, nominal with small numbers of values, and nominal with larger numbers of values. There were originally a few missing values, but these have all been replaced by the overall median. 4. DATASET DESCRIPTION NUMBER OF EXAMPLES Total no. = 690 NUMBER OF CLASSES: 2 0,1 (-,+) Class Distribution: +: 307 (44.5%) CLASS 1 -: 383 (55.5%) CLASS 0 NUMBER OF ATTRIBUTES 14 (6 Continuous 8 Categorical) A1: A2: A3: A4: A5: 0,1 CATEGORICAL a,b continuous. continuous. 1,2,3 CATEGORICAL p,g,gg 1, 2,3,4,5, 6,7,8,9,10,11,12,13,14 ff,d,i,k,j,aa,m,c,w, e, q, r,cc, x A6: 1, 2,3, 4,5,6,7,8,9 ff,dd,j,bb,v,n,o,h,z A7: A8: continuous. 1, 0 CATEGORICAL t, f. 1, 0 CATEGORICAL t, f. continuous. 1, 0 CATEGORICAL t, f. 1, 2, 3 CATEGORICAL s, g, p continuous. continuous. A9: A10: A11: A12: A13: A14: CATEGORICAL CATEGORICAL 5-REMARKS: Missing Attribute Values: 37 cases (5%) HAD one or more missing values. The missing values from particular attributes WERE: A1: 12 A2: 12 A4: 6 A5: 6 A6: 9 A7: 9 A14: 13 THESE WERE REPLACED BY THE MODE OF THE ATTRIBUTE (CATEGORICAL) MEAN OF THE ATTRIBUTE (CONTINUOUS) There is no cost matrix. _________________________________________________________________________ ____ Three remarks relating to the StatLog version: THE LABELS HAVE BEEN CHANGED FOR THE CONVENIENCE OF THE STATISTICAL ALGORITHMS. FOR EXAMPLE, ATTRIBUTE 4 ORIGINALLY HAD 3 LABELS p,g,gg AND THESE HAVE BEEN CHANGED TO LABELS 1,2,3. 1. Attributes 4 and 5 of the original WERE APPARENTLY IDENTICAL, so ATTRIBUTE 4 OF THE ORIGINAL WAS REMOVED (for the convenience of the statistical algorithms). 2. Where attributes were categorical, the categories were given numerical labels in the order of the relative risk of being class "+". Treat as categorical if thought desirable. All StatLog trials treated these variables as numerical. 3. A stepwise regression procedure strongly suggests that only attributes A5, A8, A9, A13 and A14 are relevant. Improved results are often obtained if only these five attributes are used. CONTACTS [email protected] [email protected] See README file for general information ========================================================================= =======

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Information