Download Information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1. TITLE:
Australian Credit Approval
2. USE IN STATLOG
2.1- Testing Mode
10-Fold Cross Validation
2.2- Special Preprocessing
Yes (See REMARKS)
2.3- Test Results
Algorithm
--------Cal5
Itrule
LogDisc
Discrim
Dipol92
Radial
Cart
Castle
Bayes
IndCart
BackProp
C4.5
Smart
BayTree
KNN
Ac2
NewId
LVQ
Alloc80
Cn2
QuaDisc
Default
Cascade
Kohonen
Error Rate
---------0.131
0.137
0.141
0.141
0.141
0.145
0.145
0.148
0.151
0.152
0.154
0.155
0.158
0.171
0.181
0.181
0.181
0.197
0.201
0.204
0.207
0.440
100.000
100.000
3. SOURCES and PAST USAGE
3.1 ORIGINAL SOURCE
(confidential)
Submitted by [email protected]
3.2 PAST USAGE
See Quinlan,
* "Simplifying decision trees", Int J Man-Machine Studies 27,
Dec 1987, pp. 221-234.
* "C4.5: Programs for Machine Learning", Morgan Kaufmann, Oct
1992
3.2.
RELEVANT INFORMATION
This file concerns credit card applications. All attribute names
and values have been changed to meaningless symbols to protect
confidentiality of the data.
This dataset is interesting because there is a good mix of
attributes -- continuous, nominal with small numbers of
values, and nominal with larger numbers of values. There
were originally a few missing values, but these have all
been replaced by the overall median.
4. DATASET DESCRIPTION
NUMBER OF EXAMPLES
Total no. = 690
NUMBER OF CLASSES: 2
0,1 (-,+)
Class Distribution:
+: 307 (44.5%)
CLASS 1
-: 383 (55.5%)
CLASS 0
NUMBER OF ATTRIBUTES
14 (6 Continuous 8 Categorical)
A1:
A2:
A3:
A4:
A5:
0,1
CATEGORICAL
a,b
continuous.
continuous.
1,2,3
CATEGORICAL
p,g,gg
1, 2,3,4,5, 6,7,8,9,10,11,12,13,14
ff,d,i,k,j,aa,m,c,w, e, q, r,cc, x
A6:
1, 2,3, 4,5,6,7,8,9
ff,dd,j,bb,v,n,o,h,z
A7:
A8:
continuous.
1, 0
CATEGORICAL
t, f.
1, 0
CATEGORICAL
t, f.
continuous.
1, 0
CATEGORICAL
t, f.
1, 2, 3
CATEGORICAL
s, g, p
continuous.
continuous.
A9:
A10:
A11:
A12:
A13:
A14:
CATEGORICAL
CATEGORICAL
5-REMARKS:
Missing Attribute Values:
37 cases (5%) HAD one or more missing values.
The missing
values from particular attributes WERE:
A1: 12
A2: 12
A4:
6
A5:
6
A6:
9
A7:
9
A14: 13
THESE WERE REPLACED BY THE MODE OF THE ATTRIBUTE
(CATEGORICAL)
MEAN OF THE ATTRIBUTE (CONTINUOUS)
There is no cost matrix.
_________________________________________________________________________
____
Three remarks relating to the StatLog version:
THE LABELS HAVE BEEN CHANGED FOR THE CONVENIENCE OF THE
STATISTICAL
ALGORITHMS.
FOR EXAMPLE, ATTRIBUTE 4 ORIGINALLY HAD 3 LABELS
p,g,gg
AND THESE HAVE BEEN CHANGED TO LABELS 1,2,3.
1.
Attributes 4 and 5 of the original WERE APPARENTLY IDENTICAL,
so ATTRIBUTE 4 OF THE ORIGINAL WAS REMOVED
(for the convenience of the statistical algorithms).
2. Where attributes were categorical, the categories were given
numerical
labels in the order of the relative risk of being class "+". Treat
as categorical if thought desirable.
All StatLog trials treated
these variables as numerical.
3.
A stepwise regression procedure strongly suggests that only
attributes A5, A8, A9, A13 and A14 are relevant.
Improved results
are often obtained if only these five attributes are used.
CONTACTS
[email protected]
[email protected]
See README file for general information
=========================================================================
=======
Related documents