Download See5 / C5.0

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
柏際股份有限公司 BockyTech, Inc.
Tel:886-2-23618050 Fax:886-2-23619803
See5 / C5.0
5F-5, 70, Yanping S. Rd.,
Taipei 100, Taiwan, R.O.C.
New in Release 2.03
This state-of-the-art system constructs classifiers in the form of decision trees and rulesets. See5/C5.0 has
been designed to operate on large databases and incorporates innovations such as boosting.
Weighting individual cases
The training cases for some applications have different relative importance. In a customer retention
application, for example, the importance of a case describing a customer might depend on the size
of the customer's account. Release 2.03 introduces an optional case weight attribute with numeric
values; the effect is to bias the development of a classifier to increase accuracy on more important
Smaller trees for applications with multi-valued discrete attributes
The algorithms for discrete attributes have been further improved. One noticeable consequence is
that decision trees tend to be both smaller and more accurate when there are discrete attributes with
many values.
Better use of cost information
While the treatment of costs for two-class problems remains much the same, the handling of cost
information for applications with three or more classes has been extensively revised. Muti-class
applications that specify a costs file should now observe lower average misclassification costs for
unseen cases, especially when rulesets are generated.
Other changes and bug fixes
There have been minor modifications to the way soft thresholds for decision trees are found.
Two small bugs in the Windows GUI have been rectified. These concern the display of
implicitly-defined discrete attributes when the value is unknown, and the possible change of
classifier settings when the "Cross-reference" or "Making predictions" windows are invoked
immediately after a cross-validation.
柏際股份有限公司 BockyTech, Inc.
Tel:886-2-23618050 Fax:886-2-23619803
5F-5, 70, Yanping S. Rd.,
Taipei 100, Taiwan, R.O.C.
Data Mining Tools See5 and C5.0
Data mining is all about extracting patterns from an organization's stored or warehoused data. These
patterns can be used to gain insight into aspects of the organization's operations, and to predict outcomes for
future situations as an aid to decision-making.
Patterns often concern the categories to which situations belong. For example, is a loan applicant
creditworthy or not? Will a certain segment of the population ignore a mailout or respond to it? Will a
process give high, medium, or low yield on a batch of raw material?
See5 (Windows 98/Me/2000/XP) and its Unix counterpart C5.0 are sophisticated data mining tools for
discovering patterns that delineate categories, assembling them into classifiers, and using them to make
Some important features:
See5/C5.0 has been designed to analyse substantial databases containing thousands to hundreds of
thousands of records and tens to hundreds of numeric, time, date, or nominal fields.
To maximize interpretability, See5/C5.0 classifiers are expressed as decision trees or sets of if-then
rules, forms that are generally easier to understand than neural networks.
See5/C5.0 is easy to use and does not presume any special knowledge of Statistics or Machine
Learning (although these don't hurt, either!)
RuleQuest provides C source code so that classifiers constructed by See5/C5.0 can be embedded in
your organization's own systems.