Download Enterprise Mineer vs Underlying SAS Code: A Performance Comparison

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
TM
Miner
Enterprise
vs.
Underlying SAS® Code:
A Performance Comparison
Dr. Usama Hasan
Analytical Consultant
eNiklas (UK) Ltd.
Poster presentation, SEUGI - Dublin, June 20-23, 2000
Author available for discussion: Wed 21 Jun, 9am-1pm
Objective
To investigate the relative merits of using:
• the Enterprise MinerTM GUI
• the underlying SAS® code
for data mining problems
2
Enterprise
TM
Miner
• SAS Institute Inc.’s integrated software product for end-toend data mining solutions
• GUI front-end to SEMMA process
(Sample, Explore, Modify, Model, Assess)
• Statistical tools: linear and logistic regression
• Clustering: k-means, Kohonen
• Decision trees, Neural networks: MLPs, RBFs
• Data preparation: outliers, transformations, sampling
• Visualisation of data and modelling results
3
Enterprise
•
•
•
•
•
•
•
•
•
•
TM
Miner Procedures
DMDB - create a data mining database
DMREG - regression (linear & logistic)
NEURAL - create & train neural networks
SPLIT - create decision trees
DMINE - variable selection by least squares or regression
DMSPLIT - variable selection using chi-squared methods
ASSOC - association of items (market basket analysis)
RULEGEN - generate rules from ASSOC output
SEQUENCE - find time-dependent ASSOC output
STDIZE - standardise numerical variables
4
PROC NEURAL
Arguments & Options
• Required arguments:
– DATA (training data)
– DMDBCAT (metadata catalogue)
• Options:
♦ Network specification
♦ Combination and activation functions for hidden and target units
♦ Objective function for optimisation during training
♦ Training, including preliminary optimisation and stopping
♦ Randomisation
♦ Graphing during training
♦ Test and validation datasets
•
All of these arguments and options can be set via the GUI.
5
PROC NEURAL - Statements
Comparison with functionality in Enterprise MinerTM GUI
Statement(s)
Code, Train,
Nloptions
Role
scoring code,
nonlinear
training opts.
Input, Hidden,
Set up network
Target
layers
Netoptions,
Set up network,
Ranoptions, Freq, random & input
Initial, Prelim
options
Connect, Cut,
Act on network
Delete
layers
Use, Set, Perturb, Act on weights
Freeze, Thaw
Show, Save,
Display and use
Score, Quit
of weights
All options in
GUI?
Comments
YES
Convergence &
Hessian options
not in GUI
Advanced tab
YES
Advanced tab
YES
Advanced tab
YES
Interactive mode
YES
Advanced/interac
tive mode
NO
6
Timing Studies - Datasets Used
Dataset Input
Input
columns types
DNA
60
All
binary
Nominal,
German 20
binary,
credit
ordinal,
Observ- Target
ations
1659
Nominal
(3 level)
600
Binary
Source
Publicdomain
SAS
Institute
interval
Diabetes 8
Heart
disease
13
All
768
interval
Nominal,
270
binary,
ordinal,
interval
Binary
Binary
Publicdomain
Publicdomain
7
Timing Studies - Type
Enterprise MinerTM Process Flow Diagram
• Neural network trained until
convergence, with and without GUI
• CPU times noted
• Accuracy irrelevant for timing studies
8
Timing Studies - Typical Results
Study
Hidden
nodes
Iterations to
convergence
411
Enterprise
TM
Miner
CPU time
148.78 s
PROC
NEURAL
CPU time
148.73 s
DNA
5
German
credit
3
324
39.75 s
39.82 s
Diabetes
10
265
16.80 s
16.96 s
Heart
disease
3
350
28.3 s
29.0 s
9
Conclusions
• Calling PROC NEURAL directly does not
improve execution time (for machines with
ample memory)
• PROC NEURAL gives the expert user full
control over all algorithm options
• PROC NEURAL is the only option where
the GUI is not available or link is too slow
• For extensive data pre-processing,
10
Enterprise MinerTM’s GUI is invaluable
Topics for Further Study
• Other Enterprise MinerTM PROCedures, e.g. SPLIT,
DMREG, DMINE
• Timing studies on machines with limited memory
Contact Details
Dr. Usama Hasan,
eNiklas (UK) Ltd., 2 Shaftesbury Court, Chalvey Park,
Slough SL1 2ER, U.K.
Tel: +44 1753 732100, Fax: +44 1753 732110
Email: [email protected]
11
Related documents