Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
TM Miner Enterprise vs. Underlying SAS® Code: A Performance Comparison Dr. Usama Hasan Analytical Consultant eNiklas (UK) Ltd. Poster presentation, SEUGI - Dublin, June 20-23, 2000 Author available for discussion: Wed 21 Jun, 9am-1pm Objective To investigate the relative merits of using: • the Enterprise MinerTM GUI • the underlying SAS® code for data mining problems 2 Enterprise TM Miner • SAS Institute Inc.’s integrated software product for end-toend data mining solutions • GUI front-end to SEMMA process (Sample, Explore, Modify, Model, Assess) • Statistical tools: linear and logistic regression • Clustering: k-means, Kohonen • Decision trees, Neural networks: MLPs, RBFs • Data preparation: outliers, transformations, sampling • Visualisation of data and modelling results 3 Enterprise • • • • • • • • • • TM Miner Procedures DMDB - create a data mining database DMREG - regression (linear & logistic) NEURAL - create & train neural networks SPLIT - create decision trees DMINE - variable selection by least squares or regression DMSPLIT - variable selection using chi-squared methods ASSOC - association of items (market basket analysis) RULEGEN - generate rules from ASSOC output SEQUENCE - find time-dependent ASSOC output STDIZE - standardise numerical variables 4 PROC NEURAL Arguments & Options • Required arguments: – DATA (training data) – DMDBCAT (metadata catalogue) • Options: ♦ Network specification ♦ Combination and activation functions for hidden and target units ♦ Objective function for optimisation during training ♦ Training, including preliminary optimisation and stopping ♦ Randomisation ♦ Graphing during training ♦ Test and validation datasets • All of these arguments and options can be set via the GUI. 5 PROC NEURAL - Statements Comparison with functionality in Enterprise MinerTM GUI Statement(s) Code, Train, Nloptions Role scoring code, nonlinear training opts. Input, Hidden, Set up network Target layers Netoptions, Set up network, Ranoptions, Freq, random & input Initial, Prelim options Connect, Cut, Act on network Delete layers Use, Set, Perturb, Act on weights Freeze, Thaw Show, Save, Display and use Score, Quit of weights All options in GUI? Comments YES Convergence & Hessian options not in GUI Advanced tab YES Advanced tab YES Advanced tab YES Interactive mode YES Advanced/interac tive mode NO 6 Timing Studies - Datasets Used Dataset Input Input columns types DNA 60 All binary Nominal, German 20 binary, credit ordinal, Observ- Target ations 1659 Nominal (3 level) 600 Binary Source Publicdomain SAS Institute interval Diabetes 8 Heart disease 13 All 768 interval Nominal, 270 binary, ordinal, interval Binary Binary Publicdomain Publicdomain 7 Timing Studies - Type Enterprise MinerTM Process Flow Diagram • Neural network trained until convergence, with and without GUI • CPU times noted • Accuracy irrelevant for timing studies 8 Timing Studies - Typical Results Study Hidden nodes Iterations to convergence 411 Enterprise TM Miner CPU time 148.78 s PROC NEURAL CPU time 148.73 s DNA 5 German credit 3 324 39.75 s 39.82 s Diabetes 10 265 16.80 s 16.96 s Heart disease 3 350 28.3 s 29.0 s 9 Conclusions • Calling PROC NEURAL directly does not improve execution time (for machines with ample memory) • PROC NEURAL gives the expert user full control over all algorithm options • PROC NEURAL is the only option where the GUI is not available or link is too slow • For extensive data pre-processing, 10 Enterprise MinerTM’s GUI is invaluable Topics for Further Study • Other Enterprise MinerTM PROCedures, e.g. SPLIT, DMREG, DMINE • Timing studies on machines with limited memory Contact Details Dr. Usama Hasan, eNiklas (UK) Ltd., 2 Shaftesbury Court, Chalvey Park, Slough SL1 2ER, U.K. Tel: +44 1753 732100, Fax: +44 1753 732110 Email: [email protected] 11