Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neural networks – Hands on • • • • Delta rule and Backpropagation algorithm MetaNeural format for predictive data mining Iris Data Magnetocardiogram data Neural net yields weights to map inputs to outputs Molecular weight w11 Neural Network h w11 H-bonding Hydrofobicity Electrostatic interactions w34 Molecular Descriptor Boiling Point Biological response h w23 Observable Projection There are many algorithms that can determine the weights for ANNs RENSSELAER McCulloch-Pitts neuron x1 sum w1 w2 f() w3 x3 wN xN w x i 1 N y i i y f sum 1 f sum 1 e sum RENSSELAER Neural network as collection of M-P neurons x1 x2 w 111 w 112 w 113 1 f() w 211 f() f() 11 f() w 22 w 123 Output w 3 neuron f() w 321 f() w 232 First hidden layer Second hidden layer E y o 2 noutputs j 1 j tj wjin 1 wjin w ji dE w ji dw ji RENSSELAER Standard Data Mining Terminology • Basic Terminology - MetaNeural Format - Descriptors, features, response (or activity) and ID - Classification versus regression - Modeling/Feature detection - Training/Validation/Calibration - Vertical and horizontal view of data • Outliers, rare events and minority classes • Data Preparation - Data cleansing - Scaling • Leave-one-out and leave-several-out validation • Confusion matrix and ROC curves Standard Data Mining Terminology • Basic Terminology - MetaNeural Format - Descriptors, features, response (or activity) and ID - Classification versus regression - Modeling/Feature detection - Training/Validation/Calibration - Vertical and horizontal view of data • Outliers, rare events and minority classes • Data Preparation - Data cleansing - Scaling • Leave-one-out and leave-several-out validation • Confusion matrix and ROC curves Feature 1 Feature_2Feature_3Feature_4 CLASS 7.3 2.9 6.3 1.8 3 5.1 3.8 1.9 0.4 1 5 3.2 1.2 0.2 1 6.8 3.2 5.9 2.3 3 4.6 3.4 1.4 0.3 1 5 3.4 1.6 0.4 1 4.7 3.2 1.6 0.2 1 6 2.2 5 1.5 3 5.2 3.4 1.4 0.2 1 5.1 3.3 1.7 0.5 1 7.2 3.6 6.1 2.5 3 7.1 3 5.9 2.1 3 7.2 3.2 6 1.8 3 6.1 2.8 4.7 1.2 2 6.4 2.8 5.6 2.1 3 6.1 3 4.9 1.8 3 4.8 3 1.4 0.1 1 6.7 3.1 5.6 2.4 3 5 3 1.6 0.2 1 6 2.9 4.5 1.5 2 6.2 2.2 4.5 1.5 2 6.6 2.9 4.6 1.3 2 6.3 2.5 5 1.9 3 4.4 3 1.3 0.2 1 6.5 3 5.2 2 3 5.5 2.5 4 1.3 2 6.7 3.1 4.4 1.4 2 7.7 3.8 6.7 2.2 3 6.5 3.2 5.1 2 3 5.4 3.7 1.5 0.2 1 7.7 2.6 6.9 2.3 3 6.3 3.4 5.6 2.4 3 5.6 2.7 4.2 1.3 2 4.9 2.4 3.3 1 2 5.8 4 1.2 0.2 1 4.9 2.5 4.5 1.7 3 7.9 3.8 6.4 2 3 4.4 2.9 1.4 0.2 1 5.8 2.8 5.1 2.4 3 5.4 3.4 1.5 0.4 1 5.9 3 5.1 1.8 3 ID 108 45 36 144 7 27 30 120 29 24 110 103 126 74 129 128 13 141 26 79 69 59 147 39 148 90 66 118 111 11 119 137 95 58 15 107 132 9 115 32 150 TERMINOLOGY • Standard Data Mining Problem • Header and Data • MetaNeural Format - descriptors and/or features - response (or activity to predict) - pattern ID - data matrix • Validation/Calibration • Training/Validation/Test Set Demo: iris_view.bat iris (plant), common name for a family of herbaceous flowering plants. The flowers are composed of a floral envelope (perianth) with six petal-like segments, three or six stamens, and an ovary enclosed by the base of the perianth. About 1800 species exist, placed in more than 90 genera. The family has many horticulturally important members; most are as well known by their scientific names as by common names, including crocuses, irises, and tiger-flowers. Members of the family generally have long and narrow basal leaves in two ranks and a showy perianth. In the iris genus itself the inner three segments, called standards, are erect and narrowed at the base. The outer three are also narrowed, but usually droop and are called falls. The beard in bearded irises consists of a group of colored hairs on the upper surface of each of the falls. Some 200 species of iris are divided into two groups. The first has creeping, underground stems, or rhizomes; it includes the bearded, or German, irises and the Japanese and Siberian, or beardless, irises. The second group has bulbs, modified underground buds with fleshy leaf bases; it includes the Dutch, Spanish, and English varieties. Aside from its horticultural value, the iris family is of little economic importance. Rhizomes of several species, mainly the orris, are dried and powdered to obtain orris root, used in perfume and other cosmetics. Saffron, used as a dye and to color and flavor food, is obtained from the three-parted stigmas of the saffron crocus. This species has been cultivated for a very long time and is no longer found in the wild. Its commercial importance is declining, however, because hand labor is required for harvesting. Scientific classification: Iris is the common name for the family Iridaceae. The orris is classified as Iris germanica variety florentina and the saffron crocus as Crocus sativus. Contributed by: Marshall R. Crosby1 1"Iris (plant)," Microsoft® Encarta® 97 Encyclopedia. © 1993-1996 Microsoft Corporation. All rights reserved. UC URVINE DATA REPOSITORY Datafile Name: Fisher's Iris Datafile Subjects: Agriculture , Famous datasets Description: This is a dataset made famous by Fisher, who used it to illustrate principles of discriminant analysis. It contains 6 variables with 150 observations. Reference: Fisher, R. A. (1936). The Use of Multiple Measurements in Axonomic Problems. Annals of Eugenics 7, 179-188. Story Names: Fisher's Irises REM IRIS2.BAT: PREPARING AND EXPLORING IRIS DATA Authorization: free use REM PREPARE IRIS DATA (option 5) analyze num_eg.txt 3301 Number of cases: 150 Variable Names: 1.Species_No: Flower species as a code 2.Species_Name: Species name 3.Petal_Width: Petal Width 4.Petal_Length: Petal Length 5.Sepal_Width: Sepal Width 6.Sepal_Length: Sepal Length REM MAKE FILE TAB SEPARATED analyze iris.txt 100 copy iris.txt.txt iris.txt erase *.txt.txt REM MAKE GENERIC LABELS analyze iris.txt 116 REM SCRAMBLE DATA (100 2) analyze iris.txt 20 copy cmatrix.txt +dmatrix.txt iris.txt REM MAKE CORRELATION MATRIX analyze iris.txt 28 REM VIEW COVARIANCE PLOT analyze cov.txt 3309 pause REM MAKE PHARMAPLOT REM MAHALANOBIS SCALE DATA FIRST analyze iris.txt -3 copy iris.txt.txt iris.txt analyze iris.txt 36 copy iris.txt.txt pharma.txt REM JAVAPLOTS REM VIEW MAHALINOBIS SCALED DATA analyze iris.txt 3311 pause REM VIEW PHARMAPLOT analyze pharma.txt 3308 pause exit • ANALYZE code has neural networks modules built-in • Either run: analyze root.pat 4331 (single training and testing) analyze root.pat 4332 (LOO) analyze root.txt 4333 (bootstrap mode) • Results for analyze are in resultss.xxx and resultss.ttt • Note that patterns have to be properly scaled first • The file name meta overrides the default input file for analyze Neural Network Module in Analyze Code ROOT ROOT.PAT ROOT.TES (ROOT.WGT) (ROOT.FWT) (ROOT.DBD) • Use Analyze root 4331 for easy way (the file meta let you override defaults) Analyze resultss.XXX resultss.TTT ROOT.TRN (ROOT.DBD) ROOT.WGT ROOT.FWT MetaNeural Input File for the ROOT Generating and Scaling Data 4 => 4 layers 2 => 2 inputs 16 => # hidden neurons in layer #1 4 => # hidden neurons in layer# 2 1 => # outputs 300 => epoch length (hint:always use 1, for the entire batch) 0.01 => learning parameters by weight layer (hint: 1/# patterns or 1/# epochs) 0.01 0.01 0.5 => momentum parameters by weight layer (hint use 0.5) 0.5 0.5 10000000 => some very large number of training epochs 200 => error display refresh rate 1 =>sigmoid transfer function 1 => Temperature of sigmoid check.pat => name of file with training patterns (test patterns in root.tes) 0 => not used (legacy entry) 100 => not used (legacy entry) 0.02000 => exit training if error < 0.02 0 => initial weights from a flat random distribution 0.2 => initial random weights all fall between –2 and +2 Generating and Scaling Iris Data REM GENERATE IRIS DATA (5) analyze iris.txt 3301 REM DECAPITATE HEADER analyze iris.txt 100 REM SCALE DATA analyze iris.txt.txt 100 REM SCALE DATA analyze iris.txt.txt.txt 8 REM SPLIT DATA IN TRAINING & TEST DATA (100 2) analyze iris.txt.txt.txt.txt 20 copy cmatrix.txt a.pat copy dmatrix.txt a.tes REM VIZUALIZE TRAINING DATA (3) analyze a.pat 3350 pause erase iris.txt.* Run Neural Net for Iris Data erase *.wgt REM TRAIN/TEST ANN pause analyze a.pat 4336 pause REM DESCALE DATA analyze resultss.xxx -4 copy results.ttt results.xxx analyze resultss.ttt -4 REM GENERATE CONFUSION MATRIX (3) analyze results.ttt 4242 type confusion.txt pause REM VISUALIZE RESULTS analyze resultss.ttt 3313 pause analyze results.ttt 3305 pause