Download CI04-Week9

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Neural networks – Hands on
•
•
•
•
Delta rule and Backpropagation algorithm
MetaNeural format for predictive data mining
Iris Data
Magnetocardiogram data
Neural net yields weights to map inputs to outputs
Molecular
weight

w11
Neural Network

h
w11
H-bonding

Hydrofobicity
Electrostatic
interactions

w34
Molecular
Descriptor

Boiling Point

Biological response
h
w23
Observable
Projection
There are many algorithms that can
determine the weights for ANNs RENSSELAER
McCulloch-Pitts neuron
x1
sum 
w1
w2
 f()
w3
x3
wN
xN
w x
i 1 N
y
i i
y  f sum 
1
f sum  
1  e  sum
RENSSELAER
Neural network as collection of M-P neurons
x1
x2
w 111
w 112
w 113
1
 f() w 211
 f()
 f()
11
 f()
w 22
w 123
Output
w 3 neuron
 f() w 321
 f() w 232
First hidden
layer
Second hidden
layer
E
y
 o
2
noutputs
j 1
j
tj
wjin 1  wjin   w ji
dE
w ji  
dw ji
RENSSELAER
Standard Data Mining Terminology
• Basic Terminology
- MetaNeural Format
- Descriptors, features, response (or activity) and ID
- Classification versus regression
- Modeling/Feature detection
- Training/Validation/Calibration
- Vertical and horizontal view of data
• Outliers, rare events and minority classes
• Data Preparation
- Data cleansing
- Scaling
• Leave-one-out and leave-several-out validation
• Confusion matrix and ROC curves
Standard Data Mining Terminology
• Basic Terminology
- MetaNeural Format
- Descriptors, features, response (or activity) and ID
- Classification versus regression
- Modeling/Feature detection
- Training/Validation/Calibration
- Vertical and horizontal view of data
• Outliers, rare events and minority classes
• Data Preparation
- Data cleansing
- Scaling
• Leave-one-out and leave-several-out validation
• Confusion matrix and ROC curves
Feature 1 Feature_2Feature_3Feature_4 CLASS
7.3
2.9
6.3
1.8
3
5.1
3.8
1.9
0.4
1
5
3.2
1.2
0.2
1
6.8
3.2
5.9
2.3
3
4.6
3.4
1.4
0.3
1
5
3.4
1.6
0.4
1
4.7
3.2
1.6
0.2
1
6
2.2
5
1.5
3
5.2
3.4
1.4
0.2
1
5.1
3.3
1.7
0.5
1
7.2
3.6
6.1
2.5
3
7.1
3
5.9
2.1
3
7.2
3.2
6
1.8
3
6.1
2.8
4.7
1.2
2
6.4
2.8
5.6
2.1
3
6.1
3
4.9
1.8
3
4.8
3
1.4
0.1
1
6.7
3.1
5.6
2.4
3
5
3
1.6
0.2
1
6
2.9
4.5
1.5
2
6.2
2.2
4.5
1.5
2
6.6
2.9
4.6
1.3
2
6.3
2.5
5
1.9
3
4.4
3
1.3
0.2
1
6.5
3
5.2
2
3
5.5
2.5
4
1.3
2
6.7
3.1
4.4
1.4
2
7.7
3.8
6.7
2.2
3
6.5
3.2
5.1
2
3
5.4
3.7
1.5
0.2
1
7.7
2.6
6.9
2.3
3
6.3
3.4
5.6
2.4
3
5.6
2.7
4.2
1.3
2
4.9
2.4
3.3
1
2
5.8
4
1.2
0.2
1
4.9
2.5
4.5
1.7
3
7.9
3.8
6.4
2
3
4.4
2.9
1.4
0.2
1
5.8
2.8
5.1
2.4
3
5.4
3.4
1.5
0.4
1
5.9
3
5.1
1.8
3
ID
108
45
36
144
7
27
30
120
29
24
110
103
126
74
129
128
13
141
26
79
69
59
147
39
148
90
66
118
111
11
119
137
95
58
15
107
132
9
115
32
150
TERMINOLOGY
• Standard Data Mining Problem
• Header and Data
• MetaNeural Format
- descriptors and/or features
- response (or activity to predict)
- pattern ID
- data matrix
• Validation/Calibration
• Training/Validation/Test Set
Demo: iris_view.bat
iris (plant), common name for a family of herbaceous flowering plants. The flowers are
composed of a floral envelope (perianth) with six petal-like segments, three or six stamens, and
an ovary enclosed by the base of the perianth. About 1800 species exist, placed in more than
90 genera. The family has many horticulturally important members; most are as well known by
their scientific names as by common names, including crocuses, irises, and tiger-flowers.
Members of the family generally have long and narrow basal leaves in two ranks and a showy
perianth. In the iris genus itself the inner three segments, called standards, are erect and
narrowed at the base. The outer three are also narrowed, but usually droop and are called falls.
The beard in bearded irises consists of a group of colored hairs on the upper surface of each of
the falls.
Some 200 species of iris are divided into two groups. The first has creeping, underground
stems, or rhizomes; it includes the bearded, or German, irises and the Japanese and Siberian,
or beardless, irises. The second group has bulbs, modified underground buds with fleshy leaf
bases; it includes the Dutch, Spanish, and English varieties.
Aside from its horticultural value, the iris family is of little economic importance. Rhizomes of
several species, mainly the orris, are dried and powdered to obtain orris root, used in perfume
and other cosmetics. Saffron, used as a dye and to color and flavor food, is obtained from the
three-parted stigmas of the saffron crocus. This species has been cultivated for a very long time
and is no longer found in the wild. Its commercial importance is declining, however, because
hand labor is required for harvesting.
Scientific classification: Iris is the common name for the family Iridaceae. The orris is
classified as Iris germanica variety florentina and the saffron crocus as Crocus sativus.
Contributed by:
Marshall R. Crosby1
1"Iris (plant)," Microsoft® Encarta® 97 Encyclopedia. © 1993-1996 Microsoft
Corporation. All rights reserved.
UC URVINE DATA REPOSITORY
Datafile Name: Fisher's Iris
Datafile Subjects: Agriculture , Famous datasets
Description: This is a dataset made famous by Fisher, who used it to illustrate principles of discriminant analysis. It contains
6 variables with 150 observations.
Reference: Fisher, R. A. (1936). The Use of Multiple Measurements in Axonomic Problems. Annals of Eugenics 7, 179-188.
Story Names: Fisher's Irises
REM IRIS2.BAT: PREPARING AND EXPLORING IRIS DATA
Authorization: free use
REM PREPARE IRIS DATA (option 5)
analyze num_eg.txt 3301
Number of cases: 150
Variable Names:
1.Species_No: Flower species as a code
2.Species_Name: Species name
3.Petal_Width: Petal Width
4.Petal_Length: Petal Length
5.Sepal_Width: Sepal Width
6.Sepal_Length: Sepal Length
REM MAKE FILE TAB SEPARATED
analyze iris.txt 100
copy iris.txt.txt iris.txt
erase *.txt.txt
REM MAKE GENERIC LABELS
analyze iris.txt 116
REM SCRAMBLE DATA (100 2)
analyze iris.txt 20
copy cmatrix.txt +dmatrix.txt iris.txt
REM MAKE CORRELATION MATRIX
analyze iris.txt 28
REM VIEW COVARIANCE PLOT
analyze cov.txt 3309
pause
REM MAKE PHARMAPLOT
REM MAHALANOBIS SCALE DATA FIRST
analyze iris.txt -3
copy iris.txt.txt iris.txt
analyze iris.txt 36
copy iris.txt.txt pharma.txt
REM JAVAPLOTS
REM VIEW MAHALINOBIS SCALED DATA
analyze iris.txt 3311
pause
REM VIEW PHARMAPLOT
analyze pharma.txt 3308
pause
exit
• ANALYZE code has neural networks modules built-in
• Either run:
analyze root.pat 4331 (single training and testing)
analyze root.pat 4332 (LOO)
analyze root.txt 4333 (bootstrap mode)
• Results for analyze are in resultss.xxx and resultss.ttt
• Note that patterns have to be properly scaled first
• The file name meta overrides the default input file for analyze





Neural Network Module in Analyze Code
ROOT
ROOT.PAT
ROOT.TES
(ROOT.WGT)
(ROOT.FWT)
(ROOT.DBD)
• Use Analyze root 4331 for easy way
(the file meta let you override defaults)
Analyze
resultss.XXX
resultss.TTT
ROOT.TRN
(ROOT.DBD)
ROOT.WGT
ROOT.FWT
MetaNeural Input File for the ROOT
Generating and Scaling Data
4
=> 4 layers
2 => 2 inputs
16 => # hidden neurons in layer #1
4 => # hidden neurons in layer# 2
1 => # outputs
300 => epoch length (hint:always use 1, for the entire batch)
0.01 => learning parameters by weight layer (hint: 1/# patterns or 1/# epochs)
0.01
0.01
0.5 => momentum parameters by weight layer (hint use 0.5)
0.5
0.5
10000000 => some very large number of training epochs
200 => error display refresh rate
1
=>sigmoid transfer function
1
=> Temperature of sigmoid
check.pat => name of file with training patterns (test patterns in root.tes)
0
=> not used (legacy entry)
100
=> not used (legacy entry)
0.02000 => exit training if error < 0.02
0
=> initial weights from a flat random distribution
0.2
=> initial random weights all fall between –2 and +2
Generating and Scaling Iris Data
REM GENERATE IRIS DATA (5)
analyze iris.txt 3301
REM DECAPITATE HEADER
analyze iris.txt 100
REM SCALE DATA
analyze iris.txt.txt 100
REM SCALE DATA
analyze iris.txt.txt.txt 8
REM SPLIT DATA IN TRAINING & TEST DATA (100 2)
analyze iris.txt.txt.txt.txt 20
copy cmatrix.txt a.pat
copy dmatrix.txt a.tes
REM VIZUALIZE TRAINING DATA (3)
analyze a.pat 3350
pause
erase iris.txt.*
Run Neural Net for Iris Data
erase *.wgt
REM TRAIN/TEST ANN
pause
analyze a.pat 4336
pause
REM DESCALE DATA
analyze resultss.xxx -4
copy results.ttt results.xxx
analyze resultss.ttt -4
REM GENERATE CONFUSION MATRIX (3)
analyze results.ttt 4242
type confusion.txt
pause
REM VISUALIZE RESULTS
analyze resultss.ttt 3313
pause
analyze results.ttt 3305
pause
Related documents