Download Stat 6601 Project: Neural Networks (V&R 6.3)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuroeconomics wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Neural coding wikipedia , lookup

Optogenetics wikipedia , lookup

Apical dendrite wikipedia , lookup

Artificial neural network wikipedia , lookup

Synaptogenesis wikipedia , lookup

Single-unit recording wikipedia , lookup

Neuroanatomy wikipedia , lookup

Central pattern generator wikipedia , lookup

Neural modeling fields wikipedia , lookup

Narrowing of algebraic value sets wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Sparse distributed memory wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Development of the nervous system wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Axon wikipedia , lookup

Metastability in the brain wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Synaptic gating wikipedia , lookup

Pattern recognition wikipedia , lookup

Biological neuron model wikipedia , lookup

Backpropagation wikipedia , lookup

Nervous system network models wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Recurrent neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
Stat 6601 Project:
Neural Networks
(V&R 6.3)
Group Members:
Xu Yang
Haiou Wang
Jing Wu
5/24/2017
1
Definition

Neural Network
A broad class of models that mimic functioning inside
the human brain


There are various classes of NN models.
They are different from each other
depending on:
(1) Problem types, prediction, Classification , Clustering
(2) Structure of the model
(3) Model building algorithm
We will focus on feed-forward neural network.
5/24/2017
2
A bit of biology . . .
Most important functional unit in human brain – a class of cells called –
NEURON
Dendrites
Cell Body
Axon
Synapse
Neurons
Neural Nework
• Dendrites – Receive information • Cell Body – Process information
• Axon – Carries processed information to other neurons
• Synapse – Junction between Axon end and Dendrites of other Neurons
5/24/2017
3
An Artificial Neuron
Dendrites
X1
X2
Xp
w1
w2
..
.
wp
Cell Body
Axon
Direction of flow of Information
I
f
V = f(I)
I = w1X1 + w2X2
+ w3X3 +… + wpXp
• Receives Inputs X1 X2 … Xp from other neurons or environment
• Inputs fed-in through connections with ‘weights’
• Total Input = Weighted sum of inputs from all sources
• Transfer function (Activation function) converts the input to output
• Output goes to other neurons or environment
5/24/2017
4
Simplest but most common
form (One hidden layer)
yk  o ( k   whkh ( h   wih xi ))
h
5/24/2017
i
5
Choice for Activation function
1
1
1
0.5
0
-1
Logistic
Tanh
(hyperbolic tangent)
f(x) =
(ex
–
e-x)
5/24/2017
/
(ex
+
e-x)
f(x) = ex / (1 + ex)
0
Threshold
0 if x< 0
f(x) =
1 if x >= 1
6
A collection of neurons form a layer
Input Layer
Hidden Layer
- Each neuron gets ONLY
- Connects Input and
one input, directly from outside Output layers
x1
wij
Output Layer
- Output of each neuron
directly goes to outside
x2
x3
x4
Input layer
5/24/2017
Hidden
Layer(s)
Outputs
7
More general format

Skip-layer connections
yk   o ( k   wik xi   w jk h ( j   wij xi ))
wij
Input layer
5/24/2017
i k
j k
Hidden
Layer(s)
i j
Outputs
8
Fitting criteria


Least squares
E
Maximum likelihood
p
p 2
||
t

y
||

p
p
 p

tkp
1

t
p
k
E   tk log p  (1  tk ) log
p
y
1

y
p k 
k
k 

Log likelihood
E    t log p ,
p
k
p

k
p
k
p
p
k

e

y kp
K
c 1
e
ycp
One way to ensure f is smooth: E+λC(f )
5/24/2017
9
Usage of nnet in R

nnet.formula(formula, data=NULL, weights, ..., subset,
na.action=na.fail, contrasts=NULL)
size: number of units in the hidden layer. Can be zero if there are skip-layer units.
Wts: initial parameter vector. If missing chosen at random.
linout: switch for linear output units. Default logistic output units.
entropy: switch for entropy (= maximum conditional likelihood) fitting. Default by leastsquares.
softmax: switch for softmax (log-linear model) and maximum conditional.
skip: Logical for links from inputs to outputs.
formula: A formula of the form 'class ~ x1 + x2 + ...'
weights: (case) weights for each example - if missing defaults to 1.
rang: if Wts is missing, use random weights from runif(n, -rang, rang).
decay: Parameter λ.
maxit: maximum of iterations for the optimizer.
Hess: Should the Hessian matrix at the solution be returned?
trace: logical for output form the optimizer.
5/24/2017
10
An Example
Code:
library(MASS)
library(nnet)
attach(rock)
area1<-area/10000; peri1<-peri/10000
rock1<-data.frame(perm, area=area1, peri=peri1, shape)
rock.nn<-nnet(log(perm)~area + peri +shape, rock1, size=3,
decay=1e-3, linout=T, skip=T, maxit=1000, hess=T)
summary(rock.nn)
5/24/2017
11
Output
# weights: 19
initial value
1712.850737
iter 10 value 34.726352
iter 20 value 32.725356
iter 30 value 30.677100
iter 40 value 29.430856
………………………………….
iter 140 value 13.658571
iter 150 value 13.248229
iter 160 value 12.941181
iter 170 value 12.913059
iter 180 value 12.904267
iter 190 value 12.901672
iter 200 value 12.900292
iter 210 value 12.899496
final value 12.899400
converged
5/24/2017
> summary(rock.nn)
a 3-3-1 network with 19 weights
options were - skip-layer connections linear output units
decay=0.001
b->h1 i1->h1 i2->h1 i3->h1
9.48 -7.39 -14.60 6.94
b->h2 i1->h2 i2->h2 i3->h2
1.92 -11.87 -2.88 7.36
b->h3 i1->h3 i2->h3 i3->h3
-0.03 -11.12 15.61 4.62
b->o h1->o h2->o h3->o i1->o i2->o i3->o
2.64 3.89 11.90 -17.76 -0.06 4.73 -0.38
>sum((log(perm)-predict(rock.nn))^2)
[1] 11.39573
12
Use the same method from previous
section to view the fitted surface
Code:
Xp <- expand.grid(area = seq(0.1, 1.2, 0.05), peri =
seq(0, 0.5, 0.02), shape = 0.2)
trellis.device()
rock.grid <- cbind(Xp, fit = predict(rock.nn,Xp))
## S: Trellis 3D Plot
wireframe(fit ~ area + peri, rock.grid, screen = list(z =
160, x = -60), aspect = c(1, 0.5), drape = T)
5/24/2017
13
Output
8
6
4
2
0
fit
-2
fit
area
-4
-6
5/24/2017
14
Experiment to show key factor
which affects the degree of fit
attach(cpus)
cpus3 <- data.frame(syct = syct-2, mmin = mmin-3, mmax = mmax-4, cach = cach/256,
chmin = chmin/100, chmax = chmax/100, perf = perf)
detach()
test.cpus <- function(fit)
sqrt(sum((log10(cpus3$perf) - predict(fit, cpus3))^2)/109)
cpus.nn1 <- nnet(log10(perf) ~ ., cpus3, linout = T, skip = T, size = 0)
test.cpus(cpus.nn1)
[1] 0.271962
cpus.nn2 <- nnet(log10(perf) ~ ., cpus3, linout = T, skip = T, size = 4, decay = 0.01, maxit =
1000)
test.cpus(cpus.nn2)
[1] 0.2130121
cpus.nn3 <- nnet(log10(perf) ~ ., cpus3, linout = T, skip = T, size = 10, decay = 0.01, maxit =
1000)
test.cpus(cpus.nn3)
[1] 0.1960365
cpus.nn4 <- nnet(log10(perf) ~ ., cpus3, linout = T, skip = T, size = 25, decay = 0.01, maxit =
1000)
test.cpus(cpus.nn4)
[1] 0.1675305
5/24/2017
15