Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Concept learning wikipedia , lookup
Perceptual control theory wikipedia , lookup
Gene expression programming wikipedia , lookup
Machine learning wikipedia , lookup
Neural modeling fields wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Backpropagation wikipedia , lookup
Pattern recognition wikipedia , lookup
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University [email protected] http://cs.tju.edu.cn/faculties/gongxj/course/ai/ Outline Introduction Training a single TLU Network of TLUs—Artificial Neural Network Pros & Cons of ANN Summary Biological /Artificial Neural Network x1 x2 Structure of a typical neuron SMI32-stained pyramidal neurons in cerebral cortex. w1 w2 … wn xn Artificial Intelligence Recognition modeling Neuroscience f(s) F(s) Definition of ANN Stimulate Neural Network: SNN, NN It is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network. Applications of ANN Function approximation, or regression analysis, including time series prediction and modeling. Classification, including pattern and sequence recognition, novelty detection and sequential decision making. Data processing, including filtering, clustering, blind signal separation and compression. Extension of a TLU Threshold Logic Unit -> Perceptron (Neuron) Inputs are not limited be boolean values Outputs are not limited be binary functions Output functions of a perceptron θ 1 f 0 s s 1 f 1 e s Characters of sigmoid function Smooth, continuous, and monotonically increasing (derivative is always positive) Bounded range - but never reaches max or min The logistic function is often used 1 f s 1 e f ' f (1 f ) Linear Separable function by TLU ___ f ( x1 , x\ 2 , x3 ) x1 x2 x3 ___ ___ f ( x1 , x\ 2 ) x1 x2 x1 x2 A network of TLUs 1 x1 y1 -1 x2 XOR 1 0.5 0.5 -1 0.5 y2 1 f ___ ___ f ( x1 , x\ 2 ) x1 x2 x1 x2 Even-Parity Function f x1x 2 x1x 2 Training single neuron What is the learning/training The methods The Delta Procedure The Generalized Delta Procedure The Error-Correction Procedure Reform the representation of a perceptron x1 Summing Junction W1 x2 W2 S=WX … xn Activation Function f = f (s) output Wn xn+1 ≡ 1 Wn+` s i 1 w i x i i 1 w i x i n n 1 w1 w2 s x1 x 2 ... xn xn 1 ... wn wn 1 Gradient Decent Methods Minimizing the squared error of desired response and neuron output Squared error function: ε = (d - f)2 def , ,..., , W w w w w 1 2 n n 1 S W X f S f 2(d f ) X W f S W S The Delta Procedure Using linear function Weight update: f=s W ← W + c (d – f ) X Delta rule (Widrow-Hoff rule) The Generalized Delta Procedure Using sigmoid function f (s) = 1 / (1+e-s) Weight update W ← W + c (d – f ) f (1-f ) X Generalized delta procedure: f (1– f ) → 0 , where f → 0 or f → 1 Weight change can occur only within ‘fuzzy’ region surrounding the hyperplane near the point f = 0.5 The Error-Correction Procedure Using threshold function (output : 0,1) The weight change rule W ← W + c (d – f ) X W←W±cX In the linearly separable case, after finite iterations, W will be converged to the solution. In the nonlinearly separable case, W will never be converged. An example x1=S2+S3 x2=S4+S5 x3=S6+S7 x4=S8+S9 x1 x2 x3 x4 1 W11 W21 W31 W41 W51 east ANN: Its topologies Recurrent ANN Inputs Inputs Outputs Context Layer Outputs Feedback Feedforward Training Neural Network Supervised method Unsupervised method (Self-organization) Trained by matching input and output patterns Input-output pairs can be provided by an external teacher, or by the system An (output) unit is trained to respond to clusters of pattern within the input. There is no a priori set of categories Enforcement learning An intermediate form of the above two types of learning. The learning machine does some action on the environment and gets a feedback response from the environment. The learning system grades its action good (rewarding) or bad (punishable) based on the environmental response and accordingly adjusts its parameters. Supervised training Back-propagation—Notations j Layer # j0 jm j 1 x p1 N 0 # input O p1 T p1 ---Op2 x p2 | | | N j # neurons in layer j x pN 0 | | | ---- | | | | | | O pN M Tp 2 p : the pth pattern of n patterns NM # output | | | T pN M Y ji : output of ith neuron in Layer j ji : the error valu e associated with the ith neuron in Layer j W jik : the connection weight from kth neuron in layer (j - 1) to the ith neuron in Layer j Back-propagation: The method 1. Initialize connection weights into small random values. 2. Present the pth sample input vector of pattern and the corresponding output target to the network X p ( x p1 , x p 2 ,....x pN0 ) Yp (Yp1 , Yp 2 ,....YpNM ) 3. Pass the input values to the first layer, layer 1. For every input node i in layer 0, perform: 4 For every neuron i in every layer j = 1, 2, ..., M, from input to output layer, find the output from the neuron: Y0 i x pi N j 1 Y ji f ( Y( j 1) kW jik ) 5. Obtain output values. For every output node i in layer M, perform: 6.Calculate error value for every neuron i in every layer in backward order j = M, M-1, ... , 2, 1 k 1 O pi YMi The method cont. 6.1 For the output layer, the error value is: Mi YMi (1 YMi )(Tpi YMi ) 6.2 For the hidden layer, the error value is: N j 1 ji Y ji (1 Y ji ) ( j 1) kW( j 1) ki k 1 6.3 The weight adjustment can be done for every connection from neuron k in layer (i-1) to every neuron i in every layer i: Wijk Wijk jiY ji The actions in steps 2 through 6 will be repeated for every training sample pattern p, and repeated for these sets until the root mean square (RMS) of output errors is minimized. NM E p (Tpj O pj ) 2 j 1 Generalization vs. specialization Optimal number of hidden neurons Overtraining: Too many hidden neurons : you get an over fit, training set is memorized, thus making the network useless on new data sets Not enough hidden neurons: network is unable to learn problem concept Too much examples, the ANN memorizes the examples instead of the general idea Generalization vs. specialization trade-off K-fold cross validation is often used Unsupervised method No help from the outside No training data, no information available on the desired output Learning by doing Used to pick out structure in the input: Clustering Reduction of dimensionality compression Kohonen’s Learning Law (SelfOrganization Map) Winner takes all (only update weights of winning neuron) SOM algorithm An example: Kohonen Network. Reinforcement learning Teacher: training data The teacher scores the performance of the training examples Use performance score to shuffle weights ‘randomly’ Relatively slow learning due to ‘randomness’ Anatomy of ANN learning algorithm ANN Learning Unsupervis ed Supervised Reinforcem ent learning Logic inputs Continuous inputs Logic inputs Continuous inputs Hopfield Back propagation ART SOM, Hebb, Pros & Cons of ANN Pros: A neural network can perform tasks that a linear program can not. When an element of the neural network fails, it can continue without any problem by their parallel nature. A neural network learns and does not need to be reprogrammed. It can be implemented in any application. Cons : The neural network needs training to operate. The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated. Requires high processing time for large neural networks. Summary The capability of ANN representations Training a single perceptron Training neural networks The ability of Generalization vs. specialization should be memorized