* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PowerPoint 簡報
Knowledge representation and reasoning wikipedia , lookup
Convolutional neural network wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Catastrophic interference wikipedia , lookup
Neural modeling fields wikipedia , lookup
Pattern recognition wikipedia , lookup
Concept learning wikipedia , lookup
Type-2 fuzzy sets and systems wikipedia , lookup
Machine learning wikipedia , lookup
®Copyright of Shun-Feng Su The Essence of Computational Intelligence 計算型智慧的基本概念 Offered by 蘇順豐 Shun-Feng Su, E-mail: [email protected] Department of Electrical Engineering, National Taiwan University of Science and Technology 1 March, 2009 ®Copyright of Shun-Feng Su Preface People always dreams of having machines that can act like human. Artificial Intelligence is to study what are those components that can facilitate such a dream. Due to the nature of knowledge, traditional artificial intelligence use symbols to construct the conceptual world. 2 March, 2009 ®Copyright of Shun-Feng Su Preface Symbolic artificial intelligence is very difficult to manipulate for a real world problem, especially, for implementing common sense knowledge. Recently, computational intelligence (CI) is commonly used and has demonstrated good performance in various applications. CI is named to distinguish itself from the traditional symbolic artificial intelligence in the property of easy manipulation with the use of numerical knowledge representation. 3 March, 2009 ®Copyright of Shun-Feng Su Preface The following three methodologies are often considered as CI: Fuzzy Systems, Neural Networks, and Genetic Algorithms (or referred to as Evolutionary Computation. ) This talk is to provide fundamental concepts and ideas in those often mentioned techniques. 4 March, 2009 ®Copyright of Shun-Feng Su Basics for CI CI is known to have the following characteristics [1]: Numerical knowledge representation; Adaptability; Fault tolerance; Fast processing speed ; Error rate optimality. [1] J. C. Bezdek, “what is computational intelligence?” Computational Intelligence: Imitating Life, J. M. Zurada, R. J. Marks II, and C. J. Robinson, Eds., New York: IEEE Press, pp. 1-12, 1994. 5 March, 2009 ®Copyright of Shun-Feng Su Basics for CI Possible advantages of using CI are: Efficiency; Robustness; Good generalization capability; Easy to use; Easy to incorporate problem domain heuristics; Superior performance in various applications. 6 March, 2009 ®Copyright of Shun-Feng Su Basics for CI Possible advantages of using CI are: Generation capability is to have a fair chance to behave as required for any input data. Efficiency; Robustness; Good generalization capability; Easy to use; Easy to incorporate problem domain heuristics; Superior performance in various applications. 7 March, 2009 ®Copyright of Shun-Feng Su Basics for CI Possible problems encountered while using CI are: Incomprehensive in knowledge; Lack of theoretical analysis tools, such as stability, performance guarantee, etc.; Various subjective parameters required; Lack of benchmarks in performance evaluation. May be disadvantages, but sometimes, may provide good means for applications. 8 March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Uncertainty and Its representation Fuzzy operations and Uncertainty Reasoning Fuzzy Logic Control Neural Networks Genetic Algorithms Epilogue 9 March, 2009 ®Copyright of Shun-Feng Su Introduction of Fuzzy Systems Fuzzy systems have been widely used in various applications. In fact, the fundamental idea behind fuzzy systems is to include uncertainty in the process. Such an inclusion provides extra information so that the systems can be more accurate. In other words, fuzzy is vagueness by meaning, but can provides accurate due to this extra information. 10 March, 2009 ®Copyright of Shun-Feng Su Uncertainties in Intelligent systems Uncertainties exists for the following reasons: noise always exists in the environment; facts being true or events occurring may not be certain; stored knowledge is incomplete or liable to change; exceptions are inevitable for any realistic knowledge; simplifications are necessary to reduce the complexity of the system; partitions of continuous variables for rule-based knowledge results in fuzzy set concept. 11 March, 2009 ®Copyright of Shun-Feng Su Uncertainties in Intelligent systems Traditional systems always use nominal values to reason and to make decision. But, to use more information may have more accurate decision making. Thus, to act intelligently, those uncertainties cannot be ignored in the way of computing. To incorporate uncertainties in the decision making process, the system must be capable of representing uncertainty and also be equipped with the capability of approximate reasoning. 12 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Sets As A Representation for Uncertainty The traditional sets are called classical sets or crisp sets. In a crisp set, the membership belonging is crisp and can be described in a simple yes/no answer. That is, an element is either in the set or not in the set. The membership function of A is defined as when x A, 1, A ( x) 0, when x A. 13 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Sets As A Representation for Uncertainty The range of the membership function, , of a fuzzy set A now is the interval [0,1] instead of only binary values {0,1}. Example: Let a fuzzy set A represent the concept “real numbers that are close to 5” and the membership function for A is A ( x) 1 1 10( x 5) 14 2 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Sets As A Representation for Uncertainty For example, When x = 62mph, M(x)=0.4667, F(x)=0.5333. When x = 63mph, M(x)=0.5333, F(x)=0.4667. When x = 69mph, M(x)=0.0667, F(x)=0.9333. 15 March, 2009 ®Copyright of Shun-Feng Su Uncertainty Representations Two often used uncertainty representations: Fuzzy set and Probability. From the uncertainty concept per se viewpoint, those two uncertainties are two different types of uncertainty. fuzzy set is to capture the idea of vagueness: To indicate the degree of uncertainty about what it is. What is rain? What is fast? probability is to capture the idea of ambiguity: To indicate uncertain about whether it is there. Whether it rains? What the outcome of a die will be? 16 March, 2009 ®Copyright of Shun-Feng Su Fuzzy vs. Probability From the mathematical representation viewpoint, they are comparable and possess different reasoning behaviors. Reasoning with probabilities is mathematical sound but is difficult to manipulate due to no modularity. Reasoning with fuzzy sets does not provide mathematical sound inference and is subjective, but it is easy to manipulate. In fact, other types of uncertainty can be found in the literature. 17 March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Uncertainty and Its representation Fuzzy Operations and Uncertainty Reasoning Fuzzy Logic Control Neural Networks Genetic Algorithms Epilogue 18 March, 2009 ®Copyright of Shun-Feng Su Operations of Fuzzy– Extension Principle Given a function f : U V, now the input domain is a fuzzy set A in U. What will be the output? The extension principle states that the fuzzy degree for A will be the fuzzy degree for y=f(A). The concept is to pass the membership degree of x to f(x); i.e. the function itself is crisp and will not introduce any uncertainty. Thus, the membership degree of x will truly appear for f(x). 19 March, 2009 ®Copyright of Shun-Feng Su Extension Principle Two problems arising: f(x) is a many-to-one function: i.e. f(x1)=f(x2), but x1≠x2. Then, the membership degree can be μ(x1) or μ(x2). In other words, the resultant membership degree is μ(x1)μ(x2) . The input domain consists of multiple variables. Then, f(x1, x2, …, xn) is obtained when all x1, x2, …, xn appear. In other words, the membership degree of is μ(x1)μ(x2) … μ(xn). 20 March, 2009 ®Copyright of Shun-Feng Su Extension Principle The extension principle allows the generalization of crisp mathematical concept to the fuzzy set framework, and extends point to point mapping to mapping for fuzzy sets. It provides a means for any function f that maps an ntuple (x1, x2, … ,xn) in the crisp set U to a point in the crisp set V to be generalized to mapping n fuzzy subsets in U to a fuzzy subset in V. Any mathematical relationship between non-fuzzy elements can be extended to deal with fuzzy entities. 21 March, 2009 ®Copyright of Shun-Feng Su Classic Logic Reasoning Logic reasoning is to find other true propositions (facts) from given true propositions (knowledge and/or facts). The scenario of logic reasoning can be interpreted as: There is a knowledge base containing facts or rules. Now, a new piece of information or the description of the current situation is specified. Then, we want to find out what the system can conclude or which action should be taken under current circumstance. The traditional reasoning is called the Modus Ponen as (A(AB))B. That is one knowledge AB and a fact A can result in the fact B. 22 March, 2009 ®Copyright of Shun-Feng Su Approximate Reasoning for Fuzzy sets The most used inference rule is (A1(A2B))B. In the classic logic, either A1=A2 or A1A2. Therefore, with the match and fire property, either B is concluded or B is not concluded. But, with the use of fuzzy sets, either A1 or A2 is a fuzzy set or both. Then what can the reasoning process conclude? Example: (speed=95Km/hr)(speed is too fastPull back the throttle)) Whether the throttle should be pulled back? In the most common cases, A2 is a fuzzy set and A1 is a fuzzy singleton (crisp value). Note that B can be a crisp value or a fuzzy set. However, the rule A2B is hardly fuzzy. 23 March, 2009 ®Copyright of Shun-Feng Su Approximate Reasoning for Fuzzy sets The most used reasoning format is one of the categorical reasoning called the compositional rule of inference or the generalized modus ponens. (X is A) and (IF (X is B) then (Y is C)) results in Y is AR. where X and Y are fuzzy variables and A, B and C are fuzzy labels (sets). Note that the resultant AR for Y is a fuzzy set. Usually, the membership function of AR can be computed as A R (v) max min( A (u ),t ( B (u ), C (v)) u 24 March, 2009 ®Copyright of Shun-Feng Su Approximate Reasoning for Fuzzy sets The above result can be viewed as the extension principle. A R (v) max min( A (u ), t ( B (u ), C (v)) u is to find whether v is in Y is the selection among various u (or operation max), the existence of x=u in A and (and operation min) the relation of x=u and y=v. 25 March, 2009 ®Copyright of Shun-Feng Su Approximate Reasoning for Fuzzy sets Note that from the logic viewpoint, the implication pq is equivalent to pq. However, this equivalence states that the logic of pq equals pq. But in the reasoning, the logic of the implication is assumed to be true, and the question is whether the current situations (x=u and y=v) match the rule IF (X is B) then (Y is C). Therefore, the most commonly used relation is to compute the t-norm of B (u ) and C (v) . 26 March, 2009 ®Copyright of Shun-Feng Su Approximate Reasoning for Fuzzy sets Example: R1: IF (X is A1) and (Y is B1) then (Z is C1). R2: IF (X is A2) and (Y is B2) then (Z is C2). Now, the input is (x0, y0) and the reasoning can be graphically shown as: 27 March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Uncertainty and Its representation Fuzzy Operations and Uncertainty Reasoning Fuzzy Logic Control Neural Networks Genetic Algorithms Epilogue 28 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Logic Control A Fuzzy Logic Controller (FLC) is a controller described by a collection of fuzzy rules (e.g. IFTHEN rules) involving linguistic variables. The original idea for the use of fuzzy control is to incorporate “expert experience” of human into the design of controllers. The utilization of linguistic variables, fuzzy control rules and approximate reasoning provides a means to incorporate human expert experience in designing the controller. 29 March, 2009 ®Copyright of Shun-Feng Su Rationale behind Fuzzy Logic Control In an FLC, the rule structure provides the adaptation among strategies, and then the fuzzy mechanism provides the interpreting capability among rules. With the interpreting capability, the transition between rules is gradual rather than abrupt. It is the so-called softening process. But, in recent development, fuzzy control is used because it consists of multiple strategies (rules or controllers) for different situations. It of course can have better control performance than that of one complicated controller. 30 March, 2009 ®Copyright of Shun-Feng Su Basic Structure of Fuzzy Logic Control A typical architecture of an FLC consists four principal components: a fuzzifier, a fuzzy rule base, an inference engine, and a defuzzifier. 31 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Logic Control • Knowledge usually is in a rule structure and rule structures need partition. • Fuzzy control uses fuzzy partition. 32 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Logic Control To use fuzzy rules, the input values must be transferred into fuzzy labels. With fuzzy partition The consequences of all matched rules must be transformed into actions. 33 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Logic Control To use fuzzy rules, a value must be defined into labels. With fuzzy partition The consequences of all matched rules must be transformed into actions. 34 also referred to as a fuzzy system. March, 2009 ®Copyright of Shun-Feng Su Basic Structure of Fuzzy Logic Control The fuzzifier is to transform crisp measured data (e.g., speed=100Km/hr) into suitable linguistic labels (e.g. speed is too fast). The fuzzy rule base stores the knowledge in rule forms about how to control the system to be controlled (e.g., IF “speed is too low” THEN “increase the throttle setting”). 35 March, 2009 ®Copyright of Shun-Feng Su Basic Structure of Fuzzy Logic Control The inference engine is to infer desired control strategies from rules by performing approximate reasoning based on current states. The defuzzifier is to yield a non-fuzzy action or decision from the inferred control strategy (a fuzzy set) by the inference engine. 36 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Systems Mamdani fuzzy rules : If (X is A) and (Y is B) … then (Z is C) Note that C is a fuzzy set. TSK (in modeling) or TS (in control) fuzzy rules : If (X is A) and (Y is B) … then Z=f(X,Y). Now, f() is a crisp function. 37 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Systems Mamdani fuzzy rules : If (X is A) and (Y is B) … then (Z is C) TSK (in modeling) or TS (in control) fuzzy rules : If (X is A) and (Y is B) … then Z=f(X,Y). The approximate reasoning for the output of a fuzzy rule is obtained from extension principle as: A R (v) max min( A (u ), t ( B (u ), C (v))). u 38 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Systems Mamdani fuzzy rules : COA defuzzification To find the center of the area, it need to use numerical integration. 39 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Systems TS fuzzy rules : Somewhat is also called COA. But without numerical integration. It is obtained as m i fi z= i 1 m i , Simple and easy to calculate. Most importantly, it can be used in any mathematical operations, such as derivative. i 1 i and f i are the firing strength and the where fired result for the i-th rule and m is the rule number. 40 March, 2009 ®Copyright of Shun-Feng Su Fuzzy Systems Thus, it can be found that in recent development, most of approaches consider TS (or TSK) fuzzy models. TS fuzzy models have also another advantage in applications. The output of a TS fuzzy model system can be more sensitive to the changes of the inputs. It can eliminate the chattering effects in the final control stage occurring in the use of traditional fuzzy models (Mamdani fuzzy rules). 41 March, 2009 ®Copyright of Shun-Feng Su Fuzzy System A fuzzy approximator is constructed by a set of fuzzy rules as R l : IF x1 is A1l , and , and xn is Anl THEN y F is l , for l 1,2,, M Generally, is a fuzzy singleton. In the literature, this fuzzy model can be said to be a Mamdani fuzzy model (with singleton fuzzy sets) and a TS fuzzy model (a crisp function). l A commonly-used fuzzy model in control 42 March, 2009 ®Copyright of Shun-Feng Su Fuzzy System A fuzzy approximator is constructed by a set of fuzzy rules as R l : IF x1 is A1l , and , and xn is Anl THEN y F is l , for l 1,2,, M l Generally, is a fuzzy singleton (TS fuzzy model). To me, due to no numerical integration needed, it is a TS fuzzy model. Also, no membership functions are used in the consequences. 43 March, 2009 ®Copyright of Shun-Feng Su Fuzzy System The fuzzy systems with the center-of area like defuzzification and product inference can be obtained as M y f ( x) n l ( A l ( xi ) ) l 1 M i 1 n i ( A l ( xi ) ) l 1 i 1 t-norm operation for all premise parts i It is a universal function approximator and is written as y f (x θ) θT ω . 44 March, 2009 ®Copyright of Shun-Feng Su Fuzzy System The fuzzy systems with the center-of area like defuzzification and product inference can be obtained as y f ( x) M n l 1 M i 1 n l ( A l ( xi ) ) i ( A l ( xi ) ) l 1 i 1 Note that is a function of states. i It is a universal function approximator and is written as y f (x θ) θT ω . This is what is used Simple and differentiable. 45 in adaptive fuzzy control.March, 2009 ®Copyright of Shun-Feng Su Fuzzy System It should be noted that the above system is a nonlinear system. But, it can be seen that the form is virtually linear. Thus, various approaches have been proposed to handle nonlinear systems by using the linear system techniques for the linear property bearing in each rule, such as common P stability, LMI design process, adaptive fuzzy control, etc. 46 March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Neural Networks Machine Learning Neural Network Models Leaning Analysis Genetic Algorithms Epilogue 47 March, 2009 ®Copyright of Shun-Feng Su Why need Learning The problem domain knowledge for the complicated system usually does not exist or is extremely difficult to obtain. The system may be asked to learn knowledge from experience by itself. Note that learning is an important capability for an intelligent system, but not necessary. It can be seen in the recent research, most intelligent systems have been equipped with the learning capability. 48 March, 2009 ®Copyright of Shun-Feng Su What is Learning? There are two important definitions for learning: H. Simon defined learning as – “any change in a system that allows it to perform better the second time on the repetition of the same task or on another task drawn from the same population.” B. Kosko defined learning as change in all cases. “A system learns if and only if the system parameter vector or matrix has a nonzero time derivative.” 49 March, 2009 ®Copyright of Shun-Feng Su Concept of Machine Learning The first definition is to ask the system with learning should always behave better as learning continues. The second definition is mainly for numerical learning. The fundamental problem for learning is how to change the system to make the system’s behaviors as required. 50 March, 2009 ®Copyright of Shun-Feng Su Concept of Machine Learning The first definition is to ask the system with learning should always behave better as learning continues. The second definition is mainly for numerical learning. The fundamental problem for learning is how to change the system to make the system’s behaviors as required. so-called learning 51 algorithms March, 2009 ®Copyright of Shun-Feng Su Symbolic Learning vs. Numerical Learning In a symbolic learning scheme, the representation of knowledge is symbolic, such as the predicate calculus and rules. The learning behavior is to build a conceptual relationship between those symbols from learned examples. In a numerical learning scheme, the knowledge somehow is coded into numerical data. The learning behavior is concerned about changing the values of parameters numerically. 52 March, 2009 ®Copyright of Shun-Feng Su Symbolic Learning Examples of symbolic learning schemes: Inductive Learning, Case-based Learning, Explanation-based Learning, etc. Symbolic learning is well suited to interact with human experts, but very sensitive to noise. The major drawback of this learning is that the knowledge manipulation is very complicated. Traditional artificial intelligence has been focused on symbolic learning. However, due to the difficulty in manipulation and sensitive to noise, symbolic learning actually did not provide any significant advances in the real-world applications. 53 March, 2009 ®Copyright of Shun-Feng Su Numerical Learning Examples of numerical learning schemes: Neural Networks, Cerebellar Model Arithmetic Computer (CMAC), Fuzzy Modeling, etc. Numerical learning is computational efficiency and insensitive to noise, but incomprehensible. It is easy to use but is difficult to incorporate expert knowledge. Recently, due to the use of neural networks and fuzzy systems, numerical learning has drawn more attentions . Applications of numerical learning schemes can be found in various disciples, such as artificial intelligence, computer science, control engineering, decision theory, expert systems, operation research, pattern recognition, and robotics. 54 March, 2009 ®Copyright of Shun-Feng Su Concept of Learning Depending on what type of information used in determining how to change the system, learning schemes are usually categorized into three different kinds of learning; supervised learning, unsupervised learning and reinforcement learning. Learning category Reinforcement learning sometimes, is also said to be supervised learning, but with less introductive 55 March, 2009 supervising. ®Copyright of Shun-Feng Su Concept of Learning In fact, most successful learning approaches is of supervised learning due to its simplicity in the required task. Unsupervised learning is used for finding common features or for clustering. (self-organizing) Reinforcement learning is fantastical in ideas, but due to its intricacy in learning (such as delay reward, decoupling between two learning systems), more study must be conducted. Learning category 56 March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Neural Networks Machine Learning Neural Network Models Leaning Analysis Genetic Algorithms Epilogue 57 March, 2009 ®Copyright of Shun-Feng Su Introduction of Neural Networks Artificial neural networks (ANN) or in simple, neural networks (NN) are systems that are inspired by modeling networks of biological neurons in the brain. NN are a promising new generation of information processing systems that demonstrate the ability to learn, recall, and generalize from training patterns or data. 58 March, 2009 ®Copyright of Shun-Feng Su Typical Biological Neuron and Its Model 59 March, 2009 ®Copyright of Shun-Feng Su Introduction of Neural Networks NN have a large number of highly interconnected processing elements (PE) or neurons that usually operate in parallel. NN are good at tasks such as pattern matching and pattern classification, function approximation, optimization, vector quantization, and data clustering. However, traditional computers are faster in algorithmic computational tasks and precise arithmetic operations. 60 March, 2009 ®Copyright of Shun-Feng Su Introduction of Neural Networks Since neural networks do not use a mathematical model of how a system’s output depends on its input (so-called model-free estimator), neural network architectures can be applied to a wide variety of problems. Like Brains, neural networks recognize patterns we cannot define. This is the property of recognition without definition. 61 March, 2009 ®Copyright of Shun-Feng Su Introduction of Neural Networks An NN is a parallel distributed informationprocessing structure with the following characteristics: -It is a neurally inspired mathematical model. -It consists of a large number of highly interconnected processing elements (neurons). -Its connections (weights) hold the knowledge. 62 March, 2009 ®Copyright of Shun-Feng Su Introduction of Neural Networks -A neuron can dynamically respond to its stimulus, and the response completely depends on its local information. -It has the ability to learn, recall, and generalize from training data by assigning or adjusting the connection weights. -Its collective behavior demonstrates the computational power, and no single neuron carries specific information (distributed representation property). 63 March, 2009 ®Copyright of Shun-Feng Su Basic Models of Neural Networks Models of ANNs are specified by three basic entities: 1. Neuron Models: It describes how the neurons process the input and how the output is generated. 2. Connectivity: It defines how those neurons are interconnected. 3. Learning Algorithms: It defines how the connecting weights are updated to adjust the networks so as to behavior as required. 64 March, 2009 ®Copyright of Shun-Feng Su Basic Models of Neural Networks The processing in a neuron is separated into two parts: input and output. Associated with the input of a neuron is an integration function f, which serves to combine information, activation, or evidence from an external source or other neurons into a net-input to the neuron. The most commonly used integration function is linear and written as: m for i=1, 2, … , n f i neti wij x j i j 1 where i is the threshold 65 of the i-th neuron.March, 2009 ®Copyright of Shun-Feng Su Basic Models of Neural Networks The output function of a neuron is usually called the activation function in that the output of a neuron serves the role of activation of the meaning stored in the neuron. 66 March, 2009 ®Copyright of Shun-Feng Su Learning Rules for Neural Networks 67 March, 2009 ®Copyright of Shun-Feng Su Learning in Neural Networks As we have mentioned, the basic characteristic of ANNs is that they have the capability of learning. Iterative learning procedures are used for a variety of ANN architectures. Learning in ANNs can be accomplished in several ways: establishment of connections between neurons; adjustment of the weight values on the links; adjustment of threshold values in neurons. In fact, these processes can all be considered as the adjustment of weight values 68 on the links. March, 2009 ®Copyright of Shun-Feng Su Learning in Neural Networks The backpropagation (BP) learning algorithm is usually applied for learning. Such networks are also referred as backpropagation networks. The fundamental idea is that when a cost function p 1 (k ) (k ) 2 , E(w) is defined such as E(w)= ( d y ) 2 k 1 E(w) or then the updating algorithm is w=w (k ) p w=- E = ( d ( k ) y ( k ) ) y . w j k 1 w j Since the above process is to update the weights after all training patterns are taken into account, this is kind of learning is called the batch learning. 69 March, 2009 ®Copyright of Shun-Feng Su Learning in Neural Networks It can be found that when the batch learning is used, the errors of all training patterns are summed together and then the learning effects are for the summary of all training patterns. Thus, the learning cannot make adjustments for individual pattern and the resultant learning is usually unacceptable. The other kind of learning is called the on-line learning or per-example learning. In this type of learning, these changes are made individually for (k ) each pattern; i.e., ( d ( k ) y ( k ) ) y 70 w j March, 2009 ®Copyright of Shun-Feng Su Learning in Neural Networks When we want an NN to perform some tasks, the NN is realized by finding an appropriate set of weights. In other words, the obtained weights are to capture what we want the NN to be or the knowledge. The activation values of neurons represent the system at some time snap. Thus, they capture the transition state for some specific input set at a certain time spot. From the information storage viewpoint, the weights of the links encode the so-called long-term memory and the activation states of neurons encode the short-term memory in the NN. 71 March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Neural Networks Machine Learning Neural Network Models Leaning Analysis Genetic Algorithms Epilogue 72 March, 2009 ®Copyright of Shun-Feng Su Universal Approximator Theorem Neural network with as few as one hidden layer using arbitrary squashing activation function and linear or polynomial integration function can approximate virtually any function of interest to any desired degree of accuracy, provided sufficiently many hidden neurons are available. Any lack of success in applications may arise from inadequate learning, insufficient number of hidden neurons or lack of deterministic relationships between inputs and desired outputs. The theorem only stated the existence of the ideal network, but does not provide any mechanism to 73 March, 2009 find it. ®Copyright of Shun-Feng Su Learning Performance Analysis Two types of learning phases must be distinguished in the evaluation of learning performance, especially for offline learning schemes: the training phase and the testing phase. In the training phase, the system is trained by the given training patterns. Thus, in the training phase, the system is under construction and the convergent behavior of the training is concerned. For the training performance, the convergent behavior is concerned and it is simple to consider the learning histories (training errors74 vs. training iterations). March, 2009 ®Copyright of Shun-Feng Su Learning Performance Analysis The learning convergent behaviors usually are characterized by two properties: the convergent speed and the converged error (training error). If the system is offline learning scheme, the convergent speed may not be a significant factor to be considered. An issue for the converged errors is the learning may be stuck on local minima if iterative (incremental) learning algorithms are used. 75 March, 2009 ®Copyright of Shun-Feng Su Learning Performance Analysis Even through the learning algorithms are the major factor in determining the convergent behavior, other factors, such as the system structure, the training data quality, etc., may also affect the training performance. The learning performance of the training phase is to state how accurately the learned system can approximate the desired outputs for a given input in the training data set. The purpose of learning is to obtain a system that after learning can somehow have a fair chance to behave as required for any input76 data or in short, to March, 2009 generalize. ®Copyright of Shun-Feng Su Learning Performance Analysis Thus, in the testing phase, the generalization capability is concerned; that is, whether the learned system can interpret those unlearned patterns well. In the testing phase, the learned system is tested by another set of patterns, which are not used in the training phase in any way, to define the generalization errors. The performance in this phase is usually referred to as the generalization capability. 77 March, 2009 ®Copyright of Shun-Feng Su Validation of Generalization There are several methods for estimating generalization errors: Split-sample validation: To randomly select part of the data as a test set, which must not used in any way during training. (The most commonly used one). Cross-validation: To resample the training data set. In a k-fold cross-validation, the data is divided into k subsets with equal size. Then, the network is trained k times, each time leaving out one of those subsets, but using only the omitted subset to compute the error criterion. It is also called “leave-one-out” cross-validation. Bootstrapping: Instead of repeating subsets of the data, sub-samples are randomly drawn from the data. It seems 78 March, 2009 to work better than cross-validation. ®Copyright of Shun-Feng Su Learning Performance Analysis In general, there are two different types of generalization: interpolation and extrapolation. Interpolation can often be done reliably, but extrapolation is notoriously unreliable. Note that generalization is not always possible for various learning systems despite the assertions in the literature. 79 March, 2009 ®Copyright of Shun-Feng Su Learning Performance Analysis There are three conditions that are typically necessary (although not sufficient for good generalization). Deterministic input-output relationships: The inputs to the network contain sufficient information pertaining to the desired outputs. It is impossible to learn a nonexistent function. Smooth functions: A small change in the inputs should produce a small change in the outputs. Very non-smooth functions (e.g. random noise) cannot be generalized. Sufficient training data: The used training data should be a sufficiently large and representative subset of the population. Sufficient data can avoid extrapolation. 80 March, 2009 ®Copyright of Shun-Feng Su Overfitting and Underfitting A system that is not sufficiently complex (i.e., parameters to be tuned are less than required) may fail to detect fully the signal in a complicated data set, leading to underfitting. A network that is too complex may fit not only the signal but also the noise, leading to overfitting. Note that overfitting may occur even with noise-free data. There are various approaches proposed in the literature jittering, weight decay, early stooping, Bayesian learning, robust learning algorithms, etc. . 81 March, 2009 ®Copyright of Shun-Feng Su Local Learning Concept The minimum disturbance principle suggests that a better way of learning should be aimed at not only reducing the output error for the current training pattern but also minimizing disturbance to the weights having already learned. A learning system following the minimum disturbance principle can learns more effective. We refer it as the local learning concept. 82 March, 2009 ®Copyright of Shun-Feng Su Local Learning Concept The updating effects of neural networks are prevailed to all weights in the networks due to the distributed knowledge representation. It violates the minimum disturbance principle. It is called the global learning. Local learning can be more effective, but may not always learn better. Neural fuzzy systems use spatial relations to define learning structure that can facilitate local learning concept. 83 March, 2009 ®Copyright of Shun-Feng Su Network Structure for Fuzzy Systems In this kind of approach, fuzzy models are characterized by a set of parameters, such as the centers and widths in membership functions, the rule relationships, etc. Since those parameters can be viewed as the weights in a network, the traditional learning schemes for neural networks then can be adopted to this fuzzy modeling problem. Those kinds of approaches are often referred as neural fuzzy systems or neural-network-based fuzzy systems. 84 March, 2009 ®Copyright of Shun-Feng Su Local Learning Concept It can be found that neural fuzzy systems can always have better learning capability than that of neural networks. Since local learning may restrain the learning on the pre-defined relations to reduce the learning burden, if those relations are not correct or cannot reflect certain information, the effects on local learning may not be acceptable. Several systems can be classified as local learning systems, such as radial basis function networks, Wavelet networks, CMAC, 85 etc. March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Neural Networks Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization Epilogue 86 March, 2009 ®Copyright of Shun-Feng Su Optimization in Computational Intelligence Optimization processes are required in an intelligent system due to: Better selection of applicable knowledge or strategies can result in better performance; In the learning process, an optimal way of defining the updating rule is required. In general, an optimization problem requires finding a setting of variable vector of the system such that a certain quality criterion or called a performance function is optimized. Sometimes, the variable vector may have to satisfy some constraints. 87 March, 2009 ®Copyright of Shun-Feng Su Optimization in Computational Intelligence The traditional optimization approaches are to develop a formal model that resembles the original function and then solves it by means of traditional mathematical methods . Evolutionary algorithms have been widely used in various intelligent systems. In fact, by combining with fuzzy systems and networks, lots of applications can be found in the literature. 88 March, 2009 ®Copyright of Shun-Feng Su Evolutionary Computation An important property of evolutionary algorithms in search is that in the search process, auxiliary forms of the fitness function, such as derivations, are not required. In fact, evolutionary computation should be understood as a general adaptable concept for problem solving rather than a collection of related and ready-to-use algorithms. 89 March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Neural Networks Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization Epilogue 90 March, 2009 ®Copyright of Shun-Feng Su Evolutionary Computation The majority of current implementations of evolutionary algorithms descend from three strongly related but independent developed approaches: Genetic algorithms: to use binary as gene in its representation to search for an optimal chromosome. Evolutionary programming: to evolve finite state machines to predict events on the basis of former observations. Evolution strategies: to solve difficult discrete and continuous parameter optimization problems. 91 March, 2009 ®Copyright of Shun-Feng Su Evolutionary Computation Evolutionary computation is to mimic the natural selection process so as to find the best fitted candidate for the solution. (Optimization) Evolutionary algorithms can be viewed as optimization approaches that use random search algorithms with some guidance. 92 March, 2009 ®Copyright of Shun-Feng Su Evolutionary Computation The guidance is fulfilled by a user-specified fitness function. In general, an optimization problem requires finding a setting of variable vector of the system such that a certain quality criterion or called a performance function is optimized. Sometimes, the variable vector may have to satisfy some constraints. 93 March, 2009 ®Copyright of Shun-Feng Su Initialize population P(t) Evaluate P(t) Apply reproduction and crossover on P(t) to yield C(t) Apply mutation on C(t) to yield and then evaluate D(t) Select P(t+1) from P(t) and D(t) based on the fitness Stop criterion satisfied ? 94 Stop March, 2009 ®Copyright of Shun-Feng Su Evolutionary Computation Evolutionary computation uses three basic operators to manipulate the genetic composition (chromosomes) of a population: Reproduction is a process of selecting parents for generating offspring. The most highly rated chromosomes in the current generation are most likely copied in the new generation. Crossover provides a mechanism for chromosomes to mix and match attributes through random processes. Mutation is to changed attributes (genes) in the new generation to bring new possibility. Mutation is a very important mechanism in avoiding local minimum in optimization search. 95 March, 2009 ®Copyright of Shun-Feng Su Evolutionary Computation The above operations play the role of generating the new chromosomes for evolution. Hopefully, the best-fitted solution can be generated. Besides, randomness play the essential roles in those operations. One attractive property of evolutionary algorithms is that the performance of the solution is always getting better. 96 March, 2009 ®Copyright of Shun-Feng Su Evolutionary Computation However, due to the nature of adaptation to the problems, the operations of evolutionary algorithms must be designed by the users. Moreover, if the optimization is constrained, the initial population and the generations of new chromosomes must be carefully selected. 97 March, 2009 ®Copyright of Shun-Feng Su Outline Introductions Fuzzy Systems Neural Networks Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization Epilogue 98 March, 2009 ®Copyright of Shun-Feng Su Other Non-derivation Optimization Other often mentioned approaches are Ants (ACS, ACO, etc) and Particle Swarm Optimization (PSO). The overall ideas are all similar in that they all use fitness values to guide the search with some random mechanisms associated with the search process. Usually, these approaches can have better search performance than that of genetic algorithms. 99 March, 2009 ®Copyright of Shun-Feng Su Other Non-derivation Optimization It is because Genetic algorithms are solutionwise search and swarm search algorithms are component-wise search. Also, it can be found that genetic algorithms are easier to be trapped into a local minimum if the initial population has some local optimum properties. Swarm algorithms can easily escape from such an initial local optimum phenomena. 10 0 March, 2009 ®Copyright of Shun-Feng Su Epilogue Computation intelligence is a new vehicle for the next generation of artificial intelligence. Nevertheless, only computational intelligence can bring you nowhere. To incorporate with other techniques may possibly create new frontiers for our dreams. 10 1 March, 2009 ®Copyright of Shun-Feng Su Thank you for your attention! Any Questions ?! Shun-Feng Su, Professor of Department of Electrical Engineering, National Taiwan University of Science and Technology E-mail: [email protected], 10 2 March, 2009