Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neural modeling fields wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Time series wikipedia , lookup
Catastrophic interference wikipedia , lookup
Intelligence explosion wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Reinforcement learning wikipedia , lookup
Concept learning wikipedia , lookup
Computational Intelligence Methods Machine learning Pavel Kordı́k, Martin Šlapák Dept. of computer science, Faculty of Information Technology, Czech Technical University in Prague MI-MVI, ZS 2011/12, Lect. 2 https://edux.fit.cvut.cz/courses/MI-MVI/ Evropský sociálnı́ fond Praha & EU: Investujeme do vašı́ budoucnosti Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 1 / 38 ML intro Machine learning roots Alan Turing - british mathematician and logician, AI pioneer Intelligent Machinery, (1947) - machines will learn from the experiences Herbert Simon - ML pioneer, wrote programs: Logic Theory Machine (1956) General Problem Solver (GPS) (1957) - possibly the first method of separating problem solving strategy from information about particular problems Definition ”Learning is any process by which a system improves performance from experience.” Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 2 / 38 ML intro ML history 1950s: I I Samuel’s checker player Selfridge’s Pandemonium, Rosenblatt’s Perceptron 1960s: I I Pattern recognition Minsky and Papert prove limitations of Perceptron 1970s: I I I Symbolic concept induction Expert systems and the knowledge acquisition bottleneck Quinlan’s ID3 1980s: I I I I Advanced decision tree and rule learning Learning and planning and problem solving Resurgence of neural networks (connectionism, backpropagation) Valiant’s PAC Learning Theory Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 3 / 38 ML intro ML history, cont. 1990s: I I I I I I I Data mining Adaptive software agents and web applications Text learning Reinforcement learning (RL) Inductive Logic Programming (ILP) Ensembles: Bagging, Boosting, and Stacking Bayes Net learning 2000+: I I I I I I I I Support vector machines Kernel methods Graphical models Statistical relational learning Transfer learning Sequence labeling Collective classification and structured outputs Meta-learning Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 4 / 38 ML task examples Machine learning tasks Regression, prediction Example Predict temperature based on historical measurements. Classification Example Based on training examples, classify cars into 3 categories. Clustering Example Cluster cars into groups based on their similarity. Problem solving / planning / control Example Navigate robot from the maze and then win the robotic soccer game. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 5 / 38 ML task examples Cluster analysis versus Classification Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters). So that observations in the same cluster are similar in some sense. It is a method of unsupervised learning. On the other hand, the classification requires supervised learning - categories are predefined. Figure: Clustering example Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Figure: Classification example Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 6 / 38 Supervised vs. unsupervised learning Supervised learning Definition Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value. Output is classifier (discrete) or regression function (continuous). The inferred function should predict the correct output value for any valid input object. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 7 / 38 Supervised vs. unsupervised learning Unsupervised learning What it is? Unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data. The training data consist of a set of training examples. There is no error or reward signal to evaluate a potential solution. No pair: vin =⇒ vout This distinguishes unsupervised learning from supervised learning and reinforcement learning. Many methods employed in unsupervised learning are based on data mining methods used to preprocess data. I I I clustering (e.g., k-means, mixture models, k-nearest neighbors, hierarchical clustering) neural network models, self-organizing map (SOM) Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 8 / 38 Supervised vs. unsupervised learning Classification examples Assign object/event to one of a given finite set of categories. Medical diagnosis Credit card applications or transactions Fraud detection in e-commerce Worm detection in network packets Spam filtering in email Recommended articles in a newspaper Recommended books, movies, music, or jokes Financial investments DNA sequences Spoken words Handwritten letters Astronomical images Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 9 / 38 Supervised vs. unsupervised learning Problem Solving / Planning / Control examples Performing actions in an environment in order to achieve a goal. Solving calculus problems Playing checkers, chess, or backgammon Balancing a pole Driving a car or a jeep Flying a plane, helicopter, or rocket Controlling an elevator Controlling a character in a video game Controlling a mobile robot Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 10 / 38 Supervised vs. unsupervised learning Measuring the performance Regression/prediction error (RMSE, MAPE) Classification Accuracy Specificity, sensitivity, F-ratio Confusion matrix ROC curve, AUC Clustering performance - average silhouette, CPCC Solution correctness Solution quality (length, efficiency) Speed of performance ... Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 11 / 38 Learning systems Defining the Learning Task Improve on task, T, with respect to performance metric, P, based on experience, E. Example T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Database of games, playing practice games against itself Example T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words Example T: Categorize email messages as spam or legitimate. P: Percentage of email messages correctly classified. E: Database of emails, some with human-given labels Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 12 / 38 Learning systems Designing a Learning System Machine Learning system definition A system capable of acquiring and integrating the knowledge automatically. The capability of the systems to learn from experience, training, analytical observation, and other means, results in a system that can continuously self-improve and thereby exhibit efficiency and effectiveness. A machine learning system usually starts with some knowledge and a corresponding knowledge organization so that it can interpret, analyze, and test the knowledge acquired. Build the training database, prepare training data. Choose exactly what is too be learned, i.e. the target function. Choose how to represent the target function. Choose a learning algorithm to infer the target function from the experience. Supply the algorithm with the performance metric. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 13 / 38 Checker machine learning system example Sample learning problem Learn to play checkers from self-play We will develop an approach analogous to that used in the first machine learning system developed by Arthur Samuels at IBM in 1959. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 14 / 38 Checker machine learning system example Training Experience - building the database Direct experience Given sample input and output pairs for a useful target function. Checker boards labeled with the correct move, e.g. extracted from record of expert play Indirect experience Given feedback which is not direct I/O pairs for a useful target function. Potentially arbitrary sequences of game moves and their final game results. Credit/Blame Assignment Problem: How to assign credit blame to individual moves given only indirect feedback? Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 15 / 38 Checker machine learning system example Source of Training Data Provided random examples outside of the learner’s control. I Negative examples available or only positive? Good training examples selected by a “benevolent teacher.” I “Near miss” examples Learner can query an oracle about class of an unlabeled example in the environment. Learner can construct an arbitrary example and query an oracle for its label. Learner can design and run experiments directly in the environment without any human guidance. see the Reinforcement learning. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 16 / 38 Checker machine learning system example Training vs. Test Distribution Generally assume that the training and test examples are independently drawn from the same overall distribution of data. I IID: Independently and identically distributed If examples are not independent, requires collective classification. If test distribution is different, requires transfer learning. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 17 / 38 Checker machine learning system example Choosing a Target Function What function is to be learned and how will it be used by the performance system? For checkers, assume we are given a function for generating the legal moves for a given board position and want to decide the best move. I I Could learn a function: ChooseMove(board , legalmoves ) =⇒ bestMove Or could learn an evaluation function, V (board ) =⇒ R, that gives each board position a score for how favorable it is. V can be used to pick a move by applying each legal move, scoring the resulting board position, and choosing the move that results in the highest scoring board position. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 18 / 38 Checker machine learning system example Ideal Definition of V (b) If b is a final winning board, then V (b) = 100 If b is a final losing board, then V (b) = – 100 If b is a final draw board, then V (b) = 0 Otherwise, then V (b) = V (b0 ), where b0 is the highest scoring final board position that is achieved starting from b and playing optimally until the end of the game (assuming the opponent plays optimally as well). How to compute it? Search the finite game tree! I I I Brute force search? Dynamic programming? Heuristics? Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 19 / 38 Checker machine learning system example Approximating V (b) Computing V (b) is intractable since it involves searching the complete exponential game tree. Therefore, this definition is said to be non-operational. An operational definition can be computed in reasonable (polynomial) time. Need to learn an operational approximation to the ideal evaluation function. Target function can be represented in many ways: lookup table, symbolic rules, numerical function, neural network. There is a trade-off between the expressiveness of a representation and the ease of learning. The more expressive a representation, the better it will be at approximating an arbitrary function; however, the more examples (or more time) will be needed to learn an accurate function. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 20 / 38 Checker machine learning system example Approximating V (b) by a linear function Lets use a linear approximation of the evaluation function: b (b) = w0 +w1 ∗bp(b)+w2 ∗rp(b)+w3 ∗bk (b)+w4 ∗rk (b)+w5 ∗bt (b)+w6 ∗rt (b) V bp(b): number of black pieces on board b. rp(b): number of red pieces on board b. bk (b): number of black kings on board b. rk (b): number of red kings on board b. bt (b): number of black pieces threatened (i.e. which can be immediately taken by red on its next turn). rt (b): number of red pieces threatened. How the weights can be computed? Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 21 / 38 Checker machine learning system example Obtaining Training Values Direct supervision may be available for the target function. [(bp=3,rp=0,bk=1,rk=0,bt=0,rt=0), 100] (win for black) With indirect feedback, training values can be estimated using temporal difference learning (used in reinforcement learning where supervision is delayed reward). Estimate training values for intermediate (non-terminal) board positions by the estimated value of their successor in an actual game trace. b (successor (b)), Vtrain (b) = V where successor(b) is the next board position where it is the program’s move in actual play. Values towards the end of the game are initially more accurate and continued training slowly “backs up” accurate values to earlier board positions. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 22 / 38 Checker machine learning system example Learning Algorithm Uses training values for the target function to induce a hypothesized definition that fits these examples and hopefully generalizes to unseen examples. Attempts to minimize some measure of error (loss function) such as mean squared error: P E = b (b))2 (Vtrain (b) − V trainingExamplesNumber , A gradient descent algorithm updating weights in order to minimize the error: while weights not converged do for each training example b do b (b ) compute absolute error error (b) = Vtrain (b) − V for each board feature fi do update its weight, wi = wi + c ∗ fi ∗ error (b); //c is the learning rate. end for end for end while Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 23 / 38 Checker machine learning system example Evaluation of Learning Systems Experimental I I I Conduct controlled cross-validation experiments to compare various methods on a variety of benchmark datasets. Gather data on their performance, e.g. test accuracy, training-time, testing-time. Analyze differences for statistical significance. Theoretical I Analyze algorithms mathematically and prove theorems about their: F Computational complexity F Ability to fit training data F Sample complexity (number of training examples needed to learn an accurate function) Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 24 / 38 General remarks Various Function Representations Numerical functions I I I Linear regression Neural networks Support vector machines Symbolic functions I I I Decision trees Rules in propositional logic Rules in first-order predicate logic Instance-based functions I I Nearest-neighbor Case-based Probabilistic Graphical Models I I I I I Naı̈ve Bayes Bayesian networks Hidden-Markov Models (HMMs) Probabilistic Context Free Grammars (PCFGs) Markov networks Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 25 / 38 General remarks Various Search Algorithms Gradient descent I I Perceptron Backpropagation Dynamic Programming I I HMM Learning PCFG Learning Divide and Conquer I I Decision tree induction Rule learning Evolutionary Computation I I I Genetic Algorithms (GAs) Genetic Programming (GP) Neuro-evolution Swarms ... Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 26 / 38 General remarks PSO and ACO Particle swarm optimalization and Ant colony optimalization are methods of Swarm intelligence. Swarm intelligence (SI) is the collective behaviour of decentralized, self-organized systems, natural or artificial. SI systems are typically made up of a population of simple agents or boids interacting locally with one another and with their environment. Natural examples of SI include ant colonies, bird flocking, animal herding, bacterial growth, and fish schooling. Individual agent (ant/fish) has limited abilities. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 27 / 38 General remarks Agents and multi-agents systems An agent is something that acts in an environment. An intelligent agent is an agent that acts intelligently: I I I I I its actions are appropriate for its goals and circumstances it is flexible to changing environments and goals it learns from experience it makes appropriate choices given perceptual limitations and finite computation Picture: http://www.codeproject.com/KB/architecture/Agents/Multi-Agent_systems.JPG Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 28 / 38 General remarks Evolutionary algorithms Evolutionary algorithm techniques differ in the implementation details and the nature of the particular applied problem. Genetic algorithm This is the most popular type of EA. One seeks the solution of a problem by applying operators such as recombination and mutation. This type of EA is often used in optimization problems. Genetic programming Here the solutions are in the form of computer programs, and their fitness is determined by their ability to solve a computational problem. Evolutionary programming Similar to genetic programming, but the structure of the program is fixed and its numerical parameters are allowed to evolve. Neuroevolution Similar to genetic programming but the genomes represent artificial neural networks by describing structure and connection weights. The genome encoding can be direct or indirect. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 29 / 38 Neural networks Neural networks It is a computational model that is inspired by the structure and/or functional aspects of biological neural networks. It consists of an interconnected group of artificial neurons. ANN is an adaptive system that can change its structure during the learning phase. They are usually used to model Figure: Artifical neural network complex relationships between inputs and outputs or to find patterns in data. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 30 / 38 Neural networks Perceptron Most simple binary classifier. It maps input vector to output f (x ). ξ= N X ωi · xi − θ i =1 Z = S (ξ) is transfer function Figure: Perceptron model (common is Heaviside step f. or Sigmoid f.) and θ is treshold. The perceptron learning algorithm does not terminate if the learning set is not linearly separable. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) ( f (x ) = 1 if ω · x + b > 0 0 otherwise where ω is vector of weights, b is constant term Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 31 / 38 Neural networks Multilayer perceptron – MLP An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Each node is a neuron with a nonlinear activation function. MLP is a modification of the standard linear perceptron, which can distinguish data that is not linearly separable. MLP user supervised learning. Figure: Multilayer perceptron Figure: Activation functions Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 32 / 38 Neural networks Decision trees Are a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. They represent a decision algorithm. They are aesy to understand by human. Can be simple combined with other Figure: Handwritten decision tree techniques. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 33 / 38 Neural networks Self organization maps (networks) – SOM Demo applet: http://www.eee.metu.edu.tr/~alatan/Courses/Demo/Kohonen.htm Also called Kohonen’s maps. It is based on dimensionality redection. Pattern recognition use case of SOM. Figure: Kohonen Applet Figure: Input vector mapping Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 34 / 38 Neural networks Artifical life – Alife It is a field of study of systems related to life, its processes, and its evolution through simulations using computer models. There are three main kinds of alife, named for their approaches: I I I soft from software hard from hardware wet from biochemistry The term ”artificial life” is often used to specifically refer to soft alife. Basic techniques: I I Cellular automata were used in the early days of artificial life. Neural networks are sometimes used to model the brain of an agents. Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 35 / 38 Neural networks Artifical life – simulators AnimatLab http://www.animatlab.com/ Noble Ape http://www.nobleape.com/sim/ Darwinbots http://www.darwinbots.com/ and many applets. . . Figure: Noble Ape simulator Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 36 / 38 Neural networks Artifical life – simulators – 2 Figure: AnimatLab develope enviroment Figure: AnimatLab examples Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 37 / 38 Neural networks References Raymond J. Mooney, CS 391L: Machine Learning Introduction, University of Texas at Austin Demo applets – neural networks http://neurony.wz.cz/ Backpropagation example http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html Pavel Kordı́k, Martin Šlapák (FIT ČVUT) Computational Intelligence Methods MI-MVI, ZS 2011/12, Lect. 2 38 / 38