* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Machine Learning
Technological singularity wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Pattern recognition wikipedia , lookup
Machine learning wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Intelligence explosion wikipedia , lookup
Concept learning wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Machine Learning Foundations of Artificial Intelligence Learning What is Learning? Learning in AI is also called machine learning or pattern recognition. The basic objective is to allow an intelligent agent to discover autonomously knowledge from experience. Let’s examine the definition more closely: “an intelligent agent”: The ability to learn requires a prior level of intelligence and knowledge. Learning has to start from an existing level of capability. “to discover autonomously”: Learning is fundamentally about an agent recognizing new facts for its own use and acquiring new abilities that reinforce its own existing abilities. Literal programming, i.e. rote learning from instruction, is not useful. “knowledge”: Whatever is learned has to be represented in some way that the agent can use. “If you can't represent it, you can't learn it” is a corollary of the slogan “Knowledge is power”. “from experience”: Experience is typically a set of so-called training examples; examples may be categorized or not. They may be random or selected by a teacher. They may include explanations or not. Foundations of Artificial Intelligence 2 Learning Agent sensors ? environment agent actuators Critic Learning element Percepts KB Problem solver Actions Foundations of Artificial Intelligence 3 Learning element Design of a learning element is affected by Which components of the performance element are to be learned What feedback is available to learn these components What representation is used for the components Type of feedback: Supervised learning: correct answers for each training example Unsupervised learning: correct answers not given Reinforcement learning: occasional rewards/feedback Foundations of Artificial Intelligence 4 Inductive Learning Inductive Learning inductive learning involves learning generalized rules from specific examples (can think of this as the “inverse” of deduction) main task: given a set of examples, each classified as positive or negative produce a concept description that matches exactly the positive examples Some Notes: The examples are coded in some representation language, e.g. they are coded by a finite set of real-valued features. The concept description is in a certain language that is presumably a superset of the language of possible example encodings. A “correct” concept description is one that classifies correctly ALL possible examples, not just those given in the training set. Fundamental Difficulties with Induction can’t generalize with perfect certainty examples and concepts are NOT available “directly”; they are only available through representations which may be more or less adequate to capture them some examples may be classified as both positive and negative the features supplied may not be sufficient to discriminate between positive and negative examples Foundations of Artificial Intelligence 5 Inductive Learning Frameworks 1. Function-learning formulation 2. Logic-inference formulation Foundations of Artificial Intelligence 6 Inductive learning Simplest form: learn a function from examples f is the target function An example is a pair (x, f(x)) Problem: find a hypothesis h such that h ≈ f given a training set of examples This is a highly simplified model of real learning: Ignores prior knowledge Assumes examples are given Foundations of Artificial Intelligence 7 Inductive learning Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples E.g., curve fitting: Foundations of Artificial Intelligence 8 Inductive learning Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples E.g., curve fitting: Foundations of Artificial Intelligence 9 Inductive learning Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples E.g., curve fitting: Foundations of Artificial Intelligence 10 Inductive learning Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples E.g., curve fitting: Foundations of Artificial Intelligence 11 Inductive learning Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples E.g., curve fitting: Foundations of Artificial Intelligence 12 Inductive learning Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples E.g., curve fitting: Ockham’s razor: prefer the simplest hypothesis consistent with data Foundations of Artificial Intelligence 13 Logic-Inference Formulation Background knowledge KB Training set D (observed knowledge) that is not logically implied by KB Inductive inference: Find h (inductive hypothesis) such that KB and h imply D h = D is a trivial, but uninteresting solution (data caching) Usually, not a sound inference Foundations of Artificial Intelligence 14 Rewarded Card Example Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded” Background knowledge KB: ((r=1) v … v (r=10)) NUM(r) ((r=J) v (r=Q) v (r=K)) FACE(r) ((s=S) v (s=C)) BLACK(s) ((s=D) v (s=H)) RED(s) Training set D: REWARD([4,C]) REWARD([7,C]) REWARD([2,S]) REWARD([5,H]) REWARD([J,S]) Possible inductive hypothesis: h (NUM(r) BLACK(s) REWARD([r,s])) Note: There are several possible inductive hypotheses Foundations of Artificial Intelligence 15 Learning a Predicate Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in E, takes the value True or False (e.g., REWARD) Observable predicates A(x), B(X), … e.g., NUM, RED Training set values of CONCEPT for some combinations of values of the observable predicates Foundations of Artificial Intelligence 16 A Possible Training Set Ex. # A B C D E CONCEPT 1 True True False True False False 2 True False False False False True 3 False False True True True False 4 True True True False True True 5 False True True False False False 6 True True False True True False 7 False False True False True False 8 True False True False True True 9 False False False True True False 10 True True True True False True Note that the training set does not say whether an observable predicate A, …, E is pertinent or not Foundations of Artificial Intelligence 17 Learning a Predicate Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in E, takes the value True or False (e.g., REWARD) Observable predicates A(x), B(X), … e.g., NUM, RED Training set values of CONCEPT for some combinations of values of the observable predicates Find a representation of CONCEPT in the form: CONCEPT(x) S(A,B, …) where S(A,B,…) is a sentence built with the observable predicates, e.g.: CONCEPT(x) A(x) (B(x) v C(x)) Foundations of Artificial Intelligence 18 Example set An example consists of the values of CONCEPT and the observable predicates for some object x A example is positive if CONCEPT is True, else it is negative The set X of all examples is the example set The training set is a subset of X Foundations of Artificial Intelligence 19 Hypothesis Space An hypothesis is any sentence h of the form: CONCEPT(x) S(A,B, …) where S(A,B,…) is a sentence built with the observable predicates The set of all hypotheses is called the hypothesis space H An hypothesis h agrees with an example if it gives the correct value of CONCEPT Foundations of Artificial Intelligence 20 Inductive Learning Scheme Training set D - + + + - - + + + - - + -+ + + - + - + Example set X {[A, B, …, CONCEPT]} Foundations of Artificial Intelligence Inductive hypothesis h Hypothesis space H {[CONCEPT(x) S(A,B, …)]} 21 Size of Hypothesis Space n observable predicates 2n entries in truth table n 2 In the absence of any restriction (bias), there are 2 hypotheses to choose from n = 6 2x1019 hypotheses! Foundations of Artificial Intelligence 22 Rewarded Card Example Multiple Inductive Hypotheses Rewarded Card Example (Continued) Background knowledge KB: ((r=1) v … v (r=10)) NUM([r,s]) ((r=J) v (r=Q) v (r=K)) FACE([r,s]) ((s=S) v (s=C)) BLACK([r,s]) ((s=D) v (s=H)) RED([r,s]) Training set D: REWARD([4,C]) REWARD([7,C]) REWARD([2,S]) REWARD([5,H]) REWARD([J,S]) Possible inductive hypothesis: h (NUM(x) BLACK(x) REWARD(x)) h1 NUM(x) BLACK(x) REWARD(x) h2 BLACK([r,s]) (r=J) REWARD([r,s]) h3 ([r,s]=[4,C]) ([r,s]=[7,C]) [r,s]=[2,S]) REWARD([r,s]) h4 ([r,s]=[5,H]) ([r,s]=[J,S]) REWARD([r,s]) agree with all the examples in the training set Foundations of Artificial Intelligence 23 Inductive Bias Need for a system of preferences – called a bias – to compare possible hypotheses Keep-It-Simple (KIS) Bias If an hypothesis is too complex it may not be worth learning it There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller Examples: Use much fewer observable predicates than suggested by the training set Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax (e.g., conjunction of literals) If the bias allows only sentences S that are conjunctions of k << n predicates picked from the n observable predicates, then the size of H is O(nk) Foundations of Artificial Intelligence 24 Version Spaces Idea: assume you are looking for a CONJUNCTIVE CONCEPT e.g., spade A, club 7, club 9 club 8, heart 5 concept: odd and black yes no now notice that the set of conjunctive concepts is partially ordered by specificity any card at any point, keep most specific and least specific conjuncts consistent with data: black most specific: • anything more specific misses some positive instances • always exists -- conjoin all OK conjunctions odd black least specific: • anything less specific admits some negative instances • may not be unique -- imagine all you know is club 4 not ok, odd black ok, spade ok, black not ok Idea is to gradually merge least and most specific as data comes in. Foundations of Artificial Intelligence spade odd spade 3 of spade 25 Version Spaces: Example The training examples (obtained) incrementally: Card In Target Set? A-ª yes 7-§ yes 8-© no 9-§ yes 5-© no K-¨ no 6-¨ no 7-ª yes Foundations of Artificial Intelligence Step 0: most specific concept (msc) is the empty set; least specific concept (lsc) is the set of all cards. Step 1: A-spade is found to be in target set: msc = {A-spade} lsc = set of all cards Step 2: 7-club is found to be in target set: msc = odd black cards lsc = set of all cards Step 3: 8-heart is not in target set msc = odd black cards lsc = all odd cards OR all black cards ... 26 Predicate as a Decision Tree The predicate CONCEPT(x) A(x) (B(x) v C(x)) can be represented by the following decision tree: Example: A? A mushroom is poisonous iff True it is yellow and small, or yellow, big and spotted B? • x is a mushroom False True • CONCEPT = POISONOUS • A = YELLOW True • B = BIG C? • C = SPOTTED True False True Foundations of Artificial Intelligence False False False 27 Decision Trees What is a Decision Tree it takes as input the description of a situation as a set of attributes (features) and outputs a yes/no decision (so it represents a Boolean function) each leaf is labeled "positive” or "negative", each node is labeled with an attribute (or feature), and each edge is labeled with a value for the feature of its parent node Attribute-value language for examples in many inductive tasks, especially learning decision trees, we need a representation language for examples each example is a finite feature vector a concept is a decision tree where nodes are features Foundations of Artificial Intelligence 28 Decision Trees Example: “is it a good day to play golf?” a set of attributes and their possible values: outlook sunny, overcast, rain temperature cool, mild, hot humidity high, normal windy true, false A particular instance in the training set might be: <overcast, hot, normal, false>: play In this case, the target class is a binary attribute, so each instance represents a positive or a negative example. Foundations of Artificial Intelligence 29 Using Decision Trees for Classification Examples can be classified as follows 1. look at the example's value for the feature specified 2. move along the edge labeled with this value 3. if you reach a leaf, return the label of the leaf 4. otherwise, repeat from step 1 Example (a decision tree to decide whether to go play golf): outlook sunny no Foundations of Artificial Intelligence rain yes humidity high overcast windy normal yes true no false yes 30 Classification: 3 Step Process 1. Model construction (Learning): Each record (instance) is assumed to belong to a predefined class, as determined by one of the attributes, called the class label The set of records used for construction of the model is called training set The model is usually represented in the form of classification rules, (IFTHEN statements) or decision trees 2. Model Evaluation (Accuracy): Estimate accuracy rate of the model based on a test set The known label of test sample is compared to classified result from model Accuracy rate: percentage of test set samples correctly classified by the model Test set is independent of training set otherwise over-fitting will occur 3. Model Use (Classification): The model is used to classify unseen instances (assigning class labels) Predict the value of an actual attribute Foundations of Artificial Intelligence 31 Memory-Based Reasoning Basic Idea: classify new instances based on their similarity to instances we have seen before also called “instance-based learning” Simplest form of MBR: Rote Learning learning by memorization save all previously encountered instance; given a new instance, find one from the memorized set that most closely “resembles” the new one; assign new instance to the same class as the “nearest neighbor” more general methods try to find k nearest neighbors rather than just one but, how do we define “resembles?” MBR is “lazy” defers all of the real work until new instance is obtained; no attempts are made to learn a generalized model from the training set less data preprocessing and model evaluation, but more work has to be done at classification time Foundations of Artificial Intelligence 32 MBR & Collaborative Filtering Collaborative Filtering or “Social Learning” idea is to give recommendations to a user based on the “ratings” of objects by other users usually assumes that features in the data are similar objects (e.g., Web pages, music, movies, etc.) usually requires “explicit” ratings of objects by users based on a rating scale there have been some attempts to obtain ratings implicitly based on user behavior (mixed results; problem is that implicit ratings are often binary) Nearest Neighbors Strategy: Find similar users and predicted (weighted) average of user ratings We can use any distance or similarity measure to compute similarity among users (user ratings on items viewed as a vector) In case of ratings, often the Pearson r algorithm is used to compute correlations Foundations of Artificial Intelligence 33 MBR & Collaborative Filtering Collaborative Filtering Example A movie rating system Ratings scale: 1 = “detest”; 7 = “love it” Historical DB of users includes ratings of movies by Sally, Bob, Chris, and Lynn Karen is a new user who has rated 3 movies, but has not yet seen “Independence Day”; should we recommend it to her? Star Wars Jurassic Park Terminator II Independence Day Sally 7 6 3 7 Bob 7 4 4 6 Chris 3 7 7 2 Lynn 4 4 6 2 Karen 7 4 3 ? Will Karen like “Independence Day?” Foundations of Artificial Intelligence 34 Clustering Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters Helps users understand the natural grouping or structure in a data set Cluster: a collection of data objects that are “similar” to one another and thus can be treated collectively as one group but as a collection, they are sufficiently different from other groups Clustering unsupervised classification no predefined classes Foundations of Artificial Intelligence 35 Distance or Similarity Measures Measuring Distance In order to group similar items, we need a way to measure the distance between objects (e.g., records) Note: distance = inverse of similarity Often based on the representation of objects as “feature vectors” An Employee DB ID 1 2 3 4 5 Gender F M M F M Age 27 51 52 33 45 Foundations of Artificial Intelligence Salary 19,000 64,000 100,000 55,000 45,000 Term Frequencies for Documents Doc1 Doc2 Doc3 Doc4 Doc5 T1 0 3 3 0 2 T2 4 1 0 1 2 T3 0 4 0 0 2 T4 0 3 0 3 3 T5 0 1 3 0 1 T6 2 2 0 0 4 36 Distance or Similarity Measures Common Distance Measures: Manhattan distance: Euclidean distance: Cosine similarity: dist ( X , Y ) 1 sim( X , Y ) sim( X , Y ) ( xi yi ) i xi yi 2 i Foundations of Artificial Intelligence 2 i 37 What Is Good Clustering? A good clustering will produce high quality clusters in which: the intra-class (that is, intra-cluster) similarity is high the inter-class similarity is low The quality of a clustering result also depends on both the similarity measure used by the method and its implementation The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns The quality of a clustering result also depends on the definition and representation of cluster chosen Foundations of Artificial Intelligence 38 Applications of Clustering Clustering has wide applications in Pattern Recognition Spatial Data Analysis: create thematic maps in GIS by clustering feature spaces detect spatial clusters and explain them in spatial data mining Image Processing Market Research Information Retrieval Document or term categorization Information visualization and IR interfaces Web Mining Cluster Web usage data to discover groups of similar access patterns Web Personalization Foundations of Artificial Intelligence 39 Learning by Discovery One example: AM by Doug Lenat at Stanford a mathematical system inputs: set theory (union, intersection, etc); “how to do mathematics” (based on a book by Polya), e.g., if f is an interesting function of two arguments, then f(x,x) is an interesting function on one, etc. speculated about what was interesting an made conjectures, etc. What AM discovered integers (as equivalence relation on cardinality of sets) addition (using disjoint union of sets) multiplication primes: 1 was interesting, the function returning the cardinality of set of divisors was interesting, etc. Glodbach’s conjecture: “all even numbers are the sum of two prime numbers”; (note that AM did not prove it, just discovered that it was interesting) Why was AM so successful? Connection between LISP and mathematics (mutations of small bits of LISP code are likely to be interesting) Doesn’t extend to other domains Lessons from EURISKO (fleet game) Foundations of Artificial Intelligence 40 Explanation-Based Learning Explanation- based learning (EBL) systems try to explain why each training instance belongs to the target concept. The resulting “proof” is then generalized and saved. If a new instance can be explained in the same manner as a previous instance, then it is also assumed to be a member of the target concept. Like macro- operators, EBL systems never learn to solve a problem that they couldn’t solve before (in principle). However, they can become much more efficient at problem-solving by reorganizing the search space. One of the strengths of EBL is that the resulting “explanations” are typically easy to understand. One of the weaknesses of EBL is that they rely on a domain theory to generate the explanations. Foundations of Artificial Intelligence 41 Case-Based Learning Case-based reasoning (CBR) systems keep track of previously seen instances and apply them directly to new ones. In general, a CBR system simply stores each “case” that it experiences in a “case base” which represents its memory of previous episodes. To reason about a new instance, the system consults its case base and finds the most similar case that it’s seen before. The old case is then adapted and applied to the new situation. CBR is similar to reasoning by analogy. Many people believe that much of human learning is case- based in nature. Foundations of Artificial Intelligence 42 Connectionist Algorithms Connectionist models (also called neural networks) are inspired by the interconnectivity of the brain. Connectionist networks typically consist of many nodes that are highly interconnected. When a node is activated, it sends signals to other nodes so that they are activated in turn. Using layers of nodes allows connectionist models to learn fairly complex functions. Neural networks are loosely modeled after the biological processes involved in cognition: 1. Information processing involves many simple elements called neurons. 2. Signals are transmitted between neurons using connecting links. 3. Each link has a weight that controls the strength of its signal. 4. Each neuron applies an activation function to the input that it receives from other neurons. This function determines its output. Foundations of Artificial Intelligence 43