* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Last Lecture Today
Human–computer interaction wikipedia , lookup
Intelligence explosion wikipedia , lookup
Fuzzy logic wikipedia , lookup
Ecological interface design wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Computer Go wikipedia , lookup
Inductive probability wikipedia , lookup
Pattern recognition wikipedia , lookup
Personal knowledge base wikipedia , lookup
Last Lecture • Data Mining Techniques – Genetic Algorithms – Artificial Neural Networks Today • Data Mining Techniques – Bayesian statistics and classifier – Artificial Intelligence 1 Bayesian Statistics • Contrary to the frequentist approach, Bayesian statistics measure degrees of belief • Degrees are calculated by starting with prior beliefs and updating probabilities in face of the evidence - using Bayes theorem • Priors can be estimated from experience, from other methods or even guessed – For this reason it is also called subjective probability Joint/Conditional Probability • P(A, B) • P(A|B) Joint Probability Distribution, Conditional probability. The Probability of both A and B probability of A, given that happening. B already happen. P(A,B)<P(A|B) A B P( A | B ) = P ( AB ) P( B ) 2 Bayes Classifier • A probabilistic framework for solving classification problems P ( A,C ) P (C | A ) = • From the definition of P ( A) conditional probability it P ( A,C ) follows the Bayes theorem: P(A |C ) = P (C ) likelihood • Bayes theorem: P (C | A ) = posterior prior P ( A | C ) P (C ) P ( A) evidence Example of Bayes Theorem • Given: – A doctor knows that meningitis (M) causes stiff neck (S) 50% of the time – Prior probability of any patient having meningitis P(M) is 1/50,000 – Prior probability of any patient having stiff neck P(S) is 1/20 • If a patient has stiff neck (S), what’s the probability he/she has meningitis (M)? P(M | S ) = P ( S | M ) P( M ) 0.5 × 1 / 50000 = = 0.0002 P( S ) 1 / 20 3 Bayesian Classifiers • Consider each attribute and class label as random variables • Given a record with attributes (A1, A2,…,An) – Goal is to predict class C – Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An ) • Can we estimate P(C| A1, A2,…,An ) directly from data? Bayesian Classifiers • Approach: – Compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes theorem P (C | A A K A ) = 1 2 n P ( A A K A | C ) P (C ) P(A A K A ) 1 2 n 1 2 n – Choose value of C that maximizes P(C | A1, A2, …, An) – Equivalent to choosing value of C that maximizes P(A1, A2, …, An|C) P(C) • How to estimate P(A1, A2, …, An | C )? 4 Naïve Bayes Classifier • A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions, or more specifically, independent feature model. • Assumes independence among attributes Ai when class C is given: – P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj) – Can estimate P(Ai| Cj) for all Ai and Cj. – New point is classified as Cj if the numerator in the Bayes equation (P(Cj) Π P(Ai| Cj) ) is maximal (Maximum A Posteriori(MAP)) Naive Bayes Probability Model • Graphical illustration – a class node C at root, want P(C|A1,…,An) – evidence nodes A - observed attributes/features as leaves – conditional independence between all evidence C A1 A2 …… An 5 How to Estimate Probabilities from al al u s Data? ric ric uo t ca Tid Refund eg o t ca eg o co nt in as cl Marital Status Taxable Income Evade 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes s • Class: P(C) = Nc/N – e.g., P(No) = 7/10, P(Yes) = 3/10 k • For discrete attributes: P(Ai | Ck) = |Aik|/ Nc – where |Aik| is number of instances having attribute Ai and belongs to class Ck – Examples: P(Status=Married|No) = 4/7 P(Refund=Yes|Yes)=0 10 Example of Naïve Bayes Classifier Given a Test Record: X = (Refund = No, Married, Income = 120K) naive Bayes Classifier: P(Refund=Yes|No) = 3/7 P(Refund=No|No) = 4/7 P(Refund=Yes|Yes) = 0 P(Refund=No|Yes) = 1 P(Marital Status=Single|No) = 2/7 P(Marital Status=Divorced|No)=1/7 P(Marital Status=Married|No) = 4/7 P(Marital Status=Single|Yes) = 2/7 P(Marital Status=Divorced|Yes)=1/7 P(Marital Status=Married|Yes) = 0 For taxable income: If class=No: sample mean=110 sample variance=2975 If class=Yes: sample mean=90 sample variance=25 P(X|Class=No) = P(Refund=No|Class=No) × P(Married| Class=No) × P(Income=120K| Class=No) = 4/7 × 4/7 × 0.0072 = 0.0024 P(X|Class=Yes) = P(Refund=No| Class=Yes) × P(Married| Class=Yes) × P(Income=120K| Class=Yes) = 1 × 0 × 1.2 × 10-9 = 0 Since P(X|No)P(No) > P(X|Yes)P(Yes) Therefore P(No|X) > P(Yes|X) => Class = No 6 Naïve Bayes Classifier • If there is few data, a problem is that if one of the conditional probability is zero, then the entire expression becomes zero • To avoid this problem probability estimations on conditionals can be expressed in different ways: N ic Nc Original : P ( Ai | C ) = Laplace N ic + 1 : P ( Ai | C ) = Nc + c c: number of classes p: prior probability m: parameter N ic + mp : P ( Ai | C ) = Nc + m m - estimate Example of Naïve Bayes Classifier Name human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle Give Birth yes Give Birth yes no no yes no no yes no yes yes no no yes no no no no no yes no Can Fly no no no no no no yes yes no no no no no no no no no yes no yes Can Fly no Live in Water Have Legs no no yes yes sometimes no no no no yes sometimes sometimes no yes sometimes no no no yes no Class yes no no no yes yes yes yes yes no yes yes yes no yes yes yes yes no yes mammals non-mammals non-mammals mammals non-mammals non-mammals mammals non-mammals mammals non-mammals non-mammals non-mammals mammals non-mammals non-mammals non-mammals mammals non-mammals mammals non-mammals Live in Water Have Legs yes no Class ? A: attributes M: mammals N: non-mammals 6 6 2 2 P ( A | M ) = × × × = 0.06 7 7 7 7 1 10 3 4 P ( A | N ) = × × × = 0.0042 13 13 13 13 7 P ( A | M ) P( M ) = 0.06 × = 0.021 20 13 P ( A | N ) P ( N ) = 0.004 × = 0.0027 20 P(A|M)P(M) > P(A|N)P(N) => Mammals 7 Naïve Bayes (Summary) • Robust to isolated noise points (they are averaged) • Handle missing values by ignoring the instance during probability estimate calculations • Robust to irrelevant attributes • Independence assumption may not hold for some attributes but in spite of this Naïve Bayes has shown good performance – We can use other techniques such as Bayesian Belief Networks (BBN) or Bayesian Networks Concepts and Definitions of Artificial Intelligence • Artificial intelligence (AI) definitions – Artificial intelligence (AI) The subfield of computer science concerned with symbolic reasoning and problem solving • Characteristics of AI – Symbolic processing • Numeric versus symbolic • Algorithmic versus heuristic – Heuristics Informal, judgmental knowledge of an application area that constitutes the “rules of good judgment” in the field. Heuristics also encompasses the knowledge of how to solve problems efficiently and effectively. 8 Concepts and Definitions of Artificial Intelligence • Characteristics of artificial intelligence – Inferencing • Reasoning capabilities that can build higher-level knowledge from existing heuristics – Machine learning • Learning capabilities that allow systems to adjust their behavior and react to changes in the outside environment – Knowledge-based systems (KBS) • Technologies that use qualitative knowledge rather than mathematical models to provide the needed supports AI History 9 Evolution of Artificial Intelligence Robotics, data mining business intelligence GA, ANN, Bayesian Networks Expert systems or KBS GA, Evolutionary computing Search heuristics Knowledge representation Reasoning strategies ANN, Fuzzy Logic Stanley Autonomous vehicle, DARPA 2005 grand challenge winner: drove 142 miles in a dessert within less than 7 hours. Relied on machine learning and probabilistic reasoning. Laser vision system Computer system 10 The Artificial Intelligence Field The Artificial Intelligence Field • Applications of artificial intelligence – Expert system (ES) A computer system that applies reasoning methodologies to knowledge in a specific domain to render advice or recommendations, much like a human expert. A computer system that achieves a high level of performance in task areas that, for human beings, require years of special education and training 11 Break Basic Concepts of Expert Systems (ES) • The basic concepts of ES include: – How to determine who experts are – How expertise can be transferred from a person to a computer (knowledge engineering). This is the biggest challenge. – How the system works 12 Basic Concepts of Expert Systems (ES) • Expert A human being who has developed a high level of proficiency in making judgments in a specific, usually narrow, domain • Expertise The set of capabilities that underlines the performance of human experts, including extensive domain knowledge, heuristic rules that simplify and improve approaches to problem solving, metaknowledge and metacognition, and compiled forms of behavior that afford great economy in a skilled performance Applications of ES •Development environment: used by builders. Include the knowledge base, the inference engine, knowledge acquisition, and improving reasoning capability. The knowledge engineer and the expert are considered part of these environments •Consultation environment: used by a nonexpert to obtain expert knowledge and advice. It includes the workplace, inference engine, explanation facility, recommended action, and user interface 13 Structure of ES • Three major components in ES are: – Knowledge base – Inference engine – User interface • ES may also contain: – – – – Knowledge acquisition subsystem Blackboard (workplace) Explanation subsystem (justifier) Knowledge refining system Structure of ES • Knowledge base A collection of facts, rules, and procedures organized into schemas. The assembly of all the information and knowledge about a specific field of interest • Inference engine The part of an expert system that actually performs the reasoning function • User interfaces The parts of computer systems that interact with users, accepting commands from the computer keyboard and displaying the results generated by other parts of the systems 14 Rule-based system architecture Control Scheme (Interpreter) Inference Condition-Action Rules R1: IF hot AND smoky THEN ADD fire R2: IF alarm_beeps THEN ADD smoky R3 IF fire THEN ADD switch_on_sprinklers Database of Facts alarm_beeps hot How ES Work: Inference Mechanisms • Knowledge representation and organization – Expert knowledge must be represented in a computerunderstandable format and organized properly in the knowledge base – Different ways of representing human knowledge include: • Production rules (most common, the only one discussed here) • Semantic networks • Logic statements 15 How ES Work: Inference Mechanisms • The inference process: Inference is the process of chaining multiple rules together based on available data. Methods – Forward chaining A data-driven search in a rule-based system – Backward chaining A search technique (employing IF-THEN rules) used in production systems that begins with the action clause of a rule and works backward through a chain of rules in an attempt to find a verifiable set of condition clauses Development of ES • Defining the nature and scope of the problem – Rule-based ES are appropriate when the nature of the problem is qualitative, knowledge is explicit, and experts are available to solve the problem effectively and provide their knowledge • Identifying proper experts – A proper expert should have a thorough understanding of: • Problem-solving knowledge • The role of ES and decision support technology • Good communication skills 16 Development of ES • Acquiring knowledge – Knowledge engineer An AI specialist responsible for the technical side of developing an expert system. The knowledge engineer works closely with the domain expert to capture the expert’s knowledge in a knowledge base – Knowledge engineering (KE) The engineering discipline in which knowledge is integrated into computer systems to solve complex problems normally requiring a high level of human expertise Development of ES • Selecting the building tools – General-purpose development environment (e.g. Prolog, C++ etc) – Expert system shell (e.g. Prolog Expert System PESS, JavaDON, MYCIN, JESS etc) A computer program that facilitates relatively easy implementation of a specific expert system. Analogous to a DSS generator 17 A Fuzzy Expert System • • • • A service centre keeps spare parts and repairs failed ones. A customer brings a failed item and receives a spare of the same type. Failed parts are repaired, placed on the shelf, and thus become spares. The objective here is to advise a manager of the service centre on certain decision policies to keep the customers satisfied. Process of developing a fuzzy expert system 1. Specify the problem and define linguistic variables. 2. Determine fuzzy sets. 3. Elicit and construct fuzzy rules. 4. Encode the fuzzy sets, fuzzy rules and procedures to perform fuzzy inference into the expert system. 5. Evaluate and tune the system. 18 Bayesian reasoning in ES Suppose all rules in the knowledge base are represented in the following form: IF E is true THEN H is true {with probability p} This rule implies that if event E occurs, then the probability that event H will occur is p. In expert systems, H usually represents a hypothesis and E denotes evidence to support this hypothesis. p (H E ) = p (E H ) × p (H ) p (E H ) × p (H ) + p (E ¬ H ) × p (¬ H ) A problem is that experts need to provide priors/conditionals. Psychological research shows that humans cannot elicit probability values consistent with the Bayesian rules. Certainty factors theory and evidential reasoning • • Certainty factors theory is a popular alternative to Bayesian reasoning. A certainty factor (cf ), a number to measure the expert’s belief. The maximum value of the certainty factor is, say, +1.0 (definitely true) and the minimum −1.0 (definitely false). For example, if the expert states that some evidence is almost certainly true, a cf value of 0.8 would be assigned to this evidence. 19 Uncertain terms and their interpretation in MYCIN Term •MYCIN is a popular medical expert system. Certainty Factor _ Definitely not _1.0 Almost certainly not _0.8 Probably not _0.6 Maybe not _0.4 Unknown 0.2 to +0.2 Maybe +0.4 Probably +0.6 Almost certainly +0.8 Definitely +1.0 Certainty factors theory and evidential reasoning in ES In expert systems with certainty factors, the knowledge base consists of a set of rules that have the following syntax: IF <evidence> THEN <hypothesis> {cf } where cf represents belief in hypothesis H given that evidence E has occurred. 20 Applications of ES • Applications of ES – – – – – – – – – – – – Credit analysis systems Pension fund advisors Automated help desks Homeland security systems Marketing surveillance systems Business process reengineering systems Finance Data processing Human resources Manufacturing Business process automation Health care management Benefits, Limitations, and Success Factors of ES • Benefits of ES – – – – – – – – – – – – Enhancement of problem solving and decision making Improved decision-making processes Improved decision quality Ability to solve complex problems Knowledge transfer to remote locations Enhancement of other information systems Capture of scarce expertise Flexibility Operation in hazardous environments Accessibility to knowledge and help desks Ability to work with incomplete or uncertain information Provision of training 21 Benefits, Limitations, and Success Factors of ES • Problems with ES – Knowledge is not always readily available – It can be difficult to extract expertise from humans – The vocabulary that experts use to express facts and relations is often limited and not understood by others – The approach of each expert to a situation assessment may be different yet correct – It is difficult to abstract good situational assessments when under time pressure – Users of ES have natural cognitive limits – ES work well only within a narrow domain of knowledge – Most experts have no independent means of checking whether their conclusions are reasonable Hybrid Systems • Fuzzy Logic – A methodology that allows us to design systems that are described with imprecise information – In some problems it is not easy to get the optimum set of rules • Neural Networks – Can learn models with arbitrary precision – Are fundamentally black boxes • Genetic Algorithms – A brute force optimization technique • Why not combining these techniques to take advantage of the features they provide? 22 Hybrid Systems Fuzzification Inference Defuzzification rule-base µcold µwarm µhot if temp is cold then valve is open µcold =0.7 0.7 if temp is warm then valve is half 0.2 measured temperature t µopen µhalf µ close 0.7 0.2 µwarm =0.2 if temp is hot then valve is close v crisp output for valve-setting µhot =0.0 •Shape of membership functions may be learned by a ANN A Neuro-Fuzzy System • A fuzzy system trained by heuristic learning techniques derived from neural networks • Can be viewed as a 3-layer neural network with fuzzy weights and special activation functions • Always interpretable as a fuzzy system • Uses constraint learning procedures • A function approximator (classifier, controller) 23 Hybrid Systems •Neuro-Fuzzy system Hybrid Systems • Genetic algorithms and neural networks – The genetic learning method can perform rule discovery in large databases, with the rules fed into a conventional ES or some other intelligent system – To integrate genetic algorithms with neural network models use a genetic algorithm to search for potential weights associated with network connections – A good genetic learning method can significantly reduce the time and effort needed to find the optimal neural network model 24 Summary • What were the main points of the lecture? 25