* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download I, and X
Business intelligence wikipedia , lookup
Commitment ordering wikipedia , lookup
Relational model wikipedia , lookup
Serializability wikipedia , lookup
Versant Object Database wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Database model wikipedia , lookup
Clusterpoint wikipedia , lookup
Expert Systems, 4th NF Association Rules CS 157B Prof. Sin-Min Lee COMPONENTS OF AN EXPERT SYSTEM KNOWLEDGE BASE KNOWLEDGE REPRESENTATION . PREDICATE CALCULUS . LISTS . FRAMES . SCRIPTS . SEMANTIC NETWORKS . PRODUCTION RULES Knowledge base Inference engine Expert system architecture Dynamic Data base People and Computers WHAT COMPUTERS CAN DO BETTER THAN PEOPLE . . . . Numerical Computation Information storage Repetitive Operations Computers are “Just Machines” The major application areas of AI Artificial Intelligence -GENERAL PROBLEM SOLVING -EXPERT SYSTEMS -NATURAL LANGUAGE PROCESSING -COMPUTER VISION -ROBOTICS -COMPUTER AIDED INSTRUCTION -AUTOMATIC PROGRAMMING - PLANNING AND DECISION SUPPORT Heuristics - Subconscious Heuristic - Conscious Heuristic COMPONENTS OF AN EXPERT SYSTEM KNOWLEDGE BASE PRODUCTION RULES ADVANTAGES OF USING A PRODUCTION SYSTEM TO REPRESENT KNOWLEDGE . Explanation . Modification . Understanding COMPONENTS OF AN EXPERT SYSTEM INFERENCE ENGINE BLIND SEARCH TECHNIQUES . BREADTH FIRST OR DEPTH FIRST SEARCH . FORWARD OR BACKWARD CHAINING COMPONENTS OF AN EXPERT SYSTEM INFERENCE ENGINE SEARCH TECHNIQUES . IN PRACTICE A COMBINATION OF THE TWO CHAINNING TECHNIQUES ARE USED . BIDIRECTIONAL SEARCH . PRUNING THE SEARCH TREE . USE HEURISTIC SEARCH TECHNIQUES COMPONENTS OF AN EXPERT SYSTEM INFERENCE ENGINE HEURISTIC SEARCH TECHNIQUES LIMIT THE SEARCH PROCESS IN AN EFFORT TO REACH THE SOLUTION FASTER COMPONENTS OF AN EXPERT SYSTEM INFERENCE ENGINE HEURISTIC SEARCH TECHNIQUES . BACKTRACKING . MINIMAX . STATIC EVALUATION An Expert system is a program which has a wide base of knowledge in a restricted domain, and uses complex inferential reasoning to perform tasks which a human expert could do. Wehbank 1983 DATA KNOWLEDGE BASE FACTS INTERPRETER SOLUTION BASIC ARCHITECTURE OF EXPERT SYSTEM PHYSIOLOGY OF EXPERT SYSTEM Two methods for triggering ( I.e putting into operation ) rules . Forward Chaining ( fact-directed reasoning ) A process of examining the left part of each rule in turn and applying the rule whenever the conditions for this past are found to hold, the process ends when it ceases to give any new fact. . Backward Chaining ( goal-directed reasoning ) The goal to be attained is given and the right parts of the rule are examined to find which of these include this goal, this sets up new goals which are subgoals for the original goal, and so on, until a known fact is established. FEATURES OF AN EXPERT SYSTEM THE PROGRAM SHOULD BE . DEVELOPED TO MEET A SPECIFIC NEED . EASY TO USE . EDUCATIONAL, WHEN APPROPRIATE . ABLE TO EXPLAIN ITS ADVICE . ABLE TO RESPOND TO SIMPLE QUESTIONS . ABLE TO LEARN NEW KNOWLEDGE KNOWLEDGE IN THE PROGRAM SHOULD BE EASY MODIFIED . CORRECT ERROR . ADD NEW INFORMATION EXAMPLE R5 R1 R3 R2 R4 N R6 R7 IF IF IF IF IF Z AND L THEN S A AND N THEN E D OR M THEN Z A THEN M Q AND (NOT W) AND (NOT Z) THEN IF L AND M THEN E IF B AND C THEN Q KNOWN FACTS (FACTS BASE) - (A,L) GOAL TO BE ESTABLISHED = E FORWARD CHAINING Regard the process as a sequence of iterations through the rules First iteration R2 and R6 are triggered ( A, L, M) Second iteration R3 is triggered ( A, L, M, E, Z) Third iteration R5 is triggered ( A, L, M, E, Z, S) The goal E is reached at this stage; as it happens, not further iteration is possible and the process stops here. B A TRUE R7 Q R1 C N NOT W R4 E GOAL NOT Z R6 L TRUE A TRUE M BACKWARD CHAINING Knowledge Representation 1234- Semantics Networks Object- Attribute - Value triplets Rules Frames Object - Attribute - Value Triplets -- Static knowledge vs. Instances -- Objects are ordered and related -- Handles uncertainty with a certainty factor O-A-V Frames -- a description of an object that contains slot for all the information associated with the object --slots may contain default values, pointers of other frames, set of rules, or procedures by which values may be obtained. Figure 4.12 Frame for Wilson’s coat COAT Slots: Entries: Owner Condition Condition of cuffs Condition of elbow Wilson Rumpled Worm, shiny Worm, shiny Number of arms Fabrics Pockets? Default: 2 Default: wool Default: yes Size If needed, find owner’s height and weight, and compare to Table X. Style If needed, find out collar; pockets; pockets and length; the look in table Y Inference Engine Functions: 1- Inference: Examines existing facts and rules, and add new facts when possible. 2- Control: Decides the order in which inference are made. Limitations -- confined to well defined problem, unable to reason over a field of expertise. -- cannot reason from axioms or general theory. -- do not learn, limited to use specific facts and heuristics that were “taught” by a human expert. -- lack common sense, cannot reason by analog -- performance deteriorates rapidly when problems extend beyond the narrow task they were designed to perform. Control Depth-First v.s Breadth-First Search: -- In a depth-first search, the inference engine takes every opportunity to produce a subgoal, searching for detail first in a depthfirst manner. -- A breadth-first search sweeps across all premises in a rule before digging of greater detail. Advantages -- do not display biased judgment -- do not jump to conclusion and then seek to maintain those conclusions in the face if disconfirming evidence. -- do not have “bad day” -- always attend to details, always systematically consider all of the possible alternatives -- equipped with thousands of heuristic rules, able to perform their specialized task better than a human expert. Multivalued Dependencies • There are database schemas in BCNF that do not seem to be sufficiently normalized • Consider a database classes(course, teacher, book) such that (c,t,b) classes means that t is qualified to teach c, and b is a required textbook for c • The database is supposed to list for each course the set of teachers any one of which can be the course’s instructor, and the set of books, all of which are required for the course (no matter who teaches it). Dependencies course database database database database database database operating systems operating systems operating systems operating systems teacher Avi Avi Hank Hank Sudarshan Sudarshan Avi Avi Jim Jim classes book DB Concepts Ullman DB Concepts Ullman DB Concepts Ullman OS Concepts Shaw OS Concepts Shaw • There are no non-trivial functional dependencies and therefore the relation is in BCNF • Insertion anomalies – i.e., if Sara is a new teacher that can teach database, two tuples need to be inserted (database, Sara, DB Concepts) (database, Sara, Ullman) Dependencies • Therefore, it is better to decompose classes into: course teacher database database database operating systems operating systems Avi Hank Sudarshan Avi Jim teaches course book database database operating systems operating systems DB Concepts Ullman OS Concepts Shaw text We shall see that these two relations are in Fourth Normal Form (4NF) What are Association Rules? - Techniques used to detect associations or relationships between elements in large data sets. - Show value conditions that occur frequently in a data set. - Association Rules are basically if-then rules supported by data - Application of this is called Market Basket Analysis What are Association Rules? • Finding frequent patterns, correlations, or associations among a set of items or objects in transaction databases, relational databases, or other information repositories Applications of Association Rules • • • • • Market Basket Analysis Classification Clustering Cross-marketing Loss-leader Analysis Characteristics of Association Rules • Consists of a set of items, the rule body, leading to another item, the rule head XY • Association Rules relate the rule body X to the rule head Y Itemsets • itemset = a set of items X = {apple, orange} is an itemset • k-itemset = a set of k items X = {apple, orange, banana} = 3itemset Rules • Let I = {i1, i2, …, im} be a set of items • Let transaction t be a set of items where t I • Let T be the Transaction Database or set of transactions where T = {t1, t2, …, tn}. Rules • Transaction t contains itemset X which belongs to I, a set of items • Formal association rule is defined as: X Y, where X, Y I, and X Y = Support • Support is the percentage of transactions that support an association rule. • It is the percentage of transactions that contain product X and product Y Support = Probability(X Y). Confidence • Confidence is the strength or reliability of an association rule • It is the probability that if there is X in a transaction, then Y will also be present. Confidence = Support (X,Y) / Support (X) or Confidence = Probability (Y | X) Thresholds • Minimum Support Threshold and Minimum Confidence Threshold • The association rule is more valuable if they satisfy these minimum values • The higher the percentage, the more useful the data is Uses of Association Rules? • • • • • • Retail Shopping Credit Card transactions Online purchases Medical patient histories Banking services Insurance claims Market Basket Analysis • Modeling technique based on theory that if you buy one group of items, you are also likely to buy another group of items • In consumer behavior, most purchases are bought on impulse • Market Basket Analysis seeks to find a relationship between purchases Uses of Market Basket Analysis • Identify unexpected shopping patterns • Targeted marketing towards specific types of people • Predicting customer response rates to marketing campaigns • Distinguish between profitable and unprofitable customers Maximizing Profitability - Arrangement of items in a store - Planning of specific sales during times of the year - Pricing Policy of certain goods Example -A market has 100 transactions -15 transactions contain product X -Of the 15 transactions, 5 transactions contain product Y Support = 5/100 or 5% Confidence = 5/15 or 33.3% Example • Milk + Bread => Butter X = Milk + Bread Y = Butter “Milk + Bread” is the Rule Body and “Butter” is the Rule Head Example - 5% of all transactions will contain combination of Milk, Bread, and Butter. -If customers buy Milk and Bread, there is a 33.3% possibility that they will buy Butter Support = 5% Confidence = 33.3% Mining Algorithms • Analyze a set of data, looking for patterns or trends • They differ in the strategy and data structure used • Their efficiency and memory requirements also differ Apriori Algorithm • Most classic and well-known algorithm for association rules • Useful in databases that contain transactions • Principle: Any subset of a frequent itemset must be frequent if {A,B} is a frequent itemset, {A} and {B} must also be frequent Apriori Algorithm Method • Count 1-itemsets, then the 2itemsets, then the 3-itemsets and so on • When counting k-itemsets, only consider those itemsets where all subsets of length k have been determined to be frequent in the previous step • Non frequent itemsets are pruned Apriori Algorithm Diagram null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD Pruned supersets ABCE ABDE ABCDE ACDE BCDE