Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber The game • How did you reason to find the rule? Copyright R. Weber • According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, “inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances.” Learning • Rote Learning – Learn multiplication tables • Supervised Learning – Examples are used to help a program identify a concept – Examples are typically represented with attribute-value pairs – Notion of supervision originates from guidance from examples • Unsupervised Learning Copyright R. Weber – Human efforts at scientific discovery, theory formation Inductive Learning • Learning by generalization • Performance of classification tasks – Also categorization • Rules indicate categories • Goal: Copyright R. Weber – Characterize a concept Concept Learning is a Form of Inductive Learning •Learner uses: Copyright R. Weber –positive examples (instances ARE examples of a concept) and –negative examples (instances ARE NOT examples of a concept) Concept Learning Copyright R. Weber • Needs empirical validation • Dense or sparse data determine quality of different methods Validation of Concept Learning i • The learned concept should be able to correctly classify new instances of the concept Copyright R. Weber – When it succeeds in a real instance of the concept it finds true positives – When it fails in a real instance of the concept it finds false negatives Validation of Concept Learning ii • The learned concept should be able to correctly classify new instances of the concept Copyright R. Weber – When it succeeds in a counterexample it finds true negatives – When it fails in a counterexample it finds false positives Rule Learning Copyright R. Weber • Learning widely used in data mining • Version Space Learning is a search method to learn rules • Decision Trees Version Space ii A=1 B=1 B=.5 C=1 C=1 C=.5 C=.3 C=.3 C=.5 B=0 C=1 C=.3 C=.5 Outcome ?Outcome ?Outcome ?Outcome ?Outcome ?Outcome ?Outcome ? Copyright R. Weber • Creates tree that includes all possible combinations • Does not learn for rules with disjunctions • Incremental method, trains additional data without the need to retrain all data Decision trees Copyright R. Weber • Knowledge representation formalism • Represent mutually exclusive rules (disjunction) • A way of breaking up a data set into classes or categories • Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class • Not incremental Decision trees consist of: -leaf nodes (classes) - decision nodes (tests on attribute values) -from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997 Decision tree induction Copyright R. Weber • Goal is to correctly classify all example data • Several algorithms to induce decision trees: ID3 (Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5 • Constructs decision tree from past data • Attempts to find the simplest tree (not guaranteed because it is based on heuristics) ID3 algorithm •From: – a set of target classes –Training data containing objects of more than one class Copyright R. Weber •ID3 uses test to refine the training data set into subsets that contain objects of only one class each •Choosing the right test is the key How does ID3 chooses tests Copyright R. Weber • Information gain or ‘minimum entropy’ • Maximizing information gain corresponds to minimizing entropy •Predictive features (good indicators of the outcome) Choosing tests Copyright R. Weber • Information gain or ‘minimum entropy’ • Maximizing information gain corresponds to minimizing entropy •Predictive features (good indicators of the outcome) Clustering • • • • Data analysis method applied to data Data should naturally possess groupings Goal: group data into clusters Resulting clusters are collections where objects within a cluster are similar to each other • Objects outside the cluster are dissimilar to objects inside • Objects from one cluster are dissimilar to objects in other clusters • Distance measures are used to compute similarity Copyright R. Weber • http://movielens.umn.edu/main Explanation-based learning Copyright R. Weber • Incorporates domain knowledge into the learning process • Feature values are assigned a relevance factor if their values are consistent with domain knowledge • Features that are assigned relevance factors are considered in the learning process Familiar Learning Task • • • • Copyright R. Weber Learn relative importance of features Goal: learn individual weights Commonly used in case-based reasoning Methods include a similarity measure to get feedback about verify their relative importance: feedback methods • Search methods: gradient descent • ID3 Copyright R. Weber Esteem CBR shell has option to use Gradient Descent to learn weights: Data mining tasks ii • Link analysis Rules: • Association generation • Relationships between entities • Deviation detection Copyright R. Weber • How things change over time, trends KDD applications • Fraud detection – Telecom (calling cards, cell phones) – Credit cards – Health insurance Loan approval Investment analysis Marketing and sales data analysis Copyright R. Weber Identify potential customers Effectiveness of sales campaign Store layout Text mining Copyright R. Weber The problem starts with a query and the solution is a set of information (e.g., patterns, connections, profiles, trends) contained in several different texts that are potentially relevant to the initial query. Text mining applications • IBM Text Navigator – Cluster documents by content; – Each document is annotated by the 2 most frequently used words in the cluster; Copyright R. Weber • Concept Extraction (Los Alamos) – Text analysis of medical records; – Uses a clustering approach based on trigram representation; – Documents in vectors, cosine for comparison; Problem-solving method rule-based ES Reasoning type deductive reasoning case-based reasoning analogical reasoning inductive ML, NN inductive reasoning algorithms search