Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Lecture 3 Course Syllabus • Course topics: • Introduction (Week1-Week2) – – – – What is Data Mining? Data Collection and Data Management Fundamentals The Essentials of Learning The Emerging Needs for Different Data Analysis Perspectives • Data Management and Data Collection Techniques for Data Mining Applications (Week3-Week4) – Data Warehouses: Gathering Raw Data from Relational Databases and transforming into Information. – Information Extraction and Data Processing Techniques – Data Marts: The need for building highly specialized data storages for data mining applications Week3- Remainder-Data to Knowledge Pyramid Increasing potential to support business decisions Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery End User Business Analyst Data Analyst Data Exploration Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts OLAP, MDA Data Sources Paper, Files, Information Providers, Database Systems, OLTP DBA Week 2 - Remainder - Data Mining Perspective to Knowledge Discovery Interpretation/ Evaluation Knowledge Data Mining Preprocessing Patterns Selection Preprocessed Data Data Target Data adapted from: U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An Overview,” Advances in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press Week 3 - Remainder - Essentials of Learning Learning ? •can we formalize it? •is it just a chemical activation? •is it memorization? •is it continous node connecting/disconnecting on dynamically changing brain network topology? Week 3- Remainder -Essentials of Learning The Artifical Intelligence View: •central to human knowledge and intelligence, essential for building intelligent machines. •years of effort in AI has shown that trying to build intelligent computers by programming all the rules cannot be done; automatic learning is crucial. For example, we humans are not born with the ability to understand language — we learn it — and it makes sense to try to have computers learn language instead of trying to program it all it Week 3- Remainder- Essentials of Learning The Software Engineering View: • Machine Learning allows us to program computers by example, which can be easier than writing code the traditional way. The Stats View: • Machine Learning is the marriage of computer science and statistics •computational techniques are applied to statistical problems. Machine Learning has been applied to a vast number of problems in many contexts, beyond the typical statistics problems. Machine Learning is often designed with different considerations than statistics (e.g., speed is often more important than accuracy). Week 3- Essentials of Learning Informal Learning Problem Definition: computer program that improves its performance at some task through experience Formal Learning Problem Definition: computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E Week 3- Essentials of Learning A chess learning problem: Task T: playing chess Performance measure P: percent of games won against opponents Training experience E: playing practice games against itself A handwriting recognition learning problem: Task T: recognizing and classifying handwritten words within images Performance measure P: percent of words correctly classified Training experience E: a database of handwritten words with given classifications A robot driving learning problem: Task T: driving on public four-lane highways using vision sensors Performance measure P: average distance traveled before an error (as judged by human overseer) Training experience E: a sequence of images and steering commands recorded while observing a human driver Week 3- Essentials of Learning Attributes of Experience learn from direct training examples consisting of states and the correct move for each – supervised learning -CHESS PROBLEM providing individual chess board states and the correct move for each learn from indirect information consisting of the moves and final outcomes of these moves. – unsupervised learning -CHESS PROBLEM providing sequences of moves and final outcomes of various games played causality – credit assignment Week 3- Essentials of Learning Attributes of Experience the degree to which the learner controls the sequence of training examples CHESS PROBLEM rely on the teacher to select informative board states and to provide the correct move for each the learner might itself propose board states that it finds particularly confusing and ask the teacher for the correct move learner may have complete control over both the board states and (indirect) training classifications, as it does when it learns by playing against itself with no teacher present Week 3- Essentials of Learning Attributes of Experience how well it represents the distribution of examples over which the final system performance P must be measured !!!! most current theory of machine learning rests on the crucial assumption that the distribution of training examples is identical to the distribution of test examples.Despite our need to make this assumption in order to obtain theoretical results, it is important to keep in mind that this assumption must often be violated in practice. Week 3- Essentials of Learning Central Limit Theorem The Central Limit Theorem is a theorem stating that the sum of a large number of independent, identically distributed random variables approximately follows a Normal distribution Consider a set of independent, identically distributed random variables Y1 . . . YN, governed by an arbitrary probability Distribution follows the Normal Distrubition even if we dont know the distrubition of individual Yi but we could compute the distribution of A common rule of thumb is that we can use the Normal approximation when n >= 30 Week 3- Essentials of Learning Operational Definition of Learning FunctionTarget Function Given training experience and target definition deciding on learning architecture by considering correctness applicability performance CHESS PROBLEM Ideal Target Function: ChooseMove : B -> M to indicate that this function accepts as input any board from the set of legal boardstates B and produces as output some move from the set of legal moves M. What if indirect training experience available to our system ? Week 3- Essentials of Learning Operational Definition of Learning FunctionTarget Function Given training experience and target definition deciding on learning architecture by considering correctness applicability performance CHESS PROBLEM Operational Target Function: V : B ->R to denote that V maps any legal board state from the set B to some real value. assign higher scores to better board states. then use it to select the best move from any current board position. can be accomplished by generating the successor board state produced by every legal move, then using V to choose the best successor state and therefore the best legal move. Is it really operational Not So!! searching all the way down to the end of game. Computationally not operational Week 3- Essentials of Learning Operational Definition of Learning FunctionTarget Function Given training experience and target definition deciding on learning architecture by considering correctness applicability performance CHESS PROBLEM Operational Target Function: V : B ->R to denote that V maps any legal board state from the set B to some real value. assign higher scores to better board states. then use it to select the best move from any current board position. can be accomplished by generating the successor board state produced by every legal move, then using V to choose the best successor state and therefore the best legal move. Is it really operational Not So!! searching all the way down to the end of game. Computationally not operational Week 3- Essentials of Learning Operational Definition of Learning FunctionTarget Function Given training experience and target definition deciding on learning architecture by considering correctness applicability performance CHESS PROBLEM Choosing complex target function brings expressebility but also bring performance battleneck also brings the urgent need on extra more training examples (a lot more) to learn Real issue is choosing the operation target function -> MODELING->function approximation Week 3- Essentials of Learning Importance of Target Function Target function simply determines the size of our hypothesis space (solution space) What if needed solution cannot be represented in our hypothesis space lets have perfect hypothetical H hypothesis space that can represen every teachable function so expressebility is not our problem are we OK with that H NO we are now completely unable to generalize beyond the observed examples Week 3- Essentials of Learning Importance of Target Function if we need generalization and applicability to unseen instances we must choose biased-target function (generalizable target function) a learner that makes no a priori assumptions regarding the identity of the target function has no rational basis for classifying any unseen instances we simply wish to capture here is the policy by which the learner generalizes beyond the observed training data, to infer the classification of new instances Week 3- Essentials of Learning Inductive Bias target concept = target function Formal Definition: Week 3- Essentials of Learning Inductive Bias Examples ROTE-LEARNER: Learning corresponds simply to storing each observed training example in memory. Subsequent instances are classified by looking them up in memory. If the instance is found in memory, the stored classification is returned. Otherwise, the system refuses to classify the new instance. Inductive Bias: Bias- Free CANDIDATE-ELIMINATlION ALGORITHM: New instances are classified only in the case where all members of the current version space (subset of hypetheses consistent with our training examples) agree on the classification. Otherwise, the system refuses to classify the new instance. Inductive Bias: Target Concept in the Hypothesis Space FIND-S: This algorithm, finds the most specific hypothesis consistent with the training examples. It then uses this hypothesis to classify all subsequent instances. Inductive Bias : Target Concept in the Hypothesis Space and most specific hypothesis represent it Week 3- Essentials of Learning Search Bias vs Restriction Bias ID3 searches a complete hypothesis space (i.e., one capable of expressing any finite discrete-valued function). It searches incompletely through this space, from simple to complex hypotheses, until its termination condition is met (e.g., until it finds a hypothesis consistent with the data). Its inductive bias is solely a consequence of the ordering of hypotheses by its search strategy. Its hypothesis space introduces no additional bias. (SEARCH BIAS, PREFERENCE BIAS) CANDIDATE-ELIMINATlON algorithm searches an incomplete hypothesis space (i.e., one that can express only a subset of the potentially teachable concepts, version space). It searches this space completely, finding every hypothesis consistent with the training data. Its inductive bias is solely a consequence of the expressive power of its hypothesis representation. Its search strategy introduces no additional bias. (RESTRICTION BIAS LANGUAGE BIAS) Week 3-End • read – Supplemantary Book “Machine Learning”Tom Mitchell Chapter 1 – Chapter 2 – Course Text Book Chapter 2 (preparation for the next week)