Download Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining
Lecture 3
Course Syllabus
• Course topics:
• Introduction (Week1-Week2)
–
–
–
–
What is Data Mining?
Data Collection and Data Management Fundamentals
The Essentials of Learning
The Emerging Needs for Different Data Analysis
Perspectives
• Data Management and Data Collection Techniques for
Data Mining Applications (Week3-Week4)
– Data Warehouses: Gathering Raw Data from Relational
Databases and transforming into Information.
– Information Extraction and Data Processing Techniques
– Data Marts: The need for building highly specialized data
storages for data mining applications
Week3- Remainder-Data to
Knowledge Pyramid
Increasing potential
to support
business decisions
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
End User
Business
Analyst
Data
Analyst
Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP, MDA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
DBA
Week 2 - Remainder - Data Mining
Perspective to Knowledge
Discovery
Interpretation/
Evaluation
Knowledge
Data Mining
Preprocessing
Patterns
Selection
Preprocessed
Data
Data
Target
Data
adapted from:
U. Fayyad, et al. (1995), “From Knowledge Discovery to Data
Mining: An Overview,” Advances in Knowledge Discovery and
Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press
Week 3 - Remainder - Essentials of
Learning
Learning ?
•can we formalize it?
•is it just a chemical activation?
•is it memorization?
•is it continous node connecting/disconnecting
on dynamically changing brain network
topology?
Week 3- Remainder -Essentials of
Learning
The Artifical Intelligence View:
•central to human knowledge and intelligence,
essential for building intelligent machines.
•years of effort in AI has shown that trying to build
intelligent computers by programming all the rules
cannot be done; automatic learning is crucial. For
example, we humans are not born with the ability to
understand language — we learn it — and it makes
sense to try to have computers learn language
instead of trying to program it all it
Week 3- Remainder- Essentials of
Learning
The Software Engineering View:
• Machine Learning allows us to program computers by example,
which can be easier than writing code the traditional way.
The Stats View:
• Machine Learning is the marriage of computer science and statistics
•computational techniques are applied to statistical problems. Machine
Learning has been applied to a vast number of problems in many
contexts, beyond the typical statistics problems. Machine Learning is
often designed with different considerations than statistics (e.g., speed
is often more important than accuracy).
Week 3- Essentials of Learning
Informal Learning Problem Definition:
computer program that improves its performance at some task
through experience
Formal Learning Problem Definition:
computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P,
improves with experience E
Week 3- Essentials of Learning
A chess learning problem:
Task T: playing chess
Performance measure P: percent of games won against opponents
Training experience E: playing practice games against itself
A handwriting recognition learning problem:
Task T: recognizing and classifying handwritten words within images
Performance measure P: percent of words correctly classified
Training experience E: a database of handwritten words with given classifications
A robot driving learning problem:
Task T: driving on public four-lane highways using vision sensors
Performance measure P: average distance traveled before an error (as judged
by human overseer)
Training experience E: a sequence of images and steering commands recorded
while observing a human driver
Week 3- Essentials of Learning
Attributes of Experience
learn from direct training examples consisting of states and
the correct move for each – supervised learning
-CHESS PROBLEM
providing individual chess board states and the correct move for each
learn from indirect information consisting of the moves and final outcomes
of these moves. – unsupervised learning
-CHESS PROBLEM
providing sequences of moves and final outcomes of various games played
causality – credit assignment
Week 3- Essentials of Learning
Attributes of Experience
the degree to which the learner controls the sequence of training examples
CHESS PROBLEM
rely on the teacher to select informative board states and to provide the correct move
for each
the learner might itself propose board states that it finds particularly confusing and ask
the teacher for the correct move
learner may have complete control over both the board states and (indirect) training
classifications, as it does when it learns by playing against itself with no teacher
present
Week 3- Essentials of Learning
Attributes of Experience
how well it represents the distribution of examples over which the final system
performance P must be measured
!!!!
most current theory of machine learning rests on the crucial assumption that
the distribution of training examples is identical to the distribution of
test examples.Despite our need to make this assumption in order to obtain
theoretical results, it is important to keep in mind that this assumption must
often be violated in practice.
Week 3- Essentials of Learning
Central Limit Theorem
The Central Limit Theorem is a theorem stating that
the sum of a large number of independent,
identically distributed random variables
approximately follows a Normal distribution
Consider a set of independent, identically
distributed random variables Y1 . . . YN, governed by an arbitrary probability
Distribution
follows the Normal Distrubition
even if we dont know the distrubition of individual Yi but we could compute the distribution
of
A common rule of thumb is that we can use the
Normal approximation when n >= 30
Week 3- Essentials of Learning
Operational Definition of Learning FunctionTarget Function
Given training experience and target definition deciding on
learning architecture by considering
correctness
applicability
performance
CHESS PROBLEM
Ideal Target Function: ChooseMove : B -> M to indicate
that this function accepts as input any board from the set of
legal boardstates B and produces as output some move from the set
of legal moves M.
What if indirect training experience available to our system ?
Week 3- Essentials of Learning
Operational Definition of Learning FunctionTarget Function
Given training experience and target definition deciding on
learning architecture by considering
correctness
applicability
performance
CHESS PROBLEM
Operational Target Function: V : B ->R to denote that V maps any legal
board state from the set B to some real value.
assign higher scores to better board states.
then use it to select the best move from any current board position.
can be accomplished by generating the successor board state produced by
every legal move, then using V to choose the best successor state and
therefore the best legal move.
Is it really operational Not So!! searching all the way down to the end of
game. Computationally not operational
Week 3- Essentials of Learning
Operational Definition of Learning FunctionTarget Function
Given training experience and target definition deciding on
learning architecture by considering
correctness
applicability
performance
CHESS PROBLEM
Operational Target Function: V : B ->R to denote that V maps any legal
board state from the set B to some real value.
assign higher scores to better board states.
then use it to select the best move from any current board position.
can be accomplished by generating the successor board state produced by
every legal move, then using V to choose the best successor state and
therefore the best legal move.
Is it really operational Not So!! searching all the way down to the end of
game. Computationally not operational
Week 3- Essentials of Learning
Operational Definition of Learning FunctionTarget Function
Given training experience and target definition deciding on
learning architecture by considering
correctness
applicability
performance
CHESS PROBLEM
Choosing complex target function brings expressebility
but also bring performance battleneck
also brings the urgent need on extra more training examples (a lot more) to learn
Real issue is choosing the operation target function -> MODELING->function
approximation
Week 3- Essentials of Learning
Importance of Target Function
Target function simply determines the size of our hypothesis
space (solution space)
What if needed solution cannot be represented in our hypothesis
space
lets have perfect hypothetical H hypothesis space that can represen
every teachable function
so expressebility is not our problem are we OK with that H
NO we are now completely unable to generalize beyond the
observed examples
Week 3- Essentials of Learning
Importance of Target Function
if we need generalization and applicability to unseen
instances we must choose biased-target function (generalizable
target function)
a learner that makes no a priori assumptions regarding the
identity of the target function has no rational basis for
classifying any unseen instances
we simply wish to capture here is the policy by which the learner
generalizes beyond the observed training data, to infer the
classification of new instances
Week 3- Essentials of Learning
Inductive Bias
target concept = target function
Formal Definition:
Week 3- Essentials of Learning
Inductive Bias Examples
ROTE-LEARNER: Learning corresponds simply to storing each observed
training example in memory. Subsequent instances are classified by looking
them up in memory. If the instance is found in memory, the stored
classification is returned. Otherwise, the system refuses to classify the new
instance.
Inductive Bias: Bias- Free
CANDIDATE-ELIMINATlION ALGORITHM: New instances are classified
only in the case where all members of the current version space (subset of
hypetheses consistent with our training examples) agree on the
classification. Otherwise, the system refuses to classify the new instance.
Inductive Bias: Target Concept in the Hypothesis Space
FIND-S: This algorithm, finds the most specific hypothesis consistent with
the training examples. It then uses this hypothesis to classify all subsequent
instances.
Inductive Bias : Target Concept in the Hypothesis Space and most
specific hypothesis represent it
Week 3- Essentials of Learning
Search Bias vs Restriction Bias
ID3
searches a complete hypothesis space (i.e., one capable of expressing
any finite discrete-valued function). It searches incompletely through this
space, from simple to complex hypotheses, until its termination condition is
met (e.g., until it finds a hypothesis consistent with the data). Its inductive
bias is solely a consequence of the ordering of hypotheses by its search
strategy. Its hypothesis space introduces no additional bias. (SEARCH
BIAS, PREFERENCE BIAS)
CANDIDATE-ELIMINATlON algorithm searches an incomplete
hypothesis space (i.e., one that can express only a subset of the potentially
teachable concepts, version space). It searches this space completely,
finding every hypothesis consistent with the training data. Its inductive bias
is solely a consequence of the expressive power of its hypothesis
representation. Its search strategy introduces no additional bias.
(RESTRICTION BIAS LANGUAGE BIAS)
Week 3-End
• read
– Supplemantary Book “Machine Learning”Tom Mitchell Chapter 1 – Chapter 2
– Course Text Book Chapter 2 (preparation for
the next week)