Download Last Lecture Today

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human–computer interaction wikipedia , lookup

Enactivism wikipedia , lookup

Intelligence explosion wikipedia , lookup

Fuzzy logic wikipedia , lookup

Ecological interface design wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Computer Go wikipedia , lookup

Inductive probability wikipedia , lookup

Pattern recognition wikipedia , lookup

AI winter wikipedia , lookup

Personal knowledge base wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Knowledge representation and reasoning wikipedia , lookup

Transcript
Last Lecture
• Data Mining Techniques
– Genetic Algorithms
– Artificial Neural Networks
Today
• Data Mining Techniques
– Bayesian statistics and classifier
– Artificial Intelligence
1
Bayesian Statistics
• Contrary to the frequentist approach, Bayesian
statistics measure degrees of belief
• Degrees are calculated by starting with prior
beliefs and updating probabilities in face of the
evidence - using Bayes theorem
• Priors can be estimated from experience, from
other methods or even guessed
– For this reason it is also called subjective probability
Joint/Conditional Probability
• P(A, B)
• P(A|B)
Joint Probability Distribution,
Conditional probability. The
Probability of both A and B
probability of A, given that
happening.
B already happen.
P(A,B)<P(A|B)
A
B
P( A | B ) =
P ( AB )
P( B )
2
Bayes Classifier
• A probabilistic framework for solving classification
problems
P ( A,C )
P (C | A ) =
• From the definition of
P ( A)
conditional probability it
P ( A,C )
follows the Bayes theorem:
P(A |C ) =
P (C )
likelihood
• Bayes theorem:
P (C | A ) =
posterior
prior
P ( A | C ) P (C )
P ( A)
evidence
Example of Bayes Theorem
• Given:
– A doctor knows that meningitis (M) causes stiff neck (S) 50% of the
time
– Prior probability of any patient having meningitis P(M) is 1/50,000
– Prior probability of any patient having stiff neck P(S) is 1/20
• If a patient has stiff neck (S), what’s the
probability he/she has meningitis (M)?
P(M | S ) =
P ( S | M ) P( M ) 0.5 × 1 / 50000
=
= 0.0002
P( S )
1 / 20
3
Bayesian Classifiers
• Consider each attribute and class label as
random variables
• Given a record with attributes (A1, A2,…,An)
– Goal is to predict class C
– Specifically, we want to find the value of C that
maximizes P(C| A1, A2,…,An )
• Can we estimate P(C| A1, A2,…,An ) directly
from data?
Bayesian Classifiers
• Approach:
– Compute the posterior probability P(C | A1, A2, …, An) for all values of C
using the Bayes theorem
P (C | A A K A ) =
1
2
n
P ( A A K A | C ) P (C )
P(A A K A )
1
2
n
1
2
n
– Choose value of C that maximizes
P(C | A1, A2, …, An)
– Equivalent to choosing value of C that maximizes
P(A1, A2, …, An|C) P(C)
• How to estimate P(A1, A2, …, An | C )?
4
Naïve Bayes Classifier
• A naive Bayes classifier is a simple probabilistic
classifier based on applying Bayes' theorem with
strong (naive) independence assumptions, or more
specifically, independent feature model.
• Assumes independence among attributes Ai when
class C is given:
– P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
– Can estimate P(Ai| Cj) for all Ai and Cj.
– New point is classified as Cj if the numerator in the
Bayes equation (P(Cj) Π P(Ai| Cj) ) is maximal
(Maximum A Posteriori(MAP))
Naive Bayes Probability Model
• Graphical illustration
– a class node C at root, want P(C|A1,…,An)
– evidence nodes A - observed attributes/features as leaves
– conditional independence between all evidence
C
A1
A2
……
An
5
How to Estimate Probabilities from
al
al
u s Data?
ric
ric
uo
t
ca
Tid
Refund
eg
o
t
ca
eg
o
co
nt
in
as
cl
Marital
Status
Taxable
Income
Evade
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
s
• Class: P(C) = Nc/N
– e.g., P(No) = 7/10,
P(Yes) = 3/10
k
• For discrete attributes:
P(Ai | Ck) = |Aik|/ Nc
– where |Aik| is number of
instances having attribute Ai
and belongs to class Ck
– Examples:
P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
10
Example of Naïve Bayes
Classifier
Given a Test Record:
X = (Refund = No, Married, Income = 120K)
naive Bayes Classifier:
P(Refund=Yes|No) = 3/7
P(Refund=No|No) = 4/7
P(Refund=Yes|Yes) = 0
P(Refund=No|Yes) = 1
P(Marital Status=Single|No) = 2/7
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7
P(Marital Status=Single|Yes) = 2/7
P(Marital Status=Divorced|Yes)=1/7
P(Marital Status=Married|Yes) = 0
For taxable income:
If class=No:
sample mean=110
sample variance=2975
If class=Yes: sample mean=90
sample variance=25
P(X|Class=No) = P(Refund=No|Class=No)
× P(Married| Class=No)
× P(Income=120K| Class=No)
= 4/7 × 4/7 × 0.0072 = 0.0024
P(X|Class=Yes) = P(Refund=No| Class=Yes)
× P(Married| Class=Yes)
× P(Income=120K| Class=Yes)
= 1 × 0 × 1.2 × 10-9 = 0
Since P(X|No)P(No) > P(X|Yes)P(Yes)
Therefore P(No|X) > P(Yes|X)
=> Class = No
6
Naïve Bayes Classifier
• If there is few data, a problem is that if one of the
conditional probability is zero, then the entire expression
becomes zero
• To avoid this problem probability estimations on
conditionals can be expressed in different ways:
N ic
Nc
Original
: P ( Ai | C ) =
Laplace
N ic + 1
: P ( Ai | C ) =
Nc + c
c: number of classes
p: prior probability
m: parameter
N ic + mp
: P ( Ai | C ) =
Nc + m
m - estimate
Example of Naïve Bayes
Classifier
Name
human
python
salmon
whale
frog
komodo
bat
pigeon
cat
leopard shark
turtle
penguin
porcupine
eel
salamander
gila monster
platypus
owl
dolphin
eagle
Give Birth
yes
Give Birth
yes
no
no
yes
no
no
yes
no
yes
yes
no
no
yes
no
no
no
no
no
yes
no
Can Fly
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
yes
no
yes
Can Fly
no
Live in Water Have Legs
no
no
yes
yes
sometimes
no
no
no
no
yes
sometimes
sometimes
no
yes
sometimes
no
no
no
yes
no
Class
yes
no
no
no
yes
yes
yes
yes
yes
no
yes
yes
yes
no
yes
yes
yes
yes
no
yes
mammals
non-mammals
non-mammals
mammals
non-mammals
non-mammals
mammals
non-mammals
mammals
non-mammals
non-mammals
non-mammals
mammals
non-mammals
non-mammals
non-mammals
mammals
non-mammals
mammals
non-mammals
Live in Water Have Legs
yes
no
Class
?
A: attributes
M: mammals
N: non-mammals
6 6 2 2
P ( A | M ) = × × × = 0.06
7 7 7 7
1 10 3 4
P ( A | N ) = × × × = 0.0042
13 13 13 13
7
P ( A | M ) P( M ) = 0.06 × = 0.021
20
13
P ( A | N ) P ( N ) = 0.004 × = 0.0027
20
P(A|M)P(M) > P(A|N)P(N)
=> Mammals
7
Naïve Bayes (Summary)
• Robust to isolated noise points (they are averaged)
• Handle missing values by ignoring the instance
during probability estimate calculations
• Robust to irrelevant attributes
• Independence assumption may not hold for some
attributes but in spite of this Naïve Bayes has shown
good performance
– We can use other techniques such as Bayesian Belief
Networks (BBN) or Bayesian Networks
Concepts and Definitions
of Artificial Intelligence
• Artificial intelligence (AI) definitions
– Artificial intelligence (AI)
The subfield of computer science concerned with symbolic
reasoning and problem solving
• Characteristics of AI
– Symbolic processing
• Numeric versus symbolic
• Algorithmic versus heuristic
– Heuristics
Informal, judgmental knowledge of an application area that
constitutes the “rules of good judgment” in the field. Heuristics
also encompasses the knowledge of how to solve problems
efficiently and effectively.
8
Concepts and Definitions
of Artificial Intelligence
• Characteristics of artificial intelligence
– Inferencing
• Reasoning capabilities that can build higher-level
knowledge from existing heuristics
– Machine learning
• Learning capabilities that allow systems to adjust their
behavior and react to changes in the outside environment
– Knowledge-based systems (KBS)
• Technologies that use qualitative knowledge rather than
mathematical models to provide the needed supports
AI History
9
Evolution of Artificial Intelligence
Robotics, data mining
business intelligence
GA, ANN, Bayesian
Networks
Expert systems or KBS
GA, Evolutionary computing
Search heuristics
Knowledge representation
Reasoning strategies
ANN, Fuzzy Logic
Stanley
Autonomous vehicle, DARPA
2005 grand challenge winner:
drove 142 miles in a dessert
within less than 7 hours. Relied
on machine learning and
probabilistic reasoning.
Laser vision system
Computer system
10
The Artificial Intelligence Field
The Artificial Intelligence Field
• Applications of artificial intelligence
– Expert system (ES)
A computer system that applies reasoning
methodologies to knowledge in a specific domain to
render advice or recommendations, much like a
human expert. A computer system that achieves a
high level of performance in task areas that, for
human beings, require years of special education
and training
11
Break
Basic Concepts
of Expert Systems (ES)
• The basic concepts of ES include:
– How to determine who experts are
– How expertise can be transferred from a person to a
computer (knowledge engineering). This is the
biggest challenge.
– How the system works
12
Basic Concepts
of Expert Systems (ES)
• Expert
A human being who has developed a high level of
proficiency in making judgments in a specific, usually
narrow, domain
• Expertise
The set of capabilities that underlines the performance of
human experts, including extensive domain knowledge,
heuristic rules that simplify and improve approaches to
problem solving, metaknowledge and metacognition, and
compiled forms of behavior that afford great economy in
a skilled performance
Applications of ES
•Development environment:
used by builders. Include the
knowledge base, the
inference engine, knowledge
acquisition, and improving
reasoning capability. The
knowledge engineer and the
expert are considered part of
these environments
•Consultation environment:
used by a nonexpert to obtain
expert knowledge and advice.
It includes the workplace,
inference engine, explanation
facility, recommended action,
and user interface
13
Structure of ES
• Three major components in ES are:
– Knowledge base
– Inference engine
– User interface
• ES may also contain:
–
–
–
–
Knowledge acquisition subsystem
Blackboard (workplace)
Explanation subsystem (justifier)
Knowledge refining system
Structure of ES
• Knowledge base
A collection of facts, rules, and procedures organized
into schemas. The assembly of all the information and
knowledge about a specific field of interest
• Inference engine
The part of an expert system that actually performs the
reasoning function
• User interfaces
The parts of computer systems that interact with users,
accepting commands from the computer keyboard and
displaying the results generated by other parts of the
systems
14
Rule-based system architecture
Control Scheme (Interpreter)
Inference
Condition-Action Rules
R1: IF hot AND smoky THEN ADD fire
R2: IF alarm_beeps THEN ADD smoky
R3 IF fire THEN ADD switch_on_sprinklers
Database of
Facts
alarm_beeps
hot
How ES Work:
Inference Mechanisms
• Knowledge representation and organization
– Expert knowledge must be represented in a computerunderstandable format and organized properly in the
knowledge base
– Different ways of representing human knowledge
include:
• Production rules (most common, the only one discussed here)
• Semantic networks
• Logic statements
15
How ES Work:
Inference Mechanisms
• The inference process: Inference is the process
of chaining multiple rules together based on
available data. Methods
– Forward chaining
A data-driven search in a rule-based system
– Backward chaining
A search technique (employing IF-THEN rules)
used in production systems that begins with the
action clause of a rule and works backward through
a chain of rules in an attempt to find a verifiable set
of condition clauses
Development of ES
• Defining the nature and scope of the problem
– Rule-based ES are appropriate when the nature of
the problem is qualitative, knowledge is explicit,
and experts are available to solve the problem
effectively and provide their knowledge
• Identifying proper experts
– A proper expert should have a thorough
understanding of:
• Problem-solving knowledge
• The role of ES and decision support technology
• Good communication skills
16
Development of ES
• Acquiring knowledge
– Knowledge engineer
An AI specialist responsible for the technical side
of developing an expert system. The knowledge
engineer works closely with the domain expert to
capture the expert’s knowledge in a knowledge base
– Knowledge engineering (KE)
The engineering discipline in which knowledge is
integrated into computer systems to solve complex
problems normally requiring a high level of human
expertise
Development of ES
• Selecting the building tools
– General-purpose development environment (e.g.
Prolog, C++ etc)
– Expert system shell (e.g. Prolog Expert System
PESS, JavaDON, MYCIN, JESS etc)
A computer program that facilitates relatively easy
implementation of a specific expert system.
Analogous to a DSS generator
17
A Fuzzy Expert System
•
•
•
•
A service centre keeps spare parts and repairs failed
ones.
A customer brings a failed item and receives a spare of
the same type.
Failed parts are repaired, placed on the shelf, and thus
become spares.
The objective here is to advise a manager of the service
centre on certain decision policies to keep the customers
satisfied.
Process of developing a fuzzy expert
system
1. Specify the problem and define linguistic variables.
2. Determine fuzzy sets.
3. Elicit and construct fuzzy rules.
4. Encode the fuzzy sets, fuzzy rules and procedures
to perform fuzzy inference into the expert system.
5. Evaluate and tune the system.
18
Bayesian reasoning in ES
Suppose all rules in the knowledge base are represented in the following
form:
IF
E is true
THEN
H is true {with probability p}
This rule implies that if event E occurs, then the probability that event H
will occur is p.
In expert systems, H usually represents a hypothesis and E denotes
evidence to support this hypothesis.
p (H E ) =
p (E H ) × p (H )
p (E H ) × p (H ) + p (E ¬ H ) × p (¬ H )
A problem is that experts need to provide priors/conditionals.
Psychological research shows that humans cannot elicit probability
values consistent with the Bayesian rules.
Certainty factors theory and evidential
reasoning
•
•
Certainty factors theory is a popular alternative to
Bayesian reasoning.
A certainty factor (cf ), a number to measure the
expert’s belief. The maximum value of the certainty
factor is, say, +1.0 (definitely true) and the minimum
−1.0 (definitely false). For example, if the expert states
that some evidence is almost certainly true, a cf value
of 0.8 would be assigned to this evidence.
19
Uncertain terms and their
interpretation in MYCIN
Term
•MYCIN is a
popular
medical expert
system.
Certainty Factor
_
Definitely not
_1.0
Almost certainly not
_0.8
Probably not
_0.6
Maybe not
_0.4
Unknown
0.2 to +0.2
Maybe
+0.4
Probably
+0.6
Almost certainly
+0.8
Definitely
+1.0
Certainty factors theory and evidential
reasoning in ES
In expert systems with certainty factors, the knowledge
base consists of a set of rules that have the following
syntax:
IF
<evidence>
THEN <hypothesis> {cf }
where cf represents belief in hypothesis H given that
evidence E has occurred.
20
Applications of ES
• Applications of ES
–
–
–
–
–
–
–
–
–
–
–
–
Credit analysis systems
Pension fund advisors
Automated help desks
Homeland security systems
Marketing surveillance systems
Business process reengineering systems
Finance
Data processing
Human resources
Manufacturing
Business process automation
Health care management
Benefits, Limitations,
and Success Factors of ES
• Benefits of ES
–
–
–
–
–
–
–
–
–
–
–
–
Enhancement of problem solving and decision making
Improved decision-making processes
Improved decision quality
Ability to solve complex problems
Knowledge transfer to remote locations
Enhancement of other information systems
Capture of scarce expertise
Flexibility
Operation in hazardous environments
Accessibility to knowledge and help desks
Ability to work with incomplete or uncertain information
Provision of training
21
Benefits, Limitations,
and Success Factors of ES
• Problems with ES
– Knowledge is not always readily available
– It can be difficult to extract expertise from humans
– The vocabulary that experts use to express facts and relations
is often limited and not understood by others
– The approach of each expert to a situation assessment may be
different yet correct
– It is difficult to abstract good situational assessments when
under time pressure
– Users of ES have natural cognitive limits
– ES work well only within a narrow domain of knowledge
– Most experts have no independent means of checking whether
their conclusions are reasonable
Hybrid Systems
• Fuzzy Logic
– A methodology that allows us to design systems that are
described with imprecise information
– In some problems it is not easy to get the optimum set of
rules
• Neural Networks
– Can learn models with arbitrary precision
– Are fundamentally black boxes
• Genetic Algorithms
– A brute force optimization technique
• Why not combining these techniques to take advantage
of the features they provide?
22
Hybrid Systems
Fuzzification
Inference
Defuzzification
rule-base
µcold µwarm µhot
if temp is cold
then valve is open
µcold =0.7
0.7
if temp is warm
then valve is half
0.2
measured
temperature
t
µopen µhalf µ
close
0.7
0.2
µwarm =0.2
if temp is hot
then valve is close
v
crisp output
for valve-setting
µhot =0.0
•Shape of membership functions may be learned by a ANN
A Neuro-Fuzzy System
• A fuzzy system trained by heuristic learning
techniques derived from neural networks
• Can be viewed as a 3-layer neural network
with fuzzy weights and special activation
functions
• Always interpretable as a fuzzy system
• Uses constraint learning procedures
• A function approximator (classifier,
controller)
23
Hybrid Systems
•Neuro-Fuzzy
system
Hybrid Systems
• Genetic algorithms and neural networks
– The genetic learning method can perform rule discovery in
large databases, with the rules fed into a conventional ES
or some other intelligent system
– To integrate genetic algorithms with neural network
models use a genetic algorithm to search for potential
weights associated with network connections
– A good genetic learning method can significantly reduce
the time and effort needed to find the optimal neural
network model
24
Summary
• What were the main points of the lecture?
25