Download coppin chapter 12

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Birthday problem wikipedia , lookup

Probability box wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Chapter 12
Probabilistic Reasoning and
Bayesian Belief Networks
1
Chapter 12 Contents









Probabilistic Reasoning
Joint Probability Distributions
Bayes’ Theorem
Simple Bayesian Concept Learning
Bayesian Belief Networks
The Noisy-V Function
Bayes’ Optimal Classifier
The Naïve Bayes Classifier
Collaborative Filtering
2
Probabilistic Reasoning

Probabilities are expressed in a
notation similar to that of predicates
in FOPC:
P(S) = 0.5
P(T) = 1
P(¬(A Λ B) V C) = 0.2

1 = certain; 0 = certainly not
3
Conditional Probability

Conditional probability refers to the
probability of one thing given that we
already know another to be true:

This states the probability of B, given
A.
4
Joint Probability Distributions



A joint probability distribution represents
the combined probabilities of two or more
variables.
This table shows, for example, that
P (A Λ B) = 0.11
P (¬A Λ B) = 0.09
Using this, we can calculate P(A):
P(A) = P(A Λ B) + P(A Λ ¬B)
= 0.11 + 0.63
= 0.74
5
Bayes’ Theorem

Bayes’ theorem lets us calculate a
conditional probability:
P(B) is the prior probability of B.
 P(B | A) is the posterior probability of
B.

6
Simple Bayesian Concept Learning (1)




P (H|E) is used to represent the probability that
some hypothesis, H, is true, given evidence E.
Let us suppose we have a set of hypotheses
H1…Hn.
For each Hi
Hence, given a piece of evidence, a learner can
determine which is the most likely explanation by
finding the hypothesis that has the highest
posterior probability.
7
Simple Bayesian Concept Learning (2)





In fact, this can be simplified.
Since P(E) is independent of Hi it will have the
same value for each hypothesis.
Hence, it can be ignored, and we can find the
hypothesis with the highest value of:
We can simplify this further if all the
hypotheses are equally likely, in which case we
simply seek the hypothesis with the highest
value of P(E|Hi).
This is the likelihood of E given Hi.
8
Bayesian Belief Networks (1)




A belief network shows the dependencies
between a group of variables.
If two variables A and B are independent if the
likelihood that A will occur has nothing to do
with whether B occurs.
C and D are dependent on A; D and E
are dependent on B.
The Bayesian belief network has
probabilities associated with each link.
E.g., P(C|A) = 0.2, P(C|¬A) = 0.4
9
Bayesian Belief Networks (2)

A complete set of probabilities for this belief
network might be:










P(A) = 0.1
P(B) = 0.7
P(C|A) = 0.2
P(C|¬A) = 0.4
P(D|A Λ B) = 0.5
P(D|A Λ ¬B) = 0.4
P(D|¬A Λ B) = 0.2
P(D|¬A Λ ¬B) = 0.0001
P(E|B) = 0.2
P(E|¬B) = 0.1
10
Bayesian Belief Networks (3)

We can now calculate conditional probabilities:

In fact, we can simplify this, since there are
no dependencies between certain pairs of
variables – between E and A, for example.
Hence:
11
Bayes’ Optimal Classifier





A system that uses Bayes’ theory to classify data.
We have a piece of data y, and are seeking the
correct hypothesis from H1 … H5, each of which
assigns a classification to y.
The probability that y should be classified as cj is:
x1 to xn are the training data, and m is the number
of hypotheses.
This method provides the best possible
classification for a piece of data.
12
The Naïve Bayes Classifier (1)





A vector of data is classified as a single classification.
p(ci| d1, …, dn)
The classification with the highest posterior probability is
chosen.
The hypothesis which has the highest posterior
probability is the maximum a posteriori, or MAP
hypothesis.
In this case, we are looking for the MAP
classification.
Bayes’ theorem is used to find the posterior
probability:
13
The Naïve Bayes Classifier (2)

since P(d1, …, dn) is a constant, independent of ci,
we can eliminate it, and simply aim to find the
classification ci, for which the following is
maximised:

We now assume that all the attributes d1, …, dn are
independent
So P(d1, …, dn|ci) can be rewritten as:


The classification for which this is highest is chosen to
classify the data.
14
Collaborative Filtering



A method that uses Bayesian
reasoning to suggest items that
a person might be interested in,
based on their known interests.
if we know that Anne and Bob
both like A, B and C, and that
Anne likes D then we guess
that Bob would also like D.
Can be calculated using
decision trees:
15