Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MATHEMATICS FOR COMPUTER
VISION WEEK 9
PROBABILITY AND INFERENCE
1
Dr Fabio Cuzzolin
MSc in Computer Vision
Oxford Brookes University
Year 2013-14
OUTLINE OF WEEK 9
introduction to probability theory and
Bayesian inference
Probability measures
Random variables
Marginal and joint distributions
Bayes rule
conditional probability
Random processes
Markov chains
2
PROBABILITY MEASURES AND
DISTRIBUTIONS
3
Week 9 – Probability and Inference
PROBABILITY MEASURES
probability measure → mathematical representation of
the notion of chance
assigns a probability value to every subset of a collection
of possible outcomes (of a random experiment, of a
decision problem, etc)
collection of outcomes → sample space, universe
subset of the universe → event
4
EXAMPLE
typical example: the spinning wheel
spinning wheel with 3 possible outcomes
universe Ω = {1,2,3}
eight possible events (right), including the empty set
probability of ∅ is 0, probability of Ω is 1
additivity: P({1,2}) = P({1}) + P({2})
5
FORMAL DEFINITION
probability measure µ: a real-valued function on a
probability space that satisfies countable additivity
probability space: is a triple (Ω, F, P) formed by a
universe Ω, a σ-algebra of its subsets, and a probability
measure on F
not all subsets of Ω belong necessarily to F
axioms:
µ(∅) = 0, µ(Ω) = 1
0 ≤ µ(A) ≤ 1 for all events A∈F
additivity: for all “countable” collection of pairwise
disjoint events Ai
µ(∪i Ai) = ∑i µ(Ai)
6
RANDOM VARIABLE
a variable whose value is subject to random variations,
i.e. due to chance: what chance is is subject to
philosophical debate!
it can take one of a set of possible values, with
associated probability
mathematically, it is a function X from a sample space
Ω (which forms a probability space) to (usually) the reals
subject to a condition of “measurability”: each range of
values of the real line must have an anti-image in
Ω which has a probability value
this way, we can forget about the initial probability
space and record the probabilities of various values of X
7
EXAMPLE
the sample space is the
set of outcomes of
rolling two dice
Ω = { (1,1), (1,2), (1,3), (1,4), ... , (6,4), (6,5), (6,6) }
a random variable can be the function that associates
each roll of the two dice to the sum S of the faces
random variables can be discrete or continuous
S is a discrete random variable
8
(CUMULATIVE) PROBABILITY DISTRIBUTION
OF A RANDOM VARIABLE
the probability distribution of a random variable X
records the probability values for all real values x in the
range of X
we can then answer all questions of the form, what is
the probability P(a ≤ X ≤ b), P(X > a), etcetera
• these ranges of values are called “Borel sets”
all the information is captured by the cumulative
distribution F(x) = P(X ≤ x)
9
DISCRETE PROBABILITY DISTRIBUTIONS
a random variable is called discrete when X can only
assume a finite or a countably infinite (e.g. the set of
integer numbers 1,2,3, ...) number of values
it is described by a (probability) mass function
example
common discrete distributions: Poisson, Bernoulli,
binomial → mathematical description of number of
successes in a series of trials
10
BINOMIAL DISTRIBUTION
the discrete probability distribution of the number of
successes in a sequence of n independent yes/no
experiments, each of which yields success with
probability p
example of probability distribution (left)
example of cumulative distribution (right)
11
CONTINUOUS DISTRIBUTIONS – PDFS
a random variable is called continuous when it can
assume values in a non-countable set (e.g. the real line)
it is described by a probability density function (PDF)
which describes the likelihood of the variable taking any
continuous (real) value
the probability of any range of values (e.g., an interval) is
the integral of the PDF over the range
there are mixed distributions as well, with cumulative
function of the form
12
EXAMPLES OF CONTINUOUS PDFs
examples of continuous PDFs:
• Gaussian → fundamental, see the law of large
numbers
• Beta, gamma, chi-square...
13
EXAMPLE OF CONTINUOUS PDF: THE GAUSSIAN PDF
most “famous”
continuous random
variable: the Gaussian
r.v.
typical PDF of a
Gaussian:
shape characterised by
a mean µ and a
standard deviation σ
14
MOMENTS
a random variable can be (partially) described by its
moments, which give some indications on its shape
n-th moment of a probability distribution:
where X is a random variable with cumulative distr. F
E is called the expectation operator
may or may not exists, for a r.v. X
two major moments: mean and variance
15
MEAN AND VARIANCE
mean or expected value (first order moment)
• continuous case:
• discrete case:
variance (second order moment)
• continuous case
• discrete case
describes how spread out are the values of X with
respect to the mean
standard deviation → square root of variance
relation between mean and variance:
16
EXAMPLES OF MOMENTS
Normal (Gaussian) distribution: mean µ, variance σ2
Binomial: mean np, variance np (1-p)
Exponential distribution:
• Mean λ-1
• Variance
17
LAWS OF PROBABILITY
18
Week 9 – Probability and Inference
LAW OF LARGE NUMBERS
describes what happens when you repeat the same
random experiment an increasing number of times n
the average of the results (sample mean)
should be close to the expected value (mean)
probabilities become predictable as we run the same
trial more and more times!
strong law
weak law
19
CENTRAL LIMIT THEOREM
the mean of a sufficiently large number of iterates of
independent random variables is normally (Gaussian)
distributed
let X1,...,Xn independent and identically distributed r.v.s
with the same mean µ and variance σ2
we can build the usual sample average
the random variable √n (Sn − µ) tends to a Gaussian with
mean 0 and variance σ2
20
CENTRAL LIMIT THEOREM – ILLUSTRATION
sum of N uniform random variables
21
CONDITIONAL PROBABILITY
22
Week 9 – Probability and Inference
CONDITIONAL PROBABILITIES
probability an event will occur, given that another event
has occurred (or not)
said “probability of A given B” P(A|B)
two definitions:
quotient of joint prob of A and B and the probability of B
as an (multiplication) axiom of probability theory (De
Finetti):
23
ILLUSTRATIONS
P(A) = 0.52
P(A|B1) = 0.1
P(A|B2) = 0.12
rolling of two dice
P({A=2}) = 6/36 = 1/6
P({A=2}|{A+B≤5}) = 3/10
24
LAW OF TOTAL PROBABILITY
fundamental law relating marginal probabilities to
conditional probabilities
Idea: if the universe can be decomposed into a disjoint
partition of events Bi, the marginal (total) probability of
an event A is the sum of the joint probabilities with Bi
P(A∩Bi) can also be expressed via the conditionals
25
BAYES’ RULE
relates conditional and prior probabilities
has various interpretations, according to different
interpretation of probability measures
Bayesian interpretation: probability is a degree of belief
in a proposition A, before P(A) or after P(A|B) new
evidence in gathered
evidence is always in the form “proposition B is true”
nomenclature:
• P(A) is the prior (initial degree of belief in A)
• P(A|B) is the posterior (after evidence B is considered)
posteriors are in the form of conditional probabilities,
and can be computed by Bayes' rule
26
MANY VARIABLES
27
Week 9 – Probability and Inference
JOINT DISTRIBUTION OF SEVERAL RANDOM VARIABLES
what happens when we have more than one random
variable, X, Y, ... on the same probability space?
we can define a joint distribution which specifies the
probability of X, Y, etc falling in any given range of values
example: joint Gaussian
28
MARGINAL DISTRIBUTION
from the joint distribution P(X,Y,..) of two or more
random variables X,Y, ..., one can recover the distribution
of each single random variable X
this is called the marginal distribution of X
discrete example:
discrete formula:
continuous formula:
29
INDEPENDENCE
independence of events
independence and
conditional probability
generalises to n events: distinguish pairwise/mutual
independence
independence of random variables: every pair of Borel
interval are independent (as events)
the joint PDF decomposes as
30
RANDOM PROCESS
also called stochastic process: it is a collection of random
variables
typically used to describe the evolution of some random
value over time X(t)
in this sense, statistical counterpart to deterministic
dynamical systems whose evolution is fixed given X(0)
however, can be defined over any domain, 2-D, etc
discrete time: sequence of random variables, or time
series
continuous domain: random field
31
RANDOM PROCESSES AS RANDOM-VALUED
FUNCTIONS
interpretation: a random process is a
function on its domain, whose values
are random variables
random values at different points of the
domain can be completely different
usually, though, they are required to be
of the same type (identically
distributed, i.i.)
the component random variables can
be independent or have complicated
statistical relations
EEG signals, stock market fluctuations,
but also images and videos!
Markov Random Fields are used for
image segmentation
32
RANDOM PROCESSES AS ENSEMBLES OF REALIZATIONS
helpful interpretation: as an ensemble of functions
idea: you extract a sample value from each random variable
forming the process
you get a standard, “deterministic” function on the same domain of
the random process
to each such function is attached a probability value → process =
probability distribution over functions
33
TYPES OF RANDOM PROCESSES
a stationary process is one for which the joint
distribution of a collection of its random variables does
not change when shifted around its domain
• For instance, P(X(t),X(t+1)) = P(X(t+2),X(t+3))
• weak sense →
a process is ergodic if its moments can be obtained as
limits of sample means and covariances, for size of the
sample that goes to ∞
discrete time, continuous time, etc
34
EXAMPLES
simple markov chain describing
market conditions
example of transition matrix of a
Markov chains describing weather
conditions
36
SUMMARY
37
Week 9 – Probability and Inference
SUMMARY OF WEEK 9
Introduction to probability theory
probability measures
random variables
moments, mean and variance
laws of probability
Bayes' rule and conditional probabilities
independence
random processes
Markov chains
38