Download Week 9 – Probability and Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
MATHEMATICS FOR COMPUTER
VISION WEEK 9
PROBABILITY AND INFERENCE
1
Dr Fabio Cuzzolin
MSc in Computer Vision
Oxford Brookes University
Year 2013-14
OUTLINE OF WEEK 9

introduction to probability theory and
Bayesian inference
Probability measures
 Random variables
 Marginal and joint distributions
 Bayes rule
 conditional probability
 Random processes
 Markov chains

2
PROBABILITY MEASURES AND
DISTRIBUTIONS
3
Week 9 – Probability and Inference
PROBABILITY MEASURES
probability measure → mathematical representation of
the notion of chance
 assigns a probability value to every subset of a collection
of possible outcomes (of a random experiment, of a
decision problem, etc)
 collection of outcomes → sample space, universe
 subset of the universe → event

4
EXAMPLE

typical example: the spinning wheel
spinning wheel with 3 possible outcomes
 universe Ω = {1,2,3}
 eight possible events (right), including the empty set
 probability of ∅ is 0, probability of Ω is 1
 additivity: P({1,2}) = P({1}) + P({2})

5
FORMAL DEFINITION
probability measure µ: a real-valued function on a
probability space that satisfies countable additivity
 probability space: is a triple (Ω, F, P) formed by a
universe Ω, a σ-algebra of its subsets, and a probability
measure on F
 not all subsets of Ω belong necessarily to F
 axioms:
 µ(∅) = 0, µ(Ω) = 1
 0 ≤ µ(A) ≤ 1 for all events A∈F
 additivity: for all “countable” collection of pairwise
disjoint events Ai
µ(∪i Ai) = ∑i µ(Ai)

6
RANDOM VARIABLE
a variable whose value is subject to random variations,
i.e. due to chance: what chance is is subject to
philosophical debate!
 it can take one of a set of possible values, with
associated probability
 mathematically, it is a function X from a sample space
Ω (which forms a probability space) to (usually) the reals
 subject to a condition of “measurability”: each range of
values of the real line must have an anti-image in
Ω which has a probability value
 this way, we can forget about the initial probability
space and record the probabilities of various values of X

7
EXAMPLE

the sample space is the
set of outcomes of
rolling two dice
Ω = { (1,1), (1,2), (1,3), (1,4), ... , (6,4), (6,5), (6,6) }
 a random variable can be the function that associates
each roll of the two dice to the sum S of the faces
 random variables can be discrete or continuous
 S is a discrete random variable

8
(CUMULATIVE) PROBABILITY DISTRIBUTION
OF A RANDOM VARIABLE
the probability distribution of a random variable X
records the probability values for all real values x in the
range of X
 we can then answer all questions of the form, what is
the probability P(a ≤ X ≤ b), P(X > a), etcetera
• these ranges of values are called “Borel sets”
 all the information is captured by the cumulative
distribution F(x) = P(X ≤ x)

9
DISCRETE PROBABILITY DISTRIBUTIONS
a random variable is called discrete when X can only
assume a finite or a countably infinite (e.g. the set of
integer numbers 1,2,3, ...) number of values
 it is described by a (probability) mass function
 example

common discrete distributions: Poisson, Bernoulli,
 binomial → mathematical description of number of
successes in a series of trials

10
BINOMIAL DISTRIBUTION
the discrete probability distribution of the number of
successes in a sequence of n independent yes/no
experiments, each of which yields success with
probability p
 example of probability distribution (left)
 example of cumulative distribution (right)

11
CONTINUOUS DISTRIBUTIONS – PDFS
a random variable is called continuous when it can
assume values in a non-countable set (e.g. the real line)
 it is described by a probability density function (PDF)
 which describes the likelihood of the variable taking any
continuous (real) value
 the probability of any range of values (e.g., an interval) is
the integral of the PDF over the range
 there are mixed distributions as well, with cumulative
function of the form

12
EXAMPLES OF CONTINUOUS PDFs

examples of continuous PDFs:
• Gaussian → fundamental, see the law of large
numbers
• Beta, gamma, chi-square...
13
EXAMPLE OF CONTINUOUS PDF: THE GAUSSIAN PDF
most “famous”
continuous random
variable: the Gaussian
r.v.
 typical PDF of a
Gaussian:


shape characterised by
a mean µ and a
standard deviation σ
14
MOMENTS
a random variable can be (partially) described by its
moments, which give some indications on its shape
 n-th moment of a probability distribution:

where X is a random variable with cumulative distr. F
 E is called the expectation operator
 may or may not exists, for a r.v. X
 two major moments: mean and variance

15
MEAN AND VARIANCE
mean or expected value (first order moment)
• continuous case:
• discrete case:
 variance (second order moment)
• continuous case
• discrete case
 describes how spread out are the values of X with
respect to the mean
 standard deviation → square root of variance
 relation between mean and variance:

16
EXAMPLES OF MOMENTS

Normal (Gaussian) distribution: mean µ, variance σ2
Binomial: mean np, variance np (1-p)
 Exponential distribution:
• Mean λ-1
• Variance

17
LAWS OF PROBABILITY
18
Week 9 – Probability and Inference
LAW OF LARGE NUMBERS
describes what happens when you repeat the same
random experiment an increasing number of times n
 the average of the results (sample mean)
should be close to the expected value (mean)
 probabilities become predictable as we run the same
trial more and more times!
 strong law


weak law
19
CENTRAL LIMIT THEOREM
the mean of a sufficiently large number of iterates of
independent random variables is normally (Gaussian)
distributed
 let X1,...,Xn independent and identically distributed r.v.s
with the same mean µ and variance σ2
 we can build the usual sample average
 the random variable √n (Sn − µ) tends to a Gaussian with
mean 0 and variance σ2

20
CENTRAL LIMIT THEOREM – ILLUSTRATION

sum of N uniform random variables
21
CONDITIONAL PROBABILITY
22
Week 9 – Probability and Inference
CONDITIONAL PROBABILITIES
probability an event will occur, given that another event
has occurred (or not)
 said “probability of A given B” P(A|B)
 two definitions:
 quotient of joint prob of A and B and the probability of B


as an (multiplication) axiom of probability theory (De
Finetti):
23
ILLUSTRATIONS
P(A) = 0.52
 P(A|B1) = 0.1
 P(A|B2) = 0.12

rolling of two dice

P({A=2}) = 6/36 = 1/6

P({A=2}|{A+B≤5}) = 3/10
24
LAW OF TOTAL PROBABILITY
fundamental law relating marginal probabilities to
conditional probabilities
 Idea: if the universe can be decomposed into a disjoint
partition of events Bi, the marginal (total) probability of
an event A is the sum of the joint probabilities with Bi
 P(A∩Bi) can also be expressed via the conditionals

25
BAYES’ RULE
relates conditional and prior probabilities
 has various interpretations, according to different
interpretation of probability measures
 Bayesian interpretation: probability is a degree of belief
in a proposition A, before P(A) or after P(A|B) new
evidence in gathered
 evidence is always in the form “proposition B is true”
 nomenclature:

• P(A) is the prior (initial degree of belief in A)
• P(A|B) is the posterior (after evidence B is considered)

posteriors are in the form of conditional probabilities,
and can be computed by Bayes' rule
26
MANY VARIABLES
27
Week 9 – Probability and Inference
JOINT DISTRIBUTION OF SEVERAL RANDOM VARIABLES
what happens when we have more than one random
variable, X, Y, ... on the same probability space?
 we can define a joint distribution which specifies the
probability of X, Y, etc falling in any given range of values
 example: joint Gaussian

28
MARGINAL DISTRIBUTION
from the joint distribution P(X,Y,..) of two or more
random variables X,Y, ..., one can recover the distribution
of each single random variable X
 this is called the marginal distribution of X
 discrete example:


discrete formula:

continuous formula:
29
INDEPENDENCE

independence of events

independence and
conditional probability
generalises to n events: distinguish pairwise/mutual
independence
 independence of random variables: every pair of Borel
interval are independent (as events)
 the joint PDF decomposes as

30
RANDOM PROCESS
also called stochastic process: it is a collection of random
variables
 typically used to describe the evolution of some random
value over time X(t)
 in this sense, statistical counterpart to deterministic
dynamical systems whose evolution is fixed given X(0)
 however, can be defined over any domain, 2-D, etc
 discrete time: sequence of random variables, or time
series
 continuous domain: random field

31
RANDOM PROCESSES AS RANDOM-VALUED
FUNCTIONS






interpretation: a random process is a
function on its domain, whose values
are random variables
random values at different points of the
domain can be completely different
usually, though, they are required to be
of the same type (identically
distributed, i.i.)
the component random variables can
be independent or have complicated
statistical relations
EEG signals, stock market fluctuations,
but also images and videos!
Markov Random Fields are used for
image segmentation
32
RANDOM PROCESSES AS ENSEMBLES OF REALIZATIONS




helpful interpretation: as an ensemble of functions
idea: you extract a sample value from each random variable
forming the process
you get a standard, “deterministic” function on the same domain of
the random process
to each such function is attached a probability value → process =
probability distribution over functions
33
TYPES OF RANDOM PROCESSES
a stationary process is one for which the joint
distribution of a collection of its random variables does
not change when shifted around its domain
• For instance, P(X(t),X(t+1)) = P(X(t+2),X(t+3))
• weak sense →
 a process is ergodic if its moments can be obtained as
limits of sample means and covariances, for size of the
sample that goes to ∞
 discrete time, continuous time, etc

34
EXAMPLES
simple markov chain describing
market conditions
 example of transition matrix of a
Markov chains describing weather
conditions

36
SUMMARY
37
Week 9 – Probability and Inference
SUMMARY OF WEEK 9

Introduction to probability theory
probability measures
 random variables
 moments, mean and variance
 laws of probability
 Bayes' rule and conditional probabilities
 independence
 random processes
 Markov chains

38