Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MATHEMATICS FOR COMPUTER VISION WEEK 9 PROBABILITY AND INFERENCE 1 Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14 OUTLINE OF WEEK 9 introduction to probability theory and Bayesian inference Probability measures Random variables Marginal and joint distributions Bayes rule conditional probability Random processes Markov chains 2 PROBABILITY MEASURES AND DISTRIBUTIONS 3 Week 9 – Probability and Inference PROBABILITY MEASURES probability measure → mathematical representation of the notion of chance assigns a probability value to every subset of a collection of possible outcomes (of a random experiment, of a decision problem, etc) collection of outcomes → sample space, universe subset of the universe → event 4 EXAMPLE typical example: the spinning wheel spinning wheel with 3 possible outcomes universe Ω = {1,2,3} eight possible events (right), including the empty set probability of ∅ is 0, probability of Ω is 1 additivity: P({1,2}) = P({1}) + P({2}) 5 FORMAL DEFINITION probability measure µ: a real-valued function on a probability space that satisfies countable additivity probability space: is a triple (Ω, F, P) formed by a universe Ω, a σ-algebra of its subsets, and a probability measure on F not all subsets of Ω belong necessarily to F axioms: µ(∅) = 0, µ(Ω) = 1 0 ≤ µ(A) ≤ 1 for all events A∈F additivity: for all “countable” collection of pairwise disjoint events Ai µ(∪i Ai) = ∑i µ(Ai) 6 RANDOM VARIABLE a variable whose value is subject to random variations, i.e. due to chance: what chance is is subject to philosophical debate! it can take one of a set of possible values, with associated probability mathematically, it is a function X from a sample space Ω (which forms a probability space) to (usually) the reals subject to a condition of “measurability”: each range of values of the real line must have an anti-image in Ω which has a probability value this way, we can forget about the initial probability space and record the probabilities of various values of X 7 EXAMPLE the sample space is the set of outcomes of rolling two dice Ω = { (1,1), (1,2), (1,3), (1,4), ... , (6,4), (6,5), (6,6) } a random variable can be the function that associates each roll of the two dice to the sum S of the faces random variables can be discrete or continuous S is a discrete random variable 8 (CUMULATIVE) PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE the probability distribution of a random variable X records the probability values for all real values x in the range of X we can then answer all questions of the form, what is the probability P(a ≤ X ≤ b), P(X > a), etcetera • these ranges of values are called “Borel sets” all the information is captured by the cumulative distribution F(x) = P(X ≤ x) 9 DISCRETE PROBABILITY DISTRIBUTIONS a random variable is called discrete when X can only assume a finite or a countably infinite (e.g. the set of integer numbers 1,2,3, ...) number of values it is described by a (probability) mass function example common discrete distributions: Poisson, Bernoulli, binomial → mathematical description of number of successes in a series of trials 10 BINOMIAL DISTRIBUTION the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p example of probability distribution (left) example of cumulative distribution (right) 11 CONTINUOUS DISTRIBUTIONS – PDFS a random variable is called continuous when it can assume values in a non-countable set (e.g. the real line) it is described by a probability density function (PDF) which describes the likelihood of the variable taking any continuous (real) value the probability of any range of values (e.g., an interval) is the integral of the PDF over the range there are mixed distributions as well, with cumulative function of the form 12 EXAMPLES OF CONTINUOUS PDFs examples of continuous PDFs: • Gaussian → fundamental, see the law of large numbers • Beta, gamma, chi-square... 13 EXAMPLE OF CONTINUOUS PDF: THE GAUSSIAN PDF most “famous” continuous random variable: the Gaussian r.v. typical PDF of a Gaussian: shape characterised by a mean µ and a standard deviation σ 14 MOMENTS a random variable can be (partially) described by its moments, which give some indications on its shape n-th moment of a probability distribution: where X is a random variable with cumulative distr. F E is called the expectation operator may or may not exists, for a r.v. X two major moments: mean and variance 15 MEAN AND VARIANCE mean or expected value (first order moment) • continuous case: • discrete case: variance (second order moment) • continuous case • discrete case describes how spread out are the values of X with respect to the mean standard deviation → square root of variance relation between mean and variance: 16 EXAMPLES OF MOMENTS Normal (Gaussian) distribution: mean µ, variance σ2 Binomial: mean np, variance np (1-p) Exponential distribution: • Mean λ-1 • Variance 17 LAWS OF PROBABILITY 18 Week 9 – Probability and Inference LAW OF LARGE NUMBERS describes what happens when you repeat the same random experiment an increasing number of times n the average of the results (sample mean) should be close to the expected value (mean) probabilities become predictable as we run the same trial more and more times! strong law weak law 19 CENTRAL LIMIT THEOREM the mean of a sufficiently large number of iterates of independent random variables is normally (Gaussian) distributed let X1,...,Xn independent and identically distributed r.v.s with the same mean µ and variance σ2 we can build the usual sample average the random variable √n (Sn − µ) tends to a Gaussian with mean 0 and variance σ2 20 CENTRAL LIMIT THEOREM – ILLUSTRATION sum of N uniform random variables 21 CONDITIONAL PROBABILITY 22 Week 9 – Probability and Inference CONDITIONAL PROBABILITIES probability an event will occur, given that another event has occurred (or not) said “probability of A given B” P(A|B) two definitions: quotient of joint prob of A and B and the probability of B as an (multiplication) axiom of probability theory (De Finetti): 23 ILLUSTRATIONS P(A) = 0.52 P(A|B1) = 0.1 P(A|B2) = 0.12 rolling of two dice P({A=2}) = 6/36 = 1/6 P({A=2}|{A+B≤5}) = 3/10 24 LAW OF TOTAL PROBABILITY fundamental law relating marginal probabilities to conditional probabilities Idea: if the universe can be decomposed into a disjoint partition of events Bi, the marginal (total) probability of an event A is the sum of the joint probabilities with Bi P(A∩Bi) can also be expressed via the conditionals 25 BAYES’ RULE relates conditional and prior probabilities has various interpretations, according to different interpretation of probability measures Bayesian interpretation: probability is a degree of belief in a proposition A, before P(A) or after P(A|B) new evidence in gathered evidence is always in the form “proposition B is true” nomenclature: • P(A) is the prior (initial degree of belief in A) • P(A|B) is the posterior (after evidence B is considered) posteriors are in the form of conditional probabilities, and can be computed by Bayes' rule 26 MANY VARIABLES 27 Week 9 – Probability and Inference JOINT DISTRIBUTION OF SEVERAL RANDOM VARIABLES what happens when we have more than one random variable, X, Y, ... on the same probability space? we can define a joint distribution which specifies the probability of X, Y, etc falling in any given range of values example: joint Gaussian 28 MARGINAL DISTRIBUTION from the joint distribution P(X,Y,..) of two or more random variables X,Y, ..., one can recover the distribution of each single random variable X this is called the marginal distribution of X discrete example: discrete formula: continuous formula: 29 INDEPENDENCE independence of events independence and conditional probability generalises to n events: distinguish pairwise/mutual independence independence of random variables: every pair of Borel interval are independent (as events) the joint PDF decomposes as 30 RANDOM PROCESS also called stochastic process: it is a collection of random variables typically used to describe the evolution of some random value over time X(t) in this sense, statistical counterpart to deterministic dynamical systems whose evolution is fixed given X(0) however, can be defined over any domain, 2-D, etc discrete time: sequence of random variables, or time series continuous domain: random field 31 RANDOM PROCESSES AS RANDOM-VALUED FUNCTIONS interpretation: a random process is a function on its domain, whose values are random variables random values at different points of the domain can be completely different usually, though, they are required to be of the same type (identically distributed, i.i.) the component random variables can be independent or have complicated statistical relations EEG signals, stock market fluctuations, but also images and videos! Markov Random Fields are used for image segmentation 32 RANDOM PROCESSES AS ENSEMBLES OF REALIZATIONS helpful interpretation: as an ensemble of functions idea: you extract a sample value from each random variable forming the process you get a standard, “deterministic” function on the same domain of the random process to each such function is attached a probability value → process = probability distribution over functions 33 TYPES OF RANDOM PROCESSES a stationary process is one for which the joint distribution of a collection of its random variables does not change when shifted around its domain • For instance, P(X(t),X(t+1)) = P(X(t+2),X(t+3)) • weak sense → a process is ergodic if its moments can be obtained as limits of sample means and covariances, for size of the sample that goes to ∞ discrete time, continuous time, etc 34 EXAMPLES simple markov chain describing market conditions example of transition matrix of a Markov chains describing weather conditions 36 SUMMARY 37 Week 9 – Probability and Inference SUMMARY OF WEEK 9 Introduction to probability theory probability measures random variables moments, mean and variance laws of probability Bayes' rule and conditional probabilities independence random processes Markov chains 38