Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Distributions and Frequentist Statistics “A single death is a tragedy, a million deaths is a statistic” Joseph Stalin Can we answer that? N Balls Total M Red N-M Blue ? 1st draw P(R1|I) = (M/N) ? 2nd draw N Balls Total M Red N-M Blue The Red and the Blue The Outcome of first draw Red-2 R2 = (R1 + B1), R2 is a “nuisance” parameter. Marginalize = Integrate R2 = R1 ,R2 + B1 , R2 over all options. P(R2 |I ) = P(R1 , R2 | I ) + P(B1 , R2 | I ) Using product rule = P(R1 | I ) P(R2 | R1 , I ) + P(B1 | I ) P(R2| B1 , I ) M M-1 + N-M M N N-1 N N-1 = M N = P(R1 |I ) ... = P(R3 |I ) etc = Marginalization RAIN NO RAIN NO CLOUDS 1/6 1/3 1/2 0 1/2 1/2 1/6 5/6 Chance of Rain Chance of Cloud CLOUDS Marginalization Where Ai represents a set of Mutually Exclusive and Exhaustive possibilities, then marginalization or integrating out of “nuisance parameters” takes the form: P(|D,I) = i P(, A |D,I) i Or in the limit of a continuously variable parameter A (rather than discrete case above) P changes into a probability density function: P(|D,I) = dA P(, A|D,I) This technique is often required in inference, for example we may be interested in the frequency of a sinusoidal signal in noisy data, but not interested in the amplitude (a nuisance parameter) Probability Distributions We denote probability distributions over all possible values of a variable x by p(x) . Cumulative Discrete Lim [p(x < X < x+δx)] / δx δx→ 0 Continuous Properties of Probability Distributions The expectation value for a function g(X) is the weighted average: g(X) = g(x) p(x) All x ʃ g(x) f(x) dx (discrete) (continuous) If it exists, this is the first moment, or mean of the distribution. The rth moment for a random variable X about the origin (x=0) is: ’r =Xr = xr p(x) All x ʃ x f(x) dx r (discrete) (continuous) The mean = ’1 = X is the 1st moment about the origin. Properties of Probability Distributions The rth central moment for a random variable X about the mean (origin=) is: r r =(X-) r = (x-) All x ʃ (x-) f(x) dx p(x) r (discrete) (continuous) First central moment: 1 = (X-) = 0 Second central moment: Var(X) = x2 = ( X - )2 x2 = ( X - )2 = ( X2 – 2X + 2) = X2 – 2 X + 2 = X2 – 22 + 2 = X2 – 2 = X2 – X 2 Therefore the variance x2 = X2 – X 2 Properties of Probability Distributions Third central moment: 3 = ( X - )3 Skewness Fourth central moment: 4 = ( X - )4 Kurtosis The median and the mode both provide estimates of central tendency for a distribution, and are in many cases more robust against outliers than the mean. Example: Mean and Median filtering Image degraded by salt noise Mean Filter Median Filter The Uniform Distribution A flat distribution with peak value normalized so that the area under the curve=1 • Commonly used as an ingnorance prior to express impartiality (a lack of bias) of the value of a quantity over the given interval. • Round-off error, quantization error are uniformly distributed Uniform PDF Cumulative Uniform PDF The Binomial Distribution Binomial statistics apply when there are exactly two mutually exclusive outcomes of a trial (labelled "success" and "failure“). The binomial distribution gives the probability of observing k successes in n trials, with the probability of success on a single trial denoted by p (p is assumed fixed for all trials). n Fixed p, Varying n • Among the most useful discrete distribution functions in statistics. • Multinomial distribution is a generalization for the case where there is more than a binary outcome. Fixed n, Varying p The Negative Binomial Distribution Closely related to the Binomial distribution, the Negative Binomial Distribution applies under the same circumstances but where the variable of interest is the number of trials n to obtain k successes and n-k failures (rather than the number of successes in N trials). For n Bernoulli trials each with success fraction p, the negative_binomial distribution gives the probability of observing k failures and n-k successes with success on the last trial: The Poisson Distribution Another crucial discrete distribution function, the Poisson expresses the probability of a number of events k (e.g. failures, arrivals, occurrences ...) occurring in a fixed period of time (or fixed area of space), provided these events occur with a known mean rate λ (events/time), and are independent of the previous event. • Poisson distribution is the limiting case of a binomial distribution where the probability for success p goes to zero while the number of trials n grows such that λ = np is finite. • Examples: photons received from a star in an interval; meteorite impacts over an area; pedestrians crossing at an intersection etc… The Normal (Gaussian) Distribution The Normal or Gaussian distribution is probably the most well known statistical distribution. A Gaussian with mean zero and standard deviation one is known as the Standard Normal Distribution. Given mean μ and standard deviation σ it has the PDF: • Continuous distribution which is the limiting case for a binomial as the number of trials (and successes) is very large. • Its pivotal role in statistics is partly due to the Central Limit Theorem (see later). Examples: Gaussian Distributions Human IQ Distribution The Power Law Distribution Power law distributions are ubiquitous in science, occurring in diverse phenomena, including city sizes, incomes, word frequencies, and earthquake magnitudes. A powerlaw implies that small occurrences are extremely common, whereas large instances are extremely rare. This “law” takes a number of forms (can be referred to as Zipf and sometimes Pareto). A simple illustrative power law is: Power Law PDF - Linear Scale k=0.5 K=1.0 K=2.0 Power Law PDF – Log-Log scale Example Power Laws from Nature Physics Example: Cosmic Ray Spectrum The Exponential Distribution The exponential distribution is a continuous probability distribution with an exponential falloff controlled by the rate parameter λ: larger values of λ entail a more rapid falloff in the distribution. • The exponential distribution is used to model times between independent events which happen at a constant average rate (e.g. lifetimes, waiting times). The gamma Distribution The gamma distribution is a two-parameter continuous pdf characterized by two parameters usually designated the shape parameter k and the scale parameter θ. When k=1 it coincides with the exponential distribution, and is also closely related to the Poisson and Chi Squared Distributions. Gamma PDF: Where the Gamma function is defined: • The Gamma distribution gives a flexible class of PDFs for nonnegative phenomena, often used in modeling waiting times. • Conjugate for the Poisson PDF The Beta Distribution The family of beta probability distributions is defined on the fixed interval [0,1] and parameterized by two positive shape parameters, α and β. In Bayesian statistics it is frequently encountered as a prior for the binomial distribution. Beta PDF: Where the Beta function is defined: • The family of Beta distributions allows for a wide variety of shapes over a fixed interval. • If likelihood function is a binomial, then a Beta prior will lead to another beta function for the posterior. • The role of the Beta function can be thought of as a simple normalization to ensure that the total PDF integrates to 1.0 Central Limit Theorem: Experimental demonstration ..... Central Limit Theorem: A Bayesian demonstration x1 dx1 x2 y dx2 dy X1 x1 to dx1 P(x1 |I ) = f1 (x1) X2 x2 to dx2 P(x2 |I ) = f2 (x2) Y y to dy I Y is the sum of X1 and X2 P(Y |I ) = dX1 dX2 P(Y, X1 , X2 | I ) Using the product rule, and = dX1 dX2 P(X1 | I ) P(X2 | I ) P(Y | X1 , X2 , I ) independence of X1 , X2 Because y = x1 + x2 Therefore P(Y | X1 , X2 , I ) = δ (y – x1 – x2 ) P(Y |I ) = dX1 f1 (x1) dX2 f2 (x2) δ (y – x1 – x2 ) = dX1 f1 (x1) f2 (y – x1) Convolution Integral Central Limit Theorem: Convolution Demonstration