Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine Learning Saarland University, SS 2007 Lecture 10, Friday June 22nd, 2007 (Everything you always wanted to know about statistics … but were afraid to ask) Holger Bast [with input from Ingmar Weber] Max-Planck-Institut für Informatik Saarbrücken, Germany Overview of this lecture Maximum likelihood vs. unbiased estimators – Example: normal distribution – Example: drawing numbers from a box Things you keep on reading in the ML literature – marginal distribution – prior – posterior Statistical tests – hypothesis testing – discussion of its (non)sense [example] Maximum likelihood vs. unbiased estimators Example: maximum likelihood estimator from Lecture 8, Example 2 – μ(x1,…,xn) = 1/n ∙ Σi xi σ2(x1,…,xn) = 1/n ∙ Σi (xi – μ)2 – X1,…,Xn independent identically distributed random variables with mean μ and variance σ2 – E μ(X1,…,Xn) = μ [blackboard] – E σ2(X1,…,Xn) = (n–1) / n ∙ σ2 ≠ σ2 [blackboard] – unbiased variance estimator = 1 / (n-1) ∙ Σi (xi – μ)2 Example: number x drawn from box with numbers 1..n for unknown n – maximum likelihood estimator: n = x [blackboard] – unbiased estimator: n = 2x – 1 [blackboard] Marginal distribution Joint probability distribution, for example – pick a random MPII staff member – random variables X = department, Y = gender – for example, Pr(X = D3, Y = female) D1 D2 D3 D4 D5 male 0.24 0.09 0.13 0.25 0.11 0.82 female 0.03 0.03 0.04 0.04 0.04 0.18 Pr(female) 0.27 0.12 0.17 Pr(D3) 0.29 0.15 Note: – matrix entries sum to 1 – in general, Pr(X = x, Y = y) ≠ Pr(X = x) ∙ Pr(Y = y) [holds if and only if X and Y are independent] Frequentism vs. Bayesianism Frequentism – probability = relative frequency in large number of trials – associated with random (physical) system – only applied to well-defined events in well-defined space for example: probability of a die showing 6 Bayesianism – probability = degree of belief – no random process at all needs to be involved – applied to arbitrary statements for example: probability that I will like a new movie Prior / Posterior probability Prior – guess about the data, no random experiment behind – go on computing with the guess like with a probability – for example: Z1,…,Zn from E-Step of EM algorithm Posterior – probability related to an event that has already happened – for example: all our likelihoods from Lectures 8 and 9 Note: these are no well-defined technical terms – but often used as if, which is confusing – the Bayesianism way … Hypothesis testing Example: do two samples have the same mean? – e.g., two groups of patients in a medical experiment, one group with medication and one group without – for example, 8.6 4.3 3.2 5.1 and 2.1 4.2 7.6 3.2 2.9 Test – Formulate null hypothesis, e.g. equal means – compute probability p of the given (or more extreme) data, assuming that the null hypothesis is true [blackboard] Outcome – p ≤ α = 0.05 hypothesis rejected with significance level 95% one says: the difference of the means is statistically significant – p > α = 0.05 the hypothesis cannot be rejected one says: the difference of the means is statistically insignificant Hypothesis testing — BEWARE! What one would ideally like: – given this data, what is the probability that my hypothesis if true? – formally: Pr(H | D) What one gets from hypothesis testing – given that my hypothesis is true, what is the probability of this (or more extreme) data – formally: Pr(D | H) – but Pr(D | H) could be low for other reasons than the hypothesis!! [blackboard example] Useful at all? – OK: challenge theory by attempting to reject it – NO: confirm theory by rejecting corresponding null hypothesis Literature Read the wonderful articles by Jacob Cohen – Things I have learned (so far) American Psychologist, 45(12):1304–1312, 1990 – The earth is round (p < .05) American Psychologist 49(12):997–1003, 1994