Download Machine Learning Lecture 1

Machine Learning Saarland University, SS 2007 Lecture 10, Friday June 22nd, 2007 (Everything you always wanted to know about statistics … but were afraid to ask) Holger Bast [with input from Ingmar Weber] Max-Planck-Institut für Informatik Saarbrücken, Germany Overview of this lecture  Maximum likelihood vs. unbiased estimators – Example: normal distribution – Example: drawing numbers from a box  Things you keep on reading in the ML literature – marginal distribution – prior – posterior  Statistical tests – hypothesis testing – discussion of its (non)sense [example] Maximum likelihood vs. unbiased estimators  Example: maximum likelihood estimator from Lecture 8, Example 2 – μ(x1,…,xn) = 1/n ∙ Σi xi σ2(x1,…,xn) = 1/n ∙ Σi (xi – μ)2 – X1,…,Xn independent identically distributed random variables with mean μ and variance σ2 – E μ(X1,…,Xn) = μ [blackboard] – E σ2(X1,…,Xn) = (n–1) / n ∙ σ2 ≠ σ2 [blackboard] – unbiased variance estimator = 1 / (n-1) ∙ Σi (xi – μ)2  Example: number x drawn from box with numbers 1..n for unknown n – maximum likelihood estimator: n = x [blackboard] – unbiased estimator: n = 2x – 1 [blackboard] Marginal distribution  Joint probability distribution, for example – pick a random MPII staff member – random variables X = department, Y = gender – for example, Pr(X = D3, Y = female)  D1 D2 D3 D4 D5 male 0.24 0.09 0.13 0.25 0.11 0.82 female 0.03 0.03 0.04 0.04 0.04 0.18 Pr(female) 0.27 0.12 0.17 Pr(D3) 0.29 0.15 Note: – matrix entries sum to 1 – in general, Pr(X = x, Y = y) ≠ Pr(X = x) ∙ Pr(Y = y) [holds if and only if X and Y are independent] Frequentism vs. Bayesianism  Frequentism – probability = relative frequency in large number of trials – associated with random (physical) system – only applied to well-defined events in well-defined space for example: probability of a die showing 6  Bayesianism – probability = degree of belief – no random process at all needs to be involved – applied to arbitrary statements for example: probability that I will like a new movie Prior / Posterior probability  Prior – guess about the data, no random experiment behind – go on computing with the guess like with a probability – for example: Z1,…,Zn from E-Step of EM algorithm  Posterior – probability related to an event that has already happened – for example: all our likelihoods from Lectures 8 and 9  Note: these are no well-defined technical terms – but often used as if, which is confusing – the Bayesianism way … Hypothesis testing    Example: do two samples have the same mean? – e.g., two groups of patients in a medical experiment, one group with medication and one group without – for example, 8.6 4.3 3.2 5.1 and 2.1 4.2 7.6 3.2 2.9 Test – Formulate null hypothesis, e.g. equal means – compute probability p of the given (or more extreme) data, assuming that the null hypothesis is true [blackboard] Outcome – p ≤ α = 0.05  hypothesis rejected with significance level 95% one says: the difference of the means is statistically significant – p > α = 0.05  the hypothesis cannot be rejected one says: the difference of the means is statistically insignificant Hypothesis testing — BEWARE!  What one would ideally like: – given this data, what is the probability that my hypothesis if true? – formally: Pr(H | D)  What one gets from hypothesis testing – given that my hypothesis is true, what is the probability of this (or more extreme) data – formally: Pr(D | H) – but Pr(D | H) could be low for other reasons than the hypothesis!! [blackboard example]  Useful at all? – OK: challenge theory by attempting to reject it – NO: confirm theory by rejecting corresponding null hypothesis Literature  Read the wonderful articles by Jacob Cohen – Things I have learned (so far) American Psychologist, 45(12):1304–1312, 1990 – The earth is round (p < .05) American Psychologist 49(12):997–1003, 1994

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Machine Learning Lecture 1