Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Problem 1. The probability of a certain medical diagnostic test being positive is 90% if the patient is sick. A false positive happens 3% of the time. You get a positive result. Does that mean that you are sick? Our approach so far: there is an unknown parameter θ taking values in the two-element set {S, H} (Sick and Healthy). A single observation Y , with values in {+, −} is taken, and its distribution depends on θ in the following way: P[+|S] = 0.9, P[−|S] = 0.1 P[+|H] = 0.03, P[−|H] = 0.97 Given Y = +, we could, for example, test the null hypothesis that θ = H. The p-value (for any reasonable test) is Frequentist statistics The standard (frequentist) reasoning behing the decision to reject in the test is the following: If we repeated the test on 1, 000 healthy individuals, we would not observe significantly more than 30 positive results. Therefore, what we observed (a +) is very unlikely to occur by chance alone in a population of healthy individuals. The subtle point is that we need to be able to repeat the experiment many times (hence the term frequentist) from the same distribution, i.e., the same value of the parameter. p = P[+|H] = 0.03 and we would reject the null θ = H. A different point of view - Bayesian statistics The problem with the frequentist approach in this particular setting is that we cannot sample from the healthy individuals only, but one could sample from the overall population where, e.g., P[H] = 0.99, P[S] = 0.01. This introduces a probability on the parameter space and we can now ask a question that made no sense before P[S|+] =? Using the Bayes formula, we obtain P[S|+] = P[+|S] P[S] 0.9 × 0.01 = = 0.23. P[+|S] P[S] + P[+|H] P[H] 0.9 × 0.01 + 0.03 × 0.99 The additional information on the relative probabilities of different values of the parameters (prior distribution) lead to a completely different conclusion - you are probably not sick. XKCD Priors and posteriors In addition to the parameter space (containing all possible θ) and the likelihood function L(y1 , . . . , yn |θ), the Bayesian approach requires a prior distribution g on θs. The inference is made using the posterior distribution g ∗ (θ|y1 , . . . , y2 ) = where f (y1 , . . . , yn ) = ∫ The posterior Bayes estimator θ̂B for θ is ∫ θ̂B = θ × g ∗ (θ|y1 , . . . , yn ) dθ = E[θ|Y1 = y1 , . . . , Yn = yn ]. An interval (a, b) is called a (Bayesian) (1 − α)-credible interval if b g ∗ (θ) dθ = 1 − α. a Conjugate priors Certain pairs of prior distribution and the likelihood function - said to be conjugate - go especially well together, in the sense that the resulting posterior belongs to the same parametric familiy of distribution, but with different parameters. Here are some examples: prior likelihood posterior B(α, β) Γ(α, β) N (µ, σo2 ) many more Bernoulli Poisson N (η, δ 2 ) ... B(α + i yi , β + (n − ∑ Γ(α + i yi , β + n) N (. . . , . . . ) ∑ 1. Find the posterior distribution for p if the prior is U (0, 1). 2. What is the the posterior Bayes estimator p̂B for p? L(y1 , . . . , yn |θ) × g(θ) f (y1 , . . . , yn ) L(y1 , . . . , yn |θ)g(θ) dθ (in the continuous case). ∫ Problem 2. Let Y1 , . . . , Yn be a random sample from the Bernoulli distribution with parameter p. ∑ i yi ) 3. Construct a Bayesian 90%-credible interval for p when n = 4 and (y1 , . . . , y4 ) = (0, 1, 1, 0). Problem 3. Let Y1 , . . . , Yn be a random sample from the Normal distribution N (µ, σo2 ), where σo2 is considered known and µ itself has a normal prior distribution N (η, δ 2 ). 1. Find the posterior distribution for µ 2. What is the the posterior Bayes estimator µ?