Download Frequentist statistics A different point of view

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
Problem 1. The probability of a certain medical diagnostic test
being positive is 90% if the patient is sick. A false positive happens
3% of the time. You get a positive result. Does that mean that you
are sick?
Our approach so far: there is an unknown parameter θ taking values
in the two-element set {S, H} (Sick and Healthy). A single observation Y , with values in {+, −} is taken, and its distribution depends
on θ in the following way:
P[+|S] = 0.9,
P[−|S] = 0.1
P[+|H] = 0.03,
P[−|H] = 0.97
Given Y = +, we could, for example, test the null hypothesis that
θ = H. The p-value (for any reasonable test) is
Frequentist statistics
The standard (frequentist) reasoning behing the decision to reject
in the test is the following:
If we repeated the test on 1, 000 healthy individuals, we would not
observe significantly more than 30 positive results. Therefore, what
we observed (a +) is very unlikely to occur by chance alone in a
population of healthy individuals.
The subtle point is that we need to be able to repeat the experiment
many times (hence the term frequentist) from the same distribution,
i.e., the same value of the parameter.
p = P[+|H] = 0.03
and we would reject the null θ = H.
A different point of view - Bayesian statistics
The problem with the frequentist approach in this particular setting
is that we cannot sample from the healthy individuals only, but one
could sample from the overall population where, e.g.,
P[H] = 0.99,
P[S] = 0.01.
This introduces a probability on the parameter space and we can
now ask a question that made no sense before
P[S|+] =?
Using the Bayes formula, we obtain
P[S|+] =
P[+|S] P[S]
0.9 × 0.01
=
= 0.23.
P[+|S] P[S] + P[+|H] P[H]
0.9 × 0.01 + 0.03 × 0.99
The additional information on the relative probabilities of different
values of the parameters (prior distribution) lead to a completely
different conclusion - you are probably not sick.
XKCD
Priors and posteriors
In addition to the parameter space (containing all possible θ) and
the likelihood function L(y1 , . . . , yn |θ), the Bayesian approach requires a prior distribution g on θs. The inference is made using the
posterior distribution
g ∗ (θ|y1 , . . . , y2 ) =
where f (y1 , . . . , yn ) =
∫
The posterior Bayes estimator θ̂B for θ is
∫
θ̂B = θ × g ∗ (θ|y1 , . . . , yn ) dθ = E[θ|Y1 = y1 , . . . , Yn = yn ].
An interval (a, b) is called a (Bayesian) (1 − α)-credible interval if
b
g ∗ (θ) dθ = 1 − α.
a
Conjugate priors
Certain pairs of prior distribution and the likelihood function - said
to be conjugate - go especially well together, in the sense that
the resulting posterior belongs to the same parametric familiy of
distribution, but with different parameters. Here are some examples:
prior
likelihood
posterior
B(α, β)
Γ(α, β)
N (µ, σo2 )
many more
Bernoulli
Poisson
N (η, δ 2 )
...
B(α + i yi , β + (n −
∑
Γ(α + i yi , β + n)
N (. . . , . . . )
∑
1. Find the posterior distribution for p if the prior is U (0, 1).
2. What is the the posterior Bayes estimator p̂B for p?
L(y1 , . . . , yn |θ) × g(θ)
f (y1 , . . . , yn )
L(y1 , . . . , yn |θ)g(θ) dθ (in the continuous case).
∫
Problem 2. Let Y1 , . . . , Yn be a random sample from the Bernoulli
distribution with parameter p.
∑
i yi )
3. Construct a Bayesian 90%-credible interval for p when n = 4
and (y1 , . . . , y4 ) = (0, 1, 1, 0).
Problem 3. Let Y1 , . . . , Yn be a random sample from the Normal
distribution N (µ, σo2 ), where σo2 is considered known and µ itself has
a normal prior distribution N (η, δ 2 ).
1. Find the posterior distribution for µ
2. What is the the posterior Bayes estimator µ?