Download 5.2.1 Likelihood function fY (y0|X = x)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Matrix calculus wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Four-vector wikipedia , lookup

Transcript
Recap: Finite-dimensional linear Gaussian satistical inverse problem
• The given data is y0 = M x0 + ε0 , where M ∈ Rm×n .
• The statistical model of the noise ε is an m-dimensional Gaussian random vector,
that is distributed according to N (0, Cε ) i.e.
fε (y) = p
1
1 T −1
Cε y
(2π)m det(Cε )
e− 2 y
,
for all y ∈ Rm .
• The statistical model of the unknown is n-dimensional Gaussian random vector X,
that is independent from ε and distributed according to N (0, CX ) i.e.
1 T −1
1
e− 2 x Cε x
fpr (x) = p
(2π)n det(CX )
for all x ∈ Rn .
• The statistical model of the data is Y = M X + ε.
• The solution is the posterior pdf
fY (y0 |X = x)fpr (x)
f (y0 |X = x)fpr (x)dx
Rn Y
fpost (x) = R
T C −1 (y −M x)
0
ε
1
= cy0 e− 2 (y0 −M x)
1 T −1
CX x
e− 2 x
,
which simplifies to
1
1
T −1
e− 2 (x−mpost ) Cpost (x−mpost ) ,
fpost (x) = p
(2π)n det(Cpost )
where
−1
mpost = M T Cε−1 M + CX
−1
M T Cε−1 y0
and
−1
Cpost = M T Cε−1 M + CX
−1
.
• In more general cases, the unknown and the noise can have non-zero expectations
and the unknown and the noise need not be independent.
5.2.1
Likelihood function fY (y0 |X = x)
Consider a statistical inverse problem, where the data Y is an m-dimensional rv and the
unknown X in an n-dimensional rv.
Definition 34. Let y0 ∈ Rm be a sample of Y . The function x 7→ fY (y0 |X = x) is called
the likelihood function.
The likelihood function can contain information about
• inaccuracies due to the external disturbances (noise).
• inaccuracies of the direct theory
86
The case of independent noise term
Let X and ε be independent random vectors and denote Y = F (X) + ε, where the forwad
mapping F : Rn → Rm is continuous.
If random vector ε has a pdf, then the conditional pdf of Y = F (X) + ε given X = x
is, by Corollary 5,
fY (y0 |X = x) = fε+F (x) (y0 ) = fε (y0 − F (x)).
(5.6)
Example 38 (CT scan). The unknown X-ray mass absorption coefficient f = f (x0 , y 0 )
is approximated by equation
f (x0 , y 0 ) =
n
X
xj φj (x0 , y 0 ), x0 , y 0 ∈ R2
j=1
where x = (x1 , . . . , xn ) ∈ Rn contains the unknowns and the functions φj are fixed. The
data can be (coarsely) modeled as a vector y = (y1 , . . . , ym ) whose components are
Z
n Z
X
y=
f ds + εi =
φj ds xi + εi = (M x)i + εi ,
Ci
j=1
Ci
where i = 1, . . . , , m and the random vector ε is distributed according to N (0, δI). Then
we end up with the statistical inverse problem
Y = M X + ε.
When X and ε are taken to be statistically independent, the likelihood function is
fY (y0 |X = x) =
1
1
− 2δ
|y0 −M x|2
.
m e
(2π) 2
Model errors
Next, we allow model errors for the direct theory and the unknown.
Theorem 19. Let Y be an m-dimensional rv, X be an n-dimensional rv and U be a kdimensional rv so that the joint pdf f(X,U ) is positive and the conditional pdfs fY (y|(X, U ) =
(x, u)) and fU (u|X = x), are given. Then the conditional pdf
Z
fY (y|X = x) =
fY (y|(X, U ) = (x, u))fU (u|X = x)du.
Rk
whenever fX (x) > 0.
Proof. We need to determine
fY (y|X = x) =
f(X,Y ) (x, y)
.
fX (x)
By definition, the marginal pdf
Z
f(X,Y ) (x, y) =
f(X,Y,U ) (x, y, u)du,
Rk
87
where the integrand is determined by Theorem 16. Then
Z
f(X,Y,U ) (x, y, u) f(X,U ) (x, u)
fY (y|X = x) =
du,
f(X,U ) (x, u)
fX (x)
Rk
which gives the claim by the definition of conditional pdfs..
Example 39. (Approximation error) Consider the statistical inverse problem Y = F (X)+
ε, where the unknown X and the noise ε are statistically independent. For computational reasons, a high-dimensional X is often approximated by a lower dimensional rv
Xn . Let’s take Xn = Pn X, where Pn : RN → RN is an orthogonal projection onto some
n-dimensional subspace of RN , where n < N (and also m < N ). Then
F (X) = F (Xn ) + (F (X) − F (Xn )) =: F (Xn ) + U,
which leads to
Y = F (X) + ε = F (Xn ) + U + ε.
According to Theorem 19, the likelihood function for Xn can be expressed as
Z
fU (u|Xn = x)fε (y − F (x) − u)du,
fY (y|Xn = x) =
(5.7)
Rm
whenever the assumptions of the theorem are fulfilled. Especially fU (u|Xn = x) needs to
be available.
The integral (5.7) is often computationally costly. One approximation is to replace
e that is a similarly distributed but independent from X. Whe the prior
U by a rv U
e + ε has a known probability distribution. When this
distribution of X is given, then U
distribution has a pdf, then
fY (y|Xn = x) = fε+Ue (y − F (x)).
Example 40. (Inaccuracies of the forward model) Let the forward model F : Rn → Rm
be a linear mappping whose matrix M = M σ deoends continuously from σ ∈ R, where
the value of σ is not precisely known. For example, in image enhancing (Chapter 1.2)
the blurring map
n
X
2
2
2
2
2
m
e kl = Ckl
e−(|k−i| /n +|l−j| /n )/2σ mij
i,j=1
contains such a parameter. Then we may model the inaccuracies of σ with a probability
distribution. Say, σ, X and ε are statistically independent and fσ (s) is the pdf of σ. Then
Y = M σ X + ε = G(σ, X, ε)
is a random vector, since
G : R × Rn × Rm 3 (s, x, z) 7→ M σ x + z
is continuous. By Theorem 17,
fY (y|(X, σ) = (x, s)) = fG(s,x,ε) (y) = fε (y − M s x).
Under the assumptions of Theorem 19, we have
Z
fY (y|X = x) =
fε (y − M s x)fσ (s)ds.
Rm
88
5.2.2
The prior pdf fpr (x)
The prior pdf represents the information that we have about the unknown and describes
also our perception of the lack of information.
Assume that x ∈ Rn corresponds to values of some unkown function g at fixed points
of [0, 1] × [0, 1], say
xi = g(ti ),
where ti ∈ [0, 1] × [0, 1] kun i = 1, ..., n.
Possible prior information:
Function g
Some values of g are known
exactly or inexactly.
Smoothness of g.
Image of g is known
E.g. g ≥ 0, or monotonicity
Symmetry of g.
Other restrictions for g
E.g. if g : R3 → R3 is
a magnetic field, then ∇ · g ≡ 0.
Vectorx
Some component of x are known
exactly or inexactly.
Behavior of the neighbor components in x.
The subset, where x belongs is known.
E.g. . xi ≥ 0, xi ≥ xi+1
Symmetry of x.
Restrictions for x.
Equations G(x) = 0.
Possible statistical models:
Unknown vector x ∈ Rn
Some component of x are
known exactly or inexactly.
The vectors that span x are known.
P 0
x = ni=1 ai ei , n0 ≤ n.
Statistical model X : Ω → Rn
Xi = mi + Zi , where rv Zi represents the
inaccuracy of mi .
P 0
X = ni=1 Zi ei
where Zi models the uncertainty of
the coefficients.
The behavior of neighbor components in x Statistical dependencies between components of X
The joint distribution of X.
The subset containing x
E.g. P (∩i {Xi ≥ 0}) = 1.
E.g. xi ≥ 0
5.3
Different prior pdfs
Let X : Ω → Rn be a random vector that models the uknown and let fpr : Rn → [0, ∞)
denote its pdf. Next, we meet some pdf that can be often used as fpr .
Uniform distribution
Let B ⊂ Rn be a closed and bounded hyper-rectangular
B = {x ∈ Rn : ai ≤ xi ≤ bi , i = 1, .., n},
where ai < bi when i = 1, .., n.
89
The random vector X is uniformly distributed on B if
fpr (x) =
where the number |C| :=
R
C
1
1B (x),
|B|
dx.
• Unknown belongs to set B i.e. the ith component belongs to the interval [ai , bi ].
• Reflects almost perfect uncertainty about the values of the unknown. They belong
to B.
• The set B needs to be bounded in order to fpr to be a proper pdf.
• The posterior pdf
fpost (x) =
fY (y0 |X = x)1B (x)
fY (y)|B|
is the renormed and restricted likelihood.
`1 -prior
Define a new norm `1 by
kxk1 =
n
X
|xi |
i=1
for all x ∈ Rn .
A random vector X has `1 -prior, if
fpr (x) =
α n
2
e−αkxk1
• Components Xi are statistically independent.
• Pdf fXi is symmetric w.r.t origo and the expectation is zero.
• Parameter α reflects how certain we are that the unknown attains large values.
5.3.1
`2 -prior
A random vector X has `2 -prior, if
fpr (x) =
α n2
π
e−α|x|
2
• Components of X are independent and normally distributed.
90
1
alpha=0.5
alpha=1
alpha=2
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−10
−8
−6
−4
−2
0
2
4
6
8
10
Figure 5.5: Pdf of 1-dimensional `1 -prior.
0.8
alpha=0.5
alpha=1
alpha=2
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−10
−8
−6
−4
−2
0
2
4
6
8
Figure 5.6: Pdf of 1-dimensional `2 -prior.
91
10
Cauchy prior
A random vector X has Cauchy prior if
fpr (x) =
n
α n Y
π
i=1
1
1 + α2 x2i
where x ∈ Rn .
• The components of Xi are independent
• Pdf fXi is symmetric w.r. origo.
• No expectation (large tail probabilities).
• Reflects best a situation, where some of the components of the unknown can attain
large values.
0.7
alpha=0.5
alpha=1
alpha=2
0.6
0.5
0.4
0.3
0.2
0.1
0
−10
−8
−6
−4
−2
0
2
4
6
8
10
Figure 5.7: Pdf of 1-dimensional Cauchy-prior.
Discrete Markov fields
Let the unknown represent the values of some n0 -variable function
0
0
f : Rn → R at points ti ∈ Rn , i = 1, ..., n.
The neighborhoods Ni ⊂ {1, ..., n} of indices i ∈ {1, . . . , n} consists of sets that
1. i ∈
/ Ni
2. i ∈ Nj if and only if j ∈ Ni .
92
0.4
Gauss
l1
Cauchy
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−10
−8
−6
−4
−2
0
2
4
6
Figure 5.8: Pdf of N (0, 1) , pdf of 1D-Cauchy priori with α =
2
.
when α = 2π
8
10
√π
2π
and pdf of 1D `1 -prior
Definition 35. A random vector X is a discrete Markov field with respect to neighborhood system Ni , i = 1, .., n if jos
fXi (x|(X1 , X2 , .., Xi−1 , Xi+1 , Xi+2 , ..., Xn ) = (x1 , x2 , .., xi−1 , xi+1 , xi+2 , ..., xn ))
= fXi (x|Xk = xk ∀k ∈ Ni )
The components Xi of discrete Markov field depend only on its neighboring components Xk , k ∈ Ni .
Theorem 20 (Hammersley-Clifford). Let rv X : Ω → Rn be a discrete Markov field with
respect to the neighborhood system Ni , i = 1, .., n. If X has a pdf fX > 0, then
fX (x) = ce−
Pn
i=1
Vi (x)
where Vi : Rn → R depends only from xi and its neighbor components xk , k ∈ Ni .
Example 41. (Total variation prior) Let the rv X model an image consisting of N × N pixels so that the corresponding matrix is organised as an n = N 2 -dimensional vectors.
The rv X : Ω → R2 is distributed according to the total variation prior , if
fpr (x) = ce−
Pn
j=1
Vj (x)
where
Vj (x) = α
X
lij |xi − xj |
i∈Nj
and the neighborhood Nj of index j contains only of indeces of those pixel i that share
an edge with the pixel j. Moreover, the number lij is the length of the common edge
between pixels i and j.
93
P
P
• The total variation nj=1 21 i∈Nj lij |xi − xj | is small if the difference between the
value xi of the pixel i color and its the corresponding value of its neighbor components is small except possibly for those pixel sets whose borders have very short
length.
Example 42. (1D-Gaussian smoothness priors)
Let X be such a rv that corresponds to values o an unknown function g at points
ti ∈ [0, 1], i = 1, .., n, 0 = t0 < t1 < · · · < tn < 1 are equidistant points and g(t) = 0 for
t ≤ 0.
Fix the prior pdf of X as
fpr (x) = ce−α(x1 +
2
Pn
2
i=2 (xi −xi−1 )
).
• The boundary component is forced to zero i.e. X0 = g(0) =≡ 0.
• If α is large, then the neighbor components of X are more likely to be close to each
other.
• A Random walk model.
Similarly, also higher differences can be used. For example,
1
fpr (x) = ce− 2a4 (x1 +(−2x2 −x1 )
2
Pn
2+
2
i=3 (xi −2xi−1 +xi−2 )
)
corresponds to the second differences.
Example 43. (2D-Gaussian smoothness priors) Let f : [0, 1]2 → R be such a continuous
function that f = 0 outside [0, 1]2 . Let X be rv, corresponding to a function g(t, s) at
points
k j
2
,
: k, j = 1, ..., n .
{ti ∈ [0, 1] × [0, 1] : i = 1, .., n } =
n n
Set
fpr (x) = ce−α
P
j
Vj (x)
,
where
Vj = |4xj −
X
xi |2
i∈Nj
and Nj contains only the indices of points ti that are next to the point tj ( over it or
under it or to left or to right from it).
Positvity constraint
If we know that the unknown has non-negative components, then we may restrict and
renorm the pdf
fpr (x) = cf+ (x)fX (x)
where
(
1, xi ≥ 0 ∀i = 1, .., n
f+ (x) =
0 otherwise.
94
Hierarchical priors
When the unknown is modeled as a random vector, whose pdf depends continuously on
0
parameter σ ∈ Rn , it is possible to model the uncertainty of the parameter σ by attaching
a pdf to it.
Let X : Ω → Rn be the rv that models the unknown and let the pdf of X be fX . Let
0
σ : Ω → Rn be a rv that models the unknown parameter and let its pdf be fσ . Assume
that we have the conditional pdf of X given σ = s, that is
x 7→ fX (x|σ = s) = fXs (x)
0
is known for all s ∈ Rn . When the product fXs (x)fσ (s) is integrable, we have the joint
distribution
f(X,σ) (x) = fXs (x)fσ (s).
Option 1) The unknown is modeled as a rv X with pdf
Z
fpr (x) = fXs (x)fσ (s)ds1 · · · dsn0
(whenever the marginal exists). The corresponding posterior pdf is
fpost (x) = cfY (y|X = x)fpr (x)
whenever fY (y) > 0.
Option 2) Also the hyperparameter σ is taken to be part of the unknown and as a
prior pdf, we set the joint pdf
fpr (x, s) = fXs (x)fσ (s).
which implies that the posterior pdf is
fpost (x, s) = cfY (y|(X, σ) = (x, s))fpr (x, s) = cfY (y|X = x, s)fpr (x, s)
whenever fY (y) > 0 (note that the likelihood function does not depend on s but only on
x).
0
In options 1,2 the prior pdf is called a hierarchical prior and the parameter σ : Ω → Rn
is called a hyperparameter and its distribution a hyperprior.
Example 44. Let X : Ω → R3 be rv that models the unknown and has pdf
√
s
1 2 s
1
2
2
fpr (x; s) = √ 3 exp − x1 − (x2 − x1 ) − (x3 − x2 ) ,
2
2
2
2π
where s ∈ R is an unknown parameter. We model this parameter as a random variable
σ : Ω → R and denote
fX (x|σ = s) = fpr (x; s).
As the hyperprior, we set
fσ (s) = λf+ (s)e−λs
95
where λ > 0, and f+ (s) = 1 for s > 0 and 0 otherwise. Then
√
sλ
1 2 s
1
2
2
f(X,σ) (x, s) = √
f+ (s) exp − x1 − (x2 − x1 ) − (x3 − x2 ) e−λs
3
2
2
2
( 2π)
and
Z ∞
s
√
λ
1 2 1
2
fX (x) = √
exp − x1 − (x3 − x2 )
s exp − (x2 − x1 )2 − λs ds
2
2
2
( 2π)3
0
Z ∞
1
λ
1 2 1
1
2
2
s 2 exp(−s
= √
exp − x1 − (x3 − x2 )
(x2 − x1 ) + λ )ds
2
2
2
( 2π)3
0
Z ∞
1
λ
1 2 1
1
2
2 exp(−s)ds
= √
exp − x1 − (x3 − x2 )
s
3
2
2
( 2π)3
( 12 (x2 − x1 )2 + λ) 2 0
λ exp − 12 x21 − 21 (x3 − x2 )2
3
= √
Γ
3
2
( 2π)3
( 12 (x2 − x1 )2 + λ) 2
1 2
1
λ exp − 2 x1 − 2 (x3 − x2 )2
.
= √
3
4π 2
((x2 − x1 )2 + 2λ) 2
√
The value of the Gamma function Γ(3/2) = π/4.
0.7
lambda=0.3
lambda=1
lambda=2
0.6
0.5
0.4
0.3
0.2
0.1
0
−20
−15
−10
−5
0
5
Figure 5.9: Pdf f (x) =
10
15
λ
3
(x2 +2λ) 2
20
.
• The differences between components of X are independent.
• The difference X2 − X1 has a Cauchy type distribution (transformed Beta distribution), which gives a slightly lower probability to occurance of very high values.
• Uncerainty about the variance of X2 − X1 produced a distribution that allows large
values with higher probability than the Gaussian distribution.
96
0.25
Cauchy
Transformed Beta
0.2
0.15
0.1
0.05
0
−20
−15
−10
−5
0
5
10
15
Figure 5.10: Cauchy prior and pdf f (x) =
5.4
5.4.1
20
λ
3
(x2 +2λ) 2
.
Studying the posterior distribution
Decision theory
Let pdfs f(X,Y ) , fX > 0 and fY > 0 exist and be continuous. Denote
fpost (x; y) = fX (x|Y = y)
when y ∈ Rm .
Multidimensional function fpost (x; y) can be very hard to visualize properly. Can we
extract some information about the unknown on the basis of the posterior pdf? We turn
our attention to the field of statistics that is called decision theory.
Decision theory answer to the question: what function h : Rm → Rn is such that the
vector h(y) resembles the most (in some sense) the unknown x that has produced the
observation y = F (x) + ε?
In statistics, the function h is called an estimator and the value h(y) an estimate.
Let us fix in what sense the estimator is best. We first fix a loss function
L : Rn × Rn → [0, ∞)
that measures the accuracy of the estimate h(y) when the unknown is x as L(x, h(y)) (low
values of L mean accurate estimates). For example, we can take L(x, h(y)) = |x − h(y)|2 .
Assume that L is fixed and x 7→ L(x, z)fpost (x) is integrable for all z ∈ Rn .
If y ∈ Rm , then the value h(y) ∈ Rn of the estimator h is chosen so that it minimizes
the posterior expectation
Z
L(x, h(y))fpost (x; y)dx
Rn
i.e.
Z
h(y) = argmin
z∈Rn
L(x, z)fpost (x; y)dx.
Rn
97
When data is y we look for h(y), that gives the smallest possible posterior expectation.
The number
Z Z
r(h) =
L(x, h(y))fpost (x; y)dx fY (y)dy
Rm
Rn
is called the Bayes risk. An application of the Fubini theorem leads to
Z Z
L(x, h(y))fY (y|X = x)dy fpr (x)dx.
r(h) =
Rn
Rm
The interpretation of the Bayes risk is that when the unknown is X and the noisy data
Y , then the Bayes risk r(h) of the estimator h is the expected loss with respect to the
joint distribution of X and Y i.e. r(h) = E[L(X, h(Y ))].
Example 45. (CM estimate) Take L(x, z) = |x−h(y)|2 as the loss function. Let mpost (y)
denote the posterior expectation
Z
mpost (y) =
xfpost (x)dx
Rn
and Cpost (y) the posterior covariance matrix
Z
(xi − (mpost (y))i )(xj − (mpost (y))j )fpost (x)dx.
(Cpost (y))ij =
Rn
Then
Z
Z
|x − h(y)|2 fpost (x; y)dx
L(x, h(y))fpost (x; y)dx =
n
Rn
ZR
=
|x − mpost (y) + mpost (y) − h(y)|2 fpost (x; y)dx
Rn
Z
2
(|x − mpost (y)| + 2
=
Rn
n
X
(x − mpost (y))i (mpost (y) − h(y))i
i=1
+|mpost (y) − h(y)|2 )fpost (x; y)dx
Z
|x − mpost (y)|2 fpost (x; y)dx
=
Rn
+2
n
X
Z
(mpost (y) − h(y))i
(x − mpost (y))i fpost (x; y)dx
Rn
i=1
2
Z
+|mpost − h(y)|
fpost (x; y)dx
n
R
Z
=
|x − mpost (y)|2 fpost (x; y)dx + |mpost − h(y)|2
Rn
The minimum loss is attained when |mpost (y) − h(y))|2 = 0 i.e h(y) = mpost (y), so that
Z
L(x, h(y))fpost (x; y)dx =
Rn
n
X
i=1
98
(Cpost (y))ii .
In other words, the expectation of the loss function is the sum of the diagonal elements
of the posterior covariance matrix i.e. its trace.
Posterior expectation is often denoted by x̂CM (CM=ccnditional mean).
Example 46. MAP estimate
We say that the pdf is unimodal if its global maximum is attained at only one point.
Let δ > 0 and Lδ (x, z) = 1B̄(z,δ)C (x) when x, z ∈ Rn . Let x 7→ fpost (x; y) be unimodal
for the given data y ∈ Rn . The limit of the estimate
Z
hδ (y) = argmin
1B̄(z,δ)C (x)fpost (x; y)dx
z∈Rn
Rn
Z
= argmin
fpost (x; y)dx
z∈Rn
Rn \B̄(z,δ)
is
lim hδ (y) = x̂M AP (y)
δ→0+
where
x̂M AP (y) = argmax fpost (x; y).
x∈Rn
The maximum a posterior estimate x̂M AP (y) is useful when expectations are hard to
obtain. It can be also written as
x̂M AP (y) = argmax fY (y|X = x)fpr (x)
x∈Rn
MAP estimate is often used also in situations when the posterior pdf is not unimodal
whereby it is not unique.
In addition to estimates x̂ we can also determine their componentwise Bayesian confidence intervals by choosing a e.g. in such a way that
Ppost (|Xi − x̂i | ≤ a) = 1 − α
where α = 0.05.
5.5
Recap
• About probability theory
– Conditional pdf of random vector X given Y = y (with marginal pdf fY (y) >
0) is
f(X,Y ) (x, y)
.
fX (x|Y = y) =
fY (y)
– Bayes’ formula
fX (x|Y = y)fY (y) = f(X,Y ) (x, y) = fY (y|X = x)fX (x)
holds for continuous pdfs (in case of discontinuities, only up to versions).
99
• Statistical inverse problem
– The unknown and the data are modeled as random vectors X and Y .
– The probability distributions of X and Y represents quantitative and qualitative information about X and Y and lack of such information.
– The given data y0 is a sample of Y i.e. y0 = Y (ω0 ) for some elementary event
ω0 ∈ Ω.
– The solution of a statistical inverse problem is the conditional pdf of X given
Y = y0 (with fY (y0 ) > 0)
• Posterior pdf
– consists of (normed) product of likelihood function x 7→ fY (y0 |X = x) and
prior pdf x 7→ fpr (x).
– can be used in determining estimates and confidence intervals for the unknown.
• Typical priors include Gaussian priors (especially smoothness priors), `1 -prior, Cauchy
prior and total variation prior (e.g. for 2D images).
Please learn:
• definitions of prior and posterior pdf
• how to define posterior pdf (up to the norming constant) when the unknown and
noise are statistically independent and the needed pdfs are continuous.
• how to write the expressions for the posterior pdf, its mean and covariance, in the
linear Gaussian case.
• how to explain the connection between Tikhonov regularisation and Gaussian linear
inverse problems
• how to form the hierarchical prior pdf when the conditional pdf and the hyperprior
are given
• definition of CM-estimate as the conditional mean
• definition of MAP-estimate as a maximizer of the posterior pdf
100