Download Lecture 19 (Mar. 24)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mathematical optimization wikipedia , lookup

Corecursion wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Pattern recognition wikipedia , lookup

Generalized linear model wikipedia , lookup

Inverse problem wikipedia , lookup

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Monte Carlo method wikipedia , lookup

Randomness wikipedia , lookup

Hardware random number generator wikipedia , lookup

Transcript
MP/BME 574 Lecture 19: Random number generators and Monte Carlo methods
Learning Objectives:
 Random number generators
 Introduction to the Monte Carlo methods
 Iterative methods and “blind” deconvolution
Assignment:
1. Read “An iterative technique for the rectification of observed distributions,” by Lucy et al. available on
the website in pdf.
2. Park and Miller, “Random number generators: good ones are hard to find.”
I.
Random number generators
a. Most common algorithm is Lehmer’s algorithm
i. Iterative method
z n 1  a z n mod( m)
u n 1 
z n 1
m
z o  " seed "
m  Large prime number, period length(m - 1).
(if not prime then algorithm collapses to zero).
b. “pseudorandom” number generator
i. Use the same seed, then get the same series
ii. Repeats with period m-1.
>> z(1) = 1;
m = 13;
a = 6;
for i = 1:20,
z(i+1) = mod(a*z(i),m);
%u(i+1)=z(i+1)/m;
end
>> z
z=
1
6
10
8
9
2
12
7
3
5
4
11
1
6
10
8
9
2
12
7
3
iii. Matlab e.g.
1. “rand()” or just “rand” returns a single random number
2. rand(n) returns an n × n matrix of random numbers
3. Same sequence length(m-1) long until “state” is reset, i.e. a new seed is
introduced.
“help rand”
RAND Uniformly distributed random numbers.
MP/BME 574 Lecture 19: Random number generators and Monte Carlo methods
RAND(N) is an N-by-N matrix with random entries, chosen from
a uniform distribution on the interval (0.0,1.0).
RAND(M,N) and RAND([M,N]) are M-by-N matrices with random entries.
RAND(M,N,P,…) or RAND([M,N,P,…]) generate random arrays.
RAND with no arguments is a scalar whose value changes each time it
is referenced. RAND(SIZE(A)) is the same size as A.
RAND produces pseudo-random numbers. The sequence of numbers
generated is determined by the state of the generator. Since MATLAB
resets the state at start-up, the sequence of numbers generated will
be the same unless the state is changed.
S = RAND(‘state’) is a 35-element vector containing the current state
of the uniform generator. RAND(‘state’,S) resets the state to S.
RAND(‘state’,0) resets the generator to its initial state.
RAND(‘state’,J), for integer J, resets the generator to its J-th state.
RAND(‘state’,sum(100*clock)) resets it to a different state each time.
This generator can generate all the floating point numbers in the
closed interval [2^(-53), 1-2^(-53)]. Theoretically, it can generate
over 2^1492 values before repeating itself.
c. hist(g, Nbins), where g is a vector containing pseudorandom numbers and Nbins represents the
number of bins to use in grouping counts.
freq = hist(g,Nbins)
>> g = rand([1024, 1]);
>> freq = hist(g,64)
II.
Random samples from probability distribution functions
a. “Monte Carlo” methods
MP/BME 574 Lecture 19: Random number generators and Monte Carlo methods
b.
Gaussian noise example
i. Random (or pseudorandom) number generator should result in uniform distribution.
ii. Implies all outcomes are equally probable
iii. How can we generate random samples of any given pdf?
Figure 1:
xi  rand
y ( xi )  f ( xi )
y i  c rand
y ( xi )  y i  [0, c] ?
yes, then return( xi ); see case " o" in Figure 1
no, case " x" in Figure 1.
c. Samples in each bin of Figure 1 are binomially distributed at each bin centered at xi with width
x.
n
PB ( x j , n; p)    p x (1  p) n  x
 x
n  number of pints falling into interval x
x  number of successes
f ( xi )
p  Pr(success) =
c
d. The mean and the variance will be dependent on the number of trials assuming each bin is
equally populated.
N
n
N bins
np   
n
f ( xi )
c
 2  np(1  p)  n
f ( xi )
f ( xi )
(1 
)
c
c
n
f ( xi )

SNR   c

1  f (cx )
i
Therefore, we approach or improve our estimate of the function f(xi) with root n dependence.
MP/BME 574 Lecture 19: Random number generators and Monte Carlo methods
% Zero Mean Gaussian Noise generator
% Recall that rand() returns a random number between [0,1]
% Scaling in required
clear g;
mu = 0;
bins = 100;
sigma = 10;
c = 1/(sqrt(2*pi)*sigma).*exp(-(0-mu).^2/(2*sigma^2));
c = c+0.01;
index = 256*256*100;
range = 60;
binwidth = range/bins;
for i = 1:index
x = range*(rand-0.5);
f_x = 1/(sqrt(2*pi)*sigma).*exp(-(x-mu).^2/(2*sigma^2));
r_x = f_x/c;
u = rand;
if(r_x>u)
g(i) = x;
else
end
end
g = g(find(g));
counts = length(g);
xx = -30:0.01:30;
G = counts.*binwidth.*1/(sqrt(2*pi)*sigma).*exp(-(xx-mu).^2/(2*sigma^2));
figure;hist(g,100);hold
plot(xx,G,'-k','Linewidth',2)
MP/BME 574 Lecture 19: Random number generators and Monte Carlo methods
III.
Applications of Monte Carlo methods for solving case where a point response function, p(x), is not
known or not easily measured.
a. Generalized Monte Carlo algorithm:
1. Generate a random number
2. “Guess” within some constraints or boundary on the problem
i.e. mapping f(x) to coordinate space (Figure 1).
3. Cost function: “Is the point inside the relevant coordinate space?”
4. If yes, store value
5. Repeat
IV.
Blind Deconvolution
a. In the blind deconvolution problem, both f and h are need to be estimated simultaneously. If
nothing is known about either function this is not possible.
b. Therefore a strategy is to combine the robust convergence properties of iterative techniques with
a priori assumptions about the form of the data including statistical models of uncertainty in the
measurements.
i. General assumptions about the physical boundary conditions and uncertainty in the data
1. e.g. Non-negativity and compact support.
ii. Statistical models of variation in the measured data:
1. e.g. Poisson or Gaussian distributed.
2. This leads to estimates of expected values for the measured data for Maximum
Likelihood (ML) optimization.
iii. Physical parameter constrains the solution, while the ML approach provides a criterion
for evaluating convergence.
c. Maximum Likelihood:
i. Consider a data estimation problem in which the uncertainty in the measured data are
assumed to be governed by a Gaussian probability density function (pdf):
Pr( y i ) 
1
 ( y i  yˆ i ) 2
2 2
e
2 2
It is acceptable to evaluate the log-likelihood since log(Pr) is a monotonically increasing
function. Therefore, we maximize the total probability by maximizing:
2

  1  ( yi  yˆ i ) 

ln Pr   ln 

.
2
2 
2

i 
  2 

Therefore, the log likelihood of the measured data is maximized for a model in which,
minimized.
d. Now consider the iterative ML approach to the blind deconvolution problem:
fˆo ( n1, n 2)  g ( n1, n 2)
fˆ ( n1, n 2)  fˆ ( n1, n 2)   [ g ( n1, n 2)  fˆ ( n1, n 2) * hˆ ( n1, n 2)]
k 1
then at each k,
k
k
k
( yi  yˆ i ) 2
2 2
is
MP/BME 574 Lecture 19: Random number generators and Monte Carlo methods
gˆ k (n1, n2)  fˆk (n1, n2) * hˆk (n1, n2), and
LSE  
n1
 g  gˆ
2
k
is minimized and used to optimize the convergence. The conditions for
n2
convergence are similar to the iterative procedure when h( n1, n 2) is known except that the
convergence to the inverse filter is no longer guaranteed and is sensitive to noise, choice of , and
the initial guess, fˆo (n1, n2) .
e. Statistical model of the convolution process:
i. Derives from the ML concept applied to a statistical model of the convolution process. In
this approach, x, is a random variable, and hˆ( x ) represents an estimate of a probability
density function, h(x ) , that models the missing or unknown data.
ii. Intuitively h(x ) is the superposition of multiple random processes used as probes (i.e.
individual photons, or molecules of dye) use to measure the response of the system. The
physical system must adhere to mass balance and, for finite counting statistics, nonnegativity.
iii. For example, consider:
 ( x )   ( ) p( x  )d , where  (x ) is our measured image data,  ( ) is the desired corrected
image, and p( x  ) is a conditional probability density function kernel that relates the expected value
of the data to the measured data, e.g. p( x  ) 
 ( x  ) 2
1
2 2
assuming the photon counts in our x2
ray image are approximately Gaussian distributed about there expected value  For this example,
then
 ( x )   ( )
1
2 2
2
e
 ( x  ) 2
e
2 2
d   ( x ) * p( x ) becomes our familiar convolution process.
f. Expectation maximization
i. Expectation in the sense that we use a statistical model of the data measurement process
to estimate the missing data
ii. Maximization in the maximum likelihood sense, where iteration is used within
appropriate physical constraints to maximize the likelihood of the probability model for
the image.
iii. Consider an “inverse” conditional probability function given by Beyes’ Theorem:
Q( x  ) 
 ( x ) p( x )
.
 ( )
Then it is possible to estimate the value of the inverse probability function iteratively from
current guesses of k ( x ),  k ( ) at the kth iteration of the measured image and
deconvolved image respectively.
Our iterative estimate of the inverse filter is then:
MP/BME 574 Lecture 19: Random number generators and Monte Carlo methods
Qk ( x  ) 
 k ( x ) p( x )
,
 k ( )
where,
k ( x )   k ( ) p( x )d , and  k ( )    ( x )Qk 1 ( x  )dx
Putting this all together starting with the last result and substituting, then the iterative
estimate of the image is:
 k 1 ( )    ( x )
 k ( ) p( x )
 ( x)
dx   k ( ) 
p( x )dx .
k ( x )
k ( x )
This is guaranteed to converge if the x,  are non-negative and the respective areas of
k (x ) and  k ( ) are conserved. This is because k (x ) will approach  (x ) and the
 k 1 ( ) will then approach  k ( ) in these circumstances. Note that the model has
remained general. As long as the model follows the requirements of a probability density
function (pdf), its form can depend on the desired application. This is not to say that the
algorithm is guaranteed to converge to the global maximum likelihood result although in
practice the algorithm is very robust in applications where there is sufficient SNR.