Download Data Analysis and Monte Carlo Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Super-Kamiokande wikipedia , lookup

Business intelligence wikipedia , lookup

Data analysis wikipedia , lookup

Transcript
Data Analysis and Monte Carlo Methods
Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM
Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM
General Information:
-  Lectures will be held in English, Mondays 16-18:00
-  Recitation, Mondays 14-16:00 CIP room, starting May 9, 2011
-  Exercises will be given at the end of the lecture, for you to try out. Will be
discussed, along with other material, in recitation of following week
-  Course material available under:
http://www.mpp.mpg.de/~caldwell/ss11.html
http://www.mpp.mpg.de/~volynets/ss11/
NB: you will get the most out of this course if you can practice solving
problems on your own computer !
Data Analysis and
Monte Carlo Methods
Lecture 1
1
Modeling and Data
We image flipping a coin, or rolling dice, or picking a lottery number.
The initial conditions are not known, so we assume symmetry and say
every outcome is equally likely:
- coin, heads or tails each have equal chance, probability of each is 1/2
- - rolling dice - each number on each die equally likely: each pair (6x6)
equally likely. E.g., (1,1) (3,4)…
-  i.e., we make a model for the physical process. The model contains
assumptions (e.g., each outcome equally likely).
-  given the model, we can make predictions and compare these to the
data.
-  from the comparison, we decide if the model is reasonable.
Data Analysis and
Monte Carlo Methods
Lecture 2
2
Theory
g(!y |!λ, M )
!y
!λ
M
Are the theoretical observables
Are the parameters of the theory
Is the model or theory
Modeling of experiment
f (!x|!λ, M )
compare
!
D
Experiment
!x Is a possible data outcome
Data Analysis and
Monte Carlo Methods
Lecture 2
3
Notation
g(!y |!λ, M )
Is understood as a probability density. I.e., the probability that
!y is in the interval !y → !y + d!y
given the model M and the parameter values specified by !λ .
e.g., in we are considering the decay of a unstable particle, we would
have
Probability density of decay occurring at time t
−t/τ
g(t|τ ) ∝ e
for single particle, assuming constant
probability per unit time.
Data Analysis and
Monte Carlo Methods
Lecture 2
4
How we learn
Deduction
Figure 1: Paradigm for data analysis. Knowledge is gained from a comparison of model
Data Analysis and
Lecture 2
5
predictions
with
data.
Intermediate
steps
may
be
necessary,
e.g.,
to
model
experimental
Monte Carlo Methods
conditions.
How we Learn
We learn by comparing measured data with distributions for predicted
results assuming a theory, parameters, and a modeling of the
experimental process.
What we typically want to know:
•  Is the theory reasonable ? I.e., is the observed data a likely result from
this theory (+ experiment).
•  If we have more than one potential explanation, then we want to be
able to quantify which theory is more likely to be correct given the
observations
•  Assuming we have a reasonable theory, we want to estimate the most
probable values of the parameters, and their uncertainties. This includes
setting limits (>< some value at XX% probability).
Data Analysis and
Monte Carlo Methods
Lecture 2
6
Logical Basis
Model building and making predictions from models follows deductive
reasoning:
Given AB (major premise)
Given BC (major premise)
Then, given A you can conclude that C is true
etc.
Everything is clear, we can make frequency distributions of possible
outcomes within the model, etc. This is math, so it is correct …
Data Analysis and
Monte Carlo Methods
Lecture 2
7
Logical Basis
However, in physics what we want to know is the validity of the model
given the data. i.e., logic of the form:
Given AC
Measure C, what can we say about A ?
Well, maybe A1C, A2C, …
We now need inductive logic. We can never say anything absolutely
conclusive about A unless we can guarantee a complete set of
alternatives Ai and only one of them can give outcome C. This does not
happen in science, so we can never say we found the true model.
Data Analysis and
Monte Carlo Methods
Lecture 2
8
Logical basis
Instead of truth, we consider knowledge
Knowledge = justified true belief
Justification comes from the data.
Start with some knowledge or maybe plain belief
Do the experiment
Data analysis gives updated knowledge
Data Analysis and
Monte Carlo Methods
Lecture 2
9
Formulation of Data Analysis
In the following, I will formulate data analysis as a knowledge-updating
scheme.
Knowledge+data  updated knowledge
This leads to the usual Bayes’ equation, but I prefer this derivation to the
usual one in the textbooks.
Data Analysis and
Monte Carlo Methods
Lecture 2
10
Formulation
We require that P (!x|!λ, M ) ≥ 0
!
P (!x|!λ, M )d!x = 1
although as we will see the normalization condition is not really needed.
The modeling of the experiment will typically add other (nuisance)
parameters. E.g., there are often uncertainties, such as the energy scale
of the experiment. Different assumptions on these lead to different
ν represents
predictions for the data. Can have P (!x|!λ, !ν , M ) where !
our nuisance parameters.
Data Analysis and
Monte Carlo Methods
Lecture 2
11
Formulation
The expected distribution (density) of the data assuming a model M and
parameters !λ is written as P (!x|!λ, M ) where !
x is a possible realization
of the data. There are different possible definitions of this function.
Imagine we flip a coin 10 times, and get the following result:
THTHHTHTTH
We now repeat the process with a different coin and get
TTTTTTTTTT
Which outcome has higher probability ?
Data Analysis and
Monte Carlo Methods
Lecture 2
12
Take a model where H, T are equally likely. Then,
outcome 1
prob = (1/2)10
outcome 2
prob = (1/2)10
And
Something seem wrong with this result ? This is because we evaluate
many probabilities at once. The result above is the probability for any
sequence of ten flips of a fair coin. Given a fair coin, we could also
calculate the chance of getting n times H:
!
Data Analysis and
Monte Carlo Methods
" ! "10
1
10
n
2
Lecture 2
13
And we find the following result:
n
0
1
2
3
4
5
6
7
8
9
10
p
1·2−10
10·2−10
45·2−10
120·2−10
210 ·2−10
252 ·2−10
210 ·2−10
120 ·2−10
45 ·2−10
10 ·2−10
1 ·2−10
Data Analysis and
Monte Carlo Methods
There are many more ways to get 5 H
than 0, so this is why the first result
somehow looks more probable, even
if each sequence has exactly the same
probability in the model.
Maybe the model is wrong and one
coin is not fair ? How would we test
this ?
Lecture 2
14
The message: there are usually many ways to define the probability for
your data. Which is better, or whether to use several, depends on what
you are trying to do.
E.g., have measured times in exponential decay. Can define the
probability density as
N
!
1 −ti /τ
!
P (t|τ ) =
e
τ
i=1
Or you can count events in a time interval and compare to expectations
n
M
!
e−νj νj j
P (!t|τ ) =
nj !
j=1
Data Analysis and
Monte Carlo Methods
νj = expected events in bin j
nj = observed events in bin j
Lecture 2
15
Formulation
For the model, we have 0≤ P (M ) ≤ 1 . For a fully Bayesian analysis,
we require
!
i P (Mi ) = 1
For the parameters, assuming a model, we have:
!
P (!λ|Mi )
≥
0
P (!λ|Mi )d!λ
=
1
The joint probability distribution is
and
!
P (Mi )
i
Data Analysis and
Monte Carlo Methods
"
P (!λ, M ) = P (!λ|M )P (M )
P (!λ|Mi )d!λ = 1
Lecture 2
16
Learning Rule
! ∝ P (!x = D|
! !λ, M )Pi (!λ, M )
Pi+1 (!λ, M |D)
where the index represents a ‘state-of-knowledge’
We have to satisfy our normalization condition, so
! !λ, M )Pi (!λ, M )
P
(!
x
=
D|
! =! "
Pi+1 (!λ, M |D)
! !λ, M )Pi (!λ, M )d!λ
P
(!
x
=
D|
M
We usually write
. This is our ‘prior’ information before
performing the measurement.
Data Analysis and
Monte Carlo Methods
Lecture 2
17
Learning Rule
! !λ, M )Pi (!λ, M )
P
(!
x
=
D|
! =! "
Pi+1 (!λ, M |D)
! !λ, M )Pi (!λ, M )d!λ
P (!x = D|
M
The denominator is the probability to get the data summing over all
possible models and all possible values of the parameters.
!"
! !λ, M )Pi (!λ, M )d!λ
! =
P (!x = D|
P (D)
M
so
Bayes Equation
Data Analysis and
Monte Carlo Methods
Lecture 2
18
Bayes-Laplace Equation
Here is the standard derivation:
P (A, B) = P (A|B)P (B)
S
P (A, B) = P (B|A)P (A)
So
P (B|A) =
A
A B
P (A|B)P (B)
P (A)
B
Clear for logic propositions and well-defined S,A,B.
In our case, B=model+parameters, A=data
Data Analysis and
Monte Carlo Methods
Lecture 2
19
Notation-cont.
Cumulative distribution function: F(a) =
a
∑ P(x
i
|θ )
i =−∞
a
F(a) =
∫ P(x | θ )dx
−∞
0 ≤ F(a) ≤ 1
P(a ≤ x ≤ b) = F(b) − F(a)
Equality may not be possible for discrete case
∞
Expectation value:
E[x] €
= ∑ x i P(x i | θ )
For probabilities
i= −∞
∞
E[x] =
∫ x P(x | θ ) dx
For probability densities
−∞
∞
E[u(x)] =
∫ u(x) P(x | θ ) dx
For u(x), v(x) any two functions of x, E[u+v]=E[u]+E[v]. For c,k
any constants, E[cu+k]=cE[u]+k.
−∞
Data Analysis and
€
Monte Carlo Methods
Lecture 2
20
Notation-cont.
∞
The
nth
n
α
≡
E[x
]=
moment variable is given by:
n
∫
x n P(x | θ ) dx
−∞
For discrete probabilities, integrals ⇒ sums in obvious way
µ ≡ α1 =E[x] is known as the€mean
The
nth
central moment of x:
mn ≡ E[(x − α1 ) n ] =
∞
n
(x
−
α
)
∫
1 P(x | θ ) dx
−∞
σ 2≡V[x]≡m2=α2-µ2 is known as the variance and σ is known as the
standard deviation.
€
µ, σ (or σ2) are most commonly used measures to characterize a
distribution.
Data Analysis and
Monte Carlo Methods
Lecture 2
21
Notation-cont.
Other useful characteristics:
•  most-probable value (mode) is value of x which maximizes f(x;θ)
•  median is a value of x such that F(xmed)=0.5
mode
f(x)
median
mean
x
Data Analysis and
Monte Carlo Methods
Lecture 2
22
Examples of using Bayes’ Theorem
A particle detector has an efficiency of 95% for detecting particles of
type A and an efficiency of 2% for detecting particles of type B. Assume
the detector gives a positive signal. What can be concluded about the
probability that the particle was of type A ?
Answer: NOTHING. It is first necessary to know the relative flux of
particles of type A and B.
Now assume that we know that 90% of particles are of type B and 10 %
are of type A. Then we can calculate:
P( signal | A) P( A)
P( A | signal ) =
P( signal | A) P( A) + P( signal | B) P( B)
P( A | signal ) =
Data Analysis and
Monte Carlo Methods
0.95 • 0.1
= 0.84
0.95 • 0.1 + 0.02 • 0.9
Lecture 2
23
We are told in the problem that we know P(A), P(B), P(signal|A), and
P(signal|B). This information was somehow determined separately,
possibly as a frequency, and is the job of the experimenter to determine.
Suppose we want to get the Signal to Background ratio for a sample of
many measurements, where signal is the number of particles of type A
and background is the number of particles of type B:
P( signal | A) P( A)
P( signal | A) P( A) + P( signal | B) P( B)
P( signal | B) P( B)
P( B | signal ) =
P( signal | A) P( A) + P( signal | B) P( B)
P( A | signal ) P( signal | A) P( A) 0.95 • 0.1
=
=
= 5.3
P( B | signal ) P( signal | B) P( B) 0.02 • 0.9
P( A | signal ) =
Data Analysis and
Monte Carlo Methods
Lecture 2
24
Notation-cont.
For two random variables x,y, define joint p.d.f., P(x,y), where we leave
off the parameters as shorthand. The probability that x is in the range
x→x+dx and simultaneously y is in the range y→y+dy is P(x,y)dxdy.
To evaluate expectation values, etc., usually need marginal p.d.f. The
marginal p.d.f. of x (y unobserved) is
∞
Px (x) =
∫ P(x, y) dy
−∞
The mean of x is then
µx =
∞
∫∫ xP(x, y) dxdy = ∫ xP (x) dx
x
−∞
The covariance of x and y is defined as
€
cov[x,y] = E[(x - µx )(y − µy )] = E[xy] − µx µy
€
And the correlation coefficient is
ρ xy = cov[x, y]/σ xσ y
Data Analysis and
Monte Carlo Methods
€
Lecture 2
25
Examples
ρxy=0
y
The shading represents an equal probability
density contour
x
ρxy=-0.8
y
x
x
The correlation coefficient is limited to
Data Analysis and
Monte Carlo Methods
ρxy=0.2
y
Lecture 2
-1≤ ρxy ≤1
26
Independent Variables
Two variables are independent if and only if
P(x,y)=Px(x)Py(y)
Then
cov[x, y] = E[xy] − µx µy
Data Analysis and
Monte Carlo Methods
€
=
∫ ∫ xy P(x, y) dxdy − µ µ
=
∫ ∫ xy P (x)P (y) dxdy − µ µ
=
∫ xP (x)dx ∫ yP (y)dy − µ µ
x
x
x
y
x
y
Lecture 2
y
x
y
y
=0
27
Notation-cont.
If x,y are independent, then
E[u(x)v(y)]=E[u(x)]E[v(y)]
and
V[x+y]=V[x]+V[y]
If x,y are not independent
V[x+y]=E[(x+y)2]-(E[x+y])2
=E[x2]+E[y2]+2E[xy] – (E[x]+E[y])2
=V[x]+V[y]+2(E[xy]-E[x]E[y])
=V[x]+V[y]+2cov[x,y]
Data Analysis and
Monte Carlo Methods
Lecture 2
28
Binomial Distribution
Bernoulli Process: random process with exactly two possible outcomes
which occur with fixed probabilities (e.g., flip coin, heads or tails,
particle recorded/not recorded, …). Probabilities from symmetry
argument or other information.
Definitions:
p is the probability of a ‘success’ (heads, detection of particle, …) 0≤p≤1
N independent trials (flip of the coin, number of particles crossing detector, …)
r is the number of successes (heads, observed particles, …) 0 ≤ r≤N
Then
Probability of r successes in N trials
P(r | N, p) =
N!
p rq N −r
r!(N − r)!
where q = 1− p
Number of combinations - Binomial coefficient
€
Data Analysis and
Monte Carlo Methods
Lecture 2
29
Derivation: Binomial Coefficient
Ways to order N distinct objects is N!=N(N-1)(N-2)…1
N choices for first position, then (N-1) for second, then (N-2) …
Now suppose we don’t have N distinct objects, but have subsets of
identical objects. E.g., in flipping a coin, two subsets (tails and heads).
Within a subset, the objects are indistinguishable. For the ith subset, the
ni! combinations are all equivalent. The number of distinct combinations
is then
N!
n !n !⋅ ⋅ ⋅n !
1
2
where
∑n
i
i
=N
n
For the binomial case, there are two subclasses (Success&failure, heads
or tails, …) The combinatorial coefficient is therefore
€
Data Analysis and
Monte Carlo Methods
N 
N!
 =
r  r!(N − r)!
Lecture 2
30
Binomial Distribution
p is the probability of a ‘success’ (heads, detection of particle, …) 0≤p≤1
N independent trials (flip of the coin, number of particles crossing detector, …)
r is the number of successes (heads, observed particles, …) 0 ≤ r≤N
Then
Probability of r successes in N trials
P(r | N, p) =
N!
p rq N −r
r!(N − r)!
where q = 1− p
Number of combinations - Binomial coefficient
€
Data Analysis and
Monte Carlo Methods
Lecture 2
31
Binomial Distribution-cont.
P=0.5 N=4
P=0.5 N=15
P=0.5 N=5
P=0.5 N=50
P=0.1 N=5
P=0.1 N=15
P=0.8 N=5
P=0.8 N=15
Data Analysis and
Monte Carlo Methods
Lecture 2
E[r]=Np
V[r]=Np(1-p)
Notes:
•  for large N, p near 0.5
distribution is approx.
symmetric
•  for p near 0 or 1, the
variance is reduced
32
Example
Example:
You are designing a particle tracking system and require at least three
measurements of the position of the particle along its trajectory to
determine the parameters. You know that each detector element has an
efficiency of 95%. How many detector elements would have to ‘see’ the
track to have a 99% reconstruction efficiency ?
Solution:
We are happy with 3 or more hits, so our probability is
P (r ≥ 3|N, p) =
Data Analysis and
Monte Carlo Methods
N
!
P (r|N, p) > 0.99
r=3
Lecture 2
33
Example-cont.
N =3
N =4
N =5
3!
(0.95)3 (1 − 0.95)3−3 = 0.953 = 0.857
3!(3 − 3)!
4!
f (3; 4, 0.95) =
(0.95)3 (1 − 0.95) 4−3 = 4(0.95)3 (0.05) = 0.171
3!(4 − 3)!
4!
f (4; 4, 0.95) =
(0.95) 4 (1 − 0.95) 4− 4 = (0.95) 4 = 0.815
4!(4 − 4)!
5!
f (3; 5, 0.95) =
(0.95)3 (1 − 0.95)5−3 = 10(0.95)3 (0.05) 2 = 0.021
3!(5 − 3)!
5!
f (4; 5, 0.95) =
(0.95) 4 (1 − 0.95)5− 4 = 5(0.95) 4 (0.05) = 0.204
4!(5 − 4)!
5!
f (5; 5, 0.95) =
(0.95)5 (1 − 0.95)5−5 = (0.95)5 = 0.774
5!(5 − 5)!
f (3; 3, 0.95) =
0.986
0.999
With 5 detector layers, we have >99% chance of getting at least 3 hits
Data Analysis and
Monte Carlo Methods
Lecture 2
34