Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 - Recitation, Mondays 14-16:00 CIP room, starting May 9, 2011 - Exercises will be given at the end of the lecture, for you to try out. Will be discussed, along with other material, in recitation of following week - Course material available under: http://www.mpp.mpg.de/~caldwell/ss11.html http://www.mpp.mpg.de/~volynets/ss11/ NB: you will get the most out of this course if you can practice solving problems on your own computer ! Data Analysis and Monte Carlo Methods Lecture 1 1 Modeling and Data We image flipping a coin, or rolling dice, or picking a lottery number. The initial conditions are not known, so we assume symmetry and say every outcome is equally likely: - coin, heads or tails each have equal chance, probability of each is 1/2 - - rolling dice - each number on each die equally likely: each pair (6x6) equally likely. E.g., (1,1) (3,4)… - i.e., we make a model for the physical process. The model contains assumptions (e.g., each outcome equally likely). - given the model, we can make predictions and compare these to the data. - from the comparison, we decide if the model is reasonable. Data Analysis and Monte Carlo Methods Lecture 2 2 Theory g(!y |!λ, M ) !y !λ M Are the theoretical observables Are the parameters of the theory Is the model or theory Modeling of experiment f (!x|!λ, M ) compare ! D Experiment !x Is a possible data outcome Data Analysis and Monte Carlo Methods Lecture 2 3 Notation g(!y |!λ, M ) Is understood as a probability density. I.e., the probability that !y is in the interval !y → !y + d!y given the model M and the parameter values specified by !λ . e.g., in we are considering the decay of a unstable particle, we would have Probability density of decay occurring at time t −t/τ g(t|τ ) ∝ e for single particle, assuming constant probability per unit time. Data Analysis and Monte Carlo Methods Lecture 2 4 How we learn Deduction Figure 1: Paradigm for data analysis. Knowledge is gained from a comparison of model Data Analysis and Lecture 2 5 predictions with data. Intermediate steps may be necessary, e.g., to model experimental Monte Carlo Methods conditions. How we Learn We learn by comparing measured data with distributions for predicted results assuming a theory, parameters, and a modeling of the experimental process. What we typically want to know: • Is the theory reasonable ? I.e., is the observed data a likely result from this theory (+ experiment). • If we have more than one potential explanation, then we want to be able to quantify which theory is more likely to be correct given the observations • Assuming we have a reasonable theory, we want to estimate the most probable values of the parameters, and their uncertainties. This includes setting limits (>< some value at XX% probability). Data Analysis and Monte Carlo Methods Lecture 2 6 Logical Basis Model building and making predictions from models follows deductive reasoning: Given AB (major premise) Given BC (major premise) Then, given A you can conclude that C is true etc. Everything is clear, we can make frequency distributions of possible outcomes within the model, etc. This is math, so it is correct … Data Analysis and Monte Carlo Methods Lecture 2 7 Logical Basis However, in physics what we want to know is the validity of the model given the data. i.e., logic of the form: Given AC Measure C, what can we say about A ? Well, maybe A1C, A2C, … We now need inductive logic. We can never say anything absolutely conclusive about A unless we can guarantee a complete set of alternatives Ai and only one of them can give outcome C. This does not happen in science, so we can never say we found the true model. Data Analysis and Monte Carlo Methods Lecture 2 8 Logical basis Instead of truth, we consider knowledge Knowledge = justified true belief Justification comes from the data. Start with some knowledge or maybe plain belief Do the experiment Data analysis gives updated knowledge Data Analysis and Monte Carlo Methods Lecture 2 9 Formulation of Data Analysis In the following, I will formulate data analysis as a knowledge-updating scheme. Knowledge+data updated knowledge This leads to the usual Bayes’ equation, but I prefer this derivation to the usual one in the textbooks. Data Analysis and Monte Carlo Methods Lecture 2 10 Formulation We require that P (!x|!λ, M ) ≥ 0 ! P (!x|!λ, M )d!x = 1 although as we will see the normalization condition is not really needed. The modeling of the experiment will typically add other (nuisance) parameters. E.g., there are often uncertainties, such as the energy scale of the experiment. Different assumptions on these lead to different ν represents predictions for the data. Can have P (!x|!λ, !ν , M ) where ! our nuisance parameters. Data Analysis and Monte Carlo Methods Lecture 2 11 Formulation The expected distribution (density) of the data assuming a model M and parameters !λ is written as P (!x|!λ, M ) where ! x is a possible realization of the data. There are different possible definitions of this function. Imagine we flip a coin 10 times, and get the following result: THTHHTHTTH We now repeat the process with a different coin and get TTTTTTTTTT Which outcome has higher probability ? Data Analysis and Monte Carlo Methods Lecture 2 12 Take a model where H, T are equally likely. Then, outcome 1 prob = (1/2)10 outcome 2 prob = (1/2)10 And Something seem wrong with this result ? This is because we evaluate many probabilities at once. The result above is the probability for any sequence of ten flips of a fair coin. Given a fair coin, we could also calculate the chance of getting n times H: ! Data Analysis and Monte Carlo Methods " ! "10 1 10 n 2 Lecture 2 13 And we find the following result: n 0 1 2 3 4 5 6 7 8 9 10 p 1·2−10 10·2−10 45·2−10 120·2−10 210 ·2−10 252 ·2−10 210 ·2−10 120 ·2−10 45 ·2−10 10 ·2−10 1 ·2−10 Data Analysis and Monte Carlo Methods There are many more ways to get 5 H than 0, so this is why the first result somehow looks more probable, even if each sequence has exactly the same probability in the model. Maybe the model is wrong and one coin is not fair ? How would we test this ? Lecture 2 14 The message: there are usually many ways to define the probability for your data. Which is better, or whether to use several, depends on what you are trying to do. E.g., have measured times in exponential decay. Can define the probability density as N ! 1 −ti /τ ! P (t|τ ) = e τ i=1 Or you can count events in a time interval and compare to expectations n M ! e−νj νj j P (!t|τ ) = nj ! j=1 Data Analysis and Monte Carlo Methods νj = expected events in bin j nj = observed events in bin j Lecture 2 15 Formulation For the model, we have 0≤ P (M ) ≤ 1 . For a fully Bayesian analysis, we require ! i P (Mi ) = 1 For the parameters, assuming a model, we have: ! P (!λ|Mi ) ≥ 0 P (!λ|Mi )d!λ = 1 The joint probability distribution is and ! P (Mi ) i Data Analysis and Monte Carlo Methods " P (!λ, M ) = P (!λ|M )P (M ) P (!λ|Mi )d!λ = 1 Lecture 2 16 Learning Rule ! ∝ P (!x = D| ! !λ, M )Pi (!λ, M ) Pi+1 (!λ, M |D) where the index represents a ‘state-of-knowledge’ We have to satisfy our normalization condition, so ! !λ, M )Pi (!λ, M ) P (! x = D| ! =! " Pi+1 (!λ, M |D) ! !λ, M )Pi (!λ, M )d!λ P (! x = D| M We usually write . This is our ‘prior’ information before performing the measurement. Data Analysis and Monte Carlo Methods Lecture 2 17 Learning Rule ! !λ, M )Pi (!λ, M ) P (! x = D| ! =! " Pi+1 (!λ, M |D) ! !λ, M )Pi (!λ, M )d!λ P (!x = D| M The denominator is the probability to get the data summing over all possible models and all possible values of the parameters. !" ! !λ, M )Pi (!λ, M )d!λ ! = P (!x = D| P (D) M so Bayes Equation Data Analysis and Monte Carlo Methods Lecture 2 18 Bayes-Laplace Equation Here is the standard derivation: P (A, B) = P (A|B)P (B) S P (A, B) = P (B|A)P (A) So P (B|A) = A A B P (A|B)P (B) P (A) B Clear for logic propositions and well-defined S,A,B. In our case, B=model+parameters, A=data Data Analysis and Monte Carlo Methods Lecture 2 19 Notation-cont. Cumulative distribution function: F(a) = a ∑ P(x i |θ ) i =−∞ a F(a) = ∫ P(x | θ )dx −∞ 0 ≤ F(a) ≤ 1 P(a ≤ x ≤ b) = F(b) − F(a) Equality may not be possible for discrete case ∞ Expectation value: E[x] € = ∑ x i P(x i | θ ) For probabilities i= −∞ ∞ E[x] = ∫ x P(x | θ ) dx For probability densities −∞ ∞ E[u(x)] = ∫ u(x) P(x | θ ) dx For u(x), v(x) any two functions of x, E[u+v]=E[u]+E[v]. For c,k any constants, E[cu+k]=cE[u]+k. −∞ Data Analysis and € Monte Carlo Methods Lecture 2 20 Notation-cont. ∞ The nth n α ≡ E[x ]= moment variable is given by: n ∫ x n P(x | θ ) dx −∞ For discrete probabilities, integrals ⇒ sums in obvious way µ ≡ α1 =E[x] is known as the€mean The nth central moment of x: mn ≡ E[(x − α1 ) n ] = ∞ n (x − α ) ∫ 1 P(x | θ ) dx −∞ σ 2≡V[x]≡m2=α2-µ2 is known as the variance and σ is known as the standard deviation. € µ, σ (or σ2) are most commonly used measures to characterize a distribution. Data Analysis and Monte Carlo Methods Lecture 2 21 Notation-cont. Other useful characteristics: • most-probable value (mode) is value of x which maximizes f(x;θ) • median is a value of x such that F(xmed)=0.5 mode f(x) median mean x Data Analysis and Monte Carlo Methods Lecture 2 22 Examples of using Bayes’ Theorem A particle detector has an efficiency of 95% for detecting particles of type A and an efficiency of 2% for detecting particles of type B. Assume the detector gives a positive signal. What can be concluded about the probability that the particle was of type A ? Answer: NOTHING. It is first necessary to know the relative flux of particles of type A and B. Now assume that we know that 90% of particles are of type B and 10 % are of type A. Then we can calculate: P( signal | A) P( A) P( A | signal ) = P( signal | A) P( A) + P( signal | B) P( B) P( A | signal ) = Data Analysis and Monte Carlo Methods 0.95 • 0.1 = 0.84 0.95 • 0.1 + 0.02 • 0.9 Lecture 2 23 We are told in the problem that we know P(A), P(B), P(signal|A), and P(signal|B). This information was somehow determined separately, possibly as a frequency, and is the job of the experimenter to determine. Suppose we want to get the Signal to Background ratio for a sample of many measurements, where signal is the number of particles of type A and background is the number of particles of type B: P( signal | A) P( A) P( signal | A) P( A) + P( signal | B) P( B) P( signal | B) P( B) P( B | signal ) = P( signal | A) P( A) + P( signal | B) P( B) P( A | signal ) P( signal | A) P( A) 0.95 • 0.1 = = = 5.3 P( B | signal ) P( signal | B) P( B) 0.02 • 0.9 P( A | signal ) = Data Analysis and Monte Carlo Methods Lecture 2 24 Notation-cont. For two random variables x,y, define joint p.d.f., P(x,y), where we leave off the parameters as shorthand. The probability that x is in the range x→x+dx and simultaneously y is in the range y→y+dy is P(x,y)dxdy. To evaluate expectation values, etc., usually need marginal p.d.f. The marginal p.d.f. of x (y unobserved) is ∞ Px (x) = ∫ P(x, y) dy −∞ The mean of x is then µx = ∞ ∫∫ xP(x, y) dxdy = ∫ xP (x) dx x −∞ The covariance of x and y is defined as € cov[x,y] = E[(x - µx )(y − µy )] = E[xy] − µx µy € And the correlation coefficient is ρ xy = cov[x, y]/σ xσ y Data Analysis and Monte Carlo Methods € Lecture 2 25 Examples ρxy=0 y The shading represents an equal probability density contour x ρxy=-0.8 y x x The correlation coefficient is limited to Data Analysis and Monte Carlo Methods ρxy=0.2 y Lecture 2 -1≤ ρxy ≤1 26 Independent Variables Two variables are independent if and only if P(x,y)=Px(x)Py(y) Then cov[x, y] = E[xy] − µx µy Data Analysis and Monte Carlo Methods € = ∫ ∫ xy P(x, y) dxdy − µ µ = ∫ ∫ xy P (x)P (y) dxdy − µ µ = ∫ xP (x)dx ∫ yP (y)dy − µ µ x x x y x y Lecture 2 y x y y =0 27 Notation-cont. If x,y are independent, then E[u(x)v(y)]=E[u(x)]E[v(y)] and V[x+y]=V[x]+V[y] If x,y are not independent V[x+y]=E[(x+y)2]-(E[x+y])2 =E[x2]+E[y2]+2E[xy] – (E[x]+E[y])2 =V[x]+V[y]+2(E[xy]-E[x]E[y]) =V[x]+V[y]+2cov[x,y] Data Analysis and Monte Carlo Methods Lecture 2 28 Binomial Distribution Bernoulli Process: random process with exactly two possible outcomes which occur with fixed probabilities (e.g., flip coin, heads or tails, particle recorded/not recorded, …). Probabilities from symmetry argument or other information. Definitions: p is the probability of a ‘success’ (heads, detection of particle, …) 0≤p≤1 N independent trials (flip of the coin, number of particles crossing detector, …) r is the number of successes (heads, observed particles, …) 0 ≤ r≤N Then Probability of r successes in N trials P(r | N, p) = N! p rq N −r r!(N − r)! where q = 1− p Number of combinations - Binomial coefficient € Data Analysis and Monte Carlo Methods Lecture 2 29 Derivation: Binomial Coefficient Ways to order N distinct objects is N!=N(N-1)(N-2)…1 N choices for first position, then (N-1) for second, then (N-2) … Now suppose we don’t have N distinct objects, but have subsets of identical objects. E.g., in flipping a coin, two subsets (tails and heads). Within a subset, the objects are indistinguishable. For the ith subset, the ni! combinations are all equivalent. The number of distinct combinations is then N! n !n !⋅ ⋅ ⋅n ! 1 2 where ∑n i i =N n For the binomial case, there are two subclasses (Success&failure, heads or tails, …) The combinatorial coefficient is therefore € Data Analysis and Monte Carlo Methods N N! = r r!(N − r)! Lecture 2 30 Binomial Distribution p is the probability of a ‘success’ (heads, detection of particle, …) 0≤p≤1 N independent trials (flip of the coin, number of particles crossing detector, …) r is the number of successes (heads, observed particles, …) 0 ≤ r≤N Then Probability of r successes in N trials P(r | N, p) = N! p rq N −r r!(N − r)! where q = 1− p Number of combinations - Binomial coefficient € Data Analysis and Monte Carlo Methods Lecture 2 31 Binomial Distribution-cont. P=0.5 N=4 P=0.5 N=15 P=0.5 N=5 P=0.5 N=50 P=0.1 N=5 P=0.1 N=15 P=0.8 N=5 P=0.8 N=15 Data Analysis and Monte Carlo Methods Lecture 2 E[r]=Np V[r]=Np(1-p) Notes: • for large N, p near 0.5 distribution is approx. symmetric • for p near 0 or 1, the variance is reduced 32 Example Example: You are designing a particle tracking system and require at least three measurements of the position of the particle along its trajectory to determine the parameters. You know that each detector element has an efficiency of 95%. How many detector elements would have to ‘see’ the track to have a 99% reconstruction efficiency ? Solution: We are happy with 3 or more hits, so our probability is P (r ≥ 3|N, p) = Data Analysis and Monte Carlo Methods N ! P (r|N, p) > 0.99 r=3 Lecture 2 33 Example-cont. N =3 N =4 N =5 3! (0.95)3 (1 − 0.95)3−3 = 0.953 = 0.857 3!(3 − 3)! 4! f (3; 4, 0.95) = (0.95)3 (1 − 0.95) 4−3 = 4(0.95)3 (0.05) = 0.171 3!(4 − 3)! 4! f (4; 4, 0.95) = (0.95) 4 (1 − 0.95) 4− 4 = (0.95) 4 = 0.815 4!(4 − 4)! 5! f (3; 5, 0.95) = (0.95)3 (1 − 0.95)5−3 = 10(0.95)3 (0.05) 2 = 0.021 3!(5 − 3)! 5! f (4; 5, 0.95) = (0.95) 4 (1 − 0.95)5− 4 = 5(0.95) 4 (0.05) = 0.204 4!(5 − 4)! 5! f (5; 5, 0.95) = (0.95)5 (1 − 0.95)5−5 = (0.95)5 = 0.774 5!(5 − 5)! f (3; 3, 0.95) = 0.986 0.999 With 5 detector layers, we have >99% chance of getting at least 3 hits Data Analysis and Monte Carlo Methods Lecture 2 34