Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistics What is probability? What is statistics? 1 Probability and Statistics Probability Formally defined using a set of axioms Seeks to determine the likelihood that a given event or observation or measurement will or has happened What is the probability of throwing a 7 using two dice? Statistics Used to analyze the frequency of past events Uses a given sample of data to assess a probabilistic model’s validity or determine values of its parameters After observing several throws of two dice, can I determine whether or not they are loaded Also depends on what we mean by probability 2 Probability and Statistics We perform an experiment to collect a number of top quarks How do we extract the best value for its mass? What is the uncertainty of our best value? Is our experiment internally consistent? Is this value consistent with a given theory, which itself may contain uncertainties? Is this value consistent with other measurements of the top quark mass? 3 Probability and Statistics CDF “discovery” announced 4/11/2011 4 Probability and Statistics 5 Probability and Statistics Pentaquark search - how can this occur? 2003 – 6.8s effect 2005 – no effect 6 Probability Let the sample space S be the space of all possible outcomes of an experiment Let x be a possible outcome Then P(x found in [x,x+dx]) = f(x)dx f(x) is called the probability density function (pdf) It may be called f(x;q) since the pdf could depend on one or more parameters q Often we will want to determine q from a set of measurements Of course x must be somewhere so f x dx 1 7 Probability Definitions of mean and variance are given in terms of expectation values Ex xf x dx V x E x Ex E x s 2 2 2 2 8 Probability Definitions of covariance and correlation coefficient Vxy covx, y E x x y y E xy x y xy covx, y s xs y if x, y independen t then f x,y f x x f y y and then E x,y xyf x,ydxdy x y and so covx, y 0 9 Probability Error propagation Consider y x with vari ables x x1 , x2 ,...xn and E x and Vij cov xi , x j n y then TS expanding y x y xi i i 1 xi x we find E y x y n y y 2 E y x y Vij i , j 1 xi x j x 2 10 Probability This gives the familiar error propagation formulas for sums (or differences) and products (or ratio) 2 2 2 Using V y s y E y E y we find for y x1 x2 s y2 s 12 s 22 2 covx1 , x2 and for y x1 x2 covx1 , x2 2 2 2 2 y x1 x2 x1 x2 s y2 s 12 s 22 11 Uniform Distribution Let 1 for α x β f ( x; , ) otherwise 0 x 1 E x xf x dx dx 2 V x E x E x 2 2 1 1 1 2 V x x dx 2 12 What is the position resolution of a silicon or multiwire proportional chamber with detection elements of space x? 12 Binomial Distribution Consider N independent experiments (Bernoulli trials) Let the outcome of each be pass or fail Let the probability of pass = p Probabilit y of n successes p n 1 p N! But there are permuation s for n! N n ! N distinguis hable objects, grouping them n at a time N! N n n f (n; N , p ) p 1 p n! N n ! N n 13 Permutations Quick review Number of permutatio ns for N elements N ! This considers each element distinguis hable But n elements of the first type are indistingu ishable so n! of the elements lead to the same situation Ditto for the remaining N-n elements of the second type Thus accounting for these irrelevant permutatio ns leads to N N! number of unique permuation s is N-n !n! n 14 Binomial Distribution For the mean and variance we obtain (using small tricks) N En nf n; N , p Np n 0 V n E n En Np1 p 2 2 And note with the binomial theorem that N n N n N f (n; N , p) p 1 p p 1 p 1 n 0 n 0 n N N 15 Binomial Distribution Binomial pdf 16 Binomial Distribution Examples Coin flip (p=1/2) Dice throw (p=1/6) Branching ratio of nuclear and particle decays (p=Br) Detector or trigger efficiencies (pass or not pass) Blood group B or not blood group B 17 Binomial Distribution It’s baseball season! What is the probability of a 0.300 hitter getting 4 hits in one game? N 4, p 0.3 4! 4 0 f 4;4,0.3 0.3 0.7 0.0081 4!0! E n 4 0.3 1.2 V n 4 0.3 0.7 0.84 Expect 1.2 0.84 hits for a 0.300 hitter 18 Poisson Distribution Consider when N p0 E n Np v The Poisson pdf is n v v f n; v e n! and one finds E n v V n s 2 v 19 Poisson Distribution N! N n f n; N , p p n 1 p n! N n ! N! N n for N large N n ! p 2 N n N n 1 1 p 1 pN n ... 2! N 2 p2 N n 1 p 1 Np ... e Np e v 2! n n v N n v v e f n; N , p pe for large N and small p n! n! N n 20 Poisson Distribution Poisson pdf 21 Poisson Distribution Examples Particles detected from radioactive decays Sum of two Poisson processes is a Poisson process Particles detected from scattering of a beam on target with cross section s Cosmic rays observed in a time interval t Number of entries in a histogram bin when data is accumulated over a fixed time interval Number of Prussian soldiers kicked to death by horses Infant mortality QC/failure rate predictions 22 Poisson Distribution Let Let N~1020 atoms 1 1 Let p 12 ~ 2 1020 /s τ 10 y Then v Np 2 / s and 20 e 2 P (0;2) 0.135 0! 21 e 2 P (1;2) 0.271 1! 22 e 2 P ( 2;2) 0.271 2! 23 Gaussian Distribution Gaussian distribution Important because of the central limit theorem For n independent variables x1,x2,…,xN that are distributed according to any pdf, then the sum y=∑xi will have a pdf that approaches a Gaussian for large N Examples are almost any measurement error (energy resolution, position resolution, …) E y i i V y s i i 24 Gaussian Distribution The familiar Gaussian pdf is 2 1 x f x; , s exp 2 2 s 2s 2 E x V x s 2 25 Gaussian Distribution Some useful properties of the Gaussian distribution are in range ±s) = 0.683 in range ±2s) = 0.9555 in range ±3s) = 0.9973 outside range ±3s) = 0.0027 outside range ±5s) = 5.7x10-7 P(x P(x P(x P(x P(x P(x in range ±0.6745s) = 0.5 26 c2 Distribution Chi-square distribution 1 f z; n n / 2 z n / 21e z / 2 z 0 2 n / 2 n 1,2,... is the number of degrees of freedom Ez n V z 2n The usefulness of this pdf is that for n independen t xi , i , s i2 n z i 1 xi i 2 s i2 follows the c 2 distributi on with n d.o.f. 27 c2 Distribution 28 Probability 29 Probability Probability can be defined in terms of Kolmogorov axioms The probability is a real-valued function defined on subsets A,B,… in sample space S For every subset A in S, P A 0 If A B 0, P A B P A PB PS 1 This means the probability is a measure in which the measure of the entire sample space is 1 30 Probability We further define the conditional probability P(A|B) read P(A) given B P A B P A | B P B Bayes’ theorem Using P A B PB A PB | AP A P A | B P B 31 Probability For disjoint Ai PB PB | Ai P Ai i then PB | AP A P A | B PB | Ai P Ai i Usually one treats the Ai as outcomes of a repeatable experiment 32 Probability Usually one treats the Ai as outcomes of a repeatable experiment Then P(A) is usually assigned a value equal to the limiting frequency of occurrence of A A P A lim n n Called frequentist statistics But Ai could also be interpreted as hypotheses, each of which is true or false Then P(A) represents the degree of belief that hypothesis A is true Called Bayesian statistics 33 Bayes’ Theorem Suppose in the general population P(disease) = 0.001 P(no disease) = 0.999 Suppose there is a test to check for the disease P(+, disease) = 0.98 P(-, disease) = 0.02 But also P(+, no disease) = 0.03 P(-, no disease) = 0.97 You are tested for the disease and it comes back +. Should you be worried? 34 Bayes’ Theorem Apply Bayes’ theorem P(, disease ) P(disease ) P(disease ,) P(, disease ) P(disease ) P(, no disease ) P(no disease ) 0.98 0.001 P(disease ,) 0.032 0.98 0.001 0.03 0.999 3.2% of people testing positive have the disease Your degree of belief about having the disease is 3.2% 35 Bayes’ Theorem Is athlete A guilty of drug doping? Assume a population of athletes in this sport P(drug) = 0.005 P(no drug) = 0.995 Suppose there is a test to check for the drug P(+, drug) = 0.99 P(-, drug) = 0.01 But also P(+, no drug) = 0.004 P(-, no drug) = 0.996 The athlete is tested positive. Is he/she involved in drug doping? 36 Bayes’ Theorem Apply Bayes’ theorem P(, drug ) P(drug ) P(drug ,) P(, drug ) P(drug ) P(, no drug ) P(no drug ) 0.99 0.005 P(drug ,) 0.55 0.99 0.005 0.004 0.995 and P(, no drug ) P(no drug ) P(no drug ,) P(, no drug ) P(no drug ) P(, drug ) P(drug ) 0.004 0.995 P(no drug ,) 0.45 0.004 0.995 0.99 0.005 37 ??? Binomial Distribution Calculating efficiencies Usually use e instead of p E n n E n Ne e e N N V n s 2 Ne 1 e We don' t know the true e but we can use our estimate e s e 1 e 1 e n 1 n / N N N N This is the error on the efficiency 38 Binomial Distribution But there is a problem If n=0, (e’) = 0 If n=N, (e’) = 0 Actually we went wrong in assuming the best estimate for e is n/N We should really have used the most probable value of e given n and N A proper treatment uses Bayes’ theorem but lucky for us (in HEP) the solution is implemented in ROOT h_num->Sumw2() h_den->Sumw2() h_eff->Divide(h_num,h_den,1.0,1.0,”B”) 39