Download Document

Statistics We collect a sample of data, what do we do with it?   Estimate parameters (possibly of some model) Test whether a particular theory is consistent with our data (hypothesis testing) Statistics is a set of tools that allows us to achieve these goals 1 Statistics  Preliminaries An estimator ˆ is a function of the data whose value, the estimate, is intended as a meaningful guess of the true value of the parameter  There is no fundamenta l rule for doing this One thing we desire is consistenc y lim ˆ   n  Another th ing we desire is no bias b  E ˆ    0  2 Statistics  Some common estimators are for the mean and variance Consider N independen t measuremen ts xi of unknown  and unknown  2 1 ˆ  N N x i 1 i N 1 2 2 2 ˆ ˆ xi     s   N  1 i 1 Furthermor e V ˆ   2 N 1 N 3 4  2 V ˆ   m4    N N 1    3 c2 Distribution  A common situation is that you have a set of measurements xi and you know the true value of each xit  How good are our measurements?  Similarly you may be comparing a histogram of data with another that contains expectation values under some hypothesis  How well do the data agree with this hypothesis?  Or if parameters of a function were estimated using the method of least squares, a minimum value of c2 was obtained  How good was the fit? 4 c2 Distribution  Assuming   The measurements are independent of each other The measurements come from a Gaussian distribution  One can use the “goodness-of-fit” statistic c2 to answer these questions n c  2 xi  xti  i 1 2  2 i  In the case of Poisson distributed numbers, i2=xti, this is called Pearson’s c2 statistic 5 c2 Distribution  Chi-square distribution 1 f  z; n   n / 2 z n / 21e  z / 2 z  0  2 n / 2  n  1,2,... is the number of degrees of freedom Ez   n V z   2n The usefulness of this pdf is that for n independen t xi , i ,  i2 n z i 1 xi  i  2  i2 follows the c 2 distributi on with n d.o.f. 6 c2 Distribution 7 c2 Distribution The integrals (or cumulative distributions) between arbitrary points for both the Gaussian and c2 distributions cannot be evaluated analytically and must be looked up   What is the probability of getting a c2 > 10 with 4 degrees of freedom? This number tells you the probability that random fluctuations (chance fluctuations) in the data would give a value of c2 > 10 8 c2 Distribution  Note the p-value is defined as  p  f z, n dz c2  We’ll come back to p-values in a moment 9 c2 Distribution  1- cumulative c2 distribution 10 c2 Distribution Often one uses the reduced c2 = c2/n 11 Hypothesis Testing  Hypothesis tests provide a rule for accepting or rejecting hypotheses depending on the outcome of a measurement  Suppose H predicts f x , H  for some vector  of data x  We do an experiment and measure xobs What can we say about H? 12 Hypothesis Testing Normally we define regions in x-space that define where the data is compatible with H or not 13 Hypothesis Testing  Let’s say there is just one hypothesis H  We can define some test statistic t whose value in some way reflects the level of agreement between the data and they hypothesis  We can quantify the goodness-of-fit by specifying a p-value given an observed tobs in the experiment  p   g t , H dt  tobs   Assumes t is defined such that large values correspond to poor agreement with the hypothesis g is the pdf for t 14 Hypothesis Testing Notes    p is not the significance level of the test p is not the confidence level of a confidence interval p is not the probability that H is true  That’s Bayesian speak  p is the probability, under the assumption of H, of obtaining data (x or t(x)) having equal or lesser compatibility with H as xobs 15 Hypothesis Testing  Flip coins  N! f nh ; ph , N   phnh 1  phnh nh ! N  nh !    N  nh Hypothesis H is coin is fair (random) so ph=pt=0.5 We could take t=|nh-N/2|  Toss coin N=20 times and observe nh=17 Region of t - space with  compatibil ity t  7 p  value  P(nh  0,1,2,3,17,18,19,20)  0.026  Is H false?    Don’t know We can say that probability of observing 17 or more heads assuming H is 0.0026 p is the probability of observing this result “by chance” 16 Kolmogorov-Smirnov (K-S) Test The K-S test is an alternative to the c2 test when the data sample is small It is also more powerful than the c2 test since it does not rely on bins – though one commonly uses it that way  A common use is to quantify how well data and Monte Carlo distributions agree It also does not depend on the underlying cumulative distribution function being tested 17 K-S Test Data – Monte Carlo comparison 18 K-S Test  The K-S test is based on the empirical distribution function (ECDF) Fn(x) 1 n 1 if yi  x Fn  x    n i 1 0 otherwise  For n ordered data points yi  This is a step function that increases by 1/N at the value of each ordered data point 19 K-S Test  The K-S statistic is given by D  max Fn x   F x   n D  max F x   Fn x   n F x  is the hypothesiz ed distributi on  If D > some critical value obtained from tables, the hypothesis (data and theory distributions agree) is rejected 20 K-S Test 21 Statistics Suppose N independent measurements xi are drawn from a pdf f(x;) We want want to estimate the parameters    The most important method for doing this is the method of maximum likelihood A related method in the case of least squares 22 Hypothesis Testing  Example   Properties of some selected events Hypothesis H is these are top quark events  Working in x-space is hard so usually one constructs a test statistic t instead whose value reflects the compatibility between the data vector x and H   Low t – data more compatible with H High t – data less compatible with H  Since f(x,H) is known, g(t,H) can be determined 23 Hypothesis Testing  Notes    p is not the significance level of the test p is not the confidence level of a confidence interval p is not the probability that H is true  That’s Bayesian speak  p is the probability, under the assumption of H, of obtaining data (x or t(x)) having equal or lesser compatibility with H as xobs  Since p is a function of r.v. x, p itself is a r.v   If H is true, p is uniform in [0,1] If H is not true, p is peaked closer to 0 24 Hypothesis Testing  Suppose we observe nobs=ns+nb events   ns, nb are Poisson r.v.’s with means ns,nb nobs=ns+nb is Poisson r.v. with mean n=ns+nb  n s n b  f n;n s ,n b   n n! e  n s n b  25 Hypothesis Testing  Suppose nb=0.5 and we observe nobs=5  Publish/NY Times headline or not?  Often we take H to be the null hypothesis – assume it’s random fluctuation of background  Assume ns=0 p  pn  nobs   pspace of  compatibil ity  p  nobs1 n bn n  nobs n 0 n!  f n;0,n b   1  p  1.7  10   e n b 4 This is the probability of observing 5 or more resulting from chance fluctuations of the background 26 Hypothesis Testing  Another problem, instead of counting events say we measure some variable x  Publish/NY Times headline or not? 27 Hypothesis Testing  Again take H to be the null hypothesis – assume it’s random fluctuation of background  Assume ns=0 p  f n  11;n b  3.2,n s  0  5.4  10 4  Again p is the probability of observing 11 or more events resulting from chance fluctuations of the background     How did we know where to look / how to bin? Is the observed width consistent with the resolution in x? Would a slightly different analysis still show a peak? What about the fact that the bins on either side of the peak are low? 28 Least Squares  Another approach is to compare a histogram with a hypothesis that provides expectation values  In this case we’d compare a vector of Poisson distributed numbers (the histogram) with their expectation values ni=E[ni] N c  2 n 1   ni  n i    2 i N  n 1 ni  n i  ni This is called Pearson’s statistic If the ni are not too small (e.g. ni > 5) then the observed c2 will follow the chi-square pdf for N dof  Or more generally for N – number of fitted parameters  Same will hold true for N independent measurements yi that are Gaussian distributed 29 Least Squares  We can calculate the p-value as  p  f z; N dz where f is the c 2 pdf c2 Recall for the c 2 pdf, E z   N so χ2 often is taken as the measure of agreement N  In our example c  29.8 for 20 dof 2 p  0.073 30 Least Squares In our example though we have many bins with a small number of counts or 0 We can still use Pearson’s test but we need to determine the pdf f(c2) by Monte Carlo    Generate ni from Poisson, mean ni in each bin Compute c2 and record in a histogram Repeat for a large number of times (see next slide) 31 Least Squares  Using the modified pdf would give p=0.11 rather than p=0.073  In either case, we won’t publish 32 K-S Test  Usage in ROOT      TFile * data TFile * MC TH1F * jet_pt = data → Get(“h_jet_pt”) TH1F * MCjet_pt = MC → Get(“h_jet_pt”) Double_t KS=MCjet_pt→KolmogorovTest(jet_pt)  Notes  The returned value is the probability of the test  << 1 means the two histograms are not compatable  The returned value is not the maximum KS distance though you can return this with option “M”  Also available in statistical toolbox in MatLab 33 Limiting Cases Binomial Poisson Gaussian 34 Nobel Prize or IgNobel Prize? CDF result 35 Kaplan-Meier Curve A patient is treated for a disease. What is the probability of an individual surviving or remaining disease-free?    Usually patients will be followed for various lengths of time after treatment Some will survive or remain disease-free while others will not. Some will leave the study. A nonparametric method can be found using  Kaplan-Meier curve  Life table  Survival curve 36 Kaplan-Meier Curve Calculate a conditional probability  S(tN) = P(t1) x P(t2) x P(t3) x … P(tN)  The survival function S(t) is equivalent to the empirical distribution function F(t)  S t   We can write this as p j j ;t j  t  dj  p j  1    n  j   d j is the number dying during period j n j is the number tha t have survived to the beginning of period j 37 Kaplan-Meier Curve 38 Kaplan-Meier Curve The square root of the variance of S(t) can be calculated as   pk   pk 1 pk  / nk Assuming the pk follow a Gaussian (normal) distribution, then the 95% CL will be pk  1.95  pk  39 Gaussian Confidence Interval 40 Gaussian Confidence Interval 41 Gaussian Distribution  Some useful properties of the Gaussian distribution are in range ±) = 0.683 in range ±2) = 0.9555 in range ±3) = 0.9973 outside range ±3) = 0.0027 outside range ±5) = 5.7x10-7  P(x P(x P(x P(x P(x  P(x in range ±0.6745) = 0.5     42 Gaussian Distribution 43 Confidence Intervals Suppose you have a bag of black and white marbles and wish to determine the fraction f that are white. How confident are you of the initial composition? How does your confidence change after extracting n black balls? Suppose you are tested for a disease. The test is 100% accurate if you have the disease. The test gives 0.2% false positive if you do not. The test comes back positive. What is the probability 44 Confidence Intervals  Suppose you are searching for the Higgs and have a well-known expected background of 3 events. What 90% confidence limit can you set on the Higgs cross section  if you observe 0 events?  if you observe 3 events?  if you observe 10 events?  The ability to set confidence limits (or claim discovery) is an important part of frontier physics How to do this the “correct” way is somewhat/very controversial 45 Confidence Intervals  Questions    What is the mass of the top quark? What is the mass of the tau neutrino What is the mass of the Higgs  Answers    Mt = 172.5 ± 2.3 GeV Mv < 18.2 MeV MH > 114.3 GeV  More correct answers    Mt = 172.5 ± 2.3 GeV with CL = 0.683 0 < Mv < 18.2 MeV with CL = 0.95 Infinity > MH > 114.3 GeV with CL = 0.95 46 Confidence Interval A confidence interval reflects the statistical precision of the experiment and quantifies the reliabiltiy of a measurement For a sufficiently large data sample, the mean and standard deviation of the mean provide a good provide a good interval    What if the pdf isn’t Gaussian? What if there are physical boundaries? What if the data sample is small? Here we run into problems 47 Confidence Interval A dog has a 50% probability of being 100m from its master  You observe the dog, what can you say about its master?  With 50% probability, the master is within 100m of the dog  But this assumes   The master can be anywhere around the dog The dog has no preferred direction of travel 48 Confidence Intervals Neyman’s construction    Consider a pdf f(x;θ) = P(x|θ) For each value of θ, we construct a horizontal line segment [x1,x2] such that P(x [x1,x2|θ = 1-a The union of such intervals for all values of θ is called the confidence belt 49 Confidence Intervals Neyman’s construction   After performing an experiment to measure x, a vertical line is drawn through the experimentally measured value x0 The confidence interval for θ is the set of all values of θ for which the corresponding line segment [x1,x2] is intercepted by the vertical line 50 Confidence Intervals 51 Confidence Interval  Notes  The coverage condition is not unique  P(x<x1|θ) = P(x>x2|θ) = a/2  Called central confidence intervals  P(x<x1|θ) = a  Called upper confidence limits  P(x>x2|θ) = a  Called lower confidence limits 52 Poisson Confidence Interval We previously mentioned that the number of events produced in a reaction with cross section σ and fixed luminosity L follows a Poisson distribution with mean n=σ∫Ldt   P(n;v) = e-n nn / n! If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a Example: Measuring the Higgs production cross section assuming no background 53 Poisson Confidence Interval 54 Poisson Confidence Interval Central Intervals - Poisson 20  u (n) Poisson Distribution  e  P(n |  )  n! n Parameter 15 l (n) 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Count Upper Limits Lower Limits n 55 Poisson Confidence Interval 56 Poisson Confidence Interval  Assume signal s and background b nobs a  0.05  Pn  nobs ; s, b    n 0 s  b  n n! e  s  b  Solve numericall y for s  sup This gives an upper limit on s at CL  1-α In the special case that b  0 and nobs  0 a  0.05  e  sup  3 s 57 Poisson Confidence Interval 58 Confidence Intervals Sometimes though confidence intervals     Are empty Reduce in size when the background estimate increases Are smaller for a poorer experiment Exclude parameters for which the experiment is insensitive Example    We know that P(x=0|v=2.3) = 0.1 v < 2.3 @ 90% CL If the number of background events b is 3, 59 then since v = s + b, number of signal events Confidence Intervals 60 Confidence Intervals 61 Confidence Interval Experiment X uses a fit to extract the neutrino mass   Mv = -4 ± 2 eV => P (Mv < 0 eV) = 0.98? 62 Confidence Interval What is probability?  Frequentist approach  Developed by Venn, Fisher, Neyman, von Mises  The relative frequency with which something happens  number of successes / number of trials  Venn limit (n trials to infinity)  Assumes success appeared in the past and will occur in the future with the same probability  It will rain tomorrow in Tucson and P(S) = 0.01  The relative frequency it rains on Mondays in April is 0.01 63 Confidence Interval  What is probability  Bayesian approach  Developed by Bayes, Laplace, Gauss, Jeffreys, de Finetti  The degree of belief or confidence of a statement or measurement  Closer to what is used in everyday life  Is the Standard Model correct  Similar to betting odds  Not “scientific”?  It will rain tomorrow in Tucson and P(S) = 0.01  The plausibility of the above statement is 0.01 (ie the same as if I were to draw a white ball out of a container of 100 balls, 1 of which is white) 64 Confidence Interval Usually   Confidence interval == frequentist confidence interval Credible interval == Bayesian posterior probability interval  But you’ll also hear Bayesian confidence interval Probability  P=1–a  a = 0.05 => P = 95% 65 Confidence Interval  Suppose you wish to determine a parameter θ whose true value is θt is unknown  Assume we make a single measurement of an observable x whose pdf P(x|θ) depends on θ  Recall this is the probability of obtaining x given θ  Say we measure x0, then we obtain P(x0|θ)  Frequentist  Makes statements about P(x|θ)  Bayesian   Makes statements about P(θt|x0) P(θt|x0) = P(x0|θt) P(θt) / P(x0)  We’ll stick with the frequentist approach for the moment 66 Confidence Interval  (Frequentist) confidence intervals are constructed to include the true value of the parameter (θt) with a probability of 1-α  In fact this is true for any value of θ  A confidence interval [θ1,θ2] is a member of a set, such that the set has the property that P(θ [θ1,θ2])= 1-α   Perform an ensemble of experiments with fixed θ The interval [θ1,θ2] will vary and cover the fixed value θ in a fraction of 1-α of the experiments  Presumably when we make a measurement we are selecting it at random from the ensemble that contains the true value of θ, θt  Note we haven’t said anything about the probability of θt being in the interval [θ1,θ2] 67 as a Bayesian would Confidence Interval If P(θ [θ1,θ2]) = 1-a is true we say the intervals “cover” θ at the stated confidence If there are values of θ for which P(θ [θ1,θ2]) < 1-a we say the intervals “undercover” for that θ If there are values of θ for which P(θ [θ1,θ2]) > 1-a we say the intervals “overcover” for that θ Undercoverage is bad 68 Confidence Intervals Neyman’s construction    Consider a pdf f(x;θ) = P(x|θ) For each value of θ, we construct a horizontal line segment [x1,x2] such that P(x [x1,x2|θ = 1-a The union of such intervals for all values of θ is called the confidence belt 69 Confidence Intervals Neyman’s construction   After performing an experiment to measure x, a vertical line is drawn through the experimentally measured value x0 The confidence interval for θ is the set of all values of θ for which the corresponding line segment [x1,x2] is intercepted by the vertical line 70 Confidence Intervals 71 Confidence Interval  Notes  The coverage condition is not unique  P(x<x1|θ) = P(x>x2|θ) = a/2  Called central confidence intervals  P(x<x1|θ) = a  Called upper confidence limits  P(x>x2|θ) = a  Called lower confidence limits 72 Confidence Intervals These confidence intervals have a confidence level = 1-a By construction, P(θ [θ1,θ2]) > 1-a is satisfied for all θ including θt Another method is to consider a test of the hypothesis that the parameters true value is θ If the variables are discrete by convention one constructs the confidence belt by requiring 73 P(x <x<x |θ) >= 1-a Examples Data consisting of a single random variable x that follows a Gaussian distribution Counting experiments 74 Poisson Confidence Interval We previously mentioned that the number of events produced in a reaction with cross section σ and fixed luminosity L follows a Poisson distribution with mean v=σ∫Ldt   P(n;v) = e-v vn / n! If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a Example: Measuring the Higgs production cross section assuming no 75 Poisson Confidence Interval 76 Poisson Confidence Interval Central Intervals - Poisson 20  u (n) Poisson Distribution  e  P(n |  )  n! n Parameter 15 l (n) 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Count Upper Limits Lower Limits n 77 Poisson Confidence Interval 78 Poisson Confidence Interval 79 Confidence Intervals Sometimes though confidence intervals     Are empty Reduce in size when the background estimate increases Are smaller for a poorer experiment Exclude parameters for which the experiment is insensitive Example    We know that P(x=0|v=2.3) = 0.1 v < 2.3 @ 90% CL If the number of background events b is 3, 80 then since v = s + b, number of signal events Confidence Intervals 81 Confidence Intervals 82

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document