Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics We collect a sample of data, what do we do with it? Estimate parameters (possibly of some model) Test whether a particular theory is consistent with our data (hypothesis testing) Statistics is a set of tools that allows us to achieve these goals 1 Statistics Preliminaries An estimator ˆ is a function of the data whose value, the estimate, is intended as a meaningful guess of the true value of the parameter There is no fundamenta l rule for doing this One thing we desire is consistenc y lim ˆ n Another th ing we desire is no bias b E ˆ 0 2 Statistics Some common estimators are for the mean and variance Consider N independen t measuremen ts xi of unknown and unknown 2 1 ˆ N N x i 1 i N 1 2 2 2 ˆ ˆ xi s N 1 i 1 Furthermor e V ˆ 2 N 1 N 3 4 2 V ˆ m4 N N 1 3 c2 Distribution A common situation is that you have a set of measurements xi and you know the true value of each xit How good are our measurements? Similarly you may be comparing a histogram of data with another that contains expectation values under some hypothesis How well do the data agree with this hypothesis? Or if parameters of a function were estimated using the method of least squares, a minimum value of c2 was obtained How good was the fit? 4 c2 Distribution Assuming The measurements are independent of each other The measurements come from a Gaussian distribution One can use the “goodness-of-fit” statistic c2 to answer these questions n c 2 xi xti i 1 2 2 i In the case of Poisson distributed numbers, i2=xti, this is called Pearson’s c2 statistic 5 c2 Distribution Chi-square distribution 1 f z; n n / 2 z n / 21e z / 2 z 0 2 n / 2 n 1,2,... is the number of degrees of freedom Ez n V z 2n The usefulness of this pdf is that for n independen t xi , i , i2 n z i 1 xi i 2 i2 follows the c 2 distributi on with n d.o.f. 6 c2 Distribution 7 c2 Distribution The integrals (or cumulative distributions) between arbitrary points for both the Gaussian and c2 distributions cannot be evaluated analytically and must be looked up What is the probability of getting a c2 > 10 with 4 degrees of freedom? This number tells you the probability that random fluctuations (chance fluctuations) in the data would give a value of c2 > 10 8 c2 Distribution Note the p-value is defined as p f z, n dz c2 We’ll come back to p-values in a moment 9 c2 Distribution 1- cumulative c2 distribution 10 c2 Distribution Often one uses the reduced c2 = c2/n 11 Hypothesis Testing Hypothesis tests provide a rule for accepting or rejecting hypotheses depending on the outcome of a measurement Suppose H predicts f x , H for some vector of data x We do an experiment and measure xobs What can we say about H? 12 Hypothesis Testing Normally we define regions in x-space that define where the data is compatible with H or not 13 Hypothesis Testing Let’s say there is just one hypothesis H We can define some test statistic t whose value in some way reflects the level of agreement between the data and they hypothesis We can quantify the goodness-of-fit by specifying a p-value given an observed tobs in the experiment p g t , H dt tobs Assumes t is defined such that large values correspond to poor agreement with the hypothesis g is the pdf for t 14 Hypothesis Testing Notes p is not the significance level of the test p is not the confidence level of a confidence interval p is not the probability that H is true That’s Bayesian speak p is the probability, under the assumption of H, of obtaining data (x or t(x)) having equal or lesser compatibility with H as xobs 15 Hypothesis Testing Flip coins N! f nh ; ph , N phnh 1 phnh nh ! N nh ! N nh Hypothesis H is coin is fair (random) so ph=pt=0.5 We could take t=|nh-N/2| Toss coin N=20 times and observe nh=17 Region of t - space with compatibil ity t 7 p value P(nh 0,1,2,3,17,18,19,20) 0.026 Is H false? Don’t know We can say that probability of observing 17 or more heads assuming H is 0.0026 p is the probability of observing this result “by chance” 16 Kolmogorov-Smirnov (K-S) Test The K-S test is an alternative to the c2 test when the data sample is small It is also more powerful than the c2 test since it does not rely on bins – though one commonly uses it that way A common use is to quantify how well data and Monte Carlo distributions agree It also does not depend on the underlying cumulative distribution function being tested 17 K-S Test Data – Monte Carlo comparison 18 K-S Test The K-S test is based on the empirical distribution function (ECDF) Fn(x) 1 n 1 if yi x Fn x n i 1 0 otherwise For n ordered data points yi This is a step function that increases by 1/N at the value of each ordered data point 19 K-S Test The K-S statistic is given by D max Fn x F x n D max F x Fn x n F x is the hypothesiz ed distributi on If D > some critical value obtained from tables, the hypothesis (data and theory distributions agree) is rejected 20 K-S Test 21 Statistics Suppose N independent measurements xi are drawn from a pdf f(x;) We want want to estimate the parameters The most important method for doing this is the method of maximum likelihood A related method in the case of least squares 22 Hypothesis Testing Example Properties of some selected events Hypothesis H is these are top quark events Working in x-space is hard so usually one constructs a test statistic t instead whose value reflects the compatibility between the data vector x and H Low t – data more compatible with H High t – data less compatible with H Since f(x,H) is known, g(t,H) can be determined 23 Hypothesis Testing Notes p is not the significance level of the test p is not the confidence level of a confidence interval p is not the probability that H is true That’s Bayesian speak p is the probability, under the assumption of H, of obtaining data (x or t(x)) having equal or lesser compatibility with H as xobs Since p is a function of r.v. x, p itself is a r.v If H is true, p is uniform in [0,1] If H is not true, p is peaked closer to 0 24 Hypothesis Testing Suppose we observe nobs=ns+nb events ns, nb are Poisson r.v.’s with means ns,nb nobs=ns+nb is Poisson r.v. with mean n=ns+nb n s n b f n;n s ,n b n n! e n s n b 25 Hypothesis Testing Suppose nb=0.5 and we observe nobs=5 Publish/NY Times headline or not? Often we take H to be the null hypothesis – assume it’s random fluctuation of background Assume ns=0 p pn nobs pspace of compatibil ity p nobs1 n bn n nobs n 0 n! f n;0,n b 1 p 1.7 10 e n b 4 This is the probability of observing 5 or more resulting from chance fluctuations of the background 26 Hypothesis Testing Another problem, instead of counting events say we measure some variable x Publish/NY Times headline or not? 27 Hypothesis Testing Again take H to be the null hypothesis – assume it’s random fluctuation of background Assume ns=0 p f n 11;n b 3.2,n s 0 5.4 10 4 Again p is the probability of observing 11 or more events resulting from chance fluctuations of the background How did we know where to look / how to bin? Is the observed width consistent with the resolution in x? Would a slightly different analysis still show a peak? What about the fact that the bins on either side of the peak are low? 28 Least Squares Another approach is to compare a histogram with a hypothesis that provides expectation values In this case we’d compare a vector of Poisson distributed numbers (the histogram) with their expectation values ni=E[ni] N c 2 n 1 ni n i 2 i N n 1 ni n i ni This is called Pearson’s statistic If the ni are not too small (e.g. ni > 5) then the observed c2 will follow the chi-square pdf for N dof Or more generally for N – number of fitted parameters Same will hold true for N independent measurements yi that are Gaussian distributed 29 Least Squares We can calculate the p-value as p f z; N dz where f is the c 2 pdf c2 Recall for the c 2 pdf, E z N so χ2 often is taken as the measure of agreement N In our example c 29.8 for 20 dof 2 p 0.073 30 Least Squares In our example though we have many bins with a small number of counts or 0 We can still use Pearson’s test but we need to determine the pdf f(c2) by Monte Carlo Generate ni from Poisson, mean ni in each bin Compute c2 and record in a histogram Repeat for a large number of times (see next slide) 31 Least Squares Using the modified pdf would give p=0.11 rather than p=0.073 In either case, we won’t publish 32 K-S Test Usage in ROOT TFile * data TFile * MC TH1F * jet_pt = data → Get(“h_jet_pt”) TH1F * MCjet_pt = MC → Get(“h_jet_pt”) Double_t KS=MCjet_pt→KolmogorovTest(jet_pt) Notes The returned value is the probability of the test << 1 means the two histograms are not compatable The returned value is not the maximum KS distance though you can return this with option “M” Also available in statistical toolbox in MatLab 33 Limiting Cases Binomial Poisson Gaussian 34 Nobel Prize or IgNobel Prize? CDF result 35 Kaplan-Meier Curve A patient is treated for a disease. What is the probability of an individual surviving or remaining disease-free? Usually patients will be followed for various lengths of time after treatment Some will survive or remain disease-free while others will not. Some will leave the study. A nonparametric method can be found using Kaplan-Meier curve Life table Survival curve 36 Kaplan-Meier Curve Calculate a conditional probability S(tN) = P(t1) x P(t2) x P(t3) x … P(tN) The survival function S(t) is equivalent to the empirical distribution function F(t) S t We can write this as p j j ;t j t dj p j 1 n j d j is the number dying during period j n j is the number tha t have survived to the beginning of period j 37 Kaplan-Meier Curve 38 Kaplan-Meier Curve The square root of the variance of S(t) can be calculated as pk pk 1 pk / nk Assuming the pk follow a Gaussian (normal) distribution, then the 95% CL will be pk 1.95 pk 39 Gaussian Confidence Interval 40 Gaussian Confidence Interval 41 Gaussian Distribution Some useful properties of the Gaussian distribution are in range ±) = 0.683 in range ±2) = 0.9555 in range ±3) = 0.9973 outside range ±3) = 0.0027 outside range ±5) = 5.7x10-7 P(x P(x P(x P(x P(x P(x in range ±0.6745) = 0.5 42 Gaussian Distribution 43 Confidence Intervals Suppose you have a bag of black and white marbles and wish to determine the fraction f that are white. How confident are you of the initial composition? How does your confidence change after extracting n black balls? Suppose you are tested for a disease. The test is 100% accurate if you have the disease. The test gives 0.2% false positive if you do not. The test comes back positive. What is the probability 44 Confidence Intervals Suppose you are searching for the Higgs and have a well-known expected background of 3 events. What 90% confidence limit can you set on the Higgs cross section if you observe 0 events? if you observe 3 events? if you observe 10 events? The ability to set confidence limits (or claim discovery) is an important part of frontier physics How to do this the “correct” way is somewhat/very controversial 45 Confidence Intervals Questions What is the mass of the top quark? What is the mass of the tau neutrino What is the mass of the Higgs Answers Mt = 172.5 ± 2.3 GeV Mv < 18.2 MeV MH > 114.3 GeV More correct answers Mt = 172.5 ± 2.3 GeV with CL = 0.683 0 < Mv < 18.2 MeV with CL = 0.95 Infinity > MH > 114.3 GeV with CL = 0.95 46 Confidence Interval A confidence interval reflects the statistical precision of the experiment and quantifies the reliabiltiy of a measurement For a sufficiently large data sample, the mean and standard deviation of the mean provide a good provide a good interval What if the pdf isn’t Gaussian? What if there are physical boundaries? What if the data sample is small? Here we run into problems 47 Confidence Interval A dog has a 50% probability of being 100m from its master You observe the dog, what can you say about its master? With 50% probability, the master is within 100m of the dog But this assumes The master can be anywhere around the dog The dog has no preferred direction of travel 48 Confidence Intervals Neyman’s construction Consider a pdf f(x;θ) = P(x|θ) For each value of θ, we construct a horizontal line segment [x1,x2] such that P(x [x1,x2|θ = 1-a The union of such intervals for all values of θ is called the confidence belt 49 Confidence Intervals Neyman’s construction After performing an experiment to measure x, a vertical line is drawn through the experimentally measured value x0 The confidence interval for θ is the set of all values of θ for which the corresponding line segment [x1,x2] is intercepted by the vertical line 50 Confidence Intervals 51 Confidence Interval Notes The coverage condition is not unique P(x<x1|θ) = P(x>x2|θ) = a/2 Called central confidence intervals P(x<x1|θ) = a Called upper confidence limits P(x>x2|θ) = a Called lower confidence limits 52 Poisson Confidence Interval We previously mentioned that the number of events produced in a reaction with cross section σ and fixed luminosity L follows a Poisson distribution with mean n=σ∫Ldt P(n;v) = e-n nn / n! If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a Example: Measuring the Higgs production cross section assuming no background 53 Poisson Confidence Interval 54 Poisson Confidence Interval Central Intervals - Poisson 20 u (n) Poisson Distribution e P(n | ) n! n Parameter 15 l (n) 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Count Upper Limits Lower Limits n 55 Poisson Confidence Interval 56 Poisson Confidence Interval Assume signal s and background b nobs a 0.05 Pn nobs ; s, b n 0 s b n n! e s b Solve numericall y for s sup This gives an upper limit on s at CL 1-α In the special case that b 0 and nobs 0 a 0.05 e sup 3 s 57 Poisson Confidence Interval 58 Confidence Intervals Sometimes though confidence intervals Are empty Reduce in size when the background estimate increases Are smaller for a poorer experiment Exclude parameters for which the experiment is insensitive Example We know that P(x=0|v=2.3) = 0.1 v < 2.3 @ 90% CL If the number of background events b is 3, 59 then since v = s + b, number of signal events Confidence Intervals 60 Confidence Intervals 61 Confidence Interval Experiment X uses a fit to extract the neutrino mass Mv = -4 ± 2 eV => P (Mv < 0 eV) = 0.98? 62 Confidence Interval What is probability? Frequentist approach Developed by Venn, Fisher, Neyman, von Mises The relative frequency with which something happens number of successes / number of trials Venn limit (n trials to infinity) Assumes success appeared in the past and will occur in the future with the same probability It will rain tomorrow in Tucson and P(S) = 0.01 The relative frequency it rains on Mondays in April is 0.01 63 Confidence Interval What is probability Bayesian approach Developed by Bayes, Laplace, Gauss, Jeffreys, de Finetti The degree of belief or confidence of a statement or measurement Closer to what is used in everyday life Is the Standard Model correct Similar to betting odds Not “scientific”? It will rain tomorrow in Tucson and P(S) = 0.01 The plausibility of the above statement is 0.01 (ie the same as if I were to draw a white ball out of a container of 100 balls, 1 of which is white) 64 Confidence Interval Usually Confidence interval == frequentist confidence interval Credible interval == Bayesian posterior probability interval But you’ll also hear Bayesian confidence interval Probability P=1–a a = 0.05 => P = 95% 65 Confidence Interval Suppose you wish to determine a parameter θ whose true value is θt is unknown Assume we make a single measurement of an observable x whose pdf P(x|θ) depends on θ Recall this is the probability of obtaining x given θ Say we measure x0, then we obtain P(x0|θ) Frequentist Makes statements about P(x|θ) Bayesian Makes statements about P(θt|x0) P(θt|x0) = P(x0|θt) P(θt) / P(x0) We’ll stick with the frequentist approach for the moment 66 Confidence Interval (Frequentist) confidence intervals are constructed to include the true value of the parameter (θt) with a probability of 1-α In fact this is true for any value of θ A confidence interval [θ1,θ2] is a member of a set, such that the set has the property that P(θ [θ1,θ2])= 1-α Perform an ensemble of experiments with fixed θ The interval [θ1,θ2] will vary and cover the fixed value θ in a fraction of 1-α of the experiments Presumably when we make a measurement we are selecting it at random from the ensemble that contains the true value of θ, θt Note we haven’t said anything about the probability of θt being in the interval [θ1,θ2] 67 as a Bayesian would Confidence Interval If P(θ [θ1,θ2]) = 1-a is true we say the intervals “cover” θ at the stated confidence If there are values of θ for which P(θ [θ1,θ2]) < 1-a we say the intervals “undercover” for that θ If there are values of θ for which P(θ [θ1,θ2]) > 1-a we say the intervals “overcover” for that θ Undercoverage is bad 68 Confidence Intervals Neyman’s construction Consider a pdf f(x;θ) = P(x|θ) For each value of θ, we construct a horizontal line segment [x1,x2] such that P(x [x1,x2|θ = 1-a The union of such intervals for all values of θ is called the confidence belt 69 Confidence Intervals Neyman’s construction After performing an experiment to measure x, a vertical line is drawn through the experimentally measured value x0 The confidence interval for θ is the set of all values of θ for which the corresponding line segment [x1,x2] is intercepted by the vertical line 70 Confidence Intervals 71 Confidence Interval Notes The coverage condition is not unique P(x<x1|θ) = P(x>x2|θ) = a/2 Called central confidence intervals P(x<x1|θ) = a Called upper confidence limits P(x>x2|θ) = a Called lower confidence limits 72 Confidence Intervals These confidence intervals have a confidence level = 1-a By construction, P(θ [θ1,θ2]) > 1-a is satisfied for all θ including θt Another method is to consider a test of the hypothesis that the parameters true value is θ If the variables are discrete by convention one constructs the confidence belt by requiring 73 P(x <x<x |θ) >= 1-a Examples Data consisting of a single random variable x that follows a Gaussian distribution Counting experiments 74 Poisson Confidence Interval We previously mentioned that the number of events produced in a reaction with cross section σ and fixed luminosity L follows a Poisson distribution with mean v=σ∫Ldt P(n;v) = e-v vn / n! If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a Example: Measuring the Higgs production cross section assuming no 75 Poisson Confidence Interval 76 Poisson Confidence Interval Central Intervals - Poisson 20 u (n) Poisson Distribution e P(n | ) n! n Parameter 15 l (n) 10 5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Count Upper Limits Lower Limits n 77 Poisson Confidence Interval 78 Poisson Confidence Interval 79 Confidence Intervals Sometimes though confidence intervals Are empty Reduce in size when the background estimate increases Are smaller for a poorer experiment Exclude parameters for which the experiment is insensitive Example We know that P(x=0|v=2.3) = 0.1 v < 2.3 @ 90% CL If the number of background events b is 3, 80 then since v = s + b, number of signal events Confidence Intervals 81 Confidence Intervals 82