Download Statistics - UW Canvas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Standard Model wikipedia , lookup

Forecasting wikipedia , lookup

Business intelligence wikipedia , lookup

Data analysis wikipedia , lookup

Transcript
Statistics
PHYS428/PHYS576
Advanced Techniques in Experimental Particle Physics
Fred James’s lectures
http://preprints.cern.ch/cgi-bin/setlink?base=AT&categ=Academic_Training&id=AT00000799
http://www.desy.de/~acatrain/
Glen Cowan’s lectures
http://www.pp.rhul.ac.uk/~cowan/stat_cern.html
Louis Lyons
http://indico.cern.ch/conferenceDisplay.py?confId=a063350
Bob Cousins gave a CMS lecture, may give it more publicly Gary Feldman “Journeys of an Accidental Statistician”
http://www.hepl.harvard.edu/~feldman/Journeys.pdf
http://histfitter.web.cern.ch/histfitter/
Further Reading
By physicists, for physicists
G. Cowan, Statistical Data Analysis, Clarendon Press, Oxford, 1998.
FURTHER
READING
R.J.Barlow,
A Guide to the Use of Statistical Methods in the Physical
physicists Methods in
Sciences, John Wiley, 1989;By
F.physicists,
James, for
Statistical
G. Cowan,
Statistical Data
Analysis,
Clarendon
Press, Oxford,
1998. 2006;
Experimental
Physics,
2nd
ed., World
Scientific,
R.J.Barlow, A Guide to the Use of Statistical Methods in the Physical Sciences, John Wiley, 1989;
W.T. Statistical
Eadie etMethods
al., North-Holland,
1971 2nd
(1sted.,
ed.,
hard
to find);
F. James,
in Experimental Physics,
World
Scientific,
2006;
‣ W.T. Eadie et al., North-Holland, 1971 (1st ed., hard to find);
S.Brandt, Statistical and Computational Methods in Data Analysis,
S.Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998.
Springer,
New
York,and
1998.
L.Lyons,
Statistics
L.Lyons,
Statistics
for Nuclear
Particle
Physics, CUP,
1986. for Nuclear and Particle
Physics, CUP, 1986.
5
2
updated versions of this document in the future.
3
Kyle
Cranmer’s
Lecture Notes
LECTURE
NOTES
Contents
Practical Statistics for the LHC
Kyle Cranmer
Center for Cosmology and Particle Physics, Physics Department, New York University, USA
Abstract
This document is a pedagogical introduction to statistics for particle physics.
Emphasis is placed on the terminology, concepts, and methods being used at
the Large Hadron Collider. The document addresses both the statistical tests
applied to a model of the data and the modeling itself . I expect to release
updated versions of this document in the future.
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2
Conceptual building blocks for modeling . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.1
Probability densities and the likelihood function . . . . . . . . . . . . . . . . . . . . . .
3
2.2
Auxiliary measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3
Frequentist and Bayesian reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.4
Consistent Bayesian and Frequentist modeling of constraint terms . . . . . . . . . . . .
7
Physics questions formulated in statistical language . . . . . . . . . . . . . . . . . . . . .
8
3.1
Measurement as parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3.2
Discovery as hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.3
Excluded and allowed regions as confidence intervals . . . . . . . . . . . . . . . . . . .
11
Modeling and the Scientific Narrative . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4.1
Simulation Narrative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
4.2
Data-Driven Narrative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.3
Effective Model Narrative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.4
The Matrix Element Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.5
Event-by-event resolution, conditional modeling, and Punzi factors . . . . . . . . . . . .
28
Frequentist Statistical Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
5.1
The test statistics and estimators of µ and ✓ . . . . . . . . . . . . . . . . . . . . . . . .
29
5.2
The distribution of the test statistic and p-values . . . . . . . . . . . . . . . . . . . . . .
31
5.3
Expected sensitivity and bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
5.4
Ensemble of pseudo-experiments generated with “Toy” Monte Carlo . . . . . . . . . . .
33
5.5
Asymptotic Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
5.6
Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
5.7
Look-elsewhere effect, trials factor, Bonferoni . . . . . . . . . . . . . . . . . . . . . . .
37
5.8
One-sided intervals, CLs, power-constraints, and Negatively Biased Relevant Subsets . .
37
Bayesian Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
6.1
Hybrid Bayesian-Frequentist methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
6.2
Markov Chain Monte Carlo and the Metropolis-Hastings Algorithm . . . . . . . . . . .
40
6.3
Jeffreys’s and Reference Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
6.4
Likelihood Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
7
Unfolding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
8
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3
4
Links:
On Authorea
arxiv:1503.07622
5
6
Why do we need Statistics?
Statistics plays a vital role in science, it is the way that we:
‣ quantify our knowledge and uncertainty
‣ communicate results of experiments Big questions:
‣ how do we make discoveries, measure or exclude theoretical
parameters, ... ‣ how do we get the most out of our data
‣ how do we incorporate uncertainties
‣ how do we make decisions
4
Practical Examples
Basic
questions
Basic questions
• 
• 
Physics questions we want to answer...
Physics questions we want to answer...
•  Is the new discovered particle a ‘vanilla’ Higgs boson?
Physics
questions
weparticle
wanta to
answer...
•  Is the
new discovered
‘vanilla’
Higgs boson?
What is its production cross section and couplings?
What is its production cross section and couplings?
Is
any
SUSY
in ATLAS data? particle a ‘vanilla’
IsIs there
the
new
discovered
there any SUSY in ATLAS data?
•  If not, what models do not agree with data?
• 
• 
• 
• 
•
boson?
what models do not agree with data?
•  If not,
Higgs
• 
• 
Enormous efforts in many channels, millions of plots with !
Enormous efforts in many channels, millions of plots with !
expectations, with
systematics
and observed
•signal/backgrounds
What is its production
cross
section
signal/backgrounds
expectations, with
systematics
and and
observed
data
couplings?
data
• 
• 
How do you conclude on these questions?
How do you conclude on these questions?
• Is there any SUSY in ATLAS data? Statistical tests construct probabilistic statements/models on!
Statistical tests construct probabilistic statements/models on!
P(theory|data) or P(data|theory)
P(theory|data)
or P(data|theory)
If not,
what models
do not agree with data?
• 
• 
•
• 
• 
Likelihood fits
Likelihood fits
Systematics/uncertainties
Systematics/uncertainties
• Enormous
efforts in many channels, millions of
Hypothesis testing
Hypothesis
testing
plots
with
expectations, with
Setting
limits signal/backgrounds
...
Setting limits ...
• 
• 
• 
• 
• 
• 
• 
• 
systematics
Result: decisions and
basedobserved
on these tests!
Result: decisions based on these tests!
!data
!
• How do you conclude on these questions? As a layman I would now say, I think we have it
As a layman I would now say, I think we have it
5
6
Introduction
Introductory Remark
What is Statistics?
Probability and Statistics
Why uncertainties?
Random and systematic uncertainties
Combining uncertainties
Combining experiments
Binomial, Poisson and Gaussian distributions
7
What do we do with Statistics?
Parameter Determination (best value and range)
Goodness of Fit
Hypothesis Testing
Decision Making
Why bother?
HEP is expensive and time-consuming so
Worth investing effort in statistical analysis à
better information from data
8
What do we do with Statistics?
Parameter Determination (best value and range)
e.g. Mass of Higgs = 80 +- 2
Goodness of Fit
Does data agree with our theory?
Hypothesis Testing
Does data prefer Theory 1 to Theory 2?
Decision Making
What experiment shall I do next?)
Why bother?
HEP is expensive and time-consuming so
Worth investing effort in statistical analysis à
better information from data
9
Proability Probability
and Statistics and
10
Statistics
Example: Dice
Given P(5) = 1/6, what is
P(20 5’s in 100 trials)?
Given 20 5’s in 100 trials,
what is P(5)?
And its uncertainty?
If unbiassed, what is
P(n evens in 100 trials)?
Given 60 evens in 100 trials,
is it unbiassed?
Or is P(evens) =2/3?
THEORY
DATA
DATA
THEORY
6
Probability
and
Proability and Statistics
Example:
Statistics
11
Dice
Given P(5) = 1/6, what is
P(20 5’s in 100 trials)?
Given 20 5’s in 100 trials,
what is P(5)?
And its uncertainty?
Parameter Determination
If unbiassed, what is
P(n evens in 100 trials)?
Given 60 evens in 100 trials,
is it unbiassed?
Goodness of Fit
Or is P(evens) =2/3?
Hypothesis Testing
N.B. Parameter values not sensible if goodness of fit is poor/bad
7
Why do we need uncertainties?
Affects conclusion about our result e.g. Result / Theory
= 0.970
If 0.970 ± 0.050, data compatible with theory
If 0.970 ± 0.005, data incompatible with theory
If 0.970 ± 0.7, need better experiment
Historical experiment at Harwell testing General
Relativity
12
Random + Systematic Uncertainty
Random/Statistical: Limited accuracy, Poisson counts
Spread of answers on repetition (Method of estimating)
Systematics: May cause shift, but not spread
e.g. Pendulum g = 4π2L/!2, ! = T/n Statistical uncertainties: T, L
Systematics: T, L
Calibrate: SystematicàStatistical More systematics:
Formula for undamped, small amplitude, rigid, simple pendulum
Might want to correct to g at sea level:
Different correction formulae
Ratio of g at different locations:
Possible systematics might cancel. Correlations relevant
13
14
Presenting Results
Quote result as g ± σstat ± σsyst
Or combine uncertainties in quadrature à
g±σ
Other extreme: Show all systematic contributions separately Useful for
assessing correlations with other measurements
Needed for using:
improved outside information,
combining results
using measurements to calculate something else.
Combining Uncertainties
z =x - y
δz = δx – δy [1]
Why σz2 =σx2 +σy2 ? [2]
15
Combining Errors
Combining
16
errors
z =x - y
z=x-y
δz = δx –δzδy=[1]
δx – δy
[1]
2 =σ 2 +σ 2 ? [2]
Why
σ
Why zσz2 =x σx2y+ σy2 ? [2]
1) [1] is for specific δx, δy
Could be
so on average
N.B. Mneumonic, not proof
?
2) σz2 = δz2 = δx2 + δy2 – 2 δx δy
= σx2 + σy2
provided…………..
12
17
Averaging
3) Averaging is good for you:
[1] xi ± σ
N measurements xi ± σ
or [2] xi ± σ/√N ?
4) Tossing a coin:
Score 0 for tails, 2 for heads
After 100 tosses, [1] 100 ± 100
0
100
(1 ± 1)
or
[2] 100 ± 10
?
200
Prob(0 or 200) = (1/2)99 ~ 10-30
Compare age of Universe ~ 1018 seconds
13
Rules functions
for different
Rules for different
functions
1) Linear: z = k1x1 + k2x2 + …….
σz = k 1 σ1 & k 2 σ2
& means “combine in quadrature”
N.B. Fractional errors NOT relevant
e.g. z = x – y
z = your height
x = position of head wrt moon
y = position of feet wrt moon
x and y measured to 0.1%
z could be -30 miles
18
19
Rules for different functions
Rules for different functions
2) Products and quotients
α
β
z = x y …….
σz/z = α σx/x & β σy/y
Useful for
2
x,
xy, x/√y,…….
20
Rules for different functions
3) Anything else:
z = z(x1, x2, …..)
σz = ∂z/∂x1 σ1 & ∂z/∂x2 σ2 & …….
OR numerically:
z0 = z(x1,
x2,
x3….)
z1 = z(x1+σ1, x2, x3….)
z2 = z(x1, x2+ σ2, x3….)
σz = (z1-z0) & (z2-z0) & ….
N.B. All formulae approximate (except 1)) – assumes small
uncertainties
16
Combining Results
Combining
results
Combining
results
ARE
21
Combining Results
Combining
results
Combining
results
ARE
Combining results
BEWARE
100±10
22
results
CombiningCombining
Results
Combining
results
Combining
results
ARE
Combining results
BEWARE
100±10
23
Difference between averaging and adding 24
Avergage vs Addition
Isolated island with conservative inhabitants
How many married people ?
Number of married men
= 100 ± 5 K
Number of married women = 80 ± 30 K
Total = 180 ± 30 K
Wtd average = 99 ± 5 K
Total = 198 ± 10 K
CONTRAST
GENERAL POINT: Adding (uncontroversial) theoretical input can
improve precision of answer
Compare “kinematic fitting”
Binomial Distribution
Fixed N independent trials, each with same prob of
success p
What is prob of s successes?
e.g. Throw dice 100 times. Success = ‘6’. What is
prob of 0, 1,.... 49, 50, 51,... 99, 100 successes?
Effic of track reconstrn = 98%. For 500 tracks, prob that
490, 491,...... 499, 500 reconstructed.
Ang dist is 1 + 0.7 cosθ? Prob of 52/70 events with cosθ >
0?
(More interesting is statistics question)
25
26
Binomial Distribution
Ps =
N!
ps (1-p) N-s , as is obvious
(N-s)! s!
Expected number of successes = ΣsPs = Np,
as is obvious
Variance of no. of successes = Np(1-p)
Variance ~ Np, for p~0
~ N(1-p) for p~1
NOT Np in general. NOT s ±√s
e.g. 100 trials, 99 successes, NOT 99 ± 10
20
27
Limit Cases
Statistics:
Estimate p and σp from s (and N)
p = s/N
σp2 = 1/N s/N (1 – s/N)
If s = 0, p = 0 ± 0 ?
If s = 1, p = 1.0 ± 0 ?
Limiting cases:
● p = const, N
∞:
Binomial
Gaussian
μ = Np, σ2 = Np(1-p)
● N
∞, p 0, Np = const:
Binomial
Poisson
μ = Np, σ2 = Np
{N.B. Gaussian continuous and extends to -∞}
21
Binomial Distributions
Binomial
Distributions
28
29
Poisson Distributions
Poisson
Distribution
Prob of n independent events occurring in time t when
is
r
(constant)
Prob of n independent events occurring in time t when rate
is revents
(constant)
e.g.
in bin of histogram
e.g. events
in bin of histogram
NOT
Radioactive
decay for t ~ τ
NOT Radioactive decay for t ~ τ
Limit of Binomial (N ∞, p 0, Np μ)
Limit of Binomial (N ∞, p 0, Np μ)
-r t (r t)n /n! = e -μ μn/n! (μ = r t)
P
=
e
-r
Pnn = e t (r t)n /n! = e -μ μn/n! (μ = r t)
<n>
(No
surprise!)
<n> ==rrt t==μ μ(No
surprise!)
22 = μ
σ
“n “n
±√n”
BEWARE
0±0?0±0?
σ nn = μ
±√n”
BEWARE
μ ∞: Poisson Gaussian, with mean = μ, variance =μ
μ ∞: Poisson
Gaussian, with mean = μ, variance =μ
2
Important for χ
Important for χ2
23
For your thought
For your thought
Poisson Pn = e -μ μn/n!
–μ
–μ
2
-μ
P0 = e
P1 = μ e
P2 = μ /2 e
For small μ, P1 ~ μ, P2 ~ μ2/2
If probability of 1 rare event ~ μ,
2
why isn’t probability of 2 events ~ μ ?
30
31
Poisson Distributions
Poisson
Distributions
Approximately
Gaussian
25
32
Gaussian Distributions
Gaussian or
Gaussian or
Normal
Normal
Relevance of Central
Relevance
Limit
Theoremof Central
yLimit
= ∑xTheorem
i
y
=
∑xi any dist
x has (almost)
has (almost)
y xGaussian
for any
largedist
n
y
Gaussian for large n
Significance
of σ of σ
Significance
i) RMS
of Gaussian
=σ =σ
i) RMS
of Gaussian
(hence
of definition
2 in definition
of Gaussian)
(hence
factorfactor
of 2 in
of Gaussian)
x = μ±σ,
=/√e
ymax/√e
~0.606
ii) At xii)=Atμ±σ,
y = yymax
~0.606
ymaxymax
σ = half-width
at ‘half’-height)
(i.e. σ(i.e.
= half-width
at ‘half’-height)
iii) Fractional
within
= 68%
iii) Fractional
area area
within
μ±σμ±σ
= 68%
iv) Height
at max
= 1/(σ√2
iv) Height
at max
= 1/(σ√2
π) π)
26
26
33
Gaussian Distributions
Area in tail(s)
of Gaussian
0.002
27
Gaussian vs Poisson
Relevant for Goodness of Fit
Relevant for Goodness of Fit
34
Binomial vs Gaussian vs Poisson
35
29
36
Simple statistical example
Simple statistical example
• 
Central concept in statistics is the ‘probability model’ : assigns a probability to each possible experimental outcome
• 
Example: a HEP counting experiment
• 
PROBABILITY
• 
Count number of events in your signal region (SR) in your data (specific lumi): Poisson distribution
Given the expected(MC) event count, the probability model is fully specified
Poisson(N| b)
Poisson(N| s + b)
Poisson(N| s + b)
Suppose we measure N = 7 events (Nobs), then can calculate the probability
•  P(Nobs|hypothesis) is called LIKELIHOOD - L(Nobs|b), L(Nobs|s+b), L(observed data|theory)!
!
p(Nobs|b) = 2.2%
p(Nobs|s+b) = 14.9%
• 
• 
Data is more likely under s+b hypothesis than bkg-only
W. Verkerke
HEP Workflow
HEP workflow
37
W. Verkerke
38
HEP Data Analysis
HEP data analysis
analysis view
W. Verkerke
• 
HEP Data Analysis is (should be) for a large part the reduction of a physics theory(s) to a statistical model
• 
Statistical/probability model: Given a measurement x (eg N events), what is the probability to observe each