Download Statistics for Astronomy I: Introduction and Probability Theory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics for Astronomy I:
Introduction and Probability Theory
B. Nikolic
Astrophysics Group, Cavendish Laboratory, University of Cambridge
20 October 2008
‘Astronomers cannot avoid statistics, and there are
several reasons for this unfortunate situation.’
[Wall(1979)]
‘Astronomers cannot avoid statistics, and there are
several reasons for this unfortunate situation.’
[Wall(1979)]
◮
You should aim to make best use of available data:
thorough understanding of statistics is the key to that
◮
Statistical inference is relevant both to interpreting
observation and simulation
Simply applying formulae often not sufficient:
◮
◮
◮
Need to understand the theory, limitations, etc
Computer based processing essential
Course goals
◮
Review of essential statistics
◮
◮
◮
Basic applications of statistics in astronomy
◮
◮
Some topics that everybody should know
I expect there is a range of backgrounds here so for some
this may be all very familiar
Some classic applications
Introduction to advanced statistics
◮
A survey rather than a thorough tutorial
Goals for this Lecture
Course introduction
Probability Theory
Moments
Characteristic functions
The central limit theorem
Well-known probability distributions
Normal Distribution
Binomial distribution
Poisson distribution
χ2 distribution
Reference materials
J. V. Wall and C. R. Jenkins,
Practical Statistics for Astronomers (CUP)
D. J. C. MacKay
Information Theory, Inference, and Learning Algorithms
(CUP)
Mike Irwin lectures
http://www.ast.cam.ac.uk/˜mike/lect.html
Penn State University Center for Astrostatistics
http://astrostatistics.psu.edu/
My lectures and supporting materials
http://www.mrao.cam.ac.uk/˜bn204/lecture/
J. V. Wall’s papers: 1979QJRAS..20..138W and
1996QJRAS..37..519W
http://adsabs.harvard.edu/abstract_service.html
Probability
Dual use of ‘probability’:
◮
Quantify with what frequency a ‘random-variable’ is
expected to take its possible values
◮
A measure of the degree of belief that a hypothesis is true
Random variables
◮
◮
Random variable: outcome of an experiment that we can
not determine in advance
The cause of apparent randomness is often simply that we
do not know the initial conditions of the experiment
◮
◮
◮
◮
e.g., flipping a coin or a roll of the roulette wheel are both
easily predictable given fairly rudimentary measurements of
initial conditions when coin/ball are launched. [Yes, this has
been exploited in practice]
output of a computer random number generator is exactly
predictable if you know the internal state of the generator (it
usually is from 8 to 128 bits long) and almost completely
unpredictable if you do not
‘Randomness’ of an experiment is therefore subjective and
a function of prior knowledge about the experiment
Random variables can be continuous or discrete
PDF
X is a continuous random variable
Probability Density Function (PDF)
P(x)dx is the probability of X being in range x to x + dx
◮
P(x) is non-negative:
P(x) ≥ 0
◮
Area under P(x) is unity:
Z
∀x
(1)
P(x)dx = 1
(2)
CDF
Cumulative Density Function (CDF)
C(x) is the probability that X is less than x:
Z x
P(x ′ )dx ′
C(x) =
(3)
−∞
◮
Cumulative functions are easier to estimate from
observations:
◮
◮
◮
They can be visualised more faithfully (i.e., with fewer
assumptions)
They form a basis of a number of important statistical tests
Mathematically CDFs are a slightly more general way of
describing probabilities than PDFs
Discrete random variables
◮
Most results retain equivalent forms, with integrals turned
to sums etc
◮
Probability density function usually renamed ‘Probability
Mass Function’ (PMF)
The possible values of the random variables need not have
a defined ordering
◮
◮
◮
e.g., X = H or X = T for heads or tails outcomes
no ordering =⇒ no cumulative distribution
Moments of probability distributions
µn (r ), n-th moment around value r :
Z
µn (r ) = (x − r )n P(x)dx
Mean, µ, is the first moment around r = 0:
Z
µ = xP(x)dx
n-th central moment is the n-th moment around the mean:
Z
µn (r = µ) = (x − µ)n P(x)dx
Note that moments do not necessarily exist even for some
common theoretical distributions.
(4)
(5)
(6)
Moments II
µ2 second central moment = variance = σ 2
µ3 /σ 3 Skew: a measure of asymmetry of the distribution
µ4 /σ 4 Kurtosis: a measure of peakiness / fatness
Conversion between central moments (µn ) and moments about
origin (µ′n ) :
n X
n
µn =
(−1)n−j µ′j µn−j
(7)
j
j=0
E.g.:
µ2 = µ′2 − µ2
(8)
Moments III: why all the fuss?
◮
◮
As we will see shortly, moments are a way of expanding a
probability distribution into coefficients quite similar to say
Taylor expansion of a function
Can quantitatively compare to normal distribution, e.g.:
◮
◮
◮
Expect Skew = 0 for normal distribution
Expect Kurtosis =3 for normal distribution
Central-limit theorem
Characteristic function
Characteristic function of a probability distribution:
Z
φ(t) = exp(itx)P(x)dx
(9)
◮
Note the sign in the exponent means that the φ(t) is the
inverse Fourier transform of the probability density function
◮
φ(0) ≡ 1
◮
Moments of the PDF are closely related to the Taylor
expansion of the Characteristic function:
n
′
−n d φ(t) µn = i
n
dt t=0
One reason for fussing about the moments!
(10)
Characteristic function II
Expand the characteristic function explicitly in terms of
moments:
φ(t) = 1 + itµ −
t3
t4
t2 ′
µ2 − i µ′3 + µ′4 + · · ·
2
3!
4!
(11)
The central limit
◮
◮
φX (t) is the characteristic of PX (x)
if Y = X1 + X2 , what is φY , the characteristic function of Y ?
◮
◮
◮
φY (t) = φX1 (t) × φX2 (t)
Think of this in terms of the convolution theorem in Fourier
analysis
If Y =
◮
P
k
ak Xk ?
φY (t) =
Q
k
φXk (ak t)
Sums of independent random variables
The analysis above shows that for sums of lots of independent
random variables, the characteristic function must:
φY (t) → 0
when |t| >> 1
almost regardless of distributions of component variables
(12)
The central limit II
Look for function which:
What about:
φ(0) = 1
(13)
φ(t) → 0 quickly when |t| >> 1
(14)
t 2 σ2
φ(t) = exp −
2
(15)
The central limit II
Look for function which:
What about:
φ(0) = 1
(13)
φ(t) → 0 quickly when |t| >> 1
(14)
t 2 σ2
φ(t) = exp −
2
The PDF that corresponds to this characteristic function:
1
x2
N(x; σ) = √
exp − 2
2σ
2πσ 2
(15)
(16)
The Normal Distribution
◮
Results naturally where a large number of independent
variables are additively combined
"
(x − µ)2
exp −
N(x; µ, σ) = √
2σ 2
2πσ 2
t 2 σ2
φN (t; µ, σ) = exp itµ −
2
1
#
(17)
(18)
The Normal Distribution Plot
Rx
0.9
N (x;µ =0,σ =1)
0.399
N(x′ ; µ = 0, σ = 1)dx′
0.7
0.5
0.3
0.1
−3
−2
−1
0
x
1
2
3
Ubiquity of the Normal Distribution
◮
Voltage fluctuations across a resistor at finite temperature:
σ 2 ∝ 4kB TR
◮
Limiting form of a number of other distributions
◮
Analytically tractable
But it is clearly not always applicable:
◮
◮
Experiments involving human intervention almost never
normally distributed: ‘possibility of outliers’
Non-linear processing algorithms:
◮
◮
Object detection / de-blending
De-convolution
◮
Electronics: 1/f noise and drift
◮
Sometimes ‘pre-process ’ observation to get the errors
closer to normally distributed (inevitably this leads to loss
of information)
Binomial distribution
If p is the probability of ‘success’ in one trial, binomial
distribution gives the probability of j successes in n
independent trials:
n j
P(j) =
p (1 − p)n−j
j
(Easily derived through combinatorial arguments)
(19)
Poisson distribution
Poisson distribution
Probability that n events will occur in time interval of length T
given that the underlying rate is λ per unit time:
P(n; T , λ) =
(λT )n exp(−λT )
n!
(20)
◮
Derived by generalising the binomial and then multinomial
distributions
◮
A discrete distribution: n is a discrete random variable
◮
Moments:
µ = λT
mean
(21)
µ2 = λT
variance
(22)
Poisson Distribution Plots
Rx
0.9
Poissonλ =2 (x)
0.270671
Poissonλ =2 (x′ )dx′
Poissonλ =5 (x)
0.175467
0.7
0.7
0.5
0.5
0.3
0.3
0.1
Poissonλ =5 (x′ )dx′
0.1
−2.5
0
2.5
5
7.5
−2.5
10
x
Rx
0.9
Poissonλ =10 (x)
0.12511
Poissonλ =10 (x′ )dx′
0
2.5
5
7.5
10
x
Rx
0.9
Poissonλ =100 (x)
0.039861
0.7
0.7
0.5
0.5
0.3
0.3
0.1
−5
Rx
0.9
Poissonλ =100 (x′ )dx′
0.1
0
5
10
x
15
20
−50
0
50
100
x
150
200
Poisson → Normal distribution
◮
As seen above, Poisson distribution quickly approaches
the normal distribution as T λ >> 1.
◮
The parameters of the limiting normal distribution:
µ = Tλ
√
σ = Tλ
(23)
(24)
The χ2 distribution
Y =
n
X
Xi2
X ∼ N(µ = 0, σ = 1)
i=1
=⇒ Y ∼ χ2n
Pχ2 (y; n) =
(25)
(26)
y (n/2)−1 exp (−y/2)
2n/2 Γ(n/2)
◮
Key use of the χ2 distribution is in model testing
◮
If fi is model for i-th random variable (=observable):
n X
Xi − fi 2
Y =
σi
i=1
(27)
(28)
χ2 plots
Rx 2 ′
χ (x )dx′
0.9
−6
−4
0.7
0.5
0.5
0.3
0.3
0.1
0.1
0
x
2
4
6
−6
−4
Rx 2 ′
χ (x )dx′
0.9
−2
0.7
0.7
0.5
0.5
0.3
0.3
−4
−2
2
4
6
4
6
Rx 2 ′
χ (x )dx′
5
χ 52 (x)
0.15418
0.1
−6
0
x
0.9
3
χ 32 (x)
0.241971
2
χ 22 (x)
0.497506
0.7
−2
Rx 2 ′
χ (x )dx′
0.9
1
χ 12 (x)
3.96953
0.1
0
x
2
4
6
−6
−4
−2
0
x
2
χ2 → Normal distribution
0.9
−20
Rx 2 ′
χ (x )dx′
0.7
0.5
0.5
0.3
0.3
0.1
0.1
0
x
10
20
−20
−10
0
x
Rx 2 ′
χ10 (x )dx′
0.9
−10
20
2 (x)
χ 30
0.0529946
0.7
0.7
0.5
0.5
0.3
0.3
0.1
0.1
0
x
10
Rx 2 ′
χ30 (x )dx′
0.9
2 (x)
χ 10
0.0976834
−20
5
χ 52 (x)
0.15418
0.7
−10
Rx 2 ′
χ (x )dx′
0.9
1
χ 12 (x)
225626
10
20
−60
−40
−20
0
x
20
40
60
χ2 → Normal distribution II
For χ2n :
◮
µ1 = n
◮
µ2 = 2n
◮
Kurtosis 3 + 12/k : converges to Normal
F -distribution
X1 ∼ χ2j
=⇒
◮
X1 /j
∼ Fj,k
X2 /k
X2 ∼ χ2k
Key use of the F −distribution is in testing of variances
(29)
(30)
Bibliography
Wall J. V., 1979, QJRAS, 20, 138