Download ST 521: Statistical Theory I

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Transcript
ST 521: Statistical Theory I
Armin Schwartzman
Class topic 8
Continuous Random Variables
Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Applications in geometric probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exponential Distribution
Exponential Distribution. . .
Interpretation . . . . . . . . . .
Alternative parametrization.
Memoryless property . . . . .
Shifted exponential . . . . . .
Double Exponential . . . . . .
2
3
4
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
7
8
9
10
11
12
Normal Distribution
Normal Distribution. . . . . . . . .
cont. . . . . . . . . . . . . . . . . . . .
Density integrates to 1 . . . . . . .
Notes . . . . . . . . . . . . . . . . . .
χ2 distribution . . . . . . . . . . . .
Student’s t and F distributions .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
15
16
17
18
19
More Distributions
Gamma distribution .
Notes . . . . . . . . . .
Beta distribution . . .
cont. . . . . . . . . . . .
Cauchy distribution .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
21
22
23
24
25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2 / 25
Continuous Random Variables
Uniform Distribution
Uniform: Y ∼ U [a, b]:
sample space [a, b]
pdf:
f (y) =
cdf:
F (y) =
a
y
1
b−a
for a < y ≤ b
0
elsewhere
⎧
⎪
⎨0
1
dx = y−a
b−a
⎪
b−a
⎩
1
for y < a
for a < y ≤ b
for y > b
moments
E(Y ) = (a + b)/2
(b − a)2
Var(Y ) =
12
ST 521 – Class topic 8
Fall 2014 – 3 / 25
Notes
The uniform extends to the continuous case the idea of equally likely outcomes.
If Y ∼ U [0, 1], then a + (b − a)Y ∼ U [a, b].
Useful for settings where individuals arrive at a destination, enter a study, etc. randomly over time.
Example: Suppose that buses run every 30 minutes. Suppose you arrive at a station in the middle of the
route. What is the probability that you have to wait more than 15 minutes? 5 minutes?
ST 521 – Class topic 8
Fall 2014 – 4 / 25
2
Applications in geometric probability
Example: Bertrand’s paradox
Consider a random chord of a circle with radius r. What is the probability that the length of
the chord will exceed the side of the equilateral triangle inscribed in the circle? (Note: this will
depend on how you define “random”)
ST 521 – Class topic 8
Fall 2014 – 5 / 25
6 / 25
Exponential Distribution
Exponential Distribution
Denoted X ∼ Exp(λ):
sample space: y ≥ 0
pdf:
f (y) =
cdf:
F (y) =
y
−λy
λe
0
λe−λy for y ≥ 0
0
elsewhere
1 − e−λy
dx =
0
for y ≥ 0
for y < 0
moments:
E(Y ) = 1/λ
Var(Y ) = 1/λ2
MX (t) = λ/(λ − t),
ST 521 – Class topic 8
t<λ
Fall 2014 – 7 / 25
3
Interpretation
The exponential can be derived as the waiting time between Poisson events. Suppose that the number of
events in a unit interval of time follows a Poisson(λ) distribution. Then, let Y be the time until the first
event.
P (Y > t) = P (0 events in [0, t])
But, the number of events in [0, t] follows a Poisson distribution with parameter λt. Hence,
P (Y > t) = e−λt .
The cdf corresponding to Y is
F (t) = 1 − P (Y > t) = 1 − e−λt
and hence the density is f (t) = λe−λt .
ST 521 – Class topic 8
Fall 2014 – 8 / 25
Alternative parametrization
Many books write the density as
f (y) =
1 −y/θ
θe
for y ≥ 0
0
elsewhere
so that E(Y ) = θ and Var(Y ) = θ 2 .
In this case θ = 1/λ is called the mean parameter, while λ = 1/θ is called the rate parameter.
ST 521 – Class topic 8
Fall 2014 – 9 / 25
4
Memoryless property
The exponential is memoryless, just like the geometric.
P (Y > s + t|Y > t) = P (Y > s)
Same interpretation as the geometric for continuous time:
The probability of an event in a time interval depends only on the length of the interval, not the
absolute time of the interval.
The underlying Poisson process is stationary: the rate λ is constant. (In the geometric, the prob. p of
getting an event in every discrete time unit is constant.)
ST 521 – Class topic 8
Fall 2014 – 10 / 25
Shifted exponential
Let X ∼ Exp(λ) and Y = X + v, v ∈ R.
Y has the shifted exponential distribution with pdf:
λe−(y−v)λ for y ≥ v
f (y) =
0
elsewhere
Interpretation:
v > 0: Event is delayed
v < 0: The news of the event is delayed
Does the shifted exponential mantain the memoryless property?
ST 521 – Class topic 8
Fall 2014 – 11 / 25
5
Double Exponential
The double exponential distribution is formed by reflecting and exponential distribution around zero. It
has pdf:
x∈R
f (x) = 0.5λe−λ|x| ,
Suppose X has the above distribution with λ = 1.
Now let Y = σX + μ, μ ∈ R (shifting) and σ > 0 (scaling).
Then Y has the Laplace distribution with pdf:
1
|y − μ|
exp −
fY (y) =
2σ
σ
and moments
VarY = 2σ 2
EY = μ,
The Laplace distribution provides an alternative to the normal for centered data with fatter tails but all
finite moments.
ST 521 – Class topic 8
Fall 2014 – 12 / 25
13 / 25
Normal Distribution
Normal Distribution
Introduced by De Moivre (1667 - 1754) in 1733 as an approximation to the binomial. Later studied by
Laplace and others as part of the Central Limit Theorem. Gauss derived the normal as a suitable
distribution for outcomes that could be thought of as sums of many small deviations.
sample space: R = (−∞, ∞)
pdf: For Y ∼ N (μ, σ 2 ),
f (y) = √
2
2
1
e−(y−μ) /2σ
2πσ
−∞<y <∞
cdf: There is no closed form.
If μ = 0 and σ 2 = 1 the distr. is called standard normal:
Φ(y) = P (Y ≤ y)
Φ(−y) = 1 − Φ(y)
ST 521 – Class topic 8
Fall 2014 – 14 / 25
6
cont.
Mean and variance:
Var(Y ) = E(Y − μ)2 = σ 2
EY = μ,
Higher central moments:
m
E(Y − μ)
In particular:
Standardization:
(m − 1)(m − 3) · · · 1 σ m
=
0
μ3 = E(Y − μ)3 = 0,
m even
m odd
μ4 = E(Y − μ)4 = 3σ 4
Y −μ
∼ N (0, 1)
σ
Z ∼ N (0, 1) ⇔ Y = σZ + μ ∼ N (μ, σ 2 )
Y ∼ N (μ, σ 2 ) ⇔ Z =
ST 521 – Class topic 8
Fall 2014 – 15 / 25
Density integrates to 1
Theorem:
∞
−∞
1
2
√ e−y /2 dy = 1
2π
Proof:
This is not as easy as one might think. Call the integral I. Then,
ST 521 – Class topic 8
Fall 2014 – 16 / 25
7
Notes
Normal distribution useful in many practical settings. E.g. measurement error.
Plays an important role in sampling distributions in large samples, since the Central Limit theorem
says that sums of independent identically distributed random variables are approximately normal
There are many important distributions that can be derived from functions of normal random
variables (e.g. χ2 , t, F ). We will see much more on this later, for now, we will briefly present the
pdf’s and sample spaces of these distributions.
ST 521 – Class topic 8
Fall 2014 – 17 / 25
χ2 distribution
If Z ∼ N (0, 1), then X = Z 2 has the χ2 distribution with 1 degree of freedom.
More generally, we have the χ2 distribution with ν degrees of freedom with pdf:
ν
f (x) =
(x/2) 2 −1 e−x/2
,
2Γ(ν/2)
x>0
where Γ(a) is the complete gamma function,
Γ(a) =
∞
xa−1 e−x dx
0
χ2 (ν)
distribution is a special case of the gamma distribution, so it is easier to derive its properties
The
from the gamma.
ST 521 – Class topic 8
Fall 2014 – 18 / 25
8
Student’s t and F distributions
Y has a tk distribution (t with ν degrees of freedom) if its pdf can be written as:
1
Γ[(ν + 1)/2)]
,
f (y) = √
νπΓ(ν/2) (1 + y 2 /ν)(ν+1)/2
−∞ < y < ∞
Y has an F (ν1 , ν2 ) distribution if its pdf can be written as:
f (y) =
(ν1 /ν2 )Γ[(ν1 + ν2 )/2)](ν1 y/ν2 )ν1 /2−1
,
Γ(ν1 /2)Γ(ν2 /2)(1 + ν1 y/ν2 )(ν1 +ν2 )/2
0≤y<∞
There are many important properties and relationships between these three distributions (e.g. χ2k is the
distribution of the sum of the squares of k independent standard normals). We’ll come back to these in a
few weeks when we do sampling distributions and transformations of the normal distribution.
ST 521 – Class topic 8
Fall 2014 – 19 / 25
20 / 25
More Distributions
Gamma distribution
Notation: Y ∼ gamma(a, λ).
pdf:
f (y) =
λe−λy (λy)a−1
,
Γ(a)
y≥0
where Γ(a) is the complete gamma function,
Γ(a) =
∞
xa−1 e−x dx
0
cdf: In general, there is no closed form, unless a is an integer.
moments
E(Y ) = a/λ
Var(Y ) = a/λ2
ST 521 – Class topic 8
Fall 2014 – 21 / 25
9
Notes
The special case a = 1 corresponds to an exponential(λ)
Can be thought of as a flexible generalization of the exponential (a can be interpreted as a shape
parameter)
The special case gamma(n/2, 1/2), for integer n, corresponds to the χ2 distribution with n degrees
of freedom.
We will see later in the class that the gamma distribution can be derived as the sum of a independent
exponential(λ) distributions
When a is an integer, the gamma(a, λ) distribution can be derived as the distribution of time until
the occurence of the ath event in a Poisson process.
ST 521 – Class topic 8
Fall 2014 – 22 / 25
Beta distribution
Notation: Y ∼ beta(a, b).
sample space: [0,1]
pdf:
f (y) =
y a−1 (1 − y)b−1
,
B(a, b)
0≤y≤1
where B(a, b) is the (complete) Beta function,
B(a, b) =
1
0
xa−1 (1 − x)b−1 dx =
Γ(a)Γ(b)
,
Γ(a + b)
and Γ(a) is the complete gamma function.
Note that if a and b are integers, then B(a, b) can be calculated in closed form.
ST 521 – Class topic 8
Fall 2014 – 23 / 25
10
cont.
cdf: In general, there is no closed form, except if a and b are integers.
moments
E(Y ) =
Var(Y ) =
a
a+b
ab
(a +
b)2 (a
+ b + 1)
The beta distribution is very flexible, and can take a wide variety of shapes by varying its parameters.
Special case: beta(1, 1) = U (0, 1).
ST 521 – Class topic 8
Fall 2014 – 24 / 25
Cauchy distribution
This is a famous distribution to mathematical statisticians, since it often serves as a useful
counterexample.
pdf
1
1
for − ∞ < y < ∞
f (y) =
π [1 + (y − μ)2 /σ 2 ]
The Cauchy with μ = 0, σ = 1, corresponds to the t-distribution with 1 degree of freedom.
While the moments of the Cauchy are not defined, its quantiles are (HW).
The Cauchy is not just a pathological case. We’ll see later that the ratio of two standard normals is
Cauchy. So ratios of observations can be problematic (e.g. BMI).
* Read C-B Section 3.3 (lognormal, etc.)
ST 521 – Class topic 8
Fall 2014 – 25 / 25
11