Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Properties of (Multivariate) Normal Distribution
Common Distributions
PMF
x
n−x
(n
x ) p (1 − p)
CDF
x −λ
λ e
x!
λe−λx , 0 < x < ∞1 − e−λx
σ2
Mx (t)
np(1 − p) (1 − p + pet )n
µ
np
λ
λ
1
λ
1
λ2
σ2
r
x
e
1
b−a
x−a
b−a
r
2r
(a+b)
2
(b−a)2
12
n
0
µ
(1 − βt)−α
Pk
( i=1 pi eti )n
(1 − 2t)
−r
2
etb −eta
t(b−a)
Name
Binomial
Special Properties
X + Y ∼ B(n + m, p) if X ∼ B(n, p), Y ∼ B(m, p)
X ∼ B(n, p), Y |X ∼ B(X, q), then Y ∼ B(n, pq)
Normal
Order 2 raw moment: µ + σ 2 P
Exponential
X1 , ..., Xn iid, then min{X1 , ..., Xn } ∼
λi
Memoryless: P (T > s + t|T > s) = P (T > t)
Poisson
Exp. num. of occurences in a given interval: λ
Prob. that there are exactly
x occurences
P
P
If Xi ∼ P ois(λi ), then
Xi ∼ P ois( λi )
Probability Laws and Properties
Total Variance
Total Expectation
Variance
Covariance
Exp. of Cond. Var.
Cond. Variance
Bayes’ Theorem
P
P (A) = n
i=1 P (A|Ci )P (Ci )
Cov(X, Y )
=
E[Cov(X, Y |Z)] +
Cov(E[X|Z], E[Y |Z])
V ar(Y ) = P
E[V ar(Y |X)] + V ar(E[Y |X])
E(X) =
y E(X|Y = y)P (Y = y) =
E(E(X|Y ))
V ar(Y ) = E[Y 2 ] − E[Y ]2
Cov(X, Y ) = E(XY ) − E(X)E(Y )
E[V ar(X2 |X1 )] = E(X 2 ) − E[E(X2 |X1 )2 ]
V ar(Y |X) = E[Y 2 |X] − E[Y |X]2
P (A|Ck )P (Ck )
P (Ck |A) = Pn P (A|C
)P (C )
i=1
Imp. by Prob. Space
i
i
P (B) = P (A ∩ B) + P (Ac ∩ B)
A ⊂ B ⇒ P (A) ≤ P (B)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Transformation
Single Variable
Y = g(X), fY (y) = fX
dg
(g −1 (y))|
−1
(y)
|
dy
Two Variable
∂x ∂x 1
1 ∂y ∂y g(y1 , y2 ) = f (x1 , x2 ) ∂x12 ∂x22 , where xi is in terms of y.
∂y ∂y 1
2
Support needs to be transformed too!
Inequalities
Boole’s
Bonferroni
Pn
P (∪n
i=1 P (Ci )
i=1 Ci ) ≤
min{P (C1 ), P (C2 )} ≥ P (C1
Markov
Chebyshev’s
Jensen’s
Cauchy-Schwarz
Var. of Exp.
P (C1 ) + P (C2 ) − 1
E(|x|)
P (|x| ≥ a) ≤ a
x−µx
P (| σ | > b) ≤ b12
x
φ is a convex function. φ(E[X]) ≤ E[φ(X)]
E(XY )2 ≤ E(X 2 )E(Y 2 )
V ar[E(X2 |X1 )] ≤ V ar(X2 )
∩ C2 )
0
Σ−1 (x−µ)
Independence
Common Distributions Continued...
Total Probability
Total Covariance
1
t0 Σt
t+ 1
2
MGF: e
Y = Amxn x + b ⇒ Y ∼ Nm (Aµ + b, AΣA0 )
Cov(Xi , Xj ) = 0 ⇔ Xi ⊥
⊥ Xj for i 6= j
X1 |X2 ∼ Nm (E(X1 |X2 ), V ar(X1 |X2 )) where
E(X1 |X2 ) = µ1 + Σ12 Σ−1
22 (x2 − µ2 )
V ar(X1 |X2 ) = Σ11 − Σ12 Σ−1
21
22 ΣP
n
2
If X1 , XP
2 , ... ∼ N (µP
i , σi ), Y =
i=1 αi Xi , then
n
2 σ2 )
Y ∼ N( n
α
µ
,
α
i
i
i=1
i=1 i i
σ2
1 Pn
If X̄ = n
i=1 Xi , then X̄ ∼ N (µ, n )
t −1
)
λ
1 σ 2 t2
µ+ 2
(1 −
e
1
PMF: f (x) = (2π)− 2 |Σ|− 2 e− 2 (x−µ)
eλ(e −1)
1 x−µ 2
√ 1
e− 2 ( σ ) Φ( x−µ
) µ
σ
2πσ 2
1
xα−1 e−x/β
αβ
αβ 2
Γ(α)β α
xk
x1
n!
p ...pk
npi npi (1 − pi )
x1 !...xk ! 1
βα2
βαβ
α β βα
,
α
<
x
<
∞
1
−
(
)
x
β−1 (β−2)(β−1)2
xβ+1
r −1 − x
1
2
2
Γ( r
)2 2
2
Given X ∼ Nn (µ, Σ),
t
≥
X⊥
⊥ Y iff
∃g(x), h(y) s.t. f (x, y) = g(x)h(y) ⇔
P (a < X < b, c < Y < d) = P (a < X < b)P (c < Y < d) ⇔
M (tx , ty ) = M (tx , 0)M (0, ty ) ⇔
FXY (x, y) = FX (x)FY (y) ⇔
fXY (x, y) = fX (x)fY (y) ⇔
E[u(x)v(y)] = E[u(x)]E[v(y)]
Conditional Independence
1)
2)
3)
4)
X
X⊥
⊥ A, B ⇒ X ⊥
⊥ A, X ⊥
⊥B
X⊥
⊥ A|B, X ⊥
⊥B⇒X⊥
⊥ A, B
X⊥
⊥ A|B, X ⊥
⊥ B|A ⇒ X ⊥
⊥ A, B
X⊥
⊥ Y |Z, U is a function of X then i) U ⊥
⊥ Y |Z and ii)
⊥
⊥ Y |(Z, U )
Definitions and Theorems
σ-Algebra
1) Nonempty: S ∈ Γ ⇒ ∅ ∈ Γ
2) Closed under Complementation: A ∈ Γ ⇒ Ac ∈ Γ
3) Closed under Countable Unions: A1 , A2 , ... ∈ Γ ⇒ ∪∞
i=1 Ai ∈ Γ
Kolmogorov Axioms of a Probability Measure
1) ∀A ∈ Γ, P (A) ≥ 0
2) P (S) = 1
3) ∀{Ai }∞
i=1 in
PΓ∞s.t. i 6= j, Ai ∩ Aj = ∅,
P (∪∞
i=1 P (Ai )
i=1 Ai ) =
P
P
Since
Yn −→ θ, remainder
√ −→ 0. By Slutsky’s Theorem,
√
n[g(Yn ) − g(θ)] = g 0 (θ) n(Yn − θ).
Slutsky’s Theorem
D
P
If Xn −→ X, Yn −→ a, then
D
D
Yn Xn −→ aX, and Xn + Yn −→ X + a
Def. of Convergence in Probability
P
Xn −→ X if ∀ > 0, limn→∞ P (|Xn − X| ≥ ] = 0 or
limn→∞ P [|Xn − X| < ] = 1
Weak Law of Large Numbers
{Xn } be iid with mean µ and σ 2 < ∞, then
i=1
P
Xi −→ µ
P
P
2) Xn −→ X, then aXn −→ aX
P
P
3) Xn −→ a and g(.) is continuous at a, then g(Xn ) −→ g(a)
P
P
P
4) Xn −→ X, Yn −→ Y , then Xn Yn −→ XY
Def. of Convergence in Distribution and Additional Theorems
D
FXn and FX are cdfs of Xn and X. Xn −→ X if
limn→∞ FXn (x) = FX (x)∀x ∈ C(FX ) (set of all points where FX
is continuous.
P
D
1) Xn −→ X, then Xn −→ X
D
P
2) Xn −→ b, then Xn −→ b
D
D
D
3) Xn −→ X, Yn −→ 0, then Xn + Yn −→ X
D
4) Xn −→ X, g(.) is continuous on support of X, then
D
g(Xn ) −→ g(X)
Moment Generating Function Technique
MXn (t) for {Xn } exists for −h < t < h ∀n, X with M (t) which
exists for |t| ≤ h1 ≤ h, if limn→∞ MXn (t) = M (t) for |t| ≤ h1 ,
D
then Xn −→ X
Misc.
h
i
d −b
1
A = ac db , A−1 = ad−bc
−c a
P
n
x n−x
(a + b)nR= n
x = 0(x)a b
∞ α−1 −y
Γ(α) = 0 x
e dy
a3 + b3 = (a + b)(a2 − ab + b2 )
a3 − b3 = (a − b)(a2 + ab + b2 )
σY
E(Y |X) = µY + ρ σ
(x − µx )
X
2 (1 − ρ2 )
E[V ar(YR|X)] = σY
∞
Convolution Z = X +Y fZ (z) = −∞
fX (z − y)fY (y)dy
R g(y)
d
Calculus
f
(x)dx
=
f (g(y))g 0 (y) −
dy h(y)
0
f (h(y))h (y)
R −ax
−e−ax (ax+1)
xe
dx =
a2
R −ax
−ax
e
dx = −ea
Convergence Concepts
Consistency of Extremum Estimators Theorem
Assume the following:
1) Θ is compact set in RK
2) For some function Q : Θ → R,
plimN →∞ supθ∈Θ |QN (θ) − Q(θ)| = 0.
3) The function Q is continuous.
4) Q is uniquely maximized over Θ at θ = θ∗ .
Then, plimN →∞ θ̂N = θ∗
Useful Results/Strategies
Delta Method
√
D
Assume sequence Yn satisfies n(Yn − θ) −→ n(0, σ 2 ),
given function g(.) and specific value θ, g 0 (θ) exists and 6= 0,
√
D
then n[g(Yn ) − g(θ)] −→ N (0, σ 2 (g 0 (θ))2 )
PF: Taylor expansion of g(YN ) around YN = θ is
g(Yn ) = g(θ) + g 0 (θ)(Yn − θ) + Remainder.
Pn
Results from WLLN
P
P
P
1) Xn −→ X, Yn −→ X, then Xn + Yn −→ X + Y
Inverse of 2x2
Binomial Formula
Gamma Function
Sum of Cubes
Difference of Cubes
Linearity in E(Y |X)
Central Limit Theorem
√
D
n(X n − µ) −→ N (0, σ 2 )
1
n
Distributions of Ordinal Statistics
Suppose X ⊥
⊥ Y , FX , FY , Z1 = min{X, Y }, Z2 = max{X, Y }:
1) FZ1 (t) = 1 − (1 − FX (t))(1 − FY (t))
2) FZ2 (w) = FX (w)FY (w)
3)
( FZ1 Z2 (t, w) =
FX (w)FY (w)
if w < t
FX (t)FY (w) + FX (w)FY (t) − FX (t)FY (t) if t ≤ w
4) fZ1 Z2 (t, w) = fX (t)fY (w) + fX (w)fY (t) if t ≤ w
Identification
1) Multiply by constant or add/substract a constant
to show distributions remain the same
2) Check if mean/variance are different if you assume something
is different
3) Choose different obs. x and x0 ; take difference; assume
cdfs are the same; conclude parameter is the same
4) Take the difference of moments
5) Assume paramters are different; conclude cdf is different
Partial Identification
1) If linear, assume norm = 1 for vector of coefficients
2) Assume inverse cdf is linear (get slope and intercept terms)
3) Assume first vector element = 1 (identify a cdf)
4)
5)
6)
7)
Assume two unknown parameters are the same
Assume a function returns 1 or 0 when x = 0
Assume no constant term
Specify support (e.g. {1} x R2 ; use to get mean/variance)
Solutions to HW/Past Exams