Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Properties of (Multivariate) Normal Distribution Common Distributions PMF x n−x (n x ) p (1 − p) CDF x −λ λ e x! λe−λx , 0 < x < ∞1 − e−λx σ2 Mx (t) np(1 − p) (1 − p + pet )n µ np λ λ 1 λ 1 λ2 σ2 r x e 1 b−a x−a b−a r 2r (a+b) 2 (b−a)2 12 n 0 µ (1 − βt)−α Pk ( i=1 pi eti )n (1 − 2t) −r 2 etb −eta t(b−a) Name Binomial Special Properties X + Y ∼ B(n + m, p) if X ∼ B(n, p), Y ∼ B(m, p) X ∼ B(n, p), Y |X ∼ B(X, q), then Y ∼ B(n, pq) Normal Order 2 raw moment: µ + σ 2 P Exponential X1 , ..., Xn iid, then min{X1 , ..., Xn } ∼ λi Memoryless: P (T > s + t|T > s) = P (T > t) Poisson Exp. num. of occurences in a given interval: λ Prob. that there are exactly x occurences P P If Xi ∼ P ois(λi ), then Xi ∼ P ois( λi ) Probability Laws and Properties Total Variance Total Expectation Variance Covariance Exp. of Cond. Var. Cond. Variance Bayes’ Theorem P P (A) = n i=1 P (A|Ci )P (Ci ) Cov(X, Y ) = E[Cov(X, Y |Z)] + Cov(E[X|Z], E[Y |Z]) V ar(Y ) = P E[V ar(Y |X)] + V ar(E[Y |X]) E(X) = y E(X|Y = y)P (Y = y) = E(E(X|Y )) V ar(Y ) = E[Y 2 ] − E[Y ]2 Cov(X, Y ) = E(XY ) − E(X)E(Y ) E[V ar(X2 |X1 )] = E(X 2 ) − E[E(X2 |X1 )2 ] V ar(Y |X) = E[Y 2 |X] − E[Y |X]2 P (A|Ck )P (Ck ) P (Ck |A) = Pn P (A|C )P (C ) i=1 Imp. by Prob. Space i i P (B) = P (A ∩ B) + P (Ac ∩ B) A ⊂ B ⇒ P (A) ≤ P (B) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) Transformation Single Variable Y = g(X), fY (y) = fX dg (g −1 (y))| −1 (y) | dy Two Variable ∂x ∂x 1 1 ∂y ∂y g(y1 , y2 ) = f (x1 , x2 ) ∂x12 ∂x22 , where xi is in terms of y. ∂y ∂y 1 2 Support needs to be transformed too! Inequalities Boole’s Bonferroni Pn P (∪n i=1 P (Ci ) i=1 Ci ) ≤ min{P (C1 ), P (C2 )} ≥ P (C1 Markov Chebyshev’s Jensen’s Cauchy-Schwarz Var. of Exp. P (C1 ) + P (C2 ) − 1 E(|x|) P (|x| ≥ a) ≤ a x−µx P (| σ | > b) ≤ b12 x φ is a convex function. φ(E[X]) ≤ E[φ(X)] E(XY )2 ≤ E(X 2 )E(Y 2 ) V ar[E(X2 |X1 )] ≤ V ar(X2 ) ∩ C2 ) 0 Σ−1 (x−µ) Independence Common Distributions Continued... Total Probability Total Covariance 1 t0 Σt t+ 1 2 MGF: e Y = Amxn x + b ⇒ Y ∼ Nm (Aµ + b, AΣA0 ) Cov(Xi , Xj ) = 0 ⇔ Xi ⊥ ⊥ Xj for i 6= j X1 |X2 ∼ Nm (E(X1 |X2 ), V ar(X1 |X2 )) where E(X1 |X2 ) = µ1 + Σ12 Σ−1 22 (x2 − µ2 ) V ar(X1 |X2 ) = Σ11 − Σ12 Σ−1 21 22 ΣP n 2 If X1 , XP 2 , ... ∼ N (µP i , σi ), Y = i=1 αi Xi , then n 2 σ2 ) Y ∼ N( n α µ , α i i i=1 i=1 i i σ2 1 Pn If X̄ = n i=1 Xi , then X̄ ∼ N (µ, n ) t −1 ) λ 1 σ 2 t2 µ+ 2 (1 − e 1 PMF: f (x) = (2π)− 2 |Σ|− 2 e− 2 (x−µ) eλ(e −1) 1 x−µ 2 √ 1 e− 2 ( σ ) Φ( x−µ ) µ σ 2πσ 2 1 xα−1 e−x/β αβ αβ 2 Γ(α)β α xk x1 n! p ...pk npi npi (1 − pi ) x1 !...xk ! 1 βα2 βαβ α β βα , α < x < ∞ 1 − ( ) x β−1 (β−2)(β−1)2 xβ+1 r −1 − x 1 2 2 Γ( r )2 2 2 Given X ∼ Nn (µ, Σ), t ≥ X⊥ ⊥ Y iff ∃g(x), h(y) s.t. f (x, y) = g(x)h(y) ⇔ P (a < X < b, c < Y < d) = P (a < X < b)P (c < Y < d) ⇔ M (tx , ty ) = M (tx , 0)M (0, ty ) ⇔ FXY (x, y) = FX (x)FY (y) ⇔ fXY (x, y) = fX (x)fY (y) ⇔ E[u(x)v(y)] = E[u(x)]E[v(y)] Conditional Independence 1) 2) 3) 4) X X⊥ ⊥ A, B ⇒ X ⊥ ⊥ A, X ⊥ ⊥B X⊥ ⊥ A|B, X ⊥ ⊥B⇒X⊥ ⊥ A, B X⊥ ⊥ A|B, X ⊥ ⊥ B|A ⇒ X ⊥ ⊥ A, B X⊥ ⊥ Y |Z, U is a function of X then i) U ⊥ ⊥ Y |Z and ii) ⊥ ⊥ Y |(Z, U ) Definitions and Theorems σ-Algebra 1) Nonempty: S ∈ Γ ⇒ ∅ ∈ Γ 2) Closed under Complementation: A ∈ Γ ⇒ Ac ∈ Γ 3) Closed under Countable Unions: A1 , A2 , ... ∈ Γ ⇒ ∪∞ i=1 Ai ∈ Γ Kolmogorov Axioms of a Probability Measure 1) ∀A ∈ Γ, P (A) ≥ 0 2) P (S) = 1 3) ∀{Ai }∞ i=1 in PΓ∞s.t. i 6= j, Ai ∩ Aj = ∅, P (∪∞ i=1 P (Ai ) i=1 Ai ) = P P Since Yn −→ θ, remainder √ −→ 0. By Slutsky’s Theorem, √ n[g(Yn ) − g(θ)] = g 0 (θ) n(Yn − θ). Slutsky’s Theorem D P If Xn −→ X, Yn −→ a, then D D Yn Xn −→ aX, and Xn + Yn −→ X + a Def. of Convergence in Probability P Xn −→ X if ∀ > 0, limn→∞ P (|Xn − X| ≥ ] = 0 or limn→∞ P [|Xn − X| < ] = 1 Weak Law of Large Numbers {Xn } be iid with mean µ and σ 2 < ∞, then i=1 P Xi −→ µ P P 2) Xn −→ X, then aXn −→ aX P P 3) Xn −→ a and g(.) is continuous at a, then g(Xn ) −→ g(a) P P P 4) Xn −→ X, Yn −→ Y , then Xn Yn −→ XY Def. of Convergence in Distribution and Additional Theorems D FXn and FX are cdfs of Xn and X. Xn −→ X if limn→∞ FXn (x) = FX (x)∀x ∈ C(FX ) (set of all points where FX is continuous. P D 1) Xn −→ X, then Xn −→ X D P 2) Xn −→ b, then Xn −→ b D D D 3) Xn −→ X, Yn −→ 0, then Xn + Yn −→ X D 4) Xn −→ X, g(.) is continuous on support of X, then D g(Xn ) −→ g(X) Moment Generating Function Technique MXn (t) for {Xn } exists for −h < t < h ∀n, X with M (t) which exists for |t| ≤ h1 ≤ h, if limn→∞ MXn (t) = M (t) for |t| ≤ h1 , D then Xn −→ X Misc. h i d −b 1 A = ac db , A−1 = ad−bc −c a P n x n−x (a + b)nR= n x = 0(x)a b ∞ α−1 −y Γ(α) = 0 x e dy a3 + b3 = (a + b)(a2 − ab + b2 ) a3 − b3 = (a − b)(a2 + ab + b2 ) σY E(Y |X) = µY + ρ σ (x − µx ) X 2 (1 − ρ2 ) E[V ar(YR|X)] = σY ∞ Convolution Z = X +Y fZ (z) = −∞ fX (z − y)fY (y)dy R g(y) d Calculus f (x)dx = f (g(y))g 0 (y) − dy h(y) 0 f (h(y))h (y) R −ax −e−ax (ax+1) xe dx = a2 R −ax −ax e dx = −ea Convergence Concepts Consistency of Extremum Estimators Theorem Assume the following: 1) Θ is compact set in RK 2) For some function Q : Θ → R, plimN →∞ supθ∈Θ |QN (θ) − Q(θ)| = 0. 3) The function Q is continuous. 4) Q is uniquely maximized over Θ at θ = θ∗ . Then, plimN →∞ θ̂N = θ∗ Useful Results/Strategies Delta Method √ D Assume sequence Yn satisfies n(Yn − θ) −→ n(0, σ 2 ), given function g(.) and specific value θ, g 0 (θ) exists and 6= 0, √ D then n[g(Yn ) − g(θ)] −→ N (0, σ 2 (g 0 (θ))2 ) PF: Taylor expansion of g(YN ) around YN = θ is g(Yn ) = g(θ) + g 0 (θ)(Yn − θ) + Remainder. Pn Results from WLLN P P P 1) Xn −→ X, Yn −→ X, then Xn + Yn −→ X + Y Inverse of 2x2 Binomial Formula Gamma Function Sum of Cubes Difference of Cubes Linearity in E(Y |X) Central Limit Theorem √ D n(X n − µ) −→ N (0, σ 2 ) 1 n Distributions of Ordinal Statistics Suppose X ⊥ ⊥ Y , FX , FY , Z1 = min{X, Y }, Z2 = max{X, Y }: 1) FZ1 (t) = 1 − (1 − FX (t))(1 − FY (t)) 2) FZ2 (w) = FX (w)FY (w) 3) ( FZ1 Z2 (t, w) = FX (w)FY (w) if w < t FX (t)FY (w) + FX (w)FY (t) − FX (t)FY (t) if t ≤ w 4) fZ1 Z2 (t, w) = fX (t)fY (w) + fX (w)fY (t) if t ≤ w Identification 1) Multiply by constant or add/substract a constant to show distributions remain the same 2) Check if mean/variance are different if you assume something is different 3) Choose different obs. x and x0 ; take difference; assume cdfs are the same; conclude parameter is the same 4) Take the difference of moments 5) Assume paramters are different; conclude cdf is different Partial Identification 1) If linear, assume norm = 1 for vector of coefficients 2) Assume inverse cdf is linear (get slope and intercept terms) 3) Assume first vector element = 1 (identify a cdf) 4) 5) 6) 7) Assume two unknown parameters are the same Assume a function returns 1 or 0 when x = 0 Assume no constant term Specify support (e.g. {1} x R2 ; use to get mean/variance) Solutions to HW/Past Exams