Download =sample mean Linear Transformation Constants a and b, if …then

𝑋̅=sample mean Linear Transformation Constants a and b, if 𝑦𝑖 = 𝑎𝑥𝑖 + 𝑏∀𝑖 = {1, 𝑛} …then 𝑦̅ = 𝑎𝑥̅ + 𝑏 Ex temperature in F to C, compute average in F, then convert that average to C 1 𝑠 2 =sample variance= ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 𝑛−1 Elements of Probability Sample space S: set of all possible outcomes Event 𝐸, Any subset of sample set Union: 𝐸 ∪ 𝐹, E || F Intersect: 𝐸 ∩ 𝐹, E && F, EF Null event: ∅ has no outcome Compliment: , 𝐸 𝑐 , all outcomes in 𝑆 but not in 𝐸 𝐸 𝑐 and 𝐸 are mutually exclusive, 𝑆 𝑐 = ∅ Containment 𝐸⊂𝐹 or 𝐹⊃𝐸 all out comes in 𝐸 are also in 𝐹 Example: 𝐸={(𝐻,𝐻)} ⊂ 𝐹={(𝐻,𝐻),(𝐻,𝑇), (𝑇,𝐻)} Occurrence of 𝐸 implies that of 𝐹 Example: getting two heads implies getting head at all Equality 𝐸=𝐹, if 𝐸⊂𝐹 and 𝐸⊃𝐹 Commutative law 𝐸∪𝐹=𝐹∪𝐸, 𝐸𝐹=𝐹𝐸 Associative law (𝐸∪𝐹)∪𝐺=𝐸∪(𝐹∪𝐺), (𝐸𝐹)𝐺=𝐸(𝐹𝐺) Distributive law (𝐸∪𝐹)=𝐸𝐺∪𝐹𝐺, 𝐸𝐹∪𝐺=(𝐸∪𝐺)(𝐹∪𝐺) DeMorgan’s law (𝐸∪𝐹)^𝑐=𝐸^𝑐 𝐹^𝑐, (𝐸𝐹)^𝑐=𝐸^𝑐 𝐹^𝑐 Axioms of Probability Probability of event 𝐸=Proportion of times 𝐸 occurs (out come is in 𝐸) in repeated experiments Axiomatic definition: For each event 𝐸 of an experiment with sample space 𝑆, assign a number (𝐸). If (𝐸) satisfies the following three axioms 0≤(𝐸)≤1, 𝑃(𝑆)=1 For any sequence of mutually exclusive events 𝐸_1,_2,…, 𝑃(⋃𝑛𝑖=1 𝐸𝑖 ) = ∑𝑛𝑖=1 𝑃(𝐸𝑖 ) , 𝑛 = 1,2, … , ∞ We call (𝐸) the probability of event 𝐸 Example: 𝑃(𝐸 𝑐 ) = 1 − 𝑃(𝐸) Proof 𝐸 𝑐 and 𝐸 are mutually exclusive, 𝑃(𝐸 𝑐 ) + 𝑃(𝐸) = 𝑃(𝐸 𝑐 ∪ 𝐸) = 𝑃(𝑆) = 1 Example: 𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸𝐹) Proof: Use Venn diagram 𝑃(𝐸) 𝑃(𝐸) Odds of event 𝐸 𝑐) = 𝑃(𝐸 1−𝑃(𝐸) 𝑃 = 1/2, odds = 1, it is equally likely 𝑃 = 3/4, odds = 3, it is 3 times as likely to occur Conditional Probability 𝑃(𝐸𝐹) 𝑃(𝐸|𝐹) = , 𝑃(𝐸𝐹) = 𝑃(𝐸)𝑃(𝐹|𝐸) 𝑃(𝐹) Law of Total Probability 𝑃(𝐹) = ∑𝑛𝑖=1 𝑃(𝐸𝑖 )𝑃(𝐹|𝐸𝑖 ) 𝑃(𝐹) = 𝑃(𝐹|𝐸)𝑃(𝐸) + 𝑃(𝐹|𝐸 𝑐 )𝑃(𝐸 𝑐 ) Proof: 𝑃(𝐹) = 𝑃(𝐹𝑆) = 𝑃(𝐹(𝐸 ∪ 𝐸 𝑐 )) 𝐹(𝐸 ∪ 𝐸 𝑐 ) = 𝐹𝐸 ∪ 𝐹𝐸 𝑐 , 𝑃(𝐹(𝐸 ∪ 𝐸 𝑐 )) = 𝑃(𝐹𝐸 ∪ 𝐹𝐸 𝑐 ) 𝐹𝐸 and 𝐹𝐸 𝑐 are mutually exclusive 𝑃(𝐹𝐸 ∪ 𝐹𝐸 𝑐 ) = 𝑃(𝐹𝐸) + 𝑃(𝐹𝐸 𝑐 ) 𝑃(𝐸)𝑃(𝐹|𝐸) Bayes’ Formula 𝑃(𝐸|𝐹) = 𝑃(𝐹) = 𝑃(𝐸)𝑃(𝐹|𝐸) 𝑃(𝐸)𝑃(𝐹|𝐸)+𝑃(𝐸 𝑐 )𝑃(𝐹|𝐸 𝑐 ) 𝑃(𝐸 )𝑃(𝐹|𝐸𝑖 ) 𝑃(𝐸𝑖 |𝐹) = ∑𝑛 𝑖 )𝑃(𝐹|𝐸 𝑗) 𝑗=1 𝑃(𝐸𝑗 Independent Events 𝐸 ⊥ 𝐹, if 𝑃(𝐸𝐹) = 𝑃(𝐸)𝑃(𝐹) 𝑃(𝐸𝐹) 𝑃(𝐸|𝐹) = = 𝑃(𝐸) 𝑃(𝐹) Discrete r.v. Pmf = 𝑝(𝑥) = 𝑃{𝑋 = 𝑥} = 𝑃{𝑒 ∈ 𝑆: 𝑋(𝑒) = 𝑥} Cdf = 𝐹(𝑥) = 𝑃{𝑋 ≤ 𝑥} = ∑𝑦≤𝑥 𝑃{𝑋 = 𝑦} = ∑𝑦≤𝑥 𝑝(𝑦) Continuous r.v. Pdf = 𝑃{𝑋 ∈ 𝐵} = ∫𝐵 𝑓(𝑥)𝑑𝑥 𝑎 Independent r.v. 𝑋 ⊥ 𝑌, if for any two sets of real numbers 𝐴 and 𝐵, 𝑃{𝑋 ∈ 𝐴, 𝑌 ∈ 𝐵} = 𝑃{𝑋 ∈ 𝐴}𝑃{𝑌 ∈ 𝐵} If 𝑋 ⊥ 𝑌: 𝐹(𝑎, 𝑏) = 𝐹𝑋 (𝑎)𝐹𝑌 (𝑏) 𝑝(𝑥, 𝑦) = 𝑝𝑋 (𝑥)𝑝𝑌 (𝑦), for discrete RVs 𝑓(𝑥, 𝑦) = 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦), for continuous RVs Conditional Distribution 𝑃{𝑋=𝑥,𝑌=𝑦} 𝑝(𝑥,𝑦) Disc: 𝑝𝑋|𝑌 (𝑥|𝑦) = 𝑃{𝑋 = 𝑥|𝑌 = 𝑦} = = (𝑦) 𝑃{𝑌=𝑦} Continuous: 𝑓𝑋|𝑌 (𝑥|𝑦) = 𝑓(𝑥,𝑦) 𝑝𝑌 𝑓𝑌 (𝑦) Expectation discrete RV: 𝐸[𝑋] = ∑𝑖 𝑥𝑖 𝑝(𝑥𝑖 ) ∞ Continuous RV: 𝐸[𝑋] = ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 𝐸[𝑋 + 𝑌] = 𝐸[𝑋] + 𝐸[𝑌] 𝐸[∑𝑛𝑖=1 𝑋𝑖 ] = ∑𝑛𝑖=1 𝐸[𝑋𝑖 ] If independence, 𝐸[𝑋𝑌] = 𝐸[𝑋]𝐸[𝑌] To solve optimization problems: -choose quantity 𝑐 to represent r.v. 𝑋 MSE=Mean Squared Error, 𝑀𝑆𝐸(𝑐) = 𝐸[(𝑋 − 𝑐)2 ] Let 𝐸[𝑋] = 𝜇 that minimizes MSE, 𝑀𝑆𝐸(𝜇) ≤ 𝑀𝑆𝐸(𝑐)∀𝑐 ∞ Law of the Lazy Statistician 𝐸[𝑔(𝑋)] = ∫−∞ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥 𝑔(𝑋) unknown, but 𝑓(𝑋) known 𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏 𝐸[𝑔(𝑋)] ≠ 𝑔(𝐸[𝑋]) except for linear case Variance Var(𝑋) = 𝐸[(𝑋 − 𝜇)2 ] = 𝐸[𝑋 2 ] − 𝜇 2 -always positive, 0 iff 𝑋 is constant Var(𝑎𝑋 + 𝑏) = 𝑎2 Var(𝑋) Covariance Cov(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] = 𝐸[𝑋𝑌] − 𝜇𝑋 𝜇𝑌 Cov(𝑋, 𝑌) = Cov(𝑌, 𝑋), Cov(𝑋, 𝑋) = Var(𝑋) ≥ 0 If 𝑋 ⊥ 𝑌, Cov(𝑋, 𝑌) = 0, but converse may be false Cov(𝑎𝑋 + 𝑏, 𝑌) = 𝑎Cov(𝑋, 𝑌) Cov(𝑋 + 𝑍, 𝑌) = Cov(𝑋, 𝑌) + Cov(𝑍, 𝑌) Variance and Covariance Variance of sum of RVs Var(𝑋 + 𝑌) = Var(𝑋) + Var(𝑌) + 2Cov(𝑋, 𝑌) If 𝑋 ⊥ 𝑌, Var(𝑋 + 𝑌) = Var(𝑋) + Var(𝑌) For 𝑋1 , … , 𝑋𝑛 , if 𝑋𝑖 ⊥ 𝑋𝑗 , for all 𝑖 ≠ 𝑗, Var(∑𝑛𝑖=1 𝑋𝑖 ) = ∑𝑛𝑖=1 Var(𝑋𝑖 ) Mutually independent Cov(𝑋,𝑌) Cov(𝑋,𝑌) Correlation: Corr(𝑋, 𝑌) = = Given an Operating Characteristic curve, want to know probability of acceptance/rejection Split into 4 regions: accept acceptable, reject nonacceptable (hit), reject acceptable (supplier’s risk, false alarm/positive, type I error), accept nonacceptable (inspector’s risk, miss, type II error) As number of products inspected go up, curve gets steeper, ideally a step function Poisson Number of event occurrences during a time period Example: arrival of customers We observe that on average λ customers come to the store in 1 hour Divide 1 hour into n intervals (e.g. 3600 seconds) On average, λ out of n intervals have customer arrive When n is large, no interval has more than 1 arrival in it Each interval has arrival with probability λ/n Number of arrivals in one hour: binomial distribution λ k λ n−k n n p(k) = (nk) ( ) (1 − ) λk n → ∞, p(k) ≈ e−λ k! Pois replaces bin when n large, p small λ λ μ = E[Bin (n, )] = λ, σ2 = λ (1 − ) ≈ λ n n Uniform If we want the range to be [α, β] consider Y = α + (β − α)X y−α CDF: X ≤ x ⇔ Y ≤ α + (β − α)X, Y ≤ y ⇔ x ≤ β−α FY (y) = FX ( y−α )= β−α d PDF: fY (y) = dy y−α , if y ∈ [α, β] β−α FY (y) = 1 β−α Mean: E[Y] = α + (β − α)E[X] = α+β 2 Variance: Var(Y) = (β − α)2 Var(X) = (β−α)2 12 Exponential has PDF f(x) = λe−λx, for x ≥ 0 x CDF F(x) = ∫0 λe−λy dy = −e−λy |0x = 1 − e−λx For a non-negative random variable X, with tail distribution F̅(x) ∞ E[X] = ∫ F̅(x)dx 0 ∞ 1 ∞ 1 E[X] = ∫ e−λx dx = − e−λx | = λ 0 λ −1 ≤ Corr(𝑋, 𝑌) ≤ 1, 0 correlation does not mean 0 1 independence Var(X) = 2 Markov’s Inequality 𝑋 is a nonnegative random variable, λ (x−μ)2 𝐸[𝑋] 1 − then for any 𝑎 > 0, 𝑃{𝑋 ≥ 𝑎} ≤ Normal PDF f(x) = e 2σ2 𝑎 √2πσ ∞ 𝑎 ∞ Proof: 𝐸[𝑋] = ∫0 𝑥𝑓(𝑥)𝑑𝑥 = ∫0 𝑥𝑓(𝑥)𝑑𝑥 + ∫𝑎 𝑥𝑓(𝑥)𝑑𝑥 If X ∼ 𝒩(μ, σ2 ), Y = a + bX, then Y ∼ 𝒩(a + μ, b2 σ2 ) √Var(𝑋)Var(𝑌) 𝑎 𝜎(𝑋)𝜎(𝑌) ∞ X−μ Z= , then Z ∼ 𝒩(0, 1), standard normal dist σ 0 𝑎 CDF: Φ(x) ∞ ∞ x−μ x−μ FX (x) = P{X ≤ x} = P {Z ≤ } = Φ( ) ∫ 𝑥𝑓(𝑥)𝑑𝑥 ≥ ∫ 𝑎𝑓(𝑥)𝑑𝑥 σ σ 𝑎 𝑎 Φ(−x) = P{Z ≤ −x} = P{Z ≥ x} = 1 − Φ(x) Chebyshev’s Inequality 𝑋 is a random variable with ̅ (zα ) = α zα : 1 − Φ(zα ) = Φ 𝜎2 mean 𝜇 and variance 𝜎 2 , then 𝑃{|𝑋 − 𝜇| ≥ 𝑘} ≤ 2 𝑘 The (1 − α)100-th percentile of Z Proof: Consider random variable (𝑋 − 𝜇)2, which is zα = Φ−1 (1 − α) nonnegative. Apply Markov’s inequality with 𝑎 = 𝑘 2 Sum of independent normal RVs is still normal 𝐸[(𝑋 − 𝜇)2 ] 𝜎 2 X1 ∼ 𝒩(μ1 , σ12 ), X2 ∼ 𝒩(μ2 , σ22 ), X1 ⊥ X2 𝑃{(𝑋 − 𝜇)2 ≥ 𝑘 2 } ≤ = 𝑘2 𝑘2 X1 + X2 ∼ 𝒩(μ1 + μ2 , σ12 + σ22 ) Weak Law of Large Numbers What about X1 − X2 ? Let 𝑋1 , 𝑋2 , … , be a sequence of independent and −X ∼ 𝒩(−μ , σ2 ) 2 2 2 identically distributed (i.i.d.) random variables, each X − X ∼ 𝒩(μ − μ , σ2 + σ2 ) 1 2 1 2 1 2 having mean 𝐸[𝑋𝑖 ] = 𝜇. Then for any 𝜀 > 0, Samples sample mean 𝑋̅ = (𝑋1 + ⋯ + 𝑋𝑛 )/𝑛 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 1 𝑃 {| − 𝜇| > 𝜀} → 0, as 𝑛 → ∞. 𝑋̅ = (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) 𝑛 𝑛 Average of i.i.d. RV’s converges to their mean 1 1 𝐸[𝑋̅] = 𝐸 [ (𝑋1 + ⋯ + 𝑋𝑛 )] = (𝐸[𝑋1 ]+. . +𝐸[𝑋𝑛 ]) = 𝜇 Proof: (assuming Var(X) = σ2 exists) 𝑛 𝑛 X1 + X 2 + ⋯ + X n 1 E[ ]=μ Var(𝑋̅) = Var ( (𝑋1 + ⋯ + 𝑋𝑛 )) 𝑛 n 2 1 𝜎2 X1 + X 2 + ⋯ + X n 1 σ = 2 (Var(𝑋1 ) + ⋯ + Var(𝑋1 )) = Var ( ) = 2 Var(X1 + ⋯ + Xn ) = 𝑛 𝑛 n n n 1 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 Sample variance 𝑆 2 = Apply Chebyshev’s inequality ∫ 𝑥𝑓(𝑥)𝑑𝑥 ≥ 0 ⇒ 𝐸[𝑋] ≥ ∫ 𝑥𝑓(𝑥)𝑑𝑥 Cdf = 𝐹(𝑎) = ∫−∞ 𝑓(𝑥)𝑑𝑥 Joint r.v. X +X +⋯+Xn σ2 joint CDF, 𝐹(𝑥, 𝑦) = 𝑃{𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦} P {| 1 2 − μ| > ε} ≤ 2 → 0, as n → ∞ n nε Discrete RVs, joint PMF, 𝑝(𝑥𝑖 , 𝑦𝑗 ) = 𝑃{𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 } Bernoulli μ = p, σ2 = p(1 − p) Marginal CDF 𝐹𝑋 (𝑥) = 𝐹(𝑥, ∞), i.e. 𝑌 can take any value Binomial sum of n iid Bernoulli r.v. Marginal PMF 𝑝𝑋 (𝑥) = ∑∞ 𝑗=1 𝑝(𝑥, 𝑦𝑗 ) P{X = k} = (nk)pk (1 − p)n−k, for k = 0,1, … , n Continuous joint PDF: 𝑃{(𝑋, 𝑌) ∈ 𝐶} = ∬(𝑥,𝑦)∈𝐶 𝑓(𝑥, 𝑦) ∑nk=0 p(k) = ∑nk=0(nk)pk (1 − p)n−k = (p + 1 − p)n = 1 𝑎 𝑏 μ = E[Y1 + Y2 + ⋯ ] = np Joint CDF: 𝐹(𝑎, 𝑏) = ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 ∞ σ2 = Var(Y1 + Y2. . . ) = np(1 − p) Marginal PDF 𝑓𝑋 (𝑥) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑦 Ex. Acceptance Sampling 𝑛−1 For an arbitrary distribution with mean 𝜇 and var 𝜎 2 For the sample mean 𝑋̅, 𝐸[𝑋̅], Var(𝑋̅) Approximate distribution when 𝑛 is large (Central Limit Theorem) For the sample variance 𝑆 2 (𝐸[𝑆 2 ]) For a normal distribution 𝒩(𝜇, 𝜎 2 ), Exact distribution of 𝑋̅ and 𝑆 2 Central Limit Theorem Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a sequence of i.i.d. random variables with mean 𝜇 and variance 𝜎 2 √𝑛(𝑋̅−𝜇) 2 then the distribution of √𝑛(𝑋̅ − 𝜇)⁄𝜎 converges to 𝑃{𝑆 2 > 12} = 𝑃 {14 𝑆 2 > 14 × 12} = 𝑃{𝜒14 > 18.57} = • 𝑃 {−𝑧𝛼⁄2 < < 𝑧𝛼⁄2 } = 1 − 𝛼 9 9 𝜎 standard normal as 𝑛 → ∞ 𝜎 𝜎 0.1779 ̅ ̅ 𝑃 {𝑋 − 𝑧 < 𝜇 < 𝑋 + 𝑧 } = 1 − 𝛼 ̅ ⁄ ⁄ 𝛼 2 𝛼 2 𝑃{√𝑛(𝑋 − 𝜇)⁄𝜎 ≤ 𝑥} → Φ(𝑥) ≈ 𝑃{𝑍 < 𝑥} 𝑍 √𝑛 √𝑛 𝒕-distribution If 𝑍 ⊥ 𝑋𝑛2 𝑇𝑛 = with 𝑛 deg of freedom 𝜎 𝜎 ̅ 2 √𝜒𝑛/𝑛 √𝑛(𝑋 − 𝜇)⁄𝜎 → 𝒩(0,1) (𝑥̅ − 𝑧𝛼⁄2 , 𝑥̅ + 𝑧𝛼⁄2 ) √𝑛 √𝑛 Generally works as long as 𝑛 ≥ 30 𝑋̅−𝜇 𝜎 Recall ∼ 𝑍 One sided upper: (𝑥̅ − 𝑧𝛼 , ∞) Questions: approximate distribution of the sum? 𝜎⁄√𝑛 √𝑛 2 𝜎 (𝑛 − 1)𝑆 2 ⁄𝜎 2 ∼ 𝜒𝑛−1 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ≈ 𝒩(𝑛𝜇, 𝑛𝜎 2 ) One sided lower: (−∞, 𝑥̅ + 𝑧𝛼 ) √𝑛 Question: approximate distribution of the sample mean? 𝑋̅ − 𝜇 √𝑛(𝑋̅ − 𝜇)⁄𝜎 Estimating Difference in Means = ∼ 𝑡𝑛−1 𝑋̅ ≈ 𝒩(𝜇, 𝜎 2 ⁄𝑛) √(𝑛 − 1)𝑆 2 ⁄𝜎 2 ⁄(𝑛 − 1) 𝑆⁄√𝑛 𝑋̅ − 𝑌̅ ∼ 𝒩(𝜇1 − 𝜇2 , 𝜎12 ⁄𝑛 + 𝜎22 ⁄𝑚) Ex The College of Engineering decided to admit 450 𝑋̅−𝑌̅ −(𝜇1 −𝜇2 ) Parameter Estimation a population has certain Normalize ∼ 𝒩(0,1) students, 0.3 probability that an admitted student will distribution with unknown parameter 𝜃 √𝜎12 ⁄𝑛+𝜎22 ⁄𝑚 actually enroll. What is the probability that the incoming Estimate 𝜃 based on 𝑋1 … 𝑋𝑛 𝜎2 𝜎2 𝜎2 𝜎2 class has more than 150 students? ̂ (𝑋1 , … , 𝑋𝑛 ) a function of sample, is a r.v. (𝑥̅ − 𝑦̅ − 𝑧𝛼⁄2 √ 𝑛1 + 𝑚2 , 𝑥̅ − 𝑦̅ + 𝑧𝛼⁄2 √ 𝑛1 + 𝑚2 ) is a 1 − 𝛼 Θ Solution:𝑋𝑖 ∼ Bernoulli(0.3), 𝑆 = 𝑋1 + 𝑋2 + ⋯ + Is also a point estimator 𝑋450 ~Binomial(0.3, 450) 𝑋̅−𝑌̅−(𝜇1 −𝜇2 ) Moment Generating Function 𝑃 {−𝑧𝛼⁄2 ≤ ≤ 𝑧𝛼⁄2 } = 1 − 𝛼 𝑃{𝑆 > 150} = 𝑃{𝑆 = 151} + 𝑃{𝑆 = 152} + ⋯ 2 2 𝜙(𝑡) = 𝐸[𝑒 𝑡𝑥 𝑝(𝑥)] = ∫ 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥 √𝜎1 +𝜎2 𝑛 𝑚 + 𝑃{𝑆 = 450} 𝑡ℎ 𝑡ℎ 𝑛 moment is 𝑛 derivative 𝜎2 𝜎2 𝑋1 + ⋯ + 𝑋450 is approximately normal with mean 𝑃 {𝑋̅ − 𝑌̅ − 𝑧𝛼⁄2 √ 1 + 2 ≤ 𝜇1 − 𝜇2 ≤ 𝑋̅ − 𝑌̅ + Method of Moments 𝑛 𝑚 450 × 0.3 = 135 and standard deviation Moments can be expressed as functions of parameters 2 2 √450 × 0.3 × 0.7 = √94.5 𝑚] 𝜎 𝜎 𝐸[𝑋 = 𝑔𝑘 (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) 𝑧𝛼⁄2 √ 1 + 2 } = 1 − 𝛼 𝑚 Approximate a discrete r.v. (e.g. 𝑆 = 𝑋1 + ⋯ + 𝑋450) 𝑚 𝑛 𝑚 𝑚 = 𝑋1 +⋯+𝑋𝑛 ̅̅̅̅ If 𝑘 unknown parameters, 𝑋 with a continuous r.v. (e.g. 𝑌 ∼ 𝒩(135,94.5)) 𝑛 𝑔1 (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) = ̅𝑋̅̅̅1 Question: 𝑃{𝑆 = 𝑘} ≈ 𝑃{𝑌 ∈ ? } C.I. for the difference in the means of two normal 𝑃{𝑆 = 𝑘} ≈ 𝑃{𝑘 − 0.5 ≤ 𝑌 ≤ 𝑘 + 0.5} 𝑔2 (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) = ̅̅ 𝑋̅̅2 population when the variances are known Continuity correction ⋮ When unknown variances, assume to be equal (𝑛−1)𝑆12 +(𝑚−1)𝑆22 𝑃{𝑋1 + ⋯ 𝑋450 > 150} = ∑∞ 𝑋̅̅𝑘 𝑘=151 𝑃{𝑋1 + ⋯ + 𝑋450 = 𝑘} ≈ {𝑔𝑘 (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) = ̅̅ Use Pooled Sample Variance 𝑆𝑝2 = (𝑛−1)+(𝑚−1) ∑∞ Maximum Likelihood Estimators 𝑘=151 𝑃{𝑘 − 0.5 ≤ 𝑌 ≤ 𝑘 + 0.5} = 𝑃{𝑌 ≥ 150.5} 𝑋̅−𝑌̅−(𝜇1 −𝜇2 ) 150.5−135 150.5−135 ∼ 𝑡𝑛+𝑚−2 𝑃{𝑌 ≥ 150.5} = 𝑃 {𝑍 ≥ } = 1 − Φ( ) ≈ Given parameter 𝜃, for each observation 𝑥1 , 𝑥2 , … , 𝑥𝑛 , we 2 √94.5 √94.5 √𝑆𝑝 (1⁄𝑛+1⁄𝑚) can calculate the joint density function 1 − Φ(1.59) ≈ 0.06 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃) 1 1 Continuity Correction When using a continuous function (𝑥̅ − 𝑦̅ − 𝑡𝛼⁄2,𝑛+𝑚−2 𝑠𝑝 √𝑛 + 𝑚 , 𝑥̅ − 𝑦̅ + 𝑥1 , 𝑥2 , … , 𝑥𝑛 are independent to approximate a discrete one, ex. Normal approx.. Bin. 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃) = 𝑓𝑋1 (𝑥1 ; 𝜃)𝑓𝑋2 (𝑥2 ; 𝜃) ⋯ 𝑓𝑋𝑛 (𝑥𝑛 ; 𝜃) 1 1 If P(X=n) use P(n – 0.5 < X < n + 0.5) 𝑡𝛼⁄2,𝑛+𝑚−2 𝑠𝑝 √ + ) is a 1 − 𝛼 C.I. for the difference in Given observation 𝑥1 , 𝑥2 , … , 𝑥𝑛 , for each 𝜃, define the 𝑛 𝑚 If P(X>n) use P(X > n + 0.5) likelihood 𝐿(𝜃) = 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃) If P(X≤n) use P(X < n + 0.5) the means of two normal population when the variances Maximum likelihood Estimator MLE maximizes 𝐿(Θ) If P (X<n) use P(X < n – 0.5) are unknown but equal For MLE use the pdf of the distribution, using 𝜃 to If P(X ≥ n) use P(X > n – 0.5) Approximate Confidence Interval replace as needed. Multiply together all 𝑛 samples in this If population not normal but relatively large sample size, Sample Variance 1 new pdf. use results for normal population mean with known ∑𝑛 (𝑋 − 𝑋̅)2 𝑆2 = 𝑛−1 𝑖=1 𝑖 It’s easier to maximize log 𝐿(𝜃) ≐ 𝑙(𝜃) variance to obtain approximate 1 − 𝛼 c.i. 𝑛 (𝑋 𝑛 2 2 2 ∑𝑖=1 𝑖 − 𝑋̅) = ∑𝑖=1 𝑋𝑖 − 𝑛𝑋̅ MLE of exponential dist is the sample mean 𝑠 𝑠 𝑛 𝑛 2 𝑛 2 2 2 ̅ ̅ (𝑥̅ − 𝑧𝛼⁄2 , 𝑥̅ + 𝑧𝛼⁄2 ) 𝐸[∑𝑖=1(𝑋𝑖 − 𝑋) ] = 𝐸[∑𝑖=1 𝑋𝑖 − 𝑛𝑋 ] = 𝐸[∑𝑖=1 𝑋𝑖 ] − Example: Bernoulli distribution √𝑛 √𝑛 2 2 2 𝐸[𝑛𝑋̅ ] = 𝑛𝐸[𝑋1 ] − 𝑛𝐸[𝑋̅ ] Ex. Monte Carlo Simul. 𝑃{𝑋 = 1} = 𝜃 = 𝜃 1 𝑏 Var(𝑋) = 𝐸[𝑋 2 ] − 𝐸[𝑋]2 ⇒ 𝐸[𝑋 2 ] = Var(𝑋) + 𝐸[𝑋]2 𝑃{𝑋 = 0} = 1 − 𝜃 = (1 − 𝜃)1−0 Evaluate a definite integral 𝜃 = ∫𝑎 𝑔(𝑥)𝑑𝑥 𝐸[𝑋12 ] = Var(𝑋1 ) + 𝐸[𝑋1 ]2 = 𝜎 2 + 𝜇 2 𝑝𝑋 (𝑥; 𝜃) = 𝜃 𝑥 (1 − 𝜃)1−𝑥 Consider an uniform random variable 𝑈 in [𝑎, 𝑏] 𝐸[𝑋̅ 2 ] = Var(𝑋̅) + 𝐸[𝑋̅]2 = 𝜎 2 ⁄𝑛 + 𝜇 2 log 𝑝𝑋 (𝑥; 𝜃) = 𝑥 log 𝜃 + (1 − 𝑥) log(1 − 𝜃) Law of lazy statistician 𝑛 (𝑋 2] 2 2 2 2 2 𝑛 𝑛 ̅ ) (𝑛 𝐸[∑𝑖=1 𝑖 − 𝑋 = 𝑛𝜎 + 𝑛𝜇 − 𝜎 − 𝑛𝜇 = − 1)𝜎 𝑙(𝜃) = ∑𝑖=1 𝑥𝑖 log 𝜃 + (𝑛 − ∑𝑖=1 𝑥𝑖 ) log(1 − 𝜃) 𝑏 𝑏 1 𝜃 2] 2 𝑛 𝑛 𝐸[𝑔(𝑈)] = ∫𝑎 𝑔(𝑥)𝑓𝑈 (𝑢)𝑑𝑢 = ∫ 𝑔(𝑢)𝑑𝑢 = 𝑏−𝑎 ∑ 𝑑 𝑥 𝑛−∑𝑖=1 𝑥𝑖 𝐸[𝑆 = 𝜎 , so unbiased 𝑏−𝑎 𝑎 𝑙(𝜃) = 𝑖=1 𝑖 − 𝑑𝜃 𝜃 1−𝜃 Estimate 𝜃 Sampling from a Normal Population (1 − 𝜃) ∑𝑛𝑖=1 𝑥𝑖 = 𝑛𝜃 − 𝜃 ∑𝑛𝑖=1 𝑥𝑖 Simulate uniform random variables 𝑈1 , … , 𝑈𝑛 𝑋̅ ∼ 𝒩(𝜇, 𝜎 2 ⁄𝑛) 𝑛 ∑ 𝑥 𝑛 ∑𝑛 𝑔(𝑈 ) Chi-square distribution 𝜒𝑛2 = 𝑍12 + 𝑍22 + ⋯ + 𝑍𝑛2 with 𝑛 𝜃̂(𝑋1 , … , 𝑋𝑛 ) = 𝑋̅ = 𝑖=1 𝑖 = 1 𝜃̂ = (𝑏 − 𝑎) 𝑖=1 𝑖 𝑛 𝑛 𝑛 degrees of freedom, sum of indep chi-squar is chi-square Normal: 𝜇̂ (𝑋 , … , 𝑋 ) = 𝑋̅ 1 𝑛 Let 𝜃̂ and 𝑠 be the observed sample mean and standard 1 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 𝑆2 = deviation of the 𝑔(𝑢𝑖 )’s 𝑛−1 𝜎̂(𝑋1 , … , 𝑋𝑛 ) = √∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 ⁄𝑛 ,𝜎̂ 2 = (𝑛 − 1)𝑆 2 /𝑛 2 𝑠 𝑠 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 = ∑𝑛𝑖=1((𝑋𝑖 − 𝜇) − (𝑋̅ − 𝜇)) Approximate C.I. (𝜃̂ − 𝑧𝛼⁄2 , 𝜃̂ + 𝑧𝛼⁄2 ) √𝑛 √𝑛 Uniform on [0,Θ], 𝜃̂(𝑋1 , … , 𝑋𝑛 ) = max{𝑋1 , … , 𝑋𝑛 } 𝑌𝑖 = 𝑋𝑖 − 𝜇, 𝑌̅ = 𝑋̅ − 𝜇 2 Poisson: 𝜆̂(𝑋1 , … , 𝑋𝑛 ) = 𝑋̅ ∑𝑛𝑖=1((𝑋𝑖 − 𝜇) − (𝑋̅ − 𝜇)) = ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̅)2 = ∑𝑛𝑖=1 𝑌𝑖2 − Point Estimators 𝑛𝑌̅ 2 Biased/unbiased and precise/imprecise 𝑛 (𝑋 𝑛 2 2 2 ∑𝑖=1 𝑖 − 𝑋̅) = ∑𝑖=1(𝑋𝑖 − 𝜇)𝑖 − 𝑛(𝑋̅ − 𝜇) Bias of estimator 𝜃̂(𝑋1 , … , 𝑋𝑛 ) for parameter 𝜃 Rearrange the terms ̂) = 𝐸[𝜃̂(𝑋1 , … , 𝑋𝑛 )] − 𝜃 𝑏(𝜃 2 𝑛 𝑛 2 2 ∑𝑖=1(𝑋𝑖 − 𝜇) = ∑𝑖=1(𝑋𝑖 − 𝑋̅) + 𝑛(𝑋̅ − 𝜇) Confidence Interval Decomposition of the sum of squares 2-sided, 1-sided upper/lower 2 Divide by 𝜎 Ex. normal 2-sided 95% confidence 2 𝑛 2 2 (𝑋 −𝜇) ∑ (𝑋 −𝑋̅) (𝑋̅−𝜇) ∑𝑛𝑖=1 𝑖 2 = 𝑖=1 2𝑖 + 𝜎2 √𝑛(𝑋̅ − 𝜇) 𝜎 𝜎 𝑃 {−1.96 < < 1.96} = 2Φ(1.96) − 1 = 0.95 𝑛 𝜎 2 2 =𝑋𝑛 =? ? ? +𝑋1 𝜎 𝜎 𝑋 −𝜇 ⇔ 𝑃 {−1.96 < 𝑋̅ − 𝜇 < 1.96 } = 0.95 𝑍𝑖 = 𝑖 , 𝑍1 , 𝑍2 , … , 𝑍𝑛 are independent standard normal 𝜎 𝑛 𝑛 √ √ (𝑋 −𝜇)2 𝜎 𝜎 ∑𝑛𝑖=1 𝑖 2 is chi-square with 𝑛 degrees of freedom ̅ ⇔ 𝑃 {−1.96 < 𝜇 − 𝑋 < 1.96 } = 0.95 𝜎 √𝑛 √𝑛 If sampling from normal population, then 𝑋̅ ⊥ 𝑆 2 , 𝜎 𝜎 sample mean and variance are indep r.v. 𝑃 {𝑋̅ − 1.96 < 𝜇 < 𝑋̅ + 1.96 } = 0.95 √𝑛 √𝑛 𝑋̅ ∼ 𝒩(𝜇, 𝜎 2 ⁄𝑛) If 𝜎 is given, and we have observed sample mean 𝑥̅ 2 (𝑛 − 1)𝑆 2 ⁄𝜎 2 ∼ 𝜒𝑛−1 𝜎 𝜎 (𝑥̅ − 1.96 , 𝑥̅ + 1.96 ) is a 95% confidence interval (𝑛 − 1)𝑆 2 ⁄𝜎 2 = ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 ⁄𝜎 2 √𝑛 √𝑛 CPU time for a type of jobs is normally distributed with estimate of 𝜇 mean 20 seconds and standard deviation 3 seconds Once observed, 𝑥̅ is a constant, and the interval is fixed. 15 such jobs are tested, what is the probability that the 95% is not a probability sample variance exceeds 12 Two-sided Confidence Interval 1 − 𝛼 for normal mean 2 Solution(15 − 1)𝑆 2 ⁄32 ∼ 𝜒14 with known variance:

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download =sample mean Linear Transformation Constants a and b, if …then