Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 312, 2011 Winter, W.L.Ruzzo 7. continuous random variables continuous random variables Discrete random variable: takes values in a finite or countable set, e.g. X ∈ {1,2, ..., 6} with equal probability X is positive integer i with probability 2-i Continuous random variable: takes values in an uncountable set, e.g. X is the weight of a random person (a real number) X is a randomly selected point inside a unit square X is the waiting time until the next packet arrives at the server 2 pdf and cdf f(x) : the probability density function (or simply “density”) f(x) a F(a) = ∫−∞ f(x) dx a b P(X < a) = F(x): the cumulative distribution function (or simply “distribution”) P(a < X < b) = F(b) - F(a) +∞ Need ∫-∞ f(x) dx (= F(+∞)) = 1 A key relationship: d F(x), since F(a) = ∫a f(x) dx, f(x) = dx −∞ 3 densities Densities are not probabilities; e.g. may be > 1 P(x = a) = P(a ≤ X ≤ a) = F(a)-F(a) = 0 P(a ≤ X ≤ a+ε) = F(a+ε) - F(a) ≈ ε • f(a) I.e., the probability that a continuous random variable falls at a specified point is zero The probability that it falls near that point is proportional to the density 4 sums and integrals; expectation 5 example 6 properties of expectation still true, just as for discrete just as for discrete, but w/integral (e.g., see slides 25-27) ^ 7 variance 8 example 9 continuous random variables: summary Continuous random variable X has density f(x), and uniform random variable X ~ Uni(α,β) is uniform in [α,β] 1.0 0.5 0.0 f(x) 1.5 2.0 The Uniform Density Function Uni(0.5,1.0) 0.0 0.5 α 1.0 x β 1.5 uniform random variable 2.0 The Uniform Density Function Uni(0.5,1.0) 1.0 0.0 0.5 f(x) 1.5 X ~ Uni(α,β) is uniform in [α,β] 0.0 0.5 1.0 x if α≤a≤b≤β: 1.5 uniform random variable: example X ~ Uni(α,β) is uniform in [α,β] You want to read a disk sector from a 7200rpm disk drive. Let T be the time you wait, in milliseconds, after the disk head is positioned over the correct track, until the desired sector rotates under the head. T ~ Uni(0, 8.33) 13 exponential random variable X ~ Exp(λ) 2.0 The Exponential Density Function 1.0 0.5 !=1 0.0 f(x) 1.5 !=2 -1 0 1 2 x 3 4 exponential random variable X ~ Exp(λ) =1-F(t) Memorylessness: Examples Radioactive decay: How long until the next alpha particle? Customers: how long until the next customer/packet arrives at the checkout stand/server? Buses: How long until the next northbound #71 bus arrives on the Ave? Yes, they have a schedule, but given the vagaries of traffic, riders with-bikes-and-baby-carriages, etc., can they stick to it? Gambler’s fallacy: “I’m due for a win” Relation to the Poisson: same process, different measures. Poisson says how many events in a fixed time Exponential says how long until the next event 16 normal random variable X is a normal (aka Gaussian) random variable X ~ N(μ, σ2) 0.5 The Standard Normal Density Function 0.3 0.1 0.2 !=1 0.0 f(x) 0.4 µ=0 -3 -2 -1 0 x 1 2 3 0.5 0.5 changing μ, σ 0.4 0.3 0.3 0.4 µ=0 µ=0 0.2 0 0.1 !=2 0.0 0.0 0.1 0.2 !=1 10 µ=4 -10 -5 0 5 10 0 0.3 µ=4 0.2 0.1 !=2 0.0 0.0 0.1 0.2 !=1 0 0.3 0.4 0 5 0.5 0 0.4 -5 0.5 -10 -10 -5 0 5 10 -10 density at μ is ≈ .399/σ -5 0 5 10 18 normal random variable X is a normal random variable X ~ N(μ,σ2) Y = aX + b E[Y] = E[aX+b] = aμ + b Var[Y] = Var[aX+b] = a2σ2 Y ~ N(aμ + b, a2σ2) Important special case: Z = (X-μ)/σ ~ N(0,1) Z ~ N(0,1) “standard (or unit) normal” Use Φ(z) to denote CDF, i.e. no closed form Φ(.46) 0.5 The Standard Normal Density Function 0.3 0.1 0.2 !=1 0.0 f(x) 0.4 µ=0 -3 -2 -1 0 1 2 3 x 20 0.5 The Standard Normal Density Function 0.3 0.1 0.2 !=1 0.0 f(x) 0.4 µ=0 -3 -2 -1 0 1 2 3 x If Z ~N(µ,σ) what is P( µ-σ < Z < µ+σ )? P( µ - σ < Z < µ + σ ) = Φ(1) - Φ(-1) ≈ 68% P( µ - 2σ < Z < µ + 2σ ) = Φ(2) - Φ(-2) ≈ 95% P( µ - 3σ < Z < µ + 3σ ) = Φ(3) - Φ(-3) ≈ 99% 21 normal approximation to binomial X ~ Bin(n,p) E[X] = np Var[X] = np(1-p) Poisson approx. good for n large, p small (np constant) For large n: X ≈ Y ~ N(E[X],Var[X]) = N(np,np(1-p)) (p stays fixed) Normal approximation good when np(1-p) ≥ 10 DeMoivre-Laplace Theorem: Sn = number of success in n trials (with prob. p), as n→∞ normal approximation to binomial Ross Ch5 ex 4f Fair coin flipped 40 times. Probability of 20 heads? Exact answer: 24 distribution of g(X) -0.2 0.6 1.0 Ross 5.7 ex 7a FX -0.2 0.2 CDF fY 0.6 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 1.0 2.0 3.0 x 0.0 density I.e., X has half its probability density left of .5, but Y has that squished left of 1/4. Similarly, X has 0.1 prob left of 1/10, but Y pushes that left of 0.01. Generalizing, for any constant 0 < y < 1: 0.2 CDF fX 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 P(Y < 1/4) = P(X2 < 1/4) = P(X < sqrt(1/4)) “The square of a fraction is more = P(X < 1/2) likely to be near 0 than 1” = 1/2 density Suppose X ~Unif(0,1). What is the distribution of Y = X2? FY -0.2 0.2 0.6 P(Y < y) = P(X < sqrt(y)) = sqrt(y) -0.2 0.2 0.6 1.0 y y which is the CDF FY of Y, and the PDF of Y is the derivative of that: fY(y) = 1/(2 sqrt(y)) for 0 < y < 1. 25 1.0 distribution of g(X) Ross 5.7 ex 7a Suppose X ~Unif(0,1). What is the distribution of Y = Xn? Generalizing again, for 0 ≤ y ≤1, the CDF is found as follows: And taking the derivative, the density function is: 26 distribution of g(X) 27 the central limit theorem (CLT) Consider i.i.d. (independent, identically distributed) random vars X1, X2, X3, … Xi has μ = E[Xi] and σ2 = Var[Xi] As n → ∞, Restated: As n → ∞, Credit: Annie Leibovitz, © 1987 ? How tall are you? Why? Willie Shoemaker & Wilt Chamberlain 29 in the real world… Why might that be true? Frequency Human height is approximately normal. R.A. Fisher (1918) noted it would follow from CLT if height Male Height in Inches were the sum of many independent random effects, e.g. many genetic factors (plus some environmental ones like diet). I.e., suggested part of mechanism by looking at shape of the curve. 30 Meta-analysis of Dense Genecentric Association Studies Reveals Common and Uncommon Variants Associated with Height, Lanktree, et al. The American Journal of Human Genetics 88, 6–18, January 7, 2011 Sixty-Four (and hundreds more probably exist) 31 in the real world… 32 in the real world… 33 in the real world… 34 in the real world… 35