Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
5.0 Lesson Plan • Answer Questions 1 • Expectation and Variance • Densities and Cumulative Distribution Functions • The Exponential Distribution • The Normal Approximation to the Binomial 5.1 Expectation and Variance The expected value of a random variable is the average value of an infinite number of draws. It is often denoted by µ. Or, in math, 2 IE[X] ≡ µ ≡ X all possible xp(x). x This is a weighted average of the possible values of the random variable, where the weights are the probabilities that a value will occur. For example, the expected value of a fair die is 3.5. The expected value of a function of a random variable, say h(X), is the long-run average value of repeated evaluations of that function. Or IE[h(X)] = X all possible h(x)p(x). x For example, the expected value of X 2 for a fair die is 3 6 X i=1 i2 ∗ 1 = 15.1667 6 Some functions are especially interesting. For example, the linear function aX + b has the useful property that IE[aX + b] = aIE[X] + b. The variance of a random variable is the expected value of h(X) = (X − µ)2 where µ is the expected value of X. The variance is often written as σ 2 , and its units are the squares of the original units. It provides a measure of how spread out the sample is, since it is the average squared distance of an observation from µ. A little algebra shows: 4 Var[X] = IE[(X − µ)2 ] X (x − µ)2 p(x) = all = x X all x (x2 − 2xµ + µ2 )p(x) = IE[X 2 ] − 2µIE[X] + µ2 = IE[X 2 ] − µ2 To measure how spread out a distribution is, we mostly use the standard deviation (or σ). This is the square root of the variance. The variance of the result of a roll of a fair die is IE[X 2 ] − IE[X]2 = √ 2 15.1667 − (3.5) = 2.9167. Its standard deviation is 2.9167 = 1.7078. 5 One can show that the mean, variance and standard deviation of the p Bin(n, p) distribution are np, np(1 − p) and np(1 − p), respectively. For the Pois(λ) distribution, the mean is λ, the variance is λ, and the √ standard deviation is λ. We won’t use this, but for the hypergeometric distribution, the mean is nM/N and the variance is M N − n nM ∗ ∗ 1− . σ2 = N −1 N N 5.2 Densities and CDFs In the previous lecture, we described discrete probability distributions, such as the binomial, Poisson, and hypergeometric. Discrete distributions have positive probability on finite sets (e.g., the binomial) or countably infinite sets (e.g., the Poisson). 6 Note: A countably infinite set is one whose elements can be placed in one-to-one correspondence with the integers. The even numbers are countably infinite, but the set of numbers between 0 and 1 is uncountably infinite. Note: Mixed distributions combine features of both discrete and continuous distributions. For example, the lifetime of a possibly defective light bulb. For continuous distributions, specific values have probability zero. One can only assign probabilities to intervals. For example, there is a positive probability of tossing exactly five heads in 10 flips. But there is zero probability of finding someone who is exactly five feet tall, when height is measured to an infinite number of decimal places. 7 To define probabilities for intervals, we use a density function. A density function is any function f (x) such that • f (x) is non-negative • f (x) integrates to 1. Then IP[a ≤ X ≤ b] = Z b f (x) dx. a This definition of the density function ensures the Kolmogorov axioms: (1) all probabilities are between 0 and 1, inclusive, (2) IP[−∞ < X < ∞] = 1, and (3) if A and B are disjoint intervals, then IP[X ∈ A or X ∈ B] = IP[X ∈ A] + IP[X ∈ B]. 8 It turns out to be useful to define the cumulative distribution function (or cdf) as Z x F (x) = IP[X ≤ x] = f (y) dy. −∞ Then IP[a ≤ X ≤ b] = Z a b f (x) dx = F (b) − F (a). The expected value of a continuous random variable is Z ∞ IE[X] ≡ µ ≡ xf (x) dx. −∞ Clearly, this is similar to the definition of expected value for a discrete random variable: X xp(x). IE[X] ≡ µ ≡ all x 9 Analogously, the expected value of a function h(X) of a continuous random variable is Z ∞ IE[h(X)] ≡ h(x)f (x) dx. −∞ As before, IE[aX + b] = aIE[X] + b. And Var [X] = IE[X 2 ] − (IE[X])2 . 5.3 The Exponential Distribution The exponential distribution is used to model the “memoryless” processes (among many other things). Arguably accurate approximations include 10 • the wait-time between successive phone calls; • the lifespan of a vacuum tube; • the distance on a chromosome between mutations; • monthly maximum rainfall. The exponential distribution is ubiquitous in queueing theory, where it is used to describe the time required to service a single customer. The density function for an exponential random variable with parameter λ is: 0 x<0 f (x) = λ exp(−λx) x ≥ 0 11 The cdf is F (x) = 0 x<0 1 − exp(−λx) x ≥ 0 The parameter λ is called the rate. The mean of an exponential distribution is 1/λ and the standard deviation is also 1/λ. Integration by parts enables one to find the mean of the exponential distribution. Recall that Z b Z b u(x) v ′ (x) dx = u(x)v(x) |ba − v(x)u′ (x) dx. a a For 12 IE[X] = Z ∞ xf (x) dx = 0 Z ∞ xλ exp(−λx) dx 0 set u(x) = x and v ′ (x) = λ exp(−λx). Then Z IE[X] = −x exp(−λx)|∞ 0 + = −(1/λ) exp(−λx)|∞ 0 = 1/λ. ∞ exp(−λx) dx 0 Example: On average, a manufacturing plant experiences 5.4 shutdowns per year (due to power failure, mechanical breakdown, hurricane, or other incident). What is the probability that there are no shutdowns in the first quarter? The yearly rate is λ = 5.4. We want to find the probability that X > 1/4, where years are the time unit. IP[ first shutdown is after March ] = IP[X > 1/4] 13 = 1 − F (1/4) = 1 − (1 − exp(−5.4 ∗ 0.25)) = exp(−1.35) = 0.2592. We could have said that the average time between shutdowns is 1/5.4 = 0.185 years (or 67.59 days). This gives the mean of the X values, which is 1/λ. 5.4 Normal Approximation to the Binomial A perfect normal distribution describes data that can take any possible value—negatives, fractions, irrationals, etc. But often data can only take non-negative integer values. 14 In a class of ten students, each tosses a fair coin to decide whether to attend class. So class attendance is a random variable that has the Bin(10, 0.5) distribution. Its mean is np = 5 and the standard standard p deviation is n ∗ p ∗ (1 − p) = 1.581. We can use the normal distribution to estimate the approximate probability that, say, 3 or fewer students will attend tomorrow’s lecture. But because only integers are possible, we can improve the accuracy of the normal approximation by using the continuity correction. We approximate the binomial by a normal distribution with the same mean and standard deviation. 15 The bad approximation uses the z-transformation z = (3 − 5)/1.581 = −1.265, and finds the area under the N(0,1) curve that lies below -1.265 as 0.1020. The good way handles the area between 3 and 4 appropriately, to take account of the fact that the histogram bar is centered at 3 and we want to include the area up to 3.5 We use the z-transformation z = (3.5 − 5)/1.581 = −0.949, and find the probability as 0.1711. 16 The normal approximation to the binomial is helpful when n is very large. For example, suppose we wanted to find the probability that more than 20,000 of the 228,330 residents of Durham are unemployed, when the unemployment rate in NC is 10.1%. To use the binomial, we would have to calculate ! 20,000 X 228, 330 (0.101)x (1 − 0.101)228,300−x . x=0 x This is intractable, but the normal approximation is not. The normal approximation is accurate when np > 10 and n(1 − p) > 10. So our toy example with student attendance was bogus, but it gave a histogram that was small enough that we could visualize the role of the continuity correction. 17 By the way, how do we know that the mean of the binomial is np? X xp(x) IE[X] = all = x n X x x x=0 = = n X x=0 n X x=0 n x ! px (1 − p)n−x n! px (1 − p)n−x x!(n − x)! n! px (1 − p)n−x . (x − 1)!(n − x)! Make the change of variable y = x − 1 to get IE[X] = np n−1 X y=0 = np n−1 X y=0 (n − 1)! py (1 − p)n−1−y y!(n − 1 − y)! ! n−1 y py (1 − p)n−1−y . 18 We recognize the summation as the sum of binomial probabilities for a Bin(n − 1, p) distribution. The sum of all possible probabilities is always 1, so the mean is np. A similar argument works for IE[X 2 ], from which one can verify that the variance is np(1 − p). (Recall that the variance is IE[X 2 ] − (IE[X])2 .) Note: One can also use the normal to approximate the Poisson.