Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Class structure Homework due in class on Fridays (15-20% of grade) STAT/MATH 395 Midterm Fri May 8 (30% of grade) Final TBA (35-40% of grade) Peter Guttorp & June Morita [email protected] Class participation (15% of grade) [email protected] A 2.0 usually will require 60% of possible points A numbers game Chapter 4 Expected Values Expected value Variance A popular numbers game is DJ, where the winning ticket is determined from Dow Jones averages. Three sets of stocks are used: Industrials, Transportation, and Utilities, and two quotes, at 11 am and noon, Eastern time. Covariance Law of large numbers 11 am I T U Noon 16410.75 1638.96 7509.34 7478.11 525.66 525.90 546 + 610 = 1 156 In this example, the winning number is 156. The payoff is 700 to 1. Suppose we bet $5. How much do we win or lose, on average? Numbers game, cont. Expected value Let p = probability my number wins = Let X = my earnings. In the long run, I will win $ 3500 the fraction 1/1000 of the time, and lose $ 5 the fraction 999/1000 of the time. The balance is Let X be a random variable with pmf pX(x) or pdf fX(x). The expected value of X, denoted E(X), is defined by E(X) = ∞ ∑ kp X (k) k=−∞ in the discrete case, and by ∞ E(X) = ∫ x fX (x)dx −∞ in the continuous case, provided the sum or integral is absolutely convergent. In physics, the expected value is called the center of mass. An exam problem The Poisson distribution Recall that the Poisson distribution has pmf λx −λ p(x) = e x! Here the parameter λ commonly describes the average number of events per unit time. To see that this is indeed the expected value we compute Note that the expected value need not be a possible value! A true-false question on a statistics exam has ten parts, each worth 2 points. Any incorrect answer is penalized by -3 points, although a negative total is recorded as 0. If a student guesses, what is the expected score for the part? Clearly, guessing should be avoided unless one is fairly certain to know the answer. What is the smallest probability that does not yield a negative expected score to a part? Lotka’s family size model What proportion of people are firstborn? The number of children of white families in the 1920’s was described by #βp(1− p)k ,k = 1,2,... p(k) = $ % 1− β(1− p),k = 0 a geometric distribution with modified zero term. Here p and β are probabilities. Find the expected number of children. For β=0.879, p=0.264, we get expected # children 2.45. Some properties of expected values Law of the unconscious probabilist E(X+a) = E(X) + a E(g(X)) = ∑ g(x)p X (x) x Proof: Suppose that g(x) takes on values y1, y2, …Then Pg(X) (yi ) = P(g(X) = yi ) = and ∞ ∑ p X (x) {x:g(x)=yi } ∞ E(g(X)) = ∑ y ip g(X) (yi ) = ∑ y i ∑ p X (x) i=1 {x:g(x)=yi } i=1 E(bX) = b E(X) ∞ = ∑ ∑ g(x)p X (x) = ∑ g(x)pX (x) i=1{x:g(x)=yi } x The addition rule for expectations n The binomial distribution Write X = ∑ Yi , where the Yi are indicators i=1 of success, i.e., n n i=1 i=1 E(X) = ∑ E(Yi ) = ∑ {0 × P(Yi = 0) + 1× P(Yi ) = 1} = np What is the expected value of the hypergeometric distribution? Group testing, cont. Group testing Let N be the total number of tests needed, m or N = ∑ Xi . We compute i=1 1.0 E(N) = 0.8 More generally, group the n samples into m groups of k samples each (so n=mk), test each of the groups separately. If the test is negative, we are done, while if it is positive, each individual in the group is tested. Let p = P(individual test negative), and Xi = # tests on group i. Then E(Xi) = 0.6 A large number n of blood samples are to be screened for HIV. Testing each sample separately requires n tests. Pooling half of each sample requires one test if all samples are free from HIV, while if at least one is defective, the other half of each sample could be tested individually. Most of the time we may get away with doing just one test. 1 + 1/k - p^k A special case: E(aX+b) = aE(X) + b Using the addition rule for expectations 0.4 NOTE: No assumption of independence. This result holds whenever the expectations exist. ⎪⎧ 1if the i' th event occurs Yi = ⎨ 0 otherwise ⎪⎩ 0.2 E(X+Y) = E(X) + E(Y) 5 10 k 15 20 Computation Measuring spread The expected value is one measure of the location of the distribution of a random variable. In addition to the location it is useful to have a description of how spread out a distribution is. -2 -1 0 1 0.1 0.3 dnorm(x) 0.1 0.3 dnorm(x) different location 0 1 2 1 2 3 2 The variance is defined whenever E(X2) < ∞ Var(aX + b) = same location -2 -1 0 x 1 different spread 2 x Var(X)=E(X-E(X))2 moment of inertia in mechanics units: those of X squared units of X sd(X) = Var(X) Some examples The Bienaymé-Chebyshev inequality Exponential distribution: f(x) = α exp ( - α x), x>0 ∞ 1 E(X) = ∫ xα exp(−αx)dx = α 0 2 ∞ 2 E(X ) = ∫ x α exp(−αx)dx = Consider a random variable X with expected value µ and variance σ2. Then for any t > 0 we have that2 2 α2 2 1 1 Var(X) = E(X 2 ) − [E(X)]2 = − = α2 α 2 α 2 Poisson distribution: p(x) = λx e-λ / x! 0 P( X − µ > t) ≤ P(| X − µ |> t) = ∞ λx x=0 x! E(X(X − 1)) = ∑ x(x − 1) e −λ = λ2 ∑ E(X 2 ) = E(X(X − 1)) + E(X) = λ2 + λ Var(X) = λ2 + λ − λ2 = λ ∞ λx −2 x = 2 (x − 2)! σ t2 Proof: E(X) = λ 2 2 4 0.0 0.4 0.8 dnorm(x) dnorm(x, sd = 0.5) 0 = E(X 2 ) − [E(X) ] x+2 0.1 0.3 -1 = E(X 2 ) − 2E(X)E(X) + [E(X) ] same spread 2 x -2 Var(X) = E((X − E(X))2 ) = E(X 2 ) − 2E(XE(X)) + [E(X) ] ∫ f(x)dx {x:|x − µ|> t} e −λ = λ2 ≤ ∫ (x − µ) 2 {x:| x −µ|> t} t2 Irené-Jules Bienaymé 1796-1878 2 ∞ (x − µ) 2 σ f(x)dx = 2 2 t t −∞ f(x)dx ≤ ∫ Pafnuty Chebyshev 1821-1894 How good is the Bienayme-Chebyshev inequality? Higher moments Let µk = E(Xk) and ck=E(X-µ)k In particular : Let X ~ Exp (1). How well does the inequality estimate P(|X - 1|>2)? We have E(X)=Var(X)=1, so the inequality says that P(|X – 1| > 2) ≤ µ1 = c2 = c3 is called the skewness, and c4 the kurtosis. while the exact probability is Sometimes the coefficien of skewness is defined as P(|X - 1| > 2 ) = c3/c23/2, and the coefficient of kurtosis as c4/c22 Example Joint distribution Let fX(x)=λexp(-λx), x>0. Then k E(X ) = ∫ ∞ 0 k x λe − λx 1 ∞ k! dx = k ∫ uke −u du = k λ 0 λ Γ (k+1)=k! Joint cdf FX,Y(x,y) = P(X≤x,Y≤y) Now let fX(x)=(2π)-1/2exp(-x2/2). Then E(X 2k+1 ) = ∞ E(X 2k ) = ∫ −∞ x2k −x2 /2 2k−1/2 e dx = 2π 2π If we consider two random variables, X and Y, we need to consider their joint behavior. Joint pmf pX,Y(x,y) = P(X=x,Y=y) Joint pdf fX,Y (x,y) = ∞ k−1/2 ∫u 0 = 1× 3 × 5 × (2k − 1) = (2k − 1)!! e −u du ∂2 FX,Y (x,y) ∂x∂y Queue lengths A random point in a circle A supermarket has two express lines. At a given time, let X and Y be the number of customers in line 1 and 2, respectively. The joint pmf is X y 0 1 2 3 0 0.1 0.2 0 0 1 0.2 0.25 0.05 0 2 0 0.05 0.05 0.02 3 0 0.03 0.05 (X,Y) What is the density of (X,Y) if we choose the point at random in the unit circle, in the sense that equal areas have equal probabilities? P(X≠Y) = More about a random point in a circle Given a point chosen at random in the unit circle, what is the distribution of the xcoordinate? 1− x 2 (x,y) A random point in a circle Suppose that we are interested in the expected distance from the origin of a point (X,Y) selected at random in the unit circle, or Z=(X2+Y2)1/2. To compute its expected value we simply calculate ∫∫ E(Z) = − 1− x 2 1− x2 ∞ fX (x) = ∫f X,Y (x, y)dy = −∞ 2 = π ∫ − 1− x 2 2 1− x , −1 ≤ x ≤ 1 (x2 + y 2 )1/2 fX,Y (x,y)dxdy x2 +y 2 ≤1 1 dy π 1 = ∫ r=0 π 1 1 2 2 2π ∫ r πrdrdθ = ∫ r π dr = 3 θ= − π 0 Covariance Marginal distributions If (X,Y) is discrete, the marginal distribution of X is given by p X (x) = ∑p X,Y (x,y) y while if they are continuous it is given by ∞ fX (x) = ∫ When two random variables are not independent, it is sometimes important to have a measure of their dependence. A complete description is the joint distribution, but a simple summary is the covariance, given by Cov(X,Y) = E((X - E(X))(Y - E(Y)) fX,Y (x,y)dy y=−∞ The marginal distribution contains no information about the joint behavior of X and Y. = E(XY) - E(X)E(Y) - E(X)E(Y) + E(X) = E(XY) - E(X)E(Y) Cov(aX+bY,cZ+dV) = ac Cov(X,Z) + bc Cov(Y,Z) + ad Cov(X,V) +bd Cov(Y,V) Cov(X,X) = 35 An example Var(X+Y) = Contributions to the covariance Let pX,Y(x,y)=(x+2y)/22, where (x,y) take values in {(1,1),(1,3),(2,1),(2,3)} (X-EX)(Y-EY) < 0 (X-EX)(Y-EY) > 0 (EX,EY) (X-EX)(Y-EY) > 0 (X-EX)(Y-EY) < 0 The variance of a sum n n n n Var( ∑ Xi ) = ∑ Var(Xi ) + ∑ ∑ Cov(Xi ,X j ) i=1 i=1 i=1 j=1 Binomial distribution As for the mean, write where i≠j In particular, if the Xi are independent we have Cov(Xi,Xj) = E(XiXj) - E(Xi)E(Xj) so, using the independence of the Yi , = E(Xi)E(Xj) - E(Xi)E(Xj) =0 so that Hypergeometric variance As for the binomial distribution write where A linear relationship Let Y=aX+b, a and b constants. Then Cov(X,Y) = Writing p = r / N we have again that Var(Yi) = pq, but now the Yi are no longer independent. We need to compute their covariance. By symmetry, Cov(aX+b,cY+d) Correlation coefficient Cauchy-Schwarz inequality We call a random variable standardized if it has mean 0 and variance 1. This can be achieved simply by subtracting off the mean and dividing by the sd. If S and T are random variables with finite variance, then The covariance between two standardized random variables is called the correlation coefficient. It is a number between -1 and 1. Corr(X,Y) = Cov(X*,Y*) = Cov(X,Y) / sd(X)sd(Y) {E(ST))2 ≤ E(S2) E(T2) Proof: If Z is nonnegative, then E(Z) ≥ 0. Let a be arbitrary. Then 0 ≤ E(aS + T)2 = a2ES2 + 2a E(ST) + ET2 A nonnegative quadratic (in a) must have nonpositive discriminant, i.e. 4(E(ST))2 – 4ES2 ET2 ≤ 0. Corollary |Corr(S,T)| ≤ 1. Hermann Schwarz 1843-1921 Augustin-Lous Cauchy 1789-1857 Some scatters of points Monday’s lecture Covariance and linear functions Challenger explosion Correlation coefficient Cauchy-Schwarz inequality The long run Consider independent and identically distributed (iid) random variables Xi, taking on the values 0 or 1 with equal probabilities. What happens if we average them? The law of large numbers Let X1,…,Xn be iid random variables with mean µ and variance σ2 < ∞ . Then n P( 1 ∑ (Xi − µ) > ε) → 0 as n i=1 n→ ∞ We say that X = (1 / n)∑ Xi P converges in probability to µ, written X → µ Proof: Link Monte Carlo integration Suppose we want to integrate a function f(x) from 0 to 1, but are unable to do the integration analytically. One approach is to use random numbers. Write 1 1 ∫ f(x)dx = ∫ (f(x) × 1)dx = E(f(X)) 0 An optic experiment Consider a concentrated light source, such as a laser, reflected in a mirror which can be moved around a vertical axis A. We are interested in the location of the reflection on a wall. 0 where X ~ U(0,1). By the law of large numbers we have where X1,…,Xn are iid as X. How can we use this? φ X Averaging Cauchy variables The mean of the Cauchy distribution Why does the law of large numbers not work for the Cauchy variables? Compute ∞ E(X) = xdx ∫ π(1+ x −∞ 2 )