Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY MODELS 51 13. Continuous random variables Imagine a stick of unit length that is oriented horizontally from left to right. Consider choosing a point x uniformly at random between 0 and 1 and breaking the stick x units from the left end. In this example, the sample space can be regarded as Ω = [0, 1). For S finitely many subintervals (non-overlapping) J1 , . . . , Jn , with E := ni=1 Ji , we define n n [ X P[E] := P Ji = length(Ji ). i=1 i=1 The intuition here is that there are too many possible values of X, so each individual value has an excessively small (in fact, zero) probability of occurring. Think back to our interpretation of the event space (Ω, E) at the beginning of the semester. The elements E ∈ E are the events that we can obtain information about. In the above example, even if we could generate a uniform number x to infinite precision, you could never know exactly what the number is; therefore, even an observation such as x = 0.1542847 only tells you that x is in the interval [0.15428465, 0.15428475). By this definition, we have P[[0, 1)] = P[Ω] = 1 − 0 = 1, P[[0, 1/2)] = 1/2 and P[[x, x]] = P[{x}] = x − x = 0 for every x ∈ [0, 1). This example S emphasizes the importance of countable additivity of P. In particular, we can write Ω = ω∈[0,1) {ω}, which is an uncountable union, and [ X 1 = P[Ω] = P {ω} , P[{ω}] = 0. ω∈Ω ω∈Ω Definition 13.1 (Continuous random variables). Let (Ω, E, P) be a probability model. A function X : Ω → R has a continuous distribution is there exists a function f : R → [0, ∞) such that Z P{X ∈ B} = f (x)dx B for all events B ∈ E. Specifically, f satisfies Z ∞ P{X ∈ R} = P{−∞ < X < ∞} = f (x)dx = 1. −∞ We call f the density of X. If f is continuous at x, then 1 P{X ∈ [x, x + ∆]} = lim lim ∆↓0 ∆ ∆↓0 length[x, x + ∆] x+∆ Z f (y)dy = f (x). x Therefore, we interpret f (x)dx as the probability that X is in an infinitesimally small window [x, x + dx] of x. The cumulative distribution function FX : R → [0, 1] of X is defined by Z x FX (x) := P{X ≤ x} = P{X ∈ (−∞, x]} = f (y)dy −∞ and, for x < z, P{x < X ≤ z} = FX (z) − FX (x). Conversely, if F is continuously differentiable, or everywhere continuous and differentiable at all but finitely many points, then Z z F(z) − F(x) = F0 (y)dy, x 52 HARRY CRANE so X has density f = F0 by the fundamental theorem of calculus. 13.1. Uniform random variables. Define ( 1, 0 ≤ x ≤ 1 f (x) := 0, otherwise. Then f is the density of a Uniform(0, 1) random variable. If U ∼ Unif(0, 1), then Z u FU (u) := P{U ≤ u} = dx = u, 0 ≤ u ≤ 1, 0 1 Z EU = udu = 1/2, 0 1 Z EU = 2 u2 du = 1/3, and 0 Var(U) = EU2 − [EU]2 = 1/3 − (1/2)2 = 1/12. For a < b, W = (b − a)U + a is uniformly distributed on (a, b). The density of W is ( 1 , a≤w≤b f (w) = b−a 0, otherwise. Also, 1 EW = (b − a)EU + a = (a + b) 2 (b − a)2 . Var(W) = 12 Recall the definition of the discrete Uniform distribution: P{X = k} = 1/n for k = 1, . . . , n. In this case, EX = (n + 1)/2 and Var(X) = (n2 − 1)/12. Let Nn have the discrete uniform distribution on [n] := {1, . . . , n}, for each n = 1, 2, . . .. Then we say (Nn /n, n ≥ 1) converges in distribution to the Uniform distribution on [0, 1], in the following sense. If kn → ∞ in such a way that kn /n → u ∈ [0, 1], then P{Nn ≤ kn } = kn /n = P{Nn /n ≤ kn /n} and kn /n → u and P{Nn /n ≤ kn /n} → u = P{U ≤ u}, for all u ∈ (0, 1). Note also that n+1 1 → = EU and 2n 2 2 n −1 1 Var(Nn /n) = n−2 Var(Nn ) = → = Var(U). 12 12n2 ENn /n = n−1 ENn = Therefore, one way to generate an approximately uniform random variable U is to sample x uniformly from {1, . . . , n}, for some large value of n = 1, 2, . . ., and then put U = x/n. This procedure gives a uniform random variable up to precision 1/n. See Section 13.3 for a much better way to do this. PROBABILITY MODELS 53 13.2. Expectation of continuous random variables. Let X be a random variable with density f . Then, for g : R → R, Z ∞ Z ∞ Z ∞ + Eg(X) := g(x) f (x)dx = g (x) f (x)dx − g− (x) f (x)dx, −∞ −∞ −∞ where g± (x) := max(±g(x), 0). The same rules apply for computing expectations of continuous random variables, with the modification that we now compute the integral instead of a sum. In particular, for random variables X, Y ∈ R and a, b ∈ R, we have • • • • E(aX + bY + c) = aEX + bEY + c, Var(X) = EX2 − [EX]2 , Var(aX + bY + c) = a2 Var(X) + b2 Var(Y) + 2ab Cov(X, Y), where Cov(X, Y) = EXY − EXEY. Now, suppose X ≥ 0 is a continuous random variable with density f . Then Z ∞ EX = x f (x)dx 0 Z ∞ "Z x # = dy f (x)dx 0 0 Z ∞Z ∞ = I{0≤y≤x} f (x)dydx 0 0 # Z ∞ "Z ∞ = f (x)dx dy y 0 ∞ Z = P{X ≥ y}dy. 0 Also, Z EX 2 ∞ = P{X2 ≥ y}dy Z0 ∞ = P{X ≥ √ y}dy 0 Z = ∞ 2xP{X ≥ x}dx. 0 Example 13.2 (Order statistics). Let U1 , U2 , U3 , U4 be i.i.d. Uniform(0, 1) random variables and write U(1) < U(2) < U(3) < U(4) to denote the order statistics. Then P{U(1) ≤ u} = = P{min Ui ≤ u} 1 − P{Ui > u for all i = 1, . . . , 4} = 1 − (1 − u)4 =: F(1) (u). The density of U(1) is ( f(1) (u) = F0(1) (u) = 4(1 − u)3 , 0 < u < 1 0, otherwise. 54 HARRY CRANE We can compute the expectation of U(1) in a couple of different ways. By definition, 1 Z EU(1) = 1 Z 4u(1 − u) du = 3 0 Z 4(1 − y)y dy = 3 0 1 4y3 − 4y4 dy = 1 − 4/5 = 1/5. 0 Alternatively, 1 Z EU = 1 Z P{U(1) ≥ u}du = 0 (1 − u)4 du = 1/5. 0 13.3. Constructing Uniform (0,1) random variables. Let I1 , I2 , . . . be independent Bernoulli(1/2) random variables (based on repeated flips of a fair coin). Given I1 , I2 , . . . , we define Y := ∞ X I j 2−j . j=1 Since {I j } j≥1 are random, so is Y. Y is a random variable with dyadic (binary) expansion 0.I1 I2 I3 . . .. The set n o D := 0.1 2 . . . : j ∈ {0, 1} of dyadic rationals is dense in [0,1], i.e., for every u ∈ [0, 1] there exists a δ > 0 such that some d ∈ (u − δ, u + δ) is in D. Suppose we terminate our expansion above at j = 5. Then we obtain an expansion Y5 := 0.i1 i2 i3 i4 i5 , where i1 , . . . , i5 are realizations of independent Bernoulli trials. What is the distribution of Y5 ? ( 2−5 , u = 0.1 . . . 5 , j ∈ {0, 1} P{Y5 = u} = 0, otherwise. In general, if we stop after n ≥ 1 trials and define Yn := i1 . . . in , then ( −n 2 , u = 0.1 . . . n , j ∈ {0, 1} P{Yn = u} = 0, otherwise. In other words, for n ≥ 1, Yn is uniformly distributed on Dn := {0.1 . . . n : j ∈ {0, 1}, j = 1, . . . , n}. Now, take any u ∈ [0, 1]. Since D is dense in u, there exists a sequence d1 , d2 , . . . , with dn ∈ Dn for each n ≥ 1 such that dn → u. In fact, we can choose dn so that |u − dn | ≤ 2−n for each n ≥ 1. Now, generate a sequence I1 , I2 , . . . i.i.d. Bernoulli(1/2) and, for each n ≥ 1, define Yn := I1 . . . In as above. For the dn we have chosen, we have dn − 2−n ≤ P{Yn ≤ u} ≤ dn + 2−n for all n ≥ 1. By the sandwich lemma, we have limn→∞ P{Yn ≤ u} = limn→∞ dn = u, for all u ∈ (0, 1). This is the notion of convergence in distribution. This method tells us that we can approximate a uniform random variable to precision 2−n by flipping a coin only n times. This approach is more efficient than that described above, where we must generate a number between 1 and n to obtain precision only up to n−1 . PROBABILITY MODELS 55 13.4. The Normal (Gaussian) distribution. A random variable Z has the standard normal distribution if it is continuous with density 2 1 fZ (z) = √ e−z /2 , −∞ < z < ∞. 2π The Normal distribution is the most important distribution in statistics, as we will see later in the course. The expectation and variance can be computed as follows: Z ∞ 2 1 EZ = √ ze−z /2 dz = 0 and 2π −∞ Z ∞ 2 1 2 Var(Z) = EZ = √ z × ze−z /2 dz 2π −∞ Z ∞ −z2 /2 h √ i∞ e −z2 /2 − = 1. = z(−e / 2π) −√ −∞ 2π −∞ The density of Z is commonly denoted φ and the distribution function is denoted Φ. For α ∈ (0, 1), zα is the number such that Φ(zα ) = α, e.g., z0.5 = 0, z0.84 ≈ 1, z0.025 ≈ −2, and so on. These are called the α-quantiles of the standard normal distribution and are used repeatedly in statistical inference. A random variable X is said to obey the Normal distribution with parameter (µ, σ2 ), for −∞ < µ < ∞ and σ2 > 0, written X ∼ N(µ, σ2 ), if X−µ Z := is a standard Normal random variable. σ We denote the distribution of X by Φµ,σ . The above transformation of X is called a change of location of µ and change of scale by σ. Consequently, the family {Φµ,σ : −∞ < µ < ∞, σ2 > 0} of Normal distributions is called a location-scale family of distributions, which will be covered in another course on statistical inference. Since X = µ + σZ, where Z ∼ N(0, 1), we have EX = µ + σEZ = µ and Var(X) = Var(µ + σZ) = σ2 Var(Z) = σ2 . Furthermore, X has a density fX (x)dx = P{x ≤ X ≤ x + dx} ( ) x − µ X − µ x − µ dx = P ≤ ≤ + σ σ σ σ ( ) x−µ x − µ dx = P ≤Z≤ + σ σ σ x − µ dx = fZ . σ σ Therefore, we denote the density of X by φµ,σ , which satisfies ( ) (x − µ)2 1 (22) φµ,σ (x) = √ exp − , −∞ < x < ∞. 2σ2 σ 2π Theorem 13.3 (Demoivre–Laplace Theorem). Let X be a Binomial random variable with parameter (n, p). Then X is approximately Normally distributed with parameter (np, npq). 56 HARRY CRANE Remark 13.4. This approximation requires npq ≥ 10 (or so) in order to be accurate. Example 13.5 (Normal approximation to Binomial). Spin a pointer 10 times with 4 possible outcomes, A,B,C,D. Let NA denote the number of times the pointer points toward A. Then NA follows the Binomial distribution with parameter (10, 1/4). We have ! 10 X 10 1 k 3 10−k P{NA ≥ 4} = ≈ 0.224. k 4 4 k=4 By Demoivre–Laplace, NA is approximately Normally distributed with mean µ = np = 2.5 and √ standard deviation σ = npq ≈ 1.37. The Normal approximation (with continuity correction) gives D − 2.5 3.5 − 2.5 P{D ≥ 4} ≈ P ≥ = 1 − Φ(0.73) = 0.233. 1.37 1.37 Example 13.6 (Marbles). A large collection of marbles has diameters that are approximately Normally distributed with mean 1 cm. One third of the marbles have diameters greater than 1.1 cm. (a) What is the standard deviation? From the normal table, z0.667 = 0.43; hence, N(0, 1) and X−µ σ ∼ 1.1 − 1 = 0.43 =⇒ σ = 0.1cm/0.43 = 0.23cm. σ (b) What fraction of marbles have diameters within 0.2 cm of the mean? −0.2 1.2 − 1 P{0.8 ≤ D ≤ 1.2} = P ≤Z≤ 0.23 0.23 = P {−0.86 ≤ Z ≤ 0.86} = Φ(0.86) − Φ(−0.86) = Φ(0.86) − [1 − Φ(0.86)] = 2Φ(0.86) − 1 = 0.61. (c) What diameter is exceeded by 75% of marbles? The 0.25 quantile of Z is z0.25 = −0.675. Hence, the diameter exceeded by 75%, denoted d75 , is d75 = µ − 0.675σ = 1 − 0.675 × 0.23 = 0.84 cm. 13.5. The Exponential distribution. Consider modeling arrivals of a bus as follows. Fix some λ > 0 and assume that within each time interval, the number of bus arrivals in that interval follows a Poisson distribution with parameter λ × (length of the interval). For example, if λ = 2/hour, then the number of arrivals between 12:00PM and 2:00PM is Poisson with parameter 2 × 2 = 4. Further, we assume that the number of arrivals in non-overlapping intervals are independent Poisson random variables with the appropriate means. Therefore, the number of arrivals between 12:00 and 2:00 is Poisson(4) and is independent of the number of arrivals between 2:30 and 3:00, which is Poisson(1). The above model is called a time-homogeneous Poisson process with intensity parameter λ > 0. Writing N(s,t) to denote the number of arrivals in the interval (s, t), the Poisson process is determined by the marginal distributions N(s,t) ∼ Poisson(λ(t − s)). PROBABILITY MODELS 57 Let T := inf{t > 0 : N[0,t] = 1} denote the time of the first arrival. What is the distribution of T? • First method: Note that the event {T > t} is the same as the event that there are no arrivals in the interval [0, t] for our Poisson process. Therefore, P{T > t} = P{N[0,t] = 0} = e−λt , by the assumption that counts follow the Poisson distribution; therefore, FT (t) = P{T ≤ t} = 1 − P{T > t} = 1 − e−λt and fT (t) = F0T (t) = λe−λt . • Second method: For t ≥ 0 and dt > 0, fT (t)dt = P{t ≤ T ≤ t + dt} = P{no arrival in [0, t), one arrival in [t, t + dt]} = P{N[0,t) = 0} × P{N[t,t+dt) = 1} = e−λt × e−λdt (λdt)1 /1! = λe−λt × e−λdt dt. For the density of T, we have fT (t) = lim dt↓0 P{t ≤ T ≤ t + dt} λe−λt × e−λdt dt = lim = λe−λt . dt dt dt↓0 You can check that fT above integrates to one and so is the proper probability density. The random variable T is said to have the Exponential distribution with intensity parameter λ > 0, denoted T ∼ Exp(λ). Now, put X = λT, where T ∼ Exp(λ). Then X is called standard Exponential random variable: FX (x) = P{X ≤ x} = P{T ≤ x/λ} = 1 − e−x , x>0 and fX (x) = e . −x To compute the moments of the Exponential distribution, we introduce the Gamma function, which for real numbers r > 0 is defined by Z ∞ Γ(r) := xr−1 e−x dx. 0 The Gamma function takes some special values. In particular, Γ(1) = 1 and, for any r > 0, Z ∞ Γ(r + 1) = xr e−x dx 0 Z ∞ r −x ∞ = x (−e ) 0 − rxr−1 (−e−x )dx = rΓ(r). 0 Therefore, if r is a positive integer, then Γ(r + 1) = r(r − 1) · · · 2 · 1 · Γ(1) = r!. 58 HARRY CRANE Now, let X be a standard Exponential random variable. Using the Gamma function, we have Z ∞ EX = xe−x dx = Γ(2) = 1, Z0 ∞ EX2 = x2 e−x dx = Γ(3) = 2, and 0 Z ∞ n EX = xn e−x dx = Γ(n + 1) = n!. 0 We also have Var(X) = − distribution as X/λ and so EX2 [EX]2 ET = 1/λ; = 2 − 1 = 1. Now, let T ∼ Exp(λ). Then T has the same Var(T) = 1/λ2 ; SD(T) = 1/λ. Example 13.7 (Electrical systems). Consider two components C1 , C2 of an electrical system. The system functions as long as both components are functioning. Suppose lifetimes T1 , T2 of C1 and C2 are modeled by independent Exponential random variables with parameters λ1 , λ2 and let T be the lifetime of the system. What is the distribution of T := min(T1 , T2 )? • Method 1: CDF approach Let t ≥ 0, then P{T > t} = = = = = P{T1 > t, T2 > t} P{T1 > t}P{T2 > t} e−λ1 t e−λ2 t e−(λ1 +λ2 )t e−λt , where λ = λ1 + λ2 . • Method 2: Infinitesimals approach Let t ≥ 0, then f (t)dt = = = = = P{t − dt < T ≤ t} P{t − dt < T1 ≤ t, T2 > t} + P{t − dt < T2 ≤ t T > t} P{t − dt < T1 ≤ t}P{T2 > t} + P{t − dt < T2 ≤ t}P{T1 > t} λ1 e−λ1 t dte−λ2 t + λ2 e−λ2 t dte−λ1 t λe−λt dt. We conclude that T = min(T1 , T2 ) ∼ Exp(λ1 + λ2 ) and ET = 1/(λ1 + λ2 ). What is the probability that component 1 fails before component 2? Z ∞ P{T2 > T2 } = P{T2 > T1 | T1 = t} fT1 (t)dt Z0 ∞ = P{T2 > t | T1 = t}λ1 e−λ1 t dt Z0 ∞ = e−λ2 t λ1 e−λ1 t dt 0 = λ1 /(λ1 + λ2 ). For s ≥ 0, what is P{T > s and T2 > T1 }? (Exercise.) PROBABILITY MODELS 59 The following is a (very good) exercise. Proposition 13.8. Let T1 , T2 , . . . , Tn be independent Exponential random variables with intensities λ1 , . . . , λn , respectively. Then the minimum of T1 , . . . , Tn follows the Exponential distribution with parameter λ• := λ1 + · · · + λn . When studying discrete distributions, we mentioned that the Geometric distribution is the unique discrete distribution with the memoryless property. In fact, the Exponential distribution is the unique continuous distribution with the memoryless property: Let T ∼ Exp(λ), s , t ≥ 0, then P{T ≥ t + s, T ≥ t} P{T ≥ t} −λ(t+s) e = e−λt = e−λs . P{T ≥ t + s | T ≥ t} = Given T ≥ t, the residual lifetime T − t has the Exponential distribution with parameter λ. The Exponential distribution is the only memoryless non-negative continuous distribution. The memoryless property is useful for quick probability calculations. For example, suppose the lifetime of a light bulb is Exponentially distributed with parameter 1/200. Then, given the bulb is still working after 800 hours, how much longer do you expect it to work? By the memoryless property, the residual lifetime is still Exponential with parameter 1/200, and so we expect it to work for another 200 hours. 13.6. The Gamma distribution. Consider a Poisson process with intensity λ. Let Wr be the waiting time of the rth arrival. Then P{Wr > t} = P{N[0,t] ≤ r − 1} = r−1 X e−λt (λt) j / j! j=0 and fr (t)dt = P{t ≤ Wr ≤ t + dt} = P{N[0,t) = r − 1, N[t,t+dt) = 1} = = e−λt (λt)r−1 × λdte−λdt (r − 1)! λr r−1 −λt −λdt t e dte . Γ(r) A random variable W with density f (t) = λr r−1 −λt t e I(0,∞) (t) Γ(r) is said to have the Gamma distribution with shape parameter r and intensity λ, written W ∼ Gamma(r, λ). 60 HARRY CRANE If W ∼ Gamma(r, λ), then X = λW ∼ Gamma(r, 1). Now, suppose X ∼ Gamma(r, 1), then Z ∞ Γ(r + 1) rΓ(r) 1 x × xr−1 e−x dx = EX = = = r, Γ(r) 0 Γ(r) Γ(r) EX2 = Γ(r + 2)/Γ(r) = r(r + 1), and Var(X) = r2 + r − r2 = r. Therefore, if W ∼ Gamma(r, λ), then EW = r/λ and Var(W) = r/λ2 . 13.7. Hazard functions. Let T be a nonnegative continuous random variable with density f and distribution function F. We interpret T as the lifetime of something. Define P{t < T ≤ t + dt | T > t} dt to be the ratio of (the probability of dying in the next dt time units, given that it has lived at least t ≥ 0) to dt. We call λ(t) the death rate, failure rate, or hazard rate at time t. The survival probability is given by S(t) := P{T > t} = 1 − F(t), t > 0. λ(t) := From the hazard rate, we obtain the distribution of T: P{t ≤ T ≤ t + dt | T > t} dt P{t ≤ T ≤ t + dt} = dtP{T > t} f (t) = 1 − F(t) d = − log(1 − (F(t))). dt As an example, for T following the Exponential distribution with rate λ, we have λ(t) = λe−λt = λ, e−λt which is called the constant force of mortality in actuarial modeling. In general, let λ(t) be a hazard function for some random variable T. Then Z t λ(s)ds = [− log(1 − F(s))]t0 = − log(1 − F(t)) + log(1 − F(0)). λ(t) = 0 Therefore, ( Z t ) S(t) = 1 − F(t) = exp − λ(s)ds 0 and ( Z t ) d f (t) = − S(t) = − exp − λ(s)ds (−λ(t)) = λ(t)S(t). dt 0 This shows that the Exponential distribution is the only memoryless distribution since, if T is memoryless, then λ(t) = f (0) for all t ≥ 0. Thus, T has constant hazard rate function and must follow the Exponential distribution with parameter λ = f (0). PROBABILITY MODELS 61 Example 13.9 (The Rayleigh distribution). Suppose T has hazard function λ(t) = t. Then ( Z t ) S(t) = exp − λ(s)ds 0 ( Z t ) = exp − sds 0 −t2 /2 = e 2 /2 and f (t) = λ(t)S(t) = te−t . The mean of T is √ Z ∞ Z ∞ p 2 2 2π 1 ET = t2 e−t /2 dt = √ t2 e−t /2 dt = π/2. 2 2π −∞ 0 Also, Z ET = ∞ ∞ Z −t2 /2 S(t)dt = 0 e 0 √ Z ∞ 2 1 dt = u−1/2 e−u du = √ Γ(1/2). 2 0 2 Therefore, √ π √ 1 Γ(3/2) = Γ(1/2) = π/2 2 √ 3 Γ(5/2) = Γ(3/2) = 3 π/4. 2 Γ(1/2) = We have ∞ Z ET = ∞ Z 2 /2 2tS(t)dt = 2 2 0 te−t dt = 2. 0 and Var(T) = 2 − π/2. We say that T has the (standard) Rayleigh distribution. 13.8. Some extreme value theory. Suppose we want to analyze extreme (possibly catastrophic) events, such as the occurrence of a “storm of the century” or an economic collapse. Then we consider the distribution of the maximum of a sequence of random variables. For example, we might be interested in the largest daily drop of the Dow Jones Industrial Average, the maximum insurance claim submitted over a certain period, maximum weekly rainfall, etc. Let X1 , X2 , . . . be i.i.d. standard exponential random variables and, for n ≥ 1, let Mn := max(X1 , . . . , Xn ). Then, for x ≥ 0, P{Mn ≤ x} = P{X1 ≤ x, . . . , Xn ≤ x} = (1 − e−x )n , which has density fMn (x) = ne (1 − e ) −x −x n−1 ! n −x = e × (1 − e−x )n−1 . 1 62 HARRY CRANE Now, put Yn := Mn − log n, let y ∈ R and xn := y + log n. Then, for all large n, xn ≥ 0 and P{Yn ≤ y} = P{Mn ≤ xn } = (1 − e−xn )n = (1 − e−y−log n )n = (1 − e−y /n)n = P{N = 0}, where N has the Binomial distribution with parameter (n, e−y /n). Consequently, as n → ∞, pn = e−y /n → 0, npn = ne−y /n → e−y and Bn → B∞ ∼ Poisson(e−y ), by the law of small numbers. We therefore have −y P{Yn ≤ y} = (1 − e−y /n)n → e−e = P{B∞ = 0}. −y A random variable Y with distribution function F(y) = e−e Exponential Extreme Value distribution, denoted Y ∼ EE(1). Now, suppose T ∼ Exp(1). Then for all y ∈ R, has the (standard) Double −y P{− log T ≤ y} = P{log T ≥ −y} = P{T ≥ e−y } = e−e , and so − log T ∼ EE(1). Some exercises on the Extreme value distribution: • If Y ∼ EE(1), what is its density? • Fix k and let Mn:k be the kth largest of X1 , . . . , Xn . For y ∈ R, what is limn→∞ P{Mn:k − log n ≤ y}?