Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2. Random Variables, Distribution Functions, Expectation, Moment generating Functions Aim of this section: • Mathematical definition of the concepts random variable (cumulative) distribution function (probability) density function expectation and moments moment generating function 8 Preliminaries: • Repetition of the notions random experiment outcome (sample point) and sample space event probability (see Wilfling (2011), Chapter 2) 9 2.1 Basic Terminology Definition 2.1: (Random experiment) A random experiment is an experiment (a) for which we know in advance all conceivable outcomes that it can take on, but (b) for which we do not know in advance the actual outcome that it eventually takes on. Random experiments are performed in controllable trials. 10 Examples of random experiments: • Drawing of lottery numbers • Roulette, tossing a coin, tossing a dice • ’Technical experiments’ (testing the hardness of lots from steel production etc.) In economics: • Random experiments (according to Def. 2.1) are rare (historical data, trials are not controllable) • Modern discipline: Experimental Economics 11 Definition 2.2: (Sample point, sample space) Each conceivable outcome ω of a random experiment is called a sample point. The totality of conceivable outcomes (or sample points) is defined as the sample space and is denoted by Ω. Examples: • Random experiment of tossing a single dice: Ω = {1, 2, 3, 4, 5, 6} • Random experiment of tossing a coin until HEAD shows up: Ω = {H, TH, TTH, TTTH, TTTTH, . . .} • Random experiment of measuring tomorrow’s exchange rate between the euro and the US-$: Ω = [0, ∞) 12 Obviously: • The number of elements in Ω can be either (1) finite or (2) infinite, but countable or (3) infinite and uncountable Now: • Definition of the notion Event based on mathematical sets Definition 2.3: (Event) An event of a random experiment is a subset of the sample space Ω. We say ’the event A occurs’ if the random experiment has an outcome ω ∈ A. 13 Remarks: • Events are typically denoted by A, B, C, . . . or A1, A2, . . . • A = Ω is called the sure event (since for every sample point ω we have ω ∈ A) • A = ∅ (empty set) is called the impossible event (since for every ω we have ω ∈ / A) • If the event A is a subset of the event B (A ⊂ B) we say that ’the occurrence of A implies the occurrence of B’ (since for every ω ∈ A we also have ω ∈ B) Obviously: • Events are represented by mathematical sets −→ application of set operations to events 14 Combining events (set operations): • Intersection: n T i=1 • Union: n S i=1 Ai occurs, if all Ai occur Ai occurs, if at least one Ai occurs • Set difference: C = A\B occurs, if A occurs and B does not occur • Complement: C = Ω\A ≡ A occurs, if A does not occur • The events A and B are called disjoint, if A ∩ B = ∅ (both events cannot occur simultaneously) 15 Now: • For any arbitrary event A we are looking for a number P (A) which represents the probability that A occurs • Formally: P : A −→ P (A) (P (·) is a set function) Question: • Which properties should the probability function (set function) P (·) have? 16 Definition 2.4: (Kolmogorov-axioms) The following axioms for P (·) are called Kolmogorov-axioms: • Nonnegativity: P (A) ≥ 0 for every A • Standardization: P (Ω) = 1 • Additivity: For two disjoint events A and B (i.e. for A∩B = ∅) P (·) satisfies P (A ∪ B) = P (A) + P (B) 17 Easy to check: • The three axioms imply several additional properties and rules when computing with probabilities Theorem 2.5: (General properties) The Kolmogorov-axioms imply the following properties: • Probability of the complementary event: P (A) = 1 − P (A) • Probability of the impossible event: P (∅) = 0 • Range of probabilities: 0 ≤ P (A) ≤ 1 18 Next: • General rules when computing with probabilities Theorem 2.6: (Calculation rules) The Kolmogorov-axioms imply the following calculation rules (A, B, C are arbitrary events): • Addition rule (I): P (A ∪ B) = P (A) + P (B) − P (A ∩ B) (probability that A or B occurs) 19 • Addition rule (II): P (A ∪ B ∪ C) = P (A) + P (B) + P (C) −P (A ∩ B) − P (B ∩ C) −P (A ∩ C) + P (A ∩ B ∩ C) (probability that A or B or C occurs) • Probability of the ’difference event’: P (A\B) = P (A ∩ B) = P (A) − P (A ∩ B) 20 Notice: • If B implies A (i.e. if B ⊂ A) it follows that P (A\B) = P (A) − P (B) 21 2.2 Random Variable, Cumulative Distribution Function, Density Function Frequently: • Instead of being interested in a concrete sample point ω ∈ Ω itself, we are rather interested in a number depending on ω Examples: • Profit in euro when playing roulette • Profit earned when selling a stock • Monthly salary of a randomly selected person Intuitive meaning of a random variable: • Rule translating the abstract ω into a number 22 Definition 2.7: (Random variable [rv]) A random variable, denoted by X or X(·), is a mathematical function of the form X : Ω −→ R ω −→ X(ω). Remarks: • A random variable relates each sample point ω ∈ Ω to a real number • Intuitively: A random variable X characterizes a number that is a priori unknown 23 • When the random experiment is carried out, the random variable X takes on the value x • x is called realization or value of the random variable X after the random experiment has been carried out • Random variables are denoted by capital letters, realizations are denoted by small letters • The rv X describes the situation ex ante, i.e. before carrying out the random experiment • The realization x describes the situation ex post, i.e. after having carried out the random experiment 24 Example 1: • Consider the experiment of tossing a single coin (H=Head, T =Tail). Let the rv X represent the ’Number of Heads’ • We have Ω = {H, T } The random variable X can take on two values: X(T ) = 0, X(H) = 1 25 Example 2: • Consider the experiment of tossing a coin three times. Let X represent the ’Number of Heads’ • We have Ω = {(H, H, H)}, |(H, {z H, T )}, . . . , |(T, {z T, T )}} {z | =ω1 =ω2 =ω8 The rv X is defined by X(ω) = number of H in ω • Obviously: X relates distinct ω’s to the same number, e.g. X((H, H, T )) = X((H, T, H)) = X((T, H, H)) = 2 26 Example 3: • Consider the experiment of randomly selecting 1 person from a group of people. Let X represent the person’s status of employment • We have Ω = {’employed’ {z }, ’unemployed’ | {z }} | =ω1 =ω2 • X can be defined as X(ω1) = 1, X(ω2) = 0 27 Example 4: • Consider the experiment of measuring tomorrow’s price of a specific stock. Let X denote the stock price • We have Ω = [0, ∞), i.e. X is defined by X(ω) = ω Conclusion: • The random variable X can take on distinct values with specific probabilities 28 Question: • How can we determine these specific probabilities and how can we calculate with them? Simplifying notation: (a, b, x ∈ R) • P (X = a) ≡ P ({ω|X(ω) = a}) • P (a < X < b) ≡ P ({ω|a < X(ω) < b}) • P (X ≤ x) ≡ P ({ω|X(ω) ≤ x}) Solution: • We can compute these probabilities via the so-called cumulative distribution function of X 29 Intuitively: • The cumulative distribution function of the random variable X characterizes the probabilities according to which the possible values x are distributed along the real line (the so-called distribution of X) Definition 2.8: (Cumulative distribution function [cdf]) The cumulative distribution function of a random variable X, denoted by FX , is defined to be the function FX : R −→ [0, 1] x −→ FX (x) = P ({ω|X(ω) ≤ x}) = P (X ≤ x). 30 Example: • Consider the experiment of tossing a coin three times. Let X represent the ’Number of Heads’ • We have Ω = {(H, H, H)}, (H, H, T )}, . . . , |(T, {z T, T )}} | | {z {z = ω1 = ω2 = ω8 • For the probabilities of X we find P (X = 0) = P ({(T, T, T )}) = 1/8 P (X = 1) = P ({(T, T, H), (T, H, T ), (H, T, T )}) = 3/8 P (X = 2) = P ({(T, H, H), (H, T, H), (H, H, T )}) = 3/8 P (X = 3) = P ({(H, H, H)}) = 1/8 31 • Thus, the cdf is given by FX (x) = 0.000 0.125 0.5 0.875 1 for x < 0 for 0 ≤ x < 1 for 1 ≤ x < 2 for 2 ≤ x < 3 for x ≥ 3 Remarks: • In practice, it will be sufficient to only know the cdf FX of X • In many situations, it will appear impossible to exactly specify the sample space Ω or the explicit function X : Ω −→ R. However, often we may derive the cdf FX from other factual consideration 32 General properties of FX : • FX (x) is a monotone, nondecreasing function • We have lim FX (x) = 0 x→−∞ and lim FX (x) = 1 x→+∞ • FX is continuous from the right; that is, lim F (z) = FX (x) z→x X z>x 33 Summary: • Via the cdf FX (x) we can answer the following question: ’What is the probability that the random variable X takes on a value that does not exceed x?’ Now: • Consider the question: ’What is the value which X does not exceed with a prespecified probability p ∈ (0, 1)?’ −→ quantile function of X 34 Definition 2.9: (Quantile function) Consider the rv X with cdf FX . For every p ∈ (0, 1) the quantile function of X, denoted by QX (p), is defined as QX : (0, 1) −→ R −→ QX (p) = min{x|FX (x) ≥ p}. p The value of the quantile function xp = QX (p) is called the pth quantile of X. Remarks: • The pth quantile xp of X is defined as the smallest number x satisfying FX (x) ≥ p • In other words: The pth quantile xp is the smallest value that X does not exceed with probability p 35 Special quantiles: • Median: p = 0.5 • Quartiles: p = 0.25, 0.5, 0.75 • Quintiles: p = 0.2, 0.4, 0.6, 0.8 • Deciles: p = 0.1, 0.2, . . . , 0.9 Now: • Consideration of two distinct classes of random variables (discrete vs. continuous rv’s) 36 Reason: • Each class requires a specific mathematical treatment Mathematical tools for analyzing discrete rv’s: • Finite and infinite sums Mathematical tools for analyzing continuous rv’s: • Differential- and integral calculus Remarks: • Some rv’s are partly discrete and partly continuous • Such rv’s are not treated in this course 37 Definition 2.10: (Discrete random variable) A random variable X will be defined to be discrete if it can take on either (a) only a finite number of values x1, x2, . . . , xJ or (b) an infinite, but countable number of values x1, x2, . . . each with strictly positive probability; that is, if for all j = 1, . . . , J, . . . we have P (X = xj ) > 0 and J,... X P (X = xj ) = 1. j=1 38 Examples of discrete variables: • Countable variables (’X = Number of . . .’) • Encoded qualitative variables Further definitions: Definition 2.11: (Support of a discrete random variable) The support of a discrete rv X, denoted by supp(X), is defined to be the totality of all values that X can take on with a strictly positive probability: supp(X) = {x1, . . . , xJ } or supp(X) = {x1, x2, . . .}. 39 Definition 2.12: (Discrete density function) For a discrete random variable X the function fX (x) = P (X = x) is defined to be the discrete density function of X. Remarks: • The discrete density function fX (·) takes on strictly positive values only for elements of the support of X. For realizations of X that do not belong to the support of X, i.e. for x ∈ / supp(X), we have fX (x) = 0: fX (x) = ( P (X = xj ) > 0 0 for x = xj ∈ supp(X) for x ∈ / supp(X) 40 • The discrete density function fX (·) has the following properties: fX (x) ≥ 0 for all x X fX (xj ) = 1 xj ∈supp(X) • For any arbitrary set A ⊂ R the probability of the event {ω|X(ω) ∈ A} = {X ∈ A} is given by P (X ∈ A) = X fX (xj ) xj ∈A 41 Example: • Consider the experiment of tossing a coin three times and let X = ’Number of Heads’ (see slide 31) • Obviously: X is discrete and has the support supp(X) = {0, 1, 2, 3} • The discrete density function of X is given by fX (x) = P (X = 0) = 0.125 P (X = 1) = 0.375 P (X = 2) = 0.375 P (X = 3) = 0.125 0 for x = 0 for x = 1 for x = 2 for x = 3 for x ∈ / supp(X) 42 • The cdf of X is given by (see slide 32) FX (x) = 0.000 0.125 0.5 0.875 1 for x < 0 for 0 ≤ x < 1 for 1 ≤ x < 2 for 2 ≤ x < 3 for x ≥ 3 Obviously: • The cdf FX (·) can be obtained from fX (·): FX (x) = P (X ≤ x) = X {xj ∈supp(X)|xj ≤x} =P (X=xj ) z }| { fX (xj ) 43 Conclusion: • The cdf of a discrete random variable X is a step function with steps at the points xj ∈ supp(X). The height of the step at xj is given by FX (xj ) − x→x lim F (x) = P (X = xj ) = fX (xj ), j x<xj i.e. the step height is equal to the value of the discrete density function at xj (relationship between cdf and discrete density function) 44 Now: • Definition of continuous random variables Intuitively: • In contrast to discrete random variables, continuous random variables can take on an uncountable number of values (e.g. every real number on a given interval) In fact: • Definition of a continuous random variable is quite technical 45 Definition 2.13: (Continuous rv, probability density function) A random variable X is called continuous if there exists a function fX : R −→ [0, ∞) such that the cdf of X can be written as FX (x) = Z x −∞ fX (t)dt for all x ∈ R. The function fX (x) is called the probability density function (pdf) of X. Remarks: • The cdf FX (·) of a continuous random variable X is a primitive function of the pdf fX (·) • FX (x) = P (X ≤ x) is equal to the area under the pdf fX (·) between the limits −∞ and x 46 Cdf FX (·) and pdf fX (·) fX(t) P(X ≤ x) = FX(x) x t 47 Properties of the pdf fX (·): 1. A pdf fX (·) cannot take on negative value, i.e. fX (x) ≥ 0 for all x ∈ R 2. The area under a pdf is equal to one, i.e. Z +∞ −∞ fX (x)dx = 1 3. If the cdf FX (x) is differentiable we have 0 (x) ≡ dF (x)/dx fX (x) = FX X 48 Example: (Uniform distribution over [0, 10]) • Consider the random variable X with pdf fX (x) = ( 0 0.1 , for x ∈ / [0, 10] , for x ∈ [0, 10] • Derivation of the cdf FX : For x < 0 we have FX (x) = Z x −∞ fX (t) dt = Z x −∞ 0 dt = 0 49 For x ∈ [0, 10] we have FX (x) = Z x = Z 0 −∞ fX (t) dt 0 dt + | −∞ {z =0 } Z x 0 0.1 dt = [0.1 · t]x0 = 0.1 · x − 0.1 · 0 = 0.1 · x 50 For x > 10 we have FX (x) = Z x = Z 0 −∞ fX (t) dt 0 dt + {z | −∞ =0 = 1 } Z 10 |0 0.1 dt + {z =1 } Z ∞ 0 dt | 10{z } =0 51 Now: • Interval probabilities, i.e. (for a, b ∈ R, a < b) P (X ∈ (a, b]) = P (a < X ≤ b) • We have P (a < X ≤ b) = P ({ω|a < X(ω) ≤ b}) = P ({ω|X(ω) > a} ∩ {ω|X(ω) ≤ b}) = 1 − P ({ω|X(ω) > a} ∩ {ω|X(ω) ≤ b}) = 1 − P ({ω|X(ω) > a} ∪ {ω|X(ω) ≤ b}) = 1 − P ({ω|X(ω) ≤ a} ∪ {ω|X(ω) > b}) 52 = 1 − [P (X ≤ a) + P (X > b)] = 1 − [FX (a) + (1 − P (X ≤ b))] = 1 − [FX (a) + 1 − FX (b)] = FX (b) − FX (a) = Z b = Z b −∞ a fX (t) dt − Z a −∞ fX (t) dt fX (t) dt 53 Interval probability between the limits a and b fX(x) P(a < X ≤ b) a b x 54 Important result for a continuous rv X: P (X = a) = 0 for all a ∈ R Proof: P (X = a) = lim P (a < X ≤ b) = lim b→a = Z a a Z b b→a a fX (x) dx fX (x)dx = 0 Conclusion: • The probability that a continuous random variable X takes on a single explicit value is always zero 55 Probability of a single value fX(x) a b3 b2 b1 x 56 Notice: • This does not imply that the event {X = a} cannot occur Consequence: • Since for continuous random variables we always have P (X = a) = 0 for all a ∈ R, it follows that P (a < X < b) = P (a ≤ X < b) = P (a ≤ X ≤ b) = P (a < X ≤ b) = FX (b) − FX (a) (when computing interval probabilities for continuous rv’s, it does not matter if the interval is open or closed) 57 2.3 Expectation, Moments and Moment Generating Functions Repetition: • Expectation of an arbitrary random variable X Definition 2.14: (Expectation) The expectation of the random variable X, denoted by E(X), is defined by E(X) = X xj · P (X = xj ) {x ∈supp(X)} j Z +∞ −∞ x · fX (x) dx , if X is discrete . , if X is continuous 58 Remarks: • The expectation of the random variable X is approximately equal to the sum of all realizations each weighted by the probability of its occurrence • Instead of E(X) we often write µX • There exist random variables that do not have an expectation (see class) 59 Example 1: (Discrete random variable) • Consider the experiment of tossing two dice. Let X represent the absolute difference of the two dice. What is the expectation of X? • The support of X is given by supp(X) = {0, 1, 2, 3, 4, 5} 60 • The discrete density function of X is given by fX (x) = P (X = 0) = 6/36 P (X = 1) = 10/36 P (X = 2) = 8/36 P (X = 3) = 6/36 P (X = 4) = 4/36 P (X = 5) = 2/36 0 • This gives for x = 0 for x = 1 for x = 2 for x = 3 for x = 4 for x = 5 for x ∈ / supp(X) 10 8 6 4 2 6 +1· +2· +3· +4· +5· E(X) = 0 · 36 36 36 36 36 36 = 70 = 1.9444 36 61 Example 2: (Continuous random variable) • Consider the continuous random variable X with pdf x , for 1 ≤ x ≤ 3 fX (x) = 4 0 , elsewise • To calculate the expectation we split up the integral: E(X) = = Z +∞ −∞ Z 1 x · fX (x) dx Z 3 Z +∞ x 0 dx + 0 dx x · dx + 4 −∞ 3 1 62 = 1 1 3 3 dx = · ·x 4 3 1 4 1 Z 3 2 x 27 1 1 = · − 4 3 3 26 = = 2.1667 12 Frequently: • Random variable X plus discrete density or pdf fX is known • We have to find the expectation of the transformed random variable Y = g(X) 63 Theorem 2.15: (Expectation of a transformed rv) Let X be a random variable with discrete density or pdf fX (·). For any Baire-function g : R −→ R the expectation of the transformed random variable Y = g(X) is given by E(Y ) = E[g(X)] = X g(xj ) · P (X = xj ) {xj ∈supp(X)} Z +∞ −∞ g(x) · fX (x) dx , if X is discrete . , if X is continuous 64 Remarks: • All functions considered in this course are Baire-functions • For the special case g(x) = x (the identity function) Theorem 2.15 coincides with Definition 2.14 Next: • Some important rules for calculating expected values 65 Theorem 2.16: (Properties of expectations) Let X be an arbitrary random variable (discrete or continuous), c, c1, c2 ∈ R constants and g, g1, g2 : R −→ R functions. Then: 1. E(c) = c. 2. E[c · g(X)] = c · E[g(X)]. 3. E[c1 · g1(X) + c2 · g2(X)] = c1 · E[g1(X)] + c2 · E[g2(X)]. 4. If g1(x) ≤ g2(x) for all x ∈ R then E[g1(X)] ≤ E[g2(X)]. Proof: Class 66 Now: • Consider the random variable X (discrete or continuous) and the explicit function g(x) = [x − E(X)]2 −→ variance and standard deviation of X Definition 2.17: (Variance, standard deviation) For any random variable X the variance, denoted by Var(X), is defined as the expected quadratic distance between X and its expectation E(X); that is Var(X) = E[(X − E(X))2]. The standard deviation of X, denoted by SD(X), is defined to be the (positive) square root of the variance: q SD(X) = + Var(X). 67 Remark: • Setting g(X) = [X − E(X)]2 in Theorem 2.15 (on slide 64) yields the following explicit formulas for discrete and continuous random variables: Var(X) = E[g(X)] = X 2 · P (X = x ) (X)] [x − E j j {xj ∈supp(X)} Z +∞ −∞ [x − E(X)]2 · fX (x) dx 68 Example: (Discrete random variable) • Consider again the experiment of tossing two dice with X representing the absolute difference of the two dice (see Example 1 on slide 60). The variance is given by Var(X) = (0 − 70/36)2 · 6/36 + (1 − 70/36)2 · 10/36 + (2 − 70/36)2 · 8/36 + (3 − 70/36)2 · 6/36 + (4 − 70/36)2 · 4/36 + (5 − 70/36)2 · 2/36 = 2.05247 Notice: • The variance is an expectation per definitionem −→ rules for expectations are applicable 69 Theorem 2.18: (Rules for variances) Let X be an arbitrary random variable (discrete or continuous) and a, b ∈ R real constants; then 1. Var(X) = E(X 2) − [E(X)]2. 2. Var(a + b · X) = b2 · Var(X). Proof: Class Next: • Two important inequalities dealing with expectations and transformed random variables 70 Theorem 2.19: (Chebyshev inequality) Let X be an arbitrary random variable and g : R −→ R+ a nonnegative function. Then, for every k > 0 we have P [g(X) ≥ k] ≤ E [g(X)] . k Special case: • Consider g(x) = [x − E(X)]2 and k = r2 · Var(X) • Theorem 2.19 implies n P [X − E(X)] 2 ≥ r2 · Var(X) o (r > 0) Var(X) 1 ≤ 2 = 2 r · Var(X) r 71 • Now: n P [X − E(X)] 2 ≥ r2 · Var(X) o = P {|X − E(X)| ≥ r · SD(X)} = 1 − P {|X − E(X)| < r · SD(X)} • It follows that 1 P {|X − E(X)| < r · SD(X)} ≥ 1 − 2 r (specific Chebyshev inequality) 72 Remarks: • The specific Chebyshev inequality provides a minimal probability of the event that any arbitrary random variable X takes on a value from the following interval: [E(X) − r · SD(X), E(X) + r · SD(X)] • For example, for r = 3 we have 1 8 P {|X − E(X)| < 3 · SD(X)} ≥ 1 − 2 = 3 9 which is equivalent to P {E(X) − 3 · SD(X) < X < E(X) + 3 · SD(X)} ≥ 0.8889 or P {X ∈ (E(X) − 3 · SD(X), E(X) + 3 · SD(X))} ≥ 0.8889 73 Theorem 2.20: (Jensen inequality) Let X be a random variable with mean E(X) and let g : R −→ R be a convex function, i.e. for all x we have g 00(x) ≥ 0; then E [g(X)] ≥ g(E[X]). Remarks: • If the function g is concave (i.e. if g 00(x) ≤ 0 for all x) then Jensen’s inequality states that E [g(X)] ≤ g(E[X]) • Notice that in general we have E [g(X)] = 6 g(E[X]) 74 Example: • Consider the random variable X and the function g(x) = x2 • We have g 00(x) = 2 ≥ 0 for all x, i.e. g is convex • It follows from Jensen’s inequality that E E[X]) [g(X)] ≥ g( | {z } | {z } =E(X 2) i.e. =[E(X)]2 E(X 2) − [E(X)]2 ≥ 0 • This implies Var(X) = E(X 2) − [E(X)]2 ≥ 0 (the variance of an arbitrary rv cannot be negative) 75 Now: • Consider the random variable X with expectation E(X) = µX , the integer number n ∈ N and the functions g1(x) = xn g2(x) = [x − µX ]n Definition 2.21: (Moments, central moments) (a) The n-th moment of X, denoted by µ0n, is defined as µ0n ≡ E[g1(X)] = E(X n). (b) The n-th central moment of X about µX , denoted by µn, is defined as µn ≡ E[g2(X)] = E[(X − µX )n]. 76 Relations: • µ01 = E(X) = µX (the 1st moment coincides with E(X)) • µ1 = E[X − µX ] = E(X) − µX = 0 (the 1st central moment is always equal to 0) • µ2 = E[(X − µX )2] = Var(X) (the 2nd central moment coincides with Var(X)) 77 Remarks: • The first four moments of a random variable X are important measures of the probability distribution (expectation, variance, skewness, kurtosis) • The moments of a random variable X play an important role in theoretical and applied statistics • In some cases, when all moments are known, the cdf of a random variable X can be determined 78 Question: • Can we find a function that gives us a representation of all moments of a random variable X? Definition 2.22: (Moment generating function) Let X be a random variable with discrete density or pdf fX (·). The expected value of et·X is defined to be the moment generating function of X if the expected value exists for every value of t in some interval −h < t < h, h > 0. That is, the moment generating function of X, denoted by mX (t), is defined as mX (t) = E h i t·X e . 79 Remarks: • The moment generating function mX (t) is a function in t • There are rv’s X for which mX (t) does not exist • If mX (t) exists it can be calculated as mX (t) = E = h et·X i X et·xj · P (X = xj ) {x ∈supp(X)} j Z +∞ −∞ et·x · fX (x) dx , if X is discrete , if X is continuous 80 Question: • Why is mX (t) called the moment generating function? Answer: • Consider the nth derivative of mX (t) with respect to t: X n · et·xj · P (X = x ) (x ) j j {xj ∈supp(X)} dn mX (t) = dtn Z +∞ −∞ xn · et·x · fX (x) dx for discrete X for continuous X 81 • Now, evaluate the nth derivative at t = 0: X (xj )n · P (X = xj ) {xj ∈supp(X)} dn mX (0) = n dt Z +∞ −∞ xn · fX (x) dx for discrete X for continuous X = E(X n) = µ0n (see Definition 2.21(a) on slide 76) 82 Example: • Let X be a continuous random variable with pdf fX (x) = ( 0 λ · e−λ·x , for x < 0 , for x ≥ 0 (exponential distribution with parameter λ > 0) • We have h i mX (t) = E et·X = = for t < λ Z +∞ 0 Z +∞ −∞ et·x · fX (x) dx λ · e(t−λ)·x dx = λ λ−t 83 • It follows that m0X (t) = and thus λ (λ − t)2 0 (0) = E(X) = mX 1 λ and and m00X (t) = 2λ (λ − t)3 2 m00X (0) = E(X 2) = 2 λ Now: • Important result on moment generating functions 84 Theorem 2.23: (Identification property) Let X and Y be two random variables with densities fX (·) and fY (·), respectively. Suppose that mX (t) and mY (t) both exist and that mX (t) = mY (t) for all t in the interval −h < t < h for some h > 0. Then the two cdf’s FX (·) and FY (·) are equal; that is FX (x) = FY (x) for all x. Remarks: • Theorem 2.23 states that there is a unique cdf FX (x) for a given moment generating function mX (t) −→ if we can find mX (t) for X then, at least theoretically, we can find the distribution of X • We will make use of this property in Section 4 85 Example: • Suppose that a random variable X has the moment generating function mX (t) = 1 1−t for − 1 < t < 1 • Then the pdf of X is given by fX (x) = ( 0 e−x , for x < 0 , for x ≥ 0 (exponential distribution with parameter λ = 1) 86 2.4 Special Parametric Families of Univariate Distributions Up to now: • General mathematical properties of arbitrary distributions • Discrimination: discrete vs continuous distributions • Consideration of the cdf FX (x) the discrete density or the pdf fX (x) expectations of the form E[g(X)] the moment generating function mX (t) 87 Central result: • The distribution of a random variable X is (essentially) determined by fX (x) or FX (x) • FX (x) can be determined by fX (x) (cf. slide 46) • fX (x) can be determined by FX (x) (cf. slide 48) Question: • How many different distributions are known to exist? 88 Answer: • Infinitely many But: • In practice, there are some important parametric families of distributions that provide ’good’ models for representing realworld random phenomena • These families of distributions are decribed in detail in all textbooks on mathematical statistics (see e.g. Mosler & Schmid (2008), Mood et al. (1974)) 89 • Important families of discrete distributions Bernoulli distribution Binomial distribution Geometric distribution Poisson distribution • Important families of continuous distributions Uniform or rectangular distribution Exponential distribution Normal distribution 90 Remark: • The most important family of distributions at all is the normal distribution Definition 2.24: (Normal distribution) A continuous random variable X is defined to be normally distributed with parameters µ ∈ R and σ 2 > 0, denoted by X ∼ N (µ, σ 2), if its pdf is given by fX (x) = √ 2 x−µ −1 2 σ 1 ·e 2π · σ , x ∈ R. 91 PDF’s of the normal distribution fX(x) N(5,1) N(0,1) N(5,3) N(5,5) 0 5 x 92 Remarks: • The special normal distribution N (0, 1) is called standard normal distribution the pdf of which is denoted by ϕ(x) • The properties as well as calculation rules for normally distributed random variables are important pre-conditions for this course (see Wilfling (2011), Section 3.4) 93