Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Examples of random experiment (a) Random experiment A random experiment is a process leading to an uncertain outcome, before the experiment is run. We usually assume that the experiment can be repeated indefinitely under essentially the same conditions. A basic outcome is a possible outcome of a random experiment. “BASIC”EXPERIMENT 1. mix the tickets in the box; 2. randomly select ONE ticket; 3. read the value on the ticket. ................................. ............... ......... ......... ....... ....... ...... ...... ...... ...... ..... ..... .... .... .... .... ... ... ... ... . ... . . ... .... ... ... ... ... . . ......................... .. ... ... . . . ... ... .. The structure of a random experiment is characterized by three objects: 1 • The sample space S; 0 0 1 0 • the events set; • the probability measure. OUTCOME: number on the ticket. 93 92 Examples of random experiment (b) Examples of random experiment (c) EXPERIMENT 1 EXPERIMENT 2 1. Run the basic experiment; 1. Run the basic experiment; 2. do NOT insert the ticket in the box; 2. reinsert the ticket in the box; 3. run the basic experiment. 3. run the basic experiment. ................................................... .......... ........ ........ ...... ...... ...... ..... ..... ..... ..... ..... .... .... ... ... ... ... ... . . . ... .. ... ... .... ... .. . . .......................... ... .. .... ... ... .. 1 0 ................................................... .......... ........ ........ ...... ...... ...... ..... ..... ..... ..... ..... .... .... ... ... ... ... ... . . . ... .. ... ... .... ... .. . . .......................... ... .. .... ... ... .. 0 1 0 1 0 OUTCOME: ordered pair of numbers. 0 0 1 OUTCOME: ordered pair of numbers. 94 95 The sample space: further examples The sample space The sample space S the collection of all possible outcomes of a random experiment. 1. The tickets are drawn, with replacement, until a ticket with the number “1” is extracted. In this case the sample space S = {1, 01, 001, 0001, . . .} is made up of a countably infinite number of outcomes. 2. Give a push to the hand and record the number it points. In this case the sample space is S = [0; 1) that is uncountably infinite. 0 EXP. B: S = {0, 1}. EXP. 1: S = {(0, 0), (0, 1), (1, 0), (1, 1)}. 3/4 EXP. 2: S = {(0, 0), (0, 1), (1, 0), (1, 1)}. ........................................ ............. ......... ........ ....... ....... . ...... ...... ...... .... ...... ..... ....... ..... .... ...... .... .... ... .... ... .... ... ... ... ... ... ... ... . . . . ... .. ... ... ... ... . . ... . . . ... ... ... .. ..... .... ... .. .. .. .. ..... ..... .. ... ......... .... . . .. . . ..... ... ... ... ... .. ... .. ... ... . . ... .. ... ... . ... . .. ... ... ... ... ... .. ... ... .... .... .... .... ..... ..... ...... ...... ...... ...... ........ ........ .......... ...................................................... 1/4 1/2 96 Event of a random experiment 97 Examples of event (for the experiments 1 and 2) An event is a set of outcomes, that is a subset of the sample space, to which a probability is assigned. • Sometimes an event is described by means of a proposition, however it is always possible to represent it formally by a set of outcomes; • we will denote an event by means of a capitol letter, for instance E, and it holds that E ⊆ S; • the tool used to deal and describe the relationships existing between event is set theory; • an event occurs it the random experiment results in one of its constituent basic outcomes. 98 A =“Two tickets with the same value are extracted” ={(1, 1), (0, 0)}. B =“A ticket with the number 1 is obtained in the first extraction” ={(1, 1), (1, 0)}. C =“The product of the numbers on the extracted tickets is 0” ={(0, 1), (1, 0), (0, 0)}. D =“A ticket with the number 2 is obtained in the first extraction” =∅. E =“A ticket with a numb. smaller than 2 is obtained in the first extr.” ={(1, 1), (1, 0), (0, 1), (0, 0)} =S. 99 Union and intersection of events If A and B are two events in a sample space S. For instance, if for the experiments 1 and 2 A = {(1, 1), (0, 0)} and B = {(1, 1), (1, 0)} . The events set • The basic outcomes of an experiment are singleton sets and are also known as elementary events; • The events set is the set of all possible events. • A ∪ B=“either A or B will occur”, that is A ∪ B is the set of all outcomes in S that belong to either A or B: A ∪ B = {(1, 1), (0, 0), (1, 0)}. • A∩B=“both A and B will occur”, that is A∩B is the set of all outcomes in S that belong to both A and B: A∩B = {(1, 1)}. 100 101 Three important events Mutually exclusive and collectively exhaustive events • A and B are mutually exclusive events if they have no basic outcomes in common; that is A ∩ B = ∅; • mutually exclusive events are also called disjoint events; • let E1, E2, . . . , Ek be k events of the sample space S. If such events completely cover the sample space, formally E1 ∪ E2 ∪ · · · ∪ Ek = S then they are called collectively exhaustive. 102 Sure event : event that always occurs, whatever the result of the experiment is. The sample space S is a sure event. Impossible event : event that never occurs, whatever the result of the experiment is. The empty set ∅ is an impossible event. Complement : the complement of an event E, denoted by Ē; is the set of all basic outcomes in the sample space that do not belong to E. The complement of E will occur if and only if Ē will occur. We can also write Ē = S\E where “\” is the set-difference operator. 103 Assessing probability: experiment 1 (a) Assessing probability: basic experiment • The sample space S = {(0, 0), (0, 1), (1, 0), (1, 1)} is not made up of equally likely events; Set of the events: {∅, {0}, {1}, {0, 1}} 1. P (∅) = 0; • write the sample space in a different way so as to have equally likely events 2. P ({0}) = 3/5; 3. P ({1}) = 2/5; 4. P ({0, 1}) = P (S) = 1. 1d Note that P (S) = P ({0, 1}) = P ({0} ∪ {1}) = P ({0}) + P ({1}) = 2 3 + =1 5 5 0a 0c 0b 1e (0a , 0b) (0a, 0c) (0a, 1d) (0a , 1e) (0b , 0a) (0b , 0c) (0b , 1d) (0b, 1e ) (0c, 0a) (0c , 0b) (0c, 1d) (0c , 1e) (1d, 0a) (1d , 0b) (1d, 0c) (1d, 1e) (1e , 0a) (1e , 0b) (1e , 0c) (1e, 1d) 104 105 Assessing probability: experiment 1 (b) Event of interest: {(0, 0), (1, 1)} Number of possible orderings (0a, 0b) (0a , 0c) (0a, 1d) (0a , 1e) (0b, 0a) (0b, 0c) (0b , 1d) (0b, 1e ) (0c, 0a ) (0c, 0b) (0c, 1d) (0c, 1e) (1d, 0a ) (1d, 0b) (1d, 0c) (1d, 1e) (1e , 0a) (1e , 0b) (1e, 0c) (1e, 1d) • The number of possible ways of arranging x objects in order is given by x! = x × (x − 1) × (x − 2) × · · · × 2 × 1 • “x!” is read “x factorial”. • Recall that 0! = 1. P ({(0, 0), (1, 1)}) = 8 20 106 107 Number of combinations Permutations • The total number of permutations of x objects chosen from n, denoted by Pxn , is the number of possible arrangements when x objects are to be selected from a total of n and arranged in order: Pxn = n × (n − 1) × (n − 2) × · · · × (n − x + 1). • The number of combinations, denoted by Cxn, of x objects chosen from n is the nuber of possible selections that can be made. This number is n! Cxn = . x!(n − x)! • Note that Cnn = C0n = 1; • Note that Pxn = • alternative notation: n! . (n − x)! Cxn = n x . 108 109 Assessing probability (1) Probability is the chance that an uncertain event will occur Classical probability : provided that all outcomes in the sample space are equally likely to occur, the probability of an event is the ratio between the number of outcomes that satisfy the event and the total number of outcomes in the sample space. Relative frequency probability : when an experiment is performed, for any event only one of two possibilities can happen; it occurs or it does not occur. The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event. More formally, frequentists sees probability as the long-run expected frequency of occurrence. 110 Assessing probability (2) Subjective probability : a probability derived from an individual’s personal judgment about whether a specific outcome is likely to occur. Subjective probabilities contain no formal calculations and only reflect the subject’s opinions and past experience. BETTING APPROACH: find a specific amount to win or lose such that the decision maker is indifferent about which side of the bet to take. 111 Probability rules Probability postulates Probability is a function defined on the set of the events that associates to every event A a real number P (A) that satisfies the following conditions 1. For every event A it holds that P (Ā) = 1 − P (A), that is called the complement rule. 2. P (∅) = 0. 3. For every event A it holds that 0 ≤ P (A) ≤ 1. 1. P (A) ≥ 0 4. If A = A1 ∪ A2 ∪ · · · Ak with Ai ∩ Aj = ∅ for every i 6= j, then 2. P (S) = 1 P (A) = P (A1) + P (A2) + · · · + P (Ak ). 3. if A and B are disjoint (A ∩ B = ∅) then 5. For every pair of events A and B it holds that P (A ∪ B) = P (A) + P (B). P (A ∪ B) = P (A) + P (B) − P (A ∩ B) this is called the addition rule. 112 113 Compute P (A2|A1) Conditional probability For the experiment 1 consider the following events: A1 = “the result of the FIRST extraction is 0” A2 = “the result of the SECOND extraction is 0” (0a, 0b) (0b, 0a) (0c, 0a ) (1d, 0a ) (1e , 0a) (0a , 0c) (0b, 0c) (0c, 0b) (1d, 0b) (1e , 0b) (0a, 1d) (0b , 1d) (0c, 1d) (1d, 0c) (1e, 0c) (0a , 1e) (0b, 1e ) (0c, 1e) (1d, 1e) (1e, 1d) A2|A1 = “the result of the SECOND extraction is 0 GIVEN that 0 is obtained in the FIRST extraction” P (A2|A1) = A1|A2 = “the result of the FIRST extraction is 0 GIVEN that 0 is obtained in the SECOND extraction” P (A1) = 3 5 P (A2) = 3 5 P (A2|A1) = ? P (A1|A2) = ? 114 = # outcomes in A1 and A2 1 = # outcomes in A1 2 # outcomes in A1 and A2 # outcomes in S # outcomes in A1 # outcomes in S = P (A2 ∩ A1) P (A1) 115 Computing P (A1|A2) Multiplication rule (0a, 0b) (0b, 0a) (0c, 0a ) (1d, 0a ) (1e , 0a) (0a , 0c) (0b, 0c) (0c, 0b) (1d, 0b) (1e , 0b) (0a, 1d) (0b , 1d) (0c, 1d) (1d, 0c) (1e, 0c) (0a , 1e) (0b, 1e ) (0c, 1e) (1d, 1e) (1e, 1d) For every pair of events A and B the probability of A given B can be computed as P (A|B) = P (A ∩ B) P (B) so that P (A1|A2) = = 1 # outcomes in A1 and A2 = # outcomes in A2 2 # outcomes in A1 and A2 # outcomes in S # outcomes in A2 # outcomes in S P (A ∩ B) = P (A|B) × P (B) or, equivalently, = P (A2 ∩ A1) P (A2) P (A ∩ B) = P (B|A) × P (A) 116 117 Independence Example with the experiment 2 Two events A and B are said to be independent if P (A|B) = P (A) 1d or, equivalently, P (B|A) = P (B). 0a If two event, A and B are independent, then the multiplication rule simplifies as P (A ∩ B) = P (A) × P (B) Also the revers implication holds true, that is the factorization of P (A ∩ B) as P (A ∩ B) = P (A) × P (B) is a sufficient condition to prove that A and B are independent. 118 0c 0b 1e (0a , 0a) (0a , 0b) (0a , 0c) (0a , 1d) (0a, 1e ) (0b, 0a) (0b, 0b) (0b, 0c) (0b, 1d) (0b , 1e) (0c , 0a) (0c, 0b) (0c, 0c) (0c , 1d) (0c, 1e ) (1d, 0a) (1d, 0b) (1d, 0c) (1d , 1d) (1d, 1e ) (1e , 0a) (1e , 0b) (1e , 0c) (1e , 1d) (1e, 1e) In this case it holds that 3 3 P (A2) = P (A1) = 5 5 P (A2|A1) = 3 5 P (A1|A2) = 119 3 5 Law of total probability Bayes’ theorem If the events A1, A2, . . . , Ak form a partition of the sample space, so that 1. S = A1 ∪ A2 ∪ · · · ∪ Ak 2. Ai ∩ Aj = ∅ per ogni i 6= j (collectively exhaustive); (mutually exclusive). Bayes’ formula provides an alternative way to compute conditional probabilities. For every pair of events A and B it holds that P (A|B) = Then for every event B it holds that P (B) = P (B ∩ A1) + P (B ∩ A2) + · · · + P (B ∩ Ak ) = P (B|A1)P (A1 ) + P (B|A2)P (A2) + · · · + P (B|Ak )P (Ak ) P (B|A)P (A) P (B) Typically the denominator can be computed by applying the law of total probability. P (B) = P (B|A)P (A) + P (B|Ā)P (Ā) 120 121 The envelopes riddle Which experiment? • One of my friends carries out either the experiment 1 or the experiment 2; • Suppose you’re on a game show, and you’re given the choice A B C of three labeled envelopes: • it is unknown which experiment has been carried out. • two envelops are empty and one contains 1000 euro. The host knows where the money is. P (E1) = P (E2) = 1 2 • you choose one of the three envelops, say ; • the host opens one of the remaining envelops, say and shows that it is empty; • the result of the experiment is {(0, 0)} • QUESTION: which experiment is most likely to have been executed? • SOLUTION: it is necessary to compute P (E1|{(0, 0)}) A C , • now you are allowed to switch your envelope with the host; that is take B and handle A to the host; • QUESTION: is it better for you to switch, or better not to switch? P (E2|{(0, 0)}). 122 123 The rare diseases problem (1) The accuracy of medical diagnostic test, in which a positive results indicates the presence of a disease, is often stated in terms of its sensitivity, the proportion of diseased people that test positive, and its specificity, the proportion of people without the disease who test negative. The rare diseases problem (2) • For instance – P (D) = 1/1000 – P (+|D) = 0, 99 • D=“a person has the disease”; – P (−|D̄) = 0, 98 • +=“A person’s test result is POSITIVE”; • −=“A person’s test result is NEGATIVE”; • SENSITIVITY: probability that the test result is positive for a person who has the disease, P (+|D); • QUESTION: A person’s test result is positive. What is the probability that the person actually has the disease, P (D|+) = ? • SPECIFICITY: probability that the test result is negative for a person who has not the disease, P (−|D̄); 124 125 Example of random variable: a gambling game Roughly speaking, a random variable is a numerical description of the outcome of an experiment. Random variables Aim: define tools that make it possible 1. to deal more easily and effectively with random experiments; • 3 draws with replacement; • receive one euro for every “1” extracted; • pay one euro for every “0” extracted. outcome 2. to develop a general theory that can be applied to all the random experiments that share a common probabilistic structure (even though apparently distinct form each other). 126 ........................................... ......... ....... ....... ...... ..... ..... ..... .... .... .... ... ... ... ... . ... . ... ... ... ...................... ... . . ... ... . 0 1 gain -3 (0, 0, 0) −→ (1, 0, 0) (0, 1, 0) (0, 0, 1) ց −→ ր -1 (1, 1, 0) (0, 1, 1) (1, 0, 1) ց −→ ր 1 (1, 1, 1) −→ 3 127 Definition of random variable Discrete vs continuous random variables DEFINITION: a random variable is a function from the sample space to the real line, i.e. a function that maps every element of the sample space onto a single real number: X(s) → IR. • The value taken by a random variable depends on the outcome of the experiment, and it is not known before the experiment is performed; • it is important to distinguish between a random variable and the possible values that it can take. Capitol letters, such as X, are used to denote random variables. The corresponding lowercase letter, x, denotes the possible value; • in the example of the gambling game, if X is the random variable corresponding to the gain, then X((0, 0, 0)) = −3, X((0, 1, 1)) = 1, etc. 128 • A random variable is said CONTINUOUS if it can take on any numerical value in an interval or collection of intervals; • a random variable is said DISCRETE if it can take on either a finite number of values or a countable number of values; • every value of a discrete random variable can be associated with a probability value. outcome gain probability (0, 0, 0) −→ -3 1/8 (1, 0, 0) (0, 1, 0) (0, 0, 1) ց −→ ր -1 3/8 (1, 1, 0) (0, 1, 1) (1, 0, 1) ց −→ ր 1 3/8 (1, 1, 1) −→ 3 1/8 129 Probability distribution 1 3/8 3 1/8 0.4 0.3 • its support, denoted by SX , and defined as the set of all possible values which the random variable can take on; 0.2 -1 3/8 A discrete random variable X is characterized by 0.1 -3 1/8 Characterization 0.0 values prob. probability 0.5 • A probability distribution is a function that describes the probability of a random variable taking certain values. −3 −2 −1 • in general values of X P (X = x) 0 1 2 3 gain x1 P (X = x1) x2 P (X = x2) x3 P (X = x3) • its probability mass function or, shortly, its probability function. ... ... 130 131 The (cumulative) distribution function (cdf) The probability mass function (pmf) DEFINITION: the probability mass function of a discrete random variable X is a function defined on SX that gives the probability that X is exactly equal to x ∈ SX , formally F (x) = P (X ≤ x) = for every x ∈ SX p(x) = P (X = x) Properties of the probability mass function: 2. X p(y) y∈Sx; y≤x Properties of the distribution function: 1. p(x) ≥ 0 X DEFINITION: for every real value x ∈ IR the cumulative distribution function of X is defined as 1. F (x) is (not necessarily strictly) non-decreasing ; p(x) = 1. 2. limx→−∞ F (x) = 0 and limx→+∞ F (x) = 1; x∈SX 3. F (x) is right-continuous, that is lim Any function that takes on values in SX and that fulfills the two properties above is a probability mass function for X. x→x+ 0 F (x) = F (x0). Every function that satisfies the three properties above is a distribution function. 132 133 Expected value of a discrete random variable Graph of the distribution function for the gambling game example. The expected value (or mean) of a discrete random variable X is the number 1.0 Example of probability distribution function E(X) = µX = X x p(x). 0.8 x∈SX 0.4 F(x) 0.6 • The expected value is a measure of central tendency of the probability distribution; 0.2 • note the similarity with the mean of a population; 0.0 • the expected value can be thought of as the arithmetic mean of and infinite number of realizations of the random variable; −6 −3 −1 0 1 3 6 The distribution function is discontinuous at the points -3, -1, 1, x 3 and constant in between. In the discontinuity points it takes values 1/8, 4/8, 7/8 and 1. 134 • for the gambling game example E(X) = −3 × 1 3 3 1 −1× +1× +3× =0 8 8 8 8 135 The standard deviation of a random variable The variance of a discrete random variable The variance of a discrete random variable X is the number 2 Var(X) = σX = E [X − E(X)]2 = n o X x∈SX (x − µ)2 p(x) The variance is not expressed in the same units as the random variable, but square units. Therefore, it is necessary to transform its value by computing the square root to obtain the standard deviation of the variable. SD(X) = • the variance is a measure of dispersion (around the expected value) of the probability distribution; X x∈SX x2 p(x) − E(X)2 Var(X) • In the gambling game example • similarly to the result shown for the variance of a population, it holds that Var(X) = E(X 2) − E(X)2 = q E(X 2) = 9 × 3 3 1 1 +1× +1× +9× =3 8 8 8 8 • hence the variance is Var(X) = 3 − 02 = 3; √ • and the standard deviation is SD(X) = 3 = 1.73 136 137 Example of discrete uniform distribution (1) The discrete uniform distribution The discrete uniform random variable, X, takes on a finite number of values, x1, . . . , xK with constant probabilities all equal to 1/K, that is probability mass function · · · xK 1 ··· K distribution function 1.0 0.8 F(x) p(x) 1 for i = 1, . . . , K K whereas is cumulative distribution function is numb. of xi ≤ x F (x) = for x ∈ IR K 0.6 0.15 hence its probability function can be written as 0.00 0.2 0.05 0.4 0.10 p(xi ) = 0.0 1 K 0.25 x2 0.20 values x1 1 prob. K The random variable X relative to the roll of one die has discrete uniform distribution with values x1 = 1, x2 = 2, x3 = 3, x4 = 4, x5 = 5, x6 = 6 and p(xi) = 1/6 −1 0 1 2 3 4 x 138 5 6 7 8 −1 0 1 2 3 4 5 6 x 139 7 8 Bernoulli distribution (1) Example of discrete uniform distribution (2) Experiment: The expected value of the random variable X relative to the roll of a die is • Box with r + s tickets: 6 1 X i = 3.5 E(X) = 6 i=1 r with Furthermore, E(X 2) = 6 1 X i2 = 15.17 6 i=1 1 and s with • the proportion of “1”’s in the box is π = 0 r r+s • extract ONE ticket Var(X) = 15.17 − 3.52 = 2.92 √ SD(X) = 2.92 = 1.71 Random variable Y = number on the extracted ticket. 140 141 Bernoulli distribution (2) Binomial distribution (1) Experiment: • The support of Y is {0, 1} and its probability function is p(y) = π y (1 − π)1−y that is p(y) = for y = 1 π 1−π • Box with r + s tickets: for y = 0 r with • the expected value of Y is E(Y ) = π and the variance Var(Y ) = π(1 − π); • graphical representation of the probability distribution for some values of π: 1.0 0.8 0.6 0 1 2 0 1 2 0 r r+s Random variable X= (two equivalent definitions) 1. sum of the values on the extracted tickets; 0.0 −1 s with • n tickets are extracted with replacement. 0.2 0.4 1.0 0.8 0.6 0.4 0.0 0.2 0.6 0.4 0.2 0.0 −1 and • the proportion of tickets with “1” is π = π = 0.8 1.0 π = 0.5 0.8 π = 0.2 1 −1 0 142 1 2 2. number of tickets with “1”. 143 Binomial distribution (2) Binomial distribution (3) • The support of X is {0, 1, . . . , n} and its probability function is n p(x) = π x (1 − π)n−x for x = 0, 1, . . . , n x General formulation: • Random experiment with two possible outcomes coded as SUCCESS and FAILURE: P (SUCCESS) = π • The expected value of X is E(X) = n π and the variance is Var(X) = n π(1 − π); • graphical representation of the probability distribution for some values of π with n = 10: π = 0.8 0 1 2 3 4 5 6 7 8 9 10 0.4 – under the same conditions; 0.3 – independently. • X = exact number of successes in n trials. 0.1 0.2 0.3 0.2 then X follows a binomial distribution with parameters n and π. 0.0 0.1 0.0 0.0 0.1 0.2 0.3 0.4 π = 0.5 0.4 π = 0.2 • repetition of the experiment n times 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 145 144 Continuous random variables Discrete random variable: notation 0 • If the distribution of X is discrete uniform, then we write • Experiment: give a push to the hand. 3/4 X ∼ Ud{x1, . . . , xK } • X = value pointed by the hand when it stops. • if the distribution of Y is Bernoulli with parameter π, then we write Y ∼ Bernoulli(π) or, more compactly, or, more compactly, 1/2 Example of events: Y ∼ Be(π) • If the distribution of X is binomial with parameters n and π then we write X ∼ Binomial(n, π) ....................................... ............. ......... ......... ....... ....... ...... ... ...... ...... ........ ...... ..... ..... ....... .... .... ... ... ..... ... .... ... ... ... .... ... . . ... ... .. ... ... .... ... . . ... . ... ... ... ... ..... .... ... .. .. .. ..... ..... .. .. ... .. .... ............ .. ....... ... ... .. ... .. ... ... ... ... . ... .. ... ... . . ... . ... ... ... ... ... ... ... ... ... .... .... ... ..... ..... ...... ..... ...... ...... ....... ....... . . . ......... . . . . ..... ............. ....................................... X ∼ Bin(n, π) 146 • X < 0.5 −→ X ∈ [0; 0.5) • 0.4 < X ≤ 0.7 −→ X ∈ (0.4; 0.7] • X = 0.5 −→ X ∈ [0.5] • X 6= 0.5 −→ X ∈ [0; 0.5) ∪ (0.5; 1) 147 1/4 Assessing the probability of events If all the points are equally likely (same probability of being pointed) • P (X < 0.5) = 0.5 • Even though we know for sure that X will take on some real number, the probability that it takes any fixed real number is equal to zero; • P (0.4 < X ≤ 0.7) = 0.7 − 0.4 = 0.3 • P (X = 0.5) =? Consider the interval [0.5 − 2ǫ ; 0.5 + 2ǫ ] where ǫ > 0 is a small number, then P 0.5 − Some comments ǫ ǫ ≤ X ≤ 0.5 + =ǫ 2 2 • as a consequence, it is not possible to describe the probabilistic structure of a continuous random variable by means of a probability mass function; • OBJECTIVE: identify and effective way to describe the probabilistic structure of a continuous random variable. for ǫ → 0 one obtains P (X = 0.5) = 0 furthermore P (X 6= 0.5) = 1 − P (X = 0.5) = 1 148 149 The cumulative distribution function Example of cumulative distribution function • An event of a continuous random variable can always be represented by an interval or by the union of disjoint intervals; • the probability of any event can be computed from the probability of the events corresponding to the following family of intervals (−∞; x] for F (x) = P (X ≤ x) = P (X ∈ (0; x]) = x so that x ∈ IR • or, equivalently, from the cumulative distribution function of X F (x) = P (X ≤ x) For the experiment of the wheel, for x ∈ [0; 1), it holds that for x ∈ IR F (x) = 0 for x < 0 1.0 x for 0 ≤ x < 1 0.5 1 for x ≥ 1 0.0 ............................................................................................... ... ... ... ... ... ... ... ... ... . . ... ... .. ... ... ... ... ... ... ... . . . ... ... .. ... ... ... ... ... ... ... . . ........................................................................................................................................... −1 0 1 • in the continuous case, the cumulative distribution function is characterized by the same three properties of the discrete case. 150 151 2 The probability density function (pdf) Probability of an interval The probability density function of a continuous random variable X is defined as d F (x) f (x) = dx How can P (a < X ≤ b) be computed? for the fundamental theorem of integral calculus it holds that F (b) = P (X ≤ b) Z b = P (X ∈ (−∞; a] ∪ (a, b]) a = P (X ∈ (−∞; a]) + P (X ∈ (a, b]) f (x) dx = F (b) − F (a) = P (a < X ≤ b) and, consequently, = P (X ≤ a) + P (a < X ≤ b) 1. f (x) ≥ 0 for every x ∈ IR = F (a) + P (a < X ≤ b) 2. so that P (a < X ≤ b) = F (b) − F (a) Z +∞ −∞ f (x) dx = 1 Note that it is NOT required that f (x) ≤ 1 because the values of a probability density function are not probabilities. 152 153 Interpretation of the density function 2.0 Example of probability density function 1.0 f(x) dx = 1 for x ∈ [0; 1) dx f (a) 6= P (X = a) note that 0.5 f (x) = • The values of the density function are not probabilities, 1.5 In the experiment of the wheel 0.0 and zero otherwise. P (X = a) = −0.5 0.0 0.5 1.0 Z a a f (x) dx = 0. 1.5 2.0 x R 0.7 1 dx = 0.4 P (0.3 < X ≤ 0.7) = 0.3 f(x) 1.0 0.5 R 0.7 P (0.3 ≤ X < 0.7) = 0.3 1 dx = 0.4 P (a − ǫ/2 ≤ X ≤ a + ǫ/2) = 0.0 R 0.7 P (0.3 < X < 0.7) = 0.3 1 dx = 0.4 • however, for a small ǫ > 0 it holds that 1.5 R 0.7 P (0.3 ≤ X ≤ 0.7) = 0.3 1 dx = 0.4 −0.5 0.0 0.3 0.5 0.7 1.0 1.5 Z a+ǫ/2 a−ǫ/2 f (x) dx ≈ ǫ f (a). and therefore the probability that the outcome of the experiment is a value “close to” a point with higher density is larger than the corresponding probability for a point with lower density. x 154 155 Expected value and variance of a continuous random variable From the density function to the cum. distribution function The cumulative distribution function can be computed from the density function as follows • The expected value (or mean) of a continuous random variable X is the number E(X) = µX = Z +∞ x f (x) dx −∞ • the variance is F (x) = Z x −∞ f (t) dt Z ∞ 2 Var(X) = σX = E [X − E(X)]2 = (x − µ)2 f (x) dx −∞ n per x ∈ IR o • the standard deviation is SD(X) = σX = q Var(X) 156 157 The continuous uniform distribution For an interval [a; b], the continuous uniform distribution has density function f (x) = The exponential distribution A random variable X has exponential distribution with parameter λ if its support is the interval [0, ∞) and has 1 for x ∈ [a, b] b−a • probability density function: and zero outside the interval. f (x) = λe−λx • cumulative distribution function: • E(X) = b+a 2 and Var(X) = (b − a)2 ; 12 F (x) = 1 − e−λx • we write X ∼Exp(λ). • we write X ∼ U (a; b). 158 159 Exp(1) The “memoryless” property For λ = 1 the density function of the exponential distribution is f (x) = e−x 1.0 0.8 0.6 0.4 0.2 1. the time it takes before your next telephone call; 2. the time until default (on payment to company debt holders) in reduced form credit risk modeling; 3. the time until a radioactive particle decays, or the time between clicks of a geiger counter. • The memoryless property means that “the future is independent of the past”, i.e. the fact that an event hasn’t happened yet, tells us nothing about how much longer it will take before it does happen. This says that the conditional probability that we need to wait, for example, more than another 10 seconds before the first arrival, given that the first arrival has not yet happened after 30 seconds, is equal to the initial probability that we need to wait more than 10 seconds for the first arrival. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .... .... ... .... ..... ..... ...... ...... ...... ...... ....... ....... ........ ......... ......... ........... ............. ................ .................... ............................. ....................................................... ............................................. 0 1 2 3 4 5 E(X) = Z ∞ xe−x dx = 1 Var(X) = Z ∞ x2e−x dx − E(X)2 = 1 0 0 • Let X be the random variable associated with the arrival-time of a given process. For instance, 160 161 PROBLEM: P (X > x) must be a function G(·) such that Mathematical formulation of the memoryless property PROBLEM: we want to characterize the probability distribution of a random variable X describing the arrival-time of a memoryless process. G(x + y) = G(x)G(y) for every x, y > 0 SOLUTION: G(x) = eCx In mathematical terms: P (X > x + y|X > x) = P (X > y) for every x > 0 G(x) = eCx is a probability for C < 0. Hence, if λ > 0 we can write That is: P (X > x) = e−λx P (X > x + y) = P (X > x + y|X > x)P (X > x) and F (x) = 1 − P (X > x) = 1 − e−λx = P (X > y)P (X > x). then X ∼Exp(λ). 162 163 Transformations of random variables • X random variable (discrete or continuous); Linear transformations • Y = g(X) function of X, for instance If Y is a linear transformation of X, that is 1. Y = 5X + 3 (linear transformation); Y =aX +b 2. Y = X 2 + 1 (non-linear transformation); then • the expected value of Y is E(Y ) = E[g(X)] and, in general, E[g(X)] 6= g(E[X]) but it holds that E(Y ) = X and g(x) p(x) E(Y ) = x∈SX Z +∞ −∞ 1. E(Y ) = E(aX + b) = aE(X) + b 2. Var(Y ) = Var(aX + b) = a2Var(X). g(x) f (x) dx for the discrete and continuous case respectively. 164 165 Example of linear transformation The standardization For X ∼Exp(1) let Y = X/λ. Then A specially relevant linear transformation is called STANDARDIZATION of a random variable X and is give by FY (y) = P (Y ≤ y) = P (X/λ ≤ y) = P (X ≤ λy) = FX (λy) = 1 − e−λy Z= X − E(X) SD(X) it is easy to see that RESULT: Y ∼ Exp(λ). 1. E(Z) = 0; APPLICATION: E(Y ) = 1 λ 2. Var(Z) = SD(Z) = 1. 1 Var(Y ) = 2 λ 166 167 Introduction to the normal (or Gaussian) distribution The STANDARD normal distribution The normal distribution is considered the most prominent probability distribution in statistics. There are several reasons for this: 1. the “bell” shape of the normal distribution makes it a convenient choice for modelling a large variety of random variables encountered in practice; A random variable Z has standard normal distribution if its density function is 0.50 - f (z) = √1 2 exp − 1 2 z 2π n o 0.25 - 0 2. the normal distribution arises as the outcome of the central limit theorem, which states that under mild conditions the sum of a large number of random variables is distributed approximately normally; 3. the normal distribution is very tractable analytically, that is, a large number of results involving this distribution can be derived in explicit form. ................... ... .... ... ... ... ... ... .. ... ... ... ... ... ... ... ... . ... ... ... ... ... . ... ... ... ... . ... . ... ... .... ... ... .. ... ... . ... ... ... ... . ... ... ... ... ... . ... ... ... ... . ... ... ... ... ... . ... ... ... ... . ... . ... ... .... .... . . . ..... . ...... ..... ....... ...... . . . . . ........... . ....... ......................................... ............................................ −4 −3 −2 −1 0 1 2 3 4 z Features of the standard normal distribution: 1. E(Z) = 0 and Var(Z) = SD(Z) = 1 and we write Z ∼ N (0, 1); 2. it is symmetric around the mean and unimodal; 3. its pdf is strictly positive for every z ∈ IR but the area under the curve outside the interval (−4; 4) is close to zero. 169 170 Probability of some relevant intervals The normal distribution area between −1 and 1: 68.26% Strictly speaking, it is not correct to talk about “the normal distribution” since there are many normal distributions. Normal distributions can differ in their means and in their standard deviations. The probability density function of a random variable X with (arbitrary) normal distribution is area between −2 and 2: 95.44% area between -1.96 and 1.96: 95% 1 1 f (x) = √ exp − 2 (x − µ)2 2σ σ 2π area between −3 and 3: 99.74% 171 • E(X) = µ and Var(X) = σ 2; • we write X ∼ N (µ, σ 2). 172 Normal distribution and linear transformations Multiple random variables: the bivariate case • If Z ∼ N (0, 1) then • if X ∼ N (µ, σ 2) then σ Z + µ = X ∼ N (µ, σ 2); Let X1 and X2 be two random variables: X −µ = Z ∼ N (0, 1); σ • X1 and X2 are INDEPENDENT if and only if • hence for the pdf of X it holds that P ({X1 ≤ x1} ∩ {X2 ≤ x2}) = P (X1 ≤ x1) × P (X2 ≤ x2) 1. the area between µ − σ and µ + σ è 68.26%; for every pair x1 and x2. 2. the area between µ − 2σ and µ + 2σ è 95.44%; 3. the area between µ − 1.96σ and µ + 1.96σ è 95%; • Two (or more) random variables are IDENTICALLY DISTRIBUTED if they have the same probability distribution; 4. the area between µ − 3σ and µ + 3σ è 99.74%. • the cumulative distribution function of Z is denoted by Φ(z). • i.i.d.=independent and identically distributed. 173 174 Linear combination of two random variables Let X1 and X2 be two random variables: a linear combination of X1 e X2 is a random variable defined as Expected value of a linear combination of two random variables The expected value of Y = a1X1 + a2X2 + b is Y = a1X1 + a2X2 + b E(Y ) = a1E(X1) + a2E(X2) + b where a1, a2 and b are real constants. EXAMPLE: let X1 and X2 be the result of two die-rolls (a black and a white die, say). X1 and X2 are i.i.d and Y = X1 + X2 is the linear combination corresponding to the sum of the two resulting values. 175 EXAMPLE: if X1 and X2 are the results of the black and white die-roll then E(X1 + X2) = 3.5 + 3.5 = 7 and E(X1 − X2) = 0. 176 Variance of a linear combination of two random variables Multiple random variables If X1 and X2 are INDEPENDENT, then the variance of Y = a1X1 + a2X2 + b is Consider the sequence of random variables X1, X2, . . . , Xn 2 Var(Y ) = a2 1 Var(X1) + a2 Var(X2) • These n random variables are MUTUALLY INDEPENDENT if and only if for every x1, . . . , xn it holds that EXAMPLE: if X1 and X2 are the results of the black and white die-roll then, P (X1 ≤ x1 ∩ X2 ≤ x2 ∩ · · · ∩ Xn ≤ xn) Var X1 Var(X1 = Var(X1) + Var(X2) = 2.92 + 2.92 = 5.84 – X2) = Var(X1) + Var(X2) = 2.92 + 2.92 = 5.84 + X2 = P (X1 ≤ x1) × P (X2 ≤ x2) × · · · × P (Xn ≤ xn) 177 178 i.i.d. random variables: examples Linear combination of random variables E1: A box contains tickets labeled with either “0” or “1”. Let π be the proportion of tickets with “1”. n tickets are extracted with replacement from the box. For i = 1, . . . , n let Xi denote the result of the ith extraction. The n random variables are i.i.d. with Xi ∼Be(π). E2: The “experiment of the wheel” is repeated n times. For i = 1, . . . , n let Xi denote the result of the ith repetition of the experiment. The n random variables are i.i.d. with Xi ∼ U (0; 1). E3: n married couples go to a dinner. Husbands seat on the same side of the table, and wives on the opposite side. Everybody chooses her/his seat randomly. For i = 1, . . . , n let Xi be equal to “1” if the ith couple is sitting opposite each other and “0” otherwise. These n random variables are identically distributed with Xi ∼Be(1/n) but NOT INDEPENDENT. 179 A linear combination of the random variables X1, . . . , Xn is a random variable defined as Y = a1X1 + a2X2 + · · · + anXn + b where a1, . . . , an, b are real constants. Example: E1: the total number of tickets with “1” in the n extractions is Y = X1 + · · · + Xn and its distribution is Bin(n, π); E2: the sum of the n results of the experiment is Y = X1 + · · · + Xn ; E3: the number of married couples sitting opposite each other is Y = X1 +· · ·+Xn, but Y is NOT a binomial random variable. 180 Expected value of a linear combination of random variables The expected value of Y = a1X1 + a2X2 + · · · + anXn + b is E(Y ) = a1E(X1) + a2E(X2) + · · · + an E(Xn) + b and, more specifically, if X1, . . . , Xn are identically distributed with E(Xi ) = µ and Variance of a linear combination of random variables If the random variables X1, . . . , Xn are INDEPENDENT, the variance of Y = a1X1 + a2X2 · · · + an Xn + b is equal to 2 2 Var(Y ) = a2 1 Var(X1) + a2 Var(X2 ) · · · + an Var(Xn ) and, more specifically, if X1, . . . , Xn are i.i.d. with Var(Xi) = σ 2 Y = X1 + X2 + · · · + Xn and Y = X1 + X2 + · · · + Xn it holds that it holds that Var(Y ) = n × σ 2 E(Y ) = n × µ EXAMPLE: EXAMPLE: E1: Var(Xi) = π(1 − π) and therefore Var(Y ) = nπ(1 − π); E1: E(Xi ) = π and therefore E(Y ) = n × π; 1 and therefore Var(Y = n ; E2: Var(Xi) = 12 ) 12 n E2: E(Xi ) = 1 2 and therefore E(Y ) = 2 ; 1 1 − 1 . However, Var(Y E3: Var(Xi) = n ) = ?: in this case n independence does not hold. 1 and therefore E(Y ) = 1. E3: E(Xi ) = n 181 182 Linear combination of normally distributed random variables If X1, . . . , Xn are INDEPENDENT and NORMALLY DISTRIBUTED then the linear combination Y = a1X1 + a2X2 + · · · + an Xn + b has the following properties The central limit theorem If X1, . . . , Xn are i.i.d with E(Xi) = µ and Var(Xi) = σ 2 then the distribution of the random variable Sn = X1 + X2 + · · · + Xn 1. E(Y ) = a1 E(X1) + a2 E(X2) + · · · + an E(Xn ) + b; is approximatively normal 2 2 2. Var(Y ) = a2 1 Var(X1) + a2 Var(X2 ) + · · · + an Var(Xn ) Sn ≈ N (n µ; n σ 2) 3. Y is normally distributed. More specifically, if X1, . . . , Xn are i.i.d. with E(Xi ) = µ and Var(Xi ) = σ 2 and, furthermore, Y = X1 + X2 + · · · + Xn , then The symbol “≈” means “approximatively distributed as”. The quality of the normal approximation increases with n, but also depends on the probability distribution of Xi . Y ∼ N (n µ; n σ 2). 183 184 pdf of S4 and N (4; 4) Central limit theorem: example pdf of S8 and N (8; 8) 0.25 X1, . . . , Xn are i.i.d. with density function f (x) = e−x for x > 0. Hence, E(Xi) = 1 and Var(Xi) = 1. theorem it follows that From the central limit 0.16 .... ....... .......... ... .............. ... ...... .............. ... .... ... ... ... ... ... .... ... ... ... ... ... ... ... ... .. .. ... ... ... ... ... ... .... .... ... ... ... ... ... ... ... ... .... .... ... ... . . ... ... ... ..... . . ... ... . . ... ... ... ... ... ... .... .... ... ... ... ... ... ... .... .... ... ... . . ... ... ... ..... ... ... . . .. ... .. ... ... ... ... ... ... ........ .. ... ... ... ... ....... . . ... .. . ...... ... ...... ...... . ....... ...... ...... ..... ....... . . ....... .... .. ......... .... ..... ........... . . . . ............. ...... ... ................. ....... ....... ....................... . . . . . . . . .......................................... ....... .............. .......................................................................... ............................................. ... .................... ..... ..... ... .... ... ... ... ............... ... ....... ..... ........... ... ... ..... . .... .. ... ...... .... .... ... ... ... ... ... .... . ... ... ... .. ... ... ... ... . . ... . ... . ... ... ... ... ... ... ... . .... . ... ... . ... ... ... ... ... ... . . ... . . . ... .. ... ... ... ... ... ... ... .... ..... . . ... . ... ..... ... ..... . . ... . . . ... .... ... .... .... ... ......... .... .... .... ... ...... .... ... ....... ... ... .... .... ... ....... ..... ... . . . ..... ... ... ... ......... .... .... .......... . . . .. . ....... .... .. ....... ..... .... ........... . . . . . ............... .. ...... ................. ... ....... ......... .............. . . . . . . . . . ............ ...................... .......... ................. ........................................ ... .................... .......... 0.20 0.12 0.15 0.08 0.10 0.04 0.05 X1 + X2 + · · · + Xn = Sn ≈ N (n; n) pdf of S1 = X1 and N (1; 1) 1.0 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . ................. ........ ..... ............. ...... ...... ... ...... ..... ... . . . . ..... .... ..... ..... .... ..... ..... . .... . . . .... ..... .... .... ..... ..... ..... . . ...... . . ..... .. ...... ... ..... ....... ..... . ........ .......... . . . ......... ..... ...... .................. ....... . ................... . . . . . . . ...................... ...... ...................................................................... ............ ..................................................... ................................................................ 0.8 0.6 0.4 0.2 −3 −2 −1 0 1 2 3 4 pdf of S2 and N (2; 2) pdf of S20 and N (20; 20) 0.4 .............. ... ... ... ... ... ... ... ... ... ... ... ... ... .. ....... ... .... ........... ......... ..... .... ........ ... ... ... .... . ..... . ... ... ... ... ... ... ... ... . ... . ... . ... ... ... ... .. ... ... ... . ..... ... ... .. ... .. ... ... ... ... .... .... ... ... ... ... ... ... ... ..... ..... ... ... ... .. ... ... . ... ......... ... ..... . ... . ....... . .... ..... . . ... .. ... ... ..... ... ... .... ..... .... . ...... .. ... ... ...... .... .... .. . ......... . . . . .......... .... ......... ... ..... . . . ............... . . . .................... ..... ... .......... ................... ........ .................................................................. ... ........... ................ ................................... .. 0.3 0.08 0.06 0.2 0.04 0.1 0.02 −3 −2 −1 0 1 0 −2−1 0 1 2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 185 8 10 15 20 25 30 10 15 20 pdf of S50 and N (50; 50) ....... ...... ........ ... ......................... ... .... ... .... ... ... ... .... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .... . ... ... . . ... ... ... ... ... ... .... .... ... ... ... ... ... ... .... .... ... ... ... ... ... .... ... .. .... ... ... ... . . ... ... ... ..... . . ... ... .. ... ... ... ... ... ... ........ ... ... .. .. ... ... ....... ... ... . ...... ...... ...... ....... . ...... ...... ...... ..... ....... . ...... ... ...... ... . ..... . . ..... ..... .... ....... ... . ..... ...... ........ ........ . ........... . ........... ..... ... ............. ............. . . . ................ . . ....................... ...... .... ............................................ .......... ...... ... .............................................. 5 5 35 0.06 0.05 0.04 0.03 0.02 0.01 ....... ...... ................. ... ...... ............... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .... . ... ... . . ... ... ... ... ... ... .... .... ... ... ... ... ... ... ........ ... .. .. .. ...... ....... ...... . ...... ...... ...... ....... . ...... ...... ...... ....... ...... . ...... ...... ...... ....... . ...... ...... ...... ....... ...... . ...... ...... ...... ..... ...... . ..... ... .... ... . . . .... ...... .... ....... ..... . . ....... ....... ........... .......... . . ............. . . . ............... ........... .................. ................... . . . . . ............................. . . . . . ... . ....................... ................................................... 30 40 50 186 60 70 80