Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Theory and Random Variables One of the most noticeable aspects of many computer science related phenomena is the lack of certainty. When a job is submitted to a batch oriented computer system, the exact time the job will be completed is uncertain. The number of jobs submitted tomorrow is probably not known either. Similarly, the exact response time for an interactive inquiry system cannot be predicted. If several computers are attached to a local area network, some of them may try to communicate at almost the same time and thus cause a collision on the network. How often this will happen during a given period of time is a random number. In order to work with such observed, uncertain processes, we need to put them into a mathematical framework. This is the purpose of this chapter. To apply probability theory to the process under study, we view it as a random experiment, that is, as an experiment whose outcome is not known in advance but for which the set of all possible individual outcomes is known. The sample space (Stichprobenraum, espace échantillon) of a random experiment is the set of all possible simple outcomes of the experiment. These possible outcomes are called elementary events (Elementarereignisse, événements élémentaires). Elementary events are mutually exclusive. Example 1 Tossing a fair die: Example 2 Tossing two fair dice, consider the sum of points: Example 3 Tossing a fair coin again and again until the first head appears: Example 4 Measuring the "response time" of a computer program: Probability Theory and Random Variables ______________________________________________________________ 2 Sample spaces can be finite or infinite. They are also classified as discrete if the number of sample points is finite or countably infinite. A sample space is continuous if its elementary events consist of all the numbers of some finite or infinite interval of the real line. An event (Ereignis, événement) is defined as a subset of a sample space. The empty set represents the impossible event, whereas stands for the certain event. Example 5 In Example 1 or Example 2, if A = {2,3,5}, then A is the event of tossing a prime number while B = {1,3,5} is the event of tossing an odd number. Example 6 In Example 4, A = {t | 20 < t < 30} is the event that the response time is between 20 and 30 seconds. Note, that in general events are not mutually exclusive. E.g. in Example 5 the events A and B may appear simultaneously. If A and B are two events of a given experiment, then also A B and A B are events. The event A* = \ A is the complementary event to A. If A B = then A and B are mutually exclusive or incompatible. The probability (Wahrscheinlichkeit, probabilité) of an event is defined as a real function P[] which satisfies the following Axioms 1 (1) 0 P[A] (2) P[] = 1 (3) P[A B] = P[A] + P[B] for every event A if the events A and B are mutually exclusive It is immediate from (3) that for any finite collection A1, A2, ..., An of mutually exclusive events P[A1 A2 ... An] = P[A1] + P[A2] + ... + P[An] We assume as Axiom (4) that this remains true if n tends to infinity: (4) P[A1 A2 A3 ...] = P[A1] + P[A2] + P[A3] + ... The above axioms lead to the following important consequences: Probability Theory and Random Variables ______________________________________________________________ 3 Theorem 1 (a) P[] = 0 (b) P[A*] = 1 - P[A] for every event A (c) P[A B] = P[A] + P[B] - P[A B] for any events A, B (d) A B implies P[A] P[B] for any events A, B Example 7 We toss two fair dice and consider the total number of spots as the result of the experiment. We define the following events: A : the result is a prime number B : the result is less than 5 The conditional probability (bedingte Wahrscheinlichkeit, probabilité conditionnelle) P[A|B], read as "conditional probability of A, given B" is defined as P[A|B] = Two events A and B are independent, if and only if P[A|B] = P[A]. Using the above definition of the conditional probability immediately leads us to following equivalent definition: Two events A and B are independent, if and only if P[A B] = P[A] P[B] Check whether the events A and B in Example 7 are independent or not! Probability Theory and Random Variables ______________________________________________________________ 4 The concept of two events A and B being independent should not be confused with the concept of their being mutually exclusive. In fact, if A and B are mutually exclusive then P[A B] = P[] = 0, however, if A and B are independent and P[A] 0 and P[B] 0 then P[A B] = P[A] P[B] 0. Hence, mutually exclusive events are not independent. One of the main uses of conditional probability is to assist in the calculation of unconditional probability by the use of the following theorem. Theorem 2 (Law of total probability) Let A1, A2, A3, ..., An be events such that (a) Ai Aj = if i j (mutually exclusive events) (b) P[Ai] > 0 for i = 1,2,3,...,n (c) A1 A2 .... An = The family of events A1, A2, A3, ..., An is then called a partition of . For any event A, the probability P[A] can then be calculated as follows: P[A] = P[A1] P[A|A1] + P[A2] P[A|A2] + ... + P[An] P[A|An] Example 8 There are three coins in a box. The first one is fair. The second one is weighted so that heads is twice as likely to appear as tails, and the third one shows tails on both sides. A coin is randomly drawn from the box and tossed. Find the probability of heads. Probability Theory and Random Variables ______________________________________________________________ 5 Another direct consequence of the definition of conditional probability is the following theorem: Theorem 3 (Bayes' Theorem) Suppose the events A1, A2, A3, ..., An form a partition of . Then for any event A with P[A] > 0 P[Ai|A] = Example 9 In a certain college, 25 % of the male students and 15 % of the female students are studying computer science. 70 % of all students are men. A student is chosen at random. (a) Find the probability that the student is studying computer science. (b) If the student is studying computer science, find the probability that the student is female. Example 10 A red die and a white one are tossed simultaneously. Let A, B, and C be events defined as follows: A: the red die turns up odd, B: the white die turns up odd, C: the total number of spots of both dice is odd. Probability Theory and Random Variables ______________________________________________________________ 6 The system of events A1, A2, A3, ..., An is called independent if P[Ai Aj] = P[Ai] P[Aj] P[Ai Aj Ak] = P[Ai] P[Aj] P[Ak] P[A1 A2 ... An] = P[A1] P[A2] ... P[An] for all combinations of indices such that 1 i < j < ... < k n. Reconsider the events in Example 10. Exercise 1 Let A and B be events with P[A] = 0.3, P[A B] = 0.4, and P[B] = p. Calculate p if (a) A and B are mutually exclusive, (b) A and B are independent, and (c) A is a subset of B. Exercise 2 A pair of fair dice is thrown. If the numbers appearing are different, find the probability that (a) the sum is 6, (b) the sum is 4 or less, (c) the sum is even. Exercise 3 Let three fair coins be tossed. Let A = "all heads or all tails", B = "at least two heads", and C = "at most two heads". Of the pairs (A,B), (A,C) and (B,C), which are independent? Is the system of the events A, B, C independent? Exercise 4 20 % of all HiQuHard computers are assembled on Mondays, 25 % of them on Tuesdays, Wednesdays and Thursdays, and 5% on Fridays. The company statistician has determined, that 4 % of the computers produced on Monday are "lemons", i.e. they are defective. 1 % of the computers made on Tuesday, Wednesday or Thursday are lemons, and 3 % of the computers manufactured on Friday are lemons. You find that your HiQuHard computer is a lemon. What is the probability it was manufactured on Monday? Exercise 5 Suppose there are n people in a room. (a) What is the probability p(n) that at least two persons have the same birthday. (b) What is the smallest n such that p(n) exceeds 0.5. Consider day and month only, and forget about leap years. Probability Theory and Random Variables ______________________________________________________________ 7 A random variable X (Zufallsvariable, variable aléatoire) is a function from a sample space of a given random experiment into the real numbers. I.e. each outcome of the experiment is related to a real number. Example 11 An urn contains 2 red balls and 3 white ones. We draw one ball after the other (without putting it back) until we get a red one. The number of moves is our random number X. The probability distribution function (PDF, Wahrscheinlichkeitsverteilung, distribution de probabilité) of a random variable is defined as F(x) = P[X x] In the case of a discrete random variable we have F(x) = In Example 11: Probability Theory and Random Variables ______________________________________________________________ 8 The PDF has the following properties: (1) lim F(x) = 0 (2) lim F(x) = 1 (3) if x1 < x2 then F(x1) F(x2), .e. F(x) increases monotonically There are two important functions that are defined for random variables: mean or average or expectation value (Mittelwert oder Erwartungswert, moyenne ou espérance mathématique) and variance (Varianz, variance). E[X] = expectation value Var[X] = variance The square root of the variance is known as standard deviation (Streuung, écart type) Probability Theory and Random Variables ______________________________________________________________ 9 Given a basic experiment with two possible outcomes (sucess and failure) with constant probabilities p (for success) and q = 1 - p (for failure). The experiment is repeated n times, and the number of successful outcomes is taken as the value of a random variable X. Let p(k) be the probability for the event that X takes the value of k: p(k) = P[X = k]. Then pBinomial(k,n,p) = with q = 1 - p This is the famous binomial distribution. Its expectation value and variance are: E[X] = Var[X] = For large values of n (and if p and q are not too close to zero) the evaluation of p(k,n,p) may be numerically difficult or even impossible.The binomial distribution may then be approximated by the normal distribution with the same mean and variance. pNormal(x,n,p) f(x) = with = and = Example 12 A telephone operator services 10 incoming telephone calls per hour in the average. Let us consider the minutes of an hour as "basic experiments" and declare an arriving call as a "successful outcome" of the basic experiment. What is the probability for exactly k = 12 "successful minutes" within a given hour? Binomial distribution: With the normal distribution we get: Evidently the number of "successful minutes" is not necessarily identical to the number of arriving calls, because it may happen that more than one call arrives within a single minute. We can improve the result of our probability calculation by considering "successful seconds". Probability Theory and Random Variables 10 ______________________________________________________________ In order to do so, we replace: n = 60 n= p= p= Note that while applying this substitution, the product np (equal to the mean value of the binomial distribution!) remains unchanged. We can even go further and take the limits for n and p 0 in a way that np = remains constant. We then obtain the famous Poisson distribution: pPoisson(k,) = The mean value is E[X] = and the variance is also Var[X] = 2 = . In our Example 12 we obtain for k = 12 and = 10: pPoisson(12,10) = Example 13 In Example 12, what is the probability for the event that less than seven calls arrive in a given hour? The probability distribution function F(x) gives us the answer: F(6) = P[X 6] = Example 14 Consider again the situation of our telephone operator. We start our observation (and our clock) at an arbitrarily chosen moment in time and measure the time until the first telephone call arrives. We take this time as our random variable T. Note that the starting point on the time axis is not important. We can choose it wherever we like. In particular, we can choose it at the moment, when a telephone call arrives. If we do so, T is the time between two arriving calls: the interarrival time, or pause, for short. Clearly, T is a continuous random variable. However, we shall first work with a discrete time interval t (e.g a minute) and measure T in units of this interval. We may then calculate the probability of a pause T of a certain maximum length, say t = 12 min: F(t) = P[T t] = Probability Theory and Random Variables 11 ______________________________________________________________ In our example = 10 h-1 = 1/6 min-1 represents the average arrival rate. We then have F(t) = In our numerical example: F(12 min) = However, the random number T should be measured in much shorter time units t in order to get a "more correct" result for F(t). The solution is to take the limit t 0 and n , such that t = n t remains constant. The result is the exponential distribution: F(t) = We have proved the following important fact: If the basic events of an experiment are independent from each other, then their number in a given interval of time is Poisson distributed, while the interarrivel time distribution is exponential. If X is a continuous random variable (like the interarrival time distribution in the preceeding example) we cannot use the definitions for the mean value and the average given earlier. In order to be able to define these functions for continuous random variables too, we define the probability density function (pdf) f(x) as the derivative of the probability distribution function F(x): F(x) = P[X x]. (probability distribution function); f(x) dx = F(x + dx) - F(x) = P[x X x + dx] f(x) = Probability Theory and Random Variables 12 ______________________________________________________________ The mean value or average and the variance of a continuous variable is defined as: E[X] = Var[X] = Exercise 7 We throw a 5 cent coin and a fair die and take the absolute value of the difference of the numbers that appear. The result is our random number X. Draw the probability distribution function F(x) and calculate mean and variance. Exercise 8 Use a mathematical software package to draw a picture of the binomial distribution pBinomial(k,n,p) for n = 20, p = 0.3. Compare it with a normal distribution having the same mean and variance. Exercise 9 Use a mathematical software package to draw a picture of the Poisson distribution pPoisson(k,) for = 25 , k = 0...100. Exercise 10 Calculate mean and variance of the exponential distribution F(t) = 1 - e- t Exercise 11 A particle detector counts 15 events per minute on the average. (a) What is probability of counting more than 20 events during a particular minute? (b) Find the probability of a pause shorter than 0.1 sec between two consecutive events.