Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables density and distribution function Part 1 Discrete case MSIS 385 Fall 1999 Farid Alizadeh Faculty of Management Rutgers University September 14, 1999 In this note I will review the concepts of Random variable, density and distribution functions. This is a minimal set of concepts that you should be familiar with in order to grasp the other statistical concepts that you will encounter in the course. 1 Sample space and events (Please review chapter 6 of your text.) In the situations that arise for statistical analysis we are generally dealing with some sort of experiments and expect a variety of outcomes. The term sample space refers to the collection of all possible outcomes that could possibly arise from an experiment. For example if we are throwing a coin then the possible set of coutcomes is heads or tails, thus the sample space is {head, tail}. If you are throwing a regular (that is cubic shaped) die with each face bearing one, two, ..., or six dots, then the sample space consists of 6 possible faces. When a polling company annouces the results of a, say, nationwide poll on presidential candidates it generally implies that its poll takers have chosen a random sample of people among citizens of the United States; thus here the sample space is the all the citizens of the US. Finally, when we talk about the temperature in New Brunswick at noon on October 1st, 1999, the sample space here the set of all possible numbers that can turn up, in this case an infinte set of real numbers anywhere between say, 0F to 120F. Thus sample spaces may be infinte or finte. An event is a portion of sample space that may be of interest to us. WHen we toss a coin and we want to know if the outcome is heads or not, the event under consideration is the singleton set {head}, a subset of the sample space. 1 When we throw a die and would like to know if the outcome is a face with three or fewer dots then the event of interest consistes of those faces with one, two, or three dots. Finally if we are interested to know whether the temperaqture in New Brunswick on October 1st 1999 will be between 60 and 80 degrees, then the infinite set of real numbers between 60 and 80 make up our event. 2 Random variables Sample spaces and events do not necessarily have numerical content. A sample space may be made up of people, cities, playing cards and so on. In many cases statistics deals with numerical content. The concept of random variable is invented to give numerical content to smaple spaces and events. A random variable is function that associates to each element of sample sapce a real number. At this point we impose no restriction on the numbers themselves, however particular situations may pose additional restriction such the number may be required to nonnegative or integral. For example consider the set of al Rutgers full time students in the current semester. We may be interested in their age. Consider the experiment of choosing a student at random and enquiring about his or age. In this case age is arandom varibale on the sample space of Rutgers students: It associates to students their age, a number. In another experiment we may want to throw a coin three times and we may be interested the number of times a head turned up. Here the sample space consists of 8 elements: S = {hhh, hht, hth, htt, thh, tht, tth, ttt} The random variable “number of heads that turn up” is any one of the numbers 0, 1, 2 or 3. If we denote this random varible by the symbol H, then H(hhh) = 3, H(hht) = H(hth) = H(thh) = 2, H(htt) = H(tht) = H(tth) = 1 and H(ttt) = 0, (remember a random variable is a function defined on the sample sapce, thus we are justified to write H(hht) = 2.) Whenever a sample space has a finite number of elements in it, then every random variable defined on it is a discrete random variable. It is possible, however, that the sample space and even the set of possible values attained by the random variable is infinite. In this case not every arbitrary function is acceptable as a random varibale. We restrict ourselves to continuous random variables in the infinte case; we will define this term shortly. 3 Density and Distribution functions for Discrete random variables Two fundamental functions are definable on most random varibales: the probability desnsity function (pdf for short) and the cumulative distribution function (cdf or simply the distribution function for short.) We will first define these concepts for discrete random variables. 2 Figure 1: A random variable indeciating number of heads in a sequence of three coin tosses Again consider the experiment of tossing a coin three times and the random varible which indicates the number of heads that turn up. Assume that the coin is a fair and thus the probability of a head turning up at each toss is equal to probability of a tail turning up. This implie that in throwing our coin three time all the 8 possible outcomes in the sample space S as describes above is equally likely. (Note that 8 = 2 × 2× = 23 ) Now consider the following questions and their answers: 1. what is the probability that no heads will turn up? In other words we need to calculte Pr[H = 0]. Since the event that there are no heads is E0 = {ttt} which has only 1 element , the answer is Pr[H = 0] = number of elements in E0 = 1/8 number of elements in S 2. what is the probability that only one head turns up? In other words we need to find out Pr[H = 1]. The event of interest here consists of all outcomes with only one head, that is E1 = {htt, tht, tth}. Thus, Pr[H = 1] = number of elements in E1 = 3/8 number of elements in S 3. what is the probability that two heads turn up? In other words what is the value of Pr[H = 2]. The event of interest is all outcomes with two heads that is E2 = {hht, hth, thh}. Thus, Pr[H = 2] = number of elements in E2 = 3/8 number of elements in S 3 4. what is the probability that three heads turn up? In other words we want Pr[H = 3]. The event of interest again consists of only one element: E3 = {hhh}. Thus Pr[H = 3] = number of elements in E3 = 1/8 number of elements in S What we have calculated above is the probability density function of the random variable H, the number of heads in three tossings of a fair coin. Definition 3.1 pdf is function that associates to each possible value of a discrete random variable a number that is equal to probability of that number being attained by the random variable assuming all elemnts of the sample space are equally likely. Thus id f is the probability density function of a random variable X, then f(x) = Pr[X = x] For instance in the above example if the probability density function of the random varibale H is f, then f(0) = 1/8, f(1) = 3/8, f(2) = 3/8 and f(3) = 1/8. • the pdf of a random varibale is a function that takes a numerical value as input and returns a real number between 0 and 1 inclusive; this number is aprobability. • If we add up all the f(x) for all possible value of the random variable, the sum is one, X f(x) = 1 x In the example above f(0) + f(1) + f(2) + f(3) = 1/8 + 3/8 + 3/8 + 1/8 = 1. • Depending on the situation, the function f may be given in a variety of formats. Sometimes we simply tabulate the values: for each possible value of x we provide f(x). This is OK for very small sample spaces, but becomes very cumbersome or practically impossible for larger sample spaces. Another alternative is to provide a formula or even a procdure for calculating f(x) given x. Some times the formula or procesure is called the probability law of the random variable. For instance you may hear “the random variable X follows the binomial law.” Let us now ask a set of new questions regading our tossing the fair coin three times: 1. what is the probility that two or fewer heads turn up? in other words what is the value Pr[H ≤ 2]? Since the set of outcomes that have two or fewer heads in them is E = {ttt, tth, tht, htt, hht, hth, thh} then the answer is 7/8. A better way is to observe that the event H ≤ 2 is 4 the same as H = 0 or H = 1 or H = 2; since we have already calculaed the probilities of these three events we can simply add them to get outr answer: Pr[H ≤ 2] = Pr[H = 0] + Pr[H = 1] + Pr[H = 2] = 1/8 + 3/8 + 3/8 = 7/8. the event 2. what is the probility that the number of heads is smaller that 1.3? This is a legitimate question even though the number of heads is always an integer, and never equal to 1.3. In this case Pr[x ≤ 1.3] = Pr[H ≤ 1] = 1/8 + 3/8 = 1/2. what we have done above is to calculate the cumulative distribution function of H. The cumulative distribution function or simply the distribution or cdf function of a random varible X is a function that–just like pdf –takes a real number from the set of all possible values of a random variable and returns another real number between zero and one. If the pdf is f(x) then cdf is F(x) and is defined as: F(x) = Pr[X ≤ x] The concepts of probaility density function and distribution functions are entirly determined by the values of the random variable and the formulas or probability laws that describe the functions F(x) and f(x). As such they don’t directly depend on the sample space itself. Mathmeaticians and statisticians spend time studynig and analyzing the properties of various distributions. In practical and real world situations our job is to identify an appropriate sample space and events, and define a suitable random variable and then idenify which one of the known and well-studied distribution laws applies to this particular experiment. 3.1 Binomail distribution To illustrate this point let us now study on of the most important discrete distributions laws that we will encounter repeatedly in the future, that is the binomial distribution law. This is a generalization of our coin tossing experience. Suppose that an experiment has two outcomes Success (S) and Failure (F). Furthemore, assume that in our experiment Success or Failure occur randomly with Pr[Succes] = p and Pr[Failure] = q = 1 − p. Suppose that the experiment is conducted n times and we are interested in the number of successes. Here is the setup in anutshell: • Parameters: The numbers p and n are parameters, they could be different in each experiment and any formuls we derive may depoend on them. • Sample space: There are 2n possible outcomes that can possibly occur, these outcomes may be represented by a sequenc of S and F letters. For 5 instnace if n = 2 in two experiments we may have the 22 = 4 outcomes {SS, SF, FS, FF}. These 2n outcomes constitue our sample space. • Random variable: The random variable is the number of occurrences of Success in a sequence of n expermints. Let us call it K. Thus for example if n = 4, the value of random varibale at the sequence SFFS equals 2 since for this particular outcome there are two successeexample if n = 4, the value of random varibale at the sequence SFFS equals 2 since for this particular outcome there were two successes. Put another way the random variable K counts the number of ocuurences of S in a sequence of n letters made up of S and F symbols. • the pdf of the random variable K is called the binaomail density function with parameters n and p–denoted usually by b(k; n, p)–is the probability of exactly k successes in a sequence of n trials. Notice that here the variable of the function is k and n and p are given fixed parameters. We saw in the coin tossing experiment how to calculate this function for n = 3 and p = 1/2 (except that there, Success was called head, Failure was called tail.) But you should appreicate that for even moderately large n the essentially tabulating approach is not going to be practical. FOrtunatly there is a simple formula for th binomial density function (and chapter 7 of your tex elaborates on it somewhat more): n k n−k n n! Pr[K = k|n, p] = b(k; n, p) = p q where = k!(n − k)! k k and n! = 1 × 2 × · · · × n is the facotrial function. • For the cumulative distribution function or cdf of this random variable there is no simple formula, rather we can only express this in the form of a sumation: Pr[K ≤ k|n, p] = B(k; n, p) = b(0; n, p) + b(1; n, p) + · · · + b(k − 1; n, p) + b(k; n, p) It is quite tedius to compute the cdf function of binomial distribution as given above, especially if n is large, in the order of thousands or more. In a few lectures we will see that there are quite efficeint ways of approximating the biomial distribution by easier to compoute functions. • Now what is the mean and variance of a Binomial distribution? It turns out that µK = np σ2 = npq 6