Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A random variable is a variable “whose value is a numerical outcome of a random phenomenon”. A Discrete random variable is one that is countable. Examples include: The number of males born out of 5 births. Random Variable X is the number of males and can take the form 0, 1, 2, 3, 4, or 5 The number of correct answers on a 10 question test. Random Variable Y is the number of correct questions and can take the form 0,1,2,3,4,5,6,7,8,9 or 10 The sum of tolling two six-sided dice. The Random Variable X is the sum of the two dice and can take the form 2,3,4,5,6,7,8,9,10,11,or 12 Notice in all cases the number of ways that the variable can take form is countable. A Continuous random variable is one that is not countable. Examples include: The time it takes for someone to run 100 meters. The Random Variable X is the time to run 100 meters and can take the form 10.03 seconds, 9.8997 seconds, 11.234521 seconds, ….. Selecting any rational number between 0 and 1. The Random Variable Y is the rational number selected and can take the form 1, .09, .4528, .45289, .39876145, .0000023, …… Notice in both cases there are an infinite number of ways the variable can take form. A Probability Distribution lists all the values of the Random Variable X and their probabilities. The distribution is legitimate if Every probability is a number between 0 and 1 The sum of all the probabilities is 1 Finding Probabilities for Discrete Random Variables As an example, take a six-sided die and toss it. It will land face up and there will be either be a 1 facing up, or a 2, or a 3, 4, 5 or 6. You are interested in the number that lands on top. That means you can assign a variable X as the number that lands face up and X can take the value 1, 2, 3, 4, 5, or 6. The probability of landing any of these given numbers is 1/6. You can list this out in a table like this: X 1 2 3 4 5 6 P(X) 1/6 1/6 1/6 1/6 1/6 1/6 The table above is the Probability Distribution. Check to make sure if it is legitimate. You can also find different probabilities: The probability of landing a 3 or a 5 is P(X = 3 or 5) = P(X = 3) + P(X = 5) = (1/6) + (1/6) = (1/3). The probability of getting at least a 4 is P(X ≥ 4) = P(X = 4) + P(X = 5) + P(X = 6) = (1/6) + (1/6) + (1/6) = (1/2). The probability of getting less than 2 and more than 5 is P( 2 > X > 5) = P(X = 1) + P(X = 6) = (2/6) = (1/3). The probability of getting 3 or less (P(X ≤ 3)) can be obtained two ways. One way is to add the probabilities P(X = 1) + P(X = 2) + P(X = 3) = (1/2). Another way is to consider that the sum of all the probabilities in the distribution is always equal to 1. So, P(X ≤ 3) = 1- P(X ≥ 4) = 1 – (1/2) = (1/2). Finding Probabilities for Continuous Random Variables Unlike discrete random variables, you must have an interval to find probabilities of continuous random variables. This is because the distribution of any continuous random variable is described by a density curve and, from Module 2 activities, finding probabilities in a density curve setting requires you to find the area under the curve between the two numbers. For example: Consider the scenario where a density curve consists of a straight line segment from the point (0, 2/3) to the point (1, 4/3) in the x–y plane. The random variable X is any number between 0 and 1. **** In the grid above the x-axis is scaled by (1/4) and the y-axis is scaled by (1/3). The random variable displayed on the x-axis and represents any number between 0 and 1. Notice in this distribution that there is a better chance of drawing a 1 than a 0 because the height at X = 1 is greater than the height at X = 0. The distribution is continuous because there are an infinite amount of numbers between 0 and 1 that can be drawn. ***** It may be helpful to know that the word “curve” in mathematics doesn’t mean that the shape is “curvy” (live a curve in a road). The word in mathematics always refers to a mathematical description. So, the red line in the figure above is a curve even though it is straight in shape. 1. Verify that this is a legitimate density curve Two things need to be checked. Are all probabilities between 0 and 1? Yes Is the sum of all probabilities equal to 1? Yes because the area under the curve between the random variables 0 and 1 is equal to 1. How? I can use my geometry rules to find it. The area of any rectangle is (length x width). The area of any triangle is ((1/2) x base x height). In the figure above I have a rectangle that has a length that extends from 0 to 1 on the x-axis and a width that extends from 0 to (2/3) on the y-axis. So, the area of that rectangle is (1 x (2/3)) = (2/3). There is also a triangle in the figure that has a base that extends from the point (0,(2/3)) to the point (1,(2/3)). So, the base of that triangle is has a length of 1. The triangle has a height that extends from the point (1, (2/3)) to the point (1, (4/3)) so it has a height of (2/3). The area of the triangle, then, is ((1/2) x (1) x (2/3)) = (1/3). Add the areas of the rectangle and the triangle to get the total area under the curve and you get (2/3) + (1/3) = 1 2. What percent of the observations lie below 1/2? Look at the question this way. If you were able to list out all of the numbers on the x-axis in the figure above, what percent of those numbers will be between 0 and (1/2). It’s not 50% because there is not an even distribution of all the numbers. Instead there are more occurrences of each individual number as you go “up” the x-axis. In any continuous distribution the probability of finding something requires you to find the area under a curve. So, in this case what you need to do is find the area under the red line between 0 and (1/2) on the x-axis. So, the percent of observations that lie below (1/2) is: P(X< (1/2)) = the area of the rectangle and the area of the triangle combined. The rectangle has a length between x = 0 and x = (1/2) and a width between y = 0 and y = (2/3). That area is (1/2) x (2/3) = (1/3). The triangle has a base that extends from the point (0, (2/3)) to the point ((1/2), (2/3)) and a height that extends from the point ((1/2), (2/3)) to the point ((1/2), 1). That area is (1/2) x (1/2) x (1/3) = (1/12). Combining the two areas give me (1/3) + (1/12) = (5/12) = .417 So, P(X < (1/2)) = (5/12) or about 41.7% 3. What percent of the observations lie below 1? This one is easy. It’s 100% because all the numbers in the distribution are 1 or lower. But you may be thinking “the observations don’t include “1” so why are we looking at “1 or lower”. Remember that in a continuous distribution there is no difference between the inequality signs < and ≤. Similarly, there is also no difference between the inequalities > and ≥ in a continuous distribution. So, if I had asked you to find P(X ≤ (1/2)) instead of P(X < (1/2)) you would solve them the same way. This is not the case in a discrete distribution, though 4. What percent of the observations lie between 1/2 and 1? There are two ways to do this problem. One is to look at the area under the curve between (1/2) and 1. You can find it the same way I showed you in #2 above by looking at the area of a rectangle and a triangle and adding the two together. But there is another way to look at this. We already know that the area under the whole curve must be 1 and we found in #2 above that the area between 0 and (1/2) on the x-axis was (5/12). So, this means that the percent of observations between (1/2) and 1 must be 1- (5/12) = 7/12 or about 58.3% P((1/2) < X<1) = 1-.417 = .583 = 58.3% 5. What percent of the observations lie at X = (3/4)? This answer is “0”. This is because we need to find an area and areas must have two dimensions (i.e., length and width; base and height;…). Since we are not looking for an interval of numbers we cannot find that percentage. Thus, P(X = (3/4)) = 0. This means that in a continuous distribution we cannot find the probability of any one number – it is essentially 0. We must always have an interval of numbers. Again, this is not the case in a discrete distribution. In those you can find the probability at one number.