Download A random variable is a variable “whose value is a numerical

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Addition wikipedia , lookup

Elementary mathematics wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Expected value wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
A random variable is a variable “whose value is a numerical outcome of a random
phenomenon”.
A Discrete random variable is one that is countable.
Examples include:



The number of males born out of 5 births. Random Variable X is the number of
males and can take the form 0, 1, 2, 3, 4, or 5
The number of correct answers on a 10 question test. Random Variable Y is the
number of correct questions and can take the form 0,1,2,3,4,5,6,7,8,9 or 10
The sum of tolling two six-sided dice. The Random Variable X is the sum of the
two dice and can take the form 2,3,4,5,6,7,8,9,10,11,or 12
Notice in all cases the number of ways that the variable can take form is countable.
A Continuous random variable is one that is not countable.
Examples include:


The time it takes for someone to run 100 meters. The Random Variable X is the
time to run 100 meters and can take the form 10.03 seconds, 9.8997 seconds,
11.234521 seconds, …..
Selecting any rational number between 0 and 1. The Random Variable Y is the
rational number selected and can take the form 1, .09, .4528, .45289, .39876145,
.0000023, ……
Notice in both cases there are an infinite number of ways the variable can take form.
A Probability Distribution lists all the values of the Random Variable X and their
probabilities. The distribution is legitimate if


Every probability is a number between 0 and 1
The sum of all the probabilities is 1
Finding Probabilities for Discrete Random Variables
As an example, take a six-sided die and toss it. It will land face up and there will be
either be a 1 facing up, or a 2, or a 3, 4, 5 or 6. You are interested in the number that
lands on top. That means you can assign a variable X as the number that lands face up
and X can take the value 1, 2, 3, 4, 5, or 6. The probability of landing any of these given
numbers is 1/6. You can list this out in a table like this:
X
1
2
3
4
5
6
P(X)
1/6
1/6
1/6
1/6
1/6
1/6
The table above is the Probability Distribution. Check to make sure if it is
legitimate. You can also find different probabilities:

The probability of landing a 3 or a 5 is P(X = 3 or 5) = P(X = 3) + P(X = 5) = (1/6)
+ (1/6) = (1/3).

The probability of getting at least a 4 is P(X ≥ 4) = P(X = 4) + P(X = 5) + P(X = 6)
= (1/6) + (1/6) + (1/6) = (1/2).

The probability of getting less than 2 and more than 5 is P( 2 > X > 5) = P(X = 1)
+ P(X = 6) = (2/6) = (1/3).

The probability of getting 3 or less (P(X ≤ 3)) can be obtained two ways. One way
is to add the probabilities P(X = 1) + P(X = 2) + P(X = 3) = (1/2). Another way is
to consider that the sum of all the probabilities in the distribution is always equal
to 1. So, P(X ≤ 3) = 1- P(X ≥ 4) = 1 – (1/2) = (1/2).
Finding Probabilities for Continuous Random Variables
Unlike discrete random variables, you must have an interval to find probabilities of
continuous random variables. This is because the distribution of any continuous random
variable is described by a density curve and, from Module 2 activities, finding
probabilities in a density curve setting requires you to find the area under the curve
between the two numbers.
For example:
Consider the scenario where a density curve consists of a straight line segment from the point (0, 2/3) to
the point (1, 4/3) in the x–y plane. The random variable X is any number between 0 and 1.
**** In the grid above the x-axis is scaled by (1/4) and the y-axis is scaled by (1/3). The random
variable displayed on the x-axis and represents any number between 0 and 1. Notice in this
distribution that there is a better chance of drawing a 1 than a 0 because the height at X = 1 is
greater than the height at X = 0. The distribution is continuous because there are an infinite
amount of numbers between 0 and 1 that can be drawn.
***** It may be helpful to know that the word “curve” in mathematics doesn’t mean that the
shape is “curvy” (live a curve in a road). The word in mathematics always refers to a mathematical
description. So, the red line in the figure above is a curve even though it is straight in shape.
1.
Verify that this is a legitimate density curve
Two things need to be checked.
 Are all probabilities between 0 and 1? Yes
 Is the sum of all probabilities equal to 1? Yes because the area under the curve
between the random variables 0 and 1 is equal to 1. How? I can use my geometry
rules to find it.
The area of any rectangle is (length x width). The area of any triangle is ((1/2) x base
x height). In the figure above I have a rectangle that has a length that extends from
0 to 1 on the x-axis and a width that extends from 0 to (2/3) on the y-axis. So, the
area of that rectangle is (1 x (2/3)) = (2/3). There is also a triangle in the figure that
has a base that extends from the point (0,(2/3)) to the point (1,(2/3)). So, the base of
that triangle is has a length of 1. The triangle has a height that extends from the
point (1, (2/3)) to the point (1, (4/3)) so it has a height of (2/3). The area of the
triangle, then, is ((1/2) x (1) x (2/3)) = (1/3). Add the areas of the rectangle and the
triangle to get the total area under the curve and you get (2/3) + (1/3) = 1
2. What percent of the observations lie below 1/2?
Look at the question this way. If you were able to list out all of the numbers on the x-axis in the
figure above, what percent of those numbers will be between 0 and (1/2). It’s not 50% because there
is not an even distribution of all the numbers. Instead there are more occurrences of each
individual number as you go “up” the x-axis. In any continuous distribution the probability of
finding something requires you to find the area under a curve. So, in this case what you need to do
is find the area under the red line between 0 and (1/2) on the x-axis. So, the percent of observations
that lie below (1/2) is:
P(X< (1/2)) = the area of the rectangle and the area of the triangle combined. The rectangle has
a length between x = 0 and x = (1/2) and a width between y = 0 and y = (2/3). That area is (1/2) x
(2/3) = (1/3). The triangle has a base that extends from the point (0, (2/3)) to the point ((1/2), (2/3))
and a height that extends from the point ((1/2), (2/3)) to the point ((1/2), 1). That area is (1/2) x (1/2)
x (1/3) = (1/12). Combining the two areas give me (1/3) + (1/12) = (5/12) = .417
So, P(X < (1/2)) = (5/12) or about 41.7%
3. What percent of the observations lie below 1?
This one is easy. It’s 100% because all the numbers in the distribution are 1 or lower. But you may
be thinking “the observations don’t include “1” so why are we looking at “1 or lower”. Remember
that in a continuous distribution there is no difference between the inequality signs < and ≤.
Similarly, there is also no difference between the inequalities > and ≥ in a continuous distribution.
So, if I had asked you to find P(X ≤ (1/2)) instead of P(X < (1/2)) you would solve them the same
way.
This is not the case in a discrete distribution, though
4. What percent of the observations lie between 1/2 and 1?
There are two ways to do this problem. One is to look at the area under the curve between (1/2) and
1. You can find it the same way I showed you in #2 above by looking at the area of a rectangle and a
triangle and adding the two together. But there is another way to look at this. We already know
that the area under the whole curve must be 1 and we found in #2 above that the area between 0
and (1/2) on the x-axis was (5/12). So, this means that the percent of observations between (1/2) and
1 must be 1- (5/12) = 7/12 or about 58.3%
P((1/2) < X<1) = 1-.417 = .583 = 58.3%
5. What percent of the observations lie at X = (3/4)?
This answer is “0”. This is because we need to find an area and areas must have two dimensions
(i.e., length and width; base and height;…). Since we are not looking for an interval of numbers we
cannot find that percentage. Thus, P(X = (3/4)) = 0. This means that in a continuous distribution we
cannot find the probability of any one number – it is essentially 0. We must always have an interval
of numbers.
Again, this is not the case in a discrete distribution. In those you can find the probability at one
number.