Download Random_Variables_discSp13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Discrete Random Variables
A random variable is numerical characteristic of each event in a sample space, or
equivalently, each individual in a population. Examples include the number of
correct answers when guessing on a multiple choice exam or the amount of money
one spends on a weekend.
These random variables are classified into two types: discrete or continuous. A
discrete random variable has a countable set of distinct possible values, while a
continuous random variable is such that any value (to any number of decimal
places) within some interval is a possible value. A more readably defining
difference would be that discrete random variables are counted and continuous
random variables are measured. For instance, the number of beer bottles in a case
of beer is discrete, but the volume of ounces is continuous (examine each bottle; do
they have the exact same amount in each?)
Scenario One
In a prison consisting of 500 male prisoners, each inmate was questioned on how
many times they had been convicted prior to his current incarceration. The
breakdown is as follows:
Number of prior convictions: 0
Number prisoners reporting: 80
1
265
2
100
3
40
4
15
Questions:
1. What is the chance (i.e. probability) that a randomly selected inmate had two
prior convictions? 100/500 = 0.20
2. What is the chance (i.e. probability) that a randomly selected inmate had two or
fewer prior convictions? 155/500 = 0.31
3. What is the expected value (i.e. average or mean) of prior convictions for these
inmates?
[ (80)*(0) + (265)*(1) + (100)*(2) + (40)*(3) + (15)*(4)] / 500
= (0 + 265 + 200 + 120 + 60 ) / 500
= 1.29
P(0) = 80/500 = 0.16
P(1) = 265/500 = 0.53
P(2) = 100/500 = 0.20
P(3) = 40/500 = 0.08
1
P(4) = 15/500 = 0.03
X = Priors
P(X = x)
0
0.16
1
0.53
2
0.20
3
0.08
4
0.03
This table is called a probability distribution table as it displays the probability for
each possible outcome.
Observations about such a table:
1. The “X” represents the defined outcome of interest (i.e. prior convictions) and the
“x” represents a specific outcome of interest (i.e. 0, 1, 2, 3, or 4).
2. The sum of all probabilities is one. Since the possible outcomes are an exhaustive
list, at least one of these has to occur. Therefore, the probability that some outcome
occurs is one.
3. The individual outcomes are mutually exclusive. That is, for a single event (i.e.
convict) there can only be one possible outcome.
4. Be careful! The “equal to” means a difference. For instance, at least one is not the
same as more than one.
Applying this table to answer the above questions:
1. What is the chance (i.e. probability) that a randomly selected inmate had two
prior convictions?
Notation for this is: find P(X = 2). From the table this is 0.20
2. What is the chance (i.e. probability) that a randomly selected inmate had two or
fewer prior convictions? 155/500 = 0.31
Notation for this is: find P(X >= 2). This can be found by either adding all the
probabilities that satisfy this (i.e. the outcomes of 2, 3, 4) or using the complement
which is 1 – P(X < 2).
P(X >= 2) = P(X = 2) + P(X =3) + P(X = 4) = 0.20 + 0.08 + 0.03 = 0.31
1 – P(X < 2) = 1 – P(X = 1) – P(X =2) = 1 – 0.53 – 0.16 = 0.31
3. What is the expected value (i.e. average or mean) of prior convictions for these
inmates?
2
What we did previously was:
[ (80)*(0) + (265)*(1) + (100)*(2) + (40)*(3) + (15)*(4)] / 500
However, this can be looked upon as:
[ (80/500)*(0) + (265/500)*(1) + (100/500)*(2) + (40/500)*(3) + (15/500)*(4)]
= (0.16)*(0) + (0.53)*(1) + (0.20)*(2) + (0.08)*(3) + (0.03)*(4)
= 0 + 0.53 + 0.40 + 0.24 + 0.12
= 1.29
What we have then for a probability distribution table is a formula for calculating
the mean, also called expected value of X:
E(X) = ∑XiP(Xi) which in words means to “take each outcome value times its
respective probability and then sum.”
4. Another useful tool of this table is that we can also find the variance and
standard deviation. We would expect some variability in our data (i.e. not all
convicts would answer the same). To find the variance and SD we:
Var(X) = ∑X2iP(Xi) – [E(X)]2 In words, we are taking each outcome, squaring it,
multiplying it times its respective probability, then summing these. After this we
subtract the square of the mean.
= (0)2*(0.12) + (1)2*(0.53) + (2)2*(0.20) + (3)2*(0.08) + (4)2*(0.03) – (1.29)2
= 0 + 0.53 +0.8 + 0.72 + 0.48 – 1.66
= 2.53 – 1.66 = 0.87
SD(X) = √0.87 = 0.93
An alternative to the probability distribution table is a cumulative probability
distribution table. The design of this table works exactly as it sounds: the
probabilities accumulate as you go across the table.
X = Priors
P(X <= x)
0
0.16
1
0.69
2
0.89
3
0.97
4
1.00
NOTE that instead of P(X=x) we know have P(X<=x). Here the 0.89 under ‘2’ means
“the probability of 2 or less is 0.89”
3
A special discrete random variable is the Binomial. A binomial is what it sounds
like: “two numbers”. What that means for us is two possible outcomes e.g. pass/fail,
male/female, or yes/no. When the four following conditions are satisfied we have
what is called a Binomial Experiment
1
2
3
4
There are a fixed number of trials (a fixed sample size).
On each trial, the event of interest either occurs or does not, i.e. only two
possible outcomes.
The probability of occurrence (or not) is the same on each trial.
Trials are independent of one another.
Scenario Two
An FBI survey showed that about 80% of property crimes go unsolved. Suppose
that in State College the police are investigating three property crimes.
Questions:
1. What is the chance (i.e. probability) that at least one of these crimes will be
solved?
If we let U = unsolved then P(U) = 0.8 and if we let S = solved then P(S) = 0.2
To find P(S >= 1), we can find the probability for each possible sequence where one
or more of the 3 crimes is solved.
SUU = (.2)(.8)(.8) = 0.128 SSU = (.2)(.2)(.8) = 0.032 SSS = (.2)(.2)(.2) = 0.008
USU = (.8)(.2)(.8) = 0.128 SUS = (.2)(.8)(.2) = 0.032
UUS = (.8)(.8)(.2) = 0.128 SSU = (.8)(.2)(.2) = 0.032
0.384
0.096
Summing all of these we get 0.488
An easier way would have been to use the complement! The complement to “at
least one” being solved would have been “less than one” being solved meaning “zero
or none are solved” --- all three crimes go unsolved.
P(S = 0) = UUU = 0.8*0.8*0.8 = 0.512
P(S >= 1) = 1 – P(S < 1) = 1 – P(S = 0) = 1 – 0.512 = 0.488
To help we have a formula and Minitab:
For an exact probability of an outcome we can use:
4
n!
p x (1  p )n  x where ‘n’ is the number of trials and ‘x’ is number
x!(n  x)!
of successes, and p is probability of success
P(X = x) =
Looking at the sample space for solving exactly one crime, we have n = 3, x=1, p = .2
3!
0.21 (1  0.2) 31 = (3)*(0.2)*(0.8)2 = 0.384
1!(3  1)!
Note that this is the pattern in each of the sequences where we one of three is
solved.
P(X = 1) =
For a cumulative probability we can either find the exact probabilities for each
outcome and then add them or more easily use Minitab.
In Minitab, go to Calc > Probability Distributions > Binomial. Then select:
Probability if question calls for P(X = x) e.g. Probability of exactly one prior
Cumulative Probability if question calls for P(X <= x) e.g. Probability of more than
one prior. We’d find P(X <=1) then subtract from 1.
2. What is the expected value (i.e. average or mean) of these three crimes that will
be solved?
For a binomial experiment, E(X) = np For our crime example the average number of
3 crimes that would be solved is 3*0.2 = 0.6 or slightly less than one.
SD(X) = √np(1-p) = √3*.2*.8 = √0.48 = 0.69
5