Download 1. The first step to solving this problem is figuring out how the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
1. The first step to solving this problem is figuring out how the numbers you are given fit in with notation
we use for these kinds of problem.
In this case, we are told the probability of the disease D is 1/500, so P (D) = 1/500. We are also
informed that the probability the test comes back positive (+) if the patient has the disease is 80%,
so P (+ | D) = .80. Lastly, we are told that the probability that the test comes back negative (-) if
the patient does not have the disease is 70%, so P (− | DC ) = .70.
(a) To calculate the probability the test comes back negative, we must take into account the probability the test is negative if the patient does not have the disease, as well as the probability that
the test is negative even if the subject has the disease. We need to use Bayes’ rule for this:
P (−) = P (− | D)P (D) + P (− | DC )P (DC ).
We are given all components to this equation except for P (− | D). However, P (− | D) = 1−P (+ |
D) = 1 − .80 = 0.20. Using the equation above, we get that
P (−) = P (− | D)P (D) + P (− | DC )P (DC ) = 0.20 ∗ 1/500 + 0.70 ∗ 499/500 = 0.699.
**NOTE: While it is true that P (A | B) = 1 − P (AC | B), P (A | B) 6= P (A | B C ). Be very careful
about this!
(b) This is a conditional probability, because we already know that the test is negative, but we want
to know how likely it is that the subject actually has the disease.
P (D|−) =
.20 ∗ 1/500
P (− | D)P (D)
=
= 0.00057.
P (−)
.699
(c)
P (D | +) =
P (+ | D)P (D)
0.8 ∗ 1/500
P (+ | D)P (D)
=
=
= 0.005
P (+)
1 − P (−)
.301
(d) There are many ways we could assess the accuracy of this test, but one thing to consider is what
the worst error might be. In the case of disease testing, usually the worst case scenario is that
someone who has a negative result actually has the disease. In this problem, the probability of
actually having the disease even though the test is negative is much smaller than the probability
of having the disease even though the test is positive.
2. (a) If events J and K are independent, J’ and K are also independent, and J’ and K’ are also independent. An example of this is if J represents the probability of rolling a 6 on a die, and K represents
the probability of drawing a Queen from a pack of cards. These two events are independent, In
addition, J’ = not rolling a 6 is independent of drawing a Queen, and also is independent of not
drawing a Queen (K’).
(b) If events A, B and C are mutually disjoint, A’, B’, and C’ are not necessarily mutually disjoint,
and the events are definitely not independent. If we are going to draw one card from a deck,
and A = drawing a Jack, B = drawing a Queen, and C = drawing a King, these events are
mutually disjoint, because only one of these events can happen. However, A’ = not drawing a
jack, B = not drawing a Queen, and C = not drawing a King. These complimentary events
are not mutually disjoint, and are not independent. In fact, mutually disjoint events are never
independent, because the fact that one event has not occurred will change the probability that
the other event will occur.
(c) If E is independent of F, and F is independent of G, is E is not necessarily independent of G. For
example, let’s say that E is the event that a Queen is drawn from a deck of cards, G is the event
that a King is drawn from the same deck of cards, and F is the event that a 6 is rolled in a die.
E is independent of F, and G is independent of F, but E and G are not independent, and are in
fact mutually exclusive (not dependent).
1
3. Three fair dice are rolled together. Let X be the random variable denoting the sum of all three top
faces after they’re rolled.
(a) To figure out the pmf, multiple steps must be taken:
i. Determine the possible ways each sum can be made.
ii. Find the number of combinations for each of those possibilities.
iii. Find the probability based on the total number of combinations for each sum.
For step (1), a sum of ‘3’ can only be rolled in 1 way: with three 1’s. However, for a sum of
‘7’, this can be done in 4 ways: {1,1,5; 1,2,4; 1,3,3; 2,3,2}. In step (2), the task is to find the
number of combinations for each of the sums. For the sum equal to ‘3’, there is only one way
this can happen. However, for the case of the sum equal to ‘7’ , the possible methods for getting
the sum have different combination totals. For example, the number of ways 1,1,5 can be rolled
is equal to 3. This can be calculated by enumerating the possibilities ({1,1,5; 1,5,1; 5,1,1}), or
by using permutations, and calculating 3!/2!. (The bottom 2! is to account for the two 1’s that
are not unique.) The number of ways that 1,2,4 can be rolled is equal to 3!. This means that
the total number of ways that a sum of ‘7’ can be rolled is equal to 4*3!/2! + 3! = 15. The
last step, calculating the probability, is done by taking the total number of possibilities for that
specific outcome, and dividing it by the total number of possibilities in the sample space. In this
problem, the number of items in the sample space is the number of ways three dice can be rolled
(independently) which is equal to 63 . So the probability of rolling a sum of ‘3’ is equal to 1/63 ,
and the probability of rolling a sum equal to ‘7’ is 15/63 . The cumulative distribution function is
calculated by adding the previous total probability sum to the current probability. The pmf and
cdf can be observed in the table below.
x
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
p(x)
1/63
3/63
6/63
10/63
15/63
21/63
25/63
27/63
27/63
25/63
21/63
15/63
10/63
6/63
3/63
1/63
F(x)
1/63
4/63
10/63
20/63
35/63
56/63
81/63
108/63
135/63
160/63
181/63
196/63
206/63
212/63
215/63
216/63 = 1
Table 1: Table of x, pmf, and cdf for problem 3.
(b) To calculate the expected value of X, we need to use the equation:
EX =
18
X
xp(x) = 10.5
x=3
. In this problem, the ’x’ in the equation represents the sum of the roll, and the p(x) represents
the probability of rolling that sum. This can be done by hand, or if your table is entered in
Excel, you can multiply the probability column by the sum column and put the answer in a third
column, and then sum the third column. Similarly, you can do this in R by making two vectors,
multiplying them together, and adding them up:
2
> sumvector = c(3, 4, 5, 6, ....., 18)
> probvector = c(1/6^3, 3/6^3, 6/6^3, 10/6^3, ...., 1/6^3)
> EX = sum(sumvector*probvector)
> EX
[1] 10.5
The probvector can also be used to calculate the cdf using cumsum(), which provides a cumulative
sum of any vector of numbers:
> cdf = cumsum(probvector)
What does EX mean in this problem? It means that, on average, the sum of three independently
rolled dice will equal 10.5.
(c) To calculate the variance of X, we need to use the equation:
V ar(X) =
18
X
(x − EX)2 p(x) =
x=3
18
X
x2 p(x) − (EX)2 = 8.75.
x=3
Either equation is fine, but since you have already calculated EX, it is probably easier to to use
the right-most equation. This can be done in R:
> VarXshort = sum(sumvector^2*probvector)-EX^2
> VarXshort
[1] 8.75
> VarXlong = sum((sumvector-EX)^2*probvector)
> VarXlong
[1] 8.75
4. This problem needs to be done in two stages. In the first stage, we need to find the probability that
the machine shuts down (i.e. 6 or more rings fail). In the second part, we need to find the probability
that this shut down will happen within the first 10 uses.
Part 1: Let X be a random variable that denotes the number of ring failures in the machine. The
probability that each ring fails is 0.02. Because there are 12 rings in this machine, and each ring
operates independently of each other, X ∼ B(12, .02). Since we have determined this is binomial, we
then need to calculate:
12 X
12
0.02x (0.98)12−x = 5.33 × 10−8
pM F = P (machine fails) = P (X ≥ 6) =
x
x=6
There are many quick ways to calculate this probability using software. This can be done in R with
the following code:
> pMF = sum(dbinom(6:12, size = 12, prob = .02))
> pMF
[1] 5.331333e-08
In the statement above, dbinom(6) calculates P (X = 6) for the binomial distribution, 6:12 are the
numbers 6 through 12 (so we are calculating P (X = x) for all x = 6 to 12) and “size” denotes the n
in our equation. So we are summing all of the probabilities for x = 6 to 12.
Part 2: Now that we know the probability that the machine may fail, we need to know the probability
that this happens in the first 10 times we use the machine. This means it might happen on the first
use, or it might happen the second use, or the third use, etc. If we let Y denote a random variable for
the first use during which the machine fails, Y has a Geometric distribution with probability pM F and
we are interested in:
10
X
P (Y ≤ 10) =
pM F (1 − pM F )y−1 = 5.33 × 10−7
y=1
3
. This can be done in R in a method similar to part 1:
> sum(dgeom(1:10, prob = pMF))
[1] 5.331331e-07
5. (a) This is the Poisson family of distribution, which is typically used for counting items over a specified
period of time.
(b) This is a Poisson distribution with an average (µ) equal to 30.
(c) If X is a random variable denoting the number of cars that cross the bridge in a day, then X is a
random variable that follows a Poisson distribution with µ = 30, such that
P (X = x) = exp{−30}
30x
.
x!
To find the probability that less than 10 cards cross the bridge in a day, we calcualte:
P (X < 10) = P (X ≤ 9) =
9
X
x=0
> sum(dpois(0:9, lambda = 30))
[1] 7.121751e-06
4
exp{−30}
30x
= 7.12 × 10−6 .
x!