Download Introduction to Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
MGMT 201: Statistics
Introduction to Probability (ASW Chapter 4)
 What is probability?
 Probability is a numerical representation of the likelihood that something will occur.
 We typically specify the probability space so that the probability lies between zero and one
(inclusive).
 A probability of zero indicates that the event is impossible.
 A probability of one indicates that the event is certain.
 A probability of, say, 0.5 indicates that if we could repeat the circumstance over and over
again, the event would occur half the time.
 Note that zero probability events can occur! In fact, they occur all the time. For example,
what is the probability that the temperature outside will be 46.181284706520961520975
tomorrow at 1:00 PM? When we consider that we could measure the temperature out to an
infinite number of decimals, it becomes clear that the probability that any given temperature
will occur is zero. Yet tomorrow at 1:00 PM some temperature will occur. This apparent
contradiction is an intuitive contradiction but is not a mathematical contradiction.
 Defining a Probability Space
 A probability space is simply a description of all possible events along with the probabilities of
their occurrence.
 example: Rolling a die
Event Probability
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
1/6
 When conducting an experiment, we refer to the set of all possible events as the sample space. A
particular event is called a sample point. Above, the set {1,2,3,4,5,6} is the sample space and a
roll of 4, for example, is a sample point.
 A Simple Case: Counting the Outcomes
 When events are equally likely to occur, counting is very useful. In fact, we can simply count the
number of ways an event can occur and divide by the total number of ways that all events can
occur. This will be the probability.
 example: What is the probability of drawing two consecutive aces from a deck of cards?
 # of ways to draw two aces:
1
Spades, Hearts
2
Spades, Diamonds
3
Spades, Clubs
4
Hearts, Diamonds
5
Hearts, Clubs
6
Diamonds, Clubs
 # of ways to draw two cards?
 There are 52 ways to draw the first card and 51 ways to draw the second card given that
we have drawn the first card (picture a table with the first draw on one axis and the
second on the other). But….half of these ways are duplicates. One way, for example,
would involve drawing the 3 of spades followed by the 4 of diamonds. A second way is
to draw them in the reverse order.




Total ways = 5251/2 = 1326.
Probability of drawing two aces = 6/1326 = 0.004525. Said differently, roughly 4.5 times
out of every thousand will result in two aces.
In general, # of ways to choose n objects from a set of N objects  C nN 
N!
n!N  n !
 The symbol ‘!’ represents the factorial: k! = k(k-1)(k-2)…321.
 By definition, 0! = 1.
 In the previous example, the number of ways to draw two cards = 52!/(2!50!) = 1326.
 We typical would use the phrase “N choose n is 1326” to describe the calculation.
In other cases, tree diagrams are useful in counting.
 example: Suppose we are interested in developing a new product. We begin with a
development phase in which the cost of the producing the product is determined. Based on
that cost, we must decide whether to market the product to everyone, to targeted markets, or
to license the technology. The consumer response to the product will then determine our
profits. Suppose we provide general classifications as follows
 cost  {high, low}
 response  {strong, mediocre, weak}
 Event Tree:
everyone
targeted
high
license
everyone
low
targeted
license



strong
mediocre
weak
strong
mediocre
weak
strong
mediocre
weak
strong
mediocre
weak
strong
mediocre
weak
strong
mediocre
weak
We could then assign probabilities to each of the branches along with profits to the final
outcomes. This would allow us to make the appropriate decision.
How many outcomes are possible? Counting, we see that there are 18. The simple rule
is to multiply the number of branches at each stage. We begin with two branches. Each
of them can branch three ways. Each of them can branch three ways. So, 233 = 18.
The important point, here, is for us to be logical in our approach to situations. Notice that
we have complete control over the marketing decision, but not the other two. Suppose…
 …there is a 40% probability of the product being a high cost item and a 60%
probability of low cost.


…there is a 30% chance of a strong consumer response, a 40% chance of a mediocre
response, and a 30% chance of a weak response.
…the profit projections are as follows:
Profits
everyone
targeted
high
license
everyone
low
targeted
license


$0.6M
strong
mediocre -$0.1M
weak
-$0.5M
strong
$1.1M
mediocre $0.2M
weak
-$0.1M
$0.3M
strong
mediocre $0.2M
$0.1M
weak
$0.9M
strong
mediocre $0.3M
-$0.1M
weak
strong
mediocre
weak
$1.2M
$0.3M
$0.0M
strong
mediocre
weak
$0.5M
$0.4M
$0.2M
If we must commit to our marketing strategy today, what should we do?
One approach is to calculate the expected profits from a given action. We will
consider this problem in chapter 5.
 Probabilities
 At this point, we want to be more precise in our definitions of a sample point and of an event.
 sample point  a distinct individual outcome.
 An implication of the definition is that a sample point cannot be subdivided.
 For example, consider the case where we draw an ace from a deck of cards. “Ace” is not
typically a sample point because there are four aces in the deck. The “ace of spades” is a
sample point because it cannot be broken down any further.
 event  a set of sample points
 An event may consist of any number of sample points, including one. Drawing an ace is
an event, as is drawing the ace of spades.
 Another distinction, here, is that nature determines the sample points, so to speak, while we
choose events based on what we are interested in examining.
 Because sample points are distinct (i.e., non-overlapping), the probability of any event is
equal to the sum of the probabilities of the sample points in the event.
 Determining the Probability Space
 Classical Method: When each outcome is equally likely, the probability of getting 1 of the n
possible outcomes is 1/n. (e.g., roll of die)


Relative Frequency Method: When the outcome likelihoods are unknown , we can use
sample data (assuming we can get it) to estimate the probabilities. We simply create a
relative frequency distribution and use the values given there.
 Subjective Method: When all else fails, we can take an educated guess.
Rules
 P(A) + P(Ac) = 1, where Ac is the complement of A. That is, Ac includes all of the sample
points that are not in A.
 The addition law: P(AB) = P(A) + P(B) – P(AB), where  denotes “union” and 
denotes “intersection”.
 AB is the set of all sample points that are in A or B.
 AB is that set of all sample points that are in A and B.
 e.g. Suppose we consider the roll of a die and define A = {1,2,3} and B={2,3,4}.
 P(A) = 1/6 + 1/6 + 1/6 = ½.
 P(B) = 1/6 + 1/6 + 1/6 = ½.
 We are tempted to say that the probability of A or B is ½ + ½ =1. If we do this, we
are double-counting the probabilities of “2” and “3”. So, we must subtract them to
obtain 1 – 1/6 – 1/6 = 4/6 = 2/3. That is the logic behind the addition law.
 Mathematicians often use Venn diagrams to depict situations. The diagrams are quite useful
for complex situations.
 Consider the last example: We draw a rectangle to depict the sample space and circles
(or other shapes if need be) to depict events. In this case, we draw the circles so that they
overlap. This indicates that AB  . Here,  is the empty set. When we say that the
intersection of A and B is not equal to the empty set, we mean that there are sample
points in A that are also in B.
A
B


The overlapping area depicts AB. The area of any region depicts the probability of
something in that region occurring.
If we have mutually exclusive events, AB =  and the Venn diagram is drawn as
follows:
A
B

 Since A and B do not overlap, we know that AB = .
Conditional Probabilities
 We are interested in estimating the probability that one event will occur given that
another has occurred. For example, on election night, we probably all wondered how
likely it was that Bush would win given that Pennsylvania had been won by Gore.
Mathematically, we would express this as P(AB) and say “the probability of A given
B”. Here, A={Bush wins election} and B={Gore wins Pennsylvania}.
 Graphically, once we know that B has occurred, we can eliminate the portion of A that
does not intersect with B
A
B
B
AB



Now that we have eliminated the portion of A that is not in B, we see that the probability
of getting A given that B occurs is P(AB)/P(B).
 This gives us the rule for conditional probabilities: P(AB) = P(AB)/P(B) or P(AB)
= P(B)P(AB). This is the multiplication rule for conditional probabilities.
 This makes sense intuitively. Consider our die example above and suppose I told you
that B has occurred. That is, either a 2, 3, or 4 was rolled. What is the probability that A
occurred? For A to have occurred, either a 2 or a 3 must have been rolled, so the
probability that A occurred must be 2/3 (two out of three chances). Using our notation
above, P(AB) = 1/3 and P(B) = 1/2, so P(AB)/P(B) = 2/3.
Conditional Probabilities for Independent Events
 Now, suppose that A and B are independent. This specifically means that anything we
learn when B occurs tells us nothing about the likelihood that A will occur.
 Said differently, P(AB) = P(A) for independent events.
 Note that this does not mean that AB = . In fact, A and B must overlap if they are
independent. The area of the overlap is precisely the amount needed so that
P(AB)/P(B) = P(A).
Bayes’ Rule (Theorem)
 Above, we conditioned on another event occurring. In another context, we might want to
calculate a posterior probability based on a prior probability belief.
 The basic idea is the following. Suppose there are two ways for something to occur.
You believe there is a 10% chance that the first way will occur and a 30% chance that the
second will occur. If that something does happen, what is the probability that it occurred
the second way? It seems reasonable that the probability would be 75% (i.e.,
30/(10+30)). This is the basis for Bayes’ Rule.
 How would this be depicted with Venn diagrams?
 Said differently, if there are two possible “ways” that B can occur, P(A1|B) =
P(A1B)/(P(A1B)+ P(A2B)).
 From our multiplication rule for conditional probabilities, we know that P(A1B) =
P(A1)P(B|A1) and P(A2B) = P(A2)P(B|A2). Substituting gives
P A1 B  

P A1 PB A1  P A2 PB A2 
 This is Bayes’ Rule for two events.
Bayes’ Rule for more than two events? Suppose that there are three ways that something
can occur. The first way occurs with probability 10%, the second with probability 20%,
and the third with probability 30%. Given that the “something” has occurred, what is the
probability that it was the third way? Again, it seems reasonable that it is
30%/(10%+20%+30%) = 0.5. This is the logic behind the general Bayes’ Rule:
PAi B  



P A1 PB A1 
P Ai  B 
P A1  B   ...  P An  B 
P Ai PB Ai 
P A1 PB A1  ...  P An PB An 
Here, n is the number of possible ways that an event can occur.
example: Suppose you work for a bank that issues home mortgages. We will call
borrowers who repay their loans “good” and those who default “bad”.
 Suppose you screen an applicant and he passes (i.e., the screening suggests that he is
good). What is the probability that he is a good borrower?
 Additional information
 Historical evidence suggests that if you do no screening, 76% of the population
will repay the loan.
 You have a screening process that accurately identifies good borrowers with
probability 98% and accurately identifies bad with probability 80%.
 Said differently, we know that…
 P(g) = 0.76 and P(b) = 0.24
 P(p|g) = 0.98 and P(f|g) = 0.02.
 I.e., the probability that the borrower will pass the screening given that
he is good is 0.98.
 P(p|b) = 0.20 and P(f|b) = 0.8.
 I.e., 20% of bad borrowers will pass the screening.
 The bottom line is that we want to update our probability assessment based on the
new information we have learned (he passed the screen).  We are interested in
P(g|p).
 Using Bayes’ Rule,
Pg p  


Pg P p g 
Pg P p g  Pb P p b 

0.76  0.98
 0.939
0.76  0.98  0.24  0.20
So, we are 93.9% confident that the borrower will repay the loan.
We say that our prior belief was that the client was good with probability 0.76. Our
posterior belief is that he is good with probability 0.939.