Download CH4. Introduction to Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
CH4. Introduction to Probability
Random experiments & Sample space
•
A random experiment is an observational process whose results cannot be
known in advance.
The set of all outcomes (S) is the sample space for the experiment.
•
Discrete Sample Space
•
A sample space with a countable number of outcomes is discrete.
•
For a single roll of a die, the sample space is:
S = {1, 2, 3, 4, 5, 6}
•
When two dice are rolled, the sample space is the following pairs:
S=
{(1,1), (1,2), (1,3), (1,4), (1,5), (1,6),
(2,1), (2,2), (2,3), (2,4), (2,5), (2,6),
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6),
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6),
(5,1), (5,2), (5,3), (5,4), (5,5), (5,6),
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}
Continuous Sample Space
If the outcome is a continuous measurement, the sample space can be
•
described by a rule.
•
For Ex, the sample space to describe a randomly chosen student ’s GPA
would be S = {X | 0.00 < X < 4.00}
Events
•
An event is any subset of outcomes in the sample space.
•
A simple event or elementary event, is a single outcome.
Ex 1: The event having a head in tossing a coin. A={H}
Ex 2: The event having 2 in rolling a die. A={2}
•
A discrete sample space S consists of all the simple events (Ei):
S = {E1, E2,…, En}
•
A compound event consists of two or more simple events.
Ex 1: The event having even number of rolling a die.
A = {2, 4, 6}: composed of three simple events
1
Ex 2: the compound event A = “rolling a seven” on a roll of two dice consists
of 6 simple events: A = {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}
Equally Likely
•
Consider the random experiment of tossing a balanced coin.
What is the sample space? S = {H, T}
What are the chances of observing a H or T?
•
These two elementary events are equally likely.
•
When you buy a lottery ticket, the sample space
S = {win, lose} has only two events.
Are these two events equally likely to occur?
Probability
•
The probability of an event is a number that measures the relative likelihood
that the event will occur.
•
The probability of event A [denoted P (A)], must lie within the interval from
0 < P (A) < 1
0 to 1:
If P (A) = 0, then the event cannot occur.
If P (A) = 1, then the event is certain to occur.
•
In a discrete sample space, the probabilities of all simple events must sum to
unity: P (S) = P (E1) + P (E2) + … + P (En) = 1
What is Probability?
•
Three approaches to probability:
Approach
Examples
Empirical
There is a 2 percent chance of twins in a randomly-chosen birth.
Classical
There is a 50 % probability of heads on a coin flip.
Subjective
There is a 75 % chance that England will adopt the Euro currency by
2010.
Empirical Approach
•
Use the empirical or relative frequency approach to assign probabilities by
counting
the
frequency
(f)
of
observed
outcomes
defined
on
experimental sample space.
•
For Ex, to estimate the default rate on student loans:
P (a student defaults) = f /n = (number of defaults)/ (number of loans)
2
the
•
Necessary when there is no prior knowledge of events.
•
As the number of observations (n) increases or the number of times the
experiment is performed, the estimate will become more accurate.
Classical Approach
•
Instead of performing the experiment, we can use deduction to determine
P (A).
•
a priori refers to the process of assigning probabilities before the event is
observed.
Ex) Probability of having “head” in a fair coin
•
a priori probabilities are based on logic, not experience.
•
For Example, the two dice experiment has 36 equally likely simple events.
The P(rolling a seven) is
P( A) 
•
number of outcomes with 7 dots
6

 0.1667
number of outcomes in sample space 36
The probability is obtained a priori using the classical approach as shown in
this Venn diagram for 2 dice:
Law of Large Numbers
•
The law of large numbers is an important probability theorem that states
that a large sample is preferred to a small one.
•
Flip a coin 50 times. We would expect the proportion of heads to be near
1/2.
•
However, in a small finite sample, any ratio can be obtained (e.g., 1/3, 7/13,
10/22, 28/50, etc.).
•
A large n may be needed to get close to 1/2.
Subjective Approach
3
•
A subjective probability reflects someone ’ s personal belief about the
likelihood of an event.
•
Used when there is no repeatable random experiment.
•
Ex: What is the probability that the price of GM stock will rise within the
next 30 days?
Rules of Probability
Union of Two Events
•
The union of two events consists of all outcomes in the sample space S that
are contained either in event A or in event B or both
(denoted A  B or “A or B”).
•
 may be read as “or” since one or the other or both events may occur.
Intersection of Two Events
•
The intersection of two events A and B
(denoted A  B or “A and B”) is the event consisting of all outcomes in the
sample space S that are contained in both event A and event B.
•
 may be read as “and” since both events occur.
General Law of Addition
•
The general law of addition states that the probability of the union of two
events A and B is:
P (A  B) = P (A) + P (B) – P (A  B)
When you add the P (A) and P (B) together, you count the P (A and B) twice.
4
So, you have to subtract P (A  B) to avoid over-stating the probability.
•
For the card Ex:
P (Q) = 4/52
(4 queens in a deck)
P (R) = 26/52 (26 red cards in a deck)
P (Q  R) = 2/52 (2 red queens in a deck)
P (Q  R) = P (Q) + P (R) – P (Q  R) = 4/52 + 26/52 – 2/52= 28/52
Mutually Exclusive Events
•
Events A and B are mutually exclusive (or disjoint) if their intersection is the
null set () that contains no elements.
•
In the case of mutually exclusive events, the addition law reduces to:
P (A  B) = P (A) + P (B): Special Law of Addition
Complement of an Event
•
The complement of an event A is denoted by A′ (or A C ) and consists of
everything in the sample space S except event A.
•
Since A and A′ together comprise the entire sample space,
P (A) + P (A′ ) = 1
•
The probability of A′ is found by P (A′ ) = 1 – P (A)
•
For example, The Wall Street Journal reports that about 33% of all new
small businesses fail within the first 2 years. The probability that a new
small business will survive is:
P (survival) = 1 – P (failure) = 1 – .33 = .67 or 67%
Collectively Exhaustive Events
•
Events are collectively exhaustive if their union is the entire sample space S.
•
Two mutually exclusive, collectively exhaustive events are dichotomous (or
binary) events.
5
•
More than two mutually exclusive, collectively exhaustive events are
polytomous events.
Conditional Probability
•
The probability of event A given that event B has occurred.
•
Denoted P (A | B).
The vertical line “ | ” is read as “given.”
P( A | B) 
P( A  B)
for P (B) is not zero and undefined otherwise
P( B)
P( A | B)  P( Ac | B)  ?
•
Question:
•
Consider the logic of this formula by looking at the Venn diagram.
P( A | B) 
P( A  B)
P( B)
The sample space is restricted to B, an event that has occurred.
A  B is the part of B that is also in A.
The ratio of the relative size of A  B to B is P (A | B).
6
Independent Events
•
Event A is independent of event B if the conditional probability P (A | B) is
the same as the marginal probability P (A).
•
To check for independence, apply this test:
If P (A | B) = P (A) then event A is independent of B.
•
Another way to check for independence:
If P (A  B) = P (A) P (B) then event A is independent of event B since
P (A | B) = P (A  B) = P (A) P (B) = P (A)
P (B)
•
P (B)
Ex) Out of a target audience of 2,000,000, ad A reaches 500,000 viewers, B
reaches 300,000 viewers and both ads reach 100,000 viewers.
P( A) 
500, 000
 .25
2, 000, 000
P( A  B) 
•
P( B) 
300, 000
 .15
2, 000, 000
100, 000
 .05
2, 000, 000
What is P (A | B)?
P( A | B) 
P( A  B) .05

 .30
P( B)
.15
Dependent Events
•
When P (A) ≠ P (A | B), then events A and B are dependent.
•
For dependent events, knowing that event B has occurred will affect the
probability that event A will occur.
Multiplication Law for Independent Events
•
The probability of n independent events occurring simultaneously is:
P (A1  A2  ...  An) = P (A1) P (A2) ... P (An)
if the events are (mutually) independent
Note: P (A1  A2  ...  An)= P (A1) P (A2| A1)… P (An| A1 A2 
•
…
An-1 )
To illustrate system reliability, suppose a Web site has 2 independent file
servers. Each server has 99% reliability. What is the total system reliability?
Contingency Tables
•
A contingency table is a cross-tabulation of frequencies into rows and
columns.
7
•
A contingency table is like a frequency distribution for two variables.
•
Ex) Salary Gains and MBA Tuition. Consider the following cross-tabulation
table for n = 67 top-tier MBA programs:
Relative Frequencies
•
Calculate the relative frequencies below for each cell of the cross-tabulation
table to facilitate probability calculations.
Marginal Probabilities
•
The marginal probability of a single event is found by dividing a row or
column total by the total sample size.
•
For Ex, find the marginal probability of a medium salary gain (P (S2)=33/67).
Joint Probabilities
•
A joint probability represents the intersection of two events in a crosstabulation table.
•
Consider the joint event that the school has low tuition and large salary
gains (denoted as P (T1  S3 )= 1/67 ).
•
Let X and Y be a pair of discrete random variables. Their joint probability
function expresses the probability that X takes the specific value x and
simultaneously Y takes the value y, as a function of x and y. The notation
used is P(x, y) so,
8
P( x, y)  P( X  x  Y  y)
•
Let X and Y be a pair of jointly distributed random variables. In this context
the probability function of the random variable X is called its marginal
probability function and is obtained by summing the joint probabilities over
all possible values; that is,
P( x)   P( x, y )
y
•
Similarly, the marginal probability function of the random variable Y is
P( y )   P( x, y)
•
x
Let X and Y be discrete random variables with joint probability function
P(x,y). Then
1) 0  P(x,y)  1 for any pair of values x and y
2) The sum of the joint probabilities P(x, y) over all possible values must be 1.
Conditional Probabilities
•
Find the probability that the salary gains are small (S1) given that the MBA
tuition is large (T3). P (S1 | T3) =5/32
More about dependence/ independence
•
Two variables case
Ex 1)
X: Gender (events: M, F)
Y: Pregnant (events: Yes, No)
P(Yes|M)=0, P(Yes|F)
Knowing the gender affects the probability that the person is
pregnant: two variables are dependent
Ex 2) 52 Cards Example
X: value (events: 1,2,…,10,J,Q,K)
Y: color (events: Red, Black)
P(Q|Red)=P(Q), P(Q|Black)=P(Q)
Knowing the color of card does not affect the probability that the
Queen occurs.
* In order to check the independence of two variables, we need to check the
independence conditions of all the events between two variables.
Ex 3) to illustrate system reliability, suppose a Web site has 2 independent
file servers. Each server has 99% reliability.
system reliability?
9
What is the total
X: Server A (events: survive, fail)
Y: Server B (events: survive, fail)
Under the independence assumption of X and Y, the following
conditions should be satisfied.
P(A fail  B fail)=P(A fail)P(B fail)
P(A fail  B survive)=P(A fail)P(B survive)
P(A survive  B fail)=P(A survive)P(B fail)
P(A survive  B survive)=P(A survive)P(B survive)
Question: If two events A and B are mutually exclusive ( P( A  B)  0 ), could A and
B be independent each other?
Bayes’ Theorem
•
The prior (marginal) probability of an event B is revised after event A has
been considered to yield a posterior (conditional) probability.
•
Bayes’ formula is:
•
In some situations P (A) is not given.
P( B | A) 
P( A | B) P( B)
P( A)
Therefore, the most useful and
common form of Bayes’s Theorem is:
P( B | A) 
P( A | B) P( B)
P( A | B) P( B)  P( A | B ') P( B ')
•
Ex) Of the 580 women who test positive, 576 will actually be pregnant.
•
So, the desired probability is:
10
P (Pregnant│Positive Test) = 576/580 = .9931
First define
A = positive test
B = pregnant
A' = negative test
B ' = not pregnant
Some information is given: P (A | B) = .96, P (A | B ') = .01, P (B) = .60
or P (A' | B) = .04, P (A' | B ') = .99, P (B ') = .40
•
A generalization of Bayes’s Theorem allows event B to be polytomous (B1,
B2, … Bn) rather than dichotomous (B and B').
P( Bi | A) 
P( A | Bi ) P( Bi )
P( A | B1 ) P( B1 )  P( A | B2 ) P( B2 )  ...  P( A | Bn ) P( Bn )
11