Download Introduction to Probability Theory Probability Theory Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Introduction to
Probability Theory
Probability Theory
Probability spaces
Probability spaces
Counting
Counting
Probabilities
Probabilities
Conditional
probability
Conditional
probability
Bayes’ Theorem
Introduction to Probability Theory
Random variables
Expectation & variance
Distributions
To start out the course, we need to know something about
statistics and probability
I
L645
Dept. of Linguistics, Indiana University
Fall 2015
Probability spaces
Introduction to
Probability Theory
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
This is only an introduction; for a fuller understanding,
you would need to take a statistics course
Probability theory = theory to determine how likely it is that
some outcome will occur
1 / 34
2 / 34
Introduction to
Probability Theory
Introduction to
Probability Theory
Sample space
Die rolling
We state things in terms of an experiment (or trial)—e.g.,
flipping three coins
I
outcome: one particular possible result
I
I
e.g., coin 1 = heads, coin 2 = tails, coin 3 = tails (HTT)
event: one particular possible set of results, i.e., a more
abstract idea
I e.g., two tails and one head ({HTT, THT, TTH})
Probability spaces
Probability spaces
Counting
Counting
Probabilities
Probabilities
Conditional
probability
Conditional
probability
Bayes’ Theorem
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
If we have a 6-sided die
I
I
The set of basic outcomes makes up the sample space (Ω)
I
I
Discrete sample space: countably infinite outcomes (1,
2, 3, ...), e.g., heads or tails
Continuous sample space: uncountably infinite
outcomes (1.1293..., 8.765..., ...), e.g., height
Random variables
Expectation & variance
Distributions
Sample space Ω = {One, Two, Three, Four, Five, Six}
Event space F = {{One}, {One, Two}, {One, Three, Five}
...}
I With 6 options, there are 26 = 64 distinct events
We will use F to refer to the set of events, or event space
Principles of counting
3 / 34
4 / 34
Introduction to
Probability Theory
Introduction to
Probability Theory
Principles of counting
Examples
Probability spaces
Probability spaces
Counting
Counting
Probabilities
I
I
Multiplication Principle: if there are two independent
events, P and Q, and P can happen in p different ways
and Q in q different ways, then P and Q can happen in
p · q ways
Probabilities
Conditional
probability
I
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
example 1: If there are 3 roads leading from
Bloomington to Indianapolis and 5 roads from
Indianapolis to Chicago, how many ways are there to
get from Bloomington to Chicago?
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
answer: 3 · 5 = 15
Addition Principle: if there are two independent
events, P and Q, and P can happen in p different ways
and Q in q different ways, then P or Q can happen in
p + q ways
I
example 2: If there are 2 roads going south from
Bloomington and 6 roads going north. How many roads
are there going south or north?
answer: 2 + 6 = 8
5 / 34
6 / 34
Exercises
Introduction to
Probability Theory
Probability spaces
Counting
Probabilities
I
I
How many different 7-place license plates are possible
if the first two places are for letters and the other 5 for
numbers?
John, Jim, Jack and Jay have formed a band consisting
of 4 instruments.
I
I
Probability functions
Probabilities: if A is an event, P (A ) is its probability
I
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
0 ≤ P (A ) ≤ 1
Probability spaces
Counting
Probabilities
Conditional
probability
A probability function (distribution) distributes a
probability mass of 1 over the sample space Ω
I
If each of the boys can play all 4 instruments, how many
different arrangements are possible?
What if John and Jim can play all 4 instruments, but Jay
and Jack can each play only piano and drums?
Introduction to
Probability Theory
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
Probability function: any function P : F → [0, 1], where:
I
P (Ω) = 1
I
Aj ∈ F: P (
∞
S
j =1
Aj ) =
∞
P
j =1
P (Aj ) (countable additivity)
... for disjoint sets: Aj ∩ Ak = ∅ for j , k
i.e., the probability of any event Aj happening = sum of the
probabilities of any individual event happening
I
e.g., P (roll = 1 ∪ roll = 2) = P (roll = 1) + P (roll = 2)
7 / 34
Example
Introduction to
Probability Theory
8 / 34
Additivity for non-disjoint sets
Probability spaces
Probability spaces
Counting
Counting
Probabilities
Toss a fair coin three times. What is the chance of exactly 2
heads coming up?
I
I
Sample space Ω =
{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}
Event of interest A = {HHT,HTH,THH}
Probabilities
Conditional
probability
Conditional
probability
Bayes’ Theorem
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
P (A ∪ B ) = P (A ) + P (B ) − P (A ∩ B )
I
The probability of unioning A and B requires adding up
their individual probabilities
I
... then subtracting out their intersection, so as not to
double count that portion
Since the coin is fair, we have a uniform distribution, i.e.,
each outcome is equally likely (1/8)
I
P (A ) =
|A |
|Ω|
=
Introduction to
Probability Theory
Random variables
Expectation & variance
Distributions
3
8
Conditional probability
9 / 34
10 / 34
Introduction to
Probability Theory
Introduction to
Probability Theory
Conditional probability
Example
Probability spaces
Probability spaces
Counting
Counting
Probabilities
The conditional probability of an event A occurring given
that event B has already occurred is notated as P (A |B )
I
I
Prior probability of A : P (A )
Posterior probability of A (after additional knowledge
B): P (A |B )
(1) P (A |B ) =
I
I
P (A ∩B )
P (B )
=
Probabilities
Conditional
probability
I
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
I
P ( A ,B )
P (B )
I
P (A , B ) (or P (AB )) is the joint probability
In some sense, B has become the sample space
I
11 / 34
A coin is flipped twice. If we assume that all four points
in the sample space
Ω = {(H , H ), (H , T ), (T , H ), (T , T )}
are equally likely, what is the conditional probability that
both flips result in heads given that the first flip does?
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
A = {(H , H )}, B = {(H , H ), (H , T )}
P (A ) = 14
P (B ) = 24
P (A , B ) =
P (A |B ) =
1
4
P (A ,B )
P (B )
=
1
4
2
4
=
1
2
12 / 34
The chain rule
Introduction to
Probability Theory
The multiplication rule restates P (A |B ) =
P (A ∩B )
:
P (B )
(2) P (A ∩ B ) = P (A |B )P (B ) = P (B |A )P (A )
The chain rule (used in Markov models):
Counting
Probabilities
Probabilities
Conditional
probability
Bayes’ Theorem
Random variables
nT
−1
i =1
I
I
I
Two events are independent if knowing one does not affect
the probability of the other
Events A and B are independent if
I
Ai )
I
i.e., to obtain the probability of events occurring:
I
Probability spaces
Counting
Distributions
P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 )...P (An |
Introduction to
Probability Theory
Probability spaces
Expectation & variance
(3) P (A1 ∩ ... ∩ An ) =
Independence
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
P (A ) = P (A |B )
i.e., P (A ∩ B ) = P (A )P (B )
i.e., probability of seeing A and B together = product of
seeing each one individually, as one does not affect other
select the first event
select the second event, given the first
...
select the nth event, given all the previous ones
13 / 34
Introduction to
Probability Theory
Independence
14 / 34
Bayes’ Theorem
Introduction to
Probability Theory
Example
Probability spaces
Probability spaces
Counting
Counting
Probabilities
I
I
I
fair die: event A = divisible by two; event B = divisible by
three
P(AB) = P({six}) =
P(A) × P(B) =
1
2
×
1
6
1
3
=
event C = divisible by four
P(C) = P({four}) =
I
1
6
P(AC) = P({four}) =
P(A) × P(C) =
×
Bayes’ Theorem
Distributions
Bayes’ Theorem allows one to calculate P (B |A ) in terms of
P (A |B )
(4) P (B |A ) =
I
1
2
Bayes’ Theorem
Random variables
Expectation & variance
1
6
I
I
Probabilities
Conditional
probability
Conditional
probability
1
6
1
6
=
P (A ∩B )
P (A )
=
P (A |B )P (B )
P (A )
15 / 34
16 / 34
Introduction to
Probability Theory
Partitioning
Complement sets
Probability spaces
Probability spaces
Counting
Counting
Probabilities
Bayes’ Theorem takes into account the normalizing constant
P (A )
Probabilities
Conditional
probability
I
Bayes’ Theorem
Let E and F be events.
Random variables
Expectation & variance
=
P (A |B )P (B )
P (A )
Distributions
c
P (A |B )P (B )
P (A )
E = EF ∪ EF c
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
where EF and EF are mutually exclusive.
If P (A ) is the same for every event of interest, and we want
to find the value of B which maximizes the function:
(6) arg maxB
Distributions
Introduction to
Probability Theory
Getting the most likely event
P (A ∩B )
P (A )
Expectation & variance
1
12
Bayes
(5) P (B |A ) =
Random variables
I
P (E ) = P (EF ) + P (EF c )
= arg maxB P (A |B )P (B )
= P (E |F )P (F ) + P (E |F c )P (F c )
= P (E |F )P (F ) + P (E |F c )P (1 − P (F ))
... so, in these cases, we can ignore the denominator
17 / 34
18 / 34
Example
Introduction to
Probability Theory
Example (2)
Introduction to
Probability Theory
Probability spaces
Probability spaces
Counting
Counting
Probabilities
I
I
Probabilities
An insurance company groups people into two classes:
Those who are accident-prone and those who are not.
Conditional
probability
Their statistics show that an accident-prone person will
have an accident within a fixed 1-year period with prob.
0.4. Whereas the prob. decreases to 0.2 for a
non-accident-prone person.
Random variables
I
Let us assume that 30 percent of the population are
accident prone.
I
What is the prob. that a new policyholder will have an
accident within a year of purchasing a policy?
I
Bayes’ Theorem
Expectation & variance
Distributions
I
Y = policyholder will have an accident within one year
A = policyholder is accident prone
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
look for P(Y)
Distributions
I
P (Y ) = P (Y |A )P (A ) + P (Y |A c )P (A c )
= 0.4 · 0.3 + 0.2 · 0.7 = 0.26
20 / 34
19 / 34
Partitioning
Introduction to
Probability Theory
Example of Bayes’ Theorem
Introduction to
Probability Theory
Probability spaces
Counting
Probabilities
Let’s say we have i different, disjoint sets Bi , and these sets
S
partition A (i.e., A ⊆ i Bi )
Then, the following is true:
(7) P (A ) =
P
i
P (A ∩ Bi ) =
P
i
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
P (A |Bi )P (Bi )
P (A |Bj )P (Bj )
P (A )
=
Given that we have pulled a red chip, what is the probability
that it came from bowl B1 ? In other words, what is P (B1 |R )?
I
This gives us a more general form of Bayes’ Theorem:
(8) P (Bj |A ) =
Probability spaces
Assume the following:
I Bowl B1 (P (B1 ) = 1 ) has 2 red and 4 white chips
3
I Bowl B2 (P (B2 ) = 1 ) has 1 red and 2 white chips
6
I Bowl B3 (P (B3 ) = 1 ) has 5 red and 4 white chips
2
I
I
P (A |Bj )P (Bj )
P
i P (A |Bi )P (Bi )
P (B1 ) = 13
P (R |B1 ) =
2
2+ 4
=
Expectation & variance
Distributions
P (R ) = P (B1 ∩ R ) + P (B2 ∩ R ) + P (B3 ∩ R )
= P (R |B1 )P (B1 ) + P (R |B2 )P (B2 ) + P (R |B3 )P (B3 ) =
P (R |B1 )P (B1 )
P (R )
=
(1/3)(1/3)
4/9
=
4
9
1
4
21 / 34
22 / 34
Introduction to
Probability Theory
Random Variables
Motivation
Probability spaces
Probability spaces
Counting
Counting
Probabilities
Probabilities
Conditional
probability
If we roll 2 dice:
Bayes’ Theorem
...
{(4,6),(5,5),(6,4)}
{(5,6),(6,5)}
{(6,6)}
Bayes’ Theorem
Random variables
Introduction to
Probability Theory
Mapping Outcomes to Numerical Values
Outcomes
{(1,1)}
{(1,2),(2,1)}
{(1,3),(2,2),(3,1)}
{(1,4),(2,3),(3,2),(4,1)}
Probabilities
Conditional
probability
1
3
So, we have: P (B1 |R ) =
Random Variables
Counting
Numerical Value
2
3
4
5
Probability
10
11
12
1
12
1
18
1
36
1
36
1
18
1
12
1
9
Random variables
Expectation & variance
Distributions
A random variable maps the set of outcomes Ω into the set
of real numbers.
I
Formally, a random variable is a function X → Rn
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
Motivation for random variables:
23 / 34
I
They abstract away from outcomes by putting outcomes
into equivalence classes
I
Mapping to numerical values makes calculations easier.
I
They facilitate numerical manipulations, such as the
definition of mean and standard deviation.
24 / 34
Example
Introduction to
Probability Theory
Suppose that our experiment consists of tossing 3 fair coins.
If we let Y denote the number of heads appearing, then Y is
a random variable taking on one of the values 0, 1, 2, 3 with
respective probabilities.
Introduction to
Probability Theory
Probability spaces
Probability spaces
Counting
Counting
1
P { Y = 0} = P ({(T , T , T )}) =
8
Probabilities
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
P { Y = 0} = P ({(T , T , T )}) =
Example (2)
1
8
3
8
3
P { Y = 2} = P ({(H , H , T ), (H , T , H ), (T , H , H )}) =
8
1
P { Y = 3} = P ({(H , H , H )}) =
8
P { Y = 1} = P ({(T , T , H ), (T , H , T ), (H , T , T )}) =
Probabilities
Conditional
probability
3
8
3
P { Y = 2} = P ({(H , H , T ), (H , T , H ), (T , H , H )}) =
8
1
P { Y = 3} = P ({(H , H , H )}) =
8
P { Y = 1} = P ({(T , T , H ), (T , H , T ), (H , T , T )}) =
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
Since Y must take on one of the values 0 through 3,
3
[
1 = P(
i =0
{Y = i }) =
3
X
i =0
P {Y = i }
25 / 34
Expectation of a Random Variable
Introduction to
Probability Theory
26 / 34
Example
Introduction to
Probability Theory
Probability spaces
Probability spaces
Counting
Counting
Probabilities
If X is a discrete random variable with probability mass
function p (a ), the expectation or expected value of X ,
denoted by E [X ], is defined by
E [X ] =
X
x :p (x )>0
Probabilities
Conditional
probability
Find E [X ] where X is the outcome when we roll a fair die.
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
x · p (x )
Bayes’ Theorem
Random variables
Solution
Since p (1) = · · · = p (6) =
Conditional
probability
Expectation & variance
Distributions
1
6,
1
1
1
1
1
7
1
E [X ] = 1 · ( ) + 2 · ( ) + 3 · ( ) + 4 · ( ) + 5 · ( ) + 6 · ( ) =
6
6
6
6
6
6
2
In other words: E [X ] is the weighted average of the possible
values of X , each value being weighted by the probability
that X has that particular value.
28 / 34
27 / 34
Example (2)
Introduction to
Probability Theory
Example (3)
Probability spaces
Probability spaces
Counting
Counting
Probabilities
Probabilities
Conditional
probability
Bayes’ Theorem
Assume we have the following weighted die:
I
p (X = 1) = p (X = 2) = p (X = 5) =
I
p (X = 3) = p (X = 4) =
I
p (X = 6) =
1
3
1
12
1
6
Introduction to
Probability Theory
Random variables
Expectation & variance
Distributions
A school class of 120 students are driven in three buses to a
symphonic performance. There are 36 students in one of the
buses, 40 in another, and 44 in the third bus. When the
buses arrive, one of the 120 students is randomly chosen.
Let X be a random variable denoting the students on the bus
of that randomly chosen student.
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
What is the expectation here?
Task: Find E [X ]
29 / 34
30 / 34
Variance
Introduction to
Probability Theory
Expectation alone ignores the question:
I
Do the values of a random variable tend to be
consistent over many trials or do they tend to vary a lot?
If X is a discrete random variable with mean µ, then the
variance of X , denoted by Var (X ), is defined by
Example
Introduction to
Probability Theory
Probability spaces
Probability spaces
Calculate Var (X ) if X represents the outcome when a fair
die is rolled.
Counting
Probabilities
Conditional
probability
Bayes’ Theorem
Random variables
Expectation & variance
Distributions
6
1
1
+5 ( ) + 62 ( )
6
6
1
= ( )(91)
6
6
Probabilities
Conditional
probability
Bayes’ Theorem
Solution
1
1
1
1
E [X 2 ] = 12 ( ) + 22 ( ) + 32 ( ) + 42 ( )
6
Counting
Random variables
Expectation & variance
6
Distributions
2
Var (X ) = E [(X − µ)2 ]
= E (X 2 ) − (E (X ))2
Hence
The standard deviation of X , denoted by σ(X ), is defined as
σ(X ) =
q
Var (X ) =
Var (X )
91
7
35
− ( )2 =
6
2
12
31 / 34
Example for expectation and variance
Introduction to
Probability Theory
Probability spaces
Counting
When we roll two dice, what is the expectation and the
variance for the sum of the numbers on the two dice?
32 / 34
Joint distributions
(9)
(10)
=
=
x
P
x
Counting
Probabilities
I
Conditional
probability
Joint pmf: The probability of both x and y happening
Random variables
Expectation & variance
Distributions
Expectation & variance
I
Marginal pmfs: The probability of x happening is the
sum of the occurrences of x with all the different y’s
(12) pX (x ) =
(13) pY (y ) =
p (x )(x − E (X ))2
Conditional
probability
Bayes’ Theorem
(11) p (x , y ) = P (X = x , Y = y )
Random variables
Var (X ) = E ((X − E (X ))2 )
P
Probability spaces
With (discrete) random variables, we can define:
Probabilities
Bayes’ Theorem
E (X ) = E (Y + Y )
= E (Y ) + E (Y )
= 3.5 + 3.5 = 7
Introduction to
Probability Theory
P
y
p (x , y )
x
p (x , y )
P
Distributions
If X and Y are independent, then p (x , y ) = pX (x )pY (y ), so,
e.g., the probability of rolling two sixes is:
p (x )(x − 7)2 = 5 65
I
33 / 34
p (X = 6, Y = 6) = p (X = 6)p (Y = 6) = ( 16 )( 16 ) =
1
36
34 / 34