Download Probability Theory Review Lecture Summary 1 Set theory: terms and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Cogsci 118A: Natural Computation I
Lecture 2 (01/07/10)
Probability Theory Review
Lecturer: Angela Yu
Scribe: Joseph Schilz
Lecture Summary
1. Set theory: terms and operators
In this section, we provide definitions for the set terms theory operations used in our review of probability.
2. Probability spaces
We define and explain the components of a probability space.
3. Joint probability, conditional probability, and marginal probability
We define and explain concepts in probability involving multiple random variables.
1
Set theory: terms and operators
To avoid inconsistencies and ambiguities, we will pursue a formal understanding of probability. This formal
understanding will be based on the theory of sets, and this lecture requires familiarity with the following set
theory terms and operators:
1. Naming conventions:
We generally denote by lower case letters a, b, c, d, . . . , x, y, z singular, non-set items. We denote by
upper case letters A, B, C, D, . . . sets of items. Calligraphic letters A, B, C . . . are often used to denote
sets which contain other sets.
2. Set membership: ∈
We denote that an element x is a member of the set A by: x ∈ A. It is also possible for one set to be
a member of another set. If the set E is a member of F then we write: E ∈ F.
3. Subset\superset relation: ⊆, ⊂, ⊇, ⊃
If all elements of the set A can also be found in B, we say that A is a subset of B and write: A ⊆ B.
Then it’s also true that B is a superset of A and we may write: B ⊇ A. Formally we write A ⊆ B iff
∀x ∈ A(x ∈ B).
Note: Some authors use ⊂ and ⊃ interchangeably with ⊆ and ⊇. Other authors may write A ⊂ B only
in cases where A ⊆ B and A 6= B. In this course ⊂ and ⊃ will be used in the former sense, though we
may use ⊆ and ⊇ to emphasize the possible equality of the sets in question.
4. The empty set: ∅
The empty set is the set which contains no elements. It is denoted by ∅ or by set brackets enclosing
no elements: {}. For any set A, it is true that ∅ ⊆ A.
2 (01/07/10)-1
5. Union: ∪
For sets A and B, the union of A and B is the set containing exactly those elements in either A or B.
This is denoted A ∪ B. Formally, A ∪ B = {x : x ∈ A ∨ x ∈ B}.
6. Intersection: ∩
For sets A and B, the intersection of A and B is the set containing exactly those elements in both A
and B. This is denoted A ∩ B. Formally, A ∩ B = {x : x ∈ A ∧ x ∈ B}.
7. Mutually exclusive:
Two sets A and B are said to be mutually exclusive if they contain no elements in common. More
than two sets, for example A1 , A2 , A3 , . . ., are said to be mutually exclusive if no two sets contain any
elements in common. Formally, A1 , A2 , A3 , . . . are mutually exclusive iff ∀i∀j(i 6= j ⇒ Ai ∩ Aj = ∅).
8. Complementation: AC
In the context of a set U , where A ⊆ U , the complement of A is the set of all those elements in U that
are not in A. This is denoted AC . Throughout this lecture, the set which fulfills the role of U will
generally be Ω.
2
Probability Spaces
This section discusses probability spaces. A probability space is a set of three elements: a sample space Ω, an
event space F, and a probability measure P, each obeying special properties. Section 2.1, Section 2.2, and
Section 2.3 discuss these elements respectively. Together, these elements define a mathematical experiment:
they list its possible outcomes, the events to which we can assign probability, and the actual probabilities
that we assign to those events.
Example: We take the single roll of a numbered, six-sided die as an example of a mathematical
experiment. We expand upon this example as this section progresses.
2.1
Sample Space
The sample space is the set of all possible outcomes of our mathematical experiment. It is denoted Ω. The
sample space may be finite or it may be infinite, depending upon the experiment. It cannot be empty. That
is: Ω 6= ∅. The sample space may also be called the outcome space. The variable ω is often used to denote
the elements of a sample space.
Example: If Ω is the set of all possible outcomes of rolling a six-sided die, then Ω = {1, 2, 3, 4, 5, 6}.
2.2
Event Space
The event space is the set of all events to which we can assign probabilities. In other words, it is the set of
all events of which we are able to measure the probability. The event space is denoted by F.
An event is a subset of Ω. An event is often denoted by the letter E, with some identifying subscript. The
event which contains every element of Ω may be denoted EΩ , and by definition then EΩ = Ω. The event
which contains no elements may be denoted E0 and is equal to ∅. An event which contains a single outcome
from Ω is called an atomic event.
2 (01/07/10)-2
Example: In our example Ω = {1, 2, 3, 4, 5, 6}. Consider the atomic event in which a four is
rolled: we might call this E4 where E4 = {4}. Let Eo = {1, 3, 5} be the event in which some odd
number is rolled and Ee = {2, 4, 6} be the event in which some even number is rolled. Then we
might have E4 , Eo , Ee , EΩ ∈ F.
(HW) Including the null event {} and the all inclusive event of the sample space itself Ω, how many total
events could there be in our event space F?
An event space must also satisfy the three additional properties of a sigma algebra. These properties are:
1. ∅ ∈ F
The empty set must be a member of F. This ensures that F will not be empty.
2. E ∈ F ⇒ E C ∈ F
For any event we find in F, we must also find its complement E C in F. This requirement is called
closure under complements.
[
3. E1 , E2 , E3 , . . . ∈ F ⇒
Ei ∈ F
i
For any finite or countably infinite group of events in F, we must also find the set which is the union
of those events in F. This requirement is called closure under countable unions.
(HW) Demonstrate that closure under complementation and closure under countable union together imply
closure under countable intersection.
Example: Consider F = {E4 , Eo , Ee , EΩ }. F does not meet condition (1) of a sigma algebra as
∅∈
/ F. We can add ∅, which we denote E0 , to F to produce F ′ = {E0 , E4 , Eo , Ee , EΩ }.
Though F ′ satisfies condition (1), it does not satisfy closure under complements. For example,
E4C = {4}C = {1, 2, 3, 5, 6} is not in F ′ . Consider F ′′ = {E0 , Eo , Ee , EΩ }. F ′′ satisfies condition
two, as E0C = {}C = {1, 2, 3, 4, 5, 6} = EΩ is in F ′′ , EoC = {1, 3, 5}C = {2, 4, 6} = Ee is in F ′′ ,
and so on.
F ′′ also satisfies closure under countable unions. For example, E0 ∪ Eo = {} ∪ {1, 3, 5} =
{1, 3, 5} = Eo is in F ′′ , as is any other arbitrary union.
Thus, F ′′ is a sigma algebra with respect to Ω.
Though in common speech we might use “event” and “outcome” interchangeably, they have distinct definitions in probability theory.
Example: Suppose our die is fair. Having no number come up is not a possible outcome of rolling
a die, thus it does not belong in our sample space Ω. However, the event in which no number
comes up when rolling a die is an event to which we can assign probability. Thus, E0 = {} = ∅
is a valid element of F. Later we will see that the probability we assign to E0 is zero.
2.3
Probability Measure
A probability measure P is a function from F into the real numbers. It is the function that assigns to each
event in F a numerical probability. Furthermore, P is subject to the following conditions:
2 (01/07/10)-3
1. ∀E ∈ F(0 ≤ P(E) ≤ 1)
For any event in our event space, the probability of that event occurring must be between zero and
one, inclusive. We can then define our probability measure as P : F ⇒ [0, 1].
2. P(Ω) = 1
The probability of the event Ω, also denoted EΩ occurring must be equal to one. Recall, EΩ is the
event in which any one of the possible outcomes of the experiment occurs. Thus, (2) states that when
the experiment is performed we are certain to see an outcome from the sample space.
[
X
3. For mutually exclusive events E1 , E2 , . . . , En , P( Ei ) =
P(Ei )
i
i
By (3), we can easily determine the probability of an event E if we know the probabilities of a mutually
exclusive group of simpler events which together compose E.
Example: Given P(Ee ) = 1/2 and P(E5 ) = 1/6 we may determine the probability that the die
lands on an even number or five by (3). Ee and E5 are mutually exclusive, so P(Ee ∪ E5 ) =
P(Ee ) + P(E5 ) = 1/2 + 1/6 = 4/6.
Probability measures have additional properties, which can be derived from the three stated above.
1. P(∅) = 0
The probability of no outcome occurring as a result of the experiment is equal to zero.
(3)
(2)
Proof By (2) we have P(Ω) = 1. We also find that P(Ω) = P(Ω ∪ ∅) = P(Ω) + P(∅) = 1 + P(∅).
Both the left and right sides of the equation are equal to one, and so P(∅) must equal zero.
2. Monotonicity: A ⊆ B ⇒ P(A) ≤ P(B)
Example: Suppose our die may be unfair. Even so, under no parameters would it be more
probable to roll a one than to roll either a one or a two. Formally, we can write this as:
{1} ⊆ {1, 2} ⇒ P({1}) ≤ P({1, 2}).
(HW) Prove that monotonicity follows from the definition of a probability measure.
[
X
3. Sub-additivity: A ⊆
Ai ⇒ P(A) ≤
P(Ai )
i
i
Example: Even if our die is unfair, we may be confident that P({6}) ≤ P({4, 5, 6} + P({5, 6})
because {6} ⊆ {4, 5, 6} ∪ {5, 6}.
(HW) Prove that sub-additivity follows from the definition of a probability measure.
Note: A probability measure may sometimes be denoted by a regular upper-case ‘P’. Though in one context
this upper-case ‘P’ may refer to a probability measure, in a different context it may refer to a cumulative
density function or CDF. The symbol P unambiguously refers to a probability measure.
2 (01/07/10)-4
3
3.1
Joint Probability, Conditional Probability,
and Marginal Probability
Random Variables
For this course, we will consider a random variable (r.v.) to be a variable representing experimental outcomes
which are numbers. That is, we assume that Ω ⊆ R, or Rn . For an arbitrary sample space Ω, we construct an
Ω′ ⊆ R by defining x : Ω ⇒ R. Then let Ω′ be the image of Ω under x. We define a new probability measure
P′ by P′ (x ∈ A) = P({ω : x(ω) ∈ A}). In practice, we then refer to Ω′ and P′ as Ω and P respectively.
Example: Consider the toss of a coin. Let Ω = {“heads”,“tails”}. Then we may define x
by x(“heads”) = 1 and x(“tails”) = 0. Then Ω′ = {1, 0}. If the coin is fair then P′ (1) =
P(“heads”) = 1/2 and P′ (0) = P(“tails”) = 1/2.
We generally consider P(x ∈ E), the probability that the outcome of our experiment x is a member of the
event E. As done above, we may sometimes write probabilities of the form P(E) or P(x).
Example: Consider the toss of a single fair coin. We might write that P(x ∈ {0}) = 1/2, however
it is often equally clear to write that P(0) = 1/2.
Now consider tossing two fair coins, the outcomes of which are given by random variables x1 and
x2 . If the coins are independent (see Section 3.2.1), then it is true that P(x1 ∈ A, x2 ∈ B) =
P(x1 ∈ A)P(x2 ∈ B) no matter our choices of A and B. If we simply wished to emphasize this
attribute, we might write: P(x1 , x2 ) = P(x1 )P(x2 ).
3.2
Joint Probability
Joint probability is the probability of two events both occurring as a result of performing an experiment.
Using random variables, we write the probability of our outcome x being both a member of event E1 and
∆
event E2 as P(x ∈ E1 , x ∈ E2 ). We define this as P(x ∈ E1 , x ∈ E2 ) = P(x ∈ E1 ∩ E2 )
3.2.1
Independence
We say that events E1 , E2 are independent if P(x ∈ E1 , x ∈ E2 ) = P(x ∈ E1 )P(x ∈ E2 ).
Example: Consider drawing a single card from a fair deck of 52. We find the probability that the
card is both a heart and a two, written P(“hearts”,“two”). We find that P(“heart”)P(“two”) =
(1/4)(1/13) = 1/52. We reason that drawing a card which is both a heart and a two will happen
iff we have drawn the two of hearts, which will occur with probability 1/52. Thus we confirm the
independence of the two events.
Consider the probability of drawing a card which is both a heart and red, written P(“hearts”,“red”).
We find that P(“heart”)P(“red”) = (1/4)(1/2) = 1/8. However, we reason that drawing a card
which is both a heart and red will happen iff we have drawn a heart. So P(“hearts”,“red”) =
P(“hearts”) = 1/4. Thus we confirm that the two events are not independent.
(HW) Show that if events A and B are independent, then P(A|B) = P(A) and P(B|A) = P(B).
2 (01/07/10)-5
3.3
Conditional Probability
Conditional probability is the probability of an event occurring given the assurance that a separate event
occurs. We denote the probability of A given B as P(A|B). We define conditional probability as:
∆
P(A|B) =
P(A, B)
.
P(B)
Example: Suppose we’ve tossed a fair die, and we know only that an even number has come up.
We find the probability that a two has come up using conditional probability:
P({2}|{2, 4, 6}) =
3.4
P({2}, {2, 4, 6})
P({2} ∩ {2, 4, 6})
P({2})
1/6
=
=
=
= 1/3 .
P({2, 4, 6})
P({2, 4, 6})
P({2, 4, 6})
1/2
Marginal Probability
Marginal probability is a property relating the unconditional probability of a single event to a sum
S of joint
probabilities.
Specifically,
for
a
group
of
mutually
exclusive
events
E
,
E
,
E
,
.
.
.,
such
that
Ei = Ω,
1
2
3
X
P(x) =
P(x, y ∈ Ei ).
i
Example: Consider tossing two fair coins, the outcomes of which are given by random variables x
and y respectively. Knowing that P(x = 1, y = 1) = P(x = 1, y = 0) = 1/4, we can find P(x = 1)
as P(x = 1, y = 1) + P(x = 1, y = 0) = 1/4 + 1/4 = 1/2.
6