Download Lecture Notes 1 Probability and Random Variables • Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Lecture Notes 1
Probability and Random Variables
•
Probability Spaces
•
Conditional Probability and Independence
•
Random Variables
•
Functions of a Random Variable
•
Generation of a Random Variable
•
Jointly Distributed Random Variables
•
Scalar detection
EE 278B: Probability and Random Variables
1–1
Probability Theory
• Probability theory provides the mathematical rules for assigning probabilities to
outcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage
• Basic elements of probability theory:
◦ Sample space Ω: set of all possible “elementary” or “finest grain” outcomes
of the random experiment
◦ Set of events F : set of (all?) subsets of Ω — an event A ⊂ Ω occurs if the
outcome ω ∈ A
◦ Probability measure P: function over F that assigns probabilities to events
according to the axioms of probability (see below)
• Formally, a probability space is the triple (Ω, F, P)
EE 278B: Probability and Random Variables
1–2
Axioms of Probability
• A probability measure P satisfies the following axioms:
1. P(A) ≥ 0 for every event A in F
2. P(Ω) = 1
3. If A1, A2, . . . are disjoint events — i.e., Ai ∩ Aj = ∅, for all i 6= j — then
[
∞
∞
X
P
Ai =
P(Ai)
i=1
i=1
• Notes:
◦ P is a measure in the same sense as mass, length, area, and volume — all
satisfy axioms 1 and 3
◦ Unlike these other measures, P is bounded by 1 (axiom 2)
◦ This analogy provides some intuition but is not sufficient to fully understand
probability theory — other aspects such as conditioning and independence are
unique to probability
EE 278B: Probability and Random Variables
1–3
Discrete Probability Spaces
• A sample space Ω is said to be discrete if it is countable
• Examples:
◦ Rolling a die: Ω = {1, 2, 3, 4, 5, 6}
◦ Flipping a coin n times: Ω = {H, T }n , sequences of heads/tails of length n
◦ Flipping a coin until the first heads occurs: Ω = {H, T H, T T H, T T T H, . . .}
• For discrete sample spaces, the set of events F can be taken to be the set of all
subsets of Ω, sometimes called the power set of Ω
• Example: For the coin flipping experiment,
F = {∅, {H}, {T }, Ω}
• F does not have to be the entire power set (more on this later)
EE 278B: Probability and Random Variables
1–4
• The probability measure P can be defined by assigning probabilities to individual
outcomes — single outcome events {ω} — so that:
P({ω}) ≥ 0 for every ω ∈ Ω
X
P({ω}) = 1
ω∈Ω
• The probability of any other event A is simply
X
P({ω})
P(A) =
ω∈A
• Example: For the die rolling experiment, assign
P({i}) =
1
6
for i = 1, 2, . . . , 6
The probability of the event “the outcome is even,” A = {2, 4, 6}, is
P(A) = P({2}) + P({4}) + P({6}) =
3
6
=
1
2
EE 278B: Probability and Random Variables
1–5
Continuous Probability Spaces
• A continuous sample space Ω has an uncountable number of elements
• Examples:
◦ Random number between 0 and 1: Ω = (0, 1 ]
◦ Point in the unit disk: Ω = {(x, y) : x2 + y 2 ≤ 1}
◦ Arrival times of n packets: Ω = (0, ∞)n
• For continuous Ω, we cannot in general define the probability measure P by first
assigning probabilities to outcomes
• To see why, consider assigning a uniform probability measure over (0, 1 ]
◦ In this case the probability of each single outcome event is zero
◦ How do we find the probability of an event such as A = [0.25, 0.75]?
EE 278B: Probability and Random Variables
1–6
• Another difference for continuous Ω: we cannot take the set of events F as the
power set of Ω. (To learn why you need to study measure theory, which is
beyond the scope of this course)
• The set of events F cannot be an arbitrary collection of subsets of Ω. It must
make sense, e.g., if A is an event, then its complement Ac must also be an
event, the union of two events must be an event, and so on
• Formally, F must be a sigma algebra (σ-algebra, σ-field), which satisfies the
following axioms:
1. ∅ ∈ F
2. If A ∈ F then Ac ∈ F
3. If A1, A2, . . . ∈ F then
S∞
i=1 Ai
∈F
• Of course, the power set is a sigma algebra. But we can define smaller
σ-algebras. For example, for rolling a die, we could define the set of events as
F = {∅, odd, even, Ω}
EE 278B: Probability and Random Variables
1–7
• For Ω = R = (−∞, ∞) (or (0, ∞), (0, 1), etc.) F is typically defined as the
family of sets obtained by starting from the intervals and taking countable
unions, intersections, and complements
• The resulting F is called the Borel field
• Note: Amazingly there are subsets in R that cannot be generated in this way!
(Not ones that you are likely to encounter in your life as an engineer or even as
a mathematician)
• To define a probability measure over a Borel field, we first assign probabilities to
the intervals in a consistent way, i.e., in a way that satisfies the axioms of
probability
For example to define uniform probability measure over (0, 1), we first assign
P((a, b)) = b − a to all intervals
• In EE 278 we do not deal with sigma fields or the Borel field beyond (kind of)
knowing what they are
EE 278B: Probability and Random Variables
1–8
Useful Probability Laws
• Union of Events Bound:
n
n
[
X
P
Ai ≤
P(Ai)
i=1
i=1
• Law of Total Probability : Let A1, A2,SA3, . . . be events that partition Ω, i.e.,
disjoint (Ai ∩ Aj = ∅ for i 6= j) and i Ai = Ω. Then for any event B
P
P(B) = iP(Ai ∩ B)
The Law of Total Probability is very useful for finding probabilities of sets
EE 278B: Probability and Random Variables
1–9
Conditional Probability
• Let B be an event such that P(B) 6= 0. The conditional probability of event A
given B is defined to be
P(A | B) =
P(A ∩ B)
P(A, B)
=
P(B)
P(B)
• The function P(· | B) is a probability measure over F , i.e., it satisfies the
axioms of probability
• Chain rule: P(A, B) = P(A)P(B | A) = P(B)P(A | B) (this can be generalized
to n events)
• The probability of event A given B, a nonzero probability event — the
a posteriori probability of A — is related to the unconditional probability of
A — the a priori probability — by
P(A | B) =
P(B | A)
P(A)
P(B)
This follows directly from the definition of conditional probability
EE 278B: Probability and Random Variables
1 – 10
Bayes Rule
• Let A1, A2, . . . , An be nonzero probability events that partition Ω, and let B be
a nonzero probability event
• We know P(Ai ) and P(B | Ai), i = 1, 2, . . . , n, and want to find the a posteriori
probabilities P(Aj | B), j = 1, 2, . . . , n
• We know that
P(Aj | B) =
P(B | Aj )
P(Aj )
P(B)
• By the law of total probability
n
n
X
X
P(B) =
P(Ai, B) =
P(Ai )P(B | Ai)
i=1
i=1
• Substituting, we obtain Bayes rule
P(B | Aj )
P(Aj | B) = Pn
P(Aj ), j = 1, 2, . . . , n
i=1 P(Ai )P(B | Ai )
• Bayes rule also applies to a (countably) infinite number of events
EE 278B: Probability and Random Variables
1 – 11
Independence
• Two events are said to be statistically independent if
P(A, B) = P(A)P(B)
• When P(B) 6= 0, this is equivalent to
P(A | B) = P(A)
In other words, knowing whether B occurs does not change the probability of A
• The events A1, A2, . . . , An are said to be independent if for every subset
Ai1 , Ai2 , . . . , Aik of the events,
P(Ai1 , Ai2 , . . . , Aik ) =
k
Y
P(Aij )
j=1
• Note: P(A1, A2, . . . , An) =
Qn
EE 278B: Probability and Random Variables
j=1 P(Ai )
is not sufficient for independence
1 – 12
Random Variables
• A random variable (r.v.) is a real-valued function X(ω) over a sample space Ω,
i.e., X : Ω → R
Ω
ω
X(ω)
• Notations:
◦ We use upper case letters for random variables: X, Y, Z, Φ, Θ, . . .
◦ We use lower case letters for values of random variables: X = x means that
random variable X takes on the value x, i.e., X(ω) = x where ω is the
outcome
EE 278B: Probability and Random Variables
1 – 13
Specifying a Random Variable
• Specifying a random variable means being able to determine the probability that
X ∈ A for any Borel set A ⊂ R, in particular, for any interval (a, b ]
• To do so, consider the inverse image of A under X , i.e., {ω : X(ω) ∈ A}
R
set A
inverse image of A under X(ω), i.e., {ω : X(ω) ∈ A}
• Since X ∈ A iff ω ∈ {ω : X(ω) ∈ A},
P({X ∈ A}) = P({ω : X(ω) ∈ A}) = P{ω : X(ω) ∈ A}
Shorthand: P({set description}) = P{set description}
EE 278B: Probability and Random Variables
1 – 14
Cumulative Distribution Function (CDF)
• We need to be able to determine P{X ∈ A} for any Borel set A ⊂ R, i.e., any
set generated by starting from intervals and taking countable unions,
intersections, and complements
• Hence, it suffices to specify P{X ∈ (a, b ]} for all intervals. The probability of
any other Borel set can be determined by the axioms of probability
• Equivalently, it suffices to specify its cumulative distribution function (cdf):
FX (x) = P{X ≤ x} = P{X ∈ (−∞, x ]} ,
x∈R
• Properties of cdf:
◦ FX (x) ≥ 0
◦ FX (x) is monotonically nondecreasing, i.e., if a > b then FX (a) ≥ FX (b)
FX (x)
1
x
EE 278B: Probability and Random Variables
◦ Limits:
lim FX (x) = 1 and
x→+∞
1 – 15
lim FX (x) = 0
x→−∞
◦ FX (x) is right continuous, i.e., FX (a+) = limx→a+ FX (x) = FX (a)
◦ P{X = a} = FX (a) − FX (a−), where FX (a−) = limx→a− FX (x)
◦ For any Borel set A, P{X ∈ A} can be determined from FX (x)
• Notation: X ∼ FX (x) means that X has cdf FX (x)
EE 278B: Probability and Random Variables
1 – 16
Probability Mass Function (PMF)
PSfrag
• A random variable is said to be discrete if FX (x) consists only of steps over a
countable set X
1
x
• Hence, a discrete random variable can be completely specified by the probability
mass function (pmf)
pX (x) = P{X = x} for every x ∈ X
P
Clearly pX (x) ≥ 0 and x∈X pX (x) = 1
• Notation: We use X ∼ pX (x) or simply X ∼ p(x) to mean that the discrete
random variable X has pmf pX (x) or p(x)
EE 278B: Probability and Random Variables
1 – 17
• Famous discrete random variables:
◦ Bernoulli: X ∼ Bern(p) for 0 ≤ p ≤ 1 has the pmf
pX (1) = p
and
pX (0) = 1 − p
◦ Geometric: X ∼ Geom(p) for 0 ≤ p ≤ 1 has the pmf
pX (k) = p(1 − p)k−1 ,
k = 1, 2, 3, . . .
◦ Binomial: X ∼ Binom(n, p) for integer n > 0 and 0 ≤ p ≤ 1 has the pmf
n k
p (1 − p)n−k , k = 0, 1, 2, . . .
pX (k) =
k
◦ Poisson: X ∼ Poisson(λ) for λ > 0 has the pmf
pX (k) =
λk −λ
e ,
k!
k = 0, 1, 2, . . .
◦ Remark: Poisson is the limit of Binomial for np = λ as n → ∞, i.e., for every
k = 0, 1, 2, . . ., the Binom(n, λ/n) pmf
pX (k) →
EE 278B: Probability and Random Variables
λk −λ
e
k!
as n → ∞
1 – 18
Probability Density Function (PDF)
PSfrag
• A random variable is said to be continuous if its cdf is a continuous function
1
x
• If FX (x) is continuous and differentiable (except possibly over a countable set),
then X can be completely specified by a probability density function (pdf)
fX (x) such that
Z
x
FX (x) =
fX (u) du
−∞
• If FX (x) is differentiable everywhere, then (by definition of derivative)
dFX (x)
dx
F (x + ∆x) − F (x)
P{x < X ≤ x + ∆x}
= lim
= lim
∆x
∆x
∆x→0
∆x→0
fX (x) =
EE 278B: Probability and Random Variables
1 – 19
• Properties of pdf:
◦ fX (x) ≥ 0
Z ∞
◦
fX (x) dx = 1
−∞
◦ For any event (Borel set) A ⊂ R,
P{X ∈ A} =
In particular,
Z
fX (x) dx
x∈A
P{x1 < X ≤ x2} =
Z
x2
fX (x) dx
x1
• Important note: fX (x) should not be interpreted as the probability that X = x.
In fact, fX (x) is not a probability measure since it can be > 1
• Notation: X ∼ fX (x) means that X has pdf fX (x)
EE 278B: Probability and Random Variables
1 – 20
• Famous continuous random variables:
◦ Uniform: X ∼ U[a, b ] where a < b has pdf
(
1
if a ≤ x ≤ b
b−a
fX (x) =
0
otherwise
◦ Exponential: X ∼ Exp(λ) where λ > 0 has pdf
(
λe−λx if x ≥ 0
fX (x) =
0
otherwise
◦ Laplace: X ∼ Laplace(λ) where λ > 0 has pdf
1
2
fX (x) = λe−λ|x|
◦ Gaussian: X ∼ N (µ, σ 2 ) with parameters µ (the mean) and σ 2 (the
variance, σ is the standard deviation) has pdf
1
2
fX (x) = √
2πσ 2
(x−µ)
−
2σ 2
e
EE 278B: Probability and Random Variables
1 – 21
The cdf of the standard normal random variable N (0, 1) is
Z x
2
1 −u
Φ(x) =
√ e 2 du
−∞
2π
Define the function Q(x) = 1 − Φ(x) = P{X > x}
N (0, 1)
Q(x)
x
The Q(·) function is used to compute P{X > a} for any Gaussian r.v. X :
Given Y ∼ N (µ, σ 2), we represent it using the standard X ∼ N (0, 1) as
Y = σX + µ
Then
y−µ
y−µ
=Q
P{Y > y} = P X >
σ
σ
√
◦ The complementary error function is erfc(x) = 2Q( 2 x)
EE 278B: Probability and Random Variables
1 – 22
Functions of a Random Variable
• Suppose we are given a r.v. X with known cdf FX (x) and a function y = g(x).
What is the cdf of the random variable Y = g(X)?
• We use
FY (y) = P{Y ≤ y} = P{x : g(x) ≤ y}
y
x
y
{x : g(x) ≤ y}
EE 278B: Probability and Random Variables
1 – 23
• Example: Quadratic function. Let X ∼ FX (x) and Y = X 2 . We wish to find
FY (y)
y
y
√
− y
√
y
x
If y < 0, then clearly FY (y) = 0. Consider y ≥ 0,
√
√
√
√
FY (y) = P {− y < X ≤ y } = FX ( y) − FX (− y )
If X is continuous with density fX (x), then
√ √
1
fY (y) = √ fX (+ y) + fX (− y)
2 y
EE 278B: Probability and Random Variables
1 – 24
• Remark: In general, let X ∼ fX (x) and Y = g(X) be differentiable. Then
fY (y) =
k
X
fX (xi)
|g ′(xi)|
i=1
,
where x1, x2, . . . are the solutions of the equation y = g(x) and g ′(xi ) is the
derivative of g evaluated at xi
EE 278B: Probability and Random Variables
1 – 25
• Example: Limiter. Let X ∼ Laplace(1), i.e., fX (x) = (1/2)e−|x| , and let Y be
defined by the function of X shown in the figure. Find the cdf of Y
y
+a
−1
+1
−a
x
To find the cdf of Y , we consider the following cases
◦ y < −a: Here clearly FY (y) = 0
◦ y = −a: Here
FY (−a) = FX (−1)
Z −1
1 −1
1 x
=
2 e dx = 2 e
−∞
EE 278B: Probability and Random Variables
1 – 26
◦ −a < y < a: Here
FY (y) = P{Y ≤ y}
= P {aX ≤ y}
n
y yo
=P X≤
= FX
a
a
Z y/a
1 −|x|
dx
= 12 e−1 +
2e
−1
◦ y ≥ a: Here FY (y) = 1
Combining the results, the following is a sketch of the cdf of Y
FY (y)
−a
a
y
EE 278B: Probability and Random Variables
1 – 27
Generation of Random Variables
• Generating a r.v. with a prescribed distribution is often needed for performing
simulations involving random phenomena, e.g., noise or random arrivals
• First let X ∼ F (x) where the cdf F (x) is continuous and strictly increasing.
Define Y = F (X), a real-valued random variable that is a function of X
What is the cdf of Y ?
Clearly, FY (y) = 0 for y < 0, and FY (y) = 1 for y > 1
For 0 ≤ y ≤ 1, note that by assumption F has an inverse F −1 , so
FY (y) = P{Y ≤ y} = P{F (X) ≤ y} = P{X ≤ F −1(y)} = F (F −1(y)) = y
Thus Y ∼ U [ 0, 1 ], i.e., Y is a uniformly distributed random variable
• Note: F (x) does not need to be invertible. If F (x) = a is constant over some
interval, then the probability that X lies in this interval is zero. Without loss of
generality, we can take F −1(a) to be the leftmost point of the interval
• Conclusion: We can generate a U[ 0, 1 ] r.v. from any continuous r.v.
EE 278B: Probability and Random Variables
1 – 28
• Now, let’s consider the more useful scenario where we are given X ∼ U[ 0, 1 ] (a
random number generator) and wish to generate a random variable Y with
prescribed cdf F (y), e.g., Gaussian or exponential
x = F (y)
1
y = F −1(x)
• If F is continuous and strictly increasing, set Y = F −1(X). To show Y ∼ F (y),
FY (y) = P{Y ≤ y}
= P{F −1(X) ≤ y}
= P{X ≤ F (y)}
= F (y) ,
since X ∼ U[ 0, 1 ] and 0 ≤ F (y) ≤ 1
EE 278B: Probability and Random Variables
1 – 29
• Example: To generate Y ∼ Exp(λ), set
1
λ
Y = − ln(1 − X)
• Note: F does not need to be continuous for the above to work. For example, to
generate Y ∼ Bern(p), we set
(
0 X ≤1−p
Y =
1 otherwise
x = F (y)
1−p
0
1
y
• Conclusion: We can generate a r.v. with any desired distribution from a U[0, 1] r.v.
EE 278B: Probability and Random Variables
1 – 30
Jointly Distributed Random Variables
• A pair of random variables defined over the same probability space are specified
by their joint cdf
FX,Y (x, y) = P{X ≤ x, Y ≤ y} ,
x, y ∈ R
FX,Y (x, y) is the probability of the shaded region of R2
y
(x, y)
x
EE 278B: Probability and Random Variables
1 – 31
• Properties of the cdf:
◦ FX,Y (x, y) ≥ 0
◦ If x1 ≤ x2 and y1 ≤ y2 then FX,Y (x1, y1) ≤ FX,Y (x2, y2)
◦
lim FX,Y (x, y) = 0 and
y→−∞
lim FX,Y (x, y) = 0
x→−∞
◦ lim FX,Y (x, y) = FX (x) and lim FX,Y (x, y) = FY (y)
y→∞
x→∞
FX (x) and FY (y) are the marginal cdfs of X and Y
◦
lim FX,Y (x, y) = 1
x,y→∞
• X and Y are independent if for every x and y
FX,Y (x, y) = FX (x)FY (y)
EE 278B: Probability and Random Variables
1 – 32
Joint, Marginal, and Conditional PMFs
• Let X and Y be discrete random variables on the same probability space
• They are completely specified by their joint pmf :
pX,Y (x, y) = P{X = x, Y = y} , x ∈ X , y ∈ Y
XX
pX,Y (x, y) = 1
By axioms of probability,
x∈X y∈Y
• To find pX (x), the marginal pmf of X , we use the law of total probability
X
p(x, y) , x ∈ X
pX (x) =
y∈Y
• The conditional pmf of X given Y = y is defined as
pX|Y (x|y) =
pX,Y (x, y)
,
pY (y)
pY (y) 6= 0, x ∈ X
• Chain rule: pX,Y (x, y) = pX (x)pY |X (y|x) = pY (y)pX|Y (x|y)
EE 278B: Probability and Random Variables
1 – 33
• Independence: X and Y are said to be independent if for every (x, y) ∈ X × Y ,
pX,Y (x, y) = pX (x)pY (y) ,
which is equivalent to pX|Y (x|y) = pX (x) for every x ∈ X and y ∈ Y such
that pY (y) 6= 0
EE 278B: Probability and Random Variables
1 – 34
Joint, Marginal, and Conditional PDF
• X and Y are jointly continuous random variables if their joint cdf is continuous
in both x and y
In this case, we can define their joint pdf, provided that it exists, as the function
fX,Y (x, y) such that
Z x Z y
FX,Y (x, y) =
fX,Y (u, v) du dv , x, y ∈ R
−∞
−∞
• If FX,Y (x, y) is differentiable in x and y, then
fX,Y (x, y) =
P{x < X ≤ x + ∆x, y < Y ≤ y + ∆y}
∂ 2F (x, y)
= lim
∆x,∆y→0
∂x∂y
∆x∆y
• Properties of fX,Y (x, y):
◦ fX,Y (x, y) ≥ 0
Z ∞Z ∞
fX,Y (x, y) dx dy = 1
◦
−∞
−∞
EE 278B: Probability and Random Variables
1 – 35
• The marginal pdf of X can be obtained from the joint pdf via the law of total
probability:
Z ∞
fX,Y (x, y) dy
fX (x) =
−∞
• X and Y are independent iff fX,Y (x, y) = fX (x)fY (y) for every x, y
• Conditional cdf and pdf: Let X and Y be continuous random variables with
joint pdf fX,Y (x, y). We wish to define FY |X (y | X = x) = P{Y ≤ y | X = x}
We cannot define the above conditional probability as
P{Y ≤ y, X = x}
P{X = x}
because both numerator and denominator are equal to zero. Instead, we define
conditional probability for continuous random variables as a limit
FY |X (y|x) = lim P{Y ≤ y | x < X ≤ x + ∆x}
∆x→0
P{Y ≤ y, x < X ≤ x + ∆x}
∆x→0
P{x < X ≤ x + ∆x}
Ry
Z y
f
(x, u) du ∆x
fX,Y (x, u)
−∞ X,Y
= lim
=
du
∆x→0
fX (x)∆x
fX (x)
−∞
= lim
EE 278B: Probability and Random Variables
1 – 36
• We then define the conditional pdf in the usual way as
fX,Y (x, y)
if fX (x) 6= 0
fX (x)
Z y
fY |X (u|x) du
FY |X (y|x) =
fY |X (y|x) =
• Thus
−∞
which shows that fY |X (y|x) is a pdf for Y given X = x, i.e.,
Y | {X = x} ∼ fY |X (y|x)
• independence: X and Y are independent if fX,Y (x, y) = fX (x)fY (y) for every
(x, y)
EE 278B: Probability and Random Variables
1 – 37
Mixed Random Variables
• Let Θ be a discrete random variable with pmf pΘ(θ)
• For each Θ = θ with pΘ(θ) 6= 0, let Y be a continuous random variable, i.e.,
FY |Θ(y|θ) is continuous for all θ. We define fY |Θ(y|θ) in the usual way
• The conditional pmf of Θ given y can be defined as a limit
P{Θ = θ, y < Y ≤ y + ∆y}
∆y→0
P{y < Y ≤ y + ∆y}
pΘ|Y (θ | y) = lim
fY |Θ(y|θ)pΘ(θ)
pΘ(θ)fY |Θ(y|θ)∆y
=
∆y→0
fY (y)∆y
fY (y)
= lim
EE 278B: Probability and Random Variables
1 – 38
Bayes Rule for Random Variables
• Bayes Rule for pmfs: Given pX (x) and pY |X (y|x), then
pX|Y (x|y) = P
x′ ∈X
pY |X (y|x)
pX (x)
pY |X (y|x′)pX (x′)
• Bayes rule for densities: Given fX (x) and fY |X (y|x), then
fY |X (y|x)
fX (x)
f (u)fY |X (y|u) du
−∞ X
fX|Y (x|y) = R ∞
• Bayes rule for mixed r.v.s: Given pΘ(θ) and fY |Θ(y|θ), then
pΘ|Y (θ | y) = P
fY |Θ(y|θ)
pΘ(θ)
′
′
θ ′ pΘ(θ )fY |Θ (y|θ )
Conversely, given fY (y) and pΘ|Y (θ|y), then
fY |Θ(y|θ) = R
pΘ|Y (θ|y)
fY (y)
fY (y ′ )pΘ|Y (θ|y ′)dy ′
EE 278B: Probability and Random Variables
1 – 39
• Example: Additive Gaussian Noise Channel
Consider the following communication channel:
Z ∼ N (0, N )
Θ
Y
The signal transmitted is a binary random variable Θ:
(
+1 with probability p
Θ=
−1 with probability 1 − p
The received signal, also called the observation, is Y = Θ + Z , where Θ and Z
are independent
Given Y = y is received (observed), find pΘ|Y (θ|y), the a posteriori pmf of Θ
EE 278B: Probability and Random Variables
1 – 40
Solution: We use Bayes rule
pΘ|Y (θ|y) = P
fY |Θ(y|θ)
pΘ(θ)
′
′
θ ′ pΘ (θ )fY |Θ(y|θ )
We are given pΘ(θ):
and pΘ(−1) = 1 − p
pΘ(+1) = p
and fY |Θ(y|θ) = fZ (y − θ):
Y | {Θ = +1} ∼ N (+1, N )
Y | {Θ = −1} ∼ N (−1, N )
and
Therefore
p
−
e
√
2πN
pΘ|Y (1|y) =
√
p
2πN
−
e
(y−1)2
2N
(y−1)2
2N
(1 − p) − (y+1)
+√
e 2N
2πN
2
y
=
pe N
y
y
pe N + (1 − p)e− N
for − ∞ < y < ∞
EE 278B: Probability and Random Variables
1 – 41
Scalar Detection
• Consider the following general digital communication system
Y
noisy
channel
Θ ∈ {θ0, θ1}
where the signal sent is
decoder
Θ̂(Y ) ∈ {θ0, θ1}
fY |Θ(y|θ)
Θ=
(
θ0
with probability p
θ1
with probability 1 − p
and the observation (received signal) is
Y | {Θ = θ} ∼ fY |Θ(y | θ) ,
θ ∈ {θ0, θ1}
• We wish to find the estimate Θ̂(Y ) (i.e., design the decoder) that minimizes the
probability of error :
△
Pe = P{Θ̂ 6= Θ} = P{Θ = θ0, Θ̂ = θ1} + P{Θ = θ1, Θ̂ = θ0}
= P{Θ = θ0}P{Θ̂ = θ1 | Θ = θ0} + P{Θ = θ1}P{Θ̂ = θ0 | Θ = θ1}
EE 278B: Probability and Random Variables
1 – 42
• We define the maximum a posteriori probability (MAP) decoder as
(
θ0 if pΘ|Y (θ0|y) > pΘ|Y (θ1|y)
Θ̂(y) =
θ1 otherwise
• The MAP decoding rule minimizes Pe , since
Pe = 1 − P{Θ̂(Y ) = Θ}
Z ∞
=1−
fY (y)P{Θ̂(y) = Θ | Y = y} dy
−∞
and the integral is maximized when we pick the largest P{Θ̂(y) = Θ | Y = y}
for each y, which is precisely the MAP decoder
• If p = 21 , i.e., equally likely signals, using Bayes rule, the MAP decoder reduces
to the maximum likelihood (ML) decoder
(
θ0 if fY |Θ(y|θ0) > fY |Θ(y|θ1)
Θ̂(y) =
θ1 otherwise
EE 278B: Probability and Random Variables
1 – 43
Additive Gaussian Noise Channel
• Consider the additive Gaussian noise channel with signal
( √
+ P with probability 21
Θ =
√
− P with probability 21
noise Z ∼ N (0, N ) (Θ and Z are independent), and output Y = Θ + Z
• The MAP decoder is
Θ̂(y) =

√

+ P

−√P
√
P{Θ = + P | Y = y}
>1
if
√
P{Θ = − P | Y = y}
otherwise
Since the two signals are equally likely, the MAP decoding rule reduces to the
ML decoding rule

√
√

+ P if fY |Θ(y | + √P ) > 1
fY |Θ(y | − P )
Θ̂(y) =

−√P otherwise
EE 278B: Probability and Random Variables
1 – 44
• Using the Gaussian pdf, the ML decoder reduces to the minimum distance
decoder
( √
√
√
+ P (y − P )2 < (y − (− P ))2
Θ̂(y) =
√
− P otherwise
From the figure, this simplifies to
( √
+ P
Θ̂(y) =
√
− P
y>0
y<0
Note: The decision when y = 0 is arbitrary
√
f (y |− P )
√
f (y |+ P )
√
− P
√
+ P
y
EE 278B: Probability and Random Variables
1 – 45
• Now to find the minimum probability of error, consider
Pe = P{Θ̂(Y ) 6= Θ}
√
√
√
= P{Θ = P }P{Θ̂(Y ) = − P | Θ = P } +
√
√
√
P{Θ = − P }P{Θ̂(Y ) = P | Θ = − P }
√
√
= 12 P{Y ≤ 0 | Θ = P } + 12 P{Y > 0 | Θ = − P }
√
√
= 12 P{Z ≤ − P } + 12 P{Z > P }
r !
√
P
SNR
=Q
=Q
N
The probability of error is a decreasing function of P/N , the signal-to-noise
ratio (SNR)
EE 278B: Probability and Random Variables
1 – 46