Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture Notes 1 Probability and Random Variables • Probability Spaces • Conditional Probability and Independence • Random Variables • Functions of a Random Variable • Generation of a Random Variable • Jointly Distributed Random Variables • Scalar detection EE 278B: Probability and Random Variables 1–1 Probability Theory • Probability theory provides the mathematical rules for assigning probabilities to outcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage • Basic elements of probability theory: ◦ Sample space Ω: set of all possible “elementary” or “finest grain” outcomes of the random experiment ◦ Set of events F : set of (all?) subsets of Ω — an event A ⊂ Ω occurs if the outcome ω ∈ A ◦ Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) • Formally, a probability space is the triple (Ω, F, P) EE 278B: Probability and Random Variables 1–2 Axioms of Probability • A probability measure P satisfies the following axioms: 1. P(A) ≥ 0 for every event A in F 2. P(Ω) = 1 3. If A1, A2, . . . are disjoint events — i.e., Ai ∩ Aj = ∅, for all i 6= j — then [ ∞ ∞ X P Ai = P(Ai) i=1 i=1 • Notes: ◦ P is a measure in the same sense as mass, length, area, and volume — all satisfy axioms 1 and 3 ◦ Unlike these other measures, P is bounded by 1 (axiom 2) ◦ This analogy provides some intuition but is not sufficient to fully understand probability theory — other aspects such as conditioning and independence are unique to probability EE 278B: Probability and Random Variables 1–3 Discrete Probability Spaces • A sample space Ω is said to be discrete if it is countable • Examples: ◦ Rolling a die: Ω = {1, 2, 3, 4, 5, 6} ◦ Flipping a coin n times: Ω = {H, T }n , sequences of heads/tails of length n ◦ Flipping a coin until the first heads occurs: Ω = {H, T H, T T H, T T T H, . . .} • For discrete sample spaces, the set of events F can be taken to be the set of all subsets of Ω, sometimes called the power set of Ω • Example: For the coin flipping experiment, F = {∅, {H}, {T }, Ω} • F does not have to be the entire power set (more on this later) EE 278B: Probability and Random Variables 1–4 • The probability measure P can be defined by assigning probabilities to individual outcomes — single outcome events {ω} — so that: P({ω}) ≥ 0 for every ω ∈ Ω X P({ω}) = 1 ω∈Ω • The probability of any other event A is simply X P({ω}) P(A) = ω∈A • Example: For the die rolling experiment, assign P({i}) = 1 6 for i = 1, 2, . . . , 6 The probability of the event “the outcome is even,” A = {2, 4, 6}, is P(A) = P({2}) + P({4}) + P({6}) = 3 6 = 1 2 EE 278B: Probability and Random Variables 1–5 Continuous Probability Spaces • A continuous sample space Ω has an uncountable number of elements • Examples: ◦ Random number between 0 and 1: Ω = (0, 1 ] ◦ Point in the unit disk: Ω = {(x, y) : x2 + y 2 ≤ 1} ◦ Arrival times of n packets: Ω = (0, ∞)n • For continuous Ω, we cannot in general define the probability measure P by first assigning probabilities to outcomes • To see why, consider assigning a uniform probability measure over (0, 1 ] ◦ In this case the probability of each single outcome event is zero ◦ How do we find the probability of an event such as A = [0.25, 0.75]? EE 278B: Probability and Random Variables 1–6 • Another difference for continuous Ω: we cannot take the set of events F as the power set of Ω. (To learn why you need to study measure theory, which is beyond the scope of this course) • The set of events F cannot be an arbitrary collection of subsets of Ω. It must make sense, e.g., if A is an event, then its complement Ac must also be an event, the union of two events must be an event, and so on • Formally, F must be a sigma algebra (σ-algebra, σ-field), which satisfies the following axioms: 1. ∅ ∈ F 2. If A ∈ F then Ac ∈ F 3. If A1, A2, . . . ∈ F then S∞ i=1 Ai ∈F • Of course, the power set is a sigma algebra. But we can define smaller σ-algebras. For example, for rolling a die, we could define the set of events as F = {∅, odd, even, Ω} EE 278B: Probability and Random Variables 1–7 • For Ω = R = (−∞, ∞) (or (0, ∞), (0, 1), etc.) F is typically defined as the family of sets obtained by starting from the intervals and taking countable unions, intersections, and complements • The resulting F is called the Borel field • Note: Amazingly there are subsets in R that cannot be generated in this way! (Not ones that you are likely to encounter in your life as an engineer or even as a mathematician) • To define a probability measure over a Borel field, we first assign probabilities to the intervals in a consistent way, i.e., in a way that satisfies the axioms of probability For example to define uniform probability measure over (0, 1), we first assign P((a, b)) = b − a to all intervals • In EE 278 we do not deal with sigma fields or the Borel field beyond (kind of) knowing what they are EE 278B: Probability and Random Variables 1–8 Useful Probability Laws • Union of Events Bound: n n [ X P Ai ≤ P(Ai) i=1 i=1 • Law of Total Probability : Let A1, A2,SA3, . . . be events that partition Ω, i.e., disjoint (Ai ∩ Aj = ∅ for i 6= j) and i Ai = Ω. Then for any event B P P(B) = iP(Ai ∩ B) The Law of Total Probability is very useful for finding probabilities of sets EE 278B: Probability and Random Variables 1–9 Conditional Probability • Let B be an event such that P(B) 6= 0. The conditional probability of event A given B is defined to be P(A | B) = P(A ∩ B) P(A, B) = P(B) P(B) • The function P(· | B) is a probability measure over F , i.e., it satisfies the axioms of probability • Chain rule: P(A, B) = P(A)P(B | A) = P(B)P(A | B) (this can be generalized to n events) • The probability of event A given B, a nonzero probability event — the a posteriori probability of A — is related to the unconditional probability of A — the a priori probability — by P(A | B) = P(B | A) P(A) P(B) This follows directly from the definition of conditional probability EE 278B: Probability and Random Variables 1 – 10 Bayes Rule • Let A1, A2, . . . , An be nonzero probability events that partition Ω, and let B be a nonzero probability event • We know P(Ai ) and P(B | Ai), i = 1, 2, . . . , n, and want to find the a posteriori probabilities P(Aj | B), j = 1, 2, . . . , n • We know that P(Aj | B) = P(B | Aj ) P(Aj ) P(B) • By the law of total probability n n X X P(B) = P(Ai, B) = P(Ai )P(B | Ai) i=1 i=1 • Substituting, we obtain Bayes rule P(B | Aj ) P(Aj | B) = Pn P(Aj ), j = 1, 2, . . . , n i=1 P(Ai )P(B | Ai ) • Bayes rule also applies to a (countably) infinite number of events EE 278B: Probability and Random Variables 1 – 11 Independence • Two events are said to be statistically independent if P(A, B) = P(A)P(B) • When P(B) 6= 0, this is equivalent to P(A | B) = P(A) In other words, knowing whether B occurs does not change the probability of A • The events A1, A2, . . . , An are said to be independent if for every subset Ai1 , Ai2 , . . . , Aik of the events, P(Ai1 , Ai2 , . . . , Aik ) = k Y P(Aij ) j=1 • Note: P(A1, A2, . . . , An) = Qn EE 278B: Probability and Random Variables j=1 P(Ai ) is not sufficient for independence 1 – 12 Random Variables • A random variable (r.v.) is a real-valued function X(ω) over a sample space Ω, i.e., X : Ω → R Ω ω X(ω) • Notations: ◦ We use upper case letters for random variables: X, Y, Z, Φ, Θ, . . . ◦ We use lower case letters for values of random variables: X = x means that random variable X takes on the value x, i.e., X(ω) = x where ω is the outcome EE 278B: Probability and Random Variables 1 – 13 Specifying a Random Variable • Specifying a random variable means being able to determine the probability that X ∈ A for any Borel set A ⊂ R, in particular, for any interval (a, b ] • To do so, consider the inverse image of A under X , i.e., {ω : X(ω) ∈ A} R set A inverse image of A under X(ω), i.e., {ω : X(ω) ∈ A} • Since X ∈ A iff ω ∈ {ω : X(ω) ∈ A}, P({X ∈ A}) = P({ω : X(ω) ∈ A}) = P{ω : X(ω) ∈ A} Shorthand: P({set description}) = P{set description} EE 278B: Probability and Random Variables 1 – 14 Cumulative Distribution Function (CDF) • We need to be able to determine P{X ∈ A} for any Borel set A ⊂ R, i.e., any set generated by starting from intervals and taking countable unions, intersections, and complements • Hence, it suffices to specify P{X ∈ (a, b ]} for all intervals. The probability of any other Borel set can be determined by the axioms of probability • Equivalently, it suffices to specify its cumulative distribution function (cdf): FX (x) = P{X ≤ x} = P{X ∈ (−∞, x ]} , x∈R • Properties of cdf: ◦ FX (x) ≥ 0 ◦ FX (x) is monotonically nondecreasing, i.e., if a > b then FX (a) ≥ FX (b) FX (x) 1 x EE 278B: Probability and Random Variables ◦ Limits: lim FX (x) = 1 and x→+∞ 1 – 15 lim FX (x) = 0 x→−∞ ◦ FX (x) is right continuous, i.e., FX (a+) = limx→a+ FX (x) = FX (a) ◦ P{X = a} = FX (a) − FX (a−), where FX (a−) = limx→a− FX (x) ◦ For any Borel set A, P{X ∈ A} can be determined from FX (x) • Notation: X ∼ FX (x) means that X has cdf FX (x) EE 278B: Probability and Random Variables 1 – 16 Probability Mass Function (PMF) PSfrag • A random variable is said to be discrete if FX (x) consists only of steps over a countable set X 1 x • Hence, a discrete random variable can be completely specified by the probability mass function (pmf) pX (x) = P{X = x} for every x ∈ X P Clearly pX (x) ≥ 0 and x∈X pX (x) = 1 • Notation: We use X ∼ pX (x) or simply X ∼ p(x) to mean that the discrete random variable X has pmf pX (x) or p(x) EE 278B: Probability and Random Variables 1 – 17 • Famous discrete random variables: ◦ Bernoulli: X ∼ Bern(p) for 0 ≤ p ≤ 1 has the pmf pX (1) = p and pX (0) = 1 − p ◦ Geometric: X ∼ Geom(p) for 0 ≤ p ≤ 1 has the pmf pX (k) = p(1 − p)k−1 , k = 1, 2, 3, . . . ◦ Binomial: X ∼ Binom(n, p) for integer n > 0 and 0 ≤ p ≤ 1 has the pmf n k p (1 − p)n−k , k = 0, 1, 2, . . . pX (k) = k ◦ Poisson: X ∼ Poisson(λ) for λ > 0 has the pmf pX (k) = λk −λ e , k! k = 0, 1, 2, . . . ◦ Remark: Poisson is the limit of Binomial for np = λ as n → ∞, i.e., for every k = 0, 1, 2, . . ., the Binom(n, λ/n) pmf pX (k) → EE 278B: Probability and Random Variables λk −λ e k! as n → ∞ 1 – 18 Probability Density Function (PDF) PSfrag • A random variable is said to be continuous if its cdf is a continuous function 1 x • If FX (x) is continuous and differentiable (except possibly over a countable set), then X can be completely specified by a probability density function (pdf) fX (x) such that Z x FX (x) = fX (u) du −∞ • If FX (x) is differentiable everywhere, then (by definition of derivative) dFX (x) dx F (x + ∆x) − F (x) P{x < X ≤ x + ∆x} = lim = lim ∆x ∆x ∆x→0 ∆x→0 fX (x) = EE 278B: Probability and Random Variables 1 – 19 • Properties of pdf: ◦ fX (x) ≥ 0 Z ∞ ◦ fX (x) dx = 1 −∞ ◦ For any event (Borel set) A ⊂ R, P{X ∈ A} = In particular, Z fX (x) dx x∈A P{x1 < X ≤ x2} = Z x2 fX (x) dx x1 • Important note: fX (x) should not be interpreted as the probability that X = x. In fact, fX (x) is not a probability measure since it can be > 1 • Notation: X ∼ fX (x) means that X has pdf fX (x) EE 278B: Probability and Random Variables 1 – 20 • Famous continuous random variables: ◦ Uniform: X ∼ U[a, b ] where a < b has pdf ( 1 if a ≤ x ≤ b b−a fX (x) = 0 otherwise ◦ Exponential: X ∼ Exp(λ) where λ > 0 has pdf ( λe−λx if x ≥ 0 fX (x) = 0 otherwise ◦ Laplace: X ∼ Laplace(λ) where λ > 0 has pdf 1 2 fX (x) = λe−λ|x| ◦ Gaussian: X ∼ N (µ, σ 2 ) with parameters µ (the mean) and σ 2 (the variance, σ is the standard deviation) has pdf 1 2 fX (x) = √ 2πσ 2 (x−µ) − 2σ 2 e EE 278B: Probability and Random Variables 1 – 21 The cdf of the standard normal random variable N (0, 1) is Z x 2 1 −u Φ(x) = √ e 2 du −∞ 2π Define the function Q(x) = 1 − Φ(x) = P{X > x} N (0, 1) Q(x) x The Q(·) function is used to compute P{X > a} for any Gaussian r.v. X : Given Y ∼ N (µ, σ 2), we represent it using the standard X ∼ N (0, 1) as Y = σX + µ Then y−µ y−µ =Q P{Y > y} = P X > σ σ √ ◦ The complementary error function is erfc(x) = 2Q( 2 x) EE 278B: Probability and Random Variables 1 – 22 Functions of a Random Variable • Suppose we are given a r.v. X with known cdf FX (x) and a function y = g(x). What is the cdf of the random variable Y = g(X)? • We use FY (y) = P{Y ≤ y} = P{x : g(x) ≤ y} y x y {x : g(x) ≤ y} EE 278B: Probability and Random Variables 1 – 23 • Example: Quadratic function. Let X ∼ FX (x) and Y = X 2 . We wish to find FY (y) y y √ − y √ y x If y < 0, then clearly FY (y) = 0. Consider y ≥ 0, √ √ √ √ FY (y) = P {− y < X ≤ y } = FX ( y) − FX (− y ) If X is continuous with density fX (x), then √ √ 1 fY (y) = √ fX (+ y) + fX (− y) 2 y EE 278B: Probability and Random Variables 1 – 24 • Remark: In general, let X ∼ fX (x) and Y = g(X) be differentiable. Then fY (y) = k X fX (xi) |g ′(xi)| i=1 , where x1, x2, . . . are the solutions of the equation y = g(x) and g ′(xi ) is the derivative of g evaluated at xi EE 278B: Probability and Random Variables 1 – 25 • Example: Limiter. Let X ∼ Laplace(1), i.e., fX (x) = (1/2)e−|x| , and let Y be defined by the function of X shown in the figure. Find the cdf of Y y +a −1 +1 −a x To find the cdf of Y , we consider the following cases ◦ y < −a: Here clearly FY (y) = 0 ◦ y = −a: Here FY (−a) = FX (−1) Z −1 1 −1 1 x = 2 e dx = 2 e −∞ EE 278B: Probability and Random Variables 1 – 26 ◦ −a < y < a: Here FY (y) = P{Y ≤ y} = P {aX ≤ y} n y yo =P X≤ = FX a a Z y/a 1 −|x| dx = 12 e−1 + 2e −1 ◦ y ≥ a: Here FY (y) = 1 Combining the results, the following is a sketch of the cdf of Y FY (y) −a a y EE 278B: Probability and Random Variables 1 – 27 Generation of Random Variables • Generating a r.v. with a prescribed distribution is often needed for performing simulations involving random phenomena, e.g., noise or random arrivals • First let X ∼ F (x) where the cdf F (x) is continuous and strictly increasing. Define Y = F (X), a real-valued random variable that is a function of X What is the cdf of Y ? Clearly, FY (y) = 0 for y < 0, and FY (y) = 1 for y > 1 For 0 ≤ y ≤ 1, note that by assumption F has an inverse F −1 , so FY (y) = P{Y ≤ y} = P{F (X) ≤ y} = P{X ≤ F −1(y)} = F (F −1(y)) = y Thus Y ∼ U [ 0, 1 ], i.e., Y is a uniformly distributed random variable • Note: F (x) does not need to be invertible. If F (x) = a is constant over some interval, then the probability that X lies in this interval is zero. Without loss of generality, we can take F −1(a) to be the leftmost point of the interval • Conclusion: We can generate a U[ 0, 1 ] r.v. from any continuous r.v. EE 278B: Probability and Random Variables 1 – 28 • Now, let’s consider the more useful scenario where we are given X ∼ U[ 0, 1 ] (a random number generator) and wish to generate a random variable Y with prescribed cdf F (y), e.g., Gaussian or exponential x = F (y) 1 y = F −1(x) • If F is continuous and strictly increasing, set Y = F −1(X). To show Y ∼ F (y), FY (y) = P{Y ≤ y} = P{F −1(X) ≤ y} = P{X ≤ F (y)} = F (y) , since X ∼ U[ 0, 1 ] and 0 ≤ F (y) ≤ 1 EE 278B: Probability and Random Variables 1 – 29 • Example: To generate Y ∼ Exp(λ), set 1 λ Y = − ln(1 − X) • Note: F does not need to be continuous for the above to work. For example, to generate Y ∼ Bern(p), we set ( 0 X ≤1−p Y = 1 otherwise x = F (y) 1−p 0 1 y • Conclusion: We can generate a r.v. with any desired distribution from a U[0, 1] r.v. EE 278B: Probability and Random Variables 1 – 30 Jointly Distributed Random Variables • A pair of random variables defined over the same probability space are specified by their joint cdf FX,Y (x, y) = P{X ≤ x, Y ≤ y} , x, y ∈ R FX,Y (x, y) is the probability of the shaded region of R2 y (x, y) x EE 278B: Probability and Random Variables 1 – 31 • Properties of the cdf: ◦ FX,Y (x, y) ≥ 0 ◦ If x1 ≤ x2 and y1 ≤ y2 then FX,Y (x1, y1) ≤ FX,Y (x2, y2) ◦ lim FX,Y (x, y) = 0 and y→−∞ lim FX,Y (x, y) = 0 x→−∞ ◦ lim FX,Y (x, y) = FX (x) and lim FX,Y (x, y) = FY (y) y→∞ x→∞ FX (x) and FY (y) are the marginal cdfs of X and Y ◦ lim FX,Y (x, y) = 1 x,y→∞ • X and Y are independent if for every x and y FX,Y (x, y) = FX (x)FY (y) EE 278B: Probability and Random Variables 1 – 32 Joint, Marginal, and Conditional PMFs • Let X and Y be discrete random variables on the same probability space • They are completely specified by their joint pmf : pX,Y (x, y) = P{X = x, Y = y} , x ∈ X , y ∈ Y XX pX,Y (x, y) = 1 By axioms of probability, x∈X y∈Y • To find pX (x), the marginal pmf of X , we use the law of total probability X p(x, y) , x ∈ X pX (x) = y∈Y • The conditional pmf of X given Y = y is defined as pX|Y (x|y) = pX,Y (x, y) , pY (y) pY (y) 6= 0, x ∈ X • Chain rule: pX,Y (x, y) = pX (x)pY |X (y|x) = pY (y)pX|Y (x|y) EE 278B: Probability and Random Variables 1 – 33 • Independence: X and Y are said to be independent if for every (x, y) ∈ X × Y , pX,Y (x, y) = pX (x)pY (y) , which is equivalent to pX|Y (x|y) = pX (x) for every x ∈ X and y ∈ Y such that pY (y) 6= 0 EE 278B: Probability and Random Variables 1 – 34 Joint, Marginal, and Conditional PDF • X and Y are jointly continuous random variables if their joint cdf is continuous in both x and y In this case, we can define their joint pdf, provided that it exists, as the function fX,Y (x, y) such that Z x Z y FX,Y (x, y) = fX,Y (u, v) du dv , x, y ∈ R −∞ −∞ • If FX,Y (x, y) is differentiable in x and y, then fX,Y (x, y) = P{x < X ≤ x + ∆x, y < Y ≤ y + ∆y} ∂ 2F (x, y) = lim ∆x,∆y→0 ∂x∂y ∆x∆y • Properties of fX,Y (x, y): ◦ fX,Y (x, y) ≥ 0 Z ∞Z ∞ fX,Y (x, y) dx dy = 1 ◦ −∞ −∞ EE 278B: Probability and Random Variables 1 – 35 • The marginal pdf of X can be obtained from the joint pdf via the law of total probability: Z ∞ fX,Y (x, y) dy fX (x) = −∞ • X and Y are independent iff fX,Y (x, y) = fX (x)fY (y) for every x, y • Conditional cdf and pdf: Let X and Y be continuous random variables with joint pdf fX,Y (x, y). We wish to define FY |X (y | X = x) = P{Y ≤ y | X = x} We cannot define the above conditional probability as P{Y ≤ y, X = x} P{X = x} because both numerator and denominator are equal to zero. Instead, we define conditional probability for continuous random variables as a limit FY |X (y|x) = lim P{Y ≤ y | x < X ≤ x + ∆x} ∆x→0 P{Y ≤ y, x < X ≤ x + ∆x} ∆x→0 P{x < X ≤ x + ∆x} Ry Z y f (x, u) du ∆x fX,Y (x, u) −∞ X,Y = lim = du ∆x→0 fX (x)∆x fX (x) −∞ = lim EE 278B: Probability and Random Variables 1 – 36 • We then define the conditional pdf in the usual way as fX,Y (x, y) if fX (x) 6= 0 fX (x) Z y fY |X (u|x) du FY |X (y|x) = fY |X (y|x) = • Thus −∞ which shows that fY |X (y|x) is a pdf for Y given X = x, i.e., Y | {X = x} ∼ fY |X (y|x) • independence: X and Y are independent if fX,Y (x, y) = fX (x)fY (y) for every (x, y) EE 278B: Probability and Random Variables 1 – 37 Mixed Random Variables • Let Θ be a discrete random variable with pmf pΘ(θ) • For each Θ = θ with pΘ(θ) 6= 0, let Y be a continuous random variable, i.e., FY |Θ(y|θ) is continuous for all θ. We define fY |Θ(y|θ) in the usual way • The conditional pmf of Θ given y can be defined as a limit P{Θ = θ, y < Y ≤ y + ∆y} ∆y→0 P{y < Y ≤ y + ∆y} pΘ|Y (θ | y) = lim fY |Θ(y|θ)pΘ(θ) pΘ(θ)fY |Θ(y|θ)∆y = ∆y→0 fY (y)∆y fY (y) = lim EE 278B: Probability and Random Variables 1 – 38 Bayes Rule for Random Variables • Bayes Rule for pmfs: Given pX (x) and pY |X (y|x), then pX|Y (x|y) = P x′ ∈X pY |X (y|x) pX (x) pY |X (y|x′)pX (x′) • Bayes rule for densities: Given fX (x) and fY |X (y|x), then fY |X (y|x) fX (x) f (u)fY |X (y|u) du −∞ X fX|Y (x|y) = R ∞ • Bayes rule for mixed r.v.s: Given pΘ(θ) and fY |Θ(y|θ), then pΘ|Y (θ | y) = P fY |Θ(y|θ) pΘ(θ) ′ ′ θ ′ pΘ(θ )fY |Θ (y|θ ) Conversely, given fY (y) and pΘ|Y (θ|y), then fY |Θ(y|θ) = R pΘ|Y (θ|y) fY (y) fY (y ′ )pΘ|Y (θ|y ′)dy ′ EE 278B: Probability and Random Variables 1 – 39 • Example: Additive Gaussian Noise Channel Consider the following communication channel: Z ∼ N (0, N ) Θ Y The signal transmitted is a binary random variable Θ: ( +1 with probability p Θ= −1 with probability 1 − p The received signal, also called the observation, is Y = Θ + Z , where Θ and Z are independent Given Y = y is received (observed), find pΘ|Y (θ|y), the a posteriori pmf of Θ EE 278B: Probability and Random Variables 1 – 40 Solution: We use Bayes rule pΘ|Y (θ|y) = P fY |Θ(y|θ) pΘ(θ) ′ ′ θ ′ pΘ (θ )fY |Θ(y|θ ) We are given pΘ(θ): and pΘ(−1) = 1 − p pΘ(+1) = p and fY |Θ(y|θ) = fZ (y − θ): Y | {Θ = +1} ∼ N (+1, N ) Y | {Θ = −1} ∼ N (−1, N ) and Therefore p − e √ 2πN pΘ|Y (1|y) = √ p 2πN − e (y−1)2 2N (y−1)2 2N (1 − p) − (y+1) +√ e 2N 2πN 2 y = pe N y y pe N + (1 − p)e− N for − ∞ < y < ∞ EE 278B: Probability and Random Variables 1 – 41 Scalar Detection • Consider the following general digital communication system Y noisy channel Θ ∈ {θ0, θ1} where the signal sent is decoder Θ̂(Y ) ∈ {θ0, θ1} fY |Θ(y|θ) Θ= ( θ0 with probability p θ1 with probability 1 − p and the observation (received signal) is Y | {Θ = θ} ∼ fY |Θ(y | θ) , θ ∈ {θ0, θ1} • We wish to find the estimate Θ̂(Y ) (i.e., design the decoder) that minimizes the probability of error : △ Pe = P{Θ̂ 6= Θ} = P{Θ = θ0, Θ̂ = θ1} + P{Θ = θ1, Θ̂ = θ0} = P{Θ = θ0}P{Θ̂ = θ1 | Θ = θ0} + P{Θ = θ1}P{Θ̂ = θ0 | Θ = θ1} EE 278B: Probability and Random Variables 1 – 42 • We define the maximum a posteriori probability (MAP) decoder as ( θ0 if pΘ|Y (θ0|y) > pΘ|Y (θ1|y) Θ̂(y) = θ1 otherwise • The MAP decoding rule minimizes Pe , since Pe = 1 − P{Θ̂(Y ) = Θ} Z ∞ =1− fY (y)P{Θ̂(y) = Θ | Y = y} dy −∞ and the integral is maximized when we pick the largest P{Θ̂(y) = Θ | Y = y} for each y, which is precisely the MAP decoder • If p = 21 , i.e., equally likely signals, using Bayes rule, the MAP decoder reduces to the maximum likelihood (ML) decoder ( θ0 if fY |Θ(y|θ0) > fY |Θ(y|θ1) Θ̂(y) = θ1 otherwise EE 278B: Probability and Random Variables 1 – 43 Additive Gaussian Noise Channel • Consider the additive Gaussian noise channel with signal ( √ + P with probability 21 Θ = √ − P with probability 21 noise Z ∼ N (0, N ) (Θ and Z are independent), and output Y = Θ + Z • The MAP decoder is Θ̂(y) = √ + P −√P √ P{Θ = + P | Y = y} >1 if √ P{Θ = − P | Y = y} otherwise Since the two signals are equally likely, the MAP decoding rule reduces to the ML decoding rule √ √ + P if fY |Θ(y | + √P ) > 1 fY |Θ(y | − P ) Θ̂(y) = −√P otherwise EE 278B: Probability and Random Variables 1 – 44 • Using the Gaussian pdf, the ML decoder reduces to the minimum distance decoder ( √ √ √ + P (y − P )2 < (y − (− P ))2 Θ̂(y) = √ − P otherwise From the figure, this simplifies to ( √ + P Θ̂(y) = √ − P y>0 y<0 Note: The decision when y = 0 is arbitrary √ f (y |− P ) √ f (y |+ P ) √ − P √ + P y EE 278B: Probability and Random Variables 1 – 45 • Now to find the minimum probability of error, consider Pe = P{Θ̂(Y ) 6= Θ} √ √ √ = P{Θ = P }P{Θ̂(Y ) = − P | Θ = P } + √ √ √ P{Θ = − P }P{Θ̂(Y ) = P | Θ = − P } √ √ = 12 P{Y ≤ 0 | Θ = P } + 12 P{Y > 0 | Θ = − P } √ √ = 12 P{Z ≤ − P } + 12 P{Z > P } r ! √ P SNR =Q =Q N The probability of error is a decreasing function of P/N , the signal-to-noise ratio (SNR) EE 278B: Probability and Random Variables 1 – 46