Download STAT 315: LECTURE 3 CHAPTER 3: DISCRETE RANDOM

STAT 315: LECTURE 3 CHAPTER 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS TROY BUTLER 1. The most basic concepts, definitions, and notation Some basic definitions and notation: • A random variable (rv) (but often represented as r.v.) is a rule (or map or function) that associates a specific numerical value to each outcome in a sample space S. It is a function that maps the sample space (i.e. its domain) to the real number line (i.e. its range). – We often use capital letters (e.g. X, Y , U , or Z) to denote rv’s. These are functions acting on a sample space S, i.e. X(s) explicitly denotes the functional dependence of rv X on a sample s ∈ S. – We will use lowercase letters (e.g. x, y, u, or z) to denote the real number mapped to by the associated rv, i.e. X(s) = x denotes a particular outcome of an experiment (denoted by the s) in terms of the real number x that this outcome is mapped to by the rv X. • A discrete random variable is any rv that only maps to values in a finite or countably infinite set. • The probability distribution or probability mass function (pmf ) of a discrete rv X is the function p(x) = P (X = x) (we sometimes use different lower case letters to denote the pmf, but P always denote the probability measure), where – p(x) = P (X = x) ≥ 0 for all s ∈ S s.t. X(s) = x. P P – x P (X = x) = x p(x) = 1 • The cumulative distribution function (cdf ) is defined as F (x) = P (X ≤ x), for any number x ∈ R. – The proper interpretation of the cdf: F (x) gives the probability that rv X is less than or equal P to the value x. For discrete rv’s F (x) = P (X ≤ x) = t≤x p(t), where p(t) is the pmf. – If we use a different lowercase letter for the pmf, then we typically use the associated uppercase letter for the cdf. Do not confuse the cdf with the probability measure P . They are related but are distinct. 1 2 TROY BUTLER • Consider an experiment with the following two outcomes: success (S) and failure (F ). Thus, S = {S, F }. Define the rv X : S → R as, X(S) = 1, and X(F ) = 0. We define a Bernoulli random variable as any rv whose only possible values are 0 and 1. • A Bernoulli trial is an experiment that will result in one of two outcomes, a success or a failure. – The canonical example for a Bernoulli trail is a coin toss where the coin landing “heads up” is a success with success probability denoted by 0 ≤ ρ ≤ 1 and landing “tails up” is a failure with failure probability given by 1 − ρ. – The pmf for Bernoulli rv X : {S, F } → {0, 1} is given as above with p(1) = ρ and p(0) = 1 − ρ. – We often denote X ∼ Bernoulli(ρ) to indicate that rv X has a Bernoulli distribution with success probability ρ. – Bernoulli rv’s and the concept of independent identically distributed (or i.i.d. or iid) Bernoulli trials is critical in many areas of probability theory including the development of the Binomial distribution. • The expectation (or expected value) of rv X with pmf p(x) is E(X) = µX := P x xp(x). We sometimes denote E(X) as X̄ or µX (the last notation particularly emphasizes that this is a numerical value based upon the entire population not a subset). This is a completely different notion than the sample mean x̄. Do not confuse the two! • Let X be a rv with pmf p(x) and expectation µX . The variance is defined by V ar(X) = σ 2 := P 2 x (x − µX ) p(x) (we often use V (X) instead of V ar(X)). The standard deviation of a random p variable is the square root of the variance, denoted by σ, i.e. σ := V (X). 2. The most basic theorems and results Theorem 1. Let X be a discrete rv with pmf p(x), then the expectation of any function h(X) denoted P E[h(x)] is given by E[h(X)] = x h(x)p(x). Corollary 1. The expectation for an affine transformation of rv X (i.e. Y = aX + b for real numbers a and b) is given by E(Y ) = E(aX + b) = aE(X) + b. Theorem 2. For any discrete rv X, V (X) = σ 2 = E(X 2 ) − µ2X . Theorem 3. Let X be a discrete rv with variance V ar(X). Consider any two constants a, b ∈ R. Then, V (aX + b) = a2 V ar(X). Theorem 4. Let X ∼ Bernoulli(ρ), the mean is E(X) = ρ and V (X) = ρ(1 − ρ). STAT 315: LECTURE 3 CHAPTER 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 3 3. The Binomial distribution Let X be the sum of n i.i.d. Bernoulli trials with success probability ρ, then X ∼ Binomial(n, ρ) with pmf:      n     ρx (1 − ρ)n−x b(x; n, ρ) := x     0 x ∈ {0, 1, 2, ..., n} otherwise We will use B(x; n, p) to denote the cdf of a binomial rv X. This does not give the probability of X = x (that is given by P (x)), it gives the probability of the event X ≤ x. Theorem 5. Let X ∼ Binomial(n, ρ), then E(X) = nρ and V (X) = nρ(1 − ρ). Remark 1. Given a dichotomous population of size N , if we use a sample of n from this population without replacement, then the rv X counting the number of successes in the n samples is not a binomial distribution. Why? Each trial within the experiment is not independent. However, if n/N < 0.05, then we can reasonably approximate the distribution of X as a binomial distribution. 4. Hypergeometric and Negative Binomial Distributions We will not cover section 3.5. You should still be aware of the types of experiments for which these distributions apply. The hypergeometric distribution is used for an experiment sampling (without replacement) a total of n from a dichotomous population of size N where there are only M total successes. The key assumption is that any subset of size n is equally likely to be chosen. The negative binomial distribution is used for an experiment that terminates after a fixed number of successes (call it r) is reached (so the total sample size x required to reach the r successes if the variable of interest). 5. The Poisson Distribution The Poisson distribution is used to describe the probabilities of x numbers of events occurring in a fixed interval of time or space where λ represents the mean frequency per unit time/space. A random variable X follows the Poisson distribution with parameter λ (λ > 0) if the pmf of X is given by p(x; λ) =   e−λ λx x!  0 x ∈ {0, 1, 2, 3, ...} otherwise. Theorem 6. If X ∼Poisson(λ), then E(X) = λ and V ar(X) = λ. 4 TROY BUTLER Remark 2. Given a binomial pmf b(x; n, p), if we let n → ∞ and p → 0 s.t. np → λ > 0, then b(x; n, p) → p(x; λ). Theorem 7. If the number of events that can occur in a time interval are independent with a mean rate λ and there are t disjoint time intervals, then X = the number of events occurring in the t time intervals follows a Poisson distribution with mean λt. 6. Exercises to do in class Chapter 3 exercises: 24, 44, 56, 62, 88, 122

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download STAT 315: LECTURE 3 CHAPTER 3: DISCRETE RANDOM