Download STAT 315: LECTURE 3 CHAPTER 3: DISCRETE RANDOM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Transcript
STAT 315: LECTURE 3
CHAPTER 3: DISCRETE RANDOM VARIABLES AND PROBABILITY
DISTRIBUTIONS
TROY BUTLER
1. The most basic concepts, definitions, and notation
Some basic definitions and notation:
• A random variable (rv) (but often represented as r.v.) is a rule (or map or function) that associates
a specific numerical value to each outcome in a sample space S. It is a function that maps the sample
space (i.e. its domain) to the real number line (i.e. its range).
– We often use capital letters (e.g. X, Y , U , or Z) to denote rv’s. These are functions acting on
a sample space S, i.e. X(s) explicitly denotes the functional dependence of rv X on a sample
s ∈ S.
– We will use lowercase letters (e.g. x, y, u, or z) to denote the real number mapped to by the
associated rv, i.e. X(s) = x denotes a particular outcome of an experiment (denoted by the s)
in terms of the real number x that this outcome is mapped to by the rv X.
• A discrete random variable is any rv that only maps to values in a finite or countably infinite
set.
• The probability distribution or probability mass function (pmf ) of a discrete rv X is the
function p(x) = P (X = x) (we sometimes use different lower case letters to denote the pmf, but P
always denote the probability measure), where
– p(x) = P (X = x) ≥ 0 for all s ∈ S s.t. X(s) = x.
P
P
–
x P (X = x) =
x p(x) = 1
• The cumulative distribution function (cdf ) is defined as F (x) = P (X ≤ x), for any number
x ∈ R.
– The proper interpretation of the cdf: F (x) gives the probability that rv X is less than or equal
P
to the value x. For discrete rv’s F (x) = P (X ≤ x) = t≤x p(t), where p(t) is the pmf.
– If we use a different lowercase letter for the pmf, then we typically use the associated uppercase
letter for the cdf. Do not confuse the cdf with the probability measure P . They are related but
are distinct.
1
2
TROY BUTLER
• Consider an experiment with the following two outcomes: success (S) and failure (F ). Thus, S =
{S, F }. Define the rv X : S → R as, X(S) = 1, and X(F ) = 0. We define a Bernoulli random
variable as any rv whose only possible values are 0 and 1.
• A Bernoulli trial is an experiment that will result in one of two outcomes, a success or a failure.
– The canonical example for a Bernoulli trail is a coin toss where the coin landing “heads up” is
a success with success probability denoted by 0 ≤ ρ ≤ 1 and landing “tails up” is a failure
with failure probability given by 1 − ρ.
– The pmf for Bernoulli rv X : {S, F } → {0, 1} is given as above with p(1) = ρ and p(0) = 1 − ρ.
– We often denote X ∼ Bernoulli(ρ) to indicate that rv X has a Bernoulli distribution with
success probability ρ.
– Bernoulli rv’s and the concept of independent identically distributed (or i.i.d. or iid) Bernoulli
trials is critical in many areas of probability theory including the development of the Binomial
distribution.
• The expectation (or expected value) of rv X with pmf p(x) is E(X) = µX :=
P
x
xp(x). We
sometimes denote E(X) as X̄ or µX (the last notation particularly emphasizes that this is a numerical
value based upon the entire population not a subset). This is a completely different notion than the
sample mean x̄. Do not confuse the two!
• Let X be a rv with pmf p(x) and expectation µX . The variance is defined by V ar(X) = σ 2 :=
P
2
x (x − µX ) p(x) (we often use V (X) instead of V ar(X)). The standard deviation of a random
p
variable is the square root of the variance, denoted by σ, i.e. σ := V (X).
2. The most basic theorems and results
Theorem 1. Let X be a discrete rv with pmf p(x), then the expectation of any function h(X) denoted
P
E[h(x)] is given by E[h(X)] = x h(x)p(x).
Corollary 1. The expectation for an affine transformation of rv X (i.e. Y = aX + b for real numbers a
and b) is given by E(Y ) = E(aX + b) = aE(X) + b.
Theorem 2. For any discrete rv X, V (X) = σ 2 = E(X 2 ) − µ2X .
Theorem 3. Let X be a discrete rv with variance V ar(X). Consider any two constants a, b ∈ R. Then,
V (aX + b) = a2 V ar(X).
Theorem 4. Let X ∼ Bernoulli(ρ), the mean is E(X) = ρ and V (X) = ρ(1 − ρ).
STAT 315: LECTURE 3
CHAPTER 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
3
3. The Binomial distribution
Let X be the sum of n i.i.d. Bernoulli trials with success probability ρ, then X ∼ Binomial(n, ρ) with
pmf:
 



n

 
 ρx (1 − ρ)n−x
b(x; n, ρ) :=
x




0
x ∈ {0, 1, 2, ..., n}
otherwise
We will use B(x; n, p) to denote the cdf of a binomial rv X. This does not give the probability of X = x
(that is given by P (x)), it gives the probability of the event X ≤ x.
Theorem 5. Let X ∼ Binomial(n, ρ), then E(X) = nρ and V (X) = nρ(1 − ρ).
Remark 1. Given a dichotomous population of size N , if we use a sample of n from this population without
replacement, then the rv X counting the number of successes in the n samples is not a binomial distribution.
Why? Each trial within the experiment is not independent. However, if n/N < 0.05, then we can reasonably
approximate the distribution of X as a binomial distribution.
4. Hypergeometric and Negative Binomial Distributions
We will not cover section 3.5. You should still be aware of the types of experiments for which these
distributions apply.
The hypergeometric distribution is used for an experiment sampling (without replacement) a total of n
from a dichotomous population of size N where there are only M total successes. The key assumption is
that any subset of size n is equally likely to be chosen.
The negative binomial distribution is used for an experiment that terminates after a fixed number of
successes (call it r) is reached (so the total sample size x required to reach the r successes if the variable of
interest).
5. The Poisson Distribution
The Poisson distribution is used to describe the probabilities of x numbers of events occurring in a fixed
interval of time or space where λ represents the mean frequency per unit time/space.
A random variable X follows the Poisson distribution with parameter λ (λ > 0) if the pmf of X is
given by
p(x; λ) =


e−λ λx
x!
 0
x ∈ {0, 1, 2, 3, ...}
otherwise.
Theorem 6. If X ∼Poisson(λ), then E(X) = λ and V ar(X) = λ.
4
TROY BUTLER
Remark 2. Given a binomial pmf b(x; n, p), if we let n → ∞ and p → 0 s.t. np → λ > 0, then b(x; n, p) →
p(x; λ).
Theorem 7. If the number of events that can occur in a time interval are independent with a mean rate
λ and there are t disjoint time intervals, then X = the number of events occurring in the t time intervals
follows a Poisson distribution with mean λt.
6. Exercises to do in class
Chapter 3 exercises: 24, 44, 56, 62, 88, 122