Download RANDOM VARIABLES: Binomial and hypergeometric examples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
RANDOM VARIABLES: Binomial and hypergeometric examples
BINOMIAL EXPERIMENT, probabilities, and random variable.
Binomial experiment: Consider a sequence of n independent and identical trials, each resulting in
one of two possible outcomes, ”success” (S) or ”failure” (F). Let p=P(S), and assume p remains the
same from trial to trial. We are interested in the number of successes.
Binomial probabilities: The probability of k successes in n Binomial trials is
P (k successes in n trials) =
n
k
!
pk (1 − p)n−k , for k = 0, 1, 2, . . . , n.
(1)
Binomial distribution is the assignment of binomial probabilities (above).
Binomial random variable X= number of S in n binomial trials.
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
HYPERGEOMETRIC EXPERIMENT, probabilities, and random variable.
Hypergeometric experiment: A box contains r red chips and w white chips, where r+w=N (total
number of chips). We draw n chips at random, without replacement, and we are interested in the
number of, say, red chips.
Hypergeometric probabilities: The probability of k red chips in a hypergeometric experiment is
P (k red chips among n sampled) =
r
k
!
w
n−k
N
n
!
!
Hypergeometric distribution is the assignment of hypergeometric probabilities (above).
Hypergeometric random variable X= number of red chips among the n sampled chips.
1
(2)
DISCRETE RANDOM VARIABLES (RVs)
Definition. Suppose a sample space S has finite or countable number of simple outcomes. Let p be a
real valued function on S such that
1. 0 ≤ p(s) ≤ 1 for every element s of S;
2.
P
s∈S
p(s) = 1,
Then p is said to be a discrete probability function.
NOTE: For any event A defined on S: P (A) =
P
s∈A
p(s).
Definition. A real valued function X : S → R is called a random variable.
Definition. A random variable with finite or countably many values is called a discrete random
variable.
Definition. Any discrete random variable X is described by its probability density function (or
probability mass function), denoted pX (k), which provides probabilities of all values of X as follows:
pX (k) = P (s ∈ S : X(s) = k).
(3)
NOTE: For any k not in the range (set of values) of X: pX (k) = 0.
NOTE: For any t ≤ s, P (t ≤ X ≤ s) =
Ps
k=t
P (X = k).
NOTATION: For simplicity, we denote pX (k) = P (X = k) thus suppressing the dependence on the
sample space.
Examples: 1. Binomial random variable X= number of S in the binomial experiment with n trials and
probability of S equal to p, X ∼ Bin(n, p).
pX (k) = P (k successes in n trials) =
n
k
!
pk (1 − p)n−k , for k = 0, 1, 2, . . . , n.
(4)
2. Hypergeometric random variable X= number of red chips selected among n sampled in a hypergeometric experiment.
Definition. Let X a discrete random variable. For any real number t, the cumulative distribution
function F of X at t is given by
FX (t) = P (X ≤ t) = P (s ∈ S : X(s) ≤ t).
(5)
Linear transformation: Let X be a discrete random variable (rv). let Y=aX+b, where a and b are
real constants. Then pY (y) = pX ( y−b
).
a
2
CONTINUOUS RANDOM VARIABLES (RVs)
Suppose a sample space S is uncountable, like real numbers. We think of such S as ”identical” to
real numbers. We can define a probability structure on such a space using a special function, f called
probability density function (pdf).
Definition. A probability function P on a set of real numbers S is called continuous, if there exists a
R
function f(t) such that for any closed interval [a, b] ⊂ S we have: P ([a, b]) = ab f(t)dt.
For a function f to be a pdf of some probability function P, it is necessary and sufficient that the
following properties are satisfied:
1. f(t) ≥ 0 for every t;
2.
R∞
−∞
f(t)dt = 1.
NOTE: If P is such that P (A) =
axioms.
R
A
f(t)dt for all A, then P satisfies all the Kolmogorov probability
Definition: Any function Y that maps S (a subset of real numbers) into the real numbers is called a
continuoous random variable. The pdf of Y is a function f such that
P (a ≤ Y ≤ b) =
For any event A defined on S: P (A) =
R
Z
b
a
f(t)dt.
A f(t)dt.
Definition. The cdf of a continuous random variable Y with pdf f is function FY (t) such that
FY (y) = P (Y ≤ y) = P ({s ∈ S : Y (s) ≤ y}) =
Z
y
−∞
f(t)dt for any real y.
Theorem. If FY (t) is a cdf and fY (t) is a pdf of a continuous random variable Y , then
d
FY (t) = fY (t).
dt
Properties of cdf. Let Y be a continuous random variable with cdf F. Then the following are true:
1. P (Y > y) = 1 − FY (y);
2. P (a < Y < b) = FY (b) − FY (a);
3. limy→∞ FY (y) = 1;
4. limy→−∞ FY (y) = 0;
Theorem: The cdf of any random variable is a nondecreasing function.
Linear transformation: Let X be a continuous random variable with pdf f. Let Y = aX + b, where
1
a and b are real constants. Then the pdf of Y is: gY (y) = |a|
fX ( y−b
).
a
3
MORE ON RANDOM VARIABLES
A General framework for random variables:
Definition A triple (S, E, P), where S is a sample space, E is the set of subsets of S (events), and
P is a probability function on S (that is P : E → [0, 1] satisfying Kolmogorov’s axioms), is called a
probability space.
Definition A real valued function X on S, X → R is called a random variable.
Definition A probability function (measure) defined on the real numbers R as follows:
P (A) = P (X ∈ A) = P ({s ∈ S : X(s) ∈ A})
is called the probability distribution of X.
Theorem The distribution of X is uniquely determined by the cumulative distribution function
(cdf) FX of X:
FX (x) = P (X ≤ x) = P ((−∞, x]).
Properties of cdf
1. F is nondecreasing: If x1 ≤ x2 , then F (x1) ≤ F (x2);
2. F is right - continuous: for any x, limy→x+ F (y) = F (x);
3. limy→∞ FY (y) = 1;
4. limy→−∞ FY (y) = 0.
NOTE: Here are two useful rules for computing probabilties:
1. For a sequence of increasing sets A1 ⊂ A2 ⊂ . . . the probability of their union is the limit of their
S
probabilities, that is: P ( ∞
i=1 Ai) = limi→∞ P (Ai ).
2. For a sequence of decreasing sets A1 ⊃ A2 ⊃ . . . the probability of their intersection is the limit of
T
their probabilities, that is: P ( ∞
i=1 Ai ) = limi→∞ P (Ai ).
Types of distributions: There are three main types of distributions/random variables:
1. Discrete random variable: CDF is a step function, has at most countable number of values.
2. Continuous random variable: CDF is a continuous function, has intervals in the set of values.
3. Mixed random variable: CDF is neither continuous nor step function.
Theorem: For any continuous random variable P (X = a) = 0 for any real number a.
4
EXPECTED VALUES OF RANDOM VARIABLES
To get an idea about the central tendency for a random variable, we compute its expected value (mean).
Definition Let X be a random variable.
1. If X is a discrete random variable with pdf pX (k), then the expected value of X is given by
E(X) = µ = µX =
X
k · pX (k) =
all k
X
k · P (X = k)
all k
2. If X is a continuous random variable with pdf f, then
EX = µ = µX =
Z
∞
−∞
xf(x)dx.
3. If X is a mixed random variable with cdf F, then the expected value of X is given by
E(X) = µ = µX =
Z
∞
xF 0(x)dx +
X
k · P (X = k),
all k
where F 0 is the derivative of F where the derivative exists and k’s in the summation are the
”discrete” values of X.
−∞
NOTE: For the expectation of a random variable to exist, we assume that all integrals and sums in
the definition of the expectation above converge absolutely.
Median of a random variable - a value ”dividing the distribution of X in halfs. If X is a discrete
random variable, then its median m is the point for which P (X < m) = P (X > m). If there are two
values m and m0 such that P (X ≤ m) = 0.5 and P (X ≥ m0) = 0.5, the median is the average of m and
m0, (m + m0)/2.
If X is a continuous random variable with pdf f, the median is the solution of the equation:
Z
m
−∞
f(x)dx = 0.5.
EXPECTED VALUES OF A FUNCTION OF A RANDOM VARIABLE
Theorem. Let X be a random variable. Let g(·) be a function of X.
If X is discrete with pdf pX (k), then the expected value of g(X) is given by
Eg(X) =
X
g(k) · pX (k) =
all k
provided that
P
X
g(k) · P (X = k),
all k
all k | g(k) | pX (k) is finite.
If X is a continuous random variable with pdf fX (x), and if g is a continuous function, then the expected
value of g(X) is given by
Eg(X) =
provided that
R∞
−∞
Z
∞
−∞
g(x)f(x)dx,
| g(x) | f(x)dx is finite.
NOTE: Expected value is a linear operator, that is E(aX + b) = aE(X) + b, for any rv X.
5
VARIANCE and HIGHER MOMENTS of A RANDOM VARIABLE
To get an idea about variability of a random variable, we look at the measures of spread. These include
variance and standard deviation.
Definition. Variance of a random variable, denoted Var(X) or σ 2 , is the average of its squared deviations
from the mean µ. Let X be a random variable.
1. If X is a discrete random variable with pdf pX (k) and mean µX , then the variance of X is given
by
X
X
V ar(X) = σ 2 = E[(X − µX )2 ] =
(k − µX )2pX (k) =
(k − µX )2P (X = k)
all k
all k
2. If X is a continuous random variable with pdf f and mean µX , then
2
2
V ar(X) = σ = E[(X − µX ) ] =
Z
∞
−∞
(x − µX )2 f(x)dx.
3. If X is a mixed random variable with cdf F and mean µX , then the variance of X is given by
V ar(X) = σ 2 = E[(X − µX )2 ] =
Z
∞
−∞
(x − µX )2 F 0(x)dx +
(k − µX )2 P (X = k),
all k
X
where F 0 is the derivative of F where the derivative exists and k’s in the summation are the
”discrete” values of X.
NOTE: If EX 2 is not finite, then variance does not exist.
Definition. Standard deviation σ of a r.v. X is square root of its variance (if exists): σ =
q
V ar(X).
NOTE: The units of variance are square units of the random variable. The units of standard deviation
are the same as the units of the random variable.
Theorem: Let X be a random variable with variance σ 2. Then, we can compute σ 2 as follows:
V ar(X) = σ 2 = E(X 2 ) − µ2X = E(X 2 ) − [E(X)]2
Theorem: Let X be a r.v. with variance σ 2. Then variance of aX + b, for any real a and b, is given by:
V ar(aX + b) = a2V ar(X).
HIGHER MOMENTS OF A RANDOM VARIABLE
Expected value is called the first moment of a random variable. Variance is called the second central
moment or second moment about the mean of a random variable. In general, we have the following
definition of the central and ordinary moments of random variables.
Definition: Let X be an r.v. Then the
1. The rt h moment of X (about the origin) is µr = E(X r ), provided that the moment exists.
2. The rt h moment of X about the mean is µ0r = E[(X − µX )r ], provided that the moment exists.
6
JOINT DENSITIES- RANDOM VECTORS
Joint densities describe probability distributions of random vectors. A random vector X is an ndimensional vector which components are random variables: X = (X1 , X2 , . . . , Xn ), where all Xi0 s are
random variables. We will concentrate on bivariate random vectors, that is n=2.
Discrete random vectors are described by the joint probability density function of X and Y (or joint
pdf ) denoted by
pX,Y (x, y) = P (s ∈ S : X(s) = x, Y (s) = y) = P (X = x, Y = y).
Another name for the joint pdf of a discrete random vector is joint probability mass function (pmf ).
Computing probabilities for discrete random vectors. For any subset A of R2 , we have
P ((X, Y ) ∈ A) =
X
X
P (X = x, Y = y) =
(x,y)∈A
pX,Y (x, y).
(x,y)∈A
Continuous random vectors are described by the joint probability density function of X and Y (or
joint pdf ) denoted by fX,Y (x, y). The pdf has the following properties:
1. fX,Y (x, y) ≥ 0 for every (x, y) ∈ R2 .
2.
R∞ R∞
−∞ −∞
fX,Y (x, y)dxdy = 1.
3. For any region A in the xy-plane P ((X, Y ) ∈ A) =
RR
A
fX,Y (x, y)dxdy.
Marginal distributions. Let (X, Y ) be a continuous/discrete random vector having a joint distribution with pdf/pmf f(x, y). Then, the one-dimensional distributions of X and Y are called marginal
distributions. We compute the marginal distributions as follows:
If (X, Y ) is a discrete vector, then the distributions of X and Y are given by:
fX (x) =
X
P (X = x, Y = y) and fY (y) =
all y
X
P (X = x, Y = y).
all x
If (X, Y ) is a continuous vector, then the distributions of X and Y are given by:
fX (x) =
Z
∞
−∞
f(x, y)dy and fY (y) =
Z
∞
−∞
f(x, y)dx.
Joint cdf of a vector (X, Y ). The joint cumulative distribution function of X and Y (or joint cdf) is
defined by
FX,Y (u, v) = P (X ≤ u, Y ≤ v).
Theorem. Let FX,Y (u, v) be a joint cdf of the vector (X, Y ). Then the joint pdf of (X, Y ), fX,Y ,
∂2
is given by second partial deriveative of the cdf. That is fX,Y (x, y) = ∂x∂y
FX,Y (x, y), provided that
FX,Y (x, y) has continuous second partial derivatives.
Multivariate random vectors. Please read on page 215.
7
INDEPENDENT RANDOM VARIABLES
Definition. Two random variables are called independent iff for every intervals A and B on the real
line P (X ∈ A and Y ∈ B) = P (X ∈ A)P (Y ∈ B).
Theorem. The random variables X and Y are independent iff
fX,Y (x, y) = fX (x)fY (y),
where f(x, y) is the joint pdf of (X, Y ), and fX (x) and fY (y) are the marginal densities of X and Y,
respectively.
NOTE: Random variables X and Y are independent iff FX,Y (x, y) = FX (x)FY (y), where F (x, y) is the
joint cdf of (X, Y ), and FX (x) and FY (y) are the marginal cdf’s of the X and Y, respectively.
Independence of more than 2 r.v.s A set of n random variables X1 , X2 , X3 , . . . , Xn are independent iff their joint pdf is a product of the marginal pdfs, that is fX1 ,X2 ,X3 ,...,Xn (x1, x2, x3 , . . . , xn ) =
fX1 (x1 )fX2 (x2 ) · · · fXn (xn ), where fX1 ,X2 ,X3 ,...,Xn (x1, x2, x3 , . . . , xn ) is the joint pdf of the vector (X1 , X2 , X3 , . . . , X
and fX1 (x1 ), fX2 (x2), · · · , fXn (xn ) are the marginal pdf’s of the variables X1 , X2 , X3 , . . . , Xn .
Random Sample. A random sample of size n from distribution f is a set X1 , X2 , X3 , . . . , Xn of independent and identically distributed (iid), with distribution f, random variables.
CONDITIONAL DISTRIBUTIONS
Let (X, Y ) be a random vector with some joint pdf or pmf. Consider the problem of finding the
probability that X=x AFTER a value of Y was observed. To do that we develop conditional distribution
of X given Y=y.
Definition. If (X, Y ) is a discrete random vector with pmf pX,Y (x, y), and if P (Y = y) > 0, then the
conditional distribution of X given Y=y is given by the conditional pmf
pX|Y =y (x) =
pX,Y (x, y)
.
pY (y)
Similarily, if P (X = x) > 0, then the conditional distribution of Y given X=x is given by the conditional
p
(x,y)
pmf pY |X=x (y) = X,Y
.
pX (x)
Definition. If (X, Y ) is a continuous random vector with pdf fX,Y (x, y), and if fY (y) > 0, then the
conditional distribution of X given Y=y is given by the conditional pdf
fX|Y =y (x) =
fX,Y (x, y)
.
fY (y)
Similarily, if fX (x) > 0, then the conditional distribution of Y given X=x is given by the conditional pdf
f
(x,y)
fY |X=x (y) = X,Y
.
fX (x)
Independence and conditional distributions. If random variables X and Y are independent, then
their marginal pdf/pmf’s are the same as their conditional pdf/pmf’s. That is fY |X=x(y) = fY (y) and
fX|Y =y (x) = fX (x), for all y and x where fY (y) > 0 and fX (x) > 0, respectively.
8
FUNCTIONS OF RANDOM VARIABLES or COMBINING RANDOM VARIABLES:
PDF OF A SUM, PRODUCT AND QUOTIENT of TWO RVs.
Let X and Y be independent random variables with pdf or pmf’s fX and fY or pX and pY , respectively.
Then,
If X and Y are discrete random variables, then the pmf of their sum W = X+ Y is
pW (w) =
X
pX (x)pY (w − x).
allx
If X and Y are continuous random variables, then the pdf of their sum W = X+ Y is the convolution
of the individual densities:
Z ∞
fW (w) =
fX (x)fY (w − x)dx.
−∞
If X and Y are independent continuous random variables, then the pdf of their quotient W = Y/X is
given by:
Z ∞
fW (w) =
| x | fX (x)fY (wx)dx.
−∞
The above formula is valid, if X is equal to zero in at most a set of isolated points (no intervals).
If X and Y are independent continuous random variables, then the pdf of their product W = XY is
given by:
Z ∞
1
fX (w/x)fY (x)dx.
fW (w) =
−∞ | x |
9
MORE PROPERTIES OF MEAN AND VARIANCE
Mean of a function of two random variables. Let (X, Y) be a discrete random vector with pmf
p(x, y) or continuous random vector with pdf f(x, y). Let g(x, y) be a real values function of X and Y.
Then, the expected value of he random variable g(X, Y ) is given by:
Eg(x, y) =
XX
g(x, y)p(x, y), in discrete case, and
allx ally
Eg(x, y) =
Z
∞
−∞
Z
∞
−∞
g(x, y)f(x, y)dxdy, in continuous case,
provided that the sums and the integrals converge absolutely.
Mean of a sum of random variables. Let X and Y be any random variables, and a and b real
numbers. Then
E(aX + bY ) = aE(X) + bE(Y ),
provided both expectations are finite.
NOTE: Let X1 , X2 , . . . , Xn be any random variables with finite means, and let a1, a2, . . . , an be a set
of real numbers. Then
E(a1X1 + a2X2 + · · · + an Xn ) = a1E(X1 ) + a2E(X2 ) + · · · + an E(Xn ).
Mean of a product of independent random variables. If X and Y are independent random variables with finite expectations, then E(XY ) = (EX)(EY ).
Variance of a sum of independent random variables.
Let X1 , X2 , . . . , Xn be any independent random variables with finite second moments (i.e. E(Xi2 ) < ∞).
Then
V ar(X1 + X2 + · · · + Xn ) = V ar(X1 ) + V ar(X2 ) + · · · + V ar(Xn ).
NOTE: Let X1 , X2 , . . . , Xn be any independent random variables with finite second moments, and let
a1, a2, . . . , an be a set of real numbers. Then
V ar(a1X1 + a2 X2 + · · · + an Xn ) = a21V ar(X1 ) + a22V ar(X2 ) + · · · + a2n V ar(Xn ).
ORDER STATISTICS
Definition. Let X1 , X2 , . . . , Xn be a random sample from a continuous distribution with pdf f(x). Let
x1, x2 , . . . , xn be the values of the sample. Order the sample values from the smallest to the largest:
x(1) < x(2) < . . . < x(n) . Define random variable X(i) to have the value x(1), 1 ≤ i ≤ n. Then X(i) is
called the ith order statistic.
Special order statistics. Maximum X(n) and minimum X(1) of a random sample.
10
Theorem. Let X1 , X2 , . . . , Xn be a random sample from a continuous distribution with pdf f(x) and
cdf F(x). The pdf of the ith order statistics is given by
fX(i) (x) =
n!
[F (x)]i−1[1 − F (x)]n−if(x),
(i − 1)!(n − i)!
for any 1 ≤ i ≤ n.
11
MOMENT GENERATING FUNCTIONS
Definition. Let X be a random variable. The moment generating function MX (t) of X is given by
MX (t) =
( P
etk p (k)
EetX R ∞allk tx X
if X discrete,
−∞ e f(x)dx if X continuous,
at all real values t for which the expectation exists.
Use for mgf’s. Moment generating functions are very useful in mathematical statistics. They are
primarily used for two purposes:
1. Finding moments of random variables, and
2. Identifying distributions of random variables.
Theorem. Let X be a continuous random variable with pdf f(x). Assume that the pdf f(x) is sufficiently
smooth for the order of differentiation and integration to be exchanged. Let MX (t) be the mgf of X.
Then
(r)
MX (0) = E(X r ),
provided that the rt h moment of X exists.
Theorem: Identifying distributions. Suppose that X1 and X2 are random variables with pdf/pmf’s
fX1 (x) and fX2 (x), respectively. Suppose that MX1 (t) = MX2 (t) for all t in a neighborhood of 0. Then
X1 and X2 are equal in distribution, that is fX1 (x) = fX2 (x).
Theorem: Properties of mgf’s. This theorem describes the mgf of a linear function and of a sum
of independent random variables.
1. Mgf of a linear function. Suppose that X is a random variable with mgf MX (t), and let
Y = aX + b, where a and b are real numbers. Then, the mgf of Y is given by:
MY (t) = ebt MX (at).
2. Mgf of a sum of independent rv’s. Suppose X1 , X2 , . . . , Xn are independent random variables
with mgf’s MX1 (t), MX2 (t), . . . , MXn (t), respectively. Then, the mgf of their sum is the product
of the mgf’s, that is
MX1 +X2 +···+Xn (t) = MX1 (t) · MX2 (t) · · · MXn (t).
12