Download Expected value

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Class structure
Homework due in class on Fridays
(15-20% of grade)
STAT/MATH 395
Midterm Fri May 8 (30% of grade)
Final TBA (35-40% of grade)
Peter Guttorp & June Morita
[email protected]
Class participation (15% of grade)
[email protected]
A 2.0 usually will require 60% of possible
points
A numbers game
Chapter 4
Expected Values
Expected value
Variance
A popular numbers game is DJ, where the
winning ticket is determined from Dow
Jones averages. Three sets of stocks are
used: Industrials, Transportation, and
Utilities, and two quotes, at 11 am and
noon, Eastern time.
Covariance
Law of large
numbers
11 am
I
T
U
Noon
16410.75
1638.96
7509.34
7478.11
525.66
525.90
546 + 610 = 1 156
In this example, the winning number is
156. The payoff is 700 to 1. Suppose we
bet $5. How much do we win or lose, on
average?
Numbers game, cont.
Expected value
Let p = probability my number wins =
Let X = my earnings. In the long run,
I will win $ 3500 the fraction 1/1000 of the
time, and lose $ 5 the fraction 999/1000
of the time.
The balance is
Let X be a random variable with pmf pX(x) or
pdf fX(x). The expected value of X, denoted
E(X), is defined by
E(X) =
∞
∑ kp X (k)
k=−∞
in
the discrete case, and by
∞
E(X) = ∫ x fX (x)dx
−∞
in the continuous case, provided the sum
or integral is absolutely convergent.
In physics, the expected value is called
the center of mass.
An exam problem
The Poisson distribution
Recall that the Poisson distribution has pmf
λx −λ
p(x) =
e
x!
Here the parameter λ commonly describes
the average number of events per unit time.
To see that this is indeed the expected value
we compute
Note that the expected value need not
be a possible value!
A true-false question on a statistics exam
has ten parts, each worth 2 points. Any
incorrect answer is penalized by -3
points, although a negative total is
recorded as 0. If a student guesses, what
is the expected score for the part?
Clearly, guessing should be avoided unless
one is fairly certain to know the answer.
What is the smallest probability that does
not yield a negative expected score to a
part?
Lotka’s family size model
What proportion of people are firstborn?
The number of children of white families
in the 1920’s was described by
#βp(1− p)k ,k = 1,2,...
p(k) = $
% 1− β(1− p),k = 0
a geometric distribution with modified
zero term. Here p and β are probabilities.
Find the expected number of children.
For β=0.879, p=0.264, we get expected #
children 2.45.
Some properties of expected values
Law of the unconscious probabilist
E(X+a) = E(X) + a
E(g(X)) = ∑ g(x)p X (x)
x
Proof: Suppose that g(x) takes on values y1, y2, …Then
Pg(X) (yi ) = P(g(X) = yi ) =
and
∞
∑ p X (x)
{x:g(x)=yi }
∞
E(g(X)) = ∑ y ip g(X) (yi ) = ∑ y i
∑ p X (x)
i=1 {x:g(x)=yi }
i=1
E(bX) = b E(X)
∞
= ∑
∑ g(x)p X (x) = ∑ g(x)pX (x)
i=1{x:g(x)=yi }
x
The addition rule
for expectations
n
The binomial distribution
Write X = ∑ Yi , where the Yi are
indicators i=1
of success, i.e.,
n
n
i=1
i=1
E(X) = ∑ E(Yi ) = ∑ {0 × P(Yi = 0) + 1× P(Yi ) = 1} = np
What is the expected value of the
hypergeometric distribution?
Group testing, cont.
Group testing
Let N be
the total number of tests needed,
m
or N = ∑ Xi . We compute
i=1
1.0
E(N) =
0.8
More generally, group the n samples into
m groups of k samples each (so n=mk),
test each of the groups separately. If the
test is negative, we are done, while if it is
positive, each individual in the group is
tested.
Let p = P(individual test negative), and Xi
= # tests on group i. Then
E(Xi) =
0.6
A large number n of blood samples are
to be screened for HIV. Testing each
sample separately requires n tests.
Pooling half of each sample requires one
test if all samples are free from HIV, while
if at least one is defective, the other half
of each sample could be tested
individually. Most of the time we may get
away with doing just one test.
1 + 1/k - p^k
A special case: E(aX+b) = aE(X) + b
Using the addition rule for expectations
0.4
NOTE: No assumption of independence.
This result holds whenever the
expectations exist.
⎪⎧ 1if the i' th event occurs
Yi = ⎨
0 otherwise
⎪⎩
0.2
E(X+Y) = E(X) + E(Y)
5
10
k
15
20
Computation
Measuring spread
The expected value is one measure of the
location of the distribution of a random
variable. In addition to the location it is
useful to have a description of how
spread out a distribution is.
-2
-1
0
1
0.1 0.3
dnorm(x)
0.1 0.3
dnorm(x)
different location
0
1
2
1
2
3
2
The variance is defined whenever E(X2) < ∞
Var(aX + b) =
same location
-2
-1
0
x
1
different spread
2
x
Var(X)=E(X-E(X))2
moment of inertia in mechanics
units: those of X squared
units of X
sd(X) = Var(X)
Some examples
The Bienaymé-Chebyshev
inequality
Exponential distribution: f(x) = α exp ( - α x), x>0
∞
1
E(X) = ∫ xα exp(−αx)dx =
α
0
2
∞
2
E(X ) = ∫ x α exp(−αx)dx =
Consider a random variable X with
expected value µ and variance σ2. Then
for any t > 0 we have that2
2
α2
2
1
1
Var(X) = E(X 2 ) − [E(X)]2 =
−
=
α2 α 2 α 2
Poisson distribution: p(x) = λx e-λ / x!
0
P( X − µ > t) ≤
P(| X − µ |> t) =
∞
λx
x=0
x!
E(X(X − 1)) = ∑ x(x − 1)
e −λ = λ2 ∑
E(X 2 ) = E(X(X − 1)) + E(X) = λ2 + λ
Var(X) = λ2 + λ − λ2 = λ
∞
λx −2
x = 2 (x − 2)!
σ
t2
Proof:
E(X) = λ
2
2
4
0.0 0.4 0.8
dnorm(x)
dnorm(x, sd = 0.5)
0
= E(X 2 ) − [E(X) ]
x+2
0.1 0.3
-1
= E(X 2 ) − 2E(X)E(X) + [E(X) ]
same spread
2
x
-2
Var(X) = E((X − E(X))2 ) = E(X 2 ) − 2E(XE(X)) + [E(X) ]
∫ f(x)dx
{x:|x − µ|> t}
e −λ = λ2
≤
∫
(x − µ) 2
{x:| x −µ|> t}
t2
Irené-Jules
Bienaymé
1796-1878
2
∞ (x − µ) 2
σ
f(x)dx =
2
2
t
t
−∞
f(x)dx ≤ ∫
Pafnuty
Chebyshev
1821-1894
How good is the
Bienayme-Chebyshev
inequality?
Higher moments
Let µk = E(Xk) and ck=E(X-µ)k
In particular :
Let X ~ Exp (1). How well does the
inequality estimate
P(|X - 1|>2)? We have E(X)=Var(X)=1, so
the inequality says that
P(|X – 1| > 2) ≤
µ1 =
c2 =
c3 is called the skewness, and c4 the
kurtosis.
while the exact probability is
Sometimes the coefficien of skewness is
defined as
P(|X - 1| > 2 ) =
c3/c23/2, and the coefficient of kurtosis as
c4/c22
Example
Joint distribution
Let fX(x)=λexp(-λx), x>0. Then
k
E(X ) =
∫
∞
0
k
x λe
− λx
1 ∞
k!
dx = k ∫ uke −u du = k
λ 0 λ
Γ (k+1)=k!
Joint cdf FX,Y(x,y) = P(X≤x,Y≤y)
Now let fX(x)=(2π)-1/2exp(-x2/2). Then
E(X 2k+1 ) =
∞
E(X 2k ) =
∫
−∞
x2k −x2 /2
2k−1/2
e
dx =
2π
2π
If we consider two random variables, X
and Y, we need to consider their joint
behavior.
Joint pmf pX,Y(x,y) = P(X=x,Y=y)
Joint pdf fX,Y (x,y) =
∞
k−1/2
∫u
0
= 1× 3 × 5 × (2k − 1) = (2k − 1)!!
e −u du
∂2
FX,Y (x,y)
∂x∂y
Queue lengths
A random point
in a circle
A supermarket has two express lines. At a
given time, let X and Y be the number of
customers in line 1 and 2, respectively. The
joint pmf is
X
y
0
1
2
3
0
0.1
0.2
0
0
1
0.2
0.25
0.05
0
2
0
0.05
0.05
0.02
3
0
0.03
0.05
(X,Y)
What is the density of (X,Y) if we choose the
point at random in the unit circle, in the
sense that equal areas have equal
probabilities?
P(X≠Y) =
More about a random
point in a circle
Given a point chosen at random in the unit
circle, what is the distribution of the xcoordinate?
1− x
2
(x,y)
A random point
in a circle
Suppose that we are interested in the
expected distance from the origin of a point
(X,Y) selected at random in the unit circle,
or Z=(X2+Y2)1/2. To compute its expected
value we simply calculate
∫∫
E(Z) =
−
1− x
2
1− x2
∞
fX (x) =
∫f
X,Y
(x, y)dy =
−∞
2
=
π
∫
− 1− x
2
2
1− x , −1 ≤ x ≤ 1
(x2 + y 2 )1/2 fX,Y (x,y)dxdy
x2 +y 2 ≤1
1
dy
π
1
=
∫
r=0
π
1
1
2
2 2π
∫ r πrdrdθ = ∫ r π dr = 3
θ= − π
0
Covariance
Marginal distributions
If (X,Y) is discrete, the marginal distribution
of X is given by
p X (x) =
∑p
X,Y
(x,y)
y
while if they are continuous it is given by
∞
fX (x) =
∫
When two random variables are not
independent, it is sometimes important to
have a measure of their dependence. A
complete description is the joint
distribution, but a simple summary is the
covariance, given by
Cov(X,Y) = E((X - E(X))(Y - E(Y))
fX,Y (x,y)dy
y=−∞
The marginal distribution contains no
information about the joint behavior of X
and Y.
= E(XY) - E(X)E(Y) - E(X)E(Y) + E(X)
= E(XY) - E(X)E(Y)
Cov(aX+bY,cZ+dV) = ac Cov(X,Z)
+ bc Cov(Y,Z) + ad Cov(X,V) +bd Cov(Y,V)
Cov(X,X) =
35
An example
Var(X+Y) =
Contributions to the covariance
Let pX,Y(x,y)=(x+2y)/22, where (x,y) take
values in {(1,1),(1,3),(2,1),(2,3)}
(X-EX)(Y-EY) < 0
(X-EX)(Y-EY) > 0
(EX,EY)
(X-EX)(Y-EY) > 0
(X-EX)(Y-EY) < 0
The variance of a sum
n n
n
n
Var( ∑ Xi ) = ∑ Var(Xi ) + ∑ ∑ Cov(Xi ,X j )
i=1
i=1
i=1 j=1
Binomial distribution
As for the mean, write
where
i≠j
In particular, if the Xi are independent we
have
Cov(Xi,Xj) = E(XiXj) - E(Xi)E(Xj)
so, using the independence of the Yi ,
= E(Xi)E(Xj) - E(Xi)E(Xj)
=0
so that
Hypergeometric
variance
As for the binomial distribution write
where
A linear relationship
Let Y=aX+b, a and b constants. Then
Cov(X,Y) =
Writing p = r / N we have again that Var(Yi) =
pq, but now the Yi are no longer independent.
We need to compute their covariance.
By symmetry,
Cov(aX+b,cY+d)
Correlation coefficient
Cauchy-Schwarz inequality
We call a random variable standardized if
it has mean 0 and variance 1. This can be
achieved simply by subtracting off the
mean and dividing by the sd.
If S and T are random variables with finite
variance, then
The covariance between two standardized
random variables is called the correlation
coefficient. It is a number between -1 and 1.
Corr(X,Y) = Cov(X*,Y*) = Cov(X,Y) / sd(X)sd(Y)
{E(ST))2 ≤ E(S2) E(T2)
Proof: If Z is nonnegative, then E(Z) ≥ 0. Let a be
arbitrary. Then
0 ≤ E(aS + T)2 = a2ES2 + 2a E(ST) + ET2
A nonnegative quadratic (in a) must have nonpositive
discriminant, i.e. 4(E(ST))2 – 4ES2 ET2 ≤ 0.
Corollary |Corr(S,T)| ≤ 1.
Hermann Schwarz
1843-1921
Augustin-Lous Cauchy
1789-1857
Some scatters of points
Monday’s lecture
Covariance and linear functions
Challenger explosion
Correlation coefficient
Cauchy-Schwarz inequality
The long run
Consider independent and identically
distributed (iid) random variables Xi,
taking on the values 0 or 1 with equal
probabilities. What happens if we average
them?
The law of large numbers
Let X1,…,Xn be iid random variables with
mean µ and
variance σ2 < ∞ . Then
n
P(
1
∑ (Xi − µ) > ε) → 0 as
n i=1
n→ ∞
We say that X = (1 / n)∑ Xi P converges in
probability to µ, written X → µ
Proof:
Link
Monte Carlo integration
Suppose we want to integrate a function
f(x) from 0 to 1, but are unable to do the
integration analytically. One approach is
to use random numbers. Write
1
1
∫ f(x)dx = ∫ (f(x) × 1)dx = E(f(X))
0
An optic experiment
Consider a concentrated light source,
such as a laser, reflected in a mirror
which can be moved around a vertical
axis A. We are interested in the location
of the reflection on a wall.
0
where X ~ U(0,1). By the law of large
numbers we have
where X1,…,Xn are iid as X. How can we
use this?
φ
X
Averaging Cauchy variables
The mean of the Cauchy distribution
Why does the law of large numbers not
work for the Cauchy variables?
Compute
∞
E(X) =
xdx
∫ π(1+ x
−∞
2
)