Download Set Theory Digression

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Transcript
1
Introduction to Probability
1.1
Basic Rules of Probability
Set Theory Digression
A set is de…ned as any collection of objects, which are called points or
elements. The biggest possible collection of points under consideration is
called the space, universe, or universal set. For Probability Theory the
space is called the sample space.
A set A is called a subset of B (we write A
B or B
A) if every
element of A is also an element of B. A is called a proper subset of B (we
write A
B or B
A) if every element of A is also an element of B and
there is at least one element of B which does not belong to A.
Two sets A and B are called equivalent sets or equal sets (we write
A = B) if A
B and B
A.
If a set has no points, it will be called the empty or null set and denoted
by .
The complement of a set A with respect to the space
Ac , or
A, is the set of all points that are in
, denoted by A,
but not in A.
The intersection of two sets A and B is a set that consists of the common
elements of the two sets and it is denoted by A \ B or AB.
The union of two sets A and B is a set that consists of all points that
are in A or B or both (but only once) and it is denoted by A [ B.
The set di¤erence of two sets A and B is a set that consists of all points
in A that are not in B and it is denoted by A
B.
Properties of Set Operations
Commutative: A [ B = B [ A and A \ B = B \ A.
Associative: A[(B [ C) = (A [ B)[C and A\(B \ C) = (A \ B)\C.
1
Distributive: A \ (B [ C) = (A \ B) [ (A \ C) and A [ (B \ C) =
(A [ B) \ (A [ C).
(Ac )c = A = A i.e. the complement of A complement is A.
If A subset of
A[
(the space) then: A \
= A, A [
=
, A\
= ,
= A, A \ A = , A [ A = , A \ A = A, and A [ A = A.
De Morgan Law: (A [ B) = A \ B, and (A \ B) = A [ B.
Disjoint or mutually exclusive sets are the sets that their intersection
is the empty set, i.e. A and B are mutually exclusive if A \ B = . Subsets
A1 ; A2 ; :::: are mutually exclusive if Ai \ Aj =
for any i 6= j.
The sample space is the collection or totality of all possible outcomes
of a conceptual experiment.
An event is a subset of the sample space. The class of all events associated with a given experiment is de…ned to be the event space.
Classical or a priori Probability: If a random experiment can result
in N mutually exclusive and equally likely outcomes and if N (A) of these
outcomes have an attribute A, then the probability of A is the fraction
N (A)=N i.e. P (A) = N (A)=N , where N = N (A) + N (A).
Example: Consider the drawing an ace (event A) from a deck
of 52 cards. What is P (A)?
We have that N (A) = 4 and N (A) = 48. Then N = N (A) + N (A) =
4 + 48 = 52 and P (A) =
N (A)
N
=
4
52
Frequency or a posteriori Probability: Is the ratio of the number
that an event A has occurred out of n trials, i.e. P (A) = =n.
Example: Assume that we ‡ip a coin 1000 times and we observe
450 heads. Then the a posteriori probability is P (A) = =n = 450=1000 =
0:45 (this is also the relative frequency). Notice that the a priori probability
is in this case 0.5.
2
Subjective Probability: This is based on intuition or judgment.
We shall be concerned with a priori probabilities. These probabilities
involve, many times, the counting of possible outcomes.
1.1.1
Methods of Counting
We have the following cases:
1. Duplication is permissible and Order is important (Multiple
Choice Arrangement), i.e. the element AA is permitted and AB is a
di¤erent element from BA. In this case where we want to arrange n objects
in x places the possible outcomes is given from: Mxn = nx .
Example: Find all possible combinations of the letters A, B, C,
and D when duplication is allowed and order is important.
The result according to the formula is: n = 4, and x = 2, consequently
the possible number of combinations is M24 = 42 = 16. To …nd the result we
can also use a tree diagram.
2. Duplication is not permissible and Order is important (Permutation Arrangement), i.e. the element AA is not permitted and AB
is a di¤erent element from BA. In this case where we want to permute n
objects in x places the possible outcomes is given from: Pxn =
n!
.
(n x)!
Example: Find all possible permutations of the letters A, B, C,
and D when duplication is not allowed and order is important.
The result according to the formula is: n = 4, and x = 2, consequently
the possible number of combinations is P24 =
4!
(4 2)!
=
2 3 4
2
= 12.
3. Duplication is not permissible and Order is not important
(Combination Arrangement), i.e. the element AA is not permitted and
AB is not a di¤erent element from BA. In this case where we want the
combinations of n objects in x places the possible outcomes is given from:
3
Cxn =
n!
.
x! (n x)!
Example: Find all possible combinations of the letters A, B, C,
and D when duplication is not allowed and order is not important.
The result according to the formula is: n = 4, and x = 2, consequently
the possible number of combinations is C24 =
1.1.2
4!
2! (4 2)!
=
2 3 4
2 2
= 6.
Probability De…nition and Properties
To de…ne the probability rigorously we need the following de…nition of event
space, say A, which is a collection of subsets of the sample space . A is an
event space if:
2A
i)
i.e. the sample space belong to the event space.
ii) If A 2 A; then Ac 2 A:
and
iii) If A1 2 A and A2 2 A; then A1 [ A2 2 A:
Under these 3 conditions A is called algebra of events or simply event
space.
P [:], a set function with domain A and counter domain the closed interval
[0,1], is a Probability Function or simply Probability if it satis…es the
following conditions:
0 f or every A 2 A
i) P [A]
ii) P [ ] = 1 and
"1 #
1
[
X
iii) P
Ai =
P [Ai ]
i=1
i=1
4
for any sequence of mutually exclusive events A1 ; A2 ; :::: (i.e. Ai \ Aj =
1
S
any i 6= j) and A1 [ A2 [ :::: =
Ai 2 A.
for
i=1
Properties of Probability
1. P [ ] = 0.
2.If A1 ; A2 ; :::An are mutually exclusive events then P
n
S
i=1
3. If A is an event in A, then P [A] = 1
P [A].
Ai =
n
P
P [Ai ].
i=1
4. For every two events A 2 A and B 2 A, P [A [ B] = P [A] + P [B]
P [AB]. More generally, for events A1 ; A2 ; :::An 2 A we have:
n
n
S
P
PP
PPP
P
Ai =
P [Ai ]
P [Ai Aj ]+
P [Ai Aj Ak ] ::+( 1)n+1 P [A1 ::An ].
i=1
i=1
i<j
i<j<k
For n = 3 the above formula is:
S S
P [A1 A2 A3 ] = P [A1 ]+P [A2 ]+P [A3 ] P [A1 A2 ] P [A1 A3 ] P [A2 A3 ]
+P [A1 A2 A2 ].
Notice that if A and B are mutually exclusive, then P [A [ B] = P [A] +
P [B].
5. If A 2 A, B 2 A, and A
B then P [A]
P [B].
With the use of Venn Diagrams we can have an intuitive explanation to
the above properties. The triplet ( ; A; P [:]) is called probability space.
1.1.3
Conditional Probability and Independence
Let A and B be two events in A and a probability function P [:]. The conditional probability of A given event B, is denoted by P [AjB] and is de…ned
by:
P [AjB] =
P [AB]
P [B]
if
P [B] > 0
and is left unde…ned if P [B] = 0.
From the above formula is evident P [AB] = P [AjB]P [B] = P [BjA]P [A]
5
if both P [A] and P [B] are nonzero. Notice that when speaking of conditional
probabilities we are conditioning on some given event B; that is, we are
assuming that the experiment has resulted in some outcome in B. B, in
e¤ect then becomes our ”new”sample space. All probability properties of the
previous section apply to conditional probabilities as well. However, there is
an additional property (Law) called the Law of Total Probabilities which
states that:
For a given probability space ( ; A; P [:]), if B1 ; B2 ; :::; Bn is a collection
n
S
of mutually exclusive events in A satisfying
Bi =
and P [Bi ] > 0 for
i=1
i = 1; 2; :::; n then for every A 2 A,
P [A] =
n
X
P [AjBi ]P [Bi ]
i=1
Another important theorem in probability is the so called Bayes’Theorem which states:
Given a probability space ( ; A; P [:]), if B1 ; B2 ; :::; Bn is a collection of
n
S
mutually exclusive events in A satisfying
Bi =
and P [Bi ] > 0 for
i=1
i = 1; 2; :::; n then for every A 2 A for which P [A] > 0 we have:
P [AjBj ]P [Bj ]
P [Bj jA] = P
n
P [AjBi ]P [Bi ]
i=1
Notice that for events A and B 2 A which satisfy P [A] > 0 and P [B] > 0
we have:
P [BjA] =
P [AjB]P [B]
P [AjB]P [B] + P [AjB]P [B]
Finally the Multiplication Rule states:
6
Given a probability space ( ; A; P [:]), if A1 ; A2 ; :::; An are events in A for
which P [A1 A2 ::::::An 1 ] > 0 then:
P [A1 A2 ::::::An ] = P [A1 ]P [A2 jA1 ]P [A3 jA1 A2 ]:::::P [An jA1 A2 ::::An 1 ]
Example: There are …ve boxes and they are numbered 1 to 5.
Each box contains 10 balls. Box i has i defective balls and 10 i non-defective
balls, i = 1; 2; ::; 5. Consider the following random experiment: First a box is
selected at random, and then a ball is selected at random from the selected
box. 1) What is the probability that a defective ball will be selected? 2) If
we have already selected the ball and noted that it is defective, what is the
probability that it came from the box 5?
Let A denote the event that a defective ball is selected and Bi the event
that box i is selected, i = 1; 2; ::; 5. Note that P [Bi ] = 1=5, for i = 1; 2; ::; 5,
and P [AjBi ] = i=10. Question 1) is what is P [A]? Using the theorem of total
probabilities we have:
5
5
P
P
P [A] =
P [AjBi ]P [Bi ] =
i=1
i=1
i1
55
= 3=10. Notice that the total number
of defective balls is 15 out of 50. Hence in this case we can say that P [A] =
15
50
= 3=10. This is true as the probabilities of choosing each of the 5 boxes
is the same. Question 2) asks what is P [B5 jA]. Since box 5 contains more
defective balls than box 4, which contains more defective balls than box 3 and
so on, we expect to …nd that P [B5 jA] > P [B4 jA] > P [B3 jA] > P [B2 jA] >
P [B1 jA]. We apply Bayes’theorem:
P [B5 jA] =
P [AjB5 ]P [B5 ]
=
5
P
P [AjBi ]P [Bi ]
i=1
7
11
25
3
10
=
1
3
Similarly P [Bj jA] =
P [AjBj ]P [Bj ]
5
P
P [AjBi ]P [Bi ]
=
j 1
10 5
3
10
=
j
15
for j = 1; 2; :::; 5. Notice that
i=1
unconditionally all Bi0 s were equally likely.
Let A and B be two events in A and a probability function P [:]. Events A
and B are de…ned independent if and only if one of the following conditions
is satis…ed:
(i) P [AB] = P [A]P [B];
(ii) P [AjB] = P [A]if P [B] > 0 and
(iii) P [BjA] = P [B]if P [A] > 0:
Example: Consider tossing two dice. Let A denote the event of
an odd total, B the event of an ace on the …rst die, and C the event of a
total of seven. We ask the following:
(i) Are A and B independent?
(ii) Are A and C independent?
(iii) Are B and C independent?
(i) P [AjB] = 1=2, P [A] = 1=2 hence P [AjB] = P [A] and consequently A
and B are independent.
(ii) P [AjC] = 1 6= P [A] = 1=2 hence A and C are not independent.
(iii) P [CjB] = 1=6 = P [C] = 1=6 hence B and C are independent.
Notice that although A and B are independent and C and B are independent A and C are not independent.
Let us extend the independence of two events to several ones:
For a given probability space ( ; A; P [:]), let A1 ; A2 ; :::; An be n events in
A. Events A1 ; A2 ; :::; An are de…ned to be independent if and only if:
P [Ai Aj ] = P [Ai ]P [Aj ] for i 6= j
P [Ai Aj Ak ] = P [Ai ]P [Aj ]P [Ak ] for i 6= j; i 6= k; k 6= j
8
and so on
n
n
T
Q
P [ Ai ] =
P [Ai ]
i=1
i=1
Notice that pairwise independence does not imply independence, as the
following example shows.
Example: Consider tossing two dice. Let A
1
denote the event
of an odd face in the …rst die, A2 the event of an odd face in the second
die, and A3 the event of an odd total. Then we have: P [A1 ]P [A2 ] =
P [A1 A2 ]; P [A1 ]P [A3 ] =
1
4
11
22
11
22
=
= P [A3 jA1 ]P [A1 ] = P [A1 A3 ]; and P [A2 A3 ] =
= P [A2 ]P [A3 ] hence A1 ; A2 ; A3 are pairwise independent. However notice
that P [A1 A2 A3 ] = 0 6=
1
8
= P [A1 ]P [A2 ]P [A3 ]. Hence A1 ; A2 ; A3 are not
independent.
Notice that the property of two events A and B and the property that
A and B are mutually exclusive are distinct, though related properties. We
know that if A and B are mutually exclusive then P [AB] = 0. Now if
these events are also independent then P [AB] = P [A]P [B], and consequently
P [A]P [B] = 0, which means that either P [A] = 0 or P [B] = 0. Hence two
mutually exclusive events are independent if P [A] = 0 or P [B] = 0. On the
other hand if P [A] 6= 0 and P [B] 6= 0, then if A and B are independent can
not be mutually exclusive and oppositely if they are mutually exclusive can
not be independent.
Example: A plant has two machines. Machine A produces 60%
of the total output with a fraction defective of 0.02. Machine B the rest output with a fraction defective of 0.04. If a single unit of output is observed to
be defective, what is the probability that this unit was produced by machine
A?
If A is the event that the unit was produced by machine A, B the event
9
that the unit was produced by machine B and D the event that the unit
is defective. Then we ask what is P [AjD]. But P [AjD] =
Notice that P [BjD] = 1
Now
0:6 = 0:012. Also P [D] = P [DjA]P [A] +
P [AD] = P [DjA]P [A] = 0:02
P [DjB]P [B] = 0:012 + 0:04
P [AD]
.
P [D]
0:4 = 0:028. Consequently, P [AjD] = 0:571.
P [AjD] = 0:429. We can also use a tree diagram
to evaluate P [AD] and P [BD].
Example:
A marketing manager believes the market demand
potential of a new product to be high with a probability of 0.30, or average
with probability of 0.50, or to be low with a probability of 0.20. From a
sample of 20 employees, 14 indicated a very favorable reception to the new
product. In the past such an employee response (14 out of 20 favorable) has
occurred with the following probabilities: if the actual demand is high, the
probability of favorable reception is 0.80; if the actual demand is average,
the probability of favorable reception is 0.55; and if the actual demand is
low, the probability of the favorable reception is 0.30. Thus given a favorable
reception, what is the probability of actual high demand?
Again what we ask is P [HjF ] =
P [HF ]
.
P [F ]
Now P [F ] = P [H]P [F jH] +
P [A]P [F jA] + P [L]P [F jL] = 0:24 + 0:275 + 0:06 = 0:575. Also P [HF ] =
P [F jH]P [H] = 0:24. Hence P [HjF ] =
10
0:24
0:575
= 0:4174
1.2
1.2.1
Discrete and Continuous Random Variables
Random Variable and Cumulative Distribution Function
For a given probability space ( ; A; P [:]) a random variable, denoted by X
or X(:), is a function with domain
and counterdomain the real line. The
function X(:) must be such that the set Ar , de…ned by Ar = f! : X(!)
rg,
belongs to A for every real number r.
The important part of the de…nition is that in terms of a random experiment,
is the totality of outcomes of that random experiment, and the
function, or random variable, X(:) with domain
makes some real number
correspond to each outcome of the experiment. The fact that we also require
the collection of !0s for which X(!)
r to be an event (i.e. an element of A)
for each real number r is not much of a restriction since the use of random
variables is, in our case, to describe only events.
Example: Consider the experiment of tossing a single coin. Let
the random variable X denote the number of heads. In this case
= fhead;
tailg, and X(!) = 1 if ! = head, and X(!) = 0 if ! = tail. So the random
variable X associates a real number with each outcome of the experiment.
To show that X satis…es the de…nition we should show that f! : X(!)
rg,
belongs to A for every real number r. A = f ; fheadg; ftailg; g. Now if
r < 0, f! : X(!)
if r
rg = , if 0
1 then f! : X(!)
f! : X(!)
r < 1 then f! : X(!)
rg = fhead; tailg =
rg = ftailg, and
. Hence, for each r the set
rg belongs to A and consequently X(:) is a random variable.
In the above example the random variable is described in terms of the
random experiment as opposed to its functional form, which is the usual case.
The cumulative distribution function of a random variable X, denoted by FX (:), is de…ned to be that function with domain the real line and
11
counterdomain the interval [0; 1] which satis…es FX (x) = P [X
x] = P [f! :
xg] for every real number x.
X(!)
A cumulative distribution function is uniquely de…ned for each random
variable. If it is known, it can be used to …nd probabilities of events de…ned in
terms of its corresponding random variables. Notice that it is in the de…nition
of the random variable the requirement that f! : X(!)
rg belongs to A for
every real number r which appears in our de…nition of random variable X.
Notice that each of the three words in the expression “cumulative distribution
function”is justi…able.
Example: Consider again the experiment of tossing a single coin.
Assuming that
8 the coin is fair, let X denote the number of heads. Then:
>
>
0
if x < 0
>
<
FX (x) =
1=2
if 0 x < 1
>
>
>
:
1
if 1 x
A cumulative distribution function has the following Properties:
i) FX ( 1) = lim FX (x) = 0 and FX (1) = lim FX (x) = 1
x! 1
x!1
ii) FX (:) is a monotone, nondecreasing function i.e., if
FX ( )
<
then
FX ( )
iii) FX (:) is continues from the right i.e. lim+ FX (x + h) = FX (x)
h!0
Now, we can say that any function with domain the real line and counterdomain the interval [0; 1], satisfying the above 3 properties can be called
cumulative distribution function. Now we can de…ne the discrete and continuous random variables
1.2.2
Discrete Random Variable
A random variable X will be de…ned to be discrete if the range of X is countable. If a random variable X is discrete, then its corresponding cumulative
12
distribution function FX (:) will be de…ned to be discrete.
By the range of X being countable we mean that there exists a …nite or
denumerable set of real numbers, say x1 ; x2 ; :::xn ; :::, such that X takes on
values only in that set. If X is discrete with distinct values x1 ; x2 ; :::xn ; :::,
S
then
= f! : X(!) = xn g; and fX = xi g \ fX = xj g = for i 6= j.
n
P
Hence 1 = P [ ] = P [X = xn ] by the third axiom of probability.
n
If X is a discrete random variable with distinct values x1 ; x2 ; :::xn ; :::, then
the function,8denoted by fX (:) and de…ned by
< P [X = x]
if x = xj ; j = 1; 2; :::; n; :::
fX (x) =
:
0
if x 6= xj
is de…ned to be the discrete density function of X.
Notice that the discrete density function tell us how likely or probable
each of the values of a discrete random variable is. It also enables one to
calculate the probability of events described in terms of the discrete random
variable. Also notice that for any discrete random variable X, FX (:) can be
obtained from fX (:), and vice versa
Example: Consider the experiment of tossing a single die. Let
X denote the number of spots on the upper face. Then for this case we have:
X takes any value from the set f1; 2; 3; 4; 5; 6g. So X is a discrete random
variable. The density function of X is: fX (x) = P [X = x] = 1=6 for any
x 2 f1; 2; 3; 4; 5; 6g and 0 otherwise. The cumulative distribution function
[x]
P
of X is: FX (x) = P [X
x] =
P [X = n] where [x] denotes the integer
n=1
part of x.. Notice that x can be any real number. However, the points
of interest are the elements of f1; 2; 3; 4; 5; 6g. Notice also that in this case
= f1; 2; 3; 4; 5; 6g as well, and we do not need any reference to A.
Example: Consider the experiment of tossing two dice. Let X
denote the total of the upturned faces. Then for this case we have:
13
= f(1; 1); (1; 2); :::(1; 6); (2; 1); (2; 2); ::::(2; 6); (3; 1); :::::; (6; 6)g a total
of (using the Multiplication rule) 36 = 62 elements. X takes values from the
set f2; 3; 4; 5; 6; 7; 8; 9; 10;8
11; 12g. The density function is:
>
>
1=36
f or x = 2 or x = 12
>
>
>
>
>
>
2=36
f or x = 3 or x = 11
>
>
>
>
>
>
3=36
f or x = 4 or x = 10
>
<
fX (x) = P [X = x] =
4=36
f or x = 5 or x = 9
>
>
>
>
>
5=36
f or x = 6 or x = 8
>
>
>
>
>
>
1=36
f or x = 7
>
>
>
>
: 0
f or any other x
The cumulative distribution function is:
8
>
>
0
f or x < 2
>
>
>
>
1
>
>
f or 2 x < 3
>
36
>
>
>
3
>
>
f or 3 x < 4
>
36
>
>
>
< 6
[x]
f or 4 x < 5
P
36
FX (x) = P [X x] =
P [X = n] =
10
>
n=1
>
f or 5 x < 6
>
36
>
>
>
>
>
::::::::::
>
>
>
>
>
35
>
f or 11 x < 12
>
36
>
>
>
: 1
f or 12 x
Notice that, again, we do not need any reference to A.
In fact we can speak of discrete density functions without reference to
some random variable at all.
Any function f (:) with domain the real line and counterdomain [0; 1]
is de…ned to be a discrete density function if for some countable set
14
x1 ; x2 ; :::xn ; :::: has the following properties:
i)
f (xj ) > 0 f or j = 1; 2; :::
ii)
f (x) = 0 f or x 6= xj ; j = 1; 2; ::: and
X
f (xj ) = 1; where the summation is over the points x1 ; x2 ; :::xn ; ::::
iii)
1.2.3
Continuous Random Variable
A random variable X is called continuous if there exist a function fX (:) such
Rx
fX (u)du for every real number x. In such a case FX (x) is the
that FX (x) =
1
cumulative distribution and the function fX (:) is the density function.
Notice that according to the above de…nition the density function is not
uniquely determined. The idea is that if the function change value in a
few points its integral is unchanged. Furthermore, notice that fX (x) =
dFX (x)=dx.
The notations for discrete and continuous density functions are the same,
yet they have di¤erent interpretations. We know that for discrete random
variables fX (x) = P [X = x], which is not true for continuous random variables. Furthermore, for discrete random variables fX (:) is a function with
domain the real line and counterdomain the interval [0; 1], whereas, for continuous random variables fX (:) is a function with domain the real line and
counterdomain the interval [0; 1).
Example: Let X be the random variable representing the length
of a telephone conversation. One could model this experiment by assuming
that the distribution of X is given by FX (x) = (1
e
x
) where
is some
positive number and the random variable can take values only from the interval [0; 1). The density function is dFX (x)=dx = fX (x) = e
x
. If we
assume that telephone conversations are measured in minutes, P [5 < X
15
10] =
R 10
5
fX (x)dx =
that P [5 < X
R 10
10] = e
x
e
5
1
e
dx = e
2
5
e
10
, and for
= 1=5 we have
= 0:23.
The example above indicates that the density functions of continuous
random variables are used to calculate probabilities of events de…ned in terms
of the corresponding continuous random variable X i.e. P [a < X
b] =
Rb
f (x)dx. Again we can give the de…nition of the density function without
a X
any reference to the random variable i.e.
any function f (:) with domain the real line and counterdomain [0; 1) is
de…ned to be a probability density function if f
(ii)
(i)
Z 1
f (x)
0 f or all x
and
f (x)dx = 1:
1
In practice when we refer to the certain distribution of a random variable,
we state its density or cumulative distribution function. However, notice that
not all random variables are either discrete or continuous.
16
1.3
Expectations and Moments of Random Variables
An extremely useful concept in problems involving random variables or distributions is that of expectation.
1.3.1
Mean
Let X be a random variable. The mean or the expected value of X,
denoted by E[X] or
X,
is de…ned by:
E[X] =
X
xj P [X = xj ] =
X
xj fX (xj )
if X is a discrete random variable with counterdomain the countable set
fx1 ; :::; xj ; ::g and
E[X] =
Z
1
xfX (x)dx
1
if X is a continuous random variable with density function fX (x).
Finally,
E[X] =
Z
1
[1
FX (x)]dx
0
Z
0
FX (x)dx
1
for an arbitrary random variable X.
The …rst two de…nitions are used in practice to …nd the mean for discrete
and continuous random variables, respectively. The third one is used for the
mean of a random variable that is neither discrete nor continuous.
Notice that in the above de…nition we assume that the sum and the
integrals exist, also that the summation in (i) runs over the possible values
of j and the j th term is the value of the random variable multiplied by the
probability that the random variable takes this value. Hence E[X] is an
average of the values that the random variable takes on, where each value is
weighted by the probability that the random variable takes this value. Values
that are more probable receive more weight. The same is true in the integral
17
form in (ii). There the value x is multiplied by the approximate probability
that X equals the value x, i.e. fX (x)dx, and then integrated over all values.
Notice that in the de…nition of a mean of a random variable, only density functions or cumulative distributions were used. Hence we have really
de…ned the mean for these functions without reference to random variables.
We then call the de…ned mean the mean of the cumulative distribution or
the appropriate density function. Hence, we can speak of the mean of a
distribution or density function as well as the mean of a random variable.
Notice that E[X] is the center of gravity (or centroid) of the unit mass
that is determined by the density function of X. So the mean of X is a
measure of where the values of the random variable are centered or located
i.e. is a measure of central location or central tendency.
Example: Consider the experiment of tossing two dice. Let X
denote the total of the upturned faces. Then for this case we have:
E[X] =
12
X
ifX (i) = 7:
i=2
Example: Consider a X that can take only to possible values, 1
and -1, each with probability 0.5. Then the mean of X is:
E[X] = 1 0:5 + ( 1) 0:5 = 0:
Notice that the mean in this case is not one of the possible values of X.
Example: Consider a continuous random variable X with denx
sity function fX (x) = e
E[X] =
Z1
for x 2 [0; 1). Then
xfX (x)dx =
1
Z1
0
18
x e
x
dx = 1= :
Example: Consider a continuous random variable X with density function fX (x) = x
E[X] =
Z1
2
for x 2 [1; 1). Then
xfX (x)dx =
1
Z1
xx 2 dx = lim log b = 1
b!1
1
so we say that the mean does not exist, or that it is in…nite.
1.3.2
Variance
Let X be a random variable and
by
2
X
be E[X]. The variance of X, denoted
X
or var[X], is de…ned by:
(i)
var[X] =
X
2
X ) P [X = xj ] =
(xj
X
(xj
X)
2
fX (xj )
if X is a discrete random variable with counterdomain the countable set
fx1 ; :::; xj ; ::g:
(ii)
var[X] =
Z
1
(xj
2
X ) fX (x)dx
1
if X is a continuous random variable with density function fX (x).
And
(iii)
var[X] =
Z
1
2x[1
FX (x) + FX ( x)]dx
2
X
0
for an arbitrary random variable X.
The variances are de…ned only if the series in (i) is convergent or if the
integrals in (ii) or (iii) exist. Again, the variance of a random variable is
de…ned in terms of the density function or cumulative distribution function
of the random variable and consequently, variance can be de…ned in terms of
these functions without reference to a random variable.
Notice that variance is a measure of spread since if the values of the
random variable X tend to be far from their mean, the variance of X will be
19
larger than the variance of a comparable random variable whose values tend
to be near their mean. It is clear from (i), (ii) and (iii) that the variance is
a nonnegative number.
If X is a random variable with variance 2X , then the standard deviation
p
of X, denoted by X , is de…ned as var(X)
The standard deviation of a random variable, like the variance, is a mea-
sure of spread or dispersion of the values of a random variable. In many
applications it is preferable to the variance since it will have the same measurement units as the random variable itself. In …nance the standard deviation is a measure of risk, although there are other measures, as well, i.e.
semi-standard deviation etc.
Example: Consider the experiment of tossing two dice. Let X
denote the total of the upturned faces. Then for this case we have (
12
X
var[X] =
(i
2
X ) fX (i)
X
= 7):
= 210=36:
i=2
Example: Consider a X that can take only to possible values, 1
and -1, each with probability 0.5. Then the variance of X is (
X
= 0):
var[X] = 0:5 12 + 0:5 ( 1)2 = 1:
Example: Consider a X that can take only to possible values,
10 and -10, each with probability 0.5. Then we have:
X
= E[X] = 10 0:5 + ( 10) 0:5 = 0;
and
var[X] = 0:5 102 + 0:5 ( 10)2 = 100:
20
Notice that in the second and third examples, the two random variables
have the same mean but di¤erent variance, larger being the variance of the
random variable with values further away from the mean.
Example: Consider a continuous random variable X with denx
sity function fX (x) = e
var[X] =
Z1
for x 2 [0; 1). Then (
Z1
2
(x
X ) fX (x)dx =
(x
1
X
= 1= ):
1= )2 e
x
dx =
1
2:
0
Example: Consider a continuous random variable X with density function fX (x) = x
2
for x 2 [1; 1). Then we know that the mean of X
does not exist. Consequently, we can not de…ne the variance.
1.3.3
Expected Value of a Function of a Random Variable
Let X be a random variable and g(:) be a function with domain and counterdomain the real line. The expectation or expected value of the function
g(:) of the random variable X, denoted by E[g(X)], is de…ned by:
(i)
E[g(X)] =
X
g(xj )P [X = xj ] =
X
g(xj )fX (xj )
if X is a discrete random variable with counterdomain the countable set
fx1 ; :::; xj ; ::g:
(ii)
E[X] =
Z
1
g(x)fX (x)dx
1
if X is a continuous random variable with density function fX (x).
Properties of expected value:
(i) E[c] = c for all constants c.
(ii) E[cg(x)] = cE[g(x)] for a constant c.
21
(iii) E[c1 g1 (x) + c2 g2 (x)] = c1 E[g1 (x)] + c2 E[g2 (x)].
(iv) E[g1 (x)]
E[g2 (x)] if g1
g2 (x) for all x.
From the above properties we can prove two important theorems.
Theorem 1.
For any random variable X
var[X] = E[X 2 ]
(E[X])2 :
For the second theorem we shall need the following de…nition of the convex
function. A continuous function g(:) with domain and counterdomain the real
line is called convex if for any x0 on the real line, there exist a line which goes
through the point (x0 ; g(x0 )) and lies on or under the graph of the function
g(:). Also if g == (x0 )
0 then g(:) is convex.
Theorem 2 (Jensen Inequality)
Let X be a random variable with mean E[X], and let g(:) be a convex
function. Then E[g(X)]
g(E[X]).
We can also use these properties to …nd the expected return and variance
(standard deviation) of a portfolio of assets. We shall need the following
de…nitions:
Let X and Y be any two random variables de…ned on the same probability
space. The covariance of X and Y , denoted by cov[X; Y ] or
X;Y ,
is de…ned
as:
cov[X; Y ] = E[(X
X )(Y
Y )]
provided that the indicated expectation exists.
The correlation coe¢ cient or simply the correlation, denoted by
[X; Y ] or
X;Y ,
of random variables X and Y is de…ned to be:
X;Y
=
cov[X; Y ]
X
22
Y
provided that cov[X; Y ],
X,
and
Y
exist, and
X
> 0 and
Y
> 0.
Both the covariance and the correlation of random variables X and Y
are measures of a linear relationship of X and Y in the following sense.
cov[X; Y ] will be positive when (X
X)
and (Y
Y)
tend to have the
same sign with high probability, and cov[X; Y ] will be negative when (X
X)
and (Y
Y)
tend to have opposite signs with high probability. The
actual magnitude of the cov[X; Y ] does not much meaning of how strong
the linear relationship between X and Y is. This is because the variability
of X and Y is also important. The correlation coe¢ cient does not have
this problem, as we divide the covariance by the product of the standard
deviations. Furthermore, the correlation is unitless and
1. We can
1
prove that
cov[X; Y ] = E[(X
X )(Y
Y )]
= E[XY ]
X
Y
The properties are very useful for evaluating the expected return and
standard deviation of a portfolio. Assume ra and rb are the returns on
assets A and B, and their variances are
2
a
and
2
b,
respectively. Assume that
we form a portfolio of the two assets with weights wa and wb , respectively.
If the correlation of the returns of these assets is , …nd the expected return
and standard deviation of the portfolio.
If Rp is the return of the portfolio then
Rp = wa ra + wb rb :
The expected portfolio return is
E[Rp ] = wa E[ra ] + wb E[rb ]:
23
The variance of the portfolio is
var[Rp ] = var[wa ra + wb rb ] = E[(wa ra + wb rb )2 ]
(E[wa ra + wb rb ])2
wb2 (E[rb ])2
= wa2 E[ra2 ] + wb2 E[rb2 ] + 2wa wb E[ra rb ]
wa2 (E[ra ])2
= wa2 E[ra2 ]
(E[rb ])2 + 2wa wb fE[ra rb ]
(E[ra ])2 + wb2 E[rb2 ]
2wa wb E[ra ]E[rb ]
= wa2 var[ra ] + wb2 var[rb ] + 2wa wb cov[ra ; rb ]
= wa2
2
a
+ wb2
2
b
+ 2wa wb
a b:
In a vector format we have:
E[Rp ] =
@
wa wb
and
var[Rp ] =
0
0
2
a
@
wa wb
E[ra ]
E[rb ]
a b
a b
2
b
1
A
10
A@
wa
wb
1
A:
From the above example we can see that
var[aX + bY ] = a2 var[X] + b2 var[Y ] + 2abcov[X; Y ]
for random variables X and Y and constants a and b. In fact we can generalize
the formula above for several random variables X1 ; X2 ; :::; Xn and constants
a1 ; a2 ; a3 ; :::; an i.e.
var[a1 X1 + a2 X2 + :::an Xn ] =
n
X
a2i var[Xi ]
i=1
1.3.4
+2
n
X
ai aj cov[Xi ; Xj ]:
i<j
Moments of a Random Variable
If X is a random variable, the rth raw moment of X, denoted by
de…ned as:
=
r
= E[X r ]
24
=
r,
is
E[ra ]E[rb ]g
if this expectation exists.
Notice that
=
r
=
1
= E[X] =
=
X,
the mean of X.
If X is a random variable, the rth central moment of X about
de…ned as E[(X
about
X,
)r ]: If
denoted by
r,
=
X,
is
we have the rth central moment of X
which is:
r
= E[(X
r
X ) ]:
We de…ne measures in terms of quantiles to describe some of the characteristics of random variables or density functions.
The qth quantile of a random variable X or of its corresponding distribution is denoted by
FX ( )
q
and is de…ned as the smallest number
satisfying
q. If X is a continuous random variable, then the of X is given as
the smallest number
satisfying FX ( )
q.
The median of a random variable X, denoted by medX or med(X); or
q,
is the 0.5th quantile.
Notice that if X a continuous random variable the median of X satis…es:
Z med(X)
Z 1
1
fX (x)dx = =
fX (x)dx
2
1
med(X)
so the median of X is any number that has half the mass of X to its right
and the other half to its left. The median and the mean are measures of
central location.
The third moment about the mean,
3,
is called a measure of asymme-
try, or skewness. Symmetrical distributions can be shown to have
3
= 0.
Distributions can be skewed to the left or to the right. However, knowledge of the third moment gives no clue as to the shape of the distribution, i.e. it could be the case that
3
25
= 0 but the distribution to be far
from symmetrical. The ratio
3
3
is unitless and is call the coe¢ cient of
skewness. An alternative measure of skewness is provided by the ratio:
(mean
median) = (standard deviation)
The fourth moment about the mean is used as a measure of kurtosis,
which is a degree of ‡atness of a density near the center. The coe¢ cient
of kurtosis is de…ned as
4
4
3 and positive values are sometimes used
to indicate that a density function is more peaked around its center than
the normal (leptokurtic distributions). A positive value of the coe¢ cient of
kurtosis is indicative for a distribution which is ‡atter around its center than
the standard normal (platykurtic distributions). This measure su¤ers from
the same failing as the measure of skewness i.e. it does not always measure
what it supposed to.
While a particular moment or a few of the moments may give little information about a distribution the entire set of moments will determine the
distribution exactly. In applied statistics the …rst two moments are of great
importance, but the third and forth are also useful.
26
2
Parametric Families of Univariate Distributions
A parametric family of density functions is a collection of density functions
that are indexed by a quantity called parameter, e.g. let f (x; ) = e
x > 0 and some
> 0.
is the parameter, and as
numbers, the collection ff (:; ) :
x
for
ranges over the positive
> 0g is a parametric family of density
functions.
2.1
Discrete Univariate Distributions
Let us start with discrete univariate distributions.
2.1.1
Bernoulli Distribution
A random variable whose outcome have been classi…ed into two categories,
called “success”and “failure”, represented by the letters s and f, respectively,
is called a Bernoulli trial. If a random variable X is de…ned as 1 if a Bernoulli
trial results in success and 0 if the same Bernoulli trial results in failure, then
X has a Bernoulli distribution with parameter p = P [success]. The de…nition
of this distribution is:
A random variable X has a Bernoulli distribution if the discrete density
of X is given by:
8
< px (1
fX (x) = fX (x; p) =
:
0
p)1
x
f or
x = 0; 1
otherwise
where p = P [X = 1]. For the above de…ned random variable X we have
that:
E[X] = p
and
var[X] = p(1
27
p)
2.1.2
Binomial Distribution
Consider a random experiment consisting of n repeated independent Bernoulli
trials with p the probability of success at each individual trial. Let the random variable X represent the number of successes in the n repeated trials.
Then X follows a Binomial distribution. The de…nition of this distribution
is:
A random variable X has a binomial distribution, X s Binomial(n; p),
if the discrete density of X is given by:
8 0 1
>
>
n
>
< @ A px (1
fX (x) = fX (x; n; p) =
x
>
>
>
:
0
p)n
x
f or
x = 0; 1; :::; n
otherwise
where p = P [X = 1] i.e. the probability of success in each independent
Bernoulli trial and n is the total number of trials. For the above de…ned
random variable X we have that:
E[X] = np
and
var[X] = np(1
p)
Example: Consider a stock with value S = 50. Each period the
stock moves up or down, independently, in discrete steps of 5. The probability
of going up is p = 0:7 and down 1
p = 0:3. What is the expected value and
the variance of the value of the stock after 3 period?
If we call X the random variable which is a success if the stock moves
up and failure if the stock moves down. Then P [X = success] = P [X =
1] = 0:7, and X~Binomial(3; p). Now X can take the values 0; 1; 2; 3 i.e. no
success, 1 success and 2 failures, etc.. The value of the stock in each case
and the probabilities are:0
S = 35, and fX (0) = @
3
0
1
A p0 (1
p)3
28
0
= 1 0:33 = 0:027;
0
S = 45, and fX (1) = @
0
S = 55, and fX (2) = @
0
S = 65 and fX (3) = @
3
1
3
2
3
1
A p1 (1
1
A p2 (1
1
A p3 (1
p)3
1
= 3 0:7 0:32 = 0:189;
p)3
2
= 3 0:72 0:3 = 0:441,
p)3
3
= 1 0:73 = 0:343.
3
Hence the expected stock value is:
E[S] = 35 0:027 + 45 0:189 + 55 0:441 + 65 0:343 = 56, and var[S] =
56)2 0:027 + ( 11)2 0:189 + ( 1)2 0:441 + (9)2 0:343
(35
2.1.3
Hypergeometric Distribution
Let X denote the number of defective balls in a sample of size n when sampling is done without replacement from a box containing M balls out of
which K are defective. The X has a hypergeometric distribution. The de…nition of this distribution is:
A random variable X has a hypergeometric distribution if the discrete
density of X is given by:
fX (x) = fX (x; M; K; n) =
8
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
:
0
B
B
B
@
K
x
10
CB
CB
CB
A@
0
B
B
B
@
M
n
M
n
0
K
1
x
C
C
C
A
1
C
C
C
A
f or
x = 0; 1; :::; n
otherwise
where M is a positive integer, K is a nonnegative that is at most M , and n
is a positive integer that is at most M . For this distribution we have that:
E[X] = n
K
M
and
var[X] = n
KM KM
M M M
n
1
Notice the di¤erence of the binomial and the hypergeometric i.e. for the
binomial distribution we have Bernoulli trials i.e. independent trials with
29
…xed probability of success or failure, whereas in the hypergeometric in each
trial the probability of success or failure changes depending on the result.
2.1.4
Poisson Distribution
A random variable X has a Poisson distribution, X s P oisson( ), if the
discrete density of X is given by:
fX (x) = fX (x; ) =
where
8
<
x
e
x!
: 0
is a parameter satisfying
f or
x = 0; 1; :::
otherwise
> 0. For the Poisson distribution we
have that:
E[X] =
and
var[X] =
The Poisson distribution provides a realistic model for many random phenomena. Since the values of a Poisson random variable are nonnegative integers, any random phenomenon for which a count of some sort is of interest
is a candidate for modeling in assuming a Poisson distribution. Such a count
might be the number of fatal tra¢ c accidents per week in a given place, the
number of telephone calls per hour, arriving in a switchboard of a company,
the number of pieces of information arriving per hour, etc.
Example: It is known that the average number of daily changes
in excess of 1%, for a speci…c stock Index, occurring in each six-month period
is 5. What is the probability of having one such a change within the next
6 months? What is the probability of at least 3 changes within the same
period?
We model the number of in excess of 1% changes, X, within the next 6
months as a Poisson process. We know that E[X] =
x
e
x!
=
e
Also P [X
5 5x
,
x!
= 5. Hence fX (x) =
for x = 0; 1; 2; ; ::: Then P [X = 1] = fX (1) =
3] = 1
P [X < 3] =
30
e
5 51
1!
= 0:0337.
=1
=1
P [X = 0]
e
5 50
0!
P [X = 1]
5 51
e
e
1!
5 52
2!
P [X = 2] =
= 0:875
It is worth noticing that the Binomial distribution Binomial(n; p) can
be approximated by a P oisson(np). The approximation is more valid as
n ! 1; p ! 0; in such a way so that np = constant.
2.2
Geometric Distribution
Consider a sequence of independent Bernoulli trials with p equal the probability of success on an individual trial. Let the random variable X represent
the number of trials required before the …rst success. Then X has a geometric
distribution. The de…nition of this distribution is:
A random variable X has a geometric distribution, X s geometric(p) ,
if the discrete density of X is given by:
8
< p(1 p)x
fX (x) = fX (x; p) =
:
0
f or
x = 0; 1; :::; n
otherwise
where p is the probability of success in each Bernoulli trial. For this distribution we have that:
E[X] =
1
p
p
and
31
var[X] =
1
p
p2
2.3
2.3.1
Continuous Univariate Distributions
Uniform Distribution
A very simple distribution for a continuous random variable is the uniform
distribution. Its density function is:
fX (x) = fX (x; a; b) =
where
1
b
a
f or
a
x
b
1 < a < b < 1. Then the random variable X is de…ned to
be uniformly distributed over the interval [a; b]. Now if X is uniformly
distributed over [a; b] then
E[X] =
a+b
2
and
var[X] =
a)2
12
(b
Notice that if a random variable is uniformly distributed over one of the
following intervals [a; b), (a; b], (a; b) the density function, expected value
and variance do not change.
2.3.2
Exponential Distribution
If a random variable X has a density function given by:
fX (x) = fX (x; ) = e
where
x
f or
0
x<1
> 0 then X is de…ned to have an (negative) exponential distribution.
Now this random variable X we have
E[X] =
2.3.3
1
and
var[X] =
1
2
Pareto-Levy or Stable Distributions
The stable distributions are a natural generalization of the normal in that,
as their name suggests, they are stable under addition, i.e. a sum of stable
32
random variables is also a random variable of the same type. However,
nonnormal stable distributions have more probability mass in the tail areas
than the normal. In fact, the nonnormal stable distributions are so fat-tailed
that their variance and all higher moments are in…nite.
Closed form expressions for the density functions of stable random variables are available for only the cases of normal and Cauchy.
If a random variable X has a density function given by:
fX (x) = fX (x; ; ) =
where
1<
1
2
< 1 and 0 <
)2
+ (x
1<x<1
f or
< 1; then X is de…ned to have a Cauchy
distribution. Notice that for this random variable even the mean is in…nite.
2.3.4
Normal Distribution
A random variable X is de…ned to be normally distributed with parameters
and
2
, denoted by X s N ( ;
fX (x) = fX (x; ;
where
and
2
2
1
)= p
2
2
), if its density function is given by:
2
e
(x
)2
2 2
are parameters such that
1<x<1
f or
1 <
< 1 and
2
> 0.
Any distribution de…ned as above is called a normal distribution. Now if
X s N( ;
2
) then
E[X] =
and
var[X] =
2
If the normal random variable has mean 0 and variance 1, then this random
variable is called standard normal random variable. In such a case the,
standard normal, density function is denoted by (x) i.e. we have
s tan dard normal
density
33
1
(x) = p e
2
x2
2
The cumulative distribution of X when it is normally distributed is given by
2
FX (x) = FX (x; ;
)=
Zx
1
1
p
2
2
(u
)2
2 2
e
du
Notice that the cumulative distribution of the standard normal, denoted by
(x), is given by
Zx
(x) =
Notice that if X s N ( ;
2
1
1
p e
2
u2
2
du
) then the random variable Y = (X
)= is
distributed as standard normal, i.e. Y s N (0; 1). this is called the standardization of the random variable X. This is very useful as there are statistical
tables where it is presented areas under the standard normal distribution.
2.3.5
Lognormal Distribution
Let X be a positive random variable, and let a new random variable Y be
de…ned as Y = log X. If Y has a normal distribution, then X is said to have
a lognormal distribution. The density function of a lognormal distribution is
given by
fX (x; ;
where
and
2
2
1
)= p
x 2
2
e
(log x
)2
2 2
are parameters such that
0<x<1
f or
1<
2
< 1 and
> 0. We
haven
E[X] = e
+ 21
2
var[X] = e2
and
+2
2
e2
+
2
Notice that if X is lognormally distributed then
E[log X] =
and
var[log X] =
2
Notice that we can approximate the Poisson and Binomial functions by
the normal, in the sense that if a random variable X is distributed as Poisson
34
with parameter , then
Xp
is distributed approximately as standard normal.
On the other hand if Y s Binomial(n; p) then pY
np
np(1 p)
s N (0; 1).
The standard normal is an important distribution for another reason, as
well. Assume that we have a sample of n independent random variables,
x1 ; x2 ; :::; xn , which are coming from the same distribution with mean m and
variance s2 , then we have the following:
1 X xi m
p
s N (0; 1)
s
n i=1
n
This is the well known Central Limit Theorem for independent observations.
35
3
Statistical Inference
3.1
Sampling Theory
To proceed we shall need the following de…nitions.
Let X1 ; X2 ; :::; Xk be k random variables all de…ned on the same probability space ( ; A; P [:]). The joint cumulative distribution function of
X1 ; X2 ; :::; Xk , denoted by FX1 ;X2 ;:::Xn ( ; ; :::; ), is de…ned as
FX1 ;X2 ;:::Xk (x1 ; x2 ; :::; xk ) = P [X1
x1 ; X 2
x2 ; :::; Xk
xk ]
for all (x1 ; x2 ; :::; xk ).
Let X1 ; X2 ; :::; Xk be k discrete random variables, then the joint discrete
density function of these, denoted by fX1 ;X2 ;:::Xk ( ; ; :::; ), is de…ned to
be
fX1 ;X2 ;:::Xk (x1 ; x2 ; :::; xk ) = P [X1 = x1 ; X2 = x2 ; :::; Xk = xk ]
for (x1 ; x2 ; :::; xk ), a value of (X1 ; X2 ; :::; Xk ) and is 0 otherwise.
Let X1 ; X2 ; :::; Xk be k continuous random variables, then the joint continuous density function of these, denoted by fX1 ;X2 ;:::Xk ( ; ; :::; ), is
de…ned to be a function such that
FX1 ;X2 ;:::Xk (x1 ; x2 ; :::; xk ) =
Zxk
1
:::
Zx1
fX1 ;X2 ;:::Xk (u1 ; u2 ; :::; uk )du1 ::duk
1
for all (x1 ; x2 ; :::; xk ).
The totality of elements which are under discussion and about which
information is desired will be called the target population. The statistical
problem is to …nd out something about a certain target population. It is
generally impossible or impractical to examine the entire population, but
one may examine a part of it (a sample from it) and, on the basis of this
limited investigation, make inferences regarding the entire target population.
36
The problem immediately arises as to how the sample of the population
should be selected. Of practical importance is the case of a simple random
sample, usually called a random sample, which can be de…ned as follows:
Let the random variables X1 ; X2 ; :::; Xn have a joint density fX1 ;X2 ;:::Xn (x1 ; x2 ; :::; xn )
that factors as follows:
fX1 ;X2 ;:::Xn (x1 ; x2 ; :::; xn ) = f (x1 )f (x2 ):::f (xn )
where f (:) is the common density of each Xi . Then X1 ; X2 ; :::; Xn is de…ned
to be a random sample of size n from a population with density f (:).
A statistic is a function of observable random variables, which is itself
an observable random variable, which does not contain any unknown parameters.
Let X1 ; X2 ; :::; Xk be a random sample from the density f (:). Then the
=
rth sample moment, denoted by Mr , is de…ned as:
1X r
X
n i=1 i
n
Mr= =
In particular, if r = 1, we get the sample mean, which is usually denoted
by X orX n ; that is:
1X
Xn =
Xi
n i=1
n
Also the rth sample central moment (about X n ), denoted by Mr , is
de…ned as:
1X
Xi
Mr =
n i=1
n
Xn
r
We can prove the following theorem:
Theorem
Let X1 ; X2 ; :::; Xk be a random sample from the density f (:). The expected value of the rth sample moment is equal to the rth population moment,
37
i.e. the rth sample moment is an unbiased estimator of the rth population
moment. Proof omitted.
Theorem
1
n
Let X1 ; X2 ; :::; Xn be a random sample from a density f (:), and let X n =
n
P
Xi be the sample mean. Then
i=1
E[X n ] =
where
and
2
and
var[X n ] =
1
n
2
are the mean and variance of f (:), respectively. Notice that
this is true for any distribution f (:), provided that is not in…nite.
Proof
E[X n ] = E[ n1
Also
n
P
Xi ] =
i=1
var[X n ] = var[ n1
n
P
1
n
n
P
E[Xi ] =
i=1
Xi ] =
i=1
1
n2
n
P
1
n
n
P
= n1 n =
i=1
1
n2
var[Xi ] =
i=1
n
P
2
=
i=1
1
n 2
n2
=
1
n
2
Let X1 ; X2 ; :::; Xn be a random sample from a density f (:). Then
Sn2
1
2
=S =
n
1
n
X
(Xi
X n )2
f or
n>1
i=1
is de…ned to be the sample variance.
Theorem
Let X1 ; X2 ; :::; Xn be a random sample from a density f (:), and let Sn2 as
de…ned above. Then
E[Sn2 ] =
where
2
and
4
2
and
var[Sn2 ] =
1
n
4
n
n
3
1
4
are the variance and the 4th central moment of f (:), respec-
tively. Notice that this is true for any distribution f (:), provided that
not in…nite.
Proof
38
4
is
We shall prove …rst the following identity, which will be used latter:
n
X
2
(Xi
) =
i=1
P
n
X
2
Xn
Xi
2
+ n Xn
i=1
P
2
=
Xi X n + X n
(Xi
)2 =
Xi X n + X n
i
Ph
2
2
=
X i X n + 2 Xi X n X n
+ Xn
=
P
P
2
2
Xi X n + n X n
=
=
Xi X n + 2 X n
P
2
2
=
Xi X n + n X n
P
2
=
Using the above identity we obtain:
E[Sn2 ] = E
=
=
"
1
n
1
n
1
1
n
1
"
1
#
n
X
(Xi
X n )2 =
i=1
n
X
1
)2
E (Xi
n
1
E
"
n
X
i=1
2
nE X n
i=1
n
2
1
n
n
2
=
)2
(Xi
#
=
1
n
1
"
2
n Xn
n
X
2
nvar X n
i=1
2
The derivation of the variance of Sn2 is omitted.
3.1.1
Sampling from the Normal Distribution
Theorem
Let denote X n the sample mean of a random sample of size n from a
normal distribution with mean
distribution with mean
and variance
and variance
2
n
2
. Then X n has a normal
. Proof omitted.
The gamma function is de…ned as:
(t) =
Z1
xt 1 e x dx
f or
t>0
0
Notice that (t + 1) = t (t) and if t is an integer then (t + 1) = t!. Also if
p
1) p
t is again an integer then (t + 12 ) = 1 3 5 2:::(2t
. Finally ( 12 ) =
.
t
39
#
#
If X is a random variable with density
1
fX (x) =
(k=2)
k=2
1
2
k
1
x2
e
1
x
2
f or
0<x<1
where (:) is the gamma function, then X is de…ned to have a chi-square
distribution with k degrees of freedom.
Notice that X is distributed as above then:
E[X] = k
and
var[X] = 2k
We can prove the following theorem
Theorem
If the random variables Xi ; i = 1; 2; ::; k are normally and independently
distributed with means
i
and variances
U=
k
X
2
i
then
2
Xi
i
i
i=1
has a chi-square distribution with k degrees of freedom. Proof omitted.
Theorem
If the random variables Xi ; i = 1; 2; ::; k are normally and independently
n
P
distributed with mean and variance 2 , and let S 2 = n 1 1 (Xi X n )2
i=1
then
U=
where
2
n 1
(n
1)S 2
2
v
2
n 1
is the chi-square distribution with n 1 degrees of freedom. Proof
omitted.
If X is a random variable with density
[(m + n)=2] m
fX (x) =
(m=2) (n=2) n
m=2
m
x2 1
[1 + (m=n)x](m+n)=2
40
f or
0<x<1
where (:) is the gamma function, then X is de…ned to have a F distribution
with m and n degrees of freedom.
Notice that if X is distributed as above then:
E[X] =
n
n
2
and
var[X] =
2n2 (m + n 2)
m(n 2)2 (n 4)
Theorem
If the random variables U and V are independently distributed as chisquare with m and n degrees of freedom, respectively i.e. U v
V v
2
n
2
m
and
independently, then
U=m
= X v Fm;n
V =n
where Fm;n is the F distribution with m; n degrees of freedom. Proof omitted.
If X is a random variable with density
fX (x) =
[(k + 1)=2] 1
1
p
2
(k=2)
k [1 + x =k](k+1)=2
1<x<1
f or
where (:) is the gamma function, then X is de…ned to have a t distribution
with k degrees of freedom.
Notice that if X is distributed as above then:
E[X] = 0
and
var[X] =
k
k
2
Theorem
If the random variables Z and V are independently distributed as standard normal and chi-square with k, respectively i.e. Z v (N (0; 1) and V v
independently, then
Z
p
= X v tk
V =k
where tk is the t distribution with k degrees of freedom. Proof omitted.
41
2
k
3.2
Point and Interval Estimation
The problem of estimation is de…ned as follows. Assume that some characteristic of the elements in a population can be represented by a random
variable X whose density is fX (:; ) = f (:; ), where the form of the density is assumed known except that it contains an unknown parameter
(if
were known, the density function would be completely speci…ed, and
there would be no need to make inferences about it. Further assume that
the values x1 ; x2 ; :::; xn of a random sample X1 ; X2 ; ::::; Xn from f (:; ) can
be observed. On the basis of the observed sample values x1 ; x2 ; :::; xn it is
desired to estimate the value of the unknown parameter
or the value of
some function, say ( ), of the unknown parameter. The estimation can be
made in two ways. The …rst, called point estimation, is to let the value
of some statistic, say t(X1 ; X2 ; ::::; Xn ), represent or estimate, the unknown
( ). Such a statistic is called the point estimator. The second, called interval estimation, is to de…ne two statistics, say t1 (X1 ; X2 ; ::::; Xn ) and
t2 (X1 ; X2 ; ::::; Xn ), where t1 (X1 ; X2 ; ::::; Xn ) < t2 (X1 ; X2 ; ::::; Xn ), so that
(t1 (X1 ; X2 ; ::::; Xn ); t2 (X1 ; X2 ; ::::; Xn )) constitutes an interval for which the
probability can be determined that it contains the unknown ( ).
3.2.1
Parametric Point Estimation
The point estimation admits two problems. The …rst is to devise some means
of obtaining a statistic to use as an estimator. The second, to select criteria
and techniques to de…ne and …nd a “best” estimator among many possible
estimators.
Methods of Finding Estimators
Any statistic (know function of observable random variables that is itself
42
a random variable) whose values are used to estimate ( ), where (:) is some
function of the parameter , is de…ned to be an estimator of ( ).
Notice that for speci…c values of the realized random sample the estimator
takes a speci…c value called estimate.
Method of Moments
Let f (:;
1 ; 2 ; :::; k )
be a density of a random variable X which has k
parameters
1 ; 2 ; :::; k .
As before let
In general
=
r
=
r
denote the rth moment i.e. = E[X r ].
will be a known function of the k parameters
Denote this by writing
=
r
=
r ( 1 ; 2 ; :::; k ).
=
1 ; 2 ; :::; k .
Let X1 ; X2 ; :::; Xn be a random
=
th
1 ; 2 ; :::; k ), and, as before, let Mj be the j
n
P
1
Xij . Then equating sample moments to
n
i=1
sample from the density f (:;
=
sample moment, i.e. Mj =
population ones we get k equations with k unknowns, i.e.
=
=
j ( 1 ; 2 ; :::; k )
Mj =
f or
j = 1; 2; :::; k
Let the solution to these equations be b1 ; b2 ; :::; bk . We say that these k
estimators are the estimators of
1 ; 2 ; :::; k
obtained by the method of
moments.
Example: Let X ; X ; :::; X
1
2
n
be a random sample from a normal
2
and variance
the parameters
by the method of moments.. Recall that
=
2
and
=
2)
=( ;
). Estimate
2
=
=
( 1 )2 and = 1 . The method of moment equations become:
n
P
=
=
=
1
Xi = X = M1 = 1 = 1 ( ; 2 ) =
n
1
n
i=1
n
P
i=1
=
Xi2 = M2 =
=
2
=
=
2(
;
2
)=
2
+
2
Solving the two equations
for and we get:
r n
P
b = X; and b = n1 (Xi X) which are the M-M estimators of
i=1
.
. Let ( 1 ;
2
distribution with mean
43
and
Example: Let X ; X ; :::; X
1
2
n
be a random sample from a Poisson
distribution with parameter . There is only one parameter, hence only one
equation, which is:
n
P
=
1
Xi = X = M1 =
n
i=1
Hence the M
=
1
=
1(
=
)=
M estimator of
Maximum Likelihood
is b = X.
Consider the following estimation problem. Suppose that a box contains
a number of black and a number of white balls, and suppose that it is known
that the ratio of the number is 3/1 but it is not known whether the black
or the white are more numerous, i.e. the number of drawing a black ball
is either 1/4 or 3/4. If n balls are drawn with replacement from the box,
the distribution of X, the number of black balls, is given by the binomial
distribution
0
f (x; p) = @
n
x
1
A px (1
p)n
x
f or
x = 0; 1; 2; :::; n
where p is the probability of drawing a black ball. Here p = 1=4 or p =
3=4. We shall draw a sample of three balls, i.e. n = 3, with replacement
and attempt to estimate the unknown parameter p of the distribution. the
estimation is simple in this case as we have to choose only between the two
numbers 1=4 = 0:25 and 3=4 = 0:75. The possible outcomes and their
probabilities are given below:
outcome : x
0
1
f (x; 0:75)
1=64
9=64
f (x; 0:25)
27=64 27=64
2
3
27=64 27=64
9=64
1=64
In the present example, if we found x = 0 in a sample of 3, the estimate
44
0:25 for p would be preferred over 0:75 because the probability 27=64 is
greater than 1=64.
In general we should estimate p by 0:25 when x = 0 or 1 and by 0:75
when x = 2 or 3. The estimator may be de…ned as
8
< 0:25
f or
x = 0; 1
pb = pb(x) =
: 0:75
f or
x = 2; 3
The estimator thus selects for every possible x the value of p, say pb, such
that
f (x; pb) > f (x; p= )
where p= is the other value of p.
More generally, if several values of p were possible, we might reasonably
proceed in the same manner. Thus if we found x = 2 in a sample of 3 from
a binomial population, we should substitute all possible values of p in the
expression
0
f (2; p) = @
3
2
1
A p2 (1
p)
f or
0
p
1
and choose as our estimate that value of p which maximizes f (2; p). The
position of the maximum of the function above is found by setting equal to
zero the …rst derivative with respect to p, i.e.
d
f (2; p)
dp
3p) = 0 ) p = 0 or p = 2=3. The second derivative is:
Hence,
d2
dp2
d2
f (2; 23 )
dp2
pb =
2
3
= 6p
9p2 = 3p(2
d2
f (2; p)
dp2
=6
18p.
f (2; 0) = 6 and the value of p = 0 represents a minimum, whereas
=
6 and consequently p =
2
3
represents the maximum. Hence
is our estimate which has the property
f (x; pb) > f (x; p= )
where p= is any other value in the interval 0
45
p
1.
The likelihood function of n random variables X1 ; X2 ; :::; Xn is de…ned
to be the joint density of the n random variables, say fX1 ;X2 ;:::;Xn (x1 ; x2 ; :::; xn ; ),
which is considered to be a function of . In particular, if X1 ; X2 ; :::; Xn is
a random sample from the density f (x; ), then the likelihood function is
f (x1 ; )f (x2 ; ):::::f (xn ; ). To think of the likelihood function as a function
of , we shall use the notation L( ; x1 ; x2 ; :::; xn ) or L( ; x1 ; x2 ; :::; xn ) for the
likelihood function in general.
The likelihood is a value of a density function. Consequently, for discrete
random variables it is a probability. Suppose for the moment that is known,
denoted by
0.
The particular value of the random variables which is “most
=
=
=
likely to occur”is that value x1 ; x2 ; :::; xn such that fX1 ;X2 ;:::;Xn (x1 ; x2 ; :::; xn ;
0)
is a maximum. for example, for simplicity let us assume that n = 1 and X1
has the normal density with mean 0 and variance 1. Then the value of the random variable which is most likely to occur is X1 = 0. By “most likely to oc=
cur”we mean the value x1 of X1 such that
=
0;1 (x1 )
>
0;1 (x1 ).
Now let us sup-
pose that the joint density of n random variables is fX1 ;X2 ;:::;Xn (x1 ; x2 ; :::; xn ; ),
where is known. Let the particular values which are observed be represented
=
=
=
by x1 ; x2 ; :::; xn . We want to know from which density is this particular set of
values most likely to have come. We want to know from which density (what
=
=
=
value of ) is the likelihood largest that the set x1 ; x2 ; :::; xn was obtained.
in other words, we want to …nd the value of
in the admissible set, de-
noted by b, which maximizes the likelihood function L( ; x1 ; x2 ; :::; xn ). The
=
=
=
value b which maximizes the likelihood function is, in general, a function of
x1 ; x2 ; :::; xn , say b = b(x1 ; x2 ; :::; xn ). Hence we have the following de…nition:
Let L( ) = L( ; x1 ; x2 ; :::; xn ) be the likelihood function for the random
variables X1 ; X2 ; :::; Xn . If b [where b = b(x1 ; x2 ; :::; xn ) is a function of the
observations x1 ; x2 ; :::; xn ] is the value of
46
in the admissible range which
maximizes L( ), then b = b(X1 ; X2 ; :::; Xn ) is the maximum likelihood
estimator of . b = b(x1 ; x2 ; :::; xn ) is the maximum likelihood estimate
of
for the sample x1 ; x2 ; :::; xn .
The most important cases which we shall consider are those in which
X1 ; X2 ; :::; Xn is a random sample from some density function f (x; ), so
that the likelihood function is
L( ) = f (x1 ; )f (x2 ; ):::::f (xn ; )
Many likelihood functions satisfy regularity conditions so the maximum likelihood estimator is the solution of the equation
dL( )
=0
d
Also L( ) and logL( ) have their maxima at the same value of , and it is
sometimes easier to …nd the maximum of the logarithm of the likelihood.
Notice also that if the likelihood function contains k parameters then we …nd
the estimator from the solution of the k …rst order conditions.
Example:
Let a random sample of size n is drawn from the
Bernoulli distribution
f (x; p) = px (1
where 0
p
p)1
x
1. The sample values x1 ; x2 ; :::; xn will be a sequence of 0s
and 1s, and the likelihood function is
L(p) =
Let y =
P
n
Y
pxi (1
p)1
xi
=p
i=1
P
xi
(1
p)n
xi we obtain that
logL(p) = y log p + (n
y) log(1
and
d log L(p)
y
=
dp
p
47
n
1
y
p
p)
P
xi
Setting this expression equal to zero we get
pb =
y
1X
=
xi = x
n
n
which is intuitively what the estimate for this parameter should be.
Example:
Let a random sample of size n is drawn from the
normal distribution with density
f (x; ;
2
)= p
1
2
2
e
(x
)2
2 2
The likelihood function is
L( ;
2
)=
n
Y
i=1
p
1
2
2
e
(xi
)2
2 2
n=2
1
=
exp
2
2
"
2
n
1 X
2
)2
(xi
i=1
#
the logarithm of the likelihood function is
log L( ;
2
)=
n
log 2
2
n
log
2
To …nd the maximum with respect to
2
2
2
and
n
1 X
@ log L
= 2
(xi
@
i=1
and
@ log L
=
@ 2
n
1 X
(xi
2
)2
i=1
we compute
)
n
n 1
1 X
+
(xi
2 2 2 4 i=1
)2
and putting these derivatives equal to 0 and solving the resulting equations
we …nd the estimates
and
1X
xi = x
n i=1
n
b=
X
b2 = 1
(xi
n i=1
n
x)2
which turn out to be the sample moments corresponding to
48
and
2
.
Properties of Point Estimators
One needs to de…ne criteria so that various estimators can be compared.
One of these is the unbiasedness. An estimator T = t(X1 ; X2 ; :::; Xn ) is
de…ned to be an unbiased estimator of ( ) if and only if
E [T ] = E [t(X1 ; X2 ; :::; Xn )] = ( )
for all
in the admissible space.
Other criteria are consistency, mean square error etc.
3.2.2
Interval Estimation
In practice estimates are often given in the form of the estimate plus or minus
a certain amount e.g. the cost per volume of a book could be 83 4:5 per
cent which means that the actual cost will lie somewhere between 78.5% and
87.5% with high probability. let us consider a particular example. Suppose
that a random sample (1.2,3.4,.6,5.6) of four observations is drawn from a
normal population with unknown mean
maximum likelihood estimate of
and a known variance 9. The
is the sample mean of the observations:
x = 2:7
We wish to determine upper and lower limits which are rather certain to
contain the true unknown parameter value between them. We know that the
sample mean, x, is distributed as normal with mean
x v N( ;
2
and variance 9=n i.e.
=n). Hence we have
Z=
X
v N (0; 1)
3
2
Hence Z is standard normal. Consequently we can …nd the probability that
Z will be between two arbitrary values. For example we have that
P [ 1:96 < Z < 1:96] =
Z1:96
1:96
49
(z)dz = 0:95
Hence we get that
must be in the interval
3
X + 1:96 >
2
>X
3
1:96
2
and for the speci…c value of the sample mean we have that 5:64 >
i.e. P [5:64 >
>
>
:24
:24] = :95. This leads us to the following de…nition for
the con…dence interval.
Let X1 ; X2 ; ::::; Xn be a random sample from the density f ( ; ). Let T1 =
t1 (X1 ; X2 ; ::::; Xn ) and T2 = t2 (X1 ; X2 ; ::::; Xn ) be two statistics satisfying
T1
T2 for which P [T1 < ( ) < T2 ] = , where
does not depend on
. Then the random interval (T1 ; T2 ) is called a 100 percent con…dence
interval for ( ).
is called the con…dence coe¢ cient. T1 and T2 are
called the lower and upper con…dence limits, respectively. A value (t1 ; t2 )
of the random interval (T1 ; T2 ) is also called a 100 percent con…dence
interval for ( ).
Let X1 ; X2 ; ::::; Xn be a random sample from the density f ( ; ). Let
T1 = t1 (X1 ; X2 ; ::::; Xn ) be a statistic for which P [T1 < ( )] = . Then T1
is called a one-sided lower con…dence interval for ( ). Similarly, let
T2 = t2 (X1 ; X2 ; ::::; Xn ) be a statistic for which P [ ( ) < T2 ] = . Then T2
is called a one-sided upper con…dence interval for ( ).
Example:
Let X1 ; X2 ; ::::; Xn be a random sample from the
p
density f (x; ) = ;9 (x). Set T1 = t1 (X1 ; X2 ; ::::; Xn ) = X 6= n and
p
T2 = t2 (X1 ; X2 ; ::::; Xn ) = X + 6= n. Then (T1 ; T2 ) constitutes a random
interval and is a con…dence interval for ( ) = , with con…dence coe¢ cient
p
p
= P [X 6= n < < X + 6= n] =
= P[ 2 <
X
p3
n
< 2] = (2)
( 2) = 0:9772
0:0228 = 0:9544. hence
if the random sample of 25 observations has a sample mean of, say, 17.5, then
p
p
the interval (17:5 6= 25; 17:5 + 6= 25) is also called a con…dence interval
50
of .
Sampling from the Normal Distribution
Let X1 ; X2 ; :::; Xn be a random sample from the normal distribution with
mean
2
and variance
. If
parameters and ( ) =
2
is unknown then
2
=( ;
), the unknown
, the parameter we want to estimate by interval
estimation. We know that
X
p
v N (0; 1)
n
However, the problem with this statistic is that we have two parameters.
Consequently we can not an interval. hence we look for a statistic that
involves only the parameter we want to estimate, i.e.
qP
(X
)= pn
X)2 =(n
(Xi
=
1)
2
. Notice that
X
p v tn
S= n
1
This statistic involves only the parameter we want to estimate. Hence we
have
q1 <
X
p < q2
S= n
, X
p
q2 (S= n) <
<X
p
q1 (S= n)
where q1 ; q2 are such that
X
p < q2 =
S= n
p
p
q2 (S= n); X q1 (S= n) is the 100 percent conP q1 <
Hence the interval X
…dence interval for . It can be proved that if q1 ; q2 are symmetrical around
0, then the length is the interval is minimized.
Alternatively, if we want to …nd a con…dence interval for
unknown, then we use the statistic
P
(Xi X)2
(n
=
2
51
1)S 2
2
v
2
n 1
2
, when
is
Hence we have
q1 <
(n
1)S 2
2
< q2
,
1)S 2
(n
q2
<
2
<
(n
1)S 2
q1
where q1 ; q2 are such that
P q1 <
(n
1)S 2
2
< q2 =
(n 1)S 2 (n 1)S 2
; q1
q2
is a 100 percent con…dence interval for 2 .
h
i
h
i
2
2
The q1 ; q2 are often selected so that P q2 < (n 1)S
= P (n 1)S
< q1 =
2
2
So the interval
(1
)=2. Such a con…dence interval is referred to as equal-tailed con…dence
interval for
2
.
52
3.3
Hypothesis testing
A statistical hypothesis is an assertion or conjecture, denoted by H, about
a distribution of one or more random variables. If the statistical hypothesis
completely speci…es the distribution is simple, otherwise is composite.
Example: Let X ; X ; :::; X
1
;25 (x).
2
n
be a random sample from f (x; ) =
The statistical hypothesis that the mean of the normal population is
less or equal to 17 is denoted by: H :
17. Such a hypothesis is composite,
as it does not completely specify the distribution. On the other hand, the
hypothesis H :
= 17 is simple since it completely speci…es the distribution.
A test of statistical hypothesis H is a rule or procedure for deciding
whether to reject H.
Example: Let X ; X ; :::; X
1
2
n
be a random sample from f (x; ) =
17. One possible test Y is as follows: Reject H if
p
and only if X > 17 + 5= n.
;25 (x).
Consider H :
In many hypotheses-testing problems two hypotheses are discussed. The
…rst, the hypothesis being testing, is called the null hypothesis, denoted by
H0 , and the second is called the alternative hypothesis denoted by H1 . We
say that H0 is tested against or versus H1 . The thinking is that if the null
hypothesis is wrong the alternative hypothesis is true, and vice versa. We
can make two types of errors:
Rejection of H0 when H0 is true is called a Type I error, and acceptance
of H0 when H0 is false is called a Type II error. The size of Type I error
is de…ned to be the probability that a Type I error is made, and similarly
the size of a Type II error is de…ned to be the probability that a Type II
error is made.
Signi…cance level or size of a test, denoted by , is the supremum of
the probability of rejecting H0 when H0 is correct, i.e. it is the supremum of
53
the Type I error. In general to perform a test we …x the size to a prespeci…ed
value in general 10%, 5% or 1%.
Example: Let X ; X ; :::; X
1
2
n
be a random sample from f (x; ) =
Consider H0 :
17 and the test Y: Reject H0 if and only if
p
X > 17 + 5= n. Then of the test is
p
p
17 + 5= n
X
p >
p
sup P [X > 17 + 5= n] = sup P
5= n
5= n
17
17
p
p
17 + 5= n
X
17 + 5= n
p
p
p
= sup 1 P
= sup 1 P Z
5= n
5= n
5= n
17
17
p
17 + 5= n
p
= sup 1
=1
(1) = 0:159:
5= n
17
;25 (x).
3.3.1
Testing Procedure
Let us establish a test procedure via an example. Assume that the xi 0s are
2
iid Normal, n = 64, X = 9:8 and
hypothesis that
= 0:04. We would like to test the
= 10.
1. Formulate the null hypothesis:
H0 :
= 10
2. Formulate the alternative:
H1 :
6= 10
3. select the level of signi…cance:
= 0:01
From tables …nd the critical values for Z, denoted by cZ = 2:58.
4. Establish the rejection limits:
Reject H0 if Z <
2:58 or Z > 2:58.
5. Calculate Z:
Z=
X
p
0
n
=
9:8 p10
0:2= 64
=
8
54
6. Make the decision:
Since Z is less than
2:58, reject H0 .
To …nd the appropriate test for the mean we have to consider the following
cases:
1. Normal population and known population variance (or standard deviation).
In this case the statistic we use is:
Z=
X
v N (0; 1)
0
p
n
2. Large samples in order to use the central limit theorem.
In this case the statistic we use is:
Z=
X
v N (0; 1)
0
pS
n
3. Small samples from a normal population where the population variance
(or standard deviation) is unknown.
In this case the statistic we use is:
t=
3.3.2
X
0
pS
n
v tn
1
Testing Proportions
The null hypothesis will be of the form:
H0 :
=
0
an the three possible alternatives are:
(1) H1 :
6=
0
two sided test, (2) H1 :
<
0
one sided, (3) H1 :
>
0
one sided. The appropriate statistic is based on the central limit theorem
and is:
Z=
p
0
pS
n
v N (0; 1) where S 2 =
55
0 (1
0)
Example: Mr. X believes that he will get more 60% of the votes.
However, in a sample of 400 voters 252 indicate that they will vote for X. At
a signi…cance level of 5% test Mr. X belief.
p =
252
400
= 0:63, S 2 = 0:6(1
alternative is H1 :
0:63 p0:6
0:489= 400
>
0.
0:6) = 0:24. The H0 :
=
0
The critical value is 1:64. Now Z =
and the
p
0
S
p
n
=
= 1:22. Consequently, the null is not rejected as Z < 1:64. Thus
Mr. X belief is wrong.
If fact we have the following possible outcomes when testing hypotheses:
H0 is accepted
H0 is correct Correct decision (1
H1 is correct
Type II error ( )
H1 is accepted
)
Type I error ( )
Correct decision (1
)
An operating characteristic curve presents the probability of accepting a null hypothesis for various values of the population parameter at a
given signi…cance level
using a particular sample size. The power of the
test is the inverse function of the operating characteristic curve, i.e. it is the
probability of rejecting the null hypothesis for various possible values of the
population parameter.
56
Exercises:
1) The growth of an economy can be high with probability 0.15, normal
with probability 0.5 and negative (recession) with probability 0.35. In the
high-growth state the return of a stock will be 0.4, in the normal state 0.1
and in the recession will be -0.1. Evaluate the expected return of this stock.
2) The Audit department of a bank, to check the best credit practices are
complied, it chooses at random 2 out of its 5 branches. It is known that 3
of the 5 branches do not follow the best credit practices ("bad" branches),
although which 3 is not known.
a. What is the probability that 2 "bad" branches are chosen?
b. What is the probability that 2 "good" branches are chosen?
c. What is the probability that 1 "good" and 1 "bad" are chosen?
3) In a market research 55% of the participating consumers are women.
60% of those women prefer a speci…c product, whereas only 38% of men prefer
the same product. We choose at random one participant of the research.
Find:
a) The probability that this person prefers the speci…c product.
b) The probability that the chosen person is a woman, given that the
person prefers the speci…c product.
4) If X v N (10; 9) …nd a) P (X
d) P (4:12
e) P (X
X
15:88), b) P (X
4:12), c) P (X
4:12),
15:88). For the same random variable X …nd x0 such that
x0 ) = 5%, d) P (X
x0 ) = 5% and P ( x0
X
x0 ) = 95%.
5) The following numbers are the number of computers sold per month
over the last 19 months, by a speci…c computer company:
57
25, 26, 32, 21, 29, 31, 27, 23, 34, 29, 32, 34, 35, 31, 36, 37,
41, 44, 46.
Compute the sample mean, the second, third and fourth moments, the
sample variance, the ML variance, the coe¢ cient of skewness and the kurtosis.
6) For a large class of students a random sample of 4 grades were drawn:
64, 66, 89, and 77. Calculate a 95% con…dence interval for the whole class
mean . How is your result change if you knew that the variance
2
were
100? (notice that the value of a t3 distribution that leaves 2.5% at the right
tail is 3.18)
58