Download Random experiment A random experiment is a process leading to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Examples of random experiment (a)
Random experiment
A random experiment is a process leading to an uncertain outcome, before the experiment is run. We usually assume that
the experiment can be repeated indefinitely under essentially the
same conditions. A basic outcome is a possible outcome of a
random experiment.
“BASIC”EXPERIMENT
1. mix the tickets in the box;
2. randomly select ONE ticket;
3. read the value on the ticket.
.................................
...............
.........
.........
.......
.......
......
......
......
......
.....
.....
....
....
....
....
...
...
...
...
.
...
.
.
... ....
...
... ...
...
.
.
.........................
..
...
...
.
.
.
...
...
..
The structure of a random experiment is characterized by three
objects:
1
• The sample space S;
0
0
1
0
• the events set;
• the probability measure.
OUTCOME: number on the ticket.
93
92
Examples of random experiment (b)
Examples of random experiment (c)
EXPERIMENT 1
EXPERIMENT 2
1. Run the basic experiment;
1. Run the basic experiment;
2. do NOT insert the ticket in the box;
2. reinsert the ticket in the box;
3. run the basic experiment.
3. run the basic experiment.
...................................................
..........
........
........
......
......
......
.....
.....
.....
.....
.....
....
....
...
...
...
...
...
.
.
.
... ..
...
... ....
...
..
.
.
..........................
...
..
....
...
...
..
1
0
...................................................
..........
........
........
......
......
......
.....
.....
.....
.....
.....
....
....
...
...
...
...
...
.
.
.
... ..
...
... ....
...
..
.
.
..........................
...
..
....
...
...
..
0
1
0
1
0
OUTCOME: ordered pair of numbers.
0
0
1
OUTCOME: ordered pair of numbers.
94
95
The sample space: further examples
The sample space
The sample space S the collection of all possible outcomes of a
random experiment.
1. The tickets are drawn, with replacement, until a ticket with
the number “1” is extracted. In this case the sample space
S = {1, 01, 001, 0001, . . .} is made up of a countably infinite
number of outcomes.
2. Give a push to the hand and record the number it points. In
this case the sample space is S = [0; 1) that is uncountably
infinite.
0
EXP. B: S = {0, 1}.
EXP. 1: S = {(0, 0), (0, 1), (1, 0), (1, 1)}.
3/4
EXP. 2: S = {(0, 0), (0, 1), (1, 0), (1, 1)}.
........................................
.............
.........
........
.......
.......
.
......
......
......
....
......
.....
.......
.....
....
......
....
....
... .... ...
....
...
...
...
...
...
...
...
.
.
.
.
...
..
...
...
...
...
.
.
...
.
.
.
...
...
...
..
.....
....
...
..
..
..
..
.....
.....
..
...
.........
....
.
.
..
.
.
.....
...
...
...
...
..
...
..
...
...
.
.
...
..
...
...
.
...
.
..
...
...
...
...
...
..
...
...
....
....
....
....
.....
.....
......
......
......
......
........
........
..........
......................................................
1/4
1/2
96
Event of a random experiment
97
Examples of event (for the experiments 1 and 2)
An event is a set of outcomes, that is a subset of the sample
space, to which a probability is assigned.
• Sometimes an event is described by means of a proposition,
however it is always possible to represent it formally by a set
of outcomes;
• we will denote an event by means of a capitol letter, for instance E, and it holds that E ⊆ S;
• the tool used to deal and describe the relationships existing
between event is set theory;
• an event occurs it the random experiment results in one of
its constituent basic outcomes.
98
A =“Two tickets with the same value are extracted”
={(1, 1), (0, 0)}.
B =“A ticket with the number 1 is obtained in the first extraction”
={(1, 1), (1, 0)}.
C =“The product of the numbers on the extracted tickets is 0”
={(0, 1), (1, 0), (0, 0)}.
D =“A ticket with the number 2 is obtained in the first extraction”
=∅.
E =“A ticket with a numb. smaller than 2 is obtained in the first extr.”
={(1, 1), (1, 0), (0, 1), (0, 0)}
=S.
99
Union and intersection of events
If A and B are two events in a sample space S. For instance,
if for the experiments 1 and 2 A = {(1, 1), (0, 0)} and B =
{(1, 1), (1, 0)} .
The events set
• The basic outcomes of an experiment are singleton sets and
are also known as elementary events;
• The events set is the set of all possible events.
• A ∪ B=“either A or B will occur”, that is A ∪ B is the set
of all outcomes in S that belong to either A or B: A ∪ B =
{(1, 1), (0, 0), (1, 0)}.
• A∩B=“both A and B will occur”, that is A∩B is the set of all
outcomes in S that belong to both A and B: A∩B = {(1, 1)}.
100
101
Three important events
Mutually exclusive and collectively exhaustive events
• A and B are mutually exclusive events if they have no basic
outcomes in common; that is A ∩ B = ∅;
• mutually exclusive events are also called disjoint events;
• let E1, E2, . . . , Ek be k events of the sample space S. If such
events completely cover the sample space, formally
E1 ∪ E2 ∪ · · · ∪ Ek = S
then they are called collectively exhaustive.
102
Sure event : event that always occurs, whatever the result of
the experiment is. The sample space S is a sure event.
Impossible event : event that never occurs, whatever the result
of the experiment is. The empty set ∅ is an impossible event.
Complement : the complement of an event E, denoted by Ē;
is the set of all basic outcomes in the sample space that do
not belong to E. The complement of E will occur if and
only if Ē will occur. We can also write Ē = S\E where “\”
is the set-difference operator.
103
Assessing probability: experiment 1 (a)
Assessing probability: basic experiment
• The sample space S = {(0, 0), (0, 1), (1, 0), (1, 1)} is not made
up of equally likely events;
Set of the events: {∅, {0}, {1}, {0, 1}}
1. P (∅) = 0;
• write the sample space in a different way so as to have equally
likely events
2. P ({0}) = 3/5;
3. P ({1}) = 2/5;
4. P ({0, 1}) = P (S) = 1.
1d
Note that
P (S) = P ({0, 1}) = P ({0} ∪ {1}) = P ({0}) + P ({1}) =
2
3
+ =1
5
5
0a
0c
0b
1e
(0a , 0b)
(0a, 0c)
(0a, 1d)
(0a , 1e)
(0b , 0a)
(0b , 0c)
(0b , 1d)
(0b, 1e )
(0c, 0a)
(0c , 0b)
(0c, 1d)
(0c , 1e)
(1d, 0a)
(1d , 0b)
(1d, 0c)
(1d, 1e)
(1e , 0a)
(1e , 0b)
(1e , 0c)
(1e, 1d)
104
105
Assessing probability: experiment 1 (b)
Event of interest: {(0, 0), (1, 1)}
Number of possible orderings
(0a, 0b)
(0a , 0c)
(0a, 1d)
(0a , 1e)
(0b, 0a)
(0b, 0c)
(0b , 1d)
(0b, 1e )
(0c, 0a )
(0c, 0b)
(0c, 1d)
(0c, 1e)
(1d, 0a )
(1d, 0b)
(1d, 0c)
(1d, 1e)
(1e , 0a)
(1e , 0b)
(1e, 0c)
(1e, 1d)
• The number of possible ways of arranging x objects in order
is given by
x! = x × (x − 1) × (x − 2) × · · · × 2 × 1
• “x!” is read “x factorial”.
• Recall that 0! = 1.
P ({(0, 0), (1, 1)}) =
8
20
106
107
Number of combinations
Permutations
• The total number of permutations of x objects chosen from
n, denoted by Pxn , is the number of possible arrangements
when x objects are to be selected from a total of n and
arranged in order:
Pxn = n × (n − 1) × (n − 2) × · · · × (n − x + 1).
• The number of combinations, denoted by Cxn, of x objects
chosen from n is the nuber of possible selections that can be
made. This number is
n!
Cxn =
.
x!(n − x)!
• Note that Cnn = C0n = 1;
• Note that
Pxn =
• alternative notation:
n!
.
(n − x)!
Cxn =
n
x
.
108
109
Assessing probability (1)
Probability is the chance that an uncertain event will occur
Classical probability : provided that all outcomes in the sample
space are equally likely to occur, the probability of an event
is the ratio between the number of outcomes that satisfy the
event and the total number of outcomes in the sample space.
Relative frequency probability : when an experiment is performed, for any event only one of two possibilities can happen; it occurs or it does not occur. The relative frequency
of occurrence of an event, in a number of repetitions of the
experiment, is a measure of the probability of that event.
More formally, frequentists sees probability as the long-run
expected frequency of occurrence.
110
Assessing probability (2)
Subjective probability : a probability derived from an individual’s personal judgment about whether a specific outcome
is likely to occur. Subjective probabilities contain no formal
calculations and only reflect the subject’s opinions and past
experience.
BETTING APPROACH: find a specific amount to win or
lose such that the decision maker is indifferent about which
side of the bet to take.
111
Probability rules
Probability postulates
Probability is a function defined on the set of the events that
associates to every event A a real number P (A) that satisfies
the following conditions
1. For every event A it holds that P (Ā) = 1 − P (A), that is
called the complement rule.
2. P (∅) = 0.
3. For every event A it holds that 0 ≤ P (A) ≤ 1.
1. P (A) ≥ 0
4. If A = A1 ∪ A2 ∪ · · · Ak with Ai ∩ Aj = ∅ for every i 6= j, then
2. P (S) = 1
P (A) = P (A1) + P (A2) + · · · + P (Ak ).
3. if A and B are disjoint (A ∩ B = ∅) then
5. For every pair of events A and B it holds that
P (A ∪ B) = P (A) + P (B).
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
this is called the addition rule.
112
113
Compute P (A2|A1)
Conditional probability
For the experiment 1 consider the following events:
A1
= “the result of the FIRST extraction is 0”
A2
= “the result of the SECOND extraction is 0”
(0a, 0b)
(0b, 0a)
(0c, 0a )
(1d, 0a )
(1e , 0a)
(0a , 0c)
(0b, 0c)
(0c, 0b)
(1d, 0b)
(1e , 0b)
(0a, 1d)
(0b , 1d)
(0c, 1d)
(1d, 0c)
(1e, 0c)
(0a , 1e)
(0b, 1e )
(0c, 1e)
(1d, 1e)
(1e, 1d)
A2|A1 = “the result of the SECOND extraction is 0 GIVEN
that 0 is obtained in the FIRST extraction”
P (A2|A1) =
A1|A2 = “the result of the FIRST extraction is 0 GIVEN
that 0 is obtained in the SECOND extraction”
P (A1) =
3
5
P (A2) =
3
5
P (A2|A1) = ?
P (A1|A2) = ?
114
=
# outcomes in A1 and A2
1
=
# outcomes in A1
2
# outcomes in A1 and A2
# outcomes in S
# outcomes in A1
# outcomes in S
=
P (A2 ∩ A1)
P (A1)
115
Computing P (A1|A2)
Multiplication rule
(0a, 0b)
(0b, 0a)
(0c, 0a )
(1d, 0a )
(1e , 0a)
(0a , 0c)
(0b, 0c)
(0c, 0b)
(1d, 0b)
(1e , 0b)
(0a, 1d)
(0b , 1d)
(0c, 1d)
(1d, 0c)
(1e, 0c)
(0a , 1e)
(0b, 1e )
(0c, 1e)
(1d, 1e)
(1e, 1d)
For every pair of events A and B the probability of A given B
can be computed as
P (A|B) =
P (A ∩ B)
P (B)
so that
P (A1|A2) =
=
1
# outcomes in A1 and A2
=
# outcomes in A2
2
# outcomes in A1 and A2
# outcomes in S
# outcomes in A2
# outcomes in S
P (A ∩ B) = P (A|B) × P (B)
or, equivalently,
=
P (A2 ∩ A1)
P (A2)
P (A ∩ B) = P (B|A) × P (A)
116
117
Independence
Example with the experiment 2
Two events A and B are said to be independent if
P (A|B) = P (A)
1d
or, equivalently, P (B|A) = P (B).
0a
If two event, A and B are independent, then the multiplication
rule simplifies as
P (A ∩ B) = P (A) × P (B)
Also the revers implication holds true, that is the factorization
of P (A ∩ B) as P (A ∩ B) = P (A) × P (B) is a sufficient condition
to prove that A and B are independent.
118
0c
0b
1e
(0a , 0a)
(0a , 0b)
(0a , 0c)
(0a , 1d)
(0a, 1e )
(0b, 0a)
(0b, 0b)
(0b, 0c)
(0b, 1d)
(0b , 1e)
(0c , 0a)
(0c, 0b)
(0c, 0c)
(0c , 1d)
(0c, 1e )
(1d, 0a)
(1d, 0b)
(1d, 0c)
(1d , 1d)
(1d, 1e )
(1e , 0a)
(1e , 0b)
(1e , 0c)
(1e , 1d)
(1e, 1e)
In this case it holds that
3
3
P (A2) =
P (A1) =
5
5
P (A2|A1) =
3
5
P (A1|A2) =
119
3
5
Law of total probability
Bayes’ theorem
If the events A1, A2, . . . , Ak form a partition of the sample space,
so that
1. S = A1 ∪ A2 ∪ · · · ∪ Ak
2. Ai ∩ Aj = ∅ per ogni i 6= j
(collectively exhaustive);
(mutually exclusive).
Bayes’ formula provides an alternative way to compute conditional probabilities.
For every pair of events A and B it holds that
P (A|B) =
Then for every event B it holds that
P (B) = P (B ∩ A1) + P (B ∩ A2) + · · · + P (B ∩ Ak )
= P (B|A1)P (A1 ) + P (B|A2)P (A2) + · · · + P (B|Ak )P (Ak )
P (B|A)P (A)
P (B)
Typically the denominator can be computed by applying the law
of total probability.
P (B) = P (B|A)P (A) + P (B|Ā)P (Ā)
120
121
The envelopes riddle
Which experiment?
• One of my friends carries out either the experiment 1 or the
experiment 2;
• Suppose you’re on a game show, and you’re given the choice
A
B
C
of three labeled envelopes:
• it is unknown which experiment has been carried out.
• two envelops are empty and one contains 1000 euro. The
host knows where the money is.
P (E1) = P (E2) =
1
2
• you choose one of the three envelops, say
;
• the host opens one of the remaining envelops, say
and shows that it is empty;
• the result of the experiment is {(0, 0)}
• QUESTION: which experiment is most likely to have been
executed?
• SOLUTION: it is necessary to compute
P (E1|{(0, 0)})
A
C
,
• now you are allowed to switch your envelope with the host;
that is take
B
and handle
A
to the host;
• QUESTION: is it better for you to switch, or better not to
switch?
P (E2|{(0, 0)}).
122
123
The rare diseases problem (1)
The accuracy of medical diagnostic test, in which a positive
results indicates the presence of a disease, is often stated in
terms of its sensitivity, the proportion of diseased people that
test positive, and its specificity, the proportion of people without
the disease who test negative.
The rare diseases problem (2)
• For instance
– P (D) = 1/1000
– P (+|D) = 0, 99
• D=“a person has the disease”;
– P (−|D̄) = 0, 98
• +=“A person’s test result is POSITIVE”;
• −=“A person’s test result is NEGATIVE”;
• SENSITIVITY: probability that the test result is positive for
a person who has the disease, P (+|D);
• QUESTION: A person’s test result is positive. What is the
probability that the person actually has the disease,
P (D|+) = ?
• SPECIFICITY: probability that the test result is negative for
a person who has not the disease, P (−|D̄);
124
125
Example of random variable: a gambling game
Roughly speaking, a random variable is a numerical description
of the outcome of an experiment.
Random variables
Aim: define tools that make it possible
1. to deal more easily and effectively with random experiments;
• 3 draws with replacement;
• receive one euro for every “1” extracted;
• pay one euro for every “0” extracted.
outcome
2. to develop a general theory that can be applied to all the random experiments that share a common probabilistic structure
(even though apparently distinct form each other).
126
...........................................
.........
.......
.......
......
.....
.....
.....
....
....
....
...
...
...
... .
...
.
... ...
...
......................
...
.
.
...
...
.
0
1
gain
-3
(0, 0, 0)
−→
(1, 0, 0)
(0, 1, 0)
(0, 0, 1)
ց
−→
ր
-1
(1, 1, 0)
(0, 1, 1)
(1, 0, 1)
ց
−→
ր
1
(1, 1, 1)
−→
3
127
Definition of random variable
Discrete vs continuous random variables
DEFINITION: a random variable is a function from the sample
space to the real line, i.e. a function that maps every element
of the sample space onto a single real number: X(s) → IR.
• The value taken by a random variable depends on the outcome of the experiment, and it is not known before the experiment is performed;
• it is important to distinguish between a random variable and
the possible values that it can take. Capitol letters, such as
X, are used to denote random variables. The corresponding
lowercase letter, x, denotes the possible value;
• in the example of the gambling game, if X is the random
variable corresponding to the gain, then X((0, 0, 0)) = −3,
X((0, 1, 1)) = 1, etc.
128
• A random variable is said CONTINUOUS if it can take on
any numerical value in an interval or collection of intervals;
• a random variable is said DISCRETE if it can take on either
a finite number of values or a countable number of values;
• every value of a discrete random variable can be associated
with a probability value.
outcome
gain
probability
(0, 0, 0)
−→
-3
1/8
(1, 0, 0)
(0, 1, 0)
(0, 0, 1)
ց
−→
ր
-1
3/8
(1, 1, 0)
(0, 1, 1)
(1, 0, 1)
ց
−→
ր
1
3/8
(1, 1, 1)
−→
3
1/8
129
Probability distribution
1
3/8
3
1/8
0.4
0.3
• its support, denoted by SX , and defined as the set of all
possible values which the random variable can take on;
0.2
-1
3/8
A discrete random variable X is characterized by
0.1
-3
1/8
Characterization
0.0
values
prob.
probability
0.5
• A probability distribution is a function that describes the
probability of a random variable taking certain values.
−3 −2 −1
• in general
values of X
P (X = x)
0
1
2
3
gain
x1
P (X = x1)
x2
P (X = x2)
x3
P (X = x3)
• its probability mass function or, shortly, its probability function.
...
...
130
131
The (cumulative) distribution function (cdf)
The probability mass function (pmf)
DEFINITION: the probability mass function of a discrete random
variable X is a function defined on SX that gives the probability
that X is exactly equal to x ∈ SX , formally
F (x) = P (X ≤ x) =
for every x ∈ SX
p(x) = P (X = x)
Properties of the probability mass function:
2.
X
p(y)
y∈Sx; y≤x
Properties of the distribution function:
1. p(x) ≥ 0
X
DEFINITION: for every real value x ∈ IR the cumulative distribution function of X is defined as
1. F (x) is (not necessarily strictly) non-decreasing ;
p(x) = 1.
2. limx→−∞ F (x) = 0 and limx→+∞ F (x) = 1;
x∈SX
3. F (x) is right-continuous, that is lim
Any function that takes on values in SX and that fulfills the two
properties above is a probability mass function for X.
x→x+
0
F (x) = F (x0).
Every function that satisfies the three properties above is a distribution function.
132
133
Expected value of a discrete random variable
Graph of the distribution function for the gambling game example.
The expected value (or mean) of a discrete random variable X
is the number
1.0
Example of probability distribution function
E(X) = µX =
X
x p(x).
0.8
x∈SX
0.4
F(x)
0.6
• The expected value is a measure of central tendency of the
probability distribution;
0.2
• note the similarity with the mean of a population;
0.0
• the expected value can be thought of as the arithmetic mean
of and infinite number of realizations of the random variable;
−6
−3
−1
0
1
3
6
The distribution function is discontinuous
at the points -3, -1, 1,
x
3 and constant in between. In the discontinuity points it takes
values 1/8, 4/8, 7/8 and 1.
134
• for the gambling game example
E(X) = −3 ×
1
3
3
1
−1× +1× +3× =0
8
8
8
8
135
The standard deviation of a random variable
The variance of a discrete random variable
The variance of a discrete random variable X is the number
2
Var(X) = σX
= E [X − E(X)]2 =
n
o
X
x∈SX
(x − µ)2 p(x)
The variance is not expressed in the same units as the random
variable, but square units. Therefore, it is necessary to transform
its value by computing the square root to obtain the standard
deviation of the variable.
SD(X) =
• the variance is a measure of dispersion (around the expected
value) of the probability distribution;
X
x∈SX
x2 p(x) − E(X)2
Var(X)
• In the gambling game example
• similarly to the result shown for the variance of a population,
it holds that
Var(X) = E(X 2) − E(X)2 =
q
E(X 2) = 9 ×
3
3
1
1
+1× +1× +9× =3
8
8
8
8
• hence the variance is Var(X) = 3 − 02 = 3;
√
• and the standard deviation is SD(X) = 3 = 1.73
136
137
Example of discrete uniform distribution (1)
The discrete uniform distribution
The discrete uniform random variable, X, takes on a finite number of values, x1, . . . , xK with constant probabilities all equal to
1/K, that is
probability mass function
· · · xK
1
··· K
distribution function
1.0
0.8
F(x)
p(x)
1
for i = 1, . . . , K
K
whereas is cumulative distribution function is
numb. of xi ≤ x
F (x) =
for x ∈ IR
K
0.6
0.15
hence its probability function can be written as
0.00
0.2
0.05
0.4
0.10
p(xi ) =
0.0
1
K
0.25
x2
0.20
values x1
1
prob.
K
The random variable X relative to the roll of one die has discrete
uniform distribution with values
x1 = 1, x2 = 2, x3 = 3, x4 = 4, x5 = 5, x6 = 6 and p(xi) = 1/6
−1
0
1
2
3
4
x
138
5
6
7
8
−1
0
1
2
3
4
5
6
x
139
7
8
Bernoulli distribution (1)
Example of discrete uniform distribution (2)
Experiment:
The expected value of the random variable X relative to the roll
of a die is
• Box with r + s tickets:
6
1 X
i = 3.5
E(X) =
6 i=1
r with
Furthermore,
E(X 2) =
6
1 X
i2 = 15.17
6 i=1
1
and
s with
• the proportion of “1”’s in the box is π =
0
r
r+s
• extract ONE ticket
Var(X) = 15.17 − 3.52 = 2.92
√
SD(X) =
2.92 = 1.71
Random variable Y = number on the extracted ticket.
140
141
Bernoulli distribution (2)
Binomial distribution (1)
Experiment:
• The support of Y is {0, 1} and its probability function is
p(y) = π y (1 − π)1−y
that is
p(y) =
for y = 1


π


1−π
• Box with r + s tickets:
for y = 0
r with
• the expected value of Y is E(Y ) = π and the variance Var(Y ) =
π(1 − π);
• graphical representation of the probability distribution for
some values of π:
1.0
0.8
0.6
0
1
2
0
1
2
0
r
r+s
Random variable X= (two equivalent definitions)
1. sum of the values on the extracted tickets;
0.0
−1
s with
• n tickets are extracted with replacement.
0.2
0.4
1.0
0.8
0.6
0.4
0.0
0.2
0.6
0.4
0.2
0.0
−1
and
• the proportion of tickets with “1” is π =
π = 0.8
1.0
π = 0.5
0.8
π = 0.2
1
−1
0
142
1
2
2. number of tickets with “1”.
143
Binomial distribution (2)
Binomial distribution (3)
• The support of X is {0, 1, . . . , n} and its probability function
is
n
p(x) =
π x (1 − π)n−x for x = 0, 1, . . . , n
x
General formulation:
• Random experiment with two possible outcomes coded as
SUCCESS and FAILURE:
P (SUCCESS) = π
• The expected value of X is E(X) = n π and
the variance is Var(X) = n π(1 − π);
• graphical representation of the probability distribution for
some values of π with n = 10:
π = 0.8
0
1
2
3
4
5
6
7
8
9
10
0.4
– under the same conditions;
0.3
– independently.
• X = exact number of successes in n trials.
0.1
0.2
0.3
0.2
then X follows a binomial distribution with parameters n and π.
0.0
0.1
0.0
0.0
0.1
0.2
0.3
0.4
π = 0.5
0.4
π = 0.2
• repetition of the experiment n times
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
145
144
Continuous random variables
Discrete random variable: notation
0
• If the distribution of X is discrete uniform, then we write
• Experiment: give a push to the hand.
3/4
X ∼ Ud{x1, . . . , xK }
• X = value pointed by the hand when
it stops.
• if the distribution of Y is Bernoulli with parameter π, then
we write
Y ∼ Bernoulli(π)
or, more compactly,
or, more compactly,
1/2
Example of events:
Y ∼ Be(π)
• If the distribution of X is binomial with parameters n and π
then we write
X ∼ Binomial(n, π)
.......................................
.............
.........
.........
.......
.......
......
...
......
......
........
......
.....
.....
.......
....
....
...
... ..... ...
....
...
...
...
....
...
.
.
...
...
..
...
...
....
...
.
.
...
.
...
...
...
...
.....
....
...
..
..
..
.....
.....
..
..
...
..
....
............
..
.......
...
...
..
...
..
...
...
...
...
.
...
..
...
...
.
.
...
.
...
...
...
...
...
...
...
...
...
....
....
...
.....
.....
......
.....
......
......
.......
.......
.
.
.
.........
.
.
.
.
.....
.............
.......................................
X ∼ Bin(n, π)
146
• X < 0.5
−→
X ∈ [0; 0.5)
• 0.4 < X ≤ 0.7
−→
X ∈ (0.4; 0.7]
• X = 0.5
−→
X ∈ [0.5]
• X 6= 0.5
−→
X ∈ [0; 0.5) ∪ (0.5; 1)
147
1/4
Assessing the probability of events
If all the points are equally likely (same probability of being
pointed)
• P (X < 0.5) = 0.5
• Even though we know for sure that X will take on some real
number, the probability that it takes any fixed real number
is equal to zero;
• P (0.4 < X ≤ 0.7) = 0.7 − 0.4 = 0.3
• P (X = 0.5) =?
Consider the interval [0.5 − 2ǫ ; 0.5 + 2ǫ ] where ǫ > 0 is a small
number, then
P 0.5 −
Some comments
ǫ
ǫ
≤ X ≤ 0.5 +
=ǫ
2
2
• as a consequence, it is not possible to describe the probabilistic structure of a continuous random variable by means
of a probability mass function;
• OBJECTIVE: identify and effective way to describe the probabilistic structure of a continuous random variable.
for ǫ → 0 one obtains
P (X = 0.5) = 0
furthermore P (X 6= 0.5) = 1 − P (X = 0.5) = 1
148
149
The cumulative distribution function
Example of cumulative distribution function
• An event of a continuous random variable can always be represented by an interval or by the union of disjoint intervals;
• the probability of any event can be computed from the probability of the events corresponding to the following family of
intervals
(−∞; x]
for
F (x) = P (X ≤ x) = P (X ∈ (0; x]) = x
so that
x ∈ IR
• or, equivalently, from the cumulative distribution function of
X
F (x) = P (X ≤ x)
For the experiment of the wheel, for x ∈ [0; 1), it holds that
for
x ∈ IR
F (x) =

0














for x < 0
1.0
x
for 0 ≤ x < 1
0.5
1
for x ≥ 1
0.0
...............................................................................................
...
...
...
...
...
...
...
...
...
.
.
...
...
..
...
...
...
...
...
...
...
.
.
.
...
...
..
...
...
...
...
...
...
...
.
.
...........................................................................................................................................
−1
0
1
• in the continuous case, the cumulative distribution function
is characterized by the same three properties of the discrete
case.
150
151
2
The probability density function (pdf)
Probability of an interval
The probability density function of a continuous random variable
X is defined as
d
F (x)
f (x) =
dx
How can P (a < X ≤ b) be computed?
for the fundamental theorem of integral calculus it holds that
F (b) = P (X ≤ b)
Z b
= P (X ∈ (−∞; a] ∪ (a, b])
a
= P (X ∈ (−∞; a]) + P (X ∈ (a, b])
f (x) dx = F (b) − F (a) = P (a < X ≤ b)
and, consequently,
= P (X ≤ a) + P (a < X ≤ b)
1. f (x) ≥ 0 for every x ∈ IR
= F (a) + P (a < X ≤ b)
2.
so that
P (a < X ≤ b) = F (b) − F (a)
Z +∞
−∞
f (x) dx = 1
Note that it is NOT required that f (x) ≤ 1 because the values
of a probability density function are not probabilities.
152
153
Interpretation of the density function
2.0
Example of probability density function
1.0
f(x)
dx
= 1 for x ∈ [0; 1)
dx
f (a) 6= P (X = a)
note that
0.5
f (x) =
• The values of the density function are not probabilities,
1.5
In the experiment of the wheel
0.0
and zero otherwise.
P (X = a) =
−0.5
0.0
0.5
1.0
Z a
a
f (x) dx = 0.
1.5
2.0
x
R 0.7
1 dx = 0.4
P (0.3 < X ≤ 0.7) = 0.3
f(x)
1.0
0.5
R 0.7
P (0.3 ≤ X < 0.7) = 0.3
1 dx = 0.4
P (a − ǫ/2 ≤ X ≤ a + ǫ/2) =
0.0
R 0.7
P (0.3 < X < 0.7) = 0.3
1 dx = 0.4
• however, for a small ǫ > 0 it holds that
1.5
R 0.7
P (0.3 ≤ X ≤ 0.7) = 0.3
1 dx = 0.4
−0.5
0.0
0.3
0.5
0.7
1.0
1.5
Z a+ǫ/2
a−ǫ/2
f (x) dx ≈ ǫ f (a).
and therefore the probability that the outcome of the experiment is a value “close to” a point with higher density
is larger than the corresponding probability for a point with
lower density.
x
154
155
Expected value and variance of a continuous random variable
From the density function to the cum. distribution function
The cumulative distribution function can be computed from the
density function as follows
• The expected value (or mean) of a continuous random variable X is the number
E(X) = µX =
Z +∞
x f (x) dx
−∞
• the variance is
F (x) =
Z x
−∞
f (t) dt
Z ∞
2
Var(X) = σX
= E [X − E(X)]2 =
(x − µ)2 f (x) dx
−∞
n
per x ∈ IR
o
• the standard deviation is
SD(X) = σX =
q
Var(X)
156
157
The continuous uniform distribution
For an interval [a; b], the continuous uniform distribution has
density function
f (x) =
The exponential distribution
A random variable X has exponential distribution with parameter
λ if its support is the interval [0, ∞) and has
1
for x ∈ [a, b]
b−a
• probability density function:
and zero outside the interval.
f (x) = λe−λx
• cumulative distribution function:
• E(X) =
b+a
2
and
Var(X) =
(b − a)2
;
12
F (x) = 1 − e−λx
• we write X ∼Exp(λ).
• we write X ∼ U (a; b).
158
159
Exp(1)
The “memoryless” property
For λ = 1 the density function of the exponential distribution is
f (x) = e−x
1.0
0.8
0.6
0.4
0.2
1. the time it takes before your next telephone call;
2. the time until default (on payment to company debt holders) in reduced form credit risk modeling;
3. the time until a radioactive particle decays, or the time
between clicks of a geiger counter.
• The memoryless property means that “the future is independent of the past”, i.e. the fact that an event hasn’t
happened yet, tells us nothing about how much longer it will
take before it does happen. This says that the conditional
probability that we need to wait, for example, more than another 10 seconds before the first arrival, given that the first
arrival has not yet happened after 30 seconds, is equal to the
initial probability that we need to wait more than 10 seconds
for the first arrival.
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
....
....
...
....
.....
.....
......
......
......
......
.......
.......
........
.........
.........
...........
.............
................
....................
.............................
.......................................................
.............................................
0
1
2
3
4
5
E(X) =
Z ∞
xe−x dx = 1
Var(X) =
Z ∞
x2e−x dx − E(X)2 = 1
0
0
• Let X be the random variable associated with the arrival-time
of a given process. For instance,
160
161
PROBLEM: P (X > x) must be a function G(·) such that
Mathematical formulation of the memoryless property
PROBLEM: we want to characterize the probability distribution
of a random variable X describing the arrival-time of a memoryless process.
G(x + y) = G(x)G(y) for every x, y > 0
SOLUTION:
G(x) = eCx
In mathematical terms:
P (X > x + y|X > x) = P (X > y) for every x > 0
G(x) = eCx is a probability for C < 0. Hence, if λ > 0 we can
write
That is:
P (X > x) = e−λx
P (X > x + y) = P (X > x + y|X > x)P (X > x)
and
F (x) = 1 − P (X > x) = 1 − e−λx
= P (X > y)P (X > x).
then X ∼Exp(λ).
162
163
Transformations of random variables
• X random variable (discrete or continuous);
Linear transformations
• Y = g(X) function of X, for instance
If Y is a linear transformation of X, that is
1. Y = 5X + 3 (linear transformation);
Y =aX +b
2. Y = X 2 + 1 (non-linear transformation);
then
• the expected value of Y is E(Y ) = E[g(X)] and, in general,
E[g(X)] 6= g(E[X])
but it holds that
E(Y ) =
X
and
g(x) p(x)
E(Y ) =
x∈SX
Z +∞
−∞
1. E(Y ) = E(aX + b) = aE(X) + b
2. Var(Y ) = Var(aX + b) = a2Var(X).
g(x) f (x) dx
for the discrete and continuous case respectively.
164
165
Example of linear transformation
The standardization
For X ∼Exp(1) let Y = X/λ. Then
A specially relevant linear transformation is called STANDARDIZATION of a random variable X and is give by
FY (y) = P (Y ≤ y) = P (X/λ ≤ y) = P (X ≤ λy) = FX (λy) = 1 − e−λy
Z=
X − E(X)
SD(X)
it is easy to see that
RESULT: Y ∼ Exp(λ).
1. E(Z) = 0;
APPLICATION:
E(Y ) =
1
λ
2. Var(Z) = SD(Z) = 1.
1
Var(Y ) = 2
λ
166
167
Introduction to the normal (or Gaussian) distribution
The STANDARD normal distribution
The normal distribution is considered the most prominent probability distribution in statistics. There are several reasons for
this:
1. the “bell” shape of the normal distribution makes it a convenient choice for modelling a large variety of random variables
encountered in practice;
A random variable Z has standard normal distribution if its density function is
0.50 -
f (z) = √1
2
exp − 1
2 z
2π
n
o
0.25 -
0
2. the normal distribution arises as the outcome of the central
limit theorem, which states that under mild conditions the
sum of a large number of random variables is distributed approximately normally;
3. the normal distribution is very tractable analytically, that is,
a large number of results involving this distribution can be
derived in explicit form.
...................
...
....
...
...
...
...
...
..
...
...
...
...
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
.
...
.
...
...
....
...
...
..
...
...
.
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
.
...
...
...
...
...
.
...
...
...
...
.
...
.
...
...
....
....
.
.
.
.....
.
......
.....
.......
......
.
.
.
.
.
...........
.
.......
.........................................
............................................
−4 −3 −2 −1
0
1
2
3
4
z
Features of the standard normal distribution:
1. E(Z) = 0 and Var(Z) = SD(Z) = 1 and we write Z ∼
N (0, 1);
2. it is symmetric around the mean and unimodal;
3. its pdf is strictly positive for every z ∈ IR but the area under
the curve outside the interval (−4; 4) is close to zero.
169
170
Probability of some relevant intervals
The normal distribution
area between −1 and 1: 68.26%
Strictly speaking, it is not correct to talk about “the normal
distribution” since there are many normal distributions. Normal
distributions can differ in their means and in their standard deviations. The probability density function of a random variable
X with (arbitrary) normal distribution is
area between −2 and 2: 95.44%
area between -1.96 and 1.96: 95%
1
1
f (x) = √
exp − 2 (x − µ)2
2σ
σ 2π
area between −3 and 3: 99.74%
171
• E(X) = µ and Var(X) = σ 2;
• we write X ∼ N (µ, σ 2).
172
Normal distribution and linear transformations
Multiple random variables: the bivariate case
• If Z ∼ N (0, 1) then
• if X ∼ N (µ, σ 2) then
σ Z + µ = X ∼ N (µ, σ 2);
Let X1 and X2 be two random variables:
X −µ
= Z ∼ N (0, 1);
σ
• X1 and X2 are INDEPENDENT if and only if
• hence for the pdf of X it holds that
P ({X1 ≤ x1} ∩ {X2 ≤ x2}) = P (X1 ≤ x1) × P (X2 ≤ x2)
1. the area between µ − σ and µ + σ è 68.26%;
for every pair x1 and x2.
2. the area between µ − 2σ and µ + 2σ è 95.44%;
3. the area between µ − 1.96σ and µ + 1.96σ è 95%;
• Two (or more) random variables are IDENTICALLY DISTRIBUTED if they have the same probability distribution;
4. the area between µ − 3σ and µ + 3σ è 99.74%.
• the cumulative distribution function of Z is denoted by Φ(z).
• i.i.d.=independent and identically distributed.
173
174
Linear combination of two random variables
Let X1 and X2 be two random variables: a linear combination
of X1 e X2 is a random variable defined as
Expected value of a linear combination of two random variables
The expected value of Y = a1X1 + a2X2 + b is
Y = a1X1 + a2X2 + b
E(Y ) = a1E(X1) + a2E(X2) + b
where a1, a2 and b are real constants.
EXAMPLE: let X1 and X2 be the result of two die-rolls (a black
and a white die, say). X1 and X2 are i.i.d and Y = X1 + X2
is the linear combination corresponding to the sum of the two
resulting values.
175
EXAMPLE: if X1 and X2 are the results of the black and white
die-roll then
E(X1 + X2) = 3.5 + 3.5 = 7
and
E(X1 − X2) = 0.
176
Variance of a linear combination of two random variables
Multiple random variables
If X1 and X2 are INDEPENDENT, then the variance of Y =
a1X1 + a2X2 + b is
Consider the sequence of random variables
X1, X2, . . . , Xn
2
Var(Y ) = a2
1 Var(X1) + a2 Var(X2)
• These n random variables are MUTUALLY INDEPENDENT
if and only if for every x1, . . . , xn it holds that
EXAMPLE: if X1 and X2 are the results of the black and white
die-roll then,
P (X1 ≤ x1 ∩ X2 ≤ x2 ∩ · · · ∩ Xn ≤ xn)
Var X1
Var(X1
= Var(X1)
+ Var(X2) = 2.92 + 2.92 = 5.84
– X2)
= Var(X1)
+ Var(X2) = 2.92 + 2.92 = 5.84
+ X2
= P (X1 ≤ x1) × P (X2 ≤ x2) × · · · × P (Xn ≤ xn)
177
178
i.i.d. random variables: examples
Linear combination of random variables
E1: A box contains tickets labeled with either “0” or “1”. Let π
be the proportion of tickets with “1”. n tickets are extracted
with replacement from the box. For i = 1, . . . , n let Xi denote
the result of the ith extraction. The n random variables are
i.i.d. with Xi ∼Be(π).
E2: The “experiment of the wheel” is repeated n times. For
i = 1, . . . , n let Xi denote the result of the ith repetition
of the experiment. The n random variables are i.i.d. with
Xi ∼ U (0; 1).
E3: n married couples go to a dinner. Husbands seat on the same
side of the table, and wives on the opposite side. Everybody
chooses her/his seat randomly. For i = 1, . . . , n let Xi be
equal to “1” if the ith couple is sitting opposite each other
and “0” otherwise. These n random variables are identically
distributed with Xi ∼Be(1/n) but NOT INDEPENDENT.
179
A linear combination of the random variables X1, . . . , Xn is a
random variable defined as
Y = a1X1 + a2X2 + · · · + anXn + b
where a1, . . . , an, b are real constants.
Example:
E1: the total number of tickets with “1” in the n extractions is
Y = X1 + · · · + Xn and its distribution is Bin(n, π);
E2: the sum of the n results of the experiment is
Y = X1 + · · · + Xn ;
E3: the number of married couples sitting opposite each other is
Y = X1 +· · ·+Xn, but Y is NOT a binomial random variable.
180
Expected value of a linear combination of random variables
The expected value of Y = a1X1 + a2X2 + · · · + anXn + b is
E(Y ) = a1E(X1) + a2E(X2) + · · · + an E(Xn) + b
and, more specifically, if X1, . . . , Xn are identically distributed
with
E(Xi ) = µ
and
Variance of a linear combination of random variables
If the random variables X1, . . . , Xn are INDEPENDENT, the variance of Y = a1X1 + a2X2 · · · + an Xn + b is equal to
2
2
Var(Y ) = a2
1 Var(X1) + a2 Var(X2 ) · · · + an Var(Xn )
and, more specifically, if X1, . . . , Xn are i.i.d. with
Var(Xi) = σ 2
Y = X1 + X2 + · · · + Xn
and
Y = X1 + X2 + · · · + Xn
it holds that
it holds that
Var(Y ) = n × σ 2
E(Y ) = n × µ
EXAMPLE:
EXAMPLE:
E1: Var(Xi) = π(1 − π) and therefore Var(Y ) = nπ(1 − π);
E1: E(Xi ) = π and therefore E(Y ) = n × π;
1 and therefore Var(Y = n ;
E2: Var(Xi) = 12
)
12
n
E2: E(Xi ) = 1
2 and therefore E(Y ) = 2 ;
1 1 − 1 . However, Var(Y
E3: Var(Xi) = n
) = ?: in this case
n
independence does not hold.
1 and therefore E(Y ) = 1.
E3: E(Xi ) = n
181
182
Linear combination of normally distributed random variables
If X1, . . . , Xn are INDEPENDENT and NORMALLY DISTRIBUTED
then the linear combination Y = a1X1 + a2X2 + · · · + an Xn + b
has the following properties
The central limit theorem
If X1, . . . , Xn are i.i.d with E(Xi) = µ and Var(Xi) = σ 2 then the
distribution of the random variable
Sn = X1 + X2 + · · · + Xn
1. E(Y ) = a1 E(X1) + a2 E(X2) + · · · + an E(Xn ) + b;
is approximatively normal
2
2
2. Var(Y ) = a2
1 Var(X1) + a2 Var(X2 ) + · · · + an Var(Xn )
Sn ≈ N (n µ; n σ 2)
3. Y is normally distributed.
More specifically, if X1, . . . , Xn are i.i.d. with E(Xi ) = µ and
Var(Xi ) = σ 2 and, furthermore, Y = X1 + X2 + · · · + Xn , then
The symbol “≈” means “approximatively distributed as”. The
quality of the normal approximation increases with n, but also
depends on the probability distribution of Xi .
Y ∼ N (n µ; n σ 2).
183
184
pdf of S4 and N (4; 4)
Central limit theorem: example
pdf of S8 and N (8; 8)
0.25
X1, . . . , Xn are i.i.d. with density function f (x) = e−x for x > 0.
Hence, E(Xi) = 1 and Var(Xi) = 1.
theorem it follows that
From the central limit
0.16
....
....... ..........
...
..............
...
...... ..............
...
....
... ...
...
...
... ....
...
...
... ...
...
...
... ...
..
..
... ...
...
...
... ...
.... ....
... ...
... ...
... ...
... ...
.... ....
... ...
. .
... ...
... .....
.
.
... ...
. .
... ...
... ...
... ...
.... ....
... ...
... ...
... ...
.... ....
... ...
. .
... ...
... .....
... ...
.
.
..
... ..
... ...
... ...
... ...
........
..
... ...
... ...
.......
.
.
... ..
.
......
...
......
......
.
.......
......
......
.....
.......
.
.
.......
.... ..
.........
.... .....
...........
.
.
.
.
.............
...... ...
.................
....... .......
.......................
.
.
.
.
.
.
.
.
..........................................
....... ..............
..........................................................................
.............................................
...
....................
.....
.....
...
....
...
...
...
...............
...
....... ..... ...........
...
...
.....
.
....
..
... ......
....
....
...
...
...
...
...
....
.
...
...
...
..
...
...
...
...
.
.
...
.
...
.
...
...
...
...
...
...
...
.
....
.
...
...
.
...
...
...
...
...
...
.
.
...
.
.
.
...
..
...
...
...
...
...
...
...
.... .....
. .
...
.
... .....
... .....
.
.
...
.
. .
... ....
... ....
.... ...
.........
.... ....
.... ...
......
.... ...
.......
... ...
....
.... ...
.......
..... ...
.
.
.
..... ...
... ...
.........
.... ....
..........
.
.
.
.. .
.......
.... ..
.......
..... ....
...........
.
.
.
.
.
...............
..
......
.................
...
.......
......... ..............
.
.
.
.
.
.
.
.
.
............ ......................
..........
................. ........................................
...
....................
..........
0.20
0.12
0.15
0.08
0.10
0.04
0.05
X1 + X2 + · · · + Xn = Sn ≈ N (n; n)
pdf of S1 = X1 and N (1; 1)
1.0
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
.
.................
........ ..... .............
......
......
...
......
.....
...
.
.
.
.
.....
....
.....
.....
....
.....
.....
.
....
.
.
.
....
.....
....
....
.....
.....
.....
.
.
......
.
.
.....
..
......
...
.....
.......
.....
.
........ ..........
.
.
.
......... .....
......
..................
.......
.
...................
.
.
.
.
.
.
.
......................
......
......................................................................
............
.....................................................
................................................................
0.8
0.6
0.4
0.2
−3 −2 −1
0
1
2
3
4
pdf of S2 and N (2; 2)
pdf of S20 and N (20; 20)
0.4
..............
...
...
...
...
...
...
...
...
...
...
...
...
...
..
.......
...
....
........... .........
..... .... ........
...
...
...
....
.
.....
.
...
...
...
...
...
...
...
...
.
...
.
...
.
...
...
...
...
..
...
...
...
.
.....
...
...
..
...
..
...
...
...
...
.... ....
...
...
...
... ...
...
...
..... .....
...
...
...
.. ...
...
.
...
.........
... .....
.
...
.
.......
.
.... .....
.
.
... ..
... ...
..... ...
... ....
..... ....
.
...... ..
... ...
...... ....
....
..
.
.........
.
.
.
.
..........
....
.........
...
.....
.
.
.
...............
.
.
.
....................
.....
...
.......... ...................
........
..................................................................
...
...........
................
...................................
..
0.3
0.08
0.06
0.2
0.04
0.1
0.02
−3 −2 −1 0
1
0
−2−1 0 1 2 3 4 5 6 7 8 9 10 11 12
2
3
4
5
6
7
185
8
10
15
20
25
30
10
15
20
pdf of S50 and N (50; 50)
.......
...... ........
... .........................
... ....
... ....
... ...
... ....
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ....
.
... ...
. .
... ...
... ...
... ...
.... ....
... ...
... ...
... ...
.... ....
... ...
... ...
... ....
... ..
.... ...
... ...
. .
... ...
... .....
.
.
... ...
..
... ...
... ...
... ...
........
... ...
.. ..
... ...
.......
... ...
.
......
......
......
.......
.
......
......
......
.....
.......
.
......
...
......
...
.
.....
.
.
.....
.....
....
.......
...
.
.....
......
........
........
.
...........
.
...........
..... ...
.............
.............
.
.
.
................
.
.
.......................
...... ....
............................................
.......... ......
...
..............................................
5
5
35
0.06
0.05
0.04
0.03
0.02
0.01
.......
...... .................
... ...... ...............
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ....
.
... ...
. .
... ...
... ...
... ...
.... ....
... ...
... ...
... ...
........
... ..
.. ..
......
.......
......
.
......
......
......
.......
.
......
......
......
.......
......
.
......
......
......
.......
.
......
......
......
.......
......
.
......
......
......
.....
......
.
.....
...
....
...
.
.
.
....
......
....
.......
.....
.
.
.......
.......
...........
..........
.
.
.............
.
.
.
...............
...........
..................
...................
.
.
.
.
.
.............................
.
.
.
.
.
... .
.......................
...................................................
30
40
50
186
60
70
80