Download A short introduction to probability for statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Inductive probability wikipedia , lookup

Probability wikipedia , lookup

Transcript
A Quik Exploration of Probability for
Statistis (Part 1)
Consider a statistial experiment. That's an observation (a measurement),
whose outome may not be ertain beforehand. Examples are
•
Ask a person about their preferred ie ream avor
•
Observe the olor of the eyes of a person
•
Measure the voltage produed by a battery
•
Measure the body temperature of an individual
•
Measure the time between bus arrivals at a bus stop
Let's all the outome of this experiment
X,
whih we take to be a number (if
the observation is qualitative, like eye olor or preferred avor, we assume we
have set up a oding onvention).
In general, we don't know before atually
observing or measuring what value will be taken by
X,
but we assume we
know what values are possible. Suppose, for simpliity, that
X
an only take
a nite number of values (e.g., body temperatures assumed to be between 95.0
and 109.0, in inrements of 0.1), say,
experiment will be X
= xj ,
x1 , x2 , . . . , xn .
Then the result of the
if the measurement turns out to be
xj
(e.g., if the
temperature is measured to be 98.6, the result of the experiment is X
= 98.6).
Of ourse, we may be taking more than one observation, so eah one will
have its outome, and we ould onsider the ombined results as, for example,
X
= xj , Y = yk , Z = zl ,
and so on.
A probabilisti model is a proedure that assigns a number between 0 and 1
to the outome of our observations, 0 representing an impossible outome, and
1 a sure outome.
To make sure we do not run into paradoxes and inonsistenies, we do this
in a spei way.
1
2
Sample Spae and Events
For some fairly deep reasons, it is best to set our model by dening a set
S,
whih is usually alled sample spae, and think of it as the olletion of all
possible outomes (we don't need to be too spei about what we mean by
that), and think of our outome result
X
as a funtion mapping
S
to a set of
numbers:
X :S→R
so that the result X
X(s) = xj .
= xj is the set of elements in
S, K,
suh that, if
s ∈ K,
These funtions are alled random variables. You may want to
1
quikly review your knowledge of set theory at this point . We are thus inter-
S , dened by things like {s |X(s) = xj }, or, more generally,
{s |X(s) = xj , Y (s) = yk , Z(s) = zl }. We will all these sets events, and we
ested in subset of
have some tehnial assumptions that we need to make about these sets, whih
you an hek in a more mathematially oriented report.
We will assoiate a number between 0 and 1 to eah of these sets, as suggested
above. This number is alled a probability, and we denote the probability of
set
E
as
P [E].
The extreme values orrespond to impossibility (probability
0), and ertainty (probability 1).
Right now, temporarily, let's not worry about how we are doing the assoiation we just assume we an. For onsisteny reasons, the following properties
2
must be satised :
•
If
as
A and B are events,
A ∩ B = ∅), then
and if they ave no ommon points (we write this
P [A ∪ B] = P [A] + P [B]
• P [S] = 1,
that is, something has surely to happen.
If you are familiar with Venn diagrams, and think of probability of an event (a
set) as some kind of mass assoiated to the set, you will nd the following
more general result obvious (it is easy to prove it, from the previous statements,
and a bit of set theory):
P [A ∪ B] = P [A] + P [B] − P [A ∩ B]
1 We use the following symbols here. x is an element of the set A is written x ∈ A; A
A ⊂ B ; the set of elements that are both in A and in B (alled
B ) is written A ∩ B ; the set of elements that are either in A, or in
B , or in both (alled the union of A and B ) is written A ∪ B ; the set of elements in S , our
ontainer set, that are not in A, alled the omplement of A is variously denoted by S \ A,
A, or Ac (we will use the last here); the omplement of S is the empty set (the set with no
elements), denoted by ∅.
is a subset of
B
is written
the intersetion of
A
and
2 To develop the theory in a omplete and eetive way, there are some additional onsid-
erations to be made, speially about
ountable
olletions of events. This is not anything
we have to deal in our ontext, and is delegated to your mathematial probability lass, if
you'll take one.
3
Note that, when onsidering several observations, we are onsidering the
intersetion of the orresponding events. That is,
{s |X(s) = xj , Y (s) = yk , Z(s) = zl } = {s |X(s) = xj }∩{s |Y (s) = yk }∩{s |Z(s) = zl }
(1)
c
A onsequene of these properties is that, sine for any event, A ∪ A = S ,
[Ac ] = 1 − P [A]. In partiular, the omplement of S is the empty set, and
P
P [∅] = 0,
denoting the impossible event.
Conditional Probabilities and Independene
Sine we usually deal with more than one observation, the joint probabilities as
in the expression (1) above are really important to us. In general, they may be
diult to assess, and knowing eah of the individual probabilities is normally
not enough.
To address these questions it turns out to be very useful to dene how one
observation would aet the others. In other words, if, for example,
out to be equal to
x,
will turn out to be equal to
Denition
turns
Y
y?
We dene the onditional probability of
P [X = x |Y = y ] =
Note how we need
X
how does this new knowledge aet the likelihood that
X = x,
given that
P [X = x, Y = y]
P [Y = y]
Y =y
as
(2)
P [Y = y] > 0 for onditional probabilities to be dened (you
annot examine how the ourrene of an event whih annot our would aet
the probabilities of other events).
Typial simple examples are like the following: suppose we throw a die, and
assume that eah of the possible six outomes has the same probability (hene,
beause all six together exhaust the sample spae, whih has probability 1, eah
1
has probability 6 ). Let X be the outome of the toss. Suppose now that you
don't know the value of X , but have been told that the outome is an even
number let's denote this by
Y = 0 (Y = 1
X hange:
odd). Now, the probabilities for
would denote that the outome is
for example, the outome annot
be 1, 3, or 5, so
P [X = 1 |Y = 0 ] = 0
{X = 1} ∩ {Y = 0} = ∅. On the other hand, P [Y = 1] = 12 (sine
1
it is equal to {X = 1} ∪ {X = 3} ∪ {X = 5}, eah of whih has probability 6 ,
and whih do not have ommon points), while {X = 1} ⊂ {Y = 1}, so that
P [X = 1, Y = 1] = P [X = 1]. Combining this with (2), we nd that
beause
P [X = 1 |Y = 1 ] =
1/6
2
1
= =
1/2
6
3
4
There are a ouple of onsequenes of (2) that are useful. The rst is the
equivalent formula
P [X = x, Y = y] = P [X = x |Y = y ] P [Y = y]
(sometimes alled the multipliation formula), whih shows how we need to know
onditional probabilities in order to be able to determine joint probabilities. The
seond onsequene is alled Bayes' Formula, and follows from the multipliation
formula:
P [Y = y |X = x ] =
P [X = x |Y = y ] P [Y = y]
P [X = x]
(3)
While this formula is the starting point of a dierent approah to statistial
analysis than the one whih we will be mostly onerned with, it is also an
interesting formalization of so-alled indutive arguments. In a separate le
(look for the le Conditional.pdf ), we'll show how it an be used to reah solid
onlusions in some simple, but fun, problems.
We an make one additional observation about the denominator in (3). Suppose
Y
an take only values
y1 , y2 , . . . , ym , with known probabilities P [Y = yk ].
Then, sine one (and only one) of these has to our, we an say that the event
{X = x},
an be deomposed into its intersetions with these
m events, whih
have no point in ommon. Consequently, we will have that
P [X = x] = P [X = x |Y = y1 ] P [Y = y1 ] + P [X = x |Y = y2 ] P [Y = y2 ] + . . .
. . . + P [X = x |Y = ym ] P [Y = ym ]
a formula that is sometimes alled the Total Probability Formula, and omes
often handy when working with Bayes' Formula.
When performing repeated statistial observations we often try to avoid a
situation where one outome aets the probability of another one. For this to
hold, we need
P [X = x |Y = y ] = P [X = x]
and looking at (2), we see that this is equivalent to
P [X = x, Y = y] = P [X = x] P [Y = y]
When this is the ase (for all
x and y ), we say that the random variables X and Y
n random variables X1 , X2 , . . . , Xn
are independent. More generally, we say that
are independent if, for all values that they an take, we have
P [X1 = x1 , X2 = x2 , . . . , Xn = xn ] = P [X1 = x1 ] P [X2 = x2 ] . . . P [Xn = xn ]
Referring to a previous omment, this is the ase when joint probabilities an
be omputed from the individual probabilities alone. This simpliation makes
it very tempting to assume that independene is the ase, even when there is
no evidene of that, or, worse, when it positively is not so. Many mistakes in
statistis have been aused by faulty independene assumptions.
5
Classial Probability and Other Options
Curiously enough, the rst studies in quantitative probability onerned games of
hane (oin ips, die games, ard games, roulette, and so on). In most ases, the
events in suh games an be onstruted starting from a nite set of events that we
an assume
have the same likelihood of happening .
Typial examples, are ips of a
fair oin (no reason to assume that we will end up with more heads than tails or
vie-versa), and throws of a fair die.
The assumption of not having any reason
to assume that one outome is more likely than another is alled the priniple of
suient reason, and it provides a tool, when appliable, to alulate probabilities
from rst priniples.
It also leads to interesting, and sometimes surprising results
that are a lot of fun.
Note:
The appliability of this approah, sometimes alled Classial Probability, is
limited to the ase when we have a nite sent of possible outomes, and it is
reasonable to assume that a basi set of these an be assigned equal probabilities.
Standard examples are provided by games of hane suh as those mentioned.
For
X = i, Y = j
with
example, when tossing two die, whose resulting values we write as
i
and
j
taking integer values form 1 to 6, it is easy to onlude that we an assign
equal probabilities to the 36 events of the form {X = i, Y = j}, eah having probability
1
. With a little eort, we an then alulate the probabilities of X + Y taking
36
1
spei values (for example, P [X + Y = 7] = 6 , sine this result omes from the union
of the six events {X = 1, Y = 6}, {X = 2, Y = 5}, {X = 3, Y = 4}, {X = 4, Y = 3},
{X = 5, Y = 2}, {X = 6, Y = 1})
As you an see, if Classial Probability is viable, the problem redues to a ounting
problem. Don't that fat fool you in thinking that it will always be an easy problem:
ounting things when there is a lot of them an be extremely diult, if we don't have
several hundred or thousand years available.
In general, we will need other tools. Sometimes, using limit theorems, similar to
those that we will look at shortly, an provide us with a priori models for omplex
systems (we will give a ouple of examples). At other times, the same limit theorems
justify the use of observed frequenies, when an experiment is repeated many times,
3
resulting in something like an empirial probability This is what we will be onerned
about: using statistis to speify or to validate a spei probabilisti model.
As a side note, we should remark that even this approah is not always available:
espeially in the soial sienes we may fae situations where the observation will be a
on-o event (no possibility of repeating it many times), and we still would like to assign
probabilities. In this ase, it has been proposed to onsider subjetive probabilities
probabilities onsidered a measure of personal ondene in the likelihood of spei
results.
This is an approah strongly onneted with Bayesian statistis that has
had a revival of interest in reent years, espeially sine to elaborate on it, it is often
neessary to work with ompliated probability expressions, and the advent of powerful
inexpensive omputing resoures has made this approah muh more pratial.
We
will not be onerned with this line of researh in this lass, but be aware that it is
out there.
3 As we will see, if we are areful, we an interpret probabilities as it is usually done, that is
as a limit value of the frequeny of ourrene, when we repeat our observations a suient
number of times we expet roughly half of a sequene of fair oin tosses to result in Heads,
and half in Tails.