Download A review of statistical formulas, and a review of probability formulas and facts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Probability amplitude wikipedia , lookup

Misuse of statistics wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Some Topis for 146
1
Desriptive Statistis
1.1
Reapping formulas
For an experiment resulting in
1
n
n
data points,
•
The mean:
•
The population variane:
•
The population standard deviation:
•
The sample variane:
•
The sample standard deviation:
•
The range:
•
The midrange:
•
x̄ =
Pn
we may dene
xi
1
n
σ2 =
s2 =
1
n−1
2
Pn
(xi − x̄)
√
σ = σ2
i=1
Pn
2
(xi − x̄) =
√
s = s2
i=1
n
2
n−1 σ
R = max {xi } − min {xi }
The quantiles:
is less than
i=1
x1 , x2 , . . . , xn ,
min{xi }+max{xi }
2
Qα
= min {xi } +
R
2
= max {xi } −
is a value suh that a fration
α (100α%)
R
2
of the data
Qα
as speial ases we have
α = 0.5: the median
α = 0.25, 0.75: the 1st and 3rd quartile
α = 0.01, 0.02, . . . , 0.99 : the perentiles
•
The average deviation is best dened in terms of the median
m = Q0.5 :
n
d=
1X
|xi − m|
n i=1
although you will also nd alulated plaing
mula.
1
x̄
instead of
m
in this for-
1 Desriptive Statistis
1.2
2
Formulas for the variane
Let's go over this algebra point again. The numerator in the variane formula
is
n
X
i=1
1 Pn
i=1
n
this is equal to
where
x̄ =
xi .
n
X
i=1
(sine
x̄
(xi − x̄)2
(1)
Expanding the square (reall that
2
(a − b) = a2 −2ab+b2 )
n
n
X
X
xi x̄ + nx̄2
x2i − 2
x2i − 2xi x̄ + x̄2 =
i=1
i=1
is a xed number now). We also have that
n
X
xi = nx̄
i=1
(from the formula for
x̄),
n
X
i=1
so the expression is equal to
x2i − 2nx̄2 + nx̄2 =
n
X
i=1
x2i − nx̄2
If we use the formula for the population variane (divide the sum (1) by
n),
we nd
n
σ2 =
1X 2
x − x̄2
n i=1 i
(2)
(as already noted previously, around the rst test).If we use the formula for the
sample variane (divide by
n − 1),
n
s2 =
we have
n
1 X 2
x −
x̄2 =
n − 1 i=1 i
n−1
Pn
x2i − nx̄
n−1
i=1
(3)
As you may notie, using either (2), or (3), depending on our hoies, we may
ompute the variane (and, orrespondingly, the standard deviation), by using,
as summary data, the sum of the squares of the data, and the sum of the data
- no need to keep trak of eah individual datum.
2 Probability Formulas
1.3
3
Whih measures should we worry about?
We enountered several measures. For example:
•
measures of entral tendeny: mean, median, midrange
•
measures of dispersion:
variane (in two avors)/standard deviation,
average deviation, Inter Quartile Dierene, range, 1st and 3rd Quartile,
minimum and maximum
While all are used in traditional desriptive statistis (advaned desriptive
statistis is going into more sophistiated territory, using elaborate omputer
based graphing representation tehniques), most are not suited for inferential
use.
This term refers to the use of mathematial models for the observation
proess that allow quantitative evaluations of the estimates we make from the
data.
The main measures used in this latter setting are
•
the mean
•
the variane/standard deviation
These measures are suient when the most ommon mathematial model for
our observation is appropriate. Sometimes this is not the ase (even though you
may nd it applied nonetheless - it is so easy to use, that it gets abused more
than you would expet). In suh ases we may want to refer to perentiles (as
many as pratial) as a more exible tool. In the most general ase, the full set
of observations may be needed to reah a reliable result.
2
Probability Formulas
2.1
The basis
We assume that we are given
•
•
•
a sample spae
S
all (or many) subsets of
S,
alled events:
A ⊆ S.
We may operate
on events using unions (A ∪ B ), intersetions (A ∩ B ), and omplements
c
(A = S − A: the set of all elements in S that are not in A)
a probability: to eah event
A
we assoiate a number
P [A],
with the
following properties:
0 ≤ P [A] ≤ 1
P [S] = 1
P [Ac ] = 1 − P [A]
P [A ∪ B] = P [A] + P [B] − P [A ∩ B]
From these assumptions we an derive many interesting onsequenes. To do
this, we have to add a few additional notions that build on this setup.
2 Probability Formulas
2.2
4
Conditional probabilities
We dene the following quantity (the probability of an event
an event
P [A |B ] =
P [A ∩ B]
P [B]
This is interpreted as the adjusted probability of
B
A,
onditioned on
B ):
(4)
A, if we know (or assume) that
has happened (is true). As an example, in the simplest mode of the toss of
a die, if we take
• S = {1, 2, 3, 4, 5, 6}
• P [{1}] = P [{2}] = P [{3}] = . . . = P [{6}] =
• A = {5}, B = {1, 3, 5} (A
1
6
is the event the die landed with fae 5 up;
B
is the event the die landed with a fae with an odd point up), we have
P [A] =
1
6
P [B] = P [{1}] + P [{3}] + P [{5}] =
A∩B =A
P [A |B ] =
P [A]
P [B]
=
1
6
1
2
=
2
6
=
1
2
1
3
In this example, if we know that an odd point has appeared, the probability
1
1
that it may be 5 inreases from
6 to 3 .
Formula (4) an be rewritten as
P [A ∩ B] = P [A ⌊B ] P [B]
Sine it also follows that
P [B ⌊A ] =
P [A ∩ B]
P [A]
so that
P [A ∩ B] = P [B ⌊A ] P [A]
we onlude that
P [A ⌊B ] P [B] = P [A ∩ B] = P [B ⌊A ] P [A]
One way of using this formula is to observe that it implies
P [A ⌊B ] =
P [B ⌊A ] P [A]
P [B]
(Bayes' Formula). This is a very useful tool (as an example, see the relevant
problem in the assignment due July 27)
2 Probability Formulas
2.3
5
Independene
P [A |B ] = P [A], then P [A ∩ B] = P [A |B ] P [B] = P [A] P [B],
P [B ⌊A ] = P [B]. In
this ase we say that the two events, A and B , are independent. Sine, as already
If it happens that
whih, by the symmetry of this formula, implies that also
noted, in this ase
P [A ∩ B] = P [A] P [B]
many alulations are greatly simplied when dealing with independent events.
However, this is an assumptions that has to be examined ritially when it is
applied to a real situation.
2.4
Random variables
In pratie, we deal very rarely with single events. What we mostly deal with
are
funtions dened on the sample spae.
For example, the die model disussed
in setion 2.2 an be formulated as follows.
Let
S
be a spae where eah point represents the possible outome of the
toss of a die.
Here, you an onsider any number of features as part of the
outome. We now dene a funtion
X : S → {1, 2, 3, 4, 5, 6}, where X = k for
k points up. This is
all outomes where the die ends up with the fae with
a more exible framework, as it allows us, at least in priniple, to analyze the
experiment toss of a die in muh greater detail, if we felt it neessary. Now,
our event
A
{X = 5}, Similarly, B = {X = 1, 3, 5}.
P [X = x], where x runs over all possible val(probability) distribution of X . If the possible values of
is dened as
The olletion of probabilities
ues of
X
X,
is alled the
are too many to be aounted for individually, we resort to keeping trak of
probabilities of events like
{a ≤ X ≤ b}, {X > c}, {X < d},
and so on.
The full distribution of a random variable an be a pretty ompliated objet,
as soon as the possible values of the variable are a large number. We introdue
then several summaries of distributions, in analogy with the measures we
introdued in desriptive statistis. We will make a strong onnetion between
these two sets of summaries/measures momentarily.
We dene
•
The mean, or the
expeted value
of
X
P
EX = µX = x xP [X = x]
for x,with weight equal to the
as
(this is the weighted average of all values
probability of eah value)
xk P [X = x]
P
k
k
• The k th entered moment of X as mk = E (X − EX) = x (x − EX) P [X = x]
•
The k th absolute moment of
•
In partiular, for
P
2
x (x − EX)
k = 2,
we an also dene quantiles
α
X,
as
EX k =
we dene the
P
x
variane of X
as
2
m2 = σX
=
Qα : they are numbers suh that P [X ≤ Qα ] =
2 Probability Formulas
6
X
Moments are far from exhaustive information on
(at least when we know
only a few of them), but they are usually muh easier to handle than the full
distribution.
2.5
Conditional expetation
If we have two random variables,
X
and
Y,
with their respetive distributions,
it may be useful to onsider their onnetions, and this is done through the
introdution of
ments
onditional distributions,
and, onsequently, of
onditional mo-
(espeially, onditional means, and onditional varianes).
In fat, we
an easily dene
P [X = x |Y = y ] =
P [{X = x} ∩ {Y = y}]
P [{Y = y}]
One we have this, we an also dene, for example,
E [X |Y = y ] =
X
x
xP [X = x |Y = y ]
(note how this turns out to be a funtion of
y ).
These tools turn out to be
extremely versatile in exploring probability models, but we will avoid delving
too deeply in this diretion.
2.6
Independene of random variables
One item that we want to explore in more detail is independene, when it omes
to random variables. Sine a random variable is haraterized by the totality of
its events
{X = x},
rather than by one only, it is natural to dene
Two random variables
X
and
Y
are said to be
independent
if
P [X = x |Y = y ] = P [X = x]
for all
x
and all
y.
It turns out that this is the useful notion of independene, rather than the
notion of independene of single events.
The assignment due July 27 has a
problem where this is illustrated learly: studying the toss of two die, by a
freaky oinidene, we nd that two events that are learly onneted happen
to be independent. However, if instead of looking at these two events in isolation,
we onsider the two random variables that dene them, these two (again, learly
onneted) are very far from independent.
3 Conneting with statistis
3
7
Conneting with statistis
A rst onnetion with our basi problem, dealing with observations, with probability is the following observation.
Suppose we observe some quantity repeatedly (e.g., we make several measures of a physial quantity). Say we observe the values
x1 , x2 , . . . , xn .
We an
onstrut a probability model for this experiment that says nothing more than
the data itself, but may suggest a way to onnet to a more general model.
We assume there is some sample spae
whih takes values
x1 , x2 , . . . , xn .
S,
and a random variable
X
on
S,
Sine we have no reason to onsider any of
these values more signiant than any other, we assume that the distribution
of
X
is given by
P [X = xk ] =
for all
k = 1, 2, , . . . , n.
1
n
This distribution is alled an
empirial distribution.
We
now notie that, for example,
•
the mean of the data
•
the population variane of the data is equal to
•
the quantiles
Qα
x̄ = EX
2
σX
of the data are the quantiles of the distribution of
X
In other words, eah experiment an be seen as a probability model. The deliate
issue is that, in ase we should repeat the experiment a seond time, the model
will be most likely dierent.
The deeper way to onnet these two topis is now at hand.
3.1
Inferential statistis
We look at a statistial experiment as follows. We assume there is some sample
spae
X
S,
and some random variable
X
modeling the quantity we are observing.
has a distribution, and we want to get information on this distribution. To
this end, we make a number of repeated observations of
X.
Hopefully, these re-
peated experiments are all idential and independent (if not, there a re ways to
handle the situation, but the alulations beome muh more omplex). This is
the same as saying, we have
n independent random variables X1 , X2 , X3 , . . . , Xn ,
with their distributions, whih are all the same (suh a olletion is alled an
i.i.d olletion, for independent, identially distributed. We atually observe
n values, x1 , . . . , xn . If we new the distribution of X , and hene of the n i.i.d variables Xi , we ould alulate the probability P [X1 = x1 , X = x2 , . . . , X = xn ],
but the problem is that this distribution is preisely what we are trying to
disover!
To solve this paradox, we take a reasonable attitude: we deide to rely
on the fat that events that have small probability are not likely to our - if
something happened, it should have a reasonably large probability. Of ourse,
suh a statement is not terribly strong: exept for impossible events, anything
3 Conneting with statistis
8
an happen. However, most of the time, we an assume that what happens has
a reasonably large probability of happening, and that small probability events
are very rare. If so, we will be wrong only rarely...
All the above may be reasonable, but is only a delaration of priniple. We
need to translate this into a pratial strategy. This is what we will start doing
in the next hapter. In the meantime, we quote two impressively general results,
whih will be of great help in building our pratial strategy. Their proof varies
from pretty easy, to somewhat tehnial, but we will skip both.
If you are
urious, a separate le will provide some info.
3.2
Limit theorems
The basi strategy in statistis is that more observations mean better results.
To this end it is useful to know two mathematial theorems that takle the issue
of what happens when you are having very many observations.
3.2.1
The Law of Large Numbers (LLN)
This is not to be onfused with the empirial 'law' of large numbers that we
disussed in lass. The latter is an empirial observation that seems to apply to
some well organized irumstanes. The law we are quoting here is an abstrat
mathematial theorem. In a slightly less general form than tehnially possible
it states:
Let
X1 , X2 , . . . , Xn , . . . be a sequene of i.i.d.
µ. Then
X
1
n
P i=1 Xi − µ > δ < ε
n
random variables,
with mean
provided we hoose
n
large enough.
What this means, pratially, is that if we make a very large number of independent, identially distributed observation, the arithmeti mean of our results
is very lose to the theoretial expeted value of the random variable we are
studying.
In other words, our
x̄
is going to be lose to the theoretial value
EX = µ,
if we make enough observations.
This is a big deal, with the only downside that there is no pratial indiation
of how large
to
µ.
n must be to be sure
(or, at least, ondent) that
To help somewhat, here is the next major theorem.
x̄ is lose enough
3 Conneting with statistis
3.2.2
9
The Central Limit Theorem
(Central here refers to the fat that this theorem is entral in statistial
appliations, as we will see - there is no other enter that it is referring to)
Suppose we have, again, a sequene of i.i.d. random variables, with mean
σ 2 . Then
µ,
and variane
"
1
P a≤ √
nσ 2
n
X
i=1
#
(Xi − µ) ≤ b ≈ Φ (b) − Φ (a)
(5)
2
x
√1 e− 2
2π
(this is the famous bell urve, or Gaussian urve), up to horizontal oordinate
where
Φ (x) is a funtion that returns the area under the urve ϕ(x) =
x:
This is not a funtion that is easily omputed, hene it is pre-programmed
in any statistial software pakage (as well as in any spreadsheet), and is also
tabulated in any statistis or probability book.
The ≈ is meant in the sense that the dierene between the two sides of
(5) beomes smaller and smaller as
n
grows.
With a little reetion, we may notie that this result does help in evaluating
how far we have to go so that
x̄
is lose to
µ.
What is frustrating is that we are
just shifting the problem by one noth, as there is no help in learning how far
we have to go so that the two sides of (5) are as lose as we would like. There
are further results to help even in this, but we'll leave the topi for our future
probability lass (if any).
3.2.3
Wrapping Up
It would be interesting to dig a little around these two theorems and learn more
about what they really mean, but, for the time being, let us limit ourselves to
these two vague statements, that we will be relying on a lot.
1. The LLN states, intuitively, that
x̄ ≈ µ,
at least if we have enough data
2. The CLT an be rewritten (with just a little algebra) as
P a≤
1
n
Pn
Xi − µ
√
≤ b ≈ Φ(b) − Φ(a)
σ/ n
i=1
A random variable suh that the relation above holds exatly is alled
Gaussian, or Normal. Hene, the CLT is saying that (assuming we knew
µ
and
σ ),
(essentially) regardless of the distribution of
mean, alulated as
1
n
X,
the modied
Pn
Xi − µ
√
σ/ n
i=1
will be (at least approximately) a Gaussian random variable.
3 Conneting with statistis
Remark/Exerise:
with mean
10
See if you an gure this out: given
2
and variane σ , the mean
n
i.i.d random variables,
µ,
n
1X
Xi
n i=1
σ2
n . Now, look at the previous version of the
CLT, and notie the onnetion.
has mean
µ,and
variane