Download doc - Berkeley Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Abuse of notation wikipedia , lookup

Elementary mathematics wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Expected value wikipedia , lookup

Transcript
Introduction to Probability (II)
This lecture extends the ideas of mean, standard deviation. This involves sums and averages of randomly
produced numbers. Random variables, introduced below, provide a good notation for this purpose.
Random variables
The number of heads in four tosses of a coin could be any one of the possible values 0, 1, 2, 3, 4. The
term “random variable” is introduced for something like the number of heads, which might be one of several
possible values, with a distribution of probabilities over this set of values. Typically, capital letters X, Y, Z, etc.,
are used to denote random variables. For example, X might stand for “the number obtained by rolling a die”, Y
for “the number of heads in four coin tosses”, and Z for “the suit of a card dealt from a well-shuffled deck”.
This is not really a new idea, rather a compact notation for the familiar idea of something or other picked at
random according to a probability distribution.
The “range” of a random variable X is the set of all possible value that X might produce.
If X either has finitely many possible values, or that has a countably infinite number of possible values
(ie its possible values can be labeled by an integer), the distribution of X is determined by the probabilities of
individual values --- P(X = x) for x in the range of X, and
 P(X  x)  1
as u runs through the set of all
x
possible values of the random variable X. Here it is assumed that the random variable X has a uniquely
specified value, no matter what happens. So the events (X=x) as x varies over the range of X are mutually
exclusive, and their probabilities must add up to 1. This kind of X is called a “discrete random variable”,
and has a discrete probability distribution. Here are the examples for such kind of X’s: X can be the number of
calls you are expecting to receive from 2:00pm-4:00pm today. Or X is defined as the number of heads in four
tosses of a coin.
The “expectation” of a discrete random variable X is defined by
E ( X )   xP(X  x) .
x
The range of a continuous random variable X is over an interval of real numbers. We once used the
normal curve to approximate the histogram of 19 students’ midterm scores. The idea there was to approximate
a discrete distribution of many small individual probabilities by scaling the histogram to make it follow a
continuous curve. The function defining such a curve is called a “probability density”, denoted f(x). This
function determines probabilities over an infinite continuous range of possible values. A random variable X
has density f(x) if for all a  b
b
P(a  X  b)   f ( x)dx.
a
The “expectation” of a discrete random variable X is defined by
E( X )  


xf ( x)dx.
If you try to predict the value of a random variable X by its mean E(X), you will be off by the random
amount X-E(X). It is often important to have an idea of how large this deviation is likely to be. Because
E(X-E(X))=E(X)-E(X)=0 it is necessary to consider either the absolute value or the square of X-E(X) to get an
idea of the size of the deviation without regard to sign. Because the algebra is easier with squares than with
absolute values, it is natural to first consider E[( X  E ( X )) 2 ] ,then take a square root to get back to the
same scale of units as X .
The “variance” of X, denoted Var(X), is the mean squared deviation of X from its expected value:
Var ( X )  E[( X  E ( X ))2 ] .
The “standard deviation” of X, denoted SD(X), is the square root of the variance of X:
SD( X )  Var ( X ) .
If SD(X)>0, the random variable X* = (X-E(X))/SD(X) called X in “Standard units”, has E(X*) = 0 and
SD(X*) = 1.
Addition rule for expectation and variance
Let X1 , X 2 ,..., X n be random variables.
E ( X1  X 2  ...  X n )  E ( X1 )  E ( X 2 )  ...  E ( X n ) .
If X1 , X 2 ,..., X n are independent, Var ( X1  X 2  ...  X n )  Var ( X1 )  Var ( X 2 )  ...  Var ( X n ) .
Central limit theorem
Let Sn  X1  ...  X n be the sum of n independent random variables each with the same distribution
(or say iid). For large n, the distribution of Sn is approximately normal, with mean E ( Sn )  n , and standard
deviation SD(Sn )   n , where
  E ( X i ) and   SD( X i ) .
The law of average
John Kerrich, a South African mathematician, found out “When you toss the coin many times, the
number of heads will be around half the number of tosses, but it will be off by some amount-chance error.
As the number of tosses goes up, the chance error gets bigger in absolute terms. Compared to the number
of tosses, it gets smaller.”
The above process can also be described by “the sum of draws”. The sum of draws from the box is
the key ingredient for many statistical procedures. For example, take the box
Imagine drawing n times at random with replacement from this box (the phrase “with replacement”
reminds you to put the ticket back in box before drawing again), and make a note on the number at each
draw. You can add up all the numbers, then the sum should be around n/2 but be off by some chance error,
say e. When n is bigger, e gets bigger but e/n gets smaller. In fact, when n goes to infinity, e will go to
infinity as well but e/n will decrease to 0.