Download doc - Berkeley Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Transcript
STATC141 Spring 2006
Lecture 3, 01/26/2006
Introduction to Probability and Statistics (III-a)
This lecture extends the ideas of mean, standard deviation, and normal approximation to distributions
more general than the binomial. This involves sums and averages of randomly produced numbers. Random
variables, introduced below, provide a good notation for this purpose.
Random variables
The number of heads in four tosses of a coin could be any one of the possible values 0, 1, 2, 3, 4. The
term “random variable” is introduced for something like the number of heads, which might be one of several
possible values, with a distribution of probabilities over this set of values. Typically, capital letters X, Y, Z, etc.,
are used to denote random variables. For example, X might stand for “the number obtained by rolling a die”, Y
for “the number of heads in four coin tosses”, and Z for “the suit of a card dealt from a well-shuffled deck”.
This is not really a new idea, rather a compact notation for the familiar idea of something or other picked at
random according to a probability distribution.
The “range” of a random variable X is the set of all possible value that X might produce.
If X either has finitely many possible values, or that has a countably infinite number of possible values
(ie its possible values can be labeled by an integer), the distribution of X is determined by the probabilities of
individual values --- P(X = x) for x in the range of X, and
 P(X  x)  1
as x runs through the set of all
x
possible values of the random variable X. Here it is assumed that the random variable X has a uniquely
specified value, no matter what happens. So the events (X=x) as x varies over the range of X are mutually
exclusive, and their probabilities must add up to 1. This kind of X is called a “discrete random variable”,
and has a discrete probability distribution. Here are the examples for such kind of X’s: X can be the number of
calls you are expecting to receive from 2:00pm-4:00pm today. Or X is defined as the number of heads in four
tosses of a coin.
The “expectation” of a discrete random variable X is defined by
E ( X )   xP(X  x) .
x
The range of a continuous random variable X is over an interval of real numbers. We have introduced
how to use the normal curve to approximate the histogram of 19 students’ midterm scores. The idea there was
to approximate a discrete distribution of many small individual probabilities by scaling the histogram to make
it follow a continuous curve. The function defining such a curve is called a “probability density”, denoted f(x).
This function determines probabilities over an infinite continuous range of possible values. A random variable
X has density f(x) if for all a  b
b
P(a  X  b)   f ( x)dx.
a
The “expectation” of a discrete random variable X is defined by
E( X )  


xf ( x)dx.
If you try to predict the value of a random variable X by its mean E(X), you will be off by the random
amount X-E(X). It is often important to have an idea of how large this deviation is likely to be. Because
E(X-E(X))=E(X)-E(X)=0 it is necessary to consider either the absolute value or the square of X-E(X) to get an
idea of the size of the deviation without regard to sign. Because the algebra is easier with squares than with
absolute values, it is natural to first consider E[( X  E ( X )) ] ,then take a square root to get back to the
2
same scale of units as X .
The “variance” of X, denoted Var(X), is the mean squared deviation of X from its expected value:
Var ( X )  E[( X  E ( X ))2 ] .
The “standard deviation” of X, denoted SD(X), is the square root of the variance of X:
SD( X )  Var ( X ) .
If SD(X)>0, the random variable X* = (X-E(X))/SD(X) called X in “Standard units”, has E(X*) = 0 and
SD(X*) = 1.
Addition rule for expectation and variance
Let X1 , X 2 ,..., X n be random variables.
E ( X1  X 2  ...  X n )  E ( X1 )  E ( X 2 )  ...  E ( X n ) .
If X1 , X 2 ,..., X n are independent, Var ( X1  X 2  ...  X n )  Var ( X1 )  Var ( X 2 )  ...  Var ( X n ) .
Central limit theorem
Let Sn  X1  ...  X n be the sum of n independent random variables each with the same distribution
(or say iid). For large n, the distribution of Sn is approximately normal, with mean E ( Sn )  n , and standard
deviation SD(Sn )   n , where
  E ( X i ) and   SD( X i ) .