Download doc - Berkeley Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Lecture 3, STATC141
Jan. 26, Spring 2006
Introduction on Probability and Statistics (III-b)
I. Uniform Distribution. A uniform distribution is one for which the probability of occurrence is the
same for all values of X. It is sometimes called a rectangular distribution. For example, if a fair die is
thrown, the probability of obtaining any one of the six possible outcomes is 1/6. Since all outcomes are
equally probable, the distribution is uniform. If a uniform distribution is divided into equally spaced
intervals, there will be an equal number of members of the population in each interval.
The probability density function P(x) and cumulative distribution function D(x) for a continuous uniform
distribution on the interval [a, b] are
II. Normal Distribution. A normal distribution in a variate X with mean  and variance  2 has
probability density function P(x) on the domain x  (, ) . While statisticians and mathematicians
uniformly use the term “normal distribution” for this distribution, physicists sometimes call it a Gaussian
distribution and, because of its curved flaring shape, social scientists refer to it as the “bell curve.” The
probability density function (pdf) P(x) is
P( x) 
2
2
1
e ( x   ) /(2 )
 2
The so-called “standard normal distribution” is given by taking   0 and   1 in a general normal
distribution. An arbitrary normal distribution can be converted to a standard normal distribution by
changing variables to Y 
X 

. The cumulative distribution function D(x), which gives the
probability that a variate will assume a value  x , is then the integral of the normal distribution,
x
D( x)   P(t )dt 

1
 2

x

e (t   )
2
/(2 2 )
dt
III. Poisson Distribution. The Poisson distribution is most commonly used to model the number of
random occurrences of some phenomenon in a specified unit of space or time. For example,



The number of phone calls received by a telephone operator in a 10-minute period.
The number of flaws in a bolt of fabric.
The number of typos per page made by a secretary.
For a Poisson random variable, the probability that X is some value x is given by the formula
P( X  x) 
 x e 
x!
, x  0,1,......
where  is the average number of occurrences in the specified interval. For the Poisson distribution,
E ( X )   , Var ( X )  
EXAMPLE: The number of false fire alarms in a suburb of Houston averages 2.1 per day. Assuming that
a Poisson distribution is appropriate (   2.1 ), the probability that 4 false alarms will occur on a given
day is given by
P( X  4) 
2.14 e2.1
 0.0992
4!
Poisson is ultimate Binomial.
Assuming that there are n independent trials and success probability of each trial is p, the number of
successful trials X, out of the n trials, then follows a binomial distribution. And
n
n!
P( X  x)    p x (1  p) n x 
p x (1  p) n  x .
x !(n  x)!
 x
If n  , p  0, and np   with  >0 , then
n!
p x (1  p ) n  x
n  x !( n  x )!
lim P( X  x)  lim
n 

1

 lim  (np)((n  1) p )...((n  x  1) p )(1  ) n  x 
n  x !
n


x 
 e

.
x!
(1)
Test of Significance
Was it due to chance, or something else? Statisticians have invented “tests of significance”
to deal with this sort of question. Nowadays, it is almost impossible to read a research article
without running across tests and significance levels.
For example, assume that we observed 9 heads in 10 tosses of a coin. Since 9 is much
different to what we have expected (5 heads), we would like to ask “is the coin a fair coin, or
not?” There are then two statements about the coin – null hypothesis and alternative hypothesis.
Each hypothesis represents one side of the argument:
 Null hypothesis—the coin tossed is a fair coin.
 Alternative hypothesis—the coin tossed is NOT a fair coin.
The null hypothesis expressed the idea that “the outcome of 9 heads in 10 tosses” is due to
chance, or say the difference is due to chance. The alternative hypothesis says that “the outcome
of 9 heads in 10 tosses” is due to the unfairness of the coin, or say the difference is real.
A test statistic is used to measure the difference between the data and what is expected on
the null hypothesis. The P-value of a test is the chance of getting a big test statistic—assuming
the null hypothesis to be right. P is not the chance of null hypothesis being right. If the P-value
is small, say it is less than 0.05, we would reject the null hypothesis.
For the above example, we can define the test statistic as |X-5|, where X is the number of
heads we observe in 10 tosses of a coin. Then the observed test statistic is 4. And the P-value is
the probability of the event | X  5 | 4 when the coin is fair:
P(| X  5 | 4)  P( X  0)  P( X  1)  P( X  9)  P( X  10)
=(1/2)10  10(1/ 2)10  10(1/ 2)10  (1/ 2)10
=0.02
This means that if the coin is fair, there is only about 2% probability to observe | X  5 | 4 . So
we would prefer to believe that the coin is NOT fair.