Download Example - LSE Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Chapter 2
DISCRETE PROBABILITY DISTRIBUTIONS
3.1 The Random Variable
This is a concept which attaches probability properties to the quantitative results of an
experiment.
When an experiment is performed it is possible that many different variables
can be measured as a result of the experiment. For example, when a person is sampled at
random from a group of people then we may measure many different variables associated
with the selected person; such as height, weight, age, sex (1 or 0) etc. Similarly if a chemical
sample is the result of an experiment then, for this single sample, we may measure % of
various constituents, temperature, pH value, weight etc. The value of any one of these
variables of interest will vary from sample to sample and so we call this variable the
RANDOM VARIABLE. Hence a random variable is the concept of a measurement which
takes a particular numerical value for each sample. For example if we sample 5 persons and
measure their heights then height is the random variable of interest and the 5 values that we
have are the realizations of this variable for these 5 samples. At this point it may seem that we
are making a fuss over a very simple concept but it is important to have a clear grasp of this in
order to appreciate the sampling concepts in later chapters.
As a further example; a fair die is thrown 4 times and we observe 2 6's , a 3 and a 1.
The random variable is the number of spots on the upturned face of the die and for these 4
trials it takes the values 6,6,3,1.
3.2
The Probability Distribution
The treatment of the probability distribution varies according as to whether the
random variable of interest is discrete or continuous valued although the treatments are
similar in many ways. In this chapter we limit our investigation to integer valued random
variables - namely counts. Continuous valued variables are the subject of chapter 4.
We have seen that the random variable takes numerical values as the result of a trial.
The set of all the possible values that it can take is called its SAMPLE SPACE. here are
some examples of random variables and their sample spaces.
Experiment
Random Variable
Sample Space
Die is thrown
Coin is tossed 5 times
20 people sampled
Machine operates
for a day
One person sampled
Value on a die
Number of heads
Number with blue eyes
Number of breakdowns
1,2,3,4,5,6
0,1,2,3,4,5
0 to 20
0 upwards
Height
4' to 8'
(roughly)
16
But we can go one step further in describing the properties of a random variable since
it has a much higher chance of taking some of the sample space values than others. we can
express these chances via a probability distribution. To each point in the sample space we can
associate a probability which represents the chance of the random variable being equal to that
particular value. The complete set of sample space values with their associated probabilities
(which must sum to 1) is known as the PROBABILITY DISTRIBUTION of the random
variable; it is often represented diagrammatically by plotting the probabilities by sample
space values.
Example 1
Throw of a fair die
Values,
r
1
2
3
4
5
6
Probs,
pr
1/6
1/6
1/6
1/6
1/6
1/6
This is known as the uniform distribution (discrete case) and can be represented as
pr = 1/6
r = 1,...,6
It can be shown diagrammatically thus;
PROBABILITY DISTRIBUTION
Upturned Face of a Fair Die
1
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
Value of Face
17
5
6
Example 2
Number of heads for 5 fair coins
Values,
r
0
1
2
3
4
5
Probs,
pr
.03
.16
.31
.31
.16
.03
This is an example of the binomial distribution (see later) and can be represented as
pr = 5Cr * (.5)5 ; r = 0,...,5
It can be shown diagrammatically thus;
PROBABILITY DISTRIBUTION
Number of Heads for 5 tosses
1
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0
0
1
2
Number of Heads
18
3
4
5
A probability distribution has a natural frequency interpretation; if the experiment is
repeated a very large number of times then the probability of any particular value of the
random variable is equal to the limit of its relative frequency as the number of experiments
becomes infinitely large.
There are many important probability distributions which describe the chances of real
life events, and these form the basis of statistical inference and data analysis. The Binomial
and Poisson distributions are discussed in this Chapter, and the Normal and other important
sampling distributions in following chapters.
3.3 The Binomial Distribution
The Binomial distribution applies to a series of trials known as Bernoulli Trials.
These have the following properties:1.
Each trial results in 1 of 2 outcomes - sometimes distinguished by calling them
"success" (S) or "failure"(F).
2.
The trials are independent of each other.
3.
The probability of a "success" for each trial is a constant, p.
Note that, using the relative frequency interpretation of probability, p can be regarded
as the limit of the relative frequency of successes as the number of trials becomes very large.
Let q = 1 - p = probability of a failure.
It is easy to think of examples of Bernoulli Trials:Tosses of a coin
Sex of new born babies
Classification of items as effective or defective
Voters in favour of a candidate or not
In fact many sampling situations become Bernoulli Trials if we are only interested in
classifying the result in one of two ways; eg heights of people if we are only interested in
whether each person is taller than 6 ft or not.
19
The general PROBABILITY FUNCTION for the Binomial Distribution is,
pr =
n
Cr pr qn-r
r = 0,1,2,...,n
where n is the number of Bernoulli Trials and p is the success probability for each trial. pr is
the probability that the number of successes in the n trials is equal to r.
This formula can be used to calculate probabilities for any Binomial Distribution.
Alternatively Binomial probabilities can be found in statistical tables and software packages
such as Minitab or SPSS which also give the cumulative distributuion. This is the probability
that the random variable is less than or equal to r. It follows that we can easily find the
probability function from the distribution function or vice-versa using the relationships,
Fr = p0 + p1 + ... + pr
and
pr = Fr - Fr-1
Fr is known as the Distribution function.
As an example consider 10 test tubes of bacterial solution and let us suppose that the
probability of any single tube showing bacterial growth is .2.
Then p(exactly 4 show growth) = p4 = 10C4 .24 .86
Then p(more than 1 show growth) = 1- F1 = 1 - .810 - 10C1 .21 .89
This example demonstrates the advantages of computing the probability of an event no
happening and subtracting it from 1.
3.4 The Poisson Process
The Poisson Process applies to random points occurring in a "continuous medium
such as time, length, area, volume.
Random points have the following properties:1.
Each point is equally likely to occur at any point in the medium.
2.
The position taken by each point is completely independent of the occurrence or nonoccurrence of other points.
Diagramatically a Poisson Process in time may be represented thus,
20
-------X------X-----------------X-----------------X--X-------X-------------X-------------------------0
 Time
where each X represents an occurrence. The occurrences are completely random and
independent of eachother and the rate of occurrence is measured by the average number of
occurrences per unit of time. This is often denoted by  and is known as the Poisson rate
parameter.
Another interpretation of this rate parameter is as follows. If we consider any small interval of
time of length, dt, then the probability of an occurrence in the interval is proportional to the
length of time and is equal to dt so that the chance of an occurrence in an interval of length
2dt is 2dt etc. This chance of occurrence is completely independent of whatever happens
outside of the interval and the proportionality only holds providing the length of the interval
is small.
It is easy to think of examples of a Poisson process;
Machine breakdowns in time
Flaws along the length of a rope
Plants scattered over a field
Particles in a mixture
Customers arriving at a bank, post office, etc.
Service times at a bank, post office etc.
There are many random variables associated with this process. We shall study the 2 most
important of these, namely the Poisson random variable, R, and the Exponential random
variable, X.
3.4.1 The Poisson Distribution
The random variable, R, of interest in this situation is the number of points in a particular unit
of the medium.
The sample space for this is a count and is defined over the integers,
0, 1, 2, 3, 4, 5, 6, …
The general PROBABILITY FUNCTION for the Poisson Distribution gives the
probability of there being exactly r points in a particular unit length of time and is,
pr = r exp(-)/r!
r = 0,1,2,...
where  is the average number of points per unit of the medium.
21
 is the Poisson parameter. Note that, in theory, there is no upper bound on the value of r.
It is easily shown that both the mean and variance of this distribution are equal to .
This formula can be used to calculate probabilities for any Poisson Distribution.
Alternatively Poisson probabilities can be found in statistical tables and software packages
such as Minitab or SPSS which also give the (cumulative) Distribution Function, Fr, for the
Poisson. Spreadsheets such as Excel and Lotus also contain Poisson probabilities.
Example
Let us consider a machine that breaks down, on average, 3.2 times a week.
Suppose we wish to determine the probability that it will break down exactly once next week.
As we are interested in a time period of a week then we must use, as , the average number of
breakdowns in this period of time. Hence we use  = 3.2
Then the probability that it will break down exactly once next week is,
3.2*exp(-3.2) = .13.
Suppose we wish to determine the probability that it will break down exactly 4 times in the
next 2 weeks.
As we are interested in a time period 2 weeks then we must use, as , the average number of
breakdowns in this period of time. Hence we use  = 6.4.
Then the probability that it will break down exactly 4 times in the next 2 weeks is,
(6.4)4*exp(-6.4)/4! = .116
22
3.5 The Exponential Distribution
This is another distribution associated with the Poisson Process. This time the random
variable is continuous valued and is the time between occurrences. Because of the indepent
property of the process this is also the distribution to the time to the first occurrence.
Let X represent this random variable and let f(x) and F(x) be its density and distribution
functions respectively. The sample space for X is ( x>0 ).
Then we have,
1– F(x) = p(X>x)
and it can be seen that the time to the first occurrence is greater than x if, and only if, there are
no occurrences in the interval from 0 to x.
Using the Poisson distribution with rate parameter equal to x (the average number of points
in the interval (0,x)) then the probability of no occurrences in this interval is,
1– F(x) = exp(-x)
so that F(x) = 1– exp(-x) and f(x) = dF/dx = exp(-x) ; x>0
This is the density function for the exponential distribution.
It can be shown that its mean = 1/ and its variance = 1/2
So that its standard deviation is equal to its mean.
Example
This distribution arises in reliability theory and queuing theory. Let us suppose we have a
Poisson Process with rate parameter  = 3. In this case X is a positive valued random variable
with density function ,
f(x) = 3exp(-3x)
x>0
Then p(3<X<5) =
p(X<6) =
5
 (3exp(-3x)) dx = exp(-9) - exp(-15) = .00012
3
6
 (3exp(-3x)) dx = 1 - exp(-18) = almost 1
0
For the Distribution Function,
x
23
For the mean:-
F(x) =  3exp(-3x) du = 1 - exp(-3x)
0
 = E(X) =  x (3exp(-3x)) dx = 1/3
For the Variance
2 = E(X2) - 2 =  x2 (3exp(-3x)) dx - 1/9 = 1/9
24
0<x