Download Probability Distributions - Haaga

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Probability distributions
1
You find more examples in the Excel file http://myy.haaga-helia.fi/~taaak/q/distr.xlsx
Important note! Some functions have new names in Excel 2010. Use old names if you like to have
compatibility to older versions.
Probability Distributions
To understand probability distributions, it is useful to make a comparison to frequency distributions. Let's
assume rolling a dice for several times. The frequency distribution may be as follows.
Outcome
Frequency %
1
23,1 %
2
15,8 %
3
16,0 %
4
22,4 %
5
13,3 %
6
9,4 %
The frequency distribution can also be called the empirical distribution because it is obtained empirically by
rolling a dice. On the other hand, you have also theoretical knowledge on rolling a dice. You don't have to
be a genius to calculate the following theoretical distribution.
Outcome
Probability
1
16,7 %
2
16,7 %
3
16,7 %
4
16,7 %
5
16,7 %
6
16,7 %
This distribution is called the probability distribution. In practice, you rarely can deduce the probability
distribution like in the previous example. But quite often you may use empirical frequency distribution to
approximate the probability distribution. For example, an insurance company may use accident statistics
from past years to approximate the probability distribution for the number of accidents. This kind of
probability distribution is surely useful when pricing insurances.
Random variable
A random variable is a variable that takes numerical values determined by the outcome of a random
experiment. Random variables are classified into discrete random variables and continuous random
variables.


A random variable is discrete if it can take no more than a countable number of values.
A random variable is continuous if it can take any value in an interval.
Some examples of discrete random variables:


The number of throws of a coin needed before a head first appears
The number of dots when rolling a dice
Aki Taanila 13.11.2010
Probability distributions





2
The number of defective items in a sample of 20 items from a large shipment
The number of customers arriving at a check-out counter in an hour
The number of errors detected in the corporation's accounts
The number of people in favor of nuclear power in a survey
The number of claims on a medical insurance policy in a particular year
Some examples of continuous random variables:




The yearly income for a family
The amount of oil imported into the United States in a particular month
The time that elapses between the installation of a new component and its failure
The percentage of impurity in a batch of chemicals
Distinction between discrete and continuous random variables may appear rather artificial. For example, a
family's income in a year will be some integer number of cents or may even be measured in whole euros.
Still, in practical applications the family's income is assumed to be a continuous random variable. As
another example, the daily sale of a particular newspaper at a newsagent is a discrete random variable. Still,
in some practical applications it may be assumed to be a continuous random variable because there are so
many possible values.
Probability distribution
A probability distribution for a discrete random variable is a listing of all possible numerical outcomes and
their probabilities.
The following table is an example of a discrete probability distribution. In the table the random variable is
the number of heads when throwing a coin for three times.
number of heads probability cumulative probability
0
0,125
0,125
1
0,375
0,500
2
0,375
0,875
3
0,125
1,000
A probability distribution for a continuous random variable is defined as a so called density function. The
density function can be used to calculate the cumulative probability related to a particular random variable
value. In the case of a continuous random variable it is meaningless to talk about the probability of a
particular random variable value. At least in theory, the probability of a single value is zero.
Expected value
The expected value for a discrete random variable is obtained by multiplying each possible outcome by its
probability and then summing these products.
In the previous example (throwing a coin for three times) the expected value of heads is
0 * 0,125 + 1 * 0,375 + 2 * 0,375 + 3 * 0,125 = 1,500
In this case the random variable cannot take the expected value 1,5 but you may think that when repeating
the random experiment (throwing coin for three times), on the long run the average number of heads is 1,5.
The expected value for a continuous random variable is usually one of the parameters characterizing the
distribution.
Aki Taanila 13.11.2010
Probability distributions
Binomial distribution
Assume an experiment meeting the following conditions.




The experiment consists of a sequence of n identical trials.
All possible outcomes can be classified into two categories, usually called success and failure
(success or failure, yes or no, on or off,...).
The probability of an outcome being classified as success, p, is constant from trial to trial. Thus the
probability of an outcome being classified as failure, 1-p, is also constant over all trials.
The outcome (i.e., success or failure) of any trial is independent of the outcome of any other trial.
Experiments satisfying the conditions above are called binomial experiments. The number of successes in
the n trials is a discrete random variable. This random variable can have a value of 0, 1, 2, 3, ... , n. The
probability distribution associated with this random variable is called the binomial distribution.
Some examples of binomially distributed random variables:




The number of heads when tossing a coin for 50 times
The number of reds when spinning the roulette wheel for 15 times
The number of defective items in a sample of 20 items from a large shipment
The number of people in favor of nuclear power in a survey
The expected value of the binomial distribution random variable is np (multiply the number of trials by the
probability of success).
To find binomial distribution probabilities in Excel, use the function
=BINOMDIST(x;n;p;0) (Excel 2010: BINOM.DIST)
x=the number of successes in n trials
n=the number of trials
p=the probability of success on one trial
To find binomial distribution cumulative probabilities in Excel, use the function
=BINOMDIST(x;n;p;1) (Excel 2010: BINOM.DIST)
Note, that only difference between the functions is the last argument (0 for probability, 1 for cumulative
probability).
Example: If the proportion of defectives in a large shipment is 5% then the number of defective items in a
sample of 20 items is binomially distributed (n=20, p=5%). The probability distribution is as follows.
number of defectives in a sample probability cumulative probability
0
0,358
0,358
1
0,377
0,736
2
0,189
0,925
3
0,060
0,984
4
0,013
0,997
5
0,002
1,000
:
:
:
20
0,000
1,000
For example, probability related to the number of 3 can be calculated =BINOMDIST(3;20;5%;0).
Important note: in the previous example it is possible that the random variable takes the value of 20 but it
is very unlikely (in the precision of three decimals the probability is zero).
Aki Taanila 13.11.2010
3
Probability distributions
4
Poisson distribution
When dealing with the number of occurrences of an event over a specified interval of time or space the
Poisson distribution is often useful. The Poisson distribution is applicable if the following conditions are met.



The probability of the occurrence of an event is the same for any two intervals of equal length.
The occurrence or nonoccurrence of the event in any interval is independent of the occurrence or
nonoccurrence in any other interval.
The probability that two or more events will occur in an interval approaches zero as the interval
becomes smaller (in other words event is rare).
The number of times a particular event occurs in a given interval is a discrete random variable. This random
variable follows the Poisson distribution and can have a value of 0, 1, 2, 3, 4,....
For example, the number of customers to arrive in a particular time interval (e.g. 1 minute) is in most cases
Poisson distributed. Let's see how the random variable meets the conditions above:



If 1-minute time interval is considered it may be assumed that the probability of a customer to
arrive is approximately the same for any 1- minute interval
The arrival of one customer in any 1- minute interval has no effect on the arrival of any other
customer in any other 1-minute interval.
The probability that two or more customers will arrive in a given time period approaches zero as
the time interval becomes small. Consider a time interval of 1/100th of a second. It is quite unlikely
that two customers will arrive in such a small time interval.
Some other examples of Poisson distributed random variables:





The number of failures in a large computer system during a given day
The number of ships arriving at a loading facility during a six-hour loading period
The number of delivery trucks to arrive at a central warehouse in an hour
The number of dents, scratches, or other defects in a large roll of sheet metal
The number of accidents at a crossroads during one year
To use the Poisson distribution you must know the average number of occurrences λ over a specified
interval.
To find Poisson distribution probabilities in Excel, use the function
=POISSON(x;λ;0) (Excel 2010: POISSON.DIST)
x=number of occurrences over a specified interval
λ=average number of occurrences over a specified interval
To find Poisson distribution cumulative probabilities in Excel, use the function
=POISSON(x;λ;1) (Excel 2010: POISSON.DIST)
Example: From past experience it is known that on the average one customer per five minute interval arrive
at a car washing machine. The part of the probability distribution for customers to arrive in 30 minute is
seen in the following table.
For example, the probability related to the number of 3 can be calculated =POISSON(3;6;0). The second
argument (6) is the average number of customers for 30 minutes (1 customer per five minute interval -> 6
customers per 30 minute interval).
Aki Taanila 13.11.2010
Probability distributions
customers
0
1
2
3
4
5
6
7
:
5
probability cumulative probability
0,002
0,002
0,015
0,017
0,045
0,062
0,089
0,151
0,134
0,285
0,161
0,446
0,161
0,606
0,138
0,744
:
:
Normal distribution
The normal distribution closely approximates the probability distributions of a wide range of random
variables.





Normal distribution was originally developed to give a model for measurement errors.
The dimensions of manufactured parts or the weights of food packages often follow a normal
distribution. This leads to Quality control applications.
Total sales or production often follow a normal distribution that leads us to a large family of
applications in marketing and production management.
The patterns of stock and bond prices are often modeled using the normal distribution in large
computer-based financial trading models.
Economic models use the normal distribution for a number of economic measures.
The normal distribution is described by the bell-shaped curve. The normal distribution is symmetric and has
the property that the median equals the mean.
The normal distribution is characterized by two parameters: the mean (expected value) and the standard
deviation. It is quite common to estimate the mean and the standard deviation from past experience.
To find normal distribution cumulative probabilities in Excel, use the function
=NORMDIST(x;µ;σ;1) (Excel 2010: NORM.DIST)
x=random variable value
µ=mean
σ=standard deviation
There is also a function to find the random variable value corresponding to a known cumulative probability
=NORMINV(cumulative probability;µ;σ) (Excel 2010: NORM.INV)
Aki Taanila 13.11.2010
Probability distributions
6
Example: The intelligent quotient (IQ) in the USA is normally distributed with mean 100 and standard
deviation 16. Thus 10,6 % of people in the USA have IQ below 80. In Excel you get the result with the
function =NORMDIST(80;100;16;1).
Standard normal distribution
The normal distribution with the mean=0 and the standard deviation=1 is called the standard normal
distribution. Many statistics textbooks include a table containing cumulative probabilities for the standard
normal distribution. You can use standard normal distribution probability tables with any normal
distribution. If x is a random variable value from the normal distribution with the mean=µ and the standard
deviation=σ then the transformed random variable z=(x-µ)/σ follows the standard normal distribution. The
cumulative probability related to the random variable value x in the original distribution is exactly the same
than the cumulative probability related to the transformed value z in the standard normal distribution.
The transformed value z is also called the standard value or the z-score. If two variables are measured in
different scales then the z-scores are a useful way to compare values of the two variables. Actually, the zscore gives the distance from the mean by using the standard deviation as a unit.
Exponential distribution
A continuous probability distribution that is often useful in describing the time needed to complete a task is
the exponential distribution. The exponential random variable can be used to describe




The time between arrivals at a car wash
The time required to load a truck
The distance between major defects in a highway
The service time required for a customer
If the number of occurrences in a time interval is Poisson distributed then the time interval between
successive occurrences is exponentially distributed.
To find exponential distribution cumulative probabilities in Excel, use the function
=EXPONDIST(x;1/µ;1) (Excel 2010: EXPON.DIST)
µ =average time interval between successive occurrences / average completion time
Example: Assume that the average service time per customer is 128 seconds. Some of the cumulative
probabilities:
service time (seconds) cumulative probability
60
0,005
120
0,608
180
0,755
240
0,847
300
0,904
For example, the probability that the service time for a particular customer is less than four minutes (240
seconds) is 84,7%. In Excel, the function is =EXPONDIST(240;1/128;1).
Aki Taanila 13.11.2010