Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability distributions 1 You find more examples in the Excel file http://myy.haaga-helia.fi/~taaak/q/distr.xlsx Important note! Some functions have new names in Excel 2010. Use old names if you like to have compatibility to older versions. Probability Distributions To understand probability distributions, it is useful to make a comparison to frequency distributions. Let's assume rolling a dice for several times. The frequency distribution may be as follows. Outcome Frequency % 1 23,1 % 2 15,8 % 3 16,0 % 4 22,4 % 5 13,3 % 6 9,4 % The frequency distribution can also be called the empirical distribution because it is obtained empirically by rolling a dice. On the other hand, you have also theoretical knowledge on rolling a dice. You don't have to be a genius to calculate the following theoretical distribution. Outcome Probability 1 16,7 % 2 16,7 % 3 16,7 % 4 16,7 % 5 16,7 % 6 16,7 % This distribution is called the probability distribution. In practice, you rarely can deduce the probability distribution like in the previous example. But quite often you may use empirical frequency distribution to approximate the probability distribution. For example, an insurance company may use accident statistics from past years to approximate the probability distribution for the number of accidents. This kind of probability distribution is surely useful when pricing insurances. Random variable A random variable is a variable that takes numerical values determined by the outcome of a random experiment. Random variables are classified into discrete random variables and continuous random variables. A random variable is discrete if it can take no more than a countable number of values. A random variable is continuous if it can take any value in an interval. Some examples of discrete random variables: The number of throws of a coin needed before a head first appears The number of dots when rolling a dice Aki Taanila 13.11.2010 Probability distributions 2 The number of defective items in a sample of 20 items from a large shipment The number of customers arriving at a check-out counter in an hour The number of errors detected in the corporation's accounts The number of people in favor of nuclear power in a survey The number of claims on a medical insurance policy in a particular year Some examples of continuous random variables: The yearly income for a family The amount of oil imported into the United States in a particular month The time that elapses between the installation of a new component and its failure The percentage of impurity in a batch of chemicals Distinction between discrete and continuous random variables may appear rather artificial. For example, a family's income in a year will be some integer number of cents or may even be measured in whole euros. Still, in practical applications the family's income is assumed to be a continuous random variable. As another example, the daily sale of a particular newspaper at a newsagent is a discrete random variable. Still, in some practical applications it may be assumed to be a continuous random variable because there are so many possible values. Probability distribution A probability distribution for a discrete random variable is a listing of all possible numerical outcomes and their probabilities. The following table is an example of a discrete probability distribution. In the table the random variable is the number of heads when throwing a coin for three times. number of heads probability cumulative probability 0 0,125 0,125 1 0,375 0,500 2 0,375 0,875 3 0,125 1,000 A probability distribution for a continuous random variable is defined as a so called density function. The density function can be used to calculate the cumulative probability related to a particular random variable value. In the case of a continuous random variable it is meaningless to talk about the probability of a particular random variable value. At least in theory, the probability of a single value is zero. Expected value The expected value for a discrete random variable is obtained by multiplying each possible outcome by its probability and then summing these products. In the previous example (throwing a coin for three times) the expected value of heads is 0 * 0,125 + 1 * 0,375 + 2 * 0,375 + 3 * 0,125 = 1,500 In this case the random variable cannot take the expected value 1,5 but you may think that when repeating the random experiment (throwing coin for three times), on the long run the average number of heads is 1,5. The expected value for a continuous random variable is usually one of the parameters characterizing the distribution. Aki Taanila 13.11.2010 Probability distributions Binomial distribution Assume an experiment meeting the following conditions. The experiment consists of a sequence of n identical trials. All possible outcomes can be classified into two categories, usually called success and failure (success or failure, yes or no, on or off,...). The probability of an outcome being classified as success, p, is constant from trial to trial. Thus the probability of an outcome being classified as failure, 1-p, is also constant over all trials. The outcome (i.e., success or failure) of any trial is independent of the outcome of any other trial. Experiments satisfying the conditions above are called binomial experiments. The number of successes in the n trials is a discrete random variable. This random variable can have a value of 0, 1, 2, 3, ... , n. The probability distribution associated with this random variable is called the binomial distribution. Some examples of binomially distributed random variables: The number of heads when tossing a coin for 50 times The number of reds when spinning the roulette wheel for 15 times The number of defective items in a sample of 20 items from a large shipment The number of people in favor of nuclear power in a survey The expected value of the binomial distribution random variable is np (multiply the number of trials by the probability of success). To find binomial distribution probabilities in Excel, use the function =BINOMDIST(x;n;p;0) (Excel 2010: BINOM.DIST) x=the number of successes in n trials n=the number of trials p=the probability of success on one trial To find binomial distribution cumulative probabilities in Excel, use the function =BINOMDIST(x;n;p;1) (Excel 2010: BINOM.DIST) Note, that only difference between the functions is the last argument (0 for probability, 1 for cumulative probability). Example: If the proportion of defectives in a large shipment is 5% then the number of defective items in a sample of 20 items is binomially distributed (n=20, p=5%). The probability distribution is as follows. number of defectives in a sample probability cumulative probability 0 0,358 0,358 1 0,377 0,736 2 0,189 0,925 3 0,060 0,984 4 0,013 0,997 5 0,002 1,000 : : : 20 0,000 1,000 For example, probability related to the number of 3 can be calculated =BINOMDIST(3;20;5%;0). Important note: in the previous example it is possible that the random variable takes the value of 20 but it is very unlikely (in the precision of three decimals the probability is zero). Aki Taanila 13.11.2010 3 Probability distributions 4 Poisson distribution When dealing with the number of occurrences of an event over a specified interval of time or space the Poisson distribution is often useful. The Poisson distribution is applicable if the following conditions are met. The probability of the occurrence of an event is the same for any two intervals of equal length. The occurrence or nonoccurrence of the event in any interval is independent of the occurrence or nonoccurrence in any other interval. The probability that two or more events will occur in an interval approaches zero as the interval becomes smaller (in other words event is rare). The number of times a particular event occurs in a given interval is a discrete random variable. This random variable follows the Poisson distribution and can have a value of 0, 1, 2, 3, 4,.... For example, the number of customers to arrive in a particular time interval (e.g. 1 minute) is in most cases Poisson distributed. Let's see how the random variable meets the conditions above: If 1-minute time interval is considered it may be assumed that the probability of a customer to arrive is approximately the same for any 1- minute interval The arrival of one customer in any 1- minute interval has no effect on the arrival of any other customer in any other 1-minute interval. The probability that two or more customers will arrive in a given time period approaches zero as the time interval becomes small. Consider a time interval of 1/100th of a second. It is quite unlikely that two customers will arrive in such a small time interval. Some other examples of Poisson distributed random variables: The number of failures in a large computer system during a given day The number of ships arriving at a loading facility during a six-hour loading period The number of delivery trucks to arrive at a central warehouse in an hour The number of dents, scratches, or other defects in a large roll of sheet metal The number of accidents at a crossroads during one year To use the Poisson distribution you must know the average number of occurrences λ over a specified interval. To find Poisson distribution probabilities in Excel, use the function =POISSON(x;λ;0) (Excel 2010: POISSON.DIST) x=number of occurrences over a specified interval λ=average number of occurrences over a specified interval To find Poisson distribution cumulative probabilities in Excel, use the function =POISSON(x;λ;1) (Excel 2010: POISSON.DIST) Example: From past experience it is known that on the average one customer per five minute interval arrive at a car washing machine. The part of the probability distribution for customers to arrive in 30 minute is seen in the following table. For example, the probability related to the number of 3 can be calculated =POISSON(3;6;0). The second argument (6) is the average number of customers for 30 minutes (1 customer per five minute interval -> 6 customers per 30 minute interval). Aki Taanila 13.11.2010 Probability distributions customers 0 1 2 3 4 5 6 7 : 5 probability cumulative probability 0,002 0,002 0,015 0,017 0,045 0,062 0,089 0,151 0,134 0,285 0,161 0,446 0,161 0,606 0,138 0,744 : : Normal distribution The normal distribution closely approximates the probability distributions of a wide range of random variables. Normal distribution was originally developed to give a model for measurement errors. The dimensions of manufactured parts or the weights of food packages often follow a normal distribution. This leads to Quality control applications. Total sales or production often follow a normal distribution that leads us to a large family of applications in marketing and production management. The patterns of stock and bond prices are often modeled using the normal distribution in large computer-based financial trading models. Economic models use the normal distribution for a number of economic measures. The normal distribution is described by the bell-shaped curve. The normal distribution is symmetric and has the property that the median equals the mean. The normal distribution is characterized by two parameters: the mean (expected value) and the standard deviation. It is quite common to estimate the mean and the standard deviation from past experience. To find normal distribution cumulative probabilities in Excel, use the function =NORMDIST(x;µ;σ;1) (Excel 2010: NORM.DIST) x=random variable value µ=mean σ=standard deviation There is also a function to find the random variable value corresponding to a known cumulative probability =NORMINV(cumulative probability;µ;σ) (Excel 2010: NORM.INV) Aki Taanila 13.11.2010 Probability distributions 6 Example: The intelligent quotient (IQ) in the USA is normally distributed with mean 100 and standard deviation 16. Thus 10,6 % of people in the USA have IQ below 80. In Excel you get the result with the function =NORMDIST(80;100;16;1). Standard normal distribution The normal distribution with the mean=0 and the standard deviation=1 is called the standard normal distribution. Many statistics textbooks include a table containing cumulative probabilities for the standard normal distribution. You can use standard normal distribution probability tables with any normal distribution. If x is a random variable value from the normal distribution with the mean=µ and the standard deviation=σ then the transformed random variable z=(x-µ)/σ follows the standard normal distribution. The cumulative probability related to the random variable value x in the original distribution is exactly the same than the cumulative probability related to the transformed value z in the standard normal distribution. The transformed value z is also called the standard value or the z-score. If two variables are measured in different scales then the z-scores are a useful way to compare values of the two variables. Actually, the zscore gives the distance from the mean by using the standard deviation as a unit. Exponential distribution A continuous probability distribution that is often useful in describing the time needed to complete a task is the exponential distribution. The exponential random variable can be used to describe The time between arrivals at a car wash The time required to load a truck The distance between major defects in a highway The service time required for a customer If the number of occurrences in a time interval is Poisson distributed then the time interval between successive occurrences is exponentially distributed. To find exponential distribution cumulative probabilities in Excel, use the function =EXPONDIST(x;1/µ;1) (Excel 2010: EXPON.DIST) µ =average time interval between successive occurrences / average completion time Example: Assume that the average service time per customer is 128 seconds. Some of the cumulative probabilities: service time (seconds) cumulative probability 60 0,005 120 0,608 180 0,755 240 0,847 300 0,904 For example, the probability that the service time for a particular customer is less than four minutes (240 seconds) is 84,7%. In Excel, the function is =EXPONDIST(240;1/128;1). Aki Taanila 13.11.2010