Download Special probability distributions

STATISTICS: MODULE 12122 Chapter 2 - Special probability distributions The aim of this chapter is to introduce you to three special probability distributions - the Binomial distribution which is a discrete probability distribution, the Uniform distribution and the Normal distribution which are both continuous probability distributions. 2.1 A Binomial probability distribution / model is an appropriate model where:1. There are only two outcomes e.g.: success or failure, defective or nondefective, sale or no sale, paid (account) or unpaid (account). 2. Successive outcomes are independent e.g the fact that one outcome is a success does not influence the next outcome in any way. 3. The probability of success, p, is constant, 4. The random variable X can take only values 0, 1, 2, 3, ..., n and n is known , i.e. the random variable X is a discrete random variable. Examples 2.1 1. Airline seating. Suppose an airline has 14 seats available on a particular flight and sells 18 tickets for those seats. From their experience with thousands of flights, the airlines know the probability (p) that passengers will not arrive for a flight, and so they sell more tickets than there are seats on the plane. The number to sell depends on the probability of various numbers of passengers arriving at the departure gate. If more passengers than the number of available seats arrive, the airline must pay a financial penalty or provide free tickets. If fewer passengers arrive, they lose revenue. Here define X = the number of ticket holders who actually arrive for a flight. We assume that n = 18. The two outcomes are ‘Passenger arrives for flight’ and ‘Passenger does not arrive for flight’. There are of course incidents that could invalidate the assumption of independent arrivals such as a major traffic jam on roads to the airport. If more than 14 persons arrive for the flight, one or more people will be bumped from the flight and the airline will have to pay a penalty. 2. Sales A large electrical appliance store in a large shopping complex knows from past records that 45% of the people who see a television set will actually go on to purchase a television set that same day. Suppose that on any one Saturday, they can expect 40 people to ask to see a television set but they only have 20 sets available in the store. They might be interested in the probability that potential sales will be lost, i.e. if Y = the number of people who go on to purchase a television set, they are interested in P(Y > 20) given p = P( Person purchases a T.V. ) = 0.45 and n = 40, since if more than 20 people want to buy one, they may leave the 2 store and go and purchase it that same day at a competitors store in the same shopping complex. 3. Acceptance sampling A cannery accepts a very large shipment of tomatoes, if after examining a random sample of 100 boxes of tomatoes, at least 90% of the boxes contain ripe tomatoes. So if the percentage of boxes is less than 90% the shipment is rejected. Suppose they are interested in (i) the probability that they accept a poor shipment in which the proportion of satisfactory boxes is only 80% say, (this is known as the consumer’s risk) and (ii) the probability that a good shipment is rejected in which the proportion of satisfactory boxes is 95% say (this is known as the producer’s risk). Here we define our random variable to be Z = the number of satisfactory boxes, p = P( A box is satisfactory ) and n = 100. The consumer’s risk = P (A poor shipment is accepted) = P ( Z ò 90 / p = 0.80 ) The producer’s risk = P (A good shipment is rejected) = P ( Z < 90 / p = 0.95) Example 2.2 A multiple-choice test consists of 8 questions with 3 answers per question. Only one of the answers is correct. Assume a student guesses. Suppose C is the event ‘Correct answer’and I the event ‘Incorrect answer’. Let X = number of correct answers then X = 0, 1, 2, 3, 4, 5, 6, 7 or 8. Here n = total number of correct answers = 8. 1 2 P ( Correct answer ) = P(C) = = p and P( Incorrect answer) = P(I) = = (1 - p) 3 3 Suppose we want to calculate the probability of one correct answer i.e. P( X = 1 ). Possibilities Question 1 So P ( X = 1 ) = 2 3 4 5 6 7 8 Probability 3 Suppose the test consists of 4 questions and we require P( X = 2). Possibilities Question 1 2 3 4 Probability So P( X = 2) = Suppose the number of questions (n) is 15 and we require P( X = 2) . It is not easy to write down all the possibilities. We require a formula to obtain the probabilities. This is given by the probability function. The probability function p(x) is given by p ( x) = P( X = x) = n C x p x q n− x = n C x p x ( 1 − p) n− x x = 0, 1, 2... n where q = (1- p) and X = number of correct answers ( successes) in a sample size n. µ = E(X) = mean of the Binomial = np, and σ 2 = Var(X) = variance of the Binomial = npq = np( 1- p). The quantities p and n are called the parameters of the probability model or distribution. As you can see from the probability function, the Binomial model is specified by these two parameters so that if these two parameters change, the Binomial distribution will be change. For small n and p close to 0, the Binomial distribution is skewed to the right, for small n and p close to 0.5, the Binomial distribution is almost symmetrical and for small n, p close to 1 the Binomial distribution is skewed to the left. 4 Example 2.3 In the airline seating problem, suppose that past records indicate that 80% of passengers who purchase tickets with this airline actually arrive for their flights. Calculate the probability that (a) 14 passengers arrive for the flight (b) the airline will have to pay a penalty. Example 2.4 Over a long period, a salesman has a history of making sales on about 55% of the calls he makes. (a) describe the conditions which must be met before the Binomial distribution may be applied to this situation and indicate which of these conditions are most likely to be violated. On a particular day, the salesman makes 10 calls. Assuming the conditions for the Binomial distribution are met: (b) What is the probability that he sells to six of the customers he visits? (c) What is the probability that he sells to at least two of the customers he visits? If the salesman receives £30 commission for each successful call he makes : (d) What is his expected commission for a month in which he makes 150 calls on customers? What is the standard deviation of commission? 2.2 A Uniform random variable, X , is one whose measurements are not more highly concentrated around one value than they are around any one of the other values. The measurements of X are evenly spread over the entire range of possible values of X. The probability distribution of a uniform random variable is called a Uniform distribution. The probability density function (p.d.f.) of X is constant over a range of values of the random variable i.e. 1 ( b − a) = 0 f ( x) = a ≤ x ≤b or a < x < b otherwise We say X ~ U [a , b] . The random variable X is just as likely to assume a value within one interval as it is to assume a value in any other interval of equal width. Consider, for example the arrival time of a plane that is just as likely to arrive at one time as another over a 30 minute interval. The manufacturer of some product has determined that the ages of consumers of the product fall between 16 years and 30 years with equal frequency. The uniform distribution would be an appropriate model for this age variable, and given certain necesssary information, the manufacturer could use the model to answer probability questions about the ages of the product’s consumers. 5 It is the probability model that is used when there is limited information about the pattern of outcomes of a random experiment or equivalently limited information about the values of the random variable. It is very important in a branch of Statistics known as Bayesian statistics. Also uniform random variables form the basis for simulation of other continuous random variables. For example, suppose X ~ U [0 , 1], then it can be shown that 1 Y = - ln (X) or ln has an exponential distribution with parameter 1 i.e. its p.d.f. X y > 0 ( see Example 1.14, Chapter 1, λ =1 ). given by f ( y ) = e y The p.d.f. of a Uniform random variable is symmetrical about µ = E(X) = mean of the Uniform = Also ( b + a) . 2 σ = Var(X) = variance of the Uniform = 2 ( b + a) 2 so ( b − a)2 . 12 I ask you to prove these in Exercise 2, question 1(ii). For an example of the uniform distribution, see example 6, Exercise 1. Note that the Uniform distribution is also called the Rectangular distribution because of the shape of the p.d.f. 2.3 The Normal distribution If a continuous random variable X has a Normal distribution, we write X ~ N( µ ,σ 2 ). The Normal distribution has the following properties: (1) It has a perfectly symmetrical p.d.f. curve, symmetrical about µ where µ is the mean so there is no skewness. Also P( X > µ ) = P( X < µ ) = 0.50 since the total area under the p.d.f. curve is 1. The p.d.f. of X is as follows: f ( x) =  ( x − µ )2   exp  − 2σ 2  2π σ  1 − ∞ < x < +∞ 6 (2) The median = the mode = the mean µ . (3) The percentage of X values within one standard deviation of the mean is approximately 68%, the percentage of X values within two standard deviations of the mean is approximately 95% and the percentage of X values within three standard deviations is approximately 99½%. Standardised normal distribution Probabilities can be calculated using special tables based on a special distribution called the standardised Normal distribution. This distribution has µ = 0 and σ = 1 . A standardised Normal random variable is usually denoted by Z and we write Z ~ N(0, 1). To standardise any Normal random variable to give a Z random variable, take Z = Random variable - its mean its standard deviation The p.d.f. of Z is as follows: f ( z) =  z2  1 exp  −   2 2π or 1 − e 2π 1 2 z 2 − ∞ < z < +∞ Suppose Φ ( z ) is the (cumulative) distribution function of Z, then Φ ( z) = P ( Z ó z ) = z ∫ −∞ and 1 − 21 z 2 e dz 2π Φ ( − z ) = 1− Φ ( z) because of the symmetry. The tables on page 7 give the values of Φ ( z ) for z ò 0. 7 Example 2.5 If Z ~ N(0, 1), then calculate the following probabilities: (i) P ( Z > 1.510 ) (ii) P ( 0.121 < Z < 1.478) (iii) P ( Z ó - 0.562) (iv) P ( - 0.128 ó Z ó 2.180). Example 2.6 Batteries for a transistor radio have a mean life (under normal usage) of 160 hours, with a standard deviation of 30 hours. Assume that battery life follows a Normal distribution :(i) Calculate the percentage of batteries which have a life between 150 hours and 180 hours. (ii) Calculate the range, symmetrical about the mean, within which 75% of battery lives lie. (iii) If a radio takes 4 batteries and requires all of them to be working, calculate the probability that the radio will run for at least 135 hours. Example 2.7 Machine components are mass produced at a factory. A customer requires that the components should be 5.2 cm. long but they will be acceptable if they are within limits 5.195 to 5.205 cm. The customer tests the components and finds that 10.75% of those supplied are over-sized and 4.95% are under-sized. (i) Find the mean and standard deviation of the lengths of the components supplied assuming that they are normally distributed. (ii) If three of the components are selected at random, what is the probability that one is under-sized, one over-sized and one satisfactory? Associated reading for Chapter 2 Relevant parts of Chapter 5 and Chapter 7 ; A Concise Course in A-level Statistics by Crawshaw and Chambers Sections 3.3, 4.1 - 4.5 , 4.8 only Mathematical Statistics with Applications by Mendenhall et al. (see module handout for publisher details etc) C. Osborne February 2000 8 Worked example 2. 4 (a) The 4 conditions which must be met before the Binomial distribution can be applied to this salesman problem are: (i) For any call he makes, there must be only two outcomes, a sale or no sale i.e. the call is successful or not successful. (ii) Successive calls are independent, i.e. the probability that the salesman makes a sale on a particular call is not affected by the outcomes of previous calls. (iii) If P (A sale on a particular call) = p, then p must be constant from call to call. (iv) If X = the number of calls per day then X must be a discrete random variable with values 0,1,2,....n where n is known. The most likely conditions to be violated are (ii) and (iii). Through the day people are often at work so the probability of making a sale during an afternoon may be very different from the probability of making a sale in the evening when people are at home in which case (iii) does not hold. If all the calls are in a certain neighbourhood where potential customers know each other, then condition (ii) may be violated as one person may want to buy the product in order to keep up with the neighbours. You think of some other reasons why these two conditions may be violated. (b) Let X = the number of sales salesman makes on a particular day, then X = 0,1,2,.....10, so n =10. Let p = P (A sale on a particular call) = 0.55 i.e. X ~Bin ( 10, 0.55) Hence P( X = x ) = C x ( 0.55) ( 1 − 0.55) 10− x x 10 x = 0,1,2,.....10. We require P( X = 6) = 10 C6 ( 0.55) ( 0.45) 10! 10.9.8.7 = ( 0.55)6 ( 0.45)4 = ( 0.55)6 ( 0.45)4 4 ! 6! 4.3.2.1 6 4 = 210 ( 0.55) ( 0.45) = 0.2384 6 4 So the probability that he sells to six of the customers is 0.2384 (c) We require P( X ò 2) = 1 - P( X < 2) = 1 - [ P( X = 0) + P( X = 1) ] 10! ( 0.55)0 ( 0.45)10 = ( 0.45)10 = 0.00034 10!0! 10! 1 9 P( X = 1) = 10 C1 ( 0.55) ( 0.45) = ( 0.55)1 ( 0.45)9 = 10.( 0.55)1 ( 0.45)9 = 0.00416 9!1! So P( X ò 2) = 1 - [0.00034 + 0.00416] = 1- 0.00450 = 0.9955 to 4 dec. places P( X = 0) = 10 C0 ( 0.55) ( 0.45) 0 10 = (d) Expected commission = £ 30(expected number of sales) = £ 30E(X) where E(X) = np = 150(0.55) = 82.5 so expected commission = £2475. Now Var (X) = np(1- p) = 82.5(0.45) = 37.125 so stdev(X) = 37.125 = 6.0930 9 so the standard deviation of commission = £30(6.0930) = £182.79. Worked example 2.7 Machine components are mass produced at a factory. A customer requires that the components should be 5.2 cm. long but they will be acceptable if they are within limits 5.195 to 5.205 cm. The customer tests the components and finds that 10.75% of those supplied are over-sized and 4.95% are under-sized. (i) Find the mean and standard deviation of the lengths of the components supplied assuming that they are normally distributed. (ii) If three of the components are selected at random, what is the probability that one is under-sized, one over-sized and one satisfactory? Solution Let X = the length of the component then X ~N( µ ,σ 2 ) where as yet µ and σ 2 are unknown. We are asked in part (i) to find µ and σ 2 . Let O be the event that component is over-sized, then P(O) = P(X ò 5.205) = 0.1075. Let U be the event that component is under-sized then P(U) = P(X ó 5.195) = 0.0495 Below is a diagram of the distribution of X and as you can see µ is between 5.195 cm and 5.205 cm. Standardising Z = X− µ σ 5195 . − µ = z1 where z1 < 0 (see diagram below) σ 5.205 − µ and when X = 5.205, Z = = z 2 where z2 > 0 σ so when X = 5.195, Z = So Φ ( z 2 ) = 1 - 0.1075 = 0.8925 Using the tables, we can see that z 2 = 1.240. Finding z1 is not quite so easy as z1 < 0. Φ ( z1 ) = 0.0495 and Φ ( − z1 ) = 1- Φ ( z1 ) = 1- 0.0495 = 0.9505 so using the tables − z1 = 1.650 i.e. z1 = -1.650. 10 Hence we have two simultaneous equations to solve, namely 5195 . − µ = z1 = -1.650 so 5.195 - µ = -1.650 σ σ 5.205 − µ = z 2 = 1.240 so 5.205 - µ = 1.240 σ σ Subtracting equations (1) and (2) we get, (1) (2) 5.205 - 5.195 = ( 1.240 + 1.650) σ . Hence σ = 0.00346. Substituting for σ in equation (2) gives µ = 5.205 - 1.24(0.00346) = 5.201. So the mean length of the components supplied is 5.201 cm. and the standard deviation of lengths is 0.0035 cm. (ii) P(O) = P(X ò 5.205) = 0.1075. P(U) = P(X ó 5.195) = 0.0495 Let A be the event that component is satisfactory or acceptable, then P(A) = 1 - P(O) - P(U) = 1 - 0.1075 - 0.0495 = 0.8430. From part (i) To get one undersized, one over-sized and one satisfactory component from 3 components there are 6 possibilities listed below each with the same associated probability: Possibilities Probability OUA UOA OAU AOU AUO UAO (0.1075)(0.0495)(0.8430) = 0.00448 (0.0495)(0.1075)(0.8430) = 0.00448 (0.1075)(0.8430)(0.0495) = 0.00448 (0.8430)(0.1075)(0.0495) = 0.00448 (0.8430)(0.0495)(0.8430) = 0.00448 (0.0495)(0.8430)(0.1075) = 0.00448 Above we have used the fact that events O, U and A are independent so that for example, P(O U A)= P( O and U and A) = P(O) x P(U) x P(A) which is given in Chapter 1, section 1.4, multiplication law. Hence the probability that if 3 components are selected at random, one is under-sized, one over-sized and one satisfactory = (6)(0.00448) = 0.0269 to 4 dec. places.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Special probability distributions