Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
3.1. Random Variables 95 Two Types of Random Variables In Section 1.2, we distinguished between data resulting from observations on a counting variable and data obtained by observing values of a measurement variable. A slightly more formal distinction characterizes two different types of random variables. DEFINITION A discrete random variable is an rv whose possible values either constitute a finite set or else can be listed in an infinite sequence in which there is a first element, a second element, and so on (“countably” infinite). A random variable is continuous if both of the following apply: 1. Its set of possible values consists either of all numbers in a single interval on the number line (possibly infinite in extent, e.g., from 2` to ⬁) or all numbers in a disjoint union of such intervals (e.g., [0, 10] ´ [20, 30]). 2. No possible value of the variable has positive probability, that is, P(X 5 c) 5 0 for any possible value c. Although any interval on the number line contains an infinite number of numbers, it can be shown that there is no way to create an infinite listing of all these values— there are just too many of them. The second condition describing a continuous random variable is perhaps counterintuitive, since it would seem to imply a total probability of zero for all possible values. But we shall see in Chapter 4 that intervals of values have positive probability; the probability of an interval will decrease to zero as the width of the interval shrinks to zero. Example 3.6 All random variables in Examples 3.1 –3.4 are discrete. As another example, suppose we select married couples at random and do a blood test on each person until we find a husband and wife who both have the same Rh factor. With X 5 the number of blood tests to be performed, possible values of X are D 5 52, 4, 6, 8, c6 . Since the possible values have been listed in sequence, X is a discrete rv. ■ To study basic properties of discrete rv’s, only the tools of discrete mathematics— summation and differences—are required. The study of continuous variables requires the continuous mathematics of the calculus—integrals and derivatives. EXERCISES Section 3.1 (1–10) 1. A concrete beam may fail either by shear (S) or flexure (F). Suppose that three failed beams are randomly selected and the type of failure is determined for each one. Let X 5 the number of beams among the three selected that failed by shear. List each outcome in the sample space along with the associated value of X. 5. If the sample space S is an infinite set, does this necessarily imply that any rv X defined from S will have an infinite set of possible values? If yes, say why. If no, give an example. 3. Using the experiment in Example 3.3, define two more random variables and list the possible values of each. 6. Starting at a fixed time, each car entering an intersection is observed to see whether it turns left (L), right (R), or goes straight ahead (A). The experiment terminates as soon as a car is observed to turn left. Let X 5 the number of cars observed. What are possible X values? List five outcomes and their associated X values. 4. Let X 5 the number of nonzero digits in a randomly selected zip code. What are the possible values of X? Give three possible outcomes and their associated X values. 7. For each random variable defined here, describe the set of possible values for the variable, and state whether the variable is discrete. 2. Give three examples of Bernoulli rv’s (other than those in the text). Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 96 CHAPTER 3 Discrete Random Variables and Probability Distributions a. X 5 the number of unbroken eggs in a randomly chosen standard egg carton b. Y 5 the number of students on a class list for a particular course who are absent on the first day of classes c. U 5 the number of times a duffer has to swing at a golf ball before hitting it d. X 5 the length of a randomly selected rattlesnake e. Z 5 the amount of royalties earned from the sale of a first edition of 10,000 textbooks f. Y 5 the pH of a randomly chosen soil sample g. X 5 the tension (psi) at which a randomly selected tennis racket has been strung h. X 5 the total number of coin tosses required for three individuals to obtain a match (HHH or TTT) 8. Each time a component is tested, the trial is a success (S) or failure (F). Suppose the component is tested repeatedly until a success occurs on three consecutive trials. Let Y denote the number of trials necessary to achieve this. List all outcomes corresponding to the five smallest possible values of Y, and state which Y value is associated with each one. 9. An individual named Claudius is located at the point 0 in the accompanying diagram. A2 B1 B2 A3 B4 10. The number of pumps in use at both a six-pump station and a four-pump station will be determined. Give the possible values for each of the following random variables: a. T 5 the total number of pumps in use b. X 5 the difference between the numbers in use at stations 1 and 2 c. U 5 the maximum number of pumps in use at either station d. Z 5 the number of stations having exactly two pumps in use B3 0 A1 Using an appropriate randomization device (such as a tetrahedral die, one having four sides), Claudius first moves to one of the four locations B1, B2, B3, B4. Once at one of these locations, another randomization device is used to decide whether Claudius next returns to 0 or next visits one of the other two adjacent points. This process then continues; after each move, another move to one of the (new) adjacent points is determined by tossing an appropriate die or coin. a. Let X 5 the number of moves that Claudius makes before first returning to 0. What are possible values of X? Is X discrete or continuous? b. If moves are allowed also along the diagonal paths connecting 0 to A1, A2, A3, and A4, respectively, answer the questions in part (a). A4 3.2 Probability Distributions for Discrete Random Variables Probabilities assigned to various outcomes in S in turn determine probabilities associated with the values of any particular rv X. The probability distribution of X says how the total probability of 1 is distributed among (allocated to) the various possible X values. Suppose, for example, that a business has just purchased four laser printers, and let X be the number among these that require service during the warranty period. Possible X values are then 0, 1, 2, 3, and 4. The probability distribution will tell us how the probability of 1 is subdivided among these five possible values— how much probability is associated with the X value 0, how much is apportioned to the X value 1, and so on. We will use the following notation for the probabilities in the distribution: p(0) 5 the probability of the X value 0 5 P(X 5 0) p(1) 5 the probability of the X value 1 5 P(X 5 1) and so on. In general, p(x) will denote the probability assigned to the value x. Example 3.7 The Cal Poly Department of Statistics has a lab with six computers reserved for statistics majors. Let X denote the number of these computers that are in use at a particular time of day. Suppose that the probability distribution of X is as given in the Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 104 CHAPTER 3 Discrete Random Variables and Probability Distributions PROPOSITION For any two numbers a and b with a # b, P(a # X # b) 5 F(b) 2 F(a2) where “a2” represents the largest possible X value that is strictly less than a. In particular, if the only possible values are integers and if a and b are integers, then P(a # X # b) 5 P(X 5 a or a 1 1 orc or b) 5 F(b) 2 F(a 2 1) Taking a 5 b yields P(X 5 a) 5 F(a) 2 F(a 2 1) in this case. The reason for subtracting F(a2) rather than F(a) is that we want to include P(X 5 a); F(b) 2 F(a) gives P(a , X # b). This proposition will be used extensively when computing binomial and Poisson probabilities in Sections 3.4 and 3.6. Example 3.15 Let X 5 the number of days of sick leave taken by a randomly selected employee of a large company during a particular year. If the maximum number of allowable sick days per year is 14, possible values of X are 0, 1, . . . , 14. With F(0) 5 .58, F(1) 5 .72, F(2) 5 .76, F(3) 5 .81, F(4) 5 .88, and F(5) 5 .94, P(2 # X # 5) 5 P(X 5 2, 3, 4, or 5) 5 F(5) 2 F(1) 5 .22 and ■ P(X 5 3) 5 F(3) 2 F(2) 5 .05 EXERCISES Section 3.2 (11–28) 11. An automobile service facility specializing in engine tune-ups knows that 45% of all tune-ups are done on fourcylinder automobiles, 40% on six-cylinder automobiles, and 15% on eight-cylinder automobiles. Let X 5 the number of cylinders on the next car to be tuned. a. What is the pmf of X? b. Draw both a line graph and a probability histogram for the pmf of part (a). c. What is the probability that the next car tuned has at least six cylinders? More than six cylinders? 12. Airlines sometimes overbook flights. Suppose that for a plane with 50 seats, 55 passengers have tickets. Define the random variable Y as the number of ticketed passengers who actually show up for the flight. The probability mass function of Y appears in the accompanying table. y 45 46 47 48 49 50 51 52 53 54 55 p(y) .05 .10 .12 .14 .25 .17 .06 .05 .03 .02 .01 a. What is the probability that the flight will accommodate all ticketed passengers who show up? b. What is the probability that not all ticketed passengers who show up can be accommodated? c. If you are the first person on the standby list (which means you will be the first one to get on the plane if there are any seats available after all ticketed passengers have been accommodated), what is the probability that you will be able to take the flight? What is this probability if you are the third person on the standby list? 13. A mail-order computer business has six telephone lines. Let X denote the number of lines in use at a specified time. Suppose the pmf of X is as given in the accompanying table. x p(x) 0 1 2 3 4 5 6 .10 .15 .20 .25 .20 .06 .04 Calculate the probability of each of the following events. a. {at most three lines are in use} b. {fewer than three lines are in use} c. {at least three lines are in use} d. {between two and five lines, inclusive, are in use} e. {between two and four lines, inclusive, are not in use} f. {at least four lines are not in use} Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3.2. Probability Distributions for Discrete Random Variables 14. A contractor is required by a county planning department to submit one, two, three, four, or five forms (depending on the nature of the project) in applying for a building permit. Let Y 5 the number of forms required of the next applicant. The probability that y forms are required is known to be proportional to y—that is, p(y) 5 ky for y 5 1, . . . , 5. 5 a. What is the value of k? [Hint: a p(y) 5 1.] y51 b. What is the probability that at most three forms are required? c. What is the probability that between two and four forms (inclusive) are required? d. Could p(y) 5 y2/50 for y 5 1, c, 5 be the pmf of Y? 15. Many manufacturers have quality control programs that include inspection of incoming materials for defects. Suppose a computer manufacturer receives computer boards in lots of five. Two boards are selected from each lot for inspection. We can represent possible outcomes of the selection process by pairs. For example, the pair (1, 2) represents the selection of boards 1 and 2 for inspection. a. List the ten different possible outcomes. b. Suppose that boards 1 and 2 are the only defective boards in a lot of five. Two boards are to be chosen at random. Define X to be the number of defective boards observed among those inspected. Find the probability distribution of X. c. Let F(x) denote the cdf of X. First determine F(0) 5 P(X # 0), F(1), and F(2); then obtain F(x) for all other x. 16. Some parts of California are particularly earthquake-prone. Suppose that in one metropolitan area, 25% of all homeowners are insured against earthquake damage. Four homeowners are to be selected at random; let X denote the number among the four who have earthquake insurance. a. Find the probability distribution of X. [Hint: Let S denote a homeowner who has insurance and F one who does not. Then one possible outcome is SFSS, with probability (.25)(.75)(.25)(.25) and associated X value 3. There are 15 other outcomes.] b. Draw the corresponding probability histogram. c. What is the most likely value for X? d. What is the probability that at least two of the four selected have earthquake insurance? 17. A new battery’s voltage may be acceptable (A) or unacceptable (U). A certain flashlight requires two batteries, so batteries will be independently selected and tested until two acceptable ones have been found. Suppose that 90% of all batteries have acceptable voltages. Let Y denote the number of batteries that must be tested. a. What is p(2), that is, P(Y 5 2)? b. What is p(3)? [Hint: There are two different outcomes that result in Y 5 3.] c. To have Y 5 5, what must be true of the fifth battery selected? List the four outcomes for which Y 5 5 and then determine p(5). d. Use the pattern in your answers for parts (a)–(c) to obtain a general formula for p(y). 105 18. Two fair six-sided dice are tossed independently. Let M 5 the maximum of the two tosses (so M(1,5) 5 5, M(3,3) 5 3, etc.). a. What is the pmf of M? [Hint: First determine p(1), then p(2), and so on.] b. Determine the cdf of M and graph it. 19. A library subscribes to two different weekly news magazines, each of which is supposed to arrive in Wednesday’s mail. In actuality, each one may arrive on Wednesday, Thursday, Friday, or Saturday. Suppose the two arrive independently of one another, and for each one P(Wed.) 5 .3, P(Thurs.) 5 .4, P(Fri.) 5 .2, and P(Sat.) 5 .1. Let Y 5 the number of days beyond Wednesday that it takes for both magazines to arrive (so possible Y values are 0, 1, 2, or 3). Compute the pmf of Y. [Hint: There are 16 possible outcomes; Y(W,W) 5 0, Y(F,Th) 5 2, and so on.] 20. Three couples and two single individuals have been invited to an investment seminar and have agreed to attend. Suppose the probability that any particular couple or individual arrives late is .4 (a couple will travel together in the same vehicle, so either both people will be on time or else both will arrive late). Assume that different couples and individuals are on time or late independently of one another. Let X 5 the number of people who arrive late for the seminar. a. Determine the probability mass function of X. [Hint: label the three couples #1, #2, and #3 and the two individuals #4 and #5.] b. Obtain the cumulative distribution function of X, and use it to calculate P(2 # X # 6). 21. Suppose that you read through this year’s issues of the New York Times and record each number that appears in a news article—the income of a CEO, the number of cases of wine produced by a winery, the total charitable contribution of a politician during the previous tax year, the age of a celebrity, and so on. Now focus on the leading digit of each number, which could be 1, 2, . . . , 8, or 9. Your first thought might be that the leading digit X of a randomly selected number would be equally likely to be one of the nine possibilities (a discrete uniform distribution). However, much empirical evidence as well as some theoretical arguments suggest an alternative probability distribution called Benford’s law: x11 b x p(x) 5 P(1st digit is x) 5 log10 a x 5 1, 2, . . . , 9 a. Without computing individual probabilities from this formula, show that it specifies a legitimate pmf. b. Now compute the individual probabilities and compare to the corresponding discrete uniform distribution. c. Obtain the cdf of X. d. Using the cdf, what is the probability that the leading digit is at most 3? At least 5? [Note: Benford’s law is the basis for some auditing procedures used to detect fraud in financial reporting—for example, by the Internal Revenue Service.] Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 106 CHAPTER 3 Discrete Random Variables and Probability Distributions 26. Alvie Singer lives at 0 in the accompanying diagram and has four friends who live at A, B, C, and D. One day Alvie decides to go visiting, so he tosses a fair coin twice to decide which of the four to visit. Once at a friend’s house, he will either return home or else proceed to one of the two adjacent houses (such as 0, A, or C when at B), with each of the three possibilities having probability 1 . In 3 this way, Alvie continues to visit friends until he returns home. 22. Refer to Exercise 13, and calculate and graph the cdf F(x). Then use it to calculate the probabilities of the events given in parts (a)–(d) of that problem. 23. A consumer organization that evaluates new automobiles customarily reports the number of major defects in each car examined. Let X denote the number of major defects in a randomly selected car of a certain type. The cdf of X is as follows: 0 .06 .19 .39 F(x) 5 h .67 .92 .97 1 x,0 0#x, 1#x, 2#x, 3#x, 4#x, 5#x, 6#x 1 2 3 4 5 6 A 0 D C a. Let X 5 the number of times that Alvie visits a friend. Derive the pmf of X. b. Let Y 5 the number of straight-line segments that Alvie traverses (including those leading to and from 0). What is the pmf of Y? c. Suppose that female friends live at A and C and male friends at B and D. If Z 5 the number of visits to female friends, what is the pmf of Z? Calculate the following probabilities directly from the cdf: a. p(2), that is, P(X 5 2) b. P(X . 3) c. P(2 # X # 5) d. P(2 , X , 5) 24. An insurance company offers its policyholders a number of different premium payment options. For a randomly selected policyholder, let X 5 the number of months between successive payments. The cdf of X is as follows: 0 x,1 .30 1 # x , .40 3 # x , F(x) 5 f .45 4 # x , .60 6 # x , 1 12 # x B 27. After all students have left the classroom, a statistics professor notices that four copies of the text were left under desks. At the beginning of the next lecture, the professor distributes the four books in a completely random fashion to each of the four students (1, 2, 3, and 4) who claim to have left books. One possible outcome is that 1 receives 2’s book, 2 receives 4’s book, 3 receives his or her own book, and 4 receives 1’s book. This outcome can be abbreviated as (2, 4, 3, 1). a. List the other 23 possible outcomes. b. Let X denote the number of students who receive their own book. Determine the pmf of X. 3 4 6 12 a. What is the pmf of X? b. Using just the cdf, compute P(3 # X # 6) and P(4 # X). 25. In Example 3.12, let Y 5 the number of girls born before the experiment terminates. With p 5 P(B) and 1 2 p 5 P(G), what is the pmf of Y? [Hint: First list the possible values of Y, starting with the smallest, and proceed until you see a general formula.] 28. Show that the cdf F(x) is a nondecreasing function; that is, x1 , x2 implies that F(x1) # F(x2). Under what condition will F(x1) 5 F(x2)? 3.3 Expected Values Consider a university having 15,000 students and let X 5 the number of courses for which a randomly selected student is registered. The pmf of X follows. Since p(1) 5 .01, we know that (.01) # (15,000) 5 150of the students are registered for one course, and similarly for the other x values. x 1 2 3 4 5 6 7 p(x) .01 .03 .13 .25 .39 .17 .02 Number registered 150 450 1950 3750 5850 2550 300 (3.6) Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3.3. Expected Values 113 The absolute value is necessary because a might be negative, yet a standard deviation cannot be. Usually multiplication by a corresponds to a change in the unit of measurement (e.g., kg to lb or dollars to euros). According to the first relation in (3.14), the sd in the new unit is the original sd multiplied by the conversion factor. The second relation says that adding or subtracting a constant does not impact variability; it just rigidly shifts the distribution to the right or left. Example 3.26 In the computer sales scenario of Example 3.23, E(X) 5 2 and E(X 2) 5 (0)2(.1) 1 (1)2(.2) 1 (2)2(.3) 1 (3)2(.4) 5 5 so V(X) 5 5 2 (2)2 5 1. The profit function h(X) 5 800X 2 900 then has variance (800)2 # V(X) 5 (640,000)(1) 5 640,000and standard deviation 800. ■ EXERCISES Section 3.3 (29–45) 29. The pmf of the amount of memory X (GB) in a purchased flash drive was given in Example 3.13 as x p(x) 1 2 4 8 16 .05 .10 .35 .40 .10 Compute the following: a. E(X) b. V(X) directly from the definition c. The standard deviation of X d. V(X) using the shortcut formula 30. An individual who has automobile insurance from a certain company is randomly selected. Let Y be the number of moving violations for which the individual was cited during the last 3 years. The pmf of Y is y p(y) 0 1 2 3 .60 .25 .10 .05 a. Compute E(Y). b. Suppose an individual with Y violations incurs a surcharge of $100Y2. Calculate the expected amount of the surcharge. 31. Refer to Exercise 12 and calculate V(Y) and sY. Then determine the probability that Y is within 1 standard deviation of its mean value. 32. An appliance dealer sells three different models of upright freezers having 13.5, 15.9, and 19.1 cubic feet of storage space, respectively. Let X 5 the amount of storage space purchased by the next customer to buy a freezer. Suppose that X has pmf x p(x) 13.5 15.9 19.1 .2 .5 .3 a. Compute E(X), E(X2), and V(X). b. If the price of a freezer having capacity X cubic feet is 25X 2 8.5, what is the expected price paid by the next customer to buy a freezer? c. What is the variance of the price 25X 2 8.5 paid by the next customer? d. Suppose that although the rated capacity of a freezer is X, the actual capacity is h(X) 5 X 2 .01X 2. What is the expected actual capacity of the freezer purchased by the next customer? 33. Let X be a Bernoulli rv with pmf as in Example 3.18. a. Compute E(X2). b. Show that V(X) 5 p(1 2 p). c. Compute E(X79). 34. Suppose that the number of plants of a particular type found in a rectangular sampling region (called a quadrat by ecologists) in a certain geographic area is an rv X with pmf p(x) 5 e c/x 3 x 5 1, 2, 3, . . . 0 otherwise Is E(X) finite? Justify your answer (this is another distribution that statisticians would call heavy-tailed). 35. A small market orders copies of a certain magazine for its magazine rack each week. Let X 5 demand for the magazine, with pmf x 1 2 3 4 5 6 p(x) 1 15 2 15 3 15 4 15 3 15 2 15 Suppose the store owner actually pays $2.00 for each copy of the magazine and the price to customers is $4.00. If magazines left at the end of the week have no salvage value, is it better to order three or four copies of the magazine? [Hint: For both three and four copies ordered, express net revenue as a function of demand X, and then compute the expected revenue.] Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 114 CHAPTER 3 Discrete Random Variables and Probability Distributions 36. Let X be the damage incurred (in $) in a certain type of accident during a given year. Possible X values are 0, 1000, 5000, and 10000, with probabilities .8, .1, .08, and .02, respectively. A particular company offers a $500 deductible policy. If the company wishes its expected profit to be $100, what premium amount should it charge? 40. a. Draw a line graph of the pmf of X in Exercise 35. Then determine the pmf of 2X and draw its line graph. From these two pictures, what can you say about V(X) and V(2X)? b. Use the proposition involving V(aX 1 b) to establish a general relationship between V(X) and V(2X). 37. The n candidates for a job have been ranked 1, 2, 3, . . . , n. Let X 5 the rank of a randomly selected candidate, so that X has pmf 41. Use the definition in Expression (3.13) to prove that V(aX 1 b) 5 a 2 # s2X. [Hint: With h(X) 5 aX 1 b, E[h(X)] 5 am 1 b where m 5 E(X).] p(x) 5 e 1/n x 5 1, 2, 3, . . . , n 0 otherwise (this is called the discrete uniform distribution). Compute E(X) and V(X) using the shortcut formula. [Hint: The sum of the first n positive integers is n(n 1 1)/2, whereas the sum of their squares is n(n 1 1)(2n 1 1)/6.] 38. Let X 5 the outcome when a fair die is rolled once. If before the die is rolled you are offered either (1/3.5) dollars or h(X) 5 1/X dollars, would you accept the guaranteed amount or would you gamble? [Note: It is not generally true that 1/E(X) 5 E(1/X).] 39. A chemical supply company currently has in stock 100 lb of a certain chemical, which it sells to customers in 5-lb batches. Let X 5 the number of batches ordered by a randomly chosen customer, and suppose that X has pmf x 1 2 3 4 p(x) .2 .4 .3 .1 Compute E(X) and V(X). Then compute the expected number of pounds left after the next customer’s order is shipped and the variance of the number of pounds left. [Hint: The number of pounds left is a linear function of X.] 42. Suppose E(X) 5 5 and E[X(X 2 1)] 5 27.5. What is a. E(X2)? [Hint: E[X(X 2 1)] 5 E[X 2 2 X] 5 E(X 2) 2 E(X)]? b. V(X)? c. The general relationship among the quantities E(X), E[X(X 2 1)], and V(X)? 43. Write a general rule for E(X 2 c) where c is a constant. What happens when you let c 5 m, the expected value of X? 44. A result called Chebyshev’s inequality states that for any probability distribution of an rv X and any number k that is at least 1, P(u X 2 m u $ ks) # 1/k2. In words, the probability that the value of X lies at least k standard deviations from its mean is at most 1/k2. a. What is the value of the upper bound for k 5 2? k 5 3? k 5 4? k 5 5? k 5 10? b. Compute m and s for the distribution of Exercise 13. Then evaluate P(u X 2 m u $ ks) for the values of k given in part (a). What does this suggest about the upper bound relative to the corresponding probability? c. Let X have possible values 21, 0, and 1, with probabilities 1 8 , , and 1 , respectively. What is P(u X 2 m u $ 3s), 18 9 18 and how does it compare to the corresponding bound? d. Give a distribution for which P(u X 2 m u $ 5s) 5 .04. 45. If a # X # b, show that a # E(X) # b. 3.4 The Binomial Probability Distribution There are many experiments that conform either exactly or approximately to the following list of requirements: 1. The experiment consists of a sequence of n smaller experiments called trials, where n is fixed in advance of the experiment. 2. Each trial can result in one of the same two possible outcomes (dichotomous trials), which we generically denote by success (S) and failure (F). 3. The trials are independent, so that the outcome on any particular trial does not influence the outcome on any other trial. 4. The probability of success P(S) is constant from trial to trial; we denote this probability by p. DEFINITION An experiment for which Conditions 1–4 are satisfied is called a binomial experiment. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 120 Discrete Random Variables and Probability Distributions CHAPTER 3 PROPOSITION If X , Bin(n, p), then E(X) 5 np, sX 5 1npq (where q 5 1 2 p). V(X) 5 np(1 2 p) 5 npq, and Thus, calculating the mean and variance of a binomial rv does not necessitate evaluating summations. The proof of the result for E(X) is sketched in Exercise 64. Example 3.34 If 75% of all purchases at a certain store are made with a credit card and X is the number among ten randomly selected purchases made with a credit card, then X , Bin(10, .75). Thus E(X) 5 np 5 (10)(.75) 5 7.5, V(X) 5 npq 5 10(.75)(.25) 5 1.875, and s 5 11.875 5 1.37. Again, even though X can take on only integer values, E(X) need not be an integer. If we perform a large number of independent binomial experiments, each with n 5 10 trials and p 5 .75, then the average number of S’s per experiment will be close to 7.5. The probability that X is within 1 standard deviation of its mean value is P(7.5 2 1.37 # X # 7.5 1 1.37) 5 P(6.13 # X # 8.87) 5 P(X 5 7 or 8) 5 .532 . ■ EXERCISES Section 3.4 (46–67) 46. Compute the following binomial probabilities directly from the formula for b(x; n, p): a. b(3; 8, .35) b. b(5; 8, .6) c. P(3 # X # 5) when n 5 7 and p 5 .6 d. P(1 # X) when n 5 9 and p 5 .1 47. Use Appendix Table A.1 to obtain the following probabilities: a. B(4; 15, .3) b. b(4; 15, .3) c. b(6; 15, .7) d. P(2 # X # 4) when X , Bin(15, .3) e. P(2 # X) when X , Bin(15, .3) f. P(X # 1) when X , Bin(15, .7) g. P(2 , X , 6) when X , Bin(15, .3) 48. When circuit boards used in the manufacture of compact disc players are tested, the long-run percentage of defectives is 5%. Let X 5 the number of defective boards in a random sample of size n 5 25, so X , Bin(25, .05). a. Determine P(X # 2). b. Determine P(X $ 5). c. Determine P(1 # X # 4). d. What is the probability that none of the 25 boards is defective? e. Calculate the expected value and standard deviation of X. 49. A company that produces fine crystal knows from experience that 10% of its goblets have cosmetic flaws and must be classified as “seconds.” a. Among six randomly selected goblets, how likely is it that only one is a second? b. Among six randomly selected goblets, what is the probability that at least two are seconds? c. If goblets are examined one by one, what is the probability that at most five must be selected to find four that are not seconds? 50. A particular telephone number is used to receive both voice calls and fax messages. Suppose that 25% of the incoming calls involve fax messages, and consider a sample of 25 incoming calls. What is the probability that a. At most 6 of the calls involve a fax message? b. Exactly 6 of the calls involve a fax message? c. At least 6 of the calls involve a fax message? d. More than 6 of the calls involve a fax message? 51. Refer to the previous exercise. a. What is the expected number of calls among the 25 that involve a fax message? b. What is the standard deviation of the number among the 25 calls that involve a fax message? c. What is the probability that the number of calls among the 25 that involve a fax transmission exceeds the expected number by more than 2 standard deviations? 52. Suppose that 30% of all students who have to buy a text for a particular course want a new copy (the successes!), whereas the other 70% want a used copy. Consider randomly selecting 25 purchasers. a. What are the mean value and standard deviation of the number who want a new copy of the book? b. What is the probability that the number who want new copies is more than two standard deviations away from the mean value? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3.4. The Binomial Probability Distribution c. The bookstore has 15 new copies and 15 used copies in stock. If 25 people come in one by one to purchase this text, what is the probability that all 25 will get the type of book they want from current stock? [Hint: Let X 5 the number who want a new copy. For what values of X will all 25 get what they want?] d. Suppose that new copies cost $100 and used copies cost $70. Assume the bookstore currently has 50 new copies and 50 used copies. What is the expected value of total revenue from the sale of the next 25 copies purchased? Be sure to indicate what rule of expected value you are using. [Hint: Let h(X) 5 the revenue when X of the 25 purchasers want new copies. Express this as a linear function.] 53. Exercise 30 (Section 3.3) gave the pmf of Y, the number of traffic citations for a randomly selected individual insured by a particular company. What is the probability that among 15 randomly chosen such individuals a. At least 10 have no citations? b. Fewer than half have at least one citation? c. The number that have at least one citation is between 5 and 10, inclusive?* 54. A particular type of tennis racket comes in a midsize version and an oversize version. Sixty percent of all customers at a certain store want the oversize version. a. Among ten randomly selected customers who want this type of racket, what is the probability that at least six want the oversize version? b. Among ten randomly selected customers, what is the probability that the number who want the oversize version is within 1 standard deviation of the mean value? c. The store currently has seven rackets of each version. What is the probability that all of the next ten customers who want this racket can get the version they want from current stock? 55. Twenty percent of all telephones of a certain type are submitted for service while under warranty. Of these, 60% can be repaired, whereas the other 40% must be replaced with new units. If a company purchases ten of these telephones, what is the probability that exactly two will end up being replaced under warranty? 56. The College Board reports that 2% of the 2 million high school students who take the SAT each year receive special accommodations because of documented disabilities (Los Angeles Times, July 16, 2002). Consider a random sample of 25 students who have recently taken the test. a. What is the probability that exactly 1 received a special accommodation? b. What is the probability that at least 1 received a special accommodation? c. What is the probability that at least 2 received a special accommodation? d. What is the probability that the number among the 25 who received a special accommodation is within 2 * “Between a and b, inclusive” is equivalent to (a # X # b). 121 standard deviations of the number you would expect to be accommodated? e. Suppose that a student who does not receive a special accommodation is allowed 3 hours for the exam, whereas an accommodated student is allowed 4.5 hours. What would you expect the average time allowed the 25 selected students to be? 57. Suppose that 90% of all batteries from a certain supplier have acceptable voltages. A certain type of flashlight requires two type-D batteries, and the flashlight will work only if both its batteries have acceptable voltages. Among ten randomly selected flashlights, what is the probability that at least nine will work? What assumptions did you make in the course of answering the question posed? 58. A very large batch of components has arrived at a distributor. The batch can be characterized as acceptable only if the proportion of defective components is at most .10. The distributor decides to randomly select 10 components and to accept the batch only if the number of defective components in the sample is at most 2. a. What is the probability that the batch will be accepted when the actual proportion of defectives is .01? .05? .10? .20? .25? b. Let p denote the actual proportion of defectives in the batch. A graph of P(batch is accepted) as a function of p, with p on the horizontal axis and P(batch is accepted) on the vertical axis, is called the operating characteristic curve for the acceptance sampling plan. Use the results of part (a) to sketch this curve for 0 # p # 1. c. Repeat parts (a) and (b) with “1” replacing “2” in the acceptance sampling plan. d. Repeat parts (a) and (b) with “15” replacing “10” in the acceptance sampling plan. e. Which of the three sampling plans, that of part (a), (c), or (d), appears most satisfactory, and why? 59. An ordinance requiring that a smoke detector be installed in all previously constructed houses has been in effect in a particular city for 1 year. The fire department is concerned that many houses remain without detectors. Let p 5 the true proportion of such houses having detectors, and suppose that a random sample of 25 homes is inspected. If the sample strongly indicates that fewer than 80% of all houses have a detector, the fire department will campaign for a mandatory inspection program. Because of the costliness of the program, the department prefers not to call for such inspections unless sample evidence strongly argues for their necessity. Let X denote the number of homes with detectors among the 25 sampled. Consider rejecting the claim that p $ .8 if x # 15. a. What is the probability that the claim is rejected when the actual value of p is .8? b. What is the probability of not rejecting the claim when p 5 .7? When p 5 .6? c. How do the “error probabilities” of parts (a) and (b) change if the value 15 in the decision rule is replaced by 14? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 122 CHAPTER 3 Discrete Random Variables and Probability Distributions 60. A toll bridge charges $1.00 for passenger cars and $2.50 for other vehicles. Suppose that during daytime hours, 60% of all vehicles are passenger cars. If 25 vehicles cross the bridge during a particular daytime period, what is the resulting expected toll revenue? [Hint: Let X 5 the number of passenger cars; then the toll revenue h(X) is a linear function of X.] 61. A student who is trying to write a paper for a course has a choice of two topics, A and B. If topic A is chosen, the student will order two books through interlibrary loan, whereas if topic B is chosen, the student will order four books. The student believes that a good paper necessitates receiving and using at least half the books ordered for either topic chosen. If the probability that a book ordered through interlibrary loan actually arrives in time is .9 and books arrive independently of one another, which topic should the student choose to maximize the probability of writing a good paper? What if the arrival probability is only .5 instead of .9? 62. a. For fixed n, are there values of p (0 # p # 1) for which V(X) 5 0? Explain why this is so. b. For what value of p is V(X) maximized? [Hint: Either graph V(X) as a function of p or else take a derivative.] 65. Customers at a gas station pay with a credit card (A), debit card (B), or cash (C ). Assume that successive customers make independent choices, with P(A) 5 .5, P(B) 5 .2, and P(C ) 5 .3. a. Among the next 100 customers, what are the mean and variance of the number who pay with a debit card? Explain your reasoning. b. Answer part (a) for the number among the 100 who don’t pay with cash. 66. An airport limousine can accommodate up to four passengers on any one trip. The company will accept a maximum of six reservations for a trip, and a passenger must have a reservation. From previous records, 20% of all those making reservations do not appear for the trip. Answer the following questions, assuming independence wherever appropriate. a. If six reservations are made, what is the probability that at least one individual with a reservation cannot be accommodated on the trip? b. If six reservations are made, what is the expected number of available places when the limousine departs? c. Suppose the probability distribution of the number of reservations made is given in the accompanying table. Number of reservations 3 4 5 6 Probability .1 .2 .3 .4 63. a. Show that b(x; n, 1 2 p) 5 b(n 2 x; n, p). b. Show that B(x; n, 1 2 p) 5 1 2 B(n 2 x 2 1; n, p). [Hint: At most x S’s is equivalent to at least (n 2 x) F’s.] c. What do parts (a) and (b) imply about the necessity of including values of p greater than .5 in Appendix Table A.1? Let X denote the number of passengers on a randomly selected trip. Obtain the probability mass function of X. 64. Show that E(X) 5 np when X is a binomial random variable. [Hint: First express E(X) as a sum with lower limit x 5 1. Then factor out np, let y 5 x 2 1 so that the sum is from y 5 0 to y 5 n 2 1, and show that the sum equals 1.] 67. Refer to Chebyshev’s inequality given in Exercise 44. Calculate P(u X 2 m u $ ks) for k 5 2 and k 5 3 when X , Bin(20, .5), and compare to the corresponding upper bound. Repeat for X , Bin(20, .75). 3.5 Hypergeometric and Negative Binomial Distributions The hypergeometric and negative binomial distributions are both related to the binomial distribution. The binomial distribution is the approximate probability model for sampling without replacement from a finite dichotomous (S–F) population provided the sample size n is small relative to the population size N; the hypergeometric distribution is the exact probability model for the number of S’s in the sample. The binomial rv X is the number of S’s when the number n of trials is fixed, whereas the negative binomial distribution arises from fixing the number of S’s desired and letting the number of trials be random. The Hypergeometric Distribution The assumptions leading to the hypergeometric distribution are as follows: 1. The population or set to be sampled consists of N individuals, objects, or elements (a finite population). Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3.5. Hypergeometric and Negative Binomial Distributions EXERCISES 127 Section 3.5 (68–78) 68. An electronics store has received a shipment of 20 table radios that have connections for an iPod or iPhone. Twelve of these have two slots (so they can accommodate both devices), and the other eight have a single slot. Suppose that six of the 20 radios are randomly selected to be stored under a shelf where the radios are displayed, and the remaining ones are placed in a storeroom. Let X 5 the number among the radios stored under the display shelf that have two slots. a. What kind of a distribution does X have (name and values of all parameters)? b. Compute P(X 5 2), P(X # 2), and P(X $ 2). c. Calculate the mean value and standard deviation of X. 69. Each of 12 refrigerators of a certain type has been returned to a distributor because of an audible, high-pitched, oscillating noise when the refrigerators are running. Suppose that 7 of these refrigerators have a defective compressor and the other 5 have less serious problems. If the refrigerators are examined in random order, let X be the number among the first 6 examined that have a defective compressor. Compute the following: a. P(X 5 5) b. P(X # 4) c. The probability that X exceeds its mean value by more than 1 standard deviation. d. Consider a large shipment of 400 refrigerators, of which 40 have defective compressors. If X is the number among 15 randomly selected refrigerators that have defective compressors, describe a less tedious way to calculate (at least approximately) P(X # 5) than to use the hypergeometric pmf. 70. An instructor who taught two sections of engineering statistics last term, the first with 20 students and the second with 30, decided to assign a term project. After all projects had been turned in, the instructor randomly ordered them before grading. Consider the first 15 graded projects. a. What is the probability that exactly 10 of these are from the second section? b. What is the probability that at least 10 of these are from the second section? c. What is the probability that at least 10 of these are from the same section? d. What are the mean value and standard deviation of the number among these 15 that are from the second section? e. What are the mean value and standard deviation of the number of projects not among these first 15 that are from the second section? 71. A geologist has collected 10 specimens of basaltic rock and 10 specimens of granite. The geologist instructs a laboratory assistant to randomly select 15 of the specimens for analysis. a. What is the pmf of the number of granite specimens selected for analysis? b. What is the probability that all specimens of one of the two types of rock are selected for analysis? c. What is the probability that the number of granite specimens selected for analysis is within 1 standard deviation of its mean value? 72. A personnel director interviewing 11 senior engineers for four job openings has scheduled six interviews for the first day and five for the second day of interviewing. Assume that the candidates are interviewed in random order. a. What is the probability that x of the top four candidates are interviewed on the first day? b. How many of the top four candidates can be expected to be interviewed on the first day? 73. Twenty pairs of individuals playing in a bridge tournament have been seeded 1, . . . , 20. In the first part of the tournament, the 20 are randomly divided into 10 east–west pairs and 10 north–south pairs. a. What is the probability that x of the top 10 pairs end up playing east–west? b. What is the probability that all of the top five pairs end up playing the same direction? c. If there are 2n pairs, what is the pmf of X 5 the number among the top n pairs who end up playing east–west? What are E(X) and V(X)? 74. A second-stage smog alert has been called in a certain area of Los Angeles County in which there are 50 industrial firms. An inspector will visit 10 randomly selected firms to check for violations of regulations. a. If 15 of the firms are actually violating at least one regulation, what is the pmf of the number of firms visited by the inspector that are in violation of at least one regulation? b. If there are 500 firms in the area, of which 150 are in violation, approximate the pmf of part (a) by a simpler pmf. c. For X 5 the number among the 10 visited that are in violation, compute E(X) and V(X) both for the exact pmf and the approximating pmf in part (b). 75. Suppose that p 5 P(male birth) 5 .5. A couple wishes to have exactly two female children in their family. They will have children until this condition is fulfilled. a. What is the probability that the family has x male children? b. What is the probability that the family has four children? c. What is the probability that the family has at most four children? d. How many male children would you expect this family to have? How many children would you expect this family to have? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 128 CHAPTER 3 Discrete Random Variables and Probability Distributions 76. A family decides to have children until it has three children of the same gender. Assuming P(B) 5 P(G) 5 .5, what is the pmf of X 5 the number of children in the family? 77. Three brothers and their wives decide to have children until each family has two female children. What is the pmf of X 5 the total number of male children born to the brothers? What is E(X), and how does it compare to the expected number of male children born to each brother? 78. According to the article “Characterizing the Severity and Risk of Drought in the Poudre River, Colorado” (J. of Water Res. Planning and Mgmnt., 2005: 383–393), the drought length Y is the number of consecutive time intervals in which the water supply remains below a critical value y0 (a deficit), preceded by and followed by periods in which the supply exceeds this critical value (a surplus). The cited paper proposes a geometric distribution with p 5 .409 for this random variable. a. What is the probability that a drought lasts exactly 3 intervals? At most 3 intervals? b. What is the probability that the length of a drought exceeds its mean value by at least one standard deviation? 3.6 The Poisson Probability Distribution The binomial, hypergeometric, and negative binomial distributions were all derived by starting with an experiment consisting of trials or draws and applying the laws of probability to various outcomes of the experiment. There is no simple experiment on which the Poisson distribution is based, though we will shortly describe how it can be obtained by certain limiting operations. DEFINITION A discrete random variable X is said to have a Poisson distribution with parameter m (m . 0) if the pmf of X is p(x; m) 5 e2m # mx x! x 5 0, 1, 2, 3, . . . It is no accident that we are using the symbol m for the Poisson parameter; we shall see shortly that m is in fact the expected value of X. The letter e in the pmf represents the base of the natural logarithm system; its numerical value is approximately 2.71828. In contrast to the binomial and hypergeometric distributions, the Poisson distribution spreads probability over all non-negative integers, an infinite number of possibilities. It is not obvious by inspection that p(x; m) specifies a legitimate pmf, let alone that this distribution is useful. First of all, p(x; m) . 0 for every possible x value because of the requirement that m . 0. The fact that gp(x; m) 5 1 is a consequence of the Maclaurin series expansion of em (check your calculus book for this result): em 5 1 1 m 1 m2 m3 c 1 1 5 2! 3! ` mx x50 x! g (3.18) If the two extreme terms in (3.18) are multiplied by e2m and then this quantity is moved inside the summation on the far right, the result is e2m # mx x! x50 ` 15 Example 3.39 g Let X denote the number of creatures of a particular type captured in a trap during a given time period. Suppose that X has a Poisson distribution with m 5 4.5, so on average traps will contain 4.5 creatures. [The article “Dispersal Dynamics of the Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 132 CHAPTER 3 EXERCISES Discrete Random Variables and Probability Distributions Section 3.6 (79–93) 79. Let X, the number of flaws on the surface of a randomly selected boiler of a certain type, have a Poisson distribution with parameter m 5 5. Use Appendix Table A.2 to compute the following probabilities: a. P(X # 8) b. P(X 5 8) c. P(9 # X) d. P(5 # X # 8) e. P(5 , X , 8) 80. Let X be the number of material anomalies occurring in a particular region of an aircraft gas-turbine disk. The article “Methodology for Probabilistic Life Prediction of MultipleAnomaly Materials” (Amer. Inst. of Aeronautics and Astronautics J., 2006: 787–793) proposes a Poisson distribution for X. Suppose that m 5 4. a. Compute both P(X # 4) and P(X , 4). b. Compute P(4 # X # 8). c. Compute P(8 # X). d. What is the probability that the number of anomalies exceeds its mean value by no more than one standard deviation? 81. Suppose that the number of drivers who travel between a particular origin and destination during a designated time period has a Poisson distribution with parameter m 5 20 (suggested in the article “Dynamic Ride Sharing: Theory and Practice,” J. of Transp. Engr., 1997: 308–312). What is the probability that the number of drivers will a. Be at most 10? b. Exceed 20? c. Be between 10 and 20, inclusive? Be strictly between 10 and 20? d. Be within 2 standard deviations of the mean value? 82. Consider writing onto a computer disk and then sending it through a certifier that counts the number of missing pulses. Suppose this number X has a Poisson distribution with parameter m 5 .2. (Suggested in “Average Sample Number for Semi-Curtailed Sampling Using the Poisson Distribution,” J. Quality Technology, 1983: 126–129.) a. What is the probability that a disk has exactly one missing pulse? b. What is the probability that a disk has at least two missing pulses? c. If two disks are independently selected, what is the probability that neither contains a missing pulse? 83. An article in the Los Angeles Times (Dec. 3, 1993) reports that 1 in 200 people carry the defective gene that causes inherited colon cancer. In a sample of 1000 individuals, what is the approximate distribution of the number who carry this gene? Use this distribution to calculate the approximate probability that a. Between 5 and 8 (inclusive) carry the gene. b. At least 8 carry the gene. 84. Suppose that only .10% of all computers of a certain type experience CPU failure during the warranty period. Consider a sample of 10,000 computers. a. What are the expected value and standard deviation of the number of computers in the sample that have the defect? b. What is the (approximate) probability that more than 10 sampled computers have the defect? c. What is the (approximate) probability that no sampled computers have the defect? 85. Suppose small aircraft arrive at a certain airport according to a Poisson process with rate a 5 8 per hour, so that the number of arrivals during a time period of t hours is a Poisson rv with parameter m 5 8t. a. What is the probability that exactly 6 small aircraft arrive during a 1-hour period? At least 6? At least 10? b. What are the expected value and standard deviation of the number of small aircraft that arrive during a 90-min period? c. What is the probability that at least 20 small aircraft arrive during a 2.5-hour period? That at most 10 arrive during this period? 86. The number of people arriving for treatment at an emergency room can be modeled by a Poisson process with a rate parameter of five per hour. a. What is the probability that exactly four arrivals occur during a particular hour? b. What is the probability that at least four people arrive during a particular hour? c. How many people do you expect to arrive during a 45min period? 87. The number of requests for assistance received by a towing service is a Poisson process with rate a 5 4 per hour. a. Compute the probability that exactly ten requests are received during a particular 2-hour period. b. If the operators of the towing service take a 30-min break for lunch, what is the probability that they do not miss any calls for assistance? c. How many calls would you expect during their break? 88. In proof testing of circuit boards, the probability that any particular diode will fail is .01. Suppose a circuit board contains 200 diodes. a. How many diodes would you expect to fail, and what is the standard deviation of the number that are expected to fail? b. What is the (approximate) probability that at least four diodes will fail on a randomly selected board? c. If five boards are shipped to a particular customer, how likely is it that at least four of them will work properly? (A board works properly only if all its diodes work.) 89. The article “Reliability-Based Service-Life Assessment of Aging Concrete Structures” (J. Structural Engr., 1993: 1600–1621) suggests that a Poisson process can be used to represent the occurrence of structural loads over time. Suppose the mean time between occurrences of loads is .5 year. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Supplementary Exercises a. How many loads can be expected to occur during a 2year period? b. What is the probability that more than five loads occur during a 2-year period? c. How long must a time period be so that the probability of no loads occurring during that period is at most .1? 90. Let X have a Poisson distribution with parameter m. Show that E(X) 5 m directly from the definition of expected value. [Hint: The first term in the sum equals 0, and then x can be canceled. Now factor out m and show that what is left sums to 1.] 91. Suppose that trees are distributed in a forest according to a two-dimensional Poisson process with parameter a, the expected number of trees per acre, equal to 80. a. What is the probability that in a certain quarter-acre plot, there will be at most 16 trees? b. If the forest covers 85,000 acres, what is the expected number of trees in the forest? c. Suppose you select a point in the forest and construct a circle of radius .1 mile. Let X 5 the number of trees within that circular region. What is the pmf of X? [Hint: 1 sq mile 5 640 acres.] 92. Automobiles arrive at a vehicle equipment inspection station according to a Poisson process with rate a 5 10 per hour. Suppose that with probability .5 an arriving vehicle will have no equipment violations. SUPPLEMENTARY EXERCISES 95. After shuffling a deck of 52 cards, a dealer deals out 5. Let X 5 the number of suits represented in the five-card hand. a. Show that the pmf of X is p(x) a. What is the probability that exactly ten arrive during the hour and all ten have no violations? b. For any fixed y $ 10, what is the probability that y arrive during the hour, of which ten have no violations? c. What is the probability that ten “no-violation” cars arrive during the next hour? [Hint: Sum the probabilities in part (b) from y 5 10 to ⬁.] 93. a. In a Poisson process, what has to happen in both the time interval (0, t) and the interval (t, t 1 ⌬t) so that no events occur in the entire interval (0, t 1 ⌬t)? Use this and Assumptions 1–3 to write a relationship between P0 (t 1 ⌬t) and P0(t). b. Use the result of part (a) to write an expression for the difference P0 (t 1 ⌬t) 2 P0 (t). Then divide by ⌬t and let ⌬t S 0 to obtain an equation involving (d/dt)P0 (t), the derivative of P0(t) with respect to t. c. Verify that P0 (t) 5 e2at satisfies the equation of part (b). d. It can be shown in a manner similar to parts (a) and (b) that the Pk (t)s must satisfy the system of differential equations d P (t) 5 aPk21(t) 2 aPk (t) dt k k 5 1, 2, 3, . . . Verify that Pk(t) 5 e2at(at)k/k! satisfies the system. (This is actually the only solution.) (94–122) 94. Consider a deck consisting of seven cards, marked 1, 2, . . . , 7. Three of these cards are selected at random. Define an rv W by W 5 the sum of the resulting numbers, and compute the pmf of W. Then compute m and s2. [Hint: Consider outcomes as unordered, so that (1, 3, 7) and (3, 1, 7) are not different outcomes. Then there are 35 outcomes, and they can be listed. (This type of rv actually arises in connection with a statistical procedure called Wilcoxon’s rank-sum test, in which there is an x sample and a y sample and W is the sum of the ranks of the x’s in the combined sample; see Section 15.2.) x 133 1 2 3 4 .002 .146 .588 .264 [Hint: p(1) 5 4P(all are spades), p(2) 5 6P(only spades and hearts with at least one of each suit), and p(4) 5 4P(2 spades ¨ one of each other suit).] b. Compute m, s2, and s. 96. The negative binomial rv X was defined as the number of F’s preceding the rth S. Let Y 5 the number of trials necessary to obtain the rth S. In the same manner in which the pmf of X was derived, derive the pmf of Y. 97. Of all customers purchasing automatic garage-door openers, 75% purchase a chain-driven model. Let X 5 the number among the next 15 purchasers who select the chain-driven model. a. What is the pmf of X? b. Compute P(X . 10). c. Compute P(6 # X # 10). d. Compute m and s2. e. If the store currently has in stock 10 chain-driven models and 8 shaft-driven models, what is the probability that the requests of these 15 customers can all be met from existing stock? 98. A friend recently planned a camping trip. He had two flashlights, one that required a single 6-V battery and another that used two size-D batteries. He had previously packed two 6-V and four size-D batteries in his camper. Suppose the probability that any particular battery works is p and that batteries work or fail independently of one another. Our friend wants to take just one flashlight. For what values of p should he take the 6-V flashlight? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 134 CHAPTER 3 Discrete Random Variables and Probability Distributions 99. A k-out-of-n system is one that will function if and only if at least k of the n individual components in the system function. If individual components function independently of one another, each with probability .9, what is the probability that a 3-out-of-5 system functions? 100. A manufacturer of integrated circuit chips wishes to control the quality of its product by rejecting any batch in which the proportion of defective chips is too high. To this end, out of each batch (10,000 chips), 25 will be selected and tested. If at least 5 of these 25 are defective, the entire batch will be rejected. a. What is the probability that a batch will be rejected if 5% of the chips in the batch are in fact defective? b. Answer the question posed in (a) if the percentage of defective chips in the batch is 10%. c. Answer the question posed in (a) if the percentage of defective chips in the batch is 20%. d. What happens to the probabilities in (a)–(c) if the critical rejection number is increased from 5 to 6? 101. Of the people passing through an airport metal detector, .5% activate it; let X 5 the number among a randomly selected group of 500 who activate the detector. a. What is the (approximate) pmf of X? b. Compute P(X 5 5). c. Compute P(5 # X). 102. An educational consulting firm is trying to decide whether high school students who have never before used a handheld calculator can solve a certain type of problem more easily with a calculator that uses reverse Polish logic or one that does not use this logic. A sample of 25 students is selected and allowed to practice on both calculators. Then each student is asked to work one problem on the reverse Polish calculator and a similar problem on the other. Let p 5 P(S), where S indicates that a student worked the problem more quickly using reverse Polish logic than without, and let X 5 number of S’s. a. If p 5 .5, what is P(7 # X # 18)? b. If p 5 .8, what is P(7 # X # 18)? c. If the claim that p 5 .5 is to be rejected when either x # 7 or x $ 18, what is the probability of rejecting the claim when it is actually correct? d. If the decision to reject the claim p 5 .5 is made as in part (c), what is the probability that the claim is not rejected when p 5 .6? When p 5 .8? e. What decision rule would you choose for rejecting the claim p 5 .5 if you wanted the probability in part (c) to be at most .01? 103. Consider a disease whose presence can be identified by carrying out a blood test. Let p denote the probability that a randomly selected individual has the disease. Suppose n individuals are independently selected for testing. One way to proceed is to carry out a separate test on each of the n blood samples. A potentially more economical approach, group testing, was introduced during World War II to identify syphilitic men among army inductees. First, take a part of each blood sample, combine these specimens, and carry out a single test. If no one has the disease, the result will be negative, and only the one test is required. If at least one individual is diseased, the test on the combined sample will yield a positive result, in which case the n individual tests are then carried out. If p 5 .1 and n 5 3, what is the expected number of tests using this procedure? What is the expected number when n 5 5? [The article “Random Multiple-Access Communication and Group Testing” (IEEE Trans. on Commun., 1984: 769–774) applied these ideas to a communication system in which the dichotomy was active/idle user rather than diseased/nondiseased.] 104. Let p1 denote the probability that any particular code symbol is erroneously transmitted through a communication system. Assume that on different symbols, errors occur independently of one another. Suppose also that with probability p2 an erroneous symbol is corrected upon receipt. Let X denote the number of correct symbols in a message block consisting of n symbols (after the correction process has ended). What is the probability distribution of X? 105. The purchaser of a power-generating unit requires c consecutive successful start-ups before the unit will be accepted. Assume that the outcomes of individual start-ups are independent of one another. Let p denote the probability that any particular start-up is successful. The random variable of interest is X 5 the number of start-ups that must be made prior to acceptance. Give the pmf of X for the case c 5 2. If p 5 .9, what is P(X # 8)? [Hint: For x $ 5, express p(x) “recursively” in terms of the pmf evaluated at the smaller values x 2 3, x 2 4, c, 2.] (This problem was suggested by the article “Evaluation of a Start-Up Demonstration Test,” J. Quality Technology, 1983: 103–106.) 106. A plan for an executive travelers’ club has been developed by an airline on the premise that 10% of its current customers would qualify for membership. a. Assuming the validity of this premise, among 25 randomly selected current customers, what is the probability that between 2 and 6 (inclusive) qualify for membership? b. Again assuming the validity of the premise, what are the expected number of customers who qualify and the standard deviation of the number who qualify in a random sample of 100 current customers? c. Let X denote the number in a random sample of 25 current customers who qualify for membership. Consider rejecting the company’s premise in favor of the claim that p . .10 if x $ 7. What is the probability that the company’s premise is rejected when it is actually valid? d. Refer to the decision rule introduced in part (c). What is the probability that the company’s premise is not rejected even though p 5 .20 (i.e., 20% qualify)? 107. Forty percent of seeds from maize (modern-day corn) ears carry single spikelets, and the other 60% carry paired spikelets. A seed with single spikelets will produce an ear Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Supplementary Exercises with single spikelets 29% of the time, whereas a seed with paired spikelets will produce an ear with single spikelets 26% of the time. Consider randomly selecting ten seeds. a. What is the probability that exactly five of these seeds carry a single spikelet and produce an ear with a single spikelet? b. What is the probability that exactly five of the ears produced by these seeds have single spikelets? What is the probability that at most five ears have single spikelets? 108. A trial has just resulted in a hung jury because eight members of the jury were in favor of a guilty verdict and the other four were for acquittal. If the jurors leave the jury room in random order and each of the first four leaving the room is accosted by a reporter in quest of an interview, what is the pmf of X 5 the number of jurors favoring acquittal among those interviewed? How many of those favoring acquittal do you expect to be interviewed? 109. A reservation service employs five information operators who receive requests for information independently of one another, each according to a Poisson process with rate a 5 2 per minute. a. What is the probability that during a given 1-min period, the first operator receives no requests? b. What is the probability that during a given 1-min period, exactly four of the five operators receive no requests? c. Write an expression for the probability that during a given 1-min period, all of the operators receive exactly the same number of requests. 110. Grasshoppers are distributed at random in a large field according to a Poisson process with parameter a 5 2 per square yard. How large should the radius R of a circular sampling region be taken so that the probability of finding at least one in the region equals .99? 111. A newsstand has ordered five copies of a certain issue of a photography magazine. Let X 5 the number of individuals who come in to purchase this magazine. If X has a Poisson distribution with parameter m 5 4, what is the expected number of copies that are sold? 112. Individuals A and B begin to play a sequence of chess games. Let S 5 5A wins a game6 , and suppose that outcomes of successive games are independent with P(S) 5 p and P(F) 5 1 2 p (they never draw). They will play until one of them wins ten games. Let X 5 the number of games played (with possible values 10, 11, . . . , 19). a. For x 5 10, 11, c, 19, obtain an expression for p(x) 5 P(X 5 x). b. If a draw is possible, with p 5 P(S), q 5 P(F), 1 2 p 2 q 5 P(draw), what are the possible values of X? What is P(20 # X) ? [Hint: P(20 # X) 5 1 2 P(X , 20).] 113. A test for the presence of a certain disease has probability .20 of giving a false-positive reading (indicating that an individual has the disease when this is not the case) and 135 probability .10 of giving a false-negative result. Suppose that ten individuals are tested, five of whom have the disease and five of whom do not. Let X 5 the number of positive readings that result. a. Does X have a binomial distribution? Explain your reasoning. b. What is the probability that exactly three of the ten test results are positive? 114. The generalized negative binomial pmf is given by nb(x; r, p) 5 k(r, x) # pr(1 2 p)x x 5 0, 1, 2, . . . Let X, the number of plants of a certain species found in a particular region, have this distribution with p 5 .3 and r 5 2.5. What is P(X 5 4)? What is the probability that at least one plant is found? 115. There are two Certified Public Accountants in a particular office who prepare tax returns for clients. Suppose that for a particular type of complex form, the number of errors made by the first preparer has a Poisson distribution with mean value m1, the number of errors made by the second preparer has a Poisson distribution with mean value m2, and that each CPA prepares the same number of forms of this type. Then if a form of this type is randomly selected, the function p(x; m1, m2) 5 .5 e2m1mx1 e2m2mx2 1 .5 x! x! x 5 0, 1, 2, . . . gives the pmf of X 5 the number of errors on the selected form. a. Verify that p(x; m1, m2) is in fact a legitimate pmf ($ 0 and sums to 1). b. What is the expected number of errors on the selected form? c. What is the variance of the number of errors on the selected form? d. How does the pmf change if the first CPA prepares 60% of all such forms and the second prepares 40%? 116. The mode of a discrete random variable X with pmf p(x) is that value x* for which p(x) is largest (the most probable x value). a. Let X | Bin(n, p). By considering the ratio b(x 1 1; n, p)/b(x; n, p), show that b(x; n, p) increases with x as long as x , np 2 (1 2 p). Conclude that the mode x* is the integer satisfying (n 1 1)p 2 1 # x* # (n 1 1)p. b. Show that if X has a Poisson distribution with parameter m, the mode is the largest integer less than m. If m is an integer, show that both m 2 1 and m are modes. 117. A computer disk storage device has ten concentric tracks, numbered 1, 2, . . . , 10 from outermost to innermost, and a single access arm. Let pi 5 the probability that any particular request for data will take the arm to track i(i 5 1, . . . , 10). Assume that the tracks accessed in successive seeks are independent. Let X 5 the number of Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 136 CHAPTER 3 Discrete Random Variables and Probability Distributions tracks over which the access arm passes during two successive requests (excluding the track that the arm has just left, so possible X values are x 5 0, 1, . . . , 9). Compute the pmf of X. [Hint: P(the arm is now on track i and X 5 j) 5 P(X 5 j|arm now on i) # p.i After the conditional probability is written in terms of p1, . . . , p10, by the law of total probability, the desired probability is obtained by summing over i.] 118. If X is a hypergeometric rv, show directly from the definition that E(X) 5 nM/N (consider only the case n , M). [Hint: Factor nM/N out of the sum for E(X), and show that the terms inside the sum are of the form h(y; n 2 1, M 2 1, N 2 1), where y 5 x 2 1.] 119. Use the fact that g (x 2 m)2p(x) $ g (x 2 m)2p(x) x: u x2mu$ks all x to prove Chebyshev’s inequality given in Exercise 44. 120. The simple Poisson process of Section 3.6 is characterized by a constant rate a at which events occur per unit time. A generalization of this is to suppose that the probability of exactly one event occurring in the interval [t, t 1 ⌬t] is a(t) # ⌬t 1 o(⌬t). It can then be shown that the number of events occurring during an interval [t1, t2] has a Poisson distribution with parameter m5 冮 t1 a(t) dt t2 The occurrence of events over time in this situation is called a nonhomogeneous Poisson process. The article “Inference Based on Retrospective Ascertainment,” J. Amer. Stat. Assoc., 1989: 360–372, considers the intensity function a(t) 5 ea1bt as appropriate for events involving transmission of HIV (the AIDS virus) via blood transfusions. Suppose that a 5 2 and b 5 .6 (close to values suggested in the paper), with time in years. a. What is the expected number of events in the interval [0, 4]? In [2, 6]? b. What is the probability that at most 15 events occur in the interval [0, .9907]? 121. Consider a collection A1, . . . , Ak of mutually exclusive and exhaustive events, and a random variable X whose distribution depends on which of the Ai’s occurs (e.g., a commuter might select one of three possible routes from home to work, with X representing the commute time). Let E(Xu Ai) denote the expected value of X given that the event Ai occurs. Then it can be shown that E(X) 5 ⌺E(Xu Ai) # P(Ai), the weighted average of the individual “conditional expectations” where the weights are the probabilities of the partitioning events. a. The expected duration of a voice call to a particular telephone number is 3 minutes, whereas the expected duration of a data call to that same number is 1 minute. If 75% of all calls are voice calls, what is the expected duration of the next call? b. A deli sells three different types of chocolate chip cookies. The number of chocolate chips in a type i cookie has a Poisson distribution with parameter mi 5 i 1 1 (i 5 1, 2, 3). If 20% of all customers purchasing a chocolate chip cookie select the first type, 50% choose the second type, and the remaining 30% opt for the third type, what is the expected number of chips in a cookie purchased by the next customer? 122. Consider a communication source that transmits packets containing digitized speech. After each transmission, the receiver sends a message indicating whether the transmission was successful or unsuccessful. If a transmission is unsuccessful, the packet is re-sent. Suppose a voice packet can be transmitted a maximum of 10 times. Assuming that the results of successive transmissions are independent of one another and that the probability of any particular transmission being successful is p, determine the probability mass function of the rv X 5 the number of times a packet is transmitted. Then obtain an expression for the expected number of times a packet is transmitted. Bibliography Johnson, Norman, Samuel Kotz, and Adrienne Kemp, Discrete Univariate Distributions, Wiley, New York, 1992. An encyclopedia of information on discrete distributions. Olkin, Ingram, Cyrus Derman, and Leon Gleser, Probability Models and Applications (2nd ed.), Macmillan, New York, 1994. Contains an in-depth discussion of both general properties of discrete and continuous distributions and results for specific distributions. Ross, Sheldon, Introduction to Probability Models (9th ed.), Academic Press, New York, 2007. A good source of material on the Poisson process and generalizations and a nice introduction to other topics in applied probability. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 142 CHAPTER 4 Continuous Random Variables and Probability Distributions The probability that headway time is at most 5 sec is 5 P(X # 5) 5 3 5 f(x) dx 5 3 .15e2.15(x2.5) dx .5 2` 5 .15e.075 3 e2.15x dx 5 .15e.075 # a2 5 .5 1 2.15x x55 e b ` .15 x5.5 5 e.075(2e2.75 1 e2.075) 5 1.078(2.472 1 .928) 5 .491 5 P(less than 5 sec) 5 P(X , 5) ■ Unlike discrete distributions such as the binomial, hypergeometric, and negative binomial, the distribution of any given continuous rv cannot usually be derived using simple probabilistic arguments. Instead, one must make a judicious choice of pdf based on prior knowledge and available data. Fortunately, there are some general families of pdf’s that have been found to be sensible candidates in a wide variety of experimental situations; several of these are discussed later in the chapter. Just as in the discrete case, it is often helpful to think of the population of interest as consisting of X values rather than individuals or objects. The pdf is then a model for the distribution of values in this numerical population, and from this model various population characteristics (such as the mean) can be calculated. EXERCISES Section 4.1 (1–10) 1. The current in a certain circuit as measured by an ammeter is a continuous random variable X with the following density function: f(x) 5 e .075x 1 .2 3 # x # 5 0 otherwise a. Graph the pdf and verify that the total area under the density curve is indeed 1. b. Calculate P(X # 4). How does this probability compare to P(X , 4)? c. Calculate P(3.5 # X # 4.5) and also P(4.5 , X). 2. Suppose the reaction temperature X (in 8C) in a certain chemical process has a uniform distribution with A 5 25 and B 5 5. a. Compute P(X , 0). b. Compute P(22.5 , X , 2.5). c. Compute P(22 # X # 3). d. For k satisfying 25 , k , k 1 4 , 5, compute P(k , X , k 1 4). 3. The error involved in making a certain measurement is a continuous rv X with pdf f (x) 5 e a. b. c. d. .09375(4 2 x 2) 22 # x # 2 0 otherwise Sketch the graph of f(x). Compute P(X . 0). Compute P(21 , X , 1). Compute P(X , 2.5 or X . .5). 4. Let X denote the vibratory stress (psi) on a wind turbine blade at a particular wind speed in a wind tunnel. The article “Blade Fatigue Life Assessment with Application to VAWTS” (J. of Solar Energy Engr., 1982: 107–111) proposes the Rayleigh distribution, with pdf x f(x; u) 5 • u 2 # e2x /(2u ) 2 2 0 x.0 otherwise as a model for the X distribution. a. Verify that f(x; u) is a legitimate pdf. b. Suppose u 5 100 (a value suggested by a graph in the article). What is the probability that X is at most 200? Less than 200? At least 200? c. What is the probability that X is between 100 and 200 (again assuming u 5 100)? d. Give an expression for P(X # x). 5. A college professor never finishes his lecture before the end of the hour and always finishes his lectures within 2 min after the hour. Let X 5 the time that elapses between the end of the hour and the end of the lecture and suppose the pdf of X is f(x) 5 e kx 2 0 # x # 2 0 otherwise a. Find the value of k and draw the corresponding density curve. [Hint: Total area under the graph of f(x) is 1.] b. What is the probability that the lecture ends within 1 min of the end of the hour? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4.2 Cumulative Distribution Functions and Expected Values c. What is the probability that the lecture continues beyond the hour for between 60 and 90 sec? d. What is the probability that the lecture continues for at least 90 sec beyond the end of the hour? 6. The actual tracking weight of a stereo cartridge that is set to track at 3 g on a particular changer can be regarded as a continuous rv X with pdf 1 y 25 143 0#y,5 1 f( y) 5 e 2 2 y 5 # y # 10 5 25 0 y , 0 or y . 10 a. Sketch a graph of the pdf of Y. ` k[1 2 (x 2 3)2] 2 # x # 4 f(x) 5 e 0 otherwise a. Sketch the graph of f(x). b. Find the value of k. c. What is the probability that the actual tracking weight is greater than the prescribed weight? d. What is the probability that the actual weight is within .25 g of the prescribed weight? e. What is the probability that the actual weight differs from the prescribed weight by more than .5 g? 7. The time X (min) for a lab assistant to prepare the equipment for a certain experiment is believed to have a uniform distribution with A 5 25 and B 5 35. a. Determine the pdf of X and sketch the corresponding density curve. b. What is the probability that preparation time exceeds 33 min? c. What is the probability that preparation time is within 2 min of the mean time? [Hint: Identify m from the graph of f(x).] d. For any a such that 25 , a , a 1 2 , 35, what is the probability that preparation time is between a and a 1 2 min? 8. In commuting to work, a professor must first get on a bus near her house and then transfer to a second bus. If the waiting time (in minutes) at each stop has a uniform distribution with A 5 0 and B 5 5, then it can be shown that the total waiting time Y has the pdf b. Verify that 3 f( y) dy 5 1. 2` c. What is the probability that total waiting time is at most 3 min? d. What is the probability that total waiting time is at most 8 min? e. What is the probability that total waiting time is between 3 and 8 min? f. What is the probability that total waiting time is either less than 2 min or more than 6 min? 9. Consider again the pdf of X 5 time headway given in Example 4.5. What is the probability that time headway is a. At most 6 sec? b. More than 6 sec? At least 6 sec? c. Between 5 and 6 sec? 10. A family of pdf’s that has been used to approximate the distribution of income, city population size, and size of firms is the Pareto family. The family has two parameters, k and u, both . 0, and the pdf is k # uk x$u f(x; k, u) 5 u x k11 0 x,u a. Sketch the graph of f(x; k, u). b. Verify that the total area under the graph equals 1. c. If the rv X has pdf f (x; k, u), for any fixed b . u, obtain an expression for P(X # b). d. For u , a , b, obtain an expression for the probability P(a # X # b). 4.2 Cumulative Distribution Functions and Expected Values Several of the most important concepts introduced in the study of discrete distributions also play an important role for continuous distributions. Definitions analogous to those in Chapter 3 involve replacing summation by integration. The Cumulative Distribution Function The cumulative distribution function (cdf) F(x) for a discrete rv X gives, for any specified number x, the probability P(X # x). It is obtained by summing the pmf p(y) over all possible values y satisfying y # x. The cdf of a continuous rv gives the same probabilities P(X # x) and is obtained by integrating the pdf f(y) between the limits 2` and x. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 150 Continuous Random Variables and Probability Distributions CHAPTER 4 DEFINITION The variance of a continuous random variable X with pdf f(x) and mean value m is ` sX2 5 V(X) 5 3 (x 2 m)2 # f(x)dx 5 E[(X 2 m)2] 2` The standard deviation (SD) of X is sX 5 2V(X). The variance and standard deviation give quantitative measures of how much spread there is in the distribution or population of x values. Again s is roughly the size of a typical deviation from m. Computation of s2 is facilitated by using the same shortcut formula employed in the discrete case. V(X) 5 E(X 2) 2 [E(X)]2 PROPOSITION Example 4.12 (Example 4.10 continued) For X 5 weekly gravel sales, we computed E(X) 5 38. Since ` 1 3 E(X 2) 5 3 x 2 # f(x) dx 5 3 x 2 # (1 2 x 2) dx 2 0 2` 1 3 1 5 3 (x 2 2 x 4) dx 5 2 5 0 V(X) 5 1 3 2 19 2 a b 5 5 .059 5 8 320 and sX 5 .244 ■ When h(X) 5 aX 1 b, the expected value and variance of h(X) satisfy the same properties as in the discrete case: E[h(X)] 5 am 1 b and V[h(X)] 5 a 2 # s2. EXERCISES Section 4.2 (11–27) 11. Let X denote the amount of time a book on two-hour reserve is actually checked out, and suppose the cdf is 12. The cdf for X (5 measurement error) of Exercise 3 is 0 x , 22 3 x3 1 a4x 2 b 22 # x , 2 F(x) 5 d 1 2 32 3 1 2#x 0 x,0 x2 0#x,2 F(x) 5 d 4 1 2#x Use the cdf to obtain the following: a. P(X # 1) b. P(.5 # X # 1) c. P(X . 1.5) | [solve .5 5 F(m |)] d. The median checkout duration m e. F r(x) to obtain the density function f(x) f. E(X) g. V(X) and sX h. If the borrower is charged an amount h(X ) 5 X 2 when checkout duration is X, compute the expected charge E[h(X)]. a. b. c. d. Compute P(X , 0). Compute P(21 , X , 1). Compute P(.5 , X). Verify that f(x) is as given in Exercise 3 by obtaining F r(x). | 5 0. e. Verify that m 13. Example 4.5 introduced the concept of time headway in traffic flow and proposed a particular distribution for X 5 the headway between two randomly selected consecutive cars (sec). Suppose that in a different traffic environment, the distribution of time headway has the form Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4.2 Cumulative Distribution Functions and Expected Values k x.1 f(x) 5 • x 4 0 x#1 a. Determine the value of k for which f(x) is a legitimate pdf. b. Obtain the cumulative distribution function. c. Use the cdf from (b) to determine the probability that headway exceeds 2 sec and also the probability that headway is between 2 and 3 sec. d. Obtain the mean value of headway and the standard deviation of headway. e. What is the probability that headway is within 1 standard deviation of the mean value? 14. The article “Modeling Sediment and Water Column Interactions for Hydrophobic Pollutants” (Water Research, 1984: 1169–1174) suggests the uniform distribution on the interval (7.5, 20) as a model for depth (cm) of the bioturbation layer in sediment in a certain region. a. What are the mean and variance of depth? b. What is the cdf of depth? c. What is the probability that observed depth is at most 10? Between 10 and 15? d. What is the probability that the observed depth is within 1 standard deviation of the mean value? Within 2 standard deviations? 15. Let X denote the amount of space occupied by an article placed in a 1-ft3 packing container. The pdf of X is f(x) 5 e 90x 8(1 2 x) 0 , x , 1 0 otherwise a. Graph the pdf. Then obtain the cdf of X and graph it. b. What is P(X # .5) [i.e., F(.5)]? c. Using the cdf from (a), what is P(.25 , X # .5)? What is P(.25 # X # .5)? d. What is the 75th percentile of the distribution? e. Compute E(X) and sX. f. What is the probability that X is more than 1 standard deviation from its mean value? 16. Answer parts (a)–(f) of Exercise 15 with X 5 lecture time past the hour given in Exercise 5. 17. Let X have a uniform distribution on the interval [A, B]. a. Obtain an expression for the (100p)th percentile. b. Compute E(X), V(X), and sX. c. For n, a positive integer, compute E(X n). 18. Let X denote the voltage at the output of a microphone, and suppose that X has a uniform distribution on the interval from 21 to 1. The voltage is processed by a “hard limiter” with cutoff values 2.5 and .5, so the limiter output is a random variable Y related to X by Y 5 X if |X| # .5, Y 5 .5 if X . .5, and Y 5 2.5 if X , 2.5. a. What is P(Y 5 .5)? b. Obtain the cumulative distribution function of Y and graph it. 151 19. Let X be a continuous rv with cdf 0 x#0 4 x F(x) 5 μ c1 1 lna b d 0 , x # 4 x 4 1 x.4 [This type of cdf is suggested in the article “Variability in Measured Bedload-Transport Rates” (Water Resources Bull., 1985: 39–48) as a model for a certain hydrologic variable.] What is a. P(X # 1)? b. P(1 # X # 3)? c. The pdf of X? 20. Consider the pdf for total waiting time Y for two buses 1 y 0#y,5 25 1 f ( y) 5 e 2 2 y 5 # y # 10 5 25 0 otherwise introduced in Exercise 8. a. Compute and sketch the cdf of Y. [Hint: Consider separately 0 # y , 5 and 5 # y # 10 in computing F(y). A graph of the pdf should be helpful.] b. Obtain an expression for the (100p)th percentile. [Hint: Consider separately 0 , p , .5 and .5 , p , 1.] c. Compute E(Y ) and V(Y). How do these compare with the expected waiting time and variance for a single bus when the time is uniformly distributed on [0, 5]? 21. An ecologist wishes to mark off a circular sampling region having radius 10 m. However, the radius of the resulting region is actually a random variable R with pdf f(r) 5 u 3 [1 2 (10 2 r)2] 9 # r # 11 4 0 otherwise What is the expected area of the resulting circular region? 22. The weekly demand for propane gas (in 1000s of gallons) from a particular facility is an rv X with pdf u 2a1 2 x 2 b 1 f(x) 5 0 1#x#2 otherwise a. Compute the cdf of X. b. Obtain an expression for the (100p)th percentile. What is |? the value of m c. Compute E(X) and V(X). d. If 1.5 thousand gallons are in stock at the beginning of the week and no new supply is due in during the week, how much of the 1.5 thousand gallons is expected to be left at the end of the week? [Hint: Let h(x) 5 amount left when demand 5 x.] Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 152 CHAPTER 4 Continuous Random Variables and Probability Distributions 23. If the temperature at which a certain compound melts is a random variable with mean value 1208C and standard deviation 28 C, what are the mean temperature and standard deviation measured in 8F? [Hint: 8F 5 1.88 C 1 32.] Although X is a discrete random variable, suppose its distribution is quite well approximated by a continuous distribution with pdf f(x) 5 k(1 1 x/2.5)27 for x $ 0. a. What is the value of k? b. Graph the pdf of X. c. What are the expected value and standard deviation of total medical expenses? d. This individual is covered by an insurance plan that entails a $500 deductible provision (so the first $500 worth of expenses are paid by the individual). Then the plan will pay 80% of any additional expenses exceeding $500, and the maximum payment by the individual (including the deductible amount) is $2500. Let Y denote the amount of this individual’s medical expenses paid by the insurance company. What is the expected value of Y? [Hint: First figure out what value of X corresponds to the maximum out-of-pocket expense of $2500. Then write an expression for Y as a function of X (which involves several different pieces) and calculate the expected value of this function.] 24. Let X have the Pareto pdf f (x; k, u) 5 u k # uk x$u x k11 0 x,u introduced in Exercise 10. a. If k . 1, compute E(X). b. What can you say about E(X) if k 5 1? c. If k . 2, show that V(X) 5 ku2 (k 2 1)22 (k 2 2)21. d. If k 5 2, what can you say about V(X)? e. What conditions on k are necessary to ensure that E(X n) is finite? 25. Let X be the temperature in 8 C at which a certain chemical reaction takes place, and let Y be the temperature in 8 F (so Y 5 1.8X 1 32). |, show that a. If the median of the X distribution is m | 1.8m 1 32 is the median of the Y distribution. b. How is the 90th percentile of the Y distribution related to the 90th percentile of the X distribution? Verify your conjecture. c. More generally, if Y 5 aX 1 b, how is any particular percentile of the Y distribution related to the corresponding percentile of the X distribution? 26. Let X be the total medical expenses (in 1000s of dollars) incurred by a particular individual during a given year. 27. When a dart is thrown at a circular target, consider the location of the landing point relative to the bull’s eye. Let X be the angle in degrees measured from the horizontal, and assume that X is uniformly distributed on [0, 360]. Define Y to be the transformed variable Y 5 h(X) 5 (2p/360)X 2 p, so Y is the angle measured in radians and Y is between 2p and p. Obtain E(Y) and sY by first obtaining E(X) and sX, and then using the fact that h(X) is a linear function of X. 4.3 The Normal Distribution The normal distribution is the most important one in all of probability and statistics. Many numerical populations have distributions that can be fit very closely by an appropriate normal curve. Examples include heights, weights, and other physical characteristics (the famous 1903 Biometrika article “On the Laws of Inheritance in Man” discussed many examples of this sort), measurement errors in scientific experiments, anthropometric measurements on fossils, reaction times in psychological experiments, measurements of intelligence and aptitude, scores on various tests, and numerous economic measures and indicators. In addition, even when individual variables themselves are not normally distributed, sums and averages of the variables will under suitable conditions have approximately a normal distribution; this is the content of the Central Limit Theorem discussed in the next chapter. DEFINITION A continuous rv X is said to have a normal distribution with parameters m and s (or m and s2), where 2` , m , ` and 0 , s, if the pdf of X is f(x; m, s) 5 1 2 2 e2(x2m) /(2s ) 2` , x , ` 12ps (4.3) Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 162 CHAPTER 4 Continuous Random Variables and Probability Distributions The exact probabilities are .2622 and .8348, respectively, so the approximations are quite good. In the last calculation, the probability P(5 # X # 15) is being approximated by the area under the normal curve between 4.5 and 15.5—the continuity correction is used for both the upper and lower limits. ■ When the objective of our investigation is to make an inference about a population proportion p, interest will focus on the sample proportion of successes X/n rather than on X itself. Because this proportion is just X multiplied by the constant 1/n, it will also have approximately a normal distribution (with mean m 5 p and standard deviation s 5 1pq/n) provided that both np $ 10 and nq $ 10. This normal approximation is the basis for several inferential procedures to be discussed in later chapters. EXERCISES Section 4.3 (28–58) 28. Let Z be a standard normal random variable and calculate the following probabilities, drawing pictures wherever appropriate. a. P(0 # Z # 2.17) b. P(0 # Z # 1) c. P(22.50 # Z # 0) d. P(22.50 # Z # 2.50) e. P(Z # 1.37) f. P(21.75 # Z) g. P(21.50 # Z # 2.00) h. P(1.37 # Z # 2.50) i. P(1.50 # Z) j. P( u Z u # 2.50) deviation 1.75 km/h is postulated. Consider randomly selecting a single such moped. a. What is the probability that maximum speed is at most 50 km/h? b. What is the probability that maximum speed is at least 48 km/h? c. What is the probability that maximum speed differs from the mean value by at most 1.5 standard deviations? 29. In each case, determine the value of the constant c that makes the probability statement correct. a. (c) 5 .9838 b. P(0 # Z # c) 5 .291 c. P(c # Z) 5 .121 d. P(2c # Z # c) 5 .668 e. P(c # u Z u) 5 .016 34. The article “Reliability of Domestic-Waste Biofilm Reactors” (J. of Envir. Engr., 1995: 785–790) suggests that substrate concentration (mg/cm3) of influent to a reactor is normally distributed with m 5 .30 and s 5 .06. a. What is the probability that the concentration exceeds .25? b. What is the probability that the concentration is at most .10? c. How would you characterize the largest 5% of all concentration values? 30. Find the following percentiles for the standard normal distribution. Interpolate where appropriate. a. 91st b. 9th c. 75th d. 25th e. 6th 31. Determine za for the following: a. a 5 .0055 b. a 5 .09 c. a 5 .663 32. Suppose the force acting on a column that helps to support a building is a normally distributed random variable X with mean value 15.0 kips and standard deviation 1.25 kips. Compute the following probabilities by standardizing and then using Table A.3. a. P(X # 15) b. P(X # 17.5) c. P(X $ 10) d. P(14 # X # 18) e. P(u X 2 15 u # 3) 33. Mopeds (small motorcycles with an engine capacity below 50 cm3) are very popular in Europe because of their mobility, ease of operation, and low cost. The article “Procedure to Verify the Maximum Speed of Automatic Transmission Mopeds in Periodic Motor Vehicle Inspections” (J. of Automobile Engr., 2008: 1615–1623) described a rolling bench test for determining maximum vehicle speed. A normal distribution with mean value 46.8 km/h and standard 35. Suppose the diameter at breast height (in.) of trees of a certain type is normally distributed with m 5 8.8 and s 5 2.8, as suggested in the article “Simulating a Harvester-Forwarder Softwood Thinning” (Forest Products J., May 1997: 36–41). a. What is the probability that the diameter of a randomly selected tree will be at least 10 in.? Will exceed 10 in.? b. What is the probability that the diameter of a randomly selected tree will exceed 20 in.? c. What is the probability that the diameter of a randomly selected tree will be between 5 and 10 in.? d. What value c is such that the interval (8.8 2 c, 8.8 1 c) includes 98% of all diameter values? e. If four trees are independently selected, what is the probability that at least one has a diameter exceeding 10 in.? 36. Spray drift is a constant concern for pesticide applicators and agricultural producers. The inverse relationship between droplet size and drift potential is well known. The Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4.3 The Normal Distribution paper “Effects of 2,4-D Formulation and Quinclorac on Spray Droplet Size and Deposition” (Weed Technology, 2005: 1030–1036) investigated the effects of herbicide formulation on spray atomization. A figure in the paper suggested the normal distribution with mean 1050 mm and standard deviation 150 mm was a reasonable model for droplet size for water (the “control treatment”) sprayed through a 760 ml/min nozzle. a. What is the probability that the size of a single droplet is less than 1500 mm? At least 1000 mm? b. What is the probability that the size of a single droplet is between 1000 and 1500 mm? c. How would you characterize the smallest 2% of all droplets? d. If the sizes of five independently selected droplets are measured, what is the probability that at least one exceeds 1500 mm? 37. Suppose that blood chloride concentration (mmol/L) has a normal distribution with mean 104 and standard deviation 5 (information in the article “Mathematical Model of Chloride Concentration in Human Blood,” J. of Med. Engr. and Tech., 2006: 25–30, including a normal probability plot as described in Section 4.6, supports this assumption). a. What is the probability that chloride concentration equals 105? Is less than 105? Is at most 105? b. What is the probability that chloride concentration differs from the mean by more than 1 standard deviation? Does this probability depend on the values of m and s? c. How would you characterize the most extreme .1% of chloride concentration values? 38. There are two machines available for cutting corks intended for use in wine bottles. The first produces corks with diameters that are normally distributed with mean 3 cm and standard deviation .1 cm. The second machine produces corks with diameters that have a normal distribution with mean 3.04 cm and standard deviation .02 cm. Acceptable corks have diameters between 2.9 cm and 3.1 cm. Which machine is more likely to produce an acceptable cork? 39. a. If a normal distribution has m 5 30 and s 5 5, what is the 91st percentile of the distribution? b. What is the 6th percentile of the distribution? c. The width of a line etched on an integrated circuit chip is normally distributed with mean 3.000 mm and standard deviation .140. What width value separates the widest 10% of all such lines from the other 90%? 40. The article “Monte Carlo Simulation—Tool for Better Understanding of LRFD” (J. of Structural Engr., 1993: 1586–1599) suggests that yield strength (ksi) for A36 grade steel is normally distributed with m 5 43 and s 5 4.5. a. What is the probability that yield strength is at most 40? Greater than 60? b. What yield strength value separates the strongest 75% from the others? 163 41. The automatic opening device of a military cargo parachute has been designed to open when the parachute is 200 m above the ground. Suppose opening altitude actually has a normal distribution with mean value 200 m and standard deviation 30 m. Equipment damage will occur if the parachute opens at an altitude of less than 100 m. What is the probability that there is equipment damage to the payload of at least one of five independently dropped parachutes? 42. The temperature reading from a thermocouple placed in a constant-temperature medium is normally distributed with mean m, the actual temperature of the medium, and standard deviation s. What would the value of s have to be to ensure that 95% of all readings are within .18 of m? 43. The distribution of resistance for resistors of a certain type is known to be normal, with 10% of all resistors having a resistance exceeding 10.256 ohms and 5% having a resistance smaller than 9.671 ohms. What are the mean value and standard deviation of the resistance distribution? 44. If bolt thread length is normally distributed, what is the probability that the thread length of a randomly selected bolt is a. Within 1.5 SDs of its mean value? b. Farther than 2.5 SDs from its mean value? c. Between 1 and 2 SDs from its mean value? 45. A machine that produces ball bearings has initially been set so that the true average diameter of the bearings it produces is .500 in. A bearing is acceptable if its diameter is within .004 in. of this target value. Suppose, however, that the setting has changed during the course of production, so that the bearings have normally distributed diameters with mean value .499 in. and standard deviation .002 in. What percentage of the bearings produced will not be acceptable? 46. The Rockwell hardness of a metal is determined by impressing a hardened point into the surface of the metal and then measuring the depth of penetration of the point. Suppose the Rockwell hardness of a particular alloy is normally distributed with mean 70 and standard deviation 3. (Rockwell hardness is measured on a continuous scale.) a. If a specimen is acceptable only if its hardness is between 67 and 75, what is the probability that a randomly chosen specimen has an acceptable hardness? b. If the acceptable range of hardness is (70 2 c, 70 1 c), for what value of c would 95% of all specimens have acceptable hardness? c. If the acceptable range is as in part (a) and the hardness of each of ten randomly selected specimens is independently determined, what is the expected number of acceptable specimens among the ten? d. What is the probability that at most eight of ten independently selected specimens have a hardness of less than Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 164 CHAPTER 4 Continuous Random Variables and Probability Distributions 73.84? [Hint: Y 5 the number among the ten specimens with hardness less than 73.84 is a binomial variable; what is p?] 47. The weight distribution of parcels sent in a certain manner is normal with mean value 12 lb and standard deviation 3.5 lb. The parcel service wishes to establish a weight value c beyond which there will be a surcharge. What value of c is such that 99% of all parcels are at least 1 lb under the surcharge weight? 48. Suppose Appendix Table A.3 contained (z) only for z $ 0. Explain how you could still compute a. P(21.72 # Z # 2.55) b. P(21.72 # Z # .55) Is it necessary to tabulate (z) for z negative? What property of the standard normal curve justifies your answer? 49. Consider babies born in the “normal” range of 37–43 weeks gestational age. Extensive data supports the assumption that for such babies born in the United States, birth weight is normally distributed with mean 3432 g and standard deviation 482 g. [The article “Are Babies Normal?” (The American Statistician, 1999: 298–302) analyzed data from a particular year; for a sensible choice of class intervals, a histogram did not look at all normal, but after further investigations it was determined that this was due to some hospitals measuring weight in grams and others measuring to the nearest ounce and then converting to grams. A modified choice of class intervals that allowed for this gave a histogram that was well described by a normal distribution.] a. What is the probability that the birth weight of a randomly selected baby of this type exceeds 4000 g? Is between 3000 and 4000 g? b. What is the probability that the birth weight of a randomly selected baby of this type is either less than 2000 g or greater than 5000 g? c. What is the probability that the birth weight of a randomly selected baby of this type exceeds 7 lb? d. How would you characterize the most extreme .1% of all birth weights? e. If X is a random variable with a normal distribution and a is a numerical constant (a 2 0), then Y 5 aX also has a normal distribution. Use this to determine the distribution of birth weight expressed in pounds (shape, mean, and standard deviation), and then recalculate the probability from part (c). How does this compare to your previous answer? 50. In response to concerns about nutritional contents of fast foods, McDonald’s has announced that it will use a new cooking oil for its french fries that will decrease substantially trans fatty acid levels and increase the amount of more beneficial polyunsaturated fat. The company claims that 97 out of 100 people cannot detect a difference in taste between the new and old oils. Assuming that this figure is correct (as a long-run proportion), what is the approximate probability that in a random sample of 1000 individuals who have purchased fries at McDonald’s, a. At least 40 can taste the difference between the two oils? b. At most 5% can taste the difference between the two oils? 51. Chebyshev’s inequality, (see Exercise 44, Chapter 3), is valid for continuous as well as discrete distributions. It states that for any number k satisfying k $ 1, P(u X 2 m u $ ks) # 1/k2 (see Exercise 44 in Chapter 3 for an interpretation). Obtain this probability in the case of a normal distribution for k 5 1, 2, and 3, and compare to the upper bound. 52. Let X denote the number of flaws along a 100-m reel of magnetic tape (an integer-valued variable). Suppose X has approximately a normal distribution with m 5 25 and s 5 5. Use the continuity correction to calculate the probability that the number of flaws is a. Between 20 and 30, inclusive. b. At most 30. Less than 30. 53. Let X have a binomial distribution with parameters n 5 25 and p. Calculate each of the following probabilities using the normal approximation (with the continuity correction) for the cases p 5 .5, .6, and .8 and compare to the exact probabilities calculated from Appendix Table A.1. a. P(15 # X # 20) b. P(X # 15) c. P(20 # X) 54. Suppose that 10% of all steel shafts produced by a certain process are nonconforming but can be reworked (rather than having to be scrapped). Consider a random sample of 200 shafts, and let X denote the number among these that are nonconforming and can be reworked. What is the (approximate) probability that X is a. At most 30? b. Less than 30? c. Between 15 and 25 (inclusive)? 55. Suppose only 75% of all drivers in a certain state regularly wear a seat belt. A random sample of 500 drivers is selected. What is the probability that a. Between 360 and 400 (inclusive) of the drivers in the sample regularly wear a seat belt? b. Fewer than 400 of those in the sample regularly wear a seat belt? 56. Show that the relationship between a general normal percentile and the corresponding z percentile is as stated in this section. 57. a. Show that if X has a normal distribution with parameters m and s, then Y 5 aX 1 b (a linear function of X) also has a normal distribution. What are the parameters of the distribution of Y [i.e., E(Y ) and V(Y )]? [Hint: Write the cdf of Y, P(Y # y), as an integral involving the pdf of X, and then differentiate with respect to y to get the pdf of Y.] Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 165 4.4 The Exponential and Gamma Distributions b. If, when measured in 8C, temperature is normally distributed with mean 115 and standard deviation 2, what can be said about the distribution of temperature measured in 8F? 58. There is no nice formula for the standard normal cdf (z), but several good approximations have been published in articles. The following is from “Approximations for Hand Calculators Using Small Integer Coefficients” (Mathematics of Computation, 1977: 214–222). For 0 , z # 5.5, P(Z $ z) 5 1 2 (z) < .5 exp e 2c (83z 1 351)z 1 562 df 703/z 1 165 The relative error of this approximation is less than .042%. Use this to calculate approximations to the following probabilities, and compare whenever possible to the probabilities obtained from Appendix Table A.3. a. P(Z $ 1) b. P(Z , 23) c. P(24 , Z , 4) d. P(Z . 5) 4.4 The Exponential and Gamma Distributions The density curve corresponding to any normal distribution is bell-shaped and therefore symmetric. There are many practical situations in which the variable of interest to an investigator might have a skewed distribution. One family of distributions that has this property is the gamma family. We first consider a special case, the exponential distribution, and then generalize later in the section. The Exponential Distribution The family of exponential distributions provides probability models that are very widely used in engineering and science disciplines. DEFINITION X is said to have an exponential distribution with parameter l (l . 0) if the pdf of X is f (x; l) 5 e le2lx 0 x$0 otherwise (4.5) Some sources write the exponential pdf in the form (1/b)e2x/b, so that b 5 1/l. The expected value of an exponentially distributed random variable X is ` E(X) 5 3 xle2lx dx 0 Obtaining this expected value necessitates doing an integration by parts. The variance of X can be computed using the fact that V(X) 5 E(X 2) 2 [E(X)]2. The determination of E(X 2) requires integrating by parts twice in succession. The results of these integrations are as follows: m5 1 l s2 5 1 l2 Both the mean and standard deviation of the exponential distribution equal 1/l. Graphs of several exponential pdf’s are illustrated in Figure 4.26. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 170 CHAPTER 4 Continuous Random Variables and Probability Distributions DEFINITION Let n be a positive integer. Then a random variable X is said to have a chisquared distribution with parameter n if the pdf of X is the gamma density with a 5 n/2 and b 5 2. The pdf of a chi-squared rv is thus f(x; n) 5 u 1 x (n/2)21e2x/2 x $ 0 2n/2(v/2) 0 x,0 (4.10) The parameter n is called the number of degrees of freedom (df) of X. The symbol x 2 is often used in place of “chi-squared.” EXERCISES Section 4.4 (59–71) 59. Let X 5 the time between two successive arrivals at the drive-up window of a local bank. If X has an exponential distribution with l 5 1 (which is identical to a standard gamma distribution with a 5 1), compute the following: a. The expected time between two successive arrivals b. The standard deviation of the time between successive arrivals c. P(X # 4) d. P(2 # X # 5) 60. Let X denote the distance (m) that an animal moves from its birth site to the first territorial vacancy it encounters. Suppose that for banner-tailed kangaroo rats, X has an exponential distribution with parameter l 5 .01386 (as suggested in the article “Competition and Dispersal from Multiple Nests,” Ecology, 1997: 873–883). a. What is the probability that the distance is at most 100 m? At most 200 m? Between 100 and 200 m? b. What is the probability that distance exceeds the mean distance by more than 2 standard deviations? c. What is the value of the median distance? 61. Data collected at Toronto Pearson International Airport suggests that an exponential distribution with mean value 2.725 hours is a good model for rainfall duration (Urban Stormwater Management Planning with Analytical Probabilistic Models, 2000, p. 69). a. What is the probability that the duration of a particular rainfall event at this location is at least 2 hours? At most 3 hours? Between 2 and 3 hours? b. What is the probability that rainfall duration exceeds the mean value by more than 2 standard deviations? What is the probability that it is less than the mean value by more than one standard deviation? 62. The paper “Microwave Observations of Daily Antarctic Sea-Ice Edge Expansion and Contribution Rates” (IEEE Geosci. and Remote Sensing Letters, 2006: 54–58) states that “The distribution of the daily sea-ice advance/retreat from each sensor is similar and is approximately double exponential.” The proposed double exponential distribution has density function f(x) 5 .5le2l|x| for 2` , x , ` . The standard deviation is given as 40.9 km. a. What is the value of the parameter l? b. What is the probability that the extent of daily sea-ice change is within 1 standard deviation of the mean value? 63. A consumer is trying to decide between two long-distance calling plans. The first one charges a flat rate of 10¢ per minute, whereas the second charges a flat rate of 99¢ for calls up to 20 minutes in duration and then 10¢ for each additional minute exceeding 20 (assume that calls lasting a noninteger number of minutes are charged proportionately to a whole-minute’s charge). Suppose the consumer’s distribution of call duration is exponential with parameter l. a. Explain intuitively how the choice of calling plan should depend on what the expected call duration is. b. Which plan is better if expected call duration is 10 minutes? 15 minutes? [Hint: Let h1(x) denote the cost for the first plan when call duration is x minutes and let h2(x) be the cost function for the second plan. Give expressions for these two cost functions, and then determine the expected cost for each plan.] 64. Evaluate the following: a. (6) b. (5/2) c. F(4; 5) (the incomplete gamma function) d. F(5; 4) e. F(0 ; 4) 65. Let X have a standard gamma distribution with a 5 7. Evaluate the following: a. P(X # 5) b. P(X , 5) c. P(X . 8) d. P(3 # X # 8) e. P(3 , X , 8) f. P(X , 4 or X . 6) 66. Suppose the time spent by a randomly selected student who uses a terminal connected to a local time-sharing computer facility has a gamma distribution with mean 20 min and variance 80 min2. a. What are the values of a and b? b. What is the probability that a student uses the terminal for at most 24 min? c. What is the probability that a student spends between 20 and 40 min using the terminal? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4.5 Other Continuous Distributions 67. Suppose that when a transistor of a certain type is subjected to an accelerated life test, the lifetime X (in weeks) has a gamma distribution with mean 24 weeks and standard deviation 12 weeks. a. What is the probability that a transistor will last between 12 and 24 weeks? b. What is the probability that a transistor will last at most 24 weeks? Is the median of the lifetime distribution less than 24? Why or why not? c. What is the 99th percentile of the lifetime distribution? d. Suppose the test will actually be terminated after t weeks. What value of t is such that only .5% of all transistors would still be operating at termination? 68. The special case of the gamma distribution in which a is a positive integer n is called an Erlang distribution. If we replace b by 1/l in Expression (4.8), the Erlang pdf is f(x; l, n) 5 • l(l x) n21e2lx x$0 (n 2 1)! 0 x,0 It can be shown that if the times between successive events are independent, each with an exponential distribution with parameter l, then the total time X that elapses before all of the next n events occur has pdf f(x; l, n). a. What is the expected value of X? If the time (in minutes) between arrivals of successive customers is exponentially distributed with l 5 .5, how much time can be expected to elapse before the tenth customer arrives? b. If customer interarrival time is exponentially distributed with l 5 .5, what is the probability that the tenth customer (after the one who has just arrived) will arrive within the next 30 min? c. The event {X # t} occurs iff at least n events occur in the next t units of time. Use the fact that the number of events occurring in an interval of length t has a Poisson distribution with parameter lt to write an expression 171 (involving Poisson probabilities) for the Erlang cdf F(t; l, n) 5 P(X # t). 69. A system consists of five identical components connected in series as shown: 1 2 3 4 5 As soon as one component fails, the entire system will fail. Suppose each component has a lifetime that is exponentially distributed with l 5 .01 and that components fail independently of one another. Define events Ai 5 {ith component lasts at least t hours}, i 5 1, c, 5, so that the Ais are independent events. Let X 5 the time at which the system fails—that is, the shortest (minimum) lifetime among the five components. a. The event {X $ t} is equivalent to what event involving A1, c, A5? b. Using the independence of the Airs, compute P(X $ t). Then obtain F(t) 5 P(X # t) and the pdf of X. What type of distribution does X have? c. Suppose there are n components, each having exponential lifetime with parameter l. What type of distribution does X have? 70. If X has an exponential distribution with parameter l, derive a general expression for the (100p)th percentile of the distribution. Then specialize to obtain the median. 71. a. The event {X 2 # y} is equivalent to what event involving X itself? b. If X has a standard normal distribution, use part (a) to write the integral that equals P(X 2 # y). Then differentiate this with respect to y to obtain the pdf of X 2 [the square of a N(0, 1) variable]. Finally, show that X 2 has a chi-squared distribution with n 5 1 df [see (4.10)]. [Hint: Use the following identity.] b(y) d e f(x) dx f 5 f [b(y)] # br(y) 2 f [a(y)] # a r(y) dy 3a(y) 4.5 Other Continuous Distributions The normal, gamma (including exponential), and uniform families of distributions provide a wide variety of probability models for continuous variables, but there are many practical situations in which no member of these families fits a set of observed data very well. Statisticians and other investigators have developed other families of distributions that are often appropriate in practice. The Weibull Distribution The family of Weibull distributions was introduced by the Swedish physicist Waloddi Weibull in 1939; his 1951 article “A Statistical Distribution Function of Wide Applicability” (J. of Applied Mechanics, vol. 18: 293–297) discusses a number of applications. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4.5 Other Continuous Distributions Example 4.28 177 Project managers often use a method labeled PERT—for program evaluation and review technique—to coordinate the various activities making up a large project. (One successful application was in the construction of the Apollo spacecraft.) A standard assumption in PERT analysis is that the time necessary to complete any particular activity once it has been started has a beta distribution with A 5 the optimistic time (if everything goes well) and B 5 the pessimistic time (if everything goes badly). Suppose that in constructing a single-family house, the time X (in days) necessary for laying the foundation has a beta distribution with A 5 2, B 5 5, a 5 2, and b 5 3. Then a/(a 1 b) 5 .4, so E(X) 5 2 1 (3)(.4) 5 3.2. For these values of a and b, the pdf of X is a simple polynomial function. The probability that it takes at most 3 days to lay the foundation is 3 52x 2 1 4! x 2 2 a ba b dx P(X # 3) 5 3 # 3 1!2! 3 3 2 4 3 4 # 11 11 5 3 (x 2 2)(5 2 x)2dx 5 5 5 .407 27 2 27 4 27 ■ The standard beta distribution is commonly used to model variation in the proportion or percentage of a quantity occurring in different samples, such as the proportion of a 24-hour day that an individual is asleep or the proportion of a certain element in a chemical compound. EXERCISES Section 4.5 (72–86) 72. The lifetime X (in hundreds of hours) of a certain type of vacuum tube has a Weibull distribution with parameters a 5 2 and b 5 3. Compute the following: a. E(X) and V(X) b. P(X # 6) c. P(1.5 # X # 6) (This Weibull distribution is suggested as a model for time in service in “On the Assessment of Equipment Reliability: Trading Data Collection Costs for Precision,” J. of Engr. Manuf., 1991: 105–109.) 73. The authors of the article “A Probabilistic Insulation Life Model for Combined Thermal-Electrical Stresses” (IEEE Trans. on Elect. Insulation, 1985: 519–522) state that “the Weibull distribution is widely used in statistical problems relating to aging of solid insulating materials subjected to aging and stress.” They propose the use of the distribution as a model for time (in hours) to failure of solid insulating specimens subjected to AC voltage. The values of the parameters depend on the voltage and temperature; suppose a 5 2.5 and b 5 200 (values suggested by data in the article). a. What is the probability that a specimen’s lifetime is at most 250? Less than 250? More than 300? b. What is the probability that a specimen’s lifetime is between 100 and 250? c. What value is such that exactly 50% of all specimens have lifetimes exceeding that value? 74. Let X 5 the time (in 1021 weeks) from shipment of a defective product until the customer returns the product. Suppose that the minimum return time is g 5 3.5 and that the excess X 2 3.5 over the minimum has a Weibull distribution with parameters a 5 2 and b 5 1.5 (see “Practical Applications of the Weibull Distribution,” Industrial Quality Control, Aug. 1964: 71–78). a. What is the cdf of X? b. What are the expected return time and variance of return time? [Hint: First obtain E(X 2 3.5) and V(X 2 3.5).] c. Compute P(X . 5). d. Compute P(5 # X # 8). 75. Let X have a Weibull distribution with the pdf from Expression (4.11). Verify that m 5 b(1 1 1/a). [Hint: In the integral for E(X), make the change of variable y 5 (x/b)a, so that x 5 by1/a.] 76. a. In Exercise 72, what is the median lifetime of such tubes? [Hint: Use Expression (4.12).] b. In Exercise 74, what is the median return time? c. If X has a Weibull distribution with the cdf from Expression (4.12), obtain a general expression for the (100p)th percentile of the distribution. d. In Exercise 74, the company wants to refuse to accept returns after t weeks. For what value of t will only 10% of all returns be refused? 77. The authors of the paper from which the data in Exercise 1.27 was extracted suggested that a reasonable probability model for drill lifetime was a lognormal distribution with m 5 4.5 and s 5 .8. a. What are the mean value and standard deviation of lifetime? b. What is the probability that lifetime is at most 100? c. What is the probability that lifetime is at least 200? Greater than 200? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 178 CHAPTER 4 Continuous Random Variables and Probability Distributions 78. The article “On Assessing the Accuracy of Offshore Wind Turbine Reliability-Based Design Loads from the Environmental Contour Method” (Intl. J. of Offshore and Polar Engr., 2005: 132–140) proposes the Weibull distribution with a 5 1.817 and b 5 .863 as a model for 1-hour significant wave height (m) at a certain site. a. What is the probability that wave height is at most .5 m? b. What is the probability that wave height exceeds its mean value by more than one standard deviation? c. What is the median of the wave-height distribution? d. For 0 , p , 1, give a general expression for the 100pth percentile of the wave-height distribution. 79. Nonpoint source loads are chemical masses that travel to the main stem of a river and its tributaries in flows that are distributed over relatively long stream reaches, in contrast to those that enter at well-defined and regulated points. The article “Assessing Uncertainty in Mass Balance Calculation of River Nonpoint Source Loads” (J. of Envir. Engr., 2008: 247–258) suggested that for a certain time period and location, X 5 nonpoint source load of total dissolved solids could be modeled with a lognormal distribution having mean value 10,281 kg/day/km and a coefficient of variation CV 5 .40 (CV 5 sX/mX). a. What are the mean value and standard deviation of ln(X)? b. What is the probability that X is at most 15,000 kg/day/km? c. What is the probability that X exceeds its mean value, and why is this probability not .5? d. Is 17,000 the 95th percentile of the distribution? | 80. a. Use Equation (4.13) to write a formula for the median m of the lognormal distribution. What is the median for the load distribution of Exercise 79? b. Recalling that za is our notation for the 100(1 2 a) percentile of the standard normal distribution, write an expression for the 100(1 2 a) percentile of the lognormal distribution. In Exercise 79, what value will load exceed only 1% of the time? 81. A theoretical justification based on a certain material failure mechanism underlies the assumption that ductile strength X of a material has a lognormal distribution. Suppose the parameters are m 5 5 and s 5 .1. a. Compute E(X) and V(X). b. c. d. e. Compute P(X . 125). Compute P(110 # X # 125). What is the value of median ductile strength? If ten different samples of an alloy steel of this type were subjected to a strength test, how many would you expect to have strength of at least 125? f. If the smallest 5% of strength values were unacceptable, what would the minimum acceptable strength be? 82. The article “The Statistics of Phytotoxic Air Pollutants” (J. of Royal Stat. Soc., 1989: 183–198) suggests the lognormal distribution as a model for SO2 concentration above a certain forest. Suppose the parameter values are m 5 1.9 and s 5 .9. a. What are the mean value and standard deviation of concentration? b. What is the probability that concentration is at most 10? Between 5 and 10? 83. What condition on a and b is necessary for the standard beta pdf to be symmetric? 84. Suppose the proportion X of surface area in a randomly selected quadrat that is covered by a certain plant has a standard beta distribution with a 5 5 and b 5 2. a. Compute E(X) and V(X). b. Compute P(X # .2). c. Compute P(.2 # X # .4). d. What is the expected proportion of the sampling region not covered by the plant? 85. Let X have a standard beta density with parameters a and b. a. Verify the formula for E(X) given in the section. b. Compute E[(1 2 X)m]. If X represents the proportion of a substance consisting of a particular ingredient, what is the expected proportion that does not consist of this ingredient? 86. Stress is applied to a 20-in. steel bar that is clamped in a fixed position at each end. Let Y 5 the distance from the left end at which the bar snaps. Suppose Y/20 has a standard beta distribution with E(Y) 5 10 and V(Y) 5 100. 7 a. What are the parameters of the relevant standard beta distribution? b. Compute P(8 # Y # 12). c. Compute the probability that the bar snaps more than 2 in. from where you expect it to. 4.6 Probability Plots An investigator will often have obtained a numerical sample x1, x2, c, xn and wish to know whether it is plausible that it came from a population distribution of some particular type (e.g., from a normal distribution). For one thing, many formal procedures from statistical inference are based on the assumption that the population distribution is of a specified type. The use of such a procedure is inappropriate if the actual underlying probability distribution differs greatly from the assumed type. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 188 CHAPTER 4 Continuous Random Variables and Probability Distributions b. Construct a Weibull probability plot. Is the Weibull distribution family plausible? 93. Construct a probability plot that will allow you to assess the plausibility of the lognormal distribution as a model for the rainfall data of Exercise 83 in Chapter 1. 94. The accompanying observations are precipitation values during March over a 30-year period in Minneapolis-St. Paul. .77 1.74 .81 1.20 1.95 1.20 .47 1.43 3.37 2.20 3.00 3.09 1.51 2.10 .52 1.62 1.31 .32 .59 .81 2.81 1.87 1.18 1.35 4.75 2.48 .96 1.89 .90 2.05 a. Construct and interpret a normal probability plot for this data set. b. Calculate the square root of each value and then construct a normal probability plot based on this transformed data. Does it seem plausible that the square root of precipitation is normally distributed? c. Repeat part (b) after transforming by cube roots. 95. Use a statistical software package to construct a normal probability plot of the tensile ultimate-strength data given in Exercise 13 of Chapter 1, and comment. 96. Let the ordered sample observations be denoted by y1, y2, c, yn ( y1 being the smallest and yn the largest). Our SUPPLEMENTARY EXERCISES 97. The following failure time observations (1000s of hours) resulted from accelerated life testing of 16 integrated circuit chips of a certain type: 82.8 242.0 229.9 11.6 26.5 558.9 359.5 244.8 366.7 502.5 304.3 204.6 307.8 379.1 179.7 212.6 Use the corresponding percentiles of the exponential distribution with l 5 1 to construct a probability plot. Then explain why the plot assesses the plausibility of the sample having been generated from any exponential distribution. (98–128) 98. Let X 5 the time it takes a read/write head to locate a desired record on a computer disk memory device once the head has been positioned over the correct track. If the disks rotate once every 25 millisec, a reasonable assumption is that X is uniformly distributed on the interval [0, 25]. a. Compute P(10 # X # 20). b. Compute P(X $ 10). c. Obtain the cdf F(X). d. Compute E(X) and sX. 99. A 12-in. bar that is clamped at both ends is to be subjected to an increasing amount of stress until it snaps. Let Y 5 the distance from the left end at which the break occurs. Suppose Y has pdf 1 y a bya1 2 b f(y) 5 • 24 12 0 suggested check for normality is to plot the (21((i 2 .5)/n), yi) pairs. Suppose we believe that the observations come from a distribution with mean 0, and let w1, c, wn be the ordered absolute values of the xirs . A half-normal plot is a probability plot of the wirs. More specifically, since P(u Z u # w) 5 P(2w # Z # w) 5 2(w) 2 1, a half-normal plot is a plot of the (21 5[(i 2 .5)/n 1 1]/26, wi) pairs. The virtue of this plot is that small or large outliers in the original sample will now appear only at the upper end of the plot rather than at both ends. Construct a half-normal plot for the following sample of measurement errors, and comment: 23.78, 21.27, 1.44, 2.39, 12.38, 243.40, 1.15, 23.96, 22.34, 30.84. e. The expected length of the shorter segment when the break occurs. 100. Let X denote the time to failure (in years) of a certain hydraulic component. Suppose the pdf of X is f(x) 5 32/(x 1 4)3 for x . 0. a. Verify that f(x) is a legitimate pdf. b. Determine the cdf. c. Use the result of part (b) to calculate the probability that time to failure is between 2 and 5 years. d. What is the expected time to failure? e. If the component has a salvage value equal to 100/(4 1 x) when its time to failure is x, what is the expected salvage value? 101. The completion time X for a certain task has cdf F(x) given by 0 # y # 12 otherwise Compute the following: a. The cdf of Y, and graph it. b. P(Y # 4), P(Y . 6), and P(4 # Y # 6) c. E(Y), E(Y2) , and V(Y) d. The probability that the break point occurs more than 2 in. from the expected break point. 0 x,0 ⎧ ⎪ x3 0#x,1 ⎪ 3 ⎪ ⎨ 1 7 7 3 7 ⎪ 1 2 2 a3 2 xb a4 2 4 xb 1 # x # 3 ⎪ 7 ⎪ 1 x. ⎩ 3 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Supplementary Exercises a. Obtain the pdf f(x) and sketch its graph. b. Compute P(.5 # X # 2). c. Compute E(X). 102. The breakdown voltage of a randomly chosen diode of a certain type is known to be normally distributed with mean value 40 V and standard deviation 1.5 V. a. What is the probability that the voltage of a single diode is between 39 and 42? b. What value is such that only 15% of all diodes have voltages exceeding that value? c. If four diodes are independently selected, what is the probability that at least one has a voltage exceeding 42? 103. The article “Computer Assisted Net Weight Control” (Quality Progress, 1983: 22–25) suggests a normal distribution with mean 137.2 oz and standard deviation 1.6 oz for the actual contents of jars of a certain type. The stated contents was 135 oz. a. What is the probability that a single jar contains more than the stated contents? b. Among ten randomly selected jars, what is the probability that at least eight contain more than the stated contents? c. Assuming that the mean remains at 137.2, to what value would the standard deviation have to be changed so that 95% of all jars contain more than the stated contents? 104. When circuit boards used in the manufacture of compact disc players are tested, the long-run percentage of defectives is 5%. Suppose that a batch of 250 boards has been received and that the condition of any particular board is independent of that of any other board. a. What is the approximate probability that at least 10% of the boards in the batch are defective? b. What is the approximate probability that there are exactly 10 defectives in the batch? 105. The article “Characterization of Room Temperature Damping in Aluminum-Indium Alloys” (Metallurgical Trans., 1993: 1611–1619) suggests that Al matrix grain size (mm) for an alloy consisting of 2% indium could be modeled with a normal distribution with a mean value 96 and standard deviation 14. a. What is the probability that grain size exceeds 100? b. What is the probability that grain size is between 50 and 80? c. What interval (a, b) includes the central 90% of all grain sizes (so that 5% are below a and 5% are above b)? 106. The reaction time (in seconds) to a certain stimulus is a continuous random variable with pdf 3# 1 1#x#3 f(x) 5 • 2 x 2 0 otherwise a. Obtain the cdf. b. What is the probability that reaction time is at most 2.5 sec? Between 1.5 and 2.5 sec? 189 c. Compute the expected reaction time. d. Compute the standard deviation of reaction time. e. If an individual takes more than 1.5 sec to react, a light comes on and stays on either until one further second has elapsed or until the person reacts (whichever happens first). Determine the expected amount of time that the light remains lit. [Hint: Let h(X) 5 the time that the light is on as a function of reaction time X.] 107. Let X denote the temperature at which a certain chemical reaction takes place. Suppose that X has pdf 1 (4 2 x 2) f(x) 5 • 9 0 21 # x # 2 otherwise a. Sketch the graph of f(x). b. Determine the cdf and sketch it. c. Is 0 the median temperature at which the reaction takes place? If not, is the median temperature smaller or larger than 0? d. Suppose this reaction is independently carried out once in each of ten different labs and that the pdf of reaction time in each lab is as given. Let Y 5 the number among the ten labs at which the temperature exceeds 1. What kind of distribution does Y have? (Give the names and values of any parameters.) 108. The article “Determination of the MTF of Positive Photoresists Using the Monte Carlo Method” (Photographic Sci. and Engr., 1983: 254–260) proposes the exponential distribution with parameter l 5 .93 as a model for the distribution of a photon’s free path length (mm) under certain circumstances. Suppose this is the correct model. a. What is the expected path length, and what is the standard deviation of path length? b. What is the probability that path length exceeds 3.0? What is the probability that path length is between 1.0 and 3.0? c. What value is exceeded by only 10% of all path lengths? 109. The article “The Prediction of Corrosion by Statistical Analysis of Corrosion Profiles” (Corrosion Science, 1985: 305–315) suggests the following cdf for the depth X of the deepest pit in an experiment involving the exposure of carbon manganese steel to acidified seawater. 2(x2a)/b F(x; a, b) 5 e2e 2` , x , ` The authors propose the values a 5 150 and b 5 90. Assume this to be the correct model. a. What is the probability that the depth of the deepest pit is at most 150? At most 300? Between 150 and 300? b. Below what value will the depth of the maximum pit be observed in 90% of all such experiments? c. What is the density function of X? d. The density function can be shown to be unimodal (a single peak). Above what value on the measurement axis does this peak occur? (This value is the mode.) Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 190 CHAPTER 4 Continuous Random Variables and Probability Distributions e. It can be shown that E(X) < .5772b 1 a. What is the mean for the given values of a and b, and how does it compare to the median and mode? Sketch the graph of the density function. [Note: This is called the largest extreme value distribution.] 110. Let t = the amount of sales tax a retailer owes the government for a certain period. The article “Statistical Sampling in Tax Audits” (Statistics and the Law, 2008: 320–343) proposes modeling the uncertainty in t by regarding it as a normally distributed random variable with mean value m and standard deviation s (in the article, these two parameters are estimated from the results of a tax audit involving n sampled transactions). If a represents the amount the retailer is assessed, then an under-assessment results if t . a and an over-assessment results if a . t. The proposed penalty (i.e., loss) function for over- or under-assessment is L(a, t) 5 t a if t . a and 5 k(a t) if t # a (k . 1 is suggested to incorporate the idea that over-assessment is more serious than under-assessment). a. Show that a* 5 m 1 s21(1/(k 1 1)) is the value of a that minimizes the expected loss, where 21 is the inverse function of the standard normal cdf. b. If k = 2 (suggested in the article), m = $100,000, and s = $10,000, what is the optimal value of a, and what is the resulting probability of over-assessment? 111. The mode of a continuous distribution is the value x* that maximizes f(x). a. What is the mode of a normal distribution with parameters m and s? b. Does the uniform distribution with parameters A and B have a single mode? Why or why not? c. What is the mode of an exponential distribution with parameter l? (Draw a picture.) d. If X has a gamma distribution with parameters a and b, and a . 1, find the mode. [Hint: ln[f(x)] will be maximized iff f(x) is, and it may be simpler to take the derivative of ln[f(x)].] e. What is the mode of a chi-squared distribution having n degrees of freedom? 112. The article “Error Distribution in Navigation” (J. of the Institute of Navigation, 1971: 429–442) suggests that the frequency distribution of positive errors (magnitudes of errors) is well approximated by an exponential distribution. Let X 5 the lateral position error (nautical miles), which can be either negative or positive. Suppose the pdf of X is f(x) 5 (.1)e2.2|x| 2` , x , ` a. Sketch a graph of f(x) and verify that f(x) is a legitimate pdf (show that it integrates to 1). b. Obtain the cdf of X and sketch it. c. Compute P(X # 0), P(X # 2), P(21 # X # 2), and the probability that an error of more than 2 miles is made. 113. In some systems, a customer is allocated to one of two service facilities. If the service time for a customer served by facility i has an exponential distribution with parameter li (i 5 1, 2) and p is the proportion of all customers served by facility 1, then the pdf of X 5 the service time of a randomly selected customer is f(x; l1, l2, p) 5 e pl1e2l1x 1 (1 2 p)l2e2l2x 0 x$0 otherwise This is often called the hyperexponential or mixed exponential distribution. This distribution is also proposed as a model for rainfall amount in “Modeling Monsoon Affected Rainfall of Pakistan by Point Processes” (J. of Water Resources Planning and Mgmnt., 1992: 671–688). a. Verify that f(x; l1, l2, p) is indeed a pdf. b. What is the cdf F(x; l1, l2, p)? c. If X has f(x; l1, l2, p) as its pdf, what is E(X)? d. Using the fact that E(X 2) 5 2/l2 when X has an exponential distribution with parameter l, compute E(X 2) when X has pdf f(x; l1, l2, p). Then compute V(X). e. The coefficient of variation of a random variable (or distribution) is CV 5 s/m. What is CV for an exponential rv? What can you say about the value of CV when X has a hyperexponential distribution? f. What is CV for an Erlang distribution with parameters l and n as defined in Exercise 68? [Note: In applied work, the sample CV is used to decide which of the three distributions might be appropriate.] 114. Suppose a particular state allows individuals filing tax returns to itemize deductions only if the total of all itemized deductions is at least $5000. Let X (in 1000s of dollars) be the total of itemized deductions on a randomly chosen form. Assume that X has the pdf f(x; a) 5 e k/x a x$5 0 otherwise a. Find the value of k. What restriction on a is necessary? b. What is the cdf of X? c. What is the expected total deduction on a randomly chosen form? What restriction on a is necessary for E(X) to be finite? d. Show that ln(X/5) has an exponential distribution with parameter a 2 1. 115. Let Ii be the input current to a transistor and I0 be the output current. Then the current gain is proportional to ln(I0/Ii). Suppose the constant of proportionality is 1 (which amounts to choosing a particular unit of measurement), so that current gain 5 X 5 ln(I0/Ii). Assume X is normally distributed with m 5 1 and s 5 .05. a. What type of distribution does the ratio I0/Ii have? b. What is the probability that the output current is more than twice the input current? c. What are the expected value and variance of the ratio of output to input current? 116. The article “Response of SiCf/Si3N4 Composites Under Static and Cyclic Loading—An Experimental and Statistical Analysis” (J. of Engr. Materials and Technology, 1997: 186–193) suggests that tensile strength (MPa) of Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Supplementary Exercises composites under specified conditions can be modeled by a Weibull distribution with a 5 9 and b 5 180. a. Sketch a graph of the density function. b. What is the probability that the strength of a randomly selected specimen will exceed 175? Will be between 150 and 175? c. If two randomly selected specimens are chosen and their strengths are independent of one another, what is the probability that at least one has a strength between 150 and 175? d. What strength value separates the weakest 10% of all specimens from the remaining 90%? 117. Let Z have a standard normal distribution and define a new rv Y by Y 5 sZ 1 m. Show that Y has a normal distribution with parameters m and s. [Hint: Y # y iff Z # ? Use this to find the cdf of Y and then differentiate it with respect to y.] 118. a. Suppose the lifetime X of a component, when measured in hours, has a gamma distribution with parameters a and b. Let Y 5 the lifetime measured in minutes. Derive the pdf of Y. [Hint: Y # y iff X # y/60. Use this to obtain the cdf of Y and then differentiate to obtain the pdf.] b. If X has a gamma distribution with parameters a and b, what is the probability distribution of Y 5 cX? 119. In Exercises 117 and 118, as well as many other situations, one has the pdf f(x) of X and wishes to know the pdf of y 5 h(X). Assume that h( # ) is an invertible function, so that y 5 h(x) can be solved for x to yield x 5 k(y). Then it can be shown that the pdf of Y is posing some interesting questions regarding birth coincidences. a. Disregarding leap year and assuming that the other 365 days are equally likely, what is the probability that three randomly selected births all occur on March 11? Be sure to indicate what, if any, extra assumptions you are making. b. With the assumptions used in part (a), what is the probability that three randomly selected births all occur on the same day? c. The author suggested that, based on extensive data, the length of gestation (time between conception and birth) could be modeled as having a normal distribution with mean value 280 days and standard deviation 19.88 days. The due dates for the three Utah sisters were March 15, April 1, and April 4, respectively. Assuming that all three due dates are at the mean of the distribution, what is the probability that all births occurred on March 11? [Hint: The deviation of birth date from due date is normally distributed with mean 0.] d. Explain how you would use the information in part (c) to calculate the probability of a common birth date. 122. Let X denote the lifetime of a component, with f(x) and F(x) the pdf and cdf of X. The probability that the component fails in the interval (x, x 1 x) is approximately f(x) # x. The conditional probability that it fails in (x, x 1 x) given that it has lasted at least x is f(x) # x/[1 2 F(x)]. Dividing this by x produces the failure rate function: r(x) 5 g(y) 5 f [k(y)] # |kr(y)| a. If X has a uniform distribution with A 5 0 and B 5 1, derive the pdf of Y 5 2ln(X). b. Work Exercise 117, using this result. c. Work Exercise 118(b), using this result. 120. Based on data from a dart-throwing experiment, the article “Shooting Darts” (Chance, Summer 1997, 16–19) proposed that the horizontal and vertical errors from aiming at a point target should be independent of one another, each with a normal distribution having mean 0 and variance s2. It can then be shown that the pdf of the distance V from the target to the landing point is f(v) 5 v s2 # e2v /2s 2 2 v.0 a. This pdf is a member of what family introduced in this chapter? b. If s 5 20 mm (close to the value suggested in the paper), what is the probability that a dart will land within 25 mm (roughly 1 in.) of the target? 121. The article “Three Sisters Give Birth on the Same Day” (Chance, Spring 2001, 23–25) used the fact that three Utah sisters had all given birth on March 11, 1998 as a basis for 191 f(x) 1 2 F(x) An increasing failure rate function indicates that older components are increasingly likely to wear out, whereas a decreasing failure rate is evidence of increasing reliability with age. In practice, a “bathtub-shaped” failure is often assumed. a. If X is exponentially distributed, what is r(x)? b. If X has a Weibull distribution with parameters a and b, what is r(x)? For what parameter values will r(x) be increasing? For what parameter values will r(x) decrease with x? c. Since r(x) 5 2(d/dx)ln[1 2 F(x)], ln[1 2 F(x)] 5 2兰r(x)dx. Suppose r(x) 5 • x aa1 2 b b 0 0#x#b otherwise so that if a component lasts b hours, it will last forever (while seemingly unreasonable, this model can be used to study just “initial wearout”). What are the cdf and pdf of X? 123. Let U have a uniform distribution on the interval [0, 1]. Then observed values having this distribution can be obtained from a computer’s random number generator. Let X 5 2(1/l)ln(1 2 U). Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 192 CHAPTER 4 Continuous Random Variables and Probability Distributions a. Show that X has an exponential distribution with parameter l. [Hint: The cdf of X is F(x) 5 P(X # x); X # x is equivalent to U # ?] b. How would you use part (a) and a random number generator to obtain observed values from an exponential distribution with parameter l 5 10? 124. Consider an rv X with mean m and standard deviation s, and let g(X) be a specified function of X. The first-order Taylor series approximation to g(X) in the neighborhood of m is g(X) < g(m) 1 gr(m) # (X 2 m) The right-hand side of this equation is a linear function of X. If the distribution of X is concentrated in an interval over which g( )is approximately linear [e.g., 1x is approximately linear in (1, 2)], then the equation yields approximations to E(g(X)) and V(g(X)). a. Give expressions for these approximations. [Hint: Use rules of expected value and variance for a linear function aX 1 b.] b. If the voltage v across a medium is fixed but current I is random, then resistance will also be a random variable related to I by R 5 v/I. If mI 5 20 and sI 5 .5, calculate approximations to mR and sR. # 125. A function g(x) is convex if the chord connecting any two points on the function’s graph lies above the graph. When g(x) is differentiable, an equivalent condition is that for every x, the tangent line at x lies entirely on or below the graph. (See the figure below.) How does g(m) 5 g(E(X)) compare to E(g(X))? [Hint: The equation of the tangent line at x 5 m is y 5 g(m) 1 gr(m) # (x 2 m). Use the condition of convexity, substitute X for x, and take expected values. [Note: Unless g(x) is linear, the resulting inequality (usually called Jensen’s inequality) is strict (, rather than # ); it is valid for both continuous and discrete rv’s.] 126. Let X have a Weibull distribution with parameters a 5 2 and b. Show that Y 5 2X 2/b2 has a chi-squared distribution with n 5 2. [Hint: The cdf of Y is P(Y # y); express this probability in the form P(X # g(y)), use the fact that X has a cdf of the form in Expression (4.12), and differentiate with respect to y to obtain the pdf of Y.] 127. An individual’s credit score is a number calculated based on that person’s credit history that helps a lender determine how much he/she should be loaned or what credit limit should be established for a credit card. An article in the Los Angeles Times gave data which suggested that a beta distribution with parameters A 5 150, B 5 850, a 5 8, b 5 2 would provide a reasonable approximation to the distribution of American credit scores. [Note: credit scores are integer-valued]. a. Let X represent a randomly selected American credit score. What are the mean value and standard deviation of this random variable? What is the probability that X is within 1 standard deviation of its mean value? b. What is the approximate probability that a randomly selected score will exceed 750 (which lenders consider a very good score)? x 128. Let V denote rainfall volume and W denote runoff volume (both in mm). According to the article “Runoff Quality Analysis of Urban Catchments with Analytical Probability Models” (J. of Water Resource Planning and Management, 2006: 4–14), the runoff volume will be 0 if V # nd and will be k(V 2 nd) if V . nd. Here nd is the volume of depression storage (a constant), and k (also a constant) is the runoff coefficient. The cited article proposes an exponential distribution with parameter l for V. a. Obtain an expression for the cdf of W. [Note: W is neither purely continuous nor purely discrete; instead it has a “mixed” distribution with a discrete component at 0 and is continuous for values w . 0.] b. What is the pdf of W for w . 0? Use this to obtain an expression for the expected value of runoff volume. Bury, Karl, Statistical Distributions in Engineering, Cambridge Univ. Press, Cambridge, England, 1999. A readable and informative survey of distributions and their properties. Johnson, Norman, Samuel Kotz, and N. Balakrishnan, Continuous Univariate Distributions, vols. 1–2, Wiley, New York, 1994. These two volumes together present an exhaustive survey of various continuous distributions. Nelson, Wayne, Applied Life Data Analysis, Wiley, New York, 1982. Gives a comprehensive discussion of distributions and methods that are used in the analysis of lifetime data. Olkin, Ingram, Cyrus Derman, and Leon Gleser, Probability Models and Applications (2nd ed.), Macmillan, New York, 1994. Good coverage of general properties and specific distributions. Tangent line Bibliography Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.