Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Inductive probability wikipedia , lookup
Taylor's law wikipedia , lookup
Law of large numbers wikipedia , lookup
Misuse of statistics wikipedia , lookup
251y0244 12/10/02 ECO251 QBA1 FINAL EXAM DECEMBER 10, 2002 Name KEY Class ________________ Part I. Do all the Following (16 Points) Make Diagrams! Show your work! x ~ N 3, 6 2.50 3 2.50 3 z P 0.91 z 0.08 P 2.50 x 2.50 P 6 6 P 0.91 z 0 P 0.08 z 0 .3186 .0319 .2867 25 3 5 3 z P0.33 z 3.67 Change 19 to 25 2. P5 x 25 P 6 6 P0 z 3.67 P0 z 0.33 .4999 .1293 .3706 5 3 2 3 z P 0.17 z 0.33 3. P2 x 5 P 6 6 P 0.17 z 0 P0 z 0.33 .0675 .1293 .1968 1.84 3 Pz 0.19 4. F 1.84 (Cumulative Probability) Px 1.84 P z 6 Pz 0 P 0.19 z 0 .5 .0753 .4247 6.11 3 P z 0.52 5. Px 6.11 P z 6 Pz 0 P0 z 0.52 .5 .1985 .3015 6. x.31 (Find z .31 first). (3 points) We want a point x.31 , so that Px x .31 .31 . Make a diagram for z , showing zero in the middle, 50% below zero, and the area above zero divided into 19% between z.31 and zero and 31% above z .31 . From the diagram, P0 z z.31 .1900. . The closest we can come is P0 z 0.50 .1915 . So z.31 0.50 , and x z.33 3 0.506 3 3.00 , or 6.00 . 1. 6.00 3 Px 6.00 P z Pz 0.50 6 Pz 0 P0 z 0.50 .5 .1915 .3085 .31 To check this note that 7. A symmetrical region around the mean with a probability of 31%. (3 points) We want two points and x.655 x.345 , so that Px.655 x x 345 .3100 . Make a diagram for z , showing zero in the middle, an area in the middle of 31%, split in two by zero so that 15.5% is above zero and 15.5% is above zero. Since .5 - .155 = .345, the points we want are z ..345 and z.345 . From the diagram, P0 z z.345 .1550 . P0 z 0.40 .1554 . So use z.345 0.40 , and x z.285 3 0.406 3 2.40 , or 0.60 to 5.40. The closest we can come is 5.40 3 0.60 3 z P0.60 x 5.40 P 6 6 P 0.40 z 0.40 2P0 z 0.40 2.1554 .3108 .31. To check this note that 1 251y0244 12/10/02 II. (10 points-2 point penalty for not trying part a .) Show your work! We are investigating the reliability of a machine that fills 16ounce bottles. A sample of six is taken with the following results (Assume that it is correct to do a confidence interval from such a small sample): Be careful! There is a lot of chance for rounding error in this problem! 15.23 15.26 15.19 15.30 16.04 16.00 a. Compute the sample standard deviation, s , for the number of ounces. Show your work. (4) b. Compute a 90% Confidence interval for the mean. (6) c. (Extra credit) Can you say that the mean that you got is significantly different from 16? You must give a reason for your answer to be read. (2) Solution: Since the confidence level is 1 .90, the significance level is a. 1 2 3 4 5 6 x2 x 15.23 15.26 15.19 15.30 16.04 16.00 93.02 x 93.02 , x nx s .10 231.953 232.868 230.736 234.090 257.282 256.000 1442.929 x 2 1442.929 and n 6 . So x x 93.02 15.5033 n 6 0.8151 1442.929 615.5033 0.1630 s x 0.1630 0.4038 . 5 n 1 5 s 0.1630 b. s x x 0.1648. The degrees of freedom are n 1 6 1 5 . 6 n .10 .05 . From the t-table, tn2 1 t.055 2.015 . But what compelled some of you to decide 2 2 that 0.4038 was s in a) but in b)? 2 2 x 2 2 Putting this all together x t n1 s x 15.5033 2.0150.1648 15.503 0.332 or 15.17 to 2 15.83. More formally, P15.17 15.83 .90 . c. Since 16 is not included in this interval, we can say that the mean is significantly different from 16. Note that Minitab gets a similar solution. MTB > TInterval 90.0 c1. Confidence Intervals Variable N Mean StDev SE Mean 90.0 % C.I. C1 6 15.503 0.402 0.164 ( 15.173, 15.834) 2 251y0244 12/10/02 III. Do at least 4 of the following 6 Problems (at least 12 each) (or do sections adding to at least 48 points Anything extra you do helps, and grades wrap around) . Show your work! Please indicate clearly what sections of the problem you are answering! If you are following a rule like E ax aEx please state it! If you are using a formula, state it! If you answer a 'yes' or 'no' question, explain why! If you are using the Poisson or Binomial table, state things like n , p or the mean. Avoid crossing out answers that you think are inappropriate - you might get partial credit. Choose the problems that you do carefully – most of us are unlikely to be able to do more than half of the entire possible credit in this section!) 1. 36 employees are randomly drawn from a large corporation and their sick days checked. Note that the sample size is 36 throughout this problem! a. The sample mean is 5.3. Assume that the population standard deviation is known to be 2.2 and create a 90% confidence interval for the average number of sick days. (4) b. Assume that the facts in part a are correct but that the corporation only has 360 employees. Create a 90% confidence interval for the average number of sick days again, highlighting what has changed. (4) c. Assume that the sample mean is 5.3 and the corporation has 360 employees, but that 2.2 is a sample standard deviation. Create a 90% confidence interval for the average number of sick days again, highlighting what has changed. (4) d. The sample mean is 5.3. Assume that the population standard deviation is known to be 2.2 and create a 93% confidence interval for the average number of sick days. (3) e. Explain the effect of the following on the size of a confidence interval (keep the reasons brief) (3): (i) A smaller sample size. (ii) A smaller population size. (iii) A lower confidence level. Solution: Note that the sample size is 36 throughout this problem. Since the confidence level is 1 .90, the significance level is .10 . Repeat after me! " z goes with (sigma - population variance); t goes with s (sample variance)!" Most of you seem to have been so hypnotized by the old exams that you had that you never noticed that most of this question concerned population variances. 2.2, x x 2.2 0.3667 . From the t-table z 2 z.05 1.645 . 36 n x z 2 x 5.3 1.6450.3667 5.3 0.60 or 4.70 to 5.90. More formally a) P4.70 5.90 90% . b) The population size, N 360 , is less than 20 times the sample size n 36 , so N n 2.2 360 36 x x 0.3667 0.9500 0.3484 . From the t-table 36 360 1 n N 1 z 2 z.05 1.645 . x z x 5.3 1.6450.3484 5.3 0.57 or 4.73 to 5.87. More formally 2 P4.73 5.87 90% . 3 251y0244 12/10/02 s 2.2. The degrees of freedom are n 1 36 1 35 . Since we are given s rather than , we n 1 must use t. From the t-table, t t.0535 1.690 . 2 c) The population size, N 360 , is less than 20 times the sample size n 36 , so s x sx n N n N 1 360 36 0.3667 0.9500 0.3484 . Putting this all together 36 360 1 x tn1 s x 5.3 1.6900.3484 5.3 0.59 or 4.71 to 5.89. More formally, 2 2.2 P4.71 5.89 .90 . x 2.2 d) x 0.3667 . Since the confidence level is 1 .93, the significance level is 36 n .07 . We need z 2 z.035. Make a diagram for z . It should show 50% above zero, divided into 3.5% above z.035 and 50% - 3.5% = 46.5% below z.035 . So P0 z z.035 .4650 P0 z 1.81 .4649 . x z 2 x 5.3 1.810.3667 5.3 0.66 or 4.64 to 5.96. More formally From the Normal table, the closest we can come is P4.64 5.96 93% . It’s remarkable how many of you could find something like z.035 on page one but not here. e) The basic formula for a confidence interval is x z x 2 or x tn1 s x 2 s (i) A smaller sample size. If we use the formula x tn1 s x and s x x , a smaller sample size will n 2 make the standard error larger because the sample size is in the denominator and will also make t larger, as we can see from the t table. (ii) A smaller population size. The formula for the standard error is s x size grows, sx n n 1 N n . As the population N 1 N n approaches one from below, making the confidence interval larger. So for a smaller N 1 population size, the standard deviation and the interval will be smaller. (iii) A lower confidence level. If we look at the t table, we see the value of t gets smaller as the significance level gets larger. As the confidence level falls, the significance level rises. So a lower confidence level makes the interval smaller. 4 251y0244 12/10/02 2. A sample of 6 is taken from a large Normal population with a mean of 3 and a standard deviation of 13. a. What is the probability that an individual item x in the sample lies between 1 and 5? (2) b. What is the probability that the sample mean x lies between 1 and 5? (2) c. What is the probability that at least one of the 6 measurements lies between 1 and 5? (2) d. What is the probability that all the 6 measurements lie between 1 and 5? (2) e. Do b) again assuming that the sample of 6 is taken from a population of 200 (1) f. Find the 99th percentile of the distribution of x (2) g. Find the 99th percentile of the distribution of x (1) Solution: x ~ N , N 3,12 . Since x x ~ N , x N 3,5.8138, n 6. x n 13 6 5.3072, 5 3 1 3 z P 0.15 z 0.15 P1 x 5 P 13 13 2P0 z 1.15 2.0596 .1192 . 53 1 3 z P 0.37 z 0.37 b) P1 x 5 P 5.3072 5.3072 2P0 z 0.37 2.1443 .2886 c) This problem and the next problem are Binomial problems with n 6, p .1192 and q 1 .1192 .8808 . Remember Px C xn p x q n x . a) Px 1 1 P0 1 C06 .8808 1 .4669 .5331 6 d) Px 6 C66 .1192 .0000029 . 6 e) Because the population size is more than 20 times the sample size, the answer is essentially the same as b). f) Since the 99th percentile of z is z.01 2.327 , the 99th percentile of x is x z.10 3 2.32713 33.25 g) Since the 99th percentile of z is z.01 2.327 , the 99th percentile of x is x z.05 x 3 2.3275.3072 3 12.350 15.35 5 251y0244 12/10/02 3. An eight-sided die is rolled six times. It has sides numbered one through 8, all of which are equally likely. A one two or three wins $5.00, so that the probability of winning on each roll is 3/8. Let x represent the number of wins and y represent the amount won. Find the following: a. The complete distribution of x and y (7) – note that you do not have to do this part to answer the remainder of the question. b. The mean and variance of x . (2) c. The mean and variance of y .(2) d. The chance of at least one win (2 points if you did not do a) (1) e. Assume that the die is rolled 40 times, find the probability of 30 or more wins. (Note that your binomial tables won’t help and that if you try to do this without using a table you will be here until Christmas) (i) Can this problem be done using the Poisson distribution? – Why? (1) (ii) Can this problem be done using the Normal distribution? – Why? (1) (iii) Decide which of the two distributions is correct and answer the question. (2 or 2.5) Solution: This problem is a Binomial problems with n 6, Px C xn p x q n x C x6 85 x 3 6 x 8 p 5 and q 1 5 3 . Remember 8 8 8 . The distribution is as follows: x y 0 0 1 5 2 10 3 15 4 20 5 25 6 30 C06 38 8 6 5 8 6 .05960 1 5 C16 3 8 5 8 6.035763 .21458 2 4 C 26 3 8 5 8 15.021457 .32187 3 3 C36 38 5 8 20.012875 .25749 4 2 C 46 3 8 5 8 15.007725 .11587 5 1 C56 38 5 8 6.004635 .02781 6 0 C66 38 5 8 .00278 0 5 There is a minimal rounding error since these add to 1. np 638 2.25 2 npq 2.255 8 1.40625 2 c. The prize is y 5 x so, using the formulas Eax aEx and Var ax a Var x we get E5x 5Ex 52.25 11.25 and Var5x 5 2 Varx 251.40625 35.15625 d. Px 1 1 P0 1 .05960 .94040 . b. The mean is e. (i) n 50 , p 5 8 and the variance is . If we test to see if the Poisson Distribution can be used, we find n 50 80.0. p 58 Since this is not above 500, we cannot use the Poisson Distribution. (ii) np 50 5 8 31.25 5. nq 50 3 8 50 31.25 18.75 5. Since these are both above 5, we can use the Normal distribution with 2 npq 31.253 8 11.71875. np 505 8 31.25 and (iii) With the continuity correction 29.5 31.25 Pz 0.51 .5 .1950 .3050 Px 29.5 P z 11.71875 6 251y0244 12/10/02 4. a. I buy two mainframes from a computer manufacturer. One computer has a 23% chance of breaking down during the first year, the second has a 26% chance of breaking down during the same period. Let A represent the event that the first computer breaks down in the first year and B represent the event that the second computer breaks down. Tell how you will note the complement of these events and assume that the events are independent. Find the following, noting if each is an event like A B, A B, or similar events involving the complement. Assume that events A and B are independent. (i) The probability that both computers break down in the first year . (2) (ii) The probability that one computer breaks down in the first year . (2) (iii) The probability that no computer breaks down in the first year . (2) (iv) If a breakdown costs you $1000 (and no machine will break down more than once in a year), what is the mean and variance of your breakdown costs? (4) b. I use an average of 2 boxes of paper in a day , but it takes 5 days to get a delivery. Given the average usage over 9 days , do the following: (i) What is the probability of using at least 22 boxes in the 9 day period? (1) (ii) What is the probability of using more than 22 boxes? (1) (iii) If x is the number of boxes used, what is P 4 x 17 ? (2) (iv) If I want to keep the probability of running out of paper below 1%, what is the minimum number of boxes I should have on hand when I reorder? (Why?) (2) A is the complement of A. A B PAPB .23.26 .0598 (ii) PA B PA B P APB PA PB .23.74 .77.26 .1702 .2002 .3704 . (This is P A B P A B ) (iii) PA B PA PB .77.74 .5698 . These probabilities add to one. (iv) If x represents the number of breakdowns, y 1000 x is the cost. x xPx Px x 2 P x Solution: a. (i) P 0 1 2 sum .5698 .3704 .0598 1.0000 0 .3704 .1196 .4900 0 .3704 .2392 .6096 2 x Ex x Px .49 x2 Varx Ex 2 x2 .6096 .49 .3695 Eax aEx and Varax a 2Varx , so E1000x 1000Ex 1000.49 490 and Var1000 x 1000 2 Varx 369500. b. (i) Poisson distribution 25 10. Px 22 1 Px 21 1 .99930 .00070 (ii) Px 22 1 Px 22 1 .99970 .00030 (iii) P4 x 17 Px 17 Px 3 .98572 .01034 .97538 (iv) You must have 18 boxes. Px 18 .99281.This means Px 18 is below 1%. 7 251y0244 12/10/02 5. (Lee et. al.) The following joint probability table measures the satisfaction that customers have with local food stores. x is the satisfaction level, with 4 representing the highest value. y represents the number of years that the consumer has lived in the area, with 2 representing long term residents. x 1 2 3 4 Total 1 .01 .14 .23 .07 .45 2 .10 .17 .23 .05 .55 Total .11 .31 .46 .12 a. Are x and y independent? Why? (2) b. How do we know that this is a valid distribution? (1) c. Compute the mean and standard deviation of y (2) d. Compute the covariance between x and y and interpret it. Does this mean that long term residents are more satisfied? (3.5) e. Compute the correlation. What does this tell us that we couldn’t learn from the covariance? (2.5) f. Remember that these are joint probabilities. Find the conditional probability that a long-term resident is y very satisfied Px 4 y 2 . (2) g. Use the same method to find the complete conditional probability of x for long term residents, show that this is a valid distribution and compute the conditional mean for long-term residents. (5.5) h. If you are ready for some real thinking, find the conditional mean for short-term residents and use it to compare the satisfaction levels for the two kinds of residents. (To do this correctly we would need variances as well – have a nice break and we’ll worry about that next semester.) Solution: a. x and y are not independent. For example Px 1 y 1 .01 Px 1 P y 1 .11.45 .0495 b. The probabilities add to one and are not negative or above 1. y c. P y yP y 1 2 sum .45 .55 1.00 y 2 P y .45 1.10 1.55 0.45 2.20 2.65 y E y yP y 1.55 y2 E y 2 y2 2.65 1.552 0.2475 y .2475 .4975. x d. y 1 2 Px xPx x 2 P x P y yP y y 2 P y 0.45 1.10 1.55 0.45 2.20 2.65 1 .01 .10 .11 2 .14 .17 .31 3 .23 .23 .46 4 .07 .05 .12 .45 .55 1.00 .11 .62 1.38 .48 2.59 .11 1.24 4.14 1.92 7.41 Px 1, Ex xPx 2.59 , Ex x Px 7.41, P y 1, E y yP y 1.55 and Ey y P y 2.65 2 To summarize 2 x 2 2 y E xy = .01(1)(1) +.14(2)(1) +.23(3)(1) +.07(4)(1) +.10(1)(2) 0.01 + 0.20 +.17(2)(2) + 0.28 + 0.68 +.23(3)(2) + 0.69 + 1.38 +.05(4)(2) + 0.28 + 0.40 =3.92 251y0244 12/10/02 8 xy Covxy E xy x y 3.92 2.591.55 0.0945 . Negative, so x and y move oppositely. This means that long term residents are less satisfied. e. x2 Ex 2 x2 7.41 2.592 0.7019 and E y 2.65 1.55 0.2475 xy 0.0945 .0945 .22673 So that xy x y .7019 .2475 0.83780.4975 2 y 2 2 y 2 Since the square of the correlation is between .05 and .06, on a zero-one scale it is quite weak. f. By the multiplication rule Px 4 y 2 Px 4 y 2 .05 .0943 P y 2 .53 g. If we divide the whole top row by .45, and the bottom row by .55 we get y 1 2 1 .022 .182 2 .311 .309 x 3 .511 .418 4 .156 .091 Total 1.000 1.000 For short-term residents .022(1) + .311(2) + .511(3) + .156(4) = .022 + .622 + 1.533 + .624 = 2.801 For long-term residents .182(1) + .309(2) + .418(3) + .091(4) = .182 + .618 + 1.254 + .364 = 2.418. Anyway long-term residents seem less satisfied. 9 251y0244 12/10/02 6. Answers to a-c can be left in factorial form. a. A bridge hand consists of 13 cards. A deck is a population of 52 cards, with a total of 4 kings. What is the probability of 3 kings? What is the distribution you are using? (3) b. What are the mean and variance of the number of kings in a hand? (2) c. Now comes the fun. Let’s say we take a load of decks – you do not need to know how many cards we have, and we deal you 13 cards, what is the probability of 3 kings now? (3) d. Are the mean and variance of the number of kings in your hand in c) the same as the values in b)? What changes and why? (1) e. (Keller, Warrack) The amount of gasoline sold by one of your service stations daily is uniformly distributed between a minimum of 2000 and a maximum of 5000 gallons. (i) What is the mean and standard deviation of sales? (1.5) (ii) What is the probability that sales on a given day will fall between 2500 and 3500 gallons? (2) (iii)What is the probability that sales will be over 3500 gallons? (1) (iv)What is the probability of sales between 1700 and 2700 gallons? (1) (v) If you did not do (iii) using cumulative distributions, do it now (2) (vi)If you own 10 identical service stations, what is the probability that at least one has sales over 3500 gallons? (3) Solution: M 4, N 52, n 13 and x 3 . Px M x N M n x N n C C C . So P3 4 3 48 10 52 13 C C C 4! 48! 3!1! 38!10! 52! 39!13! 48 47 46 45 44 43 42 41 40 39 4 10 9 8 7 6 5 4 3 2 1 439 52 51 50 49 48 47 46 45 44 43 42 41 40 52 51 50 49 13 12 1110 9 8 7 6 5 4 3 2 1 13 12 11 43913 12 11 .04120 52 51 50 49 4 1 4 .07692 np 13 1 , 52 13 52 N n 52 13 4 48 39 4 48 2 .7647113.07101 npq 13 13 N 1 52 1 52 52 51 52 52 b. p .70588 so 0.70588 0.84017 . c. Binomial P x C p q n x x 1312 11 1 12 3 2 1 13 13 3 10 n x 13! 1 . So P3 C p q 3!10! 13 13 3 3 10 3 10 12 13 286.0004551661.449137 .05847 d. Because the population is now infinite, we remove the .76471 finite population adjustment from the variance formula. The mean is unchanged. 10 251y0244 12/10/02 e. (Keller, Warrack) The amount of gasoline sold by one of your service stations daily is uniformly distributed between a minimum of 2000 and a maximum of 5000 gallons. (i) What is the mean and standard deviation of sales? (1.5) (ii) What is the probability that sales on a given day will fall between 2500 and 3500 gallons? (2) (iii)What is the probability that sales will be over 3500 gallons? (1) (iv)What is the probability of sales between 1700 and 2700 gallons? (1) (v) If you did not do (iii) using cumulative distributions, do it now (2) (vi)If you own 10 identical service stations, what is the probability that at least one has sales over 3500 gallons? (3) This is a continuous uniform distribution with c 2000 and d 5000 . 1 1 1 d c 5000 2000 3000 2 2 d c 5000 2000 c d 2000 5000 2 3500 , 750000 (i) 12 12 2 2 so 750000 866.025 1000 1 .3333 (ii) P2500 x 3500 3000 3 5000 3500 1 .5000 (iii) P x 3500 3000 2 2700 2000 .2333 (iv) P1700 x 2700 3000 xc (v) Recall that F x for c x d , F x 0 for x c and F x 1 for x c and that d c 3500 2000 1 1 .5000 F 500 means Px 500 . Px 3500 1 F 3500 1 3000 2 n x n x (vi) This is your basic Binomial problem P x C x p q . p .5 , q .5 and n 9. You can use the Binomial table for this one. Px 1 1 P0 1 .00195 .99805 11 251y0244 12/10/02 6. (ctd) f. (Keller, Warrack) (Extra credit) The number of hours an alkaline battery lasts is exponentially distributed with a parameter c of 0.04. (i) What are the mean and standard deviation of a battery’s life? (2) (ii) What is the probability that the battery will last between 15 and 20 hours? (2) (iii)What is the probability that it will last for more than 20 hours (1) (iv)Are you surprised that a jorcillator has two such batteries and that it works as long as one of the batteries works. What is the probability that the jorcillator lasts more than 20 hours? (2) (v) (Difficult) What is the probability that the jorcillator lasts between 15 and 20 hours? (The answer to this will not be published!) (7) g. The Muggle detector in front of my tower will identify a Muggle correctly as a Muggle 97% of the time and will identify a Wizard correctly as a Wizard 95% of the time. Assume that 2% of the population are Wizards and the rest Muggles. The Muggle detector says that you are Wizard. What is the probability that you really are a Wizard? I only admit Wizards to my tower. (Hint: To do this you need decent notation or a good tree – Let Y (yes!) be the event that it says you are a Wizard and N (no!) be the event that it says you .02 and M are a Muggle. W is the event that you are a Wizard P W Muggle. You need conditional probability to do this.) (6) is the event that you are a Solution: f) You were told in advance that this section would use the exponential distribution. In the exponential distribution from the outline 1 c Note that this is only for F x 1 e cx , when the mean time to a success is 1 . c x 0 . There is no probability below zero. 1 1 25 . (ii) F 10 1 e .0410 1 e 0.4 1 .67032 c .04 F 15 1 e .0415 1 e 0.6 1 .54881. So P10 x 15 .67032 .54881 .12151 (i) (iii) Px 20 1 F 20 1 1 e e .44933 (iv) The probability that one component lasts less than 20 hours is 1-.44933 = .55067. The probability that .0420 0.8 both fail before 20 hours is .55067 .30324 . The complement of this is .69676. 2 g. Not a hard problem. You are given PN M .97 , PY W .95 , PM .98 and PW .02 . This implies that PY M .03 and PN W .05 You are asked for PW Y . By Bayes’ Rule PW Y PY W PW . PY PY PY W PW PY M PM .95.02 .03.98 .0190 .0294 .0484 So PW Y PY W PW PY .0190 .3926 .0484 Another way to do this is to say that of 10000 people who come to my tower, 9800 will be Muggles and 200 will be Wizards. Of the Muggles, 97% or 9506 will be identified as Muggles. The remaining 3% or 294 will be wrongly identified as Wizards. Of the 200 Wizards, 95% or 190 will be correctly identified. So 294 + 190 = 484 are identified as Wizards. Of the 484, 190 or 39.3% actually are Wizards. 12