Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 366 Lab 1 Solutions (September 14, 2006) page 1 TA: Yury Petrachenko, CAB 484, [email protected], http://www.ualberta.ca/∼yuryp/ Review Questions, Chapters 2, 3, 4, 7, 8 2.105 A study of the residents of region showed that 20% were smokers. The probability of death due to lung cancer, given that a person smoked, was ten times the probability of death due to lung cancer, given the person did not smoke. If the probability of death due to lung cancer in the region is .006, what is the probability of death due to lung cancer given that the person is a smoker? Solution. Consider an experiment consisting of a random selection of a late resident in the area. Some of those who died were smokers. Let’s denote S the event that the selected person smoked. Now, some of the deaths were due to lung cancer. Denote D the event that the person died because of lung cancer. It is stated in the problem that P (S) = 0.2 and P (D) = 0.006. We are also given that P (D|S) = 10 · P (D|S), or P (D|S) = 0.1 · P (D|S), where S is the event that the person didn’t smoke. Let’s apply the law of total probability to P (D) conditioning on S and S: P (D) = P (D|S)P (S) + P (D|S)P (S). Substitute all available information into this equation: 0.006 = P (D|S) · 0.2 + 0.1 · P (D|S) · (1 − 0.2). Now, P (D|S) = 0.0021. ¤ 3.42 A particular sale involves four items randomly selected from a large lot that is known to contain 10% of defectives. Let Y denote the number of defectives among the four sold. The purchaser of the items will return the defectives for repair, and the repair cost is given by C = 3Y 2 + Y + 2. Find the expected repair cost. Solution. A binomial model is depicted here. There are n = 4 trials of selecting an item from a large lot. There are two outcomes each time: an item is either defective or not. Assuming the lot is large enough, the trials are independent. Since Y is the number of defectives, it makes sense to consider finding a defective success. Then, p = 0.1 and Y ∼ Binomial(n, p). Stat 366 Lab 1 Solutions (September 14, 2006) page 2 To find the expected repair cost C, let’s use the linearity of expected values: E[C] = E[3Y 2 + Y + 2] = 3E[Y 2 ] + E[Y ] + E[2]. The expectation of 2 is 2. The expectation of Y is np (since we know the distribution of this random variable), so E[Y ] = 4 · 0.1 = 0.4. To find E[Y 2 ] recall the formula ¡ ¢2 V [Y ] = σ 2 = E[Y 2 ] − µ2 = E[Y 2 ] − E[Y ] . ¡ ¢2 We have E[Y 2 ] = V [Y ] + E[Y ] = npq + (0.4)2 = 0.36 + 0.16 = 0.52. Let’s finally plug everything in: E[C] = 3 · 0.52 + 0.4 + 2 = 3.96. ¤ 3.107 A salesperson has found that the probability of a sale on a single contract is approximately .03. If the salesperson contacts 100 prospects, what is the approximate probability of making at least one sale? Solution. If we ignore the word “approximate”, we can approach this problem with a binomial distribution. There are n = 100 prospects, seemingly independent, with the salesperson making a single sale with the probability p = 0.03 in each of the 100 cases. The question can be reformulated in terms of probabilities as follows. Find P (Y ≥ 1) if Y is distributed binomially P with parameters n and p. To answer this question let’s use the fact that P (Y = y) = 1, and Y takes values from 0, 1, 2, . . . , 100. So, µ ¶ 100 P (Y ≥ 1) = P (Y = y) = 1 − P (Y = 0) = 1 − (0.03)0 (0.97)100 ≈ 0.9524. 0 y=1 100 X To find a truly approximate value, we can use the Poisson distribution. In this case, since p is close to zero, the variance and expected value doesn’t differ much. So, let λ = np = 3 and assume that Y follows the Poisson distribution: P (Y = y) = λy −λ e , y! y = 0, 1, 2, . . . With this assumption, P (Y ≥ 1) = 1 − P (Y = 0) = 1 − 30 −λ e = 1 − e−3 ≈ 0.9502. 0! The second method is faster and implements a model with one parameter only. ¤ Stat 366 Lab 1 Solutions (September 14, 2006) page 3 7.23 An anthropologist wishes to estimate the average height of men for a certain race of people. If the population standard deviation is assumed to be 2.5 inches and if she randomly samples 100 men, find the probability that the difference between the sample mean and the true population mean will not exceed .5 inches. Solution. The anthropologist assumes σ = 2.5. Because n = 100 is large enough (> 30), we can use the Central Limit Theorem to justify that the distribution of Y is Normal. Let’s assume that the true population mean is µ. The investigator wants to know the probability that the absolute value of the difference between µ and Y is not larger than 0.5 inches. By the CLT we have Y −µ 1 √ ≤ b) ≈ P (a ≤ 2π σ/ n Z b e−t 2 /2 dt, a where the right-hand part is just the area below the bell curve. It is available in the tables at the end of the textbook. Applying the theorem to our case, we have P (|Y − µ| ≤ 0.5) = P (−0.5 ≤ Y − µ ≤ 0.5) = P (− Y −µ 0.5 0.5 √ √ √ ≤ ≤ ) ≈ P (−2 ≤ Z ≤ 2) ≈ 0.9544. 2.5/ 100 2.5/ 100 2.5/ 100 (Z is a standard Normal.) So, the probability is about 95%. ¤ 8.2 Suppose that E(θ̂1 ) = E(θ̂2 ) = θ, V (θ̂1 ) = σ12 , and V (θ̂2 ) = σ22 . Consider the estimator θ̂3 = aθ̂1 + (1 − a)θ̂2 . (a) Show that θ̂3 is an unbiased estimator for θ. (b) If θ̂1 and θ̂2 are independent, how should the constant a be chosen in order to minimize the variance of θ̂3 ? Solution. To prove that θ̂3 is unbiased, consider its expectation: E[θ̂3 ] = E[aθ̂1 + (1 − a)θ̂2 ] = aE[θ̂1 ] + (1 − a)E[θ̂2 ] = aθ + (1 − a)θ = θ. To answer (b) we need the formula for the variance of θ̂3 as a function of a. Because θ̂1 and θ̂2 are independent, the variance is additive. Recall also that V [aX] = a2 V [X] We have: V [θ̂3 ] = V [aθ̂1 + (1 − a)θ̂2 ] = a2 V [θ̂1 ] + (1 − a)2 V [θ̂2 ] = a2 σ12 + (1 − a)2 σ22 . Stat 366 Lab 1 Solutions (September 14, 2006) page 4 We want this function minimized. Denote f (a) = a2 σ12 + (1 − a)2 σ22 , then f 0 (a) = 2aσ12 − 2(1 − a)σ22 = 0, and solving for a: a= σ22 . σ12 + σ22 This value of a minimizes the variance of the third estimator. ¤ 8.11 Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a population whose density is given by 3β 3 y −4 , β ≤ y, f (y) = 0, elsewhere where β is unknown. Consider the estimator β̂ = min(Y1 , Y2 , . . . , Yn ). (a) Derive the bias of the estimator β̂. (b) Derive MSE(β̂). Solution. See Prof. Prasad’s notes.