Download Problem Set 4 - Massachusetts Institute of Technology

Department of Urban Studies and Planning Massachusetts Institute of Technology 11.220 Quantitative Reasoning and Statistical Methods for Planning I Spring 1999 Homework Set #4 - Solutions Due: Friday, April 16, 5:00 p.m. to Mark in 10-485. [Total = 100 points] Probability, Probability Distributions, and Statistical Estimation Question 1 The director of a local pollution control board is concerned that a particular company in the area may be illegally dumping certain chemical wastes into a river. Recent national studies have indicated that such practices are relatively widespread; 15 percent of the companies that are similar in nature to the local company do dump wastes illegally. Before undertaking a formal inquiry, the director can authorize the staff to sample the water quality a short distance downstream from the company’s factory. The staff estimates that these water samples will be 75% accurate in predicting excessive dumping when it is in fact occurring and 80% accurate in predicting no illegal dumping when it is in fact not occurring. [6] (a) Draw a complete probability tree for this problem and identify all of the nodes, branches, and probabilities associated with it. The following tree uses the these abbreviations: D= Dumping, ND= Not Dumping, PY= sample Predicts Yes (dumping), PN= sample Predicts No (dumping). Bold values were provided in the question itself. Remember that the total probability of each set of branches emanating from the same node is always = 1.0. ● p(D and PY) = p(D)•p(PY|D) = (0.15)(0.75) = 0.1125 (A) ● p(D and PN) = p(D)•p(PN|D) = (0.15)(0.25) = 0.0375 (B) ● p(ND and PY) = p(ND)•p(PY|ND) = (0.85)(0.2) = 0.17 (C) ● p(ND and PN) = p(ND)•p(PN|ND) = (0.85)(0.8) = 0.68 (D) p(PY|D) = 0.75 ● p(D) = 0.15 p(PN|D) = 0.25 ● p(PY|ND) = 0.2 p(ND) = 0.85 ● p(PN|ND) = 0.8 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 [3] (b) Page 2 What is the probability that the staff, having taking a sample, will predict “yes, there is illegal dumping?” Referring to the letters on the right hand side of the tree, the answer is given by adding the probabilities associated with letters (A) and (C). P(PY) = p(PY and D) + p(PY and ND) = 0.1125 + 0.17 = 0.2825 [3] (b) What is the probability that the company is in fact dumping illegally when the staff predicts illegal dumping is occurring? This is conditional probability: p( D | PY )  p( D and PY ) 0.1125   0.3982 p( PY ) 0.2825 Question 2 In the past few months, the Department of Urban Studies and Planning has submitted eight research proposals to various government agencies for funding. Our past experience is that, on average, one out of every ten proposals will be funded. In this case, because each of the proposals is to a different agency, it seems reasonable to believe that approval or denial of each proposal will have no bearing on the decision concerning the others. [12] (a) Graph the probability histogram for the number of these eight proposals that will be funded. Clearly label both axes. This is a Binomial Probability situation. Each paper is bound to either succeed or fail (yes or no). The general Binomial Probability of getting x successes in n trials is:   x  n n!  p (1  p)n  x p( X  x)    p x (1  p)n  x    x  x!(n  x)!  We know that each paper has an independent probability of success of 1 in 10, so p=0.1 and we’re asked to evaluate the probability of x papers being successful in 8 trials, for x=0, 1, 2, …, 8. The probability of 0 papers being successful in 8 trials is: 8    8!  8! (1)(0.9) 80   (0.9) 8  p(0 successes in 8 trials)    p 0 (1  p) 80    0  0!(8  0)!   (1)(8)!   (1)(0.430)  ≈ 0.430 Similarly, the probability of 1 paper being successful in 8 trials is: 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 Page 3 8  8!   8  7!  (0.1)1 (0.9) 81   (0.1)(0.9) 7  p(1 success in 8 trials)    p1 (1  p) 81   1  1!(8  1)!   (1)(7)!   (8)(0.1)(0.478)  (0.8)(0.487)  ≈ 0.383 Once again, the probability of 2 papers being successful in 8 trials is: 8   8! (0.1) 2 (0.9) 8 2  p (2 successes in 8 trials)    p 2 (1  p ) 8 2    2  2!(8  2)!   8  7  (6!)   56  (0.01)(0.9) 6   (0.01)(0.531)  (28)(0.00531)  0.14880348    2  (2  1)(6!)  ≈ 0.149 The rest of the probabilities are calculated in the same way. The whole table for all 8 trials follows: Number of Successes (x) 0 1 2 3 4 5 6 7 8 Number of Trials (n) 8 8 8 8 8 8 8 8 8 Binomial Probability 0.430 0.383 0.149 0.033 0.005 0.000 0.000 0.000 0.000 The probability histogram, based on the above table is shown on the next page: 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 Page 4 0.5 Probability of being Funded 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 7 8 Number of Funded Proposals Out of Eight Question 3 Families wanting to get into public housing often face long waiting periods before receiving a housing unit. Applicants to the Worcester Housing Authority face waits that are distributed “normally” with a mean () of 4 years and a standard deviation () of 9 months. [3] (a) What is the probability that a randomly chosen family who is seeking public housing will have to wait at least five years for a public housing assignment? We’re looking for p(x≥5 years). x Remember that z  .  4 5 Here, x=5 years, years, and years or 0.75 years. 54 1 z   1.33 0.75 0.75 Look up 1.33 in “white card” and you’ll obtain a probability of 0.9082. Recall that the “white card” only gives the probability for the left-hand tail, whereas here we’re looking for the righthand tail (“wait at least five years”) so we need to subtract the above value from 1. Therefore p(x≥5 years)= 1 – 0.9082 = 0.0918 (or 9.18%) 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 [3] (b) Page 5 What is the amount of time past which the 20% of the families who wait the longest will have to wait? 20% This time we want to find a value in years, not a probability. The probability is given to us in the question. 4 Z=? Recall again that the “white card” only gives the probability for the left-hand tail, so we need to rephrase the question as “the time up to which the 80% of the families who wait the least will have to wait”. Then, we look up 0.8000 inside the table and find the corresponding Z-score. The closest thing is 0.7995, which corresponds to a Z of 0.84. Remember now that Z represents the number of standard deviations away from the mean. Recall that the mean is 4 and the standard deviation is 0.75, so x= + z  = 4 + (0.84)(0.75) = 4 + 0.63 = 4.63 years. [3] (c) What is the twenty-fifth percentile of waiting times? This problem is similar to the one above (part b) in that we’re given a probability and we’re looking for a value in years, so once again we’ll be looking inside the table to determine the Z to be multiplied by the standard deviation and added to the mean (4 years). This problem is more straightforward since we are looking for a left tail, which is what the “white card” gives us. 25% So, we need to find the Z corresponding to p = 0.25, by looking inside the table. The closest probability is 0.2514 which happens to be located in the left half of the table. Z=? 4 This corresponds to a Z of –0.67. Again, recall that Z represents the number of standard deviations away from the mean, so x= + z  = 4 + (-0.67)(0.75) = 4 - 0.5025 = 3.4975 ≈ 3.5 years. [3] (d) What is the probability that an applicant who has already waited five years will get an assignment within the next year? This is a conditional probability. We’re looking for p(5 ≤ x ≤ 6 | x ≥ 5 yrs). p( A and B) p( A)  p( B | A) Recall that p( A | B)  .  p ( B) p ( B) In this case, p(A) = p (5 ≤ x ≤ 6) whereas p(B) = p(x ≥ 5 yrs). It should be clear that p(B|A), i.e. p(x ≥ 5 yrs | 5 ≤ x ≤ 6) is equal to 1 (if one is waiting between 5 and 6 years, one is definitely waiting more than 5 years). So, in this case, p( A)  1.0 p( A) P(5  x  6) . We already know the denominator, which was p( A | B)    p( B) p( B) p( x  5) the answer to part (a) of this question (=0.0918), so now we need to find out the numerator. But, p (5 ≤ x ≤ 6)= p (x ≤ 6 yrs)- p (x ≤ 5 yrs), so we need to find out the Z for x=5 and the Z for x=6, then look up their respective probabilities and calculate the difference. 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 Remember that z  Page 6 x and recall, from part (a), that Z5 yrs= 1.33. Similarly to what was done  x 64 2    2.67 . Now, we look up, from the “white card”, the in part (a), Z 6 yrs   0.75 0.75 two probabilities for z=2.67 (which turns out to be 0.9962), and for z=1.33 (which is 0.9082). So, p (5 ≤ x ≤ 6)= p (x ≤ 6 yrs)- p (x ≤ 5 yrs) = 0.9962 – 0.9082 = 0.088. Finally, we calculate the conditional probability, as follows p( A)  1.0 p( A) P(5  x  6) 0.088 p( A | B)      0.958  0.96 . p ( B) p( B) p( x  5) 0.0918 So, the probability of getting a house within one year, once one has waited already for 5 years is 0.96, or 96%. (Note that – as one would intuitively expect – this is much higher than the probability of getting a house between 5 and 6 years on day one, when one begins the public housing application (which is 8.8%), since there is a 90.8% chance one will get the house before 5 years). Question 4 In the early 1970s Dr. Troy Zimmer conducted a comparative study of female participation in higher education.1 He developed his own measure to use in this study: Participation Ratio  Percentage of citizens enrolled in higher education who were female Percentage of citizens age 15 - 24 who were female He calculated this Participation Ratio for a simple random sample of 105 countries, 58 of which he characterized as “western” and 47 of which he characterized as “non-western.” The values of this variable for the 105 countries ranged from .08 (in the Congo, Guinea, and Saudi Arabia) to 1.10 (in the Philippines). The summary statistics for the data he collected are provided in Table 1: Table 1: Data for Female Participation in Higher Education Western Countries Non-Western Countries Number of Countries 58 47 Mean of Participation Ratio ( x ) .66 .34 Standard Deviation of Participation Ratio (s) .19 .17 [4] (a) Explain in no more than two sentences exactly what this variable is measuring. (You might want to think about what a particularly high value and a particularly low value of Troy A. Zimmer, “Sexism in Higher Education: A Cross-National Analysis,” Pacific Sociological Review, Vol. 18, No. 1, Jan. 1975. 1 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 Page 7 the Participation Ratio would indicate. Also, note that it is possible for this variable to be greater than 1.00, as is the case for the Philippines.) The Participation Ratio measures the relative participation of females in higher education as compared to their relative proportion of the college age population. It measures whether the percentage of people enrolled in higher education who are women is higher (Participation Ratio > 1) or lower (Participation Ratio <1) than the percentage of college age individuals in the population who are women. Another way to see this is to begin by understanding that the Participation Ratio is the ratio of two conditional probabilities: P( female enrolled in higher education) P( female 15  24 years old) [4] (b) Using the information contained in this table and your knowledge of how to quantify the chance error inherent in estimating population parameters from sample statistics, estimate the mean Participation Ratio for all western countries. In estimating population parameters from sample statistics while accounting for the chance error involved in that estimation one needs to calculate an interval estimate for the population parameter. In this case you are dealing with sample and population means, so you need to construct a confidence interval around the sample mean, plus, since we don’t know , we should use the tdistribution:  s   µ ≈ x  t / 2   n However, since, in this case, n≥30, we can use the z distribution instead of the t distribution, so s   x  z 2 n You have to choose a confidence level. 90%, 95%, and 99% are the conventional choices. There is nothing in particular in this problem to suggest the choice of one over another. The calculations for each confidence level that you might have chosen are the following: 90% confidence interval 95% confidence interval 99% confidence interval .19 .66.04  (.62,.70) 58 .19 .66 1.96  .66..05  (.61,.71) 58 .19 .66  2.57  .66..06  (.60,.72) 58 .66  1.64  NOTE: any one of the 3 answers is sufficient. Also, if you used the t-distribution, your answer will still be correct and actually more accurate than the ones above. 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 [4] (c) Page 8 Using the information contained in this table and your knowledge of how to quantify the chance error inherent in estimating population parameters from sample statistics, estimate the mean Participation Ratio for all non-western countries. Once again, calculate the appropriate confidence interval(s): 90% confidence interval 95% confidence interval 99% confidence interval .17 .34..04  (.30,.38) 47 .17 .34 1.96  .34..05  (.29,.39) 47 .17 .34  2.57  .34..06  (.28,.40) 47 .34 1.64  NOTE: any one of the 3 answers is sufficient. Also, if you used the t-distribution, your answer will still be correct and actually more accurate than the ones above. [2] (d) By themselves, what do these two results suggest about differences between female participation in higher education in western countries and in non-western countries? [Note that in the third part of the course we will develop the statistical tools to address this question more rigorously.] For whatever confidence level you chose, the corresponding confidence intervals do not overlap. This leads to the conclusion that the mean Participation Ratio for all western countries is higher than the mean Participation Ratio for all non-western countries. [Note: When we turn to hypothesis testing we will see a more explicit way of handling this question.] Question 5 As part of the Annual Housing Survey the Census Bureau determines how far the head of a household has to commute to work. In 1974, this averaged 13 miles (i. e., the mean distance traveled was 13 miles). The standard deviation of distance traveled happened to be 13 miles too! (These are one-way distances to work.) [4] (a) From these summary statistics, what do you know about the shape of the distribution of the variable “distance to work?” To explain a mean and standard deviation both equal to 13 miles, and knowing that nobody would travel less than zero miles, the distribution of the variable "“distance traveled to work" must be skewed to the right quite a bit. Some commuters must be commuting distance much longer than 13 miles. A real estate office wanted to make a similar survey in a certain town, which has about 20,000 households. A simple random sample of 400 households was chosen, the occupants were interviewed, and it was determined that, on average, the heads of the sample households 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 Page 9 commuted 12.7 miles to work, with a standard deviation of 12.0 miles. (Note that if someone was not working, the commuting distance was defined as 0. This is the same procedure as that used by the Census Bureau.) [6] (b) Using this information find the 90% confidence interval for the mean distance that all heads of households in that town commute to work. The general formula for estimating the population mean, based on a sample mean is, when we don’t know , is:  s   µ ≈ x  t / 2   n However, once again, since the size of the sample is greater than 30, we can use the zs distribution instead of the t-distribution:   x  Z  where s is the sample standard 2 n deviation, n is the size of the sample and α is the % of the normal distribution curve at both tail ends, left out of the confidence interval. For a confidence interval of 90%, the tail end (α) is 10%, or 0.1, therefore α/2 is 0.05. Since the “white card” gives us the entire left tail of the curve, including one of the α/2 tails (i.e. the left one), we need to look up the probability of 0.95 (90% plus the left 5%, α/2). We look inside the table to find the closest value to 0.95 and we find both a 0.9495, corresponding to a Z of 1.64 and a 0.9505, corresponding to a Z of 1.65. Our desired Z is exactly half way between the two, i.e. 1.645. We know that x  12.7 , s = 12.0 and n = 400. Plugging all of these values into the formula, we obtain:   12.7  1.645 12 400  12.7  1.645 12  12.7  1.6450.6  12.7  0.987 20 NOTE: if you used the t-distribution, your answer will still be correct and actually more accurate than the one above. [3] (c) Is the following statement true or false? Why? “90% of the heads of households in the town have one-way commuting distances that are between the bounds of the 90% confidence interval calculated in part (b) above.” The statement is false. It should read: “There is a 90% probability that the true mean of the commuting distance falls within a 90% confidence interval estimating it (calculated in part (b) above).” This is a statement about the process and not a claim about the result (as the other statement was). Another piece of information that was gathered in the real estate office’s survey was that in 321 of the 400 sample households the head of the household commuted by car. 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 [6] (d) Page 10 Find the 95% confidence interval for the percentage of all households in the town in which the head of the household commutes by car. This problem asks us to estimate a population proportion based on a sample proportion. The general equation is: 321 pˆ (1  pˆ )  0.8025 , where n = 400 and pˆ  population proportion  p  pˆ  Z  400 2 n In this case, α is 0.05, therefore α/2 is 0.025. We need to look for a probability of 0.95+0.025=0.975 inside the Z table in the “white card” and find the corresponding Z value. From the table, we find that Z=1.96. Plugging all our values back into the above equation, we obtain: population proportion  p  (0.8025)  (1.96)  (0.8025)  (1.96) (0.8025)(1  0.8025)  400 (0.8025)(0.1975) 0.15849375  (0.8025)  (1.96)  400 400  0.8025  (1.96) 0.000396234375  0.8025  (1.96)(0.0199)   0.8025  0.039  = 80.25% ± 3.9% [3] (e) How would your answer to part (d) change if the town had had only 10,000 households instead of 20,000? Since the sample estimate of population proportion is not dependent on the size of the population, the answer would not change. (Population size, N (capital N) is not part of the estimation equations). [3] (f) How would your answer to part (d) change if the survey had sampled 1,600 households and found that in 1,284 of them the head of the household commuted by car? In general, an estimate of a population parameter based on a sample statistic is expected to improve with larger sample sizes. Therefore, we expect our estimate of population proportion to become more precise. The random sampling error should get smaller. To test this, we use the 1284  0.8025 . Note that the same equations used in part (d) above, using n=1600 and pˆ  1600 sample proportion is the same as before. (0.8025)(1  0.8025) population proportion  p  (0.8025)  (1.96)  1600  (0.8025)  (1.96) (0.8025)(0.1975) 0.15849375  (0.8025)  (1.96)  1600 1600  0.8025  (1.96) 0.00009905859375  0.8025  (1.96)(0.00995)   0.8025  0.0195  11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 = 80.25% ± 1.95% Page 11 Q.E.D. As you recall, Mark mentioned in class that a quadrupling of the sample size will cut the error in half, which is what happened in this case. In general the error will be reduced as the square root of the increase of the sample size. Question 6 A labor union has an examining board whose job is to select apprentices for admission into the apprenticeship program of the union. Not everyone who qualifies is admitted to the apprenticeship program, but there is suspicion that the admissions that are made are made in a manner that is discriminatory. Records of the examining board show that it has a record for admitting 70% of all the applicants who satisfy the basic set of requirements. Recently, five women who satisfied the basic requirements came before the board, but 4 out of the 5 were rejected. Only one was admitted to the program. [6] (a) Calculate the probability that this would have happened using the assumption that the admissions process was non-discriminatory. This is a Binomial Probability problem. We’re being asked to determine the probability of getting 1 success out of 5 tries. The probability of success is 70% or 0.7. Using the Binomial Probability equation: n   x n!  p (1  p) n x we substitute our values and get p( X  x)    p x (1  p) n  x    x  x!(n  x)!   5  5!   5  (4!)  (0.7)(0.3) 4   (0.7)(0.0081)    (0.7)1 (1  0.7) 51   1  1!(5  1)!   (1)( 4!)   (5)(0.00567)  = 0.02835 [2] (b) In order to answer part (a) you had to determine what non-discrimination means in a probabilistic sense. How was non-discrimination incorporated into the calculations you made for part (a)? We assumed the probability of admission was independent of gender. P(Admission|Female)=p(Admission|Male)=p(Admission)=70%=0.7 Question 7 The 4 November 1985 Boston Globe contained an article entitled, “Emission Tampering Found in Many Boston Autos.” It reported on a study by the Environmental Protection Agency (EPA), 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 Page 12 in which the EPA attempted to estimate the proportion of automobiles that had emission systems that had been tampered with (i.e., illegally modified by their owners): “Fifteen percent of the emission control systems in 1975 to 1984 model cars in Boston have been tampered with, a government study has found, causing dangerous fumes to be spewed into the air...The study was conducted last year by pulling motorists over at roadsides or inspection stations. With the consent of the owners, inspectors examined emission control devices such as the catalytic converter system...” In Boston one vehicle out of ten was stopped during a specified time period. 286 vehicles were stopped and examined, and fifteen percent of these vehicles had illegal emission systems. (You may assume that no one who was stopped refused to have their car examined.) [2] (a) Assuming that the likelihood of being stopped was independent of whether or not the emission system had been tampered with—which seems to be a reasonable assumption— calculate the probability that a vehicle that had been tampered with would be stopped. From the sentence: “In Boston one vehicle out of ten was stopped…”, we determine that that the probability of being stopped is 0.10. Assuming that p(stopped)=p(stopped|tampered), as stated in the question, then P(stopped|tampered)=0.10 [6] (b) Assume that the 286 examined automobiles were a simple random sample of all the 1975-1984 cars in Boston. Using these results and your knowledge of how to quantify the chance error that comes with sampling, estimate the true proportion of all cars in Boston with emission systems that have been illegally tampered with. This problem asks us to estimate a population proportion based on a sample proportion. The general equation is: pˆ (1  pˆ ) , where n = 286 and pˆ  15%  0.15 population proportion  p  pˆ  Z  2 n In this case, you could have chosen any of the three most common confidence intervals (90%, 95% and 99%), which correspond to α’s of 0.1, 0.05 and 0.01, respectively, hence α/2’s of 0.05, 0.025 and 0,005 respectively. We need to extract the 3 Z-scores from the “white card” for probabilities of 0.90+0.05=0.95, 0.95+0.025=0.975 and 0.99+0.005=0.995, inside the Z table. From the table, we find that Z0.05=1.645, Z0.025=1.96 and Z0.005=2.575. Plugging all our values back into the above equation, we obtain, for each confidence level: 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 population proportion  p  (0.15)  Z  2  (0.15)  Z  2 Page 13 (0.15)(1  0.15)  286 (0.15)( 0.85) 0.1275  (0.15)  Z   286 286 2  (0.15)  Z  (0.0211)  2 For a confidence interval of 90%, Z0.05=1.645, the estimate of the proportion of Boston cars with tampered emissions is  (0.15)  (1.645)(0.0211)  0.15  0.0347 , i.e. 15% ± 3.47%; For a confidence interval of 95%, Z0.025=1.96, the estimate of the proportion of Boston cars with tampered emissions is  (0.15)  (1.96)(0.0211)  0.15  0.041356 , i.e. 15% ± 4.14%; For a confidence interval of 99%, Z0.005=2.575, the estimate of the proportion of Boston cars with tampered emissions is ,  (0.15)  (2.575)(0.0211)  0.15  0.0543325 ,i.e. 15% ± 5.43%; (NOTE: any one of the 3 answers is sufficient). Question 8 A short article from the science section of the Boston Globe is reproduced below. It describes a method of survey question design called “randomized response.” [2] (a) How would randomized response protect a respondent’s privacy? The key to “Randomized Response”, which may or may not be clear from the short article, is that the respondent flips the coin and the questioner does not see the result. Only the respondent knows whether he/she flipped heads or tails. To make the article clearer, one should also understand that, basically, when one flips “tails” he/she will give “an honest answer”. To summarize: HEADS: always Yes; TAILS: honest answer (yes or no). Privacy is protected because a person who, for example, actually did have sex with a prostitute should feel OK answering yes to such a sensitive question, since at least 50% of the respondents will also answer yes, after flipping a coin and getting “heads”. The questioner will never know who is responding yes because the coin came up “heads” and who is answering yes because, after flipping and getting “tails”, he/she is actually admitting to such an act. 11.220 Quantitative Reasoning and Statistical Methods for Planning I Homework Set #4 – Solutions – Spring 1999 [5] (b) Page 14 How would a survey analyst use this type of question to estimate the proportion of a population that would answer “yes" to a particularly sensitive question? Be as specific as possible. Given the random nature of coin flipping, 50% of the time the result will be “heads” and 50% it will be “tails”. All “heads” will produce an answer of “yes” to a sensitive question. Some percentage of the “tails” will also produce a “yes” response, which represents the “true” yes answers (honest answers). Due to the random nature of coin flipping, one would expect that the same percentage of “real YES answers” will occur both in the 50% of tails as in the 50% of heads, except the latter will be mixed in with “forced YES answers” due to the “heads” rule, since everyone will be answering YES, whether it is a “true YES” or not, so: p(true YES|tails) = p(true YES|heads), therefore p(true YES) = p(true YES|tails) × 2, but p(true YES|tails) = p(YES) – p(YES|heads), where p(YES|heads) = 50%, or 0.5, therefore p(true YES|tails) = p(YES) – 50% (the true YES in the “tails” group are the % of YES in excess of 50%) Therefore, p(true YES)=[p(YES) – 50%] × 2. [2] (c) There is a major unspoken assumption embedded in the use of randomized response. What is it? The major unspoken assumption is that the respondents will understand clearly how their privacy is assured and will actually answer honestly whenever “tails” comes up. If they don’t, the whole process will not yield useful results.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Problem Set 4 - Massachusetts Institute of Technology