Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
History of statistics wikipedia , lookup
Law of large numbers wikipedia , lookup
German tank problem wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Department of Urban Studies and Planning Massachusetts Institute of Technology 11.220 Quantitative Reasoning and Statistical Methods for Planning I Spring 1998 Midterm Exam—Solutions Date: Wednesday, April 22, 1998. Format: Open book, calculators allowed. Question 1 Question 2 Question 3 Question 4 Question 5 Tips: Total Possible 12 points 13 points 12 points 8 points 12 points Total 57 points EXTRA CREDIT 12 points Total Possible 69 points Your Score (1) Please be sure to show all your work. We will give partial credit. (2) Don’t forget to draw pictures when they are appropriate or helpful. For many of these questions how you set up the problem is just as important as whether or not you ultimately get the right answer. (3) If you have any questions about the wording of the questions, please ask. (4) Question 3 requires more reading time than the others, so plan accordingly. (5) Please note that the last three parts of Question 5 are for extra credit. The exam will be graded on the basis of 57 points. Thus, the extra 12 points can help pull your course average up. Your Name: _________________________________________________________________ 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Recitation (check one): p Anne Thompson Page 2 p Sumeeta Srinivasan p Peter Vaz 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 3 Question 1 In order to proceed with a proposed development in the town of Middletown, a developer needs to obtain a zoning variance. Historical data indicate that in Middletown an average of 70% of all such applications are approved by the town. Because there are costs involved in submitting an application for a zoning variance, the developer wants to avoid the expense that would be involved in submitting an application that will not be approved. The developer is considering hiring a consultant who analyzes zoning variance applications and predicts their success. This consultant has made a specialty of studying the various factors that tend to increase or decrease the probability that an application for a variance will be approved, factors that the developer has not studied. The consultant’s previous experience indicates that when a variance was approved he had actually predicted that it would be approved 9 times out of 10. But when a variance was not approved, he had predicted that it would not be approved only 6 times out of 10. (Note that in this utopian example hiring the consultant does not change the probability of approval; it merely increases the developer’s information about the relative likelihood of the outcomes.) [6] (a) Draw a probability tree to represent this problem. Clearly identify each of the nodes, branches and outcomes and place the appropriate probabilities on the tree. Joint Probabilities P (consultant predicts “approved”| approved) = 0.9 0.9 x 0.7 = 0.63 P (variance approved by town) = 0.7 P (consultant predicts “not approved”| approved) = 0.1 0.1 x 0.7 = 0.07 P (consultant predicts “approved”| not approved) = 0.4 0.4 x 0.3 = 0.12 P (variance not approved by town) = .3 P (consultant predicts “not approved”| not approved) = 0.6 0.6 x 0.3 = 0.18 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 4 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam (b) Page 5 The developer would like to know something about the consultant’s reliability. [3] • What is the probability that the variance will be approved if the consultant says it will be approved? P (variance is approved ⁄ consultant says it will be approved) = [3] • .63 .84 .63.12 What is the probability that the variance will not be approved if the consultant says it will not be approved? P (variance is not approved ⁄ consultant says it will not be approved) = .18 .72 .18.07 (Note: His reliability his higher when he predicts that the variance will be approved than when he predicts that it will not be approved.) Question 2 The primary job of building inspectors is to detect violations of the building code, but building inspectors sometimes miss violations that are actually there. A particular building inspector detects an average of 90% of all the building code violations that actually exist in the buildings that she inspects. This inspector never “discovers” code violations when they in fact do not exist. [3] (a) In a particular building the inspector has detected 15 code violations. Calculate a point estimate of the true number of code violations in this building. Explain your work. .9 x Actual Number of Violations = 15 Actual Number of Violations = 15 16.7 .9 Note that the point estimate does not have to be a whole number because it is an expected value (on average). [6] (b) Assume that the inspector is equally likely to detect each potential code violation and that all potential code violations are independent of one another. In a building 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 6 that actually has 10 code violations, what is the probability that she will detect eight or more of these code violations? This is a binomial problem with n = 10 trials and P (success) = P (detection) = p = .9 P (eight or more successes out of 10 trials) = P (eight succeses out of 10 trials) + P (nine successes out of 10 trials) + P (10 successes out of 10 trials) = .194 + .387 + .349 = .930 [4] (c) In part (b) you made two assumptions. Is each of those assumptions reasonable? Why or why not? (1) Assumption that the inspector is equally likely to detect each different violation. Some violations must be harder to detect than others because they are better hidden (behind walls, etc.), so unless the inspector works proportionately harder to detect those that are harder to detect, ti seems that the probability of detection will differ. (2) Assumption that potential code violations are independent of one another. Surely building violations must be linked to one another. If the building has one particular violation it is entirely conceivable that the probability of another, linked violation will increase. Therefore, it seems unlikely that they are all independent of one another. 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 7 Question 3 On January 21, 1998 Atlantic Marketing Research presented to the Cambridge Community Development Department its Cambridge Rental Housing Study: Impacts of the Termination of Rent Control on Population, Housing Costs, & Housing Stock. Rent controls were eliminated in Cambridge on January 1, 1995, and this report had been commissioned to test what the implications had been for renters in Cambridge. The central questions, of course, were the degree to which rents had risen and for whom, but a number of other questions were addressed as well. Atlantic Marketing took two basic samples. The first sample was a straightforward simple random sample taken from a list of all renter-occupied housing units in Cambridge. But this sample would not have included anyone who had lived in a rent controlled unit in Cambridge prior to January 1, 1995 and had moved out of Cambridge or had bought a unit in Cambridge. The second sample was an explicit attempt to identify and sample tenants who had lived in rent controlled housing and had moved either to other Cambridge addresses or to addresses outside of Cambridge. Using various lists compiled by the City of Cambridge, Atlantic Marketing constructed a complete list of the approximately 600 apartments that had been formerly subject to rent control and from which individuals had moved between 1994 and 1997. Letters and mail survey questionnaires were sent to all of these tenants at their former, rent-controlled Cambridge addresses with the hope that the mail would be forwarded to their current addresses. Anyone who responded to this survey who had lived in a rental unit in Cambridge at the time of the survey was eliminated from this second sample because they already had the appropriate probability of being included in the first sample. (a) [6] With respect to the second sample, the report states, “Significant difficulty was expected and was experienced in attempting to locate such households. While this latter effort falls outside truly random surveying techniques, it was believed to be the best way to reach relocated tenants, particularly those who moved outside Cambridge.” Identify two ways in which this second sample falls “outside truly random surveying techniques” and can introduce bias into the survey results. What can you say, if anything, about the likely direction of these biases? There are two main problems with this sampling technique. • The technique relies on the mail being forwarded correctly. In some cases it may simply be discarded without being forwarded. In other cases it may be forewarded to the wrong address. In yet other cases it may be forwarded to the correct address but one from which the addressee has once again moved. 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam • Page 8 The technique relies on those who actually receive the survey to voluntarily send it back. In this case there is no way to efficiently follow-up on intended respondents to encourage them to respond because you have no record of where they live. So, there is an issue of response rate and possible response bias. In either case it is hard to know the direction of the bias that might be introduced by these problems (though you might be able to come up with a reasonable theory that we have not thought of). Some students pointed out a third problem: • When households move they sometimes break up with former roommates moving to different places. The survey would only be forwarded to one of the roommates and this might introduce bias as well. Some people said that because the survey was sent to a census of the 600 apartments from which people had moved a bias was introduced. A census is the ideal situation: no random sampling error, no non-random sampling error, and no identifiable bias as far as I can see. If you have everyone, you have everyone. The problems come with selective forwarding and selective response. Eventually, the researchers combined the two samples for purposes of analysis. This combined sample included various groups of tenants, each of which would be particularly interesting to study on its own. The accuracy with which one can make estimates about each of these groups varies. Recognizing this, the analysts prepared the table below. (I have changed the descriptions of the various groups to make them more explicit, but otherwise the table remains the same.) In the words of the final report, this table is intended to give a guide as to how “survey results can be interpreted at a 95% confidence interval.” 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam [3] (b) Page 9 Pick one of the groups that is identified in this table and show how the “accuracy” was calculated for that group. Group Number in Sample Tenants who remained in the same unit they had occupied under rent control. 293 Accuracy ± 5.7% Tenants who had resided in a rent controlled unit but who had moved out of that unit. 97 Tenants who moved into a decontrolled unit after the elimination of rent control but had not lived in a rent controlled unit. 179 Tenants of market rate units (units that had not been subject to rent control when it was eliminated). 432 All tenants who lived in decontrolled units at the time of the survey. 474 ± 4.5% All tenants who lived in Cambridge market rate units at the time of the survey. 470 ± 4.5% All current Cambridge renters. 940 All tenants in combined sample. 1000 ± 10.0% ± 7.3% ± 4.7% ± 3.2% ± 3.1% Accuracy Calculation 1. 96 1. 96 1. 96 1. 96 1. 96 1. 96 1. 96 1. 96 .5 (1 .5) 293 .5 (1 .5) 97 .5 (1 .5) 179 .5 (1 .5) 432 .5 (1 .5) 474 .5 (1 .5) 470 .5 (1 .5) 470 .5 (1 .5) 1000 The “accuracy” calculated here is the size of the random sampling error appropriate to a 95% confidence interval: 1.96 p (1 p) n Because one does not know p and because the researchers are calculated a general accuracy level for any proportion estimation problems that one might want to do within each group, they used the most conservative value of p = .5: 1.96 .5 (1.5) n They then calculated the random sampling error for each sample size by inserting the appropriate value of n. Those calculations are the calculations that appear under the column labelled “accuracy.” 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam [3] (c) Page 10 Accuracy obviously refers to the process of estimation. What type of estimation are the accuracy levels calculated in this table useful for? These accuracy levels are for estimation problems in which a population proportion is being estimated for the group which each sample represents. 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 11 Question 4 Based on careful and complete collection of the relevant historical data you have established that the time that it takes you to get from your apartment or dorm room to the QR classroom is distributed normally with a mean, , equal to 20.0 minutes and a standard deviation, , equal to 3.9 minutes. This morning you wanted to study until the last possible minute before heading off to the midterm exam. [8] (a) You carefully calculated the latest time at which you could leave for the midterm exam and still be 90% certain of arriving on time (at 9:30 a.m.). What was that time? (You may ignore any adjustments that may have been necessary for the fact that we changed the room and you may have gotten lost.) This is a straightforward normal distribution problem. Begin by asking what is the value of z that gives 90% of the probability in the left hand tail of the distribution (and 10% in the right hand tail). This is not a two tail problem. Looking up .9000 in the table, one finds that z = 1.28 standard deviations. Therefore, the amount of time that one will need 90% of the time is: (1.28 ) 20.0 (1.28 3.9) 20.0 4.992 25.0 minutes Thus, you had to leave at 9:05 a.m. to be 90% certain of arriving at the exam room by 9:30 a.m. 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 12 Question 5 A simple random sample was taken to estimate the mean number of sinks in single family houses in Middletown. A random sample of 36 single family houses was selected. Suppose that, unknown to the person taking the sample, the true value of is 2.8 sinks per house (including kitchen, bathroom, and basement sinks) and the standard deviation of the number of sinks, , is 0.4. Note that your answers might differ slightly from the answers below depending on how and at what point you rounded off your calculations. [3] (a) Calculate the expected value of the sample mean. The expected value of the sample mean is simply the population mean: 2.8 [3] (b) Calculate the standard error of the sample mean. n [6] (c) .4 .4 .07 36 6 Calculate the probability that the sample mean will be within 0.1 sinks of the expected value of the sample mean. Sample means in this case would be distributed normally with a mean of 2.8 and a standard error of .07 Have to calculate what value of z establishes an interval of ±0.1 sinks: .1 z .07 z .1 1.43 .07 Using the table of the normal distribution, the left hand tail of the distribution corresponding to a z value of 1.43 is .9236. This means that there is .0764 in the right hand tail. Thus, for ±0.1 sinks there is 2 x .0764 = .1528 in both tails. Therefore, 1-.1528 = .8472 of the probability is within ±0.1 sinks. 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 13 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 14 The last three parts of this question are for EXTRA CREDIT. They involve concepts that we did not cover directly in class, but based on our class discussions and the information given below, you should be able to extend your understanding of the material to answer these questions. Let the notation Md indicate the sample median and suppose that you have decided to do estimation of central tendency using medians rather than means. Like sample means, sample medians are distributed normally. [3] (d) Calculate the expected value of the sample median. It says above that sample medians are distributed normally. That means their distribution is bell-shaped and symmetric. This means that the population mean and the population median are the same. Thus, the expected value of the sample median is the population median, which is equal to the population mean = 2.8. (e) The standard error of the sample median is not the same as the standard error of the sample mean, however. The standard error of the sample median is given by the following formula: Md [6] 1.57 2 n Calculate the probability that the median number of sinks will be within 0.1 sinks of the expected value of the sample median. First calculate the standard error: 1.57 2 1.57.4 2 Md .08 sinks n 36 Following the calculations in part (c) above, but with new standard error: Have to calculate what value of z establishes an interval of ±0.1 sinks: .1 z .08 z .1 1.25 .08 11.220: Quantitative Reasoning and Statistical Methods for Planning I Midterm Exam Page 15 Using the table of the normal distribution, the left hand tail of the distribution corresponding to a z value of 1.25 is .8944. This means that there is .1056 in the right hand tail. Thus, for ±0.1 sinks there is 2 x .1056 = .2112 in both tails. Therefore, 1-.2112 = .7888 of the probability is within ±0.1 sinks. [3] (f) On the basis of your answers to parts (c) and (e) above, what conclusion can you draw about the relative advantages of using the sample mean or the sample median to estimate ? There is a lower probability that the sample median will be within ±0.1 sinks than the sample mean will be within the same limits. This means that the sample mean gives us a tighter estimate (as expected). In the sense of giving a narrower confidence interval (for the same confidence level), the sample mean is a better estimate of the true population mean (and population median) than the sample median is.