Download Midterm Exam/98 - Massachusetts Institute of Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Department of Urban Studies and Planning
Massachusetts Institute of Technology
11.220 Quantitative Reasoning and Statistical Methods for Planning I
Spring 1998
Midterm Exam—Solutions
Date:
Wednesday, April 22, 1998.
Format:
Open book, calculators allowed.
Question 1
Question 2
Question 3
Question 4
Question 5
Tips:
Total Possible
12 points
13 points
12 points
8 points
12 points
Total
57 points
EXTRA CREDIT
12 points
Total Possible
69 points
Your Score
(1) Please be sure to show all your work. We will give partial credit.
(2) Don’t forget to draw pictures when they are appropriate or helpful. For many of
these questions how you set up the problem is just as important as whether or not
you ultimately get the right answer.
(3) If you have any questions about the wording of the questions, please ask.
(4) Question 3 requires more reading time than the others, so plan accordingly.
(5) Please note that the last three parts of Question 5 are for extra credit. The exam
will be graded on the basis of 57 points. Thus, the extra 12 points can help pull
your course average up.
Your Name:
_________________________________________________________________
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Recitation (check one):
p Anne Thompson
Page 2
p Sumeeta Srinivasan
p Peter Vaz
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 3
Question 1
In order to proceed with a proposed development in the town of Middletown, a developer
needs to obtain a zoning variance. Historical data indicate that in Middletown an average
of 70% of all such applications are approved by the town. Because there are costs
involved in submitting an application for a zoning variance, the developer wants to avoid
the expense that would be involved in submitting an application that will not be
approved.
The developer is considering hiring a consultant who analyzes zoning variance
applications and predicts their success. This consultant has made a specialty of studying
the various factors that tend to increase or decrease the probability that an application for
a variance will be approved, factors that the developer has not studied. The consultant’s
previous experience indicates that when a variance was approved he had actually
predicted that it would be approved 9 times out of 10. But when a variance was not
approved, he had predicted that it would not be approved only 6 times out of 10.
(Note that in this utopian example hiring the consultant does not change the probability of
approval; it merely increases the developer’s information about the relative likelihood of
the outcomes.)
[6]
(a)
Draw a probability tree to represent this problem. Clearly identify each of the
nodes, branches and outcomes and place the appropriate probabilities on the tree.
Joint Probabilities
P (consultant predicts
“approved”| approved) = 0.9
0.9 x 0.7 = 0.63
P (variance approved by town) = 0.7
P (consultant predicts “not
approved”| approved) = 0.1
0.1 x 0.7 = 0.07
P (consultant predicts
“approved”| not approved) = 0.4
0.4 x 0.3 = 0.12
P (variance not approved by town) = .3
P (consultant predicts “not
approved”| not approved) = 0.6
0.6 x 0.3 = 0.18
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 4
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
(b)
Page 5
The developer would like to know something about the consultant’s reliability.
[3]
•
What is the probability that the variance will be approved if the consultant
says it will be approved?
P (variance is approved ⁄ consultant says it will be approved) =
[3]
•
.63
.84
.63.12
What is the probability that the variance will not be approved if the
consultant says it will not be approved?
P (variance is not approved ⁄ consultant says it will not be approved) =
.18
.72
.18.07
(Note: His reliability his higher when he predicts that the variance will be
approved than when he predicts that it will not be approved.)
Question 2
The primary job of building inspectors is to detect violations of the building code, but
building inspectors sometimes miss violations that are actually there. A particular
building inspector detects an average of 90% of all the building code violations that
actually exist in the buildings that she inspects. This inspector never “discovers” code
violations when they in fact do not exist.
[3]
(a)
In a particular building the inspector has detected 15 code violations. Calculate a
point estimate of the true number of code violations in this building. Explain your
work.
.9 x Actual Number of Violations = 15
Actual Number of Violations =
15
 16.7
.9
Note that the point estimate does not have to be a whole number because it is an
expected value (on average).
[6]
(b)
Assume that the inspector is equally likely to detect each potential code violation
and that all potential code violations are independent of one another. In a building
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 6
that actually has 10 code violations, what is the probability that she will detect
eight or more of these code violations?
This is a binomial problem with n = 10 trials and P (success) = P (detection) = p = .9
P (eight or more successes out of 10 trials) = P (eight succeses out of 10 trials) +
P (nine successes out of 10 trials) +
P (10 successes out of 10 trials)
= .194 + .387 + .349 = .930
[4]
(c)
In part (b) you made two assumptions. Is each of those assumptions reasonable?
Why or why not?
(1) Assumption that the inspector is equally likely to detect each different
violation.
Some violations must be harder to detect than others because they are better
hidden (behind walls, etc.), so unless the inspector works proportionately
harder to detect those that are harder to detect, ti seems that the probability
of detection will differ.
(2) Assumption that potential code violations are independent of one another.
Surely building violations must be linked to one another. If the building has
one particular violation it is entirely conceivable that the probability of
another, linked violation will increase. Therefore, it seems unlikely that they
are all independent of one another.
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 7
Question 3
On January 21, 1998 Atlantic Marketing Research presented to the Cambridge
Community Development Department its Cambridge Rental Housing Study: Impacts of
the Termination of Rent Control on Population, Housing Costs, & Housing Stock. Rent
controls were eliminated in Cambridge on January 1, 1995, and this report had been
commissioned to test what the implications had been for renters in Cambridge. The
central questions, of course, were the degree to which rents had risen and for whom, but a
number of other questions were addressed as well.
Atlantic Marketing took two basic samples. The first sample was a straightforward
simple random sample taken from a list of all renter-occupied housing units in
Cambridge. But this sample would not have included anyone who had lived in a rent
controlled unit in Cambridge prior to January 1, 1995 and had moved out of Cambridge
or had bought a unit in Cambridge.
The second sample was an explicit attempt to identify and sample tenants who had lived
in rent controlled housing and had moved either to other Cambridge addresses or to
addresses outside of Cambridge. Using various lists compiled by the City of Cambridge,
Atlantic Marketing constructed a complete list of the approximately 600 apartments that
had been formerly subject to rent control and from which individuals had moved between
1994 and 1997. Letters and mail survey questionnaires were sent to all of these tenants at
their former, rent-controlled Cambridge addresses with the hope that the mail would be
forwarded to their current addresses. Anyone who responded to this survey who had
lived in a rental unit in Cambridge at the time of the survey was eliminated from this
second sample because they already had the appropriate probability of being included in
the first sample.
(a)
[6]
With respect to the second sample, the report states, “Significant difficulty was
expected and was experienced in attempting to locate such households. While
this latter effort falls outside truly random surveying techniques, it was believed to
be the best way to reach relocated tenants, particularly those who moved outside
Cambridge.”
Identify two ways in which this second sample falls “outside truly random
surveying techniques” and can introduce bias into the survey results.
What can you say, if anything, about the likely direction of these biases?
There are two main problems with this sampling technique.
•
The technique relies on the mail being forwarded correctly. In some
cases it may simply be discarded without being forwarded. In other
cases it may be forewarded to the wrong address. In yet other cases it
may be forwarded to the correct address but one from which the
addressee has once again moved.
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
•
Page 8
The technique relies on those who actually receive the survey to
voluntarily send it back. In this case there is no way to efficiently
follow-up on intended respondents to encourage them to respond
because you have no record of where they live. So, there is an issue of
response rate and possible response bias.
In either case it is hard to know the direction of the bias that might be
introduced by these problems (though you might be able to come up with a
reasonable theory that we have not thought of).
Some students pointed out a third problem:
•
When households move they sometimes break up with former
roommates moving to different places. The survey would only be
forwarded to one of the roommates and this might introduce bias as
well.
Some people said that because the survey was sent to a census of the 600
apartments from which people had moved a bias was introduced. A census is the
ideal situation: no random sampling error, no non-random sampling error, and
no identifiable bias as far as I can see. If you have everyone, you have everyone.
The problems come with selective forwarding and selective response.
Eventually, the researchers combined the two samples for purposes of analysis. This
combined sample included various groups of tenants, each of which would be particularly
interesting to study on its own. The accuracy with which one can make estimates about
each of these groups varies. Recognizing this, the analysts prepared the table below. (I
have changed the descriptions of the various groups to make them more explicit, but
otherwise the table remains the same.)
In the words of the final report, this table is intended to give a guide as to how “survey
results can be interpreted at a 95% confidence interval.”
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
[3]
(b)
Page 9
Pick one of the groups that is identified in this table and show how the “accuracy”
was calculated for that group.
Group
Number in Sample
Tenants who remained in the same unit they had
occupied under rent control.
293
Accuracy
± 5.7%
Tenants who had resided in a rent controlled unit but
who had moved out of that unit.
97
Tenants who moved into a decontrolled unit after the
elimination of rent control but had not lived in a rent
controlled unit.
179
Tenants of market rate units (units that had not been
subject to rent control when it was eliminated).
432
All tenants who lived in decontrolled units at the time of
the survey.
474
± 4.5%
All tenants who lived in Cambridge market rate units at
the time of the survey.
470
± 4.5%
All current Cambridge renters.
940
All tenants in combined sample.
1000
± 10.0%
± 7.3%
± 4.7%
± 3.2%
± 3.1%
Accuracy
Calculation
1. 96 
1. 96 
1. 96 
1. 96 
1. 96 
1. 96 
1. 96 
1. 96 
.5  (1 .5)
293
.5  (1 .5)
97
.5  (1 .5)
179
.5  (1 .5)
432
.5  (1 .5)
474
.5  (1 .5)
470
.5  (1 .5)
470
.5  (1 .5)
1000
The “accuracy” calculated here is the size of the random sampling error appropriate to
a 95% confidence interval:
1.96 
p  (1  p)
n
Because one does not know p and because the researchers are calculated a general
accuracy level for any proportion estimation problems that one might want to do within
each group, they used the most conservative value of p = .5:
1.96 
.5  (1.5)
n
They then calculated the random sampling error for each sample size by inserting the
appropriate value of n. Those calculations are the calculations that appear under the
column labelled “accuracy.”
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
[3]
(c)
Page 10
Accuracy obviously refers to the process of estimation. What type of estimation
are the accuracy levels calculated in this table useful for?
These accuracy levels are for estimation problems in which a population
proportion is being estimated for the group which each sample represents.
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 11
Question 4
Based on careful and complete collection of the relevant historical data you have
established that the time that it takes you to get from your apartment or dorm room to the
QR classroom is distributed normally with a mean,  , equal to 20.0 minutes and a
standard deviation,  , equal to 3.9 minutes. This morning you wanted to study until the
last possible minute before heading off to the midterm exam.
[8]
(a)
You carefully calculated the latest time at which you could leave for the midterm
exam and still be 90% certain of arriving on time (at 9:30 a.m.). What was that
time? (You may ignore any adjustments that may have been necessary for the fact
that we changed the room and you may have gotten lost.)
This is a straightforward normal distribution problem.
Begin by asking what is the value of z that gives 90% of the probability in the left
hand tail of the distribution (and 10% in the right hand tail). This is not a two tail
problem.
Looking up .9000 in the table, one finds that z = 1.28 standard deviations.
Therefore, the amount of time that one will need 90% of the time is:
 (1.28  )  20.0  (1.28  3.9)  20.0  4.992  25.0 minutes
Thus, you had to leave at 9:05 a.m. to be 90% certain of arriving at the exam
room by 9:30 a.m.
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 12
Question 5
A simple random sample was taken to estimate the mean number of sinks in single family
houses in Middletown. A random sample of 36 single family houses was selected.
Suppose that, unknown to the person taking the sample, the true value of  is 2.8 sinks
per house (including kitchen, bathroom, and basement sinks) and the standard deviation
of the number of sinks,  , is 0.4.
Note that your answers might differ slightly from the answers below depending on how
and at what point you rounded off your calculations.
[3]
(a)
Calculate the expected value of the sample mean.
The expected value of the sample mean is simply the population mean:   2.8
[3]
(b)
Calculate the standard error of the sample mean.

n
[6]
(c)

.4
.4
 .07
36 6
Calculate the probability that the sample mean will be within 0.1 sinks of the
expected value of the sample mean.
Sample means in this case would be distributed normally with a mean of 2.8 and a
standard error of .07
Have to calculate what value of z establishes an interval of ±0.1 sinks:
.1  z  .07
z
.1
 1.43
.07
Using the table of the normal distribution, the left hand tail of the distribution
corresponding to a z value of 1.43 is .9236.
This means that there is .0764 in the right hand tail. Thus, for ±0.1 sinks there is
2 x .0764 = .1528 in both tails. Therefore, 1-.1528 = .8472 of the probability is
within ±0.1 sinks.
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 13
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 14
The last three parts of this question are for EXTRA CREDIT. They involve concepts that
we did not cover directly in class, but based on our class discussions and the information
given below, you should be able to extend your understanding of the material to answer
these questions.
Let the notation Md indicate the sample median and suppose that you have decided to do
estimation of central tendency using medians rather than means. Like sample means,
sample medians are distributed normally.
[3]
(d)
Calculate the expected value of the sample median.
It says above that sample medians are distributed normally. That means their
distribution is bell-shaped and symmetric. This means that the population mean
and the population median are the same.
Thus, the expected value of the sample median is the population median, which is
equal to the population mean = 2.8.
(e)
The standard error of the sample median is not the same as the standard error of
the sample mean, however. The standard error of the sample median is given by
the following formula:
 Md 
[6]
1.57   2
n
Calculate the probability that the median number of sinks will be within 0.1 sinks
of the expected value of the sample median.
First calculate the standard error:
1.57   2
1.57.4 2
 Md 

.08 sinks
n
36
Following the calculations in part (c) above, but with new standard error:
Have to calculate what value of z establishes an interval of ±0.1 sinks:
.1  z  .08
z
.1
 1.25
.08
11.220: Quantitative Reasoning and Statistical Methods for Planning I
Midterm Exam
Page 15
Using the table of the normal distribution, the left hand tail of the distribution
corresponding to a z value of 1.25 is .8944.
This means that there is .1056 in the right hand tail. Thus, for ±0.1 sinks there is
2 x .1056 = .2112 in both tails. Therefore, 1-.2112 = .7888 of the probability is
within ±0.1 sinks.
[3]
(f)
On the basis of your answers to parts (c) and (e) above, what conclusion can you
draw about the relative advantages of using the sample mean or the sample
median to estimate  ?
There is a lower probability that the sample median will be within ±0.1 sinks than
the sample mean will be within the same limits. This means that the sample mean
gives us a tighter estimate (as expected). In the sense of giving a narrower
confidence interval (for the same confidence level), the sample mean is a better
estimate of the true population mean (and population median) than the sample
median is.