Download LAST NAME (Please Print)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
LAST NAME (Please Print):
FIRST NAME (Please Print):
HONOR PLEDGE (Please Sign):
Statistics 111
Midterm 4
• This is a closed book exam.
• You may use your calculator and a single page of notes.
• The room is crowded. Please be careful to look only at your own exam. Try to sit
one seat apart; the proctors may ask you to randomize your seating a bit.
• Report all numerical answers to at least two correct decimal places or (when appropriate) write them as a fraction.
• All question parts count for 1 point.
1
1. Suppose a network has 5 nodes, and each pair of nodes is independently linked by an
edge with probability 0.25.
What is the probability that node A is not connnected to some other
node?
What is the probability that nodes A and B are connected by a shortest path of length 2?
2. Consider the Rayleigh distribution with parameters θ0 and θ1 , which has cumulative
distribution function
1
F (t) = 1 − exp(−θ0 t − θ1 t2 ).
2
When will the Rayleigh distribution have increasing failure rate?
Write a modified cdf which can describe bathtub-shaped failure rates.
Suppose one observes one random failure time T1 = 4 from a Rayleigh distribution
with parameter θ0 = 0. Find the maximum likelihood estimate of θ1 .
3. John is bidding against Yoko to own a first edition of the Theory of Games and
Economic Behavior. It is a sealed-bid auction. He believes that Yoko’s bid (in dollars)
for the book will be uniformly distributed between $100 and $150. His own top-dollar
value for the book is $140. What bid should he make in order to maximize his expected
profit?
2
4. Dr. Evil has bred a new species of cockroach, whose lifespan (in years) is exponentially
distributed with parameter λ = 0.5. In contrast, ordinary cockroaches have lifespans
that are exponential with λ = 1.2. (Recall that the mean of an exponential is 1/λ.)
Dr. Evil releases his cockroaches into the wild. If they are viable, then the U.N. will
declare a world cockroach emergency. To assess the threat, an entomologist collects
an egg from Dr. Evil’s island, hatches it, and observes its lifespan. She will declare a
cockroach emergency if it lives more than 2.3 years.
In words, what is her alternative hypothesis?
What is her α level?
What is the power of her test?
The hatchling lives 1.7 years. What is her signficance probability?
5. A physician wants to predict lifespan from some sensible explanatory variables. To
assess predictive accuracy, she uses cross-validation. But her sample includes many
identical twins. How will this affect her estimate of predictive accuracy?
6. You want to describe the probability that someone dies between the ages of 20 and
30. Should you use a competing risks model or a Cox proportional hazards model?
Why?
3
7. Suppose that the baseline hazard function for a bridge has the Weibull distribution,
with
k x k−1
F (x) = 1 − exp[−(x/λ)k ]
f (x) =
exp[−(x/λ)k ]
λ λ
for k, λ > 0.
What is the hazard function for a bridge?
An engineer fits a Cox proportional hazards model to the lifespan of a bridge (in
decades). Her covariates are average annual traffic load (in millions), percentage
of rebar, and whether or not the bridge has a cantilever span (1 for yes), and her
estimates for the corresponding coefficients are -2, 0.15, and -0.5, respectively.
Suppose the Tacoma Narrows Bridge carries 3 million cars per year
on average, has 20% rebar, and uses a cantilever span. And the Minneapolis I-35W
bridge carries 5 million cars per year, has 30% rebar, does not use the cantilever span.
What is the hazard ratio for Tacoma Narrows compared to I-35W? (Tacoma Narrrows
is in the numerator.)
Which bridge is safer?
8. To succeed, the Reese’s Co. requires both chocolate and peanut butter. They have
two suppliers for peanut butter (A, B), and three suppliers for chocolate (C, D, E).
Over the next year, the failure probabilites for these suppliers is as follows:
company
failure probability
A
0.5
B
0.3
C
0.4
D
0.8
E
0.3
If the failures for each supplier are independent, what is the probability that Reese’s
fails in the next year?
Why should you question the assumption that failures are independent?
4
9. When does multicollinearity occur?
What is the effect of multicollinearity?
10. What is a “large-world” social network?
11. In the Holland-Leinhardt model, assume that the baseline connectivity in the population is 0.2, that Tarzan has expansiveness 0.5, Jane has attractiveness 0.3, and the
tendency to reciprocity is 0.4. What is the probability of an edge from Tarzan to
Jane?
12. List all, and only, true statements.
A. Dunbar’s number represents the maximum number of close friends one can cognitively manage.
B. As points cluster more tightly around a line, the correlation increases.
C. Georg Simmel studied the six-degrees-of-separation theory.
D. In high-dimensional regression, most data sets are not multicollinear.
E. The mean of the residuals is the average of the dependent variables.
F. Roger Boisjoly questioned the accuracy of extrapolation.
G. People tend to overestimate common risks.
H. People tend to maximize their expected utility.
I. The typical utility curve for money is concave (i.e., looks like the graph of ln x).
J. In high dimensions, the number of possible models that can be fit explodes.
K. When there are many independent variables, one needs lots of data in order to
estimate dependent values accurately.
L. Statistics is a good thing to know.
5