Download 1 - NYU

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
C22.0015
Midterm sample problems (with solutions)
▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼
Sample problems for exam of Wednesday 2011.MAR.30.
This exam will use half of class time. The exam is open-book, open-notes.
These are sample problems from previous exams. The items here represent more than
just one exam. Solutions follow, starting on page 3.
1. The random variable X has the probability density
0

f(x) =  2

 x3
if x  1
if x  1
Note that the possible values for X begin at 1. Find the expected value E X.
2. The discrete random variable M takes values on the set {0, 1, 2, …, 35}. It happens
that E M = 13.6 and SD M = 3.0. Assuming that the distribution of M may be reasonably
approximated by a normal distribution, find the probability P[ 9  M  14 ].
3. Stephanie is a psychology grad student who was given the task of collecting scores on
the chocolate-preference scale. As part of this task, she gave a survey questionnaire to 18
students in the 10 a.m. class, getting an average score of X . Later in the day, she gave
the same questionnaire to 11 students in the 3 p.m. class, getting an average score of Y .
It was assumed that these samples were taken from normal populations with the same
standard deviation . She was also able to produce s, an independent chi-squared-based
estimate of the population standard deviation, with 27 degrees of freedom. Specifically,
27 s 2
she had
~  227 . Assuming that the 10 a.m. class and the 3 p.m. class had the same
2

X Y
population mean, find the value of h for which h
will have a t distribution. Be
s
sure to identify the number of degrees of freedom in this t distribution.
4. A police surveillance unit has been watching the headquarters of a cocaine distribution
ring. They have estimated that, between the hours of 10 p.m. and 3 a.m., customers come
for drugs as a Poisson process, at the rate 2.4/hour. Based on subsequent arrests of the
customers, the police have estimated that the transactions have a mean weight of 600
grams, with a standard deviation of 120 grams.
Based on this information, give estimates for the mean and for the standard deviation of
the total weight sold in this time window over 20 consecutive days (a total of 100
“business hours”).
▼
1
 gs2011
C22.0015
Midterm sample problems (with solutions)
▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼
5. Consider the simple linear regression model Yi = 0 + 1 xi + i with the usual
assumptions on the i’s (independent, normal, mean 0, standard deviation ). The model
can equivalently be written Yi ~ N(0 + 1 xi , 2), i = 1, 2, …, n. Find Fisher’s
information for the slope 1.
HINT: Write the model as Yi ~ N(0 + 1 zi , 2), where zi = xi - x .
6. You have a sample of n observations from the density f(x | ) =  e  x I  x  0 . You
wish to test the null hypothesis H0:  = 8 versus H1:  = 10. Use the Neyman-Pearson
lemma to find the form of the best test. The answer will be a statement of form “Reject
H0 if (some function of x1, x2, … , xn) is in (some set).” There may of course be
unknown symbols in (some set), but you cannot use data to describe (some set).
Solutions follow, starting on page 3.
▼
2
 gs2011
C22.0015
Midterm sample problems (with solutions)
▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼
SOLUTIONS
1. This asks for this integral calculation:




2
E X =  x f  x  dx =  x f  x  dx =  x 3 dx = 2  x 2 dx
x
1

1
1
= 2   x 1 
x 
x 1
= 2  0   1  = 2
2. This is done routinely by standardizing, but it helps to use the continuity correction to
bring the interval to (8.5, 14.5). Here are the details:
P[ 9  M  14 ]  P[ 8.5 < M < 14.5
M  13.6
14.5  13.6 
 8.5  13.6


= P
  P[ -1.7 < Z < 0.3 ]
3.0
3.0
 3.0
= P[ 0 < Z < 1.7 ] + P[ 0 < Z < 0.3 ] = 0.4554 + 0.1179 = 0.5733
1
X Y

1
3. Note that X - Y ~ N  0, 2     and therefore that
~ N(0, 1).
1
1
 18 11  



18 11
s
27 s 2
 227

27
=
will
have
the
distribution
.

2
27
X Y
1
1


18 11
Thus we can make a t distribution with 27 degrees of freedom from
s

198 X  Y
198
X Y
X Y
=
=
=
. Thus we use h =
 2.6130.
29
s
29
1
1
29
s

s
18 11
198
Since
▼
27 s 2
~  227 , it follows that
2
3
 gs2011
C22.0015
Midterm sample problems (with solutions)
▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼
4. Let N be the number of customers. Since these come at a Poisson rate of 2.4/hour, the
random variable N will have a Poisson distribution with mean 100  2.4 = 240. Let T be
the total transactions for these N customers. We have easily that
E T = E{ E[ T | N ] } = E{ 600 N } = 600  240 = 144,000
This is in grams, so it represents 144 kg. This is a big operation.
We can get Var T through
Var T = E{ Var[ T | N ] } + Var{ E[ T | N ] }
= E{
1202 N
} + Var{ 600 N
}
= 1202  240
+
6002  240
=
+
86,400,000 = 89,856,000
3,456,000
This leads to SD(T) =
89,856,000  9,479. This corresponds to 9.479 kg.
Equivalently, you could get this result through
T2 = 2X  N   2X 2N = 1202  240 + 6002  240
and this will lead to the same number.
This calculation does not allow for the sampling variability in estimating the Poisson rate
(2.4/hour) or in estimating the mean and standard deviation of the transaction amounts
(600 g and 120 g). And yes, prosecutors do have to resort to procedures like this, since
sentences are determined by the gross quantities of drug.
5. The likelihood is
n
1
2
 2   yi   0  1 zi 
 2  yi   0  1 zi  
 1

1

2  i1
e 2
L = 
=
e

n/2
n
  2 
i 1 
  2


1
n
2
Then write
log L =  n log  
▼
n
1
log  2  
2
22
4
n
 y
i 1
i
  0  1 zi 
2
 gs2011
C22.0015
Midterm sample problems (with solutions)
▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼
Next note

1
log L = 
1
22
n
  2 zi  yi   0  1 zi 
1
2
=
i 1
n
z y
i 1
i
i
  0  1 zi 
Fisher’s information is the variance of this. Thus
 1
I(1) = Var  2

=
1
4

 1
zi  yi   0  1 zi   = Var  2

i 1


n
n
n
1
2
 zi2 2 =
i 1
 zi2 =
i 1
1
2
n
x  x 
i 1
2
=
i

n
 z y 
i 1
i
i
S xx
2
6. Begin by writing the likelihood for the whole problem. This is
f(x | ) =
n
n
 f  x |  =   e
i
i 1
  xi
=  e
n

n
 xi
i 1
i 1
The best test, according to the Neyman-Pearson lemma, has the form
Reject H0 if
f  x |   8
f  x |   10 
< c
By direct plug-in, this rule is
8
n
Reject H0 if
8 e
n
 xi
 10
10n e
n
 8  2  xi
=   e i1
 10 
n
i 1
n
 xi
< c
i 1
n
 8 
We can absorb the multiplier   into the constant c, and then take logarithms (again
 10 
changing c) to get this:
n
Reject H0 if 2  xi < c
i 1
▼
5
 gs2011
C22.0015
Midterm sample problems (with solutions)
▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼▼
We can divide by 2 and state the rule as
n
Reject H0 if
x
i 1
i
< c
This could also be stated as rejecting for x < c.
▼
6
 gs2011