Download MS exam problems – Fall 2012

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
MS exam problems – Fall 2012
(From: Ryan Martin)
1. (Stat 401) Consider the following game with a box that contains ten balls—two
red, three blue, and five green. A player selects two balls from the box at random,
without replacement. The player wins $5 for each red ball selected, $1 for each blue,
and $0 for each green. Let X denote the player’s total winnings.
(a) Find P(X ≥ 4).
(b) Find P(X ≥ 4 | X > 0)
(c) Find E(X).
2. (Stat 411) You have a coin which you would like to test for fairness. Let θ ∈ (0, 1)
denote the probability of the coin landing on heads, and consider testing H0 : θ = 0.5
versus H1 : θ < 0.5. You toss the coin until you see the first head. Let X denote
the number of tosses.
(a) Show that the uniformly most powerful test rejects H0 if X ≥ c, where c > 1
is some constant to be determined.
(b) Without loss of generality, the constant c can be an integer. Find c such that
the significance level is no more than a specified α ∈ (0, 1).
Solutions
1. Start by writing the PMF table:
0
x
pX (x) c 52
where c =
10 −1
.
2
c
1 3
5
1
c
1
2
3
2
c
5 2
5
1
1
c
6 2
3
1
1
10
c 22
After evaluating the binomial coefficients, the table looks like:
x
0
pX (x) 10/45
1
15/45
2
3/45
5
10/45
6
6/45
10
1/45
Now the rest is easy.
(a) P(X ≥ 4) = pX (5) + pX (6) + pX (10) = 17/45.
(b) Using definition of conditional probability:
P(X ≥ 4 | X > 0) =
(c) E(X) =
1
(15
45
P(X ≥ 4)
17/45
17
=
= .
P(X > 0)
1 − 10/45
35
+ 6 + 50 + 36 + 10) = 117/45 = 2.6.
1
2. (a) Fix θ1 < 0.5 and consider the simple hypothesis testing problem H0 : θ = 0.5
versus H1 : θ = θ1 . The Neyman–Pearson lemma states that the most powerful
test is one that rejects when the likelihood ratio L(0.5)/L(θ1 ) is too small. But
the likelihood ratio can be written as
1/2 X
(1/2)X
L(0.5)
=
=
θ
(1
−
θ
)
.
1
1
L(θ1 )
θ1 (1 − θ1 )X−1
1 − θ1
1/2
Since θ1 < 0.5, it follows that 1−θ
< 1; therefore, the likelihood ratio is
1
monotone decreasing in X. This monotonicity implies that the likelihood ratio
is too small if and only if X is too large. Consequently, the most powerful test
for the simple hypotheses rejects if X ≥ c for some constant c. Since the
critical region is independent of θ1 , this test must be most powerful for all
θ1 < 0.5 and, hence, uniformly most powerful for H1 : θ < 0.5.
(b) To find the critical value c, we must solve the following inequality:
P0.5 (X ≥ c) ≤ α.
The left-hand side can be simplified using the fact that X ∼ Geo(0.5) under
H0 . Using some geometric series tricks, one gets (for integer c):
P0.5 (X ≥ c) = 0.5c−1 .
Then the smallest c such that 0.5c−1 ≤ α is the solution we want:
0.5c−1 ≤ α ⇐⇒ c ≥ 1 + log α/ log 0.5,
or, equivalently, c = 1 + blog α/ log 0.5c. For example, if α = 0.05, then c = 5.
2
Statistics 401&411 – MS Exam
Fall Semester 2012
1. (STAT401) Suppose that X 1 ,..., X n i.i.d. continuous random variables with c.d.f. F ( x ) .
Let X (1) ≤
≤ X ( n ) be the order statistics.
(i) What is the joint distribution of F ( X 1 ),… , F ( X n ) ?
(ii) Find P( F ( X (1) ) ≤ 0.5) .
(iii) Find the joint distribution of F ( X (1) ) and F ( X ( n ) ) .
Solution:
(i) First, F ( X 1 ),…, F ( X n ) are i.i.d. because they are functions of i.i.d. X 1 ,..., X n . Next,
since F is the c.d.f. of continuous r.v., its inverse function F −1 exists, and
P( F ( X ) ≤ y ) = P ( X ≤ F −1 ( y )) = F ( F −1 ( y )) = y .
Therefore, F ( X 1 ),… , F ( X n ) are i.i.d. Unif(0,1).
(ii)
P( F ( X (1) ) ≤ 0.5) = P(at least one F ( X i ) ≤ 0.5)
= 1 − P(all F ( X i ) > 0.5) = 1 − ∏ P( F ( X i ) > 0.5)
= 1 − 0.5n.
(iii) First, the joint c.d.f. of F ( X (1) ) and F ( X ( n ) ) is
F1,n ( y , z ) = P( F ( X (1) ) ≤ y , F ( X ( n ) ) ≤ z ) = P ( F ( X ( n ) ) ≤ z ) − P( F ( X (1) ) ≥ y , F ( X ( n ) ) ≤ z )
= ∏ P( F ( X i ) ≤ z ) − ∏ P( y ≤ F ( X i ) ≤ z ) = z n − ( z − y )n ,
for any 0 ≤ y ≤ z ≤ 1 . Next, the joint p.d.f of F ( X (1) ) and F ( X ( n ) ) is
∂ 2 F1,n ( y , z )
= n( n − 1)( z − y ) n −2 ,
∂y∂z
for any 0 ≤ y ≤ z ≤ 1 .
2. (STAT411) Suppose that X 1 ,..., X n are i.i.d. discrete random variables with p.m.f.
f ( x | λ ) = (1 − e − λ )e − λ x , x = 0,1,… ,
where λ > 0 is an unknown parameter.
(i) Find the MLE of λ , and the expected Fisher information in X 1 ,..., X n concerning λ .
(ii) Construct an asymptotic 100(1 − α )% confidence interval for λ .
Solution:
(i) The log-likelihood function is l (λ ) = n log(1 − e − λ ) − λ ∑ xi . The score function is then
dl (λ )
e−λ
=n
− ∑ xi . Setting it equal to zero and solving for λ yields the MLE
dλ
1 − e−λ
1⎞
⎛
λˆ = log ⎜ 1 + ⎟ ,
x⎠
⎝
where x is the sample mean. Note that the MLE does NOT exist when x = 0 . Next,
⎛ d 2l (λ ) ⎞
e−λ
n
=
I n (λ ) = E ⎜ −
2 ⎟
(1 − e − λ ) 2
⎝ dλ ⎠
is the expected Fisher information of λ .
(ii) The MLE is asymptotically N (λ , I (λ ) −1 ) , therefore an asymptotic 100(1 − α )%
confidence interval for λ is
(λˆ − zα / 2 I (λˆ ) −1/2 , λˆ + zα /2 I (λˆ ) −1/2 ) .
3. (STAT 411) Suppose that X 1 ,..., X n are i.i.d. from exp(1 / θ ) with c.d.f.
F ( x | θ ) = P( X ≤ x ) = 1 − e − x /θ , if x > 0 ,
where θ > 0 is an unknown parameter.
(i) Find a compete and sufficient statistic for θ .
(ii) Find a MVUE of g (θ ) = 1 − F (t | θ ) , where t is a fixed constant.
Solution:
(i) The p.d.f. is f ( x | θ ) = θ −1e − x /θ I (0,∞ ) ( x ) , and the likelihood is
− x /θ
L(θ ) = θ − n e ∑ i ∏ I (0,∞ ) ( xi ) = exp {− n log θ − ∑ xi / θ } ∏ I (0,∞ ) ( xi ) .
Clearly, it is a member of exponential family, and hence that a complete and sufficient
statistic for θ is ∑ xi .
(ii) Note that I ( t ,∞ ) ( x1 ) is an unbiased estimate of g (θ ) , therefore a MVUE of g (θ ) is
T ( x ) = E ( I ( t ,∞ ) ( X 1 ) | ∑ X i = ∑ xi ) = P ( X 1 ≥ t | ∑ X i = ∑ xi )
⎛ X1
= P⎜
≥
⎜∑X
i
⎝
⎞
⎛ X1
| ∑ X i = ∑ xi ⎟ = P ⎜
≥
⎟
⎜∑X
∑ xi
i
⎠
⎝
t
The last equality follows from Basu's theorem and
⎞
⎟
∑ xi ⎟⎠
X1
∼ Beta (1, n − 1) is an ancillary
∑ Xi
statistic, which is due to the fact that X 1 ∼ Gamma (1, θ ) ,
n
∑X
i
∼ Gamma ( n − 1, θ ) and
i =2
they are independent. Applying the c.d.f. formula of Beta distribution,
⎛ X1
≥
T ( x) = P ⎜
⎜∑X
i
⎝
⎞ ⎛
t ⎞
⎟⎟ = ⎜⎜ 1 −
⎟
∑ xi ⎠ ⎝ ∑ xi ⎟⎠
t
t
n −1
.
Stat401, Problem, Fall 2012:
Let Y1 < Y2 <
< Yn be the order statistics of a random sample from a distribution with pdf
Let Zn = Yn – log n.
(a) Find the cdf of Zn.
(b) Determine the limiting distribution of Zn as n goes to
Stat401, Solution, Fall 2012:
.
Stat 481, Problem, Fall 2012:
An engineer in a textile mill studies the effect of temperature and time on the brightness of a synthetic
fabric in a process involving dye. Several small randomly selected fabric specimens were dyed under
each temperature and time combination. The brightness of the dyed fabric was measured on a 50-point
scale, and the results of the investigation are as follows:
Temperature (degrees Fahrenheit)
Time (cycles)
350
375
400
40
38,32,30
37,35,40
36,39,43
50
40,45,36
39,42,46
39,48,47
Denote the observations by Yijk, where i=1,2 (factor A, time), j=1,2,3 (factor B, temperature), k=1,2,3
= 28584;
= 712; =
(replications). Some relevant summary statistics are
330,
= 382;
= 221,
= 239,
= 252. We also find that SSError=186.0.
(a) Obtain appropriate ANOVA table and test whether the row (factor A), column (factor B),
and interaction (AB) effects are significant. You may need these critical F-values: F(0.05;
1,12)=4.75, F(0.05; 2,12)=3.89.
(b) Summarize your findings and tell the engineer that how the time factor and the
temperature factor affect the brightness of the dyed fabric.
Stat 481, Solution, Fall 2012, Jie Yang:
(a) SSTO = 28584 – (712)2/18 = 420.4;
SSA = (3302 + 3822)/9 - (712)2/18 = 150.2;
SSB = (2212 + 2392 + 2522)/6 - (712)2/18 = 80.8;
SSAB = 420.4 - 150.2 - 80.8 - 186.0 = 3.4
The ANOVA table is then obtained as follows:
Source
SS
A (time)
150.2
B (temperature)
80.8
df
MS
F
1 150.2 9.69
2
40.4 2.61
AB
3.4
2
Error
186.0 12
Total
420.4 17
1.7 0.11
15.5
The interaction is insignificant since FAB = 0.11 < F(0.05; 2,12) = 3.89. Time is an important factor
since FA = 9.69 > F(0.05; 1, 12)=4.75. Temperature is not quite significant since FB = 2.61 < F(0.05;
2, 12) = 3.89.
(b) The data analysis shows the absence of interaction and the strong main effect of time with
longer times increasing the brightness. The increase in temperature increases brightness too.
However, because the large variability of individual measurements, the temperature effect is
only border-line significant. Additional observations would help strengthen the evidence for a
temperature effect.
STAT 416 - August 2012
Two populations X, Y are of same distribution forms but different
measures of central tendency (mean). A random sample of size 5 is
drawn from the two populations respectively, and data is recorded
in the following table:
X : 12.6, 11.2, 13.2, 9.4, 12;
Y : 16.1, 13.4, 15.4, 11.3, 14.
(1). State the hypotheses to test if population Y has a larger
mean than population X.
(2). Choose an appropriate test statistic to make decision on the
hypotheses given α = 0.05.
x
Solution:
(1). Hypotheses: H0 : θ = µY − µX = 0 vs H1 : θ = µY − µX > 0
(2). Pooled sequence: 9.4, 11.2, 11.3, 12, 12.6, 13.2, 13.4, 14, 15.4, 16.
Wilcoxon rank sum test statistic for rank sum of X sample: WN =
1 + 2 + 4 + 5 + 6 = 18. The p-value of the test is P (WN ≤ 18) =
0.028 < 0.05. Table for n1 = n2 = 5.
So the null hypothesis is rejected in favor of H1 : θ > 0 at significance level α = 0.05.
1
STAT 481 - August 2012
A linear regression model with two covariates was fit to n=20
observations {(x1i , x2i , Yi ) , i = 1, ..., 20} .
(1). Write down the linear regression model with coefficients
β0 , β1 , β2 and necessary model assumptions.
(2). It is given that the sum squares of regression SSR = 66, sum
squares of total SST = 200. Calculate and interpret determination
of the coefficient.
(3). Construct ANOVA table and test H0 : β1 = β2 = 0 at
significance level 0.05. [F0.05 (1, 17) = 4.45; F0.05(2,17) = 3.59.]
(4). Provided that studentized t-statistics t β̂0 = 3.3, t β̂1 =
−4.6, t β̂2 = 1.2, could the bivariate model be simplified at level
α = 0.05? [t0.025 (17) = 2.11; t0.05 (2, 17) = 1.74.]
Solution:
(1). Bivariate linear regression model Yi = β0 + β1 x1i + β2 x2i + εi ,
where i.i.d. errors εi ∼ N (0, σ 2 ) .
(2). Determination of the coefficient: R2 = 66/200 = 0.33. 33%
of total variability in the response is explained by the fitted model.
(3). ANOVA table
Source SS DF M S
Reg
66
2
33
Error
134
17
7.88
F
4.19
Total 200 19
F statistic: F = 4.19 > F0.05 (2, 17) = 3.59. We reject H0 : β1 =
β2 = 0.
(4). Critical region C = {|to | > t0.025 (17)} = {|to | > 2.11} . It
implies that β2 is an coefficient not significantly from 0. Hence we
can simplify the model to a simple linear regression model: Yi =
β0 + β1 x1i + εi , i = 1, ..., n.
2
STAT 431 : Sampling Techniques for Fall 2012
Problem:
The purpose is to estimate the proportion of UIC students of a particular minority community
studying on ‘bank loan’ of an amount exceeding US$ 10,000.00 per year. It is known that there
are altogether N = 753 students belonging to this specific community. It is desired to take a
simple random sample without replacement [SRS (N, n)] of a certain number, say ‘n’, of these
students and ask them about their financial situation as to if anybody is going for stipulated
amount of bank loan or not.
Formulate the problem as one of determination of sample size (n) and provide reasonable
solutions, with or without any assumption on the nature of true proportion. Explain your
solutions with illustrative examples.
------------------------------------------------Solution:
As we see, there is a specific community-based population of students of size N and we want to
study a specific feature of the population as a whole, viz., proportion of students among them
studying on ‘bank loan’ exceeding US$ 10,000.00 per year.
Let ‘P’ denote the true [unknown] proportion of such students in this community. We want to
determine adequate sample size ‘n’ under SRS(N, n) so that the sample proportion ‘p’ will
ensure a tolerable and acceptable deviation from the true unknown P. We denote by‘d’ the
acceptable deviation. Interpretation of Acceptable Deviation: If ‘p’ = 0.73 and‘d’ = 0.05, then we
wish P to belong to the interval [0.68, 0.78], Again, if p = 0.38 and‘d’ = 0.01, then we wish P to
be included in the interval [0.37, 0.39] .Theoretically, it is known that for any choice of ‘d’, there
is a value of ‘n’, since ‘n = N’ would lead to what is known as ‘census’ which corresponds to
‘d=0’. Clearly, this narrowing down of acceptable deviation calls for increasing sample size.
Technically, we need -d < p – P < d ,i.e., p – d < P < p + d where ‘p’ is based on SRS (N, n).
We must note that we cannot give this assurance in 100% cases; we will definitely miss out at
times; so we would like to keep high chance of meeting the acceptable deviation level.
With this motivation, we formulate the problem as one of satisfying the requirement
Pr.[p – d < P < p + d] = 1- α…………(1)
where ‘α’ is a small fraction close to 0. We now use the fact that under
SRS(N, n),
(i) E(p) = P;
(ii) V(p) = (1/n – 1/N) P(1-P)
So, defining
Z = (p - P) /√ [V(p)] which is approx. N(0, 1), we have
Z_α/2 = d / √ (V(p)
which yields, using z for Z_α/2,
d2 / z2 = V (p) = P(1-P)[1/n – 1/N] …..(2)
From (2), assuming a large population size,
we have
n(approx.) = P(1-P) z2/ d2……………..(3)
In the absence of any knowledge about the nature of the true proportion ‘P’, we may replace ‘P’
by ‘1/2’ i.e., ‘P (1-P)’ by 1/4.
So, n = z2/4d2…………… (4)
And for α = 0.05; we know
z = 1.96 = 2.00 approx.,
so that an ‘initial’ estimate of n is
no = 1/ d2………..…….(5)
Now we can apply
‘finite population correction’ and conclude that the ‘final’ sample size is given by
nf = no times N /[N + no]……….. (6)
Illustrative Examples:
(i) d = 0.05; α = 0.05 so that
no = 1/ d2 [approx.] = 400
and hence
nf = 400 x 753/[753+400]
= 262.
(ii) d = 0.05, α = 0.01 so that z = 2.58 and hence
no = z2 / 4d2 = (2.58)2 /4x(0.01) 2
=16641.
So finally
nf = 16641x753/[753+16641]
= 721.
Remark : Note that the basic formula for sample size computation is given by (3). At times, we
may have some ‘guess’ values about ‘P’ such as ‘P < 0.35’ or, ‘P > 0.62’ and the like.
In such cases, we can use this kind of vague information to ‘improve’ on our initial estimate of
‘n’.
The idea is not to replace ‘P’ by ‘0.5’ as was done in arriving at (4). Rather, replace ‘P’ by the
upper or lower limit as the case may be.
However, if the vague information in terms of an interval includes the value 0.5, then we need to
use ½ in place of P. The following examples illustrate the procedure to be followed :
(I) P < 0.35….replace P by 0.35;
(II) P > 0.65….replace P by 0.65;
(III)
0.30 < P < 0.45….replace P by 0.45
(IV)
0.57 < P < 0.77….replace P by 0.57
(V) 0.43 < P < 0.67….replace P by 0.5.
MS Exam (Fall 2012) Solution for STAT 461 problems
Problem 1. I roll a six-sided die and observe the number N on the uppermost face. I then toss a
fair coin N times and observe X, the total number of heads to appear. What is the probability
that N=3 and X=1? What is the probability that X=4?
Probelm 2. Let {X_n} be a Markov chain with the state space {0, 1, 2}. The transition probability
matrix is given by P=
1
0
0
0.1
0.5
0.4
0.2
0.4
0.4
If the chain starts from state 1, find the probability that it does not visit state 2 prior its
absorption.