Download Problem set 4 solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Probability amplitude wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Department of IOMS
Inference and Regression
Assignment 4 - Solutions
Professor William Greene Phone: 212.998.0876
Office: KMC 7-90
Home page:www.stern.nyu.edu/~wgreene
Email: [email protected]
Course web page: www.stern.nyu.edu/~wgreene/Econometrics/Econometrics.htm
The following are in Rice, Mathematical Statistics and Data Analysis, 3rd Edition
1. Problem 32, page 318
a. The sample mean is 3.610869. The sample variance is 1.8488082.
b. Confidence intervals (based on the t distribution with 15 degrees of freedom). The
standard error of the mean is 1.848808/sqr(N) where N=16.
The 90% confidence interval is 3.610869 +/- 1.753(1.848808/4) = 2.801 – 4.421
The 95% confidence interval is 3.610869 +/- 2.131(1.848808/4) = 2.626 – 4.596
The 90% confidence interval is 3.610869 +/- 2.957(1.848808/4) = 2.244 – 4.978
c. Confidence intervals for σ.
The confidence intervals for σ2 are from (N-1)s2/χ2 < σ2 < (N-1)s2/χ2
Where the terminal chi squareds are for (1-α/2) and α/2.
N=16, s2 = 1.8488082. The critical chi squared values are
For 1%, (4.60,32.80), 5%, (6.26,27.49), 10%(7.26,25.00)
The intervals for σ2 are as follows: s = 1.848808
The 99% confidence interval 15(s2)/32.80 to 15(s2)/4.60 = 1.56 – 11.15
The 95% confidence interval 15(s2)/27.49 to 15(s2)/6.26 = 1.86 – 8.19
The 90% confidence interval 15(s2)/25.00 to 15(s2)/7.26 = 2.05 – 7.06
Use the square roots for the interval for σ. (1.25-3.34), (1.37-2.86), (1.43-2.66).
d. The length of the confidence interval is 2(t*)s/4 where s is the sample standard
deviation and 4 is the square root of the sample size. To halve the length of the
interval, that 4 must become an 8, so N must go from 16 to 64 – up by a factor of
4.
2.
Problem 33, page 319
c should be chosen so that the probability to the left of c is .95. That would be 1.645
standard deviations above x-bar. So that the probability to the right of c is .05.
3.
Problem 1, page 362
a. The probability of a type 1 error is the Prob(X=0 or 10|p=.5) = 2*1/1024 = 1/512.
b. The power of the test is the probability it will reject p=.5 given that p really is .1. This
is the probability of 0|p=.1 + probability of 10|p=.1 = .910 + .110 = .3487
4.
Problem 5, page 362
a. False. The significance level is the probability that the test will reject the null
hypothesis when it is true.
b. “If the significance level of a test is decreased, the power would be expected to
increase.” The significance level, α is the probability that the null hypothesis will be
rejected when it is true. The power of the test will be the probability that the hypothesis
is rejected when it is false. The power of the test is a function of the alternative
hypothesis. In general, decreasing the significance level of a test makes it harder to
reject the hypothesis. This should operate similarly on the power. Take an example.
Suppose the hypothesis is that the mean of a normal distribution is zero. If the
significance level is .05, the test is rejected if the sample mean is less than -1.96
standard errors or greater than +1.96 standard errors. Suppose the specific alternative
is that the mean is .1. Then, the power of the test is the probability that the mean will be
greater than 1.86 standard errors from .1 or less than -2.06 standard errors from .1. The
power is .051142. Now, reduce the significance level to .01. The probability of the type I
error is now .01. The probability that the null will be rejected when the alternative, µ = .1,
is true becomes .001037. Reducing the significance level reduces the power of the test
as well. What does increase when the significance is reduced is the probability of a type
II error, not rejecting the null when it is false. This is one minus the power. The
probabiliy of a type II error increases when the probability of a type I error decreases.
The power of the test is one minus the probability of a type II error. So, the answer to
the question is FALSE.
c. False. The probability is 1 - α, not α.
d. False. The probability that the null is falsely rejected is 1 - α
e. True, but it must be qualified that the null hypothesis must also be true.
f. Not necessarily. It depends on the application.
g. False. The power is determined by the alternative.
h. True. It is a function of a statistic computed from the data.
5.
Problem 9, page 363
The rejection region is the values of x-bar that are greater than 1.28 standard errors
above zero That is, where the probability that x-bar is above the value is .1. That is
1.28 standard errors. The standard error is σ/sqr(n) = 10/5 = 2. So, the rejection region is
values of x-bar above 2.56. The power of the test is the probability that it will reject H0
when it is false. That is the probability that x-bar will exceed 2.56 when µ is 1.5. If µ is
1.5, then this is the probability that z is greater than (2.56 – 1.5)/2 = prob(z > .53) =
.2981. If α is .01, then the 1.28 becomes 2.33 so the critical value is 4.66. The
probability that z exceeds 4.66 if µ is 1.5 is Prob(z > (4.66 – 1.5)/2) = Prob(z > 1.58) =
.0571.
6.
Problem 12, page 363.
The likelihood under the null hypothesis is θ0N exp( −θ0 Σi xi ). Under the alternative,
the likelihood is maximized by setting θ = 1/x. So, the likelihood under the
alternative hypothesis is (1/x) N exp( −1 / x Σi xi ). But, Σi xi =
Nx, so the likelihood
under the alternative is (1/x) N exp( − N ). The likelihood ratio is
N
 exp(−θ0 x) 
 θ0 
θ0N exp(−θ0 Nx)
= (θ0 x) N 
L=
 = 

N
(1/x) exp(− N )
 exp(−1) 
 exp(−1) 
N
( x exp(−θ0 x) )
N
.
The rejection region is values of x for which ( x exp( −θ0 x) ) is small.
N
7.
Problem 43, parts a,b, page 369
a. The mean would be .5(17950) = 8975 and the variance = .5(.5)17950 = 4487.5. The
standard deviation is 66.988. 9207 heads is (9207 – 8975)/66.988 = 3.4633 standard
deviations from the mean. Using the normal approximation, this seems like a significant
difference.
b. In tosses of 5 coins, assuming they are fair, the probabilities are 1/32, 5/32, 10/32,
10/32, 5/32, 1/32 = (.03125, .15625, .3125, .3125, .15625, .03125). The expected
values in 3590 tosses of 5 coins are (112.18, 224.37, 448.75, 448.75, 224.37, 112.18).
Computing the chi-squared = Σ [(Observed – Expected)2/Expected] = 21.57 with 5
degrees of freedom. The critical value is about 11.07. Something wrong in here.
8. Problem 54, page 373.
a. If x ~ N[µ,σ2] then
 1  x − µ 2 
1
=
f ( x)
exp  − 
 
σ 2π
 2  σ  
x
y e=
, x log y=
, dx 1/ y dy
=
 1  log y − µ  2 
f ( y)
exp  − 
=
 
σ
yσ 2 π
 
 2 
1
b. (Hint: According to the problem, the data given come from a lognormal population.
Thus, logs (base e) will be normally distributed. Take the logs, then sort the data.
Divide the observed data into a set of ranges. Observe the proportions of observations
that fall in those ranges. You can compute the counterparts to these proportions as the
predicted probabilities for the ranges from a normal distribution with the mean and
variance that occur in the data. A chi squared goodness of fit can then be used. The
sorted data on the logs of the variables are listed below:
1.60944
2.30259
2.70805
1.79176
2.39790
2.77259
1.79176
2.39790
2.77259
2.07944
2.48491
2.77259
2.19722
2.48491
2.77259
2.30259
2.70805
2.83321
2.83321
2.83321
2.83321
2.89037
2.89037
2.94444
2.94444
3.09104
2.99573
3.09104
2.99573
3.13549
3.04452
3.13549
3.04452
3.25810
3.09104
3.25810
3.29584
3.36730
3.55535
3.29584
3.36730
3.58352
3.33220
3.40120
3.61092
3.33220
3.46574
3.68888
3.36730
3.46574
3.71357
3.36730
3.49651
3.76120
3.76120
3.89182
3.98898
3.76120
3.91202
4.00733
3.78419
3.91202
4.00733
3.78419
3.93183
4.04305
3.80666
3.95124
4.09434
3.85015
3.95124
4.11087
4.14313
4.29046
4.43082
4.86753
4.15888
4.31749
4.44265
5.01064
4.17439
4.33073
4.55388
5.16479
4.20469
4.38203
4.70048
5.24702
4.24850
4.40672
4.74493
4.29046
4.40672
4.75359
The mean and standard deviation of the data are 3.508279 and 0.785343.
I divided the data into the 5 sets above. The boundaries of 5 regions are
[-∞,2.83321], (2.83321,3.25810], (3.25810,3.76120], (3.76120,4.11087], (4.11087, -∞].
Standardized (by taking (value – 3.508279)/.78543), these are
(-∞,-.86), (-.86,-.32), (-.32,+.32), (+.32,+.77),(+.77,+∞)
Using the normal table, the probabilities for these 5 intervals are .195, .180, .251, .154,
.220. (Rounding error of .001, so I reduced the top cell to make these add to 1.000).
Based on my division of the data (yours might be different), the 5 sample proportions are
.191, .191, .191, .191, .236. (again, a bit of rounding error in the top cell). The chi
squared is N×Σ [(Observed – Expected)2/Expected]
= 2.364. The number of degrees if 5-1 = 4. The critical chi squared for 4 degrees of
freedom is 9.49 (5% significance), so the normality hypothesis is not rejected.
9.
Problem 32, page 464
Prob(X < Y) = Prob(X – Y < 0). Z = X – Y is normally distributed with mean µX - µY and
variance σX2 + σY2. So, the desired probability is prob(z < [0 – (µx - µy)]/sqr(σX2 + σY2)].
10.
Problem 36, page 465
The means are 85.260 and 84.807. The difference is 0.453.
The standard deviation of x-bar – y-bar is the square root of
Var(x-bar)+var(ybar)-2Cov(xbar,ybar) = sqr(21.1962 + 21.5452 – 2(446.028))/15 = .308.
If it were assumed that the pairings were independent, the covariance term in the result
above would be omitted, and the standard deviation of xbar-ybar would be estimated by
sqr(21.1962 + 21.5452 )/15 = 2.015.
Are they different? Rice seems to think not. Whichever method is used, the difference
in the means is not statistically different from zero. The independence assumption
seems dubious given that the correlation of the two sets of values is .98.
11.
Problem 16, page 535.
There is an error in the problem. The table contains 299 responses, not 250.
Use the chi squared test of independence. The observed frequencies are in the table.
I convert these to proportions
Favorable
Neutral
Unfavorable
Total
Cautious
.264 (.205)
.033 (.030)
.033 (.095)
.330
Midroad
.194 (.208)
.027 (.030)
.114 (.096)
.335
Explorer
.164 (.208)
.030 (.030)
.141* (.096)
.335
Total
.622
.090
.288 (* rounded up)
1.00
Expected proportions are the products of the marginals, given in parentheses.
The chi squared is 299 times the sum over the 9 cells of
[(observed – expected)2/expected]
matrix ; list ; a=[.264,.194,.164/.033,.027,.030/.033,.114,.141] $
A|
1
2
3
--------+-----------------------------------------1|
.264000
.194000
.164000
2|
.0330000
.0270000
.0300000
3|
.0330000
.114000
.141000
matrix ; list ; cs = 1'a$
matrix ; list ; rs = a*[1/1/1] $
matrix ; list ; f = rs*cs $
F|
1
2
3
--------+-----------------------------------------1|
.205260
.208370
.208370
2|
.0297000
.0301500
.0301500
3|
.0950400
.0964800
.0964800
matrix ; list ; d = a-f $ Observed - expected
matrix ; list ; dd = dirp(d,d) $ Square element by element
matrix ; list ; fi = diri(f)$ Reciprocals of elements
matrix ; list ; c = dirp(dd,fi)$ Product, element by element
matrix ; list ; c2 = 299* 1'c*[1/1/1]$ Sum all elements * 299
result is 27.5584
This is a chi squared with (3-1)(3-1) = 4 degrees of freedom.
This is much larger than the critical chi squared with 4 degrees of freedom of 14.86.
The hypothesis of independence is rejected. The test does not imply the nature of the
relationship; it merely suggests that the two opinions are correlated.