Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
History of statistics wikipedia , lookup
Inductive probability wikipedia , lookup
Foundations of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Student's t-test wikipedia , lookup
Department of IOMS Inference and Regression Assignment 4 - Solutions Professor William Greene Phone: 212.998.0876 Office: KMC 7-90 Home page:www.stern.nyu.edu/~wgreene Email: [email protected] Course web page: www.stern.nyu.edu/~wgreene/Econometrics/Econometrics.htm The following are in Rice, Mathematical Statistics and Data Analysis, 3rd Edition 1. Problem 32, page 318 a. The sample mean is 3.610869. The sample variance is 1.8488082. b. Confidence intervals (based on the t distribution with 15 degrees of freedom). The standard error of the mean is 1.848808/sqr(N) where N=16. The 90% confidence interval is 3.610869 +/- 1.753(1.848808/4) = 2.801 – 4.421 The 95% confidence interval is 3.610869 +/- 2.131(1.848808/4) = 2.626 – 4.596 The 90% confidence interval is 3.610869 +/- 2.957(1.848808/4) = 2.244 – 4.978 c. Confidence intervals for σ. The confidence intervals for σ2 are from (N-1)s2/χ2 < σ2 < (N-1)s2/χ2 Where the terminal chi squareds are for (1-α/2) and α/2. N=16, s2 = 1.8488082. The critical chi squared values are For 1%, (4.60,32.80), 5%, (6.26,27.49), 10%(7.26,25.00) The intervals for σ2 are as follows: s = 1.848808 The 99% confidence interval 15(s2)/32.80 to 15(s2)/4.60 = 1.56 – 11.15 The 95% confidence interval 15(s2)/27.49 to 15(s2)/6.26 = 1.86 – 8.19 The 90% confidence interval 15(s2)/25.00 to 15(s2)/7.26 = 2.05 – 7.06 Use the square roots for the interval for σ. (1.25-3.34), (1.37-2.86), (1.43-2.66). d. The length of the confidence interval is 2(t*)s/4 where s is the sample standard deviation and 4 is the square root of the sample size. To halve the length of the interval, that 4 must become an 8, so N must go from 16 to 64 – up by a factor of 4. 2. Problem 33, page 319 c should be chosen so that the probability to the left of c is .95. That would be 1.645 standard deviations above x-bar. So that the probability to the right of c is .05. 3. Problem 1, page 362 a. The probability of a type 1 error is the Prob(X=0 or 10|p=.5) = 2*1/1024 = 1/512. b. The power of the test is the probability it will reject p=.5 given that p really is .1. This is the probability of 0|p=.1 + probability of 10|p=.1 = .910 + .110 = .3487 4. Problem 5, page 362 a. False. The significance level is the probability that the test will reject the null hypothesis when it is true. b. “If the significance level of a test is decreased, the power would be expected to increase.” The significance level, α is the probability that the null hypothesis will be rejected when it is true. The power of the test will be the probability that the hypothesis is rejected when it is false. The power of the test is a function of the alternative hypothesis. In general, decreasing the significance level of a test makes it harder to reject the hypothesis. This should operate similarly on the power. Take an example. Suppose the hypothesis is that the mean of a normal distribution is zero. If the significance level is .05, the test is rejected if the sample mean is less than -1.96 standard errors or greater than +1.96 standard errors. Suppose the specific alternative is that the mean is .1. Then, the power of the test is the probability that the mean will be greater than 1.86 standard errors from .1 or less than -2.06 standard errors from .1. The power is .051142. Now, reduce the significance level to .01. The probability of the type I error is now .01. The probability that the null will be rejected when the alternative, µ = .1, is true becomes .001037. Reducing the significance level reduces the power of the test as well. What does increase when the significance is reduced is the probability of a type II error, not rejecting the null when it is false. This is one minus the power. The probabiliy of a type II error increases when the probability of a type I error decreases. The power of the test is one minus the probability of a type II error. So, the answer to the question is FALSE. c. False. The probability is 1 - α, not α. d. False. The probability that the null is falsely rejected is 1 - α e. True, but it must be qualified that the null hypothesis must also be true. f. Not necessarily. It depends on the application. g. False. The power is determined by the alternative. h. True. It is a function of a statistic computed from the data. 5. Problem 9, page 363 The rejection region is the values of x-bar that are greater than 1.28 standard errors above zero That is, where the probability that x-bar is above the value is .1. That is 1.28 standard errors. The standard error is σ/sqr(n) = 10/5 = 2. So, the rejection region is values of x-bar above 2.56. The power of the test is the probability that it will reject H0 when it is false. That is the probability that x-bar will exceed 2.56 when µ is 1.5. If µ is 1.5, then this is the probability that z is greater than (2.56 – 1.5)/2 = prob(z > .53) = .2981. If α is .01, then the 1.28 becomes 2.33 so the critical value is 4.66. The probability that z exceeds 4.66 if µ is 1.5 is Prob(z > (4.66 – 1.5)/2) = Prob(z > 1.58) = .0571. 6. Problem 12, page 363. The likelihood under the null hypothesis is θ0N exp( −θ0 Σi xi ). Under the alternative, the likelihood is maximized by setting θ = 1/x. So, the likelihood under the alternative hypothesis is (1/x) N exp( −1 / x Σi xi ). But, Σi xi = Nx, so the likelihood under the alternative is (1/x) N exp( − N ). The likelihood ratio is N exp(−θ0 x) θ0 θ0N exp(−θ0 Nx) = (θ0 x) N L= = N (1/x) exp(− N ) exp(−1) exp(−1) N ( x exp(−θ0 x) ) N . The rejection region is values of x for which ( x exp( −θ0 x) ) is small. N 7. Problem 43, parts a,b, page 369 a. The mean would be .5(17950) = 8975 and the variance = .5(.5)17950 = 4487.5. The standard deviation is 66.988. 9207 heads is (9207 – 8975)/66.988 = 3.4633 standard deviations from the mean. Using the normal approximation, this seems like a significant difference. b. In tosses of 5 coins, assuming they are fair, the probabilities are 1/32, 5/32, 10/32, 10/32, 5/32, 1/32 = (.03125, .15625, .3125, .3125, .15625, .03125). The expected values in 3590 tosses of 5 coins are (112.18, 224.37, 448.75, 448.75, 224.37, 112.18). Computing the chi-squared = Σ [(Observed – Expected)2/Expected] = 21.57 with 5 degrees of freedom. The critical value is about 11.07. Something wrong in here. 8. Problem 54, page 373. a. If x ~ N[µ,σ2] then 1 x − µ 2 1 = f ( x) exp − σ 2π 2 σ x y e= , x log y= , dx 1/ y dy = 1 log y − µ 2 f ( y) exp − = σ yσ 2 π 2 1 b. (Hint: According to the problem, the data given come from a lognormal population. Thus, logs (base e) will be normally distributed. Take the logs, then sort the data. Divide the observed data into a set of ranges. Observe the proportions of observations that fall in those ranges. You can compute the counterparts to these proportions as the predicted probabilities for the ranges from a normal distribution with the mean and variance that occur in the data. A chi squared goodness of fit can then be used. The sorted data on the logs of the variables are listed below: 1.60944 2.30259 2.70805 1.79176 2.39790 2.77259 1.79176 2.39790 2.77259 2.07944 2.48491 2.77259 2.19722 2.48491 2.77259 2.30259 2.70805 2.83321 2.83321 2.83321 2.83321 2.89037 2.89037 2.94444 2.94444 3.09104 2.99573 3.09104 2.99573 3.13549 3.04452 3.13549 3.04452 3.25810 3.09104 3.25810 3.29584 3.36730 3.55535 3.29584 3.36730 3.58352 3.33220 3.40120 3.61092 3.33220 3.46574 3.68888 3.36730 3.46574 3.71357 3.36730 3.49651 3.76120 3.76120 3.89182 3.98898 3.76120 3.91202 4.00733 3.78419 3.91202 4.00733 3.78419 3.93183 4.04305 3.80666 3.95124 4.09434 3.85015 3.95124 4.11087 4.14313 4.29046 4.43082 4.86753 4.15888 4.31749 4.44265 5.01064 4.17439 4.33073 4.55388 5.16479 4.20469 4.38203 4.70048 5.24702 4.24850 4.40672 4.74493 4.29046 4.40672 4.75359 The mean and standard deviation of the data are 3.508279 and 0.785343. I divided the data into the 5 sets above. The boundaries of 5 regions are [-∞,2.83321], (2.83321,3.25810], (3.25810,3.76120], (3.76120,4.11087], (4.11087, -∞]. Standardized (by taking (value – 3.508279)/.78543), these are (-∞,-.86), (-.86,-.32), (-.32,+.32), (+.32,+.77),(+.77,+∞) Using the normal table, the probabilities for these 5 intervals are .195, .180, .251, .154, .220. (Rounding error of .001, so I reduced the top cell to make these add to 1.000). Based on my division of the data (yours might be different), the 5 sample proportions are .191, .191, .191, .191, .236. (again, a bit of rounding error in the top cell). The chi squared is N×Σ [(Observed – Expected)2/Expected] = 2.364. The number of degrees if 5-1 = 4. The critical chi squared for 4 degrees of freedom is 9.49 (5% significance), so the normality hypothesis is not rejected. 9. Problem 32, page 464 Prob(X < Y) = Prob(X – Y < 0). Z = X – Y is normally distributed with mean µX - µY and variance σX2 + σY2. So, the desired probability is prob(z < [0 – (µx - µy)]/sqr(σX2 + σY2)]. 10. Problem 36, page 465 The means are 85.260 and 84.807. The difference is 0.453. The standard deviation of x-bar – y-bar is the square root of Var(x-bar)+var(ybar)-2Cov(xbar,ybar) = sqr(21.1962 + 21.5452 – 2(446.028))/15 = .308. If it were assumed that the pairings were independent, the covariance term in the result above would be omitted, and the standard deviation of xbar-ybar would be estimated by sqr(21.1962 + 21.5452 )/15 = 2.015. Are they different? Rice seems to think not. Whichever method is used, the difference in the means is not statistically different from zero. The independence assumption seems dubious given that the correlation of the two sets of values is .98. 11. Problem 16, page 535. There is an error in the problem. The table contains 299 responses, not 250. Use the chi squared test of independence. The observed frequencies are in the table. I convert these to proportions Favorable Neutral Unfavorable Total Cautious .264 (.205) .033 (.030) .033 (.095) .330 Midroad .194 (.208) .027 (.030) .114 (.096) .335 Explorer .164 (.208) .030 (.030) .141* (.096) .335 Total .622 .090 .288 (* rounded up) 1.00 Expected proportions are the products of the marginals, given in parentheses. The chi squared is 299 times the sum over the 9 cells of [(observed – expected)2/expected] matrix ; list ; a=[.264,.194,.164/.033,.027,.030/.033,.114,.141] $ A| 1 2 3 --------+-----------------------------------------1| .264000 .194000 .164000 2| .0330000 .0270000 .0300000 3| .0330000 .114000 .141000 matrix ; list ; cs = 1'a$ matrix ; list ; rs = a*[1/1/1] $ matrix ; list ; f = rs*cs $ F| 1 2 3 --------+-----------------------------------------1| .205260 .208370 .208370 2| .0297000 .0301500 .0301500 3| .0950400 .0964800 .0964800 matrix ; list ; d = a-f $ Observed - expected matrix ; list ; dd = dirp(d,d) $ Square element by element matrix ; list ; fi = diri(f)$ Reciprocals of elements matrix ; list ; c = dirp(dd,fi)$ Product, element by element matrix ; list ; c2 = 299* 1'c*[1/1/1]$ Sum all elements * 299 result is 27.5584 This is a chi squared with (3-1)(3-1) = 4 degrees of freedom. This is much larger than the critical chi squared with 4 degrees of freedom of 14.86. The hypothesis of independence is rejected. The test does not imply the nature of the relationship; it merely suggests that the two opinions are correlated.