Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Inductive probability wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
German tank problem wikipedia , lookup
Resampling (statistics) wikipedia , lookup
7.84 Given a binomial random variable with n=10 and p=.3, use the formula to find the following probabilities. 10! P(k) = (0.3) k (0.7)10− k k!(10 − k)! a. P(X=3) 0.267 b. P(X=5) 0.103 c. P(X=8) 0.00145 7.97 In the US, voters who are neither Democrat nor Republican are called Independents. It is believed that 10% of all voters are Independents. A survey asked 25 people to identify themselves as Democrat, Republican, or Independent. a. What is the probability that none of the people are Independent? If p is the probability of an outcome (like being Independent) in a single trial (like a random sampling of a person’s affiliation), the probability of obtaining this outcome exactly k times out of N trials is given by the binomial distribution. n! P(k) = p k (1 − p) n− k k!(n − k)! Thus, the probability of obtaining exactly 0 Independents out of N = 25 randomly selected voters is P(0) = (1 − p)25 = (1 − 0.1)25 = (0.9)25 = 7.18% b. What is the probability that fewer than five people are Independent? Then the probability that fewer than five are Independent is the sum of the individual probabilities of finding zero, one, two, three or four Independents: 4 P(k < 5) = ∑ P(k) = P(0) + P(1) + P(2) + P(3) + P(4) = 90.20% k =0 c. What is the probability that more than two people are Independent? The probability that more than 2 are independent is 25 ∑ P(k) = P(3) + P(4) + + P(25) . k =3 Using the fact that the sum over all k is 1 (100% probability), we can write 25 2 k =3 k =0 P(k > 2) = ∑ P(k) = 1 − ∑ P(k) = P(0) + P(1) + P(2) = 1 − 0.5371 = 46.29% 8.35 X is normally distributed with mean 1,000 and standard deviation 250. What is the probability that X lies between 800 and 1,100? We want the area under the distribution from 800 to 1100. To simplify this calculation, we standardize the given normal distribution to a normal distribution with zero mean unity standard deviation. The probability is then the area between the two z scores corresponding to 800 and 1100: −200 4 100 2 z1 = =− and z2 = = . Calculating the cumulative normal probability, 250 5 250 5 N(z), for each z score and taking the difference gives the area between the two: ⎛ 2⎞ ⎛ 4⎞ N ⎜ ⎟ − N ⎜ − ⎟ = 0.497 . So there is a 49.7% chance that X lies between 800 and 1100. ⎝ 5⎠ ⎝ 5⎠ 8.42 Travelbyus is an Internet-based travel agency wherein customers can see videos of the cities they plan to visit. The number of hits daily is a normally distributed random variable with a mean of 10,000 and a standard deviation of 2,400. a. What is the probability of getting more than 12,000 hits? The z-score corresponding to 12,000 hits is 12,000 − 10,000 5 = 2,400 6 The probability for z > 5 6 is 20.233% z= b. What is the probability of getting fewer than 9,000 hits? The z-score corresponding to 9,000 hits is 9,000 − 10,000 5 =− 2,400 12 The probability for z < −5 12 is 33.85% z= 13.5 In random samples of 25 from each of two normal populations, we found the following statistics: _ x1 = 524 s1 = 129 _ x2 = 469 s2 = 141 a. Estimate the difference between the two population means with 95% confidence. t = (x1 − x2 ) s12 s 22 + = 1.439 n1 n2 df = ⎡(s12 n1 ) + (s22 n2 ) ⎤ ⎣ ⎦ 2 ⎡ (s 2 n )2 (s 2 n )2 ⎤ ⎢ 1 1 + 2 2 ⎥ = 47 n2 − 1 ⎥ ⎢⎣ n1 − 1 ⎦ For the following calculations, I used the T-score calculator at http://www.usablestats.com/calcs/2samplet&summary=1 Observed difference (Sample 1 - Sample 2): 55 Standard Deviation of Difference : 38.2215 DF : 47 95% Confidence Interval for the Difference ( -21.8902 , 131.8902 ) T-Value 1.439 Population 1 ≠ Population 2: P-Value = 0.1568 Population 1 > Population 2: P-Value = 0.9216 Population 1 < Population 2: P-Value = 0.0784 b. Repeat part (a) increasing the standard deviations to s1 = 255 and s2 = 260. Observed difference (Sample 1 - Sample 2): 55 Standard Deviation of Difference : 72.8354 DF : 47 95% Confidence Interval for the Difference ( -91.523 , 201.523 ) T-Value 0.7551 Population 1 ≠ Population 2: P-Value = 0.454 Population 1 > Population 2: P-Value = 0.773 Population 1 < Population 2: P-Value = 0.227 c. Describe what happens when the sample standard deviations get larger The interval in which we can expect the true difference to fall increases if the std dev’s get larger. The observed difference is 55, and we expect the true mean to be “near” this value, but we can only say it is within some range with some given level of confidence. d. Repeat part (a) with samples of size 100. Observed difference (Sample 1 - Sample 2): 55 Standard Deviation of Difference : 19.1107 DF : 196 95% Confidence Interval for the Difference ( 17.3118 , 92.6882 ) T-Value 2.878 Population 1 ≠ Population 2: P-Value = 0.0044 Population 1 > Population 2: P-Value = 0.9978 Population 1 < Population 2: P-Value = 0.0022 e. Discuss the effects of increasing the sample size. Increasing the sample size gives us better estimates of each individual mean, hence the difference between them. The range that we can expect the true mean to fall is thus narrower than before. 13.8 Random sampling from two normal populations produced the following results: _ x1 = 412 s2 = 128 n1 = 150 _ x2 = 405 s2 = 54 n2 = 150 For these calculations I used the T-score calculator at http://www.usablestats.com/calcs/2samplet&summary=1 a. Can we infer at the 5% significance level that µ1 is greater than µ2? Observed difference (Sample 1 - Sample 2): 7 Standard Deviation of Difference : 11.3431 DF : 200 95% Confidence Interval for the Difference ( -15.3675 , 29.3675 ) T-Value 0.6171 Population 1 ≠ Population 2: P-Value = 0.5378 Population 1 > Population 2: P-Value = 0.7311 Population 1 < Population 2: P-Value = 0.2689 The probability that µ1 > µ2 is 73.11%. We can therefore conclude µ1 > µ2 with 5% confidence (95% chance of error). However, if you really meant 95% confidence (i.e. a 5% chance of error), then no, we can not conclude µ1 > µ2. b. Repeat part (a) decreasing the standard deviations to s1 = 31 and s2 = 16. Observed difference (Sample 1 - Sample 2): 7 Standard Deviation of Difference : 2.8484 DF : 223 95% Confidence Interval for the Difference ( 1.3867 , 12.6133 ) T-Value 2.4575 Population 1 ≠ Population 2: P-Value = 0.0148 Population 1 > Population 2: P-Value = 0.9926 Population 1 < Population 2: P-Value = 0.0074 The probability µ1 > µ2 is now 99.26%, so we can certainly conclude µ1 > µ2. c. Describe what happens when the sample standard deviations get smaller. The confidence interval for the difference shrinks. (See answer to part C in previous problem.) d. Repeat part (a) with samples of size 20. Observed difference (Sample 1 - Sample 2): 7 Standard Deviation of Difference : 31.0644 DF : 25 95% Confidence Interval for the Difference ( -56.9771 , 70.9771 ) T-Value 0.2253 Population 1 ≠ Population 2: P-Value = 0.8236 Population 1 > Population 2: P-Value = 0.5882 Population 1 < Population 2: P-Value = 0.4118 The probability that µ1 > µ2 is now only 58.82%. e. Discuss the effects of decreasing the sample size. _ Decreasing the sample size increases the uncertainty in the estimated difference of the means. f. Repeat part (a) changing the mean of sample 1 to x1 = 409 Observed difference (Sample 1 - Sample 2): 4 Standard Deviation of Difference : 11.3431 DF : 200 95% Confidence Interval for the Difference ( -18.3675 , 26.3675 ) T-Value 0.3526 Population 1 ≠ Population 2: P-Value = 0.7248 Population 1 > Population 2: P-Value = 0.6376 Population 1 < Population 2: P-Value = 0.3624 The probability that µ1 > µ2 is now only 63.76%. _ g. Discuss the effect of decreasing x1. With a smaller observed difference of the two means, it is less certain that µ1 > µ2 15.48 A random sample of 50 observations yielded the following frequencies for the standardized intervals: Interval Frequency Interval of Z ≤ - 1 with frequency of 6 Interval of -1 < Z ≤ 0 with frequency of 27 Interval of 0 < Z ≤ 1 with frequency of 14 Interval of Z > 1 with frequency of 3 Can we infer that the data are not normal (use α = .10) In a normal distribution we expect the following probabilities in each bin: 0.159 = 7.9 / 50 ~ 8 / 50 0.341 = 17.1 / 50 ~ 17 / 50 0.341 = 17.1 / 50 ~ 17 / 50 0.159 = 7.9 / 50 ~ 8 / 50 We will use a χ−square test to see if our observed data (6, 27, 14 and 3) are statistically different from these values. There are N-1 or 3 degrees of freedom and the critical χ2value for 3 degrees of freedom and α = 0.10 is 6.25. Calculating the χ−square statistic for the observed data: χ2 = (6 − 8)2 (27 − 17)2 (27 − 17)2 (6 − 8)2 + + + = 10.036 8 17 17 8 Since 10.036 > 6.25, there is sufficient evidence to reject the null hypothesis. The sample data are not normally distributed. 15.56 Suppose that the personnel department in Exercise 15.55 continued its investigation by categorizing absentees according to the shift on which they worked, as shown in the below table. Is there sufficient evidence at the 10% significance level of a relationship between the days on which employees are absent and the shift on which the employees work? Shift Monday Tuesday Wednesday Thursday Friday Day 52 28 37 31 33 Evening 35 34 34 37 41 AM PM Total AM PM Total AM PM Total M 52 35 87 Observed T 28 34 62 W 37 34 71 Th 31 37 68 F 33 41 74 Expected (Row total * Column total) / Total M T W Th 43.5 31 35.5 34 43.5 31 35.5 34 87 62 71 68 M 1.66092 1.66092 3.321839 (Observed - Expected)2 / Expected T W Th 0.290323 0.06338 0.264706 0.290323 0.06338 0.264706 0.580645 0.126761 0.529412 F 37 37 74 F 0.432432 0.432432 0.864865 Total 181 181 362 Total 181 181 362 Total 2.711761 2.711761 5.423521 χ2 = 5.4235 Degrees of freedom: (Rows-1)(Columns-1) = 4 x 1 = 4 The critical value of χ2 at 0.10 is 7.78. Since 5.42 < 7.78, there is insufficient evidence to reject the null hypothesis. There's no relationship between the shift worked and days absent.