Download Probability 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
7.84 Given a binomial random variable with n=10 and p=.3, use the formula to find the
following probabilities.
10!
P(k) =
(0.3) k (0.7)10− k
k!(10 − k)!
a. P(X=3) 0.267
b. P(X=5) 0.103
c. P(X=8) 0.00145
7.97 In the US, voters who are neither Democrat nor Republican are called
Independents. It is believed that 10% of all voters are Independents. A survey asked 25
people to identify themselves as Democrat, Republican, or Independent.
a. What is the probability that none of the people are Independent?
If p is the probability of an outcome (like being Independent) in a single trial (like a
random sampling of a person’s affiliation), the probability of obtaining this outcome
exactly k times out of N trials is given by the binomial distribution.
n!
P(k) =
p k (1 − p) n− k
k!(n − k)!
Thus, the probability of obtaining exactly 0 Independents out of N = 25 randomly
selected voters is
P(0) = (1 − p)25 = (1 − 0.1)25 = (0.9)25 = 7.18%
b. What is the probability that fewer than five people are Independent?
Then the probability that fewer than five are Independent is the sum of the individual
probabilities of finding zero, one, two, three or four Independents:
4
P(k < 5) = ∑ P(k) = P(0) + P(1) + P(2) + P(3) + P(4) = 90.20%
k =0
c. What is the probability that more than two people are Independent?
The probability that more than 2 are independent is
25
∑ P(k) = P(3) + P(4) +  + P(25) .
k =3
Using the fact that the sum over all k is 1 (100% probability), we can write
25
2
k =3
k =0
P(k > 2) = ∑ P(k) = 1 − ∑ P(k) = P(0) + P(1) + P(2) = 1 − 0.5371 = 46.29%
8.35 X is normally distributed with mean 1,000 and standard deviation 250. What is the
probability that X lies between 800 and 1,100?
We want the area under the distribution from 800 to 1100. To simplify this calculation,
we standardize the given normal distribution to a normal distribution with zero mean
unity standard deviation. The probability is then the area between the two z scores
corresponding to 800 and 1100:
−200
4
100 2
z1 =
=−
and z2 =
= . Calculating the cumulative normal probability,
250
5
250 5
N(z), for each z score and taking the difference gives the area between the two:
⎛ 2⎞
⎛ 4⎞
N ⎜ ⎟ − N ⎜ − ⎟ = 0.497 . So there is a 49.7% chance that X lies between 800 and 1100.
⎝ 5⎠
⎝ 5⎠
8.42 Travelbyus is an Internet-based travel agency wherein customers can see videos of
the cities they plan to visit. The number of hits daily is a normally distributed random
variable with a mean of 10,000 and a standard deviation of 2,400.
a. What is the probability of getting more than 12,000 hits?
The z-score corresponding to 12,000 hits is
12,000 − 10,000 5
=
2,400
6
The probability for z > 5 6 is 20.233%
z=
b. What is the probability of getting fewer than 9,000 hits?
The z-score corresponding to 9,000 hits is
9,000 − 10,000
5
=−
2,400
12
The probability for z < −5 12 is 33.85%
z=
13.5 In random samples of 25 from each of two normal populations, we found the
following statistics:
_
x1 = 524 s1 = 129
_
x2 = 469 s2 = 141
a. Estimate the difference between the two population means with 95% confidence.
t = (x1 − x2 )
s12 s 22
+
= 1.439
n1 n2
df = ⎡(s12 n1 ) + (s22 n2 ) ⎤
⎣
⎦
2
⎡ (s 2 n )2 (s 2 n )2 ⎤
⎢ 1 1 + 2 2 ⎥ = 47
n2 − 1 ⎥
⎢⎣ n1 − 1
⎦
For the following calculations, I used the T-score calculator at
http://www.usablestats.com/calcs/2samplet&summary=1
Observed difference (Sample 1 - Sample 2): 55
Standard Deviation of Difference : 38.2215
DF : 47
95% Confidence Interval for the Difference ( -21.8902 , 131.8902 )
T-Value 1.439
Population 1 ≠ Population 2: P-Value = 0.1568
Population 1 > Population 2: P-Value = 0.9216
Population 1 < Population 2: P-Value = 0.0784
b. Repeat part (a) increasing the standard deviations to s1 = 255 and s2 = 260.
Observed difference (Sample 1 - Sample 2): 55
Standard Deviation of Difference : 72.8354
DF : 47
95% Confidence Interval for the Difference ( -91.523 , 201.523 )
T-Value 0.7551
Population 1 ≠ Population 2: P-Value = 0.454
Population 1 > Population 2: P-Value = 0.773
Population 1 < Population 2: P-Value = 0.227
c. Describe what happens when the sample standard deviations get larger
The interval in which we can expect the true difference to fall increases if the std dev’s
get larger. The observed difference is 55, and we expect the true mean to be “near” this
value, but we can only say it is within some range with some given level of confidence.
d. Repeat part (a) with samples of size 100.
Observed difference (Sample 1 - Sample 2): 55
Standard Deviation of Difference : 19.1107
DF : 196
95% Confidence Interval for the Difference ( 17.3118 , 92.6882 )
T-Value 2.878
Population 1 ≠ Population 2: P-Value = 0.0044
Population 1 > Population 2: P-Value = 0.9978
Population 1 < Population 2: P-Value = 0.0022
e. Discuss the effects of increasing the sample size.
Increasing the sample size gives us better estimates of each individual mean, hence the
difference between them. The range that we can expect the true mean to fall is thus
narrower than before.
13.8 Random sampling from two normal populations produced the following results:
_
x1 = 412 s2 = 128 n1 = 150
_
x2 = 405 s2 = 54 n2 = 150
For these calculations I used the T-score calculator at
http://www.usablestats.com/calcs/2samplet&summary=1
a. Can we infer at the 5% significance level that µ1 is greater than µ2?
Observed difference (Sample 1 - Sample 2): 7
Standard Deviation of Difference : 11.3431
DF : 200
95% Confidence Interval for the Difference ( -15.3675 , 29.3675 )
T-Value 0.6171
Population 1 ≠ Population 2: P-Value = 0.5378
Population 1 > Population 2: P-Value = 0.7311
Population 1 < Population 2: P-Value = 0.2689
The probability that µ1 > µ2 is 73.11%.
We can therefore conclude µ1 > µ2 with 5% confidence (95% chance of error). However,
if you really meant 95% confidence (i.e. a 5% chance of error), then no, we can not
conclude µ1 > µ2.
b. Repeat part (a) decreasing the standard deviations to s1 = 31 and s2 = 16.
Observed difference (Sample 1 - Sample 2): 7
Standard Deviation of Difference : 2.8484
DF : 223
95% Confidence Interval for the Difference ( 1.3867 , 12.6133 )
T-Value 2.4575
Population 1 ≠ Population 2: P-Value = 0.0148
Population 1 > Population 2: P-Value = 0.9926
Population 1 < Population 2: P-Value = 0.0074
The probability µ1 > µ2 is now 99.26%, so we can certainly conclude µ1 > µ2.
c. Describe what happens when the sample standard deviations get smaller.
The confidence interval for the difference shrinks. (See answer to part C in previous
problem.)
d. Repeat part (a) with samples of size 20.
Observed difference (Sample 1 - Sample 2): 7
Standard Deviation of Difference : 31.0644
DF : 25
95% Confidence Interval for the Difference ( -56.9771 , 70.9771 )
T-Value 0.2253
Population 1 ≠ Population 2: P-Value = 0.8236
Population 1 > Population 2: P-Value = 0.5882
Population 1 < Population 2: P-Value = 0.4118
The probability that µ1 > µ2 is now only 58.82%.
e. Discuss the effects of decreasing the sample size. _
Decreasing the sample size increases the uncertainty in the estimated difference of the
means.
f. Repeat part (a) changing the mean of sample 1 to x1 = 409
Observed difference (Sample 1 - Sample 2): 4
Standard Deviation of Difference : 11.3431
DF : 200
95% Confidence Interval for the Difference ( -18.3675 , 26.3675 )
T-Value 0.3526
Population 1 ≠ Population 2: P-Value = 0.7248
Population 1 > Population 2: P-Value = 0.6376
Population 1 < Population 2: P-Value = 0.3624
The probability that µ1 > µ2 is now only 63.76%.
_
g. Discuss the effect of decreasing x1.
With a smaller observed difference of the two means, it is less certain that µ1 > µ2
15.48 A random sample of 50 observations yielded the following frequencies for the
standardized intervals:
Interval Frequency
Interval of Z ≤ - 1 with frequency of 6
Interval of -1 < Z ≤ 0 with frequency of 27
Interval of 0 < Z ≤ 1 with frequency of 14
Interval of Z > 1 with frequency of 3
Can we infer that the data are not normal (use α = .10)
In a normal distribution we expect the following probabilities in each bin:
0.159 = 7.9 / 50 ~ 8 / 50
0.341 = 17.1 / 50 ~ 17 / 50
0.341 = 17.1 / 50 ~ 17 / 50
0.159 = 7.9 / 50 ~ 8 / 50
We will use a χ−square test to see if our observed data (6, 27, 14 and 3) are statistically
different from these values. There are N-1 or 3 degrees of freedom and the critical
χ2value for 3 degrees of freedom and α = 0.10 is 6.25. Calculating the χ−square
statistic for the observed data:
χ2 =
(6 − 8)2 (27 − 17)2 (27 − 17)2 (6 − 8)2
+
+
+
= 10.036
8
17
17
8
Since 10.036 > 6.25, there is sufficient evidence to reject the null hypothesis. The sample
data are not normally distributed.
15.56 Suppose that the personnel department in Exercise 15.55 continued its
investigation by categorizing absentees according to the shift on which they worked, as
shown in the below table. Is there sufficient evidence at the 10% significance level of a
relationship between the days on which employees are absent and the shift on which the
employees work?
Shift Monday Tuesday Wednesday Thursday Friday
Day 52 28 37 31 33
Evening 35 34 34 37 41
AM
PM
Total
AM
PM
Total
AM
PM
Total
M
52
35
87
Observed
T
28
34
62
W
37
34
71
Th
31
37
68
F
33
41
74
Expected (Row total * Column total) / Total
M
T
W
Th
43.5
31
35.5
34
43.5
31
35.5
34
87
62
71
68
M
1.66092
1.66092
3.321839
(Observed - Expected)2 / Expected
T
W
Th
0.290323
0.06338
0.264706
0.290323
0.06338
0.264706
0.580645 0.126761 0.529412
F
37
37
74
F
0.432432
0.432432
0.864865
Total
181
181
362
Total
181
181
362
Total
2.711761
2.711761
5.423521
χ2 = 5.4235
Degrees of freedom: (Rows-1)(Columns-1) = 4 x 1 = 4
The critical value of χ2 at 0.10 is 7.78. Since 5.42 < 7.78, there is insufficient evidence to
reject the null hypothesis. There's no relationship between the shift worked and days
absent.