Download Chapter 7 - La Sierra University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
In Statistics, it is important to know which distribution to use, and when.
Example 1. To determine the probability for the number of successful surgeries out of 35
assuming the outcomes are independent of each other, and the probability of success on
each surgery is .98, we would use the binomial distribution, with the formula
P(r) = C n,r  (p)r (q)n-r
where r is the number of successes, n=35, p=.98 and q=.02. Below is the table for values
of r ranging from 26 to 35. Notice when r is 29 or less, the probabilities are practically 0.
Successes
Probability
26
27
28
29
30
31
34
33
34
2.13795E-08
3.49199E-07
4.88879E-06
5.78226E-05
0.000566661
0.004478452
0.027430521
0.122190504
0.352196158
35
0.493074621
The corresponding histogram is as follows.
Distribution for n=35 and p=.98
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0
35
34
33
34
31
30
29
28
27
26
Number of Successes
Could you use the normal distribution to find P(r 33)? Absolutely not! Look at the shape
of the distribution, it is not anything like bell shaped. Moreover, the rule of thumb for
using the normal distribution to approximate the binomial distribution is that we must
have np > 5 and nq > 5. In this example, nq= 35(.02) = 0.70 which is not greater than 5.
Notice here that n > 30; however, the rule of thumb for using the normal distribution
when n  30 is not used for binomial problems or sample proportions, it applies to
sample means.
In contrast to this, consider the following example where np > 5 and nq> 5.
Example 2. Suppose a coin is weighted so that it comes up heads 65% of the time. Find
the probability for the number of heads expected if it is tossed 35 times. Here is a table
for the probabilities of r successes for r = 11 to 33, the other probabilities are almost 0.
11
12
13
14
15
16
17
18
19
4.16919E-05
0.000154855
0.000508811
0.001484897
0.003860732
0.008962414
0.018602657
0.034547791
0.05740648
20
21
22
23
24
25
26
27
28
0.085289628
0.113139302
0.133710084
0.140354064
0.130328773
0.106497226
0.076069447
0.04709061
0.024986854
29
30
31
32
33
0.011201004
0.004160373
0.001246195
0.000289295
4.8842E-05
Notice that the distribution looks approximately normal.
Binomial Distribution with n=35, p=.65
0.16
0.14
Probability
0.12
0.1
0.08
0.06
0.04
0.02
0
31
28
25
22
19
16
13
Number of Successes
For example, to find the probability of 20 to 24 successes (inclusive) we could add
P(20) + P(21) + P(22) + P(23) + P(24) = .60281
Where we used the numbers from the table above. However, when n is large, it is often
difficult to compute these probabilities, so it is often desirable to use the normal
approximation (if applicable) in this case it is, as
np = 35(.65) = 22.75 > 5 and nq = 12.25 > 5
So we approximate by a normal distribution with
 = np = 22.75 and
= (npq)1/2 = (350.650.35)1/2  2.83179
using the continuity correction P(20  r  24) = P(19.5 < x < 24.5), so we compute
P(19.5 < x < 24.5 ) = P(-1.15 < z < .62) = .7324 - .1230 = .6094
This gives us a reasonably good approximation of the correct answer .60281 because the
distribution is approximately normal.
Example 3. We would not use the normal distribution to determine the probability of
getting an even number on the toss of a fair die. The distribution of the outcome of a fair
die is uniform and is given below.
Distribution for tossing a fair die
0.25
Probability
0.2
0.15
0.1
0.05
0
6
5
4
3
2
1
Outcome of Toss
Thus to find the probability that a single toss is even is
P(2) + P(2) + P(2) = 1/6 + 1/6 + 1/6 = 1/2
This uniform distribution has
Mean:
 = (1 + 2 + 3 + 4 + 5 + 6)(1/6) = 3.5
Variance:
2 = (12 + 22 + 32 + 42 + 52 + 62 )(1/6) – 3.52 = 2.91667
Standard Deviation:  = (2.91667)1/2 = 1.70783
Example 4. The central limit theorem says the sampling distribution of means of 50
tosses is approximately normal (since n = 50  30) even though the original distribution
is not normal. For this we would use the following formula to convert to the standard
normal
z
x 

n
Because the sampling distribution is approximately normal with mean and standard
deviation
x  
and
x  
n
where  and  were computed in Example 3.
Therefore, the sampling distribution has mean 3.5 and standard deviation  .2415. So
the probability of having 50 tosses with an average of more than 3.7 is approximately
P(z > (3.7 – 3.5)/.2415) = P(z > .83) = 1 - .7967 = .2033
Based on a simulation of 400 groups of 50 tosses, the relative frequency histogram for the
sample means was as follows.
Relative Frequency Histogram
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
2
4.
1
4.
4
9
3.
8
3.
7
3.
6
3.
5
3.
4
3.
3
3.
2
3.
1
3.
3
9
2.
8
2.
Average of 50 tosses
Notice that it does have an approximately normal shaped distribution (even though the
original distribution did not look at all normal). Of those 400 tosses, there were 85 that
had an average of more than 3.7, thus in our simulation, 21.25% of the means were
greater than 3.7 which is quite close to the predicted percentage of 20.33% found above.