Download Chapter 4. Probability and Probability Distributions

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 4.
Probability and Probability
Distributions
Importance of Knowing Probability
• To know whether a sample is not identical to the
population from which it was selected, it is necessary to
assess the degree of accuracy to which the sample mean,
sample standard deviation, or sample proportion
represent the corresponding population values.
• To decide at what point the result of the observed
sample is not possible.
– This means that we need to know how to find the probability of
obtaining a particular sample outcome.
– Probability is the tool that enables us to make an inference.
Definition of Probability (1)
• Classical definition
– Each possible distinct result is called an outcome; an event is
identified as a collection of outcomes.
– The probability of an event E is computed by taking the ratio
of the number of outcomes favorable to event E (Ne) to the
total number of N of possible outcomes:
Ne
P(event E ) =
N
Definition of Probability (2)
• Relative frequency
– If an experiment is conducted n different times and if event E
occurs on ne of these trials, then the probability of event E is
approximately
ne
P (event E ) ≈
n
Basic Event Relations and
Probability Laws (1)
• The probability of an event, say event A, will always
satisfy the property:
0 ≤ P( A) ≤ 1
• Mutually exclusive
– Two events A and B are said to be mutually exclusive if they
cannot occur simultaneously.
P( A or B) = P ( A) + P ( B )
Basic Event Relations and
Probability Laws (2)
• Complement
– The complement of an event A is the event that A does not
occur. The complement of A is denoted by the symbol A .
P( A) + P( A ) = 1
• Union
– The union of two events A and B is the set of all outcomes that
are included in either A or B (or both).
• Intersection
– The intersection of two events A and B is the set of all
outcomes that are included in both A and B.
P( A ∪ B ) = P ( A) + P ( B) − P ( A I B)
Basic Event Relations and
Probability Laws (3)
• Conditional Probability
– When probabilities are calculated with a subset of the total
group as the denominator, the result is called a conditional
probability.
– Consider two events A and B with nonzero probabilities, P(A)
and P(B). The conditional probability of event A given event B
is:
P( A | B) =
P( A ∩ B)
P( B)
Basic Event Relations and
Probability Laws (3)
• Independence
– The occurrence of event A is not dependent on the occurrence
of event B or, simply, that A and B are independent event.
P( A | B) = P ( A)
– When events A and B are independent, it follows that:
P( A ∩ B) = P( A) P ( B | A) = P( A) P( B)
Bayes’ Formula (1)
• Let A1, A2,…, Ak be a collection of k mutually exclusive
and exhaustive events with P(Ai)>0 for i=1,…, k. Then
for any other B for which P(B) >0
P( A j | B) =
P( A j ∩ B)
P( B)
=
P( B | A j ) P( A j )
k
∑ P( B | A ) ⋅ P( A )
i =1
• Example 1.
i
i
j = 1,..., k
Bayes’ Formula (2)
• Sensitivity
– The sensitivity of a test (or symptom) is the probability of a positive
test result (or presence of the symptom) given the presence of the
disease.
• Specificity
– The specificity of a test (or symptom) is the probability of a negative
test result (or absence of the symptom) given the absence of the
disease.
• False positive
– The false positive of a test (or symptom) is the probability of a positive
test result (or presence of the symptom) given the absence of the
disease.
• False negative
– The false negative of a test (or symptom) is the probability of a
negative test result (or presence of the symptom) given the presence of
the disease.
Bayes’ Formula (3)
• Predictive value positive
– The predictive value positive of a test (or symptom) is the
probability that a subject has the disease given that the subject
has a positive test result (or has the symptom).
• Predictive value negative
– The predictive value negative of a test (or symptom) is the
probability that a subject does not have the disease, given that
the subject has a negative test result (or does not have the
symptom).
• Example 2.
Discrete and Continuous Variables
• Discrete random variable
– When observation on a quantitative random variable can
assume only a countable number of values, the variable is
called a discrete random variable.
• Continuous random variable
– When observations on a quantitative random variable can
assume any one of the uncountable number of values in a line
interval, the variable is called a continuous random variable.
Probability Distribution for
Discrete Random Variables (1)
•
For discrete random variables, we can compute the probability of
specific individual values occurring.
•
The probability distribution for a discrete random variable displays
the probability P(y) associated with each value of y.
•
Properties of discrete random variables:
– The probability associated with every value of y lies between 0 and 1.
– The sum of the probabilities for all values of y is equal to 1.
– The probabilities for a discrete random variable are additive. Hence, the
probability that y=1, 2, 3, …, k is equal to P(1) + P(2)+P(3)+…+P(k).
– Example 3.
Probability Distribution for
Discrete Random Variables (2)
• Binomial distribution (or experiment)
– Properties:
• A binomial experiment consists of n identical trials.
• Each trial results is one of two outcomes. We will label one
outcome a success and the other a failure.
• The probability of success on a single trial is equal to π and
π remains the same from trial to trial.
• The trials are independent; that is, the outcome of one trial
does not influence the outcome of any other trial.
• The random variability y is the number of successes
observed during the n trials.
General Formula for Binomial Probability
• The probability of observing y successes in n trials of a
binomial experiment is:
n!
π y (1 − π ) n − y
P( y ) =
y!(n − y )!
where
n = number of trials
π = probability of success on a trial
1 − π = probability of failure on a trial
• Example 3
y = number of successes in n trials
n! = n ⋅ (n − 1) ⋅ (n − 2) ⋅ ⋅ ⋅ 3 ⋅ 2 ⋅1
Mean and Standard Deviation of the
Binomial Probability Distribution
• Mean (µ)
µ = nπ
• Standard deviation (σ)
σ = nπ (1 − π )
where π is the probability of success in a given trial and
n is the number of trials in the binomial experiment.
• Example 6
Probability Distributions for
Continuous Random Variables
•
Theoretically, a continuous random variable is one that can
assume values associated with infinitely many points in a line
interval. It is impossible to assign a small amount of probability to
each value of y and retain the property that the probabilities sum
to 1.
•
To overcome this difficulty, for continuous random variables, the
probability of an interval of values is the event of interest or the
probability of y falling in a given interval.
Normal Distribution (1)
0.2
0.0
0.1
Normal Density
0.3
• Normal distribution distribution
-4
-3
-2
-1
0
µ
1
2
3
• Normal probability density function
1
f ( y) =
⋅e
2π σ
−( y −µ )2
2σ 2
4
Normal Distribution (2)
0.3
0.2
Normal Density
0.0
0
µ
1
2
3
4
Normal Density
0.2
0.9544
-3
-2
-1
0
µ
1
2
3
4
0.9974
0.0
0.0
0.1
-4
0.3
-1
0.2
-2
0.1
-3
0.3
-4
Normal Density
0.6826
0.1
0.3
0.2
0.1
0.0
Normal Density
0.4
0.4
•Area under a normal curve
-4
-3
-2
-1
0
µ
1
2
3
4
-4
-3
-2
-1
0
µ
1
2
3
4
Normal Distribution (3)
• Z score
– To determine the probability that a measurement will be less
than some value y, we first calculate the number of standard
deviations that y lies away from the mean by using the formula:
z=
y−µ
σ
– The value of z computed using this formula is sometimes
referred to as the z score associated with the y-value. Using the
computed value of z, we determine the appropriate probability
by using the z table.
– Example 8
Normal Distribution (4)
• 100pth percentile
– The 100pth percentile of a distribution is that value, yp, such that
100p% of the population values fall below yp and 100(1-p)% are above
yp.
– To find the percentile, zp, we find the probability p in z table.
– To find the 100pth percentile, yp, of a normal distribution with mean µ
and standard deviation σ, we need to apply the reverse of the
standardization formula:
– Example 9
y p = µ + z pσ
Random Sampling
• Random number table
• Random number generator
Sampling Distributions (1)
•
A sample statistic is a random variable; it id subject to random
variation because it is based on a random sample of measurements
selected from the population of interest.
•
Like any other random variable, a sample statistic has a
probability distribution. We call the probability distribution of a
sampling statistic the sampling distribution of that statistic.
•
Example 10
Sampling Distributions (2)
•
The sampling distribution of y has mean µ y and standard
deviation σ y , which are related to the population mean µ, and
standard deviation σ, by the following relationship:
µy = µ
•
σy =
σ
n
The sampling deviations have means that are approximately equal
to the population mean. Also, the sampling deviations have
standard deviations that are approximately equal to σ n . If all
possible values of y have been generated, then the standard
deviation of y would equal to σ n exactly.
Central Limit Theorems (1)
•
Let y denote the sample mean computed from a random sample
of n measurements from a population having a mean, µ, and finite
standard deviation, σ. Let µ y and σ y denote the mean and
standard deviation of the sampling distribution of y , respectively.
Based on repeated random samples of size n from the population,
we can conclude the following:
1. µ y = µ
2. σ y =
σ
n
3. When n is large, the sampling distribution of y will be approximately normal
(with the approximation becoming more precise as n increases).
4. When the population distribution is normal, the sampling
distribution of y is exactly nromal for any sample size n.
Central Limit Theorems (2)
•
The Central Limit Theorems provide theoretical justification for
our approximating the true sampling distribution of the sample
mean with the normal distribution. Similar theorems exist for the
sample median, sample standard deviation, and the sample
proportion.
•
For applying the Central Limit Theorems, no specific shape is
required for the theorems to be validated. However, this is not
true in general. If the population distribution had many extreme
values or several modes, the sampling distribution of y would
require n to be considerably larger in order to achieve a
symmetric bell shape.
Central Limit Theorems (3)
•
It is very unlikely that the exact shape of the population
distribution will be known. Thus, the exact shape of the sampling
distribution of y will not be known either. The important point to
remember is that the sampling distribution of y will be
approximately normally distributed with a mean µ y = µ , the
population mean , and a standard deviation σ y = σ n . The
approximation will be more precise as n, the sample size for each
sample, increases and as the shape of the population distribution
becomes more like the shape of a normal distribution.
•
How large should the sample size be for the Central Limit
Theorem to hold? In general, the Central Limit Theorem holds
for n > 30. However, one should not apply this rule blindly. If the
population is heavily skewed, the sampling distribution for y will
still be skewed even for n > 30. On the other hand, if population is
symmetric, the Central Limit Theorem holds for n < 30.
Central Limit Theorems (4)
• Central Limit Theorem for
∑y :
– Let ∑ y denote the sum of a random sample of n measurements
from a population having a mean µ and finite standard
deviation σ. Let µ ∑ y and σ ∑ y denote the mean and standard
deviation of the sampling distribution of ∑ y , respectively.
Based on repeated random samples of size n from the
population, we can conclude the following:
1. µ
2. σ
∑y
∑y
= nµ
= nσ
3. When n is large, the sampling distribution of
∑ y will be approximately
normal (with the approximation becoming more precise as n increases).
4. When the population distribution is normal, the sampling
distribution of
∑ y is exactly nromal for any sample size n.
Normal Approximation to the Binomial (1)
•
The binomial random variable y is the number of successes in the
n trials. Let n random variables, I1, I2,…, In defined as:
⎧1 if the ith trial results in a success
Ii = ⎨
⎩0 if the ith trial results in a failure
•
•
To consider the sum of the random variables, I1, I2,…, In , ∑i =1 I i .
A “1” is placed in the sum for each success that occurs and a “0”
n
for each failure that occurs. Thus, ∑i =1 I i is the number of
successes that occurred during the n trials. Hence, we conclude
n
that y = ∑i =1 I i .
n
Because the binomial random variable y is the sum of independent
random variables, each having the same distribution, we can apply
the Central Limit Theorem for sums to y.
Normal Approximation to the Binomial (2)
•
The normal distribution can be used to approximate the binomial
distribution when n is of an appropriate size. The normal
distribution that will be used has a mean and standard deviation
given by the following formula:
µ = nπ
σ = nπ (1 − π )
π = the probability of success
•
Example 11
Normal Approximation to the Binomial (3)
•
The normal approximation to the binomial distribution can be
unsatisfactory if nπ < 5 or n(1- π) < 5. If π is small and n is modest,
the actual binomial distribution is seriously skewed to the right. In
such a case, the symmetric normal curve will give an
unsatisfactory approximation. If π is near 1, so n(1- π) < 5, the
actual binomial will be skewed to the left, and again the normal
approximation will not be very accurate.
•
The normal approximation is quite good when nπ or n(1- π)
exceed about 20. In the middle zone, nπ or n(1- π) between 5 and
20, a modification called continuity correction makes a substantial
contribution to the quality of the approximation.
Normal Approximation to the Binomial (4)
•
The point of the continuity correction is that we are using the continuous
normal curve to approximate a discrete binomial distribution. The
general idea of the contunity correction is to add or subtract 0.5 from a
binomial value before using normal probabilities. A picture of the
situation as the following:
Instead of
P ( y ≤ 5) = p[ z ≤ (5 − 20 ⋅ 0.3) / 20 ⋅ 0.3 ⋅ 0.7 ] = p ( z ≤ −0.49) = 0.3121
use
P( y ≤ 5.5) = P[ z ≤ (5.5 − 20 ⋅ 0.3) / 20 ⋅ 0.3 ⋅ 0.7 ] = p ( z ≤ −0.24) = 0.4052
The actual binomial probability is :
P( y ≤ 5) = C020 0.30 ⋅ 0.7 20 + C120 0.31 ⋅ 0.719 + C220 0.32 ⋅ 0.718 + C320 0.33 ⋅ 0.717 + C420 0.34 ⋅ 0.716 + C520 0.35 ⋅ 0.715
= 0.00080 + 0.00684 + 0.02784 + 0.07160 + 0.13042 + 0.17886
= 0.41636
Homework
• 4.39, 4.40 (p.153)
• 4.95, 4.96 (p.181)
• 4.117 (p.189)
Example 1
•
A book club classifies members as heavy, medium, or light purchasers, and
separate mailings are prepared for each of these groups. Overall, 20% of the
members are heavy purchasers, 30% medium, and 50% light. A member is
not classified into a group until 18 months after joining the club, but a test is
made of the feasibility of using the first 3 months’ purchases to classify
members. The following percentages are obtained from existing records of
individuals classified as heavy, medium, or light purchasers:
First 3
•
Group (%)
Months’
Purchases
Heavy
Medium
Light
0
5
15
60
1
10
30
20
2
30
40
15
3+
55
15
5
If a member purchases no books in the first 3 months, what is the probability
that the member is a light purchaser? (Note: This table contains “conditional
percentages for each column.)
Answer to Example 1
P ( L | 0) = ?
According to Bayes' formula,
P ( L | 0) =
P ( L ∩ 0)
P(0)
P ( L ∩ 0)
P ( 0 | L ) ⋅ P ( L ) + P (0 | M ) ⋅ P ( M ) + P ( 0 | H ) ⋅ P ( H )
P ( L ∩ 0)
=
0.60 ⋅ 0.50 + 0.15 ⋅ 0.30 + 0.05 ⋅ 0.20
0.60 ⋅ 0.50 0.30
=
=
= 0.847
0.355
0.355
=
Example 2
• A screening test for a disease shows the result as the
following table. What are the sensitivity, specificity,
false positive, false negative, predictive value positive,
and predictive value negative?
Test Result
Disease
Present (D)
Absent (D)
Total
Positive (T)
a
b
a+b
Negative (T)
c
d
c+d
Total
a+c
b+d
n
Answer to Example 2
sensitivity = P(T | D) =
a
a+c
d
b+d
b
b+d
a
P (T | D) P ( D)
=
predictive value positive = P( D | T ) =
a + b P(T | D) P( D) + P(T | D ) P( D )
false negative = P(T | D) =
c
a+c
specificity = P (T | D ) =
false positive = P(T | D ) =
predictive value negative = P( D | T ) =
d
P(T | D ) P( D )
=
c + d P(T | D ) P( D ) + P(T | D) P( D)
Example 3
•
An article in the March 5, 1998, issue of The New England Journal
of Medicine discussed a large outbreak of tuberculosis. One
person, called the index patient, was diagnosed with tuberculosis
in 1995. The 232 co-worker of the index patient were given a
tuberculin screening test. The number of co-workers recording a
positive reading on the test was the random variable of interest.
Did this study satisfy the properties of a binomial experiment?
Answer to Example 3
•
Were there n identical trials?
Yes
•
Did each trial result in one of two outcomes?
•
Was the probability of success the same from trial to trial?
•
Were the trials independent?
•
Was the random variable of interest to the experimenter the
number of successes y in the 232 screening tests?
Yes
•
All five characteristics were satisfied, so the tuberculin screening
test represented a binomial experiment.
Yes
Yes
Yes
Example 4
• An economist interview 75 students in a class of 100 to
estimate the proportion of students who expect to
obtain a “C” or better in the course. Is this a binomial
experiment?
Answer:
– Were there n identical trials?
Yes
– Did each trial result in one of two outcomes?
Yes
– Was the probability of success the same from trial to trial?
No
Example 5
• What is the probability distribution of the
number of heads in 10000 tosses of 4 coins?
Answer:
Let y is the number of heads observed. Then the empirical
sampling results for y:
y
Frequency
Observed
Relative
Frequency
Expected
Relative
Frequency
0
638
0.0638
0.0625
1
2505
0.2505
0.2500
2
3796
0.3796
0.3750
3
2434
0.2434
0.2500
4
627
0.0627
0.0625
0.0
0.1
P(y)
0.2
0.3
0.4
Answer to example 5 (continued)
Probability distribution for the number of heads when 4 coins are
tossed.
0
1
2
Number of Heads
3
4
Example 6
•
Suppose that a sample of households is randomly selected from all
the households in the city in order to estimate the percentage in
which the head of the household in unemployed. To illustrate the
computation of a binomial probability, suppose that the unknown
percentage is actually 10% and that a sample of n=5 is selected
from the population.
– What is the probability that all five heads of the households are
employed?
– What is the probability of one or fewer being unemployed?
Answer:
P( y = 5) =
5!
5!
0.95 ⋅ 0.10 =
0.95 ⋅ 0.10 = 0.95 = 0.590
5!⋅(5 − 5)!
5!⋅0!
P( y = 4 or 5) = P(4) + P(5) = 5 ⋅ 0.9 4 ⋅ 0.11 + 0.95 = 0.918
Example 7
•
A company producing the turf grass takes a sample of 20 seeds on
a regular basis to monitor the quality of the seeds. According to
the result from previous experiments, the germination rate of the
seeds is 85%. If in a particular sample of 20 seeds there are only
12 had germinated, would the germination arte of 85% seem
consist with the current results?
Answer:
µ = nπ = 20 × 0.85 = 17
σ = nπ (1 − π ) = 20 × 0.85 × (1 − 0.85) = 1.60
12 − 17
= −3.125
1.6
Thus, y=12 seeds is more than 3 standard deviation less than the
mean number of seeds µ = 17; it is not likely that in 20 seeds we
would obtain only 12 germinated seeds if π really is equal to 0.85.
1500
1000
500
0
Count
2000
2500
The binomial distribution for n = 20 and π=0.85
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Number of Germinated Seeds
14
15
16
17
18
19
20
Example 8
•
The mean daily milk production of a herd of Guerney cows has a
normal distribution with µ=70 pounds and σ=13 pounds.
– What is the probability that the milk production for a cow
chosen at random will be less than 60 pounds?
– What is the probability that the milk production for a cow
chosen at random will be greater than 90 pounds?
– What is the probability that the milk production for a cow
chosen at random will be between 60 pounds and 90 pounds?
Answer to Example 8 (1)
To compute the z value corresponding to the value of 60 pounds.
y−µ
σ
60 − 70
=
= −0.7692
13
0.005
0.010
0.015
0.020
0.025
0.030
z=
0.2206
0.000
Normal Density
•
20
30
40
50
60
70
µ
80
90
100
110
120
Answer to Example 8 (2)
To compute the z value corresponding to the value of 90 pounds.
Then, check the z table to find out the corresponding probability of
the values greater than 90 pounds.
y−µ
σ
=
90 − 70
= 1.5384
13
0.020
0.015
0.010
0.005
Normal Density
0.025
0.030
z=
0.0618
0.000
•
20
30
40
50
60
70
µ
80
90
100
110
120
Answer to Example 8 (3)
The area between two values 60 and 90 is determine by finding the
difference between the areas to left of the two values.
1 − 0.0618 = 0.9382
0.015
0.020
0.025
0.030
0.9382 − 0.2206 = 0.7176
0.005
0.010
0.7176
0.000
Normal Density
•
20
30
40
50
60
70
µ
80
90
100
110
120
Example 9
•
The Scholastic Assessment Test (SAT) is an examination used to
measure a person’s readiness for college. The mathematics scores
are used to have a normal distribution with mean 500 and
standard deviation 100.
– What proportion of the people taking the SAT will score below
350?
– To identify a group of students needing remedial assistance,
say, the lower 10% of all scores, what is the score on the SAT?
Answer to Example 9
To find the proportion of scores below 350:
350 − 500
= − 1 .5
100
0.002
σ
=
Normal Density
y−µ
0.000
0.001
z=
0.003
0.004
•
0.0668
200 250 300 350 400 450 500 550 600 650 700 750 800
µ
•
To find the 10th percentile, we first find z0.1 in z table. Since 0.1003
is the value nearest 0.1000 and its corresponding z is –1.28, we take
z0.1 = -1.28 and then compute:
y0.1 = µ + z0.1σ = 500 + (−1.28) ⋅100 = 500 − 128 = 372
Random Numbers
Example 10 (1)
The population consists of 500 pennies from which we compute the
age of each penny: age = 2000 – date on penny. What are the
distributions of y based on sample of sizes n = 5, 10 and 25?
(Given the population mean µ = 15.070 and the population
standard deviation σ = 10.597.)
15
10
5
0
Frequency
20
25
•
0 1 2 3 4 5 6 7 8 9 1 01 11 21 31 41 51 61 71 81 92 02 12 22 32 42 52 62 72 82 93 03 13 23 33 43 53 63 73 83 94 0
Ages
1600
800
1200
Frequency
1200
800
0
0
400
400
Frequency
1600
2000
2000
Example 10 (2)
0
4
8
12
16
20
24
28
32
36
40
0
4
8
12
28
32
36
40
10.597
10.597
5
15.042
4.728
4.739
10
15.039
3.324
3.351
25
15.078
2.075
2.119
20
24
28
32
36
40
2000
15.070
1600
10.597
n
1200
Standard
Deviation of y
0
Frequency
1
24
800
Mean of y
20
400
Sample
Size
16
Mean Age
Mean Age
0
4
8
12
16
Mean Age
Sampling distribution of y for n = 5, 10 and 25
Example 11
Using the normal approximation to the binomial to compute the
probability of observing 460 or fewer in a sample of 1000 favoring
consolidation if we assume that 50% of the entire population favor
the change.
Answer :
µ = nπ = 1000 × 0.5 = 500
0.015
0.020
0.025
σ = nπ (1 − π ) = 1000 × 0.5 × 0.5 = 15.8
y − µ 460 − 500
z=
=
= −2.53
15.8
σ
0.005
0.010
f(y)
0.0057
0.000
•
430
442
454
466
478
490
502
y
514
526
538
550