Download Chapter 2 - Chris Bilder`s

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
2.1
Chapter 2: Probability
2.1: Sample space
Experiment - an activity for which an outcome is uncertain
Example: Flip a coin – head or tail are unknown until it is
observed
Example: Roll a pair of dice – the numbers rolled are
unknown until they are observed.
Example: Kick a field goal – the success or failure is
unknown until it is observed
Example: Clinical trial examining a new drug
Whether people are cured or not is unknown until
they are observed
An outcome of the experiment measured could be
the number of platelets in their blood.
Example: HIV test – whether or not a person has HIV is
unknown until the test outcome is observed.
Sample space – the set of all possible outcomes of a
statistical experiment; denoted by S
 2005 Christopher R. Bilder
2.2
Example: Flip a coin – S = {H, T}
Example: Roll a pair of dice
Suppose the total on the dice is of interest. Then S
= {2, 3, …, 12}.
Suppose the actual value of each die is of interest.
Then S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}
Suppose the multiplication of the die values are of
interest. Then S = {1, 2, 3, …, 36}.
Example: Kick a field goal – S = {success, failure}
Example: Clinical trial examining a new drug – S =
{cured, not cured}
Example: HIV test – S = {positive, negative}
 2005 Christopher R. Bilder
2.3
2.2: Events
Event – a subset of the sample space
Example: Flip a coin
Let A denote the event of observing a head. A is a
subset of S
Example: Roll a pair of dice
Suppose the total on the dice is of interest so that S
= {2, 3, …, 12}. Let A denote the event of observing
a total of 2. Also, we could let A denote the event of
observing 6 or less. In both cases, A is a subset of
S.
Suppose the actual value of each die is of interest
so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Let
A denote the event of observing a total of 4. Then
the outcomes within A are (1,3), (2,2), (3,1).
Question: Why do we want to define experiments, sample
spaces, and events?
Complement of an event A – subset of all elements in S that
are not in A; denoted by A (or A or Ac)
 2005 Christopher R. Bilder
2.4
Example: Flip a coin
Let A denote the event of observing a head. A is
the event of observing a tail.
Example: Roll a pair of dice
Suppose the total on the dice is of interest so that S
= {2, 3, …, 12}. Let A denote the event of observing
6 or less. A is the event of observing 7 or more.
Intersection of two events A and B – event containing all
elements that are common to A and B; denoted by AB (or
“A and B”)
Example: Roll a pair of dice
Suppose the actual value of each die is of interest
so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}.
Let A denote the event of observing a total of 4.
Let B denote the event of observing a 2 on at least
one of the Rolls. Then B has the outcomes of (2,1),
(2,2), (2,3), (2,4), (2,5), (2,6), (1,2), (3,2), (4,2), (5,2),
and (6,2).
 2005 Christopher R. Bilder
2.5
AB contains only (2,2).
Venn Diagrams are useful to see the last result above.
Events are represented by regions. Below is an
example corresponding to the AB = (2,2):
(4,4) (4,5) (4,6) (5,1) (5,3) (5,4) (5,5)
(1,3) (3,1)
A
(2,2)
(2,1) (2,3)
(2,4) (2,5)
(2,6) (1,2)
(3,2) (4,2)
(5,2) (6,2)
B
(1,1) (1,4) (1,5) (1,6) (3,3) (3,4) (3,5) (3,6) (4,1) (4,3)
(5,6)
(6,1)
(6,3)
(6,4)
(6,5)
(6,6)
S
Mutually exclusive events – if AB =  then A and B are
mutually exclusive or disjoint
Notice AA = 
Union of events – event containing all elements of A only or
B only or both A and B; denoted by AB (or “A or B”)
 2005 Christopher R. Bilder
2.6
Example: Roll a pair of dice
Suppose the actual value of each die is of interest
so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}.
Let A denote the event of observing a total of 4, and
let B denote the event of observing a 2 on at least
one of the Rolls.
AB = (1,3), (3,1), (2,1), (2,2), (2,3), (2,4), (2,5),
(2,6), (1,2), (3,2), (4,2), (5,2), and (6,2).
Some final results:
 AA = 
 AA = S
 A = 
 A = A
 (A) = A
 De Morgan’s Laws: (AB) = AB and (AB) = AB
 2005 Christopher R. Bilder
2.7
To see the last result above, Venn Diagrams are useful:
A
B
S
The orange region is the intersection of A and B. This
graphically represents AB. Everything outside of the
orange region is (AB). Now consider A and B:
A
A
 2005 Christopher R. Bilder
2.8
B
B
When you combine everything in A and B, (AB), one
can see it includes everything excluding the intersection
of AB (the orange area).
Example: Roll a pair of dice
Suppose the actual value of each die is of interest
so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}.
Let A denote the event of observing a total of 4, and
let B denote the event of observing a 2 on at least
one of the Rolls.
(AB) = AB = All elements in S except for (2,2)
 2005 Christopher R. Bilder
2.9
(AB) = AB = All events that are only in the blue
area of the Venn Diagram = (1,1) (1,4) (1,5) (1,6)
(3,3) (3,4) (3,5) (3,6) (4,1) (4,3) (4,4) (4,5) (4,6) (5,1)
(5,3) (5,4) (5,5) (5,6) (6,1) (6,3) (6,4) (6,5) (6,6)
Question: Why do we want to examine intersections and
unions?
 2005 Christopher R. Bilder
2.10
2.3: Counting sample points
Often we want to count the number of possible
outcomes of an experiment or the number of items (or
points) in the sample space, S. This can be done a
number of different ways depending on the problem.
The counting is important so that we can eventually
assign probabilities to events.
Example: Roll a pair of dice
Suppose the actual value of each die is of interest
so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}.
Listed a different way, all possible outcomes in S
are:
Die #1 Die #2
1
1
1
2
1
3
1
4
1
5
1
6
2
1
2
2
2
3
2
4
2
5
2
6
3
1
3
2
Die #1 Die #2
4
1
4
2
4
3
4
4
4
5
4
6
5
1
5
2
5
3
5
4
5
5
5
6
6
1
6
2
 2005 Christopher R. Bilder
2.11
Die #1 Die #2
3
3
3
4
3
5
3
6
Die #1 Die #2
6
3
6
4
6
5
6
6
There are a total of 36 different combinations. What
is a simpler way to determine this than listing out all
possible outcomes?
Generalized multiplication rule – If an “operation” can be
performed n1 ways, and second operation is performed n2
ways, …, a kth operation is performed nk ways, then the total
number of operations can be performed n1n2…nk ways.
This assumes that each operation does not have an effect
on the outcome of the other operations.
Example: Roll a pair of dice
Suppose the actual value of each die is of interest
so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}.
n1=6 and n2=6 so that the total number of outcomes
in S is n1n2 = 66 = 36.
Suppose the total on the dice is of interest so that S
= {2, 3, …, 12}. Notice that the generalized
multiplication rule can not be used directly here.
 2005 Christopher R. Bilder
2.12
The last statement in the generalized multiplication
rule is important. For example, suppose the actual
value of each die is of interest again. Suppose each
die is rolled separately and the type of die for the
second roll is dependent on what happens on the
first. The second die could be a die with a number
of sides equal to the outcome of first die. For
example, if a 3 is rolled on the first die, a 3 sided die
is rolled for the second die. This is an example
where the multiplication rule could not be used
directly.
The rest of Section 2.3 discusses permutations and
combinations. You are not responsible for the
permutations material. We will discuss combinations in
Section 5.3.
 2005 Christopher R. Bilder
2.13
2.4: Probability of an event
Notation: P(A) is read as “the probability that event A
happens”.
Example: Roll a pair of dice
Suppose the actual value of each die is of interest so
that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Listed a
different way, all possible outcomes in S are:
Die #1 Die #2
1
1
1
2
1
3
1
4
1
5
1
6
2
1
2
2
2
3
2
4
2
5
2
6
3
1
3
2
3
3
3
4
3
5
3
6
Die #1 Die #2
4
1
4
2
4
3
4
4
4
5
4
6
5
1
5
2
5
3
5
4
5
5
5
6
6
1
6
2
6
3
6
4
6
5
6
6
Suppose each outcome is EQUALLY likely.
 2005 Christopher R. Bilder
2.14
Let A be the event the sum of the two dies is 2. Then
P(A) = 1/36 since there is only one way, (1,1), the sum
can be 2 and there are 36 different possibly outcomes of
rolling two dice.
Less formally, this can be written as P(2) = 1/36.
Example: What is the probability that you will win in the Pick
5 game of the Nebraska lottery if you choose only one
combination of numbers? Note that 5 numbers are chosen
from 1 to 38 and a number can only be chosen once.
#1 #2 #3 #4 #5
1 2 3 4 5
1 2 3 4 6
1
2

501,942 34 35 36 37 38
(Section 2.3 discusses how to use a “combination” to
find that there is 501,942 different possibilities.)
Each outcome is EQUALLY likely.
P(win) = 1/501,942 = 1.9910-6
Probability Rules
 0P(A)1 for some event A
 2005 Christopher R. Bilder
2.15
Example: The probability it rains today can not be 110%
or -10%
 Let A1,…,Ak be all possible events for an experiment and
they are MUTUALLY EXCLUSIVE. Then
P(A1  A2  ...  Ak )  P(A1)  P(A2 ) 
k
 P(Ak )   P(Ai )  1
i 1
Example: NFL regular reason football
P(win) + P(lose) + P(tie) = 1
Theorem 2.9: If an experiment can only result in one of N
different equally likely outcomes AND if exactly n of these
correspond to an event A, then
P(A) = n/N
Example: Roll a pair of dice
Suppose the actual value of each die is of interest so
that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}.
Let A denote the event of observing a total of 4, and let
B denote the event of observing a 2 on at least one of
the Rolls.
 2005 Christopher R. Bilder
2.16
P(A) = P(total is 4) = 3/36 since A has the outcomes
of (1,3), (3,1), and (2,2)
P(B) = P(at least one dice is a 2) = 11/36 since B
has the outcomes of (2,1), (2,2), (2,3), (2,4), (2,5),
(2,6), (1,2), (3,2), (4,2), (5,2), and (6,2).
 2005 Christopher R. Bilder
2.17
2.5: Additive rules
Below are some important rules regarding probabilities.
Theorem 2.10: If A and B are any two events, then P(AB) =
P(A) + P(B) – P(AB).
Why?
A
B
S
Notice the orange area, AB, is added in twice with A
and B. Thus, it needs to be subtracted out once
This could also be reexpressed as P(AB) = P(A) + P(B)
– P(AB).
Corollary: If A and B are mutually exclusive, then
P(AB) = P(A) + P(B). If A1, A2,…, An are mutually
exclusive then P(A1  A2  ...  An ) = P(A1) + P(A2) + …
+ P(An).
 2005 Christopher R. Bilder
2.18
What would mutually exclusive events look like in a
Venn Diagram?
Theorem 2.11: If A, B, and C are any three events, then
P(ABC) = P(A) + P(B) + P(C) – P(AB) – P(AC) –
P(BC) + P(ABC)
Show this on your own through a Venn Diagram!
Theorem 2.12: If A and A are complementary events, then
P(A) and P(A). Also, P(A) + P(A) = 1 and P(A) = 1 - P(A)
Example: Roll a pair of dice
Suppose the actual value of each die is of interest so
that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}.
Let A denote the event of observing a total of 4, and let
B denote the event of observing a 2 on at least one of
the Rolls.
P(AB) = P(A) + P(B) – P(AB) = 3/36 + 11/36 – 1/36
Note that AB has the outcome of (2,2).
 2005 Christopher R. Bilder
2.19
Example: Larry Bird (bird.xls)
Free throws (FTs) are typically shot in
pairs. Below is a “contingency table”
summarizing Larry Bird’s first and
second FT attempts during the 1980-1
and 1981-2 NBA seasons. The data
source is Wardrop (American
Statistician, 1995)
Second
Made Missed Total
Made 251
34
285
First Missed 48
5
53
Total 299
39
338
Interpreting the table:
 251 first AND second FTs were both made
 34 first FTs were made AND the second FTs were
missed
 48 first FTs were missed AND the second FTs were
made
 5 first AND second FTs were both missed
 285 first FTs were made regardless what happened on
the second attempt
 299 second FTs were made regardless what
happened on the first attempt
 338 FT pairs were shot during these seasons
 2005 Christopher R. Bilder
2.20
More formally,
 Let A = 1st FT is made. Then A is 1st FT is missed.
 Let B = 2nd FT is made. Then B is 2nd FT is missed.
A
A
B
251
48
B
34
5
The “counts” table can be transformed into a table of
probabilities by dividing each numerical cell by 338.
Second
Made Missed Total
Made 0.7426 0.1006 0.8432
First
Missed 0.1420 0.0148 0.1568
Total 0.8846 0.1154 1
 What is P(A) = P(1st made)?
 What is P(B) = P(2nd made)?
Probabilities on the margins of the table (total
column and row) are often called “marginal
probabilities”.
 What does 0.7426 represent in our symbolic notation?
 What is the most likely joint outcome of the first and
second FT to occur?
 2005 Christopher R. Bilder
2.21
Probabilities in the body of table are often called
“joint probabilities”.
 P(1st made)
= P(1st made  2nd made) + P(1st made  2nd missed)
= 0.7426 + 0.1006 = 0.8432
This can be expressed as P(AB)+ P(AB) = P(A)
 What is P(1st made  2nd made) = probability make at
least one? There are a few different ways to find this.
1.
Second
Made Missed
Made 0.7426 0.1006
First
Missed 0.1420 0.0148
Add the probabilities in yellow.
2. P(AB) = P(A) + P(B) - P(AB) = 0.8432 + 0.8846 0.7426 = 0.9852
Second
Made Missed Total
Made 0.7426 0.1006 0.8432
First
Missed 0.1420 0.0148 0.1568
Total 0.8846 0.1154 1
 2005 Christopher R. Bilder
2.22
3. P(AB)
= 1 – P[(AB)] using the complement
= 1 – P(AB) using De Morgan’s laws
= 1 – 0.0148 = 0.9852
The use of Excel with a contingency table:
The use of absolute cell references were helpful when
copying formulas.
 2005 Christopher R. Bilder
2.23
2.6 and 2.7: Conditional probability and multiplicative
rules
Conditional probability – The probability an event happens
conditioned on another event happening.
Consider two events A and B. The probability that A
occurs given that B occurred is called a conditional
probability. It is denoted by P(A|B). This is read as “the
probability of A GIVEN B.
This probability can be found from
P(A  B)
,
P(A | B) 
P(B)
provided P(B)0.
Note that another conditional probability could also
be stated as P(B|A) = P(AB)/P(A).
Where does the formula P(A | B) 
P(A  B)
come from?
P(B)
 Suppose the event B occurs and it had a particular
probability (P(B)) of occurring. This now limits the
possibility of what other events occur.
 To determine the probability that A occurs, we must
examine P(AB) since B occurs.
 2005 Christopher R. Bilder
2.24
 To find the probability that A occurs given the B
occurred, we use P(AB)/P(B). This gives us the
probability of A occurring out of all possibilities where
B occurred.
Example: Larry Bird (bird.xls)
Second
Made Missed Total
Made 0.7426 0.1006 0.8432
First
Missed 0.1420 0.0148 0.1568
Total 0.8846 0.1154 1
P(1st missed  2nd made)
P(2 made | 1 missed) 
P(1st missed)
0.1420

 0.9057
0.1568
nd
st
Written in terms of
B
B
A
A
P(B|A) = P(AB)/P(A) = 0.1420/0.1568 = 0.9057
Therefore it is still very likely that Larry Bird will make the
second free throw even if the first one is missed.
 2005 Christopher R. Bilder
2.25
Question for basketball fans: Why would this probability
be important to know?
Verify on your own that P(2nd made | 1st made) = 0.8807.
Example: The Showcase
Showdown on the Price is Right
On the game show, The Price
is Right, three contestants are
given an opportunity to spin
the big wheel. The big wheel
has monetary values of 5, 10,
…, 100 cents on it. The
contestant that is closest to a
dollar (100 cents) in one or a
combination of two consecutive spins, without going
over, wins the game. If there is a tie, the tied players are
given one additional spin with the player having the
highest number in that spin winning.
Coe and Butterworth (American Statistician, 1995)
compute conditional win probabilities for the first person
spinning the big wheel. The probabilities are shown in
the table below.
First Spin P(win | spin once
(i)
& 1st spin=i)
.00034
5
P(win | spin twice &
1st spin=i)
.20595
 2005 Christopher R. Bilder
2.26
First Spin P(win | spin once
(i)
& 1st spin=i)
.00121
10
.00285
15
.00540
20
.00906
25
.01415
30
.02101
35
.03009
40
.04190
45
.05704
50
.08346
55
.11829
60
.16319
65
.21563
70
.28416
75
.36818
80
.46990
85
.59169
90
.73606
95
.90567
100
P(win | spin twice &
1st spin=i)
.20589
.20574
.20547
.20502
.20431
.20326
.20176
.19966
.19681
.19264
.18672
.17856
.16778
.15357
.13517
.11167
.08209
.04528
.00000
For example, P(win | spin once & 1st spin=5 cents) =
0.00034
What is the optimal strategy the first person should
follow in deciding whether or not to spin twice?
 2005 Christopher R. Bilder
2.27
Independence – Events A and B are independent if P(A|B) =
P(A) or equivalently P(B|A) = P(B)
In words, this means the probability of event A is not
affected by event B and vice versa.
As a result of the conditional probability equation,
P(AB) = P(A)P(B) also means independence. Why?
Example: Larry Bird (bird.xls)
What does independence mean in this example?
P(2nd made | 1st missed) = 0.9057
P(2nd made) = 0.8846.
Dependence exists - but notice how close they are.
Notes:
 Only one conditional probability needs to be checked.
 Typically, one would consider the 338 free throws here
a sample from the population of all Larry Bird’s free
throw attempts. This would be especially desirable to
do if Larry Bird still was playing basketball
professionally. Questions about whether this is a
representative sample would need to be addressed.
Assuming it was a representative sample, one may be
 2005 Christopher R. Bilder
2.28
interested in drawing an inference from the sample to
the population all free throws. A chi-square hypothesis
test for independence could be conducted using the
data. The result is there is not sufficient evidence to
prove dependency. In my Chapter 2 lecture notes of
my STAT 875 Categorical Data Analysis course, I do
perform the test for the data if you would like to see
the results.
Independence is a VERY important concept to
understand and we will be using this frequently in the
future. Here is another example of where independence
can be used.
Example: Quality control
Experience has shown that a manufacturing operation
produces, on the average, only one defective unit in 10.
These are removed from the production line, repaired,
and returned to the warehouse. Suppose that during a
given period of time you observe five defective units
emerging in sequence from the production line.
1) If prior history has shown that defective units emerge
randomly from the production line, what is the
probability of observing a sequence of five
consecutive defective units?
 2005 Christopher R. Bilder
2.29
Since units emerge “randomly”, this implies
independence.
Let A1=1st unit defective,…, A5=5th unit defective.
P(All 5 are defective)
= P(A1A2A3A4A5)
= P(A1)P(A2)P(A3)P(A4)P(A5) because of
independence
= 0.10.10.10.10.1
= 0.00001
Therefore, this would happen VERY rarely!
2) If five consecutive defective units did emerge from the
production line, what would you conclude about the
process?
There is something wrong with the manufacturing
process.
Multiplicative rule - P(A | B) 
P(A  B)
implies that P(AB) =
P(B)
P(A|B)P(B)
Theorem 2.15: Consider the events of A1, A2,…, Ak. Then
 2005 Christopher R. Bilder
2.30
o P(A1A2) = P(A1)P(A2|A1)
o P(A1A2A3) = P(A1)P(A2A3|A1)
= P(A1)P(A2|A1)P(A3|A2A1)
Why is P(A2A3|A1) = P(A2|A1)P(A3|A2A1)?
Remember that P(A2A3) = P(A2)P(A3|A2)
o In general, P(A1A2A3…Ak)
= P(A1)  P(A2|A1)  P(A3|A1A2)  … 
P(Ak|A1A2...Ak-1)
Sensitivity and specificity
Diagnostic tests are used to determine if a person has a
disease or not. These tests are not always correct. The
makers of the tests try to make them very “accurate” in
detecting a disease. However, this form of accuracy
comes at a cost in terms of incorrectly saying that some
people have the disease when they do not really have it.
Example: HIV testing
Suppose a clinical trial is being conducted on a new HIV
test. The test measures a number of different variables
related to the presence of HIV. Using the observed
 2005 Christopher R. Bilder
2.31
results for a patient, the test decides if a person is HIV
positive or not. Below are the possible outcomes:
HIV
actual
HIV test results
Negative
Positive
No Correct=True Negative Error=False positive
Yes Error=False Negative Correct=True positive
The test is correct when a person with HIV actually tests
positive. Similarly, the test is correct when a person
without HIV actually test as negative. There is the
possibility the test could be incorrect. This happens
when someone has HIV and the test says the person is
negative. Also, this happens when someone does not
have HIV and the test says the person is positive.
Obviously, it is important to control the probabilities of
making these errors.
Statisticians, epidemiologists, physicans,… are
specifically interested in two particular probabilities
associated with the contingency table above:
 Sensitivity = P(Test is Positive | Actual is Yes)
This is the probability a person tests positive, given the
person actually has HIV.
 Specificity = P(Test is Negative | Actual is No)
 2005 Christopher R. Bilder
2.32
This is the probability a person tests negative, given
the person does not actually have HIV.
According to the FDA, the ELISA test has a sensitivity of
0.993 and the specificity is 0.9999. What is the
probability of making each error?
 P(Test is Negative | Actual is Yes) =
 P(Test is Positive | Actual is No) =
Hint: P(A|B) = 1-P(A|B).
Johnson and Gastwirth (1991), estimate the proportion
of “incidence of HIV positive in the general population of
people without known risk factors” to be 0.000025.
Obviously, this may be different now since this value is
>10 years old. I could not find an updated value.
This means P(Actual is Yes) = 0.000025 or 250
people out of 10,000,000 people have HIV.
Out of these 250 people, how many would the Elisa test
give a “Test is positive” result?
How many of the 10,000,000-250 = 9,999,750 people
who do not have HIV would the Elisa test give a “Test is
positive” result?
 2005 Christopher R. Bilder
2.33
Therefore, the test gives ____ “Test is positive”
results, but only ____ actually have HIV.
____% of the “Test is positive” results are incorrect.
What should you do if you take the test and it turns up
positive?
How could we decrease the ____%?
Lower the “sensitivity” of the test. If this was done,
then more people who actually have HIV would “test
is negative”.
For more on determining sensitivity and specificity, see my
Chapter 8 lectures notes for STAT 873. Specifically, see the
discussion on Receiver and Operating Characteristic (ROC)
curves.
 2005 Christopher R. Bilder