Download Probability I. Why do we need to look probability? Probability is

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Randomness wikipedia , lookup

Probability box wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Boy or Girl paradox wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Risk aversion (psychology) wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Probability
I. Why do we need to look probability?
Probability is fundamental to statistics (okay, so that doesn't explain things).
Briefly, we use statistics to figure out how “probable” something is.
In other words, if we observe something (an event of some kind), we then figure out what the
probability of that event is.
If the probability of that event occurring “by chance” is very low, we tend to think
something else is going on.
A biological example:
We try out a new medicine to reduce blood pressure. We find the following (measuring systolic
blood pressure):
̄y
Placebo
156
186
145
168
149
164.4
Medicine
147
163
162
143
147
152.4
Does the medicine work?
To answer this question, we ask:
What is the probability of getting this result if the medicine does not work?
Another way of looking at this:
What is the probability of getting the above result completely due to chance?
If this probability is low (small), then we say that the result can't be due to chance. In
other words the medicine works.
(Incidentally, the probability of getting the above result due to chance is 27.5%;
do you think the medicine works?)
But before we can really understand what happened in this example, we need to understand probability.
Let's try another, non-biological, example:
We flip a coin 10 times. There are 11 possible results:
5 heads, 5 tails
4 heads, 6 tails
6 heads, 4 tails
3 heads, 7 tails
7 heads, 3 tails
2 heads, 8 tails
8 heads, 2 tails
1 head, 9 tails
9 heads, 1 tail
0 heads, 10 tails
10 heads, 0 tails
For which of these results would you think that the coin is unfair?
Most likely if you got one of the results near the bottom of either of these columns you'd
think the coin is unfair.
Not convinced? Suppose I give you a dollar for every head, and you give me a
dollar for every tail, and we flip the coin 10 times (my coin!) and we get 0 heads
and 10 tails.
You really think I’m being honest??
When we can figure out the probability of getting 8 tails and 2 heads, we will be done with basic
probability.
So, to summarize, we need probability to answer questions about our data: what is the probability our
results are due to chance?
II. Some very basic probability.
We won’t learn a lot of probability here. Just enough so that you understand a couple of basics.
Probability is a sub-discipline in mathematics, it's that complicated!
So let's define probability:
Probability is a number that describes the chance of an event happening. We note probability by:
Pr{E} (sometimes also just P{E})
This means “the probability of the event ‘E’“
So what is E?
Some examples:
E1: Heads, when tossing a coin.
E2: the # 6, when rolling a dice.
E3: the # 7, when rolling two dice.
E4: the Orioles (or Nationals) winning the World Series.
so we have:
Pr{E1} = 1/2
Pr{E2} = 1/6
Pr{E3} = 1/6 (more on this soon)
Pr{E4} = ????? probably close to zero, but more soon.
Important:
Pr{E} is always between 0 and 1. More formally:
0 ≤ Pr{E} ≤ 1
If you ever calculate a probability outside this range, you have made a mistake!!
Your text is a little strange on the top of p. 90 [80] {86}. Let’s think about things this way instead
(this definition is a bit more old fashioned, but it's a bit easier to explain):
Pr { E } =
the number of ways the event (E )can occur
the number of possible outcomes
So let's apply this to the probabilities we calculated above:
Pr{E1} = 1/2
because there is only 1 way of getting a head, but two
possible outcomes (heads or tails).
Pr{E2} = 1/6
again, because there is only 1 way of getting a 6, but there
are six possible outcomes (the numbers 1 through 6).
Pr{E3} = 1/6
this one is trickier. First, figure we need to figure out how many
different ways you can get 7:
dice #
1
2
1
2
3
4
5
6
6
5
4
3
2
1
Note that the sum
in each case is 7.
In other words, there are six different ways you can get a 7.
How many possible outcomes are there?
If you make up a list like above, you’ll get 36 different
possible outcomes.
So we have 6/36, which simplifies to 1/6.
But there's a simpler way to figure out all possible
outcomes:
The first dice can give us 6 different outcomes.
The second dice can give us 6 different outcomes.
But this together and we have:
6 x 6 = 36
And we have 35 possible outcomes and again we
have 6/36 = 1/6 for our answer.
Pr{E4} = ??
This one is actually quite difficult, and we won't illustrate this. But
a few comments:
For many years the Orioles were one of the worst teams in
baseball, in which case we could assume the numerator is
essentially 0, and we don't care about the denominator.
These days it would be harder to figure out since the
numerator is not 0.
A totally naive way would be to assume that every team is
doing equally well:
There are 30 teams in baseball (in 2015).
So if every team is doing equally well, we can just
do: 1/30.
(One possible way to win the World Series,
30 possible outcomes (30 teams)).
This is a rather stupid way do do things!
Professional sports statisticians use much more
complicated ways to figure this out!
Let's do another example and introduce some new/different notation:
For a single die, what is the probability of getting a 2, 3, or 4?
Let Y = our result
Remember, Y is our random variable
So now we can write our event as follows:
E=2≤Y≤4
Notice that there are three possible ways to get what we
want (2, 3, or 4).
We have six possible outcomes (the numbers 1-6).
So we can write our probability as follows:
Pr{E} = Pr{2 ≤ Y ≤ 4} = 3/6 = 1/2.
See also your text on page 92 - 93 [96 - 97] {102-103} for this and other examples.
Combining probabilities:
Again, we’re just introducing the basics.
Suppose you wanted to figure out the probability of getting a 6 with a dice and heads with a coin.
You can count up all the possible outcomes, and do it that way, or you can multiply some
probabilities (but see note below on independence). For instance:
Let Y1 = outcome with dice, and let Y2 = outcome with coin
Pr{Y1 = 6} = 1/6
Pr{Y2 = heads} = 1/2
(See above if you don't remember how we got these probabilities)
So, assuming Y1 and Y2 are independent (see below) we have:
Pr{Y1 = 6 and Y2 = heads} = 1/6 x 1/2 = 1/12
Your book uses “probability trees”, which are very convenient when you don't have a lot
of outcomes, but can get out of hand very fast otherwise (as in our example):
Independence: our events (e.g., the dice and coin above) can not influence each other.
In statistical language, they must be INDEPENDENT.
For example, you somehow connect your coin and dice so that whenever
you roll a 6 you ALWAYS get heads
Obviously, the multiplication rule above is no longer valid/
We won't worry about instances like this, although they're obviously very
important - see your text if you're interested.
Let's do another example, this time using the text (example 3.19 [3.16] {3.2.9}):
Newborn infants sometimes suffer from a serious respiratory ailment known as “hypoxic
respiratory failure”. A possible alternative treatment to surgery is Nitric Oxide.
Does Nitric Oxide work?
Newborns with the disease were assigned into two equal sized groups. One group
was given nitric oxide, the other was not.
In the group given nitric oxide, 54.4% did not need surgery. In the control group,
only 36.4% did not need surgery.
So what is the overall probability of needing surgery?
In the control group, 1- .364 = .636 or 63.6% needed surgery.
In the treatment group 1- .544 = .456 or 45.6% needed surgery.
So the overall probability of needing surgery becomes:
Pr{surgery} = (.5 x .456) + (.5 x .636) = .228 + .318 = .546
So 54.6% of newborn infants with this condition needed surgery.
Note that we multiply the various probabilities and then add up the outcomes
we’re interested in.
(Caution: if you have the 2nd edition there is a slight math mistake in this
example.)
Comment: if you want to express probabilities in %, that's fine (the book often asks you to do
this), but make sure you do not forget the % symbol.
Conditional probability:
When we talk about conditional probability, what we’re interested in is “what is the probability
that Event A happens GIVEN that Event B has happened”.
For example:
Pr{rolling an 8 with 2 dice GIVEN that the first dice shows a 3}
We usually write this like this:
Pr{rolling an 8 with 2 dice | that the first dice shows a 3}
where the vertical bar (“|”) means “given”.
So lets solve the above problem:
One die shows a 3
This implies the other die MUST come up 5.
So what is Pr{rolling a 5}? Easy, just 1/6.
But notice the following (one way of looking at independence):
Pr{rolling an 8 with 2 dice } = 5/36
(You can do the math yourself, just add up all the possible ways of getting an 8 with 2
dice and divide by 36)
This is NOT the same answer we got above. This, incidentally, tells us that the events “rolling an
8 with 2 dice” and “the first dice shows a 3" are not independent.
A good way of thinking about conditional probability is to "redefine" your universe:
For example, in the above dice rolling experiment, we are no longer interested in all
possible outcomes with two dice, JUST THOSE in which the first dice you've rolled
shows a 3.
So you only look at the results where the first dice showed a 3, and ignore all others
For example, you're not interested in ANYTHING if the first die is a 2.
***************
There is a more formal approach in your text. See p. 604 - 605 [90 - 92] {97} if you're
really interested.
If you do look at the more formal approach, be aware that calculating the
probability for the numerator may not be straight forward since the events may
not be independent (so you can't always just multiply))
***************
Let's do an example from the book on p. 98 and 99 [86 & 87] {92 & 93}. This is interesting
because it also talks about medical testing.
First, let's list some probabilities (see Fig. 3.6 {3.2.5}):
Pr{having disease and (+) test } = .076
Pr{having disease and (-) test } = .004
Pr{not having disease and (+) test } = .092
Pr{not having disease and (-) test } = .828
Also note that the probability of having this disease is 0.08 (and therefore the probability
of not having the disease is 0.92).
It is obviously possible that someone who doesn't have the disease will test positive, or
someone who has the disease will test negative. These results would be called a “false
positive” or “false negative” respectively.
Just because the test comes back positive, doesn't mean one has the disease. For example
3.21 [3.18] {3.2.11}:
Suppose the test is positive, what is the probability someone actually has the
disease?
Pr{having disease | (+) test}
First we need to figure out the overall probability of a postive outcome:
Pr{ (+) test } = .076 + .092 = .168
(we just added up everything above with a (+) test. This is our new
universe. We’re no longer interested in anything with a (-) test)
Then we get Pr{having disease and (+) test } = .076
(this is straight from above)
Finally, we figure out what we want:
Pr{ having disease and (+) test } = 0.076 = 0.452
Pr{ (+) test }
0.168
(For those who are really curious - notice the similarity with the
above and the more formal approach - it's not an accident!)
And finally we can say that
Pr{having disease | (+) test} = 0.452
This is pretty awful
It means that even if you have a positive test, chances are you DON’T
have the disease.
This is a horrible test!
Let’s take the example one step further and figure out:
Pr{no disease | (-) test}
very quickly now:
Pr{ (-) test }= .004 + .828 = .832
so Pr{no disease | (-) test} = .828/.832 = .995
This is much better.
If you have a negative test, you almost certainly don’t have the disease.
(Note that your text doesn't use conditional probability for this problem, which makes it a
bit confusing following it)
III. Back to coins (and introducing the binomial distribution):
Let’s return to our problem from right at the beginning, flipping a coin 10 times.
So what IS the probability of getting 8 tails if the coin is really fair?
Let’s figure out the number of possible outcomes:
All possible ways of getting 0 tails: HHHHHHHHHH
All possible ways of getting 1 tail:
THHHHHHHHH
HTHHHHHHHH
HHTHHHHHHH
etc.
All possilbe ways of getting 2 tails: TTHHHHHHHH
THTHHHHHHH
THHTHHHHHH
etc.
Then you need to list all possible ways of getting 3 tails, then 4 tails, and so forth. You
could do it this way, but you’d be at it for a very long time.
Obviously we need to do something else.
The binomial coefficient:
In a situation like this where we have a bunch of trials (tosses), we can figure out how many
different possible ways we can get what we want by using something called the binomial
coefficient:
n!
n = C =
n
j
j !  n− j!
j

The “!”symbol means “factorial”. This is defined as follows (for any positive integer, x):
x! = x(x-1)(x-2)...(2)(1)
so, for example, 3! = 3 · 2 · 1 = 6, and 5! = 5 · 4 · 3 · 2 · 1 = 120
This gets big FAST (try 10! or 20!)
Also, 0! = 1
Why?? It doesn't seem to make an sense, but it works.
You can actually show this, but it's complicated and involves some messy
calculus.
So what about all the other stuff?
n = the number of trials
j = the number of “successes” (we'll be using “y” here eventually)
So, for our example, we have:
10!
10 =
8! 10−8!
8
 
Don't just plug this in and use brute force to get an answer - often you can cancel
a lot of terms and even do this by hand (without a calculator) after you've
simplified it (demonstrate).
Incidentally, your book lists binomial coefficients in table 2 of the
appendix. To use it, figure out n and j (= y), then look down the row for n,
and the column for j.
For example, if n = 5 and j = y = 3 , the table gives 10.
In any case, the formula tells us that there are 45 different ways of getting 8 tails
if we toss a coin 10 times.
We could now go through and calculate the number of different ways of getting 0 tails, 1
tail, 2 tails, etc. to get our denominator, and then do 45/denominator.
This would work, but even this is still tedious.
Incidentally, if you want to know why the binomial coefficient works, see Appendix 3.2
in the 3rd edition, or 3.1 in the 4th edition.
So let’s multiply some probabilities instead:
The probability of getting a tail = .5, so if we want 8 tails, we can multiply:
.5 · .5 · .5 · .5 · .5 · .5 · .5 · .5 = .58 = 0.00390625
But if we have 10 trials, that means we have 2 heads, so we multiply that in as well:
.5 · .5 = .52 = .25
Now if we multiply everything together, we get:
0.00390625x0.25=0.0009765625
What we have now is the probability of one way of getting 8 heads and 2 tails
BUT there are 45 ways of getting 8 heads and 2 tails, so we have:
45 · 0.00097625625 = 0.0439
And finally, the probability of getting 8 heads in 10 tosses is 0.0439
This is an application of the binomial distribution formula, which is given as follows:
( nj) p q
j
n− j
= n p j (1− p)n− j
j
()
where p = probability of success
q = 1 - p = probability of failure
n = the number of trials
j = y = the number of successes
Most statisticians prefer the second version of the formula since it has one less variable.
Also remember that we will be using y instead of j shortly (we're just using j right now to
be consistent with the text)
In the coin example we're working on we have:
p = probability of tail = .5
q = 1 - p = probability of head = .5
n = the number of tosses = 10
j = y = the number of successes = 8
so we have, (much easier now):
P (8 tails in10 tosses ) = 10 .58 .52 = 0.0439
8
( )
Let’s finish for today by going through example 3.29 [3.45] {3.6.4 - but note that for some
bizarre reason the 4th edition changed p from 0.39 to 0.37}.
We are sampling 5 individuals from a large population (basically, sampling with
replacement; we may talk more about this later)
39% of the individuals in the population are mutants (37% if you have the fourth edition).
So the probability of getting three mutants is:
P (3 mutants in a sample of 5) = 5 .393 .612 = 10×0.0221 = .221
3
()
(If you have the fourth edition, you'll get 0.201 here).
Your book lists all possible combinations for a sample of 5 in table 3.5 {3.6.3}.
You should look at this and make sure you know how they got the results.
Your text presents the binomial distribution in a slightly different way (there’s nothing wrong
with it, it’s just different).