Download The Idea of Independence- Part II

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
The Idea of Independence- Part II
in Probability and Statistics
Sampling
Henry Mesa
Use your keyboard’s arrow keys to move the slides
forward (▬►) or backward (◄▬)
Use the (Esc) key to exit
Use your keyboard’s arrow keys to move the slides
forward (▬►) or backward (◄▬)
If you want to stop the slide show use the Esc key on your
keyboard.
As you view the slides have paper and pencil handy. Take down
notes, and when asked to guess at a result do so before going on. If
something does not make sense, go back through the slides using
the backward (◄▬) key on your keyboard. If the slides do not
make sense to you then write down your question and ask your
instructor.
On the last series we looked at independence, and how using
the conditional notation P(A | B) communicates if we have
independence.
P(A | B) = P(A) only if the two events are independent
But the examples we used last time where of the type which
I like to view as “starting from a whole, and then going
“into” subgroup of the whole as depicted below.
For example, a deck of card of “regular” playing cards contain
52 cards, 13 clubs, 13 diamonds, 13 hearts, and 13 spades. The
names are called suits. Each suit, contains one card called the
ace.
P(ace) = 4/52. There are four aces,
out of a deck of 52 cards.
Suppose that I remove all the cards except the club cards.
What is the chance of getting an ace now? P(ace | club) = ?
P(ace | club) =1/13
P(ace) = 4/52 = 1/13
We have independence between events.
Notice how we went from a large group, to a smaller group
within the larger group.
The notation P(ace | club) , makes it clear to
the reader that we have changed sample space,
by using the symbols “ | club” in the function
notation. We want the probability of an ace
given that our new sample space contains only
club cards.
Original sample space started
with 52 cards.
Subspace, is subset of
original sample space.
Also notice how the question is about a sample of size one,
that is one card is chosen randomly. A very important
distinction.
Sampling (more than one item) involves the exact opposite
effect. We start with an original sample space, and use it to
make a bigger set of itself.
Here is the analogy that I like to present to you. The
elements in chemistry are suppose to be the simplest
structure that can be used to create more complicated
structures.
Thus, hydrogen, H, is an element, and so is oxygen, O. But
put them together in the right order and combination and we
get water, H2O, a compound.
We can do the same with sample spaces, and events. I like
to think of sampling as creating the equivalent of creating
compounds in chemistry.
As we study probability that is a good way to think about more
complicated structures in probability, as compounds in
chemistry.
What is the chance of getting an ace on a single draw from
a deck of cards? P(ace) = 4/52; this is like an element in
chemistry, a simple event.
What is the chance of getting exactly three aces on a draw of
five cards from a deck of 52?
P(ace AND ace AND ace AND not ace AND not ace) = ?
This is like a compound in chemistry, we used the simple
event of getting an ace, to create a more complicated
structure. But if you look closely, this involved sampling
from the deck multiple times.
But in my particular example using a deck of cards, my sample
space changed from 52 items in it, {10 of clubs, ace of spades,
3 of hearts, …} to a more complicated structure with millions
of items in it. Just one “thing” consists of five connected
smaller events such as (ace of hearts, 10 of clubs, five of
spades, 7 of hearts, three of clubs). Those five smaller events
are just one item when I draw five cards from a deck of 52.
Reiterating on the analogy, this is like saying H is hydrogen,
O is oxygen, but to get one water molecule you need two
hydrogen, and one oxygen connected in a precise manner;
H2O. You need three elements to get one water molecule.
So in order to make the concept of independence but from the
sampling perspective understood, we need to see how the
notation P(A | B) is used when we sample more than one item.
After all, to show independence I need to show that
P(A | B) = P(A) and that will not change.
Also, keep in mind that when I ask about independence in a
problem where I am sampling more than one item, I am asking
about a change in probability (chance) during the sampling
procedure.
The key word here is “during.” During the process of
sampling asks, “Has the probability of getting a particular
event changed because it is now the fifth item to be chosen for
example?”
So how do we interpret the symbol P(A | B) during sampling?
It turns out it is easier than you think. Notice when one
samples, we think of the actual sampling as taking place as a
sequence: sample 1st item, then second, then third, and so on
until you get the last item in your sampling. This does not
have to be the case; the sampling can occur simultaneously.
Which ever view point you choose it does not change the
final outcomes of this discussion.
So here is how the notation P(A | B) works.
Suppose that I am going to choose three cards from a deck
of 52 cards without replacement. Now I could ask, what is
the probability of choosing an ace?
As stated earlier P(ace ) = 4/52.
Now I actually sample a card and it turns out to be a 10 of
hearts. I keep the card and sample again.
What is the probability of an ace now? And here is how I
would write the question.
P(ace | 10 hearts) = ? I am using the “given that” symbol “ | “
to indicate that my first card is an 10 of hearts. How do I know
that it suggests sampling as described here versus, the one
sample scenarios depicted earlier?
By context of the scenario, that is, the notation alone will
not indicate which scenario is which, but under context of
the situation, written in words, you can then tell what is the
intent of the notation.
So, P(ace | 10 hearts) = 4/ 51. Notice the denominator is 51
since I have only 51 cards left and 4 aces left.
Do I have independence during the sampling?
No, since on the first draw, P(ace) = 4/52, but on the second
draw P(ace | 10 hearts) = 4/51.
Thus, P(ace) ≠ P(ace | 10 hearts) and I do not have
independence during the sampling. Suppose that on the second
draw the card is a 5 of clubs.
What is the probability of an ace on the third attempt? See
if you can write down the question using function notation.
So, P(ace | 10 hearts and 5 clubs) = 4/50
Did you get it? Now what is the answer?
Do you see the pattern?
P(item you want | what has occurred already during sampling)
Thus, answering a question about independence concerning
sampling is visually easier to tell.
I have ten marbles in a bag, of which four are red and six
are green. I will sample twice from the bag, and I will not
put the marble back after each draw. Do I have
independence during the sampling procedure?
Can you answer the question?
I bet you can?
Remember what you want to answer is
P(green | red) = P(green)? Or P(red | red) = P(red)?
Or P(green | green) = P(green)? Now try it before
moving on.
Well lets pick one. P(green | green) = P(green)?
P(green) = 6/10 and P(green | green) = 5/9, which means that
we do not have independence since 6/10 ≠ 5/9.
You may start to ask how do we then get independence during
sampling then? Do we ever have independence during
sampling?
Consider the following sampling scenario.
When we sample a marble from the bag, note the color and
then put it back in the bag, shake the bag and sample again.
Do we have independence now during sampling? There
are still 10 marbles in the bag, six green four red.
Well? Give it a try first!
Remember what you want to answer is
P(green | red) = P(green)? Or P(red | red) = P(red)?
Or P(green | green) = P(green)? Now give it a try.
P(red) = 4/10
P(red | green) = 4/10. Yes we have independence during the
sampling procedure.
The probability of choosing a red marble, now that we put
the marbles back after each pick, remains the same from
sample to sample. We have independence during the
sampling procedure.
The sampling procedure as outlined is called sampling with
replacement, versus the first scenario which is sampling
without replacement.
By putting the marbles back I have created the scenario,
sampling with replacement.
If I keep the marbles after each pick, then this is called
sampling without replacement.
So far the sample space did not come to play as promised.
Oh but it did, I just did not emphasize it yet. So lets look
at some important issues here.
Why should I care? So, you can recognize what to do when
I am not around. Right now I am leading you, but later
when you are alone you will have some tools to fall back
on.
In the marble scenario the population I am sampling from
(sample space) is finite; 10 marbles. Keep in mind that my
sample space of my sampling procedure is much larger than 10
items (not important right now but keep this in the back of your
mind for future use).
Consider flipping a fair coin. I flip a coin three times. Do I
have independence during sampling? See if you can answer
the question.
Remember to use conditional notation to answer the
question.
You know P(A | B) = P(A) when you have independence.
Well?
In the marble scenario the population I am sampling from
(sample space) is finite; 10 marbles. Keep in mind that my
sample space of my sampling procedure is much larger than 10
items (not important right now but keep this in the back of your
mind for future use).
Consider flipping a fair coin. I flip a coin three times. Do I
have independence during sampling? See if you can answer
the question.
Remember to use conditional notation to answer the
question.
You know P(A | B) = P(A) when you have independence.
Well?
P(Heads) = 0.5
P(Heads | Tails ) = 0.5
Do I have independence?
Yes, since the probability of getting a coin to appear heads
does not change from sample to sample.
Notice that there is no “replacement” here. Or is there? We
could argue for a while but let us not get sidetracked. We
could think of it as the coin, by the nature of the problem,
automatically resets itself. Some people like to think of it
as a bag with infinitely many coins in it, which will be a
helpful analogy in about two or three more slides. Some
other people say that “the coin has no memory.” All these
statements are trying to come to grips with the problem at
hand.
Let me give you some scenarios and see if you can detect if
independence exists or does not during sampling.
In a class of forty children, 15 of them are girls
I will sample five children at random, and note down their
gender. Do I have independence during sampling? See, if
you can answer the question.
To reiterate, use the notation P(A | B) = P(A) to show that I
do or do not have independence.
Well?
P(girl) = 15/40, but P(girl | girl) = 14/39 so I do not have
independence.
Notice that to answer the question, I assumed that I would not
pick the same child twice. Assuming that, then there is no
independence during sampling.
But what if the sampling was being done to give out prizes
randomly, and you allowed a child to win more than one
prize if there name was chosen more than one time? One
scenario would have one child winning all the prizes. Do I
have independence now? Try and answer the question
before moving on.
In that case I do have independence during sampling.
P(girl | girl) = P(girl).
P(girl) = 15/40, but P(girl | girl) = 15/40 and so on.
Now you are going to toss a fair die five times and record the
value that the die appears. Do you have independence during
throws?
Give it a try?
P(a six) = ?
P(a six | a four) = ?
I hope you can see that you do have independence during
the sampling procedure.
P(a six) = P(a six | a four) = 1/6
You are going to sample 10 students at random from a
University that has 15,000 registered students to participate in a
study. You can not select the same student twice, that is all ten
students are unique.
Do we have independence during sampling?
Yes, give it a try.
P(Shane) = 1/15000.
P(Shane | Marla) = 1/14999.
You can see that the two values are not the same,
thus I do not have independence, but…
Warning!
You must be very open minded in order to grasp what is
about to happen next!
Your ability to accept this or not will determine how well
you will do with future statistics concepts.
I am trying to let you in on a little secret that is not well
explained in most statistics books and the author expects
that you will eventually catch on subconsciously. I am
referring what occurs when you are given a probability
question and you wonder how do I know I should be doing
this to calculate this probability.
You are going to sample 10 students at random from a
University that has 15,000 registered students to participate in a
study. You can not select the same student twice, that is all ten
students are unique.
P(Shane) = 1/15000.
P(Shane | Marla) = 1/14999.
You can see that the two values are not the same,
thus I do not have independence, but…
Alright, you do not have independence but, are the two
numbers that different from each other?
0.00006666666666 versus 0.00006667111141
Not much of a difference.
P(Shane) = 1/15000.
P(Shane | Marla) = 1/14999.
So we don’t have independence, but for the practical
purposes of calculating probabilities we will assume we do.
What?
That’s correct. We recognize that the two numbers, being
almost identical, can be interchanged for calculating
purposes, recognizing of course that you will be off the true
value but maybe not by much once you recognize what to
look for.
This is an important view point as you study Statistics.
Often, there is a correct way of how to do something. But that process might
be difficult. Thus, Statisticians look for ways to see if another process would
give results that are nearly as good. Often those processes is what is actually
calculated in practice.
This may seem strange to you, but in reality most people employ a similar way of
thinking that leads to certain decisions; “the that is good enough for…”
An aide rushes in “Senator, you are leading the polls! You have
59% support.” A second aide rushes in, “Senator I have a
better estimate. You are leading the polls by 57.92% of the
votes.”
Notice that both numbers essentially same the same thing. The Senator is
leading by more than half of the votes. One is not any better than the other
as far as making some decision on what to do next.
Lets run through some scenarios so will start recognizing the
situations.
In California, 68% of registered voters, who voted, rejected
a tax increase measure. A newspaper reporter wants to
interview 100 register voters to see why they voted the way
they voted. So we will think of the people we contact as
either being for tax increase, or against tax increase. So as
we sample voters, do we have independence from voter to
voter? Think about what this means for a minute before
continuing.
So, as we sample each person, does the P(for) change? That is
P(for | for) = P(for) or P(for | against) = P(for)?
Think about it, and commit to a response before continuing.
So, as we sample each person, does the P(for) change? That is
P(for | for) = P(for) or P(for | against) = P(for)?
Think about it, and commit to a response before continuing.
I hope you said that we do not have independence, since
the reporter is sampling without replacement; once the
reporter interviews a person the person will not be chosen
again, essentially removing the person from the sample
space.
But, for practical purposes, the sampling procedure is nearly
independent. Lets say 100,000 people voted.
P(for ) = 68000/100000, but P(for | for) = 67,999/99,999 which
is nearly identical. So while not independent for practical
purposes it is, and we will act as if it is.
Out of a shipment of 500,000 potatoes, let us say 10% would
not meet some criteria set by a factory and thus the potatoes
would need to be rejected.
A quality inspector will choose two random bucket full of
potatoes for inspection (about 50 potatoes) to see if they
will accept the shipment.
As the inspector samples from the shipment, does the probability
of finding a defective potato change from sample to sample
(which if it does it means we do not have independence)? Again
what I am asking is P(defective | defective) = P(defective), or
P(defective | defective and defective) = P(defective)?
Think about it before answering.
Again, I hope you said that we do not have independence, since
P(defective) = 10000/40,000 while
P(defective | defective and defective) = 9998/39,998.
But again the same issue arises. We do not have independence
but for practical purposes (calculating probabilities) we do.
So as long as the population I am sampling from is so large
compared to my sample size, (if I am sampling without
replacement) then I may not have actual independence, but I can
say close enough to say yes for practical purposes (calculating
probabilities).
So lets end the slides by showing you some of these coveted
calculations you have been hearing so much about.
In the first example I will show you the “sample space” that
results from sampling, and do an actual calculation. With later
examples we can dispense with the need to show the sample
space, you will know its there and how it can be created, which,
because of sheer size, will be not possible to actually show for
the most interesting of scenarios.
So lets make sure we have this idea down by doing a simple
example. A bag has 5 marbles, of which 2 are red and 3 are
green.
What is the chance a marble chosen at random from the bag
is red? P(red) = ?
P(red) = 2/5.
But now lets ask what would happen if we sampled two
marbles from the bag?
What is the chance that two marbles chosen at random from
the bag are both green? P(green AND green) = ?
Lets look at visual of the sample space we have
just created by asking the question, “what is the
chance of getting to green marbles in a row” ;
keep in mind that sample spaces are “created” by
the question you ask.
What is the probability of getting two green marbles in a row?
Let me use a table of all the possible outcomes; while the
marbles for a particular color are impossible to distinguish
they are physically different marbles. I will use subscripts to
denote which marble is which.
R1
R2
G1
G2
G3
G1
G2
G3
RG
RG
RG
RG
RG
RG
RR
GG
GG
GR
GR
GG
GR
GR
GR
GR
GG
GG
GG
R1
R2
RR
The empty space in the table indicates that the marble has
already been taken, thus once G1 is picked, example, it can not
be picked again, and thus the empty space in column. Every
marble is equally likely to be chosen thus, counting how many
outcomes lead to the desired result we get …
R1
R2
G1
G2
G3
G1
RG
RG
GG
GG
G2
RG
RG
GG
GG
G3
RG
RG
GG
GG
R1
RR
GR
GR
GR
R2
RR
GR
GR
GR
P(G and G) = 6/20
What I needed you to see at the moment is how my sample
space just increased by sampling, instead of one item, 2
items. I also wanted you to see that the answer to the
question can be arrived at by counting how many outcomes
meet the criteria. Now I will use what we have learned so far
to make connections to previous material.
R1
R2
G1
G2
G3
G1
RG
RG
GG
GG
G2
RG
RG
GG
GG
G3
RG
RG
GG
GG
R1
RR
GR
GR
GR
R2
RR
GR
GR
GR
P(G and G) = 6/20
We could have answered the question using conditional
probabilities.
P(G and G) = The first marble needs to be green, then
after that so does the second marble.
I will use a tree diagram to illustrate all that could occur
(sample space).
My original sample space contained just two items {R, G} my
new sample space consisting of sampling more than one item
from that original sample space consists of {RR, RG, GR,
GG} four things; we can debate later about RG and GR being
different.
Now, I will add the probability of going from a particular node
to a particular ending.
The first part of the tree shows,
what occurs on the first draw.
P(green) = 3/5.
But on the second draw, if the first one is green, we then
have only four marbles to choose from of which only two are
green. The notation will be P(green | green) = 2/4.
Thus the probability of choosing two green marbles is
P(green)P(green | green) = (3/5)(2/4) = 6/20 just like in the
previous example.
In general we are saying that one way to calculate P(A and B) is
by having the following information:
P(A) and P(B | A)
Or
P(B) and P(A | B)
So then we can use the general formula
P(A and B) = P(A)P(B | A) or P(B)P(A|B)
Thus the probability of choosing two green marbles is
P(green AND green) = P(green)P(green | green) = (3/5)(2/4)
= 6/20
So we used the tree diagram as a tool for calculating
probabilities that describe sampling from a population more
than once.
Had we sampled, three times, instead of two times the
corresponding tree would have looked like
P(green AND green AND green) = (3/5)(2/5)(1/3) which
equals
P(green)P(green | green)P(green | green AND green)
So do we have independence? In the speak of sampling larger
than one, what I am asking is this. Is the probability of getting
a particular outcome affected as I continue to sample?
Using my example, does the
probability of getting a green
marble change as I continue to
grab marbles from the bag?
I think you would agree the
answer is yes. On the first grab,
the chance is P(green) = 3/5.
But on the second round the probability has changed to
P(green | green) = 2/4 which is not the same as 3/5. So this
clearly shows I do not have independence as I sample from
the bag.
What would have to happen in order to have independence
during the sampling procedure?
To have independence I would
need to put the marble back
after each pick, shake the bag
and draw again. Producing the
following tree.
Notice that P(green) = 3/5, but after the second pick
P(green | green) = 3/5 again since the marble is put
back. We have independence.
Alright, what did we learn so far?
The idea of independence is still the
same as before, but now we ask
about independence during the
sampling procedure.
Notice that P(green) = 3/5, but after the second pick
P(green | green) = 3/5 again since the marble is put
back. We have independence during sampling.
Just like before to show independence we need to show
P(A | B) = P(A). Keep this in mind as this will be critical
for what is coming next.
A Consequence of Having Independence
In general P(A and B) = P(A)P(B|A), furthermore
P(A and B and C) = P(A)P(B|A)P(C|B and A),
P(A and B and C and D) = P(A)P(B|A)P(C|B and A)P(D| A&B&C)
Lets say that events A and B are independent then,
P(A and B) = P(A)P(B|A) = P(A)P(B) since
P(B | A) = P(B).
A Consequence of Having Independence
In general P(A and B) = P(A)P(B|A), furthermore
P(A and B and C) = P(A)P(B|A)P(C|B and A),
P(A and B and C and D) = P(A)P(B|A)P(C|B and A)P(D| A&B&C)
Lets say that events A and B and C are independent
then,
P(A and B and C) = P(A)P(B)P(C) and so on. Since
P(C | B and A) = P(C) which says the probability of C
does not change if we change the sample space to the
scenario in which the events A and B have already
occurred.
So now, I will pose a few problems that you will see them in
most textbooks, and how you should view those problems.
In a population of 60,000 registered voters, 70% favor all day
kindergarten. What is the probability of choosing five
registered voters at random for an interview and having all five
favor all day kindergarten?
I want to answer the question
P(favor AND favor AND favor AND favor AND favor)
which could be answered by knowing
P(favor)P(favor | favor)P(favor| favor and favor) and so on.
The above calculation is not very pleasant to consider, but do
I have independence?
In a population of 60,000 registered voters, 70% favor all day
kindergarten. What is the probability of choosing five
registered voters at random for an interview and having all five
favor all day kindergarten?
I want to answer the question
P(favor AND favor AND favor AND favor AND favor)
which could be answered by knowing
P(favor)P(favor | favor)P(favor| favor and favor) and so on.
The above calculation is not very pleasant to consider, but do
I have independence? To answer this I can ask P(favor |
favor) = P(favor)?
No, since after I interview the first person I will not
interview them again; sampling without replacement.
In a population of 60,000 registered voters, 70% favor all day
kindergarten. What is the probability of choosing five
registered voters at random for an interview and having all five
favor all day kindergarten?
I want to answer the question
P(favor AND favor AND favor AND favor AND favor)
which could be answered by knowing
P(favor)P(favor | favor)P(favor| favor and favor) and so on.
But for practical purposes do I have independence?
Yes, since I am only taking out one person out of 60,000
voters. P(favor) ≈ P(favor | favor). Thus.
P(favor AND favor AND favor AND favor AND favor) =
(.7)(.7)(.7)(.7)(0.7) = 0.75
One last example.
I toss a die three times. The die is fair. What is the
probability of getting three “ones” in a row?
Do you have independence during sampling, or for practical
purposes do I have independence during sampling?
Here I do have independence, since P(one | one) = P(one),
that is a fair die has no memory of past outcomes. You can
think of it as sampling with replacement, in this case the die
automatically reverts to its original state after each throw.
P(one AND one AND one) = (1/6)3
You can see that if we can justify having independence or
almost having independence we can make some
calculations much easier; it removes a huge headache from
the calculation. As you continue studying statistics you will
see that many statistical tests use the fact that we have near
independence during the sampling process.
Having said this, there are historically many situation in
which the calculations were created assuming independence
and that was a wrong conclusion, the result being
disastrous. A memorable case involved the Challenger
Space Shuttle disaster.
View this slide show as often as you need. Keep in mind
that I am not expecting you to get everything on the first
pass through. View this slide set, do some reading, attempt
some problems, think about what you did, and view the
slides again. Repeat and repeat. Write down questions you
may have and ask your instructor.
Continue to work hard.
The End