Download note taking guide chapter 6 - Germantown School District

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Tuesday, October 30: 6.1-6.2 Chance Experiments, Events, and the Definition of Probabilitys
FOCUS: When we do an experiment or observational study, we would like to know if our results are
statistically significant. That is, we want to know if the results we obtained were likely to occur simply
by chance. To determine if our results are statistically significant, we need calculate a probability,
which is what we will study in the next chapter.
The study of ________________ is the systematic study of _____________________.
A ____________________________ is any activity or situation in which there is uncertainty about
which of two or more possible outcomes will result.
 flipping a coin
 rolling a die
 choosing a card
 asking a survey question
 giving a treatment in an experiment
The collection of all possible outcomes of a chance experiment is the ___________________ for the
experiment.
 flipping a coin:
 rolling a die:
 choosing a card:
In some chance experiments, more than one piece of data is collected.
For example, a randomly selected stats student will be asked for his or her gender, class, and period.
 Thus, one possible outcome would be:

How many possible outcomes are there?

To identify all possible outcomes, we can use a tree diagram, with each set of branches
corresponding to one variable and each “end” representing one outcome.
An _______ is any collection of outcomes from the sample space of a chance experiment.
 We usually denote events with capital letters: A, B, C, ... or a capital letter with subscripts:
E1 , E2 , ...

A = choosing a senior male =
A ___________________ is an event consisting of exactly one outcome.

B = choosing a junior female from 5th period =
1
Forming New Events: Let A and B denote 2 events:
1. The event _________ consists of all experimental outcomes that are not in event A. This event is
often called the __________________ of A and is usually denoted A ', Ac , ~ A, or A
2. The event ______ consists of all experimental outcomes that are in at least one of the two events, that
is, in A, in B, or in both A and B. This event is called the _______ of events A and B and is denoted
A B
3. The event ______ consists of all experimental outcomes that are in BOTH of the events A and B.
This event is called the _________________ of events A and B and is denoted A  B
Def: Two events that have no outcomes in common are said to be _____________ or ____________
________________.
Suppose we select a random sample of CDO students and recorded each student’s gender and
handedness. Let event A = the student is a male and let event B = the student is right handed. How
many possible outcomes are there?
List all possible outcomes in the following events:
Ac 

A  B  same as  Ac  B c 
c

A B 
A  Bc 
A  Ac 
2
The Definition of Probability
The Classical Approach to Probability:
Def: The probability of an event E, denoted P(E), is: P  E  
# of outcomes favorable to E
# of outcomes in the sample space
Note: This method for calculating probabilities is ONLY appropriate when the outcomes of an
experiment are equally likely.
For example, rolling a die or selecting a card from a deck are chance experiments that have equally
likely outcomes.
 P(rolling a 6) = ?

P(rolling an even #) = ?

P(drawing the queen of spades) = ?

P(drawing any queen) = ?
Which of these events are simple events?
Name some chance experiments where the outcomes are not equally likely:
If the probabilities are not equally likely, you CANNOT use the classical definition.
The Relative Frequency Approach to Probability:
P E 
# of times E occurs
as the # of trials becomes very large
# of trials

The probability of rolling a 3 

The probability of rain 
# of times a 3 is rolled
total # of times the die is rolled
number of similar days with rain
number of similar days
3

What is the probability of getting heads when we SPIN a coin?
The ______________________________ says that as the number of trials increases, the relative
frequency of an event will approach the true probability of the event.
The Subjective Approach to Probability:
This method assigns a probability based on the strength of the belief that the event will occur. It is
probably the most frequently used method in everyday life, but certainly the least reliable.
HW #1: 6.1, 2, 3, 8, 9, 11-13
4
Thursday, November 1: 6.3-6.4 Basic Properties of Probability, Conditional Probability
1. For any event E, 0 ≤ P(E) ≤ 1.
 If P(E) = 1, the event is guaranteed.
 If P(E) = 0, the event will never occur.
2. If S is the sample space for an experiment, then P(S) = 1.
 Since S represents all the outcomes in an experiment, one of them has to happen.
3. If two events E and F are disjoint (mutually exclusive), then P(E or F) = P  E  F = P(E) + P(F).

when rolling one die: P(1 or 2) =

P(1 or odd)

This rule works for more than 2 disjoint events as well.
4. The complement rule: For any event E,
 P(E) + P(not E) = 1
 P(E) = 1 - P(not E)

If P(rain) = 30%, P(no rain) = ??
Def: A __________________________ is a list of all the outcomes in the sample space and their
probabilities.
Grade
A B C D F
Probability .1 .3 .4 .15

P(F) =

P(C or better) = ?

P(A’) = ?
Dice Problems:
 What is the probability distribution of the sum of the two dice?

P(sum is even) = P(2 or 4 or …) = ?
5

P(sum ≤ 4) = ?

P(sum < 12) = ?
Coin Problems:
Suppose you flipped a coin 3 times
 What are the 8 possible outcomes? Are they equally likely??

What is the probability you get 2 heads?

Make a probability distribution for the x = number of heads in 3 flips

What is the probability you get at least one tail?
Conditional Probability
Sometimes the knowledge that one event has occurred changes the probability that another event will
occur.
 The probability of being in a car accident increases if you know that it is raining outside
 Suppose that the pass rate on the AP is exam is 80%. That is, for a randomly selected student,
P(pass) = .80. However, if you know that the student got a B in the class, then the probability
increases 95%. That is, for a randomly selected student, P(pass given that you have a B) = .95.
 Notation: P(pass | B) = .95
6
Probabilities in the form P(A | B) are called _________________ probabilities and pronounced “the
probability of A occurring given that B has already occurred”
The following data is about the 2201 passengers on the Titanic. 367/1731 males survived and overall
711 survived. Express this in two-way table with gender and survival as the variables.
A _________________ is a way to summarize the relationship between two categorical variables. It
lists the outcomes of one variable down the left side, the outcomes of the other variable across the top,
and the frequencies (or relative frequencies) in each cell.
Note: The distributions of the totals for gender and survival are called the ____________ distributions.
The distributions within each gender or survival category are called ________________ distributions.
Suppose you randomly selected a name from the Titanic’s passenger list.
a. P(survived) = ?
b. P(male) = ?
c. P(female  survived) = ?
d. P(survived | female) = ?
e. P(male | survived) = ?
f. P(died | female) = ?
g. P(female | died) = ?
h. P(female | survived) = ?
Let E and F be two events. The conditional probability of event E given that event F has occurred is:
PE  F 
PE | F  
PF 
Ex: P(female | survived) = ?
7
How do we graphically display the relationship between two numerical variables?
To graphically display the relationship between two categorical variables, we can use a “two-way
segmented bar chart” which JMP calls a “Mosaic Plot”.
Suppose that in a certain company, 5% of the employees use drugs. The company decides to test all of
its employees for drugs with a test that is 95% successful (it correctly identifies 95% of users as users
and 95% of non-users as non-users). Let D = event that the employee uses drugs and P = event that the
person tests positive.
a. Express the information given in terms of P and D.
b. Express the data in a tree diagram
c. If a person tests positive, what is the probability that they use drugs? Does this result surprise
you?
8
Note: the false positive rate of a test is P(positive | no drugs) and the false negative rate is P(negative |
drugs)
 Which of these errors is worse? For the employees? For the company?

How can we change these probabilities?
HW #2: 6.15-18, 21, 26, 27, 29-33
9
Monday, November 5: 6.5 Independence
Yesterday we looked at problems where knowing that one event occurred changed the probability of
another event occurring. It is also possible that knowing one event occurred will not change the
probability of a second event. In this case, these two events are called independent.
If I flip a coin twice, does knowing the outcome of the first flip change the probabilities for the second
flip?
Suppose I survey all of my students and note their gender and handedness. The totals are shown in the
table below:
Male Female
Right
135
Left
15
70
80
150
If the variables are independent, what values would we expect to find in the table above?
Draw the mosaic plot for this data:
Def: Two events E and F are __________________ if: P(E | F) = P(E)
 In other words, knowing that F occurred doesn’t change the probability of E occurring.
 The concept of independence will be very important to us the rest of the year!
Note: If events E and F are independent and the data we have is from a random sample, then
P(E | F)  P(E). In other words, the probabilities may not be exactly the same because of sampling
variability. How close do they need to be? Wait until chapter 12…
10
Def: Two events are _________________ if they are not independent.
 Were gender and survival independent on the Titanic?
Multiplication Rule for Independent Events:
The events E and F are independent if and only if P  E  F   P  E   P  F  .
Proof:
Recall that P  E | F  
PE  F 
.
PF 
Therefore, P  E  F   P  E | F   P  F  .
However, if E and F are independent, then P(E | F) = P(E).
Therefore, P  E  F   P  E   P  F  .
If you were to toss a coin and roll a die, P(heads and 6) = ?
The multiplication rule also works for 3 or more independent events.
a. P(heads and 6 and Queen) = ?
b. P(5 heads in a row) = ?
c. P(at least one tail in 5 flips) = ?
If the probability of rain in Marana is 50% and the probability of rain in Oro Valley is 50%, what is the
probability that it will rain in both places?
If the probability of rain in New York City is also 50%, what is the probability that is will rain in both
Oro Valley and NYC?
HW #3: 6.36-6.43
11
Tuesday, November 6: 6.5-6.6: More Independence and General Probability Rules
When we do inference procedures in AP Statistics, one of the assumptions we make is that the
observations in our sample are independent. That is, knowing the outcome of previous selections does
not change the probability of future selections.
Consider a class of 15 girls and 15 boys. Suppose we want to select 2 students at random. Let A = the
event that the first person selected is a girl and let B = the event that the second person selected is a girl.
One way to sample is _________________________. That is, once we make our first selection, the
person selected goes back into the sampling frame and is eligible to be selected again.
The more realistic and common way to sample is ____________________________. That is, once a
person is selected, he is ineligible to be selected again.
What if we repeated these calculations for a school with 900 girls and 900 boys?
In general, the rule is: we can assume selections are independent when sampling without replacement if
the sample size (n) is less than 10% of the population size (N). Note: some books use 5%.
Note: This is why casinos like to play blackjack with multiple decks. They are increasing the
population size so the player’s knowledge of what cards have been played will be less informative.
Note: The 10% rule seems to imply that we want small samples. On the contrary, larger samples are
always better, its just that the methods of analysis become more complicated when we cannot assume
independence.
12
General Probability Rules
The General Addition Rule: How to calculate P  A  B  when events A and B are NOT disjoint.
In the Venn Diagram above, events A and B are not disjoint. Thus, if we use the rule: P  A  B  =
P(A)+P(B), we will count the intersection twice. Thus, to avoid counting it twice, we subtract
P  A  B  from the sum.
For ANY events A and B, P  A  B  = P(A) + P(B) - P  A  B  .
When drawing a card from a deck, let A = the card is a heart and B = the card is a queen. Find the
probability of drawing a heart or queen.
What if A and B are disjoint?
The General Multiplication Rule: How to calculate P  A  B  when A and B are NOT independent.
P  A  B   P  A  P  B | A 
If you draw 2 cards from a deck without replacement and let event A = 1st card is a heart and event B =
2nd card is black. Find P  A  B  .
What if A and B are independent?
13
Suppose that at CDO, 40% of the students are upperclassmen, 70% of the upperclassmen have a drivers
license while only 10% of the lowerclassmen have one. Express this information in symbolic form and
in a tree diagram. Let U = student is an upperclassman and D = student has a driver’s license.
If you select a student at random, what is the probability:
a. he is an upperclassman and has a DL =
b. he has a DL =
c. he is an upperclassman given that he has a DL =
d. he is an upperclassman or has a DL =
e. Are U and DL disjoint events? Explain.
f. Are U and DL independent events? Explain.
HW #4: 6.45-6.50, 6.53, 54, 56, 58, 60, 61
14
Thursday, November 8: Review 6.1-6.6, 6.7 Using Simulation to Estimate Probabilities
In a certain city, 45% of registered voters are Republicans, 40% are Democrats and the rest are
Independents. Also, 40% of the Republicans are women, 55% of the Democrats are women and 60% of
the Independents are women.
Make a tree diagram to display this information.
Suppose you randomly selected one registered voter. What is the probability that you choose:
a. person who is Republican and a woman
b. person who is a woman
c. person who is male or an Independent
d. person who is a Republican if you know he is a male
Are being a woman and being a Democrat independent? Justify.
Is it possible to use a table to solve this problem?
In general, if you are given counts, then a table (or venn diagram) is easier to use. But, if you are given
conditional probabilities, then a tree diagram is usually better.
15
In a survey of 1200 college students, 380 said they had they tried smoking, 800 had tried drinking, and
315 had tried both.
What is the probability that a randomly selected member of the sample:
a. has tried neither?
b. who has tried drinking has also tried smoking?
c. who has smoked hasn’t had a drink?
d. has tried at least one of them?
Is smoking independent of drinking? Justify.
Are smoking and drinking disjoint events?
Estimating Probabilities using Simulation (note: different than the book!)
In many cases, it is very difficult (or impossible) to calculate a probability using the rules we have
learned in this chapter so far. In these cases, we must estimate the probabilities empirically (through
repeated observations like spinning a coin) or by performing a simulation. Unfortunately, this will not
give us an exact answer, but remember that the Law of Large Numbers says that the precision of our
estimates increases as the number of observations gets larger.
16
Suppose a cereal company places one of four toys in every box of cereal. Furthermore, the company
claims that each toy is produced in the same quantity so each of the toys is equally likely to show up in a
randomly chosen box. Suppose I want to get all 4 toys, but it takes me 15 boxes to find each of the four.
Is this evidence that the toys are NOT uniformly distributed? Estimate the probability that it takes 15 or
more boxes to get all 4 toys.
There are 7 steps for a simulation:
1. Identify the component to be repeated.
2. Explain how you will model the outcomes.
3. Explain how you will simulate the trial. A trial (or run) is the sequence of events that we are
pretending will take place.
4. State clearly what the response variable is.
5. Give one example and run several trials. The more trials you run, the more precise your estimate will
be.
6. Analyze the response variable. This is usually accomplished with a table or graph.
7. State your conclusion in the context of the problem.
If it took me 20 boxes to find all 4, would that be convincing evidence against the company?
How else could we model the outcome (step 2)?
HW #5: 6.62, 63, 65, 66, 68, 69, Simulation worksheet 1-2 (do all 7 steps!)
17
Tuesday, November 13: 6.7 More Simulations
A person who claims to have ESP says that she can identify the symbol on the back of a card (circle,
triangle, or square) without seeing it. To test this claim, you shuffle the three cards, show her the back
of a randomly selected card, and record if she correctly identifies the symbol. Then, you replace the
card, reshuffle, and repeat this procedure 19 more times. Overall, she identified 10 correctly. Does this
support her claim of ESP, or could this have occurred by chance? Use a table of random digits.
Use a simulation to estimate the probability of getting 10 or more correct just by guessing.
18
Suppose that 150 seniors at a particular school (including 19 members of StuGo) signed up for “senior
parking,” which gives each student the right to a particular parking space close to the campus for the
year. Also, suppose that each student’s spot is determined by a lottery. The student body became
suspicious, however, when StuGo members were awarded 5 of the 10 best spots. Is the suspicion
warranted or could this have occurred by chance? Conduct a simulation to estimate the probability of
StuGo members getting at least 5 of the 10 best spots, assuming the lottery is fair.
HW #6 Simulation Worksheet (3-5), Book: 6.75 (5 runs)
19
Thursday, November 15: Review chapter 6
Arrange presentations
Here is data from a random sample of 100 CDO students. Are gender and playing a sport independent
for students at CDO?
Male Female Total
Sport
16
14
30
No Sport 32
38
70
total
48
52
100
Assuming the variables are independent, how likely is it to get a difference this large due to sampling
variability. Use a simulation to estimate this probability. Is it possible that the variables really are
independent?
AP Question:
HW #7 Review Worksheet (1-5)
20
Extra Simulation questions for practice:
Simulation: Pat leaves for work at a random time between 7:00 am and 8:00 am. Pat’s paper arrives at
a random time from 7:30 am to 8:30 am. Design and conduct a simulation (do 20 runs) to estimate the
probability that the paper arrives before Pat leaves for work.
Simulation: Since only 90% of ticketed passengers actually show up for their flight (on average), it is
the practice of airlines to overbook their flights (sell more seats than they have). On a certain airline,
each plane holds 100 passengers and 108 tickets are sold for each flight. Unfortunately for travelers, if
more than 100 people show up for a particular flight, some of them are turned away. For example, if
103 people show up, then 3 people will be pretty mad!
a. Design and conduct a simulation to estimate the probability that at least one passenger will be
turned away.
b. If plane tickets cost $400 and the airline loses $1000 for each overbooked passenger, how much
extra money does the airline make per flight with this practice?
Monday, November 19: Test Chapter 6
Tuesday, November 20: Project Presentations
21