Download Chapter 1. Random events and the probability Definition: A

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Chapter 1. Random events and the probability
Definition: A probability is a number between 0 and 1 representing how
likely it is that an event will occur.
Probabilities can be:
1. Frequentist (based on frequencies),
2. Subjective: probability represents a person’s degree of belief that an
event will occur,
e.g. I think there is an 80% chance it will rain today, written as
P(rain) = 0.80.
Definition: A random experiment is an experiment whose outcome is
not known until it is observed.
Definition: A sample space, Ω, is a set of outcomes of a random
experiment.
Definition: A sample point is an element of the sample space.
Experiment: Toss a coin twice and observe the result.
Sample space: Ω = {HH,HT, TH, TT}
An example of a sample point is: HT
Experiment: Toss a coin twice and count the number of heads.
Sample space: Ω = {0, 1, 2}
Experiment: Toss a coin twice and observe whether the two tosses are the
same (e.g. HH or TT).
Sample space: Ω = {same, different}
Definition: An event is a subset of the sample space. Events will be
denoted by capital letters A,B,C,... .
Note:We say that event A occurs if the outcome of the experiment is one
of the elements in A.
Example: Toss a coin twice. Sample space: Ω = {HH,HT, TH, TT}
Let event A be the event that there is exactly one head.
We write: A =“exactly one head”
Then A = {HT, TH}.
Note:A is a subset of Ω, as in the definition. We write A ⊂ Ω.
Definition: Event A occurs if we observe an outcome that is a member of
the set A.
Note: Ω is a subset of itself, so Ω is an event. The empty set,
 = {}, is
also a subset of Ω. This is called the null event, or the event with no
outcomes.
Example:
Experiment: throw 2 dice.
Sample space: Ω = {(1, 1), (1, 2), . . . , (1, 6), (2, 1), (2, 2), . . . , (2,
6), . . . , (6, 6)}
Event A = “sum of two faces is 5” = {(1, 4), (2, 3), (3, 2), (4, 1)}
Definition: Let A and B be events on the same sample space Ω: so A ⊂
Ω and B ⊂ Ω.
Definition: The complement of event A is written A and is given by
Experiment: Pick a person in this class at random.
Sample space: Ω = {all people in class}.
Let event A =“person is male” and event B =“person travelled by bike
today”.
Suppose I pick amale who did not travel by bike. Say whether the
following events have occurred:
did occur. Yes.
Venn diagrams are generally useful for up to 3 events, although they
are not used to provide formal proofs. For more than 3 events, the
diagram might not be able to represent all possible overlaps of
events. (This was probably the case for our transport Venn diagram.)
The following properties hold.
For any sets A, B, and C:
and
Definition: Two events A and B are mutually exclusive, or disjoint, if
This means events A and B cannot happen together. If A happens, it
excludes B from happening, and vice-versa.
Note: Does this mean that A and B are independent?
No: quite the opposite. A EXCLUDES B from happening, so B depends
strongly on whether or not A happens.
Definition: Any number of events A1 , A2 ,, An are mutually exclusive if
every pair of the events is mutually exclusive: ie.
Definition: A partition of the sample space Ω is a collection of mutually
exclusive events whose union is Ω.
.
form a partition of A.
We will see that this is very useful for finding the probability of event A.
This is because it is often easier to find the probability of small ‘chunks’
of A (the partitioned sections) than to find the whole probability of A at
once. The partition idea shows us how to add the probabilities of these
chunks together: see later.
1.4 Probability
The ingredient in the model for a random experiment is the specification
of the probability of the events. It tells us how likely it is that a particular
event will occur.
Definition A probability P is a rule (function) which assigns a positive
number to each event, and which satisfies the following axioms:
As a direct consequence of the axioms we have the following properties
for P.
Theorem Let A and B be events. Then,
6. If A1 , A2 ,
n
P(
i 1
n
Ai )   P( Ai ) 
i 1
, An are n arbitrary events in Ω, then

1i  j  n
P( Ai Aj ) 

1i  j  k  n
P( Ai Aj Ak ) 
 (1)n1 P( A1
An )
7. If A1 , A2 , , An is a finite sequence of mutually exclusive events
in Ω ( Ai A j   , i  j ), then
Examples of basic probability calculations
300 Australians were asked about their car preferences in 1998. Of the
respondents, 33% had children. The respondents were asked what sort of
car they would like if they could choose any car at all. 13% of
respondents had children and chose a large car. 12% of respondents did
not have children and chose a large car.
Find the probability that a randomly chosen respondent:
(a) would choose a large car;
(b) either has children or would choose a large car (or both).
First formulate events:
Next write down all the information given:
(a) Asked for P(L).
(b) Asked for P(L∪C).
Respondents were also asked their opinions on car reliability and fuel
consumption. 84% of respondents considered reliability to be of high
importance, while 40% considered fuel consumption to be of high
importance.
Formulate events:
R = “considers reliability of high importance”,
F = “considers fuel consumption of high importance”.
Information given: P(R) = 0.84 P(F) = 0.40.
(d) We can not calculate P(R∩F) from the information given.
(e) Given the further information that 12% of respondents considered
neither reliability nor fuel consumption to be of high importance, find P(R
∪F) and P(R∩F).
Probability that
respondent
considers either reliability or fuel
consumption, or both, of high importance.
Probability that respondent considers BOTH reliability AND fuel
consumption of high importance.
1.5 Conditional Probability
Conditioning is another of the fundamental tools of probability: probably
the most fundamental tool. It is especially helpful for calculating the
probabilities of intersections, such as P(A|B), which themselves are
critical for the useful Partition Theorem.
Additionally, the whole field of stochastic processes is based on the idea
of conditional probability. What happens next in a process depends, or is
conditional, on what has happened beforehand.
Dependent events Suppose A and B are two events on the same sample
space. There will often be dependence between A and B. This means that
if we know that B has occurred, it changes our knowledge of the chance
that A will occur.
Example: Toss a die once.
However, if we know that B has occurred, then there is an increased
chance that A has occurred:
Conditioning as reducing the sample space
The car survey in Examples of basic probability calculations
asked respondents which they valued more
also
highly in a car: ease of
parking, or style/prestige. Here are the responses:
Suppose we pick a respondent at random from all those in the table.Let
event A =“respondent thinks that prestige is more important”.
However, this probability di
ers between males and females. Suppose we reduce our sample space
from
This is our definition of conditional probability:
Definition: Let A and B be two events. The conditional probability that
event A occurs, given that event B has occurred, is written P(A|B), and is
given by
Note: Follow the reasoning above carefully. It is important to understand
why the conditional probability is the probability of the intersection
within the new sample space
Conditioning on event B means changing the sample space to B.
Think of P(A|B) as the chance of getting an A, from the set of B's only.
The Multiplication Rule
For any events A and B,
New statement of the Partition Theorem
The Multiplication Rule gives us a new statement of the Partition
Theorem:
Both formulations of the Partition Theorem are very widely used, but
especially the conditional formulation
Examples of conditional probability and partitions
Tom gets the bus to campus every day. The bus is on time with
probability 0.6, and late with probability 0.4.
The sample space can be written as
We can formulate events as follows:
T = “on time”; L = “late”.
From the information given, the events have probabilities:
P(T) = 0:6 ; P(L) = 0:4:
(a) Do the events T and L form a partition of the sample space? Explain
why or why not.
Yes: they cover all possible journeys (probabilities sum to 1), and there is
no overlap in the events by definition.
The buses are sometimes crowded and sometimes noisy, both of which
are problems for Tom as he likes to use the bus journeys to do his Stats
assignments. When the bus is on time, it is crowded with probability 0.5.
When it is late, it is crowded with probability 0.7. The bus is noisy with
probability 0.8 when it is crowded, and with probability 0.4 when it is not
crowded.
(b) Formulate events C and N corresponding to the bus being crowded
and noisy. Do the events C and N form a partition of the sample space?
Explain why or why not.
Let C = “crowded”, N =“noisy”.
C and N do NOT form a partition of  . It is possible for the bus to be
noisy when it is crowded, so there must be some overlap between C and
N.
(c) Write down probability statements corresponding to the information
given above. Your answer should involve two statements linking C with T
and L, and two statements linking N with C.
(d) Find the probability that the bus is crowded.
(e) Find the probability that the bus is noisy.
Bayes' Theorem: inverting conditional probabilities
Then
This is the simplest form of Bayes' Theorem, named after Thomas Bayes
(1702-61), English clergyman and founder of Bayesian Statistics.
Bayes' Theorem allows us to “invert” the conditioning,i.e. to express P(B|
A) in terms of P(A|B).
This is very useful. For example, it might be easy to calculate,
P(later event|earlier event);
but we might only observe the later event and wish to deduce the
probability that the earlier event occurred,
P(earlier event| later event):
Full statement of Bayes' Theorem:
Example: The case of the Perfidious Gardener.
Mr Smith owns a hysterical rosebush. It will die with probability 1/2 if
watered, and with probability 3/4 if not watered. Worse still, Smith
employs a perfidious gardener who will fail to water the rosebush with
probability 2/3. Smith returns from holiday to find the rosebush . . .
DEAD!!
What is the probability that the gardener did not water it?
So the gardener failed to water the rosebush with probability 3/4.
Example: The case of the Defective Ketchup Bottle.
Ketchup bottles are produced in 3 different factories, accounting for 50%,
30%, and 20% of the total output respectively. The percentage of
defective bottles from the 3 factories is respectively 0.4%, 0.6%, and
1.2%. A statistics lecturer who eats only ketchup finds a defective bottle
in her wig. What is the probability that it came from Factory 1?
More than two events
To find P( A1 A2 A3 ) , we can apply the multiplication rule successively:
Example: A box contains w white balls and r red balls. Draw 3 balls
without replacement. What is the probability of getting the sequence
white, red, white?
1.6 Probabilities from combinatorics: equally likely outcomes
Sometimes, all the outcomes in a discrete finite sample space are equally
likely. This makes it easy to calculate probabilities. If
Example: For a 3-child family, possible outcomes from oldest to youngest
are:
Let { p1 , p2 , , p8} be a probability distribution on  . If every baby is
equally likely to be a boy or a girl, then all of the 8 outcomes in  are
equally likely, so p1  p2 
1
 p8  .
8
Event A contains 4 of the 8 equally likely outcomes, so event A occurs
4
8
1
2
with probability P( A)   .
Counting equally likely outcomes
The number of permutations, n Pr , is the number of ways of selecting r
objects from n distinct objects when different orderings constitute
different choices.
The number of combinations, n C r , is the number of ways of selecting r
objects from n distinct objects when different orderings constitute the
same choice.
Then
Use the same rule on the numerator and the denominator
When P( A) 
# outcomes in A
, we can often think about the problem
# outcomes in 
either with different orderings constituting different choices, or with
different orderings constituting the same choice. The critical thing is to
use the same rule for both numerator and denominator.
Example: (a) Tom has five elderly great-aunts who live together in a tiny
bungalow. They insist on each receiving separate Christmas cards, and
threaten to disinherit Tom if he sends two of them the same picture. Tom
has Christmas cards with 12 different designs. In how many different
ways can he select 5 different designs from the 12 designs available?
Order of cards is not important, so use combinations. Number of ways of
selecting 5 distinct designs from 12 is
b) The next year, Tom buys a pack of 40 Christmas cards, featuring 10 di
erent pictures with 4 cards of each picture. He selects 5 cards at random
to send to his great-aunts. What is the probability that at least two of the
great-aunts receive the same picture?
Total number of outcomes is
(Note: order mattered above, so we need order to matter here too.)
So
Thus
P(A)=P(at least 2 cards are the same design) =1  P( A)  1  0.392  0.608.
1.7 Statistical Independence
Two events A and B are statistically independent if the occurrence of one
does not affect the occurrence of the other.
We use this as our definition of statistical independence.
For more than two events, we say:
Statistical independence for calculating the probability of an
intersection
We usually have two choices.
1. IF A and B are statistically independent, then
2. If A and B are not known to be statistically independent, we usually
have to use conditional probability and the multiplication rule:
This still requires us to be able to calculate P(A|B).
Note: If events are physically independent, then they will also be
statistically independent.
Pairwise independence does not imply mutual independence
Example: A jar contains 4 balls: one red, one white, one blue, and one red,
white& blue. Draw one ball at random.
So A, B and C are NOT mutually independent, despite being pairwise
independent.