Download Statistics 510: Notes 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Statistics 510: Notes 2
Reading: Sections 2.3, 2.4, 2.7.
I. Wrap-up of Section 2.2
Example 6 from last class: A fashionable country club has
100 members, 30 of whom are lawyers. Rumor has it that
25 of the club members are liars and that 55 are neither
lawyers nor liars. What proportion of the lawyers are liars?
Let A  set of lawyers, B  set of liars and S  set of all
members of the country club.
Let the number of members in any set Q be denoted by
N (Q) . The proportions of the lawyers that are liars is
N ( A  B)
equal to N ( A) . We are given that
N ( S )  100
N ( A)  30
N ( B)  25
N (( A  B )C )  55
The last statement implies that N ( A  B)  100  55  45 .
To use this information to calculate N ( A  B) , we verify
using a Venn diagram that
N ( A  B)  N ( A)  N ( B)  N ( A  B) . Thus,
N ( A  B)  30  25  45  10 and the proportion of lawyers
N ( A  B) 10 1
that are liars is N ( A)  30  3 .
DeMorgan’s Laws:
Let A and B denote any two events. Use Venn diagrams to
show that
(a) the complement of their intersection is the union of their
complements:
( A  B)C  AC  BC
(b) the complement of their union is the intersection of their
complements:
( A  B)C  AC  BC
Review of Section 2.2: We have defined the key concepts
of an experiment, the sample space for an experiment and
events in the sample space. We have discussed relations
between events and introduced the Venn diagram as a tool
for examining the relations between events.
The relations between events will be useful for
manipulating probabilities. We now introduce the concept
of the probability of an event.
II. Frequency interpretation of probability (Section 2.3)
The relative frequency of an event is a proportion
measuring how often, or how frequently, the event occurs
in a sequence of experiments.
Example 1: Experiment: Toss a coin. Sample space is
S  {heads, tails} .
If the experiment is repeated many times, the relative
frequency of heads will usually be close to ½:
 The French naturalist Count Buffon (1707-1788)
tossed a coin 4040 times. Result: 2048 heads, or
relative frequency 2048/4040=0.5069 for heads.
 Around 1900, the English statistician Karl Pearson
heroically tossed a coin 24,000 times. Result: 12,012
heads, a relative frequency of 0.5005.
 While imprisoned by the Germans during World War
II, the Australian mathematician John Kerrich tossed a
coin 10,000 times. Result: 5067 heads, a relative
frequency of 0.5067.
In the frequency interpretation of probability, the
probability of an event A is the expected relative frequency
of A in a large number of trials. In symbols, the proportion
of times A occurs in n trials, call it Pn ( A) , is expected to
be roughly equal to the theoretical probability P( A) if n is
large:
Pn ( A)  P( A) for large n .
Example 2: Experiment: Observation of the sex of a child.
The sample space is S  {girl , boy} . The following table
shows the proportion of boys among live births to residents
of the U.S.A. over the past 20 years (Source: Information
Please Almanac).
Year
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
Number of births
3,638,933
3,669,141
3,760,561
3,756,547
3,809,394
3,909,510
4,040,958
4,158,212
4,110,907
4,065,014
4,000,240
3,952,767
3,926,589
Proportion of boys
0.5126648
0.5122425
0.5126849
0.5124035
0.5121951
0.5121931
0.5121286
0.5121179
0.5112054
0.5121992
0.5121845
0.5116894
0.5084196
1996
1997
1998
1999
2000
2001
2002
3,891,494
3,880,894
3,941,553
3,959,417
4,058,814
4,025,933
4,021,726
0.5114951
0.5116337
0.5115255
0.5119072
0.5117182
0.5111665
0.5117154
The relative frequency of boys among newborn children in
the U.S.A. appears to be stable at around 0.512. This
suggests that a reasonable model for the outcome of a
single birth is P(boy )  0.512 and P( girl )  0.488 .
This model for births is equivalent to the sex of a child
being determined by drawing at random with replacement
from a box of 1000 tickets, containing 512 tickets marked
boy and 488 tickets marked girl .
III. Axioms of Probability (Section 2.3)
The frequency interpretation of probability is the way that
many scientists think about what probability represents but
it is hard to make it into a rigorous mathematical definition
of probability.
Kolmogorov (1933) developed an axiomatic definition of
probability which he then showed can be interpreted, in a
certain sense, as the limit of the relative frequency in a
large number of experiments.
A probability function (measure) on the events in a sample
space is a function on the events P ( E ) that satisfies the
following three axioms:
Axiom 1: 0  P ( E )  1 for all events E .
Axiom 2: P( S )  1 where S is the sample space.
Axiom 3: For any sequence of mutually exclusive events
E1 , E2 , (that is, events for which Ei  E j   when
i  j ),

P(
i 1

Ei )   P ( Ei ) .
i 1
We refer to P ( E ) as the probability of an event E .
Using these axioms, we shall be able to prove that if an
experiment is repeated over and over again, then with
probability 1, the proportion of times that a specific event
E occurs converges to P ( E ) , which is essentially the
frequency interpretation of probability. This is called the
strong law of large numbers and we shall prove it in
Chapter 8.
Consequences of axioms:
1. P()  0 .
Proof: Consider the sequence of events E1 , E2 , , where
E1  S and Ei   for i  1 . Then, as the events are
mutually exclusive and as S 

i 1
Ei , we have from Axiom
3 that


i 1
i 2
P( S )   P( Ei )  P( S )   P() ,
implying that P()  0 .
2. For any finite sequence of mutually exclusive events
E1 , , En ,
n
P(
i 1
n
Ei )   P ( Ei ) .
i 1
Proof: Let Ei   for i  n . The results follows from
Axiom 3 combined with the fact established above that
P()  0 .
IV. Examples of probability functions
Example 3: If a die is rolled and we suppose that all six
sides are equally likely to appear, then we would have
P({1})  P({2})  P({3})  P({4})  P({5})  P({6}) 
The probability of rolling an even number would equal,
from Axiom 3,
1
P({2, 4, 6})  P({2})  P({4})  P({6})  .
2
1
6.
Example 4: A die is loaded in such a way that the
probability of any particular face’s showing is directly
proportional to the number on that face. What is the
probability that an even number appears?
To solve this requires that we make use of Axiom 2 that
P( S )  1 . The experiment – tossing a die – generates a
sample space containing six outcomes. But the six are not
equally likely: by assumption,
P(" i " face appears)  P(i )  ki, i  1, , 6
where k is a constant. From Axiom 2,
6
6
6(6  1)
P
("
i
"
face
appears)

ki

k  21k  1 ,


2
i 1
i 1
i
P
("
i
"
face
appears)

which implies that k  1/ 21and
21 .
It follows then from Axiom 3 that the probability that an
even number appears is
2
4 6 12
P(even number)  P(2)  P(4)  P(6)   

21 21 21 21
V. Probability as a Measure of Belief (Section 2.7)
Another interpretation of probability, besides the frequency
interpretation, is that probability measures an individual’s
belief in the statement that he or she is making. This is
called subjective or personal probability. Consider the
question,
“What is the probability that the Philadelphia Eagles will
win the Super Bowl this year?”
It is hard to interpret such a probability using the frequency
interpretation because the football season can only be
played once. The subjective interpretation of a statement
that the Eagles have a probability of 0.1 of winning the
Super Bowl is that:
 If the person making the statement were offered a
chance to play a game in which the person was
required to pay less than 10 cents to buy into the game
and would win $1 if the Eagles win the Super Bowl,
then the person would buy into the game.
 By contrast, if the person making the statement were
offered a chance to play a game in which the person
was required to pay more than 10 cents to buy into the
game and would win $1 if the Eagles win the Super
Bowl, then the person would not buy into the game.
More generally, if E is an event, a person’s subjective
probability of P ( E ) has the following interpretation: For a
game in which the person will be paid $1 if E occurs,
P ( E ) is the amount of money the person would be willing
to pay to buy into the game. Thus, if the person is willing
to pay 50 cents to buy in, P( E )  .5 .
Note that this concept of probability is personal: P ( E ) may
vary from person to person depending on their opinions.
A rational person has a “coherent” system of personal
probabilities: a system is said to be “incoherent” if there
exists some structure of bets such that the bettor will lose
no matter what happens. It can be shown that a coherent
system of personal probabilities requires that the personal
probabilities satisfy Axioms 1, 2 and 3 (for details on this,
see Hogg, McKean and Craig, Introduction to
Mathematical Statistics, Chapter 11.1).
Thus, whether the probability function is interpreted as a
measure of belief or as a long-run relative frequency, its
mathematical properties remain unchanged.
I personally think of probability in terms of the frequentist
interpretation but it is equally valid to view probability as a
measure of belief; all results in the course are equally
applicable to both interpretations.
VI. Propositions about Probability Function Based on
Axioms (Section 2.4)
C
Proposition 4.1: P( E )  1  P( E ) .
C
Proof: Because E  E  S , by Axiom 2 we have
P( E  E C )  P ( S )  1 .
Because E and E C are mutually exclusive, it follows from
Axiom 3 that
P( E  E C )  P ( E )  P( E C ) .
C
C
Thus, P( E )  P( E  E )  P( E )  1  P( E ) .
Example 5: In a certain population, 10% of the people are
rich, 5% are famous and 3% are rich and famous. For a
person picked at random from this population (meaning
that each person has an equal probability of being picked),
what is the chance that the person is not rich?
Proposition 4.2: If E  F (meaning that every outcome in
E is contained in F ), then P( E )  P( F ) .
Proof: Note that the event F may be written in the form
F  E  (F  EC ) ,
where E and F  E C are mutually exclusive. Therefore, by
Axiom 3,
P( F )  P( E )  P( F  E C ) . By Axiom 1,
P( F  E C )  0 so that P( F )  P( E ) .
Furthermore, from the proof of Proposition 4.2, we have
the difference rule that if E  F ,
P( F and not E )  P( F  E C )  P( F )  P( E ) .
Example 5 continued: For a person picked at random from
the population, what is the chance that the person is rich but
not famous?
Proposition 4.3: P( E  F )  P( E )  P( F )  P( E  F ) .
Proof: The Venn diagram suggests the statement of the
proposition is true. More formally, we have from Axiom 3
that
P( E )  P ( E  F C )  P ( E  F )
P( F )  P( E  F )  P( E C  F )
P( E  F )  P ( E  F C )  P ( E  F )  P ( E C  F )
From the first two equations, we have that
P( E  F C )  P( E )  P( E  F ),
P( E C  F )  P( F )  P( E  F )
Substituting these expressions in the expression for
P ( E  F ) , we conclude that
P( E  F )  P( E )  P( F )  P( E  F ) .
Note: Proposition 4.3 can be extended to provide an
expression for P( E1  E2   En ) ; see Proposition 4.4,
the inclusion-exclusion identity).
Example 5 continued: What is the chance that the randomly
selected person is either rich or famous?
Example 6: Winthrop, a premed student, has been
summarily rejected by all 126 U.S. medical schools.
Desperate, he sends his transcripts and MCATs to the two
least selective campuses he can think of, the two branch
campuses ( X and Y ) of Swampwater Tech. Based on the
success his friends have had there, he estimates that his
probability of being accepted at X is 0.7, and at Y , 0.4.
He also suspects that there is a 75% chance that at least one
of his applications will be rejected. What is the probability
that at least one of the schools will accept him?