Download Statistics 510: Notes 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Statistics 510: Notes 8
Reading: Sections 3.5, 4.1-4.3
I. P ( | F ) is a probability (Chapter 3.5)
The conditional probability P ( | F ) is a probability function
on the events in the sample space S and satisfies the usual
axioms of probability:
(a) 0  P( E | F )  1
(b) P ( S | F )  1
(c) If Ei , i  1, 2, are mutually exclusive events, then

P(
1

Ei | f )   P ( Ei | F )
1
Thus, all the formulas we have derived for manipulating
probabilities in Chapter 2 apply to conditional probabilities.
C
For example, P( E | F )  1  P( E | F ) .
Conditional independence: An important concept in
probability theory is that of the conditional independence of
events. We say that events E1 and E2 are conditionally
independent given F if, given that F occurs, the conditional
probability that E1 occurs is unchanged by information as
to whether or not E2 occurs.
More formally, E1 and E2 are said to be conditionally
independent given F if
P( E1 | E2  F )  P( E1 | F )
or, equivalently,
P( E1  E2 | F )  P( E1 | F ) P( E2 | F ) .
Example 1: An insurance company believes that people can
be divided into two classes: those who are accident-prone
and those who are not. Their statistics show that an
accident-prone person will have an accident at some time
within a fixed 1-year period with probability .4, whereas
this probability decreases to .2 for a non-accident prone
person. 30 percent of the population is accident-prone.
Consider a two-year period. Assume that the event that a
person has an accident in the first year is conditionally
independent of the event that a person has an accident in
the second year given whether or not the person is accident
prone. What is the conditional probability that a randomly
selected person will have an accident in the second given
that the person had an accident in the first year?
II. Random Variables
So far, we have been defining probability functions in
terms of the elementary outcomes making up an
experiment’s sample space.
Thus, if two fair dice were tossed, a probability was
assigned to each of the 36 possible pairs of upturned faces,:
P((3,2))=1/36, P((2,3))=1/36, P((4,6))=1/36 and so on.
We have seen that in certain situations some attribute of an
outcome may hold more interest for the experimenter than
the outcome itself.
A craps player, for example, may be concerned only that he
throws a 7, not whether the 7 was the result of a 5 and a 2, a
4 and a 3 or a 6 and a 1.
That, being the case, it makes sense to replace the 36member sample space of (x,y) pairs with the more relevant
(and simpler) 11-member set of all possible two-dice sums,
S  {x  y : x  y  2,3, ,12} .
This redefinition of the sample space not only changes the
number of outcomes in the space (from 36 to 11) but also
changes the probability structure. In the original sample
space, all 36 outcomes are equally likely. In the revised
sample space, the 11 outcomes are not equally likely. The
probability of getting a sum equal to 2 is 1/36[=P((1,1))],
but the probability of getting a sum equal to 3 is
2/36[=P((1,2))+P((2,1))].
In general, rules for redefining sample spaces – like going
from (x,y)’s to (x+y)’s – are called random variables.
As a conceptual framework, random variables are of
fundamental importance: they provide a single rubric under
which all probability problems may be brought. Even in
cases where the original sample space needs no redefinition
– that is, where the measurement recorded is the
measurement of interest – the concept still applies: we
simply take the random variable to be the identity mapping.
Formal definitions for random variables:
A random variable a real-valued function whose domain is
the sample space S. We denote random variables by
uppercase letters, often X, Y or Z.
A random variable that can take on a finite or at most
countably infinite number of values is said to be discrete; a
random variable that can take on values in an interval of
real numbers, bounded or unbounded, is said to be
continuous.
We will focus on discrete random variables in Chapter 4
and consider continuous random variables in Chapter 5.
Associated with each discrete random variable X is a
probability mass function (pmf) p ( a ) that gives the
probability that X equals a:
p(a)  P{ X  a}  P({s  S | X ( s)  a}) .
Example 2: Suppose two fair dice are tossed. Let X be the
random variable that is the sum of the two upturned faces.
X is a discrete random variable since it has finitely many
possible values (the 11 integers 2, 3, ..., 12). The
probability mass function of X is
P(X=2)=1/36
P(X=3)=2/36
P(X=4)=3/36
P(X=5)=4/36
P(X=6)=5/36
P(X=7)=6/36
P(X=8)=5/36
P(X=9)=4/36
P(X=10)=3/36
P(X=11)=2/36
P(X=12)=1/36
It is often instructive to present the probability mass
function in a graphical format plotting p ( xi ) on the y-axis
against xi on the x-axis. See Figure 4.2 in the book.
Suppose the random variable X can take on values x1 , x2 ,
Since the probability mass function is a probability function
on the redefined sample space that considers values of X,

we have that
 P( X  x )  1 .
i
i 1
[This follows from
1  P( S )  P(

i 1

{ X  xi })   P( X  xi ) ]
i 1
Example 3: Independent trials, consisting of the flipping of
a coin having probability p of coming up heads, are
continually performed until either a head occurs or a total
of n flips is made. Let X be the random variable that
denotes the number of times the coin is flipped. The
probability mass function for X is
P{ X  1}  P{H }  p
P{ X  2}  P{(T , H )}  (1  p) p
P{ X  3}  P{(T , T , H )}  (1  p) 2 p
P{ X  n  1}  P{(T , T ,
, T , H )}  (1  p) n  2 p
n2
P{ X  n}  P{(T , T ,
n 1
As a check, note that
, T , T ), (T , T ,
n 1
, T , H )}  (1  p) n 1
n
n 1
i 1
i 1
 P{ X  i}   p(1  p)
i 1
 (1  p ) n 1
1  (1  p) n 1 
n 1
 p
  (1  p)
 1  (1  p) 
 1  (1  p) n 1  (1  p) n 1
1
III. Expected Value
Probability mass functions provide a global overview of a
random variable’s behavior. Detail that explicit, though, is
not always necessary – or even helpful. Often times, we
want to focus the information contained in the pmf by
summarizing certain of its features with single numbers.
The first feature of a pmf that we will examine is central
tendency, a term referring to the “average” value of a
random variable.
The most frequently used measure for describing central
tendency is the expected value. For a discrete random
variable, the expected value of a random variable X is a
weighted average of the possible values X can take on, each
value being weighted by the probability that X assumes it:
E[ X ]   xp( x) .
x: p ( x )  0
Example 2 continued: The expected value of the random
variable X is
E[ X ]  2*(1/ 36)  3*(2 / 36)  4*(3 / 36)  5*(4 / 36)  6*(5 / 36) 
7*(6/36)+8*(5/36)+9*(4/36)+10*(3/36)+11*(2/36)+12*(1/36)=7
Another motivation for the definition of the expected value
is provided by the frequency interpretation of probabilities.
The frequency interpretation assumes that if an infinite
sequence of independent replications of an experiment is
performed, then for any event E, the proportion of times E
occurs will be P(E). Now consider a random variable X
that takes on values x1 , , xn with probabilities
p( x1 ), , p( xn ) . Then the mean value of X over many
repetitions of the experiment will be
E[ X ]   xp( x)
x: p ( x )  0