Download 2.0 Probability Concepts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Randomness wikipedia , lookup

Transcript
2.0 Probability Concepts
•  definitions: randomness, parent
population, random variable, probability,
statistical independence, probability of
multiple events, derived random variables
•  the probability density function, pdf
•  the cumulative distribution function, cdf
•  discrete pdf for rolling two dice
•  continuous pdf for a fluorescence decay
2.0 : 1/11
Randomness
•  randomness: when repeated measurements vary in an
unpredictable way, they are said to be random
•  parent population: the complete set of possible random outcomes,
often written as a set, {...}
§  finite: enumerable number of outcomes, e.g. tossing a coin
{H,T}, rolling a die {1,2,3,4,5,6}, drawing from a shuffled deck
of cards
§  infinite: an infinite number of outcomes
> continuous: real-numbered values such as time or voltage
> event trigger: tossing a coin until H is observed
§  pseudo-infinite: enumerable but very large, e.g. mole which is a
pseudo-real number
•  random variable: an ordered listing of all possible outcomes of the
parent population
•  a numeric random variable can be discrete or continuous
•  discrete example: whole numbers, multiples of π
•  continuous example: any real number between 0 and 1
2.0 : 2/11
Probability
•  the probability, p, of observing a given value of the random
variable is the fraction of a very large number of measurements
yielding that outcome
•  the sum of the probability of all possible outcomes is 1
Example: A coin is tossed 10,000 times yielding 5,013 heads
and 4,987 tails. Thus, p(H) = 5013/10000 = 0.5013 and p(T) =
0.4987. Note that p(H) + p(T) = 1.
•  theory can often be used - rolling a die has the outcomes {1...6},
each with the same probability p = 1/6; drawing an ace from a deck
of playing cards, p = 4/52.
•  for discrete random variables each outcome has a finite probability
•  for a continuous random variable only ranges of outcomes have
non-zero probability
Example: Consider a continuous random variable, x, which
has a uniform probability over the range, 0 ≤ x ≤ 1.
p(0.45 ≤ x ≤ 0.55) = (0.55-0.45)/(1-0) = 0.1
p(0.495 ≤ x ≤ 0.505) = 0.01
....
p(0.49999999995 ≤ x ≤ 0.50000000005) = 0.0000000001
2.0 : 3/11
Miscellaneous
Measurements are statistically independent when the knowledge
that one outcome has been observed will not influence the outcome
of a second observation.
Example: tossing a coin
Counter Example: drawing an ace from a deck of cards
The probability of observing multiple events is given by the product
of the individual probabilities.
Example: the probability of tossing a coin to observe H and
rolling a die to observe 5, is given by (1/2)×(1/6) = 1/12.
Many random variables used to test hypotheses are derived from
one or more random measurements. The manner in which the
randomness is transferred during the calculation is of extreme
importance.
Example: rolling two dice and summing their face values
Example: computing the area of a circle by measuring its
diameter
2.0 : 4/11
Probability Functions
The probability density function, f(x), describes how probability is
distributed over the random variable.
For a discrete random variable, p(x) = f(x), and f(x) has no units.
For a continuous random variable, dp(x) = f(x)dx, and f(x) has units
of x-1.
The cumulative distribution function, F(x), describes how probability
accumulates as the range of allowed outcomes is increased. The
accumulation starts with the first enumerated outcome of the
random variable. F(x) has units of probability for both discrete and
continuous random variables.
F(x) is described as a sum for discrete random variables and an
integral for continuous random variables.
F (m) =
2.0 : 5/11
m≤b
∑
x =a
m≤b
f ( x)
F (m) =
∫
x=a
f ( x)dx
a ≤ x≤b
Discrete PDF - Two Dice
m
Consider an experiment where a red die and a blue die are rolled.
The value of the blue die will be subtracted from the value of the red
die.
The possible outcomes are x = {-5,-4,-3,-2,-1, 0, 1, 2, 3, 4, 5}.
Probability can be computed by counting the number of ways the
value of each outcome can be obtained. -5 can be obtained only one
way, (1 - 6), thus f(-5) = 1×(1/6)×(1/6) = 1/36. 0 can be obtained
six ways (1-1)...(6-6), thus f(0) = 6/36.
0.20
The pdf can also be written as a
two-part function.
−5 ≤ x ≤ 0
f ( x ) = ( 6 − x ) 36
0< x≤5
f(x)
f ( x) = ( 6 + x ) 36
0.15
0.10
0.05
0.00
-5
2.0 : 6/11
-4
-3
-2
-1
0
x
1
2
3
4
5
Discrete CDF - Two Dice
0.80
f(x)
The cdf can be obtained by
summing up the individual
probabilities, starting with -5.
F(-5) = 1/36
F(-4) = 3/36
F(-3) = 6/36
...
F(0) = 21/36
...
F(5) = 36/36
1.00
0.60
0.40
0.20
0.00
-5
-4
-3
-2
-1
0
1
2
3
4
x
The cdf can also be written as a two-part function.
m
F (m) =
∑
x =−5
(6 + x ) =
36
m + 6 )( m + 7 )
(
1 m+6
y=
∑
36 y =1
72
42 m ⎛ 6 − x ⎞ 42 m 6 m x m (11 − m ) + 42
F (m) =
+ ∑⎜
+ ∑ −∑ =
⎟=
72 x=1 ⎝ 36 ⎠ 72 x=1 36 x=1 36
72
2.0 : 7/11
−5 ≤ m ≤ 0
0<m≤5
5
Probability Calculation - Two Dice
The cdf is used to compute probability over an interval of the random
variable. The probability that x falls in the range, a < x ≤ b, is given
by F(b) - F(a).
What is the probability that the difference between two dice will have
values of -1, 0, or +1?
p ( −2 < x ≤ 1) = F (1) − F ( −2 )
1(11 − 1) + 42 52
F (1) =
=
72
72
−2 + 6 )( −2 + 7 ) 20
(
F ( −2 ) =
=
72
72
52 − 20 32 16
p ( −2 < x ≤ 1) =
=
=
72
72 36
2.0 : 8/11
Continuous PDF - Exponential Decay
m
Consider an exponential fluorescence decay that has a lifetime, τ.
The intensity of fluorescence as a function of time is normally
written as,
I(t) = I0×exp(-t/τ)
where I0 is the intensity at t = 0. Now suppose photons are
measured and we wish to compute the probability of observing
photons at various times, 0 ≤ t ≤ ∞. We first need to write the
probability density function, remembering that f(t) has to have units
of t-1, in order that f(x)dx be unitless. Note that τ has to have units
of t so that the exponent is unitless. 0.25
f ( t ) = exp ( − t τ )
τ
0≤t ≤∞
0.15
f(t)
1
τ = 5 ns
0.2
The graph at the right shows the pdf
for a 5-ns fluorescence decay.
0.1
0.05
0
-5
2.0 : 9/11
0
5
10
t (ns)
15
20
25
Continuous CDF - Exponential Decay
The cdf for an exponential decay can be obtained by integration of
the pdf.
F (m) =
1
m
m
exp ( − t τ ) dt = ⎡⎣ − exp ( − t τ )⎤⎦ = 1 − exp ( − m τ )
∫
0
τ
0
An important check is to make sure the cdf goes to unity over the
range of the random variable, in this case over the range of 0 to ∞.
F ( ∞ ) = 1 − exp ( −∞ / τ ) = 1 − 0 = 1
1.2
The graph at the right shows the
cdf for a 5-ns fluorescence decay.
0.8
F(t)
1
τ = 5 ns
0.6
0.4
0.2
0
-5
2.0 : 10/11
0
5
10
t (ns)
15
20
25
Probability Calculation - Decay
When using time-filtered detection of fluorescence, it is important to
collect as many photons as possible. The fraction of photons
collected directly affects the sensitivity of the method.
Two interferences prevent one from collecting all of the photons. At
short times Rayleigh and Raman scatter will be erroneously added
to the signal. In contrast photomultiplier dark counts will be
distributed evenly over time. Thus, the temporal filter might start
at 1 ns and end 3τ later. The cdf can be used to compute the
fraction of the fluorescence collected. Do the calculation with τ = 5
ns.
p (1 < t < 3 × 5 + 1) = F (16 ) − F (1)
p (1 < t < 3 × 5 + 1) = 1 − exp ( −16 5 ) − 1 + exp ( −1 5 )
p (1 < t < 3 × 5 + 1) = 0.82 − 0.04 = 0.78
Note that where you start has a greater impact on the sensitivity than
where you end. This means that a temporally narrower gate following
a laser pulse will dramatically improve the measurement.
2.0 : 11/11