Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 - Introduction
2 - Exploratory Data Analysis
3 - Probability Theory
4 - Classical Probability Distributions
5 - Sampling Distrbns / Central Limit Theorem
6 - Statistical Inference
7 - Correlation and Regression
(8 - Survival Analysis)
1
What is the connection between
probability and random variables?
Events (and their corresponding
probabilities) that involve
experimental measurements can
be described by random variables.
2
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
x1
x2
x3
x6
…etc….
x5
x4
xn
SAMPLE of size n
Pop values
Probabilities
xi
p(xi )
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
Data values
Relative Frequencies
xi
p(xi ) = fi /n
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
xk
p(xk)
Total
1
3
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
“Density”
f ( x ) p ( x)
(height) (area)
Probability
Histogram
p( x)  f ( x) x
Probabilities
x
p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
Total Area = 1
p(x) = Probability that the
random variable X is equal
to a specific value x, i.e.,
|
x
x
(width)
Pop values
p(x) = P(X = x)
“probability mass
function” (pmf)
|
x
X
Consider the following discrete random variable…
Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)”
X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6.
Probability Histogram
Probability Table
x
p(x)
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
1/6
1
Density f(x)
P(X = x)
Total Area = 1
1
6
1
6
1
6
1
6
1
6
1
6
X
“What is the probability of rolling a 4?”
p (4)  P( X  4) 
5
Consider the following discrete random variable…
Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)”
X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6.
Probability Histogram
Probability Table
x
p(x)
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
1/6
1
Density f(x)
P(X = x)
Total Area = 1
1
6
1
6
1
6
1
6
1
6
1
6
X
“What is the probability of rolling a 4?”
p (4)  P( X  4) 
1
6
6
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Probability
Histogram
Pop values
Probabilities
x
p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
Total Area = 1
F(x) = Probability that the
random variable X is less
than or equal to a specific
value x, i.e.,
F(x) = P(X  x)
“cumulative distribution
function” (cdf)
|
x
X
Motivation ~ Consider the following discrete random variable…
Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)”
X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6.
Cumulative
distribution
P(X = x)
P(X  x)
x
p(x)
F(x)
1
1/6
1/6
2
1/6
2/6
3
1/6
3/6
4
1/6
4/6
5
1/6
5/6
6
1/6
1
1
8
Motivation ~ Consider the following discrete random variable…
Example: X = “value shown on a single random toss of a fair die (1, 2, 3, 4, 5, 6)”
X is said to be uniformly distributed over the values 1, 2, 3, 4, 5, 6.
Cumulative
distribution
P(X = x)
P(X  x)
x
p(x)
F(x)
1
1/6
1/6
2
1/6
2/6
3
1/6
3/6
4
1/6
4/6
5
1/6
5/6
6
1/6
1
1
“staircase graph”
from 0 to 1
9
POPULATION
Pop vals
pmf
x
p(x)
x1
p(x1)
F(x1) = p(x1)
x2
p(x2)
F(x2) = p(x1) + p(x2)
x3
p(x3)
F(x3) = p(x1) + p(x2) + p(x3)
⋮
⋮
⋮
Total
1
increases from 0 to 1
random variable X
Example: X = Cholesterol level (mg/dL)
cdf
Calculating
“interval probabilities”…
F(b) = P(X  b)
F(a–) = P(X  a–)
F(b) – F(a–) =
P(X  b) – P(X  a–)
= P(a  X  b)
b
  p(x)
a
| |
a–a
|
b
X
F(x) = P(X  x)
POPULATION
Pop vals
pmf
x
p(x)
x1
p(x1)
F(x1) = p(x1)
x2
p(x2)
F(x2) = p(x1) + p(x2)
x3
p(x3)
F(x3) = p(x1) + p(x2) + p(x3)
⋮
⋮
⋮
Total
1
increases from 0 to 1
random variable X
Example: X = Cholesterol level (mg/dL)
Calculating
“interval probabilities”…
F(b) = P(X  b)
F(a–) = P(X  a–)
b
a
cdf
f ( x) dx  F (b)  F (a)
b
f
(
x
)
x
F
(
b
)
F
(
a
)
F(b) – F(a–) =
a
p( x)
P(X  b) – P(X  a–)
= P(a  X  b)
b
  p(x)
a
F(x) = P(X  x)
| |
a–a
|
b
X
FUNDAMENTAL
THEOREM OF
CALCULUS
(discrete form)
POPULATION
Pop vals
pmf
x
p(x)
x1
p(x1)
F(x1) = p(x1)
x2
p(x2)
F(x2) = p(x1) + p(x2)
x3
p(x3)
F(x3) = p(x1) + p(x2) + p(x3)
⋮
⋮
⋮
Total
1
increases from 0 to 1
random variable X
Example: X = Cholesterol level (mg/dL)
Calculating
“interval probabilities”…
F(b) = P(X  b)
F(a–) = P(X  a–)
b
a
cdf
f ( x) dx  F (b)  F (a)
b
f
(
x
)
x
F
(
b
)
F
(
a
)
F(b) – F(a–) =
a
p( x)
P(X  b) – P(X  a–)
= P(a  X  b)
b
  p(x)
a
F(x) = P(X  x)
| |
a–a
|
b
X
FUNDAMENTAL
THEOREM OF
CALCULUS
(discrete form)
POPULATION
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
random variable X
Example: X = Cholesterol level (mg/dL)
Just as the sample mean x and sample variance s2 were used to characterize
“measure of center” and “measure of spread” of a dataset, we can now define the
“true” population mean  and population variance  2, using probabilities.
•
Population mean
   x p ( x)
Also denoted by E[X], the “expected value” of the variable X.
•
Population variance
 2   ( x   ) 2 p ( x)
13
POPULATION
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
random variable X
Example: X = Cholesterol level (mg/dL)
Just as the sample mean x and sample variance s2 were used to characterize
“measure of center” and “measure of spread” of a dataset, we can now define the
“true” population mean  and population variance  2, using probabilities.
•
Population mean
   x p ( x)
Also denoted by E[X], the “expected value” of the variable X.
•
Population variance
 2   ( x   ) 2 p ( x)
14
Example 1: POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
1/2
Pop values
Probabilities
xi
p(xi )
210
1/6
240
1/3
270
1/2
Total
1
1/3
1/6
   x p( x)  (210)(1/ 6)  (240)(1/ 3)  (270)(1/ 2)  250
2
2
2
 2   ( x   )2 p( x)  (40) (1/ 6)  (10) (1/ 3)  (20) (1/ 2)  500
15
Example 2: POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Equally likely outcomes result
in a “uniform distribution.”
Pop values
Probabilities
xi
p(xi )
180
1/3
210
1/3
240
1/3
Total
1
1/3
1/3
1/3
   x p( x)  (180)(1/ 3)  (210)(1/ 3)  (240)(1/ 3)  210 (clear from symmetry)
2
2
2
 2   ( x   )2 p( x)  (30) (1/ 3)  (0) (1/ 3)  (30) (1/ 3)  600
16
To summarize…
17
POPULATION
Discrete
random variable X
Probability Table
Pop
Probabilities
xi
pmf p(xi )
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
1
Probability Histogram
Total Area = 1
X
   x p( x)
 2   ( x   ) 2 p ( x)
Frequency Table
Data
xi
x1
x2
x3
x6
x4
…etc….
x5
xn
SAMPLE of size n
Relative
Frequencies
Density Histogram
p(xi ) = fi /n
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
xk
p(xk)
1
Total Area = 1
X
x   x p( x)
s 2  nn1  ( x  x ) 2 p( x)
18
POPULATION
Continuous
Discrete
random variable X
Probability Table
Pop
Probabilities
xi
pmf p(xi )
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
1
Probability Histogram
Total Area = 1
X
   x p( x)
 2   ( x   ) 2 p ( x)
Frequency Table
Data
xi
x1
x2
x3
x6
x4
…etc….
x5
xn
SAMPLE of size n
Relative
Frequencies
Density Histogram
p(xi ) = fi /n
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
xk
p(xk)
1
Total Area = 1
X
x   x p( x)
s 2  nn1  ( x  x ) 2 p( x)
19
One final example…
20
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
p1(x)
1 = 250
x
p2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
Outcomes
(210, 240)
(210, 210), (240, 240)
+30
(210, 180), (240, 210), (270, 240)
+60
(240, 180), (270, 210)
+90
(270, 180)
21
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
p1(x)
1 = 250
x
p2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
Probabilities
Outcomesp(d)
1/9
? 240)
(210,
2/9
? 210), (240, 240)
(210,
+30
3/9
? 180), (240, 210), (270, 240)
(210,
+60
2/9
? 180), (270, 210)
(240,
+90
1/9
? 180)
(270,
The
outcomes of
D are NOT
EQUALLY
LIKELY!!!
22
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
p1(x)
1 = 250
x
p2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
Probabilities
Outcomesp(d)
(1/6)(1/3)
(210, 240)= 1/18 via independence
(210, 210), (240, 240)
+30
(210, 180), (240, 210), (270, 240)
+60
(240, 180), (270, 210)
+90
(270, 180)
23
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
p1(x)
1 = 250
x
p2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
Probabilities p(d)
(1/6)(1/3) = 1/18 via independence
(210, 210),+ (1/3)(1/3)
(1/6)(1/3)
(240, 240)
= 3/18
+30
(210, 180), (240, 210), (270, 240)
+60
(240, 180), (270, 210)
+90
(270, 180)
24
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
X2 = Cholesterol level (mg/dL)
x
p1(x)
1 = 250
x
p2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
Probability Histogram
6/18
5/18
3/18
3/18
1/18
D = X1 – X2 ~ ???
d
-30
0
Probabilities p(d)
(1/6)(1/3) = 1/18 via independence
(1/6)(1/3) + (1/3)(1/3) = 3/18
+30
(210, 180),+ (1/3)(1/3)
(240, 210),
(270, 240)
(1/6)(1/3)
+ (1/2)(1/3)
= 6/18
+60
(240, 180),+ (1/2)(1/3)
(270, 210)
(1/3)(1/3)
= 5/18
+90
(270, 180)= 3/18
(1/2)(1/3)
25
Example 3: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
Probability Histogram
X2 = Cholesterol level (mg/dL)
x
p1(x)
1 = 250
x
p2(x)
2 = 210
210
1/6
12 = 500
180
1/3
22 = 600
240
1/3
210
1/3
270
1/2
240
1/3
Total
1
Total
1
D = X1 – X2 ~ ???
d
-30
0
6/18
5/18
3/18
1/18
D = (-30)(1/18) + (0)(3/18) +
(30)(6/18) + (60)(5/18) +
(90)(3/18) = 40
Probabilities f(d)
D = 1 – 2
(1/6)(1/3) = 1/18 via independence
(1/6)(1/3) + (1/3)(1/3) = 3/18
+30
(210, 180),+ (1/3)(1/3)
(240, 210),
(270, 240)
(1/6)(1/3)
+ (1/2)(1/3)
= 6/18
+60
(240, 180),+ (1/2)(1/3)
(270, 210)
(1/3)(1/3)
= 5/18
+90
(270, 180)= 3/18
(1/2)(1/3)
3/18
D2 = (-70) 2(1/18) + (-40) 2(3/18) +
(-10) 2(6/18) + (20) 2(5/18) +
(50) 2(3/18) = 1100
2 =
2 +
2
D
1
2
26
General: TWO INDEPENDENT POPULATIONS
X1 = Cholesterol level (mg/dL)
IF the two
Probability
Histogram
populations
are
dependent…
X2 = Cholesterol level (mg/dL)
x
f1(x)
1 = 250
210
1/6
12 = 500
240
1/3
f2(x) 2 = 210
…then
this
2
180
1/3still 
formula
holds,
2 = 600
210 BUT……
1/3
270
1/2
240
Total
1
x
1/3
-30
0
5/18
3/18
3/18
1/18
Mean (X1 – X
Total
2) = 1Mean (X1) – Mean (X2)
D = X1 – X2 ~ ???
d
6/18
D = (-30)(1/18) + (0)(3/18) +
(30)(6/18) + (60)(5/18) +
(90)(3/18) = 40
Probabilities f(d)
D = 1 – 2
(1/6)(1/3) = 1/18 via independence
(1/6)(1/3) + (1/3)(1/3) = 3/18
= (-70)
+ Cov
(-40) 2(3/18)
+ )
Var (X1 – X2) = Var (X1) D+2 Var
(X22(1/18)
)
–
2
(X
,
X
2
1
2
2
+30
(210, 180),+ (1/3)(1/3)
(240, 210),
(270, 240)
(1/6)(1/3)
+ (1/2)(1/3)
= 6/18
+60
(240, 180),+ (1/2)(1/3)
(270, 210)
(1/3)(1/3)
= 5/18
These two formulas are valid for
(270, 180)
+90
(1/2)(1/3)
= 3/18
continuous
as well
as discrete distributions.
(-10) (6/18) + (20) (5/18) +
(50) 2(3/18) = 1100
2 =
2 +
2
D
1
2
27
NOTICE TO STAT 324
• Slides 29-41 contain more details on properties of
Expected Values. They are not required for Stat
324, but if you are experiencing difficulty with the
formulas, you may find them of some benefit.
• Special note regarding Slide 41: Similar to the
“alternate computational formula” for sample
variance s2, such a formula also exists for
population variance σ 2, derived there. Stat 324
material picks up with the Binomial Distribution.
28
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Mean:  X  E[ X ]   x p( x)
Suppose X is transformed to another random variable, say h(X).
Then by def, h ( X )  E[h( X )] 
 h( x) p( x)
Variance:  X2  
E ( xXXX))22 p(x
) ( x   X ) 2 p( x)
29
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
bx1
bx2
bx3
p(x1)
⋮
⋮
Total
1
p(x2)
p(x3)
General Properties of “Expectation” of X
Mean:  X  E[ X ]   x p( x)
Suppose X is constant, say b, throughout entire population…
Then by def,
E[b] 
 b p ( x) 
b
 p ( x)
 b 1  b
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
30
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
bx1
bx2
bx3
p(x1)
⋮
⋮
Total
1
p(x2)
p(x3)
General Properties of “Expectation” of X
Mean:  X  E[ X ]   x p( x)
Suppose X is constant, say b, throughout entire population…
Then…
E[b]  b
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
31
POPULATION
random variable X
Pop values
Probabilities
x
pmf p(x)
a x1
a x2
a x3
Example: X = Cholesterol level (mg/dL)
p(x1)
p(x2)
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Mean:  X  E[ X ]   x p( x)
Multiply X by any constant a…
Then by def,
E[aX ]   a x p( x)  a
 x p ( x)
 a E[ X ]
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
32
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
a x1
a x2
a x3
p(x1)
p(x2)
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Mean:  X  E[ X ]   x p( x)
Multiply X by any constant a…
Then…
E[aX ]  a E[ X ]
i.e.,…
a X  a  X
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
33
POPULATION
Pop values
Probabilities
x
pmf p(x)
x1  b
random variable X
Example: X = Cholesterol level (mg/dL)
x2  b
x3  b
p(x1)
p(x2)
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Mean:  X  E[ X ]   x p( x)
Multiply X by any constant a…
Then…
E[aX ]  a E[ X ]
i.e.,…
a X  a  X
Add any constant b to X…
 ( x  b) p( x)
  x p( x)   b p( x)
E[ X  b] 
 E[ X ]  E[b]
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
34
POPULATION
Pop values
Probabilities
x
pmf p(x)
x1  b
random variable X
Example: X = Cholesterol level (mg/dL)
x2  b
x3  b
p(x1)
p(x2)
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Mean:  X  E[ X ]   x p( x)
Multiply X by any constant a…
Add any constant b to X…
Then…
E[aX ]  a E[ X ]
E[ X  b]  E[ X ]  b
i.e.,…
a X  a  X
X b  X  b
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
35
POPULATION
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
random variable X
Example: X = Cholesterol level (mg/dL)
General Properties of “Expectation” of X
Mean:  X  E[ X ]   x p( x)
E[aX  b]  a E[ X ]  b
 a X b  a  X  b
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
36
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
Multiply X by any constant a… then X is also multiplied by a.
2
 aX
 E (aX  a X ) 2 
 E  a 2 ( X   X ) 2 
 a 2 E ( X   X ) 2 
 a 2  X2
37
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
Multiply X by any constant a… then X is also multiplied by a.
2
 aX
 a 2  X2
2
i.e.,…Var (aX )  a Var ( X )
 aX  a  X
i.e.,…SD(aX )  a SD( X )
38
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
Add any constant b to X…
then b is also added to X .
2
 X2 b  E  ( X  b)  ( X  b)  
 E  ( X   X ) 2 
  X2
39
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
Add any constant b to X…
then b is also added to X .
 X2 b   X2
i.e.,…Var ( X  b)  Var ( X )
 X b   X
i.e.,… SD( X  b)  SD( X )
40
POPULATION
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
random variable X
Example: X = Cholesterol level (mg/dL)
General Properties of “Expectation” of X
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
 E  X 2  2 X  X   X 2 
 E  X 2   2E 2X E
X XX   
EX2EX21
 E  X 2   2 X 2   X 2
 E  X 2    X 2
41
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Pop values
Probabilities
x
pmf p(x)
x1
p(x1)
x2
p(x2)
x3
p(x3)
⋮
⋮
Total
1
General Properties of “Expectation” of X
Variance:  X2  E ( X   X ) 2    ( x   X ) 2 p( x)
 X2  E  X 2    X 2   x2 p( x)   X 2
  E  X   E[ X ]
2
X
2
2
  x p( x)   x p( x) 
2
2
This is the analogue of the “alternate computational
formula” for the sample variance s2.
42
~ The Binomial Distribution ~
 Used only when dealing with binary outcomes
(two categories: “Success” vs. “Failure”), with a
fixed probability of Success () in the population.
 Calculates the probability of obtaining any given
number of Successes in a random sample of n
independent “Bernoulli trials.”
 Has many applications and generalizations, e.g.,
multiple categories, variable probability of
Success, etc.
POPULATION
40% Male,
60% Female
For any randomly selected individual,
define a binary random variable:
1 if Male, with prob   0.4
Y 
0 if Female, with prob 1    0.6
RANDOM
SAMPLE
n = 100
Discrete random variable
X = # Males in sample
(0, 1, 2, 3, …, 99, 100)
x
p(x)
F(x)
x1
p(x1)
F(x1)
How can we calculate the probability of x p(x ) F(x )
= P(X = x),
for x==2),
0, …,
1, 2,
3, …,100?
P(Xp(x)
= 0),
1), P(X
P(X
= 99), P(X = x100)?
p(x )
⋮
⋮
⋮
1
F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100?
1
2
2
3
3
2
POPULATION
40% Male,
60% Female
RANDOM
SAMPLE
n = 100
For any randomly selected individual,
define a binary random variable:
1 if Male, with prob   0.4
Y 
0 if Female, with prob 1    0.6
Discrete random variable
X = # Males in sample
(0, 1, 2, 3, …, 99, 100)
Example: How can we calculate the probability of
p(25)
p(x) = P(X = x),
for=xP(X
= 0,=1,25)?
2, 3, …,100?
Solution:
F(x) =
Model
P(X the
≤ x),sample
for x =as
0, a1,sequence
2, 3, …,100?
of independent
coin tosses, with 1 = Heads (Male), 0 = Tails (Female),
where
P(H) = 0.4, P(T) = 0.6
.… etc….
45
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
......
97
98
99
100
......
…
X = 25 Heads: { H1, H2, H3,…, H25 }
permutations of 25 among 100
There are 100 possible open slots for H1 to occupy.
For each one of them, there are 99 possible open slots left for H2 to occupy.
For each one of them, there are 98 possible open slots left for H3 to occupy.
…etc…etc…etc…
For each one of them, there are 77 possible open slots left for H24 to occupy.
For each one of them, there are 76 possible open slots left for H25 to occupy.
Hence, there are ??????????????????????
100  99  98  …  77  76 possible outcomes.
This value is the number of permutations of the coins, denoted 100P25.
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
......
97
98
99
100
......
X = 25 Heads: { H1, H2, H3,…, H25 }
100  99  98  …  77  76
permutations of 25 among 100
This number unnecessarily includes the distinct permutations of the
25 among themselves, all of which have Heads in the same positions.
For example: We would not want to count this as a distinct outcome.
1
2
3
4
5
......
......
97
98
99
100
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
......
97
98
99
100
......
X = 25 Heads: { H1, H2, H3,…, H25 }
100  99  98  …  77  76
permutations of 25 among 100
This number unnecessarily includes the distinct permutations of the
25 among themselves, all of which have Heads in the same positions.
How many is that? By the same logic…... 25  24  23  …  3  2  1
100  99  98  …  77  76
100!_
=
25  24  23  …  3  2  1
25! 75!
“25 factorial” - denoted 25!
R: choose(100, 25)
Calculator: 100 nCr 25
 100 
“100-choose-25” - denoted  25  or 100C25
This value counts the number of combinations of 25 Heads among 100 coins.
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
0.4 0.6 0.6 0.4 0.6
......
97
. ... . . ... .
98
99
100
0.6 0.4 0.4 0.6
 100 
Answer:  25 
What is the probability of each such outcome?
Recall that, per toss, P(Heads) =  = 0.4
P(Tails) = 1 –  = 0.6
Answer: Via independence in binary outcomes between any two coins,
0.4  0.6  0.6  0.4  0.6  …  0.6  0.4  0.4  0.6 = (0.4)25 (0.6)75.
100 
25
75
Therefore, the probability P(X = 25) is equal to……. 
 (0.4) (0.6)
 25 
R: dbinom(25, 100, .4)
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
0.4 0.5
0.6 0.5
0.6 0.5
0.4 0.5
0.6
0.5
 100 
Answer:  25 
......
97
. ... . . ... .
98
99
100
0.6 0.5
0.4 0.5
0.4 0.5
0.6
0.5
This is the “equally likely” scenario!
What is the probability of each such outcome?
Recall that, per toss, P(Heads) =  = 0.4
0.5
P(Tails) = 1 –  = 0.5
0.6
Answer: Via independence in binary outcomes between any two coins,
25 100
75
0.4  0.5
0.6  0.5
0.6  0.5
0.4  0.5
0.6  …  0.5
0.6  0.5
0.4  0.5
0.4  0.5
0.6 = (0.4)
.
(0.5)(0.6)
0.5
 100 
10025 100
100 75
(0.6)
2(1/
2)
(0.5)
Therefore, the probability P(X = 25) is equal to…….  25  (0.4)
Question: What if the coin were “fair” (unbiased), i.e.,  = 1 –  = 0.5 ?
POPULATION
“Success”
40%
Male, vs.
“Failure”
60%
Female
RANDOM
SAMPLE
nsize
= 100
n
For any randomly selected individual,
define a binary random variable:
“Success” with prob    0.4
1 if Male,
Y 
“Failure” with prob 11–  0.6
0 if Female,
Discrete random variable
X = # “Successes”
Males in sample
in sample
(0, 1, 2, 3, …, 99,
n) 100)
Example: What is the probability
100
100
 n xx x25 100
x
xx
75
(0.4)
(0.4)
 (1
(1(0.6)
(0.6)
))n100
x x
P(X = 25)?
x
 25
n
x = 0, 1, 2, 3, …,100
Solution:
F(x) =Model
P(X ≤the
x), sample
for x = 0,as
1, 2,
a 3,
sequence
…,100? of n = 100
independent
coinwith
tosses,
with 1 = Heads
(Male), 0= Tails
Bernoulli trials
P(“Success”)
= , P(“Failure”)
= 1 –(Female).
.
independent, with constant
probability () per trial
Then X is said to follow a Binomial distribution,
written X ~ Bin(n, ), with “probability mass function”
n x
n x
, x = 0, 1, 2, …, n.
 (1 .…
 )etc….
x
 
p(x) = 
Example: Blood Type probabilities, revisited
Rh Factor
Blood Type
+
–
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies?
Check:
1. Independent outcomes?
Reasonably assume that outcomes
“Type O” vs. “Not Type O” between
two individuals are independent of
each other. 
2. Constant probability  ?
From table,  = P(Type O) = .461
throughout population. 
Example: Blood Type probabilities, revisited
R: dbinom(0:10, 10, .461)
Rh Factor
x
Blood Type
+
–
O
.384
.077
.461
1
A
.323
.065
.388
2
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
 10 
p(x) =  x  (.461)x (.539)10 – x
 
0
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies. X ~ Bin(10, .461)
3
4
5
6
7
8
9
10
p(x)
 10 
0
 
 10 
 1
 
 10 
2
 
 10 
3
 
 10 
4
 
 10 
5
 
 10 
6
 
 10 
7
 
 10 
8
 
 10 
9
 
 10 
 10 
 
F (x)
(.461)0 (.539)10 = 0.00207
0.00207
(.461)1 (.539)9 = 0.01770
0.01977
(.461)2 (.539)8 = 0.06813
0.08790
(.461)3 (.539)7 = 0.15538
0.24328
(.461)4 (.539)6 = 0.23257
0.47585
(.461)5 (.539)5 = 0.23870
0.71455
(.461)6 (.539)4 = 0.17013
0.88468
(.461)7 (.539)3 = 0.08315
0.96783
(.461)8 (.539)2 = 0.02667
0.99450
(.461)9 (.539)1 = 0.00507
0.99957
(.461)10 (.539)0 = 0.00043
1.00000
1
Example: Blood Type probabilities, revisited
R: dbinom(0:10, 10, .461)
Rh Factor
x
Blood Type
+
–
O
.384
.077
.461
1
A
.323
.065
.388
2
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
 10 
p(x) =  x  (.461)x (.539)10 – x
 
0
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies. X ~ Bin(10, .461)
3
4
5
6
7
8
9
10
p(x)
 10 
0
 
 10 
 1
 
 10 
2
 
 10 
3
 
 10 
4
 
 10 
5
 
 10 
6
 
 10 
7
 
 10 
8
 
 10 
9
 
 10 
 10 
 
F (x)
(.461)0 (.539)10 = 0.00207
0.00207
(.461)1 (.539)9 = 0.01770
0.01977
(.461)2 (.539)8 = 0.06813
0.08790
(.461)3 (.539)7 = 0.15538
0.24328
(.461)4 (.539)6 = 0.23257
0.47585
(.461)5 (.539)5 = 0.23870
0.71455
(.461)6 (.539)4 = 0.17013
0.88468
(.461)7 (.539)3 = 0.08315
0.96783
(.461)8 (.539)2 = 0.02667
0.99450
(.461)9 (.539)1 = 0.00507
0.99957
(.461)10 (.539)0 = 0.00043
1.00000
1
n = 10
p = .461
pmf = function(x)(dbinom(x, n, p))
N = 100000
x = 0:10
bin.dat = rep(x, N*pmf(x))
hist(bin.dat, freq = F, breaks = c(-.5, x+.5), col = "green")
axis(1, at = x)
axis(2)
Example: Blood Type probabilities, revisited
R: dbinom(0:10, 10, .461)
Rh Factor
x
Blood Type
+
–
O
.384
.077
.461
1
A
.323
.065
.388
2
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
 10 
p(x) =  x  (.461)x (.539)10 – x
 
0
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies. X ~ Bin(10, .461)
3
4
5
6
7
8
9
p(x)
 10 
0
 
 10 
 1
 
 10 
2
 
 10 
3
 
 10 
4
 
 10 
5
 
 10 
6
 
 10 
7
 
 10 
8
 
 10 
9
 
 10 
 10 
 
10
n
Also, can show mean  =  x p(x) =
== 4.61
(10)(.461)
and variance  2 =  (x – ) 2 p(x) = n (1 – ) = 2.48
F (x)
(.461)0 (.539)10 = 0.00207
0.00207
(.461)1 (.539)9 = 0.01770
0.01977
(.461)2 (.539)8 = 0.06813
0.08790
(.461)3 (.539)7 = 0.15538
0.24328
(.461)4 (.539)6 = 0.23257
0.47585
(.461)5 (.539)5 = 0.23870
0.71455
(.461)6 (.539)4 = 0.17013
0.88468
(.461)7 (.539)3 = 0.08315
0.96783
(.461)8 (.539)2 = 0.02667
0.99450
(.461)9 (.539)1 = 0.00507
0.99957
(.461)10 (.539)0 = 0.00043
1.00000
1
Example: Blood Type probabilities, revisited
R: dbinom(0:10, 10, .461)
Rh Factor
x
Blood Type
+
–
O
.384
.077
.461
1
A
.323
.065
.388
2
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
 10 
p(x) =  x  (.461)x (.539)10 – x
 
0
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies. X ~ Bin(10, .461)
3
4
5
6
7
8
9
10
p(x)
 10 
0
 
 10 
 1
 
 10 
2
 
 10 
3
 
 10 
4
 
 10 
5
 
 10 
6
 
 10 
7
 
 10 
8
 
 10 
9
 
 10 
 10 
 
Also, can show mean  =  x p(x) = n = 4.61
and variance  2 =  (x – ) 2 p(x) = n (1 – ) = 2.48
F (x)
(.461)0 (.539)10 = 0.00207
0.00207
(.461)1 (.539)9 = 0.01770
0.01977
(.461)2 (.539)8 = 0.06813
0.08790
(.461)3 (.539)7 = 0.15538
0.24328
(.461)4 (.539)6 = 0.23257
0.47585
(.461)5 (.539)5 = 0.23870
0.71455
(.461)6 (.539)4 = 0.17013
0.88468
(.461)7 (.539)3 = 0.08315
0.96783
(.461)8 (.539)2 = 0.02667
0.99450
(.461)9 (.539)1 = 0.00507
0.99957
(.461)10 (.539)0 = 0.00043
1.00000
1
Example: Blood Type probabilities, revisited
Rh Factor
Blood Type
+
Therefore,
 1500 
x
1500  x
(.007)
(.993)
p(x) = 
 x 
–
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
.999
1500
individuals
Suppose nn==10
individuals
areare
to to
be selected at random from the
population.
Probability table for X = #(Type AB–)
Binomial model applies. X ~ Bin(10,
Bin(1500,
.461)
.007)
Also, can show mean  =  x p(x) = n = 10.5
– ) = 10.43
2.48
and variance  2 =  (x – ) 2 p(x) = n (1
x = 0, 1, 2, …, 1500.
RARE EVENT!
Example: Blood Type probabilities, revisited
Therefore,
 1500 
x
1500  x
(.007)
(.993)
p(x) = 
 x 
x = 0, 1, 2, …, 1500.
Is there a better alternative?
RARE EVENT!
Long positive skew as x  1500
…but contribution  0
Example: Blood Type probabilities, revisited
Rh Factor
Blood Type
+
Therefore,
 1500 
x
1500  x
(.007)
(.993)
p(x) = 
 x 
–
x = 0, 1, 2, …, 1500.
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
Poisson distribution
AB
.032
.007
.039
RARE EVENT!
.833
.166
.999
Is there a better alternative?
1500
individuals
Suppose nn==10
individuals
areare
to to
be selected at random from the
population.
Probability table for X = #(Type AB–)
p( x ) =
e μ μ x
x!
x = 0, 1, 2, …,
where mean and variance are
 = n = 10.5 and  2 = n = 10.5
Binomial model applies. X ~ Bin(1500, .007)
Also, can show mean  =  x p(x) = n = 10.5
and variance  2 =  (x – ) 2 p(x) = n (1 – ) = 10.43
X ~ Poisson(10.5)
Notation: Sometimes the
symbol  (“lambda”) is
used instead of  (“mu”).
Example: Blood Type probabilities, revisited
Rh Factor
Blood Type
+
Therefore,
 1500 
x
1500  x
(.007)
(.993)
p(x) = 
 x 
–
x = 0, 1, 2, …, 1500.
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
Poisson distribution
AB
.032
.007
.039
RARE EVENT!
.833
.166
.999
Is there a better alternative?
Suppose n = 1500 individuals are to
be selected at random from the
population.
Probability table for X = #(Type AB–)
p( x ) =
x
ee10.5
(1x 0.5)
x !x !
where mean and variance are
 = n = 10.5 and  2 = n = 10.5
Ex: Probability of exactly X = 15 Type(AB–) individuals = ?
 1500 
15
1485
Binomial:  15  (.007) (.993)
x = 0, 1, 2, …,
Poisson:
X ~ Poisson(10.5)
e 10.5 (10.5)15
15!
(both ≈ .0437)
Example: Deaths in Wisconsin
Example: Deaths in Wisconsin
Assuming deaths among young adults
are relatively rare, we know the following:
• Average λ = 584 deaths per year
• Mortality rate (α) seems constant.
Therefore, the Poisson distribution can be used as a good model to make
future predictions about the random variable X = “# deaths” per year, for this
population (15-24 yrs)… assuming current values will still apply.
 Probability of exactly X = 600 deaths next year
e584 (584)600
 0.0131
P(X = 600) =
600!
R: dpois(600, 584)
 Probability of exactly X = 1200 deaths in the next two years
Mean of 584 deaths per yr  Mean of 1168 deaths per two yrs, so let λ = 1168:
e1168 (1168)1200
 0.00746
P(X = 1200) =
1200!
584 deaths / yr
 Probability of at least one death per day: λ = 365 days / yr = 1.6 deaths/day
P(X ≥ 1) = P(X = 1) + P(X = 2) + P(X = 3) + …
True, but not practical.
e1.6 (1.6)0
= 1 – e–1.6 = 0.798
P(X ≥ 1) = 1 – P(X = 0) = 1 –
0!
Poisson Distribution (discrete)
For x = 0, 1, 2, …, this calculates P(x Events) in a random sample of n trials
coming from a population with rare P(Event) = .
But it may also be used to calculate P(x Events) within a random interval of
time units, for a “Poisson process” having a known “Poisson rate” α.
0
X = # “clicks” on a Geiger counter
in normal background radiation.
T
Poisson Distribution (discrete)
For x = 0, 1, 2, …, this calculates P(x Events) in a random sample of n trials
coming from a population with rare P(Event) = .
But it may also be used to calculate P(x Events) within a random interval of
time units, for a “Poisson process” having a known “Poisson rate” α.
0
T
X = #time
“clicks”
between
on a “clicks”
Geiger on
counter
a
in Geiger
normalcounter
background
in normal
radiation.
background radiation.
failures, deaths, births, etc.
• “Time-to-Event Analysis”
• “Time-to-Failure Analysis”
• “Reliability Analysis”
• “Survival Analysis”
Time between events is often modeled by the
Exponential Distribution (continuous).
● Binomial ~ X = # Successes in n trials, P(Success) = 
● Poisson ~ As above, but n large,  small, i.e., Success RARE
● Negative Binomial ~ X = # trials for k Successes, P(Success) = 
● Geometric ~ As above, but specialized to k = 1
● Hypergeometric ~ As Binomial, but  changes between trials
● Multinomial ~ As Binomial, but for multiple categories, with
1 + 2 + … + last = 1 and x1 + x2 + … + xlast = n
POPULATION
random variable X
Example: X = Cholesterol level (mg/dL)
Example:
X = “reaction time”
“Pain Threshold” Experiment:
Volunteers place one hand on metal
plate carrying low electrical current;
measure duration till hand withdrawn.
Time
Time
intervals
intervals
= 1.0
= 5.0
0.5
2.0
1.0
secs
secs
“In the limit…”
f ( x)
we obtain a
density curve
Total Area = 1
SAMPLE
In principle, as # individuals in samples
increase without bound, the class
interval widths can be made arbitrarily
small, i.e, the scale at which X is
measured can be made arbitrarily fine,
since it is continuous.
67
“In the limit…” we obtain a density curve
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f(x) = probability
density function (pdf)
• f(x)  0
• Area = 1
f ( x)
00 F(x) increases
continuously
from 0 to 1.
x
x
x
As with discrete variables, the density f(x) is the height, NOT the probability p(x) = P(X = x).
In fact, the zero area “limit” argument would seem to imply P(X = x) = 0 ???
(Later…)
However, we can define “interval probabilities” of the form P(a  X  b), using cdf F(x).
68
“In the limit…” we obtain a density curve
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
F(b)
f(x) = probability
density function (pdf)
F(b)  F(a)
F(a)
• f(x)  0
• Area = 1
f ( x)
a
b
F(x) increases
continuously
from 0 to 1.
a
b
As with discrete variables, the density f(x) is the height, NOT the probability p(x) = P(X = x).
In fact, the zero area “limit” argument would seem to imply P(X = x) = 0 ???
(Later…)
However, we can define “interval probabilities” of the form P(a  X  b), using cdf F(x).
69
“In the limit…” we obtain a density curve
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
F(b)
f(x) = probability
density function (pdf)
F(b)  F(a)
F(a)
• f(x)  0
• Area = 1
f ( x)
a
b
F(x) increases
continuously
from 0 to 1.
a
b
An “interval probability” P(a  X  b) can be calculated as the amount of area under
the curve f(x) between a and b, or the difference P(X  b)  P(X  a), i.e., F(b)  F(a).
(Ordinarily, finding the area under a general curve requires calculus techniques…
unless the “curve” is a straight line, for instance. Examples to follow…)
70
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
f ( x)  0.20 > 0
Density
f ( x)
Total Area = 1
1 Check?
1
1
1
6 Base 6= 6 – 16= 5 6
Height = 0.2
1
6
1
6
5  0.2 = 1 
X
“What is the probability of
that
rolling
a random
a 4?” child is 4 years old?” doesn’t mean…..
P( X  4)
4.000000000......)
 16
A single value is one point out of an infinite
continuum of points on the real number line.
The probability that a continuous
random variable is exactly equal to
any single value is ZERO!
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
f ( x)  0.20
Density
f ( x)
1
6
1
6
1
6
1
6
1
6
1
6
X
“What is the probability of
rolling
a 4?” child is 4between
4 and
5 years
old?”
that
a random
years old?”
actually
means....
P(4
( XX4) 5) = (5 – 4)(0.2) = 0.2
NOTE: Since P(X = 5) = 0, no change for P(4  X  5), P(4 < X  5), or P(4 < X < 5).
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f ( x)  0.20
Density
f ( x)
For any x, the area
under the curve is
1
6
1
6
1F(x) =10.2 (x1– 1). 1
6
6
6
6
X
x
x
or F ( x)   0.2 dt
1
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f ( x)  0.20
F(x) = 0.2 (x – 1)
Density
f ( x)
For any x, the area
under the curve is
1
6
1
6
F(x) increases
continuously
from 0 to 1.
1F(x) =10.2 (x1– 1). 1
6
6
6
6
(compare with
“staircase graph”
for discrete case)
X
x
x
or F ( x)   0.2 dt
1
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f ( x)  0.20
F(x) = 0.2 (x – 1)
Density
f ( x)
F(5) = 0.8
1
6
1
6
1
6
1
6
1
6
1
6
X
“What is the probability of
rolling
a 4?” child is under 5 years old?
that
a random
F (5)  P ( X  5)  0.2 (5  1)  0.8
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f ( x)  0.20
F(x) = 0.2 (x – 1)
Density
f ( x)
1
6
1
6
1
6
1
6
1
6
1
6
F(4) = 0.6
X
“What is the probability of
rolling
a 4?” child is under 4 years old?
that
a random
F (4)  P ( X  4)  0.2 (4  1)  0.6
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f ( x)  0.20
F(x) = 0.2 (x – 1)
Density
f ( x)
F(5) = 0.8
1
6
1
6
1
6
1
6
1
6
1
6
F(4) = 0.6
X
“What is the probability of
rolling
a 4?” child is between 4 and 5 years old?”
that
a random
P(4  X  5)  P ( X  5)  P( X  4)
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f ( x)  0.20
F(x) = 0.2 (x – 1)
Density
f ( x)
F(5) = 0.8
1
6
1
6
1
6
1
6
1
6
0.2
1
6
F(4) = 0.6
X
“What is the probability of
rolling
a 4?” child is between 4 and 5 years old?”
that
a random
P(4  X  5)  P ( X  5)  P( X  4)
= F(5)  F(4) = 0.8 – 0.6 = 0.2
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Further suppose that X is uniformly distributed over the interval [1, 6].
Density
f ( x)
f ( x)  .08 ( x  1)  0 
1
Base
Height
 1)  (0.4)
Area = (6
2
=1 
0.4
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Cumulative Distribution Function F(x)
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f ( x)  .08 ( x  1)
Density
f ( x)
F ( x)
x
base
height
1
( x  1) .08( x  1)
2
 .04 ( x  1) 2
F ( x) 
i.e.,
x
1
.08(t  1) dt
F ( x)
x
Consider the following continuous random variable…
Example: X = “Ages of children from 1 year old to 6 years old”
Cumulative Distribution Function F(x)
Cumulative probability F(x) = P(X  x)
= Area under density curve up to x
f ( x)  .08 ( x  1)
Density
f ( x)
F ( x)
base
height
1
( x  1) .08( x  1)
2
 .04 ( x  1) 2
F ( x) 
i.e.,
x
1
.08(t  1) dt
F (5)
F (4)
x
“What is the probability that a child is under 4 years old?”
“What is the probability that a child is under 5 years old?”
“What is the probability that a child is between 4 and 5?”
P ( X  4)  F (4)
P ( X  5)  F (5)
P(4  X  5) 
A continuous random variable X
Cumulative probability function (cdf)
In
summary…
x
corresponds to a probability density
F ( x)  P( X  x) 
f (t ) dt
function (pdf) f(x), whose graph is a
density curve. f(x) is NOT a pmf!
F ( x)  f ( x)
f ( x)  0
f ( x)
f ( x) dx  1
Fundamental
Theorem of
Calculus
P( X  any constant a)  0, not f (a)
F(x) increases
continuously
from 0 to 1.
b
P(a  X  b)   f ( x) dx  F (b)  F (a)
Moreover…
a
82
A continuous random variable X
Cumulative probability function (cdf)
In
summary…
x
corresponds to a probability density
F ( x)  P( X  x) 
f (t ) dt
function (pdf) f(x), whose graph is a
density curve. f(x) is NOT a pmf!
F ( x)  f ( x)
f ( x)
0
  E[ X ]   f ( x) dxx f1( x) dx
  E ( X   )   
2
2
Fundamental
Theorem of
Calculus
( x   ) F(x)f increases
( x) dx
2
continuously
from 0 to 1.
 E  X   E[ X ]   x f ( x) dx  
2
2
P( X  any constant a)  0, not f (a)
2
b
2
P(a  X  b)   f ( x) dx  F (b)  F (a)
Moreover…
a
83
SECTION 4.3 IN POSTED LECTURE NOTES
85
Four Examples: 1
For any b > 0, consider the following probability density function (pdf)...
Determine the cumulative distribution function (cdf)
2
 x, 0  x  b
f ( x)   b 2
0,
else
F ( x)  P( X  x)
2
For any x < 0, it follows that…
b
F ( x)  P( X  x)  0.
2
x
2
b
For any 0  x  b, it follows that…
F ( x)  P( X  x)  x 2 b2
without calculus...
2
x b
x
0
1
 2
0  ( x  0)  2
2
b
with calculus...
2
b
2
 x
x  2
 b
f (t ) dt  0  
x
0
2
x2
t dt  2
2
b
b
Four Examples: 1
For any b > 0, consider the following probability density function (pdf)...
Determine the cumulative distribution function (cdf)
2
 x, 0  x  b
f ( x)   b 2
0,
else
F ( x)  P( X  x)
2
For any x < 0, it follows that
b
F ( x)  P( X  x)  0.
2
x
2
b
For any 0  x  b, it follows that…
F ( x)  P( X  x)  x 2 b2
x 2 b2
0
x
b
Four Examples: 1
For any b > 0, consider the following probability density function (pdf)...
Determine the cumulative distribution function (cdf)
2
 x, 0  x  b
f ( x)   b 2
0,
else
F ( x)  P( X  x)
2
For any x < 0, it follows that
b
F ( x)  P( X  x)  0.
2
x
2
b
For any 0  x  b, it follows that…
F ( x)  P( X  x)  x 2 b2
Note: F (b)  b 2 b 2  1
1
For any x  b, it follows that…
F ( x)  P( X  x)  1  0
0
bx
Four Examples: 1
For any b > 0, consider the following probability density function (pdf)...
Determine the cumulative distribution function (cdf)
2
 x, 0  x  b
x0
f ( x)   b 2
0,
 x 2
0,
else
F ( x)  P( X  x) 
 2 , 0 xb
2
b
xb
b
 1,
2
x
2
b
1 2
x
2
b
Monotonic and
continuous
from 0 to 1
0
b
0
b
89
Four Examples: 2
For any b > a > 0, consider the probability density function (pdf)...
 2x
0 xa
 ab ,
 2( x  b)
f ( x)  
, a xb
 b( a  b)
0,
else
Determine the cumulative distrib function (cdf)
F ( x)  P( X  x)
For any x  0, it follows that F ( x)  0.
For any 0  x  a , it follows that
F ( x)  0  
2( x  b)
b(a  b)
2x
ab
a
0
x
b
x
x
0
2
a
x
2t dt 
 F (a) 
ab
b
ab
For any a  x  b , it follows that
x 2(t b)
a
( x  b) 2
F ( x)  F (a) 
dt  1 
b
a b(a b)
b( a  b)
1
(b)  0
For any x  b, it follows that F ( x)  F
90
Four Examples: 2
For any b > a > 0, consider the probability density function (pdf)...
 2x
  Edistrib
[ X ] function
x f ((cdf)
x) dx
Determine the mean
cumulative
,
0
x
a
 ab
a
b
 2( x  b)
 x f ( x) dx  x f ( x) dx
f ( x)  
, a xb
0
a
b
(
a
b
)
a 2x
b 2( x  b)
 x dx  x
dx 
0
a
0,
ab
b(a  b)
else
Determine the variance
2( x  b)
b(a  b)
2x
ab
2
0
a
  E  X   E  X    x2 f ( x) dx   2
2
a
0
2
b
2x
2 2( x  b)
x
dx   x
dx   2 
a
ab
b(a  b)
2
b
91
Four Examples: 3
Consider the following probability density function (pdf)...
2
 , x 1
f ( x)   x3
0,
x 1
WARNING: “IMPROPER INTEGRAL”
Confirm pdf
f ( x) dx  
1
  E[ X ]  
x f ( x) dx
 2
2
   x 3 dx   2 dx
1
1 x
x
1 c
c
x 
 lim 2 x 2 dx  lim 2  
1
c 
c 
 1  1
c
c
c
c
x 
2
3
dx  lim 2 1 x dx  lim 2 
3
c 
c 
2
x
 1
c
1
1
 lim   2   1  lim 2  1
c 
c  c
 x 1
2
2
 2
 lim     2  lim  2
c 
c  c
 x 1
Four Examples: 4
3
Consider the following probability density function (pdf)...
 12
 , x 1
f ( x)   x 23
0,
x 1
WARNING: “IMPROPER INTEGRAL”
Confirm pdf
f ( x) dx  
1
1 2 c c
 x x  
cc
21
23
dx  lim 2  xx dx
dx  lim 2   
32
c 
1
1
c
c
x
 12 1 1
cc
1
1
11
 
 lim   2  11lim
lim 2 1 1
c 
c 
c cc
x
  1 1
  E[ X ]  
x f ( x) dx
 12
12
   x 23 dx   2dxdx
1
1 x
x
1 c
c c
x 
2
 lim 2  x x1dx
dx  lim 2  
c  1 1
c 
 1  1
c
c
2
 2
 lim     2  lim  2
c 
c  c
 x 1
Four Examples: 4
3
Consider the following probability density function (pdf)...
 12
 , x 1
f ( x)   x 23
0,
x 1
WARNING: “IMPROPER INTEGRAL”
Confirm pdf
f ( x) dx  
1
1 2 c c
 x x  
cc
21
23
dx  lim 2  xx dx
dx  lim 2   
32
c 
1
1
c
c
x
 12 1 1
cc
1
1
11
 
 lim   2  11lim
lim 2 1 1
c 
c 
c cc
x
  1 1
  E[ X ]  
1
x f ( x) dx
1
1
x 2 dx  
dx
1
x
x
 lim  x 1 dx  lim  ln | x |1
c
c 
1
 lim (ln c)
c 
c
c
c 
 
Time
intervals
intervals
= 1.0
= 5.0
2.0
1.0
secs
secs
Time
0.5
“Density”
Interval widths can be made
arbitrarily small, i.e, the scale at
which X is measured can be made
arbitrarily fine, since it is continuous.
f ( x)
p ( x)
(height) (area)
|
x
x
(width)
pmf
P( X  x)  p( x)  f ( x) x
As x  0 and # rectangles  ∞,
this “Riemann sum” approaches the
area under the density curve f(x),
expressed as a definite integral.
pdf
  f ( x) 
dxx  1
Total Area
b
P(a  X  b)   f ( x))
dxx
b
a
a
96
~ The Normal Distribution ~
(a.k.a. “The Bell Curve”)
standard
deviation
X ~ N(μ, σ)
σ
Johann Carl Friedrich Gauss
1777-1855
X
mean μ
• Symmetric, unimodal
• Models many (but not
all) natural systems
• Mathematical
properties make it
useful to work with
97
Standard Normal Distribution
Z ~ N(0, 1)
density function
2
 ( z) 
1  z2
e
2
1
Total Area = 1
Z
The cumulative distribution function (cdf) is denoted by (z).
It is not expressible in explicit, closed form, but is tabulated, and
computable in R via the command pnorm.
Example
Standard Normal Distribution
Find (1.2) = P(Z  1.2).
Z ~ N(0, 1)
1
Total Area = 1
Z
1.2
“z-score”
Example
Standard Normal Distribution
Find (1.2) = P(Z  1.2).
Z ~ N(0, 1)
 Use the included table.
1
Total Area = 1
Z
1.2
“z-score”
Lecture Notes Appendix…
101
102
Example
Standard Normal Distribution
Find (1.2) = P(Z  1.2).
Z ~ N(0, 1)
 Use the included table.
 Use R:
> pnorm(1.2)
[1] 0.8849303
1
Total Area = 1
0.88493
P(Z > 1.2)
0.11507
Z
1.2
“z-score”
Note: Because this is a continuous distribution, P(Z = 1.2) = 0,
so there is no difference between P(Z > 1.2) and P(Z  1.2), etc.
Standard Normal Distribution
X ~ N(μ, σ)
σ
μ
Z ~ N(0, 1)
Z
X 
1
Z
Why be concerned about this, when most “bell curves”
don’t have mean = 0, and standard deviation = 1?
Any normal distribution can be transformed to the standard
normal distribution via a simple change of variable.
Example
POPULATION
Random Variable
X = Age at first birth
Question: What proportion of the
population had their first child
before the age of 27.2 years old?
P(X < 27.2) = ?
Year 2010
X ~ N(25.4, 1.5)
σ = 1.5
μ = 25.4 27.2
105
Example
POPULATION
Random Variable
X = Age at first birth
Question: What proportion of the
population had their first child
before the age of 27.2 years old?
P(X < 27.2) = ?
Year 2010
X ~ N(25.4, 1.5)
The x-score =
27.2 must first be
transformed to
a corresponding
z-score.
σ = 1.5
μ μ==25.4 27.2
33
106
Example
POPULATION
Random Variable
X = Age at first birth
Question: What proportion of the
population had their first child
before the age of 27.2 years old?
P(X < 27.2) = ?P(Z < 1.2) = 0.88493
Year 2010
X ~ N(25.4, 1.5)
27.2
X 25.4
  1.2
Z Z Z
1.5
σ = 1.5
 Using R:
> pnorm(27.2, 25.4, 1.5)
[1] 0.8849303
μ μ==25.4 27.2
33
107
Standard Normal Distribution
Z ~ N(0, 1)
1
Z
What symmetric interval about the mean 0 contains 95% of the population values?
That is…
Standard Normal Distribution
Z ~ N(0, 1)
 Use the included table.
0.95
0.025
0.025
Z
-z.025 = ?
+z.025 = ?
What symmetric interval about the mean 0 contains 95% of the population values?
That is…
Lecture Notes Appendix…
110
111
Standard Normal Distribution
Z ~ N(0, 1)
 Use the included table.
 Use R:
> qnorm(.025)
[1] -1.959964
> qnorm(.975)
[1] 1.959964
0.95
0.025
0.025
Z
-z.025 = -1.96
?
“.025 critical values”
+z.025 = +1.96
?
What symmetric interval about the mean 0 contains 95% of the population values?
X ~ N(μ1.5)
, σ)
X ~ N(25.4,
Standard Normal Distribution
Z ~ N(0, 1)
What symmetric interval about the mean age
of 25.4 contains 95% of the population values?
22.46  X  28.34 yrs
> areas = c(.025, .975)
> qnorm(areas, 25.4, 1.5)
[1] 22.46005 28.33995
Z
X 
X  25.4
1.96 
1.5
X  25.4  (1.96)(1.5)
X  25.4  2.94
0.95
0.025
0.025
Z
-z.025 = -1.96
?
“.025 critical values”
+z.025 = +1.96
?
What symmetric interval about the mean 0 contains 95% of the population values?
Standard Normal Distribution
Z ~ N(0, 1)
 Use the included table.
0.90
0.05
0.05
Z
Similarly…
-z.05 = ?
+z.05 = ?
What symmetric interval about the mean 0 contains 90% of the population values?
…so average 1.64 and 1.65
0.95  average of 0.94950 and 0.95053…
115
Standard Normal Distribution
Z ~ N(0, 1)
 Use the included table.
 Use R:
> qnorm(.05)
[1] -1.644854
> qnorm(.95)
[1] 1.644854
0.90
0.05
0.05
Z
Similarly…
-z.05 = -1.645
?
“.05 critical values”
+z
+z.05
= +1.645
?
.05 =
What symmetric interval about the mean 0 contains 90% of the population values?
Standard Normal Distribution
Z ~ N(0, 1)
In general….
10.90
–
0.05
/2
0.05
/2
Z
Similarly…
-z.05 = -1.645
? -z / 2
““.05
 / 2critical
criticalvalues”
values”
+z
+z.05
= +1.645
?
.05
/ 2=
What symmetric interval about the mean 0 contains
100(1 – )% of the population values?
continuous
discrete
Normal Approximation to the Binomial Distribution
Suppose a certain outcome exists in a population, with constant probability .
We will randomly select a random sample of n individuals, so that the binary
“Success vs. Failure” outcome of any individual is independent of the binary
outcome of any other individual, i.e., n Bernoulli trials (e.g., coin tosses).
Discrete random variable
X = # Successes in sample
(0, 1, 2, 3, …,, n)
P(Success) = 
P(Failure) = 1 – 
Then X is said to follow a Binomial distribution,
written X ~ Bin(n, ), with “probability function”
n x
n x
,
(1
)
x
 
p(x) =
x = 0, 1, 2, …, n.
118
> dbinom(10, 100, .2)
[1] 0.00336282
Area
119
> pbinom(10, 100, .2)
[1] 0.005696381
Area
120
121
122
123
124
Therefore, if…
X ~ Bin(n, ) with n  15 and n (1 – )  15,
then…
X  N n  , n  (1   .
That is…
X
ˆ   N   ,
n
 (1   ) 
n
“Sampling Distribution” of
ˆ
125
● Normal distribution
● Log-Normal ~ X is not normally distributed (e.g., skewed), but
Y = “logarithm of X” is normally distributed
● Student’s t-distribution ~ Similar to normal distr, more flexible
● F-distribution ~ Used when comparing multiple group means
● Chi-squared distribution ~ Used extensively in categorical
data analysis
● Others for specialized applications ~ Gamma, Beta, Weibull…
126