Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Random Variables and
Probability Distributions
Modified from a PowerPoint by Carlos J. Rosas-Anderson
Probability distributions

We use probability
distributions because
they work – they fit lots
of data in the real world
100
80
60
40
20
Std. Dev = 14.76
Mean = 35.3
N = 713.00
0
.0
66
.0
50
.0
34
.0
18
0
2.
Ht (cm) 1996
Ex. height (cm) of Hypericum cumulicola at
Archbold Biological Station
Probability distributions


Almost 2/3 of class responded that they were familiar
with the Normal Distribution, BUT…
Many variables relevant to biological and ecological
studies are not normally distributed!


For example, many variables are discrete (presence/absence,
# of seeds or offspring, # of prey consumed, etc.)
Since normal distributions apply only to continuous
variables, we need other types of distributions to model
discrete variables.
Random variable

The mathematical rule (or function) that
assigns a given numerical value to each
possible outcome of an experiment in the
sample space of interest.

2 Types:


Discrete random variables
Continuous random variables
The Binomial Distribution
Bernoulli Random Variables

Imagine a simple trial with only two possible outcomes:
 Success (S)
 Failure (F)

Examples
 Toss of a coin (heads or tails)
Jacob Bernoulli (1654-1705)
 Sex of a newborn (male or female)
 Survival of an organism in a region (live or die)
The Binomial Distribution
Overview

Suppose that the probability of success is p

What is the probability of failure?
 q=1–p

Examples
 Toss of a coin (S = head): p = 0.5  q = 0.5
 Roll of a die (S = 1): p = 0.1667  q = 0.8333
 Fertility of a chicken egg (S = fertile): p = 0.8  q = 0.2
The Binomial Distribution
Overview

Imagine that a trial is repeated n times

Examples
 A coin is tossed 5 times
 A die is rolled 25 times
 50 chicken eggs are examined

ASSUMPTIONS: 1) p is constant from trial to trial, and 2) the
trials are statistically independent of each other
The Binomial Distribution
Overview

What is the probability of obtaining X successes in n trials?

Example
 What is the probability of obtaining 2 heads from a coin that
was tossed 5 times?
P(HHTTT) = (1/2)5 = 1/32
The Binomial Distribution
Overview

But there are more possibilities:
HHTTT
HTHTT
THHTT
P(2 heads) = 10 × 1/32 = 10/32
HTTHT
THTHT
TTHHT
HTTTH
THTTH
TTHTH
TTTHH
The Binomial Distribution
Overview

In general, if trials result in a series of success and failures,
FFSFFFFSFSFSSFFFFFSF…
Then the probability of X successes in that order is
P(X)= q  q  p  q  
= pX  qn – X
The Binomial Distribution
Overview

However, if order is not important, then
P(X) =
n!
 pX  qn – X
X!(n – X)!
n!
where
is the number of ways to obtain X successes
X!(n – X)!
in n trials, and n! = n  (n – 1)  (n – 2)  …  2  1
The Binomial Distribution
Overview
Bin(0.3, 5)
Bin(0.1, 5)
0.4
0.3
0.2
0.1
0
0.8
0.6
0.4
0.2
0
0
1
2
3
4
0
5
1
2
3
4
5
4
5
Bin(0.5, 5)
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
Bin(0.9, 5)
Bin(0.7, 5)
0.8
0.6
0.4
0.2
0
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
0
1
2
3
The Poisson Distribution
Overview

When there are a large number of
trials but a small probability of
success, binomial calculations
become impractical
 Example: Number of deaths
from horse kicks in the Army in
different years
Simeon D. Poisson (1781-1840)

The mean number of successes from
n trials is λ = np
 Example: 64 deaths in 20 years
out of thousands of soldiers
The Poisson Distribution
Overview

If we substitute λ/n for p, and let n approach infinity, the
binomial distribution becomes the Poisson distribution:
e -λλx
P(x) =
x!
The Poisson Distribution
Overview

The Poisson distribution is applied when random events are
expected to occur in a fixed area or a fixed interval of time

Deviation from Poisson distribution may indicate some degree
of non-randomness in the events under study

Investigation of cause may be of interest
See Hurlbert 1990 for some caveats and suggestions for
analyzing random spatial distributions using Poisson
distributions

The Poisson Distribution
Example: Emission of -particles

Rutherford, Geiger, and Bateman (1910) counted the number of
-particles emitted by a film of polonium in 2608 successive
intervals of one-eighth of a minute
 What is n?
 What is p?

Do their data follow a Poisson distribution?
The Poisson Distribution
Emission of -particles

Calculation of λ:
λ = No. of particles per interval
= 10097/2608
= 3.87

Expected values:
e -3.87(3.87)x
2608  P(x) = 2608 
x!
No. -particles
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Over 14
Total
Observed
57
203
383
525
532
408
273
139
45
27
10
4
0
1
1
0
2608
The Poisson Distribution
Emission of -particles
No. -particles
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Over 14
Total
Observed
57
203
383
525
532
408
273
139
45
27
10
4
0
1
1
0
2608
Expected
54
210
407
525
508
394
255
140
68
29
11
4
1
1
1
0
2608
The Poisson Distribution
Emission of -particles
Random events
Regular events
Clumped events
The Poisson Distribution
0.5
0.1
1
0.8
12
10
8
6
0
12
10
8
6
4
2
0
1
4
0.6
0.4
0.2
0
2
1
0.8
0.6
0.4
0.2
0
1
0.8
0.6
0.4
0.2
0
12
10
8
6
4
2
6
0
2
12
10
8
6
4
2
12
10
8
6
4
2
0
0
1
0.8
0.6
0.4
0.2
0
1
0.8
0.6
0.4
0.2
0
The Expected Value of a Discrete
Random Variable
n
E ( X )   ai pi a1 p1  a2 p2  ...  an pn
i 1
The Variance of a Discrete Random
Variable
 ( X )  E  X  E ( X )
2
2


  pi  ai   ai pi 
i 1
i 1


n
n
2
Uniform random variables

The closed unit interval, which contains all
numbers between 0 and 1, including the two end
points 0 and 1: [0,1]
0.2
Subinterval [5,6]
P(X)
Subinterval [3,4]
0.1
1 / 10,0  x  10 
f ( x)  

 0, otherwise 
The probability
density function
0
0
1
2
3
4
5
X
6
7
8
9 10
(PDF)
The Expected Value of a Continuous
Random Variable
E ( X )   xf ( x)dx
For a uniform random variable x,
where f(x) is defined on the
interval [a,b] and where a<b:
E ( X )  (b  a) / 2
and
(b  a)
 (X ) 
12
2
2
The Normal Distribution
Overview

Discovered in 1733 by de Moivre as an approximation to the
binomial distribution when the number of trials is large

Derived in 1809 by Gauss

Importance lies in the Central Limit Theorem, which states that
the sum of a large number of independent random variables
(binomial, Poisson, etc.) will approximate a normal distribution

Abraham de Moivre
(1667-1754)
Example: Human height is determined by a large number of
factors, both genetic and environmental, which are additive in
their effects. Thus, it follows a normal distribution.
Karl F. Gauss
(1777-1855)
The Normal Distribution
Overview

A continuous random variable is said to be normally distributed
with mean  and variance 2 if its probability density function is
f (x)
=

1
2
e
(x  )2/22
f(x) is not the same as P(x)
 P(x) would be virtually 0 for every x because the normal
distribution is continuous
x2
 However, P(x1 < X ≤ x2) =  f(x)dx
x1
The Normal Distribution
Overview
0.45
0.40
0.35
f (x )
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-3
-2.5
-2
-1.5
-1
-0.5
0
x
0.5
1
1.5
2
2.5
3
The Normal Distribution
Overview
0.45
0.40
0.35
f (x )
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-3
-2.5
-2
-1.5
-1
-0.5
0
x
0.5
1
1.5
2
2.5
3
The Normal Distribution
Overview
Mean changes
Variance changes
The Normal Distribution
Length of Fish

A sample of rock cod in Monterey Bay suggests that the mean
length of these fish is  = 30 in. and 2 = 4 in.

Assume that the length of rock cod is a normal random variable

If we catch one of these fish in Monterey Bay,
 What is the probability that it will be at least 31 in. long?
 That it will be no more than 32 in. long?
 That its length will be between 26 and 29 inches?
The Normal Distribution
Length of Fish

What is the probability that it will be at least 31 in. long?
0.25
0.20
0.15
0.10
0.05
0.00
25
26
27
28
29
30
31
Fish length (in.)
32
33
34
35
The Normal Distribution
Length of Fish

That it will be no more than 32 in. long?
0.25
0.20
0.15
0.10
0.05
0.00
25
26
27
28
29
30
31
Fish length (in.)
32
33
34
35
The Normal Distribution
Length of Fish

That its length will be between 26 and 29 inches?
0.25
0.20
0.15
0.10
0.05
0.00
25
26
27
28
29
30
31
Fish length (in.)
32
33
34
35
Standard Normal Distribution

μ=0 and σ2=1
5000
4000
3000
2000
1000
0
-6
-4
-2
0
2
4
Useful properties of the normal
distribution

The normal distribution has useful
properties:


Can be added: E(X+Y)= E(X)+E(Y)
and σ2(X+Y)= σ2(X)+ σ2(Y)
Can be transformed with shift and
change of scale operations
Consider two random variables X and Y
Let X~N(μ,σ) and let Y=aX+b where a and b are
constants
Change of scale is the operation of multiplying X by a
constant a because one unit of X becomes “a” units of
Y.
Shift is the operation of adding a constant b to X because
we simply move our random variable X “b” units along
the x-axis.
If X is a normal random variable, then the new random
variable Y created by these operations on X is also a
normal random variable .
For X~N(μ,σ) and Y=aX+b



E(Y) =aμ+b
σ2(Y)=a2 σ2
A special case of a change of scale and shift operation
in which a = 1/σ and b = -1(μ/σ):



Y = (1/σ)X-(μ/σ) = (X-μ)/σ
This gives E(Y)=0 and σ2(Y)=1
Thus, any normal random variable can be transformed
to a standard normal random variable.
Log-normal Distribution


300
A
200
100
Std. Dev = 183.79
Mean = 127.5
N = 765.00
0
0
0.
.0
00
16 .0
00
15 .0
00
14 .0
00
13 0 .0
0
12 .0
00
11 .0
00
10
0
0.
90 0
0.
80 .0
0
70 0
0.
60 0
0.
50 0
0.
40 0
0.
30 0
0.
20 .0
0
10
X is a log-normal random
variable if its natural
logarithm, ln(X), is a normal
random variable.
Original values of X give a
right-skewed distribution (A),
but plotting on a logarithmic
scale gives a normal
distribution (B).
rep 1994
70
60
50

Many ecologically important
variables are log-normally
distributed.
40
30
20
Std. Dev = 1.44
10
Mean = 4.00
N = 765.00
0
5
.7
25
7.
75
6.
25
6.
75
5.
25
5.
75
4.
25
4.
75
3.
25
3.
75
2.
25
2.
75
1.
25
1.
LOGREP94
SOURCE: Quintana-Ascencio et al. 2006; Hypericum data from Archbold Biological Station
Log-normal Distribution
mean  e

  / 2

2

variance  e  1 e
2
2  
2
The Central Limit Theorem


Asserts that standardizing any random variable
that itself is a sum or average of a set of
independent random variables results in a new
random variable that is “nearly the same as” a
standard normal one.
The only caveats are that the sample size must
be “large enough” and that the observations
themselves must be independent and all drawn
from a distribution with common expectation
and variance.
Exercise

On Friday, we will perform an exercise in R that
will allow you to work with some of these
probability distributions!
Related documents