Download Slide 1

Random Variables and Probability Distributions Modified from a PowerPoint by Carlos J. Rosas-Anderson Probability distributions  We use probability distributions because they work – they fit lots of data in the real world 100 80 60 40 20 Std. Dev = 14.76 Mean = 35.3 N = 713.00 0 .0 66 .0 50 .0 34 .0 18 0 2. Ht (cm) 1996 Ex. height (cm) of Hypericum cumulicola at Archbold Biological Station Probability distributions   Almost 2/3 of class responded that they were familiar with the Normal Distribution, BUT… Many variables relevant to biological and ecological studies are not normally distributed!   For example, many variables are discrete (presence/absence, # of seeds or offspring, # of prey consumed, etc.) Since normal distributions apply only to continuous variables, we need other types of distributions to model discrete variables. Random variable  The mathematical rule (or function) that assigns a given numerical value to each possible outcome of an experiment in the sample space of interest.  2 Types:   Discrete random variables Continuous random variables The Binomial Distribution Bernoulli Random Variables  Imagine a simple trial with only two possible outcomes:  Success (S)  Failure (F)  Examples  Toss of a coin (heads or tails) Jacob Bernoulli (1654-1705)  Sex of a newborn (male or female)  Survival of an organism in a region (live or die) The Binomial Distribution Overview  Suppose that the probability of success is p  What is the probability of failure?  q=1–p  Examples  Toss of a coin (S = head): p = 0.5  q = 0.5  Roll of a die (S = 1): p = 0.1667  q = 0.8333  Fertility of a chicken egg (S = fertile): p = 0.8  q = 0.2 The Binomial Distribution Overview  Imagine that a trial is repeated n times  Examples  A coin is tossed 5 times  A die is rolled 25 times  50 chicken eggs are examined  ASSUMPTIONS: 1) p is constant from trial to trial, and 2) the trials are statistically independent of each other The Binomial Distribution Overview  What is the probability of obtaining X successes in n trials?  Example  What is the probability of obtaining 2 heads from a coin that was tossed 5 times? P(HHTTT) = (1/2)5 = 1/32 The Binomial Distribution Overview  But there are more possibilities: HHTTT HTHTT THHTT P(2 heads) = 10 × 1/32 = 10/32 HTTHT THTHT TTHHT HTTTH THTTH TTHTH TTTHH The Binomial Distribution Overview  In general, if trials result in a series of success and failures, FFSFFFFSFSFSSFFFFFSF… Then the probability of X successes in that order is P(X)= q  q  p  q   = pX  qn – X The Binomial Distribution Overview  However, if order is not important, then P(X) = n!  pX  qn – X X!(n – X)! n! where is the number of ways to obtain X successes X!(n – X)! in n trials, and n! = n  (n – 1)  (n – 2)  …  2  1 The Binomial Distribution Overview Bin(0.3, 5) Bin(0.1, 5) 0.4 0.3 0.2 0.1 0 0.8 0.6 0.4 0.2 0 0 1 2 3 4 0 5 1 2 3 4 5 4 5 Bin(0.5, 5) 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 Bin(0.9, 5) Bin(0.7, 5) 0.8 0.6 0.4 0.2 0 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 0 1 2 3 The Poisson Distribution Overview  When there are a large number of trials but a small probability of success, binomial calculations become impractical  Example: Number of deaths from horse kicks in the Army in different years Simeon D. Poisson (1781-1840)  The mean number of successes from n trials is λ = np  Example: 64 deaths in 20 years out of thousands of soldiers The Poisson Distribution Overview  If we substitute λ/n for p, and let n approach infinity, the binomial distribution becomes the Poisson distribution: e -λλx P(x) = x! The Poisson Distribution Overview  The Poisson distribution is applied when random events are expected to occur in a fixed area or a fixed interval of time  Deviation from Poisson distribution may indicate some degree of non-randomness in the events under study  Investigation of cause may be of interest See Hurlbert 1990 for some caveats and suggestions for analyzing random spatial distributions using Poisson distributions  The Poisson Distribution Example: Emission of -particles  Rutherford, Geiger, and Bateman (1910) counted the number of -particles emitted by a film of polonium in 2608 successive intervals of one-eighth of a minute  What is n?  What is p?  Do their data follow a Poisson distribution? The Poisson Distribution Emission of -particles  Calculation of λ: λ = No. of particles per interval = 10097/2608 = 3.87  Expected values: e -3.87(3.87)x 2608  P(x) = 2608  x! No. -particles 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Over 14 Total Observed 57 203 383 525 532 408 273 139 45 27 10 4 0 1 1 0 2608 The Poisson Distribution Emission of -particles No. -particles 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Over 14 Total Observed 57 203 383 525 532 408 273 139 45 27 10 4 0 1 1 0 2608 Expected 54 210 407 525 508 394 255 140 68 29 11 4 1 1 1 0 2608 The Poisson Distribution Emission of -particles Random events Regular events Clumped events The Poisson Distribution 0.5 0.1 1 0.8 12 10 8 6 0 12 10 8 6 4 2 0 1 4 0.6 0.4 0.2 0 2 1 0.8 0.6 0.4 0.2 0 1 0.8 0.6 0.4 0.2 0 12 10 8 6 4 2 6 0 2 12 10 8 6 4 2 12 10 8 6 4 2 0 0 1 0.8 0.6 0.4 0.2 0 1 0.8 0.6 0.4 0.2 0 The Expected Value of a Discrete Random Variable n E ( X )   ai pi a1 p1  a2 p2  ...  an pn i 1 The Variance of a Discrete Random Variable  ( X )  E  X  E ( X ) 2 2     pi  ai   ai pi  i 1 i 1   n n 2 Uniform random variables  The closed unit interval, which contains all numbers between 0 and 1, including the two end points 0 and 1: [0,1] 0.2 Subinterval [5,6] P(X) Subinterval [3,4] 0.1 1 / 10,0  x  10  f ( x)     0, otherwise  The probability density function 0 0 1 2 3 4 5 X 6 7 8 9 10 (PDF) The Expected Value of a Continuous Random Variable E ( X )   xf ( x)dx For a uniform random variable x, where f(x) is defined on the interval [a,b] and where a<b: E ( X )  (b  a) / 2 and (b  a)  (X )  12 2 2 The Normal Distribution Overview  Discovered in 1733 by de Moivre as an approximation to the binomial distribution when the number of trials is large  Derived in 1809 by Gauss  Importance lies in the Central Limit Theorem, which states that the sum of a large number of independent random variables (binomial, Poisson, etc.) will approximate a normal distribution  Abraham de Moivre (1667-1754) Example: Human height is determined by a large number of factors, both genetic and environmental, which are additive in their effects. Thus, it follows a normal distribution. Karl F. Gauss (1777-1855) The Normal Distribution Overview  A continuous random variable is said to be normally distributed with mean  and variance 2 if its probability density function is f (x) =  1 2 e (x  )2/22 f(x) is not the same as P(x)  P(x) would be virtually 0 for every x because the normal distribution is continuous x2  However, P(x1 < X ≤ x2) =  f(x)dx x1 The Normal Distribution Overview 0.45 0.40 0.35 f (x ) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -3 -2.5 -2 -1.5 -1 -0.5 0 x 0.5 1 1.5 2 2.5 3 The Normal Distribution Overview 0.45 0.40 0.35 f (x ) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -3 -2.5 -2 -1.5 -1 -0.5 0 x 0.5 1 1.5 2 2.5 3 The Normal Distribution Overview Mean changes Variance changes The Normal Distribution Length of Fish  A sample of rock cod in Monterey Bay suggests that the mean length of these fish is  = 30 in. and 2 = 4 in.  Assume that the length of rock cod is a normal random variable  If we catch one of these fish in Monterey Bay,  What is the probability that it will be at least 31 in. long?  That it will be no more than 32 in. long?  That its length will be between 26 and 29 inches? The Normal Distribution Length of Fish  What is the probability that it will be at least 31 in. long? 0.25 0.20 0.15 0.10 0.05 0.00 25 26 27 28 29 30 31 Fish length (in.) 32 33 34 35 The Normal Distribution Length of Fish  That it will be no more than 32 in. long? 0.25 0.20 0.15 0.10 0.05 0.00 25 26 27 28 29 30 31 Fish length (in.) 32 33 34 35 The Normal Distribution Length of Fish  That its length will be between 26 and 29 inches? 0.25 0.20 0.15 0.10 0.05 0.00 25 26 27 28 29 30 31 Fish length (in.) 32 33 34 35 Standard Normal Distribution  μ=0 and σ2=1 5000 4000 3000 2000 1000 0 -6 -4 -2 0 2 4 Useful properties of the normal distribution  The normal distribution has useful properties:   Can be added: E(X+Y)= E(X)+E(Y) and σ2(X+Y)= σ2(X)+ σ2(Y) Can be transformed with shift and change of scale operations Consider two random variables X and Y Let X~N(μ,σ) and let Y=aX+b where a and b are constants Change of scale is the operation of multiplying X by a constant a because one unit of X becomes “a” units of Y. Shift is the operation of adding a constant b to X because we simply move our random variable X “b” units along the x-axis. If X is a normal random variable, then the new random variable Y created by these operations on X is also a normal random variable . For X~N(μ,σ) and Y=aX+b    E(Y) =aμ+b σ2(Y)=a2 σ2 A special case of a change of scale and shift operation in which a = 1/σ and b = -1(μ/σ):    Y = (1/σ)X-(μ/σ) = (X-μ)/σ This gives E(Y)=0 and σ2(Y)=1 Thus, any normal random variable can be transformed to a standard normal random variable. Log-normal Distribution   300 A 200 100 Std. Dev = 183.79 Mean = 127.5 N = 765.00 0 0 0. .0 00 16 .0 00 15 .0 00 14 .0 00 13 0 .0 0 12 .0 00 11 .0 00 10 0 0. 90 0 0. 80 .0 0 70 0 0. 60 0 0. 50 0 0. 40 0 0. 30 0 0. 20 .0 0 10 X is a log-normal random variable if its natural logarithm, ln(X), is a normal random variable. Original values of X give a right-skewed distribution (A), but plotting on a logarithmic scale gives a normal distribution (B). rep 1994 70 60 50  Many ecologically important variables are log-normally distributed. 40 30 20 Std. Dev = 1.44 10 Mean = 4.00 N = 765.00 0 5 .7 25 7. 75 6. 25 6. 75 5. 25 5. 75 4. 25 4. 75 3. 25 3. 75 2. 25 2. 75 1. 25 1. LOGREP94 SOURCE: Quintana-Ascencio et al. 2006; Hypericum data from Archbold Biological Station Log-normal Distribution mean  e    / 2  2  variance  e  1 e 2 2   2 The Central Limit Theorem   Asserts that standardizing any random variable that itself is a sum or average of a set of independent random variables results in a new random variable that is “nearly the same as” a standard normal one. The only caveats are that the sample size must be “large enough” and that the observations themselves must be independent and all drawn from a distribution with common expectation and variance. Exercise  On Friday, we will perform an exercise in R that will allow you to work with some of these probability distributions!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slide 1