Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1/3/2017
Statistics and Probability Distributions
Random Variables and Probability
Distributions
Certain
probability distributions are
assumed by many of the common
statistical tests
ANOVA assumes variables follow a normal
distribution (need to meet assumptions to
use ANOVA)
Probability
world data
distributions fit many real-
Ecological Analyses
Ecological Analyses
Random Variables
Discrete
Which sample is ‘better’?
Random Variables
e.g., 1,3,5
Presence versus Absence
Number of Offspring born to swallows
Continuous
Random Variables
Can have any value within an interval
e.g., body mass, wing length
Ecological Analyses
Ecological Analyses
Accuracy versus Precision
Precision, Accuracy and Bias
Accuracy
is how close the estimated
value is to the true value – this
difference is the bias
Precision is the variation in the
measurement
Your
sample indicates precision, but you
don’t know its accuracy!
Precise
Accurate
Ecological Analyses
Ecological Analyses
1
1/3/2017
Discrete Random Variable Distributions
Bernoulli
Bernoulli Random Variable
Random Variables
X ~ Bernoulli(p)
Experiment has only two outcomes
(e.g., organism present or absent)
Bernoulli Random Variable describes the
outcome of such an experiment
The
random variable X is distributed as
a Bernoulli random variable with a single
parameter ‘p’
Best
example would be the toss of a
‘fair’ coin in which either outcome is
equally likely (i.e., p =0.5)
Ecological Analyses
Ecological Analyses
Bernoulli Random Variable
Binomial Random Variable
Might
use a Bernoulli Random variable to
look at the presence or absence of a
species in a number of different
locations (e.g., habitats, lakes)
Many
Bernoulli Trials = Binomial Random
Variable
Necessary because we would also want
to involve replication in our experiments
Ecological Analyses
Ecological Analyses
Binomial Random Variable
Binomial Random Variable
X ~ Bin(n,p)
binomial Random Variable X is the
number of successful results in n
independent Bernoulli trials (parameters
n and p)
If n = 1, then the result is equivalent to
a Bernoulli trial
One of the most common types of
random variables encountered in
ecological studies
The probability of obtaining X successes for a
binomial random variable is:
where n is the number of trials, X is the
number of successful outcomes (X ≤ n) and n! is
n factorial (i.e., n x (n-1) x (n-2) ... x 1)
A
Ecological Analyses
Ecological Analyses
2
1/3/2017
Binomial Random Variable
Binomial Coefficient
Think of
Consider the following set of five small
mammals:{(red-backed vole), (meadow vole),
(deer mouse), (short-tailed shrew), (jumping
mouse)}
How many unique pairs of small mammals can be
formed from this set?
as “n choose X”, which is known as the binomial
coefficient
Needed because there are many ways to obtain
combinations and failures
Ecological Analyses
Ecological Analyses
Binomial Coefficient
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Binomial Coefficient
(red-backed vole),(meadow vole)
(red-backed vole),(deer mouse)
(red-backed vole),(short-tailed shrew)
(red-backed vole),(jumping mouse)}
(meadow vole),(deer mouse)
(meadow vole),(short-tailed shrew)
(meadow vole),(jumping mouse)}
(deer mouse),(short-tailed shrew)
(deer mouse),(jumping mouse)}
(short-tailed shrew),(jumping mouse)}
Using the binomial coefficient, we would set n
= 5 and X = 2 and get 10 combinations:
Ecological Analyses
Ecological Analyses
The Binomial Distribution
Binomial Random Variable
The
By having a predicted (or theoretical)
distribution, we can then see if our observed
results ‘fit’ that distribution
But first we need to be able to know about the
available distributions
Ecological Analyses
following example (details on pages
31 and 32) illustrates taking the
distributions of taking various X values
out of 25 trials
Ecological Analyses
3
1/3/2017
Calculating the Binomial Distribution
The Binomial Distribution
Ecological Analyses
Ecological Analyses
The Binomial Distribution
Binomial Distribution with X~Bin(25,0.8)
Probability
distribution
Symmetrical (both
tails equal)
True only when
p = (1 - p) = 0.5
Ecological Analyses
Ecological Analyses
Poisson Random Variables
Binomial Distribution with X~Bin(25,0.8)
The Binomial distribution is appropriate when
there is a fixed number of trials (n) and the
probability of success is not too small
Formula becomes awkward when n becomes
large and p becomes small (i.e., for rare
occurrences of animals or plants)
Also need to be able to directly count the
trails themselves
Ecological Analyses
Ecological Analyses
4
1/3/2017
Poisson Random Variables
Poisson Random Variables
Instead
we frequently count the events
that occur within a sample
Suppose that you are using a number of
quadrats to sample for the presence of
animal damage
Each occurrence represents the
‘success’ of an unobserved event
Can’t
really determine how many ‘trials’
have taken place
Similar for trials in time: number of
birds visiting a feeder over a period of
time
We use the Poisson Distribution
Ecological Analyses
Ecological Analyses
Poisson Random Variables
Poisson Random Variables
X ~ Poisson()
X is the number of occurrences of an
event recorded in a sample of fixed area
or during a fixed time interval
Used when occurrences are rare (i.e.,
the most common number of counts in
any sample is 0)
X is the number of events in a sample
X ~ Poisson()
Described by a single parameter,
is the average value of the number of
occurrences of the event in each sample
Ecological Analyses
Ecological Analyses
Poisson Random Variables
Poisson Random Variables
Suppose
that the average number of
damaged plants in a 10-m2 quadrat is 2
What are the chances that a single
quadrat will contain 3 damaged plants?
= 2, x = 3
Ecological Analyses
= 2, x = 3
Ecological Analyses
5
1/3/2017
Poisson Random Variables
Poisson Distributions
The
chances that a plot will contain no
damage would be ( = 2, x = 0):
Ecological Analyses
Ecological Analyses
Poisson Distributions
= 0.1
= 0.5
= 2.0
Poisson Distributions
= 1.0
Later
we can test observed frequencies
against these theoretical distributions
to see if our predictions are met ...
= 4.0
Ecological Analyses
Expected value of a Discrete Random Variable
The
entire distribution can be
summarized by determining the average
value
Straight averaging can be misleading
with probability distributions, because
we need to weight by their probabilities
Ecological Analyses
Ecological Analyses
Variance of a Discrete Random Variable
The
variance of a random variable is a
measure of how far the actual values or
a random variable differ from the
expected value
Ecological Analyses
6
1/3/2017
Discrete Statistical Distributions
Female horseshoe crabs with satellite
males
Ecological Analyses
Female horseshoe crabs with satellite
males
Ecological Analyses
Continuous Random Variables
ecological variables are not
discrete:
Number of Satellite Males
Most
Body mass
Wing length
Concentrations of chemicals
Heights and diameters of trees
Within
an interval, there are infinitely
many possible values for a variable
Female Carapace Width (mm)
Ecological Analyses
Uniform Random Variables
Ecological Analyses
Uniform Random Variables
We
break up the continuous variable into
discrete intervals
The sum of the probability of
occurrence of all intervals will be 1.0
Ecological Analyses
Ecological Analyses
7
1/3/2017
Uniform Random Variable
Uniform Random Variable
The
probability that this uniform
random variable X occurs in any
subinterval
f(x)
is a probability density function
(PDF)
Assigning the P that a continuous
variable X occurs within an interval I
Ecological Analyses
Ecological Analyses
Probability Density and Cumulative
Distribution Functions
Cumulative Density Function
F(y) = P(X < y)
CDF represents the
tail probability: the
probability that a
random variable X is
less than or equal to
some value y
More when we look
at statistical tests
Ecological Analyses
Ecological Analyses
Normal (Gaussian) Random Variables
Normal Random Variables
X~N(,)
E(x) =
(x) =
Symmetric around
Ecological Analyses
Standard Normal:
X~N(0,1)
Ecological Analyses
8
1/3/2017
Properties of the Normal Distribution
Normal
distributions can be added
Properties of the Normal Distribution
Normal Distributions can be transformed
The sum of two independent normal random
variables is also a normally distributed
random variable
E(X+Y) = E(X) + E(Y)
(X+Y) = (X) + (Y)
Ecological Analyses
Properties of the Normal Distribution
Ecological Analyses
Log-normal and Exponential Distributions
Normal
Distributions can be
standardized
A special case of a transformation
If a = 1/ and b = -1(/)
E(Y) = a + b and 2(Y) = a22
For X~N(,), Y=(1/)X-/ = (X-)/
E(Y) = 0, 2(Y)=1
For each X, subtracted and divided by
Ecological Analyses
Ecological Analyses
Continuous Statistical Distributions
Central Limit Theorem
Corner
stone of probability and
statistical analyses
Standardizing any random variable that
itself is a sum or average of a set of
independent random variables results in
a new random variable that is “nearly the
same as” a standard normal one
Ecological Analyses
Ecological Analyses
9
1/3/2017
Central Limit Theorem
Summary
Allows
us to use statistics that require a
normal distribution even though the
underlying data themselves may not be
normally distributed
... Provided the samples size is large
enough ...
Ecological Analyses
The
distributions of random variables
can be characterized by their expected
values and variance
Discrete: Bernoulli, Binomial, Poisson
Continuous: Uniform, Normal, Exponential
Ecological Analyses
Summary
The
Central Limit Theorem asserts that
the sum or averages of large,
independent samples will follow a normal
distribution if standardized
For most ecological data, the Central
Limit Theorem supports the use of
statistical tests that assume normal
distributions
Ecological Analyses
10