- Catalyst Download

Transcript
Nonlinear function minimization
(review)
Newton’s minimization method
Ecological detective p. 267
Isaac Newton
Let y  f (x) We want to find the minimum value of f(x)
dy
h(x) 
(first derivative)
dx
dh(x) d 2 y
h '(x) 
 2 (second derivative)
dx
dx
begin with starting guess x0 , jump size   1
h(xi )
x i 1  x i  
h '(xi )
continue until x stops changing
Golden section search
  ( 5  1) / 2  0.618 (irrational)
x1  0.618L  0.382U , x2  0.382L  0.618U

1 
Step 1
L=0
x1

x2
U=1
1 
Step 2
L = x1
x2
x3

U=1
1 
Step 3
L = x2
x3
x4
U=1
Simplex
approach
This is a very
sophisticated
form of hill
climbing, and is
derivative-free.
Algorithm
called
“amoeba”.
Source: http://optics.nuigalway.ie/people/larry/
Simulated annealing
Randomly jump to a new spot, if it is better then
stay there, if it is worse, go back to initial jump
Source: http://www.stanford.edu/~hwang41/
Complications with model fitting
•
•
•
•
•
•
Parameter confounding (correlations)
Problems with numerical derivatives
Non-continuous problems
Integer parameters
Multiple minima
Constrained parameters
Constrained parameters
• Transform bounded parameters to unbounded
using
y
arctan(x)  

y  a  (b  a)
2
maps    x   onto 0  y  1
arctan(x)  

2 maps    x   onto a  y  b
• Then let Solver search over x, but use y in the
model equations
6 atan_demo.xlsx
Arctan transformation

2
1.0
arctan-transformed to 0-1
range
-10
y
arctan(x)  
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-5
0
5
10
Values of x
6 atan_demo.xlsx
Hints for minimization
• Constrain population sizes to not go negative
• Bound parameters in code using ABS or ATAN
• Particularly problematic are multiple proportions
that must add to 1
– Fix each p to be 1-sum of the previous ones
• In Solver set convergence criteria smaller
• Keep away from extremely small or extremely
large values
Conclusions
• Non-linear minimization is as much art as science
• You cannot just plug numbers into a program and
hope for the best, you must make checks to
assure convergence
• Takes time and experience, but is well rewarded
Probability distributions and
likelihood
Readings
• Ecological detective:
– Chapter 3 Probability distributions
• Wikipedia (seriously!)
– e.g. Beta distribution, lognormal distribution, etc.
Overview
• Probability vs. likelihood
• Probability distributions: binomial, poisson,
normal, lognormal, negative binomial, beta,
gamma, multinomial
• Likelihood profile
• The concept of support
• Model selection, likelihood ratio, AIC
• Robustness
• Contradictory data
Probability
Likelihood
If I flip a fair coin 10 times,
what is the probability of it
landing heads up every time?
I flipped a coin 10 times and
obtained 10 heads. What is the
likelihood that the coin is fair?
Given the fixed parameter (p
= 0.5), what is the probability
of different outcomes?
Given the fixed outcomes
(data), what is the likelihood of
different parameter values?
Probabilities add up to 1.
Likelihoods do not add up to 1.
Hypotheses (parameter values)
are compared using likelihood
values (higher = better).
Probability
Area under curve
between 5 and 10
0.12
Height of curve at x = 10
Height of curve at x = 14
0.12
0.10
Probability density
Probability density
Likelihood
0.08
0.06
0.04
0.02
0.00
0.10
0.08
0.06
0.04
0.02
0.00
0
5
10
15
20
25
30
Values of x
What is the probability that 5 ≤ x ≤ 10
given a normal distribution with µ =
13 and σ = 4? Answer: 0.204
What is the probability that –1000 ≤
x ≤ 1000 given a normal distribution
with µ = 13 and σ = 4? Answer: 1.000
0
5
10
15
20
25
30
Values of x
What is the likelihood that µ = 13 and
σ = 4 if you observed a value of
(a) x = 10 (answer: the likelihood is
0.075)
(b) x = 14 (answer: the likelihood is
0.097)
Conclusion: if the observed value was
14, it is more likely that the
parameters are µ = 13 and σ = 4,
because 0.097 is higher than 0.075.
We use the same (normal) probability distribution
function for both probability and likelihood!
Probability density
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0
5
10
15
Values of x
20
25
30
Common probability distributions
• Discrete: binomial, Poisson, negative binomial,
multinomial
• Continuous: normal, lognormal, beta, gamma,
(negative binomial)
7 distributions.xlsx
Examples of all distributions defined here, including
excel functions and functions defined directly in the
spreadsheet
Binomial probability distribution
Number of trials
N  k
N k
Pr{ Z  k}    p 1  p 
k
Number of successes
mean(Z )  Np
Probability of success [0,1]
var(Z )  Np(p  1)
N 
N!
 
 k  k!(N  k)!
The “factorial term”
How many ways are there of selecting k
objects from among N objects
Example: probability of getting k = 5 heads when flipping a coin N =
10 times, if the coin is fair (p = 0.5). Note: known number of trials.
SD and CV (all distributions)
standard deviation (SD)  variance
SD
coefficient of variation (CV) 
mean
Poisson probability distribution
Pr{ Z  k} 
e
mean(Z )  
var(Z )  
k

k!
Expected number of events
Number of events
Example: On average there are λ = 9.4 fatal traffic accidents in
Washington State every week. What is the probability that there
would be k = 0 in a week? (Note: rare event out of large number
of possible events.)
Limitations of Poisson
• Has only one parameter, which is both the mean
and the variance
• We often have discrete count data, but in real-life
data the variance is often larger than predicted by
the Poisson
Thus we often use the negative
binomial
•
•
•
•
Closely related to the Poisson and binomial
One extra parameter related to the variance
VERY useful
Looks scary, but don’t be scared!
Standard negative binomial
Number of failures
(k  r  1)!
r k
Pr{ Z  k} 
1  p  p
(r  1)! k!
Probability of a success
pr
Number of successes
mean(Z ) 
1 p
Squint a lot and this looks
pr
var(Z ) 
kind of like a binomial
2
(1  p)
Example: a factory makes widgets successfully with probability p.
How many successful widgets have been made when r = 3 failed
widgets have been made. The distribution predicts the probability
of k = 0, 1, 2, … successful widgets being made.
Ecological usefulness?
• Almost no ecological problems can be thought of
as successes or failures in this way
• Great for factory production problems
• But we want a function with parameters for
– Mean
– Overdispersion (increased variance = increased chance
of extreme events)
• Integer events are rare in nature, we want to deal
with real numbers
Practitioner’s negative binomial
Gamma function (factorial that
accepts non-integers, see later)
Overdispersion parameter
 1

(  k)  
Pr{ Z  k} 
 1

1
( )(k  1)     
1
mean(Z )  
var(Z )    
1
  
 1

   
k
Predicted mean
2
As θ increases, variance increases, hence “overdispersion”
As θ → ∞, var(Z) → ∞
As θ → 0, var(Z) = λ, just like a Poisson!
Example: our data contain observations k, with mean λ and
variance greater than λ. Find the value of overdispersion θ that
best accounts for this increased variance.
Weird facts about the practitioner’s
negative binomial
• When θ → 0 this doesn’t just smell like a Poisson,
and act like a Poisson, it is the Poisson (advanced
stats)
• By replacing the factorials with gamma functions,
the r and k can be real numbers not just integers
• What on earth is a gamma function???
Gamma function Γ()
A generalized factorial function that accepts real numbers
not just integers

(z)   e t t z 1dt when z is a real number
0
(z  1)  z(z) one of its properties
(z)  (z  1)! when z is an integer
Excel: does not have a gamma
function but has a ln of gamma
function (GAMMALN)
  ( 1  k)  
exp ln

1

(

)

(
k

1)

 
 exp ln ( 1  k)  ln ( 1 )  ln (k  1)
Multinomial probability distribution
Total number observed
Observed number
in category k
n!
x1 x2
xk
Pr{ X i  xi } 
p1 p2 ...pk
x1 ! x2 !...xk !
Predicted proportion
mean( X i )  npi
in category k
var( X i )  npi (1  pi )
Example: fitting a model to proportions at age (or proportions
at length) data. Model produces predicted proportions pi and
data gives observed numbers xi in each category. Total
numbers sampled = n = x1 + x2+ … + xk
Probability density
0.14
0.12
Predicted values
0.10
Data (n=100)
0.08
0.06
0.04
0.02
0.00
Probability density
0
2
4
6
8
10 12 14 16 18 20 22 24 26
Values of x
0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
Predicted values
Data (n=10000)
0
2
4
6
8
10 12 14 16 18 20 22 24 26
Values of x
Unrealism of multinomial
(and other distributions too!)
• Assumes every sampling event is completely
independent
• But there is much correlation in reality
– Same trawl, area, time of day, day of year, gender, etc.
• Real data never ever fit a multinomial this well
• Later lectures will introduce the concept of “effective
sample size” neff, which will be smaller than reported
sample size n.
Normal
distribution
Probability density
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0
5
10
15
Values of x
2

x   

1
f ( x) 
exp  

2
2
2 2


mean(x)  
var(x)   2
20
25
30
Lognormal
distribution
Probability density
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0
5
10
15
Values of x
2

ln x  ln   

1 1
f ( x) 
exp  

2
2 x
2
2


2 
mean(x)   exp  
 2 
var(x)   2 exp( 2 )  1 exp( 2 )
20
25
30
Lognormal: key notes
• 0<x<∞
• Mean(x) is not µ
• If we want the mean to be µ, then replace the
model parameter with:
 *   exp(
2
)
2
• Used widely for abundance and biomass
Probability density
3.5
Beta distribution
3.0
0.5,0.5
2.5
2.0
1.5
1.0
0.5
0.0
0
0.2
0.4
0.6
0.8
1
Values of x
1.2
Probability density
(   )  1
 1
f (x) 
x 1  x 
( )( )

mean(x) 
1.0
0.8
1,1
0.6
0.4
0.2
0.0
0
0.2
0.4
0.6
0.8
1
Values of x
1.4
Probability density
 

var(x) 
(   )2 (    1)
(   )
1
Note:
is often written as
( )( )
B( ,  )
1.2
1.0
0.8
0.6
1.3,1.3
0.4
0.2
0.0
0
0.2
0.4
0.6
0.8
1
0.8
1
Values of x
Probability density
2.5
2.0
1.5
1.0
4,4
0.5
0.0
0
0.2
0.4
0.6
Values of x
3.0
9
7
6
0.5,2
5
4
3
2
1
2.5
8
Probability density
Probability density
Probability density
8
2,6
2.0
1.5
1.0
0.5
7
6
5
50,50
4
3
2
1
0
0
0.2
0.4
0.6
Values of x
0.8
1
0.0
0
0
0.2
0.4
0.6
Values of x
0.8
1
0
0.2
0.4
0.6
Values of x
0.8
1
Beta: key notes
• Values confined to be 0 < x < 1
• Can mimic almost any shape within those bounds
• Although bounded, can change the bounds by
multiplying / dividing x values
• E.g. survival parameters
Probability density
0.25
Gamma distribution


 1   x
f (x) 
x e
( )

mean(x) 


var(x)  2

0.20
0.15
0.10
4, 1
0.05
0.00
0
2
4
6
8
10
8
10
8
10
8
10
Values of x
0.50
Probability density
0.45
0.40
0.35
4, 2
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0
2
Probability density
4
6
Values of x
0.40
0.35
0.30
1.1, 0.5
0.25
0.20
0.15
0.10
0.05
0.00
0
2
0.25
0.20
0.15
0.10
60, 5
0.05
0.00
0
5
10
15
Values of x
20
25
4
6
Values of x
0.0007
Probability density
Probability density
0.30
0.9, 0.0001
0.0006
0.0005
0.0004
0.0003
0.0002
0.0001
0.0000
0
2
4
6
Values of x
Gamma: key notes
• 0≤x<∞
• Somewhat like an exponential, lognormal, or
normal
• Flexibility without being bounded like the beta
distribution
• E.g. salmon arrival numbers plotted over time
• Excel function beta.dist() assumes parameters α*
= α and β* =1/β