* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download - Catalyst
Survey
Document related concepts
Transcript
Nonlinear function minimization (review) Newton’s minimization method Ecological detective p. 267 Isaac Newton Let y f (x) We want to find the minimum value of f(x) dy h(x) (first derivative) dx dh(x) d 2 y h '(x) 2 (second derivative) dx dx begin with starting guess x0 , jump size 1 h(xi ) x i 1 x i h '(xi ) continue until x stops changing Golden section search ( 5 1) / 2 0.618 (irrational) x1 0.618L 0.382U , x2 0.382L 0.618U 1 Step 1 L=0 x1 x2 U=1 1 Step 2 L = x1 x2 x3 U=1 1 Step 3 L = x2 x3 x4 U=1 Simplex approach This is a very sophisticated form of hill climbing, and is derivative-free. Algorithm called “amoeba”. Source: http://optics.nuigalway.ie/people/larry/ Simulated annealing Randomly jump to a new spot, if it is better then stay there, if it is worse, go back to initial jump Source: http://www.stanford.edu/~hwang41/ Complications with model fitting • • • • • • Parameter confounding (correlations) Problems with numerical derivatives Non-continuous problems Integer parameters Multiple minima Constrained parameters Constrained parameters • Transform bounded parameters to unbounded using y arctan(x) y a (b a) 2 maps x onto 0 y 1 arctan(x) 2 maps x onto a y b • Then let Solver search over x, but use y in the model equations 6 atan_demo.xlsx Arctan transformation 2 1.0 arctan-transformed to 0-1 range -10 y arctan(x) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -5 0 5 10 Values of x 6 atan_demo.xlsx Hints for minimization • Constrain population sizes to not go negative • Bound parameters in code using ABS or ATAN • Particularly problematic are multiple proportions that must add to 1 – Fix each p to be 1-sum of the previous ones • In Solver set convergence criteria smaller • Keep away from extremely small or extremely large values Conclusions • Non-linear minimization is as much art as science • You cannot just plug numbers into a program and hope for the best, you must make checks to assure convergence • Takes time and experience, but is well rewarded Probability distributions and likelihood Readings • Ecological detective: – Chapter 3 Probability distributions • Wikipedia (seriously!) – e.g. Beta distribution, lognormal distribution, etc. Overview • Probability vs. likelihood • Probability distributions: binomial, poisson, normal, lognormal, negative binomial, beta, gamma, multinomial • Likelihood profile • The concept of support • Model selection, likelihood ratio, AIC • Robustness • Contradictory data Probability Likelihood If I flip a fair coin 10 times, what is the probability of it landing heads up every time? I flipped a coin 10 times and obtained 10 heads. What is the likelihood that the coin is fair? Given the fixed parameter (p = 0.5), what is the probability of different outcomes? Given the fixed outcomes (data), what is the likelihood of different parameter values? Probabilities add up to 1. Likelihoods do not add up to 1. Hypotheses (parameter values) are compared using likelihood values (higher = better). Probability Area under curve between 5 and 10 0.12 Height of curve at x = 10 Height of curve at x = 14 0.12 0.10 Probability density Probability density Likelihood 0.08 0.06 0.04 0.02 0.00 0.10 0.08 0.06 0.04 0.02 0.00 0 5 10 15 20 25 30 Values of x What is the probability that 5 ≤ x ≤ 10 given a normal distribution with µ = 13 and σ = 4? Answer: 0.204 What is the probability that –1000 ≤ x ≤ 1000 given a normal distribution with µ = 13 and σ = 4? Answer: 1.000 0 5 10 15 20 25 30 Values of x What is the likelihood that µ = 13 and σ = 4 if you observed a value of (a) x = 10 (answer: the likelihood is 0.075) (b) x = 14 (answer: the likelihood is 0.097) Conclusion: if the observed value was 14, it is more likely that the parameters are µ = 13 and σ = 4, because 0.097 is higher than 0.075. We use the same (normal) probability distribution function for both probability and likelihood! Probability density 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 5 10 15 Values of x 20 25 30 Common probability distributions • Discrete: binomial, Poisson, negative binomial, multinomial • Continuous: normal, lognormal, beta, gamma, (negative binomial) 7 distributions.xlsx Examples of all distributions defined here, including excel functions and functions defined directly in the spreadsheet Binomial probability distribution Number of trials N k N k Pr{ Z k} p 1 p k Number of successes mean(Z ) Np Probability of success [0,1] var(Z ) Np(p 1) N N! k k!(N k)! The “factorial term” How many ways are there of selecting k objects from among N objects Example: probability of getting k = 5 heads when flipping a coin N = 10 times, if the coin is fair (p = 0.5). Note: known number of trials. SD and CV (all distributions) standard deviation (SD) variance SD coefficient of variation (CV) mean Poisson probability distribution Pr{ Z k} e mean(Z ) var(Z ) k k! Expected number of events Number of events Example: On average there are λ = 9.4 fatal traffic accidents in Washington State every week. What is the probability that there would be k = 0 in a week? (Note: rare event out of large number of possible events.) Limitations of Poisson • Has only one parameter, which is both the mean and the variance • We often have discrete count data, but in real-life data the variance is often larger than predicted by the Poisson Thus we often use the negative binomial • • • • Closely related to the Poisson and binomial One extra parameter related to the variance VERY useful Looks scary, but don’t be scared! Standard negative binomial Number of failures (k r 1)! r k Pr{ Z k} 1 p p (r 1)! k! Probability of a success pr Number of successes mean(Z ) 1 p Squint a lot and this looks pr var(Z ) kind of like a binomial 2 (1 p) Example: a factory makes widgets successfully with probability p. How many successful widgets have been made when r = 3 failed widgets have been made. The distribution predicts the probability of k = 0, 1, 2, … successful widgets being made. Ecological usefulness? • Almost no ecological problems can be thought of as successes or failures in this way • Great for factory production problems • But we want a function with parameters for – Mean – Overdispersion (increased variance = increased chance of extreme events) • Integer events are rare in nature, we want to deal with real numbers Practitioner’s negative binomial Gamma function (factorial that accepts non-integers, see later) Overdispersion parameter 1 ( k) Pr{ Z k} 1 1 ( )(k 1) 1 mean(Z ) var(Z ) 1 1 k Predicted mean 2 As θ increases, variance increases, hence “overdispersion” As θ → ∞, var(Z) → ∞ As θ → 0, var(Z) = λ, just like a Poisson! Example: our data contain observations k, with mean λ and variance greater than λ. Find the value of overdispersion θ that best accounts for this increased variance. Weird facts about the practitioner’s negative binomial • When θ → 0 this doesn’t just smell like a Poisson, and act like a Poisson, it is the Poisson (advanced stats) • By replacing the factorials with gamma functions, the r and k can be real numbers not just integers • What on earth is a gamma function??? Gamma function Γ() A generalized factorial function that accepts real numbers not just integers (z) e t t z 1dt when z is a real number 0 (z 1) z(z) one of its properties (z) (z 1)! when z is an integer Excel: does not have a gamma function but has a ln of gamma function (GAMMALN) ( 1 k) exp ln 1 ( ) ( k 1) exp ln ( 1 k) ln ( 1 ) ln (k 1) Multinomial probability distribution Total number observed Observed number in category k n! x1 x2 xk Pr{ X i xi } p1 p2 ...pk x1 ! x2 !...xk ! Predicted proportion mean( X i ) npi in category k var( X i ) npi (1 pi ) Example: fitting a model to proportions at age (or proportions at length) data. Model produces predicted proportions pi and data gives observed numbers xi in each category. Total numbers sampled = n = x1 + x2+ … + xk Probability density 0.14 0.12 Predicted values 0.10 Data (n=100) 0.08 0.06 0.04 0.02 0.00 Probability density 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Values of x 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 Predicted values Data (n=10000) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Values of x Unrealism of multinomial (and other distributions too!) • Assumes every sampling event is completely independent • But there is much correlation in reality – Same trawl, area, time of day, day of year, gender, etc. • Real data never ever fit a multinomial this well • Later lectures will introduce the concept of “effective sample size” neff, which will be smaller than reported sample size n. Normal distribution Probability density 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 5 10 15 Values of x 2 x 1 f ( x) exp 2 2 2 2 mean(x) var(x) 2 20 25 30 Lognormal distribution Probability density 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 5 10 15 Values of x 2 ln x ln 1 1 f ( x) exp 2 2 x 2 2 2 mean(x) exp 2 var(x) 2 exp( 2 ) 1 exp( 2 ) 20 25 30 Lognormal: key notes • 0<x<∞ • Mean(x) is not µ • If we want the mean to be µ, then replace the model parameter with: * exp( 2 ) 2 • Used widely for abundance and biomass Probability density 3.5 Beta distribution 3.0 0.5,0.5 2.5 2.0 1.5 1.0 0.5 0.0 0 0.2 0.4 0.6 0.8 1 Values of x 1.2 Probability density ( ) 1 1 f (x) x 1 x ( )( ) mean(x) 1.0 0.8 1,1 0.6 0.4 0.2 0.0 0 0.2 0.4 0.6 0.8 1 Values of x 1.4 Probability density var(x) ( )2 ( 1) ( ) 1 Note: is often written as ( )( ) B( , ) 1.2 1.0 0.8 0.6 1.3,1.3 0.4 0.2 0.0 0 0.2 0.4 0.6 0.8 1 0.8 1 Values of x Probability density 2.5 2.0 1.5 1.0 4,4 0.5 0.0 0 0.2 0.4 0.6 Values of x 3.0 9 7 6 0.5,2 5 4 3 2 1 2.5 8 Probability density Probability density Probability density 8 2,6 2.0 1.5 1.0 0.5 7 6 5 50,50 4 3 2 1 0 0 0.2 0.4 0.6 Values of x 0.8 1 0.0 0 0 0.2 0.4 0.6 Values of x 0.8 1 0 0.2 0.4 0.6 Values of x 0.8 1 Beta: key notes • Values confined to be 0 < x < 1 • Can mimic almost any shape within those bounds • Although bounded, can change the bounds by multiplying / dividing x values • E.g. survival parameters Probability density 0.25 Gamma distribution 1 x f (x) x e ( ) mean(x) var(x) 2 0.20 0.15 0.10 4, 1 0.05 0.00 0 2 4 6 8 10 8 10 8 10 8 10 Values of x 0.50 Probability density 0.45 0.40 0.35 4, 2 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 2 Probability density 4 6 Values of x 0.40 0.35 0.30 1.1, 0.5 0.25 0.20 0.15 0.10 0.05 0.00 0 2 0.25 0.20 0.15 0.10 60, 5 0.05 0.00 0 5 10 15 Values of x 20 25 4 6 Values of x 0.0007 Probability density Probability density 0.30 0.9, 0.0001 0.0006 0.0005 0.0004 0.0003 0.0002 0.0001 0.0000 0 2 4 6 Values of x Gamma: key notes • 0≤x<∞ • Somewhat like an exponential, lognormal, or normal • Flexibility without being bounded like the beta distribution • E.g. salmon arrival numbers plotted over time • Excel function beta.dist() assumes parameters α* = α and β* =1/β