Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bayesian estimation Why and How to Run Your First Bayesian Model Rens van de Schoot Rensvandeschoot. com Classical null hypothesis testing Wainer: "One Cheer for Null-Hypothesis Significance Testing“ (1999; Psych. Meth., 4, 212-213) … however … NHT vs. Bayes Pr (Data | H0) ≠ Pr (Hi | Data) Bayes Theorem Pr (Hi | Data) = Posterior ≈ prior * data Posterior probability is proportional to the product of the prior probability and the likelihood Bayes theorem: prior, data and posterior Bayes Theorem: Bayes Theorem Pr (Hi| Data) = Posterior ≈ prior * data Posterior probability is proportional to the product of the prior probability and the likelihood Intelligence (IQ) -∞ IQ ∞ Prior Knowledgde 1 -∞ IQ ∞ Intelligence Interval Cognitive Designation 40 - 54 Severely challenged (<1%) 55 - 69 Challenged (2.3% of test takers) 70 - 84 Below average 85 - 114 Average (68% of test takers) 115 - 129 Above average 130 - 144 Gifted (2.3% of test takers) 145 - 159 Genius (Less than 1% of test takers) 160 - 175 Extraordinary genius 9 Prior Knowledgde 40 IQ 180 Prior Knowledgde 2 40 IQ 180 Prior Knowledgde 3 40 100 IQ 180 Prior Knowledgde 4 40 100 IQ 180 Prior Knowledgde 5 40 100 IQ 180 Prior Knowledgde 1 2 -∞ ∞ 3 4 5 Prior Prior -∞ IQ ∞ Data Data Prior -∞ IQ ∞ Posterior Posterior Data Prior -∞ IQ ∞ Prior - Data Data Prior 40 100 IQ 180 Prior - Data Data Prior 40 100 IQ 180 How to obtain posterior? In complex models, the posterior is often intractable (impossible to compute exactly) Solution: approximate posterior by simulation – Simulate many draws from posterior distribution – Compute mode, median, mean, 95% interval et cetera from the simulated draws 21 ANOVA example 4 unknown parameters μj (j=1,...,4) and one common but unknown σ2. Statistical model: Y = μ1*D1 + μ2*D2 + μ3*D3 + μ4*D4 + E with E ~ N(0, σ2) The Gibbs sampler Specify prior: Pr(μ1, μ2, μ3, μ4, σ2) Prior (μj) ~ Nor(μ0, var0) Prior (μj) ~ Nor(0,10000) Prior (σ2) ~ IG(0.001, 0.001) Prior is Inverse Gamma a (shape), b (scale) 24 The Gibbs sampler Combine prior with likelihood provides posterior: Post ( μ1, μ2, μ3, μ4, σ2 | data ) …this is a 5 dimensional distribution… The Gibbs sampler Iterative evaluation via conditional distributions: Post ( μ1 | μ2, μ3, μ4, σ2, data ) ~ Prior (μ1) X Data (μ1) Post ( μ2 | μ1, μ3, μ4, σ2, data ) ~ Prior (μ2) X Data (μ2) Post ( μ3 | μ1, μ2, μ4, σ2, data ) ~ Prior (μ3) X Data (μ3) Post ( μ4 | μ1, μ2, μ3, σ2, data ) ~ Prior (μ4) X Data (μ4) Post ( σ2 | μ1, μ2, μ3, μ4, data ) ~ Prior (σ2) X Data (σ2) The Gibbs sampler 1.Assign starting values 2.Sample μ1 from conditional distribution 3.Sample μ2 from conditional distribution 4.Sample μ3 from conditional distribution 5.Sample μ4 from conditional distribution 6.Sample σ2 from conditional distribution 7.Go to step 2 until enough iterations The Gibbs sampler Iteration μ1 μ2 μ3 μ4 σ2 1 3.00 5.00 8.00 3.00 10 2 3.75 4.25 7.00 4.30 8 3 3.65 4.11 6.78 5.55 5 . . . . . . 15 4.45 3.19 5.08 6.55 1.1 . . . . . . . . . . . . 199 4.59 3.75 5.21 6.36 1.2 200 4.36 3.45 4.65 6.99 1.3 Trace plot Trace plot: posterior Posterior Distribution 31 Burn In Gibbs sampler must run t iterations ‘burn in’ before we reach target distribution f(Z) – How many iterations are needed to converge on the target distribution? Diagnostics – Examine graph of burn in – Try different starting values – Run several chains in parallel 32 Convergence 33 Convergence 34 Convergence 35 Convergence 36 Convergence 37 Conclusion about convergenge Burn-in: Mplus deletes first half of chain Run multiple chains (Mplus default 2) – Decrease Bconvergence: default .05 but better use .01 ALWAYS do a graphical evaluation of each and every parameter 38 Summing up Probability Degree of belief Prior What is known before observing the data Posterior What is known after observing the Informative prior Tool to include subjective knowledge Non-informative prior Try to express absence of prior knowledge Posterior mainly determined by data MCMC methods Simulation (sampling) techniques to obtain the posterior distribution and all posterior summary measures Convergence Important to check 39 IQ N = 20 Data are generated Mean = 102 SD = 15 N = 20 Data are generated Mean = 102 SD = 15 IQ 40 IQ 41 IQ Prior type ML Prior 1 A Prior 2a M or A Prior2b M or A Prior2c M or A Prior 3A Prior 4W Prior 5 W Prior 6a W Prior 6b W Prior Variance used large variance, SD=100 medium variance, SD=10 small variance, SD=1 medium variance, SD=10 small variance, SD=1 Large variance, SD=100 medium variance, SD=10 Posterior Mean IQ score 95% C.I./C.C.I. 102.00 101.99 101.99 101.99 102.00 102.03 102.00 102.00 99.37 86.56 94.42 – 109.57 94.35 – 109.62 94.40 – 109.42 94.89 – 109.07 100.12 – 103.87 94.22 – 109.71 97.76 – 106.80 100.20-103.90 92.47 – 106.10 80.17 – 92.47 42 43 Uncertainty in Classical Statistics Uncertainty = sampling distribution – Estimate population parameter by – Imagine drawing an infinity of samples – Distribution of ˆ over samples Problem is that we have only one sample – Estimate ˆ and its sampling distribution – Estimate 95% confidence interval 44 Inference in Classical Statistics What does 95% confidence interval actually mean? – Over an infinity of samples, 95% of these contain the true population value – But we have only one sample – We never know if our present estimateˆ and confidence interval is one of those 95% or not 45 Inference in Classical Statistics What does 95% confidence interval NOT mean? We have a 95% probability that the true population value is within the limits of our confidence interval We only have an aggregate assurance that in the long run 95% of our confidence intervals contain the true population value 46 Uncertainty in Bayesian Statistics Uncertainty = probability distribution for the population parameter In classical statistics the population parameter has one single true value In Bayesian statistics we imagine a distribution of possible values of population parameter 47 Inference in Bayesian Statistics What does a95% central credibility interval mean? We have a 95% probability that the population value is within the limits of our confidence interval 48 What have we learned so far? Results are compromise of prior & data However: -> non/low-informative priors -> informative priors -> misspecification of the prior -> convergence Results are easier to communicate (eg CCI compared to confidence interval) Software WinBUGS/ OpenBUGS R packages Special implementation for multilevel regression AMOS LearnBayes, R2Winbugs, MCMCpack MLwiN Bayesian inference Using Gibbs Sampling Very general, user must set up model Special implementation for SEM Mplus Very general (SEM + ML + many other models) MPLUS - ML DATA: FILE IS data.dat; VARIABLE: NAMES ARE IQ; ANALYSIS: ESTIMATOR IS ML; MODEL: [IQ]; 51 MPLUS – BAYES: default settings DATA: FILE IS data.dat; VARIABLE: NAMES ARE IQ; ANALYSIS: ESTIMATOR IS BAYES; MODEL: [IQ]; 52 MPLUS – BAYES: default settings Prior for IQ: Prior mean = 0 Prior variance/precision = 1010 0 IQ 53 MPLUS – BAYES: change prior DATA: FILE IS data.dat; VARIABLE: NAMES ARE IQ; ANALYSIS: ESTIMATOR IS BAYES; MODEL: [IQ] (p1); 54 MPLUS – BAYES: change prior DATA: FILE IS data.dat; VARIABLE: NAMES ARE IQ; ANALYSIS: ESTIMATOR IS BAYES; MODEL: [IQ] (p1); MODEL PRIOR: p1 ~ N(a,b); a = prior mean b = prior precission 55 MPLUS – BAYES: change prior DATA: FILE IS data.dat; VARIABLE: NAMES ARE IQ; ANALYSIS: ESTIMATOR IS BAYES; MODEL: [IQ] (p1); MODEL PRIOR: p1 ~ N(100,10); 56 MPLUS – BAYES: change prior DATA: FILE IS data.dat; VARIABLE: NAMES ARE IQ; ANALYSIS: ESTIMATOR IS BAYES; MODEL: [IQ] (p1); MODEL PRIOR: p1 ~ N(100,10); PLOT: type is plot2; 57 MPLUS – BAYES: change prior DATA: FILE IS data.dat; VARIABLE: NAMES ARE IQ; ANALYSIS: ESTIMATOR IS BAYES; CHAINS = 4; BITERATIONS = (1000); BCONVERGENCE = .01; MODEL: [IQ] (p1); MODEL PRIOR: p1 ~ N(100,10); PLOT: type is plot2; 58 MPLUS – BAYES: change prior DATA: FILE IS data.dat; VARIABLE: NAMES ARE IQ; ANALYSIS: ESTIMATOR IS BAYES; CHAINS = 4; BITERATIONS = (1000); BCONVERGENCE = .01; MODEL: [IQ] (p1); MODEL PRIOR: p1 ~ N(100,10); PLOT: type is plot2; OUTPUT: stand sampstat TECH4 TECH8; 59 Bayesian updating Dynamic interactionism where adolescents are believed to develop through a dynamic and reciprocal transaction between personality and the environment 60 Bayesian updating Dynamic interactionism where adolescents are believed to develop through a dynamic and reciprocal transaction between personality and the environment In 1998, Asendorpf and Wilpers stated that "empirical evidence on the relative strength of personality effects on relationships and vice versa is surprisingly limited" Back in 1998, there had been very few longitudinal studies about personality development. Personality was not often used as outcome variable because it was seen as stable These authors investigated for the first time personality and relationships over time in a sample of young students (n = 132) after their transition to university. The main conclusion of their analyses was that personality influenced change in social relationships, but not vice versa. 61 Bayesian updating In 2001, Neyer and Asendorpf replicated the personality–relationship model, but now using a large representative sample of young adults Based on the previous results Neyer and Asendorpf “[…] hypothesized that personality effects would have a clear superiority over relationships effects“ In line with Asendorpf and Wilpers, they concluded that “Path analyses showed that once initial correlations were controlled, personality traits predicted change in various aspects of social relationships, whereas effects of antecedent relationships on personality were rare and restricted to very specific relationships with one's pre-school children" 62 Bayesian updating Hypothesized to be >0 e1 β1 T1 Extraversion T2 Extraversion r2 β3 r1 e2 β4 T1 Friends β2 T2 Friends Hypothesized to be 0 63 Bayesian updating In 2003 Asendorpf and van Aken continued working on studies into personality–relationship transaction The authors stated that "The aim of the present study was to apply the methodology used by Asendorpf and Wilpers (1998) and Neyer and Asendorpf (2001) to the study of personality–relationship transaction over adolescence, to try to replicate key findings of these earlier studies, particularly the dominance of […] traits over relationship quality“ Asendorpf and van Aken confirmed previous findings: "The stronger effect was an extraversion effect on perceived support from peers. This result replicates, once more, similar findings in adulthood." (p.653) 64 Bayesian updating In 2010, Sturaro, Denissen, van Aken, and Asendorpf, once again, investigated the personality–relationship transaction model Sturaro et al. found some contradictory results compared to the previously described studies "[The Five-Factor theory] predicts significant paths from personality to change in social relationship quality, whereas it does not predict social relationship quality to have an impact on personality change. Contrary to our expectation, however, personality did not predict changes in relationship quality" 65 Bayesian updating In conclusion, the four papers described above clearly illustrate how theory building works in daily practice. Asendorpf and Wilpers (1998) started with testing theoretical ideas on the association between personality and social relationships, tracing back to McCrae and Costa (1996), and although their results were replicated by Neyer and Asendorpf (2001), and Asendorpf and van Aken (2003), Sturaro, Denissen, van Aken, and Asendorpf (2010) were not able to do so. This latter finding let to re-formulations of the original theoretical ideas. 66 Bayesian updating Why not update the results instead of testing the null hypothesis over and over again? Let’s use Bayesian updating and impost subjective priors In the first scenario we only focus on those data sets with similar age groups. Therefore we first re-analyze the data of Neyer and Asendorpf (2001) without using prior knowledge. Thereafter, we reanalyze the data of Sturaro et al. (2010) using prior information based on the data of Neyer and Asendorpf; both data sets contain young adults between 17-30 years of age. 67 Bayesian updating Why not update the results instead of testing the null hypothesis over and over again? Let’s us Bayesian updating and impost subjective priors In the second scenario we assume the relation between personality and social relationships is independent of age and we re-analyze the data of Sturaro et al. using prior information taken from Neyer and Asendorpf and from Asendorpf and van Aken. In this second scenario we make a strong assumption, namely that the cross lagged effects for young adolescents are equal to the cross lagged effects of young adults. This assumption implicates similar developmental trajectories across age groups and indicates a full replication study. 68 Bayesian updating Hypothesized to be >0 e1 β1 T1 Extraversion T2 Extraversion r2 β3 r1 e2 β4 T1 Friends β2 T2 Friends Hypothesized to be 0 69 Scenario 1 Model 1: Neyer & Asendorpf data without prior knowledge Estimate (SD) 95% PPI β1 0.605 (0.037) 0.532 - 0.676 β2 0.293 (0.047) 0.199 - 0.386 β3 0.131 (0.046) 0.043 - 0.222 β4 -0.026 (0.039) -0.100 0.051 70 Scenario 1 Model 1: Model 2: Neyer & Asendorpf data Sturaro et al. data without prior knowledge without prior knowledge Estimate (SD) 95% PPI Estimate (SD) 95% PPI β1 0.605 (0.037) 0.532 - 0.676 0.291 (0.063) 0.169 - 0.424 β2 0.293 (0.047) 0.199 - 0.386 0.157 (0.103) -0.042 - 0.364 β3 0.131 (0.046) 0.043 - 0.222 0.029 (0.079) -0.132 0.180 β4 -0.026 (0.039) -0.100 - 0.303 (0.081) 0.144 - 0.462 0.051 71 Scenario 1 Model 1: Model 2: Model 3: Neyer & Asendorpf data Sturaro et al. data Sturaro et al. data without prior knowledge without prior knowledge with priors based on Model 1 Estimate (SD) 95% PPI Estimate (SD) 95% PPI Estimate (SD) 95% PPI β1 0.605 (0.037) 0.532 - 0.676 0.291 (0.063) 0.169 - 0.424 0.337 (0.058) 0.228 - 0.449 β2 0.293 (0.047) 0.199 - 0.386 0.157 (0.103) -0.042 - 0.364 0.287 (0.082) 0.130 - 0.448 β3 0.131 (0.046) 0.043 - 0.222 0.029 (0.079) -0.132 - 0.180 0.106 (0.072) -0.038 - 0.247 β4 -0.026 (0.039) -0.100 - 0.051 0.303 (0.081) 0.144 - 0.462 0.249 (0.067) 0.111 - 0.375 72 Scenario 2 Model 4: Asendorpf & van Aken data without prior knowledge Estimate (SD) 95% PPI β1 0.512 (0.069) 0.376 - 0.649 β2 0.115 (0.083) -0.049 - 0.277 β3 0.217 (0.106) 0.006 - 0.426 β4 0.072 (0.055) -0.036 - 0.179 73 Scenario 2 Model 4: Model 5: Asendorpf & van Aken data Asendorpf & van Aken data without prior knowledge with priors based on Model 1 Estimate (SD) 95% PPI Estimate (SD) 95% PPI β1 0.512 (0.069) 0.376 - 0.649 0.537 (0.059) 0.424 - 0.654 β2 0.115 (0.083) -0.049 - 0.277 0.140 (0.071) 0.005 - 0 .283 β3 0.217 (0.106) 0.006 - 0.426 0.212 (0.079) 0.057 - 0.361 β4 0.072 (0.055) -0.036 - 0.179 0.073 (0.051) -0.030 - 0.171 74 Scenario 2 Model 4: Model 5: Model 6: Asendorpf & van Aken data Asendorpf & van Aken data Sturaro et al. data without prior knowledge with priors based on Model 1 with priors based on Model 5 Estimate (SD) 95% PPI Estimate (SD) 95% PPI Estimate (SD) 95% PPI β1 0.512 (0.069) 0.376 - 0.649 0.537 (0.059) 0.424 - 0.654 0.313 (0.059) 0.199 - 0.427 β2 0.115 (0.083) -0.049 - 0.277 0.140 (0.071) 0.005 - 0 .283 0.246 (0.087) 0.079 - 0.420 β3 0.217 (0.106) 0.006 - 0.426 0.212 (0.079) 0.057 - 0.361 0.100 (0.076) -0.052 - 0.248 β4 0.072 (0.055) -0.036 - 0.179 0.073 (0.051) -0.030 - 0.171 0.259 (0.070) 0.116 - 0.393 75 Final results Sturaro et al Scenario 1 Scenario 2 Model 3: Model 6: Sturaro et al. data Sturaro et al. data with priors based on Model 1 with priors based on Model 5 Estimate (SD) 95% PPI Estimate (SD) 95% PPI β1 0.337 (0.058) 0.228 - 0.449 0.313 (0.059) 0.199 - 0.427 β2 0.287 (0.082) 0.130 - 0.448 0.246 (0.087) 0.079 - 0.420 β3 0.106 (0.072) -0.038 - 0.247 0.100 (0.076) -0.052 - 0.248 β4 0.249 (0.067) 0.111 - 0.375 0.259 (0.070) 0.116 - 0.393 76 Final results Sturaro et al Scenario 1 Scenario 2 Model 3: Model 6: Sturaro et al. data Sturaro et al. data with priors based on Model 1 with priors based on Model 5 Model 2: Sturaro et al. data without prior knowledge Estimate (SD) 95% PPI Estimate (SD) 0.291 (0.063) 0.157 (0.103) β1 β2 0.169 - 0.424 0.337 (0.058) -0.042 - 0.364 0.287 (0.082) 95% PPI Estimate (SD) 95% PPI 0.228 - 0.449 0.313 (0.059) 0.199 - 0.427 0.130 - 0.448 0.246 (0.087) 0.079 - 0.420 0.029 (0.079) β3 -0.132 - 0.180 0.106 (0.072) -0.038 - 0.247 0.100 (0.076) -0.052 - 0.248 0.303 (0.081) β4 0.111 - 0.375 0.259 (0.070) 0.116 - 0.393 0.249 (0.067) 0.144 - 0.462 77 Conclusions The updating procedure of both scenarios leads us to conclude that the that using subjective priors decrease confidence intervals. => More certainty about the relations However… 78 Conclusions Using subjective priors never changed the real issue namely that Sturaro et al found opposite effects to Neyer and Asendorpf. The results supported the robustness of a conclusion that effects occurring between ages 17 and 23 are different from those occurring between ages 18-30, i.e., the clearly higher age in the Neyer and Asendorpf data. 79 Overall Conclusions Excellent tool to include prior knowledge if available Estimates (including intervals) always lie in the sample space if prior is chosen wisely Results are easier to communicate Better small-sample performance, large-sample theory not needed Analyses can be made less computationally demanding BUT: Bayes doesn’t solve misspecification of the model