Download Applied Bayesian Inference for Agricultural Statisticians

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computational phylogenetics wikipedia , lookup

Transcript
Applied Bayesian Inference for
Agricultural Statisticians
Robert J. Tempelman
Department of Animal Science
Michigan State University
1
Outline of talk:
•
•
•
•
Introduction
Review of Likelihood Inference
An Introduction to Bayesian Inference
Empirical Bayes Inference
– The Bayesian Connection to Generalized Linear Mixed
Models (GLMM)
• The Bayesian Revolution:
– Markov Chain Monte Carlo (MCMC) methods
– Metropolis Hastings sampling
• Comparisons of Bayesian with conventional GLMM
analyses of agricultural experiments.
• Extensions of GLMM analyses of agricultural
experiments using hierarchical Bayesian inference.
2
Warning
• You won’t learn Bayesian data analysis in one
day…and there is still a lot I don’t know.
• Some great resources for agricultural
statisticians/data analysts
Sorensen
and Gianola
Gelman et al.
Carlin and Louis
3
How did I get interested in
Bayesian Statistics?
• 1986-1989: Masters of Science in Animal Breeding,
University of Guelph
– Additive & Dominance Genetic Variation for milk yield in
dairy cows
Y = Xb + Zaua + Zdud + e;
ua(qx1) ~ N (0,As2a);
Ynx1
q>n
enx1 ~ N (0,Is2e);
ud(qx1) ~ N (0,Ds2d);
A, D: known correlation matrices
b: Fixed effects
ua, ud: Random effects
4
Inference issues
• What was known:
– E(u|b=GLS(b),s2a, s2d,s2e,y) was Best Linear
Unbiased Predictor (BLUP) of u.
• But typically, don’t know variance components
(VC): s2a, s2d, s2e
– Default: Use REML (Restricted Maximum Likelihood)
to estimate VC.
– E(u|b=E-GLS(b), REML (s2a , s2d , s2e ),y) is Empirical
Best Linear Unbiased Predictor (E-BLUP).
– Use Henderson’s Mixed Model Equations to get this.
– What are properties of E-BLUP, E-GLS based on REML
estimates of VC?....don’t ask, don’t tell.
5
An even potentially
bigger inference issue
• Generalized linear mixed models (GLMM).
– i.e., for the analysis of non-normal data?
• Binary, Count, etc.
– Inference in GLMM analyses is often asymptotic
(based on behavior in “large samples”) → even
when VC are known.
– What are the implications if VC are unknown?
6
From last year’s KSU workshop
(Walt Stroup)
• Generalized linear mixed models: What’s really
important
– Probability distributions
– For non-normal data, the model equation form Y = Xb
+ Zu + e is not useful..it’s counterproductive.
• Formal tests are based on asymptotic (“large
sample”) approximations.
– Nice properties when n is “large” When is n large
enough?
– Quasi-likelihood (PROC GENMOD)…what’s that?
• “vacuous” (Walt Stroup) for repeated measures specs.→
can’t even simulate data generation process.
• Can we do better? I think so.
7
Fall, 1989: Phd Program University of
Illinois- here I come!
Journal of Animal Science, 1986
8
My motivation for learning/understanding
Bayesian statistics?
• Pragmatic, not philosophical.
– Animal breeders are incredibly eclectic…they just want to
solve problems in animal genetics!
• “Physicists and engineers very often become immersed in
the subject matter. In particular, they work hand in hand
with neuroscientists and often become experimentalists
themselves. Furthermore, engineers (and likewise
computer scientists) are ambitious; when faced with
problems, they tend to attack, sweeping aside
impediments stemming from limited knowledge about
the procedures that they apply”
– From “What is Statistics” by Brown and Kass (2009) in The
American Statistician.
– This is also the culture of statistical genetics /genomics/
animal breeding….and is the culture of data analysts.
9
Bayesian statistics
• Why the fuss…its philosophy is so messy?
• We’ve been doing things ok already…right?
• What’s wrong with our current toolkit?
– Linear mixed models (LMM).
• Nothing really for classical assumptions.
– Generalized linear mixed models (GLMM)
• Depends on the distribution…binary’s the worst to deal with.
– Nonlinear mixed models. (NLMM)
• Not much wrong for classical assumptions and n is “large
enough”
• Won’t address in this workshop.
10
The real issues?
• 1. Asymptotics
– Likelihood inference is often based on
approximations.
– “large n” is really more than sample size.
• Depends on p.
• Depends on data distribution (e.g. binary vs. cont.).
• Depends on model complexity (i.e., design).
• 2. Flexibility.
– Can we go beyond the (G)(N)LMM?
11