Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Structure and Uncertainty 1 Peter Green, University of Bristol, 10 July 2003 Statistics and science “If your experiment needs statistics, you ought to have done a better experiment” Ernest Rutherford (1871-1937) 3 Graphical models Mathematics Modelling Algorithms Inference 5 Markov chains Contingency tables Spatial statistics Genetics Regression Graphical models AI Sufficiency 6 Covariance selection Statistical physics 1. Modelling 8 Mathematics Modelling Algorithms Inference Structured systems A framework for building models, especially probabilistic models, for empirical data Key idea – understand complex system – through global model – built from small pieces 9 • comprehensible • each with only a few variables • modular Mendelian inheritance - a natural structured model AB AO A AB AO OO A O OO 12 Mendel O Ion channel model model indicator transition rates Hodgson and Green, Proc Roy Soc Lond A, 1999 hidden state binary signal 13 data levels & variances C1 C2 C3 O1 O2 model indicator transition rates hidden state binary signal 14 * * ** * * * * *** data levels & variances Gene expression using Affymetrix chips Zoom Image of Hybridised Array Hybridised Spot Single stranded, labeled RNA sample * * * * * Oligonucleotide element 20µm Millions of copies of a specific oligonucleotide sequence element Expressed genes Approx. ½ million different complementary oligonucleotides Non-expressed genes 1.28cm 15 Slide courtesy of Affymetrix Image of Hybridised Array Gene expression is a hierarchical process • • • • • • • 16 Substantive question Experimental design Sample preparation Array design & manufacture Gene expression matrix Probe level data Image level data Mapping of rare diseases using Hidden Markov model Larynx cancer in females in France, 1986-1993 (standardised ratios) 20 Posterior probability of excess risk G & Richardson, 2002 Probabilistic expert systems 22 2. Mathematics Mathematics Modelling Algorithms Inference 23 Graphical models Use ideas from graph theory to • represent structure of a joint probability distribution C • by encoding conditional independencies B 24 A D F E Where does the graph come from? • Genetics – pedigree (family connections) • Lattice systems – interaction graph (e.g. nearest neighbours) • Gaussian case – graph determined by non-zeroes in inverse variance matrix 25 A B C D 1 0 0 0 Inverse of (co)variance matrix: 0 C 0 D 0 2 0 0 0 3 0 0 independent case 0 1 A B A 26 B C D A A B C D 2 1 0 0 Inverse of (co)variance matrix: non-zero cov( B, D | A, C ) case 1 2 1 1 dependent non-zero C 0 1 4 2 2 3 D 0 1 B A B C D Few links implies few parameters - Occam’s razor 27 Conditional independence • X and Z are conditionally independent given Y if, knowing Y, discovering Z tells you nothing more about X: p(X|Y,Z) = p(X|Y) •XZY X 29 Y Z Conditional independence as seen in data on perinatal mortality vs. ante-natal care…. Clinic AAnte less Bmore Ante Survived Survived less 176 Died 373 293 20 more 316 197 6 less more 23 Died % died 3% died 1.7 45.1 1.3 1.9 7.9 17 2 8.0 Does survival depend on ante-natal care? .... what if you know the clinic? 30 Conditional independence survival ante clinic survival and clinic are dependent and ante and clinic are dependent but survival and ante are CI given clinic 31 C D F B A 32 E Conditional independence provides a mathematical basis for splitting up a large system into smaller components C D D F B B A 33 E E 3. Inference Mathematics Modelling Algorithms Inference 34 Bayesian paradigm in structured modelling • ‘borrowing strength’ • automatically integrates out all sources of uncertainty • properly accounting for variability at all levels • including, in principle, uncertainty in model itself • avoids over-optimistic claims of certainty 36 Bayesian structured modelling • ‘borrowing strength’ • automatically integrates out all sources of uncertainty • … for example in forensic statistics with DNA probe data….. 38 39 (thanks to J Mortera) 40 4. Algorithms Mathematics Modelling Algorithms Inference 42 Algorithms for probability and likelihood calculations Exploiting graphical structure: • Markov chain Monte Carlo • Probability propagation (Bayes nets) • Expectation-Maximisation • Variational methods 43 Markov chain Monte Carlo • Subgroups of one or more variables updated randomly, – maintaining detailed balance with respect to target distribution • Ensemble converges to equilibrium = target distribution ( = Bayesian posterior, e.g.) 44 Markov chain Monte Carlo ? 45 Updating ? - need only look at neighbours Probability propagation 1 7 6 5 2 3 4 267 form junction tree 26 236 2 46 12 36 3456 Message passing in junction tree root 47 Message passing in junction tree root 48 Structured systems’ success stories include... • Genomics & bioinformatics – DNA & protein sequencing, gene mapping, evolutionary genetics • Spatial statistics – image analysis, environmetrics, geographical epidemiology, ecology • Temporal problems – longitudinal data, financial time series, signal processing 52 http://www.stats.bris.ac.uk/~peter [email protected] 53 …thanks to many