Download RS.pps - University of Bristol

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Transcript
Structure and Uncertainty
1
Peter Green, University of Bristol, 10 July 2003
Statistics and science
“If your experiment needs statistics,
you ought to have done a better
experiment”
Ernest Rutherford (1871-1937)
3
Graphical models
Mathematics
Modelling
Algorithms
Inference
5
Markov
chains
Contingency
tables
Spatial
statistics
Genetics
Regression
Graphical models
AI
Sufficiency
6
Covariance
selection
Statistical
physics
1. Modelling
8
Mathematics
Modelling
Algorithms
Inference
Structured systems
A framework for building models,
especially probabilistic models, for
empirical data
Key idea – understand complex system
– through global model
– built from small pieces
9
• comprehensible
• each with only a few variables
• modular
Mendelian inheritance - a
natural structured model
AB
AO
A
AB
AO
OO
A
O
OO
12
Mendel
O
Ion channel
model
model
indicator
transition
rates
Hodgson and Green,
Proc Roy Soc Lond A,
1999
hidden
state
binary
signal
13
data
levels &
variances
C1
C2
C3
O1
O2
model
indicator
transition
rates
hidden
state
binary
signal
14
*
* ** * *
* *
***
data
levels &
variances
Gene expression using
Affymetrix chips
Zoom Image of Hybridised Array
Hybridised Spot
Single stranded,
labeled RNA sample
*
*
*
*
*
Oligonucleotide element
20µm
Millions of copies of a specific
oligonucleotide sequence element
Expressed genes
Approx. ½ million different
complementary oligonucleotides
Non-expressed genes
1.28cm
15
Slide courtesy of Affymetrix
Image of Hybridised Array
Gene expression is a
hierarchical process
•
•
•
•
•
•
•
16
Substantive question
Experimental design
Sample preparation
Array design & manufacture
Gene expression matrix
Probe level data
Image level data
Mapping of rare diseases
using Hidden Markov model
Larynx cancer in
females in France,
1986-1993
(standardised ratios)
20
Posterior probability
of excess risk
G & Richardson, 2002
Probabilistic expert systems
22
2. Mathematics
Mathematics
Modelling
Algorithms
Inference
23
Graphical models
Use ideas from graph theory to
• represent structure of a joint
probability distribution C
• by encoding conditional
independencies
B
24
A
D
F
E
Where does the graph come
from?
• Genetics
– pedigree (family connections)
• Lattice systems
– interaction graph (e.g. nearest
neighbours)
• Gaussian case
– graph determined by non-zeroes in
inverse variance matrix
25
A
B
C
D
1
0
0
0 Inverse of (co)variance matrix:
0
C 0
D 0
2
0
0
0
3
0
0 independent case
0
1
A
B
A
26
B
C
D
A
A
B
C
D
2
1
0
0
Inverse of (co)variance matrix:
non-zero 
cov( B, D
| A, C )
case
1 2  1 1 dependent
non-zero
C 0 1 4 2
2 3
D 0 1
B
A
B
C
D
Few links implies few parameters - Occam’s razor
27
Conditional independence
• X and Z are conditionally
independent given Y if, knowing Y,
discovering Z tells you nothing more
about X:
p(X|Y,Z) = p(X|Y)
•XZY
X
29
Y
Z
Conditional independence
as seen in data on perinatal mortality vs.
ante-natal care….
Clinic
AAnte
less
Bmore
Ante Survived
Survived
less
176 Died
373 293 20
more
316 197 6
less
more 23
Died % died
3% died
1.7
45.1 1.3
1.9 7.9
17
2
8.0
Does survival depend on ante-natal care?
.... what if you know the clinic?
30
Conditional independence
survival
ante
clinic
survival and clinic are dependent
and ante and clinic are dependent
but survival and ante are CI given clinic
31
C
D
F
B
A
32
E
Conditional independence
provides a mathematical basis
for splitting up a large system
into smaller components
C
D
D
F
B
B
A
33
E
E
3. Inference
Mathematics
Modelling
Algorithms
Inference
34
Bayesian paradigm in
structured modelling
• ‘borrowing strength’
• automatically integrates out all sources
of uncertainty
• properly accounting for variability at all
levels
• including, in principle, uncertainty in
model itself
• avoids over-optimistic claims of certainty
36
Bayesian structured
modelling
• ‘borrowing strength’
• automatically integrates out all
sources of uncertainty
• … for example in forensic statistics
with DNA probe data…..
38
39
(thanks to J Mortera)
40
4. Algorithms
Mathematics
Modelling
Algorithms
Inference
42
Algorithms
for probability and
likelihood calculations
Exploiting graphical structure:
• Markov chain Monte Carlo
• Probability propagation (Bayes nets)
• Expectation-Maximisation
• Variational methods
43
Markov chain Monte Carlo
• Subgroups of one or more variables
updated randomly,
– maintaining detailed balance with
respect to target distribution
• Ensemble converges to equilibrium
= target distribution ( = Bayesian
posterior, e.g.)
44
Markov chain Monte Carlo
?
45
Updating
?
- need only look at neighbours
Probability propagation
1
7
6
5
2
3
4
267
form junction tree
26
236
2
46
12
36
3456
Message passing
in junction tree
root
47
Message passing
in junction tree
root
48
Structured systems’
success stories include...
• Genomics & bioinformatics
– DNA & protein sequencing,
gene mapping, evolutionary genetics
• Spatial statistics
– image analysis, environmetrics,
geographical epidemiology, ecology
• Temporal problems
– longitudinal data, financial time series,
signal processing
52
http://www.stats.bris.ac.uk/~peter
[email protected]
53
…thanks
to many