Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia, lookup

Foundations of statistics wikipedia, lookup

Time series wikipedia, lookup

Approximate Bayesian computation wikipedia, lookup

Bayesian inference wikipedia, lookup

Transcript
```who was Bayes?
• Reverend Thomas Bayes (1702-1761)
–
–
–
–
part-time mathematician
buried in Bunhill Cemetary, Moongate, London
famous paper in 1763 Phil Trans Roy Soc London
was Bayes the first with this idea? (Laplace?)
• basic idea (from Bayes’ original example)
– two billiard balls tossed at random (uniform) on table
– where is first ball if the second is to its left (right)?

first
second
Y=0
Y=1
prior
pr() = 1
likelihood pr(Y | ) = 1–Y(1–)Y
posterior pr( |Y) = ?
Plant Microarray Course
1
what is Bayes theorem?
• posterior = likelihood * prior / C

pr( parameter | data ) =
pr( data | parameter ) * pr( parameter ) / pr( data)
pr ( , Y ) pr (Y |  )  pr ( )
pr ( | Y ) 

pr (Y )
pr (Y )
Y=0
Y=1
• prior: probability of parameter before observing data
– pr(  ) = pr( parameter )
– equal chance of first ball being anywhere on the table
• posterior: probability of parameter after observing data
– pr(  | Y ) = pr( parameter | data )
– more likely second to left if first is near right end of table
• likelihood: probability of data given parameters
– pr( Y |  ) = pr( data | parameter )
– basis for classical statistical inference about  given Y
Plant Microarray Course
2
6
8
10
prior mean
actual mean
n small prior
n large
n large
prior mean
n small
prior
actual mean
Bayes posterior for normal data
12
14
16
6
y = phenotype values
10
12
14
16
y = phenotype values
small prior variance
8
Plant Microarray Course
large prior variance
3
Bayes posterior for normal data
model
environment
likelihood
prior
Yi =  + Ei
E ~ N( 0, 2 ), 2 known
Y ~ N( , 2 )
 ~ N( 0, 2 ),  known
posterior:
single individual
mean tends to sample mean
 ~ N( 0 + B1(Y1 – 0), B12)
sample of n individuals
 ~ N BnY  (1  Bn )  0 , Bn 2 / n 
with Y  sum Yi / n
{i 1,..., n}
fudge factor
(shrinks to 1)
Bn 
n
n  1
1
Plant Microarray Course
4
n large
n small prior
posterior genotypic means Gq
6
qq
8
10
Qq
12
y = phenotype values
Plant Microarray Course
14
16
QQ
5
posterior genotypic means Gq
posterior centered on sample genotypic mean
but shrunken slightly toward overall mean


prior:
Gq ~ N Y ,  2
posterior:
Gq ~ N BqYq  (1  Bq )Y , Bq 2 / nq


nq  count {Qi  q}, Yq  sum Yi / nq
{Qi  q}
fudge factor:
Bq 
nq
nq  1
1
Plant Microarray Course
6
Are strain differences real?
SREBP1
BTBR
SCD1
2.6
2.3
B6
BTBR
BTBR
Plant Microarray Course
BTBR
G6Pase
0.0
1.0
2.0
1.0 1.4 1.8 2.2
2.2
1.8
1.4
B6
B6
3.0
PEPCK
BTBR
BTBR
2.9
2
-2 -1 0
BTBR
islet
BTBR
B6
PPARalpha
ACO
B6
muscle
BTBR
1
0.5
B6
B6
B6
PPARgamma
-0.5 0.0
few d.f. per gene
Can we trust SDg ?
1.2
1.9
1.7
-2.5
B6
noise negligible?
liver
GPAT
1.6
-1.5
similar pattern
parallel lines
no interaction
fat
FAS
2.1
strain differences?
B6
BTBR
B6
BTBR
7
Bayesian shrinkage of gene-specific SD
• gene-specific SD from replication
– SDg = gene-specific standard deviation (df = 1)
• robust abundance-based estimate
– (Ag) = smoothed over mRNA
– depends only on abundance level Ag (or constant)
• combine ideas into gene-specific hybrid
– “prior” g2 ~ inv-2(0, (Ag)2)
– “posterior” shrinkage estimate
1SDg2 + 0(Ag)2
1 + 0
– combines two “statistically independent” estimates
Plant Microarray Course
8
1.00
SD for strain differences
gene-specific g
0.50
smooth of g
main effects
liver (Ag)
interaction
fat-liver (Ag)
B6
0.05
0.10
0.20
fat (Ag)
B6
fat
liver
muscle
islet
BTBR
BTBR
-2
-1
Plant Microarray Course
0
1
average intensity
2
3
9
B6
0.05
95% 82 limits
new (shrunk) g
size of shrinkage
1g2 + 0(Ag)2
1 + 0
0.10
0.20
gene-specific g
abundance (Ag)
0.50
1.00
Shrinkage Estimates of SD
B6
fat
liver
muscle
islet
BTBR
BTBR
-2
Plant Microarray Course
-1
0
1
average intensity
2
3
10
How good is shrinkage model?
0.8
prior for gene-specific
0 = 5.45,  = 1
2 approximation with 
0 = 3.56,  = .809
0.4
0.2
0.0
2 approximation
Density
histogram of ratio
g2 / (Ag)2
empirical Bayes estimates
0.6
g2 ~ inv-2(0, (Ag)2)
1g2 + 0(Ag)2
1 + 0
0
2
Plant Microarray Course
4
6
8
10
11
B6
B6
liver
muscle
10
5
0
-5
islet
BTBR
-15
fat
-10
fat-liver interaction
shrinkage-based
abundance-based
9 genes identified
15
Effect of SD Shrinkage on Detection
0.02
BTBR
0.10
0.50
2.00
10.00
A = average intensity
Plant Microarray Course
12
QTL Mapping (Gary Churchill)
Key Idea: Crossing two inbred lines creates
linkage disequilibrium which in turn creates
QTL
Marker
Trait
Plant Microarray Course
13
Bayes factors for comparing models
• goal of BF: balance model fit with model "complexity“
– want “best model” that captures key features (model bias)
– want to avoid “overfitting” the data in hand (poor prediction)
• what is a Bayes factor (BF)?
– ratio of posterior odds to prior odds
– ratio of model likelihoods
• BF is same as Bayes Information Criteria (BIC)
– penalty on likelihood ratio (LR)
• want Bayes factor to be much larger than 1 (ideally > 10)
pr ( model 1 | data ) / pr ( model 2 | data ) pr (data | model 1 )
BF12 

pr ( model 1 ) / pr ( model 2 )
pr (data | model 2 )
 2 log( BF12 )  2 log( LR)  ( p2  p1 ) log( n)