Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantitative trait locus wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Public health genomics wikipedia , lookup

Gene therapy wikipedia , lookup

Minimal genome wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

History of genetic engineering wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene desert wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genome (book) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Microevolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Bayesian Modelling of Differential
Gene Expression
Lewin A1, Richardson S1, Marshall C1,
Glazier A2 and Aitman T2 (2006),
Biometrics 62, 1-9.
1: Imperial College Dept. Epidemiology
2: Imperial College Microarray Centre
Outline

Introduction to microarrays and differential
expression

Bayesian hierarchical model for differential
expression


Decision rules

Predictive model checks

Gene Ontology analysis for differentially
expressed genes
Further work
Microarrays measure gene
expression (mRNA)
(1) Array contains thousands of
spots
*
*
*
*
*
Millions of strands of DNA of known
sequence fixed to each spot
(3) Matching sequences
of DNA and cDNA
hybridize together
DNA TGCT
cDNA ACGA
Pictures courtesy of Affymetrix
(4) Array washed 
only matching
samples left (see
which from
fluorescent spots)
(2) Sample (unknown
sequences of cDNA)
labelled with
fluorescent dye
Microarray experiment to find
genes associated with Cd36
Cd36: gene known to be important in insulin resistance
Aitman et al 1999, Nature Genet 21:76-83
Microarray Data
3 SHR compared with 3 transgenic rats (with Cd36)
3 wildtype (normal) mice compared with 3 mice with Cd36
knocked out
 12000 genes on each array
Biological Question
Find genes which are expressed differently between animals
with and without Cd36.
Outline

Introduction to microarrays and differential
expression

Bayesian hierarchical model for differential
expression


Decision rules

Predictive model checks

Gene Ontology analysis for differentially
expressed genes
Further work
Bayesian hierarchical model for
differential expression
ygsr is log gene expession
overall gene
expression
(fixed effect)

1st level
variance for
each gene
yg1r | g, δg, g1  N(g – ½ δg + r(g)1 , g12),
yg2r | g, δg, g2  N(g + ½ δg + r(g)2 , g22),
differential effect for gene g
between 2 conditions
(fixed effect or mixture prior)
array effect or
normalisation
(function of g)
Prior for gene variances
3 wildtype mice

2nd level
gs2 | μs, τs  logNorm (μs, τs)
Hyper-parameters μs and τs can be
influential, so these are estimated
in the model.

3rd level
μs  N( c, d)
τs  Gamma (e, f)
Variances estimated using information
from all measurements (~12000 x 3)
rather than just 3
Prior for array effects (Normalization)
Spline Curve
r(g)s = quadratic in g for ars(k-1) ≤ g ≤ ars(k)
with coeff (brsk(1), brsk(2) ), k =1, … #breakpoints

a0
a1 a2
a3

Locations of break points not fixed
Must do sensitivity checks on # break points
Array effect as function of gene effect
Bayesian posterior mean
loess
Decision Rules for Inference:
Fixed Effects Model
Inference on δ
(1)dg = E(δg | data) posterior mean
biological
interest
Like point estimate of log fold change.
Decision Rule: gene g is DE if |dg| > δcut
(2)pg = P( |δg| > δcut | data)
biological
interest
posterior probability (incorporates uncertainty)
Decision Rule: gene g is DE if pg > pcut
This allows biologist to specify what size of effect
is interesting (not just statistical significance)
statistical
confidence
Illustration of decision rule
3 wildtype v. 3 knock-out mice
pg = P( |δg| > log(2)
and g > 4 | data)
x pg > 0.8
Δ t-statistic > 2.78
(95% CI)
Outline

Introduction to microarrays and differential
expression

Bayesian hierarchical model for differential
expression


Decision rules

Predictive model checks

Gene Ontology analysis for differentially
expressed genes
Further work
Predictive Model Checks
Key Points

Predict new data from the model (using the
posterior distribution)

Get Bayesian p-value for each gene

Use all genes together (1000’s) to assess model
fit (p-value distribution close to Uniform if model
is good)
Mixed Predictive Checks
Mixed prediction is less
conservative than posterior
prediction
g
ybarg
μ,τ
σg
Sg
post.
pred.
Sg
σgpred
mixed
pred.
Sg
Bayesian predictive p-values
Outline

Introduction to microarrays and differential
expression

Bayesian hierarchical model for differential
expression


Decision rules

Predictive model checks

Gene Ontology analysis for differentially
expressed genes
Further work
Gene Ontology: network of terms
Links connect more general
to more specific terms
Directed Acyclic Graph
~16,000 terms
Picture from Gene Ontology website
Annotations of genes to a node
Each term may have
1000s of genes
annotated (or none)
Gene may be annotated
to several GO terms
Gene annotated to term A
 annotated to all
ancestors of A
Picture from Gene Ontology website
GO annotations of genes associated
with the insulin-resistance gene Cd36
Compare GO annotations of genes
most and least differentially
expressed
Most differentially expressed ↔ pg
> 0.5 (280 genes)
Least differentially expressed ↔ pg
< 0.2 (11171 genes)
GO annotations of genes associated
with the insulin-resistance gene Cd36
Use Fisher’s test to compare GO annotations of genes most and
least differentially expressed (one test for each GO term)
None significant with simple multiple testing adjustment, but there
are many dependencies
Inflammatory
response recently
found to be important
in insulin resistance
Summary of work in Biometrics paper

Bayesian hierarchical model flexible, estimates variances
robustly

Predictive model checks show exchangeable prior good for
gene variances

Useful to find GO terms over-represented in the most
differentially-expressed genes
Outline

Introduction to microarrays and differential
expression

Bayesian hierarchical model for differential
expression


Decision rules

Predictive model checks

Gene Ontology analysis for differentially
expressed genes
Further work
BGmix: mixture model for
differential expression
Change the prior on the
differential expression
parameters δg


Group genes into 3 classes:
 non-DE
 over-expressed
 under-expressed
Estimation and classification is
simultaneous
BGmix: mixture model for
differential expression
Choice of Null Distribution

True log fold changes = 0

‘Nugget’ null: true log fold
changes = small but not
necessarily zero
Choice of DE genes distributions

Gammas

Uniforms

Normal
BGmix: mixture model for
differential expression
Outputs

Point estimates (and s.d.) of log fold changes (stabilised and
smoothed)

Posterior probability for gene to be in each group

Estimate of proportion of differentially expressed genes based on
grouping (parameter of model)
BGmix: mixture model for
differential expression
Obtaining gene lists

Threshold on posterior probabilities
(Posterior probability of classification in the
null < threshold → gene is DE)

Estimate of False Discovery Rate
for any gene list (estimate =
average of posterior probabilities)

Very simple estimate!

Choice of decision rule:
 Bayes Rule
 Fix False Discovery Rate
 More complex rules for mixture
of 3 components
Predictive Checks for Mixture Model



Model checks for
differential expression
parameters δg
More complex for
mixture model
Important point: we
check each mixture
component separately
w
η
zg
g
σg
ybarg
Sg
μ,τ
gpred
σgpred
mixed
pred.
ybarg
mixed
pred.
Sg
Bayesian p-values for Mixture Model
Simulated data
from incorrect
model
Simulated data
from correct
model
Acknowledgements
Co-authors
Sylvia Richardson, Clare Marshall (IC Epidemiology)
Tim Aitman, Anne-Marie Glazier (IC Microarray Centre)
Collaborators on BGX Grant
Anne-Mette Hein, Natalia Bochkina (IC Epidemiology)
Helen Causton (IC Microarray Centre)
Peter Green (Bristol)
BBSRC Exploiting Genomics Grant
Papers and Software
Software:
Winbugs code for model in Biometrics paper
BGmix (R package) includes mixture model
Papers:
BGmix paper, submitted
Paper on predictive checks for mixure prior, in preparation
http://www.bgx.org.uk/