Download PW_dp

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genealogical DNA test wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genetic engineering wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Genomic imprinting wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Human genetic variation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Copy-number variation wikipedia , lookup

Public health genomics wikipedia , lookup

Gene therapy wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

The Selfish Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome-wide association study wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene desert wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Tag SNP wikipedia , lookup

SNP genotyping wikipedia , lookup

Transcript
Gene-set analysis
Danielle Posthuma & Christiaan de Leeuw
Dept. Complex Trait Genetics, VU University
Amsterdam
//danielle/2017/PW_dp.ppt
Boulder, TC31, March 8 2017
SNP associations
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
Gene
Gene
Gene
Gene
Gene
Gene
Function
Function
Function
Function
Function
Function
SNP associations
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
Gene
Gene
Gene
Gene
Gene
Gene
Are all associated SNPs randomly distributed or do they
cluster in genes?
SNP associations
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
Gene
Gene
SNP
SNP
SNP
SNP
SNP
Are all associated SNPs randomly distributed or do they
cluster in genes?
SNP associations
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
Gene
Gene
Gene
Gene
Gene
Gene
Function
Function
Function
Function
Function
Function
Do all implicated genes have different functions or are
they functionally related?
SNP associations
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
Gene
Gene
Gene
Gene
Gene
Gene
Function
Function
Do all implicated genes have different functions or are
they functionally related?
Testing for functional clustering of SNP
associations
Single SNP analysis
Gene-based analysis
Gene-set analysis
- GWAS
- single (candidate) SNPs
SNP-set analysis with gene as
unit of analysis
- whole genome
- candidate gene
SNP-set analysis with sets of
genes as unit of analysis
- targeted gene-sets/pathways
- all known gene-sets/pathways
Testing for functional clustering of SNP
associations
Single SNP analysis
Gene-based analysis
Gene-set analysis
Using quantitative
characteristics of genes
e.g. expression levels or
probability of being a
member of a gene-set
Gene-property analysis
Gene based analysis
• Instead of testing single SNPs and annotating
GWAS-significant ones to genes, we test for the
joint association effect of all SNPs in a gene,
taking into account LD (correlation between
SNPs)
• No single SNP needs to reach genome-wide
significance, yet if multiple SNPs in the same
gene have a lower P-value than expected under
the null, the gene-based test can results in low P
SNP Manhattan plot
Gene Manhattan plot
Gene based analysis
Unit of analysis is the gene
•Pro’s:
– reduce multiple testing (from 2.5M SNPs to
23k genes)
– accounts for heterogeneity in gene
– Immediate gene-level interpretation
•Cons:
– disregards regulatory (often non-genic)
information when based on location based
annotation
– Still a lot of tests
Gene-set analysis
Unit of analysis is a set of functionally related
genes
Pro’s:
–Reduce multiple testing by prioritizing genes in
biological pathways or in groups of (functionally)
related genes
–Increases statistical power
–Deals with genic heterogeneity
–Provides immediate biological insight
Gene-set analysis
Cons
•Crucial to select reliable sets of genes!
– Different levels of information
– Different quality of information
Choosing gene-sets
Gene-sets can be based on e.g.
-protein-protein interaction
-co-expression
-transcription regulatory network
-biological pathway
Use public or commercial databases: e.g. KEGG, Gene
Ontolog, Ingenuit, Biocart, String database, Human Protein
Interaction database
Or:
Create manually, expert curated lists
Online databases vs. manual
Information in online databases tends to be
•somewhat biased
– not all genes included, disease genes tend to
be investigated more often
– genes that are investigated more often will
have more interactions
•not always reliable
– interactions often not validated, sometimes
only predicted. If experimentally seen,
unknown how reliable that experiment was
Statistical issues in gene-set
analyses
• Self-contained vs. competitive tests
• Different statistical algorithms test different
alternative hypotheses
• Different statistical algorithms have
different sensitivity to LD, ngenes, nSNPs,
background h2
Self-contained vs. competitive tests
Null hypothesis:
Self-contained:
H0: The gene-sets are not associated with the trait
Competitive:
H0: The genes in the gene-set are not more
strongly associated with the trait than the genes
not in the gene-set
Why use competitive tests
• Polygenic traits influenced by thousands of
SNPs in hundreds of genes
• Very likely that many combinations (i.e.
gene-sets) of causal genes are
significantly related
• Competitive tests define which
combinations are biologically most
interpretable
Polygenicity and number of significant gene-sets
in self-contained versus competitive testing
De Leeuw, Neale, Heskes, Posthuma. Nat Rev Genet, 2016
For self-contained methods, rates increase with heritability, whereas
they are constant for competitive methods.
Rates are deflated for the binomial and hypergeometric methods because of their
discrete test statistic.
Different statistical algorithms test different alternative
hypotheses
Strategy
Alternative hypothesis
Minimal P-value
At least one SNP in the gene or
gene-set is associated with the
trait
Combined P-value
The combined pattern of
individual P-values provides
evidence for association with the
trait
Different algorithms: LD & Ngenes
De Leeuw, Neale, Heskes, Posthuma. Nat Rev Genet, 2016
Gene-set analysis: Practical