Download GWAS_lecture_Nov_2010_SB

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pharmacogenomics wikipedia , lookup

Genomics wikipedia , lookup

Genomic library wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

SNP genotyping wikipedia , lookup

Human genome wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Twin study wikipedia , lookup

Genome (book) wikipedia , lookup

Genome evolution wikipedia , lookup

Metagenomics wikipedia , lookup

Genetic drift wikipedia , lookup

RNA-Seq wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Microevolution wikipedia , lookup

Pathogenomics wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Human Genome Project wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Population genetics wikipedia , lookup

Public health genomics wikipedia , lookup

Behavioural genetics wikipedia , lookup

Human genetic variation wikipedia , lookup

Heritability of IQ wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Tag SNP wikipedia , lookup

Transcript
Biomedical Master
Introduction to genome-wide
association studies
Metabolic diseases (B. Thorens)
Sven Bergmann
University of Lausanne &
Swiss Institute of Bioinformatics
http://serverdgm.unil.ch/bergmann
Biomedical Master: Metabolic diseases
Lausanne, November 8, 2010
A Systems Biology approach
Large (genomic) systems
Small systems
• many uncharacterized
• elements well-known
elements
• relationships unknown
• computational analysis should:
• many relationships established
• quantitative modeling of
systems properties like:
 improve annotation
 Dynamics
 reveal relations
 Robustness
 reduce complexity
 Logics
Overview
• Population stratification
• Our whole genome associations
• New Methods and Approaches
Genetic variation in SNPs
(Single Nucleotide Polymorphisms)
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
6’189
individuals
CoLaus = Cohort Lausanne
Genotypes
Phenotypes
500.000 SNPs
159 measurement
144 questions
Collaboration with:
Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)
Analysis of Genotypes only
Principle Component Analysis reveals SNP-vectors
explaining largest variation in the data
PC2
PC2
Ethnic groups cluster according to
geographic distances
PC1
PC1
PCA of POPRES cohort
Predicting location according to SNP-profile ...
… is pretty accurate!
The Swiss segregate according to language
PC-Analysis of genotypic profile
• Is surprisingly accurate!
• Is useful for forensic purposes or for
individuals interested in their ancestry
• Is useful for population stratification in
Genome-wide Association studies
Phenotypic variation:
What is association?
SNPs
trait variant
chromosome
Genetic variation
yields phenotypic variation
1.2
1
0.8
Population with ‘ ’ allele
Population with ‘ ’ allele
0.6
0.4
0.2
0
-6
-4
-2
0
2
Distributions of “trait”
4
6
phenotype
Association using regression
genotype
Coded genotype
Regression formalism
(monotonic)
transformation
effect size
(regression coefficient)
error
(residual)
phenotype
(response variable)
of individual i
p(β=0)
coded genotype
(feature) of individual i
Goal: Find effect size that explains best all (potentially
transformed) phenotypes as a linear function of the
genotypes and estimate the probability (p-value) for the data
being consistent with the null hypothesis (i.e. no effect)
Whole Genome Association
Whole Genome Association
Current microarrays probe ~1M SNPs!
significance
Standard approach:
Evaluate significance for association
of each SNP independently:
Whole Genome Association
Quantile-quantile plot
significance
observed
significance
Manhattan plot
Chromosome & position
Expected significance
GWA screens include large number of statistical tests!
• Huge burden of correcting for multiple testing!
• Can detect only highly significant associations
(p < α / #(tests) ~ 10-7)
Genome-wide meta-analysis for serum calcium
identifies significantly associated SNPs near the
calcium-sensing receptor (CASR) gene
Karen Kapur, Toby Johnson, Noam D. Beckmann, Joban Sehmi, Toshiko Tanaka,
Zoltán Kutalik, Unnur Styrkarsdottir, Weihua Zhang, Diana Marek, Daniel F.
Gudbjartsson, Yuri Milaneschi, Hilma Holm, Angelo DiIorio, Dawn Waterworth,
Andrew Singleton, Unnur Steina Bjornsdottir, Gunnar Sigurdsson, Dena
Hernandez, Ranil DeSilva, Paul Elliott, Gudmundur Eyjolfsson, Jack M Guralnik,
James Scott, Unnur Thorsteinsdotti, Stefania Bandinelli, John Chambers, Kari
Stefansson, Gérard Waeber, Luigi Ferrucci, Jaspal S Kooner, Vincent Mooser, Peter
Vollenweider, Jacques S. Beckmann, Murielle Bochud, Sven Bergmann
Current insights from GWAS:
• Well-powered (meta-)studies
with (ten-)thousands of samples
have identified a few (dozen)
candidate loci with highly
significant associations
• Many of these associations
have been replicated in
independent studies
Current insights from GWAS:
• Each locus explains but a tiny (<1%)
fraction of the phenotypic variance
• All significant loci together explain
only a small (<10%) of the variance
The “Missing variance”
(Non-)Problem
Why should a simplistic (additive) model
using incomplete or approximate features
possibly explain anything close to the
genetic variance of a complex trait?
… and it doesn’t have to as long as
Genome-wide Association Studies are meant
to as an undirected approach to elucidate
new candidate loci that impact the trait!
How could our models become
more predictive?
1. Improve measurements:
- measure more variants (e.g. by UHS)
- measure other variants (e.g. CNVs)
- measure “molecular phenotypes”
2. Improve models:
- proper integration of uncertainties
- include interactions
- multi-layer models
Towards a layered Systems Model
We need intermediate (molecular) phenotypes
to better understand organismal phenotypes
Network Approaches
for Integrative Association Analysis
Using knowledge on physical gene-interactions or pathways to
prioritize the search for functional interactions
Transcription Modules reduce Complexity
http://maya.unil.ch:
7575/ExpressionView
SB, J Ihmels & N Barkai Physical Review E (2003)
Association of (average) module expression is
often stronger than for any of its constituent
genes
Take-home Messages:
• Analysis of genome-wide SNP data reveals
that population structure mirrors geography
• Genome-wide association studies elucidate
candidate loci for a multitude of traits, but
have little predictive power so far
• Future improvement will require
– better genotyping (CGH, UHS, …)
– New analysis approaches (interactions,
networks, data integration)