Download Genome-Wide Association Study (GWAS) Outline

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Exome sequencing wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genome evolution wikipedia , lookup

Molecular evolution wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Molecular ecology wikipedia , lookup

Personalized medicine wikipedia , lookup

Transcript
7/10/2014
Genome‐Wide Association Study (GWAS)
Viji R. Avali
Outline
• Basic genetic concepts behind GWAS
• Genotyping technologies and common study designs • Statistical concepts for GWAS analysis
• Replication, interpretation and follow‐up of association results
1
7/10/2014
Goal of GWAS
• To use genetic risk factors to predict who is at risk • Identify the biological underpinnings of disease susceptibility for developing new prevention and treatment strategies
Application in pharmacology
• Identifying DNA sequence variations associated w/ drug metabolism and efficacy as well as adverse effects
• Example, warfarin‐‐‐determining the appropriate dose
• Personalized medicine
2
7/10/2014
Concepts underlying the study design
• Unit of genetic variation – SNP‐‐‐single nucleotide polymorphism
– Single base pair changes in the DNA sequence that occur with high frequency in the human genome
• SNP (common) vs. Mutation (rare)
– Cystic fibrosis‐‐‐mutations in the CFTR gene
• Linkage analysis‐‐‐genotyping families affected by cystic fibrosis using a collection of genetic markers across the genome and examining how these genetic markers segregate w/ the disease across multiple families
Common Disease Common Variant Hypothesis
• Common disorders are likely influenced by genetic variation that is also common in the population
• If common genetic variants influence disease, the effect size (or penetrance) for any one variant must be small relative to that found for rare disorders. • If common alleles have small genetic effects (low penetrance), but common disorders show heritability (inheritance in families), then multiple common alleles must influence disease susceptibility. 3
7/10/2014
Figure 1. Spectrum of Disease Allele Effects.
Bush WS, Moore JH (2012) Chapter 11: Genome‐Wide Association Studies. PLoS Comput Biol 8(12): e1002822. doi:10.1371/journal.pcbi.1002822
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002822
Capturing Common Variation
• 1. location and density of commonly occurring SNPs is needed to identify the genomic regions and individual sites that must be examined by genetic studies
• 2. population‐specific differences in genetic variation must be cataloged so that studies of phenotypes in different populations can be conducted with the proper design
• 3. correlations among common genetic variants must be determined so that genetic studies do not collect redundant information
4
7/10/2014
International HapMap Project
• Used a variety of sequencing techniques to discover and catalog SNPs in European descent populations, the Yoruba populations of African origin, Han Chinese individuals from Beijing, and Japanese individuals from Tokyo
• Has since been expanded to include 11 human populations
Linkage Disequilibrium
• A property of SNPs on a contiguous stretch of genomic sequence that describes the degree to which an allele of a SNP is inherited or correlated with an allele of another SNP within a population
• Helps to mathematically describe changes in genetic variations in a population over time
• Linkage between markers on a population scale 5
7/10/2014
Linkage Disequilibrium
• Chromosomes are mosaics
• Extent and conservation of mosaic pieces depends on
– Recombination rate
– Mutation rate
– Population size
– Natural selection
http://www.sph.umich.edu/csg/abecasis/class/666.
03.pdf
Figure 2. Linkage and Linkage Disequilibrium.
Bush WS, Moore JH (2012) Chapter 11: Genome‐Wide Association Studies. PLoS Comput Biol 8(12): e1002822. doi:10.1371/journal.pcbi.1002822
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002822
6
7/10/2014
Direct vs. Indirect Association • LD creates two possible positive outcomes from a genetic association study
– direct association‐‐‐‐the SNP influencing a biological system that leads to the phenotype is directly genotyped in the study
– Indirect association‐‐‐‐the influential SNP is not directly typed, but instead a tag SNP in high LD with the influential SNP is typed Therefore, a significant SNP association from a GWAS should not be assumed as the causal variant Genotyping Technologies
• Chip‐based microarray technology
– Affymetrix
• Prints short DNA sequences as a spot on the chip that recognizes a specific SNP allele
– Illumina
• Uses a bead‐based technology with slightly longer DNA sequences to detect alleles
• More expensive, but provide better specificity
• Choice of SNPs matters
– Ex: Chip with more SNPs with better overall genomic coverage needed for a study of African genomes
7
7/10/2014
Study Design
• Case control vs. quantitative design
• Two primary classes of phenotypes: categorical or quantitative
• From the statistical perspective, quantitative traits are preferred, but not required for a successful study
Association Test
• 1. single‐locus analysis
• When a well‐defined phenotype has been selected for a study population and genotypes are collected using sound techniques, the statistical analysis can begin
• Quantitative traits‐‐‐‐ANOVA (analysis of variance)‐‐‐
null hypothesis is that there is no difference between the trait means of any genotype group
• Dichotomous case/ control traits are analyzed using logistic regression‐‐‐null hypothesis‐‐‐there is no association between the phenotype and genotype
• http://luna.cas.usf.edu/~mbrannic/files/regression/Log
istic.html
8
7/10/2014
Statistical replication
• Replication studies should be conducted in an independent dataset drawn from the same population as GWAS
• Once an effect is confirmed in the target population, other populations may be sampled to determine if the SNP has an ethnic‐specific effect
• Identical phenotype criteria should be used in both GWAS and replication studies
• A similar effect should be seen in the replication set from the same SNP, or a SNP in high LD with the GWAS‐identified SNP
Meta‐analysis of multiple analysis results
• Meta‐analysis developed to examine and refine significance and effect size estimates from multiple studies examining the same hypothesis in the published literature
• However, it is rare to find multiple studies that match perfectly on all criteria
• Study heterogeneity is often statistically quantified in a meta‐analysis to determine the degree to which studies differ. 9
7/10/2014
Data Imputation
• To conduct a meta‐analysis properly, the effect of the same allele across multiple distinct studies must be assessed. • This can prove difficult if different studies use different genotyping platforms (which use different SNP marker sets). • As this is often the case, GWAS datasets can be imputed to generate results for a common set of SNPs across all studies. • Genotype imputation exploits known LD patterns and haplotype frequencies from the HapMap or 1000 Genomes project to estimate genotypes for SNPs not directly genotyped in the study [50].
10