Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
7/10/2014 Genome‐Wide Association Study (GWAS) Viji R. Avali Outline • Basic genetic concepts behind GWAS • Genotyping technologies and common study designs • Statistical concepts for GWAS analysis • Replication, interpretation and follow‐up of association results 1 7/10/2014 Goal of GWAS • To use genetic risk factors to predict who is at risk • Identify the biological underpinnings of disease susceptibility for developing new prevention and treatment strategies Application in pharmacology • Identifying DNA sequence variations associated w/ drug metabolism and efficacy as well as adverse effects • Example, warfarin‐‐‐determining the appropriate dose • Personalized medicine 2 7/10/2014 Concepts underlying the study design • Unit of genetic variation – SNP‐‐‐single nucleotide polymorphism – Single base pair changes in the DNA sequence that occur with high frequency in the human genome • SNP (common) vs. Mutation (rare) – Cystic fibrosis‐‐‐mutations in the CFTR gene • Linkage analysis‐‐‐genotyping families affected by cystic fibrosis using a collection of genetic markers across the genome and examining how these genetic markers segregate w/ the disease across multiple families Common Disease Common Variant Hypothesis • Common disorders are likely influenced by genetic variation that is also common in the population • If common genetic variants influence disease, the effect size (or penetrance) for any one variant must be small relative to that found for rare disorders. • If common alleles have small genetic effects (low penetrance), but common disorders show heritability (inheritance in families), then multiple common alleles must influence disease susceptibility. 3 7/10/2014 Figure 1. Spectrum of Disease Allele Effects. Bush WS, Moore JH (2012) Chapter 11: Genome‐Wide Association Studies. PLoS Comput Biol 8(12): e1002822. doi:10.1371/journal.pcbi.1002822 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002822 Capturing Common Variation • 1. location and density of commonly occurring SNPs is needed to identify the genomic regions and individual sites that must be examined by genetic studies • 2. population‐specific differences in genetic variation must be cataloged so that studies of phenotypes in different populations can be conducted with the proper design • 3. correlations among common genetic variants must be determined so that genetic studies do not collect redundant information 4 7/10/2014 International HapMap Project • Used a variety of sequencing techniques to discover and catalog SNPs in European descent populations, the Yoruba populations of African origin, Han Chinese individuals from Beijing, and Japanese individuals from Tokyo • Has since been expanded to include 11 human populations Linkage Disequilibrium • A property of SNPs on a contiguous stretch of genomic sequence that describes the degree to which an allele of a SNP is inherited or correlated with an allele of another SNP within a population • Helps to mathematically describe changes in genetic variations in a population over time • Linkage between markers on a population scale 5 7/10/2014 Linkage Disequilibrium • Chromosomes are mosaics • Extent and conservation of mosaic pieces depends on – Recombination rate – Mutation rate – Population size – Natural selection http://www.sph.umich.edu/csg/abecasis/class/666. 03.pdf Figure 2. Linkage and Linkage Disequilibrium. Bush WS, Moore JH (2012) Chapter 11: Genome‐Wide Association Studies. PLoS Comput Biol 8(12): e1002822. doi:10.1371/journal.pcbi.1002822 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002822 6 7/10/2014 Direct vs. Indirect Association • LD creates two possible positive outcomes from a genetic association study – direct association‐‐‐‐the SNP influencing a biological system that leads to the phenotype is directly genotyped in the study – Indirect association‐‐‐‐the influential SNP is not directly typed, but instead a tag SNP in high LD with the influential SNP is typed Therefore, a significant SNP association from a GWAS should not be assumed as the causal variant Genotyping Technologies • Chip‐based microarray technology – Affymetrix • Prints short DNA sequences as a spot on the chip that recognizes a specific SNP allele – Illumina • Uses a bead‐based technology with slightly longer DNA sequences to detect alleles • More expensive, but provide better specificity • Choice of SNPs matters – Ex: Chip with more SNPs with better overall genomic coverage needed for a study of African genomes 7 7/10/2014 Study Design • Case control vs. quantitative design • Two primary classes of phenotypes: categorical or quantitative • From the statistical perspective, quantitative traits are preferred, but not required for a successful study Association Test • 1. single‐locus analysis • When a well‐defined phenotype has been selected for a study population and genotypes are collected using sound techniques, the statistical analysis can begin • Quantitative traits‐‐‐‐ANOVA (analysis of variance)‐‐‐ null hypothesis is that there is no difference between the trait means of any genotype group • Dichotomous case/ control traits are analyzed using logistic regression‐‐‐null hypothesis‐‐‐there is no association between the phenotype and genotype • http://luna.cas.usf.edu/~mbrannic/files/regression/Log istic.html 8 7/10/2014 Statistical replication • Replication studies should be conducted in an independent dataset drawn from the same population as GWAS • Once an effect is confirmed in the target population, other populations may be sampled to determine if the SNP has an ethnic‐specific effect • Identical phenotype criteria should be used in both GWAS and replication studies • A similar effect should be seen in the replication set from the same SNP, or a SNP in high LD with the GWAS‐identified SNP Meta‐analysis of multiple analysis results • Meta‐analysis developed to examine and refine significance and effect size estimates from multiple studies examining the same hypothesis in the published literature • However, it is rare to find multiple studies that match perfectly on all criteria • Study heterogeneity is often statistically quantified in a meta‐analysis to determine the degree to which studies differ. 9 7/10/2014 Data Imputation • To conduct a meta‐analysis properly, the effect of the same allele across multiple distinct studies must be assessed. • This can prove difficult if different studies use different genotyping platforms (which use different SNP marker sets). • As this is often the case, GWAS datasets can be imputed to generate results for a common set of SNPs across all studies. • Genotype imputation exploits known LD patterns and haplotype frequencies from the HapMap or 1000 Genomes project to estimate genotypes for SNPs not directly genotyped in the study [50]. 10