* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Jianfeng Xu, MD, DrPH: GWA - UCLA School of Public Health
Gene desert wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
History of genetic engineering wikipedia , lookup
Minimal genome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Behavioural genetics wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Heritability of IQ wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Human Genome Project wikipedia , lookup
Genomic library wikipedia , lookup
Genome editing wikipedia , lookup
Genome evolution wikipedia , lookup
GWA ─ promising but challenging Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director, Center for Human Genomics Wake Forest University School of Medicine Outline The need for genome-wide association studies The reality of genome-wide association studies Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene The need for GWA Current understanding of disease etiology is limited Therefore, candidate genes or pathways are insufficient Current understanding of functional variants is limited Therefore, the focusing on nonsynonymous changes is not sufficient Results from linkage studies are often inconsistent and broad Therefore, the utility of identified linkage regions is limited GWA studies offer an effective and objective approach Better chance to identify disease associated variants Improve understanding of disease etiology Improve ability to test gene-gene interaction and predict disease risk GWA is promising Many diseases and traits are influenced by genetic factors i.e., they are caused by sequence variants in the genome Over 6 millions SNPs are known in the genome i.e., some SNPs will be directly or indirectly associated with causal variants The cost of SNP Genotyping is reduced i.e., it is affordable to genotype a large number of SNPs in the genome Large numbers of cases and controls are available i.e., there is statistical power to detect variants with modest effect When the above conditions are met… …associated SNPs will have different frequencies between cases and controls GWA is challenging Many diseases and traits are influenced by genetic factors But probably due to multiple modest risk variants They confer a stronger risk when they interact True associated SNPs are not necessary highly significant Too many SNPs are evaluated False positives due to multiple tests Single studies tend to be underpowered False negatives Considerable heterogeneity among studies Phenotypic and genetic heterogeneity False positives due to population stratification Reality of GWA AMD, IBD, T1D, etc. Parkinson’s, nicotine dependence, T2D, etc. Prostate cancer, breast cancer, and other ongoing studies Heart diseases, lung diseases, psychiatric diseases, inflammatory diseases, cancers, and many other studies that are in planning stages Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene Genome coverage Two major platforms for GWA Illumina: HumanHap300, HumanHap550, and HumanHap1M Affymetrix: GeneChip 100K, 500K, and 1M Genome-wide coverage The percentage of known SNPs in the genome that are in LD with the genotyped SNPs Calculated based on HapMap Calculated based on ENCODE Genome coverage Genome-wide coverage Genome coverage of common SNPs (MAF ≥ 0.05) Genome coverage of rare SNPs Genome coverage using multi-markers Pe’er, 2006 Genome coverage Genome coverage for common SNPs (MAF ≥ 0.05) Pe’er, 2006 Genome coverage Genome coverage for common SNPs (MAF ≥ 0.05) Genome coverage for common and rare SNPs Pe’er, 2006 Genome coverage Genome coverage of common SNPs (MAF ≥ 0.05) Genome coverage of common and rare SNPs Genome coverage using multi-markers Pe’er, 2006 Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene Strategies for pre-association analysis Quality control Filter SNPs by genotype call rates Filter SNPs by minor allele frequencies Filter SNPs by testing for Hardy-Weinberg Equilibrium Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Evaluate whether there is an upward bias in association tests Q-Q plot All SNPs Filter by call rate Adjust for stratification Clayton, 2006 Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Correct for stratification by adjusting association statistics at each SNP by a uniform overall inflation factor Is susceptible to over or under adjustment Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Structure (STRUCTURE) Used to assign the samples to discrete subpopulation clusters and then aggregate evidence of association within each cluster Estimate individual proportion of ancestry and treat it as a covariate Computationally intensive when there are a large number of AIMs Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Structure (STRUCTURE) Principal component analysis (EIGENSTRAT) Identify several eigenvectors (ancestries or geographic regions) Adjust genotypes and phenotypes along each eigenvector Compute association statistics using adjusted genotypes and phenotypes No need for AIMs Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene Strategies for association analysis Single SNP analysis using pre-specified genetic models 2 x 3 table (2-df) Additive model (1-df), and test for additivity All possible genetic models Strategies for association analysis Single SNP analysis using pre-specified genetic models Haplotype analysis Two-marker and three-marker slide Multi-marker Within haplotype block Between two recombination hot spots Strategies for association analysis Single SNP analysis using pre-specified genetic models Haplotype analysis Gene-gene and gene-environment interactions Interaction with main effect Logistic regression Interaction without main effect: data mining Classification and recursive tree (CART) Multifactor Dimensionality Reduction (MDR) Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene Sample size and false positives Estimate sample size Sample size OR MAF Type I error Power Quanto Effective sample size Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests Bonferroni correction Nominal significance level = study-wide significance / number of tests Nominal significance level = 0.05/500,000 = 10-7 Effective number of tests Take LD into account Permutation procedure Permute case-control status Mimic the actual analyses Obtain empirical distribution of maximum test statistic under null hypothesis Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests False discovery rate (FDR) Expected proportion of false discoveries among all discoveries Offers more power than Bonferroni Holds under weak dependence of the tests Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests False discovery rate (FDR) Bayesian approach Taking a priori into account, False-Positive Report Probability (FPRP) Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene Confirmation in independent study populations The above approaches may limit the number of false positives Confirmation is needed to dissect true from false positives Replication, examine the results from the 2nd stage only Joint analysis, combining data from 1st stage with 2nd stage Multiple stages Replication vs. joint analysis Skol, 2006 Multiple stages 1st stage # of Risk SNPs # of SNPs tested # of true sig. SNPs (80% power) # of total sig. SNPs (a = 0.01) % of true sig. SNPs 2nd stage 3rd stage 20 16 13 500,000 5,016 63 16 13 10 5,016 63 10 0.38% 21% 100% Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotype, e.g. aggressive or early onset Study aggressive cases 0.20 Controls 0.18 Low grade High grade MAF of rs1447295 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 Iceland Sweden Chicago CGEMS JHU Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history Study cases with family history Antoniou and Easton, 2003 Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history Controls that are disease free Disease free controls 0.20 Controls 0.18 Low grade High grade MAF of rs1447295 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 Iceland Sweden Chicago CGEMS JHU Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history Controls that are disease free Increase their effects by studying a homogeneous population Lower levels of genetic heterogeneity Summary GWA studies are promising but difficult There are many important issues in GWA The impact of these issues can be minimized by a well- designed study