* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SNP Discovery and Genotyping Workshop (PowerPoint)
Genetic testing wikipedia , lookup
Genetic drift wikipedia , lookup
Copy-number variation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Metagenomics wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene desert wikipedia , lookup
Pathogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Behavioural genetics wikipedia , lookup
Population genetics wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Microevolution wikipedia , lookup
Public health genomics wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Human genetic variation wikipedia , lookup
Pharmacogenomics wikipedia , lookup
HLA A1-B8-DR3-DQ2 wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
A30-Cw5-B18-DR3-DQ2 (HLA Haplotype) wikipedia , lookup
Haplogroup G-M201 wikipedia , lookup
SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20, 2005 Practical Aspects of SNP Association Studies 1. SNP Discovery: Where do I find SNPs to use in my association studies? (e.g. databases, direct resequencing) 2. SNP Selection: How do I choose SNPs that are informative? (i.e. assessing SNP correlation - linkage disequilibrium) 3. SNP Associations: What analyses can I perform after genotyping these SNPs? (e.g. single SNP data, haplotype data) 4. SNP Replication/Function: How is function predicted or assessed. (e.g. nonsynonymous SNPs, conserved non-coding regions (CNS) transcription factor binding sites, gene expression) SeattleSNPs Program for Genomic Applications: Overview Aim 1: To establish a variation discovery resource capable of comprehensive resequencing of candidate genes related to HLBS. Biological Focus: Inflammation Genes and Pathways: Coagulation, Complement, Cytokines Interacting Partners SNPs in Candidate Genes SeattleSNPs Average Gene Size - 26.5 kb ~ Compare 2 haploid - 1 in 1,200 bp ~130 SNPs (200 bp) - 15,000,000 SNPs ~ 44 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs SeattleSNPs PGA: Candidate Gene SNP Resource • • • • • 4.9 Mb in 47 individuals = 230 Mb total sequence Define sequence diversity - catalogue all SNPs Select “optimal” tagSNPs sets Determine haplotype structure Provide necessary baseline data for association studies Warfarin Pharmacogenetics 1. Background • Warfarin characteristics • Pharmacokinetics/Pharmacodynamics • Discovery of VKORC1 2. VKORC1 - SNP Discovery 3. VKORC1 - SNP Selection (tagSNPs) 4. VKORC1 - SNP Testing • SNP/Haplotype Inference • Haplotype Inference, Testing 5. VKORC1 - SNP Replication/Function Pharmacogenomics as a Model for Association Studies Clear genotype-phenotype link intervention variable response Pharmacokinetics - 5x variation Quantitative intervention and response drug dose, response time, metabolism rate, etc. Target/metabolism of drug generally known gene target that can be tested directly with response Reduce variability and identify outliers. Prospective testing Personalized Medicine Warfarin Background • Commonly prescribed oral anti-coagulant • In 2003, 21.2 million prescriptions were written for warfarin (Coumadin) • Prescribed following MI, atrial fibrillation, stroke, venous thrombosis, prosthetic heart valve replacement, and following major surgery • Difficult to determine effective dosage - Narrow therapeutic range - Monitoring of prothrombin time (INR) - 2.0 - 3.0 - Large inter-individual variation 50 Ave: 5.2 mg/d n = 186 European-American No. of patients 40 30x dose variability 30 20 Add warfarin dose distribution 10 0 0 2 4 6 8 10 12 14 16 Warfarin Dose (mg/d) Patient/Clinical/Environmental Factors Pharmacokinetic/Pharmacodynamic - Genetic Warfarin inhibits the vitamin K cycle Warfarin Epoxide Reductase CYP2C9 Inactivation Pharmacokinetic -Carboxylase (GGCX) Vitamin K-dependent clotting factors (FII, FVII, FIX, FX, Protein C/S/Z) Warfarin Metabolism (Pharmacokinetics) Major pathway for termination of pharmacologic effect is through metabolism of S-warfarin in the liver by CYP2C9 • CYP2C9 SNPs alter warfarin metabolism: CYP2C9*1 (WT) - normal CYP2C9*2 (Arg144Cys) - low/intermediate CYP2C9*3 (Ile359Leu) - low • CYP2C9 alleles occur at a significant minor allele frequency European: *2 - 10.7% *3 - 8.5 % Asian: *2 - 0% *3 - 1-2% African-American: *2 - 2.9% *3 - 0.8% Effect of CYP2C9 Genotype on Anticoagulation-Related Outcomes (Higashi et al., JAMA 2002) WARFARIN MAINTENANCE DOSE TIME TO STABLE ANTICOAGULATION CYP2C9-WT ~90 days CYP2C9-Variant ~180 days *2 or *3 carriers take longer to reach stable anticoagulation N 127 28 4 18 3 5 - Variant alleles have significant clinical impact - Still large variability in warfarin dose (15-fold) in *1/*1 “controls”? Analysis of Independent Predictors of Warfarin Dose Adapted from Gage et al., Thromb Haemost, 2004 Variable Change in Warfarin Dose P value Target INR, per 0.5 increase BMI, per SD Ethnicity (African-American, [Asian]) 21% 14% 13%, [ 10-15%] <0.0005 <0.0001 0.003 Age, per decade Gender, Female Drugs (Amiodarone) CYP2C9*2, per allele CYP2C9*3, per allele 13% 12% 24% 19% 30% <0.0001 <0.0001 0.007 <0.0001 <0.0001 ~ 30% of the variability in warfarin dose is explained by these factors What other candidate genes are influencing warfarin dosing? Warfarin acts as a vitamin K antagonist Pharmacodynamic Epoxide Reductase Warfarin CYP2C9 Inactivation -Carboxylase (GGCX) Vitamin K-dependent clotting factors (FII, FVII, FIX, FX, Protein C/S/Z) New Target Protein for Warfarin Epoxide Reductase (VKORC1) -Carboxylase (GGCX) Clotting Factors (FII, FVII, FIX, FX, Protein C/S/Z) 5 kb - chr 16 Rost et al. & Li, et al., Nature (2004) Warfarin Resistance VKORC1 Polymorphisms Rost, et. al. Nature (2004) • Rare non-synonymous mutations in VKORC1 causative for warfarin resistance (15-35 mg/d) • NO non-synonymous mutations found in ‘control’ chromosomes (n = ~400) Frequency Inter-Individual Variability in Warfarin Dose: Genetic Liabilities SENSITIVITY CYP2C9 coding SNPs - *3/*3 Common VKORC1 non-coding SNPs? RESISTANCE VKORC1 nonsynonymous coding SNPs 0.5 5 15 Warfarin maintenance dose (mg/day) SNP Discovery: Resequencing VKORC1 • PCR amplicons --> Resequencing of the complete genomic region • 5 Kb upstream and each of the 3 exons and intronic segments; ~11 Kb • SeattleSNPs PGA - pga.gs.washington.edu (24 African-Am./23 Europeans) • Warfarin treated clinical patients (UWMC): 186 European • Other populations: 96 European, 96 African-Am., 120 Asian SNP Discovery: Resequencing Results Summary of PGA samples (European, n = 23) Total: 13 SNPs identified 10 common/3 rare (<5% MAF) Clinical Samples (European patients n = 186) Total: 28 SNPs identified 10 common/18 rare (<5% MAF) 15 - intronic/regulatory 7 - promoter SNPs 2 - 3’ UTR SNPs 3 - synonymous SNPs 1 - nonsynonymous - single heterozygous indiv. - highest warfarin dose = 15.5 mg/d How does the comprehensive SNP discovery compare to what was known for this gene? SNP Discovery: dbSNP database dbSNP -NCBI SNP database SNP Discovery: dbSNP database (VKORC1) SeattleSNPs Resequencing 28 SNPs --> 15 SNPs gene region 10 dbSNPs • 8/10 confirmations • 3 frequency/genotype data • 7 new dbSNP entries generated by SeattleSNPs resequencing • 8 dbSNPs/15 SNPs (~50%) SNP Discovery: dbSNP database Nickerson and Kruglyak, Nature Genetics, 2001 Mar 2005 - 5.0 million (validated - 1/600 bp) 5.0/10.0 = 50% of all common SNPs (validated)! SNP discovery is dependent on your sample population size Fraction of SNPs Discovered 2 chromosomes { GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC 1.0 96 48 24 16 8 0.5 2 0.0 0.0 0.1 0.2 0.3 Minor Allele Frequency (MAF) 0.4 0.5 SNP Discovery: dbSNP database dbSNP (Perlegen/HapMap) SeattleSNPs 25% Minor Allele Freq. (MAF) { 50% 75% Minor Allele Freq. (MAF) Rarer and population specific SNPs are found by resequencing dbSNP: Increasing numbers of SNPs now have genotype data 6.00 Validated SNPs SNPs with Genotypes SNPs(millions) 5.00 Perlegen Data 4.00 3.00 2.00 1.00 0.00 Jan-03 Mar-03 Jun-03 Aug-03 Oct-03 Jan-04 Mar-04 dbSNP Release Jun-04 Sep-04 Jan-05 Mar-05 HapMap Phase II Perlegen Current State of dbSNP 12.00 Total Reference SNPs Validated SNPs Genotyped SNPs SNPs(millions) 10.00 8.00 6.00 4.00 2.00 0.00 Jan-03 Mar-03 Jun-03 Aug-03 Oct-03 Jan-04 Mar-04 Jun-04 Sep-04 Jan-05 Mar-05 dbSNP Release Many SNPs left to validate and characterize. Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp Mar 2005 - 5.0 million (validated - 1/600 bp) 5.0/10.0 = 50% of all common SNPs validated! Coming Soon! 5.0 million validated SNPs with genotypes! SNP Discovery: dbSNP database dbSNP Issues: Not comprehensive catalog (50% of SNPs) Is the data confirmed? (50% are validated) Information about allele frequency/population (50%) No information about SNP correlations (linkage disequilibrium) genotyping efficiency SNP Selection: Using Linkage Disequilibrium Frequency • Common SNPs • VKORC1 - 28 total - 10 SNPs > 10% MAF Warfarin Dose (mg/d) Does common variation in VKORC1 have a role in determining warfarin dose? • Evaluate linkage disequilibrium (non-random association of genotype data) SNP Selection: Using Linkage Disequilibrium Site 1 Site 2 C : 50% A : 50% Maternal T : 50% G : 50% Paternal Possible 2-site comb. C C T T A G A G Expected Freq. 0.5 X 0.5 = 0.25 0.5 X 0.5 = 0.25 0.5 X 0.5 = 0.25 0.5 X 0.5 = 0.25 Site 1 C T Observed Freq. 0.50 * 0.01 0.01 0.48 * * Sites Correlated Site 2 A G SNP Selection: Using Linkage Disequilibrium • SNP discovery data (i.e. population of samples with genotypes) • Find all correlated SNPs to minimize the total number of SNPs • Maintains genetic information (correlations) for that locus LD_Select - SNP tagging/binning algorithm - based on LD (r2), not haplotypes Carlson, et al. AJHG (2004) SNP Selection: VG/LD_Select on the Web pga.gs.washington.ed/VG2 SNP Selection: tagSNP Data SNP Selection: VKORC1 tagSNPs SNP Testing: VKORC1 tagSNPs Five Bins to Test 1. 2. 3. 4. 5. 381, 3673, 6484, 6853, 7566 2653, 6009 861 5808 e.g. Bin 1 - SNP 381 9041 Bin 1 - p < 0.001 Bin 2 - p < 0.02 Bin 3 - p < 0.01 Bin 4 - p < 0.001 Bin 5 - p < 0.001 C/C C/T T/T SNP x SNP interactions - haplotype analysis? VKORC1 Summary: SNP Discovery/SNP Selection 1. VKORC1 candidate gene for warfarin dose response 2. SNP discovery performed using PCR/resequencing to catalog common SNPs • 28 SNPs found • 10 common SNPs 3. SNP discovery using dbSNP • 8/10 dbSNPs confirmed • 7 new SNPs added 4. SNP Selection using linkage disequilibrium • 10 common SNPs (> 10% MAF) • 5 informative SNPs for genotyping Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes Pick tagSNPs Pick tagSNPs Infer haplotypes Genotype samples Test for association Haplotypes in Genetic Association Studies 1. How can you get haplotypes? 2. What information do you get from haplotypes? 3. How do you use haplotypes to find tagSNPs? 4. How do you use haplotypes to test for associations? Haplotypes – The Definition “…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997 Constructing Haplotypes Collect pedigrees Somatic cell hybrids C/C, A/G C/T, A/A Rodent Human TT GG T/T, G/G C/C, A/G CC AG Hybrid C/T, A/G CT AG Allele-specific PCR SNP 1 SNP 2 C/T A/G Constructing Haplotypes Examples of Haplotype Inference Software: EM Algorithm Haploview http://www.broad.mit.edu/mpg/haploview/index.php Arlequin http://lgb.unige.ch/arlequin/ PHASE v2.1 http://www.stat.washington.edu/stephens/software.html HAPLOTYPER http://www.people.fas.harvard.edu/~junliu/Haplo/docMain.htm Haplotypes in SeattleSNPs • >200 genes re-sequenced in inflammation response • 2 populations: European- and African-Americans • PHASEv2.0 results posted on website • Interactive tool (VH1) to visualize and sort haplotypes http://pga.gs.washington.edu Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in SeattleSNPs Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes Pick tagSNPs Genotype samples Recombination Natural selection Haplotype block definition Population history Population demography Pick tagSNPs Infer haplotypes Test for association Measuring Pair-wise SNP Correlations • SNP correlation described by linkage disequilibrium (LD) • Pair-wise measures of LD: D´ and r2 D = pAB - pApB; D´ = D/Dmax r2 = D2 f(A1)f(A2)f(B1)f(B2) Recombination Power Using LD and Haplotypes to Pick tagSNPs • r2 is inversely related to power 1/r2 1,000 cases 1,000 controls r2=1.0 1,250 cases 1,250 controls Example: LDSelect • D´ is related to recombination history D´ = 1 D´ < 1 no recombination historical recombination Example: Haplotype “blocks” r2 = 0.80 Haplotype “Blocks” Daly etet al 2001 Daly al Nat. Genet. (2001) Strong LD Few Haplotypes Represent most chromosomes Block Definitions Daly etet al 2001 Daly al Nat. Genet. (2001) D´ [Gabriel et al Science (2002)] Block Definitions Four-gamete test: A B a b A B a A b b a B <4 haplotypes, D´=1 block 4 haplotypes, D´<1 boundary Haplotype Blocks and tagSNPs Identifying blocks and tagSNPs: • Manually • Algorithms – Haploview Haplotype Blocks and tagSNPs tagSNPs IL1B: 19 SNPs (MAF >5%) 4 “common” haplotypes Haplotype Blocks and tagSNPs Identifying blocks and tagSNPs: • Manually • Algorithms – HaploView LD and tagSNPs using Haploview VKORC1 European-Americans PHASEv2.1 data Minimal set of tagSNPs based on r2 Where to Find Tagging Software HaploBlockFinder Haploview LDSelect SNPtagger TagIT tagSNPs http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi http://www.broad.mit.edu/personal/jcbarret/haplo/ http://droog.gs.washington.edu/ldSelect.html http://www.well.ox.ac.uk/~xiayi/haplotype/index.html http://popgen.biol.ucl.ac.uk/software.html http://www-rcf.usc.edu/~stram/tagSNPs.html Haplotypes, TagSNPs, and Caveats • Haplotypes are inferred • Block-like structure assumed for some software • Different block definitions • Block boundaries sensitive to marker density • Genotype savings may not be great (recombination) Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes Pick tagSNPs Pick tagSNPs Infer haplotypes Genotype samples Test for association Genetic diversity of sample Multi-SNP analysis Multi-SNP testing: Haplotypes Five tagSNPs (10 total SNPs) 186 warfarin patients (European) PHASE v2.1 9 haplotypes/5 common (>5%) Multi-SNP testing: Haplotypes Test for association between haplotype and warfarin dose using multiple linear regression Adjusted for all significant covariates: age, sex, amiodarone, CYP2C9 genotype Multi-SNP testing: Haplotypes Explore the evolutionary relationship across haplotypes (381, 3673, 6484, 6853, 7566) 5808 861 9041 CCGATCTCTG-H1 CCGAGCTCTG-H2 A TCGGTCCGCA-H7 TAGGTCCGCA-H8 TACGTTCGCG-H9 B VKORC1 haplotypes cluster into divergent clades Patients can be assigned a clade diplotype: e.g. Patient 1 - H1/H2 = A/A Patient 2 - H1/H7 = A/B Patient 3 - H7/H9 = B/B VKORC1 clade diplotypes show a strong association with warfarin dose Low High Warfarin Dose (mg/d) 8 † A/A A/B B/B † 6 * * * 4 2 0 AA BB AB All patients AA AB BB 2C9 WT patients AA AB BB 2C9 VAR patients (n = 181) (n = 124) (n = 57) Independent of INR levels across all groups Multi-SNP testing: Haplotypes • European - mean ~ 5 mg/d • African-American - higher ~ 6.0-7.0 mg/d • Asian - lower ~ 3.0-3.5 mg/d Hypothesis: VKORC1 haplotypes contribute to racial variability in warfarin dosing. • “Control” populations: 120 Europeans 96 African-Americans 120 Asian Multi-SNP testing: Haplotypes Explore the evolutionary relationship across populations B (11%) B (58%) A (37%) European (CEPH) Clade Distribution Clade A = Low Clade B = High A (89%) Asian (Han) Clade Distribution Low dose phenotype Other (39%) A (14%) B (47%) African-American Clade Distribution High dose phenotype Common Errors in Association Studies • Small sample size Bell and Cardon (2001) • Subgroup analysis and multiple testing • Random error • Poorly matched control group • Failure to attempt study replication e.g., Second case/control study Gene expression studies • Failure to detect LD with adjacent loci • Overinterpreting results and positive publication bias • Unwarranted ‘candidate gene’ declaration after identifying association in arbitrary genetic region SNP Replication: VKORC1 8 Warfarin Dose (mg/d) Univ. of Washington n = 185 † † 6 * * * 4 2 0 AA AB BB All patients Brian Gage Howard McCleod Charles Eby 21% variance in dose explained 8 Warfarin Dose (mg/d) Washington University n = 386 AA AB BB 2C9 WT patients † † 6 * AA AB BB 2C9 VAR patients * † 4 2 0 AA AB BB All patients AA AB BB 2C9 WT patients AA AB BB 2C9 VAR patients SNP Function: VKORC1 Expression mechanism No nonsynonymous SNPs Several SNPs are present in evolutionarily conserved non-coding regions - mRNA expression in human liver cell lines SNP Function: VKORC1 Expression Expression in human liver tissue (n = 53) shows a graded change in expression. VKORC1 SNP alters liver-specific binding site SNP Discovery and Analysis Application to Association Studies Summary • Databases and resources available for SNP discovery • Software for tagSNP selection available • Both single and multi-SNP analysis are useful • Replication required by several journals SeattleSNPs Genotyping Service • Free genotyping (BeadArray or SNPlex) • Emphasis on young investigators • Research related to heart, lung, blood, or sleep disorders • Moderate to large population samples • Apply at pga.gs.washington.edu • Due: October 15th, 2005 SNP Typing Formats Scale Microtiter Plates - Fluorescence Low eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping Size Analysis by Electrophoresis Medium eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on genomic DNA - new paradigm for high throughput Arrays - Custom or Universal eg. Illumina, ParAllele, Affymetrics - Highly multiplexed - 1,500 SNPs and beyond (500K+) High Taqman Genotyping with fluorescence-based homogenous assays (single-tube assay) = 1 SNP/ tube SNP Typing Formats Scale Microtiter Plates - Fluorescence Low eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping Size Analysis by Electrophoresis Medium eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on genomic DNA - new paradigm for high throughput Arrays - Custom or Universal eg. Illumina, ParAllele, Affymetrics - Highly multiplexed - 1,500 SNPs and beyond (500K+) High Technological Leap - No advance PCR Universal PCR after preparing multiple regions for analysis Several based on primer specific on genomic DNA followed by PCR of the ligated products - different strategies and different readouts. SNPlex, Illumina, Parallele Also, reduced representation - Affymetrix - cut with restriction enzyme, then ligate linkers and amplify from linkers and follow by chip hybridization to read out. Detection 9. Characterize on Capillary Sequencer SNP 1 SNP 2 SNP Typing Formats Scale Microtiter Plates - Fluorescence Low eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping Size Analysis by Electrophoresis Medium eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on genomic DNA - new paradigm for high throughput Arrays - Custom or Universal eg. Illumina, ParAllele, Affymetrics - Highly multiplexed - 1,500 SNPs and beyond (500K+) High Multiplexed Genotyping C - Universal Tag Readouts G A T Locus 2 Specif ic Sequence Locus 1 Specific Sequence Tag1 sequence Tag2 sequence cTag1 sequence cTag2 sequence Subst rat e Bead or Chip Subst rat e Bead or Chip Bead Array Chip Array Tag 1 Tag 2 Tag 3 Tag 4 Illumina Multiplex ~1,000 SNPs ParAllele Not dependent on primary PCR Affymetrics Illumina Platform 96 Multi-array Matrix matches standard microtiter plates ~ 1,500 SNPs typed per matrix for 96 samples Affymetrix’s 100K Chip Optimized for 250-2000bp http://www.affymetrix.com/products/arrays/specific/100k.affx High Throughput Chip Formats Defining the scale of the genotyping project is key to selecting an approach: 1000 individuals 5 to 10 SNPs in a candidate gene - Many approaches (expensive ~ 0.60 per SNP/genotype) $6,000 48 ( to 96) SNPs in a handful of candidate genes (~ 0.25 per SNP/genotype) $12,000 384 0 1,536 SNPs (~0.15 - 0.08 per SNP/genotype) $57,600-122,880 10,000 cSNPs - defined format (~0.05 per SNP/genotype) $500,000 100,000 Genic SNPs - defined format (~0.005 per SNP/genotype $500,000 500,000 SNPs defined format (~0.004 per SNP/ genotype) $2,000,000 Acknowledgements Allan Rettie, Medicinal Chemistry Alex Reiner Dave Veenstra Dave Blough Ken Thummel Washington University Brian Gage Howard McLeod Charles Eby Joyce You - Hong Kong Noel Hastings Maggie Ahearn Josh Smith Chris Baier Peggy Dyer-Robertson