Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM [email protected] Epidemiology 243: Molecular Epidemiology Objectives Molecular genetics primer Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power Central dogma ATCG DNA mRNA Protein What are SNPs? More than 99% of all nucleotides are the same in all humans 1% of nucleotides are polymorphic SNPs>> insertions-deletions Bi-nucleotide – T (80%) Where do SNPs occur? Exons Introns Flanking regions A (20%) What are haplotypes? A haplotype is the pattern of nucleotides on a single chromosome Two “copies” of each chromosome The haplotype inference problem ? T T ? C G G T? A A TA TT CG GG TA AA ? A T ? G G ? A A What is linkage disequilibrium? Linkage disequilibrium (LD) describes the nonrandom association of nucleotides on the same chromosome in a population One nucleotide at one position (locus) predicts the occurrence of another nucleotide at another locus No LD LD What are markers? Disease Phenotype Test for association between phenotype and marker loci Test for genetic association between the phenotype and the DSL LD Candidate gene Marker loci (SNPs) Disease Susceptibility Locus What are tagSNPs? TagSNPs are a subset of all SNPs in a gene that mark groups of SNPs in LD Avoids redundant genotyping LD Marker loci (SNPs) LD Disease Susceptibility Locus The joint effect of tagSNPs in cytokine genes and cigarette smoking in cervical cancer risk T-cell proliferation IL-2 IL-2 gene IFNγ gene IL-2 receptor Proliferation Proliferation of ofTH1-cells TH1-cells IFNγ Activated T-cell Background Cigarette smoking ↑ 1.5- to 3-fold cancer risk Cigarette smoking ↓ levels of IL-2 and IFNγ (cervical and circulating) ↓ levels of IL-2 and IFNγ HPV persistence in the cervix Cervical neoplasia Decreased survival from invasive cervical cancer Model Cigarette smoking SNPs in IL-2, IL-2R, and IFNG HPV-associated squamous cell cervical cancer Methods Study design Population-based case-only study Subjects 308 Caucasian squamous cell cervical cancer cases diagnosed 1986-2004 Residing in 3 western Washington counties Data collection Structured in–person interviews DNA isolated from buffy coats Objectives Molecular genetics primer Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power Multi-stage tagSNP design Select reference panel Re-sequence panel, identify SNPs (many markers, few subjects) Choose tagSNPs Genotype tagSNPs in main study (few markers, many subjects) 1. Select reference panel Definition A sample of your study population Most representative Samples from the Coriell Repository Ability to integrate your data with other resources = Candidate gene SNPs = HapMap SNPs 2. Re-sequence reference panel Amplify and Sequence DNA Gene Phred Phrap (Ewing, 1998) (Ewing, 1998) PolyPhred (Nickerson, 1997) Alternatives to re-sequencing Program for Genomic Applications (PGA) SeattleSNPs – inflammation NIEHS SNPs – environmental response Innate Immunity International HapMap Project 5 million SNPs in four ethnically distinct populations 3. Choose tagSNPs (LD) Option LDSelect Tagger (Carlson, 2002) (de Bakker, 2005) r2 threshold (0.80) Yes Yes SNP exclusions/inclusions No Yes SNP design score No Yes LDSelect output for IL-2 SeattleSNPs, r2≥0.80, MAF ≥0.05, Caucasians Bin Total Number of Sites 1 2 2 2 TagSNPs rs2069763 rs2069772 rs2069776 rs2069778 3 2 rs2069777 rs2069779 4 1 rs2069762 Genomic context Exons (cSNPs) SIFT (Ng, 2002) PolyPhen (Ramensky, 2002) Upstream flanking region Intron-exon junctions Sequence conservation UCSC Genome Browser, PhasCons (Siepel, Score 2005) Repeat region Unique region Objectives Molecular genetics primer Databases and tools to conduct in silico analyses for tagSNP selection/prioritization Factors influencing statistical power 0 Minor allele frequency and genetic model 300 cases, 300 controls, alpha=0.05 Log-additive Dominant 1.0 0.6 0.8 0.4 0.2 1.5 0.0 1.02.0 Effect Size Power 0.8 Power Power Log-additive 0.6 0.8 1.0 0.6 0.8 0.4 0.2 0.0 2.0 1.0 1.52.5 2.0 0.20 0.30 0.6 0.4 0.2 0.0 1.0 2.5 Effect Size Effect Size 0.10 Recessive 0.4 0.2 1.5 2.5 1.0 Power 1.0 Recessive 0.0 1.5 1.0 2.01.5 Effect Size 0.40 Minor allele frequencies 0.50 2.5 2.0 Effect Size 2.5 LD SNPs genotyped SNPs not genotyped r2 Sample size requirement S1 S2 S1 and S2 - - 600 600 S1 S2 1.00 600 600 S1 S2 0.85 600 706 N/r2 (Pritchard, 2001) Genotype error Generally non-differential Reduces your power Every 1% increase in genotyping error rates requires sample size increased by 2-8% (Zou et al, 2004, Genetic Epidemiology) Depends on error model Power calculators Quanto G, E, G X E, G X G Case-control, case-sibling, case-parent, and case-only designs Quantitative or binary outcome htPowercc r2 Power for Association With Error (PAWE) Genotyping errors TagSNP summary Efficient yet comprehensive coverage of the genetic variation in our candidate genes Reduce costs Preference should be given to putatively functional variants: Literature, gene context, sequence conservation Influences of statistical power: MAF, genetic model, LD, and genotyping error Programs for Genomic Applications SeattleSNPs, http://pga.mbt.washington.edu NIEHS, http://egp.gs.washington.edu/ Innate Immunity, http://innateimmunity.net/ International HapMap, http://www.hapmap.org/ Coriell cell repository, www.coriell.org cSNP predictive analysis: SIFT, http://blocks.fhcrc.org/sift/SIFT.html PolyPhen, http://coot.embl.de/PolyPhen Vista, http://genome.lbl.gov/vista/index.shtml The following programs can be found at the Rockefeller site, http://linkage.rockefeller.edu/soft/ Tagger LDSelect PAWE Quanto