* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Population Genetics Program on West Nile Virus
Point mutation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Tay–Sachs disease wikipedia , lookup
Minimal genome wikipedia , lookup
Human genome wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Medical genetics wikipedia , lookup
Gene expression programming wikipedia , lookup
Genetic testing wikipedia , lookup
Genetic drift wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Helitron (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
Behavioural genetics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Population genetics wikipedia , lookup
Human genetic variation wikipedia , lookup
Designer baby wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Heritability of IQ wikipedia , lookup
Genome (book) wikipedia , lookup
Genetic Association Studies and GWAS October 16, 2015 Topics • Study Design • Potential Threats to validity: - sample recruitment - genotyping error -errors in data analysis - replication - population structure West Nile Virus Transmission Cycle West Nile Outbreaks • • • • • • Israel - 1951-1954, 1957 France - 1962 South Africa - 1974 Romania – 1996 Italy 1998 Russia - 1999 1999 West Nile Virus Activity NYC Mosquitoes Birds Humans Clinical Syndromes • 80% asymptomatic • 20% “West Nile Fever” • 1 in 20 of symptomatic patients develop neuroinvasive disease - Meningitis - Encephalitis - Acute Flaccid Paralysis • Apart from increased age, risk factors ill defined Hypothesis • WNV neuroinvasive disease is a consequence of genetic factors that result in increased WNV replication and subsequent pathology Q. Why detect Genes associated with Disease ? • Diagnosis • Prognosis • Therapeutics • Basic Mechanisms of disease Objectives • To assess the association between immune response genotype sets and susceptibility to neuroinvasive disease • To characterize the relationship between gene polymorphisms, protein function, and WNV infection Q. What sort of evidence do you look for to see if the question is worthwhile? Is there evidence for a ‘familial’ effect? • Migration Studies • Do immigrants have disease risk similar to • Familial Aggregation studies their native population or to the new Disease Healthy population? SubjectsFH+ are relatives Relative of a case FHRelative of a control Disease a ca c • Sibling relative risk: ls P(disease|sibling is affected) P(disease) • Familial aggregation if ls > 1 OR = ad/bc > 1 ? Healthy b db d Is there evidence for a ‘genetic’ effect? • Familial correlations in phenotype? • Heritability can be thought of as the similarity between related individuals that is due to shared genes. • If trait is heritable, individuals who share genes should have higher correlation between trait values than individuals who do not share genes – Parent & offspring trait values should be correlated – Identical twins should be more correlated than siblings – Sibling values should be more correlated than cousins Very similar Phenotype similarity (covariance) Heritability – Familial Correlations Heritable Trait Non-Heritable Trait Distant Relatives 2nd cousins Cousins Sibs/ DZ twins MZ twins Calculating Heritability Of A Disease • Twin studies • One way to study heritability MZ twins share 100% of the genome DZ twins share 50% of the genome, on average • So if disease is genetically determined, MZ > DZ concordance MZ twins > concordance DZ twins note: • Any variation in phenotype between MZ twins must be due to environmental variation • Variation in phenotype among DZ twins due to environmental variation AND genetic variation (they don’t necessarily have the same genes) Genetic Epidemiology Questions Is there familial clustering? (Ycould be shared genes or shared environments) Is there evidence for a particular genetic model? (dominant, recessive, polygenic) Is there evidence for a genetic effect? (covariance structure may indicate gene vs environment) Where is the disease gene? • Linkage • Association How does this gene contribute to disease in the general population? (variant frequency, risk magnitude, attributable risk, environmental interactions) Epidemic of Polio in North America • In North America, although sporadic epidemic disease occurred in the first half of the 20th century, by the 1950s epidemics of polio were widespread in North America • Prior to introduction of vaccination, it has been estimated that 600,000 cases of paralytic poliomyelitis occurred annually Nathanson N. Amer J Epidemiol 2010;172: 1213-1229 Host Factors • < 1% of individuals infected with poliovirus developed paralytic polio in pre-vaccine era • In families with a clinical case of poliomyelitis, ratio of inapparent to apparent infection between 3:1 and 7:1 versus 100:1 in the general population Other Evidence for Genetic Predisposition Herndon and Jennings. AJHG 1951:3:17-46 Genetic Epidemiology Process - Methods Familial clustering? – Familial Aggregation studies Evidence for genetic effects? – Heritability studies Based on phenotype data (don’t need DNA) Mode of inheritance model? – Segregation Analyses Where is the disease gene? - Disease gene identification • Genome • Genome wide wide • Particular • Particular chromosomal chromosomal regions regions • Candidate • Candidate genes genes Linkage Linkage analysis analysis (families) (families) • Model-based • Model-based • Model-free • Model-free Association studies (families Association studies (families or or population population samples) samples) • LD • LD • Direct • Direct Human Genetic Analysis Families Linkage Studies Populations Association Studies C/C C/T C/C C/T C/C C/T C/C C/C C/T C/C C/T C/T C/C C/C 40% T, 60% C Cases 15% T, 85% C Controls Simple Inheritance (Segregate) Complex Inheritance (Aggregate) Single Gene with Major Effect Multiple Genes with Small Contributions and Environmental Contexts Variant Rare in the Population Variant(s) Common in the Population ~600 Short Tandem Repeat Markers Polymorphic Markers > 1,000,000 Single Nucleotide Polymorphisms (SNPs) Q. What is the first step in designing the study? Define the phenotype! •Relationship between genotype and diseaserelated phenotype is key concern in genetic epidemiology! •This can be very direct: •Blood type A corresponds exactly to genotypes AA and AO •Or very complicated: •Serum APOE levels may be a function of APOE genotypes as well of other genes and environments Step 1: Define Phenotype – What is the trait? • The ‘phenotype’ is an observable trait in people • Phenotype must be measurable • External ex: Hair color (qualitative): , , Height (quantitative): .....4ft.....5ft.....6ft..... • Biological measurement ex: Protein isoform (qual): APOE2, APOE3, APOE4 Protein amount (quant): ...2copies……3.....100... Blood antigen: A, B, AB, O Mendelian Genetics….. - + + - A,a A,a A,a + + + - AA aA Aa aa Dominant inheritance AA A,a - - + aA Aa aa Recessive inheritance + Phenotype corresponds to: 2 genotypes: A,A A,a 1 genotype: a,a - Phenotype corresponds to: 1 genotype: a,a 2 genotypes: A,a A,A Family Pedigree Dominant Recessive Quantitative Complex Trait/Disease: Trait/Disease: Trait/Disease: Trait/Disease: following gene ‘T’: T T t, t Gg tttttt, Gg TTt t, GG Ttt tt, GG TTtt, Gg T T, TTTT GG TTt t, Gg tT, TTT Gg tT Tt t, t Gg TTt t, Gg Ex: Early-onset Alzheimer’s disease Ex: Cystic Fibrosis ttttt, t gg ttt t, t Gg TTt t, Gg T T t, tT gg Ttt tt, t Gg tttttt, Gg Ttt tt, t gg TTt t, gg Diagnostic Criteria West Nile Meningitis A. Clinical signs of meningeal inflammation B. 1 or more of the following: T > 38 C or < 35 C, CSF cells, WBC > 10,000, compatible CT or MRI results West Nile Encephalitis A. Encephalopathy ≥ 24 hrs B. 2 or more of the following: T > 38 C or < 35 C, CSF pleocytosis, WBC > 10,000, compatible neuroimaging, focal neurologic deficit, meningismus, EEG, seizures Acute Flaccid Paralysis A. Acute onset of limb weakness with progression ≥ 48 hrs B. 2 or more of the following: asymmetric weakness, areflexia/hyporeflexia, absence of pain, paresthesia, or numbness in affected limb, ≥ 5 leuk in CSF and ≥ 48 protein,WBC > 10,000, compatible neuroimaging, or EMG Resistance to WNV in Mice • First demonstrated in 1920’s • Resistant phenotype is determined by a major locus WNV/FLv on chromosome 5 • Susceptibility completely correlated to point mutation resulting in truncation of the 2’5’ OAS L1 isoform • Homologous region in human chromosome 12q Mashimo, PNAS 2002; 99:11311-11316 Clinical Syndromes • 80% asymptomatic • 20% “West Nile Fever” • 1 in 20 of symptomatic patients develop neuroinvasive disease - Meningitis - Encephalitis - Acute Flaccid Paralysis • Apart from increased age, risk factors ill defined Q. How do you find genes responsible for human disease? Difficult: • Many risk models (genotype-phenotype correlations do not follow simple patterns – ‘complex disease’) • Many possible genes (~30,000 human genes) • Difficult challenge to find a disease gene: like finding a misspelled word in a set of encyclopedias! A 1 ‘which chromosome?’ Z 24 7 page ‘chromosomal region’ This is a sentence in a paragraph… This it a sentence in a paragraph… ‘gene’ ‘mutation’ A Z page ‘chromosome’ ‘region’ This is a sentence in a paragraph… ‘gene’ This it a sentence in a paragraph… ‘mutation’ • Too many words to ‘read’ the entire set of volumes (genome) for every individual • Need ‘markers’ to represent sections • Need study designs and statistical methods to find regions (sets of markers) correlated with disease • Then, ultimately look for specific disease-associated DNA variation Definitions… Genome – • The entire sequence of DNA (across all chromosomes) of a particular species. Gene – • A segment of DNA composed of a transcribed region and a regulatory sequence that makes transcription possible. Genetic locus – • Loose term with several interpretations. Often: the specific location of a gene on a chromosome. However, some use the term to refer to a location of a putative gene. One definition: a region, or location, on the genome harboring a particular sequence of interest (gene or several genes). Genetic site – • Loose term with several interpretations. One definition: a particular . nucleotide position on the genome Definitions… One possible visualization: genome locus gene site Definitions… Haplotype • Haploid – one copy of each chromosome • Set of alleles on a particular chromosome transmitted from parent to child (pink for haplotype from Mom, blue from Dad). Diplotype – • Diploid – two homologous copies of each chromosome • Set of two haplotypes carried by an individual (one from each parent), where phase is known. Mom’s Dad’s A T C T G A A C C T G A Mom’s Dad’s A T C T G A A C C A G A Mom’s Dad’s A C C A G A A C C A G A Phase – • Knowledge of the orientation of alleles on a particular transmitted chromosome Illustration of Phase Diploid person with 4 genotypes: • Phase (orientation of alleles on particular chromosomes) is unknown based solely on these genotypes. Two possibilities: (T, C) TC (C, C) CC or TC CT CC CC = (T, A) TA TA TA (G, G) GG GG GG Diplotype 1: Haplotypes: TCTG | CCAG Diplotype 2: Haplotypes: CCTG | TCAG Broad Genetic Epidemiology Study Design Categories: • Linkage Analysis – Follows meiotic events through families for co-segregation of disease and particular genetic variants – Large Families – Sibling Pairs (or other family pairs) – Works VERY well for ‘Mendelian’ diseases • Association Studies – Detect association between genetic variants and disease across families: exploits linkage disequilibrium – Case-Control designs – Cohort designs – Parents – affected child trios (TDT) – May be more appropriate for complex diseases Q. What approaches exist for association studies ? Linkage A,a B,b C,c D,D A,a B,b C,c D,D a,a b,b c,c D,D A,a b,b c,c D,D A,A b,b C,c D,d A,A B,b C,c D,D A,a B,b c, c D,D A,A B,b C,c D,D A,a b, b c, c D,D A,a b,B c,C d,D A,A b,b C,c D,D a,A B,b C,c D,D A,a b,B C,C D,D A,a B,b C,c d,D A,a B,b C,c d,D A,A B,b C,C d,D A,a b,b C,c D,D a,A b,b c,C D,D A,a B,b C,c d,D • All 4 loci are ‘linked’ to the (unobserved) disease allele WITHIN each of the 3 families Linkage .vs. Linkage Disequilibrium (LD) A,a B,b C,c D,D A,a B,b C,c D,D a,a b,b c,c D,D A,a b,b c,c D,D A,A B,b C,c D,D A,A b,b C,c D,d A,a b,B c,C d,D A,A b,b C,c D,D a,A B,b C,c D,D A,a b,B C,C D,D A,a B,b C,c d,D A,a B,b C,c d,D A,A B,b C,C d,D A,a b,b C,c D,D a,A b,b c,C D,D A,a B,b C,c d,D • All 4 loci are ‘linked’ to the (unobserved) disease allele WITHIN each of the 3 families • Only alleles ‘B’ and ‘C’ are associated with the disease allele ACROSS families (LD) Genetic association studies – two different concepts Genetic Association Studies 1. Candidate locus testing (direct method) – – – Testing whether a particular locus allele is a disease predisposing allele Not really ‘LD mapping’, more like direct association test normally seen with ‘exposure status’ in traditional epidemiology Very applicable to studies of disease gene variant’s effect on population levels of disease (risk and attributable risk assessment) 2. ‘LD Mapping’ (indirect method) – – Exploitation of relationship between linkage disequilibrium (LD) and genetic distance Testing for LD between marker(s) and (putative) disease allele Genetic Association Studies – Two Different Concepts Known polymorphism SNP has direct effect on protein and phenotype Genetic Association Studies – Two Different Concepts 1. Direct method – Testing whether a particular allele is a disease predisposing (causative) allele – ‘exposure status’ directly measured Eg: A particular APOE allele (e4) changes protein isoform APOE gene on c19 ..GACTAAGGCCC CCGTTCAAGGAA.. C/T • Genotype that particular site for association study Fallin, L3a, 6/21/2005 slide # 46 Genetic Association Studies – Two Different Concepts Known polymorphism SNP is a marker (proxy)in LD with allele that has a direct effect SNP with direct effect on protein and risk Unmeasured! Genetic Association Studies – Two Different Concepts 2. ‘LD Mapping’ (indirect method) – ‘exposure status’ not directly measured – Rely on MARKERS correlated with true exposure status • This correlation is due to linkage disequilibrium Eg: Genotype a nearby genetic marker among study participants APOE gene on c19 ..GACTAAGGCCC CCGTTCAAG…GA CCTG.. C/T A/G Rely on correlation (LD) between these alleles to detect association! Marker-based Studies • We often do not measure the genetic variant of interest • Instead, we genotype markers at known locations in the genome • Look for markers the may indicate close proximity to a disease-related DNA variant Candidate gene analysis • Instead of genome-wide approach, many pursue particular genes as ‘candidates’ – plausible biological role in the phenotype – location in regions where prior evidence for linkage or association has been observed (positional candidate) Taken from: Makridakis and Reichardt, Molecular Epidemiology of Hormone-Metabolic Loci in Prostate cancer. Epidemiologic Reviews, 23: 24-29. Candidate Genes • • • • • • CD209 (DC-SIGN) VDR Fc γ receptor II TNF-, IL-10 HLA-A, HLA-B TAP1, TAP2, and CTLA-4 Case Control Comparison Groups Genome Wide Association Studies • Large number of individuals with disease and a relevant comparison group • DNA isolation and genotyping • Statistical tests for associations between the SNPs passing quality thresholds and the disease/trait • Replication of identified associations in an independent population sample or examination of functional implications experimentally. Lessons Learned from initial G WA Studies • • • • • • • This actually works Size and luck matter! Replication matters Collaboration matters Controls matter, but can be shared sometimes Non-coding SNPs matter Current hypotheses regarding candidate genes and pathways may not matter so much • Several genes influence more than one disease Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls: Comparison of P values for 2 different controls Q. What is the basis for LD? Crossing over and recombination fraction Diploid parent Loci . Chromosome duplication in meiosis Cross-overs occur Gamete production A B C • • 4 haploid gametes: One is passed on to child Crossing over between 2 genes is directly proportional to the distance between them. • Those sites closest together will have the least number of cross-overs between them Ex above:1 recombination between A & B 1 recombination between B & C 2 recombinations between A & C (further apart) The frequency of recombination between sites is measure of ‘genetic distance’, often expressed as the recombination fraction LD Mapping Caveats: Other Reasons for Observed Allelic Associations in Populations • Population stratification / subdivision • Recent admixture • Genetic drift • Selection • Assortative mating • Type 1 error Q. What are key assumptions made in genetic epidemiology studies? Random Mating • Under random mating, all individuals of the opposite sex are equally likely to mate, regardless of their genotype • The combination of two individual genotypes that produce offspring is referred to as a mating type • If random mating, the probability of each mating type is the product of the two genotype probabilities (frequencies) in the population: Mom Dad Genotypes AA Aa aa AA P(AA) x P(AA) P(AA) x P(Aa) P(AA x P(aa) Aa P(Aa) x P(AA) P(Aa) x P(Aa) P(Aa) x P(aa) aa P(aa) x P(AA) P(aa) x P(Aa) P(aa) x P(aa) Random Mating… • P(MT) = P(mom genotype) * P(dad genotype) • There are 6 distinct mating types Mom Dad Genotypes AA Aa aa AA pAA2 pAA* pAa pAA * paa Aa pAA* pAa pAa2 pAa * paa aa pAA * paa pAa * paa paa2 There are 6 distinct mating types (assuming parent gender doesn’t matter) Mating Type Probability of MT AA x AA pAA2 AA x Aa 2pAA* pAa AA x aa 2pAA * paa Aa x Aa pAa2 Aa x aa 2pAa * paa aa x aa paa2 A Population-based Theoretical Example… • With random mating we should have the following: Mating type MT Freq Offspring conditional genotype probability, P(g’|MT) P(MT) AA Aa aa AA x AA 0.5 x 0.5 1 0 0 AA x aa 2(0.5 x 0.5) 0 1 0 aa x aa 0.5 x 0.5 0 0 1 • After one generation of random mating, • P(AA) = SMT P(AA|MT)P(MT) = 1(.5*.5)+ 0 + 0 = .25 • P(Aa) = SMT P(Aa|MT)P(MT) = 0 + 1(2*.5*.5) + 0 = .5 • P(aa) = SMT P(aa|MT)P(MT) = 0 + 0 + 1(.5*.5) = .25 • Genotype frequencies will be: p2 =P(AA), 2pq = P(Aa), q2= P(aa) WNV study: Potential Gene Categories • Primary Response Modifiers (e.g. ISGs) • Cytokines, Chemokines, Chemokine receptors, MHC • Signal Transduction Proteins (e.g. JAK Kinases) • Transcription factors (e.g. IFN regulatory factors) • Antiviral Effector Proteins (e.g. OAS) WNV Study Methods: genotyping • Whole genome screening of non-synonymous variants performed using the Illumina HumanNS-12 Infinium array; • 13,371 single nucleotide polymorphisms (SNPs) in ~6000 genes; • Mostly non-synonymous coding, also includes synonymous, UTR, tagSNPs (MHC). Case-Control Study • Cases from states/provinces with highest rates of WNV infection • Meet CDC criteria for WNV infection and have evidence of neuroinvasive diease • Controls are those who meet criteria for infection with WNV but who did not develop neuroinvasive disease Study Designs Used in Genome-wide Association Studies Pearson, T. A. et al. JAMA 2008;299:1335-1344. Implementation • State and provincial public health agencies contact all WNV infected individuals in 20022008 • 4 Clinical Centers – Pennsylvania, Texas, Nebraska, Ontario • Whole blood is collected from participants and sent to McGill Genome Center Analysis • Two stage design, retest the best candidates in a second cohort • 600 cases for each stage to detect alleles with MAF > 0.05 for a two fold risk increase • Unconditional LR to compute odds ratios and 95% CI adjusted for site Samples: Genotyped and Phenotyped Stage 1 Stage 2 Cases: 488 (445) 143 + Controls: 858 (813) 142 + SNP discovery is dependent on your sample population size Fraction of SNPs Discovered 2 chromosomes GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 1.0 88 0.5 2 0.0 0.0 0.1 0.2 0.3 0.4 Minor Allele Frequency (MAF) 0.5 Replication A Must Replication Replication Replication Hirschhorn & Daly Nat. Genet. Rev. 6: 95, 2005 NCI-NHGRI Working Group on Replication Nature 447: 655, 2007 Examples of Multistage Designs in Genome-wide Association Studies Pearson, T. A. et al. JAMA 2008;299:1335-1344. Copyright restrictions may apply. Results • Phenotypic data on 1371 patients • 488 NI disease • 858 controls • 25 equivocal Samples: age distribution cases controls Samples: gender distribution Genotyping: quality control Out of 13,371 SNPs: • 133 failed; • 174 have call rate below 95% and were considered failed; • Average call rate is 99.8% for remaining 13,064 SNPs. Out of 1,677 unique samples genotyped • two failed (call rate < 88%); • all others have call rate > 98% (average 99.7%). Minor allele frequency spectrum includes 1009 monomorphic SNPs Hypothetical Quantile-Quantile Plots in Genome-wide Association Studies Pearson, T. A. et al. JAMA 2008;299:1335-1344. Copyright restrictions may apply. Population Structure Pairs who share more alleles (due to relatedness/identity by descent) Pairs who share less alleles (due to different ancestries/ differences in allelic frequencies) Population structure Population structure Population structure Population structure Cryptic relatedness Cryptic relatedness Hardy-Weinberg equilibrium • For each SNP: test to evaluate if there is an excess of heterozygous or homozygous; • After excluding markers that failed HWE at p<0.0005 (48 SNPs) Statistical Tests for HWP • Q: Is an observed departure from HWP statistically significant? – Ho: DHW = 0 HA: DHW 0 • Methods: – Chi-square goodness of fit (GOF) • Ho: Do the data fit a model where genotype frequencies equal expected values under HWP? – Likelihood ratio test (LRT) • Ho: Does a model assuming HWE fit the observed data better than a model that does not assume HWE? – I.e. compare likelihood of data, fixing genotype frequencies to HWP (Lo) versus likelihood of the data without fixing genotype frequencies to match HWP (L1) Reasons for Departure from HWP • Population allele frequencies can change from generation to generation due to: • • • • Migration / admixture Chance, in small populations - genetic drift Mutation Selection - depends on fertility of parents and viability of offspring • Survival bias and gender proportions - Allele frequencies can also change with age within a generation, and could be sex dependent. • Abnormal gene segregation (segregation distortion, meiotic drive - all maternal and paternal gametic contributions are not equally probable) Why is the H-W model useful to Genetic Epidemiology? • Can use HWP assumption to calculate genotype frequencies from observed phenotypes • Can use HWP to obtain haplotype frequencies from observed genotypes – useful for assessing inter-locus equilibrium, later lectures… • Can measure departures from HWP as an indication of population genetic features in a sample: – Inbreeding – Migration / admixture • Can judge potential genotyping errors Important to test for HWE! Testing for association After applying all QC filters: • 445 neuroinvasive cases, 813 controls; • 10,591 SNPs with MAF > 1% in controls entered the analysis; • Logistic model, adjusting for collection center; • X chromosome: risk of males = risk of homozygous females; gender as additional covariate. Methods: samples • • • • • Data on 1371 patients, collected in centers in USA and Canada; All have been infected with the WNv; 488 developed neuroinvasive disease (meningitis, encephalitis, acute flaccid paralysis); 858 did not (controls); 25 equivocal. Methods: genotyping • Whole genome screening of nonsynonymous variants performed using the Illumina HumanNS-12 Infinium array; • 13,371 single nucleotide polymorphisms (SNPs); • Mostly non-synonymous coding, also includes synonymous, UTR, tag-SNPs (MHC). Preliminary results: Manhattan Plot Testing for association rs2066786 p = 1.67 x 10-6 RFC1 (4p14-p13) Frq Frq Cases Ctrls Alberta .76 .55 Colorado .67 .50 Nebraska .63 .55 Ontario/Manitoba .65 .44 Saskatchewan .57 .54 Texas .59 .43 OR: 1.64 (1.34; 2.01) RFC1 • • REPLICATION FACTOR C, 140-KD SUBUNIT -- 25 exons; Has been shown to be essential for coordinated synthesis of both DNA strands during simian virus 40 DNA replication in vitro; • • rs2066786: coding synonymous (Pro847Pro) p = 1.67 x 10-6; No other SNPs in RFC1 on the genotyping array. RFC1 • • REPLICATION FACTOR C, 140-KD SUBUNIT -- 25 exons; Has been shown to be essential for coordinated synthesis of both DNA strands during simian virus 40 DNA replication in vitro; • • rs2066786: coding synonymous (Pro847Pro) p = 1.67 x 10-6; No other SNPs in RFC1 on the genotyping array. • SNPs in or near RFC1 (rs2066786, or in LD with it) are potentially regulatory (p<2.78 x 10-9) Testing for association rs2298771 p = 1.73 x 10-4 SCN1A (2q24) Frq Frq Cases Ctrls Alberta .50 .39 Colorado .40 .34 Nebraska .37 .27 Ontario/Manitoba .28 .31 Saskatchewan .52 .30 Texas .31 .34 OR: 1.50 (1.21; 1.86) SCN1A • • SODIUM CHANNEL, NEURONAL TYPE I, ALPHA SUBUNIT -- 26 exons; Shown to be associated with generalized epilepsy with febrile seizures, myoclonic epilepsy, familial hemiplegic migraine; • • rs2298771: coding non-synonymous (Ala1056Thr) p = 1.73 x 10-4; No other SNPs in SCN1A on the genotyping array. Testing for association rs25651 p = 5.5 x 10-4 ANPEP (15q26.1) Frq Frq Cases Ctrls Alberta .76 .69 Colorado .64 .65 Nebraska .72 .63 Ontario/Manitoba .73 .63 Saskatchewan .74 .67 Texas .67 .57 OR: 1.47 (1.18; 1.83) ANPEP • • ALANYL AMINOPEPTIDASE -- 20 exons; Serves as receptor for HCoV-229E (human coronavirus 229E); mediates human cytomegalovirus (HCMV) infection; • • rs25651: coding non-synonymous (Ser752Asn) p = 5.5 x 10-4; rs8192297: coding non-synonymous (Ile603Met) p = 0.39. Validation and replication panel Genotyping • Panel of 33 SNPs was designed (Sequenom MassARRAY iPLEX Gold): • Top 12 SNPs from primary analysis for validation, replication; • TagSNPs in RFC1. Results in primary samples • SNP reproducibility rate between Illumina/Sequenom: > 99.62%; • Tag-SNPs results in RFC1: Replication samples • • • • Data on 617 patients; All have been infected with the WNv; 277 developed neuroinvasive disease (meningitis, encephalitis, acute flaccid paralysis); 340 did not (controls). SNP Gene Allele Freq OR Sample size required rs2066786 RFC1 0.53 1.64 285 cases/285 controls rs2298771 SCN1A 0.30 1.50 450 cases/450 controls rs25651 ANPEP 0.65 1.47 530 cases/530 controls 80% power; p<0.001 Lack of replication SNP LOC56964-rs3738573 SCN1A-rs2298771 2'-PDE-rs2241988 RFC1-rs4974996 RFC1-rs11096990 RFC1-rs3733282 RFC1-rs17288828 RFC1-rs2306597 RFC1-rs2066786 RFC1-rs2066789 RFC1-rs13147094 RFC1-rs4975003 RFC1-rs3796517 RFC1-rs6835022 RFC1-rs6851075 RFC1-rs12644680 RFC1-rs13123782 na-rs9380006 TEX15-rs323347 CWF19L1-rs2270962 na-rs10778292 F7-rs6046 TLN2-rs3816988 LOC56964-rs7163367 ANPEP-rs25651 ANPEP-rs17240268 ANPEP-rs25653 GOT2-rs11076256 GRIN3B-rs2240154 XKR3-rs5748648 Chr Pos Allele1 Allele2 Primary Pvalue Replication Pvalue Joint Pvalue 1 2 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 8 10 12 13 15 15 15 15 15 16 19 22 84636844 166601034 57517213 38903943 38963344 38965339 38966786 38973595 38978424 38984582 38987277 38989865 39013348 39030221 39044049 39047276 39053800 27764478 30825766 102006034 102784417 112821160 60898792 88061149 88136792 88148818 88150562 57309967 954172 15660822 2 2 4 2 4 3 1 1 4 3 1 2 3 4 2 2 1 2 3 4 2 1 2 1 4 1 2 4 4 1 3 4 2 1 2 1 3 3 2 1 3 3 1 1 4 4 2 1 1 2 4 3 4 4 2 3 4 2 2 3 0.003108 0.000299 0.001476 0.000006 0.001328 0.053486 0.583279 0.009005 0.000001 0.028359 0.036064 0.000011 0.003467 0.187690 0.003936 0.922901 0.000806 0.003029 0.000238 0.003019 0.001459 0.002125 0.004330 0.000446 0.000438 0.127363 0.074759 0.003654 0.001397 0.001685 0.53 0.66 0.42 0.60 0.65 0.66 0.87 0.21 0.56 0.50 0.95 0.44 0.82 0.61 0.94 0.31 0.57 0.13 0.13 0.81 0.83 0.83 0.69 0.45 0.88 0.59 0.89 0.98 0.86 0.89 0.007122 0.001455 0.000713 0.000103 0.004485 0.219224 0.484087 0.142987 0.000030 0.051783 0.106146 0.000114 0.012938 0.257738 0.020592 0.659812 0.005485 0.064231 0.045269 0.029862 0.011818 0.019520 0.012611 0.008170 0.003159 0.268328 0.187736 0.027808 0.007999 0.005334 Age distribution Primary Replication Gender distribution Primary Replication Neuroinvasive disease type Primary Replication Ancestry: U.S. census 2000 Forest plots Primary Replication Forest plots Primary Replication Forest plots Primary Replication