Download SNP_2_JohnGray

Applications in Bioinformatics, Proteomics, and Genomics SNPs (II) J. Gray (UT) [email protected] Previous lecture - Introduction to SNPs Todays lecture 1: Mapping complex traits using SNPs 2: Explanation of Association analysis 3: Example of complex trait mapping Using SNPs to find a gene linked to retinal dystrophy Janecke et al., 2004 4: Example of trait mapping through whole genome sequencing (WGS) Lupski et al 2010 5: WGS to determine drugs suitable for cancer treatment Iyer et al 2012 6: Homework 1: Mapping complex traits using SNPs Mapping Complex Traits Using SNPs Genome wide association studies (GWAS) have been framed by the Common Disease/Common Variant (CD/CV) hypothesis which states that….. The model that complex disease is largely attributable to a moderate number of common variants, each of which explains several per cent of the risk in ethnically diverse populations Common variants differ from very rare high-risk alleles which have been mainly discovered through pedigree analysis (of related individuals) CDCV versus alternative models The CDCV model has now been refuted in light of the ‘missing heritability problem’: the observation that loci detected by GWASs explain almost without exception a small minority of the inferred genetic variance… (The contribution of genotypic differences among individuals to phenotypic variation). expected signatures from GWA studies for CDCV model of disease Gibson G., 2012 Rare and common variants: twenty arguments Nature Reviews - Genetics 13:125-145 CDCV versus alternative models The Infinitesimal model: many variants of small effect. If half a dozen common variants explain 10% of risk in the population, the remainder is attributable to a hundreds or thousands of variants that each explain considerably less than 1% of disease risk (including rare variants). The contribution of some genes is too small to measure. expected signatures from GWA studies for infinitesimal model of disease Gibson G., 2012 Nature Reviews - Genetics 13:125-145 CDCV versus alternative models The Rare allele model: many rare alleles of large effect. The alternative view is that most of the variance for certain complex diseases is due to moderately or highly penetrant rare variants, the allele frequency of which is typically <1%, most of which are recently derived alleles in the human population. expected signatures from GWA studies for rare allele model of disease Gibson G., 2012 Nature Reviews - Genetics 13:125-145 CDCV versus alternative models The Broad sense heritability model: non-additive GXG and GXE interactions and epigenetic effects. Proponents of this model point to a long history of detection of genotype-by-genotype interactions (aka epistasis) and genotype-by-environment interactions in model organism quantitative genetic research expected signatures from GWA studies for broad sense heritability (G X E) model of disease Green and orange represent different environments Gibson G., 2012 Nature Reviews - Genetics 13:125-145 Reconciliation: There are joint effects of rare and common variants high Enzyme activity Different combinations of rare and common variants can tip the balance from health to disease states Gibson G., 2012 Nature Reviews - Genetics 13:125-145 low Reconciliation: There are joint effects of rare and common variants Gibson G., 2012 Nature Reviews - Genetics 13:125-145 Figure 4 | Joint effects of rare and common variants. A straightforward reconciliation of the effects of rare and common variants supposes that pervasive common variation influences the expression and activity of genes in pathways, establishing the background liability to disease that is then further modified by rare variants with larger effects. In this hypothetical example of central metabolism, standing variation results in some individuals having lower flux than others (left versus right; colored boxes imply enzyme activity differences from low activity (red shading) to high activity (green shading)), but according to standard biochemical theory, systems evolve such that most variation is accommodated within the healthy range. The impact of a rare variant that knocks out one copy of the enzyme indicated by the cross is conditional on this liability, pushing the individual on the left beyond the disease threshold, whereas the individual on the right can accommodate the mutation, given higher activity elsewhere in glycolysis 2: Explanation of Association analysis Whatever the balance is between rare and common variants genome wide association studies have been successful in uncovering thousands of genomic variants associated with diseases. The advent of very high-density microarrays and WGS enable us to capture genome-wide variation on a huge scale. Association studies seek to correlate that variation with the underlying genetic cause of complex phenotypes (such as disease, height, etc)..but how? Brief list of SNP-associated human diseases Int. J. Mol. Med. 2003 11:379-382 Many GWAS studies have been performed to link SNPs to complex phenotypes https://www.gwascentral.org GWAS Central provides a centralized compilation of summary level findings from genetic association studies, both large and small. They actively gather datasets from public domain projects, and encourage direct data submission from the community Beck et al 2014. GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies. European Journal of Human Genetics advance online publication 4 December 2013; doi: 10.1038/ejhg.2013.274 SNPs near genes are associated with disease traits The high frequency and even distribution of SNPs across the genome make them very useful as markers for gene mapping studies – especially of complex traits DNA mutations are occasional leading to an association among SNPs along chromosomes – the presence of 1 variant provides information about the presence of another – (linkage disequilibrium LD) SNPs in 5′ UTR of genes show the largest # of associations SNPs related to exons and the 3′UTR are also enriched. SNPs related to introns are only moderately enriched, Intergenic SNPs show a depletion of associations relative to the average SNP Schork et al 2013 PLOS Genetics DOI: 10.1371/journal.pgen.1003449 Closely linked SNPs may be in linkage disequilibrium If two alleles (or two SNPs) tend to be inherited more often than other pairs then we say that they are in “linkage disequilibrium” (LD) (high LD) Linkage disequilibrium is the non-random association of alleles at two or more loci, that descend from single, ancestral chromosomes Imagine if a particular population has a higher than average incidence of Alzheimer’s disease If one could track SNPs that are in LD and correlate with disease phenotype we would have markers for loci that play a role in Alzheimers disease To map a gene underlying a trait one needs to find segment of chromosome (haplotype) that is linked with a mutation causing predisposition to a certain trait (circled patients in diagram) e.g. segment A in two affected offspring in this diagram In this segment SNPs may be in linkage disequilibrium (LD) Many Generations Important points about LD 1. Population genetics describes the way mutation, recombination, natural selection and demographics affect patterns of LD 2. There is no a priori way to predict the LD pattern in a particular genomic region 3. LD must be empirically assessed in a particular genomic region using appropriately chosen samples 4. SNPs less than a few kb apart may have weak LD (e.g. if there is high recombination) (LD) and haplotype analysis Example: 2 SNPs linked to a monogenic disease Notice that the two populations have two frequencies of the three possible genotypes. Example of a more complex trait where 3 different genes influence the phenotype 2 populations are examined The phenotype is rated on a 0-7 scale Individuals are genotyped using three loci (6 SNPs) – a total of 27 possible genotypes – (color coded to aid visualization of genotypes) Case is more typical of a polygenic trait that can be mapped to quantitative traits loci (QTLs) X’-Y’ locus may have a dominant contribution (blue) a’-b’ may have a minor recessive contribution (red) Locus 1 Locus 2 Locus 3 Pop 1 Pop 2 1’-2’ locus may also have a dominant contribution (red) Strongest phenotype when X’-Y’ and 1’-2” in presence of a’-b” homozygote (see next slide) Score Example is still rather simple – in reality many loci are used and many SNPs per locus Study is population dependent – population 2 has a greater frequency of the disease – more likely to discover informative “ linkage disequilibrium” (LD) How many loci contribute to the phenotype? Which SNPs are linked to disease causing alleles ? How to measure association of SNPs with each other (LD) There are several measures of LD If 2 alleles at each of 2 SNP loci and frequencies of alleles are written p1, p2, q1, q2, with hapltype frequencies written as p11, p22, p12, and p21 Magnitude of LD is D= (p11)(p22) – (p12)(p21) And D’ = D/Dmax Another measure r2 = D2|pApapBpb of LD is r2 Compared with |D ’ |, r2 values are lower and less affected by sample size or allele frequency Example of LD Calculation Suppose there are two genes on Chromosome 5, each with two alleles SNP1 SNP2 ACTGGTAT ………………….GATCAACCAG Allele 1 Allele 2 ACTCGTAT ………………….GATCATCCAG Step 1. Calculate allele frequencies Step 2. Calculate haplotype frequencies (GA, GT, CA, CT) Example of LD Calculation Step 3. Linkage Equilibrium When haplotype frequencies are equal to the product of their corresponding allele frequencies, it means the loci are in linkage equilibrium Step 4. Linkage Disequilibrium We can deduce linkage disequilibrium for each haplotype as the deviation of observed haplotype frequency from its corresponding allelic frequencies expected under equilibrium. D = (p11)(p22) – (p12)(p21) Example of LD Calculation Step 5. Calculation of Linkage Disequilibrium (D) If allele frequencies of p1 and q1 are both 0.5 (thus p2 and q2 are also 0.5) and equilibrium occurs (haplotypes GA, GT, CA, CT all exist in popn) P11 = p1q1 = 0.5 x 0.5 = 0.25 P22 = p2q2 = 0.5 x 0.5 = 0.25 P12 = p1q2 = 0.5 x 0.5 = 0.25 P21 = p2q1 = 0.5 x 0.5 = 0.25 D = (P11)(P22)-(P12)(P21)= (0.25()0.25)- (0.25()0.25) = 0 If allele frequencies of p1 and q1 are both 0.5 but is complete non-random association with equal allele frequencies at all loci (only haplotypes GA, and CT exist in popn) D = (P11)(P22)-(P12)(P21)= (0.5)(0.5)- (0)(0) = .25 P11 = p1q1 +D = 0.25 + D = 0.5 P22 = p2q2 +D = 0.25 + D = 0.5 P12 = p1q2 -D = 0.25 - D = 0 P21 = p2q1 -D = 0.25 - D = 0 If D= 0.25 then D’ = D/Dmax = .25/.25 = 1 (where Dmax= min p1q2 or p2q1) And r2 = D2/(p1p2q1q2) = (0.25)2/(0.5x0.5x0.5x0.5) = .0625/.0625 = 1 LD is lower in more diverse populations and vice versa Mean linkage disequilibrium (D ’ ) as a function of physical distance (kb) in samples from three ethnic groups (Trends in Genetics, 2002 18:1 p19-24) LD is lower in more diverse populations and vice versa Mean linkage disequilibrium (r2) as a function of physical distance (kb) in samples from different ethnic groups (Trends in Genetics, 2002 18:1 p19-24) Notes about Genome Wide Association Studies (GWAS) 1. Association analysis differs from more traditional LD analysis in that it compares the frequency of a set of alleles (tagSNPs) between “unrelated” patients and healthy controls. 2. It is easier to recruit larger numbers of unrelated affected individuals than it is to collect large numbers of pedigrees 3. Regions around a shared marker (tagSNP) are smaller between unrelated individuals so larger populations are required 4. Large SNP genotyping arrays and WGS now enable whole genome scanning to be performed to find tagSNPs 3: Examples of complex trait mapping Mutations in RDH12 encoding a photoreceptor cell retinol dehydrogenase cause childhood-onset severe retinal dystrophy. Janecke et al., 2003. Nature Genetics 36: p850-854 Autosomal recessive childhood-onset severe retinal dystrophy. What was the aim of this research ? How did they set about answering their questions ? What is autosomal recessive childhood-onset severe retinal dystrophy ? First described in 1869 by T. Leber and is also called Leber congenital amaurosis type III (LCA) Groups some of the most common causes of genetically inherited childhood blindness. Starts shortly after birth, involuntary eye movement (nystagamus), sluggish pupillary response Classic case of two-locus trait - since unaffected children can be born from 2 affected parents Starting point: Identifying three consanguineous populations Affected (living) individuals in black. Those with a line underwent an opthalmoscopy. Similar phenotype and from a common geographic area suggests a probable unknown common ancestor to all cases Mapping 10K Array The GeneChip® Mapping 10K Array offers ability to assay over 10,000 genotypes on a single array No need for locus-specific PCR. Requires only 250 ng of DNA for each sample An average of one SNP every 210 kb on genome (about 5cM resolution) Automated genotype calling ( 99.6% accuracy) Extensive SNP annotations in the NetAffx™ Analysis Center SNP assay method - sequence-specific based on hybridization They genotyped DNA samples from 10 affected individuals and 9 carriers from the 3 families 10,894 autosomal markers, 301 X-linked SNPs used Used GDAS v2.0 analysis software to call genotypes. Checked for errors using PedCheck software. Used Merlin and Genehunter programs to reconstruct haplotypes. Found 10 SNPs that were homozygous in affected individuals and heterozygous in carriers - in an interval of 2.86 Mb on chr 14q23.3- q24.1 This region overlapped with an interval previously associated with LCA3 Examined region and found 29 genes - one of which was RDH12 - encoding retinol dehydrogenase -expressed in neuroretina These enzymes help covert Vitamin A to 11-cisretinal. Gene has 7 exons and makes a 316 aa protein Could this be the gene that is disrupted in affected individuals ? Did PCR to amplify all segments of RDH12 gene from affected individuals. Looked for segments containing mutations using denaturing HPLC. DNA with a single base mutation and wild-type DNA are heated and then cooled slowly to form a mixture of hetero- and homoduplexes. These are easily and quickly resolved by HPLC. They detected four SNPs in the RDH12 gene two of them in exons. bp677 A-G transition in exon 6 causes a Tyr226-Cys226 substitution (only in affected individuals) bp482 G-A transition in exon 6 causes a Arg161-Glu226 substitution (also found in unaffected controls) Found bp677 A-G transition in two more unrelated affected individuals in western Austria Other non-Austrian individuals found to be homozygous for the following mutations in RDH12 Origin Mutation German 806 deletion of CCCTG in exon 6 Turkey 565C-T causing Q189X American 146T-C (T49M) and 184C-T (R62X) (on different alleles) Found that all individuals with mutations in RDH12 had retinal dystrophy starting at age 24ys - legal blindness by age 18-25. Fig 2b. Attenuation of arterioles, peripheral pigment deposits in fundus of 5 yr old patient. Then they did a series of biochemical experiments to prove the association that they had found .... Assayed normal and mutant enzymes expressed in vivo in COS-7 cells. The C226 mutation lost nearly all activity. The M49 mutation seemed to produce more than the wildtype and also the back reaction retinol-to retinal (Fig 3b and c) Forward reaction (retinal to retinol) Forward reaction (retinol to retinal) The M49 mutation seemed to produce more than the wildtype and also the back reaction retinolto retinal (Fig 3b and c) Deglycosylated M49 protein shows a similar protein pattern to WT (on western blot) - but a lower amount of glycosylation Anti-RDH12 (loading control with GAPDH on lower panel) Anti-GAPDH Summary of paper •Using 10k SNP assay it was possible to quickly identify a candidate locus underlying genetic blindness in a local population •Success was accelerated due to previous work on RDH enzyme - but could also have been successful without this work •Study is convincing because of follow up genetic, biochemical, and cellular studies. •Challenge for the future – how to use this knowledge to stop children from becoming blind? Development of Targeted Diagnostic Panels New protocols enrich for exon regions of genes (here 163 genes) Followed by high throughput sequencing Tested 179 patients. Found 45% novel mutations in genes. TruSeq Exome Enrichment Workf!ow – requires only 1ug DNA and spans 21,000 genes of interest (62MB coverage) Wang, Xia; et al. 2013 Comprehensive molecular diagnosis of 179 Leber congenital amaurosis and juvenile retinitis pigmentosa patients by targeted next generation sequencing. Jour. of Med Genetics 50, 10: 674-688 From gene discovery to gene therapy RPE65 is the isomerohydrolase essential for regeneration of 11-cis retinal, the chromophore of visual pigments Now are using gene therapy to introduce normal genes into the retinal cells of patients with this form of LCA – with some success. Now looking at treating younger patients Bainbridge J. W., et al. 2008 Effect of gene therapy on visual function in Leber’s congenital amaurosis. N. Engl. J. Med. 358, 2231–2239. Cideciyan A. V., et al. 2009 Human RPE65 Gene therapy for Leber congenital amaurosis: persistence of early visual improvements and safety at one year. Hum. Gene Ther. 20, 999–1004. Annear, M et al. 2013 Successful Gene Therapy in Older Rpe65-Deficient Dogs Following Subretinal Injection of an Adeno-Associated Vector Expressing RPE65 Human Gene Therapy 24, 10: 883-893 RPE65 gene therapy in humans – 3yr outcome 15 patients form 3 to 29 years have been injected with adenovirus vector with RPE65 gene 3 years later retinal degeneration continues but visual improvement is still maintained. Future attempts aim to also decrease retinal degeneration Cideciyan A. V., et al. 2012. Human retinal gene therapy for Leber congenital amaurosis shows advancing retinal degeneration despite enduring visual improvement Proc Natl Acad Sci USA 110(6):E517–E525 and http://www.pnas.org/content/110/19/E1706.full.pdf+html. Functional rescue in the rd12 mouse retina after 7m8mediated RPE65 gene transfer. Directed evoluton of Adenovirus vector to improve gene delivery to retinal cells Deniz Dalkara et al. In Vivo−Directed Evolution of a New Adeno-Associated Virus for Therapeutic Outer Retinal Gene Delivery from the Vitreous Sci Transl Med 5, 189ra76 (2013); DOI: 10.1126/scitranslmed.3005708 Other uses of SNPs Molecularly characterize all blood group variants for any individual – optimize blood transfusions and transplants Characterize predispositions to cancer, heart disease, Alzheimers, side effects of drugs etc..... Map traits in crop and animal species for inclusion in breeding programs Success in complex trait mapping 1: The putative disease gene is located in a chromosomal region that cosegregates with the disease in affected families 2: This region contains multiple independent mutations that are perfectly associated with disease status in the families 3: the characteristics of the mutations obviously alter protein function in relation to the disease phenotype Cautionary notes about association analyses In genetic associations of "common diseases" there is a very low "prior probability" of detecting TRUE positives a P value of 10-5 can still be false ! Other problems include -selection biases in sample collection, -genotyping errors, -population substructure, -subgroup analysis. -possible complex GXG or GXE interactions - 4: Example of trait mapping through whole genome sequencing (WGS) Lupski et al 2010 James R. Lupski, M.D., Ph.D. Baylor College of Medicine Lupski, J. et al., 2010 Whole-Genome Sequencing in a Patient with Charcot–Marie–Tooth Neuropathy (CMT) New England Jour Med 362:13 p 1181 What are the causes of rare neurological diseases? Neurodegeneration can result from subtle mutations acting over prolonged time periods in tissues that do not generally regenerate Linked to: 1) conformational changes causing prion disease, 2) the inability to degrade accumulated toxic proteins e.g. amyloidopathies, α-synucleinopathies, 3) alteration in gene copy number (CNV) and/or expression levels through mechanisms such as uniparental disomy (UPD), chromosomal aberrations (e.g., translocations), and submicroscopic genomic rearrangements including duplications, deletions, and inversions. Charcot-Marie-Tooth (CMT) Phenotypes 39 different genes linked to CMT neurodegeneration but only clinical tests available for 15 – none of these applied to the family in question Charcot-Marie-Tooth (CMT) Disease We identified a family with a recessive form of Charcot–Marie–Tooth disease for which the genetic basis had not been identified. This is a common inherited disorder that affects peripheral nerves We………. 1: sequenced the whole genome of the proband 2: identified all potential functional variants in genes likely to be related to the disease, and 3: genotyped these variants in the affected family members. Charcot-Marie-Tooth Disease Used SOLiD Sequencing (Sequencing by Oligonucleotide Ligation and Detection – performed by Applied Biosystems) Its accuracy in sequencing 50base reads is estimated at approximately 99.94%. Yield of 89.6 Gb of sequence data, representing an average depth of coverage of approximately 30 times per base. No copy number variants in CMT related genes Charcot-Marie-Tooth Disease About a half million new SNPs were uncovered compared to the 7 previously sequenced human genomes – a high rate of discovery of new SNPs at a relatively low cost. Charcot-Marie-Tooth Disease Found 159 coding region SNPs associated with Mendelian Diseases Looked at 40 genes linked to neuropathy – in these were 54 coding sequence SNPs out of 3148 putative SNPs – 2 of these were in the SH3TC2 locus – 1 known R954X and 1 novel Y169H Genetic Pedigree found in this Study New Y169H allele – Y is conserved in animals Looking forward…… In the “old” days -- meaning last week -- experts would have had to suspect which disease the patient had, then hone in on the area of the genome thought to be associated with the disorder. Even then, the results could be far from certain. "The breakthrough is that now we would be able to make this diagnosis without having any preconceived idea that the patient had Charcot-Marie-Tooth disease," Marion said. Cost (About $50,000 (in 2010) compared to current clinical tests in only 15 genes at a cost of $15,000. “ 5: WGS to determine drugs suitable for cancer treatment Iyer et al., 2012 David B. Solit M.D., Memorial SloanKettering Cancer Center Iyer, G. et al., 2012 Genome Sequencing Identifies a Basis for Everolimus Sensitivity. Science 338 p 221 News feature: http://www.reuters.com/article/2013/09/15/healthcancer-superresponders-idUSL2N0GN20120130915 Genetic Basis for “outlier” Cancer patients ? Why do some patients respond to drug treatment and others not? Clinical Trial of Everolimus® on 45 patients with bladder cancer. All died except 1 – whose metastatic tumor cleared. Computed tomography images of the index patient shows complete resolution of metastatic disease (arrows). WGS of Tumor Why did the one patient respond to Everolimus? Performed WGS on the tumor and blood samples 17,136 somatic missense mutations and small indels (mutation rate of 6.21 per MB), 140 were non-synonymous mutations within proteincoding or nc RNA regions of the genome. Somatic abnormalities in the outlier responder’s genome included (from outside to inside) CNVs; mutations at ~10-Mb resolution; regulatory, synonymous, missense, nonsense, nonstop, and frameshift indel mutations (black, orange, red, green,and dark green); and intra- and interchromosomal rearrangements (light and dark blue). A 2bp frameshift in TSC1 gene correlated best with tumor shrinkage Some other patients that exhibited tumor regression had mutations in TSC1. Alterations in TSC1 has been associated with mTORC1 dependence in preclinical models Best overall response of 14 sequenced trial patients. Negative values indicate tumor shrinkage (red line, threshold for partial response). Gradient arrow, patient with rapid progression in bone. Lessons learned from this study 1. Suggests that mTORC1-directed therapies may be most effective in cancer patients whose tumors harbor TSC1 somatic mutations 1. Demonstrate the feasibility of using whole genome and capture-based sequencing methodologies in the clinical setting to identify previously unrecognized biomarkers of drug response in genetically heterogeneous solid tumors. 2. Hundreds of drugs have been abandoned over the years after failing clinical trials, although many had their own exceptional responders. Now some may be resurrected for use (e.g. Avastin). 1. Analyzing one or a few genes in a tumor may miss important targets for tumor treatment Homework Objectives A: Become familiar with the database of the HapMap project website B: Learn to access the HapMap database using the “Guide to HapMart” and HapMap Tutorial C: Use the tools on the HapMap website to find markers that would be useful for doing association analysis of the RDH12 locus discussed in class. D: Use the browser at the www.1000genomes.org website to find some SNPs in coding region of the RDH12 locus Thank you email: [email protected]

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download SNP_2_JohnGray