Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College [email protected] Lecture overview 1. Phenotypic effects caused by known genetic variants 2. Genetic mapping to find genetic variants that cause diseases – linkage analysis and association studies 3. Genome-wide association mapping resources – the HapMap 4. Structural and epigenetic variations in disease 1. Phenotypic effects caused by known genetic variants Many SNPs do have phenotypic effects some notable genetic diseases: cystic fibrosis cycle-cell anemia Badano and Katsanis, NRG 2002 Genetic variants in Pharmacogenetics Evans and Rellig, Science 1999 Genetic variants in Pharmacogenetics Evans and Rellig, Science 1999 Using genotype information in the drug development pipeline Roses. NRG 2004 Are all genetic variants functional? ~ 10 million known SNPs SNPs, on the scale of the genome, can be described well with the “neutral theory” of sequence variations the vast majority of SNPs likely to have no functional effects 0.4 0.3 0.2 0.1 0 16kb 16 kb 12 kb 12 kb 0.00 5.00 10.00 8kb 8 kb 15.00 20.00 25.00 4 kb 30.00 4 kb 35.00 40.00 How do we find the few functional variants in the background of millions of non-functional SNPs? 2. Genetic mapping to find genetic variants that cause diseases – linkage analysis and association studies Genetic mapping Allelic association (linkage) • allelic association is the nonrandom assortment between alleles i.e. it measures how well knowledge of the allele state at one site permits prediction at another marker site functional site • significant allelic association between a marker and a functional site permits localization (mapping) even without having the functional site in our collection • allelic association, and the use of genetic markers is the basis for mapping functional alleles Mendelian diseases have simple inheritance genotype inheritance genotype + phenotype inheritance Linkage analysis compares the transmission of marker genotype and phenotype in families Complex disease – complex inheritance Badano and Katsanis, NRG 2002 Allele frequency and relative risk Brinkman et al. Nature Reviews Genetics advance online publication; published online 14 March 2006 | doi:10.1038/nrg1828 Association study strategies • region(s) interrogated: single gene, list of candidate genes (“candidate gene study”), or entire genome (“genome scan”) • direct or indirect: causative variant • single-SNP marker or multiSNP haplotype marker • single-stage or multi-stage marker that is co-inherited with causative variant causative variant Association study strategies for economy, one cannot genotype every SNP in thousands of clinical samples: marker selection is the process where a subset of all available SNPs is chosen 1. hypothesis driven (i.e. based on gene function) 2. LD-driven – based entirely on the reduction of redundancy presented by the linkage disequilibrium (LD) between SNPs; tags represent other SNPs they are correlated with causative variant Marker selection depends on genome LD Daly et al. NG 2001 Case-control association testing • genotyping cases and controls at various polymorphisms clinical cases • searching for markers with “significant” marker allele frequency differences between cases and controls; these marker signify regions of possible causative alleles AF(controls) clinical controls AF(cases) 3. Genome-wide association mapping resources – the HapMap The HapMap resource • goal: to map out human allele and association structure of at the kilobase scale • deliverables: a set of physical and informational reagents LD structure in four human populations International HapMap Consortium, Nature 2005 LD varies across samples there are large differences in LD between different human populations… European reference (CEU) African reference (YRI) … and even between samples from the same population. Other European samples Sample-to-sample LD differences make tagSNP selection problematic groups of SNPs that are in LD in the HapMap reference samples may not be in a future set of clinical samples… … and tags that were selected based on LD in the HapMap may no longer work (i.e. represent the SNPs they were supposed to) in the clinical samples… … possibly resulting in missed disease associations. Marker selection with additional samples test if markers selected from the HapMap continue to “tag” other SNPs in their original LD group Representative computational samples Two methods of computational sample generation Method 1. “Data-relevant Coalescent”. This algorithm uses a population genetic model to connect mutations in the HapMap reference to mutations in future clinical samples. Full model but computationally slow. “HapMap” HapMap “cases” “controls” Method 2. The PAC method (product of approximate conditionals, Li & Stephens). This method constructs “new” samples as mosaics of existing haplotypes, mimicking the effects of recombination. An approximation but fast. LD difference -- comparison to extra experimental genotypes • we have analyzed two extra genotype sets collected at the HapMap SNPs in three genome regions, from our clinical collaborators (Prof. Thomas Hudson, McGill; Prof. Stanley Nelson, UCLA) 0.949 +/- 0.013 0.963 +/- 0.014 0.978 +/- 0.010 Genome-wide scans for human diseases SNPs in Complement Factor H (CFH) gene are associated with Age-related Macular Degeneration (AMD) Klein et al, Science 2005 4. Somatic, structural and epigenetic variants in disease Somatic mutations the detection of somatic mutations, and their distinction from inherited polymorphism, is important to separate pre-disposing variants from mutations that occur during disease progression e.g. in cancer © Brian Stavely, Memorial University of Newfoundland 1. detect the mutations 2. classify whether somatic or inherited Detecting somatic mutations with comparative data • based on comparison of cancer and normal tissue from the same individual • often cancer tissue is highly heterogeneous and the somatic mutant allele may represent at low allele frequency Detecting somatic mutations with subtraction • if normal tissue samples are not available, we detect SNPs in cancer tissue against e.g. the human genome reference sequence • search for evidence that these mutations are genetic • subtract apparent mutations that are present in sequence variation databases Detecting somatic mutations in murine mtDNA • we have applied our methods for somatic mutation detection in murine mitochondrial sequences heteroplasmy homoplasmy • we will be applying our methods for human nuclear DNA from our collaborators Structural variants in disease Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767 Structural variations and phenotype Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767 Epigenetics and cancer Baylin at al. NRC 2006. Informatics of detection / integration of varied genetic and epigenetic data somatic mutations chromosome rearrangements methylation profiles chromatin structure copy number changes gene expression profiles repeat expansions