* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt - Chair of Computational Biology
Magnesium transporter wikipedia , lookup
Exome sequencing wikipedia , lookup
Genomic imprinting wikipedia , lookup
Community fingerprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Molecular ecology wikipedia , lookup
Ridge (biology) wikipedia , lookup
SNP genotyping wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biochemistry wikipedia , lookup
Biosynthesis wikipedia , lookup
Protein structure prediction wikipedia , lookup
Personalized medicine wikipedia , lookup
Gene expression profiling wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genetic code wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Pharmacogenomics Pharmacogenetics is an old discipline. One many distinguish pharmacogenetics (the study of a single gene) and pharmacogenomics (study of many genes or entire genomes) or use pharmagenomics for approaches that go beyond DNA to include mRNA and proteins Today, it is possible to assess entire pathways that might be relevant to disease or to drug response at the DNA, mRNA and protein levels. Eventually, the entire genome, transcriptome and proteome will be available. Therefore, parmacogenetics/-genomics and disease genetics/genomic are undergoing similar transitions, with a shift in focus from Mendelian examples (one gene one disease) to more complex modes of genetic causation. 12. Lecture WS 2003/04 Bioinformatics III 1 Where do drugs interact with proteins? This figure shows the paths that are taken by the anti-epileptic drug phenytoin and the angiotensinconverting enzyme (ACE) inhibitor imidapril in the human body. Phenytoin is absorbed into the bloodstream at the gut and circulated through the liver to the brain. It crosses the blood–brain barrier where it binds and inhibits its target, neuronal sodium channels. It is pumped back out across the blood–brain barrier into the bloodstream by multidrug resistance protein 1 (MDR1 , also known as ABCB1) efflux pumps. At the liver, phenytoin is metabolized by the cytochrome P450 enzymes CYP2C9 and CYP2C19, and it is eliminated through the kidneys. Imidapril is a PRO-DRUG . After its absorption from the gut into the bloodstream it is hydroxylated in the liver to the active metabolite imidaprilat. Imidaprilat binds and inhibits ACE in the plasma. Imidaprilat is also eliminated through the kidneys. Goldstein et al. Nature Rev. Gen. 4, 937 (2003) 12. Lecture WS 2003/04 Bioinformatics III 2 These associations were compiled from the literature by using the keywords „pharmacogenetics“ OR pharmacogenomcis“, „association study“ AND „drug response“, „polymorphism“ AND „drug response“. Therefore, the list omits many polymorphisms and probably includes some false positives. Most of the polymorphisms are either in the drug target or in a protein that is in the pathway in which the target acts. 12. Lecture WS 2003/04 Bioinformatics III Goldstein 3 et al. Nature Rev. Gen. 4, 937 (2003) The SNP Consortium 12. Lecture WS 2003/04 Goldstein et al. Nature Rev. Gen. 4, 937 (2003) Bioinformatics III 4 Haplotypes The diagram shows 5 haplotypes. 12 SNPs are localized in order along the chromosome. The letters on the top indicate groups of SNPs that have perfect pairwise linkage disequilibrium (LD) with one another, and the numbers on the bottom indicate each of the 12 SNPs. SNP 9 is the causal variant, which in this simple example determines drug response: allele C results in a therapeutic response, whereas allele G results in an adverse reaction. In this example, the selection of just one SNP from each of the groups A–E would be sufficient to fully represent all of the haplotype diversity. Each haplotype can be identified by just five tagging SNPs (tSNPs), and the causal variant would be tagged even if it were not itself typed. So, tSNP profiles that are highlighted predict an adverse reaction to the medicine. Normally, LD patterns are not so clear-cut and statistical methods are required to select appropriate sets of tSNPs. 12. Lecture WS 2003/04 Bioinformatics III Goldstein et al. Nature Rev. Gen. 4, 937 (2003 5 Haplotypes b The diagram depicts the same 12 SNPs, but with different associations among them, as might happen in a different population group. Because patterns of LD are different, some patients would be misclassified if the same five tSNPs were used and interpreted in the same way. Using the same SNP profiles as defined in population A, haplotype profiles 1, 2 and 3 are predicted to have allele C at the causal SNP 9 (a therapeutic response), whereas haplotype profiles 4 and 5 are predicted to have an adverse response. However, because the pattern of association has changed, the new haplotypes 6 and 7 are misclassified as haplotype patterns 6 and 7 in population B. Goldstein et al. Nature Rev. Gen. 4, 937 (2003) 12. Lecture WS 2003/04 Bioinformatics III 6 Discovering genotypes underlying phenotypes: from mendelian diseases to complex diseases Traditional view: over the past decade, about 1200 genes causing human diseases or traits have been identified, largely by positional cloning. Identification of the gene knowledge of relevant protein(s) often leads to understanding of the molecular and physiological basis of the disease phenotype. Successful examples in positional cloning: identifcation of genes underlying chronic granulomatous disease X-linked muscular dystrophies cystic fibrosis Fanconi anemia ataxia telangiectasia neurofibromatosis I Huntington disease identification of genes underlying hereditary predispositions to cancer, including retinoblastoma breast cancer polyposis colorectal cancer Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 7 Linkage mapping Positional cloning begins with linkage analysis. Families in which the disease phenotype segregates are analyzed using a group of DNA polymorphisms. Ideal method for diseases with very clear diagnosis. The limit of resolution remains the number of meioses in which crossovers might have occurred. In favorable cases (such as cystic fibrosis), the patterns of crossovers in the region of the gene among the cohorts studied leaves only a few predicted genes, all within about 1cM (~1Mb) as likely candidates. In less favorable cases, there may be as many as a few hundred predicted genes that might be the relevant disease genes. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 8 Linkage disequilibrium Greater power in fine-mapping is obtained by haplotype analysis, in which all markers are considered simultaneously as haplotypes rather than individually. Haplotype analysis allows the inference of likely historical crossover points, which localize the disease mutation. New algorithms based on haplotype analysis are being developed to estimate statistically the likely locations of such crossovers and thus the likely location of the disease mutation. The success of linkage disequilibrium (LD) mapping depends heavily on the degree of genetic heterogeneity underlying a disease sample. Unless one or a few mutations account for most instances of disease, the signal will be too inconsistent to find mutations. Some degree of heterogeneity is tolerable and can be overcome by clustering of disease chromosomes. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 9 Lessons from cloned mendelian genes HGMD lists 27.000 mutations in 1222 genes associated with human diseases and traits. In-frame amino acid substitutions are the most frequent. Less than 1% are found in regulatory regions. These data provide overwhelming support for the notion that mendelian clinical phenotypes are associated primarily with alterations in the normal coding sequence of proteins. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 10 Criteria for amino acid replacements Distinguish (1) biochemical severity of missense changes, and (2) location and/or context of the altered amino acid in the protein sequence. A useful guide is the Grantham scale: categorize codon replacements into classes of increasing chemical dissimilarity between the encoded amino acids: conservative moderately conservative moderately radical radical „stop“ or nonsense. There is a clear relationship between the severity of amino acid replacement and the likelihood of clinical observation. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 11 Clinical severity increases with severity of AA substitution Purple bars represent the ratio of frequencies of the indicated class of change compared to conservative changes for functional human genes compared to pseudogenes. Orange bars represent the ratio of the likelihood of clinical observation for a conservative change versus the indicated class of change. 9x A nonsense change is 9 times more likely to present clinically than a conservative amino acid substitution. For the other changes, the ratios are 3, 2.3, and 1.8. The same trend exists for the relative abundance of the different types of substitutions found in SNPs from human genes as compared with their abundance in pseudogenes. Evolution selects against radical changes! Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 12 Clinical significance correlates with degree of crossspecies evolutionary conservation An obvious way to measure the importance of a particular amino acid: conservation across species. The figure shows that the disease probability decreases monotonically with the number of amino acid differences among species. In simple terms: if evolution allows mutations between species, this amino acid cannot be so crucial. Relative risks (log odds ratios) for the observed versus the expected number of amino acid changes. Purple: severe diseases, Orange: milder disease mutations (G6PD). Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 13 Correlation of clinical severity and severity of gene lesion In numerous cases, genotype-phenotype correlation has identified milder forms of disease that are associated with less severe mutations. A classic example is Duchenne (severe) and Becker (mild) muscular dystrophy: Duchenne is caused primarily by frame-shift deletions, Becker is cause by in-frame changes. Other examples: hemolytic anemia – associated with globin mutations hemochromatosis – high penetrance radical amino acid substitution low penetrance milder amino acid substitution Gaucher disease – common milder mutation associated with fewer clinical symptoms G6PD deficiency – severity of amino acid substitution correlates with clinical significance Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 14 The future: understand complex diseases Classical linkage analysis and positional cloning remain the methods of choice for identifying rare, high-risk, disease-associated mutations, owing to their clear inheritance patterns. Knowledge of Human genome sequence will certainly help. But „simple“ mendelian inheritance is often not so simple: - multiple different mutations are often identified in the same or in different loci, with variable phenotypic effects and highly variable associated risks. - mutational or genotypic heterogeneity can explain some of the clinical variability observed in single-gene diseases, but usually not all modifier genes, environmental contributors. For non-mendelian diseases and for diseases with multi-gene effects, all contributing loci might be thought of as „modifiers“ as no single locus of large effect exists. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 15 large-scale SNP discovery projects Two strategies: „map-based“ or „sequence-based“. It is unclear which one will be more effective. The private sequencing effort has reported 2.1 million SNPs (Venter et al. 2001) and the public SNP consortium has identified 1.4 million SNPs (Sachidanandam et al. 2001). Rates of false-positives (10-15%) are modest. Rates of false-negatives (undetected SNPs) are more problematic. Neither collection was based on the sequences of many individuals many lower-fequency (< 10%) SNPs were not detected, especially those that are specific to a single population. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 16 fine-scale SNP discovery projects Study A analyzed 313 genes (720 kb of genomic sequence) for 84 ethnically diverse individuals. Only 2% (or 6% excluding singletons) of the SNPs identified are in dbSNP suggesting that there exist many more SNPs than the roughly 1.2 million unique SNPs in dbSNP Study B analyzed 65% of the unique sequence of chromosome 21 for 10 individuals. 36.000 SNPs were identified > 6.4 million SNPs for whole genome. Only 45% of the SNPs in dbSNP were found in this study. Conclusion: the number of SNPs in the human genome (defined by a rare-allele frequency of 1% or greater in at least one population) is likely to be > 15 million. Note: there are only 30.000 genes. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 17 fine-scale SNP discovery projects The alternative strategy to „map-based“ is based on genes and sequence. Here, genotyping focuses on SNPs identified in coding regions that alter or terminate amino acid sequence, or disrupt splice sites, or occur in promoter regions. The table shows that we expect 50.000 – 100.000 such gene-related SNPs. Based on results from cloned mendelian disease, one can prioritize amino acid replacements according to (a) the severity of the alteration, and (b) the degree of evolutionary conservation. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 18 Can disease-associated alleles be predicted from sequence? Main feature that distinguishes a map-based approach from a genome-based approach to genome-wide association studies is: degree to which functional variants can be predicted on the basis of sequence in, for example, coding and/or conserved regions of the genome. Table 1 showed that – for mendelian phenotypes - most diseases are the result of changes that cause loss or alterations in encoded proteins. < 1% of listed mutations occur in regulatory regions (these would be more difficult to predict from sequence). The greatest risk of a disease phenotype is associated with splice-site mutations, deletions and insertions. Botstein & Risch, Nature Gen. 33, 228 (2003) 12. Lecture WS 2003/04 Bioinformatics III 19 Can disease-associated alleles be predicted from sequence? Can this distribution of risks be extrapolated to alleles of moderate to low relative risk – which are assumed to underlie complex disease phenotypes? Literature: 18 changes – 15 AA substitutions, 1 large deletion, 1 frameshift, 1 variation in promoter region. This is not very different from high risk diseases Botstein & Risch, Nature Gen. 33, 228 (2003) and is also biased to substitutions. 12. Lecture WS 2003/04 Bioinformatics III 20 Natural variation in human membrane transporter genes: identify evolutionary and functional constraints Large-scale SNPs and Haplotype maps have only analyed 24-40 chromosomes within an ethnic population and therefore identified common variants (> 5%) with good accuracy. These screens could not identify less common variants that may have more severe functional consequences. Little is known about the relative levels of genetic diversity within classes of genes. Here: focus on membrane transporters which are important drug targets. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 21 Structure of Membrane Transporters Transmembrane helices (25 residue long stretches, purely hydrophobic; prediction accuracy > 90%). Typically 12-14 TM helices align to form pore. External domains are very variable in size. Predicted secondary structures of two representative membrane transporters from the ABC and SLC superfamilies. The transmembrane topology is schematically rendered. 12. Lecture WS 2003/04 Leabman et al. PNAS 100, 5896 (2003) Bioinformatics III 22 Membrane Transporters Membrane transporters play critical role in many biological processes: - maintain cellular and organismal homeostasis by importing nutrients essential for cellular metabolism - export cellular waste products and toxic componds. - important in drug response – they provide the targets for many commonly used drugs - are major determinants for drug absorption, distribution, and elimination. Two major subfamilies - ABC (ATP-binding cassette) transporters - SLC (solute carrier transporters) – take up neurotransmitters, nutrients, heavy metals ... Here: screen for variation in a set of 24 genes encoding membrane transporters. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 23 24 TM transporters with potential roles in drug response Transporters are grouped based on transporter family (e.g., OCT1, OCT2, and OCT3 belong to the SLC6 family; CNT1 and CNT2 belong to the SLC28 family). Blue ovals: transporters of SLC superfamily; red rectangles, ABC superfamily; green hexagon, P-type ATPase. Typical substrates for each family of transporters are listed. The direction of transport is indicated by an arrow pointing into the cell (influx) or out of the cell (efflux). Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 24 Aims of SNP scan Analyze 247 DNA samples of ethnically diverse collection (100 European Americans, 100 African Americans, 30 Asians, 10 Mexicans, 7 Pacific Islanders). Identify SNPs. Aim 1: determine the levels and patterns of genetic diversity - in different ethnic groups - in different transporter families - across different structural regions of membrane transporters. Aim 2: combine population-genetic and phylogenetic analysis to identify amino acid residues and protein domains that may be important for human fitness. Infer functional consequences of amino acid substitution. To identify polymorphisms, screen all exons plus 35 -100 bp of flanking intronic sequence. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 25 Variation in transporter genes 680 biallelic SNPs, 2 tri-allelic SNPs. 91/477 SNPs were already deposited in dbSNP. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 26 Population specificity 421/680 SNPs are population specific. 248/421 are singletons = occur only once among 494 chromosomes. (This explains why large-scale SNP projects have sofar identified far less SNPs). Of the 259 population-unspecific SNPs, 83 are present in all 5 populations. Few population-specific alleles were found at high frequency: only 4/278 African American-specific alleles had frequency > 0.1 only 1/50 Asian-specific allele had frequency > 0.1 The European American population sample had no population-specific allele (0/80) at fequency > 0.05. The relatively high incidence of moderately frequent population-specific alleles in African Americans may facilitate identification of ethnic-specific disease loci in this population. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 27 Analysis of Nucleotide Diversity On average, genetic variation in membrane transporters () is similar to that in other genes. Next: study nucleotide diversity in TM domains and in loop domains. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 28 Variation across structural regions As expected, amino acid diversity (ns) is significantly lower in TM domains than in loops. Consistent with observation that TM domains are evolutionary more conserved than loops; suggesting that there are constraints on TM domains of transporters. EC: evolutionary conserved EU: evolutionary unconserved Agreement suggests that constraints on structural regions of proteins (e.g. TM domains) occurs across long and short evolutionary distances for this set of proteins. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 29 ABC and SLC superfamilies ABC and SLC superfamilies of transporters have evolved to transport structurally diverse biological molecules. TMDs of both superfamilies contain residues and structural domains responsible for substrate specificity. Only the loops of the ABC transporters contain ATP-binding domains. Observation: is extremely low in TM domains of ABC transporters, much lower than in TM domains of SLC family members. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 30 Paralogue identification Predicted secondary structures of two representative membrane transporters (BSEP and CNT1) from the ABC and SLC superfamilies showing positions of nonsynonymous SNPs (leading to amino acid mutations). The transmembrane topology schematic was rendered by using the program TOPO. Nonsynonymous amino acid changes are shown in red. Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 31 Evolutionary conservation Surprisingly, the extent of amino acid diversity did not parallel evolutionary conservation: the fraction of EU residues in the TM domains of the ABC superfamily is significantly higher than in the TM domains of the SLC superfamily. This implies that a protein segment (TM domains of ABC transporters) is more constrained within humans than across species may be related to substrate properties _________________________________________________________________ For the SLC superfamily, NS-EC is significantly lower than NS-EU – both for the TM domains and for the loops. For the TM domains of the ABC superfamily, NS-EC ~ NS-EU. This may reflect special functional demands on the TM domain of this superfamily. Again: variation among humans does not always parallel phylogenetic variation! Leabman et al. PNAS 100, 5896 (2003) 12. Lecture WS 2003/04 Bioinformatics III 32 Back to Pharmacogenomics With the linkage of genomics with transcriptomics + proteomics, pharmacogenomics is undergoing a similar shift in focus from Mendelian examples to more complex modes of genetic causation. Candidate genes for variable drug response: (1) genes that code for drug-metabolizing enzymes (DME). Most DME-encoding genes have polymorphisms that have been shown to influence enzymatic activity. (2) proteins involved in drug transport. Drug transporters (e.g. ABC and SLC) show considerable genetic variation including many functional polymorphisms. Goldstein et al. Nature Rev. Gen. 4, 937 (2003) 12. Lecture WS 2003/04 Bioinformatics III 33 Future of Pharmacogenomics To detect the effect of a gene variant that explains 5% of the total phenotypic variation in a quantitative response to a drug by typing 100 independent SNPs would require 500 patients to provide an 80% chance of detection assuming an experiment-wide false-positive rate of 5%. The behaviour of most drugs will be influenced by a wide range of gene products (DMEs, transporters, targets, and others), and in many cases the importance of polymorphisms in one of the relevant genes might depend on polymorphisms in other genes. As a simple example, CYP1A2 and N-acetyltransferase 2 act in different stages in the pathway that metabolizes compounds in burnt meat. Variants might interact to influence the risk of colorectal cancer. The polymorphisms indicate that regulatory variants have a far more important role in variable drug response than they do in Mendelian diseases. Goldstein et al. Nature Rev. Gen. 4, 937 (2003) 12. Lecture WS 2003/04 Bioinformatics III 34