* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download a database designed for the polymorphisms of the human ccr2 gene
Saethre–Chotzen syndrome wikipedia , lookup
Copy-number variation wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Human genome wikipedia , lookup
Pathogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic engineering wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome evolution wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Genome-wide association study wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene nomenclature wikipedia , lookup
Human genetic variation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Microevolution wikipedia , lookup
Computational structural and functional genomics and transcriptomics Chapter 27 # A DATABASE DESIGNED FOR THE POLYMORPHISMS OF THE HUMAN CCR2 GENE Apasyeva N.V.1, Yudin N.S.1*, Ignatieva E.V.1, Voevoda M.I.1, 2, Romashenko A.G.1 1 Institute of Cytology and Genetics, SB RAS, Novosibirsk, 630090, Russia; 2 Institute of Internal Medicine, SB RAMS, Novosibirsk, Russia * Corresponding author: e-mail: [email protected] Key words: database, human CCR2 gene, polymorphism, disease, trait, population, allele frequency SUMMERY Motivation: Abundant information about all the currently known human genomic polymorphic markers is stored in the databases, whose sophisticated structure makes difficult efficient search of the required information. The development of a specialized secondary database with the information presented more compactly can substantially facilitate the user’s work. Results: We developed a specialized database that contains information about the polymorphic markers of the CCR2 gene and neighboring DNA regions, population frequencies of certain polymorphisms and SNP associated diseases and traits. The database can be useful for extracting in silico the polymorphisms of the CCR2 gene that have causal effect on the pathogenesis of diseases associated with immune system responses. Availability: The database is available on request from the authors. INTRODUCTION Single nucleotide polymorphisms (SNPs) are currently the most informative markers for the genes that cause common complex diseases. SNP are more abundant (1 SNP per 100– 1,000 bp), their detection is cheaper and less labor consuming than that of the other genomic polymorphic markers. Information about the SNPs and other polymorphic markers of the human genome is stored in the well known free available databases, including dbSNP, HGVbase, OMIM, and others. However, the bulky universal archives have complicated structures, this poses obstacles to search of the needed information about genetic markers and associations with diseases. Besides, these archives usually do not contain the data about the traits and diseases, because manual annotation of the continually expanding scientific information is required. Assembly of the data for polymorphisms in more specialized databases would allow to store them in the more compact and accessible format. Thus, the user’s job to tracking polymorphisms would be facilitated. The CCR2 gene is of major interest with reference to certain widespread threatening diseases (AIDS, cancer, diabetes) (Le et al., 2004). We have previously demonstrated that substitution of valine by isoleucine at position 64 of the protein sequence (V64I) is associated with myocardial infarction (MI) (Voevoda et al., 2002). Support for this association subsequently came from two independent teams. However, the detection of the association does not yet mean that this particular polymorphism is the cause of disease predisposition. It may be located on a chromosome nearby another truly disease-causative polymorphism. The database for the human CCR2 polymorphisms is required for search of the causative polymorphism at the predisposition locus to MI. It is hoped that the database we newly created would be a helpful tool to researchers dealing with the CCR2 gene. BGRS’2006 28 Part 1 METHODS AND ALGORITHMS Search in the Internet resourses was done by using the National Center for Biotechnology Information Service (http://www.ncbi.nlm.nih.gov/). The database was created as tables on the MS Excel format. Hyperlinks to the respective URL were added manually when required. Annotated abstracts and full article texts were the main sources for filling up the database. IMPLEMENTATION AND RESULTS We have developed a specialized database that contains information about polymorphic markers (predominantly SNPs) in the CCR2 gene and its neighboring DNA regions, their population frequencies and also about the trait and diseases associated with these polymorphisms. The database consists of 4 interrelated tables. Table “GENE” contains the general information about the gene: its complete and short names, references to the cards of the gene in the databases EntrezGene, GeneCards, EMBL/GenBank, NCBI, references to the protein card in the SwissProt databases. Table “POLYMORPHISMS” contains the following data about polymorphisms: identification number (rs#) in the dbSNP database, nucleotide position in the chromosomal contig, positions of the substituted amino acids in the protein, validation status. The information includes also the polymorphism effect on the gene expression level (if available), links to cards in the NCBI, UCSC and SwissProt databases, references to the published literature data; nucleotide sequences from the dbSNP database that flank the polymorphic site are additionally provided. The "“DISEASES” table lists the names of the diseases, SNPdiseases association (yes or no), ethnic group, sex and age of the subjects in the examined sample. The “POPULATIONS” table includes the country and region, the name of the examined population, the frequencies of the minor allele and of its genotypes. The third and fourth tables contain hyperlinks to the original publications and to their abstracts in the PubMed database, and currently contain information about SNP CCR2-64I only. The compiled database contains information about 41 polymorphisms. Besides 36 SNPs, the database provides information about 4 single nucleotide and 1 dinucleotide deletions. 4 polymorphisms are located in the promoter region, 21 are in the first intron (Fig. 1, positions 42757–46055), 2 are in the second intron (positions 47047–48254) 6 are in the coding parts (positions 46106–47046; 48255–48438) of exons (of these, 3 are nonsynonymous), 4 are in the 3’UTR (positions 48439–49505), and 4 are in the 3′flanking distant region of the gene. The classification of the polymorphisms was based on the structure of the longest mRNA (isoform A). The validation status of 6 polymorphisms is “unknown”; the existence of others is experimentally supported. There are 66 units that describe disease associations and 131 information units for the population frequencies. We intend to further annotate and improve the database. DISCUSSION The CCR2 gene has 3 exons and covers about 7 kb on human chromosome 3p21 (Fig. 1). There are two known alternative mRNA isoforms, A and B. Both have identical 5′-ends composed of exon 1 (positions 42728-42756) and the 5′-part of exon 2 (positions 46056– 47046), but they differ by the mRNA regions encoding their carboxyl end and 3′UTR. The two isoforms code the functional receptors differing by subcellular localization. The 41 polymorphisms in the database are unevenly distributed according to gene nucleotide sequence. SNPs are densest in the first intron, rare SNPs occur in the second. The average polymorphism density (1 SNP per 200 bp) agrees with their average density in the human genome. However, because we presented the total data for the numerous samples that BGRS’2006 Computational structural and functional genomics and transcriptomics 29 differed by ethnic group, sex, age and other features, the SNP list will increase with time. Evidently, the information about SNP associations with certain diseases is inconsistent. We revealed 5 publications that examined the CCR2-64I association with MI (Gonzalez et al., 2001; Voevoda et al., 2002; Ortlepp et al., 2003; Petrkova et al., 2003; Bjarnadottir et al., 2005). The association was proven in 3 of the 5 (Voevoda et al., 2002; Ortlepp et al., 2003; Petrkova et al., 2003). Probable environmental (diet, lifestyle etc) and/or genetic factors that may abolish the association between CCR2-64I and MI are yet to be found. It is of interest that in the popular OMIM database the CCR2 card does not mention association with MI. Regrettably, the OMIM is advisable so far as a preliminary introduction to the problem. The information about the population frequencies of the CCR2-64I allele would allow comparing the frequency of this polymorphism with morbidity and mortality of cardiovascular diseases among a population under study. Figure 1. A schematic representation of the CCR2 gene with known mRNAs and polymorphisms. a – the region of the gene with polymorphisms whose positions correspond to the contig nucleotide sequence with accession number NT_079509. Arrow points to the transcription start, coding parts of exons are black, the 5′UTR and 3′UTR are shaded, b – the known tissue-specific mRNAs, isoforms A and B. The coding parts of exons are black, the 5′UTR and 3′UTR are shaded, and the introns removed by splicing are shown by thin line. The nucleotide sequences flanking the different SNPs in the CCR2 gene, which we collected in the database, will make it possible to perform an in silico search of the polymorphism causing predisposition to MI. The functional analysis of the polymorphisms in the noncoding parts of the gene, using special software tools is required. The technology would predict the potential transcription binding sites, splicing sites and RNA secondary structures. The functional analysis of the SNPs that causes nonsynonymous substitutions in the protein can be combined with the software tools usually used to predict protein 3D structures and identify the functional motifs in protein. Two CCR2 mRNAs were detected. The isoform A includes the three exons of the CCR2 gene. The isoform B contains the first, second exons, and part of the second intron. Interestingly, the second intron differs from the other part of the CCR2 gene by low density of the polymorphisms. The polymorphisms are located on the flanks of this intron, not on its central part. It is possible that certain SNPs in the sequences of the flanks are involved in splicing regulation. Using contextual DNA analysis, we expect that computerassisted data would clarify whether the polymorphisms are involved in regulation of the CCR2 gene expression. As a result, each polymorphism will be assigned a weight score and each will be ranged according to the priority of its putative effect on the final phenotype trait (MI). The high cost of genotyping raises the question, how to choose the best – the less voluminous and most informative set of SNP polymorphisms for an associative survey of candidate genes or loci on a chromosome. It would appear that preliminary SNPs BGRS’2006 30 Part 1 weighing according to their potential functional contributions to a particular trait (disease) would allow us to elaborate a new algorithm for search of causative polymorphisms which predispose to common complex diseases. Thus, the proposed database for the human CCR2 gene polymorphisms contains informative guidelines for in silico search of the polymorphisms relating to diseases associated with the immune system responses, in particular those causing MI predisposition. ACKNOWLEDGEMENTS Work was supported in part by International Science and Technology Center (grant No. 2311), by the program “Dynamics of the gene resources of plant, animals and human” of the Russian Academy of Science and innovation project of Federal Agency of Science and Innovation IT-CP.5/001 “Development of software for computer modeling and design in postgenomic system biology (system biology in silico)”. The authors are grateful to Lokhova I.V. for assistance in retrieval of the full text of publications. REFERENCES Bjarnadottir K. et al. (2005) Examination of genetic effects of polymorphisms in the MCP-1 and CCR2 genes on MI in the Icelandic population. Atherosclerosis (Epub ahead of print). Gonzalez P. et al. (2001) Genetic variation at the chemokine receptors CCR5/CCR2 in myocardial infarction. Genes Immun., 2(4), 191–195. Le Y. et al. (2004) Chemokines and chemokine receptors: their manifold roles in homeostasis and disease. Cell. Mol. Immunol., 1(2), 95–104. Ortlepp J.R. et al. (2003) Chemokine receptor (CCR2) genotype is associated with myocardial infarction and heart failure in patients under 65 years of age. J. Mol. Med., 81(6), 363–367. Petrkova J. et al. (2003) CC chemokine receptor (CCR)2 polymorphism in Czech patients with myocardial infarction. Immunol. Lett., 88(1), 53–55. Voevoda M.I. et al. (2002) Association of the CCR2 chemokine receptor gene polymorphism with myocardial infarction. Dokl. Biol. Sci., 385, 367–370. BGRS’2006