Download a database designed for the polymorphisms of the human ccr2 gene

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Saethre–Chotzen syndrome wikipedia , lookup

Copy-number variation wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Human genome wikipedia , lookup

Genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genetic engineering wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

NEDD9 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genome evolution wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Genome-wide association study wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Tag SNP wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene nomenclature wikipedia , lookup

Human genetic variation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
Computational structural and functional genomics and transcriptomics
Chapter
27
#
A DATABASE DESIGNED FOR THE POLYMORPHISMS
OF THE HUMAN CCR2 GENE
Apasyeva N.V.1, Yudin N.S.1*, Ignatieva E.V.1, Voevoda M.I.1, 2, Romashenko A.G.1
1
Institute of Cytology and Genetics, SB RAS, Novosibirsk, 630090, Russia; 2 Institute of Internal
Medicine, SB RAMS, Novosibirsk, Russia
*
Corresponding author: e-mail: [email protected]
Key words:
database, human CCR2 gene, polymorphism, disease, trait, population, allele frequency
SUMMERY
Motivation: Abundant information about all the currently known human genomic
polymorphic markers is stored in the databases, whose sophisticated structure makes
difficult efficient search of the required information. The development of a specialized
secondary database with the information presented more compactly can substantially
facilitate the user’s work.
Results: We developed a specialized database that contains information about the
polymorphic markers of the CCR2 gene and neighboring DNA regions, population
frequencies of certain polymorphisms and SNP associated diseases and traits. The database
can be useful for extracting in silico the polymorphisms of the CCR2 gene that have causal
effect on the pathogenesis of diseases associated with immune system responses.
Availability: The database is available on request from the authors.
INTRODUCTION
Single nucleotide polymorphisms (SNPs) are currently the most informative markers for
the genes that cause common complex diseases. SNP are more abundant (1 SNP per 100–
1,000 bp), their detection is cheaper and less labor consuming than that of the other genomic
polymorphic markers. Information about the SNPs and other polymorphic markers of the
human genome is stored in the well known free available databases, including dbSNP,
HGVbase, OMIM, and others. However, the bulky universal archives have complicated
structures, this poses obstacles to search of the needed information about genetic markers
and associations with diseases. Besides, these archives usually do not contain the data about
the traits and diseases, because manual annotation of the continually expanding scientific
information is required. Assembly of the data for polymorphisms in more specialized
databases would allow to store them in the more compact and accessible format. Thus, the
user’s job to tracking polymorphisms would be facilitated.
The CCR2 gene is of major interest with reference to certain widespread threatening
diseases (AIDS, cancer, diabetes) (Le et al., 2004). We have previously demonstrated that
substitution of valine by isoleucine at position 64 of the protein sequence (V64I) is
associated with myocardial infarction (MI) (Voevoda et al., 2002). Support for this
association subsequently came from two independent teams. However, the detection of the
association does not yet mean that this particular polymorphism is the cause of disease
predisposition. It may be located on a chromosome nearby another truly disease-causative
polymorphism. The database for the human CCR2 polymorphisms is required for search of
the causative polymorphism at the predisposition locus to MI. It is hoped that the database
we newly created would be a helpful tool to researchers dealing with the CCR2 gene.
BGRS’2006
28
Part 1
METHODS AND ALGORITHMS
Search in the Internet resourses was done by using the National Center for Biotechnology
Information Service (http://www.ncbi.nlm.nih.gov/). The database was created as tables on the
MS Excel format. Hyperlinks to the respective URL were added manually when required.
Annotated abstracts and full article texts were the main sources for filling up the database.
IMPLEMENTATION AND RESULTS
We have developed a specialized database that contains information about
polymorphic markers (predominantly SNPs) in the CCR2 gene and its neighboring DNA
regions, their population frequencies and also about the trait and diseases associated with
these polymorphisms. The database consists of 4 interrelated tables. Table “GENE”
contains the general information about the gene: its complete and short names, references
to the cards of the gene in the databases EntrezGene, GeneCards, EMBL/GenBank,
NCBI, references to the protein card in the SwissProt databases. Table
“POLYMORPHISMS” contains the following data about polymorphisms: identification
number (rs#) in the dbSNP database, nucleotide position in the chromosomal contig,
positions of the substituted amino acids in the protein, validation status. The information
includes also the polymorphism effect on the gene expression level (if available), links to
cards in the NCBI, UCSC and SwissProt databases, references to the published literature
data; nucleotide sequences from the dbSNP database that flank the polymorphic site are
additionally provided. The "“DISEASES” table lists the names of the diseases, SNPdiseases association (yes or no), ethnic group, sex and age of the subjects in the examined
sample. The “POPULATIONS” table includes the country and region, the name of the
examined population, the frequencies of the minor allele and of its genotypes. The third
and fourth tables contain hyperlinks to the original publications and to their abstracts in
the PubMed database, and currently contain information about SNP CCR2-64I only.
The compiled database contains information about 41 polymorphisms. Besides 36
SNPs, the database provides information about 4 single nucleotide and 1 dinucleotide
deletions. 4 polymorphisms are located in the promoter region, 21 are in the first intron
(Fig. 1, positions 42757–46055), 2 are in the second intron (positions 47047–48254) 6 are
in the coding parts (positions 46106–47046; 48255–48438) of exons (of these, 3 are
nonsynonymous), 4 are in the 3’UTR (positions 48439–49505), and 4 are in the 3′flanking distant region of the gene. The classification of the polymorphisms was based on
the structure of the longest mRNA (isoform A). The validation status of 6 polymorphisms
is “unknown”; the existence of others is experimentally supported. There are 66 units that
describe disease associations and 131 information units for the population frequencies.
We intend to further annotate and improve the database.
DISCUSSION
The CCR2 gene has 3 exons and covers about 7 kb on human chromosome 3p21 (Fig. 1).
There are two known alternative mRNA isoforms, A and B. Both have identical 5′-ends
composed of exon 1 (positions 42728-42756) and the 5′-part of exon 2 (positions 46056–
47046), but they differ by the mRNA regions encoding their carboxyl end and 3′UTR.
The two isoforms code the functional receptors differing by subcellular localization. The
41 polymorphisms in the database are unevenly distributed according to gene nucleotide
sequence. SNPs are densest in the first intron, rare SNPs occur in the second. The average
polymorphism density (1 SNP per 200 bp) agrees with their average density in the human
genome. However, because we presented the total data for the numerous samples that
BGRS’2006
Computational structural and functional genomics and transcriptomics
29
differed by ethnic group, sex, age and other features, the SNP list will increase with time.
Evidently, the information about SNP associations with certain diseases is inconsistent.
We revealed 5 publications that examined the CCR2-64I association with MI (Gonzalez
et al., 2001; Voevoda et al., 2002; Ortlepp et al., 2003; Petrkova et al., 2003; Bjarnadottir
et al., 2005). The association was proven in 3 of the 5 (Voevoda et al., 2002; Ortlepp et
al., 2003; Petrkova et al., 2003). Probable environmental (diet, lifestyle etc) and/or
genetic factors that may abolish the association between CCR2-64I and MI are yet to be
found. It is of interest that in the popular OMIM database the CCR2 card does not
mention association with MI. Regrettably, the OMIM is advisable so far as a preliminary
introduction to the problem. The information about the population frequencies of the
CCR2-64I allele would allow comparing the frequency of this polymorphism with
morbidity and mortality of cardiovascular diseases among a population under study.
Figure 1. A schematic representation of the CCR2 gene with known mRNAs and polymorphisms.
a – the region of the gene with polymorphisms whose positions correspond to the contig nucleotide
sequence with accession number NT_079509. Arrow points to the transcription start, coding parts of
exons are black, the 5′UTR and 3′UTR are shaded, b – the known tissue-specific mRNAs, isoforms A
and B. The coding parts of exons are black, the 5′UTR and 3′UTR are shaded, and the introns removed
by splicing are shown by thin line.
The nucleotide sequences flanking the different SNPs in the CCR2 gene, which we
collected in the database, will make it possible to perform an in silico search of the
polymorphism causing predisposition to MI. The functional analysis of the
polymorphisms in the noncoding parts of the gene, using special software tools is
required. The technology would predict the potential transcription binding sites, splicing
sites and RNA secondary structures. The functional analysis of the SNPs that causes
nonsynonymous substitutions in the protein can be combined with the software tools
usually used to predict protein 3D structures and identify the functional motifs in protein.
Two CCR2 mRNAs were detected. The isoform A includes the three exons of the CCR2
gene. The isoform B contains the first, second exons, and part of the second intron.
Interestingly, the second intron differs from the other part of the CCR2 gene by low
density of the polymorphisms. The polymorphisms are located on the flanks of this intron,
not on its central part. It is possible that certain SNPs in the sequences of the flanks are
involved in splicing regulation. Using contextual DNA analysis, we expect that computerassisted data would clarify whether the polymorphisms are involved in regulation of the
CCR2 gene expression. As a result, each polymorphism will be assigned a weight score
and each will be ranged according to the priority of its putative effect on the final
phenotype trait (MI).
The high cost of genotyping raises the question, how to choose the best – the less
voluminous and most informative set of SNP polymorphisms for an associative survey of
candidate genes or loci on a chromosome. It would appear that preliminary SNPs
BGRS’2006
30
Part 1
weighing according to their potential functional contributions to a particular trait (disease)
would allow us to elaborate a new algorithm for search of causative polymorphisms
which predispose to common complex diseases. Thus, the proposed database for the
human CCR2 gene polymorphisms contains informative guidelines for in silico search of
the polymorphisms relating to diseases associated with the immune system responses, in
particular those causing MI predisposition.
ACKNOWLEDGEMENTS
Work was supported in part by International Science and Technology Center (grant
No. 2311), by the program “Dynamics of the gene resources of plant, animals and
human” of the Russian Academy of Science and innovation project of Federal Agency of
Science and Innovation IT-CP.5/001 “Development of software for computer modeling
and design in postgenomic system biology (system biology in silico)”. The authors are
grateful to Lokhova I.V. for assistance in retrieval of the full text of publications.
REFERENCES
Bjarnadottir K. et al. (2005) Examination of genetic effects of polymorphisms in the MCP-1 and CCR2
genes on MI in the Icelandic population. Atherosclerosis (Epub ahead of print).
Gonzalez P. et al. (2001) Genetic variation at the chemokine receptors CCR5/CCR2 in myocardial
infarction. Genes Immun., 2(4), 191–195.
Le Y. et al. (2004) Chemokines and chemokine receptors: their manifold roles in homeostasis and
disease. Cell. Mol. Immunol., 1(2), 95–104.
Ortlepp J.R. et al. (2003) Chemokine receptor (CCR2) genotype is associated with myocardial infarction
and heart failure in patients under 65 years of age. J. Mol. Med., 81(6), 363–367.
Petrkova J. et al. (2003) CC chemokine receptor (CCR)2 polymorphism in Czech patients with
myocardial infarction. Immunol. Lett., 88(1), 53–55.
Voevoda M.I. et al. (2002) Association of the CCR2 chemokine receptor gene polymorphism with
myocardial infarction. Dokl. Biol. Sci., 385, 367–370.
BGRS’2006