* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Evolutionary relationships of the Tas2r receptor gene families in
Long non-coding RNA wikipedia , lookup
Gene expression programming wikipedia , lookup
Transposable element wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Genomic library wikipedia , lookup
Public health genomics wikipedia , lookup
Neocentromere wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Point mutation wikipedia , lookup
Microevolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Human Genome Project wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Minimal genome wikipedia , lookup
Helitron (biology) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Metagenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Human genome wikipedia , lookup
Genome evolution wikipedia , lookup
Articles in PresS. Physiol Genomics (May 6, 2003). 10.1152/physiolgenomics.00060.2003 Evolutionary relationships of the Tas2r receptor gene families in mouse and man 1 2 1 1 Caroline Conte, Martin Ebeling, Anne Marcuz, Patrick Nef, and Pedro J. Andres-Barquin 1 1 2 Neuroscience and Bioinformatics, Pharma Research, F. Hoffmann-La Roche, Basel 4070, Switzerland Running head: Evolution of the mouse and human TAS2Rs Corresponding author: Pedro J. Andres-Barquin, Pharma Research Basel Discovery - Neuroscience, Bldg. 93/356, F. Hoffmann-La Roche Ltd., CH 4070 Basel, Switzerland. Phone: 41-61-688 73 29. Fax: 41-61-688 14 48. E-mail: [email protected] Copyright (c) 2003 by the American Physiological Society. 2 Abstract The early molecular events in the perception of bitter taste start with the binding of specific water-soluble molecules to G protein-coupled receptors (GPCRs) encoded by the Tas2r family of taste receptor genes. The identification of the complete TAS2R receptor family repertoire in mouse and a comparative study of the Tas2r gene families in mouse and man might help to better understand bitter taste perception. We have identified, cloned and characterized 13 new mouse Tas2r sequences, 9 of which encode putative functional bitter taste receptors. The encoded proteins are between 293 and 333 amino acids long and share between 18 and 54 percent sequence identity with other mouse TAS2R proteins. Including the 13 sequences identified, the mouse Tas2r family contains approximately 30% more genes and 60% fewer pseudogenes than the human TAS2R family. Sequence and phylogenetic analyses of the proteins encoded by all mouse and human Tas2r genes indicate that TAS2R proteins present a lower degree of sequence conservation in mouse than in human and suggest a classification in five groups that may reflect a specialization in their functional activity to detect bitter compounds. Tas2r genes are organized in clusters in both mouse and human genomes and an analysis of these clusters and phylogenetic analyses indicate that the five TAS2R protein groups were present prior to the divergence of the primate and rodent lineages. However, differences in subsequent evolutionary processes including local duplications, interchromosomal duplications, divergence and deletions, gave rise to species-specific sequences and shaped the diversity of the current TAS2R receptor families during mouse and human evolution. Sequence data reported in this paper have been submitted to GenBank and assigned the accession numbers AF532785-AF532793 and AY145467 - AY145470. Key words: bitter taste receptors, GPCR, primates, rodents. 3 Introduction The systems underlying chemical sensation are essential for animal survival. Even singlecell organisms possess receptors that allow them to respond to chemical signals. In most organisms, the chemical senses play a pivotal role in locating food, discriminating foods from those that are toxic, motivating food intake, and regulating the aspects of social behaviour that are necessary for reproduction. The sense of taste bestows the organism with the ability to detect nutritionally important compounds, including sugars, salts and amino acids, and potentially harmful substances, including alkaloids and acids. The initial step in taste perception is the interaction of water-soluble molecules with receptors expressed at the surface of taste receptor cells. On binding taste molecules, the receptors trigger transduction cascades that activate synapses and thus cause activation of the nerve fibres (for review, see 13). Mammals taste many compounds but are believed to distinguish between only five primary tastes: sweet, bitter, sour, salty and umami (the taste of monosodium glutamate). The tastes sweet, bitter and umami are believed to be detected by G protein-coupled receptor (GPCR) signalling pathways (6, 32). A number of taste GPCRs called TAS1Rs, TAS2Rs and Taste-mGluR4 have been identified and characterized in mammals. The taste cell-derived variant of mGluR4 receptor has been proposed to function as an umami taste receptor (5). The TAS1R family of taste GPCRs is thought to function in the perception of sweetness (17, 19, 23, 29), as amino acid sensors (22), and as umami taste receptors (12). The TAS2R family of taste GPCRs is implicated in the perception of bitterness (1, 4, 16). Tas2r genes are expressed in taste receptor cells and map to regions of human and mouse chromosomes that have been related to the ability to sense a variety of bitter compounds (3, 14, 25). Few TAS2R receptors are known that respond to bitter substances due to the difficulty to functionally express Tas2r genes in heterologous cell lines (2, 4). Studies in experimentally tractable model organisms, such as mouse, and the identification of orthologous relationships between human and mouse Tas2r genes can help to determine the ligand-binding properties of TAS2R receptors and to translate data from mouse studies into an 4 understanding of human taste. A few pairs of mouse and human Tas2r orthologous have been reported to date (1). The identification and characterization of the entire repertoire of TAS2R receptors are the basis for studies of receptor-ligand interactions and the understanding of the early molecular events in bitter taste perception. We previously searched in the virtually completed human genome sequence and, together with others, identified 28 genes encoding putative functional bitter taste receptors of the TAS2R family (2, 7). In mouse, the Tas2r gene family is less well characterized than the human TAS2R family. A total of 28 Tas2r genes were identified prior to the publication of the draft of the mouse genome sequence (1, 16). To identify the complete Tas2r family repertoire in mouse and make a comparative analysis of human and mouse Tas2r gene families, we have carried out homology-based searches in the recently published draft of the mouse genome sequence (20) and also in available unannotated raw sequences. We have identified and characterized 9 new full-length sequences encoding putative mouse taste receptors of the TAS2R family and also 4 sequences that may represent non-functional pseudogenes. Our almost complete repertoire of Tas2r genes allow us to describe the evolution of these genes and report new insights into the evolutionary processes that shaped the current Tas2r gene families in mouse and man. Materials and Methods Sequence database mining A publicly available draft of the mouse genome was downloaded from the Mouse Genome Sequencing Consortium (MGSC; http://www.ensembl.org/Mus_musculus/Download/). The draft is based on BAC clones sequenced in the public domain and contains sequences up to those that were available in February, 2002. The mouse genome is roughly sevenfold covered by the available sequence, and the assembled sequence is estimated to cover 96% of mouse 5 euchromatic DNA (http://www.ensembl.org/Mus_musculus/). Databases were prepared from the raw sequence data. These databases were searched with a collection of known mouse, human, and other vertebrate, sequences from the public domain to obtain matching segments (HSPs, high-scoring segment pairs), which often roughly correspond to exons of genes on the chromosomes. The database build-up and the sequence similarity searches were performed using the BLAST2 suite of programs available from the National Center of Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/) on their publicly accessible ftp site (ftp://ftp.ncbi.nlm.nih.gov/blast). HSPs found by the same query sequence, and lying in proximity to each other on a single chromosome, were assembled into complete genes using the publicly available software GeneWise from the Wise2 package by Ewan Birney at the Sanger Centre, Hinxton, UK (http://www.sanger.ac.uk:80/Software/Wise2). This software aligns protein sequences to genomic DNA sequences and reconstructs splice sites at the exon-intron boundaries that comply with the well-known "GT-AG" rule for the splicing of eukaryotic genes. The resulting, predicted genes, and their corresponding protein translations, were assembled into two databases, respectively. Using tools from the publicly available software package HMMER, version 2.1.1, by Sean Eddy at Washington University in St. Louis, USA (http://hmmer.wustl.edu/), an alignment of known, publicly available taste receptors was used to produce a Hidden Markov Model (HMM) characteristic of this family (programs HMMBUILD and HMMCALIBRATE from the HMMER package). This HMM was then used to search the database of protein translations of predicted mouse genes mentioned above, using the HMMSEARCH program from the HMMER package. The top-scoring matches were identified as potential taste receptors and were further analyzed. TAS2R candidates identified were compared to known TAS2R sequences in the public databases and also in a database of patented sequences (GeneSeq database obtained from Derwent Information, London, UK) using the BLAST2 suite of programs available from the NCBI (http://www.ncbi.nlm.nih.gov/). 6 Sequence alignments, phylogenetic analysis and sequence logos Multiple sequence alignments were generated with ClustalW (31). The alignments were slightly modified to adjust the gap positions by visual inspection. The resulting alignment was graphically displayed using the PrettyBox program from the Wisconsin Package Version 10.2, Genetics Computer Group (GCG), Madison, WI. Protein phylogenetic trees are based on the alignment of 36 mouse and 28 human TAS2R protein sequences. Phylogenetic analyses were performed using the PHYLIP package (Felsenstein 1993 PHYLIP [Phylogeny Inference Package] version 3.6a2. Department of Genetics, University of Washington, Seattle). Protein trees are based on sequence distances derived from the Jones-Taylor-Thornton substitution matrix (program PROTDIST from PHYLIP). From the original alignment, 1,000 alignments were obtained using the bootstrap procedure (program SEQBOOT). Distance matrices were calculated for each of them, and phylogenetic trees were obtained using the Neighbor-Joining algorithm as implemented in the program NEIGHBOR. The resulting trees were used to derive bootstrap support values for each of the branch points in the tree. Sequence logos are based on the alignment of 36 mouse TAS2R protein sequences. The sequence logos were generated using a web-based program developed by J. Gorodkin (http://www.cbs.dtu.dk/gorodkin/appl/plogo.html) (10, 30). Genomic PCR and cloning Tas2r DNA sequences were amplified from mouse genomic DNA by polymerase chain reaction (PCR) using Taq DNA polymerase (Roche Molecular Biochemicals, Basel, Switzerland) and oligonucleotide primers designed to amplify the full Tas2r open reading frames (ORF). PCR amplifications were performed with the primer pairs described in table 1. 7 PCRs were performed in a total volume of 50 µl containing 100 ng of DNA, 200 nM of each primer in 50 mM KCl, 10 mM Tris (pH 8.3), 1.5 mM MgCl2, 0.5 mM of each dNTP, and 2 U Taq DNA polymerase. An initial denaturation step at 94°C for 2 min was followed by 30 cycles of denaturation at 94°C for 30 s, annealing at 55°C for 45 s, and extension at 72°C for 1 min on a Tpersonal machine (Biometra, Goettingen, Germany). The last extension step was 10 min. The PCR products obtained were analysed on 1% agarose gels stained with ethidium bromide. The sequence of the Tas2r genes and pseudogenes was determined by subcloning the PCR products in pCRII-TOPO (Invitrogen, Paisley, UK). Oligonucleotides used to prime sequencing were Sp6 f5'-ATTTAGGTGACACTATAG-3' TAATACGACTCACTATAGGG-3'. Double-stranded templates and were T7 sequenced r5'by dideoxynucleotide chain termination by dRhodamine Terminator Cycle Sequencing Ready Reaction (Perkin Elmer, USA) with an initial denaturation step at 96°C for 2 min followed by 25 cycles of denaturation at 96°C for 30 s, annealing at 50°C for 15 s, and extension at 60°C for 4 min. The samples were loaded on an ABI310 sequence analyser (ABI). Results The mouse genome contains 36 Tas2r genes The TAS2R family of receptors has been shown to play a role in the perception of bitterness (1, 2, 4, 16). A first search in a partial mouse genome sequence led to the identification of 26 full-length gene members of the Tas2r family in mouse (1). To better understand the early events in bitter taste perception and complete the identification of the Tas2r family repertory, we sought to identify new putative mouse taste receptors belonging to this family. For this purpose, we have undertaken a bioinformatics homology-based screen of the mouse genome draft for sequences related to the TAS2R family of taste receptors. In a first step we collected all TAS2R sequence information available in public databases and in the GeneSeq database of patented 8 sequences. We found 32 entries of mouse Tas2r related sequences. Three of those entries were partial sequences, two contained a frameshift each and 27 appeared to be full-length Tas2r genes. We aligned all known, publicly available, TAS2R receptor sequences and developed an HMM model characteristic of the TAS2R family to search in a database of protein translations of predicted mouse genes and in ORFs predicted in unannotated high-throughput genomic sequences (HTGS) (see Materials and Methods). ORF-encoding protein products that were shorter than 250 amino acids were not considered as full-length, uninterrupted ORFs. Sequences sharing more than 98% nucleotide or amino acid identity were considered to be identical, because they may represent sequencing errors or genetic polymorphism. Sequences containing one or more disruptions in a full-length ORF were considered as pseudogenes. We identified in mouse 13 new Tas2r gene sequences that we named Tas2r34 to Tas2r46. Four of those sequences, Tas2r41, 42, 45 and 46, are pseudogenes as they contain frameshifts and/or premature stop codons. To experimentally validate these findings, we performed PCR amplification of mouse genomic DNA. The reactions were primed with oligonucleotides designed to amplify each full-length gene sequence and pseudogene sequence (Table 1). Sequencing of the reaction products confirmed the correct identification of the 9 new Tas2r genes and the four new pseudogenes in the mouse genome (data not shown). Figure 1 presents a comparison of the predicted amino acid sequences of the identified full-length proteins with mouse TAS2R5 protein. The predicted proteins have between 293 and 333 amino acids and share 22 to 33% sequence identity with mouse TAS2R5 and 18 to 54% sequence identity with other mouse TAS2R proteins (Fig. 1 and data not shown). Sequence analysis of the new proteins also indicated the presence of seven transmembrane domains and large sequence conservation in the transmembrane domains and in the intracellular domains. 9 The TAS2R proteins of mouse present a lower degree of sequence conservation than their human counterparts A comparison of the proteins encoded by the 27 intact mouse Tas2r genes found in databases and by the 9 full-length Tas2r genes we identified reveals a number of conserved features. Mouse TAS2R proteins are between 293 and 334 amino acids long and contain seven transmembrane domains and short amino- (1-29 amino acids) and carboxy-termini (8-54 amino acids). As the mouse TAS2R family of proteins exhibits a high variability in primary structure, 16–77% amino acid identity between its members, we sought to visualize the conservation of residues between all mouse TAS2R proteins. We visualized the alignment of the full-length amino acid sequences in a sequence logo (10, 30). As shown in figure 2A, a total of 11 residues were present in more than 90% of all TAS2Rs. Most of these residues were located in the transmembrane domains (TM), which have a higher degree of sequence conservation than the intra-cytoplasmic (IC) and extra-cellular (EC) loops (fig. 2A). The intra-cytoplasmic loops also shared a considerable degree of sequence conservation between the TAS2R members in comparison to the extracellular loops, which are very variable. The intra-cytoplasmic loops and their adjacent transmembrane segments are the predicted sites of G protein interaction and the distinctive extracellular regions are the predicted regions of ligand binding (1, 15). The overall high degree of variability between TAS2R proteins precluded the identification of a conserved consensus sequence. To compare the degree of sequence conservation between mouse and human TAS2R proteins, we aligned the 36 mouse protein sequences and 28 human TAS2R protein sequences. We identified 36 amino acid positions as the most conserved in both mouse and human sequences (Fig. 2B). These amino acids are located in the intra-cytoplasmic loops and in the transmembrane segments. Only one position (leucine 198), located in the TM5 adjacent to IC3, is fully conserved in all mouse and human TAS2R proteins suggesting that this residue is essential for the function of TAS2R receptors. A single position (leucine 202) is fully conserved 10 only in the mouse proteins and 3 positions (leucine 51, tryptophan 98 and serine 201) are fully conserved only in the human proteins. Many of the other positions show a lower degree of sequence conservation in the mouse proteins than in human (Fig. 2B). Sequence conservation can also be measured using information theory, where the “information content” of each position in the sequence is scored on the basis of the distribution of amino acids present, with conserved positions scoring more than variable positions (30). The total information content of the mouse and human proteins are 539.2 and 681.7 bits respectively, confirming that TAS2R proteins present a lower degree of sequence conservation in mouse than in human. Mouse Tas2r genes are located in chromosome regions that exhibit synteny with human TAS2R loci To determine the exact location of all mouse Tas2r family member sequences in the chromosomes, we mapped all Tas2r sequences longer than 500 bp to the mouse genome databases using BLAST2. With the exception of Tas2r34 and Tas2r19, located on chromosomes 2 and 15, respectively, all mouse Tas2r genes and pseudogenes are located on chromosome 6 (fig. 3). They are organized in two clusters: a cluster of ten Tas2r sequences spans approximately 10.4 Mb; a second cluster of 29 Tas2r sequences spans approximately 1.2 Mb. Chromosome localization analysis of mouse and human Tas2r sequences indicated that the distribution in clusters is very similar in both species and that the clusters 1 and 2 of mouse chromosome 6 are located in regions of the chromosome that exhibit synteny with TAS2R-rich regions of human chromosomes 7 and 12, respectively (Fig. 3) (http://www.ncbi.nlm.nih.gov/Homology/index.html). A number of mouse and human Tas2r genes located in these regions seem to be orthologs. Most of the Tas2r genes in cluster 1 of mouse chromosome 6 have putative orthologous genes in human chromosome 7 and many of the genes in cluster 2 of mouse chromosome 6 have putative orthologous genes in human chromosome 12 (Fig. 3). Thus, both Tas2r clusters were present when the primate and rodent 11 lineages diverged and still exist now. As indicated by blue lines in Figure 3, cluster 2 underwent three expansions in mouse and two expansions in human. These species-specific groups of sequences could have originated from gene duplications or conversions since the divergence of those species, or from loss of the orthologous gene(s) (loss from the genome or because the datasets are incomplete). Also, chromosome localization analysis of mouse and human Tas2r genes indicated that, unlike on human chromosome 12, Tas2r genes in cluster 2 of mouse chromosome 6 are distributed in two sub-clusters separated by a region of 700 kb. To study the evolutionary relationship between all mouse and human TAS2Rs, we used our alignment of human and mouse TAS2R proteins to generate a phylogenetic tree as described in Materials and Methods. The alignment also included three human TAS2R sequences, TAS2R30, 33 and 36, which do not appear in the current draft of the human genome but are available in the databases. As shown in Figure 4, most of the major clades in the phylogenetic tree contain both mouse and human sequences suggesting that most TAS2R groups were present in the common ancestor. A number of mouse proteins appear in the tree close to their putative human orthologs confirming the results obtained in the analyses of nucleotide sequence identity and chromosome localization. Analysis of this phylogenetic tree also allowed us to classify TAS2R proteins into five groups according to two different criteria, the phylogenetic cluster and the protein identity (Fig. 4). A phylogenetic tree generated from an alignment of all nucleotide sequences confirmed these results (data not shown). The classification of TAS2R receptors in groups may reflect a specialization in their functional activity to detect bitter compounds. The Tas2r gene family expands by local tandem duplications An alignment of all human and mouse Tas2r genes shows that the Tas2r genes that are located close to each other in the genome are often very similar in sequence, indicating that tandem events (duplications and/or gene conversions) are the major evolutionary forces shaping the diversity of this gene family (Fig. 5). Thirty-one (86.1%) of the 36 full-length mouse Tas2r 12 genes have their closest mouse relatives in the same cluster. In the human genome, 88% of the TAS2R genes with available chromosomal localization data have their closest human relatives in the same cluster. We found some differences between both species regarding the temporal and spatial pattern of gene duplication. In human, an analysis of the percentage of amino acid identity between the sequences within each possible pair of TAS2R sequences show that there are 10 pairs above the level of 80% (Fig. 5B and D). These turn out to be the ten possible pairs within a group of five sequences (TAS2R 44, 50, 52, 53 and 54) that map very closely to each other in one cluster (Fig. 3), indicating that these genes were generated by duplications that arose intrachromosomally by local tandem events. As shown in Fig. 5B, a second group of 18 points represent pairs of sequences with identities between 60 and 80%. These pairs of sequences are all the possible pairs between 3 additional sequences (TAS2R51, 55 and 56), which are also located in the same cluster (Fig. 3), and the 5 mentioned sequences. These data indicate that a precursor gene generated a group of four genes, including TAS2R51, 55, 56 and a fourth gene that is one of the genes from the group of 5 mentioned above. Later during evolution, this fourth sequence gave rise to four additional sequences, generating the group of 5 most similar genes, TAS2R44, 50, 52, 53 and 54. From the available data, it is not possible to infer which of those five genes seeded the cluster. In mouse, all the pairs of TAS2R sequences share less than 80% identity and 99% of the pairs are below the level of 60%, suggesting that the most recent duplications arose earlier than those in human (Fig. 5A and C). Also, two TAS2Rs sharing 58% identity, TAS2R34 and TAS2R43, are encoded by genes located on two different mouse chromosomes, chromosome 2 and 6 respectively, indicating that an interchromosomal event occurred in this species. 13 Discussion In this study we describe the identification and characterization of 13 new bitter taste receptor gene sequences in mouse nine of which encode full-length putative receptors and four of which are pseudogenes. Including these sequences, the mouse Tas2r family is composed of at least 36 full-length genes and 6 pseudogenes. Because almost 96% of the mouse genome is sequenced, our survey should almost complete the Tas2r family repertoire in mouse. Comparison of all known mouse and human TAS2R receptors reveals that the family of TAS2R receptors is more divergent in mouse than in human. Also, the TAS2R repertoire is about 30% larger in mouse than in human (36 genes in mouse and 28 genes in human), which suggest that mouse is able to detect bitter molecules with very diverse chemical structure. Positive selection to provide a diverse repertoire of bitter tastant binding receptors in mouse and/or lower selective constraints on protein sequence in the mouse TAS2R family may account for an increased diversity in the TAS2R family during mouse evolution. Bitter taste has evolved as a central warning signal against the ingestion of potentially toxic substances, and rapid evolution of the TAS2R receptors may be necessary to detect new harmful substances appearing in the environment. As bitter molecules are very numerous and greatly differ in their chemical structure, it is likely that a large number of divergent receptors be required to detect them and that selective pressure have favored evolutionary mechanisms that allow Tas2r genes to evolve into a more diverse repertoire in mouse than in human. It is also possible that deletions of TAS2R genes in the human lineage have exacerbated the differences between the two species. The presence of Tas2r genes in the mouse genome, as for example Tas2r22, which are distantly related to all other mouse genes and lack a ortologue in human (Fig. 4), is consistent with this interpretation. Comparison of the number of Tas2r pseudogenes in mouse and human reveals that a smaller number of pseudogenes exist in mouse. Approximately 17% of the mouse sequences were classified as pseudogenes while in human, the pseudogenes represent approximately 40% of the TAS2R sequences (7). Again, this marked difference suggests a steady selective pressure 14 in mouse to maintain a functional Tas2r repertoire, but may also be due in some degree to a faster elimination of pseudogenes from the mouse genome than from the human genome (11). The lower comparative number of TAS2R genes and the higher comparative number of pseudogenes observed in human with respect to mouse may represent a decrease in selective advantage to detect and respond to gustatory stimuli during human evolution. This decrease in selective advantage could be explained by the fact that in man, the gustatory function is not as essential for survival as in mouse. Other families of seven transmembrane chemosensory receptors having parallelism with the gustatory system from an evolutionary point of view, including the olfactory receptor (OR) and the vomeronasal receptor (V1R) families, also contain a high proportion of pseudogenes (8, 18, 27, 34). In the OR and V1R families, a larger number of pseudogenes and a lower number of genes exist in human in comparison with mouse (8, 9, 24, 26, 27, 33, 34). These observations are consistent with our results showing a higher comparative number of TAS2R pseudogenes and a lower comparative number of TAS2R genes in human with respect to mouse. Interestingly, in primates, an study of the OR repertoire also suggest a parallelism between the increase of the pseudogene rate in this family and a decrease in the olfactory sensory function during evolution (28). All mouse Tas2r genes and pseudogenes are located on chromosomes 2, 6, and 15. The distribution along the chromosomes is not uniform because most of the sequences are organized in 2 clusters located in chromosome 6. This chromosomal distribution is similar to the distribution described in human chromosomes, where most TAS2R genes are organized in two individual clusters located in chromosomes 7 and 12 (1, 7, 16). Comparison of the chromosomal localization of Tas2r genes in mouse and human chromosomes reveals that the cluster 1 of mouse chromosome 6 is located in a region of the chromosome that exhibit synteny with the region of human chromosome 7 containing a cluster of TAS2R genes (http://www.ncbi.nlm.nih.gov/Homology/index.html) (7). Similarly, the cluster 2 of mouse chromosome 6 is located in a region of the chromosome that exhibit synteny with the region of 15 human chromosome 12 containing the largest cluster of human TAS2R genes. This indicates that the general arrangement of these gene clusters was established before the divergence of the primate and rodent lineages. However, a number of species-specific groups of Tas2r genes have very likely originated from local gene duplication or conversion events since the primate and rodent lineages diverged. An analysis of the phylogenetic tree of all mouse and human TAS2R proteins and of the Tas2r gene clusters in mouse and human chromosomes supports this view. Over half of all Tas2r genes in both species match another Tas2r gene within the same genome better than one in the genome of the other species. This suggests that the diversity of the Tas2r gene family has been largely shaped by a “birth-and-death” model, which proposes that new genes arise by gene duplication, followed by divergence and maintenance of some duplicate genes, and deletion or accumulation of mutations in other genes (21). The birth–and-death model has also shaped a variety of gene families with significant sequence diversity including the OR gene family (33) and the MHC and immunoglobulin gene families (21). In conclusion, we have identified, cloned and characterized 13 new mouse Tas2r sequences, 9 of which encode putative functional bitter taste receptors. The finding of these sequences significantly advances the identification of the complete Tas2r family repertory in mouse, which include at present 36 genes. Comparison of the Tas2r gene families in mouse and man give insights into the pressures and processes that shaped the diversity of the current TAS2R families during mouse and human evolution. These findings provide a focus for continuing studies of the TAS2R family of receptors that should contribute to a better understanding of the early molecular events in bitter taste. 16 Acknowledgments This work was supported by F. Hoffmann-La Roche Ltd. and Givaudan Flavors Corporation. We thank Clemens Broger, Jay P. Slack, Ping Zhong, and Gonzalo Acuña for helpful discussions. Disclosure Statement We have no conflicts of interest. 17 References 1. Adler E, Hoon MA, Mueller KL, Chandrashekar J, Ryba NJ, and Zuker CS. A novel family of mammalian taste receptors. Cell 100: 693-702, 2000. 2. Bufe B, Hofmann T, Krautwurst D, Raguse JD, and Meyerhof W. The human TAS2R16 receptor mediates bitter taste in response to beta- glucopyranosides. Nat Genet 32: 397401, 2002. 3. Capeless CG, Whitney G, and Azen EA. Chromosome mapping of Soa, a gene influencing gustatory sensitivity to sucrose octaacetate in mice. Behav Genet 22: 655-663, 1992. 4. Chandrashekar J, Mueller KL, Hoon MA, Adler E, Feng L, Guo W, Zuker CS, and Ryba NJ. T2Rs function as bitter taste receptors. Cell 100: 703-711, 2000. 5. Chaudhari N, Landin AM, and Roper SD. A metabotropic glutamate receptor variant functions as a taste receptor. Nat Neurosci 3: 113-119, 2000. 6. Chaudhari N and Roper SD. Molecular and physiological evidence for glutamate (umami) taste transduction via a G protein-coupled receptor. Ann N Y Acad Sci 855: 398-406, 1998. 7. Conte C., Ebeling M, Marcuz A, Nef P, and Andres-Barquin PJ. Identification and characterization of human taste receptor genes belonging to the TAS2R family. Cytogenet Genome Res 98: 45-53, 2002. 8. Giorgi D, Friedman C, Trask BJ, Rouquier S. Characterization of nonfunctional V1R-like pheromone receptor sequences in human. Genome Res 10: 1979-1985, 2000. 9. Glusman G, Yanai I, Rubin I, Lancet D. The complete human olfactory subgenome. Genome Res 11: 685-702, 2001. 10. Gorodkin J, Heyer LJ, Brunak S, and Stormo GD. Displaying the information contents of structural RNA alignments: the structure logos. Comput Appl Biosci 13: 583-586, 1997. 18 11. Graur D, Shuali Y, and Li WH. Deletions in processed pseudogenes accumulate faster in rodents than in humans. J Mol Evol 28: 279-285, 1989. 12. Li X, Staszewski L, Xu H, Durick K, Zoller M, and Adler E. Human receptors for sweet and umami taste. Proc Natl Acad Sci U S A 99: 4692-4696, 2002. 13. Lindemann B. Receptors and transduction in taste. Nature 413: 219-225, 2001. 14. Lush IE., Hornigold N, King P, and Stoye JP. The genetics of tasting in mice. VII. Glycine revisited, and the chromosomal location of Sac and Soa. Genet Res 66: 167-174, 1995. 15. Margolskee RF. Molecular mechanisms of bitter and sweet taste transduction. J Biol Chem 277: 1-4, 2002. 16. Matsunami H, Montmayeur JP, and Buck LB. A family of candidate taste receptors in human and mouse. Nature 404: 601-604, 2000. 17. Max M, ShankerYG, Huang L, Rong M, Liu Z, Campagne F, Weinstein H, Damak S, and Margolskee RF. Tas1r3, encoding a new candidate taste receptor, is allelic to the sweet responsiveness locus Sac. Nat Genet 28: 58-63, 2001. 18. Mombaerts P. The human repertoire of odorant receptor genes and pseudogenes. Annu Rev Genomics Hum Genet 2: 493-510, 2001. 19. Montmayeur JP, Liberles SD, Matsunami H, and Buck LB. A candidate taste receptor gene near a sweet taste locus. Nat Neurosci 4: 492-498, 2001. 20. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520-562, 2002. 21. Nei M, Gu X, and Sitnikova T. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc Natl Acad Sci U S A 94: 7799-7806, 1997. 22. Nelson G, Chandrashekar J, Hoon MA, Feng L, Zhao G, Ryba NJ, and Zuker CS. An amino-acid taste receptor. Nature 416: 199-202, 2002. 19 23. Nelson G, Hoon MA, Chandrashekar J, Y. Zhang Y, Ryba NJ, and Zuker CS. Mammalian sweet taste receptors. Cell 106: 381-390, 2001. 24. Pantages E, Dulac C. A novel family of pheromone candidate receptors in mammals. Neuron 28: 835-845, 2000. 25. Reed DR, Nanthakumar E, North M, Bell C, Bartoshuk LM, and Price RA. Localization of a gene for bitter-taste perception to human chromosome 5p15. Am J Hum Genet 64: 1478-1480, 1999. 26. Rodriguez I, Greer CA, Mok MY, Mombaerts P. A putative pheromone receptor gene expressed in human olfactory mucosa. Nat Genet 1: 18-19, 2000. 27. Rodriguez I, Del Punta K, Rothman A, Ishii T, Mombaerts P. Multiple new and isolated familes within the mouse superfamily of V1r vomeronasal receptors. Nat Neurosci 5: 134-140, 2002. 28. Rouquier S, Blancher A, Giorgi D. The olfactory receptor gene repertoire in primates and mouse: evidence for reduction of the functional fraction in primates. Proc Natl Acad Sci USA 97: 2870-2874, 2000. 29. Sainz E, Korley JN, Battey JF, and Sullivan SL. Identification of a novel member of the T1R family of putative taste receptors. J Neurochem 77: 896-903, 2001. 30. Schneider TD and Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18: 6097-6100, 1990. 31. Thompson JD, Higgins DG, and Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680, 1994. 32. Wong GT, Gannon KS, and Margolskee RF. Transduction of bitter and sweet taste by gustducin. Nature 381: 796-800, 1996. 20 33. Young JM, Friedman C, Williams EM, Ross JA, Tonnes-Priddy L, and Trask BJ. Different evolutionary processes shaped the mouse and human olfactory receptor gene families. Hum Mol Genet 11: 535-546, 2002. 34. Zhang X, Firestein S. The olfactory receptor gene superfamily of the mouse. Nat Neurosci 5: 124-133, 2002. 21 TABLE AND FIGURE LEGENDS Table 1 Oligonucleotide primers used to amplify Tas2r34-Tas2r46 DNA sequences by PCR. Fig. 1. Identification of 9 new TAS2Rs in mouse. Alignment of the predicted sequences of the 9 mouse TAS2R proteins identified and mouse TAS2R5 protein (1), generated by ClustalW. Diagram is based on output obtained by the PrettyBox program from the Wisconsin Package Version 10.2, Genetic Computer Group (GCG), Madison, WI. Horizontal lines indicate the amino acid sequences corresponding to the predicted transmembrane domains TM1-TM7. Amino acids are shown in single-letter code and are numbered according to the complete predicted amino-acid sequences. Black and grey boxes indicate amino acids that are identical and similar, respectively, to the consensus sequence. Fig. 2.(A) Sequence conservation between the mouse TAS2R proteins : sequence logos for the open reading frames of TAS2Rs. The N- and C-terminal sequence stretches are removed to avoid length heterogeneity; no considerable sequence conservation was found in these regions. The height of each amino acid symbol is proportional to its frequency of occurrence. The amino acid sequences corresponding to the predicted transmembrane domains (TM1 to TM7), intracytoplasmic loops (IC1 to IC3) and extra-cellular loops (EC1 to EC3) are indicated. Asterisks indicate the highly conserved residues. Green, black, red and blue colors represent, respectively, uncharged polar (except for glycine and cysteine), non-polar, acidic and basic residues. Arrowheads indicate the most conserved amino acids in both mouse and human sequences. (B) Human TAS2Rs are more conserved than mouse TAS2Rs. Frequency of the most common amino acids in the mouse and human TAS2R proteins. Amino acids are shown in single-letter code and are numbered according to the mouse TAS2R5 sequence shown in Figure 1. 22 Frequencies in the mouse protein family are plotted as open circles and frequencies in the human protein family as closed circles. A total of 36 mouse and 28 human TAS2Rs were evaluated. The asterisk indicates the only residue that is fully conserved in all mouse and human sequences. The arrow indicates the only residue that is fully conserved only in the mouse sequences. Arrowheads indicate the residues that are fully conserved only in the human sequences. Fig. 3. Orthologous relationship between mouse and human Tas2r clusters. The two mouse Tas2r clusters are located on chromosome 6. Black horizontal lines represent expansions of the mouse Tas2r gene clusters and green horizontal lines expansions of the human gene clusters on chromosomes 7 and 12. Mouse genes are ordered within the clusters according to their genomic positions in the draft produced by the Mouse Genome Sequencing Consortium and human genes are ordered according to the human genome databases (NCBI build 30). A vertical line indicates the location of each Tas2r in the expansions. Its corresponding number names each Tas2r. Arrowheads indicate the transcription polarities. Pseudogenes are shown in italics and the newly identified Tas2r genes and pseudogenes are shown in bold type. Intergenic distances are drawn to scale as indicated. Gaps in the horizontal lines indicate breaks between genes or groups of genes. Blue lines indicate Tas2r genes sharing = 65% nucleotide identity between both species. Red lines indicate putative orthologous genes. Fig. 4. A phylogenetic tree showing the evolutionary relationship between all full-length mouse and human TAS2R proteins. The mouse sequences are shown in red and human sequences in blue. Amino acids sequences were aligned using ClustalW. Further details are described in the text. Numbers above branches are bootstrap support values derived from 1,000 bootstrap replicates, with only those above 50% shown. Circles divide the tree in 5 sub-trees or groups of TAS2Rs. 23 Fig. 5. The Tas2r gene family expands by local tandem duplications. Scatterplots comparing percent of amino acid identity between pairs of mouse (A) and human (B) intact TAS2R sequences in the same cluster (y-axis) compared to their physical distance (x-axis). Comparisons in A and B comprise the sequences predicted from the genes located in the cluster 2 of mouse chromosome 6 and in the cluster located in human chromosome 12 respectively. Two subsets of data appear in the x-axis orientation in A due to the physical distance separating two discrete sub-clusters of genes in the cluster 2 of mouse chromosome 6. Histograms show the distribution of amino acid identity between the sequences predicted from each of the mouse (C) and human (D) full-length Tas2r genes and its most similar gene in the same cluster (white bars) or its most similar gene in another cluster (black bars). Average identity of the best match in the same clusters is 44.8% in mouse and 55.4% in human and average identity of the best match in different clusters is 29.2% in mouse and 33.7% in human. Tas2r Primer forward Primer reverse Tas2r34 Tas2r35 Tas2r36 Tas2r37 Tas2r38 Tas2r39 Tas2r40 Tas2r41 Tas2r42 Tas2r43 Tas2r44 Tas2r45 Tas2r46 5'-ATGTCTTTCTCACATTCATTC-3' 5’-ATGGGACCCATCATGTCC-3' 5’-ATGAAATCACAGCCAGTGACA-3' 5’-ATGAGATTTATGAACAGAACAAG-3' 5’-ATGCTGAGTCTGACTCCTGT-3' 5’-ATGGCTCAACCCAGCAAC-3' 5’-ATGAATGCTACTGTGAAGTG-3' 5’-ACATCATGGACTAGGAGAAGA-3' 5’-GTAACAGACTGTGGTATTCTC-3' 5’-ATGCCCTCCACACCCACA-3' 5’-ATGGCAATAATTACCACAAATTC-3' 5’-ATGCAGCATCTTTTAAAGATAAT-3' 5’-GGGTGCTGCTATCCTAGTTA-3' 5’-TTATGATCTGGGAATACAAAG-3' 5’-TCAGCAGCAGCCCCTCT-3' 5’-TCAAGGTTTCTTTTCTTTCAGC-3' 5’-TTATGAAGCAGAGGGTCCCT-3' 5’-TCAGAGTGTCCTGGGAGGA-3' 5’-TCAGAATCTATTTTGTAAGTAC-3' 5’-CTAAGGACCTGGGAGTTC-3' 5’-TTAGGAATCTGAGGATTCTGC-3' 5’-TTAGGAACCAGAGAATCTTACA-3' 5’-CTAAAACCTCATCTTCAGGG-3' 5’-CTACCTTTTAAGGTAAAGATGAA-3' 5’-TTAGAGACCCAAAGTTTCTAG-3' 5’-TTAAAATTGTACAAAAGTATCCTC-3' Product size (bp) 897 966 984 1002 996 960 939 866 965 882 960 965 970