* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Evolution of Closely Linked Gene Pairs in
Zinc finger nuclease wikipedia , lookup
Human genetic variation wikipedia , lookup
Ridge (biology) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Point mutation wikipedia , lookup
Oncogenomics wikipedia , lookup
Human genome wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Minimal genome wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome editing wikipedia , lookup
The Selfish Gene wikipedia , lookup
Genome (book) wikipedia , lookup
Gene therapy wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Helitron (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Evolution of Closely Linked Gene Pairs in Vertebrate Genomes Erik Franck,* Tim Hulsen, Martijn A. Huynen, Wilfried W. de Jong,* Nicolette H. Lubsen,* and Ole Madsen*1 *Biomolecular Chemistry, 271 Nijmegen Center of Molecular Life Science, Radboud University Nijmegen, Nijmegen, The Netherlands; and Centre for Molecular and Biomolecular Informatics, NCMLS, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of intergenic DNA (gene duos). We show here that a lack of head-to-tail (h2t) gene duos is an even more distinctive characteristic of mammalian genomes, with the platypus genome as the only exception. In nonmammalian vertebrate and in nonvertebrate genomes, the frequency of h2h, h2t, and tail-totail (t2t) gene duos is close to random. In tetrapod genomes, the h2t and t2t gene duos are more likely to be part of a larger gene cluster of closely spaced genes than h2h gene duos; in fish and urochordate genomes, the reverse is seen. In human and mouse tissues, the expression profiles of gene duos were skewed toward positive coexpression, irrespective of orientation. The organization of orthologs of both members of about 40% of the human gene duos could be traced in other species, enabling a prediction of the organization at the branch points of gnathostomes, tetrapods, amniotes, and euarchontoglires. The accumulation of h2h gene duos started in tetrapods, whereas that of h2t and t2t gene duos only started in amniotes. The apparent lack of evolutionary conservation of h2t and t2t gene duos relative to that of h2h gene duos is thus a result of their relatively late origin in the lineage leading to mammals; we show that once they are formed h2t and t2t gene duos are as stable as h2h gene duos. Introduction The textbook view of a eukaryote gene is a solitary functional entity, a monocistronic coding sequence of which the expression is controlled by an autonomous promoter. In fact, there is a significant clustering of genes in the mammalian genome where the genes in these clusters show coordinate expression (Hurst et al. 2004; The FANTOM Consortium 2005; Gierman et al. 2007; Purmann et al. 2007). Often coordinate expression is due to a similar regulation of autonomous promoters located within the same active chromatin region, but it can also be the result of promoter cross talk (Hampf and Gossen 2007), of sharing of a promoter (a bidirectional promoter; see also below), or of transcriptional interference through a variety of mechanisms such as promoter occlusion, promoter competition, or RNA polymerase collision (Callen et al. 2004; Leupin et al. 2005; Shearwin et al. 2005). Nonautonomous expression, that is, expression of a gene coupled to the expression of another gene, whether positively or negatively, is likely to require close proximity of the genes. The orientation of closely linked genes in the human genome is not random: there are more closely linked headto-head (h2h) genes, usually defined as genes divergently transcribed from opposite strands separated by an intergenic region of 1 kb or less (Adachi and Lieber 2002; Trinklein et al. 2004; Li et al. 2006), than expected. The region between these h2h gene pairs is usually denoted as a bidirectional promoter. Formal experimental proof that expression of a h2h gene pair is regulated by a common and shared bidirectional element is available for only a few of such bidirectional promoters (see e.g., Hansen et al. 2003). 1 Present address: Animal Breeding and Genetics Group, University of Wageningen, Wageningen, The Netherlands. Key words: head-to-head gene, bidirectional promoter, coordinate expression. E-mail: [email protected]. Mol. Biol. Evol. 25(9):1909–1921. 2008 doi:10.1093/molbev/msn136 Advance Access publication June 19, 2008 Ó The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] However, close juxtaposition of 2 autonomous promoters does result in promoter cross talk (Hampf and Gossen 2007), unless an insulator is interposed (see e.g., Xie et al. 2007). It is therefore likely that the members of a closely linked h2h gene pair are no longer independently expressed. Indeed, most (Trinklein et al. 2004; Li et al. 2006; Lin et al. 2007; Yang et al. 2007), but not all (Takai and Jones 2004), expression analyses showed significant correlation, both negative and positive, between the expression of h2h gene pair members. The usual explanation for the evolutionary origin of closely linked h2h pairs is that once created by chance, it becomes difficult to separate the pair as insertion of intergenic DNA, such as a repetitive element, would disturb expression of both genes. H2h gene pairs would thus slowly accumulate during evolution. This explanation is supported by the higher than average evolutionary conservation of h2h pairs (Koyanagi et al. 2005; Li et al. 2006) and a lack of repetitive elements in the bidirectional promoter region between human h2h pairs (Takai and Jones 2004). It is curious, however, that enrichment in closely linked h2h pairs is reported to be limited to mammals (Koyanagi et al. 2005); one would expect them to accumulate in all evolutionary lineages. We have therefore examined the evolution of closely linked h2h gene pairs and compared the dynamics of the evolution of h2h gene pairs with that of closely linked convergently transcribed antisense gene pairs (tail-to-tail; t2t) and that of head-to-tail gene pairs (h2t; consecutive genes transcribed from the same strand). We show here that the enrichment in closely linked h2h gene pairs is not limited to mammalian genomes but is also seen in, for example, those of chicken and Xenopus tropicalis. A distinguishing feature of the mammalian gene organization, with the exception of platypus, as compared with that of other investigated vertebrates and lower eukaryotes, is the relative lack of closely linked h2t gene pairs. By tracing the emergence of the h2h, t2t, and h2t gene pairs closely linked in the human genome, we show that the accumulation of h2h pairs predates that of h2t and t2t pairs. However, once formed, the h2t and t2t pairs are as stable as the h2h pairs. 1910 Franck et al. Methods Data Sets and Gene Distribution In Ensembl (version 40 and 46; Hubbard et al. 2007), the following species were used: primates: Homo sapiens (Hs; man), Pan troglodytes (Pt; chimpanzee), Macaca mulatta (Mmu; macaque), and Otolemur garnettii (Og; bush baby); Scandentia: Tupaia belangeri (Tb; tree shrew); rodents: Rattus norvegicus (Rn; rat), Mus musculus (Mm; mouse), Spermophilus tridecemlineatus (St; squirrel), and Cavia porcellus (Cp; guinea pig); Lagomorpha: Oryctolagus cuniculus (Oc; rabbit); Laurasiatheria: Canis familiaris (Cf; dog), Felis catus (Fc; cat), Bos taurus (Bt; cow), Myotis lucifugus (Ml; microbat), Erinaceus europaeus (Ee; hedgehog), and Sorex araneus (Sa; shrew); Afrotheria: Loxodonta africana (La; elephant) and Echinops telfairi (Et; tenrec); Xenarthra: Dasypus novemcinctus (Dn; armadillo); marsupial: Monodelphis domestica (Md; opossum); monotreme: Ornithorhynchus anatinus (Oa; platypus); bird: Gallus gallus (Gg; chicken); amphibian: Xenopus tropicalis (Xt; pipid frog); fish: Tetraodon nigroviridis (Tn; spotted green puffer fish), Takifugu rubripes (Tf; fugu or japanese puffer fish), Danio rerio (Dr; zebrafish), Gasterosteus aculeatus (Ga; stickleback), and Oryzias latipes (Ol; medaka); urochordates: Ciona intestinalis (Ci; sea squirt) and Ciona savignyi (Cs; sea squirt); and other eukaryotes: Caenorhabditis elegans (Ce; nematode), Drosophila melanogaster (Dm; fruit fly), Anopheles gambiae (Ag; mosquito), and Saccharomyces cerevisiae (Sc; yeast). For all species, the number of h2h, h2t, and t2t gene pairs was determined, together with the length of the intergenic region, by means of python scripting on the Ensembl gene annotation files in Ensmart (script available from the authors). The intergenic region was defined as the number of base pairs between the beginning and/or ends of the transcripts as annotated in Ensembl. In the better annotated genomes, this includes the 5# and 3# untranslated regions (UTR); in poorly annotated genomes, information about the UTRs may be incomplete, and the number of closely linked gene pairs could be underestimated. Conservation and Dynamics of Gene Pairs The cross species homology data (orthology files in Ensmart) were used to find orthologs of human gene pairs with an intergenic distance ,600 bp (gene duos) in other species. When more than one possible ortholog was found, the most probable gene pair was chosen, that is, the one which most resembled the human situation in orientation and/or distance. For the orthologs of each member of a human gene duo, the location and organization in other species were determined with possible outcome h2h, h2t, t2t, or not linked. In case of location on the same chromosome, the intergenic distance was determined as well as whether or not the 2 genes were separated by other genes. By combining the data from different species, the most likely organization of the members of the Hs gene duos at the primate–rodent divergence was then inferred. Similarly, the putative organization of the orthologs could be inferred at the other branching points of a vertebrate tree consisting of Hs, Mm, Rn, Gg, Xt, Tn, Tr, Dr, Ol, and Ga, in which we placed Mm and Rn together in a rodent group and the 5 fish species in a fish group. In analyzing these data, the maximum parsimony principle was applied, assuming the least chromosomal rearrangements. Gene duos that were inferred to be closely linked at a branching point, yet separated in a descendant, were considered to be lost again (e.g., genes that are h2h gene duos in Hs, Mm, and Gg but not in Rn). We could trace orthologs of 47% (365) of the h2h, 27% (99) of the h2t, and 41% (185) of the t2t Hs gene duos; orthologs of the remainder of the Hs gene duos could not be found in a sufficient number of species. Gene Expression An expression data set consisting of a subset of normal human and mouse tissue samples from the Gene Logic BioExpress Database product (http://www.genelogic.com/ genomics/bioexpress/) was used. The human data set consists of 115 tissue categories (compiled from 3,269 tissue samples) and 44,792 cDNA fragments; and the mouse data set consists of 25 tissue categories (compiled from 859 tissue samples) and 36,701 cDNA fragments (Hulsen et al. 2006). First, the Pearson correlations between the expression profiles of all cDNA fragments in the human set and all genes in the mouse set were calculated (data available at http://www.cmbi.ru.nl/;timhulse/orthocomp/). A perfect correlation has a score of 1; a perfect anticorrelation has a score of 1. Second, the Affymetrix fragment IDs of the chip data were mapped to the Ensembl (version 40) IDs used in our study, using Ensembl-Affymetrix mapping files from the Ensembl FTP site (see also http://www.ensembl.org/info/about/docs/microarray_probe_set_mapping.html). When one Ensembl ID was mapped to multiple Affymetrix fragment IDs, the average of the multiple correlation coefficients was used. Of the 18,553 Ensembl Hs gene IDs mapped to 28,348 Affymetrix IDs, 10,497 map to a single ID, 4,854 to 2 IDs, 2,037 to 3 IDs, and 1,165 to 4 or more Affymetrix IDs. Of the 16,269 Ensembl Mm gene IDs mapped to 20,548 Affymetrix IDs, 11,146 map to a singe ID, 3,588 to 2 IDs, 1,136 to 3 IDs, and 399 to 4 or more Affymetrix IDs. Finally, the correlation coefficients were mapped for human and mouse Ensembl (version 40) h2h, h2t, and t2t gene pairs and 3,249 (human) or 2,197 (mouse) randomly assembled gene pairs as control. Results Organization of Closely Linked Gene Pairs Ensembl (Hubbard et al. 2007) provides a comprehensive and integrated source of annotation of genome sequences and provides orthology links between genes in annotated genomes. This makes this database particularly suitable for analyzing the evolution of gene organization within and between species. The total number of tandem gene pairs in a genome is theoretically equal to the number of genes. In practice, in well-assembled genomes, the number counted can be slightly higher because more than one potential pair can be formed when genes overlap. In poorly assembled genomes, where contigs are still short, Evolution of Closely Linked Gene Pairs the number of tandem gene pairs detected is significantly smaller than the number of genes (compare e.g., Hs and Dn in table 1). If gene organization is random, then the frequency of the 3 possible orientations of gene pairs should be 50% h2t, 25% h2h, and 25% t2t. This ratio is indeed more or less observed in most of the 34 investigated eukaryotic genomes (21 mammalian and 13 nonmammalian; table 1). However, in some genomes, notably not only in that of the hedgehog (Ee) but also, for example, in Cp, Sa, and La, we see considerably deviating frequencies. This is likely due to incomplete assembly of the genome; for reasons that we do not understand, incomplete assembly tends to result in a bias toward t2t pairs. The mammalian genome has been reported to be enriched in closely linked h2h gene pairs (Trinklein et al. 2004; Koyanagi et al. 2005; Li et al. 2006). As in the last years, the sequences of a number of other vertebrate genomes, both mammalian and nonmammalian, have become available, we reexamined whether enrichment in closely linked h2h gene pairs is indeed a characteristic of mammalian genomes only. The frequency of h2h, h2t, and t2t gene pairs relative to the intergenic distance between the members of those pairs in representatives of different vertebrate 1911 groups is shown in figure 1 (data for all species examined are shown in supplementary fig. S1 [Supplementary Material online]; note that we only selected protein-coding genes and that overlapping genes were not taken into account). With the notable exception of the opossum (Md) and platypus (Oa) genomes, all mammalian genomes do show an enrichment in closely linked h2h gene pairs and in some cases also in closely linked t2t pairs. It is unlikely that the difference in organization of the Md and Oa genomes is due to incomplete annotation as other mammalian genomes that also contain very few closely linked gene pairs do nevertheless show at least some enrichment in h2h pairs (see e.g., Cp and Et in supplementary fig. S2, Supplementary Material online). Enrichment in closely linked h2h pairs is also seen in the genome of G. gallus (Gg) and, to a lesser extent, in that of X. tropicalis (Xt) but not in any of the 5 fish genomes. The puffer fish genomes (Tn and Tr) show an enrichment in h2t as do the nonvertebrate genomes. As these data indicate that between vertebrate clades genomes might differ not only in the enrichment of closely linked h2h gene pairs but also in the frequency of closely linked h2t and t2t gene pairs, we determined the number of closely linked h2h, h2t, and t2t pairs (table 1; as the enrichment of h2h gene pairs in the mammalian genomes is seen Table 1 The Number of Gene Pairs in Vertebrate and in Some Lower Eukaryotic Genomes (Ensembl version 46) Speciesa Hs Pt Mmu Og Tb Mm Rn St Cp Oc Cf Fc Bt Ml Ee Sa La Et Dn Md Oa Gg Xt Tn Tr Ga Ol Dr Ci Cs Ce Dm Ag Sc a Genome Size (Mb) ProteinCoding Genes 3,253 2,929 3,094 1,969 2,137 3,378 2,507 1,913 1,950 2,076 2,385 1,643 3,247 1,674 2,133 1,833 2,296 2,112 2,146 3,502 1,918 1,051 1,511 342 393 447 700 1,527 173 177 100 133 278 12 22,701 20,572 21,944 15,444 15,458 24,118 22,993 14,828 14,064 15,439 19,305 14,839 21,755 16,229 14,588 13,290 15,717 16,562 15,540 19,520 18,527 16,736 18,025 28,005 21,880 20,791 20,131 21,322 14,180 11,604 20,069 14,086 12,457 6,697 Species, see Methods for abbreviations. Number of Gene Pairs (% of total) h2h 5,809 5,266 5,324 1,671 1,539 5,989 5,711 1,484 783 1,011 4,762 1,780 5,529 1,557 424 623 903 740 632 4,828 2,259 4,344 3,761 6,240 4,517 4,893 4,577 4,691 2,704 2,478 4,647 3,829 3,192 1,822 (25%) (25%) (25%) (21%) (22%) (24%) (25%) (22%) (21%) (22%) (25%) (22%) (25%) (21%) (16%) (20%) (20%) (20%) (22%) (24%) (23%) (25%) (24%) (22%) (24%) (24%) (24%) (23%) (21%) (22%) (22%) (26%) (26%) (27%) h2t 11,813 10,525 10,844 3,747 3,272 12,717 11,892 3,114 1,575 2,087 9,901 3,957 10,881 3,447 1,034 1,315 2,020 1,524 1,231 10,048 4,974 8,358 7,782 15,481 10,048 10,684 10,191 11,050 7,363 6,461 11,231 7,009 6,167 3,224 (50%) (50%) (50%) (48%) (46%) (52%) (51%) (47%) (43%) (45%) (51%) (48%) (50%) (47%) (38%) (42%) (45%) (42%) (42%) (51%) (50%) (49%) (51%) (55%) (52%) (52%) (52%) (54%) (58%) (56%) (55%) (48%) (49%) (47%) Number of Gene Pairs with Intergenic Distance ,600 bp (% of total) t2t 5,792 5,271 5,300 2,433 2,233 5,943 5,695 2,085 1,336 1,579 4,750 2,422 5,508 2,347 1,244 1,166 1,589 1,385 1,051 4,833 2,613 4,354 3,843 6,239 4,541 4,875 4,596 4,699 2,729 2,526 4,692 3,745 3,186 1,808 (25%) (25%) (25%) (31%) (32%) (24%) (24%) (31%) (36%) (33%) (24%) (30%) (25%) (32%) (46%) (38%) (35%) (38%) (36%) (25%) (27%) (26%) (25%) (23%) (24%) (24%) (24%) (23%) (21%) (22%) (23%) (26%) (25%) (26%) h2h 770 588 421 92 87 703 399 74 75 103 208 102 395 128 86 63 93 111 76 92 47 445 201 440 115 387 289 239 303 142 1,055 1,663 451 1,017 (49%) (53%) (47%) (46%) (42%) (50%) (43%) (40%) (41%) (52%) (39%) (42%) (47%) (48%) (44%) (48%) (53%) (47%) (53%) (34%) (23%) (43%) (41%) (14%) (12%) (26%) (34%) (43%) (14%) (12%) (16%) (31%) (23%) (21%) h2t 363 185 188 17 28 303 199 18 30 19 95 33 159 31 13 15 19 21 15 58 99 373 224 1,883 628 653 290 160 1,344 695 3,525 2,323 786 2,262 (23%) (17%) (21%) (9%) (14%) (22%) (21%) (10%) (16%) (10%) (18%) (13%) (19%) (12%) (7%) (11%) (11%) (9%) (10%) (22%) (49%) (36%) (46%) (60%) (65%) (44%) (35%) (29%) (62%) (60%) (52%) (43%) (40%) (48%) t2t 445 331 293 91 89 395 338 91 79 76 225 110 291 106 95 55 62 103 53 116 58 220 67 817 222 443 261 156 538 319 2,137 1,374 708 1,479 (28%) (30%) (32%) (45%) (44%) (28%) (36%) (50%) (43%) (38%) (43%) (45%) (34%) (40%) (49%) (41%) (36%) (44%) (37%) (44%) (28%) (21%) (14%) (26%) (23%) (30%) (31%) (28%) (25%) (28%) (32%) (26%) (36%) (31%) 1912 Franck et al. FIG. 1.—Frequency of h2h (black diamonds), h2t (dotted gray line), and t2t (gray line) gene pairs as a function of the length of the intergenic region in the species indicated in the panels. The frequency was calculated by dividing the number of gene pairs separated by a particular distance (in intervals of 50 bp) by the total number of gene pairs in the genome. The arrows indicate the cutoff for gene duos used in subsequent analyses. Data for other species are shown in supplementary figure S2 (Supplementary Material online). for gene pairs with an intergenic distance of 600 bp or less [arrows in fig. 1], we used 600 bp, rather than the 1,000 bp used in other studies, as the cutoff for closely linked gene pairs. We will refer below to such closely linked gene pairs as gene duos). The percentages of the 3 possible orientations of gene duos relative to the total number of gene duos in different vertebrate and nonvertebrate species are plotted in figure 2. There is some variation in the pattern in tetrapods, but the overall trend is clear: there is an increase in h2h gene duos not only in all mammals, except Oa, but also in Gg and Xt. The t2t gene duos are also in excess in most mammals but not in Gg and Xt, which have actually less t2t gene duos than expected from a random distribution (note that the excess of t2t gene duos is a feature of both well [e.g., Rn] and poorly [e.g., Ee] assembled mammalian genomes and thus unlikely to be an assembly artifact). The most noticeable and consistent feature is the marked lack of h2t gene duos in mammalian genomes except again in that of Oa. As can be expected from their much longer divergence times, the organization of the 5 fish genomes is much more variable than that of the mammalian genomes: Dr has a mammalian-like distribution with a lack of h2t gene duos and an overrepresentation of h2h gene duos, whereas in the Tn and Tr genomes, as in nonvertebrate genomes, the organization of the gene duos is more random. Clustering of Gene Duos If the location of genes was random, then gene density must correlate with genome size and a more compact genome such as that of Dm or Ce is likely to have more gene duos, merely due to a higher gene density. There is also a significant correlation between the closely linked h2h gene pair ratio and gene density in the human genome (Li et al. 2006). To determine whether gene duos are in general part of a larger gene dense cluster, we determined whether either one of the members of the gene pair is flanked by another gene within 1,000 bp or less. As ex- pected from their compact genomes, in Tn and Ci, about one-third of the gene duos are part of a cluster of at least 3 genes, whereas in the larger Dr genome, this proportion is only around 10% (table 2). In the tetrapod genomes examined on the average 24% of the h2t gene duos and 21% of the t2t gene pairs are part of a gene cluster. The h2h gene duos are more likely to be solitary, with an average of only 13% belonging to a larger dense gene cluster. This holds true for both well-annotated genomes (Hs and Mm) and less well-annotated genomes and is thus unlikely to be an artifact due to lack of information about the exact lengths of the 5# and 3# UTRs. A reason for the more solitary nature of h2h gene duos in tetrapod genomes may be that a gene neighboring the pair would be in an h2t or a t2t organization, and at least h2t gene duos are scarce in mammalian genomes (fig. 2). Such gene pairs are also scarce in the Dr genome, and h2h gene duos are also more likely to be solitary in this genome than h2t and t2t gene duos. In the Tn and Ci genomes, in which gene duos have a more random organization, the h2h gene duos are more likely to be part of a larger gene cluster than the h2t gene duos. Gene duplication often results in gene clusters; the b-globin gene cluster is a prime example. To determine whether gene duplication is a significant cause of gene duos, we determined how many genes in the human genome are adjacent to a paralog gene. Paralogous genes were identified via the paralogy link in Ensembl. As shown in table 3, about 10% of all human protein-coding genes have a paralog neighbor transcribed from the same strand, but only 6% of the h2t gene duos are paralogous. About 8% of the human protein-coding genes have a paralog neighbor transcribed from the opposite strand, with an equal occurrence of divergent or convergent transcription. However, for both the divergently and the convergently transcribed gene duos, only 1% consists of paralogs (table 3). The members of h2t gene duos are thus 3 times more likely to be paralogs than those of h2h or t2t gene duos, but gene duplication events have not significantly contributed Evolution of Closely Linked Gene Pairs 1913 FIG. 2.—Frequency of h2h, h2t, or t2t gene duos relative to the total numbers of gene pairs with intergenic distance of ,600 bp. Dotted lines indicate 25% and 50% which is the expected distribution of h2h or t2t (25%) and h2t (50%) if gene organization is random. Species are indicated as Hs 5 Homo sapiens, Pt 5 Pan troglodytes, Mmu 5 Macaca mulatta, Og 5 Otolemur garnettii, Tb 5 Tupaia belangeri, Mm 5 Mus musculus, Rn 5 Rattus norvegicus, St 5 Spermophilus tridecemlineatus, Cp 5 Cavia porcellus, Oc 5 Oryctolagus cuniculus, Cf 5 Canis familiaris, Fc 5 Felis catus, Bt 5 Bos taurus, Ml 5 Myotis lucifugus, Ee 5 Erinaceus europaeus, Sa 5 Sorex araneus, La 5 Loxodonta africana, Et 5 Echinops telfairi, Dn 5 Dasypus novemcinctus, Md 5 Monodelphis domestica, Oa 5 Ornithorhynchus anatinus, Gg 5 Gallus gallus, Xt 5 Xenopus tropicalis, Tn 5 Tetraodon nigroviridis, Tr 5 Takifugu rubripes, Ga 5 Gasterosteus aculeatus, Ol 5 Oryzias latipes, Dr 5 Danio rerio, Ci 5 Ciona intestinalis, Cs 5 Ciona savignyi, Ce 5 Caenorhabditis elegans, Dm 5 Drosophila melanogaster, Ag 5 Anopheles gambiae, and Sc 5 Saccharomyces cerevisiae. For common names of these species, see Methods. For clarities’ sake, the 2 nonmammalian tetrapod species (Gg and Xt) and the 2 urochordates (Ci and Cs) are shown in gray. The mammalian species are divided in separate groups shown in alternating black and striped bars (primates, Scandentia, Laurasiatheria, and Xenarthra in black; rodents, Lagomorpha, Afrotheria, marsupials, and monotremes in striped bars). The different taxonomic groupings are indicated at the bottom of the figure. overall to the formation of gene duos. To have some idea as to when the gene duplications occurred, we checked for the presence of orthologs of all the Hs paralogous gene duos in other species. As shown in table 3, 5 out of the 7 h2h paralogous gene duos likely predate the gnathostomes, whereas only 11 out of the 22 h2t and 2 of the 4 t2t paralogous gene duos do so (see also below). Overall, at least half of the Hs paralogous gene duos that could be traced back in the vertebrate tree are the result of a gene duplication early in vertebrate evolution. 1914 Franck et al. Table 2 The Extent to which h2h, h2t, or t2t Gene Duos Are Part of a Larger Dense Gene Cluster Speciesa Percent of Gene Duos Having a Neighboring Gene within 1,000 bp h2h (%) 13 12 13 14 11 37 8 38 Hs Pt Mm Gg Xt Tn Dr Ci a h2t (%) 27 26 22 22 23 24 13 32 t2t (%) 22 21 20 23 21 33 11 31 See Methods for abbreviations. Table 3 The Contribution of Gene Duplication to the Formation of Gene Duos in the Human Genome Percent of human gene pairs in a particular organization consisting of paralogous genes h2h (%) h2t (%) t2t (%) All 4 10 4 Duos 1 6 1 Age of human paralogous gene duos h2h h2t t2t (1)a Human 1 (4)a Euarchontoglires (2)a (1)a Amniotes 1 (1)a (4)a Tetrapods Gnathostomes 5 11 2 Total 7 22 4 a Between brackets: number of gene duos that could not be traced back further. Conservation of Gene Duos If close apposition of genes has consequences for the regulation of expression of those genes, then one would expect conservation of gene organization. Previous studies have shown that h2h gene pairs are significantly more likely to have the same gene organization in other species than h2t pairs (Koyanagi et al. 2005; Li et al. 2006; Sémon and Duret 2006). Sémon and Duret (2006) showed that genes convergently transcribed (t2t) are less likely to be linked in other species than genes transcribed in the same (h2t) or divergent direction (h2h). However, this study focused on gene clusters with no constraint on intergenic distance, and gene pairs were still considered as pairs even if another gene was interposed. We have therefore repeated this analysis and compared the organization of the members of h2h, h2t, and t2t human gene duos with that of their orthologs in other species. Most orthologs were in the same orientation in tetrapod genomes; even in fish about half of the orthologous gene pairs have the same relative orientation as in man (fig. 3, panel a). However, when also the intergenic distance is considered, then close apposition of orthologs is only seen in primates and rodents (fig. 3, panel b). In other vertebrates, the intergenic distance between the orthologs of the human gene duos is usually greater than 600 bp. When only the orthologs of the human gene duos are considered, there is little difference between the human h2h, h2t, and t2t gene duos with respect to similarity of their organization in other species (fig. 3, panel b). However, when the number of those orthologs in a species relative to the number of gene duos in that species is also taken into account, a different picture emerges (fig. 3, panel c). As expected, the larger the evolutionary distance is from man, the fewer the orthologous gene duos found. When ortholog gene duos are found, they are mostly h2h and not h2t or t2t gene duos. Thus, in species more distant from man, there are relatively fewer orthologous h2t and t2t gene duos than h2h gene duos. Assuming that the evolutionary rate of generation of gene duos is the same for h2h, h2t, and t2t, there are 2 possible explanations for the preponderance of orthologous h2h gene duos. One is that h2h gene duos were generated earlier in evolution than h2t and t2t genes and are therefore more likely to be common; the other is that the close linkage of h2h genes, once generated, is better conserved during evolution than that of h2t and t2t genes. Dynamics of Formation of Gene Duos To determine when a particular human gene duo was formed during evolution, we need to trace the organization of the genes in the ancestral species. To that end, we used the data about the organization of the orthologs of the members of the human gene duos in other species. Of the human gene duos, 365 h2h, 99 h2t, and 185 t2t were phylogenetically informative, that is, the organization of the orthologs could be traced in a sufficient number of different vertebrate species to be able to infer the most likely organization of those orthologs at the branching points of the gnathostome, tetrapod, amniote, and euarchontoglires (primates and rodents) lineages (nodes A–D in fig. 4; see also Methods). The inferred rearrangements of the members of the human h2h, h2t, and t2t duos are outlined in figure 5. For example, in the genome of the common ancestor of the gnathostomes (fig. 5, node A), 58 of the 365 human h2h gene duos were already present as h2h gene duos, 153 were already gene pairs but with a larger intervening distance, 34 were separated by intervening genes, 45 were linked but in the wrong orientation, and 75 were dispersed. Between branching points A and B, of these 75 dispersed pairs, 6 became linked but in the wrong orientation, 4 became linked in the right orientation but separated by intervening genes, 38 became a gene pair separated by .600 bp, and 8 became a gene duo, thus leaving 19 as dispersed gene pairs at node B. Of the 45 gene pairs at node A, which were linked in the wrong orientation, 2 became linked in the right orientation but still separated by intervening genes, 22 became a gene pair separated by .600 bp, and 8 became a gene duo. This left 13 linked gene pairs, but together with the 6 gene pairs that now became linked, this yields a total of 19 at node B. Of the 34 gene pairs separated by intervening genes at node A, in 10 cases the intervening genes were lost but leaving an intergenic distance .600 bp, and in 4 cases a gene duo was formed. Together with the 2 þ 4 gene pairs that were gained, this gives a total of 26 at node B. Of the 153 gene pairs separated by .600 bp at node A, 19 now became a gene duo, presumably due to loss of intergenic DNA, while 10 þ 22 þ 38 were added, which then yielded 204 gene pairs at node B. Finally, to the 58 gene Evolution of Closely Linked Gene Pairs 1915 FIG. 3.—Percentage of orthologs of human h2h, h2t, and t2t gene duos present in other eukaryotes. On the left (panel a, gray bars), the percentage of orthologs of the human gene duos with the same orientation is shown; in the middle (panel b, black bars), the percentage of orthologs of the human gene duos with the same orientation and close proximity (,600 bp) is shown. The percentage of h2h genes is shown in the top panels, that of the h2t genes in the middle panels, and that of the t2t genes in the bottom panels. Species are indicated as in the legend to figure 2. The percentage is relative to the total number of orthologous gene pairs found (see also table 4), thus correcting for missing orthologs due to incomplete gene annotation and/or orthology determination. On the right (panel c), a Venn diagram shows the overlap between Hs gene duos and those in other species. The left circle represents the Hs gene duos, and the right circle the gene duos in the species (denoted as in the legend to fig. 2) indicated on the left. The overlap represents the extent to which the gene duos are orthologous; the numbers are from left to right: the number of unique Hs gene duos, the number of common (orthologous) gene duos, and the number of gene duos unique to the species indicated. The left column shows the data for the h2h gene duos, the middle column those for the h2t gene duos, and the right column those for the t2t gene duos. duos at node A, 19 þ 4þ8 þ 8 were added, giving 97 gene duos at node B. Figure 5 illustrates that the mode of formation of the gene duos, whether h2h, h2t, or t2t, is in general very similar: first genes happened to be rearranged such that they are linked in the proper orientation, then intergenic DNA was lost. For all 3 gene pair orientations, more than 80% of the pair members were already organized in the right orientation without an intervening gene in tetrapods (fig. 5, node B, and fig. 6, left panel). The most conspicuous difference between the h2h gene duos on the one hand and the h2t and t2t gene duos on the other hand is that about 50% of the h2h gene duos (172 out of 365) predate amniotes, whereas only 14% of the h2t (15 out of 99) and 28% of the t2t gene duos (52 out of 185) do so (see fig. 5, node C, and fig. 6, right panel). Formation of the human h2h gene duos thus started in early tetrapod evo- lution, whereas most of the human h2t and t2t gene duos were formed in amniotes (fig. 6). We have also attempted to estimate the rate of loss of gene duos. In principle, the best measure is counting how many gene duos appeared to be lost again later in evolution (e.g., if orthologs form a gene duo in fish, Xt, rodents, and man but not in Gg, the inference is that the gene duo is lost in the Gg lineage). For that we need to know which gene duos were present in the ancestral genome. The numbers shown in figure 5 at a particular node are the sum of the gene duos present at the previous node plus the gene duos inferred to have been formed prior to divergence of the lineages deriving from that node. The latter gene duos thus represent those that are inferred to be present at that node because they are present in the descendant species. Hence, by definition, loss of those gene duos cannot be detected. That means that we can only determine whether a gene duo 1916 Franck et al. FIG. 4.—Phylogenetic tree of the vertebrate species used to reconstruct the dynamics of human gene pair formation during vertebrate evolution. The branch points of gnathostomes (A), tetrapods (B), amniotes (C), and euarchontoglires (D) and the endpoint Homo sapiens (E) are indicated by black diamonds. Species are indicated as in the legend to figure 2. is lost after a particular node, if that gene duo was present at the previous node. For example, the 172 h2h gene duos present at node C (fig. 5) should also be present at node D. Loss in the rodent lineage can then be inferred for those 172 h2h gene duos (table 4). The numbers are very small, particularly in the case of h2t gene duos, and the reliability is therefore not high. A rate of loss can also be calculated by combining the data presented in figures 3 and 5 and knowing how many orthologs of the human gene duos can detected in other species (supplementary table S1, Supplementary Material online). From figure 5, it can be calculated how many of the orthologs of the human gene duos present in a particular species (supplementary table S1, Supplementary Material online) were likely to be already present at the nearest branch point; the loss then follows from the number of gene duos present in the genome of that species at this time (for sample calculation, see supplementary fig. S2, Supplementary Material online). For the h2h gene duos, these 2 approaches yield very similar rates: a loss between 11% and 14% per 100 Myr in the mouse lineage and between 8% and 10% per 100 Myr in the chicken lineage. For the h2t gene duos, the estimates for the mouse lineage are a loss between 6% and 15% per 100 Myr; in the chicken lineage, none would be lost. Finally, the loss of the t2t gene duos is estimated to be between 0% and 19% per 100 Myr in the mouse lineage and between 13% and 15% per 100 Myr in the chicken lineage. From these estimates, it appears that FIG. 5.—Dynamics of human gene pair formation during vertebrate evolution. The organization of the human gene pairs at the branch points of gnathostomes (A), tetrapods (B), amniotes (C), and euarchontoglires (D) and in H. sapiens (E) was inferred from the organization in the available vertebrate genomes. The various possible organizations are shown at the top and denoted as: dis 5 dispersed; linked 5 linked on the same chromosome but in the wrong orientation; 2 arrows flanking a black box 5 linked in the proper orientation but separated by another gene; 2 arrows flanking a slash 5 genes are adjacent in the proper orientation but the intergenic distance is larger than 600 bp; and 2 arrows 5 gene duo, that is, genes are adjacent in the proper orientation and with an intergenic distance ,600 bp. The inferred numbers of gene pairs in the various predicted organizations are given for the branch points A–D; the arrows underneath each number indicate the number of those gene pairs that changed to a specific other type of organization before the next branch point, ultimately resulting in the observed numbers of human gene duos at point E. The data for the 365 human h2h gene duos are shown on the left, those for the 99 h2t gene duos in the middle, and those for the 185 t2t gene duos on the right. Evolution of Closely Linked Gene Pairs FIG. 6.—Extent of formation of gene pairs during evolution. In the left panel, the percentage of gene pairs, irrespective of intergenic distance, present at the branch and endpoints as in figures 4 and 5 is shown. The right panel shows the percentage of gene duos at these branch and endpoints. Note that in this figure, the combined data of 5 fish species are used; the data shown in figure 3 refer to a single species. there is no major difference in stability of gene duos depending on orientation. Gene Expression The closer 2 genes are the more likely they are to be located in the same expression cluster (Sémon and Duret 2006; Purmann et al. 2007). However, coexpression may be more marked when gene pairs are considered. To test this, we plotted the Pearson correlation coefficient of expression of gene pairs grouped by intergenic distance, that is, a group of gene pairs that were nonlinked (control), one distantly linked (intergenic distance .100 kb), one linked with an intergenic distance between 10 and 100 kb, one linked with an intergenic distance between 0.6 and 10 kb, and the closely linked gene duos (less than 600 bp). Although the overall results are very similar—the closer the linkage, the higher the likelihood of coexpression— whether we looked at human or mouse tissue or whether we considered h2h, h2t, or t2t orientation, there are some 1917 surprising differences (fig. 7A). For the human h2h genes, there was very little difference in the likelihood of coexpression with decreasing distance when the intergenic distance was less than 10 kb; in mice, closely linked gene pairs (,600 bp) are more likely to be coexpressed than more distantly linked gene pairs. The reverse was found for the human h2t pairs: close linkage in man is more likely to result in coexpression than in mice. The effect of close linkage on coordinate expression was least for t2t pairs: the shift toward coexpression is less than for h2h and h2t pairs and slightly larger in mice than in men. To show the difference in the likelihood of coexpression relative to the orientation of closely linked gene pairs more directly, the curves for the h2h, h2t, and t2t gene duos were superimposed (fig. 7B). This clearly shows that for all orientations, close linkage correlates with coexpression but that this effect is largest for h2t gene pairs in man and for h2h gene pairs in mice. Note that all curves skew to the right, that is, a positive correlation between the expression of both members of a gene duo. In these experiments, we found no evidence for a negative correlation of expression of a significant number of gene pairs. Discussion Mammalian genomes have been reported to be enriched in divergently transcribed cis-antisense gene pairs with an intergenic distance of less than 1 kb, the h2h genes with a bidirectional promoter region. The exact number of such genes in the human genome is still a matter of debate; the reported numbers vary from 677 (Koyanagi et al. 2005) to 1,262 (Li et al. 2006), 1,304 (Lin et al. 2007), 1,352 (Trinklein et al. 2004), and 5,653 (Yang et al. 2007). Most of these differences can be explained by the precision with which transcription start sites have been mapped, by whether not only major but also minor start sites were taken into account and by whether or not only protein-coding genes were used. In addition, it is not always clear whether or not overlapping genes were excluded. In the experiments reported here, we have used only protein-coding genes, excluded overlapping transcription units, and restricted our analysis to nonoverlapping transcription units separated by an intergenic distance of 600 bp or less (dubbed gene Table 4 Loss of Previously Formed Gene Duos during Later Evolution Number of Known Gene Duos at Nodea Node Db to Mm, Rnc Node C to Gg Node B to Xt Node D to Mm, Rn Node C to Gg Node B to Xt Node D to Mm, Rn Node C to Gg Node B to Xt a b c h2h 172 97 58 h2t 15 10 10 t2t 52 35 29 Number of Gene Duos Lost Dispersed 2 (1.2%) 2 (2.1%) 5 (8.6%) Dispersed 0 (0.0%) 0 (0.0%) 0 (0.0%) Dispersed 1 (1.9%) 2 (5.7%) 3 (10.3%) .600 bp Intergenic 16 (9.3%) 24 (24.7%) 16 (27.6%) .600 bp intergenic 3 (20.0%) 0 (0.0%) 0 (0.0%) .600 bp intergenic 8 (15.4%) 11 (31.4%) 12 (41.4%) Inverted 0 (0.0%) 4 (4.1%) 2 (3.4%) Inverted 0 (0.0%) 0 (0.0%) 0 (0.0%) Inverted 0 (0.0%) 0 (0.0%) 1 (3.4%) Gene Insertion 1 (0.6%) 1 (1.0%) 1 (1.7%) Gene insertion 0 (0.0%) 0 (0.0%) 0 (0.0%) Gene insertion 0 (0.0%) 1 (2.9%) 2 (6.9%) Note that the number of gene duos known to be present in the ancestral species at a particular node is the number present at the previous node. See figure 4. See Methods for abbreviations. Total Lost 19 (11.0%) 31 (31.9%) 24 (41.4%) Total lost 3 (20.0%) 0 (0.0%) 0 (0.0%) Total lost 9 (17.3%) 14 (40.0%) 18 (62.1%) 1918 Franck et al. Evolution of Closely Linked Gene Pairs duos), where the transcription units were defined as annotated in Ensembl. Using this definition, we found 770 h2h, 363 h2t, and 445 t2t gene duos in the human genome (table 1). The genomes of other mammals show a similar organization with respect to gene duos: on the average about 10% of the h2h gene pairs and about 6% of the t2t gene pairs has an intergenic distance of 600 bp or less (excluding the opossum and platypus genomes). One striking property of the mammalian gene organization is that close linkage between h2t gene pairs appears to be avoided, with on the average only about 2% of the gene pairs in this orientation having an intergenic distance of 600 bp or less. In spite of this similarity in gene organization, many gene duos are lineage specific (fig. 3). Our data thus emphasize that the formation of gene duos is a dynamic process: they have been continuously formed and also continuously lost during evolution. The reconstruction of the formation of the human gene duos showed that the present-day human h2h gene duos originated earlier in evolution than the h2t or t2t gene duos (figs. 5 and 6). A priori one would expect that there is an equal chance of a DNA rearrangement yielding an h2h, h2t, or t2t gene duo. The apparently later formation of the present-day human h2t and t2t gene duos would then be due to a lesser stability than that of the h2h gene duos—h2t and t2t gene duos formed prior to the emergence of the amniotes could just have been lost again (note that only about 15% instead of the expected 50% of the gene duos in eutherian genomes are h2t gene duos [fig. 2 and table 1], which suggests that fixation of an h2t gene duo is a rare event). However, our data (table 4 and supplementary fig. S2 [Supplementary Material online]) indicate that there is no major difference in stability between h2h, h2t, and t2t gene duos in the amniote lineage, which would imply that h2t and t2t gene duos were selected against earlier. Evolutionary changes in the acceptability of closely linked genes are likely. For example, about one-third of the gene pairs in the Dm genome are closely linked gene pairs; in the Hs genome, only about 7% (table 1). Furthermore, in nonmammalian genomes, the frequency of h2h, h2t, and t2t gene duos is mostly close to random, in eutherian mammalian genomes, it is not (fig. 2). A potential problem with h2t gene pairs is transcriptional read-through from the upstream gene into the downstream gene. This can cause promoter occlusion—the elongating RNA polymerase could remove positive transcription factors; promoter activation—the elongating RNA polymerase could remove repressors (Callen et al. 2004; Leupin et al. 2005; Shearwin et al. 2005); or result in the synthesis of a read-through mRNA, which encodes a chimeric protein (Parra et al. 2006). To prevent transcriptional interference, a strong transcription termination signal 1919 is needed between the 2 genes. The exact mechanisms and sequence motifs that signal termination by polymerase II are still not exactly understood (for recent reviews, see Buratowski 2005; Rosonina et al. 2006), but it is clear that the cleavage that precedes polyA addition is a prerequisite. In this respect, it is of interest to note that the polyA addition signaling in yeast is more complex than in mammalian cells (Zhao et al. 1999) and that the distance from the polyA addition site to transcription termination in yeast may be shorter (about 0.1 kb; Russo and Sherman 1989) than that in mammalian cells (.0.5 kb; Rosonina et al. 2006). Stringency of polyA addition and transcription termination could be a factor in the maintenance of h2t gene duos, and it would be of interest to examine if polyA addition signaling is also more complex in, for example, fugu or urochordates than in eutherian mammalian cells. If transcription termination poses a problem for h2t gene pairs, why are t2t gene duos not depleted? One possible explanation is that collision could cause pausing of RNA polymerase II, which in turn enhances termination (Zhao et al. 1999; Buratowski 2005; Rosonina et al. 2006) and thereby solving the termination problem. Another possibility is that the antisense transcripts serve in a presumably regulatory, yet unknown role. Antisense transcription abounds in the human genome (Yelin et al. 2003; Dahary et al. 2005; Sun et al. 2005; Engström et al. 2006) and over 40% of the human or mouse transcription units may have an antisense transcript, usually noncoding (Engström et al. 2006). The functional consequences of antisense transcription could be a factor in the selection against t2t gene duos. The depletion of h2t gene duos is not quite unique to eutherian mammalian genomes; we saw this also in the genome of the zebrafish, although not in other genomes of lower eukaryotes. What distinguishes the eutherian mammalian genomes from that of zebrafish is a nonuniform distribution of the distance between h2h genes (fig. 1). A subset of h2h genes with a short intergenic region is also seen in the Gg and Xt genomes. This would suggest that the trend toward formation of such a subset of h2h gene pairs started in tetrapods, before the mammalian divergence. However, the earliest offshoots in mammalian evolution, the monotremes (platypus) and marsupials (opossum) lack this subset of h2h gene pairs. The opossum genome has very few gene duos, which could be the reason that this subset is not detected. The opossum genome otherwise shares the eutherian mammalian characteristic of depletion of h2t gene duos and enrichment in h2h gene duos. The lack of gene duos in the opossum genome does not appear to be a problem of assembly of the genome sequence as the number of gene pairs that can be formed is about FIG. 7.—Correlation of expression of members of h2h, h2t, and t2t gene pairs. The relative number of gene pairs with specific Pearson correlation coefficients is plotted against the Pearson correlation coefficients in increments of 0.2 units. (A) Human and mouse distribution plots of h2h, h2t, and t2t gene pairs in several intergenic distance intervals (0–600 bp 5 ,600; 600 bp–10 kb 5 600–10 k; 10–100 kb 5 10–100 k; and 100 kb–N 5 .100 k). The number of human gene pairs for h2h are, respectively, 599, 594, 1,638, and 624 pairs; for h2t, 247, 2,131, 3,513, and 1,015 pairs; and for t2t, 364, 1,328, 1,425, and 441 pairs. The number of mouse gene pairs for h2h are, respectively, 401, 548, 1,213, and 355 pairs; for h2t, 145, 1,727, 2,574, and 608 pairs; and for t2t, 252, 1,088, 926, and 247 pairs. The number of human randomly paired genes was 3,249 pairs; for mouse, 2,197 random gene pairs were selected. (B) Pearson correlation distribution plot for human or mouse gene duos for which microarray data are available (human: 599 h2h, 247 h2t, and 364 t2t; mouse: 401 h2h, 145 h2t, and 252 t2t). Data are based on Ensembl Version 40. 1920 Franck et al. equal to the number of protein-coding genes (table 1). The platypus genome assembly is not yet complete as the number of potential gene pairs is only half of the number of protein-coding genes. The paucity of gene duos may be the reason that a subset of h2h gene pairs is not seen, it cannot explain that the organization of the gene duos in the platypus genome is random. Incompletely assembled eutherian mammalian genomes with only a few gene duos (see table 1) do show the typical depletion of h2t genes (fig. 2). If the platypus gene organization reflects that of the mammalian ancestor, then we must conclude that the formation of a subset of closely linked h2h genes in chicken and Xenopus is evolutionarily independent of the emergence of such a gene organization in mammalian genomes. The alternative is that large rearrangements have taken place in the platypus genome. The latter alternative is the most likely as almost all the gene duos likely to have been present in the last common ancestor are no longer gene duos in the platypus genome (data not shown). It is noteworthy that platypus is a typical mammal with respect to the density of repetitive elements in its genome; for at least one stretch even higher than in the human genome (Margulies et al. 2005; see also Warren et al. 2008). Continuous insertion of repetitive elements would tend to create gene-poor and gene-rich domains and has been suggested to be one of the factors driving genes together (e.g., Takai and Jones 2004). Whether the platypus genome indeed contains gene-poor and gene-rich domains awaits further analysis of that genome; if so, the organization of the gene duos in the platypus genome would then show that compaction to a generich domain does not necessarily lead to an enrichment in closely linked h2h gene pairs. Amphibian and avian genomes are relatively poor in repetitive elements (Organ et al. 2007) but do contain a subclass of closely linked h2h genes. Hence, there is no strict correlation between density of repetitive elements and enrichment in closely linked h2h gene pairs. For the h2h gene duos, it has been suggested that the sharing of regulator elements provides selection pressure to maintain the gene pair (Adachi and Lieber 2002; Trinklein et al. 2004; Lin et al. 2007; Yang et al. 2007). In the case of the rare eutherian h2t gene duo, it could be the transcriptional coupling or the chimeric gene product that is favorable; for the t2t gene duo, the antisense transcript could have a regulatory role (RIKEN Genome Exploration Research Group and Genome Science Group [Genome Network Project Core Group] and the FANTOM Consortium 2005). It could also just be chance that gene duos stay together: insertion of DNA in such a short intergenic region would be a rare event. In the case of h2h or h2t gene duos, the target area for DNA insertion would be even smaller as repetitive elements tend to be excluded from the first 300 bp of the promoter region (Takai and Jones 2004). Supplementary Material Supplementary table S1 and figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). Acknowledgments The authors thank Gene Logic Inc. for the use of a subset of normal human and mouse tissue samples from the Gene Logic BioExpress Database product. This work was financially supported by the Netherlands Organization for Advancement of Pure Research (NWO). Literature Cited Adachi N, Lieber MR. 2002. Bidirectional gene organization: a common architectural feature of the human genome. Cell. 109:807–809. Buratowski S. 2005. Connections between mRNA 3# end processing and transcription termination. Curr Opin Cell Biol. 17:257–261. Callen BP, Shearwin KE, Egan JB. 2004. Transcriptional interference between convergent promoters caused by elongation over the promoter. Mol Cell. 14:647–656. Dahary D, Elroy-Stein O, Sorek R. 2005. Naturally occurring antisense: transcriptional leakage or real overlap? Genome Res. 15:364–368. Engström PG, Suzuki H, Ninomiya N, et al. (24 co-authors). 2006. Complex loci in human and mouse genomes. PLoS Genet. 2:e47. Gierman HJ, Indemans MHG, Koster J, Goetze S, Seppen J, Geerts D, van Driel R, Versteeg R. 2007. Domain-wide regulation of gene expression in the human genome. Genome Res. 17:1286–1295. Hampf M, Gossen M. 2007. Promoter crosstalk effects on gene expression. J Mol Biol. 365:911–920. Hansen J, Bross P, Westergaard M, Nielsen M, Eiberg H, Børglum A, Mogensen J, Kristiansen K, Bolund L, Gregersen N. 2003. Genomic structure of the human mitochondrial chaperonin genes: HSP60 and HSP10 are localised head to head on chromosome 2 separated by a bidirectional promoter. Hum Genet. 112:71–77. Hubbard TJP, Aken BL, Beal K, et al. (58 co-authors). 2007. Ensembl. Nucleic Acids Res. 35:D610–D617. Hulsen T, Huynen M, de Vlieg J, Groenen P. 2006. Benchmarking ortholog identification methods using functional genomics data. Genome Biol. 7:R31. Hurst LD, Pal C, Lercher MJ. 2004. The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet. 5:299–310. Koyanagi KO, Hagiwara M, Itoh T, Gojobori T, Imanishi T. 2005. Comparative genomics of bidirectional gene pairs and its implications for the evolution of a transcriptional regulation system. Gene. 353:169–176. Leupin O, Attanasio C, Marguerat S, Tapernoux M, Antonarakis SE, Conrad B. 2005. Transcriptional activation by bidirectional RNA polymerase II elongation over a silent promoter. EMBO Rep. 6:956–960. Li Y-Y, Yu H, Guo Z-M, Guo T-Q, Tu K, Li Y-X. 2006. Systematic analysis of head-to-head gene organization: evolutionary conservation and potential biological relevance. PLoS Comput Biol. 2:e74. Lin JM, Collins PJ, Trinklein ND, Fu Y, Xi H, Myers RM, Weng Z. 2007. Transcription factor binding and modified histones in human bidirectional promoters. Genome Res. 17:818–827. Margulies EH, NISC Comparative Sequencing Program, Maduro VVB, Thomas PJ, Tomkins JP, Amemiya CT, Luo M, Green ED. 2005. Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes. Proc Natl Acad Sci USA. 102:3354–3359. Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV. 2007. Origin of avian genome size and structure in non-avian dinosaurs. Nature. 446:180–184. Evolution of Closely Linked Gene Pairs Parra G, Reymond A, Dabbouseh N, Dermitzakis ET, Castelo R, Thomson TM, Antonarakis SE, Guigo R. 2006. Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res. 16:37–44. Purmann A, Toedling J, Schueler M, Carninci P, Lehrach H, Hayashizaki Y, Huber W, Sperling S. 2007. Genomic organization of transcriptomes in mammals: coregulation and cofunctionality. Genomics. 89:580–587. RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group) and the FANTOM Consortium. 2005. Antisense transcription in the mammalian transcriptome. Science. 309:1564–1566. Rosonina E, Kaneko S, Manley JL. 2006. Terminating the transcript: breaking up is hard to do. Genes Dev. 20: 1050–1056. Russo P, Sherman F. 1989. Transcription terminates near the poly(A) site in the CYC1 gene of the yeast Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 86:8348–8352. Sémon M, Duret L. 2006. Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Mol Biol Evol. 23:1715–1723. Shearwin KE, Callen BP, Egan JB. 2005. Transcriptional interference—a crash course. Trends Genet. 21:339–345. Sun M, Hurst LD, Carmichael GG, Chen J. 2005. Evidence for a preferential targeting of 3#-UTRs by cis-encoded natural antisense transcripts. Nucleic Acids Res. 33:5533–5543. Takai D, Jones PA. 2004. Origins of bidirectional promoters: computational analyses of intergenic distance in the human genome. Mol Biol Evol. 21:463–467. 1921 The FANTOM Consortium. 2005. The transcriptional landscape of the mammalian genome. Science. 311:1709–1711. Trinklein ND, Aldred SF, Hartman SJ, Schroeder DI, Otillar RP, Myers RM. 2004. An abundance of bidirectional promoters in the human genome. Genome Res. 14:62–66. Warren WC, Hillier LW, Marshall Graves JA, et al. (99 coauthors). 2008. Genome analysis of the platypus reveals unique signatures of evolution. Nature. 453:175–183. Xie X, Mikkelsen TS, Gnirke A, Lindblad-Toh K, Kellis M, Lander ES. 2007. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc Natl Acad Sci USA. 104:7145–7150. Yang MQ, Koehly LM, Elnitski LL. 2007. Comprehensive annotation of bidirectional promoters identifies co-regulation among breast and ovarian cancer genes. PLoS Comput Biol. 3:e72. Yelin R, Dahary D, Sorek R, et al. (16 co-authors). 2003. Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol. 21:379–386. Zhao J, Hyman L, Moore C. 1999. Formation of mRNA 3# ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev. 63:405–445. Kenneth Wolfe, Associate Editor Accepted June 12, 2008