Download Interchromosomal Segmental Duplications Explain the Unusual

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genetic variation wikipedia , lookup

Human Genome Structural Variation wikipedia , lookup

NEDD9 wikipedia , lookup

Human genome wikipedia , lookup

Human evolutionary genetics wikipedia , lookup

Transcript
Interchromosomal Segmental Duplications Explain the Unusual Structure
of PRSS3, the Gene for an Inhibitor-Resistant Trypsinogen
Lee Rowen,*1Eleanor Williams, 1 Gustavo Glusman,* Elena Linardopoulou, Cynthia Friedman, Mary Ellen Ahearn,à Jason Seto,§ Cecilie Boysen,k Shizhen Qin,* Kai Wang,{ Amardeep Kaur,*
Scott Bloom,* Leroy Hood,* and Barbara J. Trask *Institute for Systems Biology; Division of Human Biology, Fred Hutchinson Cancer Research Center; àDepartments of
Pediatrics and Genetics, University of Miami School of Medicine; §Bioinformatics and Computational Biology Program,
George Mason University; kViaLogy; and {PhenoGenomics Corporation
Homo sapiens possess several trypsinogen or trypsinogen-like genes of which three (PRSS1, PRSS2, and PRSS3) produce
functional trypsins in the digestive tract. PRSS1 and PRSS2 are located on chromosome 7q35, while PRSS3 is found on
chromosome 9p13. Here, we report a variation of the theme of new gene creation by duplication: the PRSS3 gene was
formed by segmental duplications originating from chromosomes 7q35 and 11q24. As a result, PRSS3 transcripts display
two variants of exon 1. The PRSS3 transcript whose gene organization most resembles PRSS1 and PRSS2 encodes a functional protein originally named mesotrypsinogen. The other variant is a fusion transcript, called trypsinogen IV. We show
that the first exon of trypsinogen IV is derived from the noncoding first exon of LOC120224, a chromosome 11 gene.
LOC120224 codes for a widely conserved transmembrane protein of unknown function. Comparative analyses suggest that
these interchromosomal duplications occurred after the divergence of Old World monkeys and hominids. PRSS3 transcripts consist of a mixed population of mRNAs, some expressed in the pancreas and encoding an apparently functional
trypsinogen and others of unknown function expressed in brain and a variety of other tissues. Analysis of the selection
pressures acting on the trypsinogen gene family shows that, while the apparently functional genes are under mild to strong
purifying selection overall, a few residues appear under positive selection. These residues could be involved in interactions
with inhibitors.
Introduction
It is now widely appreciated that the human genome
has been shaped by mutation, gross rearrangement, and
duplications of gene-containing segments during its evolution. Roughly 5% of the human genome appears to have
arisen in the last 40 Myr through duplication of segments
of 1 kb or longer (Bailey et al. 2001). Genes contained in
these duplicate segments often evolve to take on distinct
functions (Hurles 2004) either through alteration of their
protein structure or their regulatory elements. Here, we report that duplicated segments from human chromosomes 7
and 11 have coalesced on chromosome 9, forming an unusual trypsinogen gene with two distinct promoters, derived
from each of the originating chromosomes.
Trypsinogens are the inactive precursors to trypsins,
a class of serine proteases that digest proteins by cleaving
at lysine or arginine residues. They are produced in the pancreas and secreted to the duodenum and intestines, where
they are activated by enterokinase (enteropeptidase, PRSS7)
to form trypsin, which, in turn, activates itself and other
digestive enzymes (Kitamoto et al. 1994). Trypsinogen
genes in mammals and birds constitute a multigene family
whose members are found within the beta T cell receptor
(TCRB) locus (Hood, Rowen, and Koop 1995; Wang et al.
1995; Rowen, Koop, and Hood 1996). In the fully characterized TCRB loci of human (Rowen, Koop, and Hood
1996) and mouse (AE000663, AE000664, AE000665),
clusters of trypsinogen (T) or trypsinogen-like (TL) genes
1
These authors contributed equally to the work.
Key words: segmental duplication, fusion gene, selection, trypsinogen.
E-mail: [email protected]; [email protected].
Mol. Biol. Evol. 22(8):1712–1720. 2005
doi:10.1093/molbev/msi166
Advance Access publication May 18, 2005
Ó The Author 2005. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
flank a region spanning hundreds of kilobases containing
the TCRB variable gene segments.
Analysis of the human TCRB locus on 7q35 revealed
genes coding for the functional trypsinogen proteins PRSS1
and PRSS2, also known, respectively, as the cationic and
anionic trypsinogens (Scheele, Bartelt, and Bieger 1981),
but not PRSS3, also called mesotrypsinogen (Rinderknecht
et al. 1984; Nyaruhucha, Kito, and Fukuoka 1997). The
3.6-kb five-exon genes coding for PRSS1 and PRSS2
are embedded within the first and last units, respectively,
of a tandem array of five 10.6-kb–duplication units located
near the 3# end of the TCRB locus (Rowen, Koop, and Hood
1996). The three internal units of the tandem array contain
trypsinogen pseudogenes, none of which corresponds to the
mesotrypsinogen cDNAs in GenBank. Complicating matters further, cDNAs for an alternative form of mesotrypsinogen called trypsinogen IV were reported (Wiegand et al.
1993). Although exons 2–5 were the same as those found in
mesotrypsinogen, the sequence of exon 1 of trypsinogen IV
was completely different from exon 1 of any other trypsinogen. As a result, the predicted protein is missing the leader
signal required for the secretion of pancreatic enzymes.
Because a cluster of nonfunctional TCRB V gene
segments was previously localized to chromosome 9
(Charmley, Wei, and Concannon 1993; Robinson et al.
1993), we reasoned that the missing mesotrypsinogentrypsinogen IV gene would be found in association with
these V genes. This hypothesis was confirmed by our sequence of this portion of chromosome 9p13. Moreover,
we report here that ;175 kb of chromosome 11 sequence
was also duplicated to chromosome 9 after the divergence
of hominids and Old World monkeys, and that trypsinogen IV’s first exon has been co-opted from another gene.
We show that the two PRSS3 variants are expressed in
Segmental Duplications Explain Structure of PRSS3 1713
different tissues and explore the selection pressures acting
on the trypsinogen gene family.
signals were analyzed in at least 10 metaphase spreads and
numerous interphase nuclei.
Materials and Methods
Sequencing
Sequence Analysis
The sequence of the beta T cell/trypsinogen locus on
chromosome 7 (NG_001333; U66059, U66060, U66061)
was described earlier (Rowen, Koop, and Hood 1996). The
sources of chromosome 9 sequence containing the orphon
V gene segments and PRSS3 are annotated in AF029308.
For the region on chromosome 11, a BAC clone RP1161J24 was identified from a library screen using probes
to trypsinogen IV exon 1 and sequenced (AC010583). The
human sequences analyzed in this report can be found at
chr7:141810473–141960472, chr9:33566742–33816741,
and chr11:129032812–129239214 in the 05/04 genome assembly (http://genome.ucsc.edu). A rhesus macaque BAC
clone, CHORI250-28G19, containing an orthologous trypsinogen gene cluster was sequenced and submitted to
GenBank as AC149201. All sequencing was done using
the high-redundancy shotgun method (Rowen, Lasky, and
Hood 1999) and finished to .99.99% accuracy.
Fluorescence In Situ Hybridization
Clones from regions involved in the chromosome 7-9
duplication (group A) and in the chromosome 11-9 duplication (group B) were used to determine the origin and timing of duplicated sequence in the chromosome 9 locus.
Group A comprises the following: human chromosome 7
cosmid B97 (subcloned from a YAC derived from the
CGM1 cell line) containing V20-V25; human chromosome
9 cosmid X91 (from ATCC 1475; AF029308) containing
orphon V20-V24; human chromosome 7 BAC CTD-2087C12
containing trypsinogen genes; and rhesus macaque BAC
CHORI250-28G19 (AC149201) containing trypsinogen
genes. Group B comprises the following: human chromosome 9 cosmid 3B9 (subcloned from BAC CTA-109D8;
AF029308) containing trypsinogen IV exon 1 and human
chromosome 11 BAC RP11-61J24 (AC010583) containing
LOC120224. Note that not all four clones in group A were
analyzed in all species.
Cosmid or BAC DNA was isolated from bacterial
cultures, biotinylated via nick translation, and hybridized
in the presence of human Cot1 DNA to metaphase spreads
prepared from phytohaemagglutinin-stimulated human
lymphocytes and fibroblast or lymphoblast cell lines of
various other primates using published procedures (Trask
1999). The cell lines used were CRL1847 or AG16618
for chimpanzee, CRL1854 or AG05251 for gorilla,
CRL1850 or GM06213 for orangutan, GM03443 for rhesus
macaque, CRL1495 for baboon, and H39 for gibbon. Cell
repository lines (CRL) were obtained from ATCC (www.
atcc.org), and AG- and GM-lines were obtained from the
Coriell Cell Repositories (http://locus.umdnj.edu/ccr/).
The sites of hybridization were detected with two layers of
fluorescein-conjugated avidin connected with biotinylated
goat antiavidin antibody. The chromosomes were counterstainedwith4#-6diamidino-2-phenylindole,andimageswere
collected for analysis as described elsewhere (Trask 1999).
For each assay, the number and location of hybridization
Exon 1 containing expressed sequence tags (ESTs) for
mesotrypsinogen, trypsinogen IV, and LOC120224 were
identified from the 05/04 genome assembly. Library information was derived from the EST accession numbers and
the Image Consortium (http://image.llnl.gov/image/html/
humlib_info.shtml). To identify duplicated regions, we
used the TCRB/trypsinogen locus on chromosome 7 as
a starting query sequence for similarity searches. We
performed pairwise alignment of similar sequences on
chromosomes 7, 9, and 11 using Blast2 (Tatusova and Madden 1999), without repeat masking and with parameters set
so that alignments spanned interspersed-repeat integrations,
small insertions, and deletions. RepeatMasker (Smit et al.
2004) was used to identify interspersed-repeat sequences
either spanning or truncated at the breakpoints of similarity
between sequence pairs, allowing the original and rearranged sequences to be identified in some cases. To calculate divergence rates between species, regions of similarity
between human and chimpanzee (11/03 assembly) or rhesus macaque (BAC CHORI250-28G19) sequences were
first identified using BLAT (Kent 2002) searches or Blast2.
The percent identities/divergences of all regions of similarity were calculated excluding insertions-deletions (indels)
and applying Jukes-Cantor correction for multiple substitutions (Jukes and Cantor 1969).
Detecting Natural Selection
Selection pressures acting on 3 human, 5 rhesus macaque, and 11 mouse putatively functional trypsinogen
genes were estimated (human: PRSS1, PRSS2, PRSS3;
rhesus macque: try9, try12, try13, try14, try16; mouse:
try4, try5, try7, try8, try9, try10, try11, try12, try15,
try16, try20). The mouse genes are in sequence with accessions AE000663, AE000664, and AE000665. Genes were
classed as putatively functional if they did not contain
frame-shifts, premature stop codons, or the R122H mutation known to cause hereditary pancreatitis (Whitcomb
et al. 1996). In addition, one apparently functional gene
from rhesus macaque (try4) was excluded from selection
analyses as the GENECONV software (Sawyer 1989)
predicts that this gene has been involved in a geneconversion event (P , 0.01 for both global permutation
and Bonferroni-corrected Karlin-Altshul calculated P values, all sites considered, mismatches allowed [g1]). A
multiple sequence alignment of coding regions of the 19
genes was made using ClustalW (Thompson, Higgins,
and Gibson 1994) and manually edited to keep codons in
frame and remove sites with gaps in any sequence (supplementary data files 1 and 2, Supplementary Material online).
We first compared the number of nonsynonymous
substitutions per nonsynonymous site (dN) to the number
of synonymous substitutions per synonymous site (dS) over
the length of the genes using the program yn00 (Yang and
Nielsen 2000). We then used the ADAPTSITE software
version 1.3.2 (Suzuki and Gojobori 1999; Suzuki, Gojobori
and Nei 2001) and the CODEML program in the PAML
1714 Rowen et al.
FIG. 1.—Paralogous relationships among regions on human chromosomes 9p13, 7q35, and 11q24 resulting from interchromosomal duplications and
local rearrangements. Creation of two transcriptional variants of PRSS3; trypsinogen IV is a hybrid of an exon copied from a chromosome 11 gene
LOC120224 and mesotrypsinogen exons duplicatively transferred from chromosome 7. Shaded panels indicate the extent of sequence shared by
two chromosomes. Segments that are in inverted orientation are shown with arrows below them. The positions of known genes (black blocks) and
the trypsinogen-duplication units (white blocks) are shown. Chromosome-specific insertions or deletions 3 kb, for example, of interspersed repetitive
elements, are not shown. Each rearrangement breakpoint that lies within an interspersed repetitive element is marked with either O (intact and thus
original) or T (truncated and thus derived).
software version 3.14 (Yang 1997; Wong et al. 2004) to
determine whether any amino acid sites were likely to be
subject to positive selection. For ADAPTSITE, a phylogenetic tree was constructed using the p-distance tree option in
the NJBOOT program in the LINTREE software (Takezaki,
Rzhetsky, and Nei 1995). PAUP* (Swofford 2003) was
used to produce a neighbor-joining tree using the uncorrected p-distance for input into CODEML. ADAPTSITE
looks at each amino acid site and estimates the number
of synonymous and nonsynonymous sites and changes
throughout the tree using ancestral sequences constructed
by maximum parsimony. The program then tests whether
the proportion of nonsynonymous changes at each amino
acid site is significantly different from the neutral expectation (dN/dS 5 1). CODEML uses maximum likelihood
methods to determine how well models that allow sites with
different classes of dN/dS (x) ratios fit the data. The fit of
four models to the data was calculated. Model 1a permits
two site classes with 0 , x0 , 1 and x1 5 1. Model 2a adds
to Model 1a a third site class with x2 . 1, thus allowing
some sites to be under positive selection. Model 7 assumes
a beta distribution of x, with all values of x between 0 and
1. Model 8 adds an extra class with x . 1 to Model 7. These
models were chosen for analysis because the fit of the data
to the more general models (M2a and M8) can be compared
to the fit to the more specific models (M1a and M7) using
chi-square tests as described in Yang et al. (2000). The
program was run with three starting x values and, in all
cases, converged on the same likelihood values, suggesting
that global rather than local maxima were achieved.
Results and Discussion
Genomic Organization of the Mesotrypsinogen/
Trypsinogen IV Gene
The interval between NOL6 and UBE2R2, which encompasses PRSS3, is vastly larger in the genomes of human
and chimpanzee (.340 kb) than in the genomes of dog
(9.8 kb) or mouse (11.6 kb). This difference suggests that
the primate genome underwent rearrangement of this region
of chromosome 9p13 subsequent to the mammalian radiation. Approximately 190 kb of the additional DNA in
human and chimpanzee can be explained by duplicative
transfers of sequence to chromosome 9 from the TCRB
locus on chromosome 7q35 and from a region on 11q24.
The derivative sequence at 9p13 has two regions of similarity with the TCRB locus on chromosome 7 separated by a
66-kb region of similarity with portions of 11q24 (fig. 1). In
human, an additional 150 kb of sequence between the
NOL6 and UBE2R2 genes are accounted for by multiple
intrachromosomal duplications (see the ‘‘Segmental
Duplication’’ track of the Human Genome Browser (http://
genome.ucsc.edu/).
The more telomeric segment of similarity between
9p13 and 7q35 spans ;88 kb on chromosome 9 and contains V gene segments V20 through V26 (fig. 1). The centromere-proximal region of chromosome 7 similarity spans
28 kb and contains V gene segment V29 (fig. 1). A 20-kb
region of chromosome 7 containing V gene segments V27
and V28 is not present on chromosome 9. In addition to
V29, the more centromeric region of chromosome 7 similarity contains a trypsinogen gene (PRSS3, mesotrypsinogen)
embedded in a ;10.6-kb trypsinogen-duplication unit and
the initial 1.2 kb of an adjacent trypsinogen-duplication
unit. Chromosome 7 is clearly the source of this duplicated
segment: this trypsinogen-duplication unit continues in the
chromosome 7 sequence. The paralogous region on chromosome 7 contains a total of five trypsinogen-duplication
units, which include PRSS1, three trypsinogen pseudogenes, and PRSS2, respectively (Rowen, Koop, and Hood
1996). Immediately upstream of the V20-V26–containing
segment of chromosome 7 similarity are two additional
short regions (1.9 kb and 0.6 kb) that both match in reverse orientation to portions of the trypsinogen-duplication
units. These small segments show high similarity to the
trypsinogen-duplication units located at the 3# end of the
TCRB locus on chromosome 7, but might have been derived
from the unit at the 3# end of the duplicated sequence on
chromosome 9. Both segments have higher percent identity
to the unit on chromosome 9 than to any of the five units on
chromosome 7 (average 95% vs. 92%).
The region of 11-9 paralogy (fig. 1) includes the
first exon of a six-exon gene, LOC120224, found on
chromosome 11 and represented by cDNA AK098106.
LOC120224 is the original source of this first exon as it
is contained in ESTs of LOC120224 from rhesus macaque,
a species that does not have the 11-9 duplication (see below). No other known transcriptional units were transferred
from chromosome 11 to chromosome 9. Curiously, in
contrast to the portion transferred from chromosome 7,
Segmental Duplications Explain Structure of PRSS3 1715
the portion of chromosome 11 represented on chromosome 9 underwent significant rearrangement, presumably
subsequent to the duplicative transfer. A total of 109 kb of
sequence found on chromosome 11 is absent from the chromosome 9 copy, with the result that nine discrete blocks with
similarity to chromosome 11 are observed (fig. 1). Moreover,
several blocks of sequence are rearranged in order and
orientation on chromosome 9 relative to chromosome 11.
Of the 18 rearrangement breakpoints, 9 lie in interspersedrepeat elements, and in all of these cases, the chromosome
11 copy contains an intact repetitive element spanning the
breakpoint, while chromosome 9 contains the derived, truncated form. Thus, in each case where the original state can be
deduced from the repeats present at the breakpoints, chromosome 11 represents the original, unaltered state, and the copy
on chromosome 9 has been rearranged.
Timing of the Duplications
Excluding the trypsinogen-duplication units, the two
main blocks of sequences on chromosome 9 derived from
chromosome 7 average 94.5% and 94.9% identity to their
counterparts on chromosome 7, suggesting that the two
regions were copied to chromosome 9 at the same time.
Interestingly, the percent identity between the chromosome 9
and 7 sequences drops at the start of the PRSS3 trypsinogenduplication unit and is only 91.6% over the length of the unit.
This point is discussed separately below. The chromosome
9-11 paralogy has an overall percent identity of 95.2%. Duplication of sequence from chromosome 7, followed by the
chromosome 11-to-9 duplication is the most parsimonious
explanation for the interruption of the regions of similarity
with chromosome 7 and the replacement, possibly through
a double recombination, of the V27 and V28 gene segment
on chromosome 9 by chromosome 11 sequence.
The approximate average percent divergence of
5% for both the chromosome 7 versus 9 and 11 versus 9
comparisons suggests that both segmental duplications to
chromosome 9 occurred sometime after the separation of
hominids from Old World monkeys (;7% divergence;
Yi, Ellsworth, and Li 2002) but before the human-orangutan
split (;3% divergence; Chen and Li 2001), assuming no
homogenizing sequence exchanges occurred between the
paralogs. This conclusion is supported by our fluorescence
in situ hybridization (FISH) analyses in a variety of
primates using probes derived from regions involved in
either the chromosome 7-9 or chromosome 11-9 transfers.
Probes from both regions show hybridization signals on
two chromosome pairs in orangutan, gorilla, chimp,
and human, but on a single pair in baboon/macaque
and gibbon (not shown). Thus, both duplicative transfers
to chromosome 9 occurred after the divergence of hominids from Old World monkeys, but before the orangutan
lineage split off from the gorilla/chimp/human branch.
The ;3% greater divergence of the trypsinogen-repeat
block on chromosome 9 relative to the version on chromosome 7 compared to the rest of the duplication is intriguing.
Its divergence of 8% would imply that the chromosome 7
and 9 copies of the trypsinogen-containing portion of the
duplication began diverging 35 MYA (assuming a neutral
mutation rate of ;1.1 3 109 changes per nucleotide per
year [Chen and Li 2001]), but FISH results establish that
this region duplicated from chromosome 7 to 9 more recently (i.e., 15–20 MYA, after the gibbon lineage branched off
from the common ancestor of human and orangutan). We
excluded the formal possibility that the trypsinogen-repeat
units are diverging more rapidly than neighboring sequence
by comparing the divergence between human and chimpanzee within these units to that of neighboring regions. Human and chimpanzee sequences from the first introns in the
trypsinogen genes have diverged at a rate (1.3%, averaged
for four genes, 1-kb sequence compared) similar to the rate
observed for ;8.5 kb of nontandemly duplicated noncoding
sequence 5# of the PRSS1-duplication unit (1.4%). These
rates are similar to previous estimates of human-chimp
divergence (e.g., 1.24 6 0.07% in Chen and Li 2001).
The most plausible explanation for the anomalous divergence between the chromosome 7 and chromosome 9
paralogs of the trypsinogen units is that a single chromosome 7 segment duplicated to chromosome 9, 15–20
MYA, but the original source of the PRSS3-containing
portion is no longer present on chromosome 7, or its relationship with the PRSS3 unit has been obscured by geneconversion events. Tandemly duplicated blocks are often
subject to deletion in unequal crossover events (Strachan
and Read 1999). Indeed, loss of the trypsinogen-duplication
unit containing the try6 pseudogene is a common polymorphism in humans (Seboun et al. 1989; Rowen, Koop, and
Hood 1996), and gene conversion is speculated to occur
among trypsinogens in human (Chen and Ferec 2000) and
other species (Roach et al. 1997). At least two of the
human-duplication units on chromosome 7 and two of the
rhesus macaque-duplication units show evidence of gene
conversion (P , 0.01) using GENECONV (Sawyer 1989).
Human versus rhesus macaque comparisons provide additional insight into trypsinogen gene evolution.
Human-rhesus divergence is greater for the trypsinogen
first intron (10.4% average divergence of best match,
1-kb sequence compared) than for the single-copy sequence
just 5# of the PRSS1-duplication unit (5.7%). The divergence of the single-copy sequence is similar to that found
in 470 kb of sequence from the major histocompatibility
complex class II region (Daza-Vamenta et al. 2004). Furthermore, five of the six human trypsinogen intronic sequences match best to the trypsinogen gene at the 5# end
of the rhesus array. These findings suggest that human
and rhesus macaque each retained a different subset of
the ancestral set of duplication units, or that orthologous
relationships are obscured by gene conversion between
the units, with possible loss of the donor units.
Gene Structures of PRSS3 and
Chromosome 11-Specific LOC120224
Analysis of the duplicative transfers from chromosomes 7 and 11 to chromosome 9 reveals the origin
of the mesotrypsinogen and trypsinogen IV transcriptional variants of PRSS3. The intron/exon organization of
mesotrypsinogen on chromosome 9 is the same as that of
PRSS1 and PRSS2 on chromosome 7 (fig. 2). Each of the
three genes spans ;3.6 kb. The five exons of mesotrypsinogen code for a protein that is 247 amino acids long.
1716 Rowen et al.
1 kb
M +
PRSS1 trypsinogen
Expression of the Mesotrypsinogen and
Trypsinogen IV Transcripts
Mesotrypsinogen, like cationic and anionic trypsinogen mRNAs (PRSS1 and PRSS2) is transcribed predomiPRSS3 trypsinogen IV a/b
nantly in the pancreas (Wiegand et al. 1993). Pancreatic
35 kb
M +
trypsinogen gene expression is thought to be regulated
trypsinogen IV B
35 kb
by the pancreatic transcription factor PTF1, which contains
FIG. 2.—The splice variants of PRSS1, mesotrypsinogen, and trypsin- a subunit called p48, whose expression appears restricted to
ogen IV. The PRSS1 gene is on chromosome 7, all other genes are on the pancreas (Krapp et al. 1996). Based on ESTs sequenced
chromosome 9. The proposed locations of the initiating methionine (M)
from an unnormalized HR85 normal pancreatic islet library
and the activation peptide sequence (1) are also shown.
(Kaestner et al. 2003), mesotrypsinogen is expressed at
a markedly lower level than are the cationic and anionic
Thirteen residues, including the leader peptide required for trypsinogens (mesotrypsinogen: 8 ESTs, trypsinogen IV:
secretion, are in the first exon of mesotrypsinogen derived 0, PRSS1: 1371, PRSS2: 1331, out of a total of 69,008
from chromosome 7 sequence (see cDNAs X15505 and available ESTs). The low relative level of mesotrypsinogen
mRNAs might not be due to transcriptional regulation beBC069476 for examples).
In contrast, the trypsinogen IV variant of PRSS3 spans cause the three genes are identical in the region hypothe48.6 kb and derives its first exon from a sequence that sized to be a pancreatic-specific cis-regulatory element
was duplicated from chromosome 11 (fig. 1). Most ESTs (caggtgtgtttgt) 70–80 bases 5# of the TATA box (Stevenson,
and mRNAs for trypsinogen IV include the chromosome Hagenbuchle, and Wellauer 1986; Cockell et al. 1989).
11–derived exon 1 and chromosome 7–derived exons However, other transcription factors and cis-control ele2–5 of the PRSS3 gene. Allelic variants of this splice form ments could contribute to the expression of pancreatic trypare called a and b (Wiegand et al. 1993) (fig. 2). None con- sinogens, and one or more of these elements could be
tains mesotrypsinogen’s first exon. An mRNA for another missing in the promoters of various trypsinogen genes.
splice form (AY052783, named ‘‘isoform B’’) includes an An alternative explanation for differential expression levels
additional exon after its chromosome 11–derived first exon. is that exon 1 of mesotrypsinogen has a nonconsensus GC
This additional exon is from the chromosome 7–derived se- splice donor, which might render its splicing less efficient.
quence situated before the first exon of mesotrypsinogen. Mesotrypsinogen and trypsinogen IV show different tissueThe translation start site for the ‘‘a/b’’ splice form is thought distribution patterns of expression (table 1), as would be
to be in the exon derived from chromosome 11 (Wiegand expected from differences in the cis-regulatory elements
et al. 1993). Translation of the ‘‘B’’ isoform is predicted to 5# of the different first exons for the two transcript variants.
Like PRSS1 and PRSS2, mesotrypsinogen expression apstart in its second exon (AY052783 annotation).
The first exon of all the trypsinogen IV variants cor- pears to be restricted to the pancreas. In contrast, trypsinresponds to exon 1 of a predicted gene on chromosome 11 ogen IV appears to be expressed at a low level in a variety
currently named LOC120224 (AK098106; NM_138788). of tissues and is not restricted to brain as earlier thought
This first untranslated exon of LOC120224 is 95% identical (Wiegand et al. 1993) (table 1). Trypsinogen IV’s expresover 80 nt (EST B1256410) to exon 1 of trypsinogen IV. sion pattern may be due to the gene-regulation signals that
The LOC120224 mRNA codes for a 275-amino acid affect the transcription of LOC120224 from chromosome
protein that aligns well to predicted/hypothetical protein se- 11. Electronic Northern data from Unigene indicate that
quences in mouse, rat, chicken, frog, and fish, suggesting LOC120224 transcripts with and without exon 1 are exthat although the gene’s function is currently unknown, it is pressed in a variety of tissues, with an emphasis on colon
likely to be important. LOC120224 has six predicted trans- and stomach, but not in normal pancreas (table 1). The
membrane domains, five of which have been annotated as tissue distributions of trypsinogen IV and LOC120224
the ‘‘DUF 716’’ domain. A Psi-Blast search shows that this ESTs containing the first exon show no overlap, but transprotein is related to proteins involved in antiviral responses. cripts of both genes are rare.
M +
mesotrypsinogen
M
+
Table 1
Expression Patterns of PRSS3a and LOC120224b
Total Number
of ESTsc
ESTs from Normal
Pancreas Tissue
PRSS3-mesotrypsinogen
16
12
PRSS3-trypsinogen IVa/b
49
LOC120224
26
Gene
a
b
c
ESTs from Cancer
Pancreas Tissue
ESTs from other Tissues
4 (insulinoma)
None
1
26 (ductal carcinoma
MGC_110)
Hypothalamous, cerebellar cortex, brain,
heart, uterus, several cancer cell lines
0
3 (insulinoma)
Stomach, nasopharanx, liver, cervix, lung,
small intestine, unspecified tissue
PRSS3 ESTs missing exon 1 cannot be assigned to either mesotrypsinogen or trypsinogen IV and are therefore not included in the table.
Because LOC120224 might contain an alternative promoter (see BC016153), only ESTs containing the exon 1 similar to trypsinogen IV were included in this analysis.
Based on the provided descriptions, most of the EST libraries were not normalized.
Segmental Duplications Explain Structure of PRSS3 1717
Functional Significance of PRSS3
The proteolytic functions of the PRSS1 and PRSS2
trypsins in the digestive tract are well documented (Craik
and Halfon 1998), but the physiological role of mesotrypsin
is a matter of debate. Mesotrypsinogen comprises a minor
portion (;3%) of the total trypsinogen protein in the
pancreas (Szmola, Kukor, and Sahin-Toth 2003). Mesotrypsin is significantly more resistant to trypsin inhibitors
(Rinderknecht et al. 1984) than are the cationic and anionic
trypsins, presumably due to disruption of the inhibitorbinding site caused by the presence of arginine rather than
glycine at position 198 in the mesotrypsin amino acid sequence (Nyaruhucha, Kito, and Fukuoka 1997). One of the
rhesus monkey trypsinogens also has this substitution,
showing possible convergent evolution. Due to this amino
acid change, mesotrypsinogen can degrade soybean inhibitor (SBTI) and human pancreatic secretory inhibitor
(SPINK1) (Szmola, Kukor, and Sahin-Toth 2003). The normal physiological role for mesotrypsin is surmised to be to
digestively degrade naturally occurring trypsin inhibitors
found in food such as soybeans. However, should mesotrypsinogen become inappropriately activated in the pancreas, it could degrade the trypsin inhibitors that protect
the pancreas from damage caused by residual levels of
trypsin, thereby causing or contributing to pancreatitis
(Szmola, Kukor, and Sahin-Toth 2003).
The function of trypsinogen IV variants, if there is one,
is not yet known, but the expression data suggest a possible
significance in tissues other than pancreas. All trypsinogen
IV variants contain the activation peptide sequence of mesotrypsinogen (fig. 2) and thus have the potential to encode
functional trypsins identical to the mesotrypsin produced
in the pancreas. The intracellular transport and activation
of the trypsinogen IV variants might differ, however, from
trypsinogens expressed in pancreatic cells. The leader peptide required for secretion is encoded by the first exon of
other trypsinogens, and is therefore missing in trypsinogen
IV. Assuming that translation starts in the chromosome
11–derived exon, trypsinogen IVa/b contains four RXXR
furin cleavage–recognition sites (Molloy et al. 1992),
supporting the possibility of this alternative pathway for
secretion (Wiegand et al. 1993; Cottrell et al. 2004).
Trypsin IV is among the various proteases found to
activate protease-activated receptors (PARs) 2 and 4, and
the protease inhibitor resistance of trypsin IV has been
postulated to promote prolonged PAR-mediated signaling
in nonpancreatic cells (Cottrell et al. 2004). Trypsin IV
has also been implicated in the increased production of
glial fibrillary acidic protein and b-amyloid in the brain of
transgenic mice constructed to express trypsinogen IVa/b
in neurons (Minn et al. 1998). Additionally, levels of
PRSS1, PRSS2 (see references in Yamamoto et al. 2003),
and PRSS3 (Diederichs et al. 2004) are elevated in
various tumors, and trypsin might stimulate cancer cell
proliferation via PAR activation (Miyata et al. 2000).
Nothing is known about the expression and possible
functions of trypsinogen IV isoform B. If the translation
start of this variant is indeed within the second exon as predicted, the protein contains no furin cleavage–recognition
sites and might not be extracellularly secreted.
Accretion of a novel first exon and promoter region in
the trypsinogen IV splice form appears to have expanded
PRSS3’s tissue expression beyond the pancreas, where
the trypsin protein could have physiological roles distinct
from the pancreaticaly expressed, inhibitor-resistant mesotrypsin. Competing selection pressures might arise between
these divergent functions. If both functions are selectively
advantageous, subfunctionalization could evolve to produce an inhibitor-resistant trypsin from another trypsinogen
duplicate allowing the trypsinogen IV form of PRSS3 to
evolve independently.
Purifying Selection Acting on Trypsinogen Genes
We evaluated the selective pressures acting on 3 human, 5 rhesus monkey, and 11 mouse apparently functional
trypsinogen genes, including mesotrypsinogen. We first
looked at the dN/dS ratio calculated over the length of
the coding regions of the genes. Strong selection pressures
will result in differences in the fixation rates of nonsynonymous changes, compared to synonymous changes (Yang
and Nielsen 2000). The average dN/dS ratio was 0.26 6
0.11. All dN/dS ratios were 0.67 except the comparison
of mouse try4 versus try5, for which the dN/dS ratio was
1.03. This overall signature of mild to strong purifying selection suggests three possibilities: (1) that the genes are
under purifying selection, because there is an advantage
to maintaining more than one functional trypsinogen gene,
as suggested by Roach et al. (1997), (2) that some paralogs
are now under neutral selection, but they have not yet accumulated many changes because they lost their function,
or (3) that subfunctionalization is underway with relaxation
of selection pressures on certain parts of some of the genes.
We consider below the possibility that some amino acid
sites in the duplicates are under positive selection while
adapting to new function(s).
Positive Selection Might be Acting on
Some Amino Acid Residues
Because positive selection on a few sites would not
be reflected in the whole-gene dN/dS ratios, we used the
maximum parsimony method in the ADAPTSITE software
and the CODEML sites estimation method in the PAML
software to test for positive selection at specific sites.
ADAPTSITE finds no sites under positive selection, but
identifies 34 sites likely to be under purifying selection
(P , 0.05). CODEML models that allow some sites to
be under positive selection (Models 2a and 8) fit the data
significantly better than models that do not (Models 1a
and 7) (table 2). CODEML identifies amino acid sites 99
and 100 in our gap-free alignment (sites 101 and 102 in
conventional trypsinogen numbering) as likely to be under
positive selection. Seven different amino acids are found at
each of these two sites among the 19 genes studied, showing that the high dN/dS ratio of these sites is due to high
rates of nonsynonymous substitutions rather than locally
low rates of synonymous changes. The side chains of residues at sites 99 and 100 are solvent-exposed and lie in
loops ringing the active site (see Protein Data Base entries
1TRN and 1H4W). Therefore, such changes are unlikely to
1718 Rowen et al.
Table 2
Tests for Selection Pressures on Sites in Apparently Functional Human, Rhesus Macaque, and
Mouse Trypsinogen Genes
Method
ADAPTSITE
Models Compared
in CODEMLa
—
x Values Estimated in
CODEML
P value of Fit
of Data in CODEML
—
—
Amino Acid Sites Likely to be
Under Positive Selection
None
CODEML
M2a versus M1ab
0.000161
M1a: 0.07, 1; M2a: 0.07, 1, 4.8
99***, 100***
CODEML
M8 versus M7b
7.38 3 105
M7: 10 estimated x’s between 0 and
1; M8: 10 estimated x’s between 0 and
1, plus additional x of 3.79
99***, 100***
a
The model which fits the data better is shown in bold.
The fit of the data to models that permit some sites to be under positive selection (Models 2a and 8) was compared to the fit of data to models in which x must be
1 (Models 1a and 7).
***
Probability that site is under positive selection .0.99 (Bayes Empirical Bayes analysis).
b
affect interactions within the catalytic site. However, these
sites could be involved in interactions with the propeptide,
thereby affecting autoactivation rates, or with trypsin inhibitors. Amino acid residue 99 was independently identified
as a possible determinant of specific interactions with trypsin inhibitors (Gaboriaud et al. 1996; residue 96 in their
chymotrypsinogen numbering system). We speculate that
these variants could have differential affinity to trypsin inhibitors and, either through resistance to inhibitors or the
ability to degrade them, allow a wider variety of trypsin
inhibitor-containing foods to be digested.
The discrepancy between the results from the
ADAPTSITE and CODEML analyses could be due to either a lack of power in ADAPTSITE to detect positively
selected sites in a data set of this size or false positives
in the CODEML results. ADAPTSITE fails to identify sites
under strong positive selection in simulation studies using
small (30 sequence) data sets (Wong et al. 2004) like ours.
However, CODEML has been shown to have high falsepositive rates in some situations (Suzuki and Nei 2004,
Wong et al. 2004). This high false-positive rate has been
ameliorated in the version of CODEML used here by the
use of the Bayes Empirical Bayes method for the inference
of sites under positive selection (Yang, Wong and Nielsen
2005). Nevertheless, the sites identified should be subject to
further investigation before firm conclusions about their
function in the trypsin protein are drawn.
regions from chromosomes 7 and 11 to form a trypsinogen gene, PRSS3, on chromosome 9 with two different
promoters and first exons. PRSS3 encodes two different
inhibitor-resistant protein products, trypsinogen IV and
mesotrypsinogen. By virtue of its location on a different
chromosome than the other human trypsinogens, mesotrypsinogen might be less susceptible to gene-conversion events
that could reverse the amino acid substitution that confers
this resistance. It remains to be tested whether the co-opted
first exon of trypsinogen IV adds novel function to the trypsinogen gene family, by allowing expression in different
tissues such as brain and/or providing a different mechanism for protein secretion.
Supplementary Material
Supplementary data files 1 and 2 are available at
Molecular Biology and Evolution online (http://www.
mbe.oxfordjournals.org/).
Supplementary Data File 1.—Nucleotide alignment of
the sequences of the 3 human, 5 rhesus macaque, and 11
mouse putatively functional trypsinogen genes used in
the analyses of natural selection.
Supplementary Data File 2.—Amino acid translation
of the nucleotide sequence alignment of the human, rhesus
macaque, and mouse trypsinogens used in the analyses of
natural selection.
Acknowledgments
Conclusion
Gene duplication provides a mechanism for expanding
the functional repertoire of a family of proteins such as pancreatic trypsinogens. Upon duplication, genes have the possibility of acquiring new functions via changes in coding
sequences or regulatory regions. Our analyses of the selection forces acting upon the trypsinogen genes in human,
rhesus macaque, and mouse suggest that some members
of this gene family might be under positive selection for
amino acid substitution at some sites, and that these changes
might affect their interactions with trypsin inhibitors.
Duplicated genes may also become inactivated in order
to maintain the proper gene dosage or because they confer
no selective advantage. Indeed, some trypsinogen paralogs
in various species have become pseudogenes. In this report,
we document an unusual variation on the theme of creating
new genes by duplication, namely a duplicative transfer of
This work was supported by grants from National
Institutes of Health (GM057070 and DC04209 to B.T.;
HG01791 to L.H.) and the Department of Energy. We thank
Ken Kidd and Michael Seamon for generously providing
gibbon cell line H39, Nikki Jerome, Dale Baskin, Carol
Loretz, Stephen Lasky, Ann Ramsey, and Sung Mo for assistance with mapping and sequencing, and Janet Young,
Jared Roach, and Pat Charmley for helpful discussions.
Literature Cited
Bailey, J. A., A. M. Yavor, H. F. Massa, B. J. Trask, and E. E.
Eichler. 2001. Segmental duplications: organization and
impact within the current human genome project assembly.
Genome Res. 11:1005–1017.
Charmley, P., S. Wei, and P. Concannon. 1993. Polymorphisms in
the Tcrb-V2 gene segments localize the Tcrb orphon genes to
human chromosome 9p21. Immunogenetics 38:283–286.
Segmental Duplications Explain Structure of PRSS3 1719
Chen, F. C., and W. H. Li. 2001. Genomic divergences between
humans and other hominoids and the effective population size
of the common ancestor of humans and chimpanzees. Am.
J. Hum. Genet. 68:444–456.
Chen, J. M., and C. Ferec. 2000. Gene conversion-like missense
mutations in the human cationic trypsinogen gene and insights
into the molecular evolution of the human trypsinogen family.
Mol. Genet. Metab. 71:463–469.
Cockell, M., B. J. Stevenson, M. Strubin, O. Hagenbuchle, and
P. K. Wellauer. 1989. Identification of a cell-specific DNAbinding activity that interacts with a transcriptional activator
of genes expressed in the acinar pancreas. Mol. Cell. Biol.
9:464–476.
Cottrell, G. S., S. Amadesi, E. F. Grady, and N. W. Bunnett. 2004.
Trypsin IV: a novel agonist of protease-activated receptors
2 and 4. J. Biol. Chem. 279:13532–13539.
Craik, C. S., and S. Halfon. 1998. Trypsin. Pp. 12–21 in
A. J. Barrett, N. D. Rawlings and J. F. Woessner, eds. Handbook of proteolytic enzymes. Academic Press, London.
Daza-Vamenta, R., G. Glusman, L. Rowen, B. Guthrie, and
D. Geraghty. 2004. Genetic divergence of the rhesus macaque major histocompatibility complex. Genome Res. 14:
1501–1515.
Diederichs, S., E. Bulk, B. Steffen et al. (14 co-authors). 2004.
S100 family members and trypsinogens are predictors of
distant metastasis and survival in early-state non-small cell
lung cancer. Cancer Res. 64:5564–5569.
Gaboriaud, C., L. Serre, O. Guy-Crotte, E. Forest, and
J.-C. Fontecilla-Camps. 1996. Crystal structure of human
trypsin 1: unexpected phosphorylation of Tryp151. J. Mol.
Biol. 259:995–1010.
Hood, L., L. Rowen, and B. F. Koop. 1995. Human and mouse
T-cell receptor loci: genomics, evolution, diversity, and serendipity. Ann. NY Acad. Sci. 758:390–412.
Hurles, M. 2004. Gene duplication: the genomic trade in spare
parts. PLoS Biol. 2:E206.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein
molecules. Pp. 21–123 in H. N. Munro and J. B.
Allison, eds. Mammalian protein metabolism. Academic
Press, New York.
Kaestner, K. H., C. S. Lee, L. M. Scearce et al. (20 co-authors).
2003. Transcriptional program of the endocrine pancreas in
mice and humans. Diabetes 52:1604–1610.
Kent, W. J. 2002. BLAT—the BLAST-like alignment tool.
Genome Res. 12:656–664.
Kitamoto, Y., X. Yuan, Q. Wu, D. W. McCourt, and J. E. Sadler.
1994. Enterokinase, the initiator of intestinal digestion, is
a mosaic protease composed of a distinctive assortment of
domains. Proc. Natl. Acad. Sci. USA 91:7588–7592.
Krapp, A., M. Knofler, S. Frutiger, G. J. Hughes, O. Hagenbuchle,
and P. K. Wellauer. 1996. The p48 DNA-binding subunit of
transcription factor PTF1 is a new exocrine pancreas-specific
basic helix-loop-helix protein. EMBO J. 15:4317–4329.
Minn, A., M. Schubert, W. F. Neiss, and B. Muller-Hill. 1998.
Enhanced GFAP expression in astrocytes of transgenic mice
expressing the human brain-specific trypsinogen IV. Glia
22:338–47.
Miyata, S., N. Koshikawa, H. Yasumitsu, and K. Miyazaki. 2000.
Trypsin stimulates integrin alpha(5)beta(1)-dependent
adhesion to fibronectin and proliferation of human gastric
carcinoma cells through activation of proteinase-activated
receptor-2. J. Biol. Chem. 275:4592–4598.
Molloy, S. S., P. A. Bresnahan, S. H. Leppla, K. R. Klimpel, and
G. Thomas. 1992. Human furin is a calcium-dependent serine
endoprotease that recognizes the sequence Arg-X-X-Arg and
efficiently cleaves anthrax toxin protective antigen. J. Biol.
Chem. 267:16396–16402.
Nyaruhucha, C. N. M., M. Kito, S.-I. Fukuoka. 1997. Identification and expression of the cDNA-encoding human mesotrypsin(ogen), an isoform of trypsin with inhibitor resistance.
J. Biol. Chem. 272:10573–10578.
Rinderknecht, H., I. G. Renner, S. B. Abramson, and C. Carmack.
1984. Mesotrypsin: a new inhibitor-resistant protease from a
zymogen in human pancreatic tissue and fluid. Gastroenterology
86:681–692.
Roach, J. C., K. Wang, L. Gan, and L. Hood. 1997. The molecular
evolution of the vertebrate trypsinogens. J. Mol. Evol.
45:640–652.
Robinson, M. A., M. P. Mitchell, S. Wei, C. E. Day, T. M. Zhao,
and P. Concannon. 1993. Organization of human T-cell receptor beta-chain genes: clusters of V beta genes are present on
chromosomes 7 and 9. Proc. Natl. Acad. Sci. USA 90:
2433–2437.
Rowen, L., B. F. Koop, and L. Hood. 1996. The complete 685kilobase DNA sequence of the human beta T cell receptor
locus. Science 272:1755–1762.
Rowen, L., S. Lasky, and L. Hood. 1999. Deciphering genomes
through automated large-scale sequencing. Methods
Microbiol. 28:155–192.
Sawyer, S. 1989. Statistical tests for detecting gene conversion.
Mol. Biol. Evol. 6:526–538.
Scheele, G., D. Bartelt, and W. Bieger. 1981. Characterization of
human exocrine pancreatic proteins by two-dimensional isoelectric focusing/sodium dodecyl sulfate gel electrophoresis.
Gastroenterology 80:461–473.
Seboun, E., M. A. Robinson, T. J. Kindt, and S. L. Hauser. 1989.
Insertion/deletion-related polymorphisms in the human T cell
receptor beta gene complex. J. Exp. Med. 170:263–270.
Smit, A. F. A., R. Hubley, and P. Green. 1996–2004. RepeatMasker Open-3.0. http://www.repeatmasker.org.
Stevenson, B. J., O. Hagenbuchle, and P. K. Wellauer. 1986. Sequence organisation and transcriptional regulation of the mouse
elastase II and trypsin genes. Nucleic Acids Res. 14:8307–8330.
Strachan, T., and A. P. Read. 1999. Human molecular genetics 2.
2nd edition. John Wiley & Sons. New York.
Suzuki, Y., and T. Gojobori. 1999. A method for detecting
positive selection at single amino acid sites. Mol. Biol. Evol.
16:1315–1328.
Suzuki, Y., T. Gojobori, and M. Nei. 2001. ADAPTSITE:
detecting natural selection at single amino acid sites.
Bioinformatics 17:660–661.
Suzuki, Y., and M. Nei. 2004. False-positive selection identified
by ML-based methods: examples from the Sig1 gene of the
diatom Thalassiosira weissflogii and the tax gene of a human
T-cell lymphotropic virus. Mol. Biol. Evol. 21:914–921.
Swofford, D. L. 2003. PAUP*: phylogenetic analysis using
parsimony (*and other methods). Version 4. Sinauer
Associates, Sunderland, Mass.
Szmola, R., Z. Kukor, and M. Sahin-Toth. 2003. Human mesotrypsin is a unique digestive protease specialized for the degradation of trypsin inhibitors. J. Biol. Chem. 278:8580–8589.
Takezaki, N., A. Rzhetsky, and M. Nei. 1995. Phylogenetic test of
the molecular clock and linearized trees. Mol. Biol. Evol.
12:823–833.
Tatusova, T. A., and T. L. Madden. 1999. BLAST 2 sequences,
a new tool for comparing protein and nucleotide sequences.
FEMS Microbiol. Lett. 174:247–250.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994.
CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice.
Nucleic Acids Res. 22:4673–4680.
Trask, B. 1999. Fluorescence in situ hybridization. Pp. 303–413 in
B. Birren, E. D. Green, P. Hieter, S. Klapholz, R. M. Myers,
1720 Rowen et al.
H. Riethman, and J. Roskams, eds. Genome analysis: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, N.Y.
Wang, K., L. Gan, I. Lee, and L. Hood. 1995. Isolation and characterization of the chicken trypsinogen gene family. Biochem.
J. 307:471–479.
Whitcomb, D. C., M. C. Gorry, R. A. Preston et al. (15 coauthors). 1996. Hereditary pancreatitis is caused by a mutation
in the cationic trypsinogen gene. Nat. Genet. 14:141–145.
Wiegand, U., S. Corbach, A. Minn, J. Kang, and B. Muller-Hill.
1993. Cloning of the cDNA encoding human brain trypsinogen
and characterization of its product. Gene 136:167–175.
Wong, W. S. W., Z. Yang, N. Goldman, and R. Nielsen. 2004.
Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying
positively selected sites. Genetics 168:1041–1051.
Yamamoto, H., S. Iku, Y. Adachi et al. (11 co-authors). 2003.
Association of trypsin expression with tumour progression
and matryilysin expression in human colorectal cancer.
J. Pathol. 199:176–184.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555–556.
Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary
models. Mol. Biol. Evol. 17:32–43.
Yang, Z., R. Nielsen, N. Goldman, and A. K. Pedersen. 2000.
Codon-subsitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449.
Yang, Z., W. S. W. Wong, and R. Nielsen. 2005. Bayes Empirical
Bayes inference of amino acid sites under positive selection.
Mol. Biol. Evol. 22:1107–1118.
Yi, S., D. L. Ellsworth, and W. H. Li. 2002. Slow molecular clocks
in Old World monkeys, apes, and humans. Mol. Biol. Evol.
19:2191–2198.
John H. McDonald, Associate Editor
Accepted May 12, 2005