Download Survey of Conserved Alternative Splicing Events

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomics wikipedia , lookup

Epitranscriptome wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Human genome wikipedia , lookup

Non-coding DNA wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Minimal genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Genetic code wikipedia , lookup

Genome evolution wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Primary transcript wikipedia , lookup

NEDD9 wikipedia , lookup

Helitron (biology) wikipedia , lookup

Protein moonlighting wikipedia , lookup

Transcript
Survey of Conserved Alternative Splicing Events of mRNAs Encoding
SR Proteins in Land Plants
Kei Iida* and Mitiko Go* *Faculty of Bio-Science, Nagahama Institute of Bio-Science and Technology, Siga, Japan; and Ochanomizu University,
Tokyo, Japan
The serine/arginine-rich (SR) protein family plays an important role in constitutive and alternative splicing (AS). These
proteins regulate AS in a tissue-specific and stress-responsive manner. Pre-mRNAs encoding SR proteins are often alternatively spliced, and these AS events may be important for the regulation of AS events of other pre-mRNAs. In this study, we
analyzed AS events of SR proteins in Arabidopsis thaliana and Oryza sativa (rice). We found three sets of AS events conserved between Arabidopsis and rice. These conserved AS events were found in the plant-novel-SR protein, SC35-like
(SCL), and two-Zn-knuckles–type 9G8 subfamilies. Each member of these subfamilies has at least one RNA recognition
motif (RRM) and at least one intron in the RRM-encoded region. We found that the conserved AS events occurred in these
introns and, in each case, the conserved AS events resulted in mature mRNAs encoding proteins with incomplete RRMs. To
search for the evolutionary origin of these AS events, we analyzed SR proteins in Physcomitrella patens (moss) in addition to
those in Arabidopsis and rice. We found moss homologues of the plant-novel-SR protein, SCL, and the two-Zn-knuckles–
type 9G8 subfamilies in silico, and these homologues have long introns at the same location of the conserved AS sites in Arabidopsis and rice. Such long introns are quite specific for alternatively spliced introns concerning the Arabidopsis SR protein
genes. The long introns found in the moss SR protein genes strongly suggested that conserved AS events in moss SR protein
genes might be similar to those in Arabidopsis and rice. We traced the evolutionary origin of the conserved AS events to 400
MYA, when plants first invaded land. These events are likely important in the regulation of whole AS events and likely
contribute to the complicated transcriptome described by AS. The complicated transcriptome created by regulated AS events
might have provided plants tolerance against droughts or temperature shifts and given them the ability to live on land.
Introduction
Alternative splicing (AS) is a mechanism by which multiple forms of mature mRNAs are made from a single, premature mRNA. In Arabidopsis and rice (Oryza sativa),
10%–20% of all pre-mRNAs undergo AS (Kikuchi et al.
2003; Iida et al. 2004). The rates of AS in Arabidopsis
and rice are lower than those in human or mouse, in which
50% of all genes undergo AS (Kan, States, and Gish 2002;
Carninci et al. 2005). Despite the lower rate, AS is also important in plants. Several AS events play regulatory roles in
development, in specific tissues, or in response to environmental stress (Macknight et al. 2002; Shi et al. 2002;
Yoshimura et al. 2002). In our previous study, we found
large-scale changes in the relative quantities of AS transcripts (which we referred to as ‘‘AS profiles’’) in some organs and in responses to stress (Iida et al. 2004). In that
study, we discussed AS events in pre-mRNAs that encode
splicing factors, especially we regarded the serine/argininerich (SR) proteins as possible regulators of entire AS profiles.
In the Arabidopsis genome, there are 19 SR proteins (Kalyna
and Barta 2004). Many pre-mRNAs encoding SR proteins
undergo AS (Kalyna and Barta 2004; Wang and Brendel 2004).
If they are truly important for the regulation of entire AS
profiles, we would expect the AS events of pre-mRNAs that
encode SR proteins to be highly conserved across evolution.
The SR proteins of Arabidopsis are classified into
seven subfamilies: SF2/ASF, SC35, one-Zn-knuckle–type
9G8, two-Zn-knuckles–type 9G8, SCL, plant-novel-SR
protein, and SR45 (Kalyna and Barta 2004). For Arabidopsis, each subfamily contains more than two members, except for SC35 and SR45 subfamilies. Although one might
Key words: Arabidopsis thaliana, Oryza sativa, Physcomitrella
patens, transcriptome, stress response, land plant evolution.
E-mail: [email protected].
Mol. Biol. Evol. 23(5):1085–1094. 2006
doi:10.1093/molbev/msj118
Advance Access publication March 6, 2006
Ó The Author 2006. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
expect gene duplication events in this family, no reports to
date have examined whether SR protein AS events have
a common evolutionary origin. On the other hand, several
groups have reported AS events of pre-mRNAs encoding
SR proteins in other species (Gao, Gordon-Kamm, and
Lyznik 2004; Gupta et al. 2005) but did not determine
whether the AS events in those species were conserved.
In this study, we compared AS events of mRNAs encoding
SR proteins in Arabidopsis and rice and attempted to determine the evolutionary origin of these AS events.
In tracing the origin of the AS events, moss (Physcomitrella patens) is an important target. Mosses and flowering plants are both land plants but are thought to have
diverged about 400 MYA (Nishiyama et al. 2003), about
twice as ancient as the divergence of Arabidopsis and rice
(145–206 MYA; Yu et al. 2002). By analyzing moss, we
expected to obtain evolutionarily primitive information
about AS events of SR proteins. Although the likelihood
of finding conserved AS events in moss was low—because
so little moss transcript data were known—we expected to
find AS candidates from the moss genomic sequence. In
Arabidopsis, the alternatively spliced introns of the genes
of SR proteins are often remarkably long (.400 bp) when
compared to other introns in Arabidopsis (Kalyna and Barta
2004). Most such introns in the Arabidopsis SR protein
genes are alternatively spliced. Based on this remarkable
property, we studied the possibility of AS events in moss
genes encoding SR proteins. We compared probable AS
events to those found in Arabidopsis and rice and traced
the evolutionary origin of these conserved AS events.
Materials and Methods
Data Set
For Arabidopsis thaliana, we used the complete genome sequence released by The Institute for Genomic
1086 Iida and Go
Research (TIGR) database (Haas et al. 2002, ver. 5.0) and
the annotated gene set. To identify AS events, we used transcript data from National Center for Biotechnology Information (NCBI) Unigene (Wheeler et al. 2003), RIKEN
full-length cDNAs (Seki et al. 2002), and Ceres Inc. fulllength cDNAs (Haas et al. 2002). Sequence identifiers were
beginning with the letters ‘‘At#S,’’ ‘‘RAFL,’’ and ‘‘ceres’’
for Unigene, RIKEN full-length cDNAs, and Ceres Inc.
full-length cDNAs, respectively, in supplementary figures.
For O. sativa (rice), we used pseudomolecules of the complete genome (ver. 3.0) and the annotated gene set released
by TIGR. We used NCBI Unigene data and ‘‘kome’’ full
cDNAs (Kikuchi et al. 2003) for the rice transcripts. Sequence identifiers were beginning with the letters ‘‘Os#S’’
and kome for Unigene and kome full cDNAs, respectively,
in supplementary figures. For the P. patens (moss) genome,
we referenced PHYSCObase (Nishiyama et al. 2003). We
used Blast searches (Altschul et al. 1997) of PHYSCObase
to obtain the sequences of genomic fragments (database
date, September 6, 2005). For the moss transcript, we used
contig sequences of full cDNAs on PHYSCObase and
Unigene (NCBI).
Identification of Exon-Intron Structures and AS Events
To detect AS events, we identified the exon-intron
structures of SR protein genes. For Arabidopsis and rice,
we identified exon-intron structures by mapping transcripts
to the genomes. We mapped transcripts in two steps. First,
we roughly mapped transcripts to the genomes using Blast
and determined their loci. In the next step, we precisely
aligned the transcripts to the loci sequences using GeneSeqer (Brendel, Xing, and Zhu 2004) and identified the
exon-intron structures of SR protein genes. We identified
AS events based on exon-intron structures (Okazaki
et al. 2002; Iida et al. 2004). For each locus, we clustered
transcripts from the locus and determined the genomic
exon-intron structures. Nucleotides were treated as genomic exon nucleotides if they were found in an exon of
any transcript. We compared the exon-intron structures
of each transcript and genome and identified AS events.
Identification of Conserved AS Events
We surveyed for conserved AS events based on the
positions of alternatively spliced introns during multiple
alignments of amino acid sequences. We used reference
sequences for the multiple alignments. For Arabidopsis
and rice genes, we used the sequences annotated by TIGR
as the reference sequences. In loci with multiple spliced
forms, some sequences have incomplete domain organization. We chose sequences with the characteristic domain organization of each subfamily. For each gene, we compared
the reference sequence and transcripts from the locus and
mapped the AS events to the intron positions of the reference
sequence. We compared the positions of alternatively
spliced introns between multiple alignments and defined
AS events at the same position on multiple alignments as
‘‘conserved AS events.’’ We made multiple alignments of
reference amino acid sequences of each subfamily using
ClustalW (Thompson, Higgins, and Gibson 1994) and determined the conserved AS events from the alignments. We cre-
ated phylogenetic trees to study the gene duplication events
in the evolutionary pathways leading to moss, Arabidopsis,
and rice. We made phylogenetic trees using the maximum likelihood method of the PHYLIP software package
(http://evolution.genetics.washington.edu/phylip.html). The
phylogenetic trees are displayed using TreeView (http://
taxonomy.zoology.gla.ac.uk/rod/treeview.html).
Assembling the Moss Genomic Sequence Fragments
The genome sequence of moss (P. patens) was not assembled as of September, 2005, so we could use only fragment sequences. We needed genomic sequences to identify
the exon-intron structures of SR protein genes and to search
for AS events in moss. We assembled several parts of the
moss genome. First, we searched for SR protein homologues from the moss transcript set using Blast. Next, we
searched for fragment sequences of loci encoding moss
SR protein genes, using Blast with the identified transcript
sequences as queries. Finally, we assembled the sequences
using the TIGR assembler (Sutton et al. 1995).
Results
Conserved AS Events in the SR Protein Family
For our analyses of SR proteins in three species, we first
examined Arabidopsis genes encoding SR proteins. SR protein studies in Arabidopsis are quite advanced (Kalyna and
Barta 2004; Wang and Brendel 2004); in the Arabidopsis
genome, there are 19 known genes encoding SR protein
family members. These members are classified into seven
subfamilies (table 1). The SF2/ASF, SC35, and oneZn-knuckle–type 9G8 subfamilies are general splicing
factors and are essential for splicing activities (Graveley 2000).
Members of these three subfamilies are also found in the
genomes of animals (Graveley 2000), while the other four
subfamilies are plant specific. We searched for SR protein
homologues from the rice annotated gene set using a Blast
search and querying with Arabidopsis SR protein sequences. We found 24 SR protein family members in the rice
gene set (table 1); we collected mRNAs transcribed from
these loci and searched for AS events. Isshiki, Tsumoto,
and Shimamoto (2006) recently isolated and characterized
20 SR proteins from rice. Excluding Os01g21420 in the
SF2/ASF subfamily, Os03g24890 and Os11g47830 in the
SCL subfamily, and two genes in the SR45 subfamily, their
gene set is the same as ours. Fifteen of the 19 Arabidopsis SR
protein–encoded pre-mRNAs and 17 of the 24 rice SR protein–encoded pre-mRNAs were subjected to AS. Next, we
analyzed whether these AS events were conserved between
Arabidopsis and rice, and we found three conserved AS
events. These events were in the plant-novel-SR, SCL,
and two-Zn-knuckles–type 9G8 subfamilies. In this article,
we defined ‘‘conserved AS event’’ as set of AS events found
at introns of the same sites of amino acid sequences of homologous genes. See Materials and Methods for details.
The Plant-Novel-SR Protein Subfamily
We found two members of the plant-novel-SR protein
subfamily in the rice genome (table 1). We aligned the
amino acid sequences of four members of Arabidopsis
Conserved AS of SR Protein mRNAs in Land Plants 1087
Table 1
SR Proteins in Arabidopsis thaliana and Oryza sativa
Subfamily Name
SF2/ASF
Plant-novel-SR
SC35
SCL
One-Zn-knuckle–type
9G8
Two-Zn-knuckles–type
9G8
SR45
Species
Number of
Genes
A. thaliana
O. sativa
A. thaliana
O. sativa
A. thaliana
O. sativa
A. thaliana
O. sativa
4
4
4
2
1
3
4
6
A. thaliana
O. sativa
A. thaliana
O. sativa
A. thaliana
O. sativa
3
3
2
4
1
2
and two members of rice and compared the exon-intron structures of these genes. All had the same exon-intron structures
for RRM1 and RRM2 (fig. S1, Supplementary Material online). We could not obtain exact multiple alignments for the C
terminal arginine-serine-rich (RS) domain due to amino acid
sequences of low complexity. We found that introns of the
RRM1-encoded regions of each member were alternatively
spliced and that they were conserved AS events (figs. 1 and
2 and table 2). In each gene, AS events generated stop codons in RRM1-encoding regions (fig. S1, Supplementary
Material online). Alternatively, spliced forms of the mRNA
encoded proteins with truncated RRM1. The AS events in
atRSp31-At3g61860, atRSp32-At2g46610, and atRSp40At4g25500 were previously reported by Kalyna and Barta
(2004). We found that atRSp41-At5g52040 and two rice
homologues also had AS events. Our phylogenic analysis
of this subfamily resulted in a phylogenic tree, indicating
that these AS events had the same origin (fig. 3A). We could
trace the origin of the conserved AS event to a time prior to
the divergence of Arabidopsis and rice.
FIG. 1.—The domain structures of SR protein subfamilies. RRM,
RNA recognition motif; RS, arginine/serine-rich domain. Bars with
squares indicate the approximate positions where the conserved AS events
were found. In each subfamily, conserved AS events were found at introns
of RRM-coding regions. However, the positions of alternatively spliced
introns in these subfamilies were different.
Locus Names
At1g02840, At3g49430, At1g09140, At4g02430
Os05g30140, Os07g47630, Os03g22380, Os01g21420
At4g25500, At5g52040, At3g61860, At2g46610
Os02g03040, Os04g02870
At5g64200
Os07g43050, Os8g37960, Os03g27030
At1g55310, At3g13570, At3g55460, At5g18810
Os03g25770, Os07g43950, Os02g15310, OS12g38430,
Os03g24890, Os11g47830
At1g23860, At2g24590, At4g31580
Os06g08840, Os02g39720, Os02g54770
At3g53500, At2g37340
Os05g07000, Os01g06290, Os03g17710, Os05g02880
At1g16610
Os01g72890, Os05g01540
SCL Subfamily
In the SCL subfamily, we found six homologues in the
rice genome (table 1). Similar to our findings for the plantnovel-SR protein subfamily, the exon-intron structures of
all members were the same, excluding the RS domains
(fig. S1B, Supplementary Material online). AS events of atSCL30a-At3g13570 and atSCL33-At1g55310 have been
reported previously (Kalyna and Barta 2004). Recently,
AS events of osSCL26-Os03g25770 and osSCL30bOs12g38430 were also reported (Isshiki, Tsumoto, and
Shimamoto 2006). Our analysis determined that nearly
all members of the SCL subfamily (all six rice members
and three of four members of Arabidopsis [excluding
atSCL28-At5g18810]) had AS events (table 2). Six of these
genes
(atSCL33-At1g55310,
atSCL30a-At3g13570,
atSCL30-At3g55460, osSCL25-Os07g43950, osSCL30aOs02g15310, and osSCL30b-Os12g38430) had conserved
AS events at introns in RRM-encoding regions (figs. 1 and
2 and table 2). Although various types of AS events were
present, they had the same properties as AS events that
generated mRNAs encoding proteins with truncated RRMs
(table 2 and fig. S3 [Supplementary Material online]). The
position of the alternatively spliced introns was 4 aa (13 bp)
downstream relative to the conserved AS events of the
plant-novel-SR subfamily (fig. 2). We found conserved
AS events in six genes from all 10 members of the SCL
subfamily. The results of our phylogenetic analysis clearly
showed the existence of this AS event at the time of the
divergence of Arabidopsis and rice (fig. 3B). On the other
hand, three genes (osSCL26-Os03g25770, Os03g24890,
and Os11g47830) had AS events in RS domain-encoding
regions (table 2). These AS events were at different positions on multiple alignments, so we did not consider them
as conserved AS events.
Two-Zn-Knuckles–Type 9G8 Subfamily
We found four members of the two-Zn-knuckles–
type 9G8 subfamily in the rice genome (table 1). Several
AS events have previously been reported for this subfamily
(Kalyna, Lopato, and Barta 2003; Kalyna and Barta 2004;
1088 Iida and Go
FIG. 2.—A multiple alignment of amino acid sequences constructing the RRM region. Intron positions are indicated by black highlighted blocks.
Introns with two highlighted characters were phase zero, and the others were phase one or two. Subfamily classifications are indicated on the right: g1,
SCL subfamily; g2, two-Zn-knuckles–type 9G8 subfamily; and g3, plant-novel-SR protein subfamily. Conserved AS events are indicated below the
figure: i1, SCL subfamily; i2, two-Zn-knuckles–type 9G8 subfamily; and i3, plant-novel-SR protein subfamily. Asterisks (*) to the left of each gene
ID indicate genes containing the conserved AS events of each subfamily. Numbers to the right of the gene IDs indicate the amino acid positions of the
initiation sites of RRM regions in each protein. Sequences with gene IDs beginning with ‘‘At’’ are from Arabidopsis, while those with ‘‘Os’’ are from rice.
Isshiki, Tsumoto, and Shimamoto 2006). We reanalyzed
AS events in this subfamily to find conserved AS events.
The exon-intron structures of all Arabidopsis and rice homologues were the same, excluding the RS domains (fig.
S1C, Supplementary Material online). We found conserved
AS events in the introns of RRM-encoded regions of two
Arabidopsis members (atRSZ33-At2g37340 and atRSZ34At3g53500) and three rice members (osRSZ37bOs03g17710, osRSZ36-Os05g02880, and osRSZ37aOs01g06290) (table 2). The alternatively spliced introns
were located 9 aa (26 bp) upstream relative to the conserved
AS events of the plant-novel-SR protein subfamily gene
Table 2
AS Events Found in SR Protein–Coding Pre-mRNAs
Gene ID
Plant-novel-SR Protein Subfamily
Arabidopsis thaliana
At4g25500.1
At5g52040.2
At3g61860.1
At2g46610.1
Oryza sativa
Os02g03040.3
Os04g02870.2
SCL subfamily
Arabidopsis thaliana
Oryza sativa
At1g55310.1
At3g13570.1
At3g55460.1
At5g18810.1
Os03g25770.2
Os07g43950.1
Os02g15310.1
Os12g38430.1
Os03g24890.2
Os11g47830.1
Two-Zn-knuckles–type 9G8 subfamily
Arabidopsis thaliana
At2g37340.1
At3g53500.2
Oryza sativa
Os03g17710.1
Os05g02880.3
Os01g06290.1
Os05g07000.1
a
Each abbreviation is the same as in figure 3.
Product Name
Site of AS
Type of ASa
Result of AS
atRSp40/atRSP35
atRSp41
atRSp31
atRSp32
osRSp33
osRSp29
RRM1
RRM1
RRM1
RRM1
RRM1
RRM1
CE or AA, AD
CE or AA
AA
AD
RI
CE
Stop
Stop
Stop
Stop
Stop
Stop
codon
codon
codon
codon
codon
codon
atSR33/atSCL33
atSCL30a
atSCL30
atSCL28
osSCL26
osSCL25
osSCL30a
osSCL30b
RRM
RRM
RRM
CE
CE, RI
CE
Stop codon
Stop codon
Stop codon
RS domain
RRM
RRM
RRM
RS domain
RS domain
AA
CE
AD or RI, CE or AA
AD or RI, CE or AA
AT
AD
atRSZ33
atRSZ34
osRSZ37b
osRSZ36
osRSZ37a
osRSZ39
RRM
RRM
RRM
RRM
RRM
AA, RI
AA, RI
AA
AD or AT (, AI)
ME
Stop codon
Stop codon
Stop codon
Stop codon
Stop codon
Stop codon
Stop codon
Weakened RRM
Conserved AS of SR Protein mRNAs in Land Plants 1089
FIG. 3.—Phylogenetic trees of SR protein subfamilies. (A), (B), and (C) represent the plant-novel-SR protein subfamily, the SCL subfamily, and the
two-Zn-knuckles–type 9G8 subfamily, respectively. Genes with names such as ‘‘Contig;’’ or ‘‘Ppa;’’ were from moss, and these genes were used as an
evolutionary outgroup. The numbers indicate the scores of bootstrap tests with 100 bootstrap replicates. We describe regions where we found AS events
and AS types to the right of each gene ID. CE, cassette-type exon; AA, alternative acceptor; AD, alternative donor; RI, retained intron; and ME, mutually
exclusive. AI indicates the alternative initiation event. Although it was not an AS event, we included this information in the tree. See the text for details.
1090 Iida and Go
FIG. 4.—Exon-intron structures of the transcripts and reference sequences. (A) The osRSZ37a-Os01g06290 member of the rice two-Zn-knuckles–
type 9G8 subfamily. Exon3a and exon3b were selected mutually exclusively. (B) The osRSZ36-Os05g02880 member of the rice two-Zn-knuckles–type
9G8 subfamily. We found an alternative initiation event in addition to the conserved AS event. The arrow indicates introns where the conserved AS events
were found.
(fig. 2). For the two members of Arabidopsis and the
osRSZ36-Os05g02880 and osRSZ37b-Os03g17710 members of rice, conserved AS events created stop codons in
these transcripts (fig. S4, Supplementary Material online).
In osRSZ37a-Os01g06290, 43 bp of exon3a was selected
mutually exclusively over 40 bp of exon3b (fig. 4A). Even if
exon3b was used, there were no stop codons or frameshifts.
We checked the RRM consensus sequences of each AS isoform using a Pfam search (Bateman et al. 2004). Mature
mRNA of exon3b (reference sequence) encoded amino acid
sequences similar to the RRM consensus sequence, with an
E value of 2.6 3 10ÿ6. On the other hand, the AS isoform of
exon3a encoded amino acid sequences similar to RRM,
with an E value of 1.0 3 10ÿ4. This AS isoform encoded
an amino acid sequence of the RRM region with a weakened
RRM consensus. For the Os05g02880 member of rice, we
found an alternate initiation event in addition to the AS
event. This alternative initiation event generated a transcript
starting within the intron regions in which the conserved AS
events of this subfamily had been found (fig. 4B). Similar to
the conserved AS events, this alternative initiation event
generated an mRNA encoding a protein with a truncated
RRM. Phylogenetic tree analysis indicated that the con-
served AS events had an evolutionary root prior to the
branching of Arabidopsis and rice (fig. 3C).
Long Introns in Moss SR Protein Homologues
To trace the evolutionary origin of the conserved AS
events, we analyzed SR proteins in moss (P. patens). Moss
and flowering plants diverged about 400 MYA (Nishiyama
et al. 2003), and if we could determine AS events in moss,
we could then trace the evolutionary origin of the conserved
AS events to this era. In moss, we found a plant-novel-SR
protein subfamily member, three SCL subfamily members,
and a two-Zn-knuckles–type 9G8 subfamily member. The
number of SR protein genes should be greater, but neither
the complete genome nor the gene set from moss was currently available. Although a member of SCL (pphn44h03)
had no introns, the others had introns at the same positions
as in the Arabidopsis and rice members (fig. 5). Although
we could not locate AS events in moss due to a lack of transcript data, we obtained a set of results strongly indicating
the existence of conserved AS events in moss. A previous
study of Arabidopsis SR proteins (Kalyna and Barta 2004)
Conserved AS of SR Protein mRNAs in Land Plants 1091
FIG. 5.—A comparison of the exon-intron structures of SR proteins between moss, Arabidopsis, and rice. (A), (B), and (C) represent the plant-novelSR protein subfamily, the SCL subfamily, and the two-Zn-knuckles–type 9G8 subfamily, respectively. Excluding pphn44h03, all homologues had the
same exon-intron structures as those of Arabidopsis and rice. Arrows indicate conserved AS positions. The introns at these positions of moss were
remarkably long, suggesting AS. Contig11624 and Ppa#S17588830 were incomplete sequences.
determined that the lengths of alternatively spliced introns
were remarkably long compared to other introns in its genome. Most of the alternatively spliced introns of the SR
protein genes were over 400 bp in length. Such long introns
are specific to alternatively spliced introns of Arabidopsis
SR protein genes. The same properties were true of the rice
SR protein genes (figs. S2–S4, Supplementary Material online). Based on these properties, we presumed the existence
of AS events in the moss SR protein genes. We analyzed
introns of SR protein genes in moss and found long introns
at the same positions as the conserved AS events found in
Arabidopsis and rice. The gene encoding plant-novel-SR
protein in moss had an intron of over 670 bp in length
at the same position as of the conserved AS event. The intron length is likely longer than this because we could not
obtain complete sequence data. Each of the two genes of the
SCL subfamily had introns of 1,432 bp (Contig11106) and
.720 bp (Contig11624). The moss gene for two-Znknuckles–type 9G8 had an intron of .1,167 bp (figs. 5
and S5 [Supplementary Material online]).
Discussion
Conserved AS Events in Land Plants
We found three sets of evolutionarily conserved AS
events in SR protein families that were conserved between
monocots and dicots. Each event was an intrasubfamily
event and was found in the following subfamilies: plantnovel-SR protein, SCL, and two-Zn-knuckles–type 9G8
(figs. 1 and 2 and table 2). Each of the conserved AS events
included several types of AS events, including cassette
exon type, alternative donor/acceptor type, and retained intron type. Because the alternatively spliced introns were at
the same positions across multiple alignments, the evolutionary conservation of these events was clear. The type
of AS event seemed to change along with evolutionary
divergence. Although the types of AS events varied, conserved AS events had similar properties in that they
generated mRNA encoding proteins with incomplete (truncated or weakened) RRMs. We hypothesized that in regard to AS events of SR proteins, the selection pressure
1092 Iida and Go
FIG. 6.—A model for the regulation of the AS profile mediated by AS events in SR protein–encoding mRNAs. Transcriptional and AS regulations of
SR proteins lead to oscillating expression of the SR protein products. Transcriptomes adapted to each condition may be created by SR protein regulation.
was toward encoding amino acid sequences for conserved
processes. In other words, all AS event types that generate
mRNAs encoding proteins with incomplete RRMs might
actually be caused by the same selection pressure. Given
this point of view, the origin of the alternative initiation
event found in the osRSZ36-Os05g02880 member of the
two-Zn-knuckles–type 9G8 subfamily was the same as that
of the conserved AS events across the subfamily.
Although all three conserved AS events were found
near the centers of RRM-coding regions, the positions of
the three events were not identical (fig. 2). We regarded
each of the three AS events as conserved only in each subfamily but not across subfamilies. We emphasize that despite the fact that they seemed to be of differing origins,
all three conserved AS events generated stop codons in
RRM-encoding regions. This result suggested the importance of generating mRNAs encoding proteins with incomplete RRMs by AS.
Functions of Conserved AS Events
Each conserved AS event was found in an RRM-coding region. We expected that these AS events would greatly
influence SR protein function because the RRMs are essential for SR protein function (Chandler et al. 1997). The proteins with truncated RRMs that were generated by the AS
events should thus be nonfunctional. Another possibility is
that mRNAs with abnormal stop codons might be targeted
by nonsense-mediated mRNA decay (Lewis, Green, and
Brenner 2003). In either case, the function of the SR proteins might be decreased. For mouse SRp20 and human
SC35, ‘‘autoregulating’’ AS mechanisms have been
reported (Jumaa and Nielsen 1997; Sureau et al. 2001).
In Arabidopsis, autoregulation for arSRp30 and atRSZ33
was also reported (Lopato et al. 1999; Kalyna, Lopato,
and Barta 2003). In these cases, pre-mRNA splicing of
an SR protein itself was influenced by the quantity of its
own protein products. At the same time, pre-mRNA splicing controlled the quantity of its own protein products. We
propose that a similar regulation mechanism could occur in
the SR protein families of Arabidopsis and rice. They are
not necessarily cases of self-regulation in Arabidopsis or
rice, however, because many SR protein genes were the result of gene duplications. Duplicated SR proteins would
create complicated regulation pathways. Regardless of
self-regulation or non–self-regulation, we assume that the
conserved AS events are important in controlling the
amount of functional SR protein products.
In our previous study, we reported large-scale changes
of AS profiles according to the expressing organs and environmental stresses (Iida et al. 2004). In that report, we
observed induced expression and AS events of mRNA
encoding SR proteins under similar conditions. Based on
these results, we assumed that regulating the pre- and posttranscriptional levels of SR proteins was critical in controlling whole AS profiles. Our hypothesis of transcriptome
regulation mediated by AS of SR protein mRNAs is shown
in figure 6. The expression levels of SR protein products are
regulated by both transcriptional and AS control. Inducing
the expression of pre-mRNA of SR proteins and AS events
that created mature mRNAs encoding truncated proteins induced oscillating expression of the SR protein product.
Entire AS profiles are influenced by SR proteins, and
transcriptomes are constructed to adapt to each condition.
Other elements, such as protein degradation, phosphorylation events, and localizations are certainly also important
for the regulation of AS. However, transcript-level regulation mediated by conserved AS events must form a critical
Conserved AS of SR Protein mRNAs in Land Plants 1093
system of regulation as the events are highly conserved in
evolution.
We note that all subfamilies containing the conserved
AS events (plant-novel-SR, SCL, and two-Zn-knuckles–
type 9G8) were plant specific. These subfamilies are important in regulating AS (Lopato et al. 1999; Kalyna, Lopato,
and Barta 2003). The fact that all the conserved AS events
were found in plant-specific SR protein subfamilies supports our previous hypothesis that the conserved AS events
of SR protein pre-mRNAs are critical events in regulating AS.
The Origin of Conserved AS Events and
Land Plant Evolution
Each conserved AS event originated prior to the
branching between Arabidopsis and rice, a fact supported
by the results of our phylogenetic tree analysis (fig. 3) and
the widespread AS events in each subfamily. Arabidopsis
thalianaandO.sativa—adicot and a monocot, respectively—
diverged 145–206 MYA (Yu et al. 2002), and we traced
the conserved AS events back to this era. We obtained
results indicating the possibility of an even more ancient
origin for these conserved AS events. Long introns found
in moss SR protein homologues might be alternatively
spliced, and the origin of the conserved AS events could
be as ancient as 400 MYA, when moss and flowering
plants diverged and ancestral plants first invaded land.
At that time, the ancestors of land plants were exposed to
drought conditions and drastic temperature changes. Land
plants acquired life cycles with various developmental
stages, as well as various tissues and organs. For complicated life cycles, developmental stages, tissues, and
organs, a more complicated transcriptome was required.
We speculate that the AS events found in the SR proteins
greatly contributed to obtaining a transcriptome that was
adapted for each new requirement. A more complicated
transcriptome might well have allowed plants to live on
land.
Supplementary Material
Supplementary figures S1–S5 and multiple alignments
of transcript sequences for each locus are available at Molecular Biology and Evolution online (http://www.mbe.
oxfordjournals.org/).
Acknowledgments
This work was supported by Grants-in-Aid for Scientific Research (C) and for Priority Area ‘‘Genome Information Science’’ to M.G. from the Ministry of Education,
Culture, Sports, Science, and Technology of Japan.
Literature Cited
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang,
W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.
Bateman,A.,L.Coin,R.Durbinetal.(13co-authors).2004.ThePfam
protein families database. Nucleic Acids Res. 32:D138–D141.
Brendel, V., L. Xing, and W. Zhu. 2004. Gene structure prediction
from consensus spliced alignment of multiple ESTs matching
the same genomic locus. Bioinformatics 20:1157–1169.
Carninci, P., T. Kasukawa, S. Katayama et al. (194 co-authors).
2005. The transcriptional landscape of the mammalian
genome. Science 309:1559–1563.
Chandler, S. D., A. Mayeda, J. M. Yeakley, A. R. Krainer, and X.
D. Fu. 1997. RNA splicing specificity determined by the coordinated action of RNA recognition motifs in SR proteins.
Proc. Natl. Acad. Sci. USA 94:3596–3601.
Gao, H., W. J. Gordon-Kamm, and L. A. Lyznik. 2004. ASF/SF2like maize pre-mRNA splicing factors affect splice site utilization and their transcripts are alternatively spliced. Gene
339:25–37.
Graveley, B. R. 2000. Sorting out the complexity of SR protein
functions. RNA 6:1197–1211.
Gupta, S., B. B. Wang, G. A. Stryker, M. E. Zanetti, and S. K. Lal.
2005. Two novel arginine/serine (SR) proteins in maize are
differentially spliced and utilize non-canonical splice sites.
Biochim. Biophys. Acta 1728:105–114.
Haas, B. J., N. Volfovsky, C. D. Town, M. Troukhan, N. Alexandrov, K. A. Feldmann, R. B. Flavell, O. White, and S. L.
Salzberg. 2002. Full-length messenger RNA sequences
greatly improve genome annotation. Genome Biol. 3:RESEARCH0029.
Iida, K., M. Seki, T. Sakurai, M. Satou, K. Akiyama, T. Toyoda,
A. Konagaya, and K. Shinozaki. 2004. Genome-wide analysis
of alternative pre-mRNA splicing in Arabidopsis thaliana
based on full-length cDNA sequences. Nucleic Acids Res.
32:5096–5103.
Isshiki, M., A. Tsumoto, and K. Shimamoto. 2006. The serine/
arginine-rich protein family in rice plays important roles in
constitutive and alternative splicing of pre-mRNA. Plant Cell
18:146–158.
Jumaa, H., and P. J. Nielsen. 1997. The splicing factor SRp20
modifies splicing of its own mRNA and ASF/SF2 antagonizes
this regulation. EMBO J. 16:5077–5085.
Kalyna, M., and A. Barta. 2004. A plethora of plant serine/arginine-rich proteins: redundancy or evolution of novel gene
functions? Biochem. Soc. Trans. 32:561–564.
Kalyna, M., S. Lopato, and A. Barta. 2003. Ectopic expression of
atRSZ33 reveals its function in splicing and causes pleiotropic
changes in development. Mol. Biol. Cell 14:3565–3577.
Kan, Z., D. States, and W. Gish. 2002. Selecting for functional
alternative splices in ESTs. Genome Res. 12:1837–1845.
Kikuchi, S., K. Satoh, T. Nagata et al. (74 co-authors). 2003. Collection, mapping, and annotation of over 28,000 cDNA clones
from japonica rice. Science 303:376–379.
Lewis, B. P., B. E. Green, and S. E. Brenner. 2003. Evidence for
the widespread coupling of alternative splicing and nonsensemediated mRNA decay in humans. Proc. Natl. Acad. Sci. USA
100:189–192.
Lopato, S., M. Kalyna, S. Dorner, R. Kobayashi, A. R. Krainer,
and A. Barta. 1999. atSRp30, one of two SF2/ASF-like proteins from Arabidopsis thaliana, regulates splicing of specific
plant genes. Genes Dev. 13:987–1001.
Macknight, R., M. Duroux, R. Laurie, P. Dijkwel, G. Simpson,
and C. Dean. 2002. Functional significance of the alternative
transcript processing of the Arabidopsis floral promoter FCA.
Plant Cell 14:877–888.
Nishiyama, T., T. Fujita, T. Shin-I et al. (12 co-authors). 2003.
Comparative genomics of Physcomitrella patens gametophytic
transcriptome and Arabidopsis thaliana: implication for land
plant evolution. Proc. Natl. Acad. Sci. USA 100:8007–8012.
Okazaki, Y., M. Furuno, T. Kasukawa et al. (137 co-authors).
2002. Analysis of the mouse transcriptome based on functional
annotation of 60,770 full-length cDNAs. Nature 420:563–573.
1094 Iida and Go
Seki, M., M. Narusaka, A. Kamiya et al. (20 co-authors). 2002.
Functional annotation of a full-length Arabidopsis cDNA
collection. Science 296:141–145.
Shi, H., L. Xiong, B. Stevenson, T. Lu, and J. K. Zhu. 2002. The
Arabidopsis salt overly sensitive 4 mutants uncover a critical role for vitamin B6 in plant salt tolerance. Plant Cell
14:575–588.
Sureau, A., R. Gattoni, Y. Dooghe, J. Stevenin, and J. Soret. 2001.
SC35 autoregulates its expression by promoting splicing
events that destabilize its mRNAs. EMBO J. 20:1785–1796.
Sutton, G., O. White, D. Adams, and A. Kerlavage. 1995. TIGR
assembler: a new tool for assembling large shotgun sequencing
projects. Genome Sci. Technol. 1:9–18.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple
sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids
Res. 22:4673–4680.
Wang, B. B., and V. Brendel. 2004. The ASRG database: identification and survey of Arabidopsis thaliana genes involved
in pre-mRNA splicing. Genome Biol. 5:R102.
Wheeler, D. L., D. M. Church, S. Federhen et al. (11 co-authors).
2003. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31:28–33.
Yoshimura, K., Y. Yabuta, T. Ishikawa, and S. Shigeoka. 2002.
Identification of a cis element for tissue-specific alternative
splicing of chloroplast ascorbate peroxidase pre-mRNA in
higher plants. J. Biol. Chem. 277:40623–40632.
Yu, J., S. Hu, J. Wang et al. (100 co-authors). 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92.
Takashi Gojobori, Associate Editor
Accepted March 1, 2006