* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Survey of Conserved Alternative Splicing Events
Survey
Document related concepts
Epitranscriptome wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Human genome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Pathogenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genetic code wikipedia , lookup
Genome evolution wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Primary transcript wikipedia , lookup
Transcript
Survey of Conserved Alternative Splicing Events of mRNAs Encoding SR Proteins in Land Plants Kei Iida* and Mitiko Go* *Faculty of Bio-Science, Nagahama Institute of Bio-Science and Technology, Siga, Japan; and Ochanomizu University, Tokyo, Japan The serine/arginine-rich (SR) protein family plays an important role in constitutive and alternative splicing (AS). These proteins regulate AS in a tissue-specific and stress-responsive manner. Pre-mRNAs encoding SR proteins are often alternatively spliced, and these AS events may be important for the regulation of AS events of other pre-mRNAs. In this study, we analyzed AS events of SR proteins in Arabidopsis thaliana and Oryza sativa (rice). We found three sets of AS events conserved between Arabidopsis and rice. These conserved AS events were found in the plant-novel-SR protein, SC35-like (SCL), and two-Zn-knuckles–type 9G8 subfamilies. Each member of these subfamilies has at least one RNA recognition motif (RRM) and at least one intron in the RRM-encoded region. We found that the conserved AS events occurred in these introns and, in each case, the conserved AS events resulted in mature mRNAs encoding proteins with incomplete RRMs. To search for the evolutionary origin of these AS events, we analyzed SR proteins in Physcomitrella patens (moss) in addition to those in Arabidopsis and rice. We found moss homologues of the plant-novel-SR protein, SCL, and the two-Zn-knuckles– type 9G8 subfamilies in silico, and these homologues have long introns at the same location of the conserved AS sites in Arabidopsis and rice. Such long introns are quite specific for alternatively spliced introns concerning the Arabidopsis SR protein genes. The long introns found in the moss SR protein genes strongly suggested that conserved AS events in moss SR protein genes might be similar to those in Arabidopsis and rice. We traced the evolutionary origin of the conserved AS events to 400 MYA, when plants first invaded land. These events are likely important in the regulation of whole AS events and likely contribute to the complicated transcriptome described by AS. The complicated transcriptome created by regulated AS events might have provided plants tolerance against droughts or temperature shifts and given them the ability to live on land. Introduction Alternative splicing (AS) is a mechanism by which multiple forms of mature mRNAs are made from a single, premature mRNA. In Arabidopsis and rice (Oryza sativa), 10%–20% of all pre-mRNAs undergo AS (Kikuchi et al. 2003; Iida et al. 2004). The rates of AS in Arabidopsis and rice are lower than those in human or mouse, in which 50% of all genes undergo AS (Kan, States, and Gish 2002; Carninci et al. 2005). Despite the lower rate, AS is also important in plants. Several AS events play regulatory roles in development, in specific tissues, or in response to environmental stress (Macknight et al. 2002; Shi et al. 2002; Yoshimura et al. 2002). In our previous study, we found large-scale changes in the relative quantities of AS transcripts (which we referred to as ‘‘AS profiles’’) in some organs and in responses to stress (Iida et al. 2004). In that study, we discussed AS events in pre-mRNAs that encode splicing factors, especially we regarded the serine/argininerich (SR) proteins as possible regulators of entire AS profiles. In the Arabidopsis genome, there are 19 SR proteins (Kalyna and Barta 2004). Many pre-mRNAs encoding SR proteins undergo AS (Kalyna and Barta 2004; Wang and Brendel 2004). If they are truly important for the regulation of entire AS profiles, we would expect the AS events of pre-mRNAs that encode SR proteins to be highly conserved across evolution. The SR proteins of Arabidopsis are classified into seven subfamilies: SF2/ASF, SC35, one-Zn-knuckle–type 9G8, two-Zn-knuckles–type 9G8, SCL, plant-novel-SR protein, and SR45 (Kalyna and Barta 2004). For Arabidopsis, each subfamily contains more than two members, except for SC35 and SR45 subfamilies. Although one might Key words: Arabidopsis thaliana, Oryza sativa, Physcomitrella patens, transcriptome, stress response, land plant evolution. E-mail: [email protected]. Mol. Biol. Evol. 23(5):1085–1094. 2006 doi:10.1093/molbev/msj118 Advance Access publication March 6, 2006 Ó The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] expect gene duplication events in this family, no reports to date have examined whether SR protein AS events have a common evolutionary origin. On the other hand, several groups have reported AS events of pre-mRNAs encoding SR proteins in other species (Gao, Gordon-Kamm, and Lyznik 2004; Gupta et al. 2005) but did not determine whether the AS events in those species were conserved. In this study, we compared AS events of mRNAs encoding SR proteins in Arabidopsis and rice and attempted to determine the evolutionary origin of these AS events. In tracing the origin of the AS events, moss (Physcomitrella patens) is an important target. Mosses and flowering plants are both land plants but are thought to have diverged about 400 MYA (Nishiyama et al. 2003), about twice as ancient as the divergence of Arabidopsis and rice (145–206 MYA; Yu et al. 2002). By analyzing moss, we expected to obtain evolutionarily primitive information about AS events of SR proteins. Although the likelihood of finding conserved AS events in moss was low—because so little moss transcript data were known—we expected to find AS candidates from the moss genomic sequence. In Arabidopsis, the alternatively spliced introns of the genes of SR proteins are often remarkably long (.400 bp) when compared to other introns in Arabidopsis (Kalyna and Barta 2004). Most such introns in the Arabidopsis SR protein genes are alternatively spliced. Based on this remarkable property, we studied the possibility of AS events in moss genes encoding SR proteins. We compared probable AS events to those found in Arabidopsis and rice and traced the evolutionary origin of these conserved AS events. Materials and Methods Data Set For Arabidopsis thaliana, we used the complete genome sequence released by The Institute for Genomic 1086 Iida and Go Research (TIGR) database (Haas et al. 2002, ver. 5.0) and the annotated gene set. To identify AS events, we used transcript data from National Center for Biotechnology Information (NCBI) Unigene (Wheeler et al. 2003), RIKEN full-length cDNAs (Seki et al. 2002), and Ceres Inc. fulllength cDNAs (Haas et al. 2002). Sequence identifiers were beginning with the letters ‘‘At#S,’’ ‘‘RAFL,’’ and ‘‘ceres’’ for Unigene, RIKEN full-length cDNAs, and Ceres Inc. full-length cDNAs, respectively, in supplementary figures. For O. sativa (rice), we used pseudomolecules of the complete genome (ver. 3.0) and the annotated gene set released by TIGR. We used NCBI Unigene data and ‘‘kome’’ full cDNAs (Kikuchi et al. 2003) for the rice transcripts. Sequence identifiers were beginning with the letters ‘‘Os#S’’ and kome for Unigene and kome full cDNAs, respectively, in supplementary figures. For the P. patens (moss) genome, we referenced PHYSCObase (Nishiyama et al. 2003). We used Blast searches (Altschul et al. 1997) of PHYSCObase to obtain the sequences of genomic fragments (database date, September 6, 2005). For the moss transcript, we used contig sequences of full cDNAs on PHYSCObase and Unigene (NCBI). Identification of Exon-Intron Structures and AS Events To detect AS events, we identified the exon-intron structures of SR protein genes. For Arabidopsis and rice, we identified exon-intron structures by mapping transcripts to the genomes. We mapped transcripts in two steps. First, we roughly mapped transcripts to the genomes using Blast and determined their loci. In the next step, we precisely aligned the transcripts to the loci sequences using GeneSeqer (Brendel, Xing, and Zhu 2004) and identified the exon-intron structures of SR protein genes. We identified AS events based on exon-intron structures (Okazaki et al. 2002; Iida et al. 2004). For each locus, we clustered transcripts from the locus and determined the genomic exon-intron structures. Nucleotides were treated as genomic exon nucleotides if they were found in an exon of any transcript. We compared the exon-intron structures of each transcript and genome and identified AS events. Identification of Conserved AS Events We surveyed for conserved AS events based on the positions of alternatively spliced introns during multiple alignments of amino acid sequences. We used reference sequences for the multiple alignments. For Arabidopsis and rice genes, we used the sequences annotated by TIGR as the reference sequences. In loci with multiple spliced forms, some sequences have incomplete domain organization. We chose sequences with the characteristic domain organization of each subfamily. For each gene, we compared the reference sequence and transcripts from the locus and mapped the AS events to the intron positions of the reference sequence. We compared the positions of alternatively spliced introns between multiple alignments and defined AS events at the same position on multiple alignments as ‘‘conserved AS events.’’ We made multiple alignments of reference amino acid sequences of each subfamily using ClustalW (Thompson, Higgins, and Gibson 1994) and determined the conserved AS events from the alignments. We cre- ated phylogenetic trees to study the gene duplication events in the evolutionary pathways leading to moss, Arabidopsis, and rice. We made phylogenetic trees using the maximum likelihood method of the PHYLIP software package (http://evolution.genetics.washington.edu/phylip.html). The phylogenetic trees are displayed using TreeView (http:// taxonomy.zoology.gla.ac.uk/rod/treeview.html). Assembling the Moss Genomic Sequence Fragments The genome sequence of moss (P. patens) was not assembled as of September, 2005, so we could use only fragment sequences. We needed genomic sequences to identify the exon-intron structures of SR protein genes and to search for AS events in moss. We assembled several parts of the moss genome. First, we searched for SR protein homologues from the moss transcript set using Blast. Next, we searched for fragment sequences of loci encoding moss SR protein genes, using Blast with the identified transcript sequences as queries. Finally, we assembled the sequences using the TIGR assembler (Sutton et al. 1995). Results Conserved AS Events in the SR Protein Family For our analyses of SR proteins in three species, we first examined Arabidopsis genes encoding SR proteins. SR protein studies in Arabidopsis are quite advanced (Kalyna and Barta 2004; Wang and Brendel 2004); in the Arabidopsis genome, there are 19 known genes encoding SR protein family members. These members are classified into seven subfamilies (table 1). The SF2/ASF, SC35, and oneZn-knuckle–type 9G8 subfamilies are general splicing factors and are essential for splicing activities (Graveley 2000). Members of these three subfamilies are also found in the genomes of animals (Graveley 2000), while the other four subfamilies are plant specific. We searched for SR protein homologues from the rice annotated gene set using a Blast search and querying with Arabidopsis SR protein sequences. We found 24 SR protein family members in the rice gene set (table 1); we collected mRNAs transcribed from these loci and searched for AS events. Isshiki, Tsumoto, and Shimamoto (2006) recently isolated and characterized 20 SR proteins from rice. Excluding Os01g21420 in the SF2/ASF subfamily, Os03g24890 and Os11g47830 in the SCL subfamily, and two genes in the SR45 subfamily, their gene set is the same as ours. Fifteen of the 19 Arabidopsis SR protein–encoded pre-mRNAs and 17 of the 24 rice SR protein–encoded pre-mRNAs were subjected to AS. Next, we analyzed whether these AS events were conserved between Arabidopsis and rice, and we found three conserved AS events. These events were in the plant-novel-SR, SCL, and two-Zn-knuckles–type 9G8 subfamilies. In this article, we defined ‘‘conserved AS event’’ as set of AS events found at introns of the same sites of amino acid sequences of homologous genes. See Materials and Methods for details. The Plant-Novel-SR Protein Subfamily We found two members of the plant-novel-SR protein subfamily in the rice genome (table 1). We aligned the amino acid sequences of four members of Arabidopsis Conserved AS of SR Protein mRNAs in Land Plants 1087 Table 1 SR Proteins in Arabidopsis thaliana and Oryza sativa Subfamily Name SF2/ASF Plant-novel-SR SC35 SCL One-Zn-knuckle–type 9G8 Two-Zn-knuckles–type 9G8 SR45 Species Number of Genes A. thaliana O. sativa A. thaliana O. sativa A. thaliana O. sativa A. thaliana O. sativa 4 4 4 2 1 3 4 6 A. thaliana O. sativa A. thaliana O. sativa A. thaliana O. sativa 3 3 2 4 1 2 and two members of rice and compared the exon-intron structures of these genes. All had the same exon-intron structures for RRM1 and RRM2 (fig. S1, Supplementary Material online). We could not obtain exact multiple alignments for the C terminal arginine-serine-rich (RS) domain due to amino acid sequences of low complexity. We found that introns of the RRM1-encoded regions of each member were alternatively spliced and that they were conserved AS events (figs. 1 and 2 and table 2). In each gene, AS events generated stop codons in RRM1-encoding regions (fig. S1, Supplementary Material online). Alternatively, spliced forms of the mRNA encoded proteins with truncated RRM1. The AS events in atRSp31-At3g61860, atRSp32-At2g46610, and atRSp40At4g25500 were previously reported by Kalyna and Barta (2004). We found that atRSp41-At5g52040 and two rice homologues also had AS events. Our phylogenic analysis of this subfamily resulted in a phylogenic tree, indicating that these AS events had the same origin (fig. 3A). We could trace the origin of the conserved AS event to a time prior to the divergence of Arabidopsis and rice. FIG. 1.—The domain structures of SR protein subfamilies. RRM, RNA recognition motif; RS, arginine/serine-rich domain. Bars with squares indicate the approximate positions where the conserved AS events were found. In each subfamily, conserved AS events were found at introns of RRM-coding regions. However, the positions of alternatively spliced introns in these subfamilies were different. Locus Names At1g02840, At3g49430, At1g09140, At4g02430 Os05g30140, Os07g47630, Os03g22380, Os01g21420 At4g25500, At5g52040, At3g61860, At2g46610 Os02g03040, Os04g02870 At5g64200 Os07g43050, Os8g37960, Os03g27030 At1g55310, At3g13570, At3g55460, At5g18810 Os03g25770, Os07g43950, Os02g15310, OS12g38430, Os03g24890, Os11g47830 At1g23860, At2g24590, At4g31580 Os06g08840, Os02g39720, Os02g54770 At3g53500, At2g37340 Os05g07000, Os01g06290, Os03g17710, Os05g02880 At1g16610 Os01g72890, Os05g01540 SCL Subfamily In the SCL subfamily, we found six homologues in the rice genome (table 1). Similar to our findings for the plantnovel-SR protein subfamily, the exon-intron structures of all members were the same, excluding the RS domains (fig. S1B, Supplementary Material online). AS events of atSCL30a-At3g13570 and atSCL33-At1g55310 have been reported previously (Kalyna and Barta 2004). Recently, AS events of osSCL26-Os03g25770 and osSCL30bOs12g38430 were also reported (Isshiki, Tsumoto, and Shimamoto 2006). Our analysis determined that nearly all members of the SCL subfamily (all six rice members and three of four members of Arabidopsis [excluding atSCL28-At5g18810]) had AS events (table 2). Six of these genes (atSCL33-At1g55310, atSCL30a-At3g13570, atSCL30-At3g55460, osSCL25-Os07g43950, osSCL30aOs02g15310, and osSCL30b-Os12g38430) had conserved AS events at introns in RRM-encoding regions (figs. 1 and 2 and table 2). Although various types of AS events were present, they had the same properties as AS events that generated mRNAs encoding proteins with truncated RRMs (table 2 and fig. S3 [Supplementary Material online]). The position of the alternatively spliced introns was 4 aa (13 bp) downstream relative to the conserved AS events of the plant-novel-SR subfamily (fig. 2). We found conserved AS events in six genes from all 10 members of the SCL subfamily. The results of our phylogenetic analysis clearly showed the existence of this AS event at the time of the divergence of Arabidopsis and rice (fig. 3B). On the other hand, three genes (osSCL26-Os03g25770, Os03g24890, and Os11g47830) had AS events in RS domain-encoding regions (table 2). These AS events were at different positions on multiple alignments, so we did not consider them as conserved AS events. Two-Zn-Knuckles–Type 9G8 Subfamily We found four members of the two-Zn-knuckles– type 9G8 subfamily in the rice genome (table 1). Several AS events have previously been reported for this subfamily (Kalyna, Lopato, and Barta 2003; Kalyna and Barta 2004; 1088 Iida and Go FIG. 2.—A multiple alignment of amino acid sequences constructing the RRM region. Intron positions are indicated by black highlighted blocks. Introns with two highlighted characters were phase zero, and the others were phase one or two. Subfamily classifications are indicated on the right: g1, SCL subfamily; g2, two-Zn-knuckles–type 9G8 subfamily; and g3, plant-novel-SR protein subfamily. Conserved AS events are indicated below the figure: i1, SCL subfamily; i2, two-Zn-knuckles–type 9G8 subfamily; and i3, plant-novel-SR protein subfamily. Asterisks (*) to the left of each gene ID indicate genes containing the conserved AS events of each subfamily. Numbers to the right of the gene IDs indicate the amino acid positions of the initiation sites of RRM regions in each protein. Sequences with gene IDs beginning with ‘‘At’’ are from Arabidopsis, while those with ‘‘Os’’ are from rice. Isshiki, Tsumoto, and Shimamoto 2006). We reanalyzed AS events in this subfamily to find conserved AS events. The exon-intron structures of all Arabidopsis and rice homologues were the same, excluding the RS domains (fig. S1C, Supplementary Material online). We found conserved AS events in the introns of RRM-encoded regions of two Arabidopsis members (atRSZ33-At2g37340 and atRSZ34At3g53500) and three rice members (osRSZ37bOs03g17710, osRSZ36-Os05g02880, and osRSZ37aOs01g06290) (table 2). The alternatively spliced introns were located 9 aa (26 bp) upstream relative to the conserved AS events of the plant-novel-SR protein subfamily gene Table 2 AS Events Found in SR Protein–Coding Pre-mRNAs Gene ID Plant-novel-SR Protein Subfamily Arabidopsis thaliana At4g25500.1 At5g52040.2 At3g61860.1 At2g46610.1 Oryza sativa Os02g03040.3 Os04g02870.2 SCL subfamily Arabidopsis thaliana Oryza sativa At1g55310.1 At3g13570.1 At3g55460.1 At5g18810.1 Os03g25770.2 Os07g43950.1 Os02g15310.1 Os12g38430.1 Os03g24890.2 Os11g47830.1 Two-Zn-knuckles–type 9G8 subfamily Arabidopsis thaliana At2g37340.1 At3g53500.2 Oryza sativa Os03g17710.1 Os05g02880.3 Os01g06290.1 Os05g07000.1 a Each abbreviation is the same as in figure 3. Product Name Site of AS Type of ASa Result of AS atRSp40/atRSP35 atRSp41 atRSp31 atRSp32 osRSp33 osRSp29 RRM1 RRM1 RRM1 RRM1 RRM1 RRM1 CE or AA, AD CE or AA AA AD RI CE Stop Stop Stop Stop Stop Stop codon codon codon codon codon codon atSR33/atSCL33 atSCL30a atSCL30 atSCL28 osSCL26 osSCL25 osSCL30a osSCL30b RRM RRM RRM CE CE, RI CE Stop codon Stop codon Stop codon RS domain RRM RRM RRM RS domain RS domain AA CE AD or RI, CE or AA AD or RI, CE or AA AT AD atRSZ33 atRSZ34 osRSZ37b osRSZ36 osRSZ37a osRSZ39 RRM RRM RRM RRM RRM AA, RI AA, RI AA AD or AT (, AI) ME Stop codon Stop codon Stop codon Stop codon Stop codon Stop codon Stop codon Weakened RRM Conserved AS of SR Protein mRNAs in Land Plants 1089 FIG. 3.—Phylogenetic trees of SR protein subfamilies. (A), (B), and (C) represent the plant-novel-SR protein subfamily, the SCL subfamily, and the two-Zn-knuckles–type 9G8 subfamily, respectively. Genes with names such as ‘‘Contig;’’ or ‘‘Ppa;’’ were from moss, and these genes were used as an evolutionary outgroup. The numbers indicate the scores of bootstrap tests with 100 bootstrap replicates. We describe regions where we found AS events and AS types to the right of each gene ID. CE, cassette-type exon; AA, alternative acceptor; AD, alternative donor; RI, retained intron; and ME, mutually exclusive. AI indicates the alternative initiation event. Although it was not an AS event, we included this information in the tree. See the text for details. 1090 Iida and Go FIG. 4.—Exon-intron structures of the transcripts and reference sequences. (A) The osRSZ37a-Os01g06290 member of the rice two-Zn-knuckles– type 9G8 subfamily. Exon3a and exon3b were selected mutually exclusively. (B) The osRSZ36-Os05g02880 member of the rice two-Zn-knuckles–type 9G8 subfamily. We found an alternative initiation event in addition to the conserved AS event. The arrow indicates introns where the conserved AS events were found. (fig. 2). For the two members of Arabidopsis and the osRSZ36-Os05g02880 and osRSZ37b-Os03g17710 members of rice, conserved AS events created stop codons in these transcripts (fig. S4, Supplementary Material online). In osRSZ37a-Os01g06290, 43 bp of exon3a was selected mutually exclusively over 40 bp of exon3b (fig. 4A). Even if exon3b was used, there were no stop codons or frameshifts. We checked the RRM consensus sequences of each AS isoform using a Pfam search (Bateman et al. 2004). Mature mRNA of exon3b (reference sequence) encoded amino acid sequences similar to the RRM consensus sequence, with an E value of 2.6 3 10ÿ6. On the other hand, the AS isoform of exon3a encoded amino acid sequences similar to RRM, with an E value of 1.0 3 10ÿ4. This AS isoform encoded an amino acid sequence of the RRM region with a weakened RRM consensus. For the Os05g02880 member of rice, we found an alternate initiation event in addition to the AS event. This alternative initiation event generated a transcript starting within the intron regions in which the conserved AS events of this subfamily had been found (fig. 4B). Similar to the conserved AS events, this alternative initiation event generated an mRNA encoding a protein with a truncated RRM. Phylogenetic tree analysis indicated that the con- served AS events had an evolutionary root prior to the branching of Arabidopsis and rice (fig. 3C). Long Introns in Moss SR Protein Homologues To trace the evolutionary origin of the conserved AS events, we analyzed SR proteins in moss (P. patens). Moss and flowering plants diverged about 400 MYA (Nishiyama et al. 2003), and if we could determine AS events in moss, we could then trace the evolutionary origin of the conserved AS events to this era. In moss, we found a plant-novel-SR protein subfamily member, three SCL subfamily members, and a two-Zn-knuckles–type 9G8 subfamily member. The number of SR protein genes should be greater, but neither the complete genome nor the gene set from moss was currently available. Although a member of SCL (pphn44h03) had no introns, the others had introns at the same positions as in the Arabidopsis and rice members (fig. 5). Although we could not locate AS events in moss due to a lack of transcript data, we obtained a set of results strongly indicating the existence of conserved AS events in moss. A previous study of Arabidopsis SR proteins (Kalyna and Barta 2004) Conserved AS of SR Protein mRNAs in Land Plants 1091 FIG. 5.—A comparison of the exon-intron structures of SR proteins between moss, Arabidopsis, and rice. (A), (B), and (C) represent the plant-novelSR protein subfamily, the SCL subfamily, and the two-Zn-knuckles–type 9G8 subfamily, respectively. Excluding pphn44h03, all homologues had the same exon-intron structures as those of Arabidopsis and rice. Arrows indicate conserved AS positions. The introns at these positions of moss were remarkably long, suggesting AS. Contig11624 and Ppa#S17588830 were incomplete sequences. determined that the lengths of alternatively spliced introns were remarkably long compared to other introns in its genome. Most of the alternatively spliced introns of the SR protein genes were over 400 bp in length. Such long introns are specific to alternatively spliced introns of Arabidopsis SR protein genes. The same properties were true of the rice SR protein genes (figs. S2–S4, Supplementary Material online). Based on these properties, we presumed the existence of AS events in the moss SR protein genes. We analyzed introns of SR protein genes in moss and found long introns at the same positions as the conserved AS events found in Arabidopsis and rice. The gene encoding plant-novel-SR protein in moss had an intron of over 670 bp in length at the same position as of the conserved AS event. The intron length is likely longer than this because we could not obtain complete sequence data. Each of the two genes of the SCL subfamily had introns of 1,432 bp (Contig11106) and .720 bp (Contig11624). The moss gene for two-Znknuckles–type 9G8 had an intron of .1,167 bp (figs. 5 and S5 [Supplementary Material online]). Discussion Conserved AS Events in Land Plants We found three sets of evolutionarily conserved AS events in SR protein families that were conserved between monocots and dicots. Each event was an intrasubfamily event and was found in the following subfamilies: plantnovel-SR protein, SCL, and two-Zn-knuckles–type 9G8 (figs. 1 and 2 and table 2). Each of the conserved AS events included several types of AS events, including cassette exon type, alternative donor/acceptor type, and retained intron type. Because the alternatively spliced introns were at the same positions across multiple alignments, the evolutionary conservation of these events was clear. The type of AS event seemed to change along with evolutionary divergence. Although the types of AS events varied, conserved AS events had similar properties in that they generated mRNA encoding proteins with incomplete (truncated or weakened) RRMs. We hypothesized that in regard to AS events of SR proteins, the selection pressure 1092 Iida and Go FIG. 6.—A model for the regulation of the AS profile mediated by AS events in SR protein–encoding mRNAs. Transcriptional and AS regulations of SR proteins lead to oscillating expression of the SR protein products. Transcriptomes adapted to each condition may be created by SR protein regulation. was toward encoding amino acid sequences for conserved processes. In other words, all AS event types that generate mRNAs encoding proteins with incomplete RRMs might actually be caused by the same selection pressure. Given this point of view, the origin of the alternative initiation event found in the osRSZ36-Os05g02880 member of the two-Zn-knuckles–type 9G8 subfamily was the same as that of the conserved AS events across the subfamily. Although all three conserved AS events were found near the centers of RRM-coding regions, the positions of the three events were not identical (fig. 2). We regarded each of the three AS events as conserved only in each subfamily but not across subfamilies. We emphasize that despite the fact that they seemed to be of differing origins, all three conserved AS events generated stop codons in RRM-encoding regions. This result suggested the importance of generating mRNAs encoding proteins with incomplete RRMs by AS. Functions of Conserved AS Events Each conserved AS event was found in an RRM-coding region. We expected that these AS events would greatly influence SR protein function because the RRMs are essential for SR protein function (Chandler et al. 1997). The proteins with truncated RRMs that were generated by the AS events should thus be nonfunctional. Another possibility is that mRNAs with abnormal stop codons might be targeted by nonsense-mediated mRNA decay (Lewis, Green, and Brenner 2003). In either case, the function of the SR proteins might be decreased. For mouse SRp20 and human SC35, ‘‘autoregulating’’ AS mechanisms have been reported (Jumaa and Nielsen 1997; Sureau et al. 2001). In Arabidopsis, autoregulation for arSRp30 and atRSZ33 was also reported (Lopato et al. 1999; Kalyna, Lopato, and Barta 2003). In these cases, pre-mRNA splicing of an SR protein itself was influenced by the quantity of its own protein products. At the same time, pre-mRNA splicing controlled the quantity of its own protein products. We propose that a similar regulation mechanism could occur in the SR protein families of Arabidopsis and rice. They are not necessarily cases of self-regulation in Arabidopsis or rice, however, because many SR protein genes were the result of gene duplications. Duplicated SR proteins would create complicated regulation pathways. Regardless of self-regulation or non–self-regulation, we assume that the conserved AS events are important in controlling the amount of functional SR protein products. In our previous study, we reported large-scale changes of AS profiles according to the expressing organs and environmental stresses (Iida et al. 2004). In that report, we observed induced expression and AS events of mRNA encoding SR proteins under similar conditions. Based on these results, we assumed that regulating the pre- and posttranscriptional levels of SR proteins was critical in controlling whole AS profiles. Our hypothesis of transcriptome regulation mediated by AS of SR protein mRNAs is shown in figure 6. The expression levels of SR protein products are regulated by both transcriptional and AS control. Inducing the expression of pre-mRNA of SR proteins and AS events that created mature mRNAs encoding truncated proteins induced oscillating expression of the SR protein product. Entire AS profiles are influenced by SR proteins, and transcriptomes are constructed to adapt to each condition. Other elements, such as protein degradation, phosphorylation events, and localizations are certainly also important for the regulation of AS. However, transcript-level regulation mediated by conserved AS events must form a critical Conserved AS of SR Protein mRNAs in Land Plants 1093 system of regulation as the events are highly conserved in evolution. We note that all subfamilies containing the conserved AS events (plant-novel-SR, SCL, and two-Zn-knuckles– type 9G8) were plant specific. These subfamilies are important in regulating AS (Lopato et al. 1999; Kalyna, Lopato, and Barta 2003). The fact that all the conserved AS events were found in plant-specific SR protein subfamilies supports our previous hypothesis that the conserved AS events of SR protein pre-mRNAs are critical events in regulating AS. The Origin of Conserved AS Events and Land Plant Evolution Each conserved AS event originated prior to the branching between Arabidopsis and rice, a fact supported by the results of our phylogenetic tree analysis (fig. 3) and the widespread AS events in each subfamily. Arabidopsis thalianaandO.sativa—adicot and a monocot, respectively— diverged 145–206 MYA (Yu et al. 2002), and we traced the conserved AS events back to this era. We obtained results indicating the possibility of an even more ancient origin for these conserved AS events. Long introns found in moss SR protein homologues might be alternatively spliced, and the origin of the conserved AS events could be as ancient as 400 MYA, when moss and flowering plants diverged and ancestral plants first invaded land. At that time, the ancestors of land plants were exposed to drought conditions and drastic temperature changes. Land plants acquired life cycles with various developmental stages, as well as various tissues and organs. For complicated life cycles, developmental stages, tissues, and organs, a more complicated transcriptome was required. We speculate that the AS events found in the SR proteins greatly contributed to obtaining a transcriptome that was adapted for each new requirement. A more complicated transcriptome might well have allowed plants to live on land. Supplementary Material Supplementary figures S1–S5 and multiple alignments of transcript sequences for each locus are available at Molecular Biology and Evolution online (http://www.mbe. oxfordjournals.org/). Acknowledgments This work was supported by Grants-in-Aid for Scientific Research (C) and for Priority Area ‘‘Genome Information Science’’ to M.G. from the Ministry of Education, Culture, Sports, Science, and Technology of Japan. Literature Cited Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402. Bateman,A.,L.Coin,R.Durbinetal.(13co-authors).2004.ThePfam protein families database. Nucleic Acids Res. 32:D138–D141. Brendel, V., L. Xing, and W. Zhu. 2004. Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus. Bioinformatics 20:1157–1169. Carninci, P., T. Kasukawa, S. Katayama et al. (194 co-authors). 2005. The transcriptional landscape of the mammalian genome. Science 309:1559–1563. Chandler, S. D., A. Mayeda, J. M. Yeakley, A. R. Krainer, and X. D. Fu. 1997. RNA splicing specificity determined by the coordinated action of RNA recognition motifs in SR proteins. Proc. Natl. Acad. Sci. USA 94:3596–3601. Gao, H., W. J. Gordon-Kamm, and L. A. Lyznik. 2004. ASF/SF2like maize pre-mRNA splicing factors affect splice site utilization and their transcripts are alternatively spliced. Gene 339:25–37. Graveley, B. R. 2000. Sorting out the complexity of SR protein functions. RNA 6:1197–1211. Gupta, S., B. B. Wang, G. A. Stryker, M. E. Zanetti, and S. K. Lal. 2005. Two novel arginine/serine (SR) proteins in maize are differentially spliced and utilize non-canonical splice sites. Biochim. Biophys. Acta 1728:105–114. Haas, B. J., N. Volfovsky, C. D. Town, M. Troukhan, N. Alexandrov, K. A. Feldmann, R. B. Flavell, O. White, and S. L. Salzberg. 2002. Full-length messenger RNA sequences greatly improve genome annotation. Genome Biol. 3:RESEARCH0029. Iida, K., M. Seki, T. Sakurai, M. Satou, K. Akiyama, T. Toyoda, A. Konagaya, and K. Shinozaki. 2004. Genome-wide analysis of alternative pre-mRNA splicing in Arabidopsis thaliana based on full-length cDNA sequences. Nucleic Acids Res. 32:5096–5103. Isshiki, M., A. Tsumoto, and K. Shimamoto. 2006. The serine/ arginine-rich protein family in rice plays important roles in constitutive and alternative splicing of pre-mRNA. Plant Cell 18:146–158. Jumaa, H., and P. J. Nielsen. 1997. The splicing factor SRp20 modifies splicing of its own mRNA and ASF/SF2 antagonizes this regulation. EMBO J. 16:5077–5085. Kalyna, M., and A. Barta. 2004. A plethora of plant serine/arginine-rich proteins: redundancy or evolution of novel gene functions? Biochem. Soc. Trans. 32:561–564. Kalyna, M., S. Lopato, and A. Barta. 2003. Ectopic expression of atRSZ33 reveals its function in splicing and causes pleiotropic changes in development. Mol. Biol. Cell 14:3565–3577. Kan, Z., D. States, and W. Gish. 2002. Selecting for functional alternative splices in ESTs. Genome Res. 12:1837–1845. Kikuchi, S., K. Satoh, T. Nagata et al. (74 co-authors). 2003. Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 303:376–379. Lewis, B. P., B. E. Green, and S. E. Brenner. 2003. Evidence for the widespread coupling of alternative splicing and nonsensemediated mRNA decay in humans. Proc. Natl. Acad. Sci. USA 100:189–192. Lopato, S., M. Kalyna, S. Dorner, R. Kobayashi, A. R. Krainer, and A. Barta. 1999. atSRp30, one of two SF2/ASF-like proteins from Arabidopsis thaliana, regulates splicing of specific plant genes. Genes Dev. 13:987–1001. Macknight, R., M. Duroux, R. Laurie, P. Dijkwel, G. Simpson, and C. Dean. 2002. Functional significance of the alternative transcript processing of the Arabidopsis floral promoter FCA. Plant Cell 14:877–888. Nishiyama, T., T. Fujita, T. Shin-I et al. (12 co-authors). 2003. Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana: implication for land plant evolution. Proc. Natl. Acad. Sci. USA 100:8007–8012. Okazaki, Y., M. Furuno, T. Kasukawa et al. (137 co-authors). 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420:563–573. 1094 Iida and Go Seki, M., M. Narusaka, A. Kamiya et al. (20 co-authors). 2002. Functional annotation of a full-length Arabidopsis cDNA collection. Science 296:141–145. Shi, H., L. Xiong, B. Stevenson, T. Lu, and J. K. Zhu. 2002. The Arabidopsis salt overly sensitive 4 mutants uncover a critical role for vitamin B6 in plant salt tolerance. Plant Cell 14:575–588. Sureau, A., R. Gattoni, Y. Dooghe, J. Stevenin, and J. Soret. 2001. SC35 autoregulates its expression by promoting splicing events that destabilize its mRNAs. EMBO J. 20:1785–1796. Sutton, G., O. White, D. Adams, and A. Kerlavage. 1995. TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci. Technol. 1:9–18. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. Wang, B. B., and V. Brendel. 2004. The ASRG database: identification and survey of Arabidopsis thaliana genes involved in pre-mRNA splicing. Genome Biol. 5:R102. Wheeler, D. L., D. M. Church, S. Federhen et al. (11 co-authors). 2003. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31:28–33. Yoshimura, K., Y. Yabuta, T. Ishikawa, and S. Shigeoka. 2002. Identification of a cis element for tissue-specific alternative splicing of chloroplast ascorbate peroxidase pre-mRNA in higher plants. J. Biol. Chem. 277:40623–40632. Yu, J., S. Hu, J. Wang et al. (100 co-authors). 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92. Takashi Gojobori, Associate Editor Accepted March 1, 2006