* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Splicing regulation: a structural biology perspective
Histone acetyltransferase wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genetic code wikipedia , lookup
Messenger RNA wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
RNA interference wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polyadenylation wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Alternative splicing wikipedia , lookup
RNA silencing wikipedia , lookup
History of RNA biology wikipedia , lookup
Non-coding RNA wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Epitranscriptome wikipedia , lookup
Splicing regulation: a structural biology perspective Antoine Cléry1, and Frédéric H.-T. Allain1,2 1 Institute for Molecular Biology and Biophysics, ETH Zürich, CH-8093 Zürich, Switzerland 2 To whom correspondence should be addressed: [email protected] 1 Splicing regulation: a structural biology perspective Introduction The spliceosome and his associated proteins is a highly dynamic RNP machine involving a complicated network of RNA-RNA, RNA-protein and protein-protein interactions. Mass spectrometric analyses of affinity-purified spliceosomal complexes indicate that the total number of spliceosome-associated factors is approximately 170 [1]. Among all the proteins involved in splicing, one can first distinguish the proteins which are part of the spliceosome (the spliceosomal proteins) and the others which are referred as splicing factors. Three chapters have been dedicated to this nuclear macromolecular machinery in human, yeasts and plants. Here, we focus on the large number of splicing factors involved in the regulation of splicing (also referred as alternative-splicing). Recent estimations indicate that nearly 80 to 95% of human multi-exon pre-mRNAs are alternatively spliced [2-4]. In higher eukaryotes, high frequency of alternative-splicing events results from the presence of degenerated 5’ and 3’ splice-sites which fail to efficiently recruit the spliceosome. As a result, the presence of additional RNA sequences located in both exon and intron elements are necessary to stimulate or inhibit splicing. Most of these cis-acting RNA sequences are bound by splicing factors which help recruiting or not the splicing machinery. The numerous splicing factors identified to date can be categorized in three main families: the SR proteins (containing serine/arginine rich sequences) which mostly facilitate splice-site recognition, the hnRNP proteins which are considered to have rather an antagonist function and finally the tissue specific splicing factors which can play both roles (reviewed recently by Chen and Manley [5]). All these alternative-splicing factors contain different types of RNA binding domains (mostly RRMs, KH domains and zinc fingers) often in multiple copies (Fig. 1 and Table 1) and all of them recognize RNA sequence specifically. In this chapter, we review the current knowledge on how alternative-splicing factors recognize RNA and proteins at the atomic level. Structural biology contributions have been essential over the last decade to help deciphering this vast protein-RNA and protein-protein interaction network. Structures have explained how certain cis-acting elements can be discriminated by splicing factors but also how RNA binding protein can affect RNA structure. We have organized this chapter in grouping the splicing factors by the types of RNA binding domains they embed rather than by family of proteins. We then review successively the structures of alternative-splicing factors containing RRMs (the most common RNA binding domains found in splicing factors), 2 containing zinc-fingers and finally containing KH domains. We describe and compare the different structures and show how the structure of the alternative-splicing factors in complex with the different RNA and protein partners contribute to a better understanding of the mechanism of action of these proteins in splicing regulation. 1. The RRM: a versatile scaffold for interacting with multiple RNA sequences and also proteins The RNA-recognition motif (RRM), also known as RBD (RNA binding domain) or RNP (ribonucleoprotein domain) is the most abundant RNA-binding domain in higher vertebrates (this motif is present in about 0.5%-1% of human genes) [6]. Over the last ten years, biochemical and structural studies have shown that this domain is not only involved in RNA/DNA recognition but also in protein-protein interaction. Both modes of interactions play crucial role in splicing regulation. 1.1. RRM-RNA interaction and splicing regulation An RRM is approximately 90 amino acids long with a typical topology that forms a four-stranded -sheet packed against two -helices (Fig. 2A). RRMs are found in almost all types of splicing factor families in a single copy or in multiple copies (Fig. 1). Although the -sheet is most commonly used to bind single-stranded RNAs, an extreme structural diversity of modes of RRM-nucleic acid recognition has been selected during evolution making RRMs a very versatile RNA binding platform [7, 8]. Most commonly, three aromatic side-chains belonging to the two signature sequences RNP1 and RNP2, and exposed on the -sheet surface (Fig. 2A and 2B), accommodate two nucleotides as follows: the bases of the 5’ and of the 3’ nucleotides stack on an aromatic ring located in 1 (position 2 of RNP2) and in 3 (position 5 of RNP1), respectively (Fig. 2A). The third aromatic ring which is usually located in 3 (position 3 of RNP1) is often inserted between the two sugar rings of the dinucleotide (Fig. 2A). However, deviations from this basic mode of binding are found in many RRM-RNA complexes, due to a role of the N- and C-terminal extensions of the domain, to the interdomain linker in case of proteins containing multiple RRMs or to additional protein cofactors that can also modulate the RNA-binding specificity [8]. Several alternative-splicing factors containing one or multiple RRMs have been solved in complex with RNA over the years (Table 1), namely PTB (polypyrimidinetract binding protein, 4 RRMs), HuD (2 of 3 RRMs), Sex-lethal (2 RRMs), hnRNP A1 (2 3 RRMs), U2AF65 (2 RRMs), Fox-1 (1 RRM), RBMY (1 RRM), SRp20 (1 RRM) and more recently RRM3 of CUG-BP. 1.1.1 RNA binding by splicing factors containing a single RRM Splicing factors embedding a single RRM are few in comparison with the ones containing multiple RRMs. With a single RRM, only SRp20, 9G8, SC35, SRp46, SRp54, SRrp86, RNPS1, Tra2 and Tra2 are found among SR and SR-like proteins, hnRNP C1/C2 and G, among hnRNP proteins and Fox-1 and Fox-2 among the tissue-specific splicing factors (Fig. 1). With a single RRM, one would expect these proteins to bind RNA with less affinity and less sequence-specificity than multi-RRM proteins, we will see that if this is true for some (SRp20) this is not always true (Fox-1). Among these factors, the structures of three single RRMs in complex with RNA have been determined, namely SRp20, Fox-1 and RBMY (a testis-specific protein with more than 80% identity with the RRM of hnRNP G). The NMR structure of the human SR protein SRp20 in complex with the 5’-CAUC-3’ RNA sequence still represents the first and unique structure to date of an SR protein in complex with RNA [9]. The structure reveals the presence of an additional aromatic residue (a tryptophan) located on the -sheet surface (on 2-strand) that is responsible for the binding of the two most 3’ nucleotides (Fig. 2C). Although, four nucleotides are bound, the affinity is rather weak (20 M) due to the unusual semi sequence-specific mode of RNA recognition by this RRM. Indeed the structure reveals a binding consensus sequence CNNC (where N can be any nucleotide) which is compatible with the sequence consensus established for this protein by in vitro and in vivo SELEX experiments [10]. This degenerate sequence-specificity of SRp20 RRM allows the binding of this protein to more diverse RNA sequences making the evolutionary pressure on the bound RNA weaker, which is ideal for exonic sequences containing natural SRp20 RNA targets [10]. This weak RNA binding affinity allows a more frequent SRp20 association and dissociation from the RNA which is important in the context of the highly dynamic processes involving this protein which is present from RNA transcription to mRNA export. The structure of the RRM of human Fox-1 (a tissue-specific alternative splicing factor) in complex with the 5’-UGCAUGU-3’ RNA presents a radically different mode of binding compared to SRp20 [11]. Although both proteins contain a single RRM, the affinity of the Fox-1 RRM for 5’-UGCAUGU-3’ is extremely high (Kd in the subnanomolar range) reflecting a very high sequence-specificity for the central pentamer GCUAG. To 4 accommodate seven RNA nucleotides on a single domain, the RRM of Fox-1 uses, in addition to the -sheet surface, several loops joining secondary structure elements (Fig. 2D). In particular the presence of a phenylalanine in the 1/1 loop of Fox-1 RRM is critical for binding RNA as the first three nucleotides are wrapped around it (Fig. 2D) [11]. Although the mechanism of action of Fox-1 and Fox-2 in splicing regulation is not known, the clear sequence-specificity of the protein allowed a reliable mapping of its binding sites and the identification of strong correlations between the location of Fox-1 binding site relative to splice-sites and its effect on splicing regulation [12, 13]. Considering the very high affinity of Fox-1 RRM, one would expect Fox-1 to remain bound to the RNA when the protein finds its target, contrary to SRp20. The structure of the single RRM of the human testis-specific RBMY in complex with RNA revealed common features to both SRp20 and Fox-1 [14]. Considering the high sequence identity between the RRMs of the human RBMY and hnRNP G, the structure suggests that hnRNP G can bind sequence-specifically CAA motifs on the -sheet surface of the RRM (Fig. 2E). However, hnRNP G and RBMY having a different 2/3 loop (both in length and sequence), only RBMY has the ability to bind a stem-loop containing a CAA motif in the loop by insertion of the loop in the major groove of the RNA stem (Fig. 2E). Although only putative targets have been identified for RBMY [14], it is interesting to note that the two tissue-specific splicing factors described here (RBMY and Fox-1) both bind RNA with high-affinity and specificity using a single RRM. 1.1.2 RNA binding by splicing factors containing multiple RRMs Most splicing factors contain multiple RRM copies (Table 1). Structures of the two RRMs of Sex-lethal, U2AF65 and hnRNP A1, of RRM3 of CUG-BP, of RRM1 and RRM2 of HuD and of the four RRMs of PTB have been determined in complex with RNA (Table 1). From these few structures it appears that generally RRMs joined by a single protein chain happen to bind very similar sequences although not in an identical manner. This could be at the origin of the very repetitive sequences that have been observed in cis-acting elements regulating splicing [5]. There are of course exceptions to this rule with for example five SR proteins (ASF/SF2, SRp30c, SRp40, SRp55 and SRp75) which all embed two very different RRMs (a canonical RRM and a pseudo-RRM), each of these two RRMs harboring a different RNA binding specificity [10, 15]. 5 Recognizing pyrimidine-tract by Sex-lethal, U2AF65 and PTB The 3’ splice-site pyrimidine-tract is a major cis-acting element for both constitutive and alternative splicing. Several trans-acting factors have been shown to bind the pyrimidinetract resulting in activating (U2AF65) or repressing (Sex-lethal and PTB) splicing. The structure of the RRMs of the three proteins bound to pyrimidine-tracts revealed the nature of the RRM-RNA interaction and the molecular basis of the sequence-specificity of each protein. The structure of all four RRMs of PTB was solved in complex with short 5’CUCUCU-3’ pyrimidine-tracts [16]. It was found that each RRM of PTB can bind a short pyrimidine-tract, RRM1 and RRM4 binding three pyrimidines, RRM2 binding four and RRM3 binding five. RRM2 and RRM3 of PTB contain an additional fifth -strand resulting in an extension of the -sheet and therefore the binding of additional nucleotides (Fig. 3A) [16]. The structure revealed a similar although not identical sequence specificity for the four RRMs as RRM 1, 2, 3 and 4 recognize specifically YCU, CUNN, YCUNN and YCN sequences, respectively (Y is a pyrimidine and N any nucleotide). The dissociation constant (Kd) of each RRM for a CUCUCU sequence is around 1M but increases substantially for polyU sequences confirming the sequence-specific binding preference for pyrimidine-tracts containing cytosines [17]. The structure of the pre-mRNA splicing factor U2AF65 in complex with a U-tract revealed a different mode of pyrimidine-tract recognition although still using the -sheet surface (Fig. 3B) [18]. This interaction is governed by hydrogen-bonds involving flexible side-chains of conserved U2AF65 residues and by water molecules mediating interactions between U2AF65 side-chains and the uracil bases. The use of flexible side-chains and the possible relocation of bound water molecules could explain how U2AF65 accommodates at certain position cytosines which are present in most 3’ splice-site pyrimidine-tracts. Like PTB RRMs, the two RRMs of U2AF65 bind RNA independently explaining the similarly weak affinity (Kdin the M range) observed for this splicing factor [19]. These structural data allow a better understanding of how PTB and U2AF65 compete for binding on the 3’ splice-site pyrimidine-tract [20]. U2AF65 preferentially binds uraciltracts but can adapt to bind any pyrimidine-tract due to its versatile mode of RNA binding, whereas PTB preferentially binds pyrimidine-rich sequences containing CU-tracts. This explains that alternative exons repressed by PTB and containing CU-tracts in the 3’splice-site can be changed into constitutive exons and therefore de-repressed by several C to U changes [21, 22]. 6 Binding of U-tracts by the two RRMs (RRM12) of Sex-lethal is quite different from the other two proteins. In the structure of the complex [23], Sex-lethal RRM12 recognizes sequence-specifically each nucleotide of 5’-UGUUUUUUU-3’ except U5, with RRM2 recognizing the 5’UGU and RRM1 the 3’UUUUUU sequences. InterRRM interactions upon RNA binding and contact from the short interdomain linker to the RNA contribute to the overall high affinity (Kd -lethal for the RNA. Comparison between the two structures explains well, how Sex-lethal can prevent U2AF65 binding to U-tract like observed in the Drosophila tra pre-mRNA [24]. Sex-lethal RRMs can not only discriminate better than U2AF65 uracils over cytosines but also the two RRMs of Sex-lethal can bind cooperatively U-tracts while the two RRMs of U2AF65 cannot. Although PTB, U2AF and Sex-lethal bind pyrimidine-tracts using similar RNA recognition motifs and the same surface of interactions (the -sheet), subtle variations in the side-chain composition on the -sheet surface has allowed the RRM of each protein to recognize UCU, YYY and UUU sequences, respectively. Additionally, the RRMs of Sexlethal evolved to bind pyrimidine-tract cooperatively while the RRMs of PTB and U2AF65 appear to bind RNA independently. Recognizing purine-pyrimidine tract by CUG-BP and HuD Several purine-pyrimidine tracts have been found as alternative-splicing regulatory cis-acting elements like for example CA-tracts, UG-tracts or CUG-tracts [25]. AU-rich elements have been initially characterized for their importance in RNA stability and more recently in alternative-splicing regulation [26]. Several RRM containing proteins have been identified as trans-acting factors binding these purine-pyrimidine tracts, for example hnRNP L binding CA-tracts, RBM35 and CELF-proteins like CUG-BP binding UG-tracts and CUGtracts [25] or ELAV-proteins like HuD binding AU-tracts. The structures of HuD RRM1 and RRM2 bound to AU-rich RNA [27] and more recently the CUG-BP RRM3 in complex with RNA [28] have been determined and provided information on how such RNA tracts are recognized by RRMs. HuD and CUG-BP have in common a similar domain organization, both proteins embedding three RRMs with the two most N-terminal RRMs (RRM1 and RRM2) being separated by a small interdomain linker (11 and 9 amino-acids, respectively) while the Cterminal RRM3 is found much further away from RRM2 (89 and 113 amino-acids, respectively). The two solved structures therefore provide indications on the RNA binding 7 mode of the numerous splicing factors containing three RRMs (Fig. 1). Considering the high sequence similarity between RRM12 of HuD and the RRM12 of Sex-lethal, it is maybe not too surprising that the structure of HuD bound to UAUUUAUUU [27] (Fig. 3C) and Sexlethal bound to UGUUUUUUU adopt a very similar conformation [23]. While most of the contacts with the pyrimidines are sequence-specific (Fig. 3C), the protein contacts to the adenines in HuD do not appear to be A-specific, similarly to the contacts to guanines in Sexlethal [23]. It is therefore unclear how the purines are discriminated by these two RNA binding proteins. In the case of HuD, it was even suggested that adenines destabilize HuD binding [29]. Like for Sex-lethal, it is very likely that RRM1 and RRM2 of HuD bind RNA in a cooperative fashion to increase RNA binding affinity and specificity [30]. The structure of RRM3 of CUG-BP1 was recently determined in complex with the hexamer UGUGUG [28]. This NMR structure revealed sequence-specific recognition of the central UGU motif although all six nucleotides are bound by RRM3 (Fig. 3D). The 12 aminoacids immediately N-terminal to the RRM strongly interact with the surface of the RRM in its free form by running across the -sheet. This N-terminal extension also contributes to RNA binding by interacting with G4 and U5 (Fig. 3D). This extension partly explains that six nucleotides can be bound to this isolated RRM although the binding affinity remains modest (Kd = 1.9 M). The binding affinity and mode of sequence-specific binding for the two N-terminal RRMs of HuD and the C-terminal RRM of CUG-BP are quite different. This possibly reflects on the different roles played by these two different parts in both proteins [30, 31], although this needs to be confirmed by the structure of the three RRM containing protein bound to RNA. Also it remains to be seen if, in this context, the three RRMs have the same RNA binding specificity or not. Recognizing polypurine-tract by hnRNP A1 and hnRNP F Polypurine-tract are found very frequently as high affinity binding sites for many alternativesplicing factors, including most SR proteins and SR-related proteins [10] but also the splicing repressor hnRNP A1 (binding sequence 5’-UAGGG-3’) and members of the hnRNP H/F family that bind G-tracts containing RNAs [32]. Among these different proteins, the structure of hnRNP A1 in complex with DNA telomeric repeats (5’-TTAGGG-3’) has been determined [33] and the structure of the apo form supplemented by interesting NMR binding studies have been done for the three hnRNP F RRMs [34]. 8 In the hnRNP A1-DNA complex, sequence-specific interactions with 5’-TAGG-3’ sequences have been observed on the -sheet surface of both RRMs with an almost identical recognition in both domains for TAG. The structure strongly argues that UAGG RNA sequences would be recognized in an identical manner [33]. This sequence is reminiscent of the 3’ splice-site consensus sequence and is found in many cis-acting elements bound by hnRNP A1 that regulate alternative splicing [5, 32, 35]. The crystal structure of hnRNP A1 bound to two telomeric DNA repeats revealed also an usual arrangement of the DNA and the protein, as the complex is a dimer with the 5’TAGG of each DNA molecule contacting RRM1 of one subunit and the 3’TAGG contacting RRM2 of the second subunit. Although this arrangement might be functionally important for telomeric repeats, it remains to be seen if this would be relevant for splicing regulation. Recent NMR investigation of the three RRMs of hnRNP F [34] in complex with the 5’-CGGGAU-3’ RNA sequence revealed a non canonical binding surface constituted by the three loops (the 1/1, 2/3 and2/4 loops which are all located at the “south” side of the -sheet) of each RRM instead of the -sheet surface [34]. These RRMs are not canonical since they lack conserved aromatic residues in RNP1 and RNP2. This is why historically these domains were named qRRMs for quasi-RRMs. The structures of these unusual RRMs might reveal why such unusual mode of binding evolved. 1.2. RRM-RRM and RRM-proteins interaction in splicing regulation It has been apparent over the past several years that the RRM is not only an RNA binding platform but also (sometimes exclusively) a very good protein-protein interaction domain. Peptide-RRM as well as RRM-RRM interactions have been discovered and structures have been determined. These protein-RRM interactions can also play a significant role in splicing regulation. 1.2.1 RRM-protein interactions without RNA binding The UHM family (U2AF homology motif), a noncanonical RRM family, has been defined for RRMs sharing sequence and structural characteristics with U2AF [36]. This family is characterized by i) the absence of aromatic residues in the RNP2 sequence, ii) an extended highly acidic 1-helix, and iii) the presence of a conserved Arg-X-Phe motif (X is for any amino acid) in the 2/4 loop. UHM-ULM (UHM-ligand motif) interaction plays an important role for the assembly of splicing factors at the 3’ splice-site. For example, UHMULM contacts mediate the interaction of U2AF65 with SF1, U2AF35 or SF3b155. 9 Interestingly, RNA binding to these UHM RRMs seems to be compromised by the presence of an additional C-terminal helix, which is packed against the -sheet. The Arg-X-Phe motif and the negatively charged extended 1-helix form the surface for protein-protein interactions. A recent structure of SPF45 UHM in complex with SF3b155 ULM has shed light for a role of the UHM-ULM interaction in alternative splicing (Fig. 3E) [37]. This interaction was found to be critical for the splicing regulation of the apoptosis regulatory gene FAS. Based on the structure, the authors showed that substitutions in the conserved UHM motif Arg375-X-Phe377 (in 3’) or mutation of Glu329 (in 1-helix) or Asp319 (in the 1/1 loop) affect differently the affinity of SPF45 UHM for three natural ULM targets of SPF45 (SF3b155, SF1 and U2AF65). These data strongly suggest that by interacting with the ULM present in these three splicing factors, SPF45 can repress splicing. It therefore appears that RRM containing proteins can repress splicing by very different mechanisms, for some like Sex-lethal, PTB or hnRNP A1 it involves RNA binding competition with splicing factors and for others like SPF45 it involves direct interactions with splicing factors in order to prevent their assembly. 1.2.2 RRM-protein interactions allowing RNA binding Another example of RRM-protein interactions regulating splicing is the binding of PTB RRM2 by its co-repressor Raver1 (Fig. 3F) [38]. The Raver1 peptide interacts with the shallow groove formed by 1-helix and the 2/4 loop of PTB RRM2, similarly to the binding of ULM to an UHM (Fig. 3E) [39]. However, the tryptophan side-chain typical of an ULM is replaced by conserved leucine residues at positions 500 and 501 of the Raver1 motif (499-SLLGEPP-505). Although similar to UHM-ULM interaction, the PTB-Raver1 interaction is functionally different as it is compatible with simultaneous RNA binding [38]. Raver1 containing four PTB RRM2 binding motifs, it is suggested that the co-repressor mechanism of action of Raver1 could be to act as a recruitment platform for multiple PTB molecules [38]. The interaction of RRMs with proteins can also limit the specificity of RRM-RNA recognition. Nice illustrations are the crystal [40] and solution [41] structures of the complex containing the p14 protein, a human component of the spliceosomal U2 and U11/U12 snRNPs, and a peptide derived from SF3b155. The p14 -sheet is occluded by its own helix and by SF3b155. Only one pocket containing a conserved RNP2 residue (Tyr22) is accessible to the solvent [40]. Biochemical and NMR studies suggest that this residue [40, 42] 10 but maybe also Tyr28 (1/1 loop) and Arg85 (2/4 -hairpin) are involved in the branchpoint recognition [41] but with a weak specificity and affinity, allowing the possible regulation of this interaction by competitors. 1.2.3. Impact of RRM-RRM interaction on splicing mechanism RRMs also use their -helices to interact with each other keeping the -sheet of the RRM completely free for RNA interactions. This can be done intramolecularly like in PTB or intermolecularly like in hnRNP A1. The structure of PTB RRM3 and RRM4 free [43] and bound to RNA [16] revealed that the two RRMs are tightly associated in both forms through their helices (1 and 2 of RRM3 and 2 of RRM4) and the interdomain linker, forming a large hydrophobic interface involving 27 protein side-chains [43]. This tight interaction between the two RRMs, results in an anti-parallel orientation of their bound RNAs implying that RRM3 and RRM4 could induce the formation of RNA loops. These structural data suggest that PTB might repress splicing by looping out alternative-exons, a branch point or any other cis-acting element [16]. As described above, hnRNP A1 RRM1 and RRM2 have been found to dimerize upon telomeric DNA binding via intermolecular RRM-RRM interactions in the crystal structure. This suggests a potential mechanism for how hnRNP A1 might loop-out alternative-exon or help in the splicing of very large introns as proposed by Blanchette and Chabot [44]. 2. The zinc finger domain The classical zinc finger domain is approximately 30 amino acids long and displays a protein fold in which a -hairpin and an -helix are pinned together by a Zn2+ ion. These domains are classified in function of the amino acids which stabilize the Zn2+ interaction. Although zinc fingers are mostly known as interacting with DNA molecules, few structures of these domains in complex with RNA have also been solved [45]. Similarly to the RRMs, the zinc fingers have also been reported as interacting specifically with RNA using hydrogen bond and aromatic-base stacking interactions. However, the amino acids involved in the RNA interaction are not mainly located in the -strands like for RRMs but are rather embedded in the protein loops (Fig. 4A and 4B). It has recently been shown by crystallography that the muscleblind-like 1 (MBNL1) tandem CCCH zinc fingers 3 and 4 specifically interact with the 5’-GC-3’ sequence using intermolecular stacking and hydrogen-bonding interactions [46]. In the zinc finger 3 domain, 11 the Arg195 side chain stacks over the G base, and the cytosine is sandwiched between the Phe202 ring, which is inserted between the two nucleotides, and the Arg186 side chain (Fig. 4A). Sequence-specific recognition is mediated by four hydrogen bonds involving main-chain amide and carbonyl groups and three hydrogen bonds involving the side chains of Glu183 and the two cysteines (Cys185 and Cys200) coordinated to the zinc atom (Fig. 4A). This mode of RNA interaction is reminiscent of how Tis11d (another CCCH zinc finger containing protein involved in mRNA stability) binds a 5’-AU-3’ sequence although the dinucleotide sequence is different (PDB code: 1RGO) [47]. Like for proteins containing several RRMs, the mode of RNA recognition is very similar for the MBNL1 zinc fingers 3 and 4 suggesting a duplication of this motif during evolution. Interestingly, the anti-parallel orientation adopted by the RNA molecules bound by the two zinc fingers and the location of MBNL1 binding sites on natural targets suggest that the protein could induce a looping of the RNA blocking the 3’ splice site recognition by U2 snRNP resulting in exon skipping [46]. The human 9G8 SR protein contains one CCHC zinc finger located between a RRM and a RS domain (Fig. 1) and recognizes in vitro different RNA sequences when the zinc finger is intact or when two cysteines coordinated to the zinc atom are substituted by glycines [48]. Indeed, in vitro SELEX experiments in presence of the wild type protein selected 5’GAC-3’ repeat RNA sequences, instead of the 5’-(A/U)C(A/U)(A/U)C-3’ motif selection in presence of the 9G8 mutant [48]. These results bring to the fore the involvement of the zinc finger in the specific RNA recognition by 9G8 [48]. Another RS containing protein, ZRANB2, embeds in place of RRMs two RanBP2-type (“CCCC”) zinc finger domains. A crystal structure of these motifs in complex with the 5’-AGGUAA-3’ RNA sequence was recently determined [49]. Each domain is composed of two short -hairpins sandwiching a zinc ion that is coordinated by four conserved cysteines (Fig. 4B). A structural particularity of this RNA-protein complex is the guanine-Trp79-guanine “ladder” formation adopted by a continuous stacking of these three residues. The G2, G3 and U4 bases are specifically recognized by formation of hydrogen bonds involving protein side chains (N76, R81, R82 and N86), backbone groups (V77 carbonyl and W79 amide) and water-mediated hydrogen bonds (D68 and A80). These amino acids are mainly located in ZRANB2 loops, especially the one located at the C-terminal extremity of the first -hairpin (Fig. 4B). Based on functional data and the strong homology between the ZRANB2 binding site and the 5’ splice site sequences, the authors suggest that this protein might interact with a subset of 5’ splice sites preventing their recognition by the spliceosome [49]. Here again, in addition to explain at the atomic 12 level the molecular basis of specific RNA recognition by these proteins, structural data suggest possible mode of action for these splicing factors. Since MBNL1 and ZRANB2 both bind 5’-GY-3’ containing sequences, it is interesting to compare their mode of RNA recognition. In both complexes, one can see similarities like the stacking by one aromatic ring (F202 and W79) on RNA bases (C3 and G2/G3 in MBNL1-RNA and ZRANB2-RNA complexes, respectively). However, one can also see clear differences on how very differently RNA bases are recognized. C3 is mainly recognized by MBNL1 main chains whereas U4 interacts exclusively with ZRANB2 side chains (Fig. 4A and 4B). In addition, the G2 and C3 bases are perpendicular in the MBNL1RNA complex whereas the corresponding bases are parallel to each another in presence of the ZRANB2 domain. Finally, it is only in the MBNL1-RNA complex that two cysteines coordinated to the zinc atom are found also contacting RNA. These structural data illustrate clearly how closely related RNA sequences (GY) can be recognized very differently by zinc finger domains. It also brings to the fore the difficulty to predict RNA recognition by protein domains and the necessity of still solving additional structures of RNA-protein complexes in order to correctly characterized and better understand such interactions. 3. The KH domain The hnRNPK homology (KH) domain is approximately 70 amino acids long. The KH motif is found in archaea, bacteria and eukaryotes and is known to interact with RNA or ssDNA targets with a low micromolar affinity [50, 51]. Several copies of this domain can be found in a protein acting independently or cooperatively. In the latter case, it results in an increase of the nucleic acid affinity and specificity [52]. Only few structures of KH domains bound to nucleic acid molecules have been deposed in the Protein Data Bank and most of them concern the eukaryotic type I KH domain. This motif has a topology and is characterized by a -sheet composed of three antiparallel -strands packed against three helices [50, 51]. The 1- and 2-strands are parallel to each other and the 3-strand is antiparallel to both. In addition, a “GXXG loop” containing the (I/L/V)-I-G-X-X-G-X-X(I/L/V) conserved motif, located between the and helices, and a 2-3 loop variable in length and sequence, are also found in this motif (Fig. 4C and D). The KH type II fold is typically found in prokaryotic proteins. It differs from the type I by a topology and a characteristic -sheet in which the central strand is parallel to 3 and antiparallel to 1 [50, 51]. 13 KH domains have been shown interacting with their nucleic acid targets using common features. The single-stranded RNA or DNA molecule is mostly bound by an extended RNA binding surface including the and helices, the GXXG motif, the 2strand and the variable loop [51]. Together, they form a binding cleft that usually accommodates four bases (Fig. 4C and 4D). Interestingly, KH motifs use a different mode of RNA recognition when compared to RRMs. Instead of interacting via the -sheet surface they use a / platform. In addition, the KH RNA binding surface is very hydrophobic and, contrary to the canonical RNA binding mode of RRM and Zinc finger domains, aromatic residues are not involved in these interactions. This feature could in part explain the low affinity found for the KH domain interacting with single stranded nucleic acids. Nova2 (Neuro-oncological ventral antigen 2) is a tissue-specific alternative splicing factor containing three KH domains (Fig. 1 and Table 1). This protein is highly expressed in the neocortex and hippocampus and regulates the alternative splicing of transcripts coding for proteins having specific functions in brain [53]. The crystal structure of the Nova2 KH3 domain in complex with an in vitro selected stem-loop RNA shows that this protein interacts with the single stranded 5’-UCAC-3’ sequence located in the loop (Fig. 4C) [54]. U12 is indirectly specifically recognized by two water molecules forming bridges with the Lys23 from the GKGG loop and the Arg75 from the 3 helix (Fig. 4C). C13 and C15 directly interact with protein side chains from the 2 and 3 strands, whereas A14 is the only base to be hydrogen bonded to amide and carbonyl of the main chain of I41 (Fig. 4C). This structure revealed that the NOVA2 KH3 domain interacts specifically with the 5’-UCAY-3’ RNA sequence. In good agreement with this result, the 5’-UCAU-3’ sequence located upstream of the alternatively spliced exon 3A of the glycine receptor 2 pre-mRNA could be predicted as being a Nova binding site [55, 56]. These structural data have been crucial for the in vivo identification of several new Nova binding sites and for a better understanding of the splicing regulation by this protein [53, 57, 58]. Another KH containing protein involved in splicing is SF1/mBBP which specifically binds the 5’-UACUAAC-3’ intron branchpoint sequence (BPS) in human pre-mRNA transcripts [59] using a binding surface composed by a KH domain and a C-terminal helix known as the QUA2 domain (Quaking homology 2) [60]. This extended KH surface with a topology allows the binding of seven nucleotides instead of the four nucleotides usually bound by a single KH domain. The 3’-end of the BPS (5’-UAAC-3’), which contains the conserved branch point adenosine (underlined), is specifically recognized by the KH 14 domain, whereas the 5’-end (5’-ACU-3’) is bound by conserved residues from the QUA2 domain. Amino acids from the 1 and 2 helices, the 2 strand, the GXXG motif and the variable loop of the KH domain are mainly used for binding RNA by a combination of hydrophobic interactions, hydrogen bonding and electrostatic contacts (Fig. 4D) [60]. Interestingly, in good agreement with the conservation of the branch point adenosine, the NMR structure shows that the base is specifically recognized by hydrogen bonds involving the main chain of I177 [60] similarly to the contact to A14 in Nova2 KH3 with I41(Fig. 4C and 4D). The structures of Nova2 KH3 and SF1 KH domains have been solved in complex with the similar 5’-UCAC-3’ and 5’-UAAC-3’ RNA sequences, respectively [54, 60]. These data show that, like the RRM and zinc finger domains, KH domains are able to specifically recognize RNA sequences. Interestingly, these two proteins use a similar mode of RNA recognition (Fig. 4C and 4D). In addition to the similar contact to A14 (NOVA2) and A8 (SF1), C13 (NOVA2) and A7 (SF1) are together hydrogen bonded with an aspartate located in the 1-1 loop. Finally, A8 and A14 are stacked on C9 and C15 in the SF1 KH- and Nova2 KH3-RNA complexes, respectively. Interestingly, these features are also observed in the type II tandem KH domains of NusA (PDB code: 2ATW) [61] suggesting a rather small range of sequences that could be targeted specifically by the KH domain containing proteins. This could partially explain the small number of splicing factors containing KH domains in comparison to the splicing factors containing RRMs (Fig. 1). Conclusion and perspectives In this chapter, we have described the current knowledge on how splicing factors interact with RNA and proteins at the atomic level and participate in splicing regulation. Although, still few structures of splicing factors bound to RNA or proteins have been determined compared to the vast number of proteins involved in splicing regulation, a few conclusions or hypotheses can be nevertheless drawn from these structures. It is clear from figure 1 that the vast majority of splicing factors contain RRMs which are used for RNA binding but in some cases as well for protein-protein interactions. The main lesson we have learned over the years about the RRM [7, 8, 62] is the extreme versatility and plasticity of this small protein domain. We showed here that RRM containing proteins can bind specifically a large variety of sequences as shown with the structures of RRMs bound to pyrimidine tracts (Sex-lethal, U2AF65 and PTB), purine-pyrimidine tracts (CUG-BP and HuD) and purine tracts (hnRNP A1, hnRNP F and SR proteins) (Table 1). RRMs also bind RNA with a wide 15 range of affinities as illustrated by the RRMs of SRp20 and Fox-1 that bind RNA specifically with low and high affinity, respectively. The extreme versatility of the RRMs for binding can be explained by the use of different combinations of side chain and main chain RNA interactions but also by the capacity for this domain to increase its RNA binding surface outside the canonical -sheet surface. Indeed, there are examples of RRMs using an additional -strand (PTB RRM2 and 3), loops (2/3 loop of RBMY and 1/1 loop of Fox-1) and RRM extremities (C-terminus of PTB RRMs and N-terminus of CUG-BP) to interact with RNA. With such high diversity in its modes of RNA interactions, it seems now almost logical that the RRM appears so frequently in splicing factors considering the large repertoire of sequences that need to be recognized with different affinity and specificity for splicing regulation. Unfortunately, one drawback in this versatility is that RRM-RNA interactions are still very hard to predict which justify the need to determine still more structures of RRMRNA and more generally of protein-RNA complexes. The structural data have provided essential information to map correctly binding sites for several splicing factors in vivo (the best examples are Fox-1 and NOVA2), mapping that revealed that the positioning of the binding site relative to the splice sites appears to be a major element controlling the mode of action of the splicing factor. Although this information is not sufficient to fully characterise this mode of action, it contributes to a better understanding of their functions. Splicing factors work also by competing against other factors for the same RNA binding sequence. The structural work on PTB, U2AF65 and Sexlethal revealed how each protein adapts to the different pyrimidine-tracts found at the 3’ splice-site. Finally, solving the structures of alternative-splicing factors bound to RNA and proteins revealed unexpected features like the potential for RNA looping by PTB, hnRNP A1 or MBNL1 (Fig. 5) suggesting that splicing factors function by recognizing RNA sequences but also by remodelling RNA structure. Despite progress in the last decade in this growing field, many questions remain to be answered and will require a structural biology approach to fully understand the role of splicing factors in splicing regulation. This goes from simple questions that could be addressed rapidly to more complicated ones that will require multidisciplinary approaches or new methodologies. For example, we still need to address how a pseudo-RRM binds RNA or how RS domains mediate RNA and protein binding. A more complicated question is how do splicing factors interact with the splicing machinery or how several factors assemble or multimerise on certain cis-acting elements? Also, how dynamic are protein-RNA interactions near splice-sites and how phosphorylation influences this dynamic? How coordinated among 16 the different gene families is the splicing regulation and how this is mediated at the molecular level? Finally, since an increasing number of diseases appear to be connected with splicing regulation, all these emerging knowledge will be indispensable to develop new therapeutic treatments [63]. Acknowledgements The authors would like thank Profs. Steve Matthews for providing several models of RRM2Raver1, the Swiss National Science Foundation (No. 3100A0-118118), the SNF-NCCR Structural Biology and EURASNET for financial support to FHTA and the European Molecular Biology Organization for a post-doctoral fellowship to AC. Figure captions Table 1: Structures of RRMs, KH domains and zinc fingers from splicing factors solved in complex with RNA. The protein domains and target RNA sequences used for the structure determinations and the corresponding PDB numbers are indicated. The nucleotides bound by the proteins are in bold. Figure 1: Classification of the main human alternative splicing factors in function of their RNA binding domain composition. The RRMs, the quasi-RRMs (qRRMs), the pseudo-RRMs (ΨRRMs), the KH domains and the zinc fingers are represented by rectangles coloured in dark blue, pale blue, pale green, magenta and yellow, respectively. The RS (Arg/Ser-rich) domains are represented by red spheres. Figure 2: The high versatility of single RRM interactions with RNA. (A) Structure of hnRNP A1 RRM2 in complex with single stranded telomeric DNA as a model of single stranded nucleic acid binding [33]. (B) Scheme of the four-stranded -sheet with the place of main conserved RNP1 and RNP2 aromatic residues indicated in green. RNP1 and RNP2 consensus sequences of RRMs are shown (X is for any amino acid). (C) Structure of SRp20 RRM in complex with the 5’-CAUC-3’ RNA [9]. In all the figures, the ribbon of the RRM is shown in grey, the RNA nucleotides are in yellow and the protein sidechains are in green. The N, O and P atoms are in blue, red and orange, respectively. The Nand C-terminal extensions of the RRM and 5’- and 3’-end of RNA are indicated. Hydrogen 17 bonds are represented by purple dashed lines. (D) Structure of Fox-1 RRM in complex with the 5’-UGCAUGU-3’ RNA [11]. (E) Structure of RBMY RRM in complex with a stem-loop RNA capped by a 5’-CACAA-3’ pentaloop [14]. The figures were generated by the program MOLMOL [64]. Figure 3: Structures illustrating RRM-RNA and RRM-peptide interactions. (A) Structure of PTB RRM3 in complex with the 5’-CUCUCU-3’ RNA [16]. The 4-strand, the 4/5 loop and the additional -strand of RRM3, which are involved in the RRM-RNA interaction, are shown in red. (B) Structure of U2AF65 RRM1 in complex with U-tract RNA [18]. (C) Structure of HuD RRM1 and 2 in complex with the 5’-UAUUUAUUU-3’ RNA [27]. (D) Structure of the CUG-BP1 RRM3 in complex with the 5’-UGUGUG-3’ RNA sequence [28]. (E) Structure of SPF45 UHM (in grey) in complex with the SF3b155 ULM (aa 333 to 342, in blue) [37]. (F) Structure of PTB RRM2 (in grey) in complex with Raver1 peptide (in blue) [38] and RNA (in yellow) [16]. The colour schemes are the same that in Figure 2. Figure 4: Structures of zinc fingers and KH domains from splicing factors in complex with RNA. (A) Crystal structure of MBNL1 ZnF3 bound to the 5’-GC-3’ RNA sequence [46]. The zinc atom and water molecules are represented by black and red spheres, respectively. (B) Crystal structure of ZRANB2 ZnF in complex with the 5’-GGU-3’ RNA sequence [49]. (C) Crystal structure of Nova2 KH3 bound to the 5’-UCAC-3’ RNA sequence [54]. (D) NMR structure of SF1 KH domain bound to the 5’-UAAC-3’ RNA sequence [60]. The colour schemes are the same that in Figure 2. Figure 5: Splicing repression models by RNA looping. Models are based on the structures of the MBNL1 zinc fingers 3+4 [46], the PTB RRM3+4 [16] and the hnRNP A1 RRM1+2 dimer [33] bound to RNA. These proteins repress splicing by looping out cis-acting elements essential for splicing, the pyrimidine-rich sequence located at the 3’ splice site as proposed by Teplova and co-workers [46] for MBNL1, short alternative exon as proposed by Oberstrass and co-workers [16] for PTB and long alternative-exons as proposed by Blanchette and Chabot [44] for hnRNP A1. References 18 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. Wahl, M.C., C.L. Will, and R. Luhrmann, The spliceosome: design principles of a dynamic RNP machine. Cell, 2009. 136(4): p. 701-18. Pan, Q., et al., Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet, 2008. 40(12): p. 1413-5. Sultan, M., et al., A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science, 2008. 321(5891): p. 956-60. Wang, E.T., et al., Alternative isoform regulation in human tissue transcriptomes. Nature, 2008. 456(7221): p. 470-6. Chen, M. and J.L. Manley, Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nat Rev Mol Cell Biol, 2009. 10(11): p. 741-54. Venter, J.C., et al., The sequence of the human genome. Science, 2001. 291(5507): p. 1304-51. Clery, A., M. Blatter, and F.H. Allain, RNA recognition motifs: boring? Not quite. Curr Opin Struct Biol, 2008. 18(3): p. 290-8. Maris, C., C. Dominguez, and F.H. Allain, The RNA recognition motif, a plastic RNAbinding platform to regulate post-transcriptional gene expression. Febs J, 2005. 272(9): p. 2118-31. Hargous, Y., et al., Molecular basis of RNA recognition and TAP binding by the SR proteins SRp20 and 9G8. Embo J, 2006. 25(21): p. 5126-37. Bourgeois, C.F., F. Lejeune, and J. Stevenin, Broad specificity of SR (serine/arginine) proteins in the regulation of alternative splicing of pre-messenger RNA. Prog Nucleic Acid Res Mol Biol, 2004. 78: p. 37-88. Auweter, S.D., et al., Molecular basis of RNA recognition by the human alternative splicing factor Fox-1. Embo J, 2006. 25(1): p. 163-73. Yeo, G.W., et al., An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol, 2009. 16(2): p. 130-7. Zhang, C., et al., Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev, 2008. 22(18): p. 2550-63. Skrisovska, L., et al., The testis-specific human protein RBMY recognizes RNA through a novel mode of interaction. EMBO Rep, 2007. 8(4): p. 372-9. Tacke, R. and J.L. Manley, The human splicing factors ASF/SF2 and SC35 possess distinct, functionally significant RNA binding specificities. Embo J, 1995. 14(14): p. 3540-51. Oberstrass, F.C., et al., Structure of PTB bound to RNA: specific binding and implications for splicing regulation. Science, 2005. 309(5743): p. 2054-7. Auweter, S.D., F.C. Oberstrass, and F.H. Allain, Solving the structure of PTB in complex with pyrimidine tracts: an NMR study of protein-RNA complexes of weak affinities. J Mol Biol, 2007. 367(1): p. 174-86. Sickmier, E.A., et al., Structural basis for polypyrimidine tract recognition by the essential pre-mRNA splicing factor U2AF65. Mol Cell, 2006. 23(1): p. 49-59. Jenkins, J.L., et al., Solution conformation and thermodynamic characteristics of RNA binding by the splicing factor U2AF65. J Biol Chem, 2008. 283(48): p. 33641-9. Singh, R., J. Valcarcel, and M.R. Green, Distinct binding specificities and functions of higher eukaryotic polypyrimidine tract-binding proteins. Science, 1995. 268(5214): p. 1173-6. Chan, R.C. and D.L. Black, Conserved intron elements repress splicing of a neuronspecific c-src exon in vitro. Mol Cell Biol, 1997. 17(5): p. 2970. Gromak, N., et al., Antagonistic regulation of alpha-actinin alternative splicing by CELF proteins and polypyrimidine tract binding protein. Rna, 2003. 9(4): p. 443-56. 19 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. Handa, N., et al., Structural basis for recognition of the tra mRNA precursor by the Sex-lethal protein. Nature, 1999. 398(6728): p. 579-85. Valcarcel, J., et al., The protein Sex-lethal antagonizes the splicing factor U2AF to regulate alternative splicing of transformer pre-mRNA. Nature, 1993. 362(6416): p. 171-5. Hui, J. and A. Bindereif, Alternative pre-mRNA splicing in the human system: unexpected role of repetitive sequences as regulatory elements. Biol Chem, 2005. 386(12): p. 1265-71. Voelker, R.B. and J.A. Berglund, A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing. Genome Res, 2007. 17(7): p. 1023-33. Wang, X. and T.M. Tanaka Hall, Structural basis for recognition of AU-rich element RNA by the HuD protein. Nat Struct Biol, 2001. 8(2): p. 141-5. Tsuda, K., et al., Structural basis for the sequence-specific RNA-recognition mechanism of human CUG-BP1 RRM3. Nucleic Acids Res, 2009. 37(15): p. 5151-66. Park-Lee, S., S. Kim, and I.A. Laird-Offringa, Characterization of the interaction between neuronal RNA-binding protein HuD and AU-rich RNA. J Biol Chem, 2003. 278(41): p. 39801-8. Park, S., et al., HuD RNA recognition motifs play distinct roles in the formation of a stable complex with AU-rich RNA. Mol Cell Biol, 2000. 20(13): p. 4765-72. Mori, D., et al., Quantitative analysis of CUG-BP1 binding to RNA repeats. J Biochem, 2008. 143(3): p. 377-83. Han, K., et al., A combinatorial code for splicing silencing: UAGG and GGGG motifs. PLoS Biol, 2005. 3(5): p. e158. Ding, J., et al., Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single-stranded telomeric DNA. Genes Dev, 1999. 13(9): p. 1102-15. Dominguez, C. and F.H. Allain, NMR structure of the three quasi RNA recognition motifs (qRRMs) of human hnRNP F and interaction studies with Bcl-x G-tract RNA: a novel mode of RNA recognition. Nucleic Acids Res, 2006. 34(13): p. 3634-45. Martinez-Contreras, R., et al., hnRNP proteins and splicing control. Adv Exp Med Biol, 2007. 623: p. 123-47. Kielkopf, C.L., S. Lucke, and M.R. Green, U2AF homology motifs: protein recognition in the RRM world. Genes Dev, 2004. 18(13): p. 1513-26. Corsini, L., et al., U2AF-homology motif interactions are required for alternative splicing regulation by SPF45. Nat Struct Mol Biol, 2007. 14(7): p. 620-9. Rideau, A.P., et al., A peptide motif in Raver1 mediates splicing repression by interaction with the PTB RRM2 domain. Nat Struct Mol Biol, 2006. 13(9): p. 839-48. Selenko, P., et al., Structural basis for the molecular recognition between human splicing factors U2AF65 and SF1/mBBP. Mol Cell, 2003. 11(4): p. 965-76. Schellenberg, M.J., et al., Crystal structure of a core spliceosomal protein interface. Proc Natl Acad Sci U S A, 2006. 103(5): p. 1266-71. Kuwasako, K., et al., Complex assembly mechanism and an RNA-binding mode of the human p14-SF3b155 spliceosomal protein complex identified by NMR solution structure and functional analyses. Proteins, 2008. 71(4): p. 1617-36. Spadaccini, R., et al., Biochemical and NMR analyses of an SF3b155-p14-U2AF-RNA interaction network involved in branch point definition during pre-mRNA splicing. Rna, 2006. 12(3): p. 410-25. Vitali, F., et al., Structure of the two most C-terminal RNA recognition motifs of PTB using segmental isotope labeling. Embo J, 2006. 25(1): p. 150-62. 20 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. Blanchette, M. and B. Chabot, Modulation of exon skipping by high-affinity hnRNP A1-binding sites and by intron elements that repress splice site utilization. Embo J, 1999. 18(7): p. 1939-52. Hall, T.M., Multiple modes of RNA recognition by zinc finger proteins. Curr Opin Struct Biol, 2005. 15(3): p. 367-73. Teplova, M. and D.J. Patel, Structural insights into RNA recognition by the alternative-splicing regulator muscleblind-like MBNL1. Nat Struct Mol Biol, 2008. 15(12): p. 1343-51. Hudson, B.P., et al., Recognition of the mRNA AU-rich element by the zinc finger domain of TIS11d. Nat Struct Mol Biol, 2004. 11(3): p. 257-64. Cavaloc, Y., et al., The splicing factors 9G8 and SRp20 transactivate splicing through different and specific enhancers. Rna, 1999. 5(3): p. 468-83. Loughlin, F.E., et al., The zinc fingers of the SR-like protein ZRANB2 are singlestranded RNA-binding domains that recognize 5' splice site-like sequences. Proc Natl Acad Sci U S A, 2009. 106(14): p. 5581-6. Grishin, N.V., KH domain: one motif, two folds. Nucleic Acids Res, 2001. 29(3): p. 638-43. Valverde, R., L. Edwards, and L. Regan, Structure and function of KH domains. Febs J, 2008. 275(11): p. 2712-26. Lunde, B.M., C. Moore, and G. Varani, RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol, 2007. 8(6): p. 479-90. Ule, J., et al., Nova regulates brain-specific splicing to shape the synapse. Nat Genet, 2005. 37(8): p. 844-52. Lewis, H.A., et al., Sequence-specific RNA binding by a Nova KH domain: implications for paraneoplastic disease and the fragile X syndrome. Cell, 2000. 100(3): p. 323-32. Buckanovich, R.J. and R.B. Darnell, The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo. Mol Cell Biol, 1997. 17(6): p. 3194-201. Jensen, K.B., et al., Nova-1 regulates neuron-specific alternative splicing and is essential for neuronal viability. Neuron, 2000. 25(2): p. 359-71. Ule, J., et al., CLIP identifies Nova-regulated RNA networks in the brain. Science, 2003. 302(5648): p. 1212-5. Ule, J., et al., An RNA map predicting Nova-dependent splicing regulation. Nature, 2006. 444(7119): p. 580-6. Berglund, J.A., et al., The splicing factor BBP interacts specifically with the premRNA branchpoint sequence UACUAAC. Cell, 1997. 89(5): p. 781-7. Liu, Z., et al., Structural basis for recognition of the intron branch site RNA by splicing factor 1. Science, 2001. 294(5544): p. 1098-102. Beuth, B., et al., Structure of a Mycobacterium tuberculosis NusA-RNA complex. Embo J, 2005. 24(20): p. 3576-87. Auweter, S.D., F.C. Oberstrass, and F.H. Allain, Sequence-specific binding of singlestranded RNA: is there a code for recognition? Nucleic Acids Res, 2006. 34(17): p. 4943-59. Tazi, J., N. Bakkour, and S. Stamm, Alternative splicing and disease. Biochim Biophys Acta, 2009. 1792(1): p. 14-26. Koradi, R., M. Billeter, and K. Wuthrich, MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph, 1996. 14(1): p. 51-5, 29-32. 21