Download Splicing regulation: a structural biology perspective

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Histone acetyltransferase wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genetic code wikipedia , lookup

NEDD9 wikipedia , lookup

Messenger RNA wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA interference wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Polyadenylation wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA world wikipedia , lookup

Alternative splicing wikipedia , lookup

RNA wikipedia , lookup

RNA-Seq wikipedia , lookup

RNA silencing wikipedia , lookup

History of RNA biology wikipedia , lookup

Non-coding RNA wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Epitranscriptome wikipedia , lookup

Primary transcript wikipedia , lookup

RNA-binding protein wikipedia , lookup

Transcript
Splicing regulation: a structural biology perspective
Antoine Cléry1, and Frédéric H.-T. Allain1,2
1
Institute for Molecular Biology and Biophysics, ETH Zürich, CH-8093 Zürich, Switzerland
2
To whom correspondence should be addressed: [email protected]
1
Splicing regulation: a structural biology perspective
Introduction
The spliceosome and his associated proteins is a highly dynamic RNP machine
involving a complicated network of RNA-RNA, RNA-protein and protein-protein
interactions. Mass spectrometric analyses of affinity-purified spliceosomal complexes indicate
that the total number of spliceosome-associated factors is approximately 170 [1]. Among all
the proteins involved in splicing, one can first distinguish the proteins which are part of the
spliceosome (the spliceosomal proteins) and the others which are referred as splicing factors.
Three chapters have been dedicated to this nuclear macromolecular machinery in human,
yeasts and plants. Here, we focus on the large number of splicing factors involved in the
regulation of splicing (also referred as alternative-splicing). Recent estimations indicate that
nearly 80 to 95% of human multi-exon pre-mRNAs are alternatively spliced [2-4]. In higher
eukaryotes, high frequency of alternative-splicing events results from the presence of
degenerated 5’ and 3’ splice-sites which fail to efficiently recruit the spliceosome. As a result,
the presence of additional RNA sequences located in both exon and intron elements are
necessary to stimulate or inhibit splicing. Most of these cis-acting RNA sequences are bound
by splicing factors which help recruiting or not the splicing machinery. The numerous
splicing factors identified to date can be categorized in three main families: the SR proteins
(containing serine/arginine rich sequences) which mostly facilitate splice-site recognition, the
hnRNP proteins which are considered to have rather an antagonist function and finally the
tissue specific splicing factors which can play both roles (reviewed recently by Chen and
Manley [5]). All these alternative-splicing factors contain different types of RNA binding
domains (mostly RRMs, KH domains and zinc fingers) often in multiple copies (Fig. 1 and
Table 1) and all of them recognize RNA sequence specifically. In this chapter, we review the
current knowledge on how alternative-splicing factors recognize RNA and proteins at the
atomic level. Structural biology contributions have been essential over the last decade to help
deciphering this vast protein-RNA and protein-protein interaction network. Structures have
explained how certain cis-acting elements can be discriminated by splicing factors but also
how RNA binding protein can affect RNA structure. We have organized this chapter in
grouping the splicing factors by the types of RNA binding domains they embed rather than by
family of proteins. We then review successively the structures of alternative-splicing factors
containing RRMs (the most common RNA binding domains found in splicing factors),
2
containing zinc-fingers and finally containing KH domains. We describe and compare the
different structures and show how the structure of the alternative-splicing factors in complex
with the different RNA and protein partners contribute to a better understanding of the
mechanism of action of these proteins in splicing regulation.
1. The RRM: a versatile scaffold for interacting with multiple RNA sequences and also
proteins
The RNA-recognition motif (RRM), also known as RBD (RNA binding domain) or
RNP (ribonucleoprotein domain) is the most abundant RNA-binding domain in higher
vertebrates (this motif is present in about 0.5%-1% of human genes) [6]. Over the last ten
years, biochemical and structural studies have shown that this domain is not only involved in
RNA/DNA recognition but also in protein-protein interaction. Both modes of interactions
play crucial role in splicing regulation.
1.1. RRM-RNA interaction and splicing regulation
An RRM is approximately 90 amino acids long with a typical  topology
that forms a four-stranded -sheet packed against two -helices (Fig. 2A). RRMs are found in
almost all types of splicing factor families in a single copy or in multiple copies (Fig. 1).
Although the -sheet is most commonly used to bind single-stranded RNAs, an extreme
structural diversity of modes of RRM-nucleic acid recognition has been selected during
evolution making RRMs a very versatile RNA binding platform [7, 8].
Most commonly, three aromatic side-chains belonging to the two signature sequences
RNP1 and RNP2, and exposed on the -sheet surface (Fig. 2A and 2B), accommodate two
nucleotides as follows: the bases of the 5’ and of the 3’ nucleotides stack on an aromatic ring
located in 1 (position 2 of RNP2) and in 3 (position 5 of RNP1), respectively (Fig. 2A).
The third aromatic ring which is usually located in 3 (position 3 of RNP1) is often inserted
between the two sugar rings of the dinucleotide (Fig. 2A). However, deviations from this
basic mode of binding are found in many RRM-RNA complexes, due to a role of the N- and
C-terminal extensions of the domain, to the interdomain linker in case of proteins containing
multiple RRMs or to additional protein cofactors that can also modulate the RNA-binding
specificity [8]. Several alternative-splicing factors containing one or multiple RRMs have
been solved in complex with RNA over the years (Table 1), namely PTB (polypyrimidinetract binding protein, 4 RRMs), HuD (2 of 3 RRMs), Sex-lethal (2 RRMs), hnRNP A1 (2
3
RRMs), U2AF65 (2 RRMs), Fox-1 (1 RRM), RBMY (1 RRM), SRp20 (1 RRM) and more
recently RRM3 of CUG-BP.
1.1.1 RNA binding by splicing factors containing a single RRM
Splicing factors embedding a single RRM are few in comparison with the ones
containing multiple RRMs. With a single RRM, only SRp20, 9G8, SC35, SRp46, SRp54,
SRrp86, RNPS1, Tra2 and Tra2 are found among SR and SR-like proteins, hnRNP C1/C2
and G, among hnRNP proteins and Fox-1 and Fox-2 among the tissue-specific splicing factors
(Fig. 1). With a single RRM, one would expect these proteins to bind RNA with less affinity
and less sequence-specificity than multi-RRM proteins, we will see that if this is true for some
(SRp20) this is not always true (Fox-1). Among these factors, the structures of three single
RRMs in complex with RNA have been determined, namely SRp20, Fox-1 and RBMY (a
testis-specific protein with more than 80% identity with the RRM of hnRNP G).
The NMR structure of the human SR protein SRp20 in complex with the 5’-CAUC-3’
RNA sequence still represents the first and unique structure to date of an SR protein in
complex with RNA [9]. The structure reveals the presence of an additional aromatic residue (a
tryptophan) located on the -sheet surface (on 2-strand) that is responsible for the binding of
the two most 3’ nucleotides (Fig. 2C). Although, four nucleotides are bound, the affinity is
rather weak (20 M) due to the unusual semi sequence-specific mode of RNA recognition by
this RRM. Indeed the structure reveals a binding consensus sequence CNNC (where N can be
any nucleotide) which is compatible with the sequence consensus established for this protein
by in vitro and in vivo SELEX experiments [10]. This degenerate sequence-specificity of
SRp20 RRM allows the binding of this protein to more diverse RNA sequences making the
evolutionary pressure on the bound RNA weaker, which is ideal for exonic sequences
containing natural SRp20 RNA targets [10]. This weak RNA binding affinity allows a more
frequent SRp20 association and dissociation from the RNA which is important in the context
of the highly dynamic processes involving this protein which is present from RNA
transcription to mRNA export.
The structure of the RRM of human Fox-1 (a tissue-specific alternative splicing factor)
in complex with the 5’-UGCAUGU-3’ RNA presents a radically different mode of binding
compared to SRp20 [11]. Although both proteins contain a single RRM, the affinity of the
Fox-1 RRM for 5’-UGCAUGU-3’ is extremely high (Kd in the subnanomolar range)
reflecting a very high sequence-specificity for the central pentamer GCUAG. To
4
accommodate seven RNA nucleotides on a single domain, the RRM of Fox-1 uses, in addition
to the -sheet surface, several loops joining secondary structure elements (Fig. 2D). In
particular the presence of a phenylalanine in the 1/1 loop of Fox-1 RRM is critical for
binding RNA as the first three nucleotides are wrapped around it (Fig. 2D) [11]. Although the
mechanism of action of Fox-1 and Fox-2 in splicing regulation is not known, the clear
sequence-specificity of the protein allowed a reliable mapping of its binding sites and the
identification of strong correlations between the location of Fox-1 binding site relative to
splice-sites and its effect on splicing regulation [12, 13]. Considering the very high affinity of
Fox-1 RRM, one would expect Fox-1 to remain bound to the RNA when the protein finds its
target, contrary to SRp20.
The structure of the single RRM of the human testis-specific RBMY in complex with
RNA revealed common features to both SRp20 and Fox-1 [14]. Considering the high
sequence identity between the RRMs of the human RBMY and hnRNP G, the structure
suggests that hnRNP G can bind sequence-specifically CAA motifs on the -sheet surface of
the RRM (Fig. 2E). However, hnRNP G and RBMY having a different 2/3 loop (both in
length and sequence), only RBMY has the ability to bind a stem-loop containing a CAA motif
in the loop by insertion of the  loop in the major groove of the RNA stem (Fig. 2E).
Although only putative targets have been identified for RBMY [14], it is interesting to note
that the two tissue-specific splicing factors described here (RBMY and Fox-1) both bind RNA
with high-affinity and specificity using a single RRM.
1.1.2 RNA binding by splicing factors containing multiple RRMs
Most splicing factors contain multiple RRM copies (Table 1). Structures of the two
RRMs of Sex-lethal, U2AF65 and hnRNP A1, of RRM3 of CUG-BP, of RRM1 and RRM2 of
HuD and of the four RRMs of PTB have been determined in complex with RNA (Table 1).
From these few structures it appears that generally RRMs joined by a single protein chain
happen to bind very similar sequences although not in an identical manner. This could be at
the origin of the very repetitive sequences that have been observed in cis-acting elements
regulating splicing [5]. There are of course exceptions to this rule with for example five SR
proteins (ASF/SF2, SRp30c, SRp40, SRp55 and SRp75) which all embed two very different
RRMs (a canonical RRM and a pseudo-RRM), each of these two RRMs harboring a different
RNA binding specificity [10, 15].
5
Recognizing pyrimidine-tract by Sex-lethal, U2AF65 and PTB
The 3’ splice-site pyrimidine-tract is a major cis-acting element for both constitutive
and alternative splicing. Several trans-acting factors have been shown to bind the pyrimidinetract resulting in activating (U2AF65) or repressing (Sex-lethal and PTB) splicing. The
structure of the RRMs of the three proteins bound to pyrimidine-tracts revealed the nature of
the RRM-RNA interaction and the molecular basis of the sequence-specificity of each protein.
The structure of all four RRMs of PTB was solved in complex with short 5’CUCUCU-3’ pyrimidine-tracts [16]. It was found that each RRM of PTB can bind a short
pyrimidine-tract, RRM1 and RRM4 binding three pyrimidines, RRM2 binding four and
RRM3 binding five. RRM2 and RRM3 of PTB contain an additional fifth -strand resulting
in an extension of the -sheet and therefore the binding of additional nucleotides (Fig. 3A)
[16]. The structure revealed a similar although not identical sequence specificity for the four
RRMs as RRM 1, 2, 3 and 4 recognize specifically YCU, CUNN, YCUNN and YCN
sequences, respectively (Y is a pyrimidine and N any nucleotide). The dissociation constant
(Kd) of each RRM for a CUCUCU sequence is around 1M but increases substantially for
polyU sequences confirming the sequence-specific binding preference for pyrimidine-tracts
containing cytosines [17].
The structure of the pre-mRNA splicing factor U2AF65 in complex with a U-tract
revealed a different mode of pyrimidine-tract recognition although still using the -sheet
surface (Fig. 3B) [18]. This interaction is governed by hydrogen-bonds involving flexible
side-chains of conserved U2AF65 residues and by water molecules mediating interactions
between U2AF65 side-chains and the uracil bases. The use of flexible side-chains and the
possible relocation of bound water molecules could explain how U2AF65 accommodates at
certain position cytosines which are present in most 3’ splice-site pyrimidine-tracts. Like PTB
RRMs, the two RRMs of U2AF65 bind RNA independently explaining the similarly weak
affinity (Kdin the M range) observed for this splicing factor [19].
These structural data allow a better understanding of how PTB and U2AF65 compete
for binding on the 3’ splice-site pyrimidine-tract [20]. U2AF65 preferentially binds uraciltracts but can adapt to bind any pyrimidine-tract due to its versatile mode of RNA binding,
whereas PTB preferentially binds pyrimidine-rich sequences containing CU-tracts. This
explains that alternative exons repressed by PTB and containing CU-tracts in the 3’splice-site
can be changed into constitutive exons and therefore de-repressed by several C to U changes
[21, 22].
6
Binding of U-tracts by the two RRMs (RRM12) of Sex-lethal is quite different from
the other two proteins. In the structure of the complex [23], Sex-lethal RRM12 recognizes
sequence-specifically each nucleotide of 5’-UGUUUUUUU-3’ except U5, with RRM2
recognizing the 5’UGU and RRM1 the 3’UUUUUU sequences. InterRRM interactions upon
RNA binding and contact from the short interdomain linker to the RNA contribute to the
overall high affinity (Kd
-lethal for the RNA. Comparison between
the two structures explains well, how Sex-lethal can prevent U2AF65 binding to U-tract like
observed in the Drosophila tra pre-mRNA [24]. Sex-lethal RRMs can not only discriminate
better than U2AF65 uracils over cytosines but also the two RRMs of Sex-lethal can bind
cooperatively U-tracts while the two RRMs of U2AF65 cannot.
Although PTB, U2AF and Sex-lethal bind pyrimidine-tracts using similar RNA
recognition motifs and the same surface of interactions (the -sheet), subtle variations in the
side-chain composition on the -sheet surface has allowed the RRM of each protein to
recognize UCU, YYY and UUU sequences, respectively. Additionally, the RRMs of Sexlethal evolved to bind pyrimidine-tract cooperatively while the RRMs of PTB and U2AF65
appear to bind RNA independently.
Recognizing purine-pyrimidine tract by CUG-BP and HuD
Several purine-pyrimidine tracts have been found as alternative-splicing regulatory
cis-acting elements like for example CA-tracts, UG-tracts or CUG-tracts [25]. AU-rich
elements have been initially characterized for their importance in RNA stability and more
recently in alternative-splicing regulation [26]. Several RRM containing proteins have been
identified as trans-acting factors binding these purine-pyrimidine tracts, for example hnRNP L
binding CA-tracts, RBM35 and CELF-proteins like CUG-BP binding UG-tracts and CUGtracts [25] or ELAV-proteins like HuD binding AU-tracts. The structures of HuD RRM1 and
RRM2 bound to AU-rich RNA [27] and more recently the CUG-BP RRM3 in complex with
RNA [28] have been determined and provided information on how such RNA tracts are
recognized by RRMs.
HuD and CUG-BP have in common a similar domain organization, both proteins
embedding three RRMs with the two most N-terminal RRMs (RRM1 and RRM2) being
separated by a small interdomain linker (11 and 9 amino-acids, respectively) while the Cterminal RRM3 is found much further away from RRM2 (89 and 113 amino-acids,
respectively). The two solved structures therefore provide indications on the RNA binding
7
mode of the numerous splicing factors containing three RRMs (Fig. 1). Considering the high
sequence similarity between RRM12 of HuD and the RRM12 of Sex-lethal, it is maybe not
too surprising that the structure of HuD bound to UAUUUAUUU [27] (Fig. 3C) and Sexlethal bound to UGUUUUUUU adopt a very similar conformation [23]. While most of the
contacts with the pyrimidines are sequence-specific (Fig. 3C), the protein contacts to the
adenines in HuD do not appear to be A-specific, similarly to the contacts to guanines in Sexlethal [23]. It is therefore unclear how the purines are discriminated by these two RNA
binding proteins. In the case of HuD, it was even suggested that adenines destabilize HuD
binding [29]. Like for Sex-lethal, it is very likely that RRM1 and RRM2 of HuD bind RNA in
a cooperative fashion to increase RNA binding affinity and specificity [30].
The structure of RRM3 of CUG-BP1 was recently determined in complex with the
hexamer UGUGUG [28]. This NMR structure revealed sequence-specific recognition of the
central UGU motif although all six nucleotides are bound by RRM3 (Fig. 3D). The 12 aminoacids immediately N-terminal to the RRM strongly interact with the surface of the RRM in its
free form by running across the -sheet. This N-terminal extension also contributes to RNA
binding by interacting with G4 and U5 (Fig. 3D). This extension partly explains that six
nucleotides can be bound to this isolated RRM although the binding affinity remains modest
(Kd = 1.9 M).
The binding affinity and mode of sequence-specific binding for the two N-terminal
RRMs of HuD and the C-terminal RRM of CUG-BP are quite different. This possibly reflects
on the different roles played by these two different parts in both proteins [30, 31], although
this needs to be confirmed by the structure of the three RRM containing protein bound to
RNA. Also it remains to be seen if, in this context, the three RRMs have the same RNA
binding specificity or not.
Recognizing polypurine-tract by hnRNP A1 and hnRNP F
Polypurine-tract are found very frequently as high affinity binding sites for many alternativesplicing factors, including most SR proteins and SR-related proteins [10] but also the splicing
repressor hnRNP A1 (binding sequence 5’-UAGGG-3’) and members of the hnRNP H/F
family that bind G-tracts containing RNAs [32]. Among these different proteins, the structure
of hnRNP A1 in complex with DNA telomeric repeats (5’-TTAGGG-3’) has been determined
[33] and the structure of the apo form supplemented by interesting NMR binding studies have
been done for the three hnRNP F RRMs [34].
8
In the hnRNP A1-DNA complex, sequence-specific interactions with 5’-TAGG-3’
sequences have been observed on the -sheet surface of both RRMs with an almost identical
recognition in both domains for TAG. The structure strongly argues that UAGG RNA
sequences would be recognized in an identical manner [33]. This sequence is reminiscent of
the 3’ splice-site consensus sequence and is found in many cis-acting elements bound by
hnRNP A1 that regulate alternative splicing [5, 32, 35]. The crystal structure of hnRNP A1
bound to two telomeric DNA repeats revealed also an usual arrangement of the DNA and the
protein, as the complex is a dimer with the 5’TAGG of each DNA molecule contacting RRM1
of one subunit and the 3’TAGG contacting RRM2 of the second subunit. Although this
arrangement might be functionally important for telomeric repeats, it remains to be seen if this
would be relevant for splicing regulation.
Recent NMR investigation of the three RRMs of hnRNP F [34] in complex with the
5’-CGGGAU-3’ RNA sequence revealed a non canonical binding surface constituted by the
three loops (the 1/1, 2/3 and2/4 loops which are all located at the “south” side of the
-sheet) of each RRM instead of the -sheet surface [34]. These RRMs are not canonical
since they lack conserved aromatic residues in RNP1 and RNP2. This is why historically
these domains were named qRRMs for quasi-RRMs. The structures of these unusual RRMs
might reveal why such unusual mode of binding evolved.
1.2. RRM-RRM and RRM-proteins interaction in splicing regulation
It has been apparent over the past several years that the RRM is not only an RNA
binding platform but also (sometimes exclusively) a very good protein-protein interaction
domain. Peptide-RRM as well as RRM-RRM interactions have been discovered and
structures have been determined. These protein-RRM interactions can also play a significant
role in splicing regulation.
1.2.1 RRM-protein interactions without RNA binding
The UHM family (U2AF homology motif), a noncanonical RRM family, has been
defined for RRMs sharing sequence and structural characteristics with U2AF [36]. This
family is characterized by i) the absence of aromatic residues in the RNP2 sequence, ii) an
extended highly acidic 1-helix, and iii) the presence of a conserved Arg-X-Phe motif (X is
for any amino acid) in the 2/4 loop. UHM-ULM (UHM-ligand motif) interaction plays an
important role for the assembly of splicing factors at the 3’ splice-site. For example, UHMULM contacts mediate the interaction of U2AF65 with SF1, U2AF35 or SF3b155.
9
Interestingly, RNA binding to these UHM RRMs seems to be compromised by the presence
of an additional C-terminal helix, which is packed against the -sheet. The Arg-X-Phe motif
and the negatively charged extended 1-helix form the surface for protein-protein
interactions. A recent structure of SPF45 UHM in complex with SF3b155 ULM has shed light
for a role of the UHM-ULM interaction in alternative splicing (Fig. 3E) [37]. This interaction
was found to be critical for the splicing regulation of the apoptosis regulatory gene FAS.
Based on the structure, the authors showed that substitutions in the conserved UHM motif
Arg375-X-Phe377 (in 3’) or mutation of Glu329 (in 1-helix) or Asp319 (in the 1/1 loop)
affect differently the affinity of SPF45 UHM for three natural ULM targets of SPF45
(SF3b155, SF1 and U2AF65). These data strongly suggest that by interacting with the ULM
present in these three splicing factors, SPF45 can repress splicing. It therefore appears that
RRM containing proteins can repress splicing by very different mechanisms, for some like
Sex-lethal, PTB or hnRNP A1 it involves RNA binding competition with splicing factors and
for others like SPF45 it involves direct interactions with splicing factors in order to prevent
their assembly.
1.2.2 RRM-protein interactions allowing RNA binding
Another example of RRM-protein interactions regulating splicing is the binding of
PTB RRM2 by its co-repressor Raver1 (Fig. 3F) [38]. The Raver1 peptide interacts with the
shallow groove formed by 1-helix and the 2/4 loop of PTB RRM2, similarly to the
binding of ULM to an UHM (Fig. 3E) [39]. However, the tryptophan side-chain typical of an
ULM is replaced by conserved leucine residues at positions 500 and 501 of the Raver1 motif
(499-SLLGEPP-505). Although similar to UHM-ULM interaction, the PTB-Raver1
interaction is functionally different as it is compatible with simultaneous RNA binding [38].
Raver1 containing four PTB RRM2 binding motifs, it is suggested that the co-repressor
mechanism of action of Raver1 could be to act as a recruitment platform for multiple PTB
molecules [38].
The interaction of RRMs with proteins can also limit the specificity of RRM-RNA
recognition. Nice illustrations are the crystal [40] and solution [41] structures of the complex
containing the p14 protein, a human component of the spliceosomal U2 and U11/U12
snRNPs, and a peptide derived from SF3b155. The p14 -sheet is occluded by its own helix and by SF3b155. Only one pocket containing a conserved RNP2 residue (Tyr22) is
accessible to the solvent [40]. Biochemical and NMR studies suggest that this residue [40, 42]
10
but maybe also Tyr28 (1/1 loop) and Arg85 (2/4 -hairpin) are involved in the branchpoint recognition [41] but with a weak specificity and affinity, allowing the possible
regulation of this interaction by competitors.
1.2.3. Impact of RRM-RRM interaction on splicing mechanism
RRMs also use their -helices to interact with each other keeping the -sheet of the
RRM completely free for RNA interactions. This can be done intramolecularly like in PTB or
intermolecularly like in hnRNP A1. The structure of PTB RRM3 and RRM4 free [43] and
bound to RNA [16] revealed that the two RRMs are tightly associated in both forms through
their helices (1 and 2 of RRM3 and 2 of RRM4) and the interdomain linker, forming a
large hydrophobic interface involving 27 protein side-chains [43]. This tight interaction
between the two RRMs, results in an anti-parallel orientation of their bound RNAs implying
that RRM3 and RRM4 could induce the formation of RNA loops. These structural data
suggest that PTB might repress splicing by looping out alternative-exons, a branch point or
any other cis-acting element [16].
As described above, hnRNP A1 RRM1 and RRM2 have been found to dimerize upon
telomeric DNA binding via intermolecular RRM-RRM interactions in the crystal structure.
This suggests a potential mechanism for how hnRNP A1 might loop-out alternative-exon or
help in the splicing of very large introns as proposed by Blanchette and Chabot [44].
2. The zinc finger domain
The classical zinc finger domain is approximately 30 amino acids long and displays a
 protein fold in which a -hairpin and an -helix are pinned together by a Zn2+ ion. These
domains are classified in function of the amino acids which stabilize the Zn2+ interaction.
Although zinc fingers are mostly known as interacting with DNA molecules, few structures of
these domains in complex with RNA have also been solved [45]. Similarly to the RRMs, the
zinc fingers have also been reported as interacting specifically with RNA using hydrogen
bond and aromatic-base stacking interactions. However, the amino acids involved in the RNA
interaction are not mainly located in the -strands like for RRMs but are rather embedded in
the protein loops (Fig. 4A and 4B).
It has recently been shown by crystallography that the muscleblind-like 1 (MBNL1)
tandem CCCH zinc fingers 3 and 4 specifically interact with the 5’-GC-3’ sequence using
intermolecular stacking and hydrogen-bonding interactions [46]. In the zinc finger 3 domain,
11
the Arg195 side chain stacks over the G base, and the cytosine is sandwiched between the
Phe202 ring, which is inserted between the two nucleotides, and the Arg186 side chain (Fig.
4A). Sequence-specific recognition is mediated by four hydrogen bonds involving main-chain
amide and carbonyl groups and three hydrogen bonds involving the side chains of Glu183 and
the two cysteines (Cys185 and Cys200) coordinated to the zinc atom (Fig. 4A). This mode of
RNA interaction is reminiscent of how Tis11d (another CCCH zinc finger containing protein
involved in mRNA stability) binds a 5’-AU-3’ sequence although the dinucleotide sequence is
different (PDB code: 1RGO) [47]. Like for proteins containing several RRMs, the mode of
RNA recognition is very similar for the MBNL1 zinc fingers 3 and 4 suggesting a duplication
of this motif during evolution. Interestingly, the anti-parallel orientation adopted by the RNA
molecules bound by the two zinc fingers and the location of MBNL1 binding sites on natural
targets suggest that the protein could induce a looping of the RNA blocking the 3’ splice site
recognition by U2 snRNP resulting in exon skipping [46].
The human 9G8 SR protein contains one CCHC zinc finger located between a RRM
and a RS domain (Fig. 1) and recognizes in vitro different RNA sequences when the zinc
finger is intact or when two cysteines coordinated to the zinc atom are substituted by glycines
[48]. Indeed, in vitro SELEX experiments in presence of the wild type protein selected 5’GAC-3’ repeat RNA sequences, instead of the 5’-(A/U)C(A/U)(A/U)C-3’ motif selection in
presence of the 9G8 mutant [48]. These results bring to the fore the involvement of the zinc
finger in the specific RNA recognition by 9G8 [48]. Another RS containing protein,
ZRANB2, embeds in place of RRMs two RanBP2-type (“CCCC”) zinc finger domains. A
crystal structure of these motifs in complex with the 5’-AGGUAA-3’ RNA sequence was
recently determined [49]. Each domain is composed of two short -hairpins sandwiching a
zinc ion that is coordinated by four conserved cysteines (Fig. 4B). A structural particularity of
this RNA-protein complex is the guanine-Trp79-guanine “ladder” formation adopted by a
continuous stacking of these three residues. The G2, G3 and U4 bases are specifically
recognized by formation of hydrogen bonds involving protein side chains (N76, R81, R82 and
N86), backbone groups (V77 carbonyl and W79 amide) and water-mediated hydrogen bonds
(D68 and A80). These amino acids are mainly located in ZRANB2 loops, especially the one
located at the C-terminal extremity of the first -hairpin (Fig. 4B). Based on functional data
and the strong homology between the ZRANB2 binding site and the 5’ splice site sequences,
the authors suggest that this protein might interact with a subset of 5’ splice sites preventing
their recognition by the spliceosome [49]. Here again, in addition to explain at the atomic
12
level the molecular basis of specific RNA recognition by these proteins, structural data
suggest possible mode of action for these splicing factors.
Since MBNL1 and ZRANB2 both bind 5’-GY-3’ containing sequences, it is
interesting to compare their mode of RNA recognition. In both complexes, one can see
similarities like the stacking by one aromatic ring (F202 and W79) on RNA bases (C3 and
G2/G3 in MBNL1-RNA and ZRANB2-RNA complexes, respectively). However, one can
also see clear differences on how very differently RNA bases are recognized. C3 is mainly
recognized by MBNL1 main chains whereas U4 interacts exclusively with ZRANB2 side
chains (Fig. 4A and 4B). In addition, the G2 and C3 bases are perpendicular in the MBNL1RNA complex whereas the corresponding bases are parallel to each another in presence of the
ZRANB2 domain. Finally, it is only in the MBNL1-RNA complex that two cysteines
coordinated to the zinc atom are found also contacting RNA. These structural data illustrate
clearly how closely related RNA sequences (GY) can be recognized very differently by zinc
finger domains. It also brings to the fore the difficulty to predict RNA recognition by protein
domains and the necessity of still solving additional structures of RNA-protein complexes in
order to correctly characterized and better understand such interactions.
3. The KH domain
The hnRNPK homology (KH) domain is approximately 70 amino acids long. The KH
motif is found in archaea, bacteria and eukaryotes and is known to interact with RNA or
ssDNA targets with a low micromolar affinity [50, 51]. Several copies of this domain can be
found in a protein acting independently or cooperatively. In the latter case, it results in an
increase of the nucleic acid affinity and specificity [52]. Only few structures of KH domains
bound to nucleic acid molecules have been deposed in the Protein Data Bank and most of
them concern the eukaryotic type I KH domain. This motif has a  topology and is
characterized by a -sheet composed of three antiparallel -strands packed against three helices [50, 51]. The 1- and 2-strands are parallel to each other and the 3-strand is
antiparallel to both. In addition, a “GXXG loop” containing the (I/L/V)-I-G-X-X-G-X-X(I/L/V) conserved motif, located between the and helices, and a 2-3 loop variable in
length and sequence, are also found in this motif (Fig. 4C and D). The KH type II fold is
typically found in prokaryotic proteins. It differs from the type I by a  topology and a
characteristic -sheet in which the central strand is parallel to 3 and antiparallel to 1 [50,
51].
13
KH domains have been shown interacting with their nucleic acid targets using
common features. The single-stranded RNA or DNA molecule is mostly bound by an
extended RNA binding surface including the and helices, the GXXG motif, the 2strand and the variable loop [51]. Together, they form a binding cleft that usually
accommodates four bases (Fig. 4C and 4D). Interestingly, KH motifs use a different mode of
RNA recognition when compared to RRMs. Instead of interacting via the -sheet surface they
use a / platform. In addition, the KH RNA binding surface is very hydrophobic and,
contrary to the canonical RNA binding mode of RRM and Zinc finger domains, aromatic
residues are not involved in these interactions. This feature could in part explain the low
affinity found for the KH domain interacting with single stranded nucleic acids.
Nova2 (Neuro-oncological ventral antigen 2) is a tissue-specific alternative splicing
factor containing three KH domains (Fig. 1 and Table 1). This protein is highly expressed in
the neocortex and hippocampus and regulates the alternative splicing of transcripts coding for
proteins having specific functions in brain [53]. The crystal structure of the Nova2 KH3
domain in complex with an in vitro selected stem-loop RNA shows that this protein interacts
with the single stranded 5’-UCAC-3’ sequence located in the loop (Fig. 4C) [54]. U12 is
indirectly specifically recognized by two water molecules forming bridges with the Lys23
from the GKGG loop and the Arg75 from the 3 helix (Fig. 4C). C13 and C15 directly
interact with protein side chains from the 2 and 3 strands, whereas A14 is the only base to
be hydrogen bonded to amide and carbonyl of the main chain of I41 (Fig. 4C). This structure
revealed that the NOVA2 KH3 domain interacts specifically with the 5’-UCAY-3’ RNA
sequence. In good agreement with this result, the 5’-UCAU-3’ sequence located upstream of
the alternatively spliced exon 3A of the glycine receptor 2 pre-mRNA could be predicted as
being a Nova binding site [55, 56]. These structural data have been crucial for the in vivo
identification of several new Nova binding sites and for a better understanding of the splicing
regulation by this protein [53, 57, 58].
Another KH containing protein involved in splicing is SF1/mBBP which specifically
binds the 5’-UACUAAC-3’ intron branchpoint sequence (BPS) in human pre-mRNA
transcripts [59] using a binding surface composed by a KH domain and a C-terminal helix
known as the QUA2 domain (Quaking homology 2) [60]. This extended KH surface with a
 topology allows the binding of seven nucleotides instead of the four nucleotides
usually bound by a single KH domain. The 3’-end of the BPS (5’-UAAC-3’), which contains
the conserved branch point adenosine (underlined), is specifically recognized by the KH
14
domain, whereas the 5’-end (5’-ACU-3’) is bound by conserved residues from the QUA2
domain. Amino acids from the 1 and 2 helices, the 2 strand, the GXXG motif and the
variable loop of the KH domain are mainly used for binding RNA by a combination of
hydrophobic interactions, hydrogen bonding and electrostatic contacts (Fig. 4D) [60].
Interestingly, in good agreement with the conservation of the branch point adenosine, the
NMR structure shows that the base is specifically recognized by hydrogen bonds involving
the main chain of I177 [60] similarly to the contact to A14 in Nova2 KH3 with I41(Fig. 4C
and 4D).
The structures of Nova2 KH3 and SF1 KH domains have been solved in complex with
the similar 5’-UCAC-3’ and 5’-UAAC-3’ RNA sequences, respectively [54, 60]. These data
show that, like the RRM and zinc finger domains, KH domains are able to specifically
recognize RNA sequences. Interestingly, these two proteins use a similar mode of RNA
recognition (Fig. 4C and 4D). In addition to the similar contact to A14 (NOVA2) and A8
(SF1), C13 (NOVA2) and A7 (SF1) are together hydrogen bonded with an aspartate located
in the 1-1 loop. Finally, A8 and A14 are stacked on C9 and C15 in the SF1 KH- and Nova2
KH3-RNA complexes, respectively. Interestingly, these features are also observed in the type
II tandem KH domains of NusA (PDB code: 2ATW) [61] suggesting a rather small range of
sequences that could be targeted specifically by the KH domain containing proteins. This
could partially explain the small number of splicing factors containing KH domains in
comparison to the splicing factors containing RRMs (Fig. 1).
Conclusion and perspectives
In this chapter, we have described the current knowledge on how splicing factors
interact with RNA and proteins at the atomic level and participate in splicing regulation.
Although, still few structures of splicing factors bound to RNA or proteins have been
determined compared to the vast number of proteins involved in splicing regulation, a few
conclusions or hypotheses can be nevertheless drawn from these structures. It is clear from
figure 1 that the vast majority of splicing factors contain RRMs which are used for RNA
binding but in some cases as well for protein-protein interactions. The main lesson we have
learned over the years about the RRM [7, 8, 62] is the extreme versatility and plasticity of this
small protein domain. We showed here that RRM containing proteins can bind specifically a
large variety of sequences as shown with the structures of RRMs bound to pyrimidine tracts
(Sex-lethal, U2AF65 and PTB), purine-pyrimidine tracts (CUG-BP and HuD) and purine
tracts (hnRNP A1, hnRNP F and SR proteins) (Table 1). RRMs also bind RNA with a wide
15
range of affinities as illustrated by the RRMs of SRp20 and Fox-1 that bind RNA specifically
with low and high affinity, respectively. The extreme versatility of the RRMs for binding can
be explained by the use of different combinations of side chain and main chain RNA
interactions but also by the capacity for this domain to increase its RNA binding surface
outside the canonical -sheet surface. Indeed, there are examples of RRMs using an additional
-strand (PTB RRM2 and 3), loops (2/3 loop of RBMY and 1/1 loop of Fox-1) and
RRM extremities (C-terminus of PTB RRMs and N-terminus of CUG-BP) to interact with
RNA. With such high diversity in its modes of RNA interactions, it seems now almost logical
that the RRM appears so frequently in splicing factors considering the large repertoire of
sequences that need to be recognized with different affinity and specificity for splicing
regulation. Unfortunately, one drawback in this versatility is that RRM-RNA interactions are
still very hard to predict which justify the need to determine still more structures of RRMRNA and more generally of protein-RNA complexes.
The structural data have provided essential information to map correctly binding sites
for several splicing factors in vivo (the best examples are Fox-1 and NOVA2), mapping that
revealed that the positioning of the binding site relative to the splice sites appears to be a
major element controlling the mode of action of the splicing factor. Although this information
is not sufficient to fully characterise this mode of action, it contributes to a better
understanding of their functions. Splicing factors work also by competing against other
factors for the same RNA binding sequence. The structural work on PTB, U2AF65 and Sexlethal revealed how each protein adapts to the different pyrimidine-tracts found at the 3’
splice-site. Finally, solving the structures of alternative-splicing factors bound to RNA and
proteins revealed unexpected features like the potential for RNA looping by PTB, hnRNP A1
or MBNL1 (Fig. 5) suggesting that splicing factors function by recognizing RNA sequences
but also by remodelling RNA structure.
Despite progress in the last decade in this growing field, many questions remain to be
answered and will require a structural biology approach to fully understand the role of
splicing factors in splicing regulation. This goes from simple questions that could be
addressed rapidly to more complicated ones that will require multidisciplinary approaches or
new methodologies. For example, we still need to address how a pseudo-RRM binds RNA or
how RS domains mediate RNA and protein binding. A more complicated question is how do
splicing factors interact with the splicing machinery or how several factors assemble or
multimerise on certain cis-acting elements? Also, how dynamic are protein-RNA interactions
near splice-sites and how phosphorylation influences this dynamic? How coordinated among
16
the different gene families is the splicing regulation and how this is mediated at the molecular
level? Finally, since an increasing number of diseases appear to be connected with splicing
regulation, all these emerging knowledge will be indispensable to develop new therapeutic
treatments [63].
Acknowledgements
The authors would like thank Profs. Steve Matthews for providing several models of RRM2Raver1, the Swiss National Science Foundation (No. 3100A0-118118), the SNF-NCCR
Structural Biology and EURASNET for financial support to FHTA and the European
Molecular Biology Organization for a post-doctoral fellowship to AC.
Figure captions
Table 1: Structures of RRMs, KH domains and zinc fingers from splicing factors solved
in complex with RNA.
The protein domains and target RNA sequences used for the structure determinations and the
corresponding PDB numbers are indicated. The nucleotides bound by the proteins are in bold.
Figure 1: Classification of the main human alternative splicing factors in function of
their RNA binding domain composition.
The RRMs, the quasi-RRMs (qRRMs), the pseudo-RRMs (ΨRRMs), the KH domains and
the zinc fingers are represented by rectangles coloured in dark blue, pale blue, pale green,
magenta and yellow, respectively. The RS (Arg/Ser-rich) domains are represented by red
spheres.
Figure 2: The high versatility of single RRM interactions with RNA.
(A) Structure of hnRNP A1 RRM2 in complex with single stranded telomeric DNA as a
model of single stranded nucleic acid binding [33]. (B) Scheme of the four-stranded -sheet
with the place of main conserved RNP1 and RNP2 aromatic residues indicated in green.
RNP1 and RNP2 consensus sequences of RRMs are shown (X is for any amino acid). (C)
Structure of SRp20 RRM in complex with the 5’-CAUC-3’ RNA [9]. In all the figures, the
ribbon of the RRM is shown in grey, the RNA nucleotides are in yellow and the protein sidechains are in green. The N, O and P atoms are in blue, red and orange, respectively. The Nand C-terminal extensions of the RRM and 5’- and 3’-end of RNA are indicated. Hydrogen
17
bonds are represented by purple dashed lines. (D) Structure of Fox-1 RRM in complex with
the 5’-UGCAUGU-3’ RNA [11]. (E) Structure of RBMY RRM in complex with a stem-loop
RNA capped by a 5’-CACAA-3’ pentaloop [14]. The figures were generated by the program
MOLMOL [64].
Figure 3: Structures illustrating RRM-RNA and RRM-peptide interactions.
(A) Structure of PTB RRM3 in complex with the 5’-CUCUCU-3’ RNA [16]. The 4-strand,
the 4/5 loop and the additional -strand of RRM3, which are involved in the RRM-RNA
interaction, are shown in red. (B) Structure of U2AF65 RRM1 in complex with U-tract RNA
[18]. (C) Structure of HuD RRM1 and 2 in complex with the 5’-UAUUUAUUU-3’ RNA
[27]. (D) Structure of the CUG-BP1 RRM3 in complex with the 5’-UGUGUG-3’ RNA
sequence [28]. (E) Structure of SPF45 UHM (in grey) in complex with the SF3b155 ULM (aa
333 to 342, in blue) [37]. (F) Structure of PTB RRM2 (in grey) in complex with Raver1
peptide (in blue) [38] and RNA (in yellow) [16]. The colour schemes are the same that in
Figure 2.
Figure 4: Structures of zinc fingers and KH domains from splicing factors in complex
with RNA.
(A) Crystal structure of MBNL1 ZnF3 bound to the 5’-GC-3’ RNA sequence [46]. The zinc
atom and water molecules are represented by black and red spheres, respectively. (B) Crystal
structure of ZRANB2 ZnF in complex with the 5’-GGU-3’ RNA sequence [49]. (C) Crystal
structure of Nova2 KH3 bound to the 5’-UCAC-3’ RNA sequence [54]. (D) NMR structure of
SF1 KH domain bound to the 5’-UAAC-3’ RNA sequence [60]. The colour schemes are the
same that in Figure 2.
Figure 5: Splicing repression models by RNA looping.
Models are based on the structures of the MBNL1 zinc fingers 3+4 [46], the PTB RRM3+4
[16] and the hnRNP A1 RRM1+2 dimer [33] bound to RNA. These proteins repress splicing
by looping out cis-acting elements essential for splicing, the pyrimidine-rich sequence located
at the 3’ splice site as proposed by Teplova and co-workers [46] for MBNL1, short alternative
exon as proposed by Oberstrass and co-workers [16] for PTB and long alternative-exons as
proposed by Blanchette and Chabot [44] for hnRNP A1.
References
18
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
Wahl, M.C., C.L. Will, and R. Luhrmann, The spliceosome: design principles of a
dynamic RNP machine. Cell, 2009. 136(4): p. 701-18.
Pan, Q., et al., Deep surveying of alternative splicing complexity in the human
transcriptome by high-throughput sequencing. Nat Genet, 2008. 40(12): p. 1413-5.
Sultan, M., et al., A global view of gene activity and alternative splicing by deep
sequencing of the human transcriptome. Science, 2008. 321(5891): p. 956-60.
Wang, E.T., et al., Alternative isoform regulation in human tissue transcriptomes.
Nature, 2008. 456(7221): p. 470-6.
Chen, M. and J.L. Manley, Mechanisms of alternative splicing regulation: insights
from molecular and genomics approaches. Nat Rev Mol Cell Biol, 2009. 10(11): p.
741-54.
Venter, J.C., et al., The sequence of the human genome. Science, 2001. 291(5507): p.
1304-51.
Clery, A., M. Blatter, and F.H. Allain, RNA recognition motifs: boring? Not quite.
Curr Opin Struct Biol, 2008. 18(3): p. 290-8.
Maris, C., C. Dominguez, and F.H. Allain, The RNA recognition motif, a plastic RNAbinding platform to regulate post-transcriptional gene expression. Febs J, 2005.
272(9): p. 2118-31.
Hargous, Y., et al., Molecular basis of RNA recognition and TAP binding by the SR
proteins SRp20 and 9G8. Embo J, 2006. 25(21): p. 5126-37.
Bourgeois, C.F., F. Lejeune, and J. Stevenin, Broad specificity of SR (serine/arginine)
proteins in the regulation of alternative splicing of pre-messenger RNA. Prog Nucleic
Acid Res Mol Biol, 2004. 78: p. 37-88.
Auweter, S.D., et al., Molecular basis of RNA recognition by the human alternative
splicing factor Fox-1. Embo J, 2006. 25(1): p. 163-73.
Yeo, G.W., et al., An RNA code for the FOX2 splicing regulator revealed by mapping
RNA-protein interactions in stem cells. Nat Struct Mol Biol, 2009. 16(2): p. 130-7.
Zhang, C., et al., Defining the regulatory network of the tissue-specific splicing factors
Fox-1 and Fox-2. Genes Dev, 2008. 22(18): p. 2550-63.
Skrisovska, L., et al., The testis-specific human protein RBMY recognizes RNA
through a novel mode of interaction. EMBO Rep, 2007. 8(4): p. 372-9.
Tacke, R. and J.L. Manley, The human splicing factors ASF/SF2 and SC35 possess
distinct, functionally significant RNA binding specificities. Embo J, 1995. 14(14): p.
3540-51.
Oberstrass, F.C., et al., Structure of PTB bound to RNA: specific binding and
implications for splicing regulation. Science, 2005. 309(5743): p. 2054-7.
Auweter, S.D., F.C. Oberstrass, and F.H. Allain, Solving the structure of PTB in
complex with pyrimidine tracts: an NMR study of protein-RNA complexes of weak
affinities. J Mol Biol, 2007. 367(1): p. 174-86.
Sickmier, E.A., et al., Structural basis for polypyrimidine tract recognition by the
essential pre-mRNA splicing factor U2AF65. Mol Cell, 2006. 23(1): p. 49-59.
Jenkins, J.L., et al., Solution conformation and thermodynamic characteristics of RNA
binding by the splicing factor U2AF65. J Biol Chem, 2008. 283(48): p. 33641-9.
Singh, R., J. Valcarcel, and M.R. Green, Distinct binding specificities and functions of
higher eukaryotic polypyrimidine tract-binding proteins. Science, 1995. 268(5214): p.
1173-6.
Chan, R.C. and D.L. Black, Conserved intron elements repress splicing of a neuronspecific c-src exon in vitro. Mol Cell Biol, 1997. 17(5): p. 2970.
Gromak, N., et al., Antagonistic regulation of alpha-actinin alternative splicing by
CELF proteins and polypyrimidine tract binding protein. Rna, 2003. 9(4): p. 443-56.
19
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
Handa, N., et al., Structural basis for recognition of the tra mRNA precursor by the
Sex-lethal protein. Nature, 1999. 398(6728): p. 579-85.
Valcarcel, J., et al., The protein Sex-lethal antagonizes the splicing factor U2AF to
regulate alternative splicing of transformer pre-mRNA. Nature, 1993. 362(6416): p.
171-5.
Hui, J. and A. Bindereif, Alternative pre-mRNA splicing in the human system:
unexpected role of repetitive sequences as regulatory elements. Biol Chem, 2005.
386(12): p. 1265-71.
Voelker, R.B. and J.A. Berglund, A comprehensive computational characterization of
conserved mammalian intronic sequences reveals conserved motifs associated with
constitutive and alternative splicing. Genome Res, 2007. 17(7): p. 1023-33.
Wang, X. and T.M. Tanaka Hall, Structural basis for recognition of AU-rich element
RNA by the HuD protein. Nat Struct Biol, 2001. 8(2): p. 141-5.
Tsuda, K., et al., Structural basis for the sequence-specific RNA-recognition
mechanism of human CUG-BP1 RRM3. Nucleic Acids Res, 2009. 37(15): p. 5151-66.
Park-Lee, S., S. Kim, and I.A. Laird-Offringa, Characterization of the interaction
between neuronal RNA-binding protein HuD and AU-rich RNA. J Biol Chem, 2003.
278(41): p. 39801-8.
Park, S., et al., HuD RNA recognition motifs play distinct roles in the formation of a
stable complex with AU-rich RNA. Mol Cell Biol, 2000. 20(13): p. 4765-72.
Mori, D., et al., Quantitative analysis of CUG-BP1 binding to RNA repeats. J
Biochem, 2008. 143(3): p. 377-83.
Han, K., et al., A combinatorial code for splicing silencing: UAGG and GGGG motifs.
PLoS Biol, 2005. 3(5): p. e158.
Ding, J., et al., Crystal structure of the two-RRM domain of hnRNP A1 (UP1)
complexed with single-stranded telomeric DNA. Genes Dev, 1999. 13(9): p. 1102-15.
Dominguez, C. and F.H. Allain, NMR structure of the three quasi RNA recognition
motifs (qRRMs) of human hnRNP F and interaction studies with Bcl-x G-tract RNA: a
novel mode of RNA recognition. Nucleic Acids Res, 2006. 34(13): p. 3634-45.
Martinez-Contreras, R., et al., hnRNP proteins and splicing control. Adv Exp Med
Biol, 2007. 623: p. 123-47.
Kielkopf, C.L., S. Lucke, and M.R. Green, U2AF homology motifs: protein
recognition in the RRM world. Genes Dev, 2004. 18(13): p. 1513-26.
Corsini, L., et al., U2AF-homology motif interactions are required for alternative
splicing regulation by SPF45. Nat Struct Mol Biol, 2007. 14(7): p. 620-9.
Rideau, A.P., et al., A peptide motif in Raver1 mediates splicing repression by
interaction with the PTB RRM2 domain. Nat Struct Mol Biol, 2006. 13(9): p. 839-48.
Selenko, P., et al., Structural basis for the molecular recognition between human
splicing factors U2AF65 and SF1/mBBP. Mol Cell, 2003. 11(4): p. 965-76.
Schellenberg, M.J., et al., Crystal structure of a core spliceosomal protein interface.
Proc Natl Acad Sci U S A, 2006. 103(5): p. 1266-71.
Kuwasako, K., et al., Complex assembly mechanism and an RNA-binding mode of the
human p14-SF3b155 spliceosomal protein complex identified by NMR solution
structure and functional analyses. Proteins, 2008. 71(4): p. 1617-36.
Spadaccini, R., et al., Biochemical and NMR analyses of an SF3b155-p14-U2AF-RNA
interaction network involved in branch point definition during pre-mRNA splicing.
Rna, 2006. 12(3): p. 410-25.
Vitali, F., et al., Structure of the two most C-terminal RNA recognition motifs of PTB
using segmental isotope labeling. Embo J, 2006. 25(1): p. 150-62.
20
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
Blanchette, M. and B. Chabot, Modulation of exon skipping by high-affinity hnRNP
A1-binding sites and by intron elements that repress splice site utilization. Embo J,
1999. 18(7): p. 1939-52.
Hall, T.M., Multiple modes of RNA recognition by zinc finger proteins. Curr Opin
Struct Biol, 2005. 15(3): p. 367-73.
Teplova, M. and D.J. Patel, Structural insights into RNA recognition by the
alternative-splicing regulator muscleblind-like MBNL1. Nat Struct Mol Biol, 2008.
15(12): p. 1343-51.
Hudson, B.P., et al., Recognition of the mRNA AU-rich element by the zinc finger
domain of TIS11d. Nat Struct Mol Biol, 2004. 11(3): p. 257-64.
Cavaloc, Y., et al., The splicing factors 9G8 and SRp20 transactivate splicing through
different and specific enhancers. Rna, 1999. 5(3): p. 468-83.
Loughlin, F.E., et al., The zinc fingers of the SR-like protein ZRANB2 are singlestranded RNA-binding domains that recognize 5' splice site-like sequences. Proc Natl
Acad Sci U S A, 2009. 106(14): p. 5581-6.
Grishin, N.V., KH domain: one motif, two folds. Nucleic Acids Res, 2001. 29(3): p.
638-43.
Valverde, R., L. Edwards, and L. Regan, Structure and function of KH domains. Febs
J, 2008. 275(11): p. 2712-26.
Lunde, B.M., C. Moore, and G. Varani, RNA-binding proteins: modular design for
efficient function. Nat Rev Mol Cell Biol, 2007. 8(6): p. 479-90.
Ule, J., et al., Nova regulates brain-specific splicing to shape the synapse. Nat Genet,
2005. 37(8): p. 844-52.
Lewis, H.A., et al., Sequence-specific RNA binding by a Nova KH domain:
implications for paraneoplastic disease and the fragile X syndrome. Cell, 2000.
100(3): p. 323-32.
Buckanovich, R.J. and R.B. Darnell, The neuronal RNA binding protein Nova-1
recognizes specific RNA targets in vitro and in vivo. Mol Cell Biol, 1997. 17(6): p.
3194-201.
Jensen, K.B., et al., Nova-1 regulates neuron-specific alternative splicing and is
essential for neuronal viability. Neuron, 2000. 25(2): p. 359-71.
Ule, J., et al., CLIP identifies Nova-regulated RNA networks in the brain. Science,
2003. 302(5648): p. 1212-5.
Ule, J., et al., An RNA map predicting Nova-dependent splicing regulation. Nature,
2006. 444(7119): p. 580-6.
Berglund, J.A., et al., The splicing factor BBP interacts specifically with the premRNA branchpoint sequence UACUAAC. Cell, 1997. 89(5): p. 781-7.
Liu, Z., et al., Structural basis for recognition of the intron branch site RNA by
splicing factor 1. Science, 2001. 294(5544): p. 1098-102.
Beuth, B., et al., Structure of a Mycobacterium tuberculosis NusA-RNA complex.
Embo J, 2005. 24(20): p. 3576-87.
Auweter, S.D., F.C. Oberstrass, and F.H. Allain, Sequence-specific binding of singlestranded RNA: is there a code for recognition? Nucleic Acids Res, 2006. 34(17): p.
4943-59.
Tazi, J., N. Bakkour, and S. Stamm, Alternative splicing and disease. Biochim
Biophys Acta, 2009. 1792(1): p. 14-26.
Koradi, R., M. Billeter, and K. Wuthrich, MOLMOL: a program for display and
analysis of macromolecular structures. J Mol Graph, 1996. 14(1): p. 51-5, 29-32.
21