Download The genesis of new proteins and new

Document related concepts

Cryptochrome wikipedia , lookup

Glossary of plant morphology wikipedia , lookup

Arabidopsis thaliana wikipedia , lookup

Transcript
The genesis of new proteins
and new biosyntheses
Benjamin Pouvreau
MSc (Plant Biology)
This thesis is presented for the degree of Doctor of Philosophy of
The University of Western Australia in 2016
School of Chemistry and Biochemistry, Faculty of Science
Abstract
The emergence of new proteins with novel functions is at the root of adaptive
evolutionary innovation. Understanding how proteins are made and how new
proteins evolve is of fundamental importance. Increasingly, it is clear that de
novo protein evolution is frequent and important. The cyclic peptide SFTI-1
(Sunflower Trypsin Inhibitor 1) represents a shortcut to de novo protein
evolution as it represents a protein that has evolved de novo within another
protein instead of evolving fully de novo, i.e. from non-coding DNA. SFTI-1
biosynthesis and its evolutionary origin have been partly resolved. In this study I
used synthetic genes and Arabidopsis transgenesis to provide more evidence
for the requirement for asparaginyl endopeptidase (AEP) in processing SFTI-1
from its dual-product precursor PawS1 (Preproalbumin with SFTI-1). My data
showed that SFTI-1 is not a by-product of its adjacent albumin. SFTI-1 can be
processed from non-native locations within PawS1 as well as from unrelated
protein precursors in an AEP-dependent fashion. In nature, the sequence for
SFTI-1 emerged by expansion within an ancestral albumin gene, without
altering the coding frame, creating a dual-product gene encoding a single
protein precursor named PawS1 that is processed by enzymatic cleavage to
produce the two distinct mature proteins. I termed neoproteinisation this type of
de novo protein evolution and I believe it has been overlooked. To support this,
I reveal a different family of similarly buried plant peptides (VBP for Vicilin
Buried Peptides) that appear to have also evolved by neoproteinisation within
ancestral vicilin precursors. VBPs are approximately 30 residues, possess a
conserved CXXXC(X10-14)CXXXC motif and present as an alpha helical hairpin
tethered by two disulfide bonds. VBPs appeared at the emergence of flowering
plants and are now widespread in most flowering plants except Brassicales.
This discovery confirms that neoproteinisation has occurred at least twice and
could be a widespread and overlooked mechanism for de novo protein
evolution.
I
Acknowledgement
I thank the University of Western Australia for the financial support of my PhD
by awarding me an International Postgraduate Research Scholarship (IPRS)
and an Australian Postgraduate Award (APA).
I thank my supervisor Josh Mylne, lecturer and research group leader at the
School of Chemistry and Biochemistry for welcoming me in his research group
and supervising me during this PhD.
I thank the ARC Centre of Excellence in Plant Energy Biology for giving me
access to plant growth facilities and mass-spectrometry facilities.
I thank Aurélie Benfield for generating the aep triple mutant lines and a third of
the transgenic lines used during my PhD.
Special thanks to Ricarda for her magical talent with targeted proteomics.
Special thanks to Owen for coping with my infinite number of questions on
mass-spectrometry.
I thank all the members of the Mylne lab for their support during my PhD.
Special thanks to Mark for his computing tips.
Special thanks to Amy for correcting many and many mistakes in this
manuscript.
Special thanks to Max for his enthusiasm and good mood.
Special thanks to Julie for coping with me during my PhD thesis writing.
I also thank all the administrative staff of UWA and the School of Chemistry and
Biochemistry who helped me many times during my PhD.
Special thanks to Adam and Nehal for their continuous good mood and help
with my unusual product requests, such as the now famous Craig Balls.
II
Thesis declaration
The research presented in this thesis is composed of my original work except
where due reference has been made in the text. I have clearly stated the
contribution by others to jointly-authored works that I have included in my
thesis. This work was carried out in the School of Chemistry and Biochemistry,
The University of Western Australia.
The content of my thesis is the result of work I have carried out since the
commencement of my candidature for the degree of Doctor of Philosophy of
The University of Western Australia and does not include a substantial part of
work that has been submitted to qualify for the award of any other degree or
diploma in any university or other tertiary institution.
The thesis is presented in separate Chapters, including three that will be
submitted for publication. The bibliographical details of the work and where it
appears in the thesis are as outlined below:
Chapter 1: Pouvreau, B.: SFTI-1, an unusual plant cyclic peptide. Introduction
Chapter 2: Pouvreau, B., Mylne, J.S.: The importance of seed storage protein
maturation for germination.
Author contribution: Pouvreau, B.: performed experiments (100%), wrote
Chapter (100%); Mylne, J.S.: conceived the study (100%), edited Chapter.
Chapter 3: Pouvreau, B., Mylne, J.S.: AEP function in dark-induced leaf
senescence.
Author contribution: Pouvreau, B.: performed experiments (100%), wrote
Chapter (100%); Mylne, J.S.: conceived the study (100%), edited Chapter.
III
Chapter 4: Pouvreau, B., Mylne, J.S.: Using SFTI-1 to study protein trafficking
defects.
Author contribution: Pouvreau, B.: conceived the study (30%), performed
experiments (100%), wrote Chapter (100%); Mylne, J.S.: conceived the study
(70%), edited Chapter.
Chapter 5: Pouvreau, B., Fenske, R., Mylne, J.S.: How readily will SFTI-1
hijack other sequences? (Main body of a manuscript ready for publication)
Author contribution: Pouvreau, B.: conceived the study (50%), performed
experiments (90%), wrote and edited manuscript (50%); Fenske, R.: performed
experiments (10%); Mylne, J.S.: conceived the study (50%), wrote and edited
manuscript (50%).
Chapter 6: Pouvreau, B., Mylne, J.S.: Neoproteinisation: a new mechanism for
de novo evolution. (Review ready for publication)
Author contribution: Pouvreau, B.: wrote manuscript (100%); Mylne, J.S.:
edited manuscript.
Chapter 7: Pouvreau, B., Mylne, J.S.: Neoproteinisation in vicilin precursors.
(Partly included in the discussion of a manuscript ready for publication)
Author contribution: Pouvreau, B.: Conceived the study (50%), Performed
experiments (100%), wrote manuscript (100%); Mylne, J.S.: Conceived the
study (50%), edited manuscript.
Chapter 8: Pouvreau, B., Mylne, J.S.: General discussion and future
directions. (Conclusion to the thesis)
Author contribution: Pouvreau, B.: wrote manuscript (100%); Mylne, J.S.:
edited manuscript.
IV
Abbreviations
In alphabetical order:
1‑D: One dimension
5d: Five days
9d: Nine days
A. thaliana: Arabidopsis thaliana
ACTH: Adrenocorticotropin
ADH (or adh): Alcohol dehydrogenase
AEP: Asparaginyl endopeptidase
aep‑null: aep quadruple null mutant (aep1 aep2 aep3 aep4)
AFGP: Antifreeze glycopeptides
AGI: Reference gene model for the protein in the A. thaliana information
resource (TAIR) database
AP: Aspartic protease
Asn: Asparagine
AUC: Area under the curve
C2: Original CXXXC(10-14)CXXXC motif peptide characterised in Cucurbita
maxima
C2a: First form of the C2 peptide from Cucurbita maxima
C2b: Second form of the C2 peptide from Cucurbita maxima
C2m: Potential C2 peptide homologue from Cucumis melo
C2ps: Potential C2 peptide homologue from Pisum sativum
C2pv: Potential C2 peptide homologue from Phaseolus vulgaris
V
C2s: Potential C2 peptide homologue from Cucumis sativus
C2vf: Potential C2 peptide homologue from Vicia faba
CaMV35S: Cauliflower mosaic virus 35s promoter
ClaI: Caryophanon latum L. restriction enzyme 1
CO8: Cycloviolacin-O8 from Viola odorata
Col‑0: Arabidopsis thaliana ecotype Columbia-0
CraS1: Chimeric construct with SFTI-GLDN inserted inside CRU1
CRU1: CRUCIFERIN A1
cSlo: Chicken homologue of the Drosophila Slowpoke gene
CtA: Cyclotheonamide A
Ctc: Cyclotide genes from Clitoria ternatea
CtM: Cyclotide M from Clitoria ternatea
C‑terminal: Carboxy-terminal
cTI: Cyclic trypsin inhibitor from Momordica cochinchinensis
CXXXC motif: amino acid motif consisting of two cysteine residues spaced by
three other amino acid residues
CXXXC(10-14)CXXXC motif: amino acid motif consisting of two CXXXC motifs
spaced by 10 to 14 other amino acid residues
DNA: Deoxyribonucleic acid
cDNA: complementary DNA
Dscam: Down syndrome cell adhesion molecule
DTT: Dithiothreitol
E. coli: Escherichia coli
EDTA: Ethylenediaminetetraacetic acid
VI
ER: Endoplasmic reticulum
ER‑signal: Peptide signal for ER targeting
F(1 to 6): Filial generation of offspring of distinctly different parental types.
Fg01: Arbitrary attributed mouse gene name
fgf4: Fibroblast growth factor 4 gene
Flt1: Vascular endothelial growth factor receptor 1
sFlt1: Alternatively spliced Flt1
sFlt1‑14: Alternatively spliced Flt1 that is human specific
GABI: Generation of flanking sequence tags from T-DNA mutagenised A.
thaliana
G‑protein: Guanine nucleotide-binding protein
GTPase: Guanosine triphosphate hydrolase
HIV: Human immunodeficiency virus
HPLC: High performance liquid chromatography
Hsmar1: Homo sapiens mariner elements
IDL: Individually darkened leaf
JGW (or jgw): Jingwei
KB: kalata B1 type cyclotides
KLK4: human kallikrein‑related peptidase 4
LC: Liquid chromatography
LEA: Late embryogenesis abundant
LeaS1: Chimeric construct with SFTI-GLDN inserted inside a LEA precursor
LINE‑1: Long interspersed element 1
LPH: Lipotropin
VII
lola: Longitudinal lacking
LSU: large subunit
Ltd: Limited (referring to private companies)
MAG: Maigo (means a stray or lost child in Japanese)
MALDI‑TOF: Matrix-assisted laser desorption/ionisation time-of-flight
MES: 2-(N-morpholino)ethanesulfonic acid
MiAMP2: Macadamia integrifolia antimicrobial peptide 2
MS media: Murashige-Skoog media
MS: Mass spectrometry
MS/MS: Tandem mass spectrometry
MSH: Melanotropin
NMR: Nuclear magnetic resonance
NAHR: Non-allelic homologous recombination
NHEJ: Non-homologous end joining
NOS1: Nitric oxide synthetase 1
nos 3′: nopaline synthase 3′ UTR
N‑terminal: NH2-terminal
Oak: Oldenlandia affinis kalata gene
OLE1: Oleosin 1
OleoS1: Chimeric construct with SFTI-GLDN inserted inside OLE1 precursor
ORF: Open reading frame
PA1: Pea albumin 1
pAC161: Plasmid artificial chromosome P1 version 161
VIII
pAOV: Plasmid (antisense orientation vector) used to create the transformants
generated during this PhD
PAP85: Plastid lipid associated protein 85
PapS1: Chimeric construct with SFTI-GLDN inserted inside PAP85 precursor
PawS1: Preproalbumin with Sunflower inhibitor 1
Paw[N65]S1: Chimeric construct with SFTI-GLDN removed from its native
position and placed after asparagine 65 inside PawS1
Paw[N84]S1: Chimeric construct with SFTI-GLDN removed from its native
position and placed after asparagine 84 inside PawS1
Paw[N96]S1: Chimeric construct with SFTI-GLDN removed from its native
position and placed after asparagine 96 inside PawS1
Paw[N143]S1: Chimeric construct with SFTI-GLDN removed from its native
position and placed after asparagine 143 inside PawS1
Paw[N146]S1: Chimeric construct with SFTI-GLDN removed from its native
position and placed after asparagine 146 inside PawS1
Paw4SFTI: Chimeric construct with four tandem SFTI-GLDN variants placed at
SFTI-GLDN native position inside PawS1
PawS1[Δ2‑21]: Chimeric construct with ER-signal removed from PawS1
PawS2: Name given to the sunflower PawS1 homologue
pBI121: Plasmid binary version 121
PCR: Polymerase chain reaction
pDAP101: Plasmid used to create some of the SAIL lines
PDB: Protein Data Bank
PhA: Petunia hybrida cyclotide A
Phc1: Petunia hybrida cyclotide gene 1
IX
pOLEOSIN:PawS1: Construct, expressing native PawS1 protein precursor
under the control of a seed specific A. thaliana OLEOSIN promoter
POMC: Proopiomelanocortin
pROK2: Plasmid used to create some of the SALK lines
pipsl: Phosphatidylinositol-4-phosphate 5-kinase and proteasome subunit like
PV100: Protein vacuolar 100
QqQ: Triple quadrupole mass spectrometer
Q‑TOF: Quadrupole time-of-flight mass spectrometer
RACE: Rapid amplification of cDNA ends
RNA: Ribonucleic acid
mRNA: Messenger RNA
SacI: Streptomyces achromogenes restriction enzyme 1
SAG12: Senescence associated gene 12
SAIL: Syngenta Arabidopsis insertion library
SDS: Sodium dodecyl sulfate
SDS‑PAGE: Sodium dodecyl sulfate - polyacrylamide gel electrophoresis
SESA1 (or 4): SEED STORAGE ALBUMIN 1 (or 4)
SesaS1: Chimeric construct with SFTI-GLDN inserted inside SESA1
SesaS2: Chimeric construct with SFTI-GLDN inserted inside SESA1 without
internal propeptide.
SET: Su(var)3-9 enhancer-of-zeste trithorax domain
setmar: Fusion of SET and Hsmar1 DNA sequence
SFTI‑1: Sunflower trypsin inhibitor 1 (cyclic peptide)
1(to 4)SFTI: Chimeric constructs with one to four tandem SFTI-GLDN variants
placed at SFTI-GLDN native position inside a PawS1 variant with no albumin
X
SFTI‑1F12Y: SFTI‑1 variant with tyrosine 12 replaced by phenylalanine
SFTI‑1I10V: SFTI‑1 variant with isoleucine 10 replaced by valine
SFTI‑1K5R: SFTI‑1 variant with lysine 5 replaced by arginine
SPAD: Minolta SPAD-502 leaf chlorophyll meter (SPAD is just a brand name)
SSU: Small subunit
SUMO: Small ubiquitin-like modifier
T1 (or T2): First (or second) transgenic generation
TAIR10: The A. thaliana information resource database (version10)
tbc1d3: TBC (Tre-2/Bub2/Cdc16) 1 Domain Family, Member 3 gene
TI: Trypsin inhibitors
UniProtKB: Universal protein knowledgebase
usp: Ubiquitin-specific protease gene
VaA: Viola arvensis peptide A
VEGF: Vascular endothelial growth factor
Voc: Viola odorata cyclotide gene
Vok: Viola odorata kalata gene
VPE: Vacuolar processing enzyme (or AEP, legumain)
VPS: Vacuolar protein sorting
VSR: Vacuolar sorting receptor
Ws‑2: Arabidopsis thaliana ecotype Wassilewskija-2
WT: Wild type
XhoI: Xanthomonas holcicola restriction enzyme 1
XI
XL: Large upstream exon from the gene for G-protein first subunit
ALEX: Product of the alternative splicing of the XL exon
ynd: yande gene
ZAG1: Zea mays AGAMOUS 1
ZMM2: Zea mays MADS 2 (MADS is a conserved amino acid motif)
XII
Contents
ABSTRACT
I
ACKNOWLEDGEMENT
II
THESIS DECLARATION
III
ABBREVIATIONS
V
CONTENTS
XIII
CHAPTER 1
1
1
2
SFTI-1, AN UNUSUAL PLANT CYCLIC PEPTIDE
1.1 Plant cyclic peptides
2
1.1.1
Plant cyclic peptide identification
2
1.1.2
Plant cyclic peptide biosynthesis
4
1.1.3
Plant cyclic peptide evolution
8
1.2 Discovery of the sunflower trypsin inhibitor SFTI-1
10
1.2.1
Identification of SFTI-1
10
1.2.2
Multiple applications for SFTI-1 properties
12
1.3 Discovery of SFTI-1 biosynthetic origins
14
1.3.1
Identification of PawS1 (Preproalbumin with SFTI-1)
14
1.3.2
The napin-type seed storage albumins
14
1.3.3
Seed albumins are matured by AEP
16
1.4 SFTI-1 is matured by the seed albumin processing machinery
19
1.4.1
SFTI-1 requires AEP for proper processing in transgenic lines
19
1.4.2
A recombinant AEP can perform SFTI-1 maturation
22
1.5 SFTI-1 as a model to study protein biosynthesis and evolution
23
1.5.1
SFTI-1 is an unusual buried cyclic peptide
23
1.5.2
SFTI-1 stepwise evolution
26
1.6 Thesis aim
28
XIII
CHAPTER 2
2
29
THE IMPORTANCE OF SEED STORAGE PROTEIN MATURATION FOR
GERMINATION
30
2.1 Introduction
30
2.2 Material and methods
32
2.2.1
Plant material
32
2.2.2
Plant growth conditions
32
2.2.3
Germination assays
32
2.2.4
Protein extraction
33
2.2.5
Protein 1-D gel electrophoresis
33
2.3 Results
34
2.3.1
Preliminary study
34
2.3.2
Germination rate is not influenced by media composition
35
2.3.3
Germination rate varies in a fixed genotype
37
2.3.4
Impaired seed storage protein maturation does not influence germination rate
37
2.4 Discussion
38
CHAPTER 3
39
3
40
AEP FUNCTION IN DARK-INDUCED LEAF SENESCENCE
3.1 Introduction
40
3.1.1
AtAEP3 expression is highest in senescing leaves
40
3.1.2
Senescence is a natural recycling plant mechanism
41
3.1.3
Dark-induced senescence as a tool to study senescence
42
3.2 Material and Methods
43
3.2.1
Plant material
43
3.2.2
Chlorophyll abundance measurement
43
3.2.3
Plant Growth Conditions, selection, crosses and seed collection
43
3.3 Results
44
3.3.1
Preliminary study
44
3.3.2
Natural variation for response to dark-induced senescence
46
3.3.3
AEP is not involved in dark-induced senescence
47
3.4 Discussion
49
XIV
CHAPTER 4
50
4
51
USING SFTI-1 TO STUDY PROTEIN TRAFFICKING DEFECTS
4.1 Introduction
51
4.2 Material and Methods
54
4.2.1
Plant material and growth conditions
54
4.2.2
Plant genotyping
54
4.2.3
Plant crossing and selection
55
4.2.4
Segregation analysis
56
4.2.5
Seed peptide extraction
56
4.2.6
Acyclic SFTI-1 detection and quantification
57
4.2.7
Protein 1-D gel electrophoresis
58
4.3 Results
59
4.3.1
Generation of hybrid lines
59
4.3.2
SFTI-1 as a tool to monitor seed storage protein maturation
59
4.3.3
Seed storage protein trafficking pathways
61
4.4 Discussion
64
CHAPTER 5
65
5
66
HOW READILY WILL SFTI-1 HIJACK OTHER SEQUENCES?
5.1 Introduction
66
5.2 Material and methods
67
5.2.1
Synthetic genes for in planta expression
67
5.2.2
Plant transformation and growth conditions
68
5.2.3
Seed peptide extraction
68
5.2.4
SFTI-1 (cyclic and acyclic) detection and acyclic SFTI-1 quantification
69
5.2.5
Shotgun proteomics of A. thaliana seeds
69
5.2.6
Targeted proteomics
70
5.3 Results
71
5.3.1
The SFTI-1 sequence can emerge from non-native locations
71
5.3.2
The SFTI-1 sequence can recruit AEP
74
5.3.3
The SFTI-1 sequence can recruit AEP without an adjacent protein
77
5.4 Conclusion
78
XV
CHAPTER 6
6
79
NEOPROTEINISATION: A NEW MECHANISM FOR DE NOVO
EVOLUTION
80
6.1 Introduction
80
6.2 How do new proteins arise?
81
6.2.1
New protein evolution by DNA based gene duplication and divergence
81
6.2.2
New protein evolution through RNA based gene duplication and divergence
83
6.2.3
Creation of new proteins through gene or transcript fusion
85
6.2.4
Acquisition of de novo proteins by horizontal gene transfer
89
6.2.5
Emergence of de novo protein by new alternative gene coding
90
6.2.6
De novo protein evolution
91
6.2.7
Neoproteinisation, a new mechanism of divergence that creates de novo proteins 94
6.3 How to discover other neoproteinisation events?
6.3.1
Neoproteinisation creates unusual genes encoding multiple proteins
6.3.2
Leads for discovery of other neoproteinisation events
95
95
100
6.4 Conclusion
102
CHAPTER 7
103
7
104
NEOPROTEINISATION IN VICILIN PRECURSORS
7.1 Introduction
104
7.2 Material and methods
107
7.2.1
Vicilin alignment
107
7.2.2
Seed extract fractions preparation
107
7.2.3
MALDI-TOF-MS analysis
108
7.2.4
LC/MS analysis
108
7.3 Results
109
7.3.1
Several classes of vicilin precursors
109
7.3.2
Emergence of conserved CXXXC(10-14)CXXXC motifs in vicilin precursors
111
7.3.3
Are CXXXC(10-14)CXXXC motifs a neoproteinisation event?
114
7.3.4
Identification of CXXXC(10-14)CXXXC peptides
115
7.3.5
Identification of C2 homologues from closely related species
117
7.3.6
More widespread evidence for C2-like peptides
118
7.4 Conclusion
120
XVI
CHAPTER 8
121
8
122
GENERAL DISCUSSION AND FUTURE DIRECTIONS
8.1 Seed storage protein trafficking
122
8.2 SFTI-1 maturation
123
8.3 SFTI-1 evolution, neoproteinisation
124
8.4 Neoproteinisation of vicilin precursors
125
CHAPTER 9
126
9
127
SUPPLEMENTARY INFORMATION
9.1 Gene sequences, relative to trafficking mutants (Chapter 4)
127
9.2 Oleosin promoter used for transgenic expression in A. thaliana
137
9.3 Synthetic gene sequences from Chapter 5
138
9.4 Chimeric protein precursors from Chapter 5
144
REFERENCES
147
XVII
Chapter 1
SFTI-1, an unusual plant cyclic
peptide
An introduction
1
1
SFTI-1, an unusual plant cyclic peptide
1.1 Plant cyclic peptides
1.1.1 Plant cyclic peptide identification
Within the last two decades, there has been significant advancement in
research to identify circular proteins in higher organisms and in the
development of synthetic cyclic proteins. In general, backbone cyclic peptides
have several advantages over their linear counterparts. They are more
thermodynamically stable than non cyclic peptides and because they have no
free ends, they resist digestion by exopeptidases, making them much more
stable than their linear counterpart. Cyclic peptides are also more efficient than
non cyclic ones at creating binding interactions with target proteins because the
cyclic structure leads to reduced entropic losses upon binding (Marx,
Korsinczky et al. 2003).
Cyclic peptides have various functions and are widespread. They have been
isolated from several tissues of plants and mammals, and are also found in
fungi and bacteria (Craik, Mylne et al. 2010). Although the biological function of
cyclic peptides in plants often remains unknown, the initial interest in cyclic
peptides from plants was driven by their diverse and often very potent biological
activities. Their bioactivities include uterotonic, anti-HIV and neurotensin
inhibitory activities, amongst others. Cyclic peptides were initially discovered
through biochemical analysis of indigenous medicinal plants (Gran 1970; Gran
1973)
and
later
during
pharmaceutically
oriented
screening
programs
(Gustafson, Sowder et al. 1994; Witherup, Bogusky et al. 1994). A typical
example is the plant cyclic peptide kalata B1 extracted from the leaves of the
African plant Oldenlandia affinis. Kalata B1 was originally discovered as the
active component in a native African medicine from the Congo region, made
from Oldenlandia affinis boiled leaves and used by women to accelerate
childbirth (Gran 1970; Gran 1973). The kalata B1 (founding member of the
cyclotide family of plant cyclic peptide, see next section for a more detail
definition of all families of plant cyclic peptides) sequence and structure was
characterised in 1995, revealing its cyclic nature (Saether, Craik et al. 1995).
2
Kalata B1 was later also reported to have antimicrobial (Tam, Lu et al. 1999),
haemolytic (Barry, Daly et al. 2003), anti-HIV (Daly, Gustafson et al. 2004) and
insecticidal activities (Jennings, West et al. 2001). More recently, several cyclic
peptides, and particularly cyclotides, have been reported to have insecticidal
activity (Jennings, West et al. 2001; Jennings, Rosengren et al. 2005; Barbeta,
Marshall et al. 2008). Many other activities also related to defence have been
reported for cyclotides and other specific cyclic peptides, including antimicrobial,
antifouling, cytotoxic, molluscicidal and nematocidal activities
(Lindholm,
Göransson et al. 2002; Göransson, Sjogren et al. 2004; Colgrave, Kotze et al.
2008; Plan, Saska et al. 2008; Colgrave, Kotze et al. 2009). However these
studies lack native in planta evidence and the reported activities might not
reflect the native functions and could be a result of their known property for
interfering into differing degrees with membranes (Craik, Mylne et al. 2010).
Nevertheless, all their biological properties and structural advantages make
cyclic peptides promising scaffolds for the design of stable pharmaceuticals.
Naturally occurring peptidic inhibitors of known structure are a promising
starting point for the design of novel drugs. They can provide a relatively large
surface area to enhance the potential of designing selective analogues. In
addition, as the overall three dimensional structure of the original protein is
preserved in the new variants; the interpretation of interactions at the atomic
level is possible without further structural analysis. Although linear peptides are
useful for some applications, these molecules tend to be unstable in vivo due to
the hydrolytic activity of peptidases. Therefore, cyclic peptides are promising
scaffolds for the design of stable drugs (Boy, Mier et al. 2010). Numerous
examples of re-engineered plant cyclic peptides, and particularly cyclotides,
have been reported, with potential applications in the treatment of many
diseases including vascular disorders (Gunasekera, Foley et al. 2008; Chan,
Gunasekera et al. 2011; Getz, Cheneval et al. 2013), allergic asthma
(Sommerhoff, Avrutina et al. 2010), inflammatory pain (Wong, Rowlands et al.
2012), obesity (Eliasen, Daly et al. 2012), cancer (Ji, Majumder et al. 2013), HIV
(Aboye, Ha et al. 2012) and other diseases related to the immune system
(Wang, Gruber et al. 2014). For example, kalata B1 was re-engineered in
targeted studies focusing on the development of therapeutics for the treatment
of multiple sclerosis, an inflammatory disorder of the central nervous system, in
which the immune system has a significant role (see next page).
3
Figure 1.1. Plant cyclic peptides. For each major class the size range is
given, the typical structure (when known) is represented by canonical example
of each group and presence of disulfide bonds is highlighted in yellow; (a)
SFTI-1, PDB code 1SFI (Luckett, Garcia et al. 1999), (b) kalata B1, PDB code
1KAL (Saether, Craik et al. 1995), (c) cTI2, PDB code 1HA9 (Heitz, Hernandez
et al. 2001), (d) Segetalin A (6 amino acid residues orbitide) chemical
schematic (Condie, Nowak et al. 2011; Arnison, Bibb et al. 2013), (e) Adouetine
X chemical schematic (Dias, Gressler et al. 2007; Trevisan, Maldaner et al.
2009).
The treatments currently used usually involve repressing the patient immune
system, thus creating the particularly strong side effect of not being able to have
an immune response to pathogens infections. Peptide antigens are a promising
alternative strategy, to modulate the immune system instead of shutting it down,
but their therapeutic potential is often limited by their poor in vivo stability. The
exceptional stability of some cyclic peptide provides another alternative, and the
potential of re-engineered plant cyclic peptides in the treatment of complex
diseases has already been demonstrated. In a recent study, three of the 29
amino acid residues of kalata B1 were replaced by a seven amino acid residues
peptide corresponding to the active site of an immune-regulatory protein called
myelin oligodendrocyte glycoprotein. Injection of this so-called grafted peptide
reduced significantly both clinical and histological symptoms of multiple
sclerosis in mouse system (Wang, Gruber et al. 2014). Moreover, the use of a
naturally occurring peptide as scaffolds for the design of new therapeutic
molecules also leaves open the possibility of using plants as “factories” for
producing the modified inhibitors via a low cost route (Quimbar, Malik et al.
2013). The numerous plant cyclic peptides reported with insecticidal activities
could also be used to develop alternative approaches to chemical pesticides,
such as the production of transgenic plants that express naturally occurring
plant cyclic peptides to protect themselves against pathogens (Craik, Mylne et
al. 2010).
1.1.2 Plant cyclic peptide biosynthesis
In order to engineer plants for the production of modified cyclic peptides, it is
necessary to understand how plants make cyclic peptides. In this regard, plant
cyclic peptides can be separated into five families according to how their
specific structures and biosynthesis are known (figure 1.1).
Some are thought to be non-gene-encoded and synthesised by a non-ribosomal
peptide synthetase, but the regulation of the process is not well understood
(Tan and Zhou 2006) and poorly studied. Some non-gene-encoded cyclic
peptides have been reported in other organisms, from bacteria to human. They
can be produced during other proteins degradation by enzymes that cut pieces
off different proteins and ligate and cyclise them together, or by a non-ribosomal
protein synthesis pathway with spontaneous cyclisation (Xu, Li et al. 2011).
4
Figure 1.2. Primary structures of cyclotides and their precursors. (a)
Conserved primary structure of cyclotides, represented by kalata B1 (KB1). (b)
Sequences of characteristic cyclotides for each plant family where cyclotides
have been found, presented in an alignment highlighting their strong sequence
conservation, in particular for the cysteine residues. Alignment with CLC
Genomics (CLCbio), black for perfect homology, grey for strong homology
cysteine residues highlighted in red. (c) Primary structure of the protein
precursors for characteristic cyclotides. The Oak genes from Rubiaceae encode
the kalata B (KB) cyclotides and can encode one to three cyclotides. In
Violaceae kalata B1 is produced from a multiple precursor named Vok1 also
containing two copies of the cyclotides VaA, but single precursors also exist
such as Voc1 containing the cyclotide CO8. The truncated type of precursor
from Solanaceae (Phc1) and the unusual Ctc type of dual precursor from
Fabaceae are also presented. Sequences are from UniProtKB; Oak1 (P56254)
containing kalata B1 (KB1), Oak4 (P58454) containing kalata B2 (KB2), Oak2
(P58455) containing kalata B3/B6 (KB3/6), Oak3 (P58457) containing kalata B7
(KB7), Voc1 (P58440) containing Cycloviolacin-O8 (CO8), Vok1 (Q5USN7)
containing cyclotide VaA, Phc1 (B3EWH5) containing cyclotide PhA and Ctc1
(P86899) containing the cyclotide CtM. Precursor structure deduced from
sequences and previously published data (Gruber, Elliott et al. 2008; Nguyen,
Zhang et al. 2011; Poth, Mylne et al. 2012).
The cyclic
peptides
of the orbitide family are mostly represented in
Caryophyllaceae and close relatives. They typically consist of a single ring of
five to twelve amino acid residues, that are thought to originate from
gene-encoded
linear
precursors,
in
a
two-step
process
involving
serine-protease-like enzymes, but non-gene-encoded amino acids can also be
added to the structure, and their biosynthesis remain to be fully elucidated
(Condie, Nowak et al. 2011; Barber, Pujara et al. 2013).
On the other hand, the members of the three other known families of plant
cyclic peptides are thought to have their sequence totally gene-encoded and
their biosynthesis have been described in more details and thus represent a
more evident choice for transgenic applications.
The third family of plant cyclic peptides, and the most broadly represented by
far, is the cyclotide family, which members have been isolated from various
species in the Rubiaceae (Craik 2001; Jennings, West et al. 2001), Violaceae
(Dutton, Renda et al. 2004), Fabaceae (Nguyen, Zhang et al. 2011; Poth,
Colgrave et al. 2011), and Solanaceae (Poth, Mylne et al. 2012) which are
phylogenetically distant plant families. Cyclotides are disulfide-rich cyclic
peptides from 29 to 31 amino acid residues and have a unique structure (figure
1.1 and 1.2) consisting of a distorted cyclised triple-stranded β-sheet, with a
knotted arrangement of three disulfide bonds (Craik, Daly et al. 1999). This
cyclic cystine-knot motif makes cyclotides particularly resistant to enzymatic
degradation, but also to thermal or chemical degradation (Colgrave and Craik
2004). Their sequence and structure (figure 1.2) are highly conserved among
species (Dutton, Renda et al. 2004).
They are characterised by the first
cyclotide to have been discovered, kalata B1. Kalata B1 was originally
discovered in Oldenlandia affinis (Rubiaceae) leaves, but is also found in the
phylogenetically distant species Viola odorata (Violaceae) (Gruber, Elliott et al.
2008). Cyclotides are typically encoded by a gene producing a larger protein
precursor consisting of a signal-peptide (endoplasmic-reticulum targeting), a
propeptide and one to three identical or slightly different cyclotide domains,
which themselves consist of a cyclotide surrounded by two propeptides.
However, little is known about the enzymes involved in releasing and cyclising
cyclotides
from
their
linear
precursors. An enzyme called asparaginyl
endopeptidase (AEP, also referred to as VPE, vacuolar processing enzyme or
5
legumain) was suggested to be involved in the removal of the C-terminal
propeptide of cyclotides (Saska, Gillon et al. 2007) , but the enzymes involved
in the removal of the two N-terminal propeptides are still unknown (Mylne, Chan
et al. 2012). However, an unusual cyclotide precursor was recently described in
a plant from the Fabaceae family. Ctc type cyclotide genes from Clitoria
ternatea encode dual protein precursors that also contain an albumin precursor
similar to PA1 (Pea albumin 1) seed storage albumin. Although no data were
provided to support the existence of the albumin at the protein level, the Ctc
cyclotides were found in most tissues of this plant and then characterised. They
share very high sequence and structure similarity with most other plant
cyclotides and process cytotoxic activities like many other plant cyclotides
(Nguyen, Zhang et al. 2011). These cyclotides, characterised by the CtM
cyclotide, only require cleavage by AEP at their C-terminus to be released from
their precursor, because their N-terminus is unmasked by endoplasmic
reticulum (ER) signal removal.
6
Figure 1.3. Structures of squash trypsin inhibitors and their precursors.
(a) Conserved primary structure of cyclic squash trypsin inhibitors, represented
by cTI2 from Momordica cochinchinensis (Cucurbitaceae). (b) Sequence of
cyclic squash trypsin inhibitors from Momordica cochinchinensis for which the
precursor and encoding gene has been identified (cTI - 1, 2, 4, 7 and 8),
presented in an alignment (as for Figure 1.2) highlighting their strong sequence
conservation between each other, but also with their linear counterparts (TI5
and TI6) and with other common linear (*) squash trypsin inhibitors from
pumpkin (TI-I and TI-IV). Contrastingly their sequence strongly differs from
cyclotides, exemplified here by kalata B1 (KB1), , apart from the structural
cysteines and the initial Gly critical for cyclisation. (c) Primary structure of the
known precursors of cyclic squash trypsin inhibitors, compared to a common
cyclic squash trypsin inhibitor precursor. Sequences and precursor structure
from previously published data (Mylne, Chan et al. 2012) excepted sequences
of TI-I (P01074) and TI-IV (P07853) that are from UniProtKB. (d) Structure of
cTI2, PDB code 1HA9 (Heitz, Hernandez et al. 2001) and TI-I, PDB code 2V1V
(Zhukov, Jaroszewski et al. 2000). The terminal amino acid residues for acyclic
TI-I and the additional “cyclic-loop” in cTI2 are labelled.
The cyclic squash trypsin inhibitors, or macrocyclic knottins, from the
Cucurbitaceae family, set apart and constitute a fourth family of plant cyclic
peptide (Huang, Liu et al. 1993; Mylne, Chan et al. 2012). They share the same
cystine-knot motif as cyclotides and for this reason they have been used
similarly to cyclotides as drug-scaffolds, but they strongly diverge in sequence.
In addition, whereas cyclotides can naturally have various activities and are
found in phylogenetically dispersed plant families, cyclic squash trypsin
inhibitors have only been found in the seeds of Momordica cochinchinensis
(Cucurbitaceae), a tropical liana, and their native activity is limited to trypsin
inhibition, although they have been re-engineered for
diverse different
applications (Thongyoo, Bonomelli et al. 2009). The cyclic squash trypsin
inhibitors are matured from large precursors that also differ from traditional
cyclotides precursors. Their precursors are constituted by tandem series of
cyclic peptide sequences, separated by identical spacers and terminate with a
sequence for an acyclic squash trypsin inhibitor of similar sequence and
structure (figure 1.3). Contrary to their cyclic homologues, this type of acyclic
squash trypsin inhibitor, such as the typical trypsin inhibitor one (TI-I) from
pumpkin, is widespread in Cucurbitaceae, with almost identical sequence and
structure compared to acyclic trypsin inhibitors from Momordica cochinchinensis
(figure 1.3). The squash cyclic trypsin inhibitors are thought to be processed at
both ends and cyclised by AEP, whereas the maturation of the acyclic
counterpart remains unclear, but is independent of AEP. The identical
structures and repetitive sequences of the cyclic squash trypsin inhibitors
suggest an evolution by internal duplication of the same motif within an
ancestral gene (Mylne, Chan et al. 2012).
Sunflower trypsin inhibitor 1 (SFTI-1) is the first member of a newly discovered
family of cyclic peptides, restricted to sunflower (Astereceae) and close relatives
(termed Helianthae), and will be introduced in detail in the other parts of this
Chapter.
7
Figure 1.4. Structures of PA1b , Ctc cyclotides and of their precursors.
The sequences of some of the PA1 and Ctc known precursors are presented in
separated alignments (as for Figure 1.2), associated with their schematic of
their primary structure as previously described (Nguyen, Zhang et al. 2011).
Sequences are from UniProtKB; PA1-1 (P62926), PA1-2 (P62927), PA1-3
(P62928), PA1-4 (P62929), PA1-5 (P62930), PA1- 6(P62931), Ctc1 (P86899),
Ctc2 (G1CWI0), Ctc3 (G1CWH9), Ctc4 (G1CWH1). The structure of PA1b, PDB
code 1P8B (Jouvensal, Quillien et al. 2003) and CtM, PDB code 2LAM (Poth,
Colgrave et al. 2011) are presented.
1.1.3 Plant cyclic peptide evolution
Although it is unclear how cyclotides with identical sequence and similar
precursors originated in a few evolutionary dispersed plant families (Mylne,
Chan et al. 2012), the discovery of the unusual Ctc gene family in Fabaceae
brought a first clue. The Ctc gene family share many similarities with the PA1
gene family from pea, another Fabaceae (figure 1.4). The gene PA1 encodes a
dual protein precursor that is cleaved into two different and independent
proteins, an albumin, PA1a, and a linear cystine-knot peptide, PA1b (Higgins,
Chandler et al. 1986). Several copies of the gene exist in the pea genome, with
little sequence variation. The PA1a albumin is the most abundant low molecular
weight seed storage albumin from pea, but it differs from most other low
molecular weight seed albumins, such as napin-type. PA1a is monomeric
whereas most napin-type albumins are structured into two subunits connected
by disulfide bonds. However, to some extent, PA1a seems to share some
moderate sequence similarity with a napin-type albumin sub-unit including a
similar cysteine pattern that is highly conserved in most seed storage albumins
(Higgins, Chandler et al. 1986). Contrastingly, whereas acyclic, PA1b shares
high similarity in structure, but also in sequence, with CtM and most plant
cyclotides and in particular conserves the cystine-knot motif (figure 1.4). PA1b
is a highly potent insecticidal peptide used in several transgenic approaches to
improve crop resistance to pests in alternatives to chemical pesticide (Gressent,
Da Silva et al. 2011).
The strong structure and sequence homology between the Ctc genes and PA1,
described in the previous paragraph, suggest that they are related. In Clitoria
ternatea, the lack of protein evidence for the PA1a-like albumin from Ctc
precursor would suggest that this albumin might not be produced, whereas the
Ctc cyclotides are found in most tissues. On the other hand, in pea the PA1a
albumin is the most abundant seed-storage albumin and PA1b-type peptides
are not cyclic peptides and are only found in seeds. Taken together this
suggests that Ctc genes might have evolved from PA1-type genes in the
Fabaceae lineage (Nguyen, Zhang et al. 2011). This also suggests that
cyclisation and ubiquitous expression of the Ctc cyclotides was achieved by
divergence. However, the fate of their adjacent albumin remains unknown, but
8
is likely to have diverged from the ancestral seed storage function in regard to
the gene expression pattern.
This hypothesis also implies a strong positive evolutionary advantage for
favouring the knotted peptide production over the co-encoded albumin, which is
quite likely due to the high potency of PA1b, Ctc peptides and many other
cyclotides as insecticidal and cytotoxic molecules. Furthermore, the common
redundancy of albumin genes in plants genomes allows some degree of liberty
for divergence. In this scenario evolution achieving cyclisation of the favoured
peptide would have constituted an undeniable advantage because cyclic
peptides are traditionally much more stable than their linear counterparts.
Additionally, achieving more ubiquitous production would have provided a
serious advantage in regard to their activity. If this model of evolution is correct,
then it is also possible that in other species such dual genes would have
diverged even further and lost the albumin region to create the typical cyclotide
precursors currently known. However the origin of this dual precursor itself
remains unknown and it could likely also have appeared from the fusion of an
ancestral cystine-knot peptide precursor with an ancestral albumin precursor.
Much more investigations would be required before having a clear answer, but it
seems that cyclic peptides and seed storage proteins could be evolutionary
linked.
Thus it appears that cyclotides might have evolved from linear counterparts also
harbouring a cystine-knot motif. Interestingly, the cyclic squash trypsin inhibitors
might also have evolved from linear counterparts. Indeed, their precursors are
very likely to have evolved by internal duplication of the same motif, and are
terminated by an acyclic variant. Because these acyclic variants are almost
identical to other more traditional linear squash trypsin inhibitors (figure 1.3), it
is likely that they constitute the ancestral version, and that atypical divergence
and gene internal duplication in the squash Momordica cochinchinensis gave
rise to the cyclic squash trypsin inhibitor precursors (Mylne, Chan et al. 2012).
However, once again, more data are required to confirm this hypothesis.
9
Figure 1.5. Structure of SFTI-1. (a) The structure of SFTI-1 (cyan) was first
elucidated in complex to bovine β-trypsin (grey), PDB code 1SFI, (Luckett,
Garcia et al. 1999) and later the NMR solution structure (lighter), PDB code
1JBL, (Korsinczky, Schirra et al. 2001)), identified that the peptide did not differ
greatly in conformation to the crystal structure in complex, apart from the
orientation of the active lysine residue (in red) which is flexible; bonded
cysteines are highlighted (yellow). (b) 8 Bowman-Birk inhibitors (PDB codes
1BBI, 1D6R, 1G9I, 1H34, 1MVZ, 1SMF, 1TAB, 2BBI, line format, magenta)
overlaid to SFTI-1 (PDB code 1SFI, stick format, cyan) illustrating the structural
homology of SFTI-1 with the active arm of the Bowman-Birk inhibitor family. (c)
SFTI-1 (cyan) overlaid, in a zoom, to CtA (Cyclotheonamide A in blue) and the
previous Bowman-Birk inhibitors. The α-ketrohomo-arginine of CtA overlays
with reactive lysine of SFTI-1 (red) and the four other residues mimic one loop
of SFTI-1 reasonably well, although CtA is not wholly peptidic and display less
homology than SFTI-1 toward Bowman-Birk inhibitors, CtA PDB code (1TYN),
(Ganesh, Tulinsky et al. 1996).
1.2 Discovery of the sunflower trypsin inhibitor SFTI-1
1.2.1 Identification of SFTI-1
In 1999 a screen for protease inhibitor activities in various seed extracts was
conducted in Peter Shewry’s lab (Shewry 1999; Konarev, Anisimova et al. 2002;
Konarev, Griffin et al. 2004). In extracts from sunflower seeds, the most potent
trypsin inhibitor had a very low molecular weight, suggesting a peptidic nature,
and this sample was labelled SFTI-1 for Sunflower Trypsin Inhibitor 1. The
sequence and cyclic structure of SFTI-1 were only revealed after soaking
trypsin crystals with SFTI-1 and analysing the structure of the complex between
trypsin and SFTI-1 (Luckett, Garcia et al. 1999). Later structural analysis
conducted on a synthesised SFTI-1 revealed that its structure in complex did
not differ significantly from its solution structure, with the exception of the Lys5,
(figure 1.5.a). This lysine residue was shown to be flexible in solution and to be
responsible for the potent activity of SFTI-1 (Korsinczky, Schirra et al. 2001).
SFTI-1 is a 14 residues cyclic peptide and adopts an anti-parallel β-sheet
conformation stabilised by a single disulfide bond dividing the peptide into a
nine residues loop region, the ‘reactive loop’, and a five residues turn, the ‘cyclic
loop’ (figure 1.5.a). As its name suggests, SFTI-1 can inhibit trypsin, but also
several other proteinases including cathepsin, elastase, chymotrypsin and
thrombin (Luckett, Garcia et al. 1999).
The catalytic mechanism of serine proteases such as trypsin, chymotrypsin and
elastase has been characterised. The active site of these proteases contains
conserved residues, a histidine, a serine and an aspartic acid. These three
amino acid residues constitute a catalytic triad, each playing an essential role
for peptide bond cleavage (Kraut 1977).
The flexible lysine residue of SFTI-1 binds to trypsin by forming contacts with
two residues of the trypsin binding pocket and one of the main chains of trypsin.
This binding places the bond between the flexible lysine residue of SFTI-1 and
the next residue close enough to the histidine and serine of the trypsin catalytic
triad to allow hydrogen bonds formation. As a result SFTI-1 is locked within the
10
trypsin active site and inhibits trypsin without release, thus making SFTI-1 a
potent (Ki 0.1 nM) inhibitor of trypsin (Luckett, Garcia et al. 1999).
The structure of SFTI-1 mimics the functional arm of the Bowman-Birk protease
inhibitors (figure 1.5.b). In particular the flexible lysine residue of SFTI-1 and its
next residue perfectly overlay with the two residues of the functional arm of the
Bowman-Birk inhibitors that is responsible for the binding with proteases
(Luckett, Garcia et al. 1999). Bowman-Birk inhibitors are found in some
legumes and cereals seeds and are serine proteinase inhibitors. Bowman-Birk
inhibitors precursors are typically around 100 amino acid residues long and
about 65 for their mature forms, which possess a complex structure typically
stabilised by seven disulfide bonds. Bowman-Birk inhibitors traditionally have
two
active
sites
that
can
inhibit
two
proteinases
simultaneously
or
independently. Most of them can inhibit trypsin with one reactive site and
chymotrypsin with the second reactive site (Korsinczky, Schirra et al. 2001).
The active loops of these inhibitors are characteristic for their ability to retain
their conformation after binding to their target enzyme (Laskowski and Kato
1980; Bode and Huber 1992). Similarly, SFTI-1 retains its conformation when
complexed with bovine β-trypsin (Luckett, Garcia et al. 1999).
All these observations led to SFTI-1 being described as the smallest
Bowman-Birk-type inhibitor, but SFTI-1 differs from this family of proteases
inhibitors on several points. Although SFTI-1 structurally mimics the N-terminal
trypsin inhibitory reactive site of Bowman-Birk-type inhibitors, SFTI-1 is much
smaller (14 residues versus 65 on average). SFTI-1 also structurally differs from
these inhibitors by possessing a cyclic backbone, and has an increased potency
for inhibition. For example SFTI-1 is six times more potent than the soybean
Bowman-Birk inhibitor (Luckett, Garcia et al. 1999), the most potent known
inhibitor of this inhibitor family before the discovery of SFTI-1 (Voss, Ermler et
al. 1996), and is twice more potent than cyclotheonamide, the most potent small
serine protease inhibitor before SFTI-1 identification (Ganesh, Tulinsky et al.
1996). Cyclotheonamide is a cyclic pentapeptide, isolated from sponges, that
also mimics the trypsin inhibitory arm of Bowman-Birk inhibitors, but with a
weaker homology than SFTI-1 (figure 1.5.c).
11
Despite the extensive knowledge about its sequence, structure and biochemical
properties, the biological role of SFTI-1 in sunflower seeds is still undetermined.
It is known that SFTI-1 inhibits several proteinases, including many digestive
enzymes and particularly trypsin and chymotrypsin (the two most common
digestive enzymes of insects and mammals). Thus SFTI-1 might be produced
as a component of the sunflower seeds constitutive defence against predation
by gramnivores, such as seed-boring insects. In almost all kingdoms of life,
cyclic peptides with defence functions have been characterised. For example
the bacterial cyclic peptide subtilosin is antimicrobial (Babasaki, Takao et al.
1985; Blond, Péduzzi et al. 1999), and many other plants cyclotides have been
shown to be insecticidal or antiviral (Jennings, West et al. 2001; Daly, Clark et
al. 2006; Pelegrini, Quirino et al. 2007; Colgrave, Kotze et al. 2008). In addition
seeds generally produce and accumulate many serine protease inhibitors as a
part of their defence system against pathogens and pests (Tedford, Fletcher et
al. 2001). Nevertheless, the biological relevance of SFTI-1 as a sunflower seed
protease inhibitor remains to be demonstrated.
1.2.2 Multiple applications for SFTI-1 properties
SFTI-1 is not only a very potent trypsin inhibitor, but it also has other interesting
bioactivities. It can inhibit matriptase with a similar inhibition constant to that
against bovine trypsin (Long, Lee et al. 2001; Yuan, Chen et al. 2011).
Matriptase is a type II epithelial transmembrane serine protease that was
initially purified from human milk, but it is also produced by normal and
cancerous breast epithelial cells in culture. Recent studies have suggested that
its mis-regulation can contribute to abnormal cell proliferation, mobility, and
states of differentiation, and thereby can cause pathogenic states, such as
cancer (Lin, Anders et al. 1999). These results revealed that SFTI-1 can also
inhibit a protease of strong clinical importance with high specificity, launching
several studies to develop therapeutics adapted for specific targets by using
SFTI-1 as a backbone (Korsinczky, Schirra et al. 2001; Daly, Chen et al. 2006;
Kawakami, Ohta et al. 2009; Swedberg, Nigon et al. 2009).
For example, a modified SFTI-1 with three substitutions was generated. This
modified SFTI-1 loses its ability to inhibit trypsin and can specifically and
selectively block the activity of the human kallikrein-related peptidase 4 (KLK4),
12
a trypsin-like serine protease that can activate many tumorigenic and metastatic
pathways causing prostate cancer. Currently, there are no KLK4-specific
small-molecule inhibitors available for therapeutic development. This modified
SFTI-1 was shown to be extremely robust and to have a high potency of
inhibition for KLK4, making it an excellent candidate for further therapeutic
development (Swedberg, Nigon et al. 2009). This illustrates the high potency of
SFTI-1 as a scaffold for therapeutic molecules with high specificity. Other
analogues of SFTI-1 were successfully designed to specifically inhibit the
activity of chymotrypsin, leukocyte elastase (Pereira, Salgado et al. 2009), or
cathepsin-G (Legowska, Debowski et al. 2009), mostly by replacing the Lys5 of
the SFTI-1 active site with other residues to change its binding specificity.
These consecutive successes also drew attention to SFTI-1 from biotechnology
and pharmaceutical industries (Robinson, DeMarco et al. 2008; Pratley, Nauck
et al. 2010).
13
Figure 1.6. PawS1 and SFTI-1 sequences and structures. (a) PawS1
sequence with ER signal (rose), SFTI-1 (aqua), PawS1 albumin small subunit
(green) and large subunit (orange). Spacer regions or the GLDN tail are
coloured black. (b) PawS1 protein precursor (after ER signal removal)
hypothetical 3D peptide model in ribbon format. It was generated by homology
modelling using EasyModeller4.0 (Kuntal, Aparoy et al. 2010), using all
available napin-type albumin structures (1SM7A, 1S6DA, 1W2QA, 1PSYA,
2LVFA) as template for the albumin part of PawS1 and SFTI-1
three-dimensional structures as a template for the SFTI-1 part of PawS1.
Disulfide bonds are indicated as sticks. Model evaluation data: ProSA-web
(Wiederstein and Sippl 2007) Z-Score= -4.87, Procheck (Laskowski, MacArthur
et al. 1993) Ramachandran parameters: Procheck Ramachandran Plot
(Allowed= 90.8%, Additionally allowed= 8.5%, Generously Allowed= 0.7%,
Disallowed region= 0.0%) and Procheck G-factors (Dihedrals= -0.11, Covalent=
0.26, Overall= 0.02). (c) SFTI-1 solution structure by nuclear magnetic
resonance spectroscopy, (PDB code 1JBL, (Korsinczky, Schirra et al. 2001)),
presented in stick format. The carbon backbone is represented in cyan with the
disulfide bond in yellow, nitrogen in dark blue, and oxygen in red.
1.3 Discovery of SFTI-1 biosynthetic origins
1.3.1 Identification of PawS1 (Preproalbumin with SFTI-1)
Despite the lack of a characterised function, all the remarkable bioactivities and
potential therapeutic applications displayed by SFTI-1 have drawn attention to
this low molecular mass protein. Thus an effort was made to identify the DNA
sequence encoding SFTI-1. In 2011, Mylne et al. performed 3′ then 5′ RACE
PCR using mature sunflower seed RNA and identified the SFTI-1 encoding
sequence, which revealed something rather unexpected. The sequence
encoding SFTI-1 is buried within PawS1 (Preproalbumin with SFTI-1), which is
also a precursor for a sunflower napin-type seed storage albumin. The gene
PawS1 encodes a protein precursor of 151 amino acid residues, composed of
an ER-signal, SFTI-1 and two subunits that constitute a typical napin-type seed
storage albumin precursor, separated by spacer propeptides (figure 1.6). Mylne
et al. also found a second similar gene from sunflower, that they called PawS2.
It also encodes a napin-type albumin and a small cyclic peptide and displays
strong sequence similarities with PawS1 for the encoded albumin, but less for
the small cyclic peptide. This second cyclic peptide was called SFT-L1, for
‘SFTI-Like 1’, but cannot inhibit trypsin and no bioactivity has been determined
for it (Mylne, Colgrave et al. 2011).
In this study, both SFTI-1 and the napin-type pro-albumin adjacent to it in
PawS1 protein precursor (figure 1.6) were shown to require proteolytic
cleavage for maturation (Mylne, Colgrave et al. 2011).
1.3.2 The napin-type seed storage albumins
As described above, SFTI-1 is buried inside the N-terminal propeptide of a
napin-type albumin precursor produced in sunflower seeds. So far this type of
dual-precursor has only been found in sunflower and close relatives (Elliott,
Delay et al. 2014). Angiosperms and other seed plants produce seeds to
propagate and disperse themselves, but seeds are also the major plant organ
consumed by humans and livestock and constitute their major source of protein
(Shewry, Napier et al. 1995). Seed storage proteins can constitute over 50% of
14
total seed proteins (Shewry and Halford 2002) and therefore have been of
major interest since the 18th century and were already extensively reviewed in
1909 (Osborne 1909). Plant seed storage proteins are accumulated as a source
of nitrogen and sulphur for the germinating seedling (Shewry, Napier et al.
1995; Herman and Larkins 1999) and are classified into four categories based
on which solvents they dissolve in most readily. In seed extracts albumins,
globulins, prolamins, and glutelins can be dissolved respectively in water, saline
solution, water/alcohol mixture, and acid or basic solution (Osborne 1909).
Seed albumins are highly stable and constitute a major portion of seed storage
proteins in dicots (Shewry and Pandya 1999).
Seed albumins are found in most dicot seeds and are also referred as 2S
albumins. Despite their obvious importance as a source of nutrients for the
emerging seedling, various seed albumins have been reported to possess
bactericidal (Maria-Neto, Honorato et al. 2011) and fungicidal (Terras, Schoofs
et al. 1992; Ribeiro, Taveira et al. 2012; Freire, Vasconcelos et al. 2015)
activities. Seed albumins have also often been of particular interest in selection
programs due to their allergenic properties. For example the two principal
albumin allergen from peanuts have been silenced (Zhou, Wang et al. 2013),
and albumins with allergenic effect are found in many other seeds of
commercial importance such as Brazil nut (Nordlee, Taylor et al. 1996), yellow
mustard seeds (Menéndez-Arias, Moneo et al. 1988), castor bean seeds
(Thorpe, Kemeny et al. 1988) and peanut (Lehmann, Schweimer et al. 2006).
It appears that seed albumins represent a large and diverse group of seed
proteins. Among this diversity of seed albumins, the seed storage albumins are
the most widely represented in dicotyledonous plants seeds. The structure and
properties of seed storage albumin are exemplified by napins of oilseed rape
(Brassica napus), the first well studied seed storage albumins, with napin-type
albumins being the most common type of seed storage albumins in
dicotyledonous plants seeds (Shewry and Pandya 1999). Plants usually contain
multiple napin-type albumins genes that are often intron-less. Napin-type
albumins have a highly conserved pattern of cysteine residues and are often
rich in glutamine. Each mature napin-type albumin is processed from a
precursor with a conserved structure. A typical napin-type albumin precursor
contains an ER-signal and a pro-domain which are both removed during the
15
maturation process. During proteolytic maturation of the napin-type albumin
precursor, the napin-type albumin is often cleaved (by removal of an internal
propeptide) into a heterodimer of small and large subunits with molecular
weights of around 4 and 9 kDa respectively that remain connected by disulfide
bonds (Ericson, Rödin et al. 1986). Mature napin-type albumins generally have
eight conserved cysteine residues, two in the small subunit and six in the large
subunit (Shewry, Napier et al. 1995; Shewry and Pandya 1999). However, the
albumins from Fabaceae, such as the pea PA1a, are different and constitutea
single unit with only two internal couples of bonded-cysteines and are encoded
in a dual-precursor (Higgins, Chandler et al. 1986), as described above.
1.3.3 Seed albumins are matured by AEP
AEP belongs to the superfamily of C13 peptidases and cleaves preferentially on
the C-terminal side of asparagine or aspartic acid residues. The active site of
AEP contains a conserved catalytic dyad, constituted of one histidine residue
and one cysteine residue, which are both preceded by four conserved
hydrophobic residues and separated by approximately 35 residues. In plants,
the AEP active site also contains highly conserved residues, an arginine residue
and a serine residue that play important roles in the binding of the targeted
residue in the enzyme active site. Additionally, plant AEPs possess another
highly conserved sequence of four residues at the C-terminus that is essential
for their vacuolar targeting (Kuroyanagi, Nishimura et al. 2002).
AEP function in plants has been deeply studied and characterised. The first
report of AEP activity was in vacuolar extracts of maturing pumpkin seeds which
could convert seed pro-globulin to mature globulin (Hara-Nishimura and
Nishimura 1987). Soon after, AEP activity was demonstrated in extracts of
castor bean (Ricinus communis) extracts (Hara-Nishimura, Inoue et al. 1991).
AEP was then characterised as a cysteine protease that cleaves seed storage
protein precursors, for their maturation, on the C-terminal side of asparagine
residues preferentially or aspartic acid residues occasionally. (Hara-Nishimura,
Inoue et al. 1991; Hara-Nishimura, Takeuchi et al. 1993; Sheldon, Keen et al.
1996; Hiraiwa, Kondo et al. 1997; Hiraiwa, Nishimura et al. 1999). AEP activity
has also been studied in barley (Hordeum vulgare L.) cell walls (Linnestad,
Doan et al. 1998), soybean (Glycine max) (Jung, Scott et al. 1998) and kidney
16
bean (Phaseolus vulgaris) (Rotari, Dando et al. 2001). However sequence
analyses of several substrates of various AEPs showed a strong variability for
the residues preceding the asparagine or aspartic acid residues targeted by
AEP (Jung, Scott et al. 1998; Rotari, Dando et al. 2001). Other studies suggest
that substrate conformation is particularly important for recognition by proteases
and that most proteases usually recognize their target residues when they are
placed at an extremity of a β-strand (Tyndall, Nall et al. 2005).
Thus, the conditions that make a given asparagine or aspartic acid residue an
AEP target remain to be understood.
Further studies conducted on the model plant Arabidopsis thaliana (A. thaliana),
proved that AEP is a key enzyme essential for the processing of all seed
albumins and globulins (Shimada, Yamada et al. 2003; Gruis, Schulze et al.
2004), the major components of the seed storage proteins. A. thaliana has four
AEP genes (Kinoshita, Nishimura et al. 1995; Gruis, Schulze et al. 2004;
Nakaune, Yamada et al. 2005). These are AtAEP1 (At2g25940), AtAEP2
(At1g62710), AtAEP3 (At4g32940) and AtAEP4 (At3g20210).
In A. thaliana, the AEP genes are expressed in many plant tissues, but each
has specific expression and function. AtAEP2 is mainly expressed in seeds and
is the most important for maturation of storage proteins in A. thaliana seeds.
AtAEP1 and AtAEP3 are also expressed in seeds at lower level than AtAEP2
and can compensate for most of its function when AtAEP2 is knocked out,
AtAEP3 being more effective at seed storage protein maturation than AtAEP1,
although their activity could also be dependent upon AtAEP2 (Shimada,
Yamada et al. 2003; Gruis, Schulze et al. 2004). AtAEP4 is expressed only in
maturing flower and young seed coat and was shown to be involved in seed
coat maturation by programmed cell death of specific cell layers. It is not
present within seeds and is not involved in seed protein storage maturation
(Nakaune, Yamada et al. 2005).
It has been shown that A. thaliana aep1 aep2 aep3 triple mutants (expressing
AEP4 only) and aep quadruple mutants (aep1 aep2 aep3 aep4), also called
aep-null mutants, abnormally accumulate many precursors of storage proteins
in their seeds. However, it has also been reported that under normal growth
condition these triple and quadruple mutants have no morphological or
17
developmental differences from wild-type plants. A. thaliana aep-null lines were
reported to germinate normally under standard conditions with no morphological
or developmental difference from the wild-type (Kinoshita, Nishimura et al.
1995; Gruis, Selinger et al. 2002; Shimada, Yamada et al. 2003; Gruis, Schulze
et al. 2004). This is surprising as in higher plants, seed storage proteins are
deposited in protein storage vacuoles as a source of carbon, nitrogen, and
sulphur during seed germination. Thus remobilisation of amino acids that
compose these proteins is critical to germination and provide building blocks for
rapid growth (Shewry, Napier et al. 1995; Herman and Larkins 1999). Because
aep-null seeds don’t accumulate correctly matured seed storage proteins, one
might have expected this nutritional source used by the developing seedlings to
be less stable or harder to re-mobilise, and potentially result in germination
problems. Thus I investigated the possibility of germination delay phenotypes
for several aep mutants and I present my results in Chapter 2.
Although AEP is the major seed storage protein processing enzyme (Shimada,
Yamada et al. 2003; Gruis, Schulze et al. 2004), other functions have been
reported for AEP, in different tissues than the seed. AtAEP3 is also expressed
in stem and leaves and is up-regulated in the lytic vacuoles of vegetative
tissues during senescence and under various stress conditions (Kinoshita,
Yamada et al. 1999). Several studies suggest that AEP could have a role in
programmed cell death (see review by Hatsugai, Yamada et al., 2015 for
details). For example, AtAEP3 was reported to play an essential role in
programmed cell death of pathogen infected leaves (Kuroyanagi, Yamada et al.
2005). However, despite the up-regulation of AtAEP3 in senescing leaves, to
date no function has been reported for this gene that would imply a role in
senescence. In Chapter 3, I present the results of my investigations for a
function for A. thaliana AEPs in dark-induced senescence of individual leaves.
18
Figure 1.7. Acyclic and cyclic SFTI-1. (a) Acyclic SFTI-1 (green, PDB code
1JBN, (Korsinczky, Schirra et al. 2001)) and (b) cyclic SFTI-1 (cyan, PDB code
1JBL, (Korsinczky, Schirra et al. 2001)), shown in stick mode with the carbon
backbone represented in green or cyan, disulfide bonds in yellow, nitrogen in
dark blue and oxygen in red. Acyclic and cyclic SFTI-1 display close structural
similarity even though the acyclic version has no peptide bond between SFTI-1
first glycine and last aspartic acid residues. In SFTI-1 acyclic version these two
residues have their respective free termini maintained in close proximity.
1.4 SFTI-1 is matured by the seed albumin processing machinery
1.4.1 SFTI-1 requires AEP for proper processing in transgenic lines
The role played by AEP in SFTI-1 processing at both peptide proto-termini was
first suggested in Mylne et al. (2011). As SFTI-1 is buried in a precursor
(PawS1) for a seed albumin, generally processed by AEP (Shimada, Yamada et
al. 2003; Gruis, Schulze et al. 2004), and because, inside PawS1, SFTI-1 is
preceded by an asparagine residue and ends with an aspartic acid residue,
which are both AEP specific proteolysis sites (Hara-Nishimura, Inoue et al.
1991; Hiraiwa, Nishimura et al. 1999) AEP was likely to be the SFTI-1
processing enzyme.
It was also suggested that AEP might perform SFTI-1 cyclisation. First, The
napin-type albumin precursors form their disulfide bonds before AEP processing
(Krebbers, Herdies et al. 1988; D'Hondt, Van Damme et al. 1993; HaraNishimura, Takeuchi et al. 1993; Müntz 1998), suggesting that SFTI-1 would
also acquire a constrained acyclic structure prior to proteolytic maturation, thus
already bringing its two proto-termini closer together. This can be observed in
the solution structure of the acyclic version of SFTI-1 (PDB code 1JBN,
(Korsinczky, Schirra et al. 2001)) that shows that the first and last residues of
SFTI-1 remain in close proximity even when not linked to each other by a
peptide bond (figure 1.7). Then, the proposed sequence of events for SFTI-1
maturation, based on the fact that AEP targets preferentially asparagine
residues over aspartic acid residues (Hara-Nishimura, Inoue et al. 1991;
Hiraiwa, Nishimura et al. 1999), would be that the first glycine residue of SFTI-1
is liberated before cleavage at its last asparagine residue. These features would
create highly energetically favourable conditions for a cleavage-coupled
cyclisation reaction.
19
Figure 1.8. SFTI-1 maturation in sunflower and in A. thaliana.
MALDI-TOF-MS profiles of seed peptide extracts from sunflower as well as
A. thaliana expressing a synthetic PawS1 in wild-type (WT) and aep-null
backgrounds. A mass for acyclic SFTI-1 (1,531.82 Daltons) is the major form
detectable in WT A. thaliana. Only mis-processed products are detectable in the
aep-null background.
In Mylne et al. 2011, an pOLEOSIN:PawS1 construct expressing native PawS1
under the control of a seed specific A. thaliana OLEOSIN promoter, was
introduced in the genome of the model plant A. thaliana (Mylne, Colgrave et al.
2011). The availability of substantial mutant collections makes A. thaliana a
great model for functional studies. A. thaliana has four AEP genes and several
aep-null (quadruple null mutant) lines are available. In addition, no small
peptides from A. thaliana have ever been described and A. thaliana seed
peptide profile on MALDI-TOF-MS appears to lack any masses of significance
in the mass range and experimental conditions used (figure 1.8). This provides
a great opportunity to use A. thaliana as a genetic tool to study SFTI-1
maturation and the potential involvement of AEP in this process. Thus, the
pOLEOSIN:PawS1 construct was introduced in a wild-type and in an aep-null
background. Seeds peptide extracts of these transgenic lines were analysed by
MALDI-TOF-MS.
A mass (1,513.82 Daltons) identical to that of sunflower SFTI-1 as well as a
more abundant mass corresponding to acyclic SFTI-1 (1,531.82 Daltons) can
be detected in seeds of wild-type A. thaliana pOLEOSIN:PawS1 lines, but not in
extracts of aep-null A. thaliana seeds (figure 1.8). Further analyses have
demonstrated that the 1,513.82 Daltons and 1,531.82 Daltons masses
produced in A. thaliana pOLEOSIN:PawS1 lines correspond to cyclic and
acyclic SFTI-1 (Mylne, Colgrave et al. 2011). These data confirm that
A. thaliana can process SFTI-1 correctly from PawS1, but is less efficient than
sunflower at cyclisation (figure 1.8).
Several lines of evidence were presented to support a role for AEP in SFTI-1
processing. First, no mass for cyclic or acyclic SFTI-1 was detectable in
transgenic A. thaliana pOLEOSIN:PawS1 seeds from the aep-null genetic
background, (figure1.8); but instead masses corresponding to SFTI-1 with
additional or less amino acid residues at its N- and C- termini were detected,
corresponding to the amino acid residues sequence in PawS1 and thus
suggesting mis-processing of SFTI-1 (figure1.8). These observations are
suggesting that the enzyme (or enzymes) responsible for accurate SFTI-1
cleavage was lacking in the aep-null genetic background. Then, additional
studies revealed that mutations of the asparagine residue preceding the SFTI-1
N-terminus or the aspartic acid residue at the SFTI-1 C-terminus caused
20
mis-processing of SFTI-1 in a wild-type background (Mylne, Colgrave et al.
2011). This strongly supported the idea that AEP could be responsible for
SFTI-1 processing. Finally, using targeted mass spectrometry, unprocessed
PawS1 albumin and SFTI-1 were shown to abnormally accumulate in an
aep-null genetic background. Several tryptic fragments from the mature PawS1
albumin large subunit were used as controls and the tryptic fragment
SIPPICFPDGLDNPR, that contains SFTI-1 C-terminus and PawS1 albumin
small subunit N-terminus, was selected to represent unprocessed PawS1. The
tryptic
fragment SIPPICFPDGLDNPR
increased 160-fold in an aep-null
background relative to wild-type (Mylne, Colgrave et al. 2011), attesting that
SFTI-1 processing was strongly impaired in an aep-null background.
In wild-type A. thaliana seeds, the processing of SFTI-1 is correct, but
cyclisation is not efficient. The great majority of SFTI-1 is produced as an
acyclic form in this context, with less than 5% of SFTI-1 produced as its native
cyclic form estimated (Mylne, Colgrave et al. 2011). The sequencing of the
1,531.82 Daltons mass by MS/MS confirmed its acyclic nature and sequence.
Three possibilities could explain these results. AEP is not responsible for
SFTI-1 cyclisation at all, A. thaliana AEPs do not have efficient cyclisation
capability, or an activity that cleaves at aspartic acid is too dominant and masks
efficient macro cyclisation. The second scenario would suggest that sunflower
AEP would have evolve unusual cyclisation abilities. Additionally parallel studies
showed that other plant cyclic peptides were also processed and, in some
cases, cyclised by AEP (Mylne, Chan et al. 2012).
21
Figure 1.9. Proposed model for PawS1 maturation. PawS1 structure was
generated as described above (see figure 1.6). SFTI-1 (aqua), PawS1 albumin
small subunit (green) and large subunit (orange), spacer regions (grey) GLDN
tail (black). Napin-type albumins have a conserved proteolytic maturation
process (Hara-Nishimura, Takeuchi et al. 1993; Hiraiwa, Kondo et al. 1997;
Otegui, Herder et al. 2006) and the model for SFTI-1 maturation was proposed
in 2015 (Bernath-Levin, Nelson et al. 2015). Seed storage protein precursors
meet AEP in multi-vesicular bodies originating for the Golgi complex and
targeted to the protein storage vacuole. (a) First AEP (pale yellow shape)
cleaves the three asparagine residues preceding all mature domains of PawS1
(SFTI-1 and both PawS1 albumin subunits). (b) Then AEP performs the GLDN
tail cleavage coupled to SFTI-1 cyclisation and the PawS1 albumin remaining
cleavage is perform by another enzyme (brown shape) called aspartic protease
(Hiraiwa, Kondo et al. 1997), to release (c) mature SFTI-1 and PawS1 albumin.
1.4.2 A recombinant AEP can perform SFTI-1 maturation
In complement to the A. thaliana system approach described in the previous
section, Bernath-Levin et al. (2015) developed an in vitro approach to
investigate a potential role for AEP in SFTI-1 processing and cyclisation. In an
in situ assay they confirmed that crude extract made from sunflower seeds can
produce SFTI-1 (GRCTKSIPPICFPD) by cleavage and cyclisation of a synthetic
SFTI-GLDN precursor (GRCTKSIPPICFPDGLDN). In these artificial conditions,
the reaction appeared to be less efficient than in vivo as a high proportion of
acyclic SFTI-1 was produced.
However, the unexpected presence of acyclic SFTI-1 after incubation allowed
an additional experiment in which they introduced heavy water to the in situ
assay. By performing the in situ in
18 O-water,
a 1.9 Dalton mass increase was
consecutively observed for the acyclic version of SFTI-1 produced, but no mass
shift at all for the cyclic SFTI-1 produced. The 1.9 Da mass shift was consistent
with the incorporation of a heavy oxygen atom (from the heavy water) during
hydrolysis. This confirmed the hypothesis that SFTI-1 cyclisation does not
involve separate hydrolysis and ligation. It suggested that the macro cyclisation
reaction is coupled energetically to the cleavage reaction at the last aspartic
acid residue of SFTI-1 (Bernath-Levin, Nelson et al. 2015).
Bernath-Levin et al. (2015) also confirmed that a recombinant jack bean AEP
produced in E. coli could perform the coupled cleavage and cyclisation reaction
and generate cyclic SFTI-1 in vitro. Of five plant AEPs tested, only jack bean
AEP could perform the macro cyclisation reaction, suggesting that this activity
might be restricted to some specific AEPs (Bernath-Levin, Nelson et al. 2015).
In sunflower, which contains several AEPs, it is possible that one AEP might
have diverged from the others and acquired unusual cyclisation capability.
These data strongly support the idea that AEP is responsible for the maturation
of both mature products of PawS1 precursor (see model in figure 1.9).
Nevertheless, to date no AEP from sunflower with macro cyclisation capability
has been discovered.
22
Figure 1.10. PawS1, an unusual napin precursor. Napins are parts of the 2S
albumin class of seed storage proteins. Napin precursor structure (as shown
above) is usually highly conserved. They have a signal peptide targeting them
to the endoplasmic reticulum (ER, pink) and two subunits linked by disulfide
bonds, the small subunit (SSU, green) and a the large subunit (LSU, orange).
Napins are matured into a heterodimer by proteolytic removal of propeptide
regions (black lines). PawS1 is a unique napin precursor because it evolved to
also contain a precursor for a small cyclic peptide, SFTI-1 (blue) and its tail
(dark). This tail is not in the mature SFTI-1, but is essential for its maturation.
1.5 SFTI-1 as a model to study protein biosynthesis and evolution
1.5.1 SFTI-1 is an unusual buried cyclic peptide
In many regards, SFTI-1 is a unique type of plant cyclic peptide. First its
structure greatly differs from the traditional cystine-knot motif found in many
plant cyclic peptides and other acyclic trypsin inhibitory peptides from plants,
and consists of a twisted cyclic loop, with a single disulfide bond (Mylne,
Colgrave et al. 2011). SFTI-1, like most plant cyclic peptides described so far, is
encoded by a gene. However, unlike SFTI-1, most plant cyclic peptides are
encoded by genes encoding a single protein or variants of the motif in
multi-copies (Mylne, Chan et al. 2012), suggesting that they have undergone
extensive gene internal duplication of the cyclic peptide domain. The protein
precursor for SFTI-1 varies from all other cyclic peptide precursors. It is
encoded by the gene PawS1, which unlike other genes encoding cyclic
peptides precursors, is a dual protein gene that encodes a traditional napin-type
albumin precursor adjacent to a completely different type of protein (a small
cyclic peptide). PawS1 and PawS2 are the first members to have been
discovered (Mylne, Colgrave et al. 2011) of a family of genes restricted to
sunflower and its close relatives and encoding a dual precursor for a napin-type
albumin and a cyclic peptide (Elliott, Delay et al. 2014). These genes also differ
from the Ctc gene families, which also encode dual precursors which could
appear similar as they also encode a cyclic peptide and an albumin. However,
this albumin appears to be a non-functional version (Nguyen, Zhang et al. 2011)
of the PA1a albumin, which is different from the napin type albumins (Higgins,
Chandler et al. 1986). In addition Ctc genes contain a small intron inside the
sequence encoding the ER signal, which is unusual to napin-type albumin
genes which are traditionally intron-less, but quite common for cyclotide genes
(Nguyen, Zhang et al. 2011). Thus the originality of SFTI-1 precursor raises
many questions, but also provides unique study opportunities. For example,
SFTI-1 is produced from the dual precursor PawS1, which also encodes a
napin-type albumin, the most common type of seed storage albumin. However
PawS1 dual type of napin and additional peptide precursors is restricted to
23
sunflower and its close relatives, and most napin precursors in other plants
produce only napin (figure 1.10).
This originality provides an interesting opportunity to address current limitations
to techniques available for the study of napin and seed storage protein in
general. The study of seed storage protein maturation and trafficking, for
example, particularly suffers from the limitations of current techniques applied
for protein quantification from tissues samples. Many proteomics laboratories
have developed protein quantification methods based on mass spectrometry
detection of tryptic peptides, thus requiring digestion of the protein samples
prior to detection. However, these methods are poorly suited to studies of
protein maturation, because it is difficult to define whether tryptic peptides are
derived from the mature or immature form, and these methods cannot allow
identification of native (non-digested with trypsin or other enzymes) proteins
(Bantscheff, Lemeer et al. 2012). Other methods can allow identification of
native proteins, and thus could be used for detection of seed storage proteins,
but these methods are not suited for quantification (Karas and Hillenkamp 1988;
Fenn, Mann et al. 1989). Contrary to larger proteins, SFTI-1, as many small
peptides, is easily and precisely detectable in its native form using mass
spectrometry (Mylne, Colgrave et al. 2011; Elliott, Delay et al. 2014). Because
of their small size, small peptides are also cheap and easy to synthesise, in
order to be used as controls necessary for any quantification technique. This
allows the development of simple quantitative mass spectrometry methods that
precisely quantify such small peptides. In Chapter 4, I present a proof of
concept application for such a quantification assay, in the study of seed storage
protein trafficking. Indeed, PawS1 is an original seed storage albumin precursor
that co-produces the small cyclic peptide SFTI-1 along with PawS1 albumin
upon maturation. In transgenic A. thaliana producing PawS1, the cyclisation of
SFTI-1 is inefficient and the major form of SFTI-1 is acyclic (Mylne, Colgrave et
al. 2011). Because acyclic SFTI-1 and PawS1 albumin are produced from the
same protein precursor, processed by the same enzyme and both accumulate
in dry seeds, their respective quantity in dry seeds should be linked, as well as
their fate. Thus in dry seeds of wild type and mutants affected in seed storage
protein maturation, the variation of acyclic SFTI-1 quantity should be
proportional to the variation of total mature albumin quantity. So I propose to
use acyclic SFTI-1 quantification as a tool, to provide an alternative method to
24
compare this variation in several mutants of the seed storage protein trafficking
pathway, and gather more understanding about the complex regulation of the
fundamental process of protein trafficking, which is intimately associated with
protein maturation.
The manner by which SFTI-1 is produced from the dual precursor PawS1 also
raises questions about its biosynthesis. Its position within the precursor PawS1,
in the N-terminal propeptide that is traditionally removed during napin-type
albumin maturation, and its requirement for maturation by AEP, the central seed
storage protein maturation enzyme, suggest that SFTI-1 could be produced
opportunistically as a by-product of its adjacent albumin maturation. In
Chapter 5, I will present a study based on the use of synthetic genes in which
the sequence for SFTI-1 is put in different contexts. I introduced these genes
into A. thaliana and then using the same quantification method described in
Chapter 4, I compared acyclic SFTI-1 quantity in the seeds of each transgenic
line. By using this method, I provide new data supporting that AEP is
responsible for SFTI-1 release from its protein precursor and demonstrating that
SFTI-1 has the ability to recruit AEP for its maturation, and is not a by-product
of its adjacent albumin maturation.
Previous work also showed that perturbing the double cysteine bond or that
single alanine substitutions within SFTI-1 sequence in PawS1 preproalbumin
(e.g. R37A, T39A, P43A, P44A, I45A, F47A) abolish in vivo acyclic SFTI-1
expression (Mylne, Colgrave et al. 2011; Elliott, Delay et al. 2014). This implies
that not just any sequence will emerge from such chimeric proteins and raises
questions about how such an efficient sequence has evolved. Contrary to other
plant cyclic peptides, SFTI-1 evolution has partially been elucidated, and, as I
will elaborate later, it depicts a quite unusual path for protein evolution in
general that has been taken in sunflower and close relatives toward the
production of cyclic protease inhibitors.
25
Figure 1.11. Model for SFTI-1 evolution. PawS1 is the first member of a
recently discovered family of napin-type albumin precursors with buried
peptides that evolved within the Asteraceae family over 40 million years ago.
SFTI-1 is the first member of a family of buried peptides and appears to have
evolved de novo in a napin-type albumin gene (that became PawS1), via a
genetic expansion that did not alter the napin-type albumin coding frame and
allowed two proteins to be matured from it instead of one. (a) First the ancestral
processing site between the N-term propeptide and the first subunit of the
ancestral albumin expanded and (b) in the daisy family further expansion led to
the creation of an extra processed peptide compared to the original precursor.
(c) Later, after acquisition of the double bounded cysteines and the complete
SFTI-1 type processing sites, allowing cyclisation, the peptide became stable.
(d) Further events led to evolve a mimic of a Bowman-Birk inhibitor loop in this
peptide, and (e) in sunflower close relatives only it evolved trypsin inhibitory
activity in sunflower close relatives only and (f) diverged in SFTI-1 in sunflower.
AEP processing site are indicated on SFTI-1 precursor. During maturation
SFTI-1-GLDN is first cut out by the strong AEP preference for asparagine (red
Arrows). SFTI-1-GLDN is then processed by an AEP that specifically
recognises Asp14 on SFTI-1 and simultaneously cuts (blue arrow) and ligates it
with SFTI-1 Gly1.
1.5.2 SFTI-1 stepwise evolution
SFTI-1 is the first member of a recently discovered family of buried peptides
that evolved within the daisy (Asteraceae) family over forty million years ago
(Elliott, Delay et al. 2014). SFTI-1 appears to have evolved via a genetic
expansion in an ancestral napin-type albumin gene, without altering the mature
napin-type albumin coding frame, and thus allowed the two proteins to be
matured from one precursor (Elliott, Delay et al. 2014). In figure 1.11, I present
a model for SFTI-1 evolution, based on the original model and data from Elliott,
Delay et al. (2014). In this model, I illustrate, what I believe represents the first
example of, the natural and stepwise de novo evolution of a coding sequence
for a new independent protein within a pre-existing gene without altering the
reading frame for the ancestral protein. In Chapter 6, I compare this type of
evolutionary mechanism to other established evolutionary mechanisms and
illustrate how this peptide emerged by a mechanism that has never been
reported previously. I call this mechanism neoproteinisation and highlight the
differences
between
previously
described
evolution
mechanisms
and
neoproteinisation.
One characteristic of neoproteinisation, which I also highlight in Chapter 6, is to
create unusual genes encoding multiples protein (more than just one protein).
SFTI-1 differs from the so-called cryptides or crypteins (Autelitano, Rajic et al.
2006) because SFTI-1 is not a breakdown product of a functional protein, rather
it emerges independently from a biologically inert ‘spacer’ or pro-region of a
proalbumin that is removed during maturation into albumin. PawS1 also differs
from most polyprotein genes as it encodes two proteins differing in size,
structure, sequence and function. SFTI-1 is a 14 residues macrocycle and
inhibits several proteinases (Luckett, Garcia et al. 1999), many of which are
digestive enzymes, implicating a role protecting seeds from gramnivores. By
contrast, PawS1 albumin is a linear heterodimer of a 24 residues small subunit
connected by disulfide bonds to a 67 residues large subunit forming a bundle of
five α-helices, folded in a right-handed super-helix (see figure 1.9). Mature
PawS1 albumin contributes to a pool of sunflower seed albumins (Jayasena,
Franke et al. 2016) that are catabolised during germination to provide a nitrogen
and sulphur source for the developing seedling.
26
Based on this characteristic, I looked in the literature for other similar examples.
I
believe
that,
similarly
to
newly
discovered
evolution
mechanisms,
neoproteinisation has been overlooked, and that many other examples exist. In
Chapter 7, I present one potential example that I found and investigated that led
to the discovery of what I am sure will be a new family of small
cysteine-bounded peptides buried in larger protein precursor, namely seed
storage vicilins, that evolved by neoproteinisation.
27
1.6 Thesis aim
The overarching goal of this thesis was to use the original cyclic peptide SFTI-1
as a model to study the maturation and evolution of proteins, and to explore
eventual new functions for AEP. As the several studies that I conducted toward
that goal relate to different subjects, I present their results in separate Chapters.
The enzyme proposed to be responsible for SFTI-1 maturation is AEP. Despite
a wide expression range across many plant tissues, no function has been
confirmed for AEP apart from seed protein maturation. So another goal of the
thesis was to investigate other potential functions for AEP in plants. In
Chapter 2, I present the results of my investigation of AEP-deficiency
implications on in seed germination. Then, in Chapter 3, I present the results of
my investigation of the potential role for AEP in senescence.
The unusual nature of the PawS1 precursor also provided an opportunity to
study protein biosynthesis and evolution. In Chapter 4, I propose using SFTI-1
as a tool to estimate seed storage protein production. I present an example of
application in protein trafficking studies, and provide insights into the complex
regulation of protein trafficking and maturation. In Chapter 5, I present the study
of SFTI-1 production from various chimeric protein precursors. I provide more
evidence supporting AEP requirement for SFTI-1 maturation from its protein
precursor. I also demonstrate that SFTI-1 and its trailing GLDN sequence have
the ability to recruit AEP for precise cleavage from alternative host proteins, and
that SFTI-1 is not a mere by-product of its adjacent albumin maturation.
In Chapter 6, I compare and highlight the differences between previously
described protein evolution mechanisms and neoproteinisation. I also illustrate
how genes which have undergone neoproteinisation differ from other genes
encoding multiple proteins. The SFTI-1 family is the first example of a
neoproteinisation event, but in Chapter 7, I introduce a new family of small
proteins, also buried in larger protein precursors that are potentially widespread
among most flowering plant families. The evidence that I gathered suggest that
this other peptide family also evolved by neoproteinisation, and bolsters the
notion that neoproteinisation has been overlooked in the past and that many
more cases of neoproteinisation are waiting to be discovered.
28
Chapter 2
The importance of seed storage
protein maturation for
germination
29
Figure 2.1. One-dimensional SDS-PAGE analysis of A. thaliana seed
protein content in various aep backgrounds. Protein precursors (pro) of
albumins and globulins are indicated, as well as alpha and beta subunits of
globulins.
2
The
importance
of seed
storage
protein
maturation
for
germination
2.1 Introduction
A. thaliana aep-null lines lack mature seed storage proteins and accumulate
mis-processed seed storage proteins precursors compared to wild-type
(Kinoshita, Nishimura et al. 1995; Gruis, Selinger et al. 2002; Shimada, Yamada
et
al.
2003;
Gruis,
Schulze
et
al.
2004).
Triple
mutant
lines
aep2-3 aep3-1 aep4-1 and aep1-1 aep2-3 aep3-1 have a similar seed profile to
the aep-null, whereas aep1-1 aep3-1 aep4-1 and aep1-1 aep2-3 aep4-1 have a
similar seed profile to wild-type (figure 2.1). This mis-processing of seed
storage proteins could have an impact on seed physiology during germination
and the remobilisation of seed storage proteins. If the mis-processing of seed
storage proteins leads to germination defects, then the aep-null line and the
triple mutant lines aep2-3 aep3-1 aep4-1 and aep1-1 aep2-3 aep3-1 should
exhibit a delay in germination compared to wild-type, whereas one would expect
aep1-1 aep3-1 aep4-1 and aep1-1 aep2-3 aep4-1 to germinate similarly to
wild-type.
No abnormal germination was reported for aep-null lines, but in the previous
studies a problem with germination could have been missed if performed in
standard laboratory conditions. These experiments were not detailed and were
described as ”data not shown”, but were likely to be done on nutrient-rich
media. Most commonly, germination studies on A. thaliana are conducted on
traditional Murashige-Skoog media, often complemented with sucrose or
glucose. These nutrients help the plants to develop in these artificial conditions.
During the early germination process, the seed naturally only uses its reserves
to develop its first roots before it can use nutrients from the environment
(Shewry, Napier et al. 1995; Herman and Larkins 1999). Thus the artificial
presence of high concentrations of soluble nutrients, such as sugar or nitrogen
commonly supplemented in germination media, could have complemented
germination defects during the previous studies.
30
As no data were provided to support the report that aep-null seeds germinate as
wild-type, in this study I wanted to investigate aep-null seeds germination
abilities in more detail.
To provide a detailed analysis of a potential germination defect due to an
altered seed storage protein composition, I conducted germination tests on the
four possible combinations of aep triple mutants and aep-null lines of A. thaliana
on traditional Murashige-Skoog (MS)
media. In order to test potential
supplementation effects due to the germination media I also tested variations of
the traditional Murashige-Skoog media, but not complemented with sugar and
eventually depleted in one type of the major nutrients usually used for seed
germination medium (nitrogen, salts and vitamins).
31
2.2 Material and methods
2.2.1 Plant material
The wild-type plants used in this study were Arabidopsis thaliana ecotype
Columbia (Col-0). The A. thaliana aep-null corresponding to the aep quadruple
mutant
aep1-1 aep2-3 aep3-1 aep4-1
αvpe-1 βvpe-3 γvpe-1 δvpe-1)
expressing
(also
no
AEP,
the
known
as
triple
mutant
aep1-1 aep2-3 aep3-1, as well as the aep4-1 single mutant were generously
provided by Professor Ikuko Hara-Nishimura from Graduate School of Science,
Kyoto University, Japan. This aep-null line has a mixed genetic background of
Columbia (Col-0) and Wassilewskija (Ws-2). The A. thaliana triple mutants
aep2-3 aep3-1 aep4-1, aep1-1 aep3-1 aep4-1 and aep1-1 aep2-3 aep4-1, each
expressing only one AEP, were previously generated in my supervisor’s
research group from crosses between aep1-1 aep2-3 aep3-1 aep4-1 and the
aep4-1 single mutant. All these triple mutants also have a mixed genetic
background of Columbia (Col-0) and Wassilewskija (Ws-2).
A second aep quadruple mutant, or aep-null, aep1-3 aep2-5 aep3-1 aep4-1
(also known as αvpe-3 βvpe-5 γvpe-1 δvpe-1) with a Columbia (Col-0) genetic
background only (CS67918) was obtained from the Arabidopsis Biological
Resource Center.
2.2.2 Plant growth conditions
For each assays, seeds originated from parents grown side-by-side under
identical conditions. Parent seeds were sown on soil, stratified for 3 days at 4°C
in the dark and grown at 22°C with 16 h of light per 24 h days, with fluorescent
light of 150 μmol photons m −2 s −1 at 22°C and 65% relative humidity. Parent
plants were allowed to self-fertilise and produce seeds. Seeds were harvested
independently for each parent plant after complete natural senescence.
2.2.3 Germination assays
For seed germination assays, precisely 100 seeds per replicate of each line
were
surface-sterilised
with
chlorine
32
gas.
Seeds
were
placed
in
a
microcentrifuge tube, on a rack, inside a desiccator jar where a beaker with 100
mL bleach was also placed. Just before closing the desiccator jar 3 mL
concentrated HCl was added to the bleach to allow chlorine gas to form, and
sterilisation was conducted for 3 h. Surface-sterilised seeds were then sown,
individually for each batch of 100 seeds, on normal or modified sterile
Murashige-Skoog 1% (w/v) agar media (Austratec Pty. Ltd.) plates, eventually
depleted in salt or nitrogen, and then stratified and grown in the same
conditions described above. For each assay the whole set of plates were
randomised for position under light. Seed germination rate, established by
radicle protrusion, was then quantified every day for up to 13 days.
2.2.4 Protein extraction
Total protein was extracted from whole seeds of each tested genotype
according to Gruis, Selinger et al. (2002). Total protein was extracted from
precisely 100 seeds, crushed in liquid nitrogen and then resuspended in 200 μL
of 2% (w/v) SDS, 160 mM DTT, and 50 mM Tris-HCl, pH 6.8. Immediately after
addition of the buffer, samples were vortexed, incubated for 5 minutes in a dry
bath incubator set at 100°C and centrifuged for 10 minutes at 20,800 x g.
Supernatants were collected and vortexed before 20 μL of protein samples
were adjusted with a SDS-PAGE sample buffer to final concentrations of 62.5
mM Tris-HCl pH 6.8, 2.5% (w/v) SDS, 0.002% (w/v) bromophenol blue, 5% (v/v)
β-mercaptoethanol, and 10% (v/v) glycerol.
2.2.5 Protein 1-D gel electrophoresis
Samples adjusted with SDS-PAGE sample buffer were loaded on a precast
Bolt™ 4-12% Bis-Tris Plus Gel, fifteen wells (Thermo Fisher Scientific). 1-D gel
electrophoresis was performed for 35 min at 165 V in SDS-PAGE running
buffer: 50 mM MES (2-[N-morpholino]ethanesulfonic acid), 50 mM Tris base, 1
mM EDTA, 0.01% (w/v) SDS, using the compatible Mini Gel Tank (Thermo
Fisher Scientific). Gels were stained with 0.5% (w/v) Coomassie Brilliant Blue
G-250 (made in 50% (v/v) methanol, 10% (v/v) acetic acid), and later destained
in 40% (v/v) methanol with 10% (v/v) acetic acid.
33
Figure 2.2. A. thaliana germination rate in various aep backgrounds,
assay 1. Germination assays were conducted on 100 seeds per genotype and
per plate. For each genotype, 5 times 100 surface sterilised seeds were placed
on five identical germination media plates. Germination was assessed by
radicle protrusion counted every day at the same time from the original time of
transfer to the growth chamber.
2.3 Results
2.3.1 Preliminary study
In a preliminary experiment, I conducted germination tests on Col-0, an aep-null
line (aep1-1 aep2-3 aep3-1 aep4-1)
and each of the four corresponding
combinations of aep triple mutants of A. thaliana, on normal MS media. For
each genotype I used seeds from one plant, and did five replicates, grown for
13 days. The number of germinated seeds was determined for each plate every
day from the second day after being placed under the light, and averaged per
genotype (figure
2.2). Each genotype presented a low variability for
germination rate from one plate to another as indicated by the standard
deviation. During this assay, the aep-null demonstrated a significant (P < 1.10-4)
delay in germination compared to Col-0, the wild-type control (figure 2.2).
However, parts of these preliminary results were intriguing. Among all the other
triple
mutants,
only
aep1-1 aep2-3 aep4-1
exhibited
an
intermediary
germination rate pattern, between the aep-null and the wild-type and the three
other triple mutants geminated as wild-type (figure 2.2). However, if the delay in
germination observed for this aep-null line was due to misprocessing of seed
storage
proteins,
aep2-3 aep3-1 aep4-1
similar
and
results
should
have
been
aep1-1 aep2-3 aep3-1,
but
obtained
not
for
for
aep1-1 aep2-3 aep4-1 that should germinate like wild-type.
Thus despite the apparent delay in germination for the aep-null seeds
compared to wild-type, my first results did not provide further understanding of
the mechanisms underlying this difference.
34
Figure 2.3. A. thaliana germination rate in various aep backgrounds,
assay 2. Germination assays were conducted on 100 seeds per genotype. For
each plant, 100 surface sterilised seeds were placed on five different
germination media, with five replicates per media. Germination was assessed
by radicle protrusion counting every day at the same time than original transfer
to growth chamber.
2.3.2 Germination rate is not influenced by media composition
In a second experiment, I assessed the possibility that the outcome of the
germination assay could be biased by the composition of the germination
media. Using a similar experimental set-up as previously described, the six
different genotypes were sown on duplicates of five different media plates. The
traditional MS media used for growing plants on plates, in particular A. thaliana,
is mostly constituted of salts and nitrogen. So to test if these components could
help an impaired line to germinate as wild-type by complementing its deficiency,
I conducted tests on normal MS media, on MS media with a half dose of salts or
nitrogen, and on MS media completely depleted in salts or nitrogen. The 60
total plates were randomised for position under light and grown for 4 days.
Based on my previous results, I established that most wild-type seeds would
germinate in 4 days (figure 2.2). The number of germinated seeds was
established for each plate every day after being placed under the light, and
averaged per genotype (figure 2.3).
For each genotype, the results obtained on each duplicate of media were
similar. Surprisingly, for each genotype no difference was noticeable with
respect to rate of germination on any type of media, depleted or not in nitrogen
or salt (data not shown). Thus, all results obtained per genotype were averaged
regardless of the germination media used. The very low (less than 10%)
standard deviation at each time point and for each line attest that the different
germination medium had no influence (P > 0.05) on germination rates of these
lines (figure 2.3).
Interestingly germination rates
differed from
one genotype to another,
regardless of the germination media. Most of the genotypes exhibited a quite
different germination pattern than during the preliminary experiment. More
remarkably
the
wild-type
and
the
aep2-3 aep3-1 aep4-1
and
aep1-1 aep2-3 aep4-1 lines were the most delayed in germination, whereas the
aep-null line was the fastest to germinate during this assay.
35
Whereas intriguing, these results nevertheless demonstrate that the nutrient
composition of the media does not influence the germination rate of a given line,
and that an aep-null line can germinate normally even on nitrogen or salt
depleted media.
At this stage I realised that my experiments could be biased. Though I had
replicates, during both germination tests I used a single independent line per
genotype, but seeds from a different parent plant were being used from one
assay to the other. It was then possible that the differences observed for the
same genotype between the two assays were due to internal phenotypic
variation in this genotype.
36
Figure 2.4. A. thaliana germination rate in various aep backgrounds,
assay 3. Germination assays were conducted on seeds from eight different
plants per genotype, all harvested at the same time and stored at 25⁰C for at
least 4 weeks. For each plant, 100 surface sterilised seeds were plated on
germination media. Germination was assessed by radicle protrusion counting
every day at the same time than original transfer to growth chamber.
(a) Germination rates of seedlings of eight different plants, either Col-0 or
aep-null. (b) Averaged germination rates of aep-null seeds and wild-type ones.
2.3.3 Germination rate varies in a fixed genotype
To test the hypothesis that germination rate could be variable from one plant
seedling to another with the same genotype, I conducted a third assay on Col-0
and aep-null.
Based on my previous results suggesting no correlation between nutrient
composition of the media and germination rate, I conducted the next
germination assays on traditional MS media. For each genotype, eight
individual lines were sown on media plates. Germination rates were established
up to 4 days after light induction.
Additionally, to prevent any phenotypic variability that could be linked to the
mixed genetic background of the previously used mutants, this new assay was
conducted on a different aep-null line (aep1-3 aep2-5 aep3-1 aep4-1) with a
pure Col-0 genetic background.
As expected each genotype exhibited a great internal variability for their
germination rate (figure 2.4a). These data suggest that regardless of the seed
protein content, other factors independent from genetic information have a
strong effect upon germination rate (please see discussion on next page).
2.3.4 Impaired seed storage protein maturation does not influence
germination rate
Results obtained for all lines of each genotype were averaged and compared.
Overall, the aep-null line had no statistically significant difference in germination
rate compared to the wild-type line plant (figure 2.4b). This result indicates the
impaired seed storage protein maturation that can be observed in aep-null
seeds and in seeds of some aep triple mutants does not influence seed
germination. My findings concur with these of the previous authors (Kinoshita,
Nishimura et al. 1995; Gruis, Selinger et al. 2002; Shimada, Yamada et al.
2003; Gruis, Schulze et al. 2004).
37
2.4 Discussion
In this study, I investigated the germination rates of several aep mutants and
confirmed the previous undocumented reports attesting that aep mutants
germinate normally compared to wild-type. My results also revealed a great
variability for the germination rate in A. thaliana, unrelated to the genetic
information, which was not previously reported in the literature to my
knowledge. In previous studies Col-0 germination rate was much more uniform
(Wang, Chen et al. 2016; Waterworth, Footitt et al. 2016; Yan, Wu et al. 2014).
This suggests that in addition to the genetic control of germination, other factors
might also be involved in germination control and generate germination rate
fluctuation inside a fixed genotype lineage.
For example, an increasing number of studies have shown that epigenetic
regulations
modifies
gene
transcription
during
seed
development
and
germination and thus could contribute to the control of seed germination
(Nakabayashi, Okamoto et al. 2005; Wolny, Braszewska-Zalewska et al. 2014).
In addition, recent reports demonstrated that new epigenetic variants are
generated at a higher rate than genetic variants in homogeneous A. thaliana
population during several generations (Becker, Hagmann et al. 2011; Schmitz,
Schultz et al. 2011) and several studies also showed that epigenetic variations
can be influenced by the environment and inherited at least to the immediate
progeny (Boyko, Blevins et al. 2010; Kathiria, Sidler et al. 2010).
Thus these observations suggest that epigenetic variations between plants of a
population with a fixed genotype can lead to variability in germination control of
the progeny plants, resulting in variable germination rates for seeds originating
from different sister-plants. This mechanism has ecological significance
because it allows a fixed genetic population to have phenotypic variability for
germination rate, preventing synchronised germination of a whole population
which could represent a great disadvantage compared to time-spaced
germination in various environmental conditions.
38
Chapter 3
AEP function in dark-induced leaf
senescence
39
Figure 3.1. Relative expression of AEPs in A. thaliana. Data based on
mRNA microarray experiments from Gene Expression Map of A. thaliana
Development (Schmid, Davison et al. 2005), and the Nambara lab for the
imbibed and dry seed stages (Nakabayashi, Okamoto et al. 2005). Data are
normalised by the Affymetrix® GeneChip® Operating Software method, target
value of 100. Most tissues were sampled in triplicate. Data downloaded from the
‘‘Electronic Fluorescent Pictograph’’ Browser by B. Vinegar (Winter, Vinegar et
al. 2007), top cartoons drawn by J. Allis and N. Provart.
3
AEP function in dark-induced leaf senescence
3.1 Introduction
3.1.1 AtAEP3 expression is highest in senescing leaves
As introduced in Chapter 1, AEP function in plants has been intensively studied
and characterised. AEP is a cysteine protease that cleaves seed storage
protein precursors on the C-terminal side of an asparagine or an acid aspartic
residue for their maturation (Hara-Nishimura, Inoue et al. 1991; Hara-Nishimura,
Takeuchi et al. 1993; Sheldon, Keen et al. 1996; Hiraiwa, Nishimura et al.
1999). Four homologues of AEP have then been discovered in A. thaliana. The
A. thaliana AEP genes are expressed in many plant tissues, but each AtAEP
has specific expression (figure 3.1) and probably a different function.
In A. thaliana, the role of AEP in maturation of seed storage proteins has been
confirmed and fully detailed with AtAEP2 (mainly expressed in seeds) being the
most important for this function, but AtAEP3 can compensate (as well as
AtAEP1 to a less extent) for this function if AtAEP2 is knocked out (Kinoshita,
Nishimura et al. 1995; Gruis, Selinger et al. 2002; Shimada, Yamada et al.
2003; Gruis, Schulze et al. 2004). AtAEP3 is mainly expressed (as well as
AtAEP1 more moderately) in stem and leaves (figure 3.1). AtAEP3 is also
involved in programmed cell death of pathogen infected leaves (Kuroyanagi,
Yamada et al. 2005) and is up-regulated in leaves during senescence and
under various stress conditions (Kinoshita, Yamada et al. 1999). Despite the
fact that AtAEP3 expression is highest in senescing leaves, a role for AEPs in
senescence has not been reported.
40
3.1.2 Senescence is a natural recycling plant mechanism
Senescence is the final stage of leaf development and of every plant organ,
resulting in the death of single cells, tissues or whole plants. Plant senescence
is best known from the public in the form of autumnal leaf senescence. During
autumn in temperate regions of the world, deciduous trees develop colours of
great aesthetic value, when leaves change colour from green to yellow, orange,
red and eventually brown, before they die and fall from the tree. These colour
changes are due to the preferential degradation of specific pigments, such as
chlorophylls, over others like carotenoids or anthocyanins (Gan and Amasino
1997; Keskitalo, Bergquist et al. 2005). This transformation is just the visible
part of a much more complex process. During senescence most nutrients
contained in leaves are re-distributed to other parts of the plant before the
leaves die (Leopold 1961) to prepare for the winter. This mechanism allows
deciduous trees to retain valuable nutrients, which are often present in limiting
amounts in the soil, instead of losing them when the leaves die and detach
(Vitousek and Howarth 1991). Senescence also occurs in annual plants, and
generally the whole plant turns from green to golden as the seeds ripens before
harvest (in the case of crops) or the plant simply dies (Buchanan-Wollaston,
Earl et al. 2003). Finally, senescence can also occur in response to pathogens
or environmental stresses. These external factors can lead a plant to favour
some organs over others where senescence is consequently induced (Quirino,
Normanly et al. 1999; Weaver and Amasino 2001; Hickman, Hill et al. 2013).
During senescence, the visible transformation is accompanied by active
metabolic changes that result in the degradation of the concerned organ’s
components accumulated during development, such as chlorophylls, proteins,
carbohydrates, lipids, and nucleic acids. The nutrients are then redirected to
other parts of the plant, where they can be recycled for the production of new
vegetative, reproductive or storage organs (Watanabe, Balazadeh et al. 2013).
Many proteases are involved in senescence, particularly for protein degradation
(Roberts, Caputo et al. 2012), but no report suggests that this function has ever
been investigated for the protease AEP.
41
3.1.3 Dark-induced senescence as a tool to study senescence
In 2001, Weaver and Amasino developed an assay to induce senescence on
individual leaves by covering a few individual leaves per plant with aluminium
foil for several days. This assay mimics a natural event, the coverage of
individual leaves by other leaves, organs or other objects. Upon removal,
covered leaves from wild-type plants would exhibit a typical leaf depigmentation
and desiccation phenotype of senescence. Biochemical analysis revealed that
chlorophyll and total protein remobilisation had occurred and that several
markers of senescence such as SAG12 were induced (Weaver and Amasino
2001). Indeed, senescence appears to be a logical biological response to
recycle the contents of a covered leaf that is less efficient for photosynthesis
than uncovered leaves and to use these remobilised contents as nutrients to
build new leaves. This assay was used in several studies to identify effectors
and inhibitors of senescence such as the A. thaliana mitochondrial enzyme
NOS1 involved in nitric oxide synthesis. NOS1 inhibits senescence by reducing
the production of reactive oxygen species and oxidation of proteins and lipids
that naturally occur during senescence (Guo and Crawford 2005). It was also
used to analyse the specific genes regulation and hormone signalling (van der
Graaff, Schwacke et al. 2006) that occur in senescing leaves and to provide
detailed analysis of the chlorophyll breakdown pathway (Pruzinska, Tanner et
al. 2005), chloroplast autophagy (Wada, Ishida et al. 2009) and the early
disruption
of
the
microtubule
network
(Keech, Pesquet et al. 2010)
accompanying leaf senescence in A. thaliana. In most of these studies,
senescence was induced on individual leaves of 3 to 4 weeks old plants if
grown under long day conditions or of 8 weeks old plants if grown in short days
condition. Senescence induction of 5 days was enough to induce an easily
noticeable phenotype and was conducted for up to 9 days, which was sufficient
to result in total leaf desiccation and protein remobilisation, mimicking the
natural death by senescence of an aging leave (Keech, Pesquet et al. 2010).
To investigate in detail a possible role for AEP proteases in senescing leaves, I
artificially induced leaf senescence in aep mutant plants using dark treatment
on single leaves. I used several aep mutants to see if they would exhibit
abnormal responses to this senescence induction.
42
3.2 Material and Methods
3.2.1 Plant material
The wild-type plants used in this study were Arabidopsis thaliana ecotype
Columbia (Col-0) and Wassilewskija (Ws-2).
The A. thaliana mutant lines used here were previously described in 2.2.1.
3.2.2 Chlorophyll abundance measurement
During the senescence induction assays, at various time points (from 1 to 9
days of darkness), chlorophyll content was estimated from covered and control
(uncovered) leaves using a chlorophyll meter (or SPAD meter), a small portable
spectrometer that allowed non-destructive measurement.
3.2.3 Plant
Growth
Conditions,
selection,
crosses
and
seed
collection
All plant genotypes were propagated as in 2.2.2.
For induction of senescence assays, plants were sown on soil and grown at
22°C under short-day conditions (8 h light per 24 h day) with fluorescent light of
150 μmol photons m −2 s −1 at 22°C and 65% relative humidity. Leaves from 7 or
8 week-old plants were individually covered with aluminium foil for up to 9 days.
I chose to dark-induce senescence on leaf 8 for all tested plants. In this way, all
individually darkened leaves (IDLs) were at the same physiological state, which
minimised risks of experimental artefacts. For the same reason, I used leaves 7
and 9 from the same plant as non-induced controls, because they are of similar
physiological state as leaf 8 and constitute better controls than a non-covered
leaf 8 from a sister plant, which could be in a different physiological state due to
various external factors.
43
a
b
Figure 3.2. Chlorophyll content variation in 5 days IDLs (8 weeks old
plants). For each line, 16 plants were grown for 8 weeks before dark-induction
of senescence on IDLs. (a) For each line, an IDL is presented after 5 days of
dark-induction, associated with a corresponding control uncovered leaf from the
same plant. After 5 days of induction, the IDLs were uncovered and chlorophyll
content was estimated by a SPAD index, as well as for two control uncovered
leaves per plant. (b) Results are summarised as average values obtained for
each line.
3.3 Results
3.3.1 Preliminary study
In a preliminary experiment, I used the aep quadruple null line used in previous
studies (Mylne, Colgrave et al. 2011), and the related aep triple mutants. This
aep-null line had a mixed Col-0/Ws-2 genetic background, with two mutated
AEP genes originally isolated in a Columbia ecotype (Col-0), and the two others
from a Wassilewskija ecotype (Ws-2). As a result the various triple mutants also
had a mixed genetic background. Thus I used both Col-0 and Ws-2 ecotypes as
controls during this analysis. Because these two ecotypes have different
flowering times, all lines were cultivated under short day conditions to delay
flowering and allow a similar vegetative growth for all genetic background. After
8 weeks of vegetative growth, selected leaves were individually covered with
aluminium foil to induce senescence. Each individually darkened leaf (IDL) was
uncovered after 5 or 9 days (5d or 9d). After 5 days of induction only the
wild-type Col-0 genetic background IDLs exhibited a clear, but light, yellowing
phenotype characteristic of senescence compared to their respective uncovered
control leaves (figure 3.2). All the mutants and the wild-type Ws-2 genetic
background appeared to be much less sensitive to the dark treatment, their
respective IDLs only exhibiting a slight lightening compared to their respective
controls.
To quantify this colour change mostly due to chlorophyll remobilisation, I
estimated leaf chlorophyll content using a chlorophyll meter (or SPAD meter).
The chlorophyll meter gives values (called SPAD index) exponentially linked to
the tested leaf chlorophyll contents (Blackmer and Schepers 1995; Markwell,
Osterman et al. 1995; Uddling, Gelang-Alfredsson et al. 2007). The resulting
values (figure 3.2) reflect the phenotypes described above. A strong decrease
in chlorophyll content can be observed in the wild-type Col-0 genetic
background IDLs compared to their control, as expected for a wild-type plant.
44
a
b
Figure 3.3. Chlorophyll content variation in 9 days IDLs (8 weeks old
plants). For each line, 16 plants were grown for 8 weeks before dark-induction
of senescence on IDLs. (a) For each line, an IDL is presented after 9 days of
dark-induction, associated with a corresponding control uncovered leaf from the
same plant. After 9 days of induction the IDLs were uncovered and chlorophyll
content was estimated by a SPAD index, as well as for two control uncovered
leaves per plant. (b) Results are summarised as average values obtained for
each line.
This decrease is much less obvious for the other lines and particularly for the
aep-null line and the Ws-2 wild-type control. I also observed that the uncovered
Ws-2 wild-type leaves were on average less green than the other lines and this
was reflected by the SPAD measurement.
After 9 days of dark treatment, the induction of senescence was much more
clearly identifiable. The Col-0 and Ws-2 ecotype exhibited a clear senescence
phenotype for their IDLs as expected, but much more severe for the Col-0
phenotype than for Ws-2 (figure 3.3). By contrast the aep-null and
aep1-1 aep2-3 aep3-1 lines exhibited a stay-green phenotype in response to
the dark treatment. IDLs from the three other triple mutant backgrounds also
exhibited a yellowing phenotype similar to wild-type.
Overall these results would indicate that AEP is involved in senescence, as
aep-null does not respond to induction of senescence. Furthermore the results
obtained with the triple mutants suggest that the three AEP1, AEP2, AEP3 can
fulfil this function, but not the seed-coat specific AEP4. These results correlate
with the expression pattern reported for these respective AEPs.
As for the previous induction time, after 9 days of induction the chlorophyll
contents of IDLs and control leaves was estimated using a chlorophyll meter.
The values obtained indicated a clear and strong chlorophyll loss for the
wild-type Col-0 genetic background IDLs (figure 3.3). The aep-null and the
aep1-1 aep2-3 aep3-1 lines IDLs also showed lower chlorophyll contents than
their respective controls, but the decrease was much less important than for
Col-0, suggesting a delay in senescence induction. Intermediary results were
obtained for the three other triple mutants. Unfortunately during this assay, all
the wild-type Ws-2 plants had a light-green phenotype compared to all the other
lines tested (figure 3.3) and also more variability in their response to the
darkening treatment, as reflected by the chlorophyll contents measurements
(figure 3.3). Thus I could not obtain statistically significant results from this
analysis whereas the aep-null and the aep1-1 aep2-3 aep3-1 lines seemed to
be delayed for senescence induction compared to wild-type Col-0 ecotype line.
45
Figure 3.4. Leaf phenotype of 9 days IDLs (7 weeks old plants). For each
line an IDL is presented after 9 days of dark-induction, associated with a
corresponding control uncovered leaf from the same plant.
3.3.2 Natural variation for response to dark-induced senescence
I repeated the same experiment previously described in 3.3.1. In the previous
trial, I encountered an unexpected coloration difference between wild-type
leaves of Ws-2 and Col-0 genetic background after 9 days of senescence
induction. It appeared that the wild-type Ws-2 plants would start flowering at
approximately 9 weeks under these specific growth conditions. As a result, the
probable stress generated by flowering induction under short day conditions
might have induced a more generalised, but partial, remobilisation response in
wild-type Ws-2 plants, thus interfering with this assay. However, for the plants
induced for 5 days, this difference was not noticeable, indicating that a change
occurred in the Ws-2 plants during this time lapse, probably linked to flowering
induction. To prevent this problem, in this assay the plants were grown for 7
weeks instead of 8 in the previous assay.
After 9 days of senescence induction, IDLs were uncovered for analysis. As for
the
previous
experiment, the wild-type Col-0 plants
as
well as
the
aep2-3 aep3-1 aep4-1, aep1-1 aep3-1 aep4-1 and aep1-1 aep2-3 aep4-1 plants
all exhibited a typical yellowing phenotype indicating advanced senescence
(figure 3.4). Surprisingly, this time aep-null and aep1-1 aep2-3 aep3-1 IDLs did
not exhibit the stay green phenotype observed during the first trial. Instead they
seemed to have also undergo senescence, but to a less severe extent,
suggesting a delay in senescence. Unfortunately, the wild-type Ws-2 plants also
had a similar reaction, suggesting a possible genetic background dependent
response. This strongly suggests that Col-0 and Ws-2 ecotypes have very
different senescence responses.
46
Figure 3.5. Chlorophyll content kinetic in IDLs (7 weeks old plants). For
each line, 16 plants were grown for 7 weeks before IDLs senescence induction.
After up to 9 days of induction the IDLs were uncovered and chlorophyll content
was estimated by a SPAD index. Results are summarised as average values
obtained for each line. (a) Col-0 and Ws-2 display different chlorophyll
remobilisation profiles, with some aep mutants displaying intermediate profiles.
(b) Most aep mutants exhibit a similar response than wild-type Col-0 to
senescence induction.
To further investigate this possibility a second set of plants were induced and
IDLs were uncovered after 3, 6 or 9 days to evaluate their chlorophyll contents
with a chlorophyll meter. For all lines the chlorophyll contents of IDLs
decreased, establishing a specific remobilisation pattern for each line (figure
3.5).
Interestingly, wild-type Col-0 and Ws-2 genetic background present a different
remobilisation pattern during dark-induced senescence, suggesting a different
regulation of senescence in these two ecotypes. Most triple mutants had a
wild-type
However,
remobilisation
the
pattern
pattern, correlating the similar
of
chlorophyll
remobilisation
visual changes.
for
aep-null
and
aep1-1 aep2-3 aep3-1, that exhibit a delay in yellowing compared to Col-0, but
similar to Ws-2, seem to be combinations of Col-0 and Ws-2 chlorophyll
remobilisation pattern. This strongly suggested that the genetic background
variation introduced a bias in this study, and for this reason, it was impossible to
get statistically clear results, even if aep-null and aep1-1 aep2-3 aep3-1
chlorophyll remobilisation pattern clear suggest a delay in senescence induction
compared to Col-0 in this assay, which confirms the phenotypic observations.
Nevertheless, whether the senescence induction delay observed for aep-null
and aep1-1 aep2-3 aep3-1 is due to their genetic background or is actually due
to the lack of enzyme is still to be determined.
3.3.3 AEP is not involved in dark-induced senescence
As previously said, Col-0 and Ws-2 A. thaliana ecotypes have very different
flowering time, Col-0 is a late-flowering ecotype and Ws-2 is rather an
early-flowering ecotype, indicating different regulations of flowering initiation.
Though flowering is known to induce senescence, interactions between
flowering regulation and dark induced senescence regulation are poorly
understood or studied. However, a recent study suggested the signalling
pathways involved in the regulation of flowering and dark-induced senescence
might be connected (Breeze, Harrison et al. 2011).
47
Figure 3.6. Leaf phenotype of wild-type IDLs. Eight Col-0 plants were grown
for 7 weeks before individual leaf coverage for dark induced senescence assay.
For each plant, IDLs after 1, 3, 5 or 7 days of senescence induction are
presented associated with a corresponding non-covered control leaf.
It appears that further analysis of the natural variability that exists between the
Col-0 and Ws-2 ecotypes for senescence could bring more insight on the
regulation of initiation and inhibition of senescence. However this was not the
subject of this study aiming to identify the function of AEP in senescing leaves,
and rather represented a serious issue, that I will explain now.
The aep-null line I was using for this study had a mixed genetic background of
the two A. thaliana ecotypes, Col-0 and Ws-2. Thus, in addition to the AEP
mutations, these mutants also have a different allelic combination of genes.
Because Col-0 and Ws-2 demonstrate different senescence responses it is
likely that they possess
different alleles of important and unidentified
senescence regulators. The resulting mix of these regulators in the aep-null
line, and in the aep triple mutant lines too, makes it difficult to compare these
mutants to their wild-type parents for AEP specific responses to dark induced
senescence.
However aep-null and aep1-1 aep2-3 aep3-1 lines repeatedly exhibited a
different response to the induction compared to both Col-0 and Ws-2 wild-type
lines, suggesting that AEP might be required for dark-induced senescence. To
elucidate this possible involvement of AEP in senescence and to counter the
unexpected outcome of my previous assays, I ordered a new aep-null line from
the Arabidopsis Biological Resource Center (Ohio, USA). This aep quadruple
mutant is exclusively resulting from the original crossings of Col-0 ecotype
single aep mutant alleles, and thus has a fixed Col-0 genetic background.
I repeated the previous experiment with this new 100% Columbia aep-null line,
and a wild-type Col-0 line. This assay revealed that this mutant has no
abnormal phenotype related to dark-induced senescence (figures 3.6 and 3.7).
It is clear here that beside some small internal variability, the aep-null line
(figure 3.7) exhibits the same response to senescence induction than the
wild-type line (figure 3.6). Chlorophyll content profiles (not shown) were almost
identical and presented no significant difference (P > 0.05). These results imply
that the differences that I observed in my previous assays between aep-null and
aep1-1 aep2-3 aep3-1 lines were more likely to be due to the mixed genetic
background of the mutants and not to the aep mutations. This also suggests
convincingly that AEP has no involvement in dark induced leaf senescence.
48
Figure 3.7. Leaf phenotype of aep-null IDLs. Eight aep-null (in a pure Col-0
background) plants were grown for 7 weeks before individual leaf coverage for
dark induced senescence assay. For each plant, IDLs after 1, 3, 5 or 7 days of
senescence induction are presented associated with a corresponding
non-covered control leaf.
3.4 Discussion
In this study, I investigated the possible involvement of A. thaliana AEPs in
senescence. Using genetic knock-out lines I showed that plants totally deficient
in AEP would respond similarly to wild-type plants to dark-induced senescence.
Although several proteases have been shown to be crucial for protein
remobilisation during senescence (Roberts, Caputo et al. 2012), and A. thaliana
AEP1 and AEP3 are preferentially expressed in leaves, particularly senescing
leaves (Kinoshita, Yamada et al. 1999; Yamada, Shimada et al. 2005), my
results suggest that A. thaliana AEPs are not essential to achieve dark induced
leaf senescence.
Thus, an AEP function in senescing leaves remains unknown, and the reason
for high expression of AEP3 in senescing leaves remains unexplained.
However, AEP could still be involved in some aspect of leaf senescence, and
contribute to the degradation of some of the re-mobilised protein content, but
much finer and exhaustive analysis than the one I conducted here would be
required to investigate such hypothesis, such as investigating the degradation
of leaf proteins during senescence relative to the presence of AEP in the leaves
or not. However, AEP is surely not a major protease required for senescence,
and gathering more knowledge of AEP function during senescence won’t help
understanding how plants resist senescence, which is the main goal of most
studies on leaf senescence.
49
Chapter 4
Using SFTI-1 to study protein
trafficking defects
A proof-of-concept study
50
Figure 4.1. PawS1 albumin and SFTI-1 biosynthesis. 1 PawS1 gene (blue
helix) is transcribed into mRNA (red curve). 2 mRNA exit the nucleus via
nuclear pores and is recognised by ribosome for translation into protein (black).
3 Translation continues on the ER, where the protein is integrated, the peptide
signal is cut (purple) and disulfide bonds form. 4 Protein precursors are
secreted from the ER to the Golgi complex. MAG2, located on the ER, and
MAG4, located on the Golgi complex, are involved in this process. 5 In the Golgi
complex, VSR1 recognises and binds the C-terminal region of seed storage
protein precursors. 6 VSR1 is also involved in seed storage protein precursors
sorting to multivesicular bodies via Golgi vesicles. 7 In the multivesicular bodies,
seed storage protein precursors are sequentially processed as they traffic to the
protein storage vacuole. 8 AEP cuts PawS1 at the three C-term extremities of
the acid aspartic residues preceding SFTI-1, the small and the large PawS1
albumin subunits. 9 Napin-type pro-albumins are further processed by an
aspartic peptidase. 10 SFTI-GLDN is simultaneously cleaved and cyclised by
AEP into SFTI-1. 11 After delivery of mature seed storage proteins in the
protein storage vacuole, VSR1 is recycled by the retromer complex (which
contains MAG1, VPS35 and other proteins) back to the Golgi complex.
4
Using SFTI-1 to study protein trafficking defects
4.1 Introduction
As described previously SFTI-1 is encoded by a multiprotein gene also
encoding a napin-type albumin, called PawS1. The full details of SFTI-1
biosynthesis are still to be confirmed, but the processing pathway for napin-type
albumins is well characterised and allows us to infer a robust description of
PawS1 maturation, which is as follows (figure 4.1). After translation, the PawS1
protein precursor is transported to the endoplasmic reticulum. Upon entry in the
endoplasmic reticulum, the PawS1 pre-pro-protein (151 amino acid residues) is
cleaved and the ER-signal peptide is removed to release PawS1 pro-protein
(130 amino acid residues). As most seed storage precursors, napin-type
albumin precursors are then transported to the Golgi complex where they
accumulate in specific buds that give rise to electron-dense vesicles released in
the cellular matrix (Hillmer, Movafeghi et al. 2001). In the same way, protease
precursors involved in the maturation of seed storage proteins are also released
from the Golgi complex in separate specialised vesicles. These two types of
vesicles fuse with each other to give rise to multivesicular bodies (Otegui,
Herder et al. 2006). Then, as they traffic to protein storage vacuoles in
multivesicular bodies, napin-type albumin precursors are processed by AEP
and aspartic
peptidase, to release the disulfide-bonds-linked-two-subunit
napin-type albumins (Hara-Nishimura, Takeuchi et al. 1993; Otegui, Herder et
al. 2006), and SFTI-1 in the case of PawS1 (Mylne, Colgrave et al. 2011;
Bernath-Levin, Nelson et al. 2015). AEP cleaves PawS1 at three points to
release the two albumin subunits and SFTI-1-GLDN. Then, the GLDN trailing
sequence of SFTI-1 is cut by AEP and SFTI-1 is simultaneously cyclised by a
head to tail ligation between the first and the last residues of SFTI-1 mature
sequence (Mylne, Colgrave et al. 2011; Bernath-Levin, Nelson et al. 2015). The
internal propeptide separating the two albumin subunits is then traditionally
removed from the small subunit by the action of aspartic peptidase (Hiraiwa,
Kondo et al. 1997)
51
Protein biosynthesis and maturation have been extensively studied and seed
storage proteins have often been used as models to study this process due to
their abundance and major contribution to the worldwide food production
(Shewry, Napier et al. 1995). The trafficking of seed storage protein during
maturation is a very complex process, and although the key events happening
during this process are well understood (Otegui, Herder et al. 2006), few
regulators have been characterised, mostly in A. thaliana.
The A. thaliana mutant lines mag2 and mag4 are altered in their transport of
seed storage protein precursors from the endoplasmic reticulum to the Golgi
complex. Their seed cells develop a large number of abnormal structures
containing storage protein precursors and these precursors accumulate in dry
seeds (Li, Shimada et al. 2006; Takahashi, Tamura et al. 2010). MAG2 is a
peripheral membrane protein of the ER involved in the transport of seed storage
protein precursors between the ER and the Golgi complex in plants (Li,
Shimada et al. 2006). MAG4 was shown to be localised on the Golgi complex in
A. thaliana root and cultured protoplast cells and is involved in the transport of
seed storage protein precursors from the ER to the Golgi complex in plants
(Takahashi, Tamura et al. 2010). However, the mode of action of these two
proteins remains unknown and their role in seed storage protein trafficking is
still poorly understood.
The A. thaliana various vps35 double and triple mutants, as well as the mutant
lines mag1 and vsr1, are altered in sorting proteins from the Golgi complex to
protein storage vacuoles in seeds. They abnormally accumulate the precursors
of storage proteins in their seeds as well as mis-sort storage proteins and
partially secrete them out of the seed cells (Shimada, Fuji et al. 2003; Shimada,
Koumoto et al. 2006; Yamazaki, Shimada et al. 2008). VSR1 has been shown
to bind the C-terminal peptides of seed storage protein precursors, which
suggests that it acts as a specific receptor of seed storage proteins and targets
them from the Golgi complex to the protein storage vacuoles (Shimada, Fuji et
al. 2003). VPS35 (three copies in A. thaliana) and MAG1 (also called VPS29)
are part of the retromer complex and were shown to interact together and to
co-localise in pre-vacuolar compartments. Thus MAG1 and VPS35 are thought
to interact in the recycling of VSR1 from the pre-vacuolar compartments back to
the Golgi complex (Shimada, Koumoto et al. 2006; Yamazaki, Shimada et al.
52
2008). Nevertheless, as for MAG2 and MAG4, the function of MAG1 and
VPS35 in seed storage protein trafficking remain largely unclear.
Although all of the mutants in these studies displayed seed storage protein
processing defects, there was no quantitative data regarding the levels of
mature or misprocessed seed storage proteins in the mutants. Indeed, in
contrast to the relative simplicity in observing the accumulation of the seed
storage proteins precursors in these mutants (Shimada, Fuji et al. 2003; Li,
Shimada et al. 2006; Shimada, Koumoto et al. 2006; Yamazaki, Shimada et al.
2008; Takahashi, Tamura et al. 2010), no quantification methods are currently
available
(Karas and Hillenkamp 1988; Fenn, Mann et al. 1989; Bantscheff,
Lemeer et al. 2012) that can allow to compare, relatively simply, but precisely,
the amount of mature seed storage protein versus the amount of unprocessed
seed storage proteins present inside the seeds. Thus, alternative methods to
assess the accumulation of mature seed storage proteins in seeds from these
mutants would be of great value and could help to clarify the functions of
effectors of the trafficking pathways of seed storage proteins.
The biosynthetic output of the seed storage protein precursor PawS1 gives an
unusual opportunity to address this problem. In this study, I developed an assay
for acyclic SFTI-1 relative quantification, and monitored acyclic SFTI-1
production in seeds of transgenic A. thaliana producing PawS1 in a wild-type
background and in various seed storage protein trafficking mutants. I used
acyclic SFTI-1 relative quantification as a tool to evaluate the relative quantity of
mature seed storage protein in these lines. Based on these results, I propose
additional explanations for the previously described phenotypes of the several
trafficking mutants used in this study.
53
Forward
primer
Sequence
Reverse
primer
Sequence
Associated
Product
genotype
length
when +
F1
F3
F5
CGTTGGCTCGCTTGTTATTT
GGATAAAAGTTTCGGCGTCA
TTTTTGACTGTTGTAGGAAATCC
R2
R4
R6
TCGCCAACATCGTTGTAATG
AAACTCCAGAAACCCGCGGCTGAG
ATATTGACCATCATACTCATTGC
MAG1
mag1-1
mag1-2
884
620
460
F7
F9
AGCACGATCCACAGGAAGTT
AATAATCGAACGATCCAAATCAGT
R8
R6
ACCAGAGCTGGTTAGTTCAATTTC
ATATTGACCATCATACTCATTGC
MAG2
mag2-3
750
430
F10
F14
F12
F15
GCCTTATAGGTAATCCGGAACG
GGTGTATGCCAGGCTTTGTT
TTCCTGACAGAGCTTCTTGTTG
TAGCATCTGAATTTCATAACCAATCTCGATACAC
R11
R11
R13
R13
GAACTGCTGCGAAGAAGATTGT
GAACTGCTGCGAAGAAGATTGT
CCATTCCCTGATGAGAGAGC
CCATTCCCTGATGAGAGAGC
MAG4
mag4-1
MAG4
mag4-2
506
513
655
400
F17
F6
TGATTTGTACCACCTGCGAGT
ATATTGACCATCATACTCATTGC
R18
R18
AGCATGTTACTAGTGATGCCTAC
AGCATGTTACTAGTGATGCCTAC
VSR1
vsr1-2
585
450
F19
F23
AACCAATTGCATACGAATTTTTCACCC
TGGTTCACGTAGTGGGCCATCG
R20
R20
GCGTAGTTGCAAAGAATGATTCAGCAG
GCGTAGTTGCAAAGAATGATTCAGCAG
VPS35b
vps35b-1
862
500
F21
F21
CGTTTACCCAACTACAGATTATTTTCA
CGTTTACCCAACTACAGATTATTTTCA
R22
R23
TTCAGCAGTTCGACATATAGAGACACC
TGGTTCACGTAGTGGGCCATCG
VPS35c
vps35c-1
1370
950
Figure 4.2. A. thaliana MAG1, MAG2, MAG4, VSR1, VPS35b and VPS35c
genes. Filled boxes, exons; solid lines, introns; thin lines, untranslated region.
The position of each mutation and T-DNA insertion is indicated, but the T-DNA
is not shown to scale. Orange, pAC161; dark-purple, pBI121; light-purple,
pDAP101; green, pROK2. The position of each primer used for genotyping is
indicated with a red arrow (see table for primer sequence). See full genes
sequences in supplemental figures 9.1.
4.2 Material and Methods
4.2.1 Plant material and growth conditions
The wild-type plants used in this study were Arabidopsis thaliana ecotype
Columbia (Col-0). The A. thaliana homozygous (single insertion locus)
pOLEOSIN:PawS1 transgenic Col-0 lines used in this study, producing a
PawS1 protein precursor, were generated for a previous study (Mylne, Colgrave
et al. 2011). The mutant lines mag1-1 (T-DNA insertion line, see (Shimada,
Koumoto et al. 2006)), mag1-2 (GABI_G037C03), mag2-2 (complete gene
deletion mutant, see (Li, Shimada et al. 2006)), mag2-3 (GABI_391C01),
mag4-1 (partial gene deletion mutant, see Takahashi, Tamura et al. 2010),
mag4-2 (SAIL_1281_G08), vsr1-2 (GABI_503A05) and the double mutant
vps35b-1c-1 (SALK_039689 crossed with SALK_099735) were generously
provided by Professor Ikuko Hara-Nishimura from the Graduate School of
Science, Kyoto University, Japan. All plants were grown as in 2.2.2.
4.2.2 Plant genotyping
For further plants crosses, plants were genotyped to select homozygous
individuals. Genomic DNA of mag1-1, mag1-2, mag2-2, mag2-3, mag4-1,
mag4-2, vsr1-2 single mutants and vps35 b-1c-1 double mutant plants was
extracted as described in Edwards, Johnstone et al. ( 1991). Homozygous and
double homozygous plants were selected using polymerase chain reaction
(PCR) and specific primers (figure 4.2). Later in the project, the same method
was used to select what I call hybrid lines, homozygous for one of these
mutations and carrying a transgenic pOLEOSIN:PawS1 insertion (selected by
another means, see next paragraph).
54
4.2.3 Plant crossing and selection
Plants homozygous for the mutations mag1-1, mag1-2, mag2-2, mag2-3,
mag4-1, mag4-2, vsr1-2 and double homozygous plants vps35b-1c-1 were
crossed with a pOLEOSIN:PawS1 transgenic A. thaliana homozygous line. To
prevent the eventuality of the pOLEOSIN:PawS1 transgene being linked to one
of the mutant loci (thus resulting in unsuccessful crosses), two different
pOLEOSIN:PawS1 transgenic A. thaliana lines were used. The homozygous
mutant lines were used as female parents and the pOLEOSIN:PawS1
transgenic as male parents to simplify the first selection stages. The selection
marker used in pOLEOSIN:PawS1 construct confers plant resistance to
glufosinate ammonium, thus the successful crosses can easily be selected by
Basta® herbicide selection. The F1 (first filial generation) plants resulting from
the crosses were sown and grown on soil under the same condition previously
described. At the cotyledon stage, F1 plants were selected for the presence of
pOLEOSIN:PawS1 by Basta® spray at a final concentration of 0.4 g/L
glufosinate ammonium (Bayer Cropscience Pty Ltd). Surviving F1 plants were
allowed to grow and self-fertilise. F2 (second filial generation) plants were
selected for the presence of pOLEOSIN:PawS1 with Basta® herbicide as the
previous generation and then surviving plants homozygous for desired
mutations were selected by genotyping as described above. From this stage, all
the plants selected were originating from the same pOLEOSIN:PawS1
transgenic line with which all crosses were successful. With most trafficking
mutations being deleterious when homozygotes, the targeted F2 genotype were
grown to generate a larger F3 generation of hybrids for further analyses. F3
plants were also selected for the presence of pOLEOSIN:PawS1 with Basta®
herbicide and the survivors allowed to self-fertilize.
55
4.2.4 Segregation analysis
Further selection steps were conducted to obtain hybrids also homozygous for
the pOLEOSIN:PawS1 transgene. Seeds of F3 plants were surface-sterilised
and then sown on sterile Murashige-Skoog media, complemented with 40 μL of
Basta® herbicide per litre of media, and grown in the same conditions
previously
described in 2.2.2. Segregation of Basta® resistance was
determined a week later by the ratio [living plants / total plants]. Seeds of F3
plants 75% resistant to Basta® herbicide were considered as issued from an
heterozygous plant for the pOLEOSIN:PawS1 transgene and seeds of F3 plants
100% resistant to Basta® herbicide were considered as issued from an
homozygous plant for the pOLEOSIN:PawS1 transgene. Selected F4 plants
were genotyped as described above to confirm the homozygous stage of the
desired mutation, and then allowed to self-fertilise. F5 plants were grown as
previously described to multiply the few desired hybrids obtained and F6 seeds
were collected for further analyses.
4.2.5 Seed peptide extraction
Peptides were extracted from 100 A. thaliana F6 seeds, for ten plants of each
hybrid generated. Seeds were placed in a 2.0 mL plastic tube with a 3/16"
stainless steel ball, frozen in liquid nitrogen and shaken for 1 min at an
oscillation frequency of 30 Hertz in a Retsch mixer mill MM301. Tissue powder
was resuspended in 0.25 mL methanol and 0.25 mL chloroform. For phase
separation, 0.1 mL of 0.05% (v/v) trifluoroacetic acid was added and mixed for 2
min at room temperature before centrifugation for 5 min at 20000 x g at room
temperature. The aqueous phase was retained and a 50-fold dilution from this
extract was made with 50% (v/v) acetonitrile 0.1% (v/v) trifluoroacetic acid
before analysis by MALDI-TOF-MS.
56
4.2.6 Acyclic SFTI-1 detection and quantification
The matrix used for MALDI-TOF-MS was prepared by adding an excess of
α-cyano-4-hydroxycinnamic acid matrix (Fluka) to 90% (v/v) acetonitrile 0.1%
(v/v) trifluoroacetic acid, followed by sonication in a water bath for 15 min and
then centrifugation for 5 min at 10,000 x g. The supernatant was diluted 10-fold
in acetonitrile to make the final matrix solution used in MALDI-TOF-MS.
For acyclic SFTI-1 detection in each seed peptide extract, the previously diluted
aqueous phase was mixed in equal proportion with the final matrix solution, and
2 μL of this mix were spotted onto MTP AnchorChip™ 384 plate. Dried spots
were analysed with an UltraFlex III MALDI-TOF/TOF mass spectrometer
(Bruker Daltonics) at 30% laser intensity. For each spot, ten spectra, each
corresponding to 500 shots, were accumulated to generate the final spectrum
used for acyclic SFTI-1 detection. Spectra were analysed using FlexControl
software (Bruker Daltonics).
To quantitate acyclic SFTI-1, I adopted the widely accepted assumption that
peptide abundance directly correlates with the integrated area under the curve
(AUC) of precursor ion spectra (Fenyö and Beavis 2008; Benk and Roesli
2012). The standard I selected for acyclic SFTI-1 quantification is a synthetic
acyclic SFTI-1 mimic with a diselenide bond. Diselenide bonds are excellent
structural mimics of disulfide bonds and the dominant Se78 and Se80 isotopes
give this peptide a distinctive isotopic pattern and higher molecular mass than
natural acyclic SFTI-1, thus making this diselenide version of acyclic SFTI-1 a
suitable structural analogue for use as an standard (Duncan, Roder et al. 2008).
Therefore, acyclic SFTI-1 quantity in a sample can be directly deduced by
comparing AUC for the mass corresponding to acyclic SFTI-1 (1531.8 Daltons)
relative to AUC for the mass corresponding to the exogenously spiked acyclic
SFTI-1 with a diselenide bond (1627.7 Daltons). To introduce the standard, the
aqueous phase of each diluted sample was mixed in equal proportion with the
standard also diluted in 50% (v/v) acetonitrile 0.1% (v/v) trifluoroacetic acid to a
concentration of 2 pM. This concentration was previously determined to be
enough for quantification, but not to suppress the acyclic SFTI-1 signal in the
sample. This mix was then diluted to 50% (v/v) with the matrix solution and the
same MALDI-TOF-MS analysis as for acyclic SFTI-1 detection was conducted.
57
Areas of the peaks at 1531.8 Daltons and 1627.7 Daltons corresponding
respectively to the acyclic SFTI-1 ion and the control synthetic peptide ion were
calculated with the provided Bruker software. For each seed peptide extract, the
relative acyclic SFTI-1 quantity was then obtained by the ratio of AUC (1531.8
Daltons) to AUC (1627.7 Daltons). This MALDI-TOF-MS analysis is similar to
the one previously described in Bernath-Levin et al. (2015). For each plant
tested (10 per hybrid), four technical repeats were performed, corresponding to
four independent analyses of the same seed peptide extract. Final results were
normalised relative to the parental transgenic line expressing PawS1 in a
wild-type genetic background.
4.2.7 Protein 1-D gel electrophoresis
To check the seed protein contents of the different mutants, 12 μL of the same
seed extracts previously used for MALDI-TOF-MS analyses were adjusted with
a SDS-PAGE sample buffer and analysed by protein 1 D gel electrophoresis as
described already in 2.2.5. All replicates previously used for MALDI-TOF-MS
analyses were tested with apparent identical results (data not shown). For figure
4.3, all replicates for each line were combined in equal proportion. For each
line, 12 μL of this combined sample were adjusted with a SDS-PAGE sample
buffer and analysed by protein 1 D gel electrophoresis as previously described.
58
4.3 Results
4.3.1 Generation of hybrid lines
In this study, I generated what I refer to as hybrid A. thaliana lines, homozygous
for a pOLEOSIN:PawS1 transgene and for one of the mag1-1, mag1-2, mag2-2,
mag2-3, mag4-1, mag4-2, vsr1-2 or vps35b-1c-1 mutations. These mutations
are responsible for abnormal accumulation and mis-trafficking of seed storage
protein precursors. All hybrids were obtained from the same pOLEOSIN:PawS1
single locus insertion line to ascertain identical PawS1 production in all hybrids.
Successful hybrid lines were selected using PCR to select homozygous lines for
the mag1-1, mag1-2, mag2-2, mag2-3, mag4-1, mag4-2, vsr1-2 or vps35b-1c-1
mutations, and using segregation analyses of the selective marker (Basta
resistance) to select homozygous lines for the pOLEOSIN:PawS1 transgene.
4.3.2 SFTI-1 as a tool to monitor seed storage protein maturation
All the mutants used in this study were reported to abnormally accumulate seed
storage protein precursors. Whereas no quantitative data were produced,
comparison of protein profiles suggests that mag2-2 and mag2-3 seeds might
accumulate similar levels of mature albumin than wild-type (Li, Shimada et al.
2006), as well as mag4-1 (Takahashi, Tamura et al. 2010) and vps35b-1c-1 dry
seeds (Yamazaki, Shimada et al. 2008). On the contrary, immunoblot analyses
suggest that mag4-2 seeds produces more mature albumin than wild-type or
mag4-1 seeds (Takahashi, Tamura et al. 2010). Similarly comparison of protein
profiles suggests that, compared to wild-type, albumin accumulation is
increased in mag1-1 seeds and decreased in mag1-2 seeds (Shimada,
Koumoto et al. 2006). Only the study of vsr1-2 did not allow any inference
toward levels of seed albumin production (Shimada, Fuji et al. 2003).
Although these mutants all accumulated unprocessed seed storage proteins,
many appear to have wild-type levels or even higher levels of properly matured
seed storage proteins, with no obvious correlation with the proposed function of
mutated genes. Surprisingly this was not commented on in these manuscripts.
59
Figure 4.3. Protein profile and acyclic SFTI-1 (acSFTI-1) quantity in
A. thaliana seeds of various backgrounds. Soluble proteins were extracted
from 100 seeds of wild-type and various trafficking mutants. Relative production
of acyclic SFTI-1 in 100 seeds based on MALDI-TOF-MS analysis of seed
peptide extracts. Stars indicate significant difference with wild type.
To understand the processes deregulated in the various mutants I developed a
simple method for precise relative quantification of acyclic SFTI-1. I introduced
the PawS1 expression construct in various mutant backgrounds and used this
semi quantitative method to compare acyclic SFTI-1 production level in the
seed of the resulting lines. Because acyclic SFTI-1 production is bound to the
PawS1 albumin production, relative variation of acyclic SFTI-1 quantity in the
seeds should reflect relative variation of the level of properly matured storage
protein accumulated in the seeds (figure 4.3).
The level of acyclic SFTI-1 produced in seeds of hybrids derived from mag2-2,
mag2-3, mag4-1 and vps35b-1c-1 did not differ significantly (P > 0.05) from the
level obtained for seeds of the pOLEOSIN:PawS1 transgenic Col-0 line. On the
contrary, compared to the pOLEOSIN:PawS1 transgenic Col-0 line, acyclic
SFTI-1 production was significantly (P < 0.0001) reduced to 40% in the mag1-2
derived hybrid and to less than 20% in the vsr1-2 derived hybrid, and increased
to 170% or 220% respectively in the hybrids derived from the mag4-2 and
mag1-1.
My results correlate with the relative quantity variations for mature albumins in
these mutant seeds noticeable on a protein gel (figure 4.3). The 1-D protein gel
is not a quantitative method, but for each sample, the same quantity of seeds
and volumes were used so that the variations observed on gel reflect the
variations between seeds contents of the different mutants. Similar procedures
were used for the MALDI-TOF-MS analyses. The relative quantity variations
observed on figure 4.3 for acyclic SFTI-1 seed contents seems to correlate with
the relative quantity variations of mature albumins and globulins in these same
seeds that can be deduced from figure 4.3.
Overall acyclic SFTI-1 quantity seems to be a good indicator of mature
albumins and globulins quantity in the various mutant seeds, and thus validate
the concept of SFTI-1 as a tool to evaluate mature seed storage quantity in
seeds. I propose to use this acyclic SFTI-1 simple quantification method to infer
the levels of mature albumins and globulins accumulating in lines transformed
with PawS1 and to provide more understanding about seed storage protein
trafficking pathways.
60
4.3.3 Seed storage protein trafficking pathways
The results presented above suggest that in mag2 seeds, despite abnormal
accumulation of seed storage protein precursors, similar amounts of mature
seed storage proteins are produced. The mag2-2 mutation is a complete gene
deletion, thus mag2-2 seeds don’t produce any MAG2 protein, and the mag2-3
mutation is a complete loss of function. MAG2 was shown to be involved in
seed storage protein precursor trafficking from the ER to the Golgi complex (Li,
Shimada et al. 2006). To explain the presence of mature seed storage proteins
in mag2 seeds, the authors suggested that a MAG2 homologue might
compensate for its function, or that an independent pathway might exist.
However, these two possibilities would not explain similar amounts of mature
seed storage proteins in mag2 seeds than in wild-type seeds, as their
precursors are partly mis-sorted to abnormal intracellular structures where they
remain unprocessed. In agreement with the previous conclusions, my results
suggest that in mag2-2 another effector can effectively compensate for MAG2
function, but also that similar amounts of protein precursors are correctly
trafficked and matured in the seeds. In this case, the accumulation of
precursors in the same mutant seeds implies that more precursors were
produced to compensate for their partial mis-trafficking and that the MAG2
homologue or the other pathway compensating for MAG2 loss of function is less
effective. Overall, these results reveal that in mag2 seeds an unknown
mechanism can sense the accumulation of mis-trafficked seed storage protein
precursors in abnormal structures and triggers an overproduction of seed
storage protein precursors to compensate for their inefficient trafficking.
MAG4 was also shown to be involved in trafficking seed storage protein
precursors from the ER to the Golgi complex (Takahashi, Tamura et al. 2010),
with mag4 seed cells also accumulating abnormal structures containing seed
storage protein precursors. The mutant mag4-2 was shown to have complete
loss of function for MAG4 whereas this loss was only partial for mag4-1, with
mag4-2 plants exhibiting a much stronger phenotype than mag4-1 plants. As
previously described, mag4-2 is a null mutant that contains a T-DNA insertion in
MAG4 gene and the mag4-1 deletion in MAG4 gene results in the production of
a truncated MAG4 protein. As a result, the MAG4 transcription is undetectable
in mag4-2 while it is similar to wild type in the mag4-1 mutant (Takahashi,
61
Tamura et al. 2010). The presence of mature seed storage proteins in mag4
seeds confirmed the potential existence of other effectors than MAG2 and
MAG4 or of a Golgi-independent pathway. In addition, my results suggest that
the accumulation of seed storage protein precursors in the mag4 seeds is
correlated with production of mature seed storage proteins in quantities similar
or greater than wild-type respectively for mag4-1 and mag4-2. Overall, these
results support my previous deduction and indicate that the accumulation of
seed storage protein precursors in abnormal structures is detected and
proportionally compensated in mag4 seeds.
VSR1 is a receptor originally localised on the internal Golgi surface and that
selectively binds C-termini of seed storage protein precursors. The precursors
are thereby gathered in vesicles produced by the Golgi apparatus and targeted
to the protein storage vacuole. During their transport to the protein storage
vacuole, these vesicles merge with other vesicles containing processing
proteases to form multivesicular bodies and allow seed storage protein
maturation to start (Shimada, Fuji et al. 2003; Otegui, Herder et al. 2006).
Accordingly,
vsr1
seeds
abnormally
accumulate
seed
storage protein
precursors, and immunogold analyses, with antibodies that cannot differentiate
mature proteins or protein precursors, showed that some globulins and
albumins are abnormally secreted from seed cells in vsr1 lines (Shimada, Fuji et
al. 2003). My results also suggest that the abnormal accumulation of seed
storage protein precursors is accompanied with a reduction in mature seed
storage proteins production. Combined with the previous studies, these
conclusions imply that the lack of VSR1 results in the secretion of some seed
storage protein precursors outside the cell, thus never coming into contact with
their processing proteases. This ultimately results in less mature seed storage
proteins being produced. In contrast to the previous mag2 and mag4 mutants,
this reduction is not compensated. This confirms the theory that a compensation
mechanism can detect the accumulation of seed storage protein precursors in
the cell, which is not activated in the vsr1 seeds where these precursors are
secreted out of the cell.
62
MAG1 (or VPS29) and VPS35 are components of the retromer complex, which
is responsible for retrograde transport of proteins (Coudreuse, Roel et al. 2006;
Oliviusson, Heinzerling et al. 2006; George, Leahy et al. 2007; Nielsen,
Gustafsen et al. 2007). The mag1 and vps35 seeds accumulate seed storage
protein precursors, both in abnormal intracellular or extracellular structures
(Shimada, Koumoto et al. 2006; Yamazaki, Shimada et al. 2008). The authors
concluded that the retromer complex was involved in VSR1 recycling (Shimada,
Koumoto et al. 2006; Yamazaki, Shimada et al. 2008), but this would not
explain the accumulation of seed storage protein precursors inside the cell, or
why mag1 and vps35 lines exhibit a strong dwarf phenotype and pleiotropic
phenotypes. Rather these results imply that the retromer complex is involved in
retrograde transport of proteins at large and that mis-sorting of seed storage
proteins in retromer components mutants is just a side-effect of the deregulation
of protein transport in general.
This correlates with my results indicating that in the partial loss of function
mutants vps35b-1c-1 and mag1-1 the accumulation of seed storage protein
precursors is compensated to produce similar or even larger amounts of mature
seed storage proteins as wild-type, similarly to mag2 and mag4 lines. This
suggest that other components of the seed storage protein trafficking pathway
(potentially MAG2 and MAG4) might also be partially impaired in the
vps35b-1c-1
and
mag1-1
lines,
resulting
in
the
previously
reported
accumulation of seed storage protein precursors inside the cell, and that
similarly to mag2 and mag4 lines this accumulation is detected and
compensated. This mechanism might be deregulated as well in the mag1-1
lines, resulting in the over production of mature seed storage protein suggested
by my results. In contrast, when the loss of function is complete (such as for the
mutant mag1-2), my results indicate that the mis-sorting of seed storage protein
precursors cannot be compensated by another mechanism, supporting the idea
that not only is VSR1 recycling greatly impaired in mag1-2, but protein transport
in general, thus greatly impairing protein production and maturation at large.
63
4.4 Discussion
In this study, I developed a tool to monitor mature seed albumin production. The
specificities of PawS1 albumin precursor, suggested it could be used as a tool
for estimating albumin abundance in seeds. It contains the sequence for the
small cyclic peptide SFTI-1 and for an albumin, both of which are processed by
AEP for their maturation. This suggests that SFTI-1 and its adjacent albumin
fate should be linked and their respective quantity in seeds should be
proportional. I developed a method to compare acyclic SFTI-1 quantity in seeds
of A. thaliana lines. Then, I created several hybrids, each expressing transgenic
PawS1 protein precursor in a different seed storage protein trafficking mutant
contexts and compared their acyclic SFTI-1 quantity in mature seeds, allowing
us to infer on mature seed storage protein production in each mutant
background.
This method allowed us to gain a greater understanding of the modifications
that occur in various protein trafficking mutants. I elucidated in more detail these
mutants phenotypes and, more importantly, I identified the existence of a
mechanism (that remains to be characterised) that can detect when seed
storage protein precursors accumulate inside the cells of these mutants, but
can’t when they are secreted outside of the cells. This mechanism can
compensate for the detected accumulation of the precursors and consequently
results in similar or greater production of mature seed storage proteins
compared to wild-type.
This suggests that this compensation mechanism acts at the translational or at
the transcriptional level to increase the production of protein precursors and
thus compensate the defects resulting in maturation of only some of these
precursors. More detailed analyses of the transcript levels for seed storage
proteins and of the production of their protein precursor during seed
development in the various mutants are required to provide a better
understanding
of
this
proposed
sensing
mechanism.
Additionally,
the
generation of mutated populations from these mutants might enable the
characterisation of effectors of this compensation mechanism in targeted
screens.
64
Chapter 5
How readily will SFTI-1 hijack
other sequences?
SFTI-1 is not a passive by-product of its adjacent
albumin maturation
65
Figure 5.1. PawS1 and SFTI-1 processing. (a) PawS1 sequence with ER
signal (rose), SFTI-1 (aqua), PawS1 albumin small subunit (green) and large
subunit (orange). Spacer regions or the GLDN tail are coloured black.
Asparagine residues are numbered. (b) PawS1 (after ER signal removal)
hypothetical 3D peptide model in ribbon format. It was generated by homology
modelling using EasyModeller4.0 (Kuntal, Aparoy et al. 2010) and using all
available napin-type albumin structures (1SM7A, 1S6DA, 1W2QA, 1PSYA,
2LVFA) as template for the albumin part of PawS1 and SFTI-1
three-dimensional structures as a template for the SFTI-1 part of PawS1. Model
evaluation data: ProSA-web (Wiederstein and Sippl 2007) Z-Score= -4.87,
Procheck (Laskowski, MacArthur et al. 1993) Ramachandran parameters:
Procheck Ramachandran Plot (Allowed= 90.8%, Additionally allowed= 8.5%,
Generously Allowed= 0.7%, Disallowed region= 0.0%) and Procheck G-factors
(Dihedrals= -0.11, Covalent= 0.26, Overall= 0.02). Disulfide bonds are indicated
as sticks. Asparagine residues are numbered and their side chains shown in
line format. (c) MALDI profiles of seed peptide extracts from sunflower as well
as A. thaliana expressing a synthetic PawS1 in wild-type (WT) and aep-null
backgrounds. A mass for acyclic SFTI-1 (1,531.82 Daltons) is the major form
detectable in WT A. thaliana. Only mis-processed PawS1 derivatives are
detectable in the aep-null background.
5
How readily will SFTI-1 hijack other sequences?
5.1 Introduction
As introduced in Chapter 1, SFTI-1 is a small macrocyclic peptide from
sunflower seeds that has potent trypsin inhibitory activity (Luckett, Garcia et al.
1999). The gene encoding SFTI-1 (figure 5.1) was named Preproalbumin with
SFTI-1 (PawS1) as it also encodes a napin-type seed storage albumin (Mylne,
Colgrave et al. 2011). Using transgenic A. thaliana wild-type and AEP knockout
lines, it was shown that both SFTI-1 and its adjacent albumin require the
protease AEP to be processed from PawS1 dual precursor (Mylne, Colgrave et
al. 2011) and SFTI-1 biosynthesis has been reconstituted in situ with sunflower
seed extracts as well as in vitro with a recombinant AEP (Bernath-Levin, Nelson
et al. 2015). Processing by AEP is one obvious requirement for SFTI-1
maturation, but little is known about the ability of this peptide sequence to
emerge from its protein precursor successfully. In this study, I investigated
whether SFTI-1 really overcomes its adjacent albumin for maturation or whether
SFTI-1 is a passive proteinaceous passenger produced as a result of PawS1
albumin maturation. To examine how readily SFTI-1 would emerge from other
proteins in vivo, I chose to work in the genetic model A. thaliana. It was
previously shown (Mylne, Colgrave et al. 2011) that a synthetic PawS1 gene
coupled to an A. thaliana seed-specific promoter can produce SFTI-1 in vivo, in
A. thaliana seeds, by accurate AEP-dependant cleavages at the asparagine
preceding SFTI-1 and at the terminal acid aspartic residue of SFTI-1 (figure
5.1). The macrocyclisation reaction is inefficient in A. thaliana so the majority of
product is acyclic SFTI-1. Our goal was to monitor the ability for the SFTI-1
sequence
to
emerge from
host sequences
rather
than to efficiently
macrocyclise, so in this study I focussed solely on the mass for acyclic SFTI-1. I
created synthetic genes with the DNA sequence for SFTI-1 (and a short trailing
sequence) at different locations within PawS1 or in genes for unrelated seed
proteins. Then, I used a MALDI-TOF-MS based method to compare the quantity
of SFTI-1 cleaved from these synthetic precursors, in the seeds of the various
transgenic lines created, and to get more insight on SFTI-1 maturation.
66
5.2 Material and methods
5.2.1 Synthetic genes for in planta expression
Synthetic genes (GENEART®) were designed with a ClaI restriction site
preceding the start ATG and a SacI restriction site after the stop codon
(supplemental figures 9.3 and 9.4) to allow sub-cloning into a binary vector.
Most synthetic genes (except 3SFTI, 2SFTI, 1SFTI and PawS1[∆2-21]) were
ordered this way. A synthetic PawS1 gene was also generated using the same
technology to use as a control in this study, because the other synthetic variants
of PawS1 generated were codon optimised for expression in A. thaliana. From
original plasmids containing the ordered synthetic genes, variants were
generated by PCR. The PCR were conducted using phosphorylated primers
surrounding a region to delete in an original plasmid. The whole plasmid was
amplified by PCR, at the exception of the small region surrounded by the couple
of primers. The resulting truncated and linearized plasmid was then
re-circularised thanks to ligation of the two phosphorylated free ends. Synthetic
genes encoding 3SFTI, 2SFTI and 1SFTI were generated this way from the one
encoding 4SFTI, using the primer couples BP37 (5'-GTT ATC AAG ACC ATC
TGG ATA GCA-3') and BP40 (5'-TGA TGA GAG CTC ATT AAT TAA-3'), BP38
(5'-ATT ATC CAA ACC ATC TGG GAA ACA-3') and BP40 or BP39 (5'-GTT
ATC GAG TCC ATC AGG GAA-3') and BP40 respectively to remove one, two
or three SFTI-GLDN domains. To remove the ER signal and generate the
artificial gene encoding PawS1[∆2-21], the PawS1 synthetic gene was used as
template in a PCR reaction with primers BP51 (5'-GGA TAC AAG ACC TCT
ATC TC-3') and BP52 (5'-CAT TGT ATC GAT TGG CGC GC-3'). To create
binary constructs for plant transformation, the vector pAOV (Mylne and Botella
1998) was cut with XhoI/SacI to remove its CaMV35S promoter, and then the
OLEOSIN promoter with XhoI/ClaI restriction sites and a synthetic gene
digested with ClaI and SacI were triple ligated with the open promoter-less
pAOV. Upon ligation, the synthetic gene is then preceded by the OLEOSIN
promoter and followed by a Nopaline synthase terminator (nos3’) included in
pAOV.
67
5.2.2 Plant transformation and growth conditions
All plants were grown as in 2.2.2.
Constructs carrying the synthetic genes, E. coli resistance to tetracycline and
plant resistance to glufosinate ammonium, were tri-parental mated into
Agrobacterium tumefaciens strain LBA4404 and used to transform A. thaliana
wild-type Columbia (Col-0) or an aep1-1 aep2-3 aep3-1 aep4-1 (also known as
αvpe-1 βvpe-3 γvpe-1 δvpe-1) quadruple mutant (Nakaune, Yamada et al.
2005) by floral dip (Bechtold, Ellis et al. 1993). Seeds resulting from the parental
transformation were collected and sown on soil for selection of the first
transgenic generation (T1) with Basta® herbicide with a final concentration of
0.4 g/L glufosinate ammonium (Bayer Cropscience Pty Ltd). The segregation
ratio of resistant to susceptible seedlings in the second transgenic generation
(T2) was used to select single locus transgene insertion lines.
5.2.3 Seed peptide extraction
Peptides were extracted from fifty T2 seeds from each single locus transgenic
line generated (or from one seed in the case of sunflower), as in 4.2.5.
Seeds were placed in a 2.0 mL plastic tube with a 3/16" stainless steel ball,
frozen in liquid nitrogen and shaken in a mixer mill (Retsch MM301). Tissue
powder was resuspended in 0.25 mL methanol and 0.25 mL chloroform. For
phase separation 0.1 mL of 0.05% (v/v) trifluoroacetic acid was added and
mixed for 2 min at room temperature before centrifugation for 5 min at 20000 x
g. The aqueous phase was retained and diluted 50-fold in 50% (v/v) acetonitrile
0.1% (v/v) trifluoroacetic acid for analysis by MALDI-TOF-MS.
68
5.2.4 SFTI-1 (cyclic and acyclic) detection and acyclic SFTI-1
quantification
To detect SFTI-1 and quantify acyclic SFTI-1 in each seed peptide extract the
same procedure as described in 4.2.6 was conducted. For each synthetic gene
I created and introduced in A. thaliana, T2 seeds of a minimum of eight
independent transgenic lines were tested and a minimum of four technical
repeats per line were performed.
5.2.5 Shotgun proteomics of A. thaliana seeds
To determine the abundant seed proteins in wild-type A. thaliana (Col-0
ecotype), proteins were extracted from 240 seeds in six independent biological
replicates and digested with trypsin as described (Mylne, Colgrave et al. 2011).
For each sample, 8 µg of digested proteins were injected into an Agilent 6550
Q-TOF using HPLC via Chip Cube interface with a C18 trapping/analytical
Polaris chip for protein binding, and a 45 min gradient of 10% to 30% (v/v)
acetonitrile in 0.1% (v/v) formic acid for elution. Settings used were positive ion
mode, eight mass spectrometry (MS) scans at 250 to 1,400 mass-to-charge
ratio per second, maximum of eight precursors fragmented per cycle with an
absolute threshold of 5,000 counts, scan speed varied according to abundance,
and charge state selection set to +2 and +3. Spectra were searched against
TAIR10 protein sequences using Mascot 2.3.02 (Matrix Science) to infer the
forty most abundant proteins in dry A. thaliana seeds (table 5.1).
69
5.2.6 Targeted proteomics
For each construct, proteins were extracted from 240 seeds (30 seeds for eight
independent transformation events with acyclic SFTI-1 production) were
pooled). Proteins were extracted and digested with trypsin as previously
described (Mylne, Colgrave et al. 2011). Based on a theoretical digest of each
of the protein, possible peptide transitions were determined using Skyline,
version 1.1.0.2905 (MacLean, Tomazela et al. 2010). These were then
optimised following an algorithm specific for Agilent 6400 series instruments.
For PawS1 albumin, I selected the fragment QLQQGQGGQQQVQQMVK (from
the large subunit) as it is the only fragment unaffected by any of the artificial
SFTI-GLDN insertion. As an internal control I targeted the tryptic fragment
QEEPVCVCPTLR from SESA4, an endogenous A. thaliana napin-type albumin.
For each fragment, I selected at least three transitions for one precursor ion.
For identification of the selected daughter ions, 2 µg of digested protein were
injected using an Agilent Technologies 1200 series nano/capillary LC system on
an Agilent 6490 QqQ mass spectrometer with an HPLC Chip Cube interface
with C18 trapping/analytical Polaris chip, and eluted with a 15.5 min and 3% to
100% (v/v) acetonitrile gradients in 0.1% (v/v) formic acid. The mass
spectrometer was run in positive ion mode, with a drying gas temperature of
365°C and flow rate of 5 litres per min; for each transition, the fragmentor was
set to 130 and dwell time was 5 ms. As reference for the expected
fragmentation pattern and retention times for PawS1 albumin tryptic fragment, I
performed an identical proteomic analysis on recombinant 6-His tagged PawS1
albumin produced as previously described (Mylne, Colgrave et al. 2011).
70
5.3 Results
5.3.1 The SFTI-1 sequence can emerge from non-native locations
In PawS1, the sequence for SFTI-1 and its trailing GLDN sequence
(GRCTKSIPPICFPDGLDN, called SFTI-GLDN hereafter) immediately precede
the sequence of mature PawS1 albumin. This location avoids interfering with
the adjacent albumin. The GLDN sequence trailing SFTI-1 is conserved within
the PawS1 family of protein precursors and is essential for SFTI-1 maturation
(Mylne, Colgrave et al. 2011; Elliott, Delay et al. 2014). Although AEP only
cleaves at three asparagine (Asn) residues, there are seven in PawS1 (figure
5.1): Asn35 precedes SFTI-1, Asn53 ends SFTI-GLDN and precedes PawS1
albumin small subunit, Asn65 is in the middle of PawS1 albumin small subunit,
Asn84 precedes PawS1 albumin large subunit whereas the remaining three
(Asn96, Asn143, Asn146) are within PawS1 albumin large subunit and are
interspersed with conserved cysteine residues that form disulfide bonds and
impart the five helical structure conserved in all known structures for napin-type
seed storage albumin (Mylne, Hara-Nishimura et al. 2014). I forced SFTI-1 and
PawS1 albumin to compete for processing by creating PawS1 variants with
SFTI-GLDN placed directly after one of the five alternative existing asparagine
residues in and around what become the small and large PawS1 albumin
subunits during maturation (figure 5.1). The SFTI-GLDN sequence was
removed from its native position and inserted at Asn65, Asn84, Asn96, Asn143,
Asn146
in
the
constructs
Paw[N65]S1,
Paw[N84]S1,
Paw[N96]S1,
Paw[N143]S1, and Paw[N146]S1 respectively. Of these five asparagine
residues, only Asn84 is targeted by AEP in native PawS1. I also generated
PawS1[Δ2-21], a variant lacking the N-terminal ER signal sequence. Each
synthetic open reading frame was placed in a binary vector under control of the
seed-specific OLEOSIN promoter (Parmenter, Boothe et al. 1995) and
transformed into A. thaliana wild-type plants by Agrobacterium-mediated in
planta transformation.
71
Figure 5.2. Acyclic SFTI-1 production from non-native locations in PawS1.
(left) Synthetic precursor proteins with ER signal peptide (pink), SFTI-1 (cyan),
GLDN (black), PawS1 albumin small subunit (green), PawS1 albumin large
subunit (orange) and propeptides (line). (middle) acyclic SFTI-1 relative
quantity based on MALDI-TOF-MS analysis of seed peptide extracts from
transgenic A. thaliana wild-type, stars indicate significant difference compared
to the PawS1 line (grey). (right) MALDI-TOF-MS spectra of seed peptide
extracts. Each spectrum typically displays only the 1531.82 Daltons mass for
acyclic SFTI-1, excepted for lines expressing the chimeric precursors with no
ER signal, that produce no peptide. For full sequences see supplemental
figures 9.3 and 9.4.
To determine whether acyclic SFTI-1 could emerge from these non-native
locations within PawS1, seed extracts from transgenic lines were analysed by
matrix-assisted
laser
desorption-ionisation
mass
spectrometry
(MALD-TOF-MS). Although the objective was to look for the presence of a mass
for acyclic SFTI-1 (1,531 Daltons), full spectra from 900 Daltons to 3,000
Daltons were acquired, to look for mis-processing events. A mass consistent
with acyclic SFTI-1 was detectable for all five SFTI-GLDN-insertion constructs
and
there
were
no
additional
masses,
which
would
have indicated
mis-processing. No acyclic SFTI-1 was detectable when the ER signal was
removed from PawS1 (figure 5.2).
To compare acyclic SFTI-1 production from each construct, MALDI-TOF-MS
spectra were acquired for at least nine independent, single-locus lines. Relative
quantitation from spectra peak areas was enabled by including a spiked
standard identical to acyclic SFTI-1, but possessing a diselenide bond instead
of disulfide. This gave the spiked standard comparable ionisation, but a higher
mass (1627.7 Daltons) and distinct isotopic peak pattern. I found that the acyclic
SFTI-1 levels produced when the SFTI-GLDN sequence was within PawS1
albumin small subunit (Paw[N65]S1) or at PawS1 albumin large subunit release
point (Paw[N84]S1) did not differ significantly from the levels produced by native
PawS1. Acyclic SFTI-1 continued to emerge when it was buried in the large
albumin subunit (Paw[N96]S1, Paw[N143]S1, and Paw[N146]S1), but in
significantly (P <0.01) reduced levels, about half that when at its native location
(figure 5.2). Overall these data suggest that acyclic SFTI-1 can emerge
successfully from multiple locations within PawS1 provided this protein
precursor has an ER signal. The ability for acyclic SFTI-1 to consistently
emerge from within PawS1 albumin suggests that its disulfide bond is formed
correctly and that AEP is able to process acyclic SFTI-1 from all tested
locations. It is also interesting to note here that the artificial presence of the
SFTI-GLDN sequence downstream asparagine residues not usually targeted by
AEP converted them into AEP processing targets.
72
Figure 5.3. PawS1 albumin production from PawS1 variants. Targeted
proteomic analysis of seed protein extracts from A. thaliana lines, each
expressing one of the PawS1 derivate synthetic constructs. PawS1 (transgenic)
and SESA4 (native) albumins presence were each determined by detection of
one specific tryptic fragment, both selected for being the most easily and
reproducibly detectable in preliminary analysis. The first column shows the
retention time and relative intensity of the detected fragments. The two next
column shows MS/MS fractionation of respectively the PawS1 albumin fragment
first and the SESA4 one then, when being detected.
Within PawS1, SFTI-1 and PawS1 albumin are neighbours, but when the
SFTI-GLDN sequence is artificially buried in the mature PawS1 albumin
sequence, acyclic SFTI-1 continues to emerge. To find out if unprocessed or
truncated PawS1 albumin persists in these lines, its presence was determined
using targeted LC-MS/MS with endogenous A. thaliana SEED STORAGE
PROTEIN 4 (SESA4) as an internal control. Using this targeted proteomics
approach I found no evidence for mature PawS1 albumin in transgenic lines
expressing constructs in which PawS1 albumin was interrupted by SFTI-GLDN
(figure 5.3). The endogenous control SESA4 was detected in all transgenic
lines and PawS1 albumin was detectable in lines expressing unaltered PawS1.
Combined with acyclic SFTI-1 quantification, these data confirm that although
acyclic SFTI-1 and PawS1 albumin both emerge from PawS1 when they reside
side-by-side, when SFTI-GLDN resides within the mature PawS1 albumin
domain, acyclic SFTI-1 will keep emerging to the detriment of PawS1 albumin.
73
AGI
At5g44120
At4g28520
At1g03880
At4g27150
Function/identity
Cruciferin, 12S globulin, seed storage
Cruciferin, 12S globulin, seed storage
Cruciferin, 12S globulin, seed storage
Napin type 2S albumin, seed storage
At4g27160
Napin type 2S albumin, seed storage
At4g27170
Napin type 2S albumin, seed storage
At5g66400
At3g22640
Dehydrin family, stress induced
RmlC-type cupin, vicilin, 7S seed
storage
Lipase/lipooxygenase, PLAT/LH2 family
Napin type 2S albumin, seed storage
Polyketide cyclase/dehydrase
Lipid transfer
1-cysteine peroxiredoxin antioxidant
Napin type 2S albumin, seed storage
At4g39730
At5g54740
At1g14950
At2g38530
At1g48130
At4g27140
At1g03890
At3g15670
At2g40170
At5g12030
At5g40420
At1g52690
At4g25140
At4g39260
At3g46230
At1g05510
At3g02480
At5g42980
At5g45690
At3g17520
At5g06760
At2g42560
At3g51810
At4g38740
At3g53040
At5g01300
At2g36640
At3g05020
At3g50980
At3g15260
RmlC-type cupin, vicilin, 7S seed
storage
Late embryogenesis abundant
Late embryogenesis abundant
Heat shock, chaperone activity
Oleosin, seed lipid accumulation
Late embryogenesis abundant
Oleosin, seed lipid accumulation
Cold induction, circadian rhythm related
Heat shock protein
Oil body associated and ABA
responsive
Late embryogenesis abundant
Thioredoxin (H-TYPE)
Unknow n function
Late embryogenesis abundant
Late embryogenesis abundant
Late embryogenesis abundant
Stress induced protein
Rotamase cyclophilin
Late embryogenesis abundant
Phosphatidylethanolamine-binding
Late embryogenesis abundant
Acyl carrier
Unknow n function
Protein phosphatase 2C family
Nam e
CRU1, CRA1
CRU3, CRC
CRU2, CRB
SESA2,
AT2S2
SESA3,
AT2S3
SESA4,
AT2S4
RAB18, ATDI8
PAP85
PLAT1
SESA5
LTP2, CDF3
PER1
SESA1,
AT2S1
LEA6, EM6
HSP17.6
OLE2, OLEO2
LEA7
OLE1, OLEO1
CCR1, GRP8
HSP17.4
OBAP1A
TRX3, TRXH3
LEA-LIKE
LEA4-5
GEA1, EM1
ROC1
PEBP
ECP63
ACP, ACP1
XERO1
aa
472
524
455
170
Hits
263
413
189
89
frag
9
20
10
5
cov.
31.05
55.07
33.40
36.98
Q
29.72
21.20
18.85
16.59
164
78
6
33.05
13.43
166
67
7
37.87
9.57
186
486
25
87
3
12
25.12
24.58
8.71
7.28
181
165
155
118
216
164
13
32
31
17
38
13
2
6
6
3
8
3
11.38
25.23
40.77
34.78
36.27
18.08
6.00
5.76
5.34
5.20
4.83
4.28
451
32
8
17.22
4.20
225
92
156
199
169
173
169
156
241
31
15
8
8
12
12
5
11
29
8
4
2
2
4
3
2
3
9
40.13
37.88
10.30
12.77
26.43
20.00
11.00
18.80
37.20
4.00
3.91
3.90
3.54
3.48
3.41
3.38
3.37
3.31
68
118
247
298
158
635
152
172
479
162
448
137
128
289
12
6
15
11
8
28
10
5
10
5
9
3
2
4
4
2
5
4
3
11
4
3
5
2
5
2
1
4
54.88
20.05
25.16
14.90
28.25
22.25
30.70
23.60
10.50
12.48
11.86
20.43
16.78
13.56
3.29
3.27
3.08
2.74
2.56
2.49
2.19
2.08
2.00
2.00
1.77
1.55
1.29
1.17
Table 5.1. Shotgun proteomic analysis of A. thaliana Col-0 dry seeds.
Presented results are averaged on analysis of six independent biological
replicates. Only proteins identified in at least five of six replicates with more than
10% coverage are presented. Proteins are ordered according to their relative
abundance in the samples. AGI: reference gene model for the protein in the
A. thaliana information resource (TAIR) database. Function/identity: summary
of functional information available for the protein on TAIR and the Universal
Protein Resource. Name: list of the most common names given to the protein.
aa: number of amino acid in the protein. Hits: total number of peptide fragments
identified and matching to the protein sequence. frag: total number of unique
peptide fragments identified and matching to the protein sequence. cov:
Percentage of the total protein precursor sequence matching with identified
peptide fragments. Q: relative quantity of proteins (in 1 µg A. thaliana Col-0
ecotype dry seeds total protein extract) was roughly estimated based on
average number of hits per unique sequences identified by shotgun proteomics.
5.3.2 The SFTI-1 sequence can recruit AEP
The above results show that acyclic SFTI-1 can emerge from non-native
locations and triumphs over PawS1 albumin when forced into competition for
proper
processing. To determine whether acyclic SFTI-1 can emerge
successfully from other protein precursors I created synthetic chimeric genes by
fusing the DNA sequence for SFTI-GLDN to genes encoding different and
abundant A. thaliana seed proteins. To determine a set of such proteins from
wild-type A. thaliana seeds, I used shotgun proteomics, and selected five (table
5.1). The proteins were selected to include similar and different processing
routes to PawS1 albumin. The most similar was SEED STORAGE ALBUMIN 1
(At4g27140, SESA1) which, like PawS1 albumin, is napin-type seed storage
albumin. The second was CRUCIFERIN A1 (At5g44120, CRU1), which is a
different type of seed storage protein referred to as cruciferin or more broadly
as globulin (Fujiwara, Nambara et al. 2002). Globulins are other major
AEP-processed seed storage proteins and mutations in trafficking or processing
machinery affect albumins and globulins processing similarly (Gruis, Schulze et
al. 2004; Takagi, Renna et al. 2013). The third, Plastid-lipid-Associated Protein
85 (At3g22640, PAP85), is a vicilin-like seed storage protein (Rozwadowski,
Yang et al. 2008), another type of seed storage protein. Vicilin genes and vicilin
precursor maturation are largely poorly described and understood (Shewry,
Napier et al. 1995; Ribeiro, Monteiro et al. 2014), but at least one vicilin
precursor has been shown to be processed by AEP for maturation (Yamada,
Shimada et al. 1999). The fourth was a Late Embryogenesis Abundant protein
(At3g17520, LEA-like) which is unrelated to albumin, vicilin and globulin storage
proteins. LEA is ER targeted (Shih, Hoekstra et al. 2008) , but is not thought to
require proteolytic maturation. The fifth protein chosen was an oleosin
(At4g25140, OLE1), which differs from all other proteins chosen as it has no ER
signal and so does not traffic via the secretory pathway, but is synthesised on
the ER (Hsieh and Huang 2004).
74
Figure 5.4. Acyclic SFTI-1 production from chimeric precursors. (left)
Chimeric precursor proteins with ER signal peptide (pink), SFTI-1 (cyan), GLDN
(black), albumin small subunit (green), albumin large subunit (orange), other
protein units (brown) and propeptides (line). (middle) acyclic SFTI-1 relative
quantity based on MALDI-TOF-MS analysis of seed peptide extracts from
transgenic A. thaliana wild-type (grey) or the aep-null background (black, aep), .
stars indicate significant difference compared to the PawS1 lines (right)
MALDI-TOF-MS spectra of seed peptide extracts. Each spectrum typically
displays only the 1531.82 Daltons mass for acyclic SFTI-1, excepted for lines
expressing the chimeric precursors with no ER signal, that produce no peptide.
The only spectrum showing an unexpected mass was for the CraS1 construct,
due to minor acyclic SFTI-1 misprocessing. For full sequences see
supplemental figures 9.3 and 9.4.
The first synthetic gene I designed (SesaS1) encodes a chimeric protein
precursor with SFTI-GLDN inserted inside the precursor for SESA1, as a mimic
of PawS1 and the second (SesaS2) is a variant with no N-terminal propeptide
encoding sequence, placing SFTI-GLDN immediately adjacent to the predicted
ER signal (see figure 5.4). Previous PawS1 deletion analysis showed the
sequence between the ER signal and SFTI-GLDN could be removed without
preventing acyclic SFTI-1 production (Mylne, Colgrave et al. 2011). For the
three next chimeras (CraS1, PapS1 and LeaS1), SFTI-GLDN is inserted
immediately after each ER signal such that ER signal cleavage would liberate
the first amino acid residue of SFTI-1. For all chimeric constructs except
SesaS1, proto-N-terminal processing of SFTI-1 is done by ER signal removal
instead of AEP cleavage so only a single cleavage is required at Asp14 on
SFTI-1 for acyclic SFTI-1 to be successfully released. For the oleosin chimera
(OleoS1), as there is no ER signal, SFTI-GLDN was inserted immediately after
the OLE1 start methionine. All chimeric constructs were stably transformed into
A. thaliana under control of the OLEOSIN promoter (Parmenter, Boothe et al.
1995) and seeds of multiple transgenic lines analysed by MALDI-TOF-MS.
A mass for acyclic SFTI-1 (1531.8 Daltons) was detectable in seed peptide
extracts for all constructs except the OLE1 chimera (figure 5.4). Quantification
for each line (relative to the PawS1 line) with a spiked standard (1627.7
Daltons) found the level of acyclic SFTI-1 produced was not significantly
different (P = 0.66) when SFTI-GLDN was buried in a similar location within
SESA1 (SesaS1 construct), although removal of the spacer regions between
the ER signal and SFTI-1 (SesaS2 construct) reduced acyclic SFTI-1
production significantly (P < 0.01). Acyclic SFTI-1 levels in lines expressing
CraS1, PapS1 or LeaS1 lines were significantly (P <0.013) reduced to
respectively 60%, 40% and 60% of PawS1 lines. A mis-processed product was
only detectable in the CraS1 lines (figure 5.4), indicating precise acyclic SFTI-1
excision overall. The extra mass in CraS1 lines was consistent with the absence
of the first glycine residue of acyclic SFTI-1, possibly due to some flexibility for
ER signal removal.
75
Overall, these results demonstrate that acyclic SFTI-1 is robust and can emerge
successfully and precisely from a range of A. thaliana seed proteins including
some that do not require AEP for processing, and are probably subject to
different trafficking. The only construct that failed to produce acyclic SFTI-1 was
OleoS1, which lacks
an ER
signal. Combined with previous PawS1
mutagenesis studies (Mylne, Colgrave et al. 2011; Elliott, Delay et al. 2014), our
results imply the only prerequisites for successful acyclic SFTI-1 maturation is
that the protein precursor it resides within is delivered to the secretory system
and that acyclic SFTI-1 is flanked by the appropriate AEP-target residues for
release.
To investigate whether acyclic SFTI-1 emerges from chimeric proteins in an
AEP-dependent fashion, the same suite of constructs were introduced into an
A. thaliana quadruple knockout line lacking AEP. This aep-null line has no
obvious phenotype apart from misprocessing of seed storage proteins
(Shimada, Yamada et al. 2003; Gruis, Schulze et al. 2004). In all cases, acyclic
SFTI-1 production dropped or was abolished, demonstrating that AEP is
required for the processing of acyclic SFTI-1 from the chimeric constructs
(figure 5.4). A mass for acyclic SFTI-1 (1531.8 Daltons) was detected only for
the lines expressing the chimeric PapS1 and LeaS1 precursors, which implies
non-AEP proteases might process acyclic SFTI-1 from the PapS1 and LeaS1
chimeric protein precursors. The acyclic SFTI-1 level from PapS1 and LeaS1 in
aep-null background are significantly reduced (P < 0.01) by more than 50%
relative to wild-type background (figure 5.4), suggesting that AEP is still
involved in acyclic SFTI-1 production from PapS1 and LeaS1. Overall, these
data illustrate that, provided it is attached to an ER targeted protein, acyclic
SFTI-1 can emerge with precision from non-native protein contexts and recruits
AEP to do so.
76
Figure 5.5. Acyclic SFTI-1 production from albumin-less PawS1 variants.
(left) Synthetic precursor proteins with ER signal peptide (pink), SFTI-1 (cyan),
GLDN (black), PawS1 albumin small subunit (green) and large subunit (orange)
and propeptides (line). The SFTI-1 variant in each construct are indicated as
followed. R corresponds to SFTI-1K5R (acyclic mass 1559.82 Daltons), Y
corresponds to SFTI-1F12Y (acyclic mass 1547.82 Daltons) and V corresponds
to SFTI-1I10V (acyclic mass 1517.82 Daltons). (middle) acyclic SFTI-1 relative
quantity based on MALDI-TOF-MS analysis of seed peptide extracts from single
insertion lines of transgenic A. thaliana with wild-type (grey) or the aep-null
(black, aep) genetic background, stars indicate significant difference compared
to the PawS1 lines. (right) MALDI-TOF-MS spectra of seed peptide extracts.
Spectrum for the PawS1 line is not shown. Each spectrum displays the
expected masses for the single amino acid residue variants of acyclic SFTI-1
Spectra for the albumin-less variants are from multiple insertion lines, to have a
high enough signal on the picture. For full sequences see supplemental
figures 9.3 and 9.4.
5.3.3 The SFTI-1 sequence can recruit AEP without an adjacent
protein
The ability of acyclic SFTI-1 to emerge from various locations and various
proteins implies that the SFTI-GLDN sequence has the ability to overtake
adjacent proteins processing and recruit AEP to proteins (or protein locations)
not naturally targeted by AEP. To ascertain whether SFTI-1 requires an
adjacent protein I created versions of PawS1 without the sequence for PawS1
albumin. To guard against the possibility that a single SFTI-GLDN might traffic
or be processed improperly due to its short length, I created four PawS1
variants encoding one to four tandem SFTI-GLDN sequences with no adjacent
albumin (respectively called 1SFTI to 4SFTI), each varying by a single amino
acid residue so as to be distinguishable based on mass. I used previous results
(Mylne, Colgrave et al. 2011) to select mutations that would not impair
processing. I also used a variant PawS1 that encodes the four tandem
SFTI-GLDN and the adjacent PawS1 albumin (called Paw4SFTI) as a positive
control (figure 5.5). This Paw4SFTI construct mimics natural variants of PawS1
such as the 4-cassette PawS1 gene from Galinsoga quadriradiata (Elliott, Delay
et al. 2014). I transformed these chimeric genes into A. thaliana and could
detect masses for all expected acyclic SFTI-1 variants in all seed peptide
extracts (figure 5.5). Acyclic SFTI-1 production from albumin-less precursors
was only 5% to 10% of the level when embedded next to PawS1 albumin
(figure 5.5). This result shows that although acyclic SFTI-1 does not have an
absolute requirement for an adjacent protein, its production benefits from being
co-matured with an adjacent protein. I also wanted to test whether AEP was
responsible for acyclic SFTI-1 production from this type of precursors. For this
reason, the synthetic genes encoding the chimeric protein precursors containing
four tandems SFTI-GLDN and either an adjacent PawS1 albumin or no albumin
(namely Paw4SFTI and 4SFTI) were also transformed into the aep-null
background. I performed the same MALDI-TOF-MS analysis on these
transformed lines and acyclic SFTI-1 production was not detectable in their
seed extracts (figure 5.5), demonstrating that AEP is required for the proper
processing of acyclic SFTI-1 and the three acyclic SFTI-1 variants from these
constructs. Overall, these results demonstrate the SFTI-GLDN sequence alone
is capable of recruiting AEP for its processing.
77
5.4 Conclusion
In the native context, SFTI-1 and its adjacent albumin are co-translated as a
single precursor and cleaved into their independent and mature forms by AEP.
Here, by making chimeric constructs I showed acyclic SFTI-1 can also emerge
from alternative locations within its native protein precursor, to the detriment of
PawS1 albumin. The SFTI-1 sequence also emerged with precision from
chimeric protein precursors by recruiting AEP anew to the artificial precursors
deriving from various other types of seed protein precursors than albumins. I
also found that acyclic SFTI-1 can be produced from an albumin-less PawS1
variant, confirming that the sequence for SFTI-1 alone can recruit AEP for
processing from within its protein precursor. The production of acyclic SFTI-1
was not dependent upon the ancillary albumin, but was greatly reduced without
it. In addition, this result alone raises questions about the relationship between
SFTI-1 and its adjacent albumin and why do they remain linked, which will be
discussed in the general discussion to this thesis.
The dominance of SFTI-1 over its host protein, the ease with which it emerges
from other proteins and its ability to recruit AEP for its maturation demonstrate
that SFTI-1 is not just a passive by-product of PawS1 albumin production.
Overall, I demonstrate with a large variety of constructs that the SFTI-1
precursor sequence (SFTI-GLDN) can recruit AEP for its processing. Previous
studies conducted in my research group on SFTI-1 evolution sequence of
events, illustrated an unusual evolution mechanism. SFTI-1 coding sequence
evolved de novo by sequential expansion in a napin-type albumin gene, without
modifying the coding frame or sequence for the mature ancestral protein (Elliott,
Delay et al. 2014). Such a mechanism bypasses many of the impediments that
other de novo evolving coding sequences generally face before encoding a
stable, new, functional protein. This mechanism seems to differ from any other
evolutionary mechanism previously described; which will be discussed in the
next Chapter. Combined together, these data suggest that this original route for
production of new bioactivities is remarkably efficient.
78
Chapter 6
Neoproteinisation: a new
mechanism for de novo evolution
A review
79
6
Neoproteinisation: a new mechanism for de novo evolution
6.1 Introduction
The emergence of new proteins with potentially new bioactivities is essential for
the evolution of any living organism and for the emergence of new species, and
also contributes to increase the complexity of living organism proteomes. Many
evolutionary mechanisms leading to the emergence of new proteins have been
described, most of them based on remodelling existing material, and fewer
based on de novo creation (Kaessmann 2010). In a recent study, the evolution
of a small protein called SFTI-1 was described. The DNA sequence encoding
SFTI-1 evolved de novo, through successive steps of expansion inside a
napin-type albumin gene and without disturbing the sequence or the coding
frame for the ancestral napin-type albumin (Elliott, Delay et al. 2014). This novel
evolutionary mechanism has not previously been reported. In this review, I
illustrate what differentiates this new mechanism from other previously
described evolutionary mechanisms, and propose to term it neoproteinisation.
Neoproteinisation occurs when a genetic expansion in an ancestral gene, that
does not alter the coding frame or the sequence for the ancestral protein, leads
to the creation of a new protein sequence that is co-produced with the ancestral
protein in a single protein precursor cleaved during proteolytic maturation, thus
allowing two proteins instead of one to be produced from a single gene. I also
suggested that, similarly to de novo gene evolution, which had been overlooked
until a decade ago, neoproteinisation might have been overlooked until now. A
consequence of neoproteinisation is the creation of multi-protein-coding genes.
I will also illustrate how new genes generated after neoproteinisation vary from
most multi-protein-coding genes, and how understanding these differences can
help reveal other potential cases of neoproteinisation.
80
6.2 How do new proteins arise?
Evolutionary mechanisms can create new proteins in living organisms by
various ways. The most commonly described mechanisms of duplication and
divergence produce new protein-coding genes from ancestral gene templates in
the lineage of a given organism (Kaessmann 2010). A new protein can also
emerge de novo in a lineage after transfer of genetic material between different
organisms encoding a protein anew to the receiving organism (Zhou and Wang
2008). New proteins completely anew to a given organism can also emerge de
novo by the evolution of new coding DNA from ancestral non-coding DNA
(Long, VanKuren et al. 2013; Light, Basile et al. 2014).
6.2.1 New protein evolution by DNA based gene duplication and
divergence
Gene duplication is the most commonly and extensively studied mechanism for
new protein evolution and is very frequent in all prokaryotes (Romero and
Palacios 1997) and eukaryotes (Zhou, Zhang et al. 2008). After duplication, one
of the two duplicate gene copies can undergo neofunctionalisation (figure 6.1)
and subsequently encode for a new protein, when the other copy carries on the
ancestral function (Ohno 1970). Alternatively, the eventual multiple functions of
an ancestral gene can be divided between duplicate genes in a process called
subfunctionalisation (Force, Lynch et al. 1999), and thus give birth to two new
protein-coding genes (figure 6.1). Extensive studies have described a large
quantity of examples for both models in many organisms, strongly confirming
their relevance (Zhang, Zhang et al. 2002; Long, Betrán et al. 2003).
A quite interesting example of neofunctionalisation was illustrated in a leaf
eating monkey. It was shown that adaptive evolution in Asian and African
leaf-eating monkeys led to the convergent duplication and divergence of the
pancreatic ribonuclease gene. In both organisms, different substitutions in the
gene copies led to convergent neofunctionalisation and profound changes of
the diverging copies, including a lower optimal pH more adapted to leaf
digestion and stronger target specificity than the parental ribonuclease. The
study showed that the resulting inability to digest double stranded RNA, a
potentially quite important function for antiviral defence, was directly due to the
81
adaptive evolution to a lower optimal pH. Thus, the pancreatic ribonucleases
adaptive evolution would probably have never occurred without prior duplication
(Zhang 2006). This illustrates how duplication and divergence can be an
efficient solution to adapt existing proteins to new selective constraints, as the
adaptation to a leaf-only diet here.
Evidence for subfunctionalisation events is often quite poor. For example the
floral organ identity gene AGAMOUS, broadly expressed in Arabidopsis carpels
and stamens, has two orthologues in maize, ZAG1 and ZMM2, which are
differentially expressed in carpels and stamens. ZAG1 and ZMM2 seem to have
undergone subfunctionalisation, with ZAG1 being more carpel specific, and
ZMM2 being more stamen specific. Unfortunately, the potential changes in the
regulatory regions of ZAG1 and ZMM2 that could explain the differential
expression pattern compare to the Arabidopsis orthologue remain to be
identified (Force, Lynch et al. 1999).
Surprisingly, the molecular events underlying gene duplication are less well
known. A gene duplicate can originate from small-scale events, such as the
duplication of a gene fragment, a whole gene region, or a chromosomal
segment containing several genes. These events can emerge through
non-allelic
homologous
recombination (NAHR) through the mediation of
repetitive elements (Roth, Porter et al. 1985; Bailey, Liu et al. 2003) or as a
result of recombination errors (non-homologous end joining, NHEJ) (Kidd,
Cooper et al. 2008; Yang, Arguello et al. 2008) Nevertheless, current
knowledge on these mechanisms is rather speculative due to the small number
of studied organisms (Zhou and Wang 2008).
A new gene can also result from the duplication of a whole genome through
various polyploidisation mechanisms (Semon and Wolfe 2007; Conant and
Wolfe 2008; Van de Peer, Maere et al. 2009). Whereas single-gene duplication
has been shown to have a great impact on new protein origination, and the
development of new specific traits (Maere, De Bodt et al. 2005), whole genome
duplication seem to have much less contributed to the creation of new activities
or functions in animal models such as yeast (Hakes, Pinney et al. 2007) and
human (Kaessmann 2010).
82
Figure 6.1. New protein evolution through duplication and divergence. The
gene information can get duplicated at two levels. On the DNA level (top panel)
a gene can duplicate. The two resulting duplicates can then diverge. Through
neofunctionalisation, one of the two duplicates conserves the ancestral
sequence and function and the other accumulates mutations until eventually
encoding a different protein. Modification of the promoter region only can
already change the expression pattern and confer a new function to the
duplicate. The two duplicates can also co-diverge and subsequently
subfunctionalise to both partly keep the ancestral sequence and function. The
two resulting proteins are then necessary to fulfil the ancestral function. On the
RNA level (bottom panel) a gene can be retroduplicated inside the genome.
This retrocopy lacks regulatory regions, and often requires evolving a new
promoter to be able to produce a protein. Thus the newly evolved retrogene
can, already at this stage, have a different function than its ‘parental gene’ due
to eventual expression pattern modifications. Then the retrocopy can also
diverge similarly than already described for a duplicate gene.
Contrastingly, whole genome duplication has been of significant importance for
plant evolution. Indeed, the emergence of the regulatory genes necessary for
seed and flower development was a result of two successive whole genome
duplication events, in ancestral lineages, that happened shortly before the
emergence of seed plants and flowering plants respectively (Jiao, Wickett et al.
2011). These major innovations have certainly contributed to the success of
seed plants (also called gymnosperms) and flowering plants (also called
angiosperms) that now represent the vast majority of land plants. This illustrates
how whole genome duplication, in the case of plants, has significantly
contributed to the creation of new important functions.
6.2.2 New protein evolution through RNA based gene duplication
and divergence
As outlined above, DNA-mediated gene duplication followed by divergence is
the most commonly described mechanism giving rise to new proteins and is still
considered
as
the
main
contributor
to
evolution.
Another
duplication
mechanism, known as retroposition or retroduplication, can give rise to new
gene copies (Kaessmann, Vinckenbosch et al. 2009). Retroduplication occurs
when a ‘‘parent’’ gene is transcribed and matured into a messenger RNA
(mRNA), and is then reversed transcribed into a complementary DNA copy and
inserted into a new location in the genome (figure 6.1), thus creating a new
gene copy (Brosius 1991). The specific enzymes (in particular the reverse
transcriptase) required for retroposition are encoded by retrotransposons (or
retrotransposable elements). Different retrotransposons
exist in different
species, for example LINE-1 in mammals, so that the number and specificities
of the retrotransposable elements present in a genome are directly linked to the
retroduplication occurrence (Kaessmann, Vinckenbosch et al. 2009). In plants,
this mechanism is less broadly studied and thus the retrotransposons involved
in retroduplication in plants remain to be characterised, but a recent study
revealed that approximately 1% of all A. thaliana protein-coding genes were
retrogenes, as opposed to 3.4% in human (Abdelsamad and Pecinka 2014).
Unlike common gene duplicates, the resulting new retrocopy usually lack
parental introns and rarely inherit the parental promoter, thus retaining only a
part of the original information. For some time, retroduplicated genes were
83
considered to be pseudogenes because of these defects (Petrov, Lozovskaya
et al. 1996; Mighell, Smith et al. 2000). Nevertheless, since the late 1980s
(McCarrey and Thomas 1987), individual functional retrocopies (so-called
retrogenes) were found. Many retrogenes have been reported in various
genomes of plants (Drouin and Dover 1990; Charlesworth, Liu et al. 1998;
Zhang, Wu et al. 2005; Wang, Zheng et al. 2006), fruit flies (Betran and Long
2003; Zhou, Zhang et al. 2008), and mammals (Marques, Dupanloup et al.
2005; Potrzebowski, Vinckenbosch et al. 2008). It appeared that retrogenes can
acquire regulatory sequences (figure 6.1) from various sources, allowing them
to be transcribed (a prerequisite for gene functionality), and evolve new
expression patterns (Kaessmann, Vinckenbosch et al. 2009).
However, the mechanisms underlying the acquisition of new regulatory
sequences seems to be biased, as retrogenes frequently evolve testis-related
gene expression patterns, such as 53% of all identified Drosophila retrogenes
and 42% of all identified human retrogenes (Vinckenbosch, Dupanloup et al.
2006; Bai, Casola et al. 2007) for example. Similarly, most recently discovered
retrogenes in A. thaliana had a strong preferential expression pattern for male
gametes (Abdelsamad and Pecinka 2014).
The acquisition of new expression patterns can lead retrogenes to evolve new
functions. However, studies that elucidate the functions of these newly
originated genes are limited. In 2009, Kaessmann et al. revealed that, in
primates, retrogenes evolving new spatial expression patterns not only adapt to
new
functions, but also have contributed to hominoid brain evolution
(Kaessmann, Vinckenbosch et al. 2009). A more striking example of new
retrogene formation demonstrates how important a retroduplication event may
be. In the domestic dog, chondrodysplasia, a short-legged phenotype common
to several dog breeds is due only to a recently acquired retrogene. This
retrocopy (fgf4), derived from a growth factor gene, encodes the same protein
as its parental gene, but with a different expression pattern leading to the
drastic modification of a morphological trait (Parker, VonHoldt et al. 2009). In
addition to a new set of regulatory regions, a new retrocopy can also undergo
modifications leading it to encode a completely new protein, thus creating a new
function (figure 6.1). For example, in the human genome several transcribed
retrogenes have acquired new exons de novo (Vinckenbosch, Dupanloup et al.
84
2006). More drastic modifications can also occur. For example, an unusual case
of a new protein encoded by a retrogene was reported. Fg01, a mouse
retrocopy of a ribosomal protein gene, encodes a protein completely different to
that encoded by its parental gene because it is transcribed from the reverse
strand. This new protein has an entirely new function directly linked to
resistance against the Alzheimer’s disease (Zhang, Liu et al. 2009), illustrating
that retroduplication can lead to profound functional innovations.
6.2.3 Creation of new proteins through gene or transcript fusion
The idea that fusion mechanisms would be a great source of new proteins goes
back to the discovery of introns (Gilbert 1978). It was thought that reshuffling
exons of different transcription units would generate new chimeric proteins
(Long, Betrán et al. 2003). Various mechanisms can lead to creation of chimeric
proteins and fusion can happen either at the gene level or at the transcript level
(Zhou and Wang 2008; Li, Zhao et al. 2009). Fusion mechanisms are by
definition generating new functions as long as the resulting chimeric transcripts
are subsequently translated into stable proteins.
As introduced above, new transcriptional units can be created by combining
exons from different genes or transcriptional units. This can happen at the
transcript level, by a mechanism called intergenic splicing or trans-splicing,
during RNA processing, when different pre-mRNAs are integrally or partially
fused together (figure 6.2). Individual cases of trans-splicing were first reported
in the late 1980s in protists, and then in all living kingdom (Bonen 1993). With
the advent of the genomics era, genome-wide searches led to the discovery of
a large number of chimeric RNAs in human (Akiva, Toporik et al. 2006), yeast
(Li, Zhao et al. 2009), fruit fly (McManus, Duff et al. 2010) and rice (Zhang, Guo
et al. 2010). However, these approaches are based on reverse transcription and
PCR followed by sequencing, so might create artefacts and generate artificial
chimeric cDNAs (McManus, Duff et al. 2010), suggesting that reports of
trans-splicing should be treated with caution.
85
Figure 6.2. New protein evolution through fusion mechanisms. Fusion
mechanisms can happen either at the DNA or at the RNA level. Two different
genes can get fused, after duplication or retroduplication (top panel), resulting in
a chimeric gene encoding a new protein. During RNA splicing (bottom panel),
the splicing machinery can eventually exchange blocks from two different
pre-mRNA (from two different genes or from the same gene) and create
chimeric mRNA. This chimeric mRNA encodes a new protein, and can also be
integrated into the genome through retroduplication.
Nevertheless the large number of confirmed trans-splicing events suggests that
this mechanism is universal and potentially creates novel functions by
increasing the complexity of the transcriptome. Such generated chimeric
proteins can have essential functions. For example, in fruit fly an entire family of
transcription factors, regulating the nervous system development, is generated
through trans-splicing from a single complex locus called lola (Horiuchi, Giniger
et al. 2003). It is constituted of 32 exons on the same DNA strand. In 5’, each of
the first four exons has an alternative transcription initiation site; the next four
exons are common to all splicing variants, and the last 24 exons are
alternatively spliced to the constant exons. More than 80 splicing variants can
be generated this way. So far, 20 Lola isoforms have been identified; different
isoforms specifying different aspects of axon growth and guidance. At least half
of these protein isoforms are generated through trans-splicing, meaning that
different parts from different pre-mRNAs variants generated from the lola locus
are subsequently re-arranged together to generate more variants. Such
complexity contributes to achieve a sophisticated expression pattern required
for
fine
regulations
of
developmental
processes.
Additionally,
transcription-induced chimeric mRNAs can eventually be integrated in the
genome by retroposition (figure 6.2). During human evolution, a transcript
chimera of two adjacent genes encoding respectively a specific type of kinase
and a proteasome subunit became a fixed gene. This newly emerged
retrogene, called pipsl, landed in a completely different chromosomic region, is
regulated differently to its parental gene and encodes a testis-specific
ubiquitin-binding protein unique to humans (Babushok, Ohshima et al. 2007).
This example illustrates the importance of trans-splicing for organismal
evolution.
New proteins can also be generated and integrated in a more persistent fashion
in an organism via gene fusion (figure 6.2). In this case, new chimeric genes
arise from adjacent, complete or partial, gene copies. These duplicates can be
provided from the previously described gene duplication, retroduplication and
eventually by retrotransposon transcripts carrying with them a copy of their
downstream adjacent genomic sequence. As these new genes are made from
copies of DNA sequences, the function of the parental sequence is preserved.
86
Gene duplication creates multiple genes copies in a given genome, and further
recombination errors can lead adjacent copies of related or completely different
genes to be fused together. The functional importance of gene fusion in early
evolution has been extensively illustrated (Patthy 1999), but fusion events of
gene duplicates have also been reported in relatively recently evolved species.
Several such chimeric genes have been identified in fruit fly (Zhou, Zhang et al.
2008), for example the new testes-specific expression chimeric gene Hun
(Arguello, Chen et al. 2006). In humans, the usp6 oncogene, encoding an
ubiquitin-specific protease, arose after partial duplication and fusion of two quite
different genes, usp32 and tbc1d3. They are both quite ubiquitously expressed
in human tissues, but usp32 is a unique ancient gene, highly conserved in
animals, encoding a protein with an ubiquitin-protease domain and tbc1d3 is a
newly evolved human specific gene, with multiple non-functional copies derived
from duplication. After partial duplication in a other part of the genome, the
fusion of the functional domain of tbc1d3 (Rab GTPase signalling function) and
of the ubiquitin-protease domain of usp32 created usp6, a testis specifically
expressed gene (Paulding, Ruvolo et al. 2003). New genes with testis-specific
expression patterns are often supposed to be involved in the emergence of
reproductive barriers. Overall, this specific case of gene fusion illustrated how
new chimeric genes may have contributed to human speciation, and organism
evolution in general.
Gene fusion can also happen after retroduplication, as this mechanism copies
coding sequences to new genomic locations. Retrocopies can land inside
another gene (figure 6.2), resulting in the formation of a new chimeric gene,
(Vinckenbosch, Dupanloup et al. 2006). Assuming that the gene where the
retrocopy landed has duplicates, so that its original function is conserved and/or
that the fusion is beneficial, this new chimeric gene might be retained through
evolution. A typical example of this is the first young gene ever characterised;
jingwei (jgw) in Drosophila (Long and Langley 1993). This gene, specific to the
yakuba
Drosophila
lineage emerged after
retroposition of the alcohol
dehydrogenase gene (adh) inside a partial duplicate of the gene yande (ynd)
and successive fusion of the adh retrocopy with the regulatory elements and the
three first exons of the ynd duplicate. The new JGW chimeric protein is also an
alcohol dehydrogenase, but has different targets in the pheromone and
hormone metabolism compared to the conserved parental ADH, representing a
87
positive selective advantage, which has been kept by natural selection (Zhang,
Dean et al. 2004). Similar fusion-after-retroduplication events have also been
reported in mammals (Sayah, Sokolskaja et al. 2004; Brennan, Kozyrev et al.
2008) and this process is extremely important in plants (Wang, Zheng et al.
2006; Zhu, Zhang et al. 2009). During plant evolution many mitochondrial genes
have been retrocopied into nuclear genes and fused to create new chimeras. In
most cases, the nuclear parental genes provided signal sequences allowing the
mitochondrial functions to be maintained by targeting the mitochondrion-derived
protein back into the mitochondria (Liu, Zhuang et al. 2009).
Retrotransposons can also contribute to the creation of new chimeric proteins.
Another fusion mechanism was discovered in human (Moran, DeBerardinis et
al. 1999) and later called retrotransposon-mediated transduction. During the
naturally occurring retrotransposons duplication events, the RNA transcription
machinery of a retrotransposon can accidently continue its transcription after
the
retrotransposon
polyadenylation
signal
down
to
an
alternative
polyadenylation site in adjacent DNA. The resulting retrotransposon transcript
will carry an extra copy of genomic DNA, and after retrotransposition this extra
piece of DNA as well as parts of the retrotransposon can be subjected to similar
fusion mechanisms as described above and eventually lead to the creation of a
new gene encoding a new functional protein. Few cases have been reported,
but a striking example is the gene setmar. During primate evolution, an Hsmar1
transposon got inserted downstream the set gene, and led to the creation of the
chimeric gene setmar, after fusion of the transposase coding sequence with the
set
gene.
The
resulting
chimeric
protein
exhibits
similar
histone
methyltransferase activity as its SET parent, but has an extra region of yet
unknown function (Cordaux, Udit et al. 2006).
Overall, the large number of described events and the diversity of fusion
mechanisms illustrate the importance of chimeric proteins for evolution. For
instance more than 30% of all new genes in Drosophila were shown to be
chimeric genes (Zhou, Zhang et al. 2008).
88
6.2.4 Acquisition of de novo proteins by horizontal gene transfer
Horizontal gene transfer (or lateral gene transfer) is a non-sexual mechanism by
which an organism incorporates genetic information from another organism
(figure 6.3). This mechanism can result in the production of a new protein in a
lineage, which occurs frequently in bacterial (Beiko, Harlow et al. 2005) and
microbial eukaryote evolution (Gladyshev, Meselson et al. 2008).
In plants, horizontal gene transfer is common between organelles to nuclei
(Keeling and Palmer 2008) or between hosts and plant parasites (Yoshida,
Maruyama et al. 2010). A recent study showed that during its evolution, the
mitochondrial genome of the angiosperm Amborella trichopoda acquired the
entire mitochondrial genome of two mosses in addition to whole-genome
transfers three from green algae (Taylor, Rice et al. 2015).
Horizontal gene transfer has been more rarely documented in animals. One
example of bacteria-to-animal horizontal gene transfer was reported in
Drosophila (Dunning Hotopp, Clark et al. 2007), but none of the genes have
been proven to have a function. More recently, more than 100 human genes
with important functions, involved in metabolism or immune responses for
example, have been reported as foreign genes because their sequences share
more identity with genes from non-animal organism than with other animals
(Crisp, Boschetti et al. 2015). However, evidence that they were acquired
through horizontal gene transfer is usually lacking, and they might have been
lost in most lineages and conserved only in human or could also have arisen by
other evolutionary mechanisms.
To summarise, except these events of endosymbiosis only a few validated
cases of horizontal gene transfer have been reported in eukaryotic lineages with
no confirmation of a new protein being consecutively produced in the receiving
organism (Gladyshev, Meselson et al. 2008; Pace, Gilbert et al. 2008).
Therefore, the contribution of horizontal gene transfer to the origin of new
protein-coding genes in multicellular eukaryotes remains to be confirmed and its
role and mechanism remain largely uncharacterised.
89
Figure 6.3. Rapid acquisition or emergence of de novo protein. A new
protein can appear in a given organism after horizontal transfer of genetic
information from another organism (top panel). Alternatively in a given organism
a gene can evolve a new translation initiation site de novo and thus encode two
protein instead of one, the ancestral one and an additional new protein.
Eventually this gene can further evolve to lose the ancestral translation initiation
site and then only encode the new protein (bottom panel).
6.2.5 Emergence of de novo protein by new alternative gene coding
As described above, various duplication and fusion mechanisms can contribute
in some rare cases to the creation of completely new proteins. The complexity
of the fusion or of the divergence after duplication can eventually lead to the
emergence of a completely new protein, with a unique sequence in the given
organism where it occurred and with a different function to it or its parental
sequences. However, duplication and fusion mechanisms generally lead to
variations of existing protein domains or re-shuffle them to create new assembly
of domains, and so are less likely to generate evolutionary innovations.
Another, originally less broadly observed, mechanism termed frameshifting or
overprinting can lead to the creation of new proteins from existing genes. It was
first thought to only concern micro-organisms. It was shown in bacteria (Ohno
1984; Delaye, Deluna et al. 2008), but mostly in viruses (Xue, Jones et al. 2003;
Carter, Daugherty et al. 2013; Pavesi, Magiorkinis et al. 2013), that mutations
leading to the creation of alternative open reading frames in a pre-existing gene
can lead to the emergence of a new protein sequence, with often conservation
of the pre-existing frame, allowing the production of two totally distinct proteins
from a single mRNA (figure 6.3). An alternative open reading frame coding
generates a completely different protein sequence due to the different codon
usage and represents a much greater potential for innovation than the
duplication or fusion of pre-existing genes. For instance, the emergence of
alternative open reading frames (ORFs) represents obvious advantages for
small genome organisms where duplication and fusion are limited, and it has
greatly contributed to virus evolution. It has been extensively studied and
reviewed for hepatitis B virus in particular (Torresi 2002; Zaaijer, van Hemert et
al. 2007; Zhang, Chen et al. 2010; Torres, Fernandez et al. 2013). In 2001, for
the first time in mammals, a case of overlapping reading frame was found
(Klemke, Kehlenbach et al. 2001). The mammalian gene encoding the first
subunit of the G-protein, involved in signal transduction particularly in
neuroendocrine cells, has an alternative translation initiation site 32 bases
downstream of the first one, thus the two ORFs have different frame. The first
exon (called XL) originally encodes for the first domain (called XL domain) of
the G-protein subunit 1. The presence of this alternative translation initiation site
90
and also of an alternative stop codon right at the end of the XL exon, lead to the
production of another protein called ALEX from the same mRNA, but encoded
only by the XL exon, thus much smaller than the original G-protein subunit 1,
and totally different in sequence due to the different frames. ALEX interacts with
the XL domain of the G-protein to achieve fine developmental and neurological
functions. This locus and its dual protein expression has been conserved from
mouse to human, with a rate of mutation much more important than average,
especially
between
primates
and
human,
suggesting
involvement
in
species-specific neurological differences (Nekrutenko, Wadhawan et al. 2005).
Such a striking example drew more attention on this mechanism and its
relevance in eukaryotic evolution too. Further studies whose goal was
systematic detection of de novo alternative gene coding events in mouse and
human genomes revealed hundreds of alternative open reading frames leading
to the production of newly expressed stable proteins (Okamura, Feuk et al.
2006; Vanderperre, Lucier et al. 2013), mainly being species-specific. This
illustrates the importance of this mechanism for the creation of novel functions,
and particularly for speciation (Wu and Sharp 2013).
6.2.6 De novo protein evolution
Other mechanisms, more recently discovered, can lead to the creation of new
proteins ‘from scratch’, therefore called de novo evolved protein. Similarly to
alternative gene coding or horizontal gene transfer, de novo evolution can lead
to the creation of completely new proteins with new sequences and functions,
anew to the concerned organism. By contrast to new proteins created by other
evolutionary mechanisms, de novo evolved proteins are encoded by a genomic
sequence with no link to a parent coding sequence, representing true orphan
genes that exist only in closely related species, at least at the time of its
emergence.
Before the omic era it was widely accepted that “The probability that a functional
protein would appear de novo by random association of amino acids is
practically zero” (Jacob 1977). In fact, it seems now that de novo evolution of
protein is quite common, but specific cases are complicated to identify, as it
requires complete assembly of genome or transcriptome sequences of
numerous related species. This kind of data is still difficult to put together
91
nowadays, but is becoming increasingly achievable. The evolution of de novo
genes is emerging as an important source of new proteins in addition to the
familiar ‘parental gene’ model common to the traditional and previously
described mechanisms (Long, Betrán et al. 2003; Bornberg-Bauer, Huylmans et
al. 2010; Long, VanKuren et al. 2013). Cases of de novo genes have now been
reported in all living kingdoms. More precisely, striking examples with important
functions have been described in bacteria (Yang and Huang 2011), yeast (Cai,
Zhao et al. 2008; Li, Dong et al. 2010), plants (Xiao, Liu et al. 2009), Drosophila
(Begun, Lindfors et al. 2007), mouse (Heinen, Staubach et al. 2009) and human
(Li, Zhang et al. 2010; Wu, Irwin et al. 2011). Comparative genomics and
transcriptomics approaches have then revealed a plethora of de novo evolved
transcripts (Carvunis, Rolland et al. 2012; Neme and Tautz 2013; Neme and
Tautz 2014; Zhao, Saelao et al. 2014) illustrating that de novo evolution is a
significant source of new proteins and new functions (Tautz 2014). The
contribution of de novo genes to organism evolution and speciation is now
accepted. For example, it was recently showed that de novo genes play
complex roles in the human brain and that most of them are expressed in the
evolutionarily newest part of human brain, the neocortex. Remarkably, these
genes arose soon after its morphological origin and their expression pattern
suggest a function in early brain development (Zhang, Landback et al. 2011).
This demonstrates how de novo genes have contributed to the emergence of
new brain functions during human evolution.
92
Figure 6.4. De novo protein evolution through de novo gene evolution. De
novo genes evolve from non-coding DNA. A non-coding DNA region can
become coding and acquire a transcription initiation site and regulating
sequences through several mutations and insertions. More changes are
required before the newly evolved gene can encode a stable protein.
Recent studies have begun to reveal mechanisms that have enabled de novo
gene evolution (Xie, Zhang et al. 2012; Reinhardt, Wanjiru et al. 2013). De novo
genes evolve from non-coding DNA (figure 6.4). First several mutations and
insertions lead this non-coding DNA to become transcriptionally active. This
notably includes emergence of a new transcription initiation site and acquisition
of regulating sequences. At this point of its evolution the emerging gene only
produces a non-coding RNA (figure 6.4). Further mutations then allow the new
gene to encode a protein, a long process during which the regulation of its
transcription keeps evolving too. Alternatively, a dormant open reading frame
can evolve new regulatory elements and become transcribed to create a new
coding gene (figure 6.4). Subsequent challenges awaiting a de novo evolved
gene before encoding a stable protein include successful trafficking of the
encoded protein precursor, folding, processing and storage of the matured
protein (figure 6.4). The complexity of the overall process of de novo gene
creation led to the general assumption that such new genes would have a short
intronless coding sequence. However, it has been shown that de novo genes
are a great source of long coding RNAs (Ruiz-Orera, Messeguer et al. 2014)
and that de novo genes can evolve introns before they become an ORF or
acquire some after (Yang and Huang 2011). In addition, many orphans genes
(present in only one species) were identified in the past as having important
functions, suggesting their involvement in the evolution of adaptive traits
(Domazet-Loso and Tautz 2003; Khalturin, Hemmrich et al. 2009) and recent
studies suggest that an important part of these orphans genes evolved de novo
(Toll-Riera, Bosch et al. 2009; Tautz and Domazet-Lošo 2011).
Overall, de novo gene evolution, a mechanism first widely neglected, then long
underestimated, is now emerging has one of the most important evolutionary
mechanisms for the creation of novel functions, and numerous examples will
continue to be discovered. Another rarely considered route to de novo protein
evolution is for a new protein to evolve within an already existing functional
protein precursor. De novo genes emerge from previously non-functional
genomic sequence, unrelated to any pre-existing functional genes. However, in
an organism, a sequence encoding for a new independent protein, unrelated to
any pre-existing sequence in this organism, can also emerge de novo inside a
pre-existing gene, through a mechanism termed neoproteinisation.
93
Figure 6.5. De novo protein evolution through neoproteinisation.
Neoproteinisation occurs when a genetic expansion in an ancestral gene, that
does not alter the coding frame or the sequence for the ancestral protein, leads
to the creation of a new protein sequence that is co-produced with the ancestral
protein in a single protein precursor cleaved during proteolytic maturation,
allowing two proteins instead of one to be produced from a single gene. To
ensure transcription of the newly evolved sequence and for the ancestral
protein precursor to be maintained, the expansion has to happen inside the
ORF, but in a region encoding a propeptide removed during the ancestral
protein proteolytic maturation. At some point in its evolution such expansion can
acquire the ability to be processed from the protein precursor and to remain
stable, and then continue to evolve new bioactivities.
6.2.7 Neoproteinisation, a new mechanism of divergence that
creates de novo proteins
SFTI-1 is the first member of a recently discovered family of buried peptides
that evolved within the daisy (Asteraceae) family over forty million years ago.
The SFTI-1 coding sequence appears to have evolved via a genetic expansion
within a napin-type albumin gene, without modifying the ancestral napin-type
albumin coding frame, allowing two different proteins to be proteolytically
matured from one precursor (Elliott, Delay et al. 2014) by the same
endoprotease. This evolutionary mechanism that I call neoproteinisation has
never been fully defined as an evolutionary mechanism (figure 6.5).
Several points differentiate neoproteinisation from other de novo evolution
mechanisms. Insertions and mutations in a gene can create an alternative
splicing site (Sela, Itin et al. 2008) or an alternative coding frame (Klemke,
Kehlenbach et al. 2001; Nekrutenko, Wadhawan et al. 2005) allowing a gene to
produce a new additional protein. Nevertheless, most de novo evolved proteins
described are encoded by de novo evolved genes (Zhou and Wang 2008;
Kaessmann 2010; Light, Basile et al. 2014). Although initially considered rare or
unusual (Jacob 1977), de novo gene evolution is now thought to be common,
and although protein evidence for these de novo genes is often lacking, some
have been shown to have acquired important functions (Heinen, Staubach et al.
2009; Chen, Zhang et al. 2010; Wu, Irwin et al. 2011; Reinhardt, Wanjiru et al.
2013). De novo gene evolution usually involves a silent or non-coding genome
region acquiring regulatory sequences and a transcription initiation site, thus
becoming transcriptionally active pseudogene. By further evolving a translatable
open reading frame, the newly evolved gene encodes a protein, but this still
must have proper processing, folding, and trafficking to be a stable component
of the proteome. When the new gene provides a selective advantage, it is
retained (Xie, Zhang et al. 2012; Reinhardt, Wanjiru et al. 2013).
94
Figure 6.6. Alternative splicing. All the different types of alternative splicing
events that can be encountered by a pre-mRNA are represented. They
generate alternative transcripts producing distinct proteins. Alternative
sequences are indicated by beams. The most extreme possibility of alternative
splicing, the use of alternative translation initiation sites in different frames,
leads to the production of a completely different protein sequences (represented
in different colours).
By contrast, neoproteinisation is not preceded by de novo gene evolution and
doesn’t require modification of the coding frame. The DNA sequence encoding
SFTI-1 emerged de novo within an ancestral transcriptionally active ORF and
did not change the coding frame or the sequence of the ancestral encoded
protein. Additionally, SFTI-1 uses the trafficking and processing machinery of its
adjacent protein (Mylne, Colgrave et al. 2011; Elliott, Delay et al. 2014). Thus,
neoproteinisation might be an unlooked for and effective route for de novo
protein evolution as it bypasses many of the evolutionary constraints de novo
genes face to evolve stable proteins.
6.3 How to discover other neoproteinisation events?
In order to discover other neoproteinisation events, it is first necessary to
understand
what
differentiate
neoproteinisation
from
other
previously
characterised evolutionary mechanisms.
6.3.1 Neoproteinisation creates unusual genes encoding multiple
proteins
Neoproteinisation is an evolutionary mechanism that leads to the creation of a
new gene (from an ancestral one) encoding the ancestral protein and an
additional de novo evolved protein. There are various ways for a gene to
encode more than just one protein, including mechanisms of alternative splicing
and the numerous possible post-translational modifications.
Alternative splicing is a mechanism regulated by histone modifications (Luco,
Pan et al. 2010) by which a single gene can produce different mRNAs (figure
6.6) by the use of alternative exons, transcription initiation sites or termination
sites or by the retention of introns (Blencowe 2006). Most eukaryotic genes are
alternatively spliced to generate two or more alternative mRNA, individually
translated into different proteins, increasing the complexity of an organism’s
proteome or of a cell-specific proteome (Nilsen and Graveley 2010).
In mammals, the gene for Flt1, a vascular endothelial growth factor (VEGF)
transmembrane receptor, is alternatively spliced to generate sFlt1, a truncated
receptor lacking the transmembrane and intracellular domains. This soluble
95
receptor acts as a VEGF inhibitor, because it has the same binding domain as
Flt1. In humans only, another splicing variant exists, called sFlt1-14, differing by
75 amino acid residues with sFlt1 on the N-terminal end. VEGF is a key factor
in mammals for embryo development and adult neovascularisation. Whereas
Flt1 and sFlt1 are broadly expressed and contribute to the transmission and
regulation of the VEGF signal, sFlt1-14 appears to be more placenta-specific
and associated with preeclampsia, a human-specific disease affecting about 5%
of pregnancies, whereby sFlt1-14 becomes the prevalent placental produce of
Flt1, resulting in the loss of Flt1 function. (Sela, Itin et al. 2008; Souders,
Maynard et al. 2015). This example alone illustrates how alternative splicing
contributes to establish fine regulations, signal transmission regulation in this
case, and possibly speciation, and how its dis-regulation can lead to severe
diseases. Other examples of alternative splicing are even more striking,
illustrating how alternative splicing can greatly contributes to increase the
proteome complexity and can achieve complex processes. A chicken specific
gene called cSlo encodes, through alternative splicing, 576 different proteins,
each preferentially expressed in the various hair cells of the avian cochlea to
specifically react to different sound frequencies and allow hearing (Black 1998).
In the fruit fly, the gene Dscam encodes a nervous system cell-surface protein
necessary for axon targeting and is alternatively spliced to create up to more
than 38,000 potential variants. Although the exact number of expressed variants
is unknown, it has been shown that specific cells express unique combinations
of up to 50 variants, potentially creating a form of unique cell identity (Neves,
Zucker et al. 2004). Another type of post-transcriptional change, known as RNA
editing, by which RNA undergo discrete changes, can change the genomic
information. However, RNA editing, common in mitochondria, is rare for nuclear
genes with very few examples reported in animals, and thus is regarded as a
minor contributor to organismal protein diversity (Brennicke, Marchfelder et al.
1999).
96
Figure 6.7. Protein post-translational events. After translation several events
can generate multiple proteins (in blue font). The direct product of translation is
often a protein precursor that requires post-translational modifications to
become functional. These post-translational modifications can include
methylation, acetylation, hydroxylation, phosphorylation, nitrosylation, sulfation,
or sugar, lipid or small protein attachment (a) A given protein precursor can
eventually undergo different post-translational modifications to have different
functions or to modulate its function, depending on various factors such as cell
type for example, thus the gene encoding this protein precursor produce
multiple different proteins. (b) The most common post-translational modification
is proteolytic cleavage, commonly used for signal sequence removal and
maturation of multi-units proteins from a single precursor (in light blue font). (c)
Proteolytic cleavage can also mature multiprotein precursors by cleaving them
into two or more proteins of identical, related or different sequence and function.
(d) Protein degradation can sometimes result in the production of cryptic
peptides with biological activities (in darker blue font).
Another way for a gene to encode for multiple proteins is through post
translational modifications. The primary product of translation often goes
through modification processes such as sugar, lipid or small protein (e.g.
ubiquitin, SUMO) attachment on specific residues or amino-acid modifications,
including methylation, acetylation, hydroxylation, phosphorylation, nitrosylation
or sulfation (Mann and Jensen 2003), thus increasing the protein diversity
(figure 6.7a). However, probably the most important set of modifications
occurring during protein maturation is generated by the action of proteases
(figure 6.7b). The process of protein maturation by proteolytic cleavage is
widely studied, and is required for the maturation of many proteins (Rogers and
Overall 2013). It is particularly important for the maturation of multi-chain
proteins originally produced as a single-chain precursor, such as insulin
(Steiner, Cunningham et al. 1967; Chance, Ellis et al. 1968; Rubenstein, Clark
et al. 1969). Another type of precursors, named polyproteins, depends on
proteases for maturation.
Polyproteins are encoded by genes producing a single mRNA that is translated
into a single protein precursor, but this precursor is cleaved into multiple
proteins during its maturation (figure 6.7c). It is particularly important for
viruses, as many viral genomes encode just one polyprotein, which is cleaved
into multiple functional proteins (Kazachkov Yu, Svitkin Yu et al. 1983;
Carrington and Dougherty 1988; Korant 1990). Polyproteins have also been
reported in all types of organism such as bacteria (Thony-Meyer, Bock et al.
1992), yeast (Singh, Chen et al. 1983; Gessert, Kim et al. 1994), amphibians
(Hoffmann, Bach et al. 1983), fish (Hsiao, Cheng et al. 1990), insects (Settle,
Green et al. 1995) and plants (Pearce, Moura et al. 2001) where it is particularly
frequent for seed storage proteins (Shewry, Napier et al. 1995). In animals,
most reported polyproteins produce hormones or neurotransmitters such as the
mollusc bag cell neurotransmitters, all cleaved from a single precursor
(Rothman, Mayeri et al. 1983), or the mammalian hormone dual precursors of
arginine-vasopressin and neurophysin II (Land, Schutz et al. 1982) or of
oxytocin and neurophysin I (Land, Grez et al. 1983).
97
Most known polyprotein precursors generate proteins with identical or related
sequences and functions (see (Douglass, Civelli et al. 1984) for review). These
precursors contain tandem repeats of the same or similar proteins, further
cleaved into independent functional proteins. For example in plant seeds,
storage albumin precursors can contain two identical or related copies of
albumin (Shewry and Pandya 1999) and a fish gene encoding antifreeze
glycopeptides (AFGP) produce a single polyprotein precursor including 44
tandem repeats of AFGP8 and two of AFGP7 all separated by a three amino
acid spacers, further cleaved to produce the 46 proteins (Hsiao, Cheng et al.
1990). More rarely, polyprotein genes can also encodes multiple proteins with
unrelated sequences and unique biochemical actions such as the two
neurophysin containing precursor described above. The most extensively
studied example is most probably the human proopiomelanocortin (or POMC)
prohormone precursor. The POMC precursor can be first cleaved into
y-melanotropin (y-MSH), adrenocorticotropin (ACTH) and b-lipotropin (b-LPH),
three hormones with unrelated sequences and structures and unique functions
(Mains, Eipper et al. 1977; Spiess, Mount et al. 1982). ACTH is primarily a
positive regulator of cortisol production and release, y-MSH induces melanin
production and b-LPH stimulates lipolysis. Further cleavages can occur and cut
these
proteins
into smaller
components
involved in various
biological
responses. ACTH can be cleaved to release another melanotropin (a-MSH) and
b-LPH
is
also
a
precursor
for
the
endorphins
(endogenous
opioid
neuropeptides) and b-MSH, a third melanotropin (Ueki, Someya et al. 2007).
The several proteins derived from POMC are preferentially produced in a
tissue-specific fashion thanks to cell-specific production of the processing
proteases (Autelitano, Rajic et al. 2006), illustrating how a single gene can
produce a unique translational product, but still encodes multiple proteins.
Because the secondary POMC derivatives are cleavage products of functional
proteins, thus representing additional bioactive polypeptides hidden in bigger
functional proteins, they are sometimes referred as cryptides or crypteins. Later,
many other examples of so-called cryptic peptides have been described and
reviewed (Autelitano, Rajic et al. 2006; Pimenta and Lebrun 2007; Ueki,
Someya et al. 2007; Samir and Link 2011). The extracellular matrix is supposed
to be the greatest contributor of cryptic fragments (Schenk and Quaranta 2003),
such as endostatin (Yamaguchi, Anand-Apte et al. 1999), restin (Sasaki,
98
Larsson et al. 1999), tumstatin (Maeshima, Sudhakar et al. 2002), canstatin
(Kamphaus, Colorado et al. 2000) and arresten (Colorado, Torre et al. 2000),
but all are degradation products of collagen chains and all are also reported to
act as modulators of angiogenesis and tumour cell growth. Unlike POMC
derivatives or more traditional protein encoded by polyprotein genes, cryptides
are degradation products of functional proteins (figure 6.7d) and not the
product of the complex and regulated maturation of a polyprotein precursor. In
addition, the finding that peptides derived from larger functional proteins can
affect biological processes does not prove that these peptides are true
endogenous signalling molecules if their production and eventual secretion are
not regulated (Gelman and Fricker 2010), which is still largely unknown.
Therefore, most cryptic peptides reported so far, cannot be considered as
additional proteins in a given proteome, but rather represent a new ‘ome’, the
cryptome (Pimenta and Lebrun 2007).
PawS1 represents an unusual polyprotein precursor. SFTI-1 is not a cryptic
peptide because it is produced from an immature protein precursor, along with
the other protein (a napin-type albumin). In this aspect, PawS1 is a traditional
polyprotein precursor, but unlike most of them, PawS1 is cleaved into two
proteins of unrelated sequence, size and function. Additionally, unlike these
rarely described polyprotein genes encoding unrelated proteins, PawS1 belongs
to a family of usually highly conserved genes, the plant napin-type albumin
genes. The PawS1 family of napin-type albumin genes is restricted to only
sunflower and its close relatives and has been shown to have evolved de novo
from an ancestral napin-type albumin gene to encode an additional protein
(Elliott, Delay et al. 2014), through a newly described mechanism I propose to
term neoproteinisation.
99
6.3.2 Leads for discovery of other neoproteinisation events
The proteolytic cleavage of large precursors to mature or ‘free’ functional
proteins can be regarded as the most important post-translational modification
mechanism due to its irreversibility and because it is ubiquitous to most
organisms (Rogers and Overall 2013). This maturation processes generates
by-products, the spacers and other propeptides, surrounding the different
chains or proteins included in the larger precursors. Once again, it is important
to note that such peptides are not to be considered as ‘cryptic’ because they are
produced during maturation of an inactive protein precursor, and not by the
degradation of a functional protein. Unfortunately, these peptides have been
historically considered as non-functional or even junk. To the contrary, the
emphasis created by the cryptome debate described above has brought to light
this so-called junk which then revealed that it could potentially be a great
reserve of a plethora of uncharacterised bioactive peptides. One of the best
examples is the human C-peptide produced during maturation of insulin. The
precursor for insulin was first described in 1968 (Chance, Ellis et al. 1968) along
with the sequence of the peptide connecting the two insulin chains. This peptide
was later called the C-peptide and in 1969 it was already reported to be
secreted from pancreas to blood along with insulin (Rubenstein, Clark et al.
1969) and proposed to have biological significance. Surprisingly this study
seems to have been long underestimated. More than two decades later
beneficial effects of C-peptide on diabetic patients only started to be reported
(Johansson, Kernell et al. 1993), illustrating the potential of this neglected
peptide. During the last decade several studies have elucidated the role of
C-peptide in reducing diabetes effects, showing that the C-peptide affects the
expression of specific genes to antagonise endothelial dysfunction (Luppi,
Cifarelli et al. 2008) and prevents renal hypertrophy by inducing arteriole
dilatation and repressing sodium reabsorption (Nordquist, Brown et al. 2009;
Nordquist and Wahren 2009). The C-peptide is now regarded as an
endogenous hormone, but no C-peptide receptors have been reported so far,
so its detailed mode of action is still unclear.
100
Interestingly the fact that the C-peptide was a small peptide lacking interspecies conservation, both in chain length and amino acid composition
(excepted for the conserved cleavage sites), were the major reasons leading to
it being neglected as having a function. No assumption should be taken
regarding the size of a protein (as one says “size doesn’t matter”). The past
assumption that small proteins (less than 100 amino acid residues) are less
likely to have important functions is no longer widely accepted. Several studies
have demonstrated how small peptides, even very small peptides, can have
incredibly important functions (see (Su, Ling et al. 2013) for review) such as the
eleven
amino
acid
residues
protein
TAL
from
Drosophila influencing
development by controlling gene expression and tissue folding (Galindo, Pueyo
et al. 2007). Similarly to the C-peptide, the N-terminal propeptide of napin-type
albumin precursors also displays low conservation (excepted for the highly
conserved AEP cleavage site) relative to the more highly conserved napin-type
albumin domain. Previous studies showed that, in some napin-type albumin
genes from the daisy family of plants only, this propeptide evolved de novo to
contain a new bioactive peptide, such as SFTI-1 in sunflower (Elliott, Delay et
al. 2014). Therefore, the characteristics that led to human C-peptide function to
be overlooked might exactly be what we should look for to discover new
bioactivities. The investigation of widely conserved protein families that require
proteolytic cleavages for their maturation, such as napin-type albumins or
insulins, could reveal a multitude of new bioactive peptides.
Furthermore, studying the detailed evolution of such peptides, like the previous
work
done
for
SFTI-1
evolution,
might
also
reveal
other
cases
of
neoproteinisation. For example, in the next Chapter, I present a study of vicilin
precursors (another type of seed storage protein) suggesting that in many
plants species, vicilin precursors can possess additional domains, cleaved from
the precursors by proteases during maturation, creating stable peptides with
functions unrelated to seed storage proteins, such as digestive proteases
inhibition and cytotoxic activities.
101
6.4 Conclusion
The SFTI-1 family is the first family of proteins to have been shown to have
evolved by neoproteinisation, a process by which a new protein-coding
sequence evolve de novo inside an ancestral gene, without modifying the
coding frame or sequence for the mature ancestral protein. Next generation
sequencing has already greatly helped change the old vision of evolution. It was
long-considered that de novo gene evolution was almost impossible and
probable examples were neglected rather than being studied. Nowadays, the
evolution of de novo genes is emerging as an important source of new proteins
in addition to the familiar ‘parental gene’ model and cases have now been
reported in all living kingdoms. I believe that, similarly, many other cases of
neoproteinisation might have been overlooked in the past, and the use of
modern approaches such as next generation sequencing allowing whole
genome and/or transcriptome sequencing for a growing number of species will
help reveal more neoproteinisation events.
This mechanism appears to be remarkably efficient in many regards. First,
neoproteinisation seems to create particularly efficient new bioactivities. Indeed,
as previously said, neoproteinisation led to the creation of SFTI-1, the most
potent trypsin inhibitor. SFTI-1 is also the second smallest trypsin inhibitor,
which can also be regarded as a form of efficiency in terms of size to achieve a
same function, compared to all the larger trypsin inhibitors. My previous results
also demonstrated that SFTI-1 was a remarkably efficient peptide sequence for
maturation. The AEP processing sites of SFTI-1 evolved prior the emergence of
the trypsin inhibitory activity (Elliott, Delay et al. 2014), suggesting a strong
dependency toward the processing enzyme for the evolution of SFTI-1. In
addition to seed storage protein maturation, AEP was already demonstrated to
be involved in the processing of many peptides. Overall, this suggests that AEP
processed proteins might represent potential “hot-beds” of neoproteinisation
events. Seed storage proteins have long precursors that require maturation by
AEP for their maturation and for this reason could be of particular interest for
the identification of other events of neoproteinisation. In the next Chapter, I
present evidence supporting that the vicilin type of seed storage proteins
precursors might also have evolved additional peptides by neoproteinisation.
102
Chapter 7
Neoproteinisation in vicilin
precursors
103
Figure 7.1. C2 and similar peptides. The sequences of the C2 peptide from
pumpkin, Luffin P1 from a pumpkin relative and the three MiAMP2 peptides
from macadamia nuts are presented and aligned. The NMR structure of native
Luffin P1, (Li, Yang et al. 2003). Luffin P1 shares 78% sequence identity with
C2.
7
Neoproteinisation in vicilin precursors
7.1 Introduction
If neoproteinisation was a universal mechanism, there should be other
examples. Following the recommendations I presented in the previous Chapter,
I searched published datasets for other cases of neoproteinisation. An unusual
peptide from pumpkin (Cucurbita maxima) seeds called C2 was shown to
emerge from the N-terminal domain of a precursor protein that also encoded a
very different type of seed storage protein called vicilin (Yamada, Shimada et al.
1999). The precursor was called PV100 and is matured by AEP into a 50 kDa
vicilin, a 5 kDa trypsin inhibitory peptide called C2, and three other peptides
from 4-5 kDa with cytotoxic activities (Yamada, Shimada et al. 1999). The C2
peptide has a pair of CXXXC motifs connected by disulfide bonds, and its
C-terminus presents two possible AEP cleavage sites. The two forms of C2,
C2a (5615 Da) and C2b (5829 Da), only differ by two amino acid residues (see
figure 7.1 for sequence). The C2 peptide is a trypsin inhibitor and the CXXXC
motifs were proposed to be the active sites by homology with other trypsin
inhibitors
where
conserved
interconnected
CXXXC
motifs
have
been
characterised as the active sites (Yamada, Shimada et al. 1999). Similarly,
another
research
group
isolated,
from
macadamia
nuts
(Macadamia
integrifolia), three similar peptides named MiAMP2 (b, c, and d) and which
display antimicrobial activities (Marcus, Green et al. 1999). These three
peptides are also encoded in a single gene encoding a polyprotein precursor
that also contains a typical vicilin domain. This polyprotein precursor was
reported to share lot of similarities with PV100. These MiAMP2 (b, c, and d)
peptides present a pair of CXXXC motifs like the C2 peptide from pumpkin and
they possess a similar sequence (figure 7.1). They are also spaced by AEP
target sites, as the similar peptides from the PV100 precursor, and possess an
AEP target site right before their N-terminus. However, their mature sequence
suggests they would receive additional processing after AEP processing, which
was demonstrated for MiAMP2c. The C-terminal propeptide that separate
MiAMP2c from the AEP cleavage site prior MiAMP2d was also sequenced in
the immature form of MiAMP2c (Marcus, Green et al. 1999). A few years later a
104
close homologue of the C2 peptide was identified in a pumpkin relative (Luffa
cylindrica), during a screen for ribosome inactivating peptide (Li, Yang et al.
2003). However, in this study only partial DNA sequence was obtained from the
gene encoding this peptide, and thus the full length protein precursor was
unknown and the homology with C2 was not noticed. Despite the lack of full
length gene sequence, the double CXXXC motifs, as well as the overall
sequence conservation (figure 7.1) implies that Luffin P1 is a C2 homologue
and is derived from a homologue of PV100. Luffin P1 was further characterised
and shown to display anti-HIV activities and its structure was solved by NMR
(Ng, Yang et al. 2011). It confirmed the cysteine bonding (figure 7.1) previously
assumed for the C2 peptide, based on an unrelated peptide with two CXXXC
motifs, as also suggested for the MiAMP2 peptides. Overall, the sequence of
C2 and MiAMP2 peptides is particularly conserved and they define a family of
peptides buried in vicilin precursors with various defence-related functions.
Additionally the C2 peptide and its homologues are proposed to be processed
entirely or partially by AEP.
PV100 is an AEP-matured polyprotein precursor producing a seed storage
vicilin plus additional peptides; this makes PV100 excitingly similar to PawS1
and SFTI-1 biosynthesis. It suggests that, similarly to the emergence of SFTI-1
in an ancestral albumin precursor, a family of C2-type peptides could have
evolved by neoproteinisation of an ancestral vicilin precursor. Vicilins constitute
another family of low molecular weight seed storage proteins, also referred as
7S globulins. They vary greatly in sequence and structure to other seed storage
proteins and are the subject of heavy post-translational modifications (Shewry,
Napier et al. 1995), but their proteolytic maturation has been poorly described
apart from PV100. Thus, it appears that seed storage protein precursors would
be of particular interest to look for neoproteinisation events. Additionally, the
MiAMP2 precursor contains four repeats of double CXXXC motifs, including
three that have been characterised at the protein level, and PV100 contains two
double CXXXC motifs, C1 and C2, but only C2 was characterised at the protein
level. This strongly suggests internal expansion in the N-terminal part of the
vicilin precursors in addition to neoproteinisation. Finally, the phylogenetic
distance between Cucurbita maxima and Macadamia integrifolia, where C2 and
MiAMP2 were respectively discovered, suggests that this family of doubleCXXXC-motif peptides is widespread and constitute a much more ancient
105
neoproteinisation event than SFTI-1, which is confined to the Asteroideae
subfamily of the Asterales plant order (ca 42 Mya, Jayasena et al. unpublished).
To roughly date the neoproteinisation of vicilin precursors, I investigated
published datasets to identify homologues of PV100 and MiAMP2 precursors in
the entire plant kingdom. I generated an alignment of most known vicilin
precursors described so far. I characterised different classes of vicilin
precursors, and in particular one that contains additional buried peptides similar
to these present in PV100 and MiAMP2 protein precursor. This class of vicilin
precursor contain several repeats of conserved CXXXC(10-14)CXXXC motifs
widespread in the plant kingdom, confirming that the C2 peptides and its
homologues represent a large and ancient family of peptides buried in vicilin
precursors. Then, using mass spectrometry, I identified several masses for
potential C2 peptide homologues in close relatives. Although more data will be
necessary to establish the detailed evolutionary story of these vicilin buried
peptides, they represent a second family of buried peptides that have evolved
by neoproteinisation.
106
7.2 Material and methods
7.2.1 Vicilin alignment
NCBI and UniProtKB databases were searched for protein sequences
annotated as vicilin and for protein sequences similar to PV100. All results were
gathered (983 unique entries) and only unique and full ORF sequences (452
sequences) were retained. Then, these sequences were further filtered to
eliminate, in a given species, multiple entries of the same sequences with minor
variations possibly due either to sequencing errors or recent gene duplication.
Such sequences would bring little information and could introduce a bias in the
alignment. In this way, the sequence number was reduced to 291 (including
PV100 and MiAMP2 precursors). These sequences were aligned with CLC
Genomics (CLCbio) and additional modifications to the alignment were made
manually to correct obvious alignment errors creating unnecessary large gaps
when only one or a few amino acid residues from one protein precursor would
not align with the 290 other protein precursors.
7.2.2 Seed extract fractions preparation
Seeds of all plant species studied in this study were frozen in liquid nitrogen and
ground to a fine powder with a mortar and pestle. The meal was covered with
an excess of hexane, mixed for 5 min before centrifugation for 5 min at 20000 x
g at room temperature, and the upper hexane/oil phase discarded. The meal
was covered with hexane a second time and the above process repeated. The
now defatted meal was allowed to dry and then resuspended in 1:1
methanol:chloroform and mixed for 5 min. To separate the phases, the mixture
was then diluted to 0.01% (v/v) trifluoroacetic acid with a 0.05% (v/v)
trifluoroacetic acid solution and was then mixed for 2 min before centrifugation
for 5 min at 20000 x g at room temperature. The aqueous phase was
desiccated and resuspended in 5% (v/v) acetonitrile 0.1% (v/v) trifluoroacetic
acid. Proteins were separated on a C18 column (Macrospin columns, 50-450
μL, The Nest Group Inc) with a 5% to 75% (v/v) acetonitrile linear gradient with
5% increments and fractions were collected at each elution points. After
desiccation, these fractions were resuspended 50% (v/v) acetonitrile for
107
MALDI-TOF-MS analysis or in water for subsequent trypsin digestion as
previously described (Mylne, Colgrave et al. 2011) and LC-MS/MS analysis.
7.2.3 MALDI-TOF-MS analysis
The matrix used for MALDI-TOF-MS was prepared by adding an excess of
α-cyano-4-hydroxycinnamic acid matrix (Fluka) to 90% (v/v) acetonitrile 0.1%
(v/v) trifluoroacetic acid, followed by sonication in a water bath for 15 min and
then centrifugation for 5 min at 10,000 x g. The supernatant was diluted 10-fold
in acetonitrile to make the final matrix solution used in MALDI-TOF-MS. For
each seed extract fractions, the previously resuspended fraction was mixed in
equal proportion with the final matrix solution, and 2 μL of this mix were spotted
onto MTP AnchorChip™ 384 plate. Dried spots were analysed with an UltraFlex
III MALDI-TOF/TOF mass spectrometer (Bruker Daltonics) at 30% laser
intensity.
7.2.4 LC/MS analysis
For each sample, 8 µg of digested proteins were injected into an Agilent 6550
Q-TOF and processed as described in 5.2.5. Spectra were searched against
the PV100 sequence using Mascot 2.3.02 (Matrix Science).
108
Figure 7.2. Alignment of 291 vicilin precursors. The result is presented in the
coloured top alignment with a consensus schema of plant vicilin precursors
above. The colour code is random with one colour specifically attributed to one
letter. The three major groups are presented, with group I constituting the
ancestral form, and two different events leading to the emergence of either
group II or III. The group IV is an exception inside group III, see explanations in
the text. The sequence of the characterised C2 and MiAMP2 (b, c, d) peptides
are marked by a star. The bottom alignments 1 and 2 are a zoom-in of the
boxes marked in the alignment. In these, shown in black and white, cysteine
residues are highlighted in red to accent the CXXXC(10-14)CXXXC motifs. An
additional zoom 3 is presented, in colour again to highlight the sequence
homology and contain sequences supported by mass-spectrometry as shown
by black boxes. The colour code is the same as the alignment above. Zoom 1
and 2 will be presented in larger resolution in the following part (7.3.2).
7.3 Results
7.3.1 Several classes of vicilin precursors
As introduced above, I generated an alignment of 291 vicilin precursors
published sequences. They represent data from 100 species, spread over 22
plant orders and covering most of the plant kingdom from ferns (Polypodiales)
to angiosperms. On the alignment presented in figure 7.2, I have outlined what
I see to be four classes of vicilin precursors, and they are represented in most
families of higher plants.
Class I vicilin precursors contains only an ER signal and the vicilin domain,
which is highly conserved throughout the plant kingdom. The class I is likely to
represent the ancestral form of vicilin precursors (see later comment). What I
call the class II vicilin precursors have an additional C-terminal domain rich in
arginine and glutamine residues. The class III vicilin precursors, including
PV100 and MiAMP2 precursors, has two additional N-terminal domains, a
cysteine-rich domain with conserved CXXXC(10-14)CXXXC motifs and a
domain rich in arginine and glutamine residues. These two domains are
constituted of repeats of the same motifs, and the number of repeats is variable
between species and inside each species ranging from one to eight repeats.
The characterised C2 peptide and three MiAMP2 peptides are marked. Their
dispersed positions on the alignment suggest that potentially many other plants
could also produce homologues of these peptides. Finally, the class IV vicilin
precursors have only an additional N-terminal domain rich in arginine and
glutamine residues, but this class appears restricted to legumes (Fabales).
Class IV align within class III due to high sequence homology between their
vicilin domains and these of other legume class III vicilin precursors.
109
Figure 7.3. Emergence of new classes of vicilin precursors. Results from
the alignment of 291 vicilin precursors summarised on a recently published
phylogenetic tree (Ruhfel, Gitzendanner et al. 2014). Species highlighted in blue
were represented on the alignment, as well as species highlighted in red for
which, additionally, evidence of CXXXC(10-14)CXXXC motifs peptides were
found.
The presence of the various classes of vicilin precursors is overlaid on a
recently published phylogenetic tree of green plants (Viridiplantae) from algae to
angiosperms (Ruhfel, Gitzendanner et al. 2014) and presented in figure 7.3.
The plants orders represented in the alignment are highlighted on the
phylogenetic tree. This figure suggests an ancient evolutionary story for vicilin
precursors.
The simple class I vicilin precursors contain only a vicilin domain and are the
most broadly distributed among all plants, from Polypodiales to Brassicales
(figure 7.3), which is consistent with a hypothesis that class I precursors
represent the ancestral form of vicilin precursors.
Then the two other classes of vicilin precursors, class II and III (figure 7.2) are
represented in most flowering plants, from Amborellales to Malvales, but not in
species from the order Brassicales, the last plant group on the phylogenetic tree
(figure 7.3), that only have class I and II.
Pinales, the non-flowering plant order the most closely related to flowering plant
orders, only have class I vicilin precursors, which is supported by data from five
different species, thus giving some relative confidence that the class II and of
vicilin
precursors
emerged
after
the
transition
between
Pinales
and
Amborellales. Interestingly, the transition between these two is unequivocally
considered to constitute the emergence of the flowering plants. Brassicales
contain the class II of vicilin precursor, but lack the class III. The absence of this
class of precursors in Brassicales is supported by data from ten different
species including several model species, which allow strong confidence to this
statement.
The class IV seems to be a variant inside class III. Indeed, these precursors are
represented only in a few species of legumes (Fabales), which could suggests
incomplete sequencing. However, the Fabales constituted the larger data set
and presented the most diversity of vicilin precursors, and this variant class
could also represent an intermediate evolutionary state that was fixed only in a
few legumes. As said above, their sequence is highly similar to other class III
vicilin precursor from the Fabales, apart for the CXXXC(10-14)CXXXC motifs.
110
7.3.2 Emergence of conserved CXXXC(10-14)CXXXC motifs in vicilin
precursors
From these results, an interesting evolutionary story for vicilin precursors can be
proposed. Non-flowering plants possess only class I vicilin precursors,
suggesting that class I represents the ancestral form of vicilin precursors,
whereas flowering plants possess up to four different classes of vicilin
precursors, including this ancestral form. It appears that the vicilin precursors
diverged by an internal genetic expansion after the emergence of the flowering
plants. This was most probably preceded by gene duplication, as many
flowering plants species possess classes I, II and III vicilin precursors. This is
consistent with the reports of whole genome duplication events shortly before
the emergence flowering plants (Jiao, Wickett et al. 2011).
Two different set of genetic events resulted in either the C-terminal or the
N-terminal region of the ancestral class I vicilin precursors to expand, creating
two new and distinct classes of vicilin precursors, classes II and III. These
modifications seem to have been conserved in most flowering plant. We can
note the exception of class III not being represented in Brassicales, suggesting
that this class of precursor might have been lost after Brassicales emergence.
We can also note, according to the different degree of sequence identity on the
alignment, that the expansion on the C-terminal side of vicilin precursors is
relatively conserved in size and sequence among flowering plants, suggesting
that class II vicilin precursors haven’t dramatically diverged after the emergence
of flowering plants, whereas the expansion on the N-terminal side of vicilin
precursors, which gave birth to variable numbers of CXXXC(10-14)CXXXC
conserved motifs between flowering plant species, suggests that class III vicilin
precursors continued to diverge after the emergence of flowering plants.
These CXXXC(10-14)CXXXC conserved motifs appear to be functional as
confirmed by the characterised C2 and three MiAMP2 peptides. The broad
representation of class III vicilin precursors among flowering plants suggests
that potentially many other plants could also produce vicilin-buried peptides.
111
Populus trichocarpa
Populus euphratica
Ricinus communis
Jatropha curcas
Ricinus communis
Jatropha curcas
Vitis vinifera
Vitis vinifera
Solanum tuberosum
Solanum lycopersicum
Nicotiana sylvestris
Nicotiana tomentosiformis
Nicotiana tomentosiformis
Nicotiana tomentosiformis
Nicotiana sylvestris
Solanum lycopersicum
Solanum tuberosum
Solanum lycopersicum
Erythranthe guttata
Erythranthe guttata
Sesamum indicum
Erythranthe guttata
Sesamum indicum
Beta vulgaris
Nelumbo nucifera
Macadamia integrifolia
Oryza glaberrima
Oryza sativa
Oryza sativa
Oryza glumipatula
Oryza meridionalis
Oryza sativa
Oryza brachyantha
Leersia perrieri
Zea mays
Zea mays
Zea mays
Zea mays
Sorghum bicolor
Setaria italica
Zea mays
Triticum aestivum
Triticum aestivum
Triticum aestivum
Triticum aestivum
Triticum aestivum
Triticum urartu
Hordeum vulgare
Musa acuminata
Musa acuminata
Musa acuminata
Musa acuminata
Musa acuminata
Musa acuminata
Elaeis guineensis
Elaeis guineensis
Phoenix dactylifera
Elaeis guineensis
Elaeis guineensis
Amborella trichopoda
FMVDPAKKKPGQCLEECQRQ
EGGKQKSLCRLRCQEKYEREPGRE EEGNM EEKE
E
FTVDPAKKKLGQCLEECQRQ
AGGEQKSLCSLRCQEKYERGPGRE EVGNM EEEE
E
STTDP EKRLRECQRQCERQ
EG QQRTLCRRRCQESYERERERE EEGGR GEREHGRE
AERRVGQCQSQCERQ
KG EERALCRFRCQQKFGRDEGEH SWG
EEK
EKSDDPRKQYERCLEICERQEGR
QKQQCKRRCYTQYEEQQKEWEEREHGGGGGNSE
TET
SEDPRVQYERCQELCDRQPER
QKSQCRGRCERQFEEKQKEREERGSD
RGT
NTDPEKQLQQCQKQCER Q
REQQKEQCRRQCQDTYERQRGEEGKGGGGQSGEKDE QGRQ
VDPEKQLQQCRKQCELLQ
PGRQREQCHQECQEKYDQQQLEEG
ESDLKQCKHQCKVLQQIEEY QREKCYEICEKLYHGEKQEKEE INF
YEGRE
ESDLKQCRHHCKVLQHIEEY QREKCYDICEKLYHSEKQEKEEETNS
YEGRE
DPELKQCKHQCKALQQIGEH QRKECYEMCER YHSEKQSKEEDTSTFL
PYRGRE
DPELKRCKHQCKALQQIGEH QRKECYEMCEK YDSEKQSKEEDTSTFF
PYRGKE
NQGRDPQRIFRECQQRCQQQEQ GRQQQRQCQQRCEEEFRREQEQ QRR
GQETGEERENPQ
NQGRDPQRIFRECQQRCQQQEQ GRQQQRQCQQRCEEEFRREQEQ QRR
GQETGEERENPQ
NQGRDPQRRFRVCQQRCQQQEQ GRQQQRRCQQRCEEEFRREQEQ QGR
GQETGEERENPQ
RLQECQRRCQSEQQG Q RLQECQQRCQQEYQRE
K
GQHQGET NPQ
KFQECQRRCQSEEQG Q QLQECQQRCQQEYQRE
K
GQ QVET NPE
SQCDDQNPQGQHPEE KFRECQQHCERQEQ
PEQEYQECRQECRRQSREGGSR QERCEERCQEQRREREREQPGR
REGGGGMYMYEGRE
PEQEYQECRQECRRQSREGGSRRGTS QERCEERCQEQRREREREQPGRREGGGNM YEGR
PEKQYQQCRLQCRRQGEGGGFSR
EHCERRREEKYREQQGR
EGGRGEM YEGRE
REDEPAAQRFKQCQSRCGKQEQG Q QRQYCQQKCQWEYEKQK
REEEQQHGGGRGGGDPTN
RRDPE QQYKQCQSRCAREERG E QRQYCQQKCQWEYERQK
REQGREQGGGGGS
TN
APRRDPEKAYRECRERCQEVEEG RREQQVCESECEQRRER
IRRDDEIVGRRSV
CQQLCDQ GRGEQ EKQLCRQGCEQRHRQQQKEQD
ERQREHC
NKRDPQQREYEDCRRRCEQQ
E PRQQHQCQLRCREQQRQHGRGGDMMN
RRRETSLRRCLQRCE
QDRPPYERARCVQECKDQQ QQQ QERRREHGGHDDDRRD RDR
RRRETSLRRCLQRCE
QDRPPYERARCVQECKDQQ QQQ QERRREHGGHDDDRRD RDR
RRRETSLRRCLQRCE
QDRPPYERARCVQECKDQQ QQQ QERRREHGGHDDDRRD RDR
RRRETSLRRCLQRCE
QDRPPYERARCVQECKDQQ QQQ QERRREHGGHDDDRRD RDR
RRRETSLRRCLQRCE
EDRPLYERARCVQECKDQQ QQQ QERRREHGRHDDDRRD RDR
RRRETSLRRCLQRCE
QDRPPYERARCVQECKDQQ QQQ QERRREHGGHDDDRRD RDR
RRGETSLGRCLQRCE
EDRPRYERARCVQECKEQQ QQQ EQERRREHGRHDDDRNG RDR
RRGETSLGRCMQRCE
EDRPSYERARCLQQCKEQQRQQQQ EEERRREHGRHDDDRSSSRDR
NHHHHGGHKSGQCVRRCE
DRPWHQRPRCLEQCREEEREKRQ ERS
RHEAD
DR
NHHHHGGHKSGQCVRRCE
DRPWHQRPRCLEQCREEEREKRQ ERS
RHEAD
DR
NHHHHGGHKSGRCVRRCE
DRPWHQRPRCLEQCREEEREKRQ ERS
RHEAD
DR
NHHHHGGHKSGRCVRRCE
DRPWHQRPRCLEQCREEEREKRQ ERS
RHEAD
DR
HDGH
RCARRCE
DRPWHQRARCVEQCREEEERERQ QQEERGRGDRHE H
DR
REPWRCARRCE
DRPRHERAQCVQECREEERERGR RDELGRRG
DR
NHHHHGGHKSGRCVRRCE
DRPWHQRPRCLEQCREEEREKRQ ERS
RHEAD
DR
EEDRRGGHSLQQCVQRCQ
QDRPRYSHARCVQECREDQQQHGRHEQEEQ
EEDRRGGRSLQQCVQRCH
QDRPRYSHARCVQECRDEQQQHGRHEQEEQ
EDDRRGGHSLQQCVQRCQ
QDRPRYSHARCVQECRDDQQQHGRHEQEEQ
EEDRRGGRSLQRCVQRCQ
QDRPRYSHARCVQECRDDQQQHGRHEQEEQ
EEDRRGGRSLQQCVQRCQ Q DRPRYSHARCVQECRDDQQQHGRHEQEEQ
EEDRRGGRSLQQCVQRCQ
QDRPRYSHAR
EDDRRGGHSLQQCVQRCR
QERPRYSHARCVQECRDDQQQHGRHEQEEEQGRGRGWHGEGEREEE
DPEKKR CAMECRGIPE
QQQRKLCVHRCLDDSSEQESG
ECRGIPE
QQQRKLCVHRCLDHSGEQESG
DPEKKR CVMECRGIPE
QQQRKLC
DPEKTR CVMECRGTSE
RE
AGREQEREEEAEG
DRAKKR CHMECRGTPE
GRRRKECVRQCLDHSGREREHGEVAEG
H EDPERRYRQCRQECTQRAQD KRQLQQCEHRCEEEYKEQRHREGN
EEPEKRLEECRRECREQAE
RRERRECEKRCEEEYKE HRGRSKDKEEGEEGRGEKRRESD
EEPEKRLEECRRECREQAE
RRERRECEKRCEEEYKE HRGRSKDKEEGEEGRGEKRRESD
HKGEDPEKRLEECRRECREQAEGRE
QRECEKRCEEECKE RRGESKEEEKREEEKGEKRRGSD
GKREDPRKQLEECRRECQEKQEGRR QQRECEQRCEEEYRK HRGGNKEEEEWEEGQEKKRREGH
RKVEECLSKCKESFFYHQDRQLLEKCEQICFDKYDGSH
ERERQERQCKSECERSREREREVR QCKQECEERYRRQK
Figure 7.4. Details of zoom 1 in the alignment on figure 7.2. Colour scheme
is identical to figure 7.2 and cysteine residues are highlighted in red.
Class III vicilin precursors appear to have two distinct domains. One domain is
particularly rich in arginine and glutamine residues, thus sharing some
similarities with the Arg/Glu-rich C-terminal domain of class II vicilin precursors.
The second domain contains the repeated CXXXC(10-14)CXXXC motifs. This
expansion event gave rise to PV100 and MiAMP2-type precursors. However,
the two N-terminal domains seem to be linked, especially in early emerging
flowering species, as it can be seen on the zoom of the alignment, presented
right after (figure 7.4).
In figure 7.4, are shown the sequences from “zoom 1” of the alignment,
displaying the conserved CXXXC(10-14)CXXXC motifs typical from the C2
peptide, and potentially several homologues. These sequences are of plants
from the orders Amborellales to Malpighiales (from bottom to top), that
represent the earliest emerged flowering plant orders from my data-set. The
sequence of the MiAMP2d is marked by highlighting the name Macadamia
integrifolia.
First, the double CXXXC motifs as well as their relative distance (10-14 amino
acid residues) are highly conserved among all species of displayed. However,
the rest of the sequence is poorly conserved between species, but conserved
within a species. This suggests that although the expansion that gave rise to the
CXXXC(10-14)CXXXC motifs is conserved, this expansion has subsequently
diverged within plant species. This is also supported by the various numbers of
repeats of the motifs inside the domains.
Another interesting observation from figure 7.4 is that CXXXC(10-14)CXXXC
motifs are still quite rich in arginine and glutamine. We previously observed that
in legumes a class IV of vicilin precursors contain only the N-terminal domain
rich in arginine and glutamine residues. Thus, it seems that the expansion on
the N-terminal side of vicilin precursors, that gave rise to PV100 and other class
III vicilin precursors, first expanded and incorporated several additional arginine
and glutamine residues in particular (probably similarly than the C-terminal side
of class II). Then, whereas in class II the C-terminal expansion was fixed and
relatively conserved among species, in class III the N-terminal expansion
continued to diverge and in most species ended in the emergence of the
CXXXC(10-14)CXXXC motifs, most probably by neoproteinisation.
112
Gossypium raimondii
Gossypium arboreum
Gossypium arboreum
Gossypium hirsutum
Gossypium raimondii
Gossypium raimondii
Gossypium hirsutum
Gossypium arboreum
Theobroma cacao
Anacardium occidentale
Pistacia vera
Citrus sinensis
Citrus clementina
Citrus clementina
Eucalyptus grandis
Eucalyptus grandis
Pyrus x bretschneideri
Malus domestica
Pyrus x bretschneideri
Prunus persica
Prunus mume
Prunus persica
Prunus mume
Fragaria vesca
Morus notabilis
Morus notabilis
Ficus pumila
Prunus persica
Prunus mume
Pyrus x bretschneideri
Corylus avellana
Carya illinoinensis
Juglans regia
Juglans nigra
Cucurbita maxima
Cucumis melo
Cucumis sativus
Cucumis melo
Cucumis sativus
Phaseolus vulgaris
Pisum sativum
Vicia faba
Glycine max
Glycine max
Glycine max
Glycine max
Glycine max
Medicago truncatula
Cicer arietinum
Arachis hypogaea
Arachis duranensis
Arachis hypogaea
Arachis hypogaea
Arachis hypogaea
Glycine soja
Glycine max
Glycine max
Glycine max
Glycine max
Glycine soja
Phaseolus vulgaris
Cicer arietinum
Cicer arietinum
Cicer arietinum
HPGGE LSECQSRC V
GLAGEVRQLCLFRCQQKWRRDYASRWEEEGKK QSKE
HPGDG LSECRSRC A
GLAGEVRQLCLFRCQQKWRRDYASRWEEEGKK QSKE
SQRQFQECQQHCHQQEQ RPERKQQCVRECRERYQENPWRREREE
SQRQFQECQQHCHQQEQ RPERKQQCVRECRERYQENPWRREREE
SQRQFQECQQHCHQQEQ RPEKKQQCVRECREKYQENPWRGEREE
PDKRFKECQQRCQWQEQ RPERKQQCVKECREQYQEDPWKGERENKW
PDKQFKECQQRCQWQEQ RPERKQQCVKECREQYQEDPWKGERENKW
PDKRFKECQHRCQRQEQ GPERKQQCVKECREQYQEDPGKGERENKW
LQRQYQQCQGRCQEQQQ GQREQQQCQRKCWEQYKEQE RGEHEN Y
STHEPAEKHLSQCMRQCERQE
GGQQKQLCRFRCQERYKKER
GQHNYK REDD
STHEPGEKRLSQCMKQCERQD
GGQQKQLCRFRCQEKYKKER
REHSYS RDEE
NPGEPAEKHLRQCQRECDRQEE GGQQRALCRFRCQEKYRREKEGEGGQHN T QEEE
NPGEPAEKHLRQCQRECDRQEE GGQQRALCRFRCQEKYRREKEGEGGQHN T QEEE
NHHRDPKWQHEQCLKQCERRES GEQQQQQCKSWCEKHRQKGQRRREKEGKFNPSSNW
TTTEDPQQQFQHCQQRCERQQG REPQRRQCLQECERQFQEQQR
GK
RKQLKECQKQCERRQQ KGRSREGCQQRCVEAYGRKGEDQPRNRRGKEDQEKERGR
DPELKQCKHQCKHQRGFEEW QKKQCERKC EDYIKQKREQEEQRG
DPELKQCKHQCKHQRG
RGNRRT SGGGGKEEEEEEEGG
EGGSFYI
DPELKQCQHQCKHQRGFDER QKQQCERKC EDYTGE
DPELKQCRHQCEHQQGFDSK QREQCEQGC DKYIKQKREEEKHRR
KSEGGGS
DPELKQCRHQCEHQQGFDSK QREQCEQGC DKYIKQKREEEKHRG
KSGGGGS
DHPQDAREEYFYCSQSCGTSEEPEQ
CETECRERFDEQLKKEA
DHPQDAREEYFYCSQSCGTSEDPEQ
CETECRKRFDEQLRKEA
DPGLRQCMHQCQHQEGYDEQ QKQICEQRC DDYIKQKRQEERQRR
RGGGDTS
DPELKQCRHQCKHQRGFEEE QRQQCEQRC EEYIREKKHREREEE
G
TASRDPEEKFRRCQQRCQEQQHV QERRRECQQECERKYREEERR
ERRRRDDDDD E
TIHDDPEEKLRRCQKRCQEQQHV QERRRECEQECQKKYREEEHR
GGRRRDGDEDVDD
QQFEHEKHQYKQCKRGCENQARDD FQKEQCKQQCTQQMSQQYGQQQEGSTGGGGLNQ
QQFEHEKHQYKQCKRGCEKQALDDD QKEQCKQQCTQQMSQQYGQQQEGSTGGGGPNQ
KQFEQEKQQFKQCKKGCEKQAQDDD QKKQCKLQCAQQGQQQQG
GSTGGVHLNQY
DPELKKCKHKCRDERQFDEQ QRRDGKQIC EEKARERQQEEGNSSEESYGKEQ
QDPQQQYHRCQRRCQTQEQS PERQRQCQQRCERQYKEQQGR EWG
PDQASPRR
QDPQQQYHRCQRRCQIQEQS PERQRQCQQRCERQYKEQQGR ERG
PE ASPRR
QDPQQQYHRCQRRCQIQEQS PERQRQCQQRCERQYKEQQGR ERG
PE ASPRR
NQRGSPRAEYEVCRLRCQVAER GVEQQRKCEQVCEERLREREQGRGEDVDEVERRDPEWER
NQE
VTEICRQWCQVMEPHGGEQQQRRCRQECEERLRDQKQG EDVED KWRDPERER
NQK
ETEICRQWCQVMKPQGGEEQ RRCQQECEERLRDQEQG EDVED KWRDPERER
DPELKQCKHQCKVQRQFDEQ QKRDCERSCDEYYKMKKEKGRN
DPELKQCKHQCKVQRQFDEQ QKRDCERSCDEYYKMKKEKGRN
TKVVG DPELKTCKHQCLQQQKYSED DKLICLRSCDEYHGLKHEREKQIEESSEKQIHEHEG
EKDPELTTCKDQCDMQRQYDEE DKRICMERCDDYIKKKQERQKHKEHEEEEEQEQEED
EKDPELTTCKDKCDLQGQYDEE DKRICMEKCEDYVRKKQERQKHKEHEKEEHEEEEEN
TEVEE DPELVTCKHQCQQQRQYTES DKRTCLQQCDS
MKQEREKQVEEETRE
TEVEE DPELVTCKHQCQQQRQYTES DKRTCLQQCDS
MKQEREKQVEE ETRE
TEVEE DPELVTCKHQCQQQRQYTES DKRTCLQQCDS
MKQEREKQVEE ETRE
TEVEEEDPELVTCKHQCQQPQQYTEG DKRVCLQRCDRYHRMKQEREKQIQESRERETRE
TEVEEEDPELVTCKHQCQQQQQYTEG DKRVCLQSCDRYHRMKQEREKQIQESRERETRE
ETDPELKTCIHQCKQQRQYDE DKGICMDKCEDYHRMKQEREKRQHQ
ERDPEIKTCKHQCRQQLQYSEE DKNVCMEECDEYHRMKKEREKRK
Y RKTENPCAQRCLQSCQQE
PDDLKQKACESRCTKLEYDPRCVYD
TGATNQRHPPG
Y RKTENPCAQRCLQSCQQE
PDDLKQKACESRCTKLEYDPRCVYD
TGATNQRHPPG
Y RKTENPCAQRCLQSCQQE
PDDLKQKACESRCTKLEYDPRCVYD
TGATNQRHPPG
YQKKTENPCAQRCLQSCQQE
PDDLKQKACESRCTKLEYDPRCVYDPRGHTGTTNQRSPPG
YQKKTENPCAQRCLQSCQQE
PDDLKQKACESRCTKLEYDPRCVYDPRGHTGTTNQRSPPG
WEKENPKHNKCLQSCNSE
RDSYRNQACHARCNLLKV EKEE CEEGEIPRPR
WEKENPKHNKCLQSCNSE
RDSYRNQACHARCNLLKV EKEE CEEGEIPRPR
WEKQNPKHNKCLQSCNSE
RDSYRNQACHARCNLLKV EKEEECEEGEIPRPR
WEKQNPSHNKCLRSCNSE
KDSYRNQACHARCNLLKV EEEEECEEGQIPRPR
QVKPPSHHECLQSCESE
TDPYKRKACKLSCKFVPEHGEQQHEEEEREREHPQ
WEKQNPSHNKCLRSCNSE
KDSYRNQACHARCNLLKV EEEEECEEGQIPRPR
TEKP SRRKCVKICESE
KDLSRGQTCQLRCKHVSEPGGK
EEEEREIPE
QGNPCIEKCLQRYNKVHETFKHHECRSHCEKEEEEE
HQGNPCLKKCMQRYNKVNEAFK
RREEEE
QYEQGNPCLKKCMQRYNKVNEAFK
RREEEE
Figure 7.5. Details of zoom 2 in the alignment on figure 7.2. Colours are
identical to figure 7.2, and cysteine residues are highlighted in red.
In Fabales only, some of these class III precursors seem to have not led to the
emergence of the CXXXC(10-14)CXXXC motifs or they could also have further
evolved and even regressed to lose the N-terminal cysteine-rich domain and
only maintain the N-terminal domain rich in arginine and glutamine residues.
This constitutes class IV vicilin precursors.
From Fabales to Malvales, the potential homologues of the C2 peptide, all
containing a CXXXC(10-14)CXXXC motifs, display much more sequence
conservation between species (figure 7.5).
On the figure
7.5, pumpkin (Cucurbita maxima)
is
highlighted. The
corresponding precursor fragment displayed corresponds to the PV100
fragment from which C2 emerges. All the other sequences on this figure, as well
as the ones from the figure 7.4, and also from the high-copy-number of multiCXXXC(10-14)CXXXC-motifs from some precursors that can be seen on the
alignment figure 7.2, constitute potential homologues of the C2 and MiAMP2
peptides.
113
7.3.3 Did CXXXC(10-14)CXXXC motifs arose by neoproteinisation?
Overall, the expansion within the vicilin precursors that created class III vicilin
precursors is consistent with neoproteinisation. This example potentially
constitutes a much earlier event of neoproteinisation than the event that created
the PawS1 precursor and accordingly SFTI-1, evidenced by class III vicilin
precursors being distributed widely within the plant kingdom.
Among class III vicilin precursors, that possess the CXXXC(10-14)CXXXC
motifs, PV100 has been shown to produce multiple stable peptides with potent
bioactivities from its extra N-terminal domain, including the C2 peptide
exhibiting CXXXC motifs and trypsin inhibitory activity. The C2 peptide from
PV100 seems to have homology with many other plant vicilin precursors.
Homologues have already been charactered in a close pumpkin relative (Luffin
P1) and in the more phylogenetically distant species Macadamia integrifolia
(MiAMP2). All these potential homologues share a lot of sequence similarities,
and particularly the CXXXC(10-14)CXXXC motifs, which are most likely to be
structurally important and potentially constitute the active sites, are very highly
conserved. Such a broad conservation of this cysteine-rich domain among the
plant kingdom, with the highly conserved CXXXC(10-14)CXXXC motifs in
particular, suggests that these peptides would be produced by the plants and
functional.
Thus, to provide further evidence of this potential new family of buried peptide I
searched many of the plants species containing the class III vicilin precursors
for the presence of CXXXC(10-14)CXXXC motifs peptides in their seeds.
114
Figure 7.6. MALDI-TOF-MS and LC-MS/MS evidence for C2. Soluble proteins
from pumpkin seeds (Cucurbita maxima) were extracted and fractionated. (a)
The mass spectra acquired by MALDI-TOF-MS of the fraction (25% acetonitrile
elution point) containing both forms of the C2 peptide is presented, with C2a
(5615 Daltons) and C2b (5829 Daltons). (b) Each fraction was then reduced,
alkylated and digested with trypsin before being loaded in a LC-MS/MS.
Resulting spectra were searched against a pumpkin protein sequences
database using Mascot 2.3.02 (Matrix Science). Results for the 25% acetonitrile
elution point of the pumpkin soluble protein extract are presented here.
7.3.4 Identification of CXXXC(10-14)CXXXC peptides
To validate potential C2 homologues, I first investigated methods for extracting
and characterise them, using the C2 peptide as a model, making the
assumption that C2 homologues should have the same or at least similar
characteristics to C2 in terms of extraction methods.
Using MALDI-TOF-MS for detection, I developed a sample preparation to obtain
a good signal for both forms of the C2 peptide (figure 7.6a), as it possesses
two possible C-terminal cleavage sites. The method that I previously used to
successfully extract SFTI-1, only allowed a poor signal for the targeted masses
in this case. I found that I required a second purification step, based on solvent
gradient separation on a C18 column to separate the different components of
the samples into several fractions. Doing so I could obtain a fraction where I
could detect, with a relatively good signal, masses corresponding to the
theoretical average masses of the two forms of the C2 peptides and also
corresponding to the masses previously observed (Yamada, Shimada et al.
1999).
Then I tried to obtain the sequence of the identified masses. It appeared that
direct MS/MS on the MALDI-TOF was not conclusive, probably due to the
native form of the peptide. However, any attempt to alter the structure of the
targeted peptide resulted in very complex mass spectra above analysing,
despite the apparent relative purity of the sample from the MS spectra obtained
on the MALDI. Thus, it appeared that the samples I used for MALDI based
detection were still quite complex and contained many proteins not detectable in
their native form.
As an alternative, this same protein extract fraction was reduced, alkylated and
digested with trypsin and then, using LC-MS/MS, searched for fragments
corresponding to C2 digested fragments (figure 7.6b). I was able to resolve
most of the C2 peptide sequence using this method.
115
Although I was able to detect C2 fragments, I could also detect some fragments
from the rest of the vicilin precursor (as well as many other seed protein) in the
same sample, making it difficult to be certain the detected fragments were
coming from the precursor or the matured form. Thus, I could not confirm by this
method the existence of the peptides as independent entities. This second
method unfortunately could not be used on native peptide of this size (about 5
kDa) due to the particular instrument settings. This machine was set to work
with small tryptic fragment for higher resolution, to the detriment of the massrange detection, but this is the most common proteomic approach nowadays.
This is routinely used in my research centre. Adjusting such settings would have
required attributing one machine to my project only for months, which was not
an option due to the limited number of such expensive equipment and the high
number of users in our research centre.
At this point, I had two different methods to efficiently detect my peptides of
interest. The first one allowed detection of native peptide mass, but didn’t allow
to sequence the detected masses. The second one allowed to sequence only
fragments of the targeted peptides, but not their native forms, thus requiring
prior purification to ascertain the origin of the detected fragments not being from
the larger protein precursors. I investigated other extraction and purification
methods, and used these detection methods to confirm the presence of the
targeted peptides in the samples. However, all the other purification or
extraction methods that I investigated resulted in the loss of the targeted
peptide, or at least it could not be detected anymore by the methods previously
described.
For these reasons, only MALDI-TOF-MS based detection was used for the
identification of potential C2 homologues in other species.
116
Figure 7.7. C2 homologues in cucurbits. (a) Sequence and theoretical
masses of C2 and homologues from cucumber (Cucumis sativus) and melon
(Cucumis melo). The calculated masses account for the presence of two
disulfide bonds and a post-translational modification changing the first
glutamine at the N-term into pyroglutamate, as described for C2 (Yamada,
Shimada et al. 1999). Cysteine residues are highlighted in red and putative AEP
cleavage site are highlighted in yellow. Colour code is identical to figure 7.1.
(b) MALDI-TOF-MS analysis of cucumber (Cucumis sativus) and melon
(Cucumis melo) fractionated seed extracts. The mass spectra of the cucumber
seed extract fraction containing C2s (4317 Daltons) and of the melon seed
extract fraction containing C2m (4453 Daltons) are presented.
7.3.5 Identification of C2 homologues from closely related species
In the vicilin alignment (figure 7.2) two homologues of PV100 were identified in
the close pumpkin relatives melon (Cucumis melo) and cucumber (Cucumis
sativa), which are other members of the plant order Cucurbitales that pumpkin
(Cucurbita maxima) belongs to. These PV100 homologues also contain
sequences for C2 peptide homologues, which were named C2m for the C2
peptide homologue from Cucumis melo and C2s for the C2 peptide homologue
from Cucumis sativus. The sequences of C2 and of the potential C2m and C2s
peptides, as well as their theoretical masses are presented in figure 7.7a.
Using the MALDI-TOF-MS based method developed for the C2 peptide
detection, I identified a mass in a Cucumis sativus seed extract fraction,
corresponding to C2s (figure 7.7b), and another mass in Cucumis melo
fractionated seed extract, corresponding to C2m (figure 7.7b). The masses
identified correspond to the theoretical masses predicted (with less than 0.5 Da
difference), allowing strong confidence that they correspond to the C2 peptide
homologues.
117
Figure 7.8. C2 homologues in legumes (Fabales). The calculated masses
account for the presence of two disulfide bonds. Cysteine residues are
highlighted in red and putative AEP cleavage site are highlighted in yellow.
Colour code is identical to figure 7.1. The mass spectrum of (a) the broad bean
(Vicia faba) fraction containing C2vf (5954 Daltons), (b) of the common bean
(Phaseolus vulgaris) fraction containing C2ps (5943 Daltons) and of (c) the pea
(Pisum sativum) fraction containing C2pv (5440 Daltons) are presented.
7.3.6 More widespread evidence for C2-like peptides
I similarly investigated seeds of many other plant species, represented in the
alignment in figure 7.2, for presence of potential C2 homologues. In three
species, all from the Fabales order, I found masses corresponding to theoretical
sequences of C2 homologues (figure 7.8). The masses identified suggest
trimming of a few amino acid residues on the C-terminal sides of the peptides
as described for the other peptides produced from PV100 (Yamada, Shimada et
al. 1999) and for the MiAMP2 peptides (Marcus, Green et al. 1999). The
sequence of these homologues, extracted from the alignment in figure 7.2, as
well as their theoretical mature sequences and masses are presented in figure
7.8. Only the mass identified in broad bean (Vicia faba) seed extract is a perfect
match with the theoretical mass for the corresponding C2 homologues called
C2vf (figure 7.8). The slight mass difference (4 Daltons) between the mass
detected in common bean (Phaseolus vulgaris) seed extract and the theoretical
mass for the corresponding C2 homologues called C2pv, as well as the
irregular peak of average mass of 5943.18 Daltons detected in pea (Pisum
sativum) seed extract matching the theoretical mass for the corresponding C2
homologues called C2ps (figure 7.8), suggest that these peptides might be
subjected more post-translational modifications.
Unfortunately, as it can be seen on the generated spectra, I almost reach the
detection limits of the instrument. Indeed, the resolution on the generated
spectra is quite poor, as shown by the large basis of the peaks that usually
covers over 200 Da and by the lack of isotopic resolution (cannot been seen on
the presented images), which was not a problem at all when I was working on
smaller masses, such as the work done with acyclic SFTI-1 (1531 Daltons).
This method already presented limitations due to the fact that the precise
cleavage sites of the potential C2 homologues are unknown. For some of them,
it was predictable that they could also be matured by AEP such as C2, and
when potential AEP cleavage sites were present in the precursor sequence,
such assumption was made as in C2 homologues from other Cucurbitaceae for
example. These close relatives already harbor an additional AEP cleavage site
prior to the two other conserved as in C2, and only this shorter form was
detected. Other C2 peptides relatives further diverge, such as the ones from
118
Phaseolus vulgaris, Pisum sativum and Vicia faba. Due to the absence of
potential AEP cleavage site or to a multitude of them in their precursors, no
clear assumption could be made and only a few additional eventual
homologues were then identified with relative confidence. However, the fact that
no homologues of the C2 peptide were identified in many of the investigated
species doesn’t mean they do not exist, but are rather due to limitations of the
method. For example the C2-like peptide masses from macadamia nuts couldn’t
be detected (not shown). This could be due to the particular composition of
these seeds not being compatible with my extraction method. This could also
have been the case for more seeds than expected and thus could explain the
identification of potential C2 homologues in a limited number of tested species
only. For these reasons other extraction and/or identification methods should be
investigated to resolve this problem.
This is when my PhD reached its end and I couldn’t further investigate this very
interesting topic, which will be continued in my research group.
119
7.4 Conclusion
In this study I described three main classes of vicilin precursors suggesting that
expansion on both the N- and C-terminal sides of the vicilin domain within vicilin
precursors has been extensive. During the emergence of flowering plants,
which was preceded by whole genome duplication events (Jiao, Wickett et al.
2011), I believe either the N- or C- termini of some vicilin precursors expanded,
seemingly independently as no vicilin precursor has both an N- and C-terminal
expansion. The C-terminal side of some of the duplicates appears to have
rapidly accumulated additional amino acid residues during evolution and was
also rapidly fixed in the plant kingdom. This is demonstrated by the overall
homogeneity of class II vicilin precursors in most flowering plants. By contrast
the expansion on the N-terminal side vicilin precursors appears to have evolved
more gradually. This gave rise to class III vicilin precursors which particularly
contain a variable number of CXXXC(10-14)CXXXC motifs at their N-terminal
side. The class IV vicilin precursors suggest an intermediary evolution state,
which would have been fixed in a few legumes only. The CXXXC(10-14)CXXXC
motif appears to have evolved through the gradual expansion of vicilin
precursors N-termini, similarly to SFTI-1 that evolved through the gradual
expansion of PawS1 albumin precursor. For this reason, the emergence of the
CXXXC(10-14)CXXXC motif suggests neoproteinisation.
However to support that these CXXXC(10-14)CXXXC motifs correspond to
additional peptides processed from the class III vicilin precursors, it is essential
to isolate stable CXXXC(10-14)CXXXC peptides that correspond to precursor
sequences. In this study I investigated several plant species and found
evidence for homologues of the previously characterised C2 peptide from
pumpkin, in the pumpkin relatives Cucumis melo and Cucumis sativa, and in
seeds from the more distantly related Fabales plant order, Phaseolus vulgaris,
Pisum sativum and Vicia faba. The C2 peptide and its homologues represent a
large and ancient family of peptides buried in vicilin precursors that remains to
be fully characterised. Although more data will be necessary to establish the
evolutionary story of these vicilin buried peptides, they represent a potential
second family of buried peptides that would have evolved by neoproteinisation.
120
Chapter 8
General discussion and future
directions
121
8
General discussion and future directions
8.1 Seed storage protein trafficking
During my PhD candidature, I investigated the potential of the protein precursor
PawS1 to be used as a tool to study seed storage protein maturation. The
trafficking of protein precursors is of particular importance for their maturation.
My investigations, presented in Chapter 4, revealed that deregulation of
trafficking pathways can have an effect on the quantity of total seed storage
precursor being produced. Previous work on trafficking mutants had usually
been targeted at one or two mutants and used protein gels standardised for
total protein content to show that there was mis-processing of seed storage
protein precursors in the mutant seeds. These studies did not provide data on
the proportion of mature seed storage protein present in the mutant seeds.
My
results
suggest
that
although
trafficking
mutants
all
accumulate
mis-processed seed storage protein precursors, the location of mis-processed
seed storage protein precursors would correlate to different level of correctly
matured
seed
storage
proteins.
Trafficking
mutants
that
accumulate
mis-processed seed storage protein precursors inside their cells would over
produce or still produce normal quantities of correctly matured seed storage
proteins, whereas trafficking mutants that export mis-processed seed storage
protein precursors outside their cells would produce less correctly matured seed
storage proteins in total.
I propose the existence of an unknown mechanism that senses the
accumulation of mis-processed seed storage protein precursor inside the cell
and triggers an overproduction of seed storage protein precursor to compensate
for their mis-processing. This suggests that the regulation of protein maturation
and gene expression could be linked. This hypothesis could be validated by
studying gene expression of seed storage protein precursors during seed
development of mutants impaired in seed storage protein trafficking.
This project will probably lead to further collaborations with Ikuko HaraNishimura’s lab, which identified the mutants that I used in my study.
122
8.2 SFTI-1 maturation
In Chapter 5, I used several chimeric constructs to provide more evidence of
AEP requirement for SFTI-1 maturation and demonstrated the efficiency of the
SFTI-GLDN sequence at being processed by AEP in various protein contexts.
My results suggested that SFTI-1 can be processed without an adjacent protein.
For the synthetic constructs without adjacent protein, the production of acyclic
SFTI-1 was 5% of what it was for the wild type protein with an adjacent albumin.
However, this might be due to the absence of targeting sequence for proper
trafficking. Indeed, it is known that seed storage proteins have a vacuolartargeting sequence in their C-termini (recognise by VSR1). Additionally seed
storage protein precursors are matured during their traffic between the Golgi
apparatus and the vacuole. Thus, it is quite possible that the low production of
acyclic SFTI-1 from the albumin-depleted PawS1 variants is only due to
inefficient targeting. This could be tested in my research group with a further set
of chimeric constructs.
Nevertheless, it appears that SFTI-1 could be matured from its precursor
without an adjacent protein. This is particularly interesting in regards to the
current knowledge of cyclic peptide evolution. This result suggests that the
precursor for SFTI-1 (PawS1) could evolve to lose its albumin domain and still
be matured into SFTI-1. Parallel studies in my research group demonstrated
that the contribution of PawS1 to the overall albumin content of sunflower seed
was minor (Jayasena, Franke et al. 2016), and thus there is no particular
evolutionary constraint into keeping this particular albumin. The current analysis
of more than 100 transcriptomes of sunflower close relatives in my research
group (Jayasena et al. unpublished) also suggests that genes that encode such
truncated precursors exist and are transcribed in some species, but to date no
protein level evidence have been produced for these precursors.
The parallel with cyclotides is astonishing. Cyclotides are another type of cyclic
peptides also thought to be partially matured by AEP and generally encoded in
precursors consisting of one to three cyclotides in tandem repeat. The existence
of dual precursors has also been reported for cyclotides, in the species Clitoria
ternatea. In these plants, cyclotides extracted from the leaves appeared to be
encoded in dual precursors also encoding an albumin domain (different to the
123
napin type). Thus, although it hasn’t been supported by additional data yet, it
can be hypothesised that cyclotides also could have evolved from a dual
precursor and diverged to lose the adjacent ancestral protein.
This subject is of particular interest to several research groups and should be
resolved in the near future.
8.3 SFTI-1 evolution, neoproteinisation
My research group is particularly interested in SFTI-1 evolution. Recent and
current work, conducted in my group, attests that SFTI-1 evolved by an unusual
mechanism. During my PhD candidature, my insight led to the realisation that
the evolutionary mechanism that led to the creation of SFTI-1, previously
decrypted in details by my colleagues mostly by the use of next generation
sequencing, differs to previously described de novo evolutionary mechanisms.
After a thorough comparison with these other mechanisms, I named this
mechanism neoproteinisation.
Next generation sequencing has already greatly helped change perceptions of
evolution. For a long time it was considered that de novo gene evolution was
almost impossible and probable examples were neglected rather than being
studied. Nowadays, the evolution of de novo genes is emerging as an important
source of new proteins in addition to the familiar ‘parental gene’ model and
cases
have now
been reported in all living kingdoms (Tautz 2014).
Neoproteinisation is another alternative to the traditional ‘parental gene’ model
and I believe it has been overlooked, similarly than de novo gene evolution in
the past. Indeed, my results suggest that this mechanism is particularly effective
to create new functions and many other events might have occurred during
evolution. Seed storage proteins were targets of choice for the identification of
other neoproteinisation events and I consecutively identified another potential
occurrence of neoproteinisation with the prototypic example being a vicilin
precursor called PV100. Thus, in an attempt to elucidate this particular
precursor evolution I started to study vicilin precursors more generally, which
revealed quite an unexpected outcome.
124
8.4 Neoproteinisation of vicilin precursors
I established that expansion has been rampant in vicilin precursors, and that
neoproteinisation might have given rise to a second family of peptides different
from SFTI-1 and possessing a conserved CXXXC(10-14)CXXXC motif.
However, to prove that these peptides with CXXXC(10-14)CXXXC motifs
emerge from the vicilin precursors as I expected, I needed to prove their
existence as independent protein entities. Thus, I developed a sample
preparation to clearly detect C2 and some of its homologues using
MALDI-TOF-MS. Using this method, I found mass evidence for homologues of
the previously characterised C2 peptide from pumpkin, in the pumpkin relatives
Cucumis
melo
and
Cucumis
sativa,
and
in other
seeds
from
the
phylogenetically more distant Fabales plant order, Phaseolus vulgaris, Pisum
sativum and Vicia faba.
Additionally to my work, another homologue of the C2 peptide was fully
characterised in a close pumpkin relative. My group is currently attempting to
recover the gene encoding this peptide (Luffin P1). Thus, in addition to my
results, it seems that C2 peptide homologues are common in Cucurbitaceae
and maybe also in Fabaceae.
Overall my results suggest that class III vicilin precursors could be a second
example of neoproteinisation event. This class of vicilin precursor, which I
discovered is represented in most flowering plant species and, contains
additional buried peptides with a conserved CXXXC(10-14)CXXXC motif. So far
the existence of the CXXXC(10-14)CXXXC containing peptides as independent
mature entities had only been confirmed in two phylogenetically distant species
(pumpkin and macadamia nuts), but I provide evidence for homologues in two
more pumpkin relatives and three legumes. This work has created a desire to
identify and characterise this new family of peptides.
This potential second and more ancient event of neoproteinisation suggests that
neoproteinisation could be a widespread mechanism for protein evolution. In
this regard, this work constitutes the foundation studies for the characterisation
of neoproteinisation.
125
Chapter 9
Supplementary information
126
9
Supplementary information
9.1 Gene sequences, relative to trafficking mutants (Chapter 4)
MAG1
gggaaaacaaaggatcttccatgggcgatcgctgagaagatccgcaagaacaaaaacatcaaattccttga
aatcaggaatcttattcgatagtctatttgagcccttgatctggtaagctctgtctgatcttctaaagtttcgctttttaa
cttttggggaaatcatccacctttgtatcaagtttctggcaatttcttctctgtagttagaatcattcaataagcccttta
atctcttcgacgcagctagtgttgaattacatctgggttttgtctactttgtgtcaaaaccatgcttttatgtttgattttcg
cgaagtatgatttaggttcgattttagtgaaagaaagacttaacagttgtagttaactgaaactatgcgatatctga
ttacgttggtgttatgattttattggaagggttttggaaatgattccacattgtattgtagatctcattgacttttggtttttg
atctgtttctcaatccagattctttatactttagtttaacaagtgacaaactttgatctttgattggttcaccaacgtcttt
gaaggttttttatttggtgaaagttttgaaATGGTGCTGGTATTGGCATTGGGGGATCTCCA
TGTACCCCATAGAGCGGCTGATCTACCTCCTAAGTTCAAATCAATGCTTGT
TCCTGGGAAGATTCAACACATCATCTGCACTGGAAATCTCTGCATCAAAgta
gtgttttttttccttcaaattttcaacttttacatctctagtatgctttattttttttcgttaatgtcgttgtgggatcattctgtctg
atgtaactgtttctgggatatttacttatgaagcagaaatgaggattcagaaggagggtagactctaatttcaagc
tttatatgatgcttttctctgatgcaacctagtctatgtttatcctgtattgggttgtttagaactattttcttacttggttgtgt
aagccttaccagaaatttgataaggagactcaatagtgaattatgtgatctactcggagcttgttctatatctgctg
gattgtgtagatttgaattcttaccttgtagtgtgagagtcgtgtaaaatgggaagaggaattccattacattttggtt
tgatcagttttagttttagatttgcttgtgaccatggaattgtaaagtgattaagtcagcatcgatgagtcatttatgag
aagccaatgatgttcttttagttggaagatatgttaagaagtttgtaaggtgtttagttgaaaacccatcttagaga
aactgcagacctgatttatttgtatgctgcaaagtcttatatggttcatgtttgtggtaagcttaacatatttttgactgtt
gtagGAAATCCATGACTACTTGAAGACTATCTGCCCTGATTTGCATATAGTTC
GAGGGGAGTTTGACGAAGATGCGCGATACCCTGAGAATAAAACCTTGACT
ATTGGGCAATTCAAGCTGGGATTGTGTCATGGCCACCAGgttcagtttcccttctttgt
gcatttcatcatgatctttacgagttagacgctagcatttgcagcatacgaataatccctcgtatcatacatcatctg
tttatcagaattctcagactagaaaaccaaattttgcttgcctttacttcattgctttgttggtgtgtgtcccttacccagt
gagtttcagcgttggctcgcttgttattttgcatgataatttaaacaggctttgaatatatat[mag1.2_insertio
n]cagatggaatattatctttctacccttttactttgtcatatggtaaacagtgataaaatagtttttgtctagatgcctc
taatctgcacatactttttctatatatgataagaactgtattggagattagtaaacgggaaaggtctaggataaaa
gtttcggcgtcaatgattttcattttcaatgtcaatgttttagactaaaactatgaaagagaataagagcaatggtta
gttaagttatataacttatttgcagtggtatgaaaacgagcttctctatttgtggtacagGTGATCCCATGG
GGTGATCTAGACTCACTCGCCATGCTTCAGAGACAACTGGGCGTCGACAT
TCTCGTAACAGGCCATACCCACCAGTTCACAGCGTACAAACACGAGGGAG
GAGTGGTGATAAACCCGGGCTCTGCAACCGGAGCTTACAGCAGCATAAAC
CAAGACGTTAACCCAAGCTTTGTCCTTATGGACATCGATGGTTTCCGAGCT
GTAGTCTATGTCTATGAGCTAATCGATGGAGAAGTTAAAGTCGACAAGATC
GAGTTCAAGAAGCCCCCTACTACCAGCTCTGGTCCGTAGttccttaaaaaaaaact
taaaccgttttatgttttcgtctgaaatcactcctaaaca[mag1.1_insertion]tttagaactttcttgtgaaa
aataatatttttaacaatttgcaaaacgaaccattgagaataaacaacttttaacaacttcctgcaaaagtcgaat
atttttattggcatttgcaattttgcattacaacgatgttggcgagtttattcggaaatattaagccctaataacccga
ccccacccgacccgacccgacccga
127
MAG2
ttaaagattttgctcgatttggggatccatgattgattgaatttggagattttgctcgacgtcactttgttttttttttgttttttt
ttcgctttgttgttatagaaaacaataacgttttgttgttatagaaagcaataatgttttgaattaacgggtatgttaat
aaaaaattaattttgttaacggaatctgttaatctacttaacatcatgtctaatatgaccattagcaaatattttgtattt
attaagaaagtcacataagtttgatatttattagaaaaatcaacataaatttgatatttatttcgtttcgttggatgttg
agaatttaaacaaatgttttcaaataaaagttggagattttagtgataattttccgataaaagaatgtcatttctttga
gaatgccattacagttttaatttacttgatatgttaaaaacatttttgtgggcttaaccagggcccattaatccgagtt
accaaaactcgcaactcgttaacactccgagtcattgactcagtatcttcttccaacgaaatttcaatttttcattgtt
cttgatgaaactgaggccacgacgaatcgcaaATGGAGGCGATCAAACCACTTCCACAG
GTCTCGAGCTTCTCCGCTTCGGTATTTAGCTTTCTCGATGGCAGATTCAAA
GAATCCACGGATCTCTCTCATTCTACGGGTCTGGTTTCCGAGTTACAGACG
GAGATTTCCGAATTGGATCAGAGATTGGCCGGATTAAATCGCCAGCTCGA
GTCAGGTCTCGCTGCTTACGCTTCGTTTTCCGATCGCGTCGGTGGTCTATT
TTTCGAGGTTAATGCCAAATTGGCTGATCTCTCTTCTTCTACCTCCGTTACT
CGCTCCGCGTCAGgttcgtatgatttcatcacgattcttctggatgtagattatgttcttgaaatgttgtagtc
aaatatgtaaattcagtcctggtgatggaagattgaatatattcccgttataatttgagaatttgataattgtttaattc
attcagATAGCGGAAAGGAGGAGGAGGCGACGGAGCATGTAGCCGGAGAGG
ATCTTCCTTCATTAGCAAAGGAAGTAGCACAAGTTGAGTCTGTGCGTGCTT
ATGCTGgtatgtttggtgttaaggcgttttgtatttgatggctgctttctgataaagactagctctattttatgatttg
gtgttatcgatgttgatcgggattagttagtcccaaagttggagttaccaaccttctttcttagtttggtcactgtggtat
tcaacattttaagatctttctctatgtttcagAGACTGCACTAAAACTTGACACTTTAGTTGGT
GATATTGAGGATGCTGTGATGTCTTCTTTGAATAAAAACTTAAGAACATCTC
GGTCAAGTGGTTTTGAAgtaagcacactgtcttcagcttagactgttccgtagttttagccagtgtgtgt
ggttcaatgttctaagtttttactgatttcagGAAGTGCGTCTACATGCTATTAAAACACTTAA
AACGACAGAAGAGATACTGAGTTCAGTAGCAAAGAGACATCCTCGGTGGG
CACGTCTTGTTTCTGCTGTTGATCATAGAGTTGATAGAGCTTTAGCTATGAT
GAGACCTCAGGCAATTGCTGATTACAGAGCGTTGCTTTCTTCTCTTCGATG
GCCACCTCAGCTTTCTACACTAACTTCGGCAAGTCTTGATTCAAAGTCAGA
AAATGTTCAAAATCCGCTTTTCAACATGGAAGGGAGCCTCAAAAGTCAATA
CTGTGGAAGTTTTCATGCCTTGTGTAGCCTGCAGGGGTTACAGTTGCAAAG
AAAGTCGCGGCAGCTTGGGATCCATAAGGGAGAAAATGTTCTTTTCCACCA
GCCACTCTGGGCTATTGAAGAGCTGGTCAACCCTCTGACAGTTGCATCTC
AGCGACATTTTACAAAGTGGAGTGAAAAGCCAGAATTCATTTTTGCCCTTG
TGTATAAAATCACAAGGGACTATGTCGATTCTATGGATGAGTTGTTACAAC
CGCTTGTGGATGAAGCAAAACTAGCTGGGTACAGTTGCCGAGAAGAGTGG
GTTTCGGCTATGGTAAGCTCACTGTCTTTGTACTTGGTGAAAGAGATCTTT
CCTATATATGTTGGTCAGCTAGACGAAGCAAATGAAACTGATCTTCGTTCT
GAGGCTAAGGTCTCGTGGCTCCATCTCATTGACCTGATGATCTCCTTTGAT
AAGCGAGTTCAGTCTTTGGTATCACAATCAGGAATACTTTCGCTTCAAGAA
GATGGGAATCTTTTGAGAATTTCCTCTCTCTCAGTTTTCTGTGACAGACCTG
ATTGGCTTGATTTATGGGCAGAGATAGAGCTAGATGAGAGGCTCGTCAAAT
TCAAGGAGGAGATTGATAACGACAGAAATTGGACAGCGAAGGTCCAAGAC
GAACTCATCTCCAGTTCCAACGTTTACAGACCACCAATCATTTCCAGCATC
TTTCTACAGCATTTGTCATCAATAATCGAACGATCCAAATCAGTGCCGGCC
TTATATTTGAGGGCTAGGTTCCTGAGACTGGCAGCCTCACCAACAATTCAT
AAGTTCTTGGATTGCCTCCTTCTCAGGTGCCAAGACGCCGACGGACTAAC
TGCATTAACTGAAAATAACGATCTAATCAAGGTCTCGAACTCTATTAATGCT
GGTCACTACATTGAATCTGTCTTAGAAGAATGGTCTGAGGATGTCTTTTTC
CTTGAAATGGGAACTGGACAGCACGATCCACAGGAAGTTCCAGGACTGGA
GAACTTTACTGAACCTTCTGAAGGTATTTTCGGAGAAGAATTTGAAAAGTTG
GAGAAGTTCCGG[mag2.3_insertion]CTAGAGTGGATAAACAAATTGTCGGT
GGTGATCTTGAGAGGCTTTGATGCTCGAATCCGAGAATACATAAAAAACAG
128
AAAGCAATGGCAGGAGAAGAAAGACAAAGAATGGACGGTGTCAAGGGCAC
TAGTTGGGGCTCTAGACTACTTGCAAGGAAAAACGTCTATAATAGAAGAAA
ATCTAAACAAAGCAGACTTTACCGCTATGTGGAGAACTCTAGCCTCAGAGA
TAGACAAGTTGTTCTTCAACAGCATCTTGATGGCGAACGTGAAGTTTACCA
ATGATGGAGTCGAAAGGTTAAAAGTAGACATGGAGGTTCTATATGGGGTTT
TCCGGACGTGGTGTGTTAGACCCGAAGGTTTCTTTCCTAAACTAAGTGAGG
GGCTTACGCTTCTGAAGATGGAAGAGAAGCAAGTGAAGGACGGTCTGAGT
AGAGGCGATAAGTGGCTACGTGAGAATAGAATTCGATATTTGAGTGAAGC
CGAAGCCAAGAAGGTAGCGAAGAGTAGAGTGTTCTCTTAGttagatatttttcagca
aatcaatatctcatacgaaggtacattcataggtgagatcatttcatttatttgtggattaagatataacacacaac
tttggatttttaattgaaattgaactaaccagctctggttctgtttcgtcttaaattaatgattcatgagttaatagagaa
acaaaacaaaa
MAG4
tgaaaccaaacaaaaatttcaaaataatcaaatggatactaaattttagaaccgaaaaaactgataaccgaa
caccgatgactaaagtctatgagcataacacaatccacaaaaaataaacctccacagttttactggctatgcc
ataaaagcccactggacctaaatgggcccatattcaagcatcctgtaatctcacaagagctagttatctggtgc
gggcatgagtataatacgttatcaatcttcctcttcgaccattaaagaagaaaaaaaacaatatcatcgtgacg
aatcgaacggtttatcagctcttctgactttgcttgaatagatcggtcaggtgaaaaacacgagagactctgtga
agaagagatcgctgcgatctatcacggttccattgcctggaggattcttcttccttgttttgatcttgttcttatcaacat
agtctgtgtaagctgttcgattctcgccattttcaatttctgttttccctctctgtgataaatttcggagttctagggttttgt
agtttgattctccgtttttgtactggtttgtgttgttgctatttcaattttggttgaaaatttcttgatttttgttgtttcaggtgatt
ttattggctataacgagatATGGATTTGGCATCCCGTTATAAGgtgaatatctttcttcgattgtga
gcttaatttcagctgtcaggttttcagggtgattaaattctgttttggatcgaatgcatgtttccttcttgtttccgtgttttc
gaatagctgtatttgttgagaggagtctgttttgcatttgctcggtgttgattccaatatgaattttagtagtttctcaatg
gtgtagaaagagtgactggtagtttactaatcatgaggatggcagagagctttatactatatttccatggcatttta
acagcagcattattaaagtttcccggtgtttttaacttgttttacgaacttttacagGGGGTTGTTGGAATG
GTTTTTGGCGACAATCAATCTTCTAATGAAGACAGgtacattttcttctcttttctttgccatg
agtcgggagatataggctatcatggctcgtaagtatgtcgagaaatgaagtaatgaggattggatttttatttggt
gtatttttgtagTTACATTCAACGCCTGCTGGATCGTATAAGTAATGGTACCTTGC
CTGACGATAGGAGAACTGCTATAGTAGAACTTCAGTCTGTTGTCGCAGAAA
GTAATGCTGCTCAATTAGCTTTTGGAGCAGCCGgtatgtgtctttctgttttatgttatgtgtct
gattgattgtgagataaaatctaaagttacaatttacattctgtcatttcagGATTTCCTGTGATAGTGG
GCATCCTAAAGGATCAACGAGATGATCTTGAAATGgtgctgtacccttattttttttttccat
atttcttttgttctctctcagtttgtgcatagtgctgtttgtccttacgtcatagcacaatctcttgtcctcactttatggaaa
atgtggtaaaaatcgttgaagcaaagtttgttaaagatggattttttttttttgctccaaacaaagtagttttgggtga
gtagcatttgtaattctacttttcgtagtccttgctaatagtgtagcgatgtataattcagaatcagaataggtccaca
agtccaggggaaaatcaattggaaaggaaaatgttgtaaatttccttcagtatggattctggtctagtgtagattct
taaccatgaaagttgcacttataaatctcttgttgttttactttgggattatccattgaatttcttgttaatgttaccgttttg
tgcttttcttcaacttatatttttgttgactgtccaggttcgaggtgccttagacttataattttgttgactgtccagGTT
CGAGGTGCTTTAGAAACTCTTCTTGGCGCTCTAACGCCAATCGATCATGCA
AGGGCACAGAAGACTGAAGTCCAGGCAGCTCTTATGAACTCCGACTTGCT
CTCCCGTGAAGCTGAAAATATCACTCTTCTTTTGAGTTTGCTGgttagagtttactc
aatattttcgaaatagaaaaagttattttttagttttgtttgttttttatgtgtgcatcctatggttaattttcctggtggtatg
acagGAGGAAGAAGACTTTTATGTTCGATATTACACTCTTCAGATTTTGACAG
CTCTGCTTATGAACTCTCAAAACAGgtatgtatcttaggatgtactatctctgtgtcatttttctatct
gtatgctatagagactgttgctgactgtctcttcagGTTGCAGGAAGCTATTTTGACCACTCCT
CGCGGTATTACTCGGCTCATGGATATGCTAATGGATAGAGAGgtaaataattaatt
129
atttattggatatgtatgatctttccattgtttacgtaagactctgacctctttaacctcctgtcaacctcaaataggag
gttatgagacggtaaactctggaagggaacaatatcatgtttgtaattttttttttggtgtatgccttatagGTAAT
CCGGAACG(mag4.1_deletion)AGGCTTTGTTGCTTCTTACGCACTTGACTC
GTGAAGCAGAGgtgagacctcctggttatatttgtaactgttccttttcttctgccttgagccgagggcagcc
attgatcgatcaattttacatttgtagGAGATCCAAAAAATTGTTGTCTTTGAAGGTGCTTT
TGAAAAGATTTTTAGCATCATTAAGGAGGAAGGGGGGTCAGATGGCGATG
TGGTTGTGCAGgtgaccagttgctcttttaaaaatatttatagcacttgtaatctgttaaaaatttggcatata
aaaaggataaatgaagaacccattctcctatttaggcctaaacattattggcttttacatttccatatgtcagcata
ctttgggattgcctgcttatgctaaacttcaaattccttgcccgtgcacttgcactaatcttgttctttatatacagGA
CTGTCTGGAGTTGTTGAACAATCTTCTTCGCAGCAGTTCATCTAACCAGgtttt
gaagctttgtttctggtcttgatttttctgattaatttactagatccaacatatgctaatatttgaacttatacagATAT
TACTAAGGGAAACCATGGGTTTTGAACCTATCATTTCAATTCTAAAACTGCG
AGGGATTACATATAAGTTCACCCAGCAAAAGgttagatcatagagtgcttctcccattatgg
atttattttaccagcttgtgcaattttgttaatctttgtccttctgaatgaacaaattcccagtagACTGTTAATC
TGCTTAGTGCACTGGAGACCATCAATATGCTGATCATGGGACGTGCAGAC
ACGGAGCCTGGAAAAGATTCAAACAAGCTGGCAAATCGAACAGTTTTAGTT
CAGgtttgttatgatattcaaatgttatgattccttattgtggacaggatcgaatatgtttcttattctatcatatttgatc
caataacagAAAAAACTGCTAGACTACCTACTGATGTTGGGTGTTGAAAGCCA
ATGGGCGCCTGTTGCTGTTCGCTGCATGgtaagcaaatcatctttattcgaaatcttcttcctt
agccccttccctttttttgttagtattcatatttcattgtatatcgaaatttcattcctgatttgcgcaactgtcaataagtt
ggaatagtttcattcagattctagaattcagctgttgaacattgtttctaggttgaccatgggaactgatatacttac
agtgatctattagcagtttttttctagagattgttgcactttctttttagccaattgctgaacagttgtctgaagcaagta
ttttattgatgtttttttctaactggtcttttgcattgtgattctttgctgctagACATTTAAATGCATTGGAGA
TCTAATTGATGGACATCCTAAGAACCGTGATATCCTTGCTAGCAAAGTTCTT
GGTGAAGATCGGCAAGTAGAACCTGCACTCAATTCTATCCTCCGGATCATT
TTACAGACGTCCAGTATTCAAGAGTTTGTTGCAGCTGATTACGTTTTCAAGA
CATTCTGTGAGgtttgaaactttttttcctatagatttttagttcatagtctaatgtgatattgcttcttctaatata
gtatgtgattggatctttatctagaatctcattcactaaacgttttcagAAAAATACTGAGGGTCAAAC
AATGCTAGCTTCGACTTTAATACCTCAACCACATCCCACATCCCGAGATCA
TCTAGAAGATGATGTTCACATGTCATTTGGAAGgttccatcttcttttctttcttctgcggctttc
gcagtgttgttttgaagggatctcactcttcattctcattgcgagtttgcatggtcctccaactttgacctacttctcttgt
attaaaacttatccaactttttttctatatgtgtcagTATGTTGTTACGAGGCCTTTGTTCTGGTG
AAGCTGATGGTGATCTTGAGgtaaaattgtttctctctgttggatattaatttaaaagcattccggaa
cttacagttatttttttacttcttaaaaacagACATGTTGCAGAGCTGCAAGCATTCTTTCTCA
TGTTGTCAAGGACAATCTTCGGTGCAAGGAAAAGgtaacaaagtcatcactttaaactt
attttctgtacgcaatatttttgtttttcctgcattgactgactttttgggaaactcaatagGCTTTGAAAATCG
TACTTGAATCACCGATGCCGTCCATGGGAACTCCAGAGCCTCTCTTTCAGA
GGATTGTCAGATACCTGGCTGTTGCTTCTTCCATGAAAAGCAAAGAGAAAT
CAAGCACACTGGGGAAATCATATATTCAACAGATCATTTTAAAACTGCTGG
TGACATGGACTGTTGATTGTCCGACTGCAGTTCAGTGCTTTTTAGATTCCC
GCCACCACCTAACGTTTTTGCTCGAGCTGGTAACAGACCCAGCTGCAACA
GTTTGCATAAGGGGATTGGCGTCTATTCTATTGGGAGAATGCGTCATTTAC
AATAAATCAATTGAGAATGGTAAAGATGCTTTTTCTGTAGTTGATGCTGTGG
GCCAGAAGATGGGGCTGACATCATACTTCTCAAAGTTTGAAGAGATGCAG
AACAGCTTCATCTTCTCTCCCTCCAAGAAACCTCCACAGGGTTATAAACCG
TTGACAAGAACACCCACGCCGAGTGAAGCCGAGATTAATGAGGTGGATGA
GGTAGATGAAATGGTTAAAGGGAATGAGGATCACCCAATGCTCTTATCTTT
GTTTGATGCTAGTTTCATAGGCTTAGTGAAGAGCTTGGAAGGTAACATCAG
AGAAAGAATCGTGGATGTGTATAGTCGGCCAAAAAGTGAGGTGGCTGTAG
TACCAGCAGATTTGGAGCAGAAGAGCGGAGAAAATGAGAAAGACTACATC
AACCGCTTGAAAGCCTTTATAGAGAAACAGTGCTCGGAGATTCAGgtatgtccc
tcttcagaactgtctatgctactgagtataatactgttagcatatctgaaaatatatactattttggtgttgattgcaatt
130
aaaagaagacaaatcttgagattcatcacaagtaacatgaaatttaaaaatagatgaaaagttcagggggttc
aactaaaaatatgggtattaggaagcattgcttattgctgaaacggaatataggaagcactaagttttggctaag
atgggactatttagaattggtcttatgttctcagctggatatcaaacaatattcagacaataggtctctgaattgatg
tgataaatatatgttgaggttaaccgttcaaaaaagaaatctcaacatcagttgttcctgacagagcttcttgttgtt
gtgacctctgtgttttacagAACCTTCTTGCTCGAAACGCTGCTTTGGCAGAAGACGT
AGCTAGCTCTGGCAGGAACGAGCAATCTCAAGGATCAGAGCAAAGAGCAA
GTACAGTAATGGACAAAGTTCAGATGGAAAGCATCAGACGAGAACTCCAA
GAAACATCTCAGCGTTTAGAGACAGTGAAGGCTGAAAAAGCCAAGATCGA
GTCTGAGGCATCAAGTAACAAGAACATGGCTGCAAAACTTGAGTTCGACCT
AAAAAGCTTATCGGATGCATACAACAGTCTAGAACAGGCGAATTACCATCT
TGAGCAGGAAGTGAAATCCTTGAAAGGAGGAGAGAGTCCCATGCAGTTTC
CTGATATAGAAGCCATCAAGGAAGAAGTGAGAAAAGAAGCTCAGAAAGAG
AGCGAAGACGAGTTAAATGACTTGTTAGTATGTTTGGGACAAGAGGAAAGC
A[mag4.2_insertion]AAGTCGAGAAACTAAGCGCGAAGCTGATAGAATTAGG
AGTTGATGTTGATAAGCTTTTAGAGGATATTGGAGACGAATCAGAAGCTCA
AGCGGAATCTGAAGAAGACTGAccattgaacaagttgtagagattttatatcctctaaaatgctct
ctcatcagggaatggaacttgtacagtcttgggtgatttatttcgatatttgacaattctctttgcttaatcttttagttat
aactctttaatttatctaccaaacttcttagattgcttgaaacatacacatcttaacaatgaaacacaatgtggtttta
agttactgcaggttgttatgcttttcatattttgttgaaacaatcaaacaagtactcagttcaattagtgttt
VPS35b
aaactacagacaacacattaaacagaaaacaacattggaagaccaaacaggtggtatccatgttgcagag
aaaactatttataatttgatttggttcgattttaatcaaaatgcaaatcaaaaatataataaaataacaaagtagtt
acaatattttatattaattgatatgtttattcataatacataaaatataaaattgaaataaattttgtttatccggttcggt
tgagttccaatatagcagaaacgattgcggttaaaattctggattttattcggtttgatgaaatagaatatggaccg
ttagatctaaagttaagtagaaacaacgaacggtggatagcgaggaaaacgtacacgtgtgaagcatgaca
taatttttttcccatccaaagactttcacaaggtcttttttttgaaacctgaaaagtgagaaacccagagatcgaga
ttgagatttagagaagacgaagatcagaagATGATCGCAGACGGATCAGAAGATGAAGA
GAAATGGCTCGCCGCCGGTGCTGCTGCTTTCAAGCAAAACGCATTTTACAT
GCAACGCGCTATTgtaaattctgcttttcccttcacttcttcctttcgtttctctgtgactattttcgttgaaattg
agaaatttccaaatttgtacttttcattgatgtgtagaattgaaggaaccctaattttggttttgggaaaaaacttgtct
tgtttttggttaattagGACTCGAATAATCTGAAAGATGCTCTCAAGTATTCGGCTCA
GATGCTAAGCGAGTTACGGACTTCGAAGCTATCACCTCACAAGTACTATGA
TCTATgtgagttgtttccgtcttctgatctaccttgatcatatttatttcgctcaatcccttaattgacttaatagggttt
tgtttgtgatctgatttgatggtttttagATATGAGAGCTTTTGATGAATTGAGGAAACTTGA
GATTTTCTTTATGGAAGAAACTCGTCGTGGCTGCTCAGTCATTGAACTCTAT
GAGCTGGTTCAGCATGCTGGTAACATATTACCGCGTTTgtaagttactctcatacaat
tccctttcgctgtctgttcctatgttttggcaagtttttttttatcgatcgtatcttcgtaaagaatcattttggtctttcttagtt
caaattcatgttatcttaaagatggtttatttgattactacttgatgttgctagcatttcttctgatatttagtctgttcacttc
tctagctaaagtaatatttgttcattttattttcccttccctagGTATCTCCTATGTACAGCAGGATCC
GTGTATATCAAAACCAAGGAAGCTCCTGCCAAGGAAATTCTTAAAGATCTT
GTTGAGATGTGCCGTGGGATTCAGCATCCTCTACGTGGTCTCTTCTTAAGA
AGTTACCTTGCGCAGATTAGTCGAGATAAATTACCTGACATTGGTTCTGAG
TATGAAGGgtaagcatgtagacactattgttgggtttcttcatagcagctgggactcccatggggttcttgctt
gtgcttaaattatcttcttagacaaagatccctagtttctttcgacttcatatagaatgttggctattaattaaaatcattt
tgattcatctagttggtgactcgttttcaaatctctacagAGATGCTGATACAGTCATAGATGCTG
TGGAGTTTGTACTACTGAACTTTACTGAGATGAATAAACTCTGGGTCAGAA
TGCAACATCAGgtgagttatgatttaagattgttatacacccttctactcattttggaaacgagcttgccaga
131
atgtttgcacttcatggagatttaattggactatgagagcatgattctgatcatgcaggtgttgtattgcttgtgaatc
attaccctttgatgatcctgagttgaagcttcctatcgggctttctggtctgcagGGACCTGCTCGAGAA
AAGGAGAGACGGGAGAAAGAGAGGGGCGAGCTTCGTGACCTTgtaatatgactt
tctttcagctggtaacagtgaactgaagtgccgaccgtttgttttgtactgatggaagcgtacgttttccctcacttat
cagGTTGGAAAGAACCTTCACGTGCTGAGTCAGTTAGAAGGTGTGGACCTT
GATATGTACAGAGATACAGTTCTTCCTAGAGTCTTAGAGCAGgtaaaatattatat
cctgtaaacacacatacacttgcaaacttaacttattccctaaccaaaaactaatgtttttctgatgatttctcagA
TTGTGAACTGCAGAGATGAGATTGCCCAATATTACCTAATAGACTGTATAAT
TCAAGTTTTTCCTGACGAGTATCACTTGCAGACTCTAGATGTACTTCTTGG
GGCGTGTCCTCAACTTCAGgtagtctgagtagctctctcgtagttagcattatgctctgctcctaatca
tctaacttgattgcaactgcactgtcggggtttttggtgtcagtggttcactaacctcctgtttttttcactttctctttggttt
tcttctcagGCATCAGTTGACATCATGACAGTGCTTTCCCGTTTAATGGAGAGG
CTGTCAAATTATGCTGCCTTAAATGCGGAAGTATTACCTTATTTCCTGCAAG
TGGAAGCTTTCTCAAAGTTGAATAATGCAATTGGAAAGgtaacattacttgtaggctgt
gagtcctgttacacttttgtttctgtatgaaagtatgaaagtaatgacagcttagaaatcttgttgactggacttgtta
gcatcacgaaagacatggagaaggcccaactatttttttgcttgacatttcttttttaagtgaaactcttcactttaga
cgattccgtcaacatctagaaagatgcattgttctttatctgttggatgtatccgctgcaaagcgtgaccctaatta
acacagttctgcatgtgttcacttcttaaattatgtggttttgacgccttttttcccaatgaattttttttcttaagtgggaat
actatgatgttgcagGTGATAGAAGCACAAGAAGACATGCCTATTCTGAGTGCAGT
AACCCTATATTCCTCCCTTCTCAAGTTTACTCTTCACGTTCACCCTGATCGG
CTTGATTATGCGGACCAAGTGTTGgtcagttttcttatagcgactgtatttacctattgttctcttctta
actcaagtttaaatactccgtcttctgtcaatgttgcttagtggcctatcgatctcattatttagattatcttgtcattctat
tcacattgctgagacattgctaaatttgtgatacttgtgtcaaaagatgccaatctttttatataaaaaaaaaaaca
gatttcaggttcatctttgtttatcattcttgtagcggatattggttcaggaatttatgttttattctgttatatgtatgtgctttt
ttatacatgtttatggtacgctacatttgcagaaaacttcctactacgtatcttatacgaattttttgctggaatagggt
acattgggtgaaagagtttgaatttaatgttcttcactaactcttcccagGGATCATGTGTTAAGCAAC
TGTCCGGAAAAGGAAAGATTGATGACACTCGTGCAACAAAGGAGCTTGTC
TCGCTTTTAAGTGCTCCCTTAGAGAAGTATAATGATGTTGTCACCGCCCTT
AAACTAACTAACTATCCCCTCGTGGTGGAGTACCTTGATACCGAAACAAAG
AGAATAATGGCTACTGTTATAGTTCGAAGCATTATGAAAAACAATACTCTTA
TTACTACAGCGGAGAAGgtaggatacctttatgccattaccctttgggttgaactatttgtgtggtaaaa
ttgaatttctttgtcttgttttactttggtagGTTGAAGCATTGTTTGAACTGATTAAAGGAATTA
TCAACGATTTGGATGAGCCGCAAGGTCTTGAGgttgttataagcattttgagatactttctat
cttgtacttaatatcctctcaatcacttgtgaccccaatattaagttgtattttgacttctgtacatatttttctacgctctta
gGTTGATGAAGATGATTTTCAGGAGGAGCAGAATTCTGTTGCGCTTCTCAT
TCATATGTTATATAATGATGACCCAGAAGAGATGTTTAAGgtgcattgagctcttgca
cattcagattcattaatgtgtgtcatatctttatgctgactttaggctttctagttattggcttacattaatatttttaatgttt
aatggcatttattcccagttgcctttattctttgtttgtaattttcaactgagacatagattatacttatccccaggagtc
atcatttccttcaaaattctattaaggttttggctttcagtgtttgaagctgtttaattgaatttcagATAGTCAAT
GTCCTGAAGAAGCATTTCCTGACAGGAGGGCCAAAGCGCTTAAAATTCAC
CATTCCTCCCCTTGTTGTTTCTACTCTAAAGgtttgatttcttaagcgaagttattctttaaagc
ccctcataatttatgttgagcttaatttgcgatgacatggtttggttgtttctatttatgtgccataccgcatagaagagt
atctgatatagtttgttaattttgttcttcttcaaattctgtaatttttgttgtctgttttgtttaaattgtagCTAATCAGG
CGATTGCCAGTGGAAGGAGACAATCCTTTTGGAAAAGAGGCTTCCGTTACT
GCTACTAAAATATTCCAATTTCTAAATCAGgtaagtttcataacttttcactgccttgtgtgttca
gatctgtttttaggtagaacaatgtgtctcagagctacattacttttttaaaacttagccaaagttagttcatacaagc
tgattgacaacgttgaaaattctcaatcaacagtttgtagatattagctaaagagactgctgttattgctctgtctcc
actacagtcatcagttcttgtatttattgttggatttttttgcttgttgcagATTATCGAAGCGCTACCTAA
TGTTCCATCACCTGACCTGGCATTCCGGTTGTACTTGCAATGTGCTGAGgta
aaactacgtacctgtttatttagtatgcacttgctcggtaaaatgctcacagatctttcctatttcatcagGCTGC
GGATAAGTGTGATGAAGAACCAATTGCATACGAATTTTTCACCCAGGCATA
CATCTTATACGAAGAAGAAATTTCGgttagttatgttttcgcggacgttcatatgtctcgtgtctaa
132
accaaagcaagtaggtgagtttaactctctctcgtgcgtcaaaaacttgccttttgaacagGACTCAAAG
GCCCAGGTGACTGCGTTACAACTTATAATTGGAACTCTGCAGAGGATGCAA
GTATTTGGTGTTGAGAATAGAGATACATTAACGCACAAGGCTACGGGGgtatt
tcatcatgatatgataattagtctcttaccatattcaattgtctctgcttaaaggctgacaagggaaaacttatactct
tgcagTATGCAGCGAAACTTCTAAAGAAACCTGATCAATGTCGAGCTGTTTAT
GCCTG[vps35b.1_insertion]TTCTCATTTGTTCTGGCTGGAAGATCGTGAGA
CCATACAAGATGGAGAAAGgtaatactttcttctgtgcgatagaacacagtgtgtttaaaatagctca
ttaggtttcttacttgaaatcagGGTTCTACTTTGTCTGAAACGAGCGCTTAAAATTGCA
AATTCGGCTCAACAAGTGGCCAACACAGCTCGGGGTAGTACAGGGTCTGT
TACCCTCTTCATCGAGATACTAAACAAgtaagcgctttttattgtttgttttacttctcattttcccag
atcagcaagcagcattatgtaatgtgaattatggcagGTACCTCTATTTCTATGAGAAAGGGG
TTCCACAGATAACAGTTGAATCAGTAGAAAGCCTGATAAAACTGATCAAGA
ACGAAGAATCGATGCCCTCTGATCCATCTGCTGAATCATTCTTTGCAACTA
CGCTTGAGTTCATGGAGTTCCAAAAGCAGAAAGAGGGTGCCATTGGTGAG
AGATACCAGGCGATCAAAGTATAGaatctcatgttggtcaaaagaagtcggtttgtttttattaaa
aaaaaaaaaagctttgctttgagtcgtttgagaatcggaaagggactgatattgtgttgcttgctgctcgaaatgt
aaacccgggtttaggtctagttttattcctcggattttcaaggcggttcatggtctgtaacttgtggagtttgatgtgtg
tttatacgaatctttagtattctgagtttaacattggatattacatattagataatgtccttcatgaaaaaagtattcattt
tatgtattctgttgaatgattattgacgcataaaaacattttcaataccattgtataacgcataattaacaacctttgt
gaggcttcagccttcaaatatatcaatagaaattcagtttacgaaatcaacttaaaaaattttgaaaaaataata
aaatttaaaaaaaaaacaaattgtaggttaaatttttttaaaaacaaaataaaacagaaaagaaacgagaaa
aaaaa
VPS35c
aatgtgaggaaaaagggtaaatcaaattctaaatcatgtgatctatttacgtaatttaaattaccagcttactcaa
aagtcaaaattgaagttgtgagagataagactaatgtagcctaatgactatgaaatgagttatgtcaccatctaa
tttaaagttaaagctgataatgaagaaagaacaaatttaaatgaggagagtttcaattcgcgaagatctcaga
aaattcggtcaaaattgagacttctctctgtagagtaaagtagtgaggaacatagaATGATCGCCGAC
GACGATGAGAAGTGGCTTGCCGCCGCAATCGCCGCCGTTAAGCAAAATGC
TTTCTATATGCAGCGCGCTATCgtaccactccttcccctctcctctcttcttcatcaatcaattttccat
tcgggatcttagaaccctaaatcctgcattgttttctaacgaatctccaatttatcctcgcagGACTCCAACA
ATCTCAAAGACGCTCTAAAGTTTTCTGCTCAAATGCTCAGCGAGCTTCGTA
CTTCCAAGCTTTCTCCTCATAAATACTATGAACTCTgtgagtcgactcgttcgttacccg
ctcttcctctattcgtagctcttgtgtgtgattttcatttccttactgcgatctgatttgttcagATATGCGAGTTTT
TAATGAATTGGGGACTTTGGAGATATTCTTCAAGGAAGAGACGGGACGTG
GTTGCTCCATCGCAGAGCTCTACGAGTTAGTTCAGCATGCCGGTAACATTT
TGCCAAGATTgtaagtttcttttgttcttgttgttcagaatacgttgtcatttgcacactctgattttaactttgtag
atactcatcaattgagtttcagGTATCTTTTATGTACCATAGGATCTGTATATATCAAAT
CCAAGGATGTTACTGCTACGGATATACTTAAGGATCTTGTTGAAATGTGTC
GAGCTGTTCAACATCCCTTGCGTGGTCTTTTCTTGAGAAGTTATCTTGCTC
AAGTTACTAGGGATAAGTTACCCAGCATTGGTTCTGACTTGGAAGGgtaaggc
cacacgacctgctgttagatgattttgtttaagctatgaaacacatgttaagaggcaagtgttttgcttttctctccag
AGATGGTGATGCACACATGAATGCTTTAGAGTTCGTGCTTCAAAATTTTAC
CGAGATGAACAAATTATGGGTTCGAATGCAACATCAGgttggttggaaatgctaatga
tatactgctctcatatccagcatgtatgtattgaatatgtttgcgtactgtgagagtccaatgtctggtcatgctgctat
tttactctttgtgattttttcctactgagctgaacctgagttgagactttcttgtatccagGGACCTAGTCGGG
AGAAAGAAAAACGTGAGAAAGAAAGGAATGAACTACGTGATCTTgtaatagcatt
aacttgtacctgattcttctcatctagaaatttgcgagctatggttcttattattttctgaaagacacttttttcttctcttta
133
ctgcttgattagGTTGGTAAGAACCTTCATGTACTGAGTCAGTTAGAAGGTGTTG
ACCTTGGAATCTACAGAGATACTGTTCTTCCTAGAATACTGGAGCAGgtctggt
ttttctctccgtttagatagtttcagcatccaaccttccgtctacatgttgtcaactattgtctcactatgtcttaaaagG
TTGTCAACTGTAAGGATGAACTTGCACAGTGCTACCTTATGGATTGCATTAT
TCAAGTCTTTCCTGATGATTTTCACCTTCAAACTCTCGATGTATTATTGGGA
GCTTGTCCACAACTTCAGgtacttttttatgtaaccgaccacttcttttttatatgcatttttctttttaaaaa
attagactgcttataaacctcatcccatcagaacatgtttctcctatttttctttttgtcattttttgttccagCCGTCT
GTTGATATCAAGACGGTTCTTTCGGGTCTAATGGAAAGGCTTTCGAACTAT
GCTGCATCCAGTGTGGAAGCATTGCCTAATTTCCTGCAAGTAGAAGCCTTT
TCAAAGTTAAATTATGCCATTGGGAAGgtaaatcttgcatctacacatgaagtcaaaagtgat
aacttaaattgacacttgaatgtaagtttatttattaggggtgctagtattagaaacagctgaaaaatctagcttaat
ccccattgtgttgtgatatttttgtgtaatttttgttgccctactttcctctgattgtatcacattgataatctattgtatttata
aacagctgatcgagtccacatctattatgattcagGTTGTAGAAGCACAGGCGGACTTGCCG
GCCGCAGCATCTGTAACATTGTATTTGTTTCTTCTAAAATTCACTCTCCATG
TCTATTCCGATCGTCTGGACTATGTGGACCAAGTTTTGgtataacgctttctgaggat
gtataagcgtgttaagtttattcatctgtagtacttttttttaatgctttaagtagttctggttgttgctaatagacattgttgt
ctctcttttttcctttagGGATCATGTGTTACTCAACTATCTGCCACAGGAAAGCTTTG
TGATGACAAGGCAGCAAAACAGATCGTCGCATTTCTTAGCGCTCCATTGGA
AAAGTATAATAATGTAGTAACTATCTTGAAGCTTACCAACTATCCTCTAGTA
ATGGAATACCTTGACCGTGAGACAAACAAAGCAATGGCAATCATTCTAGTT
CAGAGCGTTTTTAAAAATAATACTCACATTGCTACTGCTGATGAGgtgtccgtttc
cttcctcgatttagtttcttcttttagcattctctattgatgtggagtttccttttattgtttgttacagGTCGATGCAT
TGTTTGAACTGGCAAAGGGTCTCATGAAGGATTTTGACGGAACAATTGATG
ATGAGgttctagaaccctacccttgttcgtcaattcttattgcatttgtgttcaaagtattcttgtcactcgaaagag
ttgtttttctttagataacttacgctttcaaacgggaaactatgttgggttctagATTGATGAAGAAGATTT
CCAGGAAGAGCAGAACCTTGTTGCACGGCTTGTTAACAAGCTCTACATTGA
TGATCCAGAAGAAATGTCTAAGgtgtgtattataattagaggcgttactatcaatgtggcctgtgta
cgattcttaattcgtttacccaactacagATTATTTTCACTGTAAGGAAGCACATCGTGGCT
GGAGGACCTAAGCGTTTGCCTTTAACTATACCACCCCTTGTCTTTTCCGCG
CTCAAGgtttgtattttttactgatcactttttcaccccatatccaacttggaaaagtttttcaatttaaaatttggga
ctgttgtgtcgtttccagttttgttatggatttggtagacattgtgtttgaatttggaactccgcacattgtattgatatgct
tgtgcagTTAATTCGGCGACTGCGAGGTGGAGATGAAAACCCATTTGGAGATG
ATGCATCAGCAACACCGAAGAGAATTCTACAGCTGTTAAGCGAGgcaagtgta
catacatcttgttgttagaaaatagtgaggacagtttagttgccttgcaaacatcattggctttgtcccagtgaaac
attgtatttcgttttgccagaatttctatgatgccagtcgattttgaatctcaatttttacagACGGTGGAAGTT
CTATCTGATGTTTCCGCACCTGATTTGGCATTGCGCCTCTACTTGCAGTGT
GCTCAGgtatatgatcatttggactttggtgtatccactatcaaaaccaacaaaagatatagtgcatgagctg
aaattttcttttattgaatctttagGCAGCAAATAACTGTGAGTTAGAAACAGTTGCTTATG
AGTTCTTCACAAAGGCATATCTTCTGTATGAAGAAGAAATTTCAgtaagtgaattt
cttaggaggctacatgcaactttctgcaagaaagaccttgacgtggctataaattttgggtgttttagGATTCA
AAGGCACAAGTAACAGCACTACGTTTAATAATAGGGACACTCCAGAGGATG
CGTGTATTTAATG[vps35c.1_insertion]TTGAGAACAGAGATACTCTGACTC
ACAAGGCGACAGGGgtatctttcctgacaaaccagttgagtaagtgtgtgctacactgaatactaaca
aggctaaattatgttgttgtagTATTCTGCAAGACTCCTAAGGAAGCCAGATCAGTGTA
GAGCTGTTTATGAATGTGCGCATCTCTTCTGGGCTGACGAATGCGAGAAC
CTGAAAGATGGAGAGAGgtatgtacacatatgaatgtatactcttaacatactaaatctcaaagttaa
gattggtagctgtacgcaactaataaggggagaaaaattgaggcatattgagagtttgttgggggtttttgggat
accagGGTTGTGCTTTGCCTGAAGAGGGCACAAAGGATTGCAGATGCTGTT
CAACAAATGGCCAACGCATCCCGTGGTACCAGTAGCACTGGGTCGGTGTC
TCTATATGTCGAACTGCTGAACAAgtaagtttctctttaagttgcatctatttagtacctactaccta
cgctgcaaaatgcttgcaatcgagtactttcccagttgtcactgctttctcttaatggttttacgagtcaaatgcagG
TATCTCTACTTCCTGGAGAAAGGAAACCAACAAGTAACAGGCGATACAATC
134
AAGAGCTTAGCTGAATTAATCAAAAGCGAGACAAAGAAGGTTGAGTCAGG
CGCAGAGCCATTCATAAATAGTACTCTACGTTACATCGAGTTTCAGAGACA
GCAAGAAGATGGTGGCATGAATGAGAAATACGAGAAAATCAAAATGGAAT
GGTTTGAATGAatttatttcttgcattctgttaggcaaatcatcaataatcaatggcttctcgtcttgttgcaga
agcatgaagagtaatcttgagaaataaaacttgtatcattacttttcatttcctagttacactctctaccaataaaca
tgtttgcaaagtttaaaacagtaacatacaaaatatggtcagctgaagaaatggtgggaatgaaaactgattca
aaacatgtggtgtttgtgtggaattatgcataggactgaatcgta
VSR1
taccaatcttttgaatagttatttcacttgagctctgatcaacattttttatggtatggtaaaaaatccacctctcagat
aaagcatcacatcagatgaaacacgaattttctttttcagtaccgtcagattaaaagattgcccacacgtgtttcc
cgtttactcttcttagattcttattcaatttgtacaaactctttttttacatatccacgtttccggaaacgacaaagaag
aacaattcaataaagtgataaacccttctccattcctaaattaaaataattaattaactcttaacattcatgaatctc
taatcctaatcaccattaataatttatccattagattaaatcttctttcttctcttccttaacttagaagactcttaaatcta
ataactctgttcacttccttcacacgaatcaaaaacaaaaagcttcgtacatttcgcctagctttcttcgttctaggg
tttcggatcttacttaattacccaaaacccttgaatccattactctgtgtgagcttaaatctcttaccttttatcactttca
atcgattgggtattaagttgtaaaggaaggaacttggaatcATGAAGCTTGGGCTTTTCACTCT
CTCGTTTCTTCTGATCTTGAATCTAGCAATGGGTAGATTCGTTGTTGAGAA
GAACAATCTCAAAGTTACATCACCTGATTCGATCAAAGGTATTTACGAATGT
GCCATTGGTAATTTCGGAGTTCCTCAATACGGTGGTACTTTAGTCGGCACC
GTCGTCTATCCTAAATCCAATCAGAAAGCTTGTAAAAGCTACTCCGATTTC
GATATCTCCTTCAAATCCAAACCTGGACGATTACCAACTTTTGTCCTTATCG
ATCGTGGAGgttcgatttcaaagttcttacctttttagttgttcccttaattgggtattcaaaagatctgtttttgatt
caattttggaactgaattttgttttcagATTGTTACTTCACTTTGAAAGCATGGATAGCTCA
ACAAGCTGGAGCAGCAGCGATACTTGTAGCTGATAGTAAAGCTGAGCCAT
TGATTACAATGGATACACCTGAAGAAGATAAATCTGATGCGGATTATCTTCA
GAACATTACCATTCCTTCTGCTCTCATTACTAAAACATTGGGAGACAGTATA
AAGTCTGCTCTTTCCGGTGGTGATATGGTTAACATGAAGCTAGATTGGACT
GAGTCGGTTCCACATCCTGATGAGCGTGTAGAGTATGAACTGTGGACTAAT
AGCAATGACGAGTGTGGGAAGAAGTGTGATACTCAGATTGAGTTTCTCAAG
AATTTTAAAGGAGCTGCTCAGATTCTTGAGAAAGGTGGGCATACTCAGTTC
ACGCCACATTATATTACTTGGTACTGTCCTGAAGCGTTTACGTTGAGTAAA
CAGTGTAAGTCTCAGTGCATCAACCATGGAAGGTATTGTGCGCCTGATCCT
GAGCAGGATTTTACGAAAGGGTATGATGGAAAGGATGTTGTTGTTCAGAAT
CTACGTCAGGCTTGTGTCTACAGAGTGATGAATGATACCGGTAAGCCGTG
GGTCTGGTGGGACTATGTGACTGACTTTGCTATCCGGTGTCCAATGAAGG
AGAAGAAGTACACCAAGGAATGCGCAGATGGAATTATTAAGTCCCTTGgtga
gtataatgcttcattacttcgatggtatttgctcctgtggtatcatttaattaacatattacctgtggctagaaattagtct
agaaactgaatagtattttcagatatcagctcatccataatatagaacaaccaattgattgatttaaatgagctta
ccttgtatggccttcttgaatccttatattcaccaagtggatattttgttactttcaaatttgtgcatttcatcctaaattttc
tgactatcagctatatatgcattcagGCATTGATCTCAAAAAGGTGGACAAGTGTATCGG
AGACCCTGAAGCAGATGTGGAGAACCCAGTTCTTAAAGCAGAGCAGGAGT
CACAGgtttgcctttttgtaactaattggctcttacacgtcttcccaagatcttgttggttttaagttgccatatattct
ccattcctcgaggtgttcttaaaatgagtctacttcaatttcagATAGGCAAAGGTTCCCGTGGAG
ACGTGACTATACTCCCAACTCTTGTCGTGAACAACAGACAATACAGAGgtatt
ggggaaatgttgacttgtcagtttttcctccctttctacagcatctttcttatgtttttgctaattgaattatctttcatgctttt
agGTAAATTGGAAAAGGGGGCAGTGCTTAAGGCTATGTGTTCGGGTTTTCA
AGAGTCAACGGAACCAGCTATCTGTCTTACTGAAGgtactattatccacgacatttgca
135
actctattttttttaacagaaaccacagttctcttgataggccaatatatattttcaaaacttgcattagagctatcctt
aattctttatctattgttttatatcatgggagagaaggatagttttcagacatgcttctgatttttgcagATTTGGA
AACTAATGAATGTTTGGAAAACAACGGTGGATGCTGGCAAGACAAAGCTG
CCAACATTACTGCATGCAGGgtatcatttgggcatgctattcttttcattttctaacttttggttttgttgttg
ctactatgactgattactcttatttggtatgaatcggcattttagGATACTTTTAGGGGAAGATTGTG
TGAGTGCCCTACTGTTCAAGGTGTTAAATTCGTTGGTGACGGTTACACTCA
CTGCAAAGgtaaattattttttatttgtgtagcttaacatgttttgtgtggaacaagatacagaacaatgccata
ccctctgcttagcttctgtattccacaagtatttgttactggtagtcaaacaatactctgatttgtcccttgtcgatatttt
cttggaatcaacttagatggtctatttgttgatttgtaccacctgcgagtataatatcaagattgataaataaaattgt
cacaatcattgattaaattttactttgaaaaagtttacacttgaagaaattcgttgatgtgactgagtggaaacattc
taatgttttttatgcagttctactgaattttcccctttgcgtatatcataagttgtcttac[vsr1.2_insertion]tata
atctttgttaatgtgcattttctgtatccatatgttatttcacttggcagCCTCTGGAGCTTTGCATTGTG
GTATCAACAATGGAGGATGCTGGAGAGAATCCCGAGGTGGCTTCACTTAC
TCTGCTTGCGTAgtaagtatttcgcttctatgtcaatgtgtttgaacaatcatcttcttaacctaaaataaaa
cttctttcgttcttgaaatggatttttcagGATGATCATTCAAAGGATTGCAAATGCCCACTT
GGGTTCAAGGGCGATGGAGTGAAGAACTGTGAAGgtacttgctaagtattatttacaaa
atatatgcatgtggtctacttgcatatataaagagctctatgtcgatttatttagtaggcatcactagtaacatgctat
tagcctattgtttatatgcatgcatgtcggtctagaatatttagtgcagatacacaattatcagtggaattttttattgta
ctcaatcaaagaagtgagatcattgttaaacgtcaaattcaaattcataatcatattcttttgattgaatgtgttagA
TGTGGACGAGTGCAAAGAAAAAACGGTGTGCCAGTGCCCAGAGTGTAAAT
GTAAAAACACTTGGGGAAGTTATGAATGCAGCTGCAGCAACGGTTTGCTTT
ACATGCGTGAGCACGACACTTGCATAGgtaaaagaaatagatcttccttcctcctagcccca
tacttaatttggagtcattaatcagactttctgcttttgaatctggtgaatgtgtagGTTCAGGCAAAGTTG
GAACCACAAAACTCAGCTGGAGCTTTTTGTGGATCCTTATAATCGGGGTGG
GTGTTGCAGGTCTTTCTGGATATGCAGTCTACAAATACAGAATCAGGgtatac
gatgttctttactctggtcttgttatcatattaaaacataggaagattttgcggtctagatagagatgctaagtaaca
attgaaattatgcagAGTTACATGGATGCGGAAATTAGAGGGATCATGGCACAGT
ACATGCCATTGGAAAGTCAACCACCCAACACAAGTGGTCACCATATGGATA
TATGAgtgaagttatcaacacaagccaagagactctctctctttctatgagtagctttaagtttccatgtaaatg
aatcaatggacttgatacaactctagagaaaagaaaaaagccacaaacagaaataagtgtacctgcactatt
tgtaagaaatgtggatttgcagatggaatctcctttttgtctgcatattactcttacaatgtaaaatagacataactg
gatgaataatgagtgttacattatttactgttgaaactaacttaagatccataagaagaaaaacaataaactttga
ttgagagattg
136
9.2 Oleosin promoter used for transgenic expression in A. thaliana
aactcgAGGAACTCTCTGGTAAGCTAGCTCCACTCCCCAGAAACAACCGGCG
CCAAATTGCGCGAATTGCTGACCTGAAGACGGAACATCATCGTCGGGTCC
TTGGGCGATTGCGGCGGAAGATGGGTCAGCTTGGGCTTGAGGACGAGAC
CCGAATCCGAGTCTGTTGAAAAGGTTGTTCATTGGGGATTTGTATACGGAG
ATTGGTCGTCGAGAGGTTTGAGGGAAAGGACAAATGGGTTTGGCTCTGGA
GAAAGAGAGTGCGGCTTTAGAGAGAGAATTGAGAGGTTTAGAGAGAGATG
CGGCGGCGATGAGCGGAGGAGAGACGACGAGGACCTGCATTATCAAAGC
AGTGACGTGGTGAAATTTGGAACTTTTAAGAGGCAGATAGATTTATTATTTG
TATCCATTTTCTTCATTGTTCTAGAATGTCGCGGAACAAATTTTAAAACTAAA
TCCTAAATTTTTCTAATTTTGTTGCCAATAGTGGATATGTGGGCCGTATAGA
AGGAATCTATTGAAGGCCCAAACCCATACTGACGAGCCCAAAGGTTCGTTT
TGCGTTTTATGTTTCGGTTCGATGCCAACGCCACATTCTGAGCTAGGCAAA
AAACAAACGTGTCTTTGAATAGACTCCTCTCGTTAACACATGCAGCGGCTG
CATGGTGACGCCATTAACACGTGGCCTACAATTGCATGATGTCTCCATTGA
CACGTGACTTCTCGTCTCCTTTCTTAATATATCTAACAAACACTCCTACCTC
TTCCAAAATATATACACATCTTTTTGATCAATCTCTCATTCAAAATCTCATTC
TCTCTAGTAAACAAGAACAAAAAAatcgat
137
9.3 Synthetic gene sequences from Chapter 5
PawS1
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGG
ACTCGATAACCCTAGAGGATGCCAGATCAGAATCCAGCAGCTTAACCATTG
CCAGATGCACCTCACCTCATTCGATTACAAGCTCAGGATGGCTGTTGAGAA
CCCTAAGCAACAGCAGCACCTTTCACTTTGCTGCAACCAGCTTCAAGAGGT
TGAGAAGCAGTGTCAGTGCGAGGCTATCAAGCAAGTTGTTGAGCAAGCTC
AGAAGCAGCTTCAACAAGGACAAGGTGGACAACAGCAGGTTCAGCAGATG
GTGAAGAAAGCTCAGATGCTCCCTAACCAGTGCAACCTCCAATGCTCTATC
TGATGAgagctc
PawS[N65]S1
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACCCTAGGGGATGCCAGATCAGAATCCAGCAGCTTAACGGAAGGT
GTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGGACTCGATAACCATT
GCCAGATGCACCTCACCTCATTCGATTACAAGCTCAGGATGGCTGTTGAG
AACCCTAAGCAACAGCAGCACCTTTCACTTTGCTGCAACCAGCTTCAAGAG
GTTGAGAAGCAGTGTCAGTGCGAGGCTATCAAGCAAGTTGTTGAGCAAGC
TCAGAAGCAGCTTCAACAAGGACAAGGTGGACAACAGCAGGTTCAGCAGA
TGGTGAAGAAAGCTCAGATGCTCCCTAACCAGTGCAACCTCCAATGCTCTA
TCTGATGAgagctc
PawS[N84]S1
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACCCTAGGGGATGCCAGATCAGAATCCAGCAGTTGAACCATTGCC
AGATGCACCTCACCTCATTCGATTACAAGCTCAGGATGGCTGTTGAGAACG
GAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGGACTCGATA
ACCCAAAGCAACAGCAGCACCTTTCACTCTGCTGCAACCAGCTTCAAGAG
GTTGAGAAGCAGTGTCAGTGCGAGGCTATCAAGCAAGTTGTTGAGCAAGC
TCAGAAGCAGCTTCAACAAGGACAAGGTGGACAACAGCAGGTTCAGCAGA
TGGTGAAGAAAGCTCAGATGCTCCCTAACCAGTGCAACCTCCAATGCTCTA
TCTGATGAgagctc
PawS[N96]S1
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACCCTAGGGGATGCCAGATCAGAATCCAGCAGTTGAACCATTGCC
AGATGCACCTCACCTCATTCGATTACAAGCTCAGGATGGCTGTTGAGAACC
CTAAGCAACAGCAGCACCTTTCACTTTGCTGCAACGGAAGGTGTACCAAGT
CTATCCCTCCTATCTGCTTCCCTGATGGACTCGATAACCAGCTCCAAGAGG
TTGAGAAGCAGTGTCAGTGTGAGGCTATCAAGCAAGTGGTTGAGCAAGCT
CAGAAGCAGCTTCAACAAGGACAAGGTGGACAACAGCAGGTTCAGCAGAT
GGTGAAGAAAGCTCAGATGCTCCCTAACCAGTGCAACCTCCAATGCTCTAT
CTGATGAgagctc
138
PawS[N143]S1
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACCCTAGGGGATGCCAGATCAGAATCCAGCAGTTGAACCATTGCC
AGATGCACCTCACCTCATTCGATTACAAGCTCAGGATGGCTGTTGAGAACC
CTAAGCAACAGCAGCACCTTTCACTTTGCTGCAACCAGCTTCAAGAGGTTG
AGAAGCAGTGTCAGTGCGAGGCTATCAAGCAAGTTGTTGAGCAAGCTCAG
AAGCAGCTTCAACAAGGACAAGGTGGACAACAGCAGGTTCAGCAGATGGT
GAAGAAAGCTCAGATGCTCCCTAACGGAAGGTGTACCAAGTCTATCCCTC
CTATCTGCTTCCCTGATGGACTCGATAACCAGTGCAACCTCCAGTGCTCTA
TCTGATGAgagctc
PawS[N146]S1
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACCCTAGGGGATGCCAGATCAGAATCCAGCAGTTGAACCATTGCC
AGATGCACCTCACCTCATTCGATTACAAGCTCAGGATGGCTGTTGAGAACC
CTAAGCAACAGCAGCACCTTTCACTTTGCTGCAACCAGCTTCAAGAGGTTG
AGAAGCAGTGTCAGTGCGAGGCTATCAAGCAAGTTGTTGAGCAAGCTCAG
AAGCAGCTTCAACAAGGACAAGGTGGACAACAGCAGGTTCAGCAGATGGT
GAAGAAAGCTCAGATGCTCCCTAACCAGTGCAACGGAAGATGTACCAAGT
CTATCCCTCCTATCTGCTTCCCTGATGGACTCGATAACCTCCAGTGCTCTA
TCTGATGAgagctc
PawS1[Δ2 21]
atcgatATGGGATACAAGACCTCTATCTCTACCATCACCATCGAGGATAACGG
AAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGGACTCGATAA
CCCTAGAGGATGCCAGATCAGAATCCAGCAGCTTAACCATTGCCAGATGC
ACCTCACCTCATTCGATTACAAGCTCAGGATGGCTGTTGAGAACCCTAAGC
AACAGCAGCACCTTTCACTTTGCTGCAACCAGCTTCAAGAGGTTGAGAAGC
AGTGTCAGTGCGAGGCTATCAAGCAAGTTGTTGAGCAAGCTCAGAAGCAG
CTTCAACAAGGACAAGGTGGACAACAGCAGGTTCAGCAGATGGTGAAGAA
AGCTCAGATGCTCCCTAACCAGTGCAACCTCCAATGCTCTATCTGATGAga
gctc
SesaS1
atcgatATGGCTAACAAGCTTTTCCTCGTGTGCGCTGCTCTTGCTCTTTGCTT
CCTTCTCACCAACGCTTCTATCTACAGGACCGTGGTTGAGTTCGAAGAGGA
TGATGCTACCAACGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCC
TGATGGACTCGATAACCCTATCGGACCTAAGATGAGGAAGTGCAGAAAAG
AGTTCCAGAAAGAGCAGCACCTCAGAGCTTGTCAGCAGCTTATGCTTCAAC
AGGCTAGACAGGGAAGGTCTGATGAGTTCGATTTCGAGGATGATATGGAA
AACCCTCAGGGACAGCAGCAAGAACAGCAACTTTTCCAGCAGTGCTGCAA
CGAGCTTAGGCAAGAGGAACCTGATTGCGTTTGCCCTACTCTTAAGCAGG
CTGCTAAGGCTGTTAGACTCCAAGGACAACATCAGCCTATGCAGGTTAGG
AAGATCTACCAGACTGCTAAGCACCTCCCTAACGTGTGCGATATCCCTCAA
GTTGATGTGTGCCCTTTCAATATCCCTTCTTTCCCATCTTTCTACTGATGAg
agctc
139
SesaS2
atcgatATGGCTAACAAGCTTTTCCTCGTGTGCGCTGCTCTTGCTCTTTGCTT
CCTTCTTACCAACGCTGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTT
CCCTGATGGACTCGATAACCCTATCGGACCTAAGATGAGGAAGTGCAGAA
AAGAGTTCCAGAAAGAGCAGCACCTCAGAGCTTGTCAGCAGCTTATGCTT
CAACAGGCTAGACAGGGAAGGTCTGATGAGTTCGATTTCGAGGATGATAT
GGAAAACCCTCAGGGACAGCAGCAAGAACAGCAACTTTTCCAGCAGTGCT
GCAACGAGCTTAGGCAAGAGGAACCTGATTGCGTTTGCCCTACTCTTAAG
CAGGCTGCTAAGGCTGTTAGACTCCAAGGACAACATCAGCCTATGCAGGT
TAGGAAGATCTACCAGACTGCTAAGCACCTCCCTAACGTGTGCGATATCCC
TCAAGTTGATGTGTGCCCTTTCAATATCCCTTCTTTCCCATCTTTCTACTGA
TGAgagctc
CraS1
atcgatATGGCTAGGGTGTCATCTCTCCTCAGTTTCTGCCTTACTCTCCTCAT
CCTCTTCCACGGATATGCTGCTGGAAGGTGTACCAAGTCTATCCCTCCTAT
CTGCTTCCCTGATGGACTCGATAACCAGCAAGGACAACAAGGACAGCAGT
TCCCTAACGAGTGTCAGCTTGATCAGCTTAACGCTCTCGAGCCTTCTCACG
TTTTGAAGTCTGAGGCTGGAAGAATCGAGGTGTGGGATCATCATGCTCCT
CAGCTTAGATGCTCTGGTGTGTCTTTCGCTAGATATATCATCGAGTCTAAG
GGACTCTACCTCCCTTCATTCTTCAACACCGCTAAGCTCTCATTCGTGGCT
AAGGGAAGAGGACTTATGGGAAAGGTTATCCCTGGATGCGCTGAGACTTT
CCAGGATTCTTCTGAGTTCCAGCCTAGGTTCGAGGGACAAGGACAATCTC
AGAGGTTCAGGGATATGCACCAGAAGGTTGAGCACATCAGGTCTGGTGAT
ACAATCGCTACTACTCCTGGTGTGGCTCAGTGGTTCTACAACGATGGACAA
GAGCCTCTCGTGATCGTGTCTGTTTTCGATCTCGCTTCTCACCAGAACCAG
CTCGATAGAAACCCTAGGCCTTTCTACCTCGCTGGAAACAACCCTCAAGGA
CAGGTTTGGCTTCAGGGAAGAGAGCAACAGCCTCAAAAGAACATCTTCAA
CGGATTCGGACCTGAGGTGATCGCTCAGGCTCTTAAGATTGATCTCCAGA
CTGCTCAGCAGCTTCAGAACCAGGATGATAACAGGGGTAACATCGTGAGA
GTGCAGGGACCTTTCGGAGTTATCAGACCTCCACTCAGAGGACAAAGGCC
TCAAGAAGAAGAAGAGGAAGAAGGTAGACACGGAAGGCATGGAAACGGA
CTCGAAGAGACTATCTGCTCTGCTAGGTGCACCGATAACCTCGATGATCCT
TCTAGGGCTGATGTGTACAAGCCTCAGCTCGGTTACATCTCTACCCTCAAC
TCTTACGATCTCCCTATCCTCAGATTCATCAGGCTCAGTGCTCTCAGGGGT
TCTATCAGACAGAACGCTATGGTTCTCCCTCAGTGGAACGCTAACGCTAAT
GCTATCCTCTACGTGACCGATGGTGAGGCTCAAATCCAGATCGTGAACGA
TAACGGAAACAGGGTGTTCGATGGACAGGTGTCACAGGGACAACTTATCG
CTGTTCCTCAGGGATTCTCTGTGGTGAAGAGAGCTACCTCTAACAGGTTCC
AGTGGGTTGAGTTCAAGACCAACGCAAACGCTCAGATCAACACCCTCGCT
GGTAGAACTTCTGTGCTTAGAGGACTCCCTTTGGAGGTGATCACTAACGG
ATTCCAGATCTCTCCTGAAGAGGCTAGAAGGGTGAAGTTCAACACTCTCGA
GACTACCCTCACCCACTCATCTGGACCTGCTTCATATGGTAGACCTAGAGT
GGCTGCTGCTTGATGAgagctc
140
PapS1
atcgatATGGCTATCACCAACAAGTTGATTATCACCCTCTTGCTCCTCATCTCT
ATCGCTGTGGTGCATTCTGGAAGGTGTACCAAGTCTATCCCTCCTATCTGC
TTCCCTGATGGACTCGATAACCTCTCATTCAGGGTTGAGATTGATGAGTTC
GAGCCTCCTCAACAGGGTGAACAAGAAGGACCTAGAAGAAGGCCTGGTG
GTGGATCTGGTGAAGGATGGGAAGAAGAGTCTACCAACCACCCTTACCAC
TTCAGAAAGAGATCTTTCTCAGATTGGTTCCAGTCTAAAGAGGGATTCGTT
AGGGTTCTCCCTAAGTTTACCAAGCACGCTCCTGCTTTGTTCAGGGGAATC
GAGAACTACAGGTTCTCACTCGTTGAGATGGAACCTACCACCTTCTTCGTG
CCTCATCACCTTGATGCTGATGCTGTGTTCATCGTGCTTCAGGGAAAGGGT
GTGATCGAGTTCGTGACCGATAAGACCAAAGAGTCTTTCCACATCACCAAG
GGTGATGTTGTGAGGATCCCTTCTGGTGTGACCAACTTCATCACTAACACC
AACCAGACCGTGCCTCTTAGACTCGCTCAAATCACTGTGCCTGTGAACAAC
CCTGGAAACTACAAGGATTACTTCCCTGCTGCTTCTCAGTTCCAGCAGTCT
TACTTCAACGGATTCACCAAAGAGGTGCTCTCAACCTCTTTCAACGTGCCA
GAGGAACTTCTCGGTAGACTCGTGACCAGGTCAAAAGAGATTGGACAGGG
AATTATCAGAAGGATCTCTCCTGATCAGATCAAAGAGCTTGCTGAGCACGC
TACCTCTCCATCTAACAAGCACAAGGCTAAGAAAGAGAAAGAAGAGGATAA
GGATCTCAGGACCCTCTGGACTCCTTTCAACCTTTTCGCTATTGATCCTAT
CTACTCTAACGATTTCGGACACTTCCACGAGGCTCACCCTAAGAACTACAA
CCAGCTTCAGGATCTCCATATCGCTGCTGCTTGGGCTAATATGACTCAGG
GATCTCTTTTCCTCCCTCACTTCAACTCTAAGACCACCTTCGTGACCTTCGT
TGAGAACGGATGTGCTAGGTTCGAGATGGCTACCCCTTACAAGTTCCAAA
GGGGTCAACAACAATGGCCTGGACAGGGTCAAGAGGAAGAGGAAGATAT
GTCTGAGAACGTGCACAAGGTGGTGTCTAGAGTGTGCAAGGGTGAGGTGT
TCATTGTGCCTGCTGGTCACCCTTTCACCATCTTGTCTCAGGATCAGGATT
TCATTGCTGTGGGATTCGGAATCTACGCTACCAACTCAAAGAGGACCTTCC
TCGCTGGTGAAGAGAACCTTCTCTCTAACCTTAACCCAGCTGCTACCAGG
GTTACCTTCGGAGTTGGATCTAAGGTTGCAGAGAAGCTTTTCACCTCTCAG
AACTACTCTTACTTCGCTCCTACCTCTAGATCTCAGCAGCAGATTCCAGAG
AAGCACAAGCCATCTTTCCAGTCAATCCTCGATTTCGCTGGATTCTGATGA
gagctc
LeaS1
atcgatATGGGACTCGAGAGGAAGGTGTACGGACTCGTTATGGTTTCTCTCG
TGCTCATGGCTATCGCTACCATGTCATCTGTTCAGGCTGGTAGGTGTACCA
AGTCTATCCCTCCTATCTGCTTCCCTGATGGACTCGATAACACCATCGAAG
AAGAGGCTGCTAAGGATGAGTCTTGGACCGATTGGGCTAAAGAGAAGATC
GGACTCAAGCACGAGGATAACATCCAGCCTACTCACACTACTACCACCGT
GCAAGATGATGCTTGGAGGGCTTCTCAAAAGGCTGAGGATGCTAAAGAGG
CAGCAAAGAGGAAGGCTGAAGAAGCTGTTGGAGCTGCAAAAGAGAAGGC
AGGATCTGCTTACGAGACTGCTAAGTCTAAGGTGGAAGAGGGACTCGCTT
CTGTGAAGGATAAGGCTTCTCAGTCTTACGATTCTGCTGGACAGGTGAAG
GATGATGTGTCTCACAAGTCTAAGCAGGTTAAGGATTCTCTCAGTGGTGAT
GAGAACGATGAGAGTTGGACCGGATGGGCAAAAGAAAAAATTGGTATTAA
GAACGAGGATATCAACTCTCCTAACCTCGGAGAGACTGTGTCTGAGAAGG
CAAAAGAGGCTAAAGAAGCAGCTAAAAGAAAAGCTGGTGATGCTAAAGAA
AAGCTCGCTGAGACTGTGGAAACCGCAAAAGAAAAGGCTTCTGATATGAC
CTCTGCTGCTAAAGAGAAAGCAGAGAAGCTCAAAGAAGAAGCAGAGAGGG
AATCTAAGTCAGCAAAAGAGAAAATCAAAGAGTCTTACGAAACCGCTAAGA
GTAAGGCTGATGAGACACTCGAGTCAGCTAAAGATAAGGCTAGTCAGAGT
141
TACGATAGTGCTGCTAGGAAGTCAGAGGAAGCTAAGGATACCGTTTCTCA
CAAGAGTAAGAGGGTGAAAGAATCTCTCACCGATGATGATGCTGAGCTTT
GATGAgagctc
OleoS1
atcgatATGGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGG
ACTCGATAACCCTGCTGATACTGCTAGAGGAACCCACCACGATATCATCG
GAAGAGATCAGTACCCTATGATGGGAAGGGATAGGGATCAATACCAGATG
TCTGGAAGGGGATCTGATTACTCTAAGTCTAGGCAGATCGCTAAGGCTGC
TACTGCTGTTACAGCTGGTGGATCTCTTCTCGTGCTCTCTTCTCTTACTCTC
GTGGGAACTGTGATCGCTCTTACTGTTGCTACTCCTCTCCTCGTGATCTTC
TCTCCTATCCTTGTGCCTGCTCTTATCACCGTGGCTTTGCTCATTACCGGA
TTCCTCTCATCTGGTGGATTCGGAATCGCTGCAATCACCGTGTTCTCTTGG
ATCTACAAGTACGCTACTGGTGAGCACCCTCAGGGTTCTGATAAGCTCGAT
TCTGCTAGGATGAAGCTCGGATCTAAGGCTCAGGATCTTAAGGATAGGGC
TCAGTACTACGGACAGCAGCATACTGGTGGTGAGCATGATAGAGATAGAA
CCAGAGGTGGACAGCACACTACCTGATGAgagctc
Paw4SFTI
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGG
ACTCGATAACGGTAGATGTACTAGATCAATTCCACCAATCTGTTTCCCAGA
TGGTTTGGATAATGGTAGGTGTACAAAGAGTATCCCACCAATTTGCTATCC
AGATGGTCTTGATAACGGAAGATGTACAAAAAGTATCCCTCCTGTGTGTTT
TCCTGATGGATTGGATAACCCTAGGGGATGCCAGATCAGAATCCAGCAGC
TTAACCATTGCCAGATGCACCTCACCTCATTCGATTACAAGCTCAGGATGG
CTGTTGAGAACCCTAAGCAACAGCAGCACCTTTCACTTTGCTGCAACCAGC
TTCAAGAGGTTGAGAAGCAGTGTCAGTGCGAGGCTATCAAGCAAGTTGTT
GAGCAAGCTCAGAAGCAGCTTCAACAAGGACAAGGTGGACAACAGCAGGT
TCAGCAGATGGTGAAGAAAGCTCAGATGCTCCCTAACCAGTGCAACCTCC
AGTGCTCTATCTGATGAgagctc
4SFTI
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGG
ACTCGATAACGGTAGATGTACTAGATCAATTCCACCAATCTGTTTCCCAGA
TGGTTTGGATAATGGTAGGTGTACAAAGAGTATCCCACCAATTTGCTATCC
AGATGGTCTTGATAACGGAAGATGTACAAAAAGTATCCCTCCTGTGTGTTT
TCCTGATGGATTGGATAACTGATGAgagctc
3SFTI
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGG
ACTCGATAACGGTAGATGTACTAGATCAATTCCACCAATCTGTTTCCCAGA
TGGTTTGGATAATGGTAGGTGTACAAAGAGTATCCCACCAATTTGCTATCC
AGATGGTCTTGATAACTGATGAgagctc
142
2SFTI
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGG
ACTCGATAACGGTAGATGTACTAGATCAATTCCACCAATCTGTTTCCCAGA
TGGTTTGGATAATTGATGAgagctc
1SFTI
atcgatATGGCTAAGCTCATCATCCTCGTGGTGCTTGCTATCCTCGCTTTCGT
TGAGGTTTCAGTGTCTGGATACAAGACCTCTATCTCTACCATCACCATCGA
GGATAACGGAAGGTGTACCAAGTCTATCCCTCCTATCTGCTTCCCTGATGG
ACTCGATAACTGATGAgagctc
143
9.4 Chimeric protein precursors from Chapter 5
The above sequences are amino acid sequences of a synthetic PawS1, PawS1
derivatives
with SFTI-GLDN inserted after
alternative asparagine sites,
synthetic proteins that combine an abundant seed protein with SFTI-GLDN
inserted after the ER signal (or after start Met for Oleosin), a synthetic PawS1
with four tandem SFTI-GLDN variants and a suite of derivatives lacking PawS1
albumin and including from four down to one SFTI-GLDN (or variants). In this
last cases, each SFTI-1 variant sequence has one alternative residue that alters
the peptide mass and that does not interfere with peptide processing.
Sequences are colour coded to denote ER signal (rose), SFTI-1 (aqua), PawS1
albumin small subunit (green) and large subunit (orange) or other seed protein
(brown). Spacer regions or the GLDN tail are coloured black. Mutations to the
usual sequence of SFTI-1 are shown in red.
PawS1
[151 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNGRCTKSIPPICFPDGLDNPRGCQI
RIQQLNHCQMHLTSFDYKLRMAVENPKQQQHLSLCCNQLQEVEKQCQCEAIKQVVEQAQ
KQLQQGQGGQQQVQQMVKKAQMLPNQCNLQCSI
PawS[N65]S1
[151 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNPRGCQIRIQQLNGRCTKSIPPICF
PDGLDNHCQMHLTSFDYKLRMAVENPKQQQHLSLCCNQLQEVEKQCQCEAIKQVVEQAQ
KQLQQGQGGQQQVQQMVKKAQMLPNQCNLQCSI
PawS[N84]S1
[151 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNPRGCQIRIQQLNHCQMHLTSFDYK
LRMAVENGRCTKSIPPICFPDGLDNPKQQQHLSLCCNQLQEVEKQCQCEAIKQVVEQAQ
KQLQQGQGGQQQVQQMVKKAQMLPNQCNLQCSI
PawS[N96]S1
[151 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNPRGCQIRIQQLNHCQMHLTSFDYK
LRMAVENPKQQQHLSLCCNGRCTKSIPPICFPDGLDNQLQEVEKQCQCEAIKQVVEQAQ
KQLQQGQGGQQQVQQMVKKAQMLPNQCNLQCSI
PawS[N143]S1
[151 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNPRGCQIRIQQLNHCQMHLTSFDYK
LRMAVENPKQQQHLSLCCNQLQEVEKQCQCEAIKQVVEQAQKQLQQGQGGQQQVQQMVK
KAQMLPNGRCTKSIPPICFPDGLDNQCNLQCSI
PawS[N146]S1
[151 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNPRGCQIRIQQLNHCQMHLTSFDYK
LRMAVENPKQQQHLSLCCNQLQEVEKQCQCEAIKQVVEQAQKQLQQGQGGQQQVQQMVK
KAQMLPNQCNGRCTKSIPPICFPDGLDNLQCSI
144
PawS1[Δ2-21]
[131 residues]
MGYKTSISTITIEDNGRCTKSIPPICFPDGLDNPRGCQIRIQQLNHCQMHLTSFDYKLR
MAVENPKQQQHLSLCCNQLQEVEKQCQCEAIKQVVEQAQKQLQQGQGGQQQVQQMVKKA
QMLPNQCNLQCSI
SesaS1
[182 residues]
MANKLFLVCAALALCFLLTNASIYRTVVEFEEDDATNGRCTKSIPPICFPDGLDNPIGP
KMRKCRKEFQKEQHLRACQQLMLQQARQGRSDEFDFEDDMENPQGQQQEQQLFQQCCNE
LRQEEPDCVCPTLKQAAKAVRLQGQHQPMQVRKIYQTAKHLPNVCDIPQVDVCPFNIPS
FPSFY
SesaS2
[166 residues]
MANKLFLVCAALALCFLLTNAGRCTKSIPPICFPDGLDNPIGPKMRKCRKEFQKEQHLR
ACQQLMLQQARQGRSDEFDFEDDMENPQGQQQEQQLFQQCCNELRQEEPDCVCPTLKQA
AKAVRLQGQHQPMQVRKIYQTAKHLPNVCDIPQVDVCPFNIPSFPSFY
CraS1
[490 residues]
MARVSSLLSFCLTLLILFHGYAAGRCTKSIPPICFPDGLDNQQGQQGQQFPNECQLDQL
NALEPSHVLKSEAGRIEVWDHHAPQLRCSGVSFARYIIESKGLYLPSFFNTAKLSFVAK
GRGLMGKVIPGCAETFQDSSEFQPRFEGQGQSQRFRDMHQKVEHIRSGDTIATTPGVAQ
WFYNDGQEPLVIVSVFDLASHQNQLDRNPRPFYLAGNNPQGQVWLQGREQQPQKNIFNG
FGPEVIAQALKIDLQTAQQLQNQDDNRGNIVRVQGPFGVIRPPLRGQRPQEEEEEEGRH
GRHGNGLEETICSARCTDNLDDPSRADVYKPQLGYISTLNSYDLPILRFIRLSALRGSI
RQNAMVLPQWNANANAILYVTDGEAQIQIVNDNGNRVFDGQVSQGQLIAVPQGFSVVKR
ATSNRFQWVEFKTNANAQINTLAGRTSVLRGLPLEVITNGFQISPEEARRVKFNTLETT
LTHSSGPASYGRPRVAAA
PapS1
[504 residues]
MAITNKLIITLLLLISIAVVHSGRCTKSIPPICFPDGLDNLSFRVEIDEFEPPQQGEQE
GPRRRPGGGSGEGWEEESTNHPYHFRKRSFSDWFQSKEGFVRVLPKFTKHAPALFRGIE
NYRFSLVEMEPTTFFVPHHLDADAVFIVLQGKGVIEFVTDKTKESFHITKGDVVRIPSG
VTNFITNTNQTVPLRLAQITVPVNNPGNYKDYFPAASQFQQSYFNGFTKEVLSTSFNVP
EELLGRLVTRSKEIGQGIIRRISPDQIKELAEHATSPSNKHKAKKEKEEDKDLRTLWTP
FNLFAIDPIYSNDFGHFHEAHPKNYNQLQDLHIAAAWANMTQGSLFLPHFNSKTTFVTF
VENGCARFEMATPYKFQRGQQQWPGQGQEEEEDMSENVHKVVSRVCKGEVFIVPAGHPF
TILSQDQDFIAVGFGIYATNSKRTFLAGEENLLSNLNPAATRVTFGVGSKVAEKLFTSQ
NYSYFAPTSRSQQQIPEKHKPSFQSILDFAGF
LeaS1
[316 residues]
MGLERKVYGLVMVSLVLMAIATMSSVQAGRCTKSIPPICFPDGLDNTIEEEAAKDESWT
DWAKEKIGLKHEDNIQPTHTTTTVQDDAWRASQKAEDAKEAAKRKAEEAVGAAKEKAGS
AYETAKSKVEEGLASVKDKASQSYDSAGQVKDDVSHKSKQVKDSLSGDENDESWTGWAK
EKIGIKNEDINSPNLGETVSEKAKEAKEAAKRKAGDAKEKLAETVETAKEKASDMTSAA
KEKAEKLKEEAERESKSAKEKIKESYETAKSKADETLESAKDKASQSYDSAARKSEEAK
DTVSHKSKRVKESLTDDDAEL
OleoS1
[192 residues]
MGRCTKSIPPICFPDGLDNPADTARGTHHDIIGRDQYPMMGRDRDQYQMSGRGSDYSKS
RQIAKAATAVTAGGSLLVLSSLTLVGTVIALTVATPLLVIFSPILVPALITVALLITGF
LSSGGFGIAAITVFSWIYKYATGEHPQGSDKLDSARMKLGSKAQDLKDRAQYYGQQHTG
GEHDRDRTRGGQHTT
145
Paw4SFTI
[205 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNGRCTKSIPPICFPDGLDNGRCTRS
IPPICFPDGLDNGRCTKSIPPICYPDGLDNGRCTKSIPPVCFPDGLDNPRGCQIRIQQL
NHCQMHLTSFDYKLRMAVENPKQQQHLSLCCNQLQEVEKQCQCEAIKQVVEQAQKQLQQ
GQGGQQQVQQMVKKAQMLPNQCNLQCSI
4SFTI
[107 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNGRCTKSIPPICFPDGLDNGRCTRS
IPPICFPDGLDNGRCTKSIPPICYPDGLDNGRCTKSIPPVCFPDGLDN
3SFTI
[89 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNGRCTKSIPPICFPDGLDNGRCTRS
IPPICFPDGLDNGRCTKSIPPICYPDGLDN
2SFTI
[71 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNGRCTKSIPPICFPDGLDNGRCTRS
IPPICFPDGLDN
1SFTI
[53 residues]
MAKLIILVVLAILAFVEVSVSGYKTSISTITIEDNGRCTKSIPPICFPDGLDN
146
References
Abdelsamad, A. and A. Pecinka (2014). "Pollen-specific activation of
Arabidopsis
retrogenes
is
associated
with
global
transcriptional
reprogramming." Plant Cell 26(8): 3299-3313.
Aboye, T. L., H. Ha, S. Majumder, F. Christ, Z. Debyser, A. Shekhtman, N.
Neamati and J. A. Camarero (2012). "Design of a novel cyclotide-based CXCR4
antagonist with anti-human immunodeficiency virus (HIV)-1 activity." Journal of
Medicinal Chemistry 55(23): 10729-10734.
Akiva, P., A. Toporik, S. Edelheit, Y. Peretz, A. Diber, R. Shemesh, A. Novik
and R. Sorek (2006). "Transcription-mediated gene fusion in the human
genome." Genome Research 16(1): 30-36.
Arguello, J. R., Y. Chen, S. Yang, W. Wang and M. Long (2006). "Origination of
an X-linked testes chimeric gene by illegitimate recombination in Drosophila."
PLoS Genetics 2(5): e77.
Arnison, P. G., M. J. Bibb, G. Bierbaum, A. A. Bowers, T. S. Bugni, G. Bulaj, J.
A. Camarero, D. J. Campopiano, G. L. Challis, J. Clardy, P. D. Cotter, D. J.
Craik, M. Dawson, E. Dittmann, S. Donadio, P. C. Dorrestein, K.-D. Entian, M.
A. Fischbach, J. S. Garavelli, U. Goransson, C. W. Gruber, D. H. Haft, T. K.
Hemscheidt, C. Hertweck, C. Hill, A. R. Horswill, M. Jaspars, W. L. Kelly, J. P.
Klinman, O. P. Kuipers, A. J. Link, W. Liu, M. A. Marahiel, D. A. Mitchell, G. N.
Moll, B. S. Moore, R. Muller, S. K. Nair, I. F. Nes, G. E. Norris, B. M. Olivera, H.
Onaka, M. L. Patchett, J. Piel, M. J. T. Reaney, S. Rebuffat, R. P. Ross, H.-G.
Sahl, E. W. Schmidt, M. E. Selsted, K. Severinov, B. Shen, K. Sivonen, L.
Smith, T. Stein, R. D. Sussmuth, J. R. Tagg, G.-L. Tang, A. W. Truman, J. C.
Vederas, C. T. Walsh, J. D. Walton, S. C. Wenzel, J. M. Willey and W. A. van
der Donk (2013). "Ribosomally synthesized and post-translationally modified
peptide natural products: overview and recommendations for a universal
nomenclature." Natural Product Reports 30(1): 108-160.
Autelitano, D. J., A. Rajic, A. I. Smith, M. C. Berndt, L. L. Ilag and M. Vadas
(2006). "The cryptome: a subset of the proteome, comprising cryptic peptides
with distinct bioactivities." Drug Discovery Today 11(7-8): 306-314.
Babasaki, K., T. Takao, Y. Shimonishi and K. Kurahashi (1985). "Subtilosin A, a
New Antibiotic Peptide Produced by Bacillus subtilis 168: Isolation, Structural
Analysis, and Biogenesis." Journal of Biochemistry 98(3): 585-603.
Babushok, D. V., K. Ohshima, E. M. Ostertag, X. Chen, Y. Wang, P. K. Mandal,
N. Okada, C. S. Abrams and H. H. Kazazian, Jr. (2007). "A novel testis
ubiquitin-binding protein gene arose by exon shuffling in hominoids." Genome
Research 17(8): 1129-1138.
Bai, Y., C. Casola, C. Feschotte and E. Betrán (2007). "Comparative genomics
reveals a constant rate of origination and convergent acquisition of functional
retrogenes in Drosophila." Genome Biology 8(1): R11-R11.
147
Bailey, J. A., G. Liu and E. E. Eichler (2003). "An Alu Transposition Model for
the Origin and Expansion of Human Segmental Duplications." American Journal
of Human Genetics 73(4): 823-834.
Bantscheff, M., S. Lemeer, M. M. Savitski and B. Kuster (2012). "Quantitative
mass spectrometry in proteomics: critical review update from 2007 to the
present." Analytical and Bioanalytical Chemistry 404(4): 939-965.
Barber, C. J. S., P. T. Pujara, D. W. Reed, S. Chiwocha, H. Zhang and P. S.
Covello (2013). "The two-step biosynthesis of cyclic peptides from linear
precursors in a member of the plant family Caryophyllaceae involves cyclization
by a serine protease-like enzyme." Journal of Biological Chemistry 288(18):
12500-12510.
Barbeta, B. L., A. T. Marshall, A. D. Gillon, D. J. Craik and M. A. Anderson
(2008). "Plant cyclotides disrupt epithelial cells in the midgut of lepidopteran
larvae." Proceedings of the National Academy of Sciences of the United States
of America 105(4): 1221-1225.
Barry, D. G., N. L. Daly, R. J. Clark, L. Sando and D. J. Craik (2003).
"Linearization of a naturally occurring circular protein maintains structure but
eliminates hemolytic activity." Biochemistry 42(22): 6688-6695.
Bechtold, N., J. Ellis and G. Pelletier (1993). "In planta Agrobacterium-mediated
gene transfer by infiltration of adult Arabidopsis thaliana plants." Comptes
rendus de l'Académie des sciences. Série III, Sciences de la vie 316: 11941199.
Becker, C., J. Hagmann, J. Muller, D. Koenig, O. Stegle, K. Borgwardt and D.
Weigel (2011). "Spontaneous epigenetic variation in the Arabidopsis thaliana
methylome." Nature 480(7376): 245-249.
Begun, D. J., H. A. Lindfors, A. D. Kern and C. D. Jones (2007). "Evidence for
de novo evolution of testis-expressed genes in the Drosophila
yakuba/Drosophila erecta clade." Genetics 176(2): 1131-1137.
Beiko, R. G., T. J. Harlow and M. A. Ragan (2005). "Highways of gene sharing
in prokaryotes." Proceedings of the National Academy of Sciences of the United
States of America 102(40): 14332-14337.
Benk, A. S. and C. Roesli (2012). "Label-free quantification using MALDI mass
spectrometry: considerations and perspectives." Analytical and Bioanalytical
Chemistry 404(4): 1039-1056.
Bernath-Levin, K., C. Nelson, Alysha G. Elliott, Achala S. Jayasena, A. H. Millar,
David J. Craik and Joshua S. Mylne (2015). "Peptide macrocyclization by a
bifunctional endoprotease." Chemistry & Biology 22(5): 571-582.
Betran, E. and M. Long (2003). "Dntf-2r, a young Drosophila retroposed gene
with specific male expression under positive Darwinian selection." Genetics
164(3): 977-988.
Black, D. L. (1998). "Splicing in the inner ear: a familiar tune, but what are the
instruments?" Neuron 20(2): 165-168.
Blackmer, T. M. and J. S. Schepers (1995). "Use of a chlorophyll meter to
monitor nitrogen status and schedule fertigation for corn." Journal of Production
Agriculture 8(1).
148
Blencowe, B. J. (2006). "Alternative splicing: new insights from global
analyses." Cell 126(1): 37-47.
Blond, A., J. Péduzzi, C. Goulard, M. J. Chiuchiolo, M. Barthélémy, Y. Prigent,
R. A. Salomón, R. N. Farías, F. Moreno and S. Rebuffat (1999). "The cyclic
structure of microcin J25, a 21-residue peptide antibiotic from Escherichia coli."
European Journal of Biochemistry 259(3): 747-756.
Bode, W. and R. Huber (1992). "Natural protein proteinase inhibitors and their
interaction with proteinases." European Journal of Biochemistry 204(2): 433451.
Bonen, L. (1993). "Trans-splicing of pre-mRNA in plants, animals, and protists."
FASEB Journal 7(1): 40-46.
Bornberg-Bauer, E., A. K. Huylmans and T. Sikosek (2010). "How do new
proteins arise?" Current Opinion in Structural Biology 20(3): 390-396.
Boy, R. G., W. Mier, E. M. Nothelfer, A. Altmann, M. Eisenhut, H. Kolmar, M.
Tomaszowski, S. Kramer and U. Haberkorn (2010). "Sunflower trypsin inhibitor
1 derivatives as molecular scaffolds for the development of novel peptidic
radiopharmaceuticals." Mol Imaging Biol 12(4): 377-385.
Boyko, A., T. Blevins, Y. Yao, A. Golubov, A. Bilichak, Y. Ilnytskyy, J. Hollander,
F. Meins, Jr. and I. Kovalchuk (2010). "Transgenerational adaptation of
Arabidopsis to stress requires DNA methylation and the function of Dicer-like
proteins." PLoS ONE 5(3): e9514.
Breeze, E., E. Harrison, S. McHattie, L. Hughes, R. Hickman, C. Hill, S. Kiddle,
Y. S. Kim, C. A. Penfold, D. Jenkins, C. Zhang, K. Morris, C. Jenner, S.
Jackson, B. Thomas, A. Tabrett, R. Legaie, J. D. Moore, D. L. Wild, S. Ott, D.
Rand, J. Beynon, K. Denby, A. Mead and V. Buchanan-Wollaston (2011).
"High-resolution temporal profiling of transcripts during Arabidopsis leaf
senescence reveals a distinct chronology of processes and regulation." Plant
Cell 23(3): 873-894.
Brennan, G., Y. Kozyrev and S. L. Hu (2008). "TRIMCyp expression in Old
World primates Macaca nemestrina and Macaca fascicularis." Proceedings of
the National Academy of Sciences of the United States of America 105(9):
3569-3574.
Brennicke, A., A. Marchfelder and S. Binder (1999). "RNA editing." FEMS
Microbiology Reviews 23(3): 297-316.
Brosius, J. (1991). "Retroposons: seeds of evolution." Science 251(4995): 753.
Buchanan-Wollaston, V., S. Earl, E. Harrison, E. Mathas, S. Navabpour, T.
Page and D. Pink (2003). "The molecular analysis of leaf senescence--a
genomics approach." Plant Biotechnology Journal 1(1): 3-22.
Cai, J., R. Zhao, H. Jiang and W. Wang (2008). "De novo origination of a new
protein-coding gene in Saccharomyces cerevisiae." Genetics 179(1): 487-496.
Carrington, J. C. and W. G. Dougherty (1988). "A viral cleavage site cassette:
identification of amino acid sequences required for tobacco etch virus
polyprotein processing." Proceedings of the National Academy of Sciences of
the United States of America 85(10): 3391-3395.
149
Carter, J. J., M. D. Daugherty, X. Qi, A. Bheda-Malge, G. C. Wipf, K. Robinson,
A. Roman, H. S. Malik and D. A. Galloway (2013). "Identification of an
overprinting gene in Merkel cell polyomavirus provides evolutionary insight into
the birth of viral genes." Proceedings of the National Academy of Sciences of
the United States of America 110(31): 12744-12749.
Carvunis, A.-R., T. Rolland, I. Wapinski, M. A. Calderwood, M. A. Yildirim, N.
Simonis, B. Charloteaux, C. A. Hidalgo, J. Barbette, B. Santhanam, G. A. Brar,
J. S. Weissman, A. Regev, N. Thierry-Mieg, M. E. Cusick and M. Vidal (2012).
"Proto-genes and de novo gene birth." Nature 487(7407): 370-374.
Chan, L. Y., S. Gunasekera, S. T. Henriques, N. F. Worth, S.-J. Le, R. J. Clark,
J. H. Campbell, D. J. Craik and N. L. Daly (2011). "Engineering pro-angiogenic
peptides using stable disulfide-rich cyclic scaffolds." Blood 118(25): 6709-6717.
Chance, R. E., R. M. Ellis and W. W. Bromer (1968). "Porcine proinsulin:
characterization and amino acid sequence." Science 161(3837): 165-167.
Charlesworth, D., F. L. Liu and L. Zhang (1998). "The evolution of the alcohol
dehydrogenase gene family by loss of introns in plants of the genus
Leavenworthia (Brassicaceae)." Molecular Biology and Evolution 15(5): 552559.
Chen, S., Y. E. Zhang and M. Long (2010). "New genes in Drosophila quickly
become essential." Science 330(6011): 1682-1685.
Colgrave, M. L. and D. J. Craik (2004). "Thermal, chemical, and enzymatic
stability of the cyclotide kalata B1: the importance of the cyclic cystine knot."
Biochemistry 43(20): 5965-5975.
Colgrave, M. L., A. C. Kotze, Y. H. Huang, J. O'Grady, S. M. Simonsen and D.
J. Craik (2008). "Cyclotides: natural, circular plant peptides that possess
significant activity against gastrointestinal nematode parasites of sheep."
Biochemistry 47(20): 5581-5589.
Colgrave, M. L., A. C. Kotze, S. Kopp, J. S. McCarthy, G. T. Coleman and D. J.
Craik (2009). "Anthelmintic activity of cyclotides: In vitro studies with canine and
human hookworms." Acta Trop 109(2): 163-166.
Colorado, P. C., A. Torre, G. Kamphaus, Y. Maeshima, H. Hopfer, K.
Takahashi, R. Volk, E. D. Zamborsky, S. Herman, P. K. Sarkar, M. B. Ericksen,
M. Dhanabal, M. Simons, M. Post, D. W. Kufe, R. R. Weichselbaum, V. P.
Sukhatme and R. Kalluri (2000). "Anti-angiogenic cues from vascular basement
membrane collagen." Cancer Research 60(9): 2520-2526.
Conant, G. C. and K. H. Wolfe (2008). "Turning a hobby into a job: how
duplicated genes find new functions." Nature Reviews Genetics 9(12): 938-950.
Condie, J. A., G. Nowak, D. W. Reed, J. John Balsevich, M. J. T. Reaney, P. G.
Arnison and P. S. Covello (2011). "The biosynthesis of Caryophyllaceae-like
cyclic peptides in Saponaria vaccaria L. from DNA-encoded precursors." The
Plant Journal 67: 682–690.
Cordaux, R., S. Udit, M. A. Batzer and C. Feschotte (2006). "Birth of a chimeric
primate gene by capture of the transposase gene from a mobile element."
Proceedings of the National Academy of Sciences of the United States of
America 103(21): 8101-8106.
150
Coudreuse, D. Y., G. Roel, M. C. Betist, O. Destree and H. C. Korswagen
(2006). "Wnt gradient formation requires retromer function in Wnt-producing
cells." Science 312(5775): 921-924.
Craik, D. J. (2001). "Plant cyclotides: circular, knotted peptide toxins." Toxicon
39(12): 1809-1813.
Craik, D. J., N. L. Daly, T. Bond and C. Waine (1999). "Plant cyclotides: A
unique family of cyclic and knotted proteins that defines the cyclic cystine knot
structural motif." J. Mol. Biol. 294(5): 1327-1336.
Craik, D. J., J. S. Mylne and N. L. Daly (2010). "Cyclotides: macrocyclic
peptides with applications in drug design and agriculture." Cellular and
Molecular Life Sciences 67(1): 9-16.
Crisp, A., C. Boschetti, M. Perry, A. Tunnacliffe and G. Micklem (2015).
"Expression of multiple horizontally acquired genes is a hallmark of both
vertebrate and invertebrate genomes." Genome Biol 16(50): 015-0607.
D'Hondt, K., J. Van Damme, C. Van Den Bossche, S. Leejeerajumnean, R. De
Rycke, J. Derksen, J. Vandekerckhove and E. Krebbers (1993). "Studies of the
role of the propeptides of the Arabidopsis thaliana 2S albumin." Plant
Physiology 102(2): 425-433.
Daly, N. L., Y.-K. Chen, F. M. Foley, P. S. Bansal, R. Bharathi, R. J. Clark, C. P.
Sommerhoff and D. J. Craik (2006). "The absolute structural requirement for a
proline in the P3'-position of Bowman-Birk protease inhibitors is surmounted in
the minimized SFTI-1 scaffold." Journal of Biological Chemistry 281(33): 2366823675.
Daly, N. L., R. J. Clark, M. R. Plan and D. J. Craik (2006). "Kalata B8, a novel
antiviral circular protein, exhibits conformational flexibility in the cystine knot
motif." Biochemical Journal 393(3): 619-626.
Daly, N. L., K. R. Gustafson and D. J. Craik (2004). "The role of the cyclic
peptide backbone in the anti-HIV activity of the cyclotide kalata B1." FEBS Lett.
574(1-3): 69-72.
Delaye, L., A. Deluna, A. Lazcano and A. Becerra (2008). "The origin of a novel
gene through overprinting in Escherichia coli." BMC Evolutionary Biology 8: 31.
Dias, G. C., V. Gressler, S. C. Hoenzel, U. F. Silva, Dalcol, II and A. F. Morel
(2007). "Constituents of the roots of Melochia chamaedrys." Phytochemistry
68(5): 668-672.
Domazet-Loso, T. and D. Tautz (2003). "An evolutionary analysis of orphan
genes in Drosophila." Genome Research 13(10): 2213-2219.
Douglass, J., O. Civelli and E. Herbert (1984). "Polyprotein gene expression:
generation of diversity of neuroendocrine peptides." Annual Review of
Biochemistry 53: 665-715.
Drouin, G. and G. A. Dover (1990). "Independent gene evolution in the potato
actin gene family demonstrated by phylogenetic procedures for resolving gene
conversions and the phylogeny of angiosperm actin genes." Journal of
Molecular Evolution 31(2): 132-150.
Duncan, M. W., H. Roder and S. W. Hunsucker (2008). "Quantitative matrixassisted laser desorption/ionization mass spectrometry." Brief Funct Genomic
Proteomic 7(5): 355-370.
151
Dunning Hotopp, J. C., M. E. Clark, D. C. Oliveira, J. M. Foster, P. Fischer, M.
C. Munoz Torres, J. D. Giebel, N. Kumar, N. Ishmael, S. Wang, J. Ingram, R. V.
Nene, J. Shepard, J. Tomkins, S. Richards, D. J. Spiro, E. Ghedin, B. E. Slatko,
H. Tettelin and J. H. Werren (2007). "Widespread lateral gene transfer from
intracellular bacteria to multicellular eukaryotes." Science 317(5845): 17531756.
Dutton, J. L., R. F. Renda, C. Waine, R. J. Clark, N. L. Daly, C. V. Jennings, M.
A. Anderson and D. J. Craik (2004). "Conserved structural and sequence
elements implicated in the processing of gene-encoded circular proteins." J.
Biol. Chem. 279(45): 46858-46867.
Edwards, K., C. Johnstone and C. Thompson (1991). "A simple and rapid
method for the preparation of plant genomic DNA for PCR analysis." Nucleic
Acids Research 19(6): 1349.
Eliasen, R., N. L. Daly, B. S. Wulff, T. L. Andresen, K. W. Conde-Frieboes and
D. J. Craik (2012). "Design, synthesis, structural and functional characterization
of novel melanocortin agonists based on the cyclotide kalata B1." Journal of
Biological Chemistry 287(48): 40493-40501.
Elliott, A. G., C. Delay, H. Liu, Z. Phua, K. J. Rosengren, A. H. Benfield, J. L.
Panero, M. L. Colgrave, A. S. Jayasena, K. M. Dunse, M. A. Anderson, E. E.
Schilling, D. Ortiz-Barrientos, D. J. Craik and J. S. Mylne (2014). "Evolutionary
origins of a bioactive peptide buried within preproalbumin." Plant Cell 26(3):
981-995.
Ericson, M. L., J. Rödin, M. Lenman, K. Glimelius, L. G. Josefsson and L. Rask
(1986). "Structure of the rapeseed 1.7 S storage protein, napin, and its
precursor." Journal of Biological Chemistry 261(31): 14576-14581.
Fenn, J., M. Mann, C. Meng, S. Wong and C. Whitehouse (1989). "Electrospray
ionization for mass spectrometry of large biomolecules." Science 246(4926): 6471.
Fenyö, D. and R. C. Beavis (2008). "Informatics development: challenges and
solutions for MALDI mass spectrometry." Mass Spectrom Rev 27(1): 1-19.
Force, A., M. Lynch, F. B. Pickett, A. Amores, Y. L. Yan and J. Postlethwait
(1999). "Preservation of duplicate genes by complementary, degenerative
mutations." Genetics 151(4): 1531-1545.
Freire, J. E. C., I. M. Vasconcelos, F. B. M. B. Moreno, A. B. Batista, M. D. P.
Lobo, M. L. Pereira, J. P. M. S. Lima, R. V. M. Almeida, A. J. S. Sousa, A. C. O.
Monteiro-Moreira, J. T. A. Oliveira and T. B. Grangeiro (2015). "Mo-CBP3, an
Antifungal chitin-binding protein from Moringa oleifera seeds, is a member of
the 2S albumin family." PLoS ONE 10(3): e0119871.
Fujiwara, T., E. Nambara, K. Yamagishi, D. B. Goto and S. Naito (2002).
"Storage proteins." Arabidopsis Book 1(10): 30.
Galindo, M. I., J. I. Pueyo, S. Fouix, S. A. Bishop and J. P. Couso (2007).
"Peptides encoded by short ORFs control development and define a new
eukaryotic gene family." PLoS Biology 5(5).
Gan, S. and R. M. Amasino (1997). "Making sense of senescence (molecular
genetic regulation and manipulation of leaf senescence)." Plant Physiology
113(2): 313-319.
152
Ganesh, V., A. Tulinsky, A. Y. Lee and J. Clardy (1996). "Comparison of the
structures of the cyclotheonamide a complexes of human α-thrombin and
bovine β-trypsin." Protein Science 5(5): 825-835.
Gelman, J. S. and L. D. Fricker (2010). "Hemopressin and other bioactive
peptides from cytosolic proteins: are these non-classical neuropeptides?" Aaps
J 12(3): 279-289.
George, A., H. Leahy, J. Zhou and P. J. Morin (2007). "The vacuolar-ATPase
inhibitor bafilomycin and mutant VPS35 inhibit canonical Wnt signaling."
Neurobiology of Disease 26(1): 125-133.
Gessert, S. F., J. H. Kim, F. E. Nargang and R. L. Weiss (1994). "A polyprotein
precursor of two mitochondrial enzymes in Neurospora crassa. Gene structure
and precursor processing." Journal of Biological Chemistry 269(11): 8189-8203.
Getz, J. A., O. Cheneval, D. J. Craik and P. S. Daugherty (2013). "Design of a
cyclotide antagonist of neuropilin-1 and -2 that potently inhibits endothelial cell
migration." ACS Chem Biol 8(6): 1147-1154.
Gilbert, W. (1978). "Why genes in pieces?" Nature 271(5645): 501.
Gladyshev, E. A., M. Meselson and I. R. Arkhipova (2008). "Massive horizontal
gene transfer in bdelloid rotifers." Science 320(5880): 1210-1213.
Göransson, U., M. Sjogren, E. Svangard, P. Claeson and L. Bohlin (2004).
"Reversible Antifouling Effect of the Cyclotide Cycloviolacin O2 against
Barnacles." J Nat Prod 67(8): 1287-1290.
Gran, L. (1970). "An oxytocic principle found in Oldenlandia affinis DC." Medd.
Nor. Farm. Selsk. 12: 173-180.
Gran, L. (1973). "Oxytocic principles of Oldenlandia affinis." Lloydia 36: 174178.
Gressent, F., P. Da Silva, V. Eyraud, L. Karaki and C. Royer (2011). "Pea
Albumin 1 subunit b (PA1b), a promising bioinsecticide of plant origin." Toxins
3(12): 1502-1517.
Gruber, C. W., A. G. Elliott, D. C. Ireland, P. G. Delprete, S. Dessein, U.
Göransson, M. Trabi, C. K. Wang, A. B. Kinghorn, E. Robbrecht and D. J. Craik
(2008). "Distribution and evolution of circular miniproteins in flowering plants."
Plant Cell 20(9): 2471-2483.
Gruis, D., J. Schulze and R. Jung (2004). "Storage protein accumulation in the
absence of the vacuolar processing enzyme family of cysteine proteases." Plant
Cell 16(1): 270-290.
Gruis, D. F., D. A. Selinger, J. M. Curran and R. Jung (2002). "Redundant
proteolytic mechanisms process seed storage proteins in the absence of seedtype members of the vacuolar processing enzyme family of cysteine proteases."
Plant Cell 14(11): 2863-2882.
Gunasekera, S., F. M. Foley, R. J. Clark, L. Sando, L. J. Fabri, D. J. Craik and
N. L. Daly (2008). "Engineering stabilized vascular endothelial growth factor-A
antagonists: synthesis, structural characterization, and bioactivity of grafted
analogues of cyclotides." J Med Chem 51(24): 7697-7704.
Guo, F. Q. and N. M. Crawford (2005). "Arabidopsis nitric oxide synthase1 is
targeted to mitochondria and protects against oxidative damage and darkinduced senescence." Plant Cell 17(12): 3436-3450.
153
Gustafson, K. R., R. C. I. Sowder, L. E. Henderson, I. C. Parsons, Y. Kashman,
J. H. I. Cardellina, J. B. McMahon, R. W. J. Buckheit, L. K. Pannell and M. R.
Boyd (1994). "Circulins A and B: Novel HIV-inhibitory macrocyclic peptides from
the tropical tree Chassalia parvifolia." J. Am. Chem. Soc. 116: 9337-9338.
Hakes, L., J. W. Pinney, S. C. Lovell, S. G. Oliver and D. L. Robertson (2007).
"All duplicates are not equal: the difference between small-scale and genome
duplication." Genome Biology 8(10): R209-R209.
Hara-Nishimura, I., K. Inoue and M. Nishimura (1991). "A unique vacuolar
processing enzyme responsible for conversion of several proprotein precursors
into the mature forms." FEBS Letters 294(1-2): 89-93.
Hara-Nishimura, I. and M. Nishimura (1987). "Proglobulin processing enzyme in
vacuoles isolated from developing pumpkin cotyledons." Plant Physiology 85(2):
440-445.
Hara-Nishimura, I., Y. Takeuchi, K. Inoue and M. Nishimura (1993). "Vesicle
transport and processing of the precursor to 2S albumin in pumpkin." Plant J. 4:
793-800.
Hara-Nishimura, I., Y. Takeuchi and M. Nishimura (1993). "Molecular
characterization of a vacuolar processing enzyme related to a putative cysteine
proteinase of Schistosoma mansoni." Plant Cell 5(11): 1651-1659.
Hatsugai, N., K. Yamada, S. Goto-Yamada, I. Hara-Nishimura (2015). "Vacuolar
processing enzyme in plant programmed cell death." Frontier in Plant Science
6(234).
Heinen, T. J. A. J., F. Staubach, D. Häming and D. Tautz (2009). "Emergence
of a new gene from an intergenic region." Current Biology 19(18): 1527-1531.
Heitz, A., J. F. Hernandez, J. Gagnon, T. T. Hong, T. T. Pham, T. M. Nguyen,
D. Le-Nguyen and L. Chiche (2001). "Solution structure of the squash trypsin
inhibitor MCoTI-II. A new family for cyclic knottins." Biochemistry 40(27): 79737983.
Herman, E. and B. Larkins (1999). "Protein storage bodies and vacuoles." The
Plant Cell 11(4): 601-614.
Hickman, R., C. Hill, C. A. Penfold, E. Breeze, L. Bowden, J. D. Moore, P.
Zhang, A. Jackson, E. Cooke, F. Bewicke-Copley, A. Mead, J. Beynon, D. L.
Wild, K. J. Denby, S. Ott and V. Buchanan-Wollaston (2013). "A local regulatory
network around three NAC transcription factors in stress responses and
senescence in Arabidopsis leaves." Plant Journal 75(1): 26-39.
Higgins, T. J., P. M. Chandler, P. J. Randall, D. Spencer, L. R. Beach, R. J.
Blagrove, A. A. Kortt and A. S. Inglis (1986). "Gene structure, protein structure,
and regulation of the synthesis of a sulfur-rich protein in pea seeds." Journal of
Biological Chemistry 261(24): 11124-11130.
Hillmer, S., A. Movafeghi, D. G. Robinson and G. Hinz (2001). "Vacuolar
storage proteins are sorted in the cis-cisternae of the pea cotyledon golgi
apparatus." The Journal of Cell Biology 152(1): 41-50.
Hiraiwa, N., M. Kondo, M. Nishimura and I. Hara-Nishimura (1997). "An aspartic
endopeptidase is involved in the breakdown of propeptides of storage proteins
in protein-storage vacuoles of plants." European Journal of Biochemistry 246(1):
133-141.
154
Hiraiwa, N., M. Nishimura and I. Hara-Nishimura (1999). "Vacuolar processing
enzyme is self-catalytically activated by sequential removal of the C-terminal
and N-terminal propeptides." FEBS Letters 447(2-3): 213-216.
Hoffmann, W., T. C. Bach, H. Seliger and G. Kreil (1983). "Biosynthesis of
caerulein in the skin of Xenopus laevis: partial sequences of precursors as
deduced from cDNA clones." Embo J 2(1): 111-114.
Horiuchi, T., E. Giniger and T. Aigaki (2003). "Alternative trans-splicing of
constant and variable exons of a Drosophila axon guidance gene, lola." Genes
& Development 17(20): 2496-2501.
Hsiao, K. C., C. H. Cheng, I. E. Fernandes, H. W. Detrich and A. L. DeVries
(1990). "An antifreeze glycopeptide gene from the antarctic cod Notothenia
coriiceps neglecta encodes a polyprotein of high peptide copy number."
Proceedings of the National Academy of Sciences of the United States of
America 87(23): 9265-9269.
Hsieh, K. and A. H. C. Huang (2004). "Endoplasmic reticulum, oleosins, and oils
in seeds and tapetum cells." Plant physiology 136(3): 3427-3434.
Huang, Q., S. Liu and Y. Tang (1993). "Refined 1.6 A resolution crystal
structure of the complex formed between porcine beta-trypsin and MCTI-A, a
trypsin inhibitor of the squash family. Detailed comparison with bovine betatrypsin and its complex." J. Mol. Biol. 229(4): 1022-1036.
Jacob, F. (1977). "Evolution and tinkering." Science 196(4295): 1161-1166.
Jayasena, A. S., B. Franke, K. J. Rosengren and J. S. Mylne (2016). "A tripartite
‘omic analysis identifies the major sunflower seed albumins." Theoretical and
Applied Genetics 129: 613-629.
Jennings, C., J. West, C. Waine, D. Craik and M. Anderson (2001).
"Biosynthesis and insecticidal properties of plant cyclotides: the cyclic knotted
proteins from Oldenlandia affinis." Proceedings of the National Academy of
Sciences of the United States of America 98(19): 10614-10619.
Jennings, C. V., K. J. Rosengren, N. L. Daly, M.
Scanlon, C. Waine, D. G. Norman, M. A. Anderson
"Isolation, solution structure, and insecticidal activity
protein with a twist: do Mobius strips exist in nature?"
860.
Plan, J. Stevens, M. J.
and D. J. Craik (2005).
of kalata B2, a circular
Biochemistry 44(3): 851-
Ji, Y., S. Majumder, M. Millard, R. Borra, T. Bi, A. Y. Elnagar, N. Neamati, A.
Shekhtman and J. A. Camarero (2013). "In vivo activation of the p53 tumor
suppressor pathway by an engineered cyclotide." Journal of the American
Chemical Society 135(31): 11623-11633.
Jiao, Y., N. J. Wickett, S. Ayyampalayam, A. S. Chanderbali, L. Landherr, P. E.
Ralph, L. P. Tomsho, Y. Hu, H. Liang, P. S. Soltis, D. E. Soltis, S. W. Clifton, S.
E. Schlarbaum, S. C. Schuster, H. Ma, J. Leebens-Mack and C. W. dePamphilis
(2011). "Ancestral polyploidy in seed plants and angiosperms." Nature
473(7345): 97-100.
Johansson, B. L., A. Kernell, S. Sjoberg and J. Wahren (1993). "Influence of
combined C-peptide and insulin administration on renal function and metabolic
control in diabetes type 1." Journal of Clinical Endocrinology & Metabolism
77(4): 976-981.
155
Jouvensal, L., L. Quillien, E. Ferrasson, Y. Rahbe, J. Gueguen and F. Vovelle
(2003). "PA1b, an insecticidal protein extracted from pea seeds (Pisum
sativum): 1H-2-D NMR study and molecular modeling." Biochemistry 42(41):
11915-11923.
Jung, R., M. P. Scott, Y.-W. Nam, T. W. Beaman, R. Bassüner, I. Saalbach, K.
Müntz and N. C. Nielsen (1998). "The Role of Proteolysis in the Processing and
Assembly of 11S Seed Globulins." The Plant Cell Online 10(3): 343-357.
Kaessmann, H. (2010). "Origins, evolution, and phenotypic impact of new
genes." Genome Research 20(10): 1313-1326.
Kaessmann, H., N. Vinckenbosch and M. Long (2009). "RNA-based gene
duplication: mechanistic and evolutionary insights." Nature reviews. Genetics
10(1): 19-31.
Kamphaus, G. D., P. C. Colorado, D. J. Panka, H. Hopfer, R. Ramchandran, A.
Torre, Y. Maeshima, J. W. Mier, V. P. Sukhatme and R. Kalluri (2000).
"Canstatin, a novel matrix-derived inhibitor of angiogenesis and tumor growth."
Journal of Biological Chemistry 275(2): 1209-1215.
Karas, M. and F. Hillenkamp (1988). "Laser desorption ionization of proteins
with molecular masses exceeding 10,000 daltons." Analytical Chemistry 60(20):
2299-2301.
Kathiria, P., C. Sidler, A. Golubov, M. Kalischuk, L. M. Kawchuk and I.
Kovalchuk (2010). "Tobacco mosaic virus infection results in an increase in
recombination frequency and resistance to viral, bacterial, and fungal
pathogens in the progeny of infected tobacco plants." Plant Physiology 153(4):
1859-1870.
Kawakami, T., A. Ohta, M. Ohuchi, H. Ashigai, H. Murakami and H. Suga
(2009). "Diverse backbone-cyclized peptides via codon reprogramming." Nature
Chemical Biology 5(12): 888-890.
Kazachkov Yu, A., V. Svitkin Yu and V. I. Agol (1983). "The position of
polypeptide G on the encephalomyocarditis virus polyprotein cleavage map."
FEBS Letters 154(1): 161-165.
Keech, O., E. Pesquet, L. Gutierrez, A. Ahad, C. Bellini, S. M. Smith and P.
Gardestrom (2010). "Leaf senescence is accompanied by an early disruption of
the microtubule network in Arabidopsis." Plant Physiology 154(4): 1710-1720.
Keeling, P. J. and J. D. Palmer (2008). "Horizontal gene transfer in eukaryotic
evolution." Nature Reviews Genetics 9(8): 605-618.
Keskitalo, J., G. Bergquist, P. Gardestrom and S. Jansson (2005). "A cellular
timetable of autumn senescence." Plant Physiology 139(4): 1635-1648.
Khalturin, K., G. Hemmrich, S. Fraune, R. Augustin and T. C. Bosch (2009).
"More than just orphans: are taxonomically-restricted genes important in
evolution?" Trends in Genetics 25(9): 404-413.
156
Kidd, J. M., G. M. Cooper, W. F. Donahue, H. S. Hayden, N. Sampas, T.
Graves, N. Hansen, B. Teague, C. Alkan, F. Antonacci, E. Haugen, T. Zerr, N.
A. Yamada, P. Tsang, T. L. Newman, E. Tuzun, Z. Cheng, H. M. Ebling, N.
Tusneem, R. David, W. Gillett, K. A. Phelps, M. Weaver, D. Saranga, A. Brand,
W. Tao, E. Gustafson, K. McKernan, L. Chen, M. Malig, J. D. Smith, J. M. Korn,
S. A. McCarroll, D. A. Altshuler, D. A. Peiffer, M. Dorschner, J.
Stamatoyannopoulos, D. Schwartz, D. A. Nickerson, J. C. Mullikin, R. K. Wilson,
L. Bruhn, M. V. Olson, R. Kaul, D. R. Smith and E. E. Eichler (2008). "Mapping
and sequencing of structural variation from eight human genomes." Nature
453(7191): 56-64.
Kinoshita, T., M. Nishimura and I. Hara-Nishimura (1995). "Homologues of a
vacuolar processing enzyme that are expressed in different organs in
Arabidopsis thaliana." Plant Mol Biol 29(1): 81-89.
Kinoshita, T., K. Yamada, N. Hiraiwa, M. Kondo, M. Nishimura and I. HaraNishimura (1999). "Vacuolar processing enzyme is up-regulated in the lytic
vacuoles of vegetative tissues during senescence and under various stressed
conditions." Plant J 19(1): 43-53.
Klemke, M., R. H. Kehlenbach and W. B. Huttner (2001). "Two overlapping
reading frames in a single exon encode interacting proteins--a novel way of
gene usage." EMBO J 20(14): 3849-3860.
Konarev, A. V., I. N. Anisimova, V. A. Gavrilova, T. E. Vachrusheva, G. Y.
Konechnaya, M. Lewis and P. R. Shewry (2002). "Serine proteinase inhibitors in
the Compositae: distribution, polymorphism and properties." Phytochemistry
59(3): 279-291.
Konarev, A. V., J. Griffin, G. Y. Konechnaya and P. R. Shewry (2004). "The
distribution of serine proteinase inhibitors in seeds of the Asteridae."
Phytochemistry 65(22): 3003-3020.
Korant, B. D. (1990). "Strategies to inhibit viral polyprotein cleavages." Annals
of the New York Academy of Sciences 616: 252-257.
Korsinczky, M. L., H. J. Schirra, K. J. Rosengren, J. West, B. A. Condie, L.
Otvos, M. A. Anderson and D. J. Craik (2001). "Solution structures by 1H NMR
of the novel cyclic trypsin inhibitor SFTI-1 from sunflower seeds and an acyclic
permutant." Journal of Molecular Biology 311(3): 579-591.
Kraut, J. (1977). "Serine Proteases: Structure and Mechanism of Catalysis."
Annual Review of Biochemistry 46(1): 331-358.
Krebbers, E., L. Herdies, A. De Clercq, J. Seurinck, J. Leemans, J. Van
Damme, M. Segura, G. Gheysen, M. Van Montagu and J. Vandekerckhove
(1988). "Determination of the processing sites of an Arabidopsis 2S albumin
and characterization of the complete gene family." Plant Physiology 87(4): 859866.
Kuntal, B.K., P. Aparoy, P. Reddanna (2010) "EasyModeller: A graphical
interface to MODELLER." BioMed Central Research Notes 3(226).
Kuroyanagi, M., M. Nishimura and I. Hara-Nishimura (2002). "Activation of
Arabidopsis Vacuolar Processing Enzyme by Self-Catalytic Removal of an
Auto-Inhibitory Domain of the C-Terminal Propeptide." Plant Cell Physiol. 43(2):
143-151.
157
Kuroyanagi, M., K. Yamada, N. Hatsugai, M. Kondo, M. Nishimura and I. HaraNishimura (2005). "Vacuolar processing enzyme is essential for mycotoxininduced cell death in Arabidopsis thaliana." Journal of Biological Chemistry
280(38): 32914-32920.
Land, H., M. Grez, S. Ruppert, H. Schmale, M. Rehbein, D. Richter and G.
Schutz (1983). "Deduced amino acid sequence from the bovine oxytocinneurophysin I precursor cDNA." Nature 302(5906): 342-344.
Land, H., G. Schutz, H. Schmale and D. Richter (1982). "Nucleotide sequence
of cloned cDNA encoding bovine arginine vasopressin-neurophysin II
precursor." Nature 295(5847): 299-303.
Laskowski, M., Jr. and I. Kato (1980). "Protein inhibitors of proteinases." Annu
Rev Biochem 49: 593-626.
Laskowski, R.A., M.W. MacArthur, D.S. Moss, J.M. Thornton (1993).
"PROCHECK - a program to check the stereochemical quality of protein
structures." Journal of Applied Crystallography 26:283-291.
Legowska, A., D. Debowski, A. Lesner, M. Wysocka and K. Rolka (2009).
"Introduction of non-natural amino acid residues into the substrate-specific P1
position of trypsin inhibitor SFTI-1 yields potent chymotrypsin and cathepsin G
inhibitors." Bioorganic & Medicinal Chemistry 17(9): 3302-3307.
Lehmann, K., K. Schweimer, G. Reese, S. Randow, M. Suhr, W.-M. Becker, S.
Vieths and P. Rösch (2006). "Structure and stability of 2S albumin-type peanut
allergens: implications for the severity of peanut allergic reactions." Biochemical
Journal 395(3): 463-472.
Leopold, A. C. (1961). "Senescence in plant development: the death of plants or
plant parts may be of positive ecological or physiological value." Science
134(3492): 1727-1732.
Li, C. Y., Y. Zhang, Z. Wang, C. Cao, P. W. Zhang, S. J. Lu, X. M. Li, Q. Yu, X.
Zheng, Q. Du, G. R. Uhl, Q. R. Liu and L. Wei (2010). "A human-specific de
novo protein-coding gene associated with human brain functions." PLoS
Computational Biology 6(3): e1000734.
Li, D., Y. Dong, Y. Jiang, H. Jiang, J. Cai and W. Wang (2010). "A de novo
originated gene depresses budding yeast mating pathway and is repressed by
the protein encoded by its antisense strand." Cell Research 20(4): 408-420.
Li, F., X.-x. Yang, H.-c. Xia, R. Zeng, W.-g. Hu, Z. Li and Z.-c. Zhang (2003).
"Purification and characterization of Luffin P1, a ribosome-inactivating peptide
from the seeds of Luffa cylindrica." Peptides 24(6): 799-805.
Li, L., T. Shimada, H. Takahashi, H. Ueda, Y. Fukao, M. Kondo, M. Nishimura
and I. Hara-Nishimura (2006). "MAIGO2 is involved in exit of seed storage
proteins from the endoplasmic reticulum in Arabidopsis thaliana." Plant Cell
18(12): 3535-3547.
Li, X., L. Zhao, H. Jiang and W. Wang (2009). "Short homologous sequences
are strongly associated with the generation of chimeric RNAs in eukaryotes."
Journal of Molecular Evolution 68(1): 56-65.
Light, S., W. Basile and A. Elofsson (2014). "Orphans and new gene origination,
a structural and evolutionary perspective." Current Opinion in Structural Biology
26: 73-83.
158
Lin, C. Y., J. Anders, M. Johnson, Q. A. Sang and R. B. Dickson (1999).
"Molecular cloning of cDNA for matriptase, a matrix-degrading serine protease
with trypsin-like activity." J Biol Chem 274(26): 18231-18236.
Lindholm, P., U. Göransson, S. Johansson, P. Claeson, J. Gulbo, R. Larsson, L.
Bohlin and A. Backlund (2002). "Cyclotides: A novel type of cytotoxic agents."
Mol. Cancer Ther. 1: 365-369.
Linnestad, C., D. N. P. Doan, R. C. Brown, B. E. Lemmon, D. J. Meyer, R. Jung
and O.-A. Olsen (1998). "Nucellain, a Barley Homolog of the Dicot VacuolarProcessing Protease, Is Localized in Nucellar Cell Walls." Plant Physiology
118(4): 1169-1180.
Liu, S. L., Y. Zhuang, P. Zhang and K. L. Adams (2009). "Comparative analysis
of structural diversity and sequence evolution in plant mitochondrial genes
transferred to the nucleus." Molecular Biology and Evolution 26(4): 875-891.
Long, M., E. Betrán, K. Thornton and W. Wang (2003). "The origin of new
genes: glimpses from the young and old." Nat Rev Genet 4(11): 865-875.
Long, M. and C. H. Langley (1993). "Natural selection and the origin of jingwei,
a chimeric processed functional gene in Drosophila." Science 260(5104): 91-95.
Long, M., N. W. VanKuren, S. Chen and M. D. Vibranovski (2013). "New gene
evolution: little did we know." Annual Review of Genetics 47: 307-333.
Long, Y. Q., S. L. Lee, C. Y. Lin, I. J. Enyedy, S. Wang, P. Li, R. B. Dickson and
P. P. Roller (2001). "Synthesis and evaluation of the sunflower derived trypsin
inhibitor as a potent inhibitor of the type II transmembrane serine protease,
matriptase." Bioorganic & Medicinal Chemistry Letters 11(18): 2515-2519.
Luckett, S., R. S. Garcia, J. J. Barker, A. V. Konarev, P. R. Shewry, A. R. Clarke
and R. L. Brady (1999). "High-resolution structure of a potent, cyclic proteinase
inhibitor from sunflower seeds." Journal of Molecular Biology 290(2): 525-533.
Luco, R. F., Q. Pan, K. Tominaga, B. J. Blencowe, O. M. Pereira-Smith and T.
Misteli (2010). "Regulation of alternative splicing by histone modifications."
Science 327(5968): 996-1000.
Luppi, P., V. Cifarelli, H. Tse, J. Piganelli and M. Trucco (2008). "Human Cpeptide antagonises high glucose-induced endothelial dysfunction through the
nuclear factor-kappaB pathway." Diabetologia 51(8): 1534-1543.
MacLean, B., D. M. Tomazela, N. Shulman, M. Chambers, G. L. Finney, B.
Frewen, R. Kern, D. L. Tabb, D. C. Liebler and M. J. MacCoss (2010). "Skyline:
an open source document editor for creating and analyzing targeted proteomics
experiments." Bioinformatics 26(7): 966-968.
Maere, S., S. De Bodt, J. Raes, T. Casneuf, M. Van Montagu, M. Kuiper and Y.
Van de Peer (2005). "Modeling gene and genome duplications in eukaryotes."
Proceedings of the National Academy of Sciences of the United States of
America 102(15): 5454-5459.
Maeshima, Y., A. Sudhakar, J. C. Lively, K. Ueki, S. Kharbanda, C. R. Kahn, N.
Sonenberg, R. O. Hynes and R. Kalluri (2002). "Tumstatin, an endothelial cellspecific inhibitor of protein synthesis." Science 295(5552): 140-143.
Mains, R. E., B. A. Eipper and N. Ling (1977). "Common precursor to
corticotropins and endorphins." Proceedings of the National Academy of
Sciences of the United States of America 74(7): 3014-3018.
159
Mann, M. and O. N. Jensen (2003). "Proteomic analysis of post-translational
modifications." Nat Biotech 21(3): 255-261.
Marcus, J. P., J. L. Green, K. C. Goulter and J. M. Manners (1999). "A family of
antimicrobial peptides is produced by processing of a 7S globulin protein in
Macadamia integrifolia kernels." Plant Journal 19(6): 699-710.
Maria-Neto, S., R. V. Honorato, F. T. Costa, R. G. Almeida, D. S. Amaro, J. T.
A. Oliveira, I. M. Vasconcelos and O. L. Franco (2011). "Bactericidal activity
identified in 2S albumin from sesame seeds and in silico studies of structure–
function relations." The Protein Journal 30(5): 340-350.
Markwell, J., J. C. Osterman and J. L. Mitchell (1995). "Calibration of the
Minolta SPAD-502 leaf chlorophyll meter." Photosynthesis Research 46(3): 467472.
Marques, A. C., I. Dupanloup, N. Vinckenbosch, A. Reymond and H.
Kaessmann (2005). "Emergence of young human genes after a burst of
retroposition in primates." PLoS Biology 3(11): e357.
Marx, U. C., M. L. Korsinczky, H. J. Schirra, A. Jones, B. Condie, L. Otvos, Jr.
and D. J. Craik (2003). "Enzymatic cyclization of a potent Bowman-Birk
protease inhibitor, sunflower trypsin inhibitor-1, and solution structure of an
acyclic precursor peptide." J. Biol. Chem. 278(24): 21782-21789.
McCarrey, J. R. and K. Thomas (1987). "Human testis-specific PGK gene lacks
introns and possesses characteristics of a processed gene." Nature 326(6112):
501-505.
McManus, C. J., M. O. Duff, J. Eipper-Mains and B. R. Graveley (2010). "Global
analysis of trans-splicing in Drosophila." Proceedings of the National Academy
of Sciences of the United States of America 107(29): 12975-12979.
Menéndez-Arias, L., I. Moneo, J. DomÍNguez and R. RodrÍGuez (1988).
"Primary structure of the major allergen of yellow mustard (Sinapis alba L.)
seed, Sin a I." European Journal of Biochemistry 177(1): 159-166.
Mighell, A. J., N. R. Smith, P. A. Robinson and A. F. Markham (2000).
"Vertebrate pseudogenes." FEBS Letters 468(2-3): 109-114.
Moran, J. V., R. J. DeBerardinis and H. H. Kazazian, Jr. (1999). "Exon shuffling
by L1 retrotransposition." Science 283(5407): 1530-1534.
Müntz, K. (1998). "Deposition of storage proteins." Plant Molecular Biology
38(1): 77-99.
Mylne, J. S. and J. R. Botella (1998). "Binary vectors for sense and antisense
expression of Arabidopsis ESTs." Plant Molecular Biology Reporter 16(3): 257262.
Mylne, J. S., L. Y. Chan, A. H. Chanson, N. L. Daly, H. Schaefer, T. L. Bailey, P.
Nguyencong, L. Cascales and D. J. Craik (2012). "Cyclic peptides arising by
evolutionary parallelism via asparaginyl-endopeptidase-mediated biosynthesis."
Plant Cell 24: 2765-2778.
Mylne, J. S., M. L. Colgrave, N. L. Daly, A. H. Chanson, A. G. Elliott, E. J.
McCallum, A. Jones and D. J. Craik (2011). "Albumins and their processing
machinery are hijacked for cyclic peptides in sunflower." Nature Chemical
Biology 7(5): 257-259.
160
Mylne, J. S., I. Hara-Nishimura and K. J. Rosengren (2014). "Seed storage
albumins: biosynthesis, trafficking and structures." Functional Plant Biology
41(7): 671-677.
Nakabayashi, K., M. Okamoto, T. Koshiba, Y. Kamiya and E. Nambara (2005).
"Genome-wide profiling of stored mRNA in Arabidopsis thaliana seed
germination: epigenetic and genetic regulation of transcription in seed." Plant
Journal 41(5): 697-709.
Nakaune, S., K. Yamada, M. Kondo, T. Kato, S. Tabata, M. Nishimura and I.
Hara-Nishimura (2005). "A Vacuolar Processing Enzyme, {delta}VPE, Is
Involved in Seed Coat Formation at the Early Stage of Seed Development."
Plant Cell 17(3): 876-887.
Nekrutenko, A., S. Wadhawan, P. Goetting-Minesky and K. D. Makova (2005).
"Oscillating evolution of a mammalian locus with overlapping reading frames: an
XLalphas/ALEX relay." PLoS Genetics 1(2): e18.
Neme, R. and D. Tautz (2013). "Phylogenetic patterns of emergence of new
genes support a model of frequent de novo evolution." BMC Genomics 14(1):
117.
Neme, R. and D. Tautz (2014). "Evolution: dynamics of de novo gene
emergence." Current Biology 24(6): R238-240.
Neves, G., J. Zucker, M. Daly and A. Chess (2004). "Stochastic yet biased
expression of multiple Dscam splice variants by individual cells." Nature
Genetics 36(3): 240-246.
Ng, Y.-M., Y. Yang, K.-H. Sze, X. Zhang, Y.-T. Zheng and P.-C. Shaw (2011).
"Structural characterization and anti-HIV-1 activities of arginine/glutamate-rich
polypeptide Luffin P1 from the seeds of sponge gourd (Luffa cylindrica)."
Journal of Structural Biology 174(1): 164-172.
Nguyen, G. K., S. Zhang, N. T. Nguyen, P. Q. Nguyen, M. S. Chiu, A. Hardjojo
and J. P. Tam (2011). "Discovery and characterization of novel cyclotides
originated from chimeric precursors consisting of albumin-1 chain a and
cyclotide domains in the Fabaceae family." The Journal of Biological Chemistry
286(27): 24275-24287.
Nielsen, M. S., C. Gustafsen, P. Madsen, J. R. Nyengaard, G. Hermey, O.
Bakke, M. Mari, P. Schu, R. Pohlmann, A. Dennes and C. M. Petersen (2007).
"Sorting by the cytoplasmic domain of the amyloid precursor protein binding
receptor SorLA." Molecular and Cellular Biology 27(19): 6842-6851.
Nilsen, T. W. and B. R. Graveley (2010). "Expansion of the eukaryotic proteome
by alternative splicing." Nature 463(7280): 457-463.
Nordlee, J. A., S. L. Taylor, J. A. Townsend, L. A. Thomas and R. K. Bush
(1996). "Identification of a Brazil-Nut allergen in transgenic soybeans." New
England Journal of Medicine 334(11): 688-692.
Nordquist, L., R. Brown, A. Fasching, P. Persson and F. Palm (2009).
"Proinsulin C-peptide reduces diabetes-induced glomerular hyperfiltration via
efferent arteriole dilation and inhibition of tubular sodium reabsorption."
American Journal of Physiology - Renal Physiology 297(5): 9.
Nordquist, L. and J. Wahren (2009). "C-Peptide: the missing link in diabetic
nephropathy?" Rev Diabet Stud 6(3): 203-210.
161
Ohno, S. (1970). Evolution by gene duplication, Springer-Verlag Berlin
Heidelberg.
Ohno, S. (1984). "Birth of a unique enzyme from an alternative reading frame of
the preexisted, internally repetitious coding sequence." Proceedings of the
National Academy of Sciences of the United States of America 81(8): 24212425.
Okamura, K., L. Feuk, T. Marques-Bonet, A. Navarro and S. W. Scherer (2006).
"Frequent appearance of novel protein-coding sequences by frameshift
translation." Genomics 88(6): 690-697.
Oliviusson, P., O. Heinzerling, S. Hillmer, G. Hinz, Y. C. Tse, L. Jiang and D. G.
Robinson (2006). "Plant retromer, localized to the prevacuolar compartment and
microvesicles in Arabidopsis, may interact with vacuolar sorting receptors."
Plant Cell 18(5): 1239-1252.
Osborne, T. B. (1909). The vegetable proteins. London, New York Longmans,
Green.
Otegui, M. S., R. Herder, J. Schulze, R. Jung and L. A. Staehelin (2006). "The
proteolytic processing of seed storage proteins in Arabidopsis embryo cells
starts in the multivesicular bodies." Plant Cell 18(10): 2567-2581.
Pace, J. K., 2nd, C. Gilbert, M. S. Clark and C. Feschotte (2008). "Repeated
horizontal transfer of a DNA transposon in mammals and other tetrapods."
Proceedings of the National Academy of Sciences of the United States of
America 105(44): 17023-17028.
Parker, H. G., B. M. VonHoldt, P. Quignon, E. H. Margulies, S. Shao, D. S.
Mosher, T. C. Spady, A. Elkahloun, M. Cargill, P. G. Jones, C. L. Maslen, G. M.
Acland, N. B. Sutter, K. Kuroki, C. D. Bustamante, R. K. Wayne and E. A.
Ostrander (2009). "An Expressed Fgf4 Retrogene Is Associated with BreedDefining Chondrodysplasia in Domestic Dogs." Science (New York, N.Y.)
325(5943): 995-998.
Parmenter, D. L., J. G. Boothe, G. J. van Rooijen, E. C. Yeung and M. M.
Moloney (1995). "Production of biologically active hirudin in plant seeds using
oleosin partitioning." Plant Molecular Biology 29(6): 1167-1180.
Patthy, L. (1999). "Genome evolution and the evolution of exon-shuffling--a
review." Gene 238(1): 103-114.
Paulding, C. A., M. Ruvolo and D. A. Haber (2003). "The Tre2 (USP6)
oncogene is a hominoid-specific gene." Proceedings of the National Academy
of Sciences of the United States of America 100(5): 2507-2511.
Pavesi, A., G. Magiorkinis and D. G. Karlin (2013). "Viral proteins originated de
novo by overprinting can be identified by codon usage: application to the “gene
nursery” of Deltaretroviruses." PLoS Computational Biology 9(8): e1003162.
Pearce, G., D. S. Moura, J. Stratmann and C. A. Ryan (2001). "Production of
multiple plant hormones from a single polyprotein precursor." Nature 411(6839):
817-820.
Pelegrini, P. B., B. F. Quirino and O. L. Franco (2007). "Plant cyclotides: an
unusual class of defense compounds." Peptides 28(7): 1475-1481.
162
Pereira, H. J. V., M. C. O. Salgado and E. B. Oliveira (2009). "Immobilized
analogues of sunflower trypsin inhibitor-1 constitute a versatile group of affinity
sorbents for selective isolation of serine proteases." Journal of Chromatography
B 877(22): 2039-2044.
Petrov, D. A., E. R. Lozovskaya and D. L. Hartl (1996). "High intrinsic rate of
DNA loss in Drosophila." Nature 384(6607): 346-349.
Pimenta, D. C. and I. Lebrun (2007). "Cryptides: buried secrets in proteins."
Peptides 28(12): 2403-2410.
Plan, M. R., I. Saska, A. G. Cagauan and D. J. Craik (2008). "Backbone
cyclised peptides from plants show molluscicidal activity against the rice pest
Pomacea canaliculata (golden apple snail)." J Agric Food Chem 56(13): 52375241.
Poth, A. G., M. L. Colgrave, R. E. Lyons, N. L. Daly and D. J. Craik (2011).
"Discovery of an unusual biosynthetic origin for circular proteins in legumes."
Proceedings of the National Academy of Sciences of the United States of
America 108(25): 10127-10132.
Poth, A. G., M. L. Colgrave, R. Philip, B. Kerenga, N. L. Daly, M. A. Anderson
and D. J. Craik (2011). "Discovery of cyclotides in the Fabaceae plant family
provides new insights into the cyclization, evolution, and distribution of circular
proteins." ACS Chemical Biology on-line 20 Jan 2011: DOI:10.1021/cb100388j.
Poth, A. G., J. S. Mylne, J. Grassl, R. E. Lyons, A. H. Millar, M. L. Colgrave and
D. J. Craik (2012). "Cyclotides associate with leaf vasculature and are the
products of a novel precursor in Petunia (Solanaceae)." Journal of Biological
Chemistry 287(32): 27033-27046.
Potrzebowski, L., N. Vinckenbosch, A. C. Marques, F. Chalmel, B. Jégou and
H. Kaessmann (2008). "Chromosomal Gene Movements Reflect the Recent
Origin and Biology of Therian Sex Chromosomes." PLoS Biology 6(4): e80.
Pratley, R. E., M. Nauck, T. Bailey, E. Montanya, R. Cuddihy, S. Filetti, A. B.
Thomsen, R. E. Søndergaard and M. Davies (2010). "Liraglutide versus
sitagliptin for patients with type 2 diabetes who did not have adequate
glycaemic control with metformin: a 26-week, randomised, parallel-group, openlabel trial." The Lancet 375(9724): 1447-1456.
Pruzinska, A., G. Tanner, S. Aubry, I. Anders, S. Moser, T. Muller, K. H.
Ongania, B. Krautler, J. Y. Youn, S. J. Liljegren and S. Hortensteiner (2005).
"Chlorophyll breakdown in senescent Arabidopsis leaves. Characterization of
chlorophyll catabolites and of chlorophyll catabolic enzymes involved in the
degreening reaction." Plant Physiology 139(1): 52-63.
Quimbar, P., U. Malik, C. P. Sommerhoff, Q. Kaas, L. Y. Chan, Y. H. Huang, M.
Grundhuber, K. Dunse, D. J. Craik, M. A. Anderson and N. L. Daly (2013).
"High-affinity cyclic peptide matriptase inhibitors." J Biol Chem 288(19): 1388513896.
Quirino, B. F., J. Normanly and R. M. Amasino (1999). "Diverse range of gene
activity during Arabidopsis thaliana leaf senescence includes pathogenindependent induction of defense-related genes." Plant Molecular Biology 40(2):
267-278.
163
Reinhardt, J. A., B. M. Wanjiru, A. T. Brant, P. Saelao, D. J. Begun and C. D.
Jones (2013). "De novo ORFs in Drosophila are important to organismal fitness
and evolved rapidly from previously non-coding sequences." PLOS Genetics
9(10): e1003860.
Ribeiro, A. C., S. V. Monteiro, B. M. Carrapico and R. B. Ferreira (2014). "Are
vicilins another major class of legume lectins?" Molecules 19(12): 20350-20373.
Ribeiro, S. F. F., G. B. Taveira, A. O. Carvalho, G. B. Dias, M. Da Cunha, C.
Santa-Catarina, R. Rodrigues and V. M. Gomes (2012). "Antifungal and other
biological activities of two 2S albumin-homologous proteins against pathogenic
fungi." The Protein Journal 31(1): 59-67.
Roberts, I. N., C. Caputo, M. V. Criado and C. Funk (2012). "Senescenceassociated proteases in plants." Physiologia Plantarum 145(1): 130-139.
Robinson, J. A., S. DeMarco, F. Gombert, K. Moehle and D. Obrecht (2008).
"The design, structures and therapeutic potential of protein epitope mimetics."
Drug Discovery Today 13(21–22): 944-951.
Rogers, L. D. and C. M. Overall (2013). "Proteolytic post-translational
modification of proteins: proteomic tools and methodology." Molecular & Cellular
Proteomics 12(12): 3532-3542.
Romero, D. and R. Palacios (1997). "Gene amplification and genomic plasticity
in prokaryotes." Annual Review of Genetics 31: 91-111.
Rotari, V. I., P. M. Dando and A. J. Barrett (2001). Legumain forms from plants
and animals differ in their specificity. Biological Chemistry. 382: 953.
Roth, D. B., T. N. Porter and J. H. Wilson (1985). "Mechanisms of
nonhomologous recombination in mammalian cells." Molecular and Cellular
Biology 5(10): 2599-2607.
Rothman, B. S., E. Mayeri, R. O. Brown, P. M. Yuan and J. E. Shively (1983).
"Primary structure and neuronal effects of alpha-bag cell peptide, a second
candidate neurotransmitter encoded by a single gene in bag cell neurons of
Aplysia." Proceedings of the National Academy of Sciences of the United States
of America 80(18): 5753-5757.
Rozwadowski, K., W. Yang and S. Kagale (2008). "Homologous recombinationmediated cloning and manipulation of genomic DNA regions using Gateway and
recombineering systems." BMC Biotechnol 8(88): 1472-6750.
Rubenstein, A. H., J. L. Clark, F. Melani and D. F. Steiner (1969). "Secretion of
proinsulin C-peptide by pancreatic [beta] cells and its circulation in blood."
Nature 224(5220): 697-699.
Ruhfel, B. R., M. A. Gitzendanner, P. S. Soltis, D. E. Soltis and J. G. Burleigh
(2014). "From algae to angiosperms-inferring the phylogeny of green plants
(Viridiplantae) from 360 plastid genomes." BMC Evolutionary Biology 14(23):
1471-2148.
Ruiz-Orera, J., X. Messeguer, J. A. Subirana and M. M. Alba (2014). "Long noncoding RNAs as a source of new peptides." Elife 3: e03523.
Saether, O., D. J. Craik, I. D. Campbell, K. Sletten, J. Juul and D. G. Norman
(1995). "Elucidation of the primary and three-dimensional structure of the
uterotonic polypeptide kalata B1." Biochemistry 34: 4147-4158.
164
Samir, P. and A. J. Link (2011). "Analyzing the cryptome: uncovering secret
sequences." Aaps J 13(2): 152-158.
Sasaki, T., H. Larsson, J. Kreuger, M. Salmivirta, L. Claesson-Welsh, U.
Lindahl, E. Hohenester and R. Timpl (1999). "Structural basis and potential role
of heparin/heparan sulfate binding to the angiogenesis inhibitor endostatin." The
EMBO Journal 18(22): 6240-6248.
Saska, I., A. D. Gillon, N. Hatsugai, R. G. Dietzgen, I. Hara-Nishimura, M. A.
Anderson and D. J. Craik (2007). "An asparaginyl endopeptidase mediates in
vivo protein backbone cyclisation." Journal of Biological Chemistry 282(40):
29721-29728.
Sayah, D. M., E. Sokolskaja, L. Berthoux and J. Luban (2004). "Cyclophilin A
retrotransposition into TRIM5 explains owl monkey resistance to HIV-1." Nature
430(6999): 569-573.
Schenk, S. and V. Quaranta (2003). "Tales from the crypt[ic] sites of the
extracellular matrix." Trends in Cell Biology 13(7): 366-375.
Schmid, M., T. S. Davison, S. R. Henz, U. J. Pape, M. Demar, M. Vingron, B.
Scholkopf, D. Weigel and J. U. Lohmann (2005). "A gene expression map of
Arabidopsis thaliana development." Nature Genetics 37(5): 501-506.
Schmitz, R. J., M. D. Schultz, M. G. Lewsey, R. C. O'Malley, M. A. Urich, O.
Libiger, N. J. Schork and J. R. Ecker (2011). "Transgenerational epigenetic
instability is a source of novel methylation variants." Science 334(6054): 369373.
Sela, S., A. Itin, S. Natanson-Yaron, C. Greenfield, D. Goldman-Wohl, S. Yagel
and E. Keshet (2008). "A novel human-specific soluble vascular endothelial
growth factor receptor 1: cell-type-specific splicing and implications to vascular
endothelial growth factor homeostasis and preeclampsia." Circulation Research
102(12): 1566-1574.
Semon, M. and K. H. Wolfe (2007). "Consequences of genome duplication."
Current Opinion in Genetics & Development 17(6): 505-512.
Settle, S. H., Jr., M. M. Green and K. C. Burtis (1995). "The silver gene of
Drosophila melanogaster encodes multiple carboxypeptidases similar to
mammalian prohormone-processing enzymes." Proceedings of the National
Academy of Sciences of the United States of America 92(21): 9470-9474.
Sheldon, P. S., J. N. Keen and D. J. Bowles (1996). "Post-translational peptide
bond formation during concanavalin A processing in vitro." Biochemical Journal
320: 865-870.
Shewry, P. (1999). Enzyme Inhibitors of Seeds: Types and Properties. Seed
Proteins. P. Shewry and R. Casey. Dordrecht, Kluwer: 587-615.
Shewry, P. and M. Pandya (1999). The 2S Albumin Storage Proteins. Seed
Proteins. P. Shewry and R. Casey. Dordrecht, Kluwer: 563-586.
Shewry, P. R. and N. G. Halford (2002). "Cereal seed storage proteins:
structures, properties and role in grain utilization." Journal of Experimental
Botany 53(370): 947-958.
Shewry, P. R., J. A. Napier and A. S. Tatham (1995). "Seed storage proteins:
structures and biosynthesis." Plant Cell 7: 945-956.
165
Shih, M.-d., F. A. Hoekstra and Y.-I. C. Hsing (2008). Late embryogenesis
abundant proteins, Academic Press.
Shimada, T., K. Fuji, K. Tamura, M. Kondo, M. Nishimura and I. Hara-Nishimura
(2003). "Vacuolar sorting receptor for seed storage proteins in Arabidopsis
thaliana." Proc Natl Acad Sci U S A 100(26): 16095-16100.
Shimada, T., Y. Koumoto, L. Li, M. Yamazaki, M. Kondo, M. Nishimura and I.
Hara-Nishimura (2006). "AtVPS29, a putative component of a retromer
complex, is required for the efficient sorting of seed storage proteins." Plant Cell
Physiol 47(9): 1187-1194.
Shimada, T., K. Yamada, M. Kataoka, S. Nakaune, Y. Koumoto, M. Kuroyanagi,
S. Tabata, T. Kato, K. Shinozaki, M. Seki, M. Kobayashi, M. Kondo, M.
Nishimura and I. Hara-Nishimura (2003). "Vacuolar processing enzymes are
essential for proper processing of seed storage proteins in Arabidopsis
thaliana." Journal of Biological Chemistry 278(34): 32292-32299.
Singh, A., E. Y. Chen, J. M. Lugovoy, C. N. Chang, R. A. Hitzeman and P. H.
Seeburg (1983). "Saccharomyces cerevisiae contains two discrete genes
coding for the α-factor pheromone." Nucleic Acids Research 11(12): 4049-4063.
Sommerhoff, C. P., O. Avrutina, H. U. Schmoldt, D. Gabrijelcic-Geiger, U.
Diederichsen and H. Kolmar (2010). "Engineered cystine knot miniproteins as
potent inhibitors of human mast cell tryptase beta." Journal of Molecular Biology
395(1): 167-175.
Souders, C. A., S. E. Maynard, J. Yan, Y. Wang, N. K. Boatright, J. Sedan, D.
Balyozian, P. S. Cheslock, D. C. Molrine and T. A. Simas (2015). "Circulating
levels of sFlt1 splice variants as predictive parkers for the development of
preeclampsia." International Journal of Molecular Sciences 16(6): 12436-12453.
Spiess, J., C. D. Mount, W. E. Nicholson and D. N. Orth (1982). "NH2-terminal
amino acid sequence and peptide mapping of purified human beta-lipotropin:
comparison with previously proposed sequences." Proceedings of the National
Academy of Sciences of the United States of America 79(16): 5071-5075.
Steiner, D. F., D. Cunningham, L. Spigelman and B. Aten (1967). "Insulin
biosynthesis: evidence for a precursor." Science 157(3789): 697-700.
Su, M., Y. Ling, J. Yu, J. Wu and J. Xiao (2013). "Small proteins: untapped area
of potential biological importance." Front Genet 4(286): 00286.
Swedberg, J. E., L. V. Nigon, J. C. Reid, S. J. de Veer, C. M. Walpole, C. R.
Stephens, T. P. Walsh, T. K. Takayama, J. D. Hooper, J. A. Clements, A. M.
Buckle and J. M. Harris (2009). "Substrate-guided design of a potent and
selective kallikrein-related peptidase inhibitor for kallikrein 4." Chemistry &
Biology 16(6): 633-643.
Takagi, J., L. Renna, H. Takahashi, Y. Koumoto, K. Tamura, G. Stefano, Y.
Fukao, M. Kondo, M. Nishimura, T. Shimada, F. Brandizzi and I. HaraNishimura (2013). "MAIGO5 functions in protein export from Golgi-associated
endoplasmic reticulum exit sites in Arabidopsis." Plant Cell 25: 4658-4675.
Takahashi, H., K. Tamura, J. Takagi, Y. Koumoto, I. Hara-Nishimura and T.
Shimada (2010). "MAG4/Atp115 is a Golgi-localized tethering factor that
mediates efficient anterograde transport in Arabidopsis." Plant and Cell
Physiology 51(10): 1777-1787.
166
Tam, J. P., Y. A. Lu, J. L. Yang and K. W. Chiu (1999). "An unusual structural
motif of antimicrobial peptides containing end-to-end macrocycle and cystineknot disulfides." Proceedings of the National Academy of Sciences of the United
States of America 96(16): 8913-8918.
Tan, N.-H. and J. Zhou (2006). "Plant cyclopeptides." Chemical Reviews
106(3): 840-895.
Tautz, D. (2014). "The discovery of de novo gene evolution." Perspectives in
Biology and Medicine 57(1): 149-161.
Tautz, D. and T. Domazet-Lošo (2011). "The evolutionary origin of orphan
genes." Nature Reviews Genetics 12(10): 692-702.
Taylor, Z. N., D. W. Rice and J. D. Palmer (2015). "The complete moss
mitochondrial genome in the angiosperm Amborella is a chimera derived from
two moss whole-genome transfers." PLoS ONE 10(11).
Tedford, H. W., J. I. Fletcher and G. F. King (2001). "Functional Significance of
the β-Hairpin in the Insecticidal Neurotoxin ω-Atracotoxin-Hv1a." Journal of
Biological Chemistry 276(28): 26568-26576.
Terras, F. R. G., H. M. E. Schoofs, M. F. C. De Bolle, F. Van Leuven, S. B.
Rees, J. Vanderleyden, B. P. A. Cammue and W. F. Broekaert (1992). "Analysis
of two novel classes of plant antifungal proteins from radish (Raphanus sativus
L.) seeds." J. Biol. Chem. 267: 15301-15309.
Thongyoo, P., C. Bonomelli, R. J. Leatherbarrow and E. W. Tate (2009). "Potent
inhibitors of β-tryptase and human leukocyte elastase based on the MCoTI-II
scaffold." Journal of Medicinal Chemistry 52(20): 6197-6200.
Thony-Meyer, L., A. Bock and H. Hennecke (1992). "Prokaryotic polyprotein
precursors." FEBS Letters 307(1): 62-65.
Thorpe, S. C., D. M. Kemeny, R. C. Panzani, B. McGurl and M. Lord (1988).
"Allergy to castor bean: II. Identification of the major allergens in castor bean
seeds." Journal of Allergy and Clinical Immunology 82(1): 67-72.
Toll-Riera, M., N. Bosch, N. Bellora, R. Castelo, L. Armengol, X. Estivill and M.
M. Alba (2009). "Origin of primate orphan genes: a comparative genomics
approach." Molecular Biology and Evolution 26(3): 603-612.
Torres, C., M. D. Fernandez, D. M. Flichman, R. H. Campos and V. A. Mbayed
(2013). "Influence of overlapping genes on the evolution of human hepatitis B
virus." Virology 441(1): 40-48.
Torresi, J. (2002). "The virological and clinical significance of mutations in the
overlapping envelope and polymerase genes of hepatitis B virus." J Clin Virol
25(2): 97-106.
Trevisan, G., G. Maldaner, N. A. Velloso, S. Sant'Anna Gda, V. Ilha, C. Velho
Gewehr Cde, M. A. Rubin, A. F. Morel and J. Ferreira (2009). "Antinociceptive
effects of 14-membered cyclopeptide alkaloids." J Nat Prod 72(4): 608-612.
Tyndall, J. D. A., T. Nall and D. P. Fairlie (2005). "Proteases universally
recognize beta strands in their active sites." Chemical Reviews 105(3): 9731000.
Uddling, J., J. Gelang-Alfredsson, K. Piikki and H. Pleijel (2007). "Evaluating the
relationship between leaf chlorophyll concentration and SPAD-502 chlorophyll
meter readings." Photosynthesis Research 91(1): 37-46.
167
Ueki, N., K. Someya, Y. Matsuo, K. Wakamatsu and H. Mukai (2007).
"Cryptides: functional cryptic peptides hidden in protein structures." Biopolymers
88(2): 190-198.
Van de Peer, Y., S. Maere and A. Meyer (2009). "The evolutionary significance
of ancient genome duplications." Nature Reviews Genetics 10(10): 725-732.
van der Graaff, E., R. Schwacke, A. Schneider, M. Desimone, U. I. Flugge and
R. Kunze (2006). "Transcription analysis of Arabidopsis membrane transporters
and hormone pathways during developmental and induced leaf senescence."
Plant Physiology 141(2): 776-792.
Vanderperre, B., J. F. Lucier,
Vanderperre, M. Wisztorski, M.
"Direct detection of alternative
human significantly expands the
C. Bissonnette, J. Motard, G. Tremblay, S.
Salzet, F. M. Boisvert and X. Roucou (2013).
open reading frames translation products in
proteome." PLoS ONE 8(8): e70698.
Vinckenbosch, N., I. Dupanloup and H. Kaessmann (2006). "Evolutionary fate of
retroposed gene copies in the human genome." Proceedings of the National
Academy of Sciences of the United States of America 103(9): 3220-3225.
Vitousek, P. M. and R. W. Howarth (1991). "Nitrogen limitation on land and in
the sea: how can it occur?" Biogeochemistry 13(2): 87-115.
Voss, R. H., U. Ermler, L.-O. Essen, G. Wenzl, Y.-M. Kim and P. Flecker
(1996). "Crystal Structure of the Bifunctional Soybean Bowman-Birk Inhibitor at
0.28-nm Resolution." European Journal of Biochemistry 242(1): 122-131.
Wada, S., H. Ishida, M. Izumi, K. Yoshimoto, Y. Ohsumi, T. Mae and A. Makino
(2009). "Autophagy plays a role in chloroplast degradation during senescence in
individually darkened leaves." Plant Physiology 149(2): 885-893.
Wang, Z., F. Chen, X. Li, H. Cao, M. Ding, C. Zhang, J. Zuo, C. Xu, J. Xu, X.
Deng, Y. Xiang, W.J.J. Soppe, and Y. Liu (2016). "Arabidopsis seed
germination speed is controlled by SNL histone deacetylase-binding factormediated regulation of AUX1." Nature Communications 7(13412).
Wang, C. K., C. W. Gruber, M. Cemazar, C. Siatskas, P. Tagore, N. Payne, G.
Sun, S. Wang, C. C. Bernard and D. J. Craik (2014). "Molecular grafting onto a
stable framework yields novel cyclic peptides for the treatment of multiple
sclerosis." ACS Chem Biol 9(1): 156-163.
Wang, W., H. Zheng, C. Fan, J. Li, J. Shi, Z. Cai, G. Zhang, D. Liu, J. Zhang, S.
Vang, Z. Lu, G. K.-S. Wong, M. Long and J. Wang (2006). "High Rate of
Chimeric Gene Origination by Retroposition in Plant Genomes." The Plant Cell
18(8): 1791-1802.
Watanabe, M., S. Balazadeh, T. Tohge, A. Erban, P. Giavalisco, J. Kopka, B.
Mueller-Roeber, A. R. Fernie and R. Hoefgen (2013). "Comprehensive
dissection of spatiotemporal metabolic shifts in primary, secondary, and lipid
metabolism during developmental senescence in Arabidopsis." Plant Physiology
162(3): 1290-1310.
Waterworth, W.M., S. Footitt, C.M. Bray, W.E. Finch-Savage, C.E. West (2016).
"DNA damage checkpoint kinase ATM regulates germination and maintains
genome stability in seeds." Proceedings of the National Academy of Sciences of
the United States of America 113(34): 9647-9652.
168
Weaver, L. M. and R. M. Amasino (2001). "Senescence is induced in
individually darkened Arabidopsis leaves, but inhibited in whole darkened
plants." Plant Physiol 127(3): 876-886.
Wiederstein, M. and M.J. Sippl, (2007) "ProSA-web: interactive web service for
the recognition of errors in three-dimensional structures of proteins." Nucleic
Acids Research 35:W407-W410.
Winter, D., B. Vinegar, H. Nahal, R. Ammar, G. V. Wilson and N. J. Provart
(2007). "An "electronic fluorescent pictograph" browser for exploring and
analyzing large-scale biological data sets." PLoS ONE 2(8): e718.
Witherup, K. M., M. J. Bogusky, P. S. Anderson, H. Ramjit, R. W. Ransom, T.
Wood and M. Sardana (1994). "Cyclopsychotride A, A biologically active, 31residue cyclic peptide isolated from Psychotria longipes." J. Nat. Prod. 57:
1619-1625.
Wolny, E., A. Braszewska-Zalewska and R. Hasterok (2014). "Spatial
distribution of epigenetic modifications in Brachypodium distachyon embryos
during seed maturation and germination." PLoS ONE 9(7).
Wong, C. T., D. K. Rowlands, C. H. Wong, T. W. Lo, G. K. Nguyen, H. Y. Li and
J. P. Tam (2012). "Orally active peptidic bradykinin B1 receptor antagonists
engineered from a cyclotide scaffold for inflammatory pain treatment." Angew
Chem Int Ed Engl 51(23): 5620-5624.
Wu, D.-D., D. M. Irwin and Y.-P. Zhang (2011). "De novo origin of human
protein-coding genes." PLoS Genetics 7(11): e1002379.
Wu, X. and P. A. Sharp (2013). "Divergent transcription: a driving force for new
gene origination?" Cell 155(5): 990-996.
Xiao, W., H. Liu, Y. Li, X. Li, C. Xu, M. Long and S. Wang (2009). "A rice gene
of de novo origin negatively regulates pathogen-induced defense response."
PLoS ONE 4(2): e4603.
Xie, C., Y. E. Zhang, J. Y. Chen, C. J. Liu, W. Z. Zhou, Y. Li, M. Zhang, R.
Zhang, L. Wei and C. Y. Li (2012). "Hominoid-specific de novo protein-coding
genes originating from long non-coding RNAs." PLoS Genetics 8(9): e1002942.
Xu, W., L. Li, L. Du and N. Tan (2011). "Various mechanisms in cyclopeptide
production from precursors synthesized independently of non-ribosomal peptide
synthetases." Acta Biochimica et Biophysica Sinica 43(10): 757-762.
Xue, S. A., M. D. Jones, Q. L. Lu, J. M. Middeldorp and B. E. Griffin (2003).
"Genetic diversity: frameshift mechanisms alter coding of a gene (Epstein-Barr
virus LF3 gene) that contains multiple 102-base-pair direct sequence repeats."
Molecular and Cellular Biology 23(6): 2192-2201.
Yamada, K., T. Shimada, M. Kondo, M. Nishimura and I. Hara-Nishimura
(1999). "Multiple functional proteins are produced by cleaving Asn-Gln bonds of
a single precursor by Vacuolar Processing Enzyme." Journal of Biological
Chemistry 274(4): 2563-2570.
Yamada, K., T. Shimada, M. Nishimura and I. Hara-Nishimura (2005). "A VPE
family supporting various vacuolar functions in plants." Physiologia Plantarum
123(4): 369-375.
169
Yamaguchi, N., B. Anand-Apte, M. Lee, T. Sasaki, N. Fukai, R. Shapiro, I. Que,
C. Lowik, R. Timpl and B. R. Olsen (1999). "Endostatin inhibits VEGF-induced
endothelial cell migration and tumor growth independently of zinc binding." The
EMBO Journal 18(16): 4414-4423.
Yamazaki, M., T. Shimada, H. Takahashi, K. Tamura, M. Kondo, M. Nishimura
and I. Hara-Nishimura (2008). "Arabidopsis VPS35, a retromer component, is
required for vacuolar protein sorting and involved in plant growth and leaf
senescence." Plant Cell Physiol 49(2): 142-156.
Yan, A., M. Wu, L. Yan, R. Hu, I. Ali, Y. Gan (2014). "AtEXP2 is involved in
seed germination and abiotic stress response in Arabidopsis." PLoS One 9(1):
1–10.
Yang, S., J. R. Arguello, X. Li, Y. Ding, Q. Zhou, Y. Chen, Y. Zhang, R. Zhao, F.
Brunet, L. Peng, M. Long and W. Wang (2008). "Repetitive element-mediated
recombination as a mechanism for new gene origination in Drosophila." PLoS
Genetics 4(1): e3.
Yang, Z. and J. Huang (2011). "De novo origin of new genes with introns in
Plasmodium vivax." FEBS Letters 585(4): 641-644.
Yoshida, S., S. Maruyama, H. Nozaki and K. Shirasu (2010). "Horizontal gene
transfer by the parasitic plant Striga hermonthica." Science 328(5982): 1128.
Yuan, C., L. Chen, E. Meehan, N. Daly, D. Craik, M. Huang and J. Ngo (2011).
"Structure of catalytic domain of Matriptase in complex with Sunflower trypsin
inhibitor-1." BMC Structural Biology 11(1): 30.
Zaaijer, H. L., F. J. van Hemert, M. H. Koppelman and V. V. Lukashov (2007).
"Independent evolution of overlapping polymerase and surface protein genes of
hepatitis B virus." Journal of General Virology 88(Pt 8): 2137-2143.
Zhang, D., J. Chen, L. Deng, Q. Mao, J. Zheng, J. Wu, C. Zeng and Y. Li
(2010). "Evolutionary selection associated with the multi-function of overlapping
genes in the hepatitis B virus." Infection Genetics and Evolution 10(1): 84-88.
Zhang, G., G. Guo, X. Hu, Y. Zhang, Q. Li, R. Li, R. Zhuang, Z. Lu, Z. He, X.
Fang, L. Chen, W. Tian, Y. Tao, K. Kristiansen, X. Zhang, S. Li, H. Yang and J.
Wang (2010). "Deep RNA sequencing at single base-pair resolution reveals
high complexity of the rice transcriptome." Genome Research 20(5): 646-654.
Zhang, J. (2006). "Parallel adaptive origins of digestive RNases in Asian and
African leaf monkeys." Nature Genetics 38(7): 819-823.
Zhang, J., A. M. Dean, F. Brunet and M. Long (2004). "Evolving protein
functional diversity in new genes of Drosophila." Proceedings of the National
Academy of Sciences of the United States of America 101(46): 16246-16250.
Zhang, J., Y. P. Zhang and H. F. Rosenberg (2002). "Adaptive evolution of a
duplicated pancreatic ribonuclease gene in a leaf-eating monkey." Nature
Genetics 30(4): 411-415.
Zhang, Y.-w., S. Liu, X. Zhang, W.-B. Li, Y. Chen, X. Huang, L. Sun, W. Luo, W.
J. Netzer, R. Threadgill, G. Wiegand, R. Wang, S. N. Cohen, P. Greengard, F.F. Liao, L. Li and H. Xu (2009). "A Functional Mouse Retroposed Gene Fg01
Reduces Alzheimer’s β-Amyloid Levels and Tau Phosphorylation." Neuron
64(3): 10.1016/j.neuron.2009.1008.1036.
170
Zhang, Y., Y. Wu, Y. Liu and B. Han (2005). "Computational Identification of 69
Retroposons in Arabidopsis." Plant physiology 138(2): 935-948.
Zhang, Y. E., P. Landback, M. D. Vibranovski and M. Long (2011). "Accelerated
recruitment of new brain development genes into the human genome." PLoS
Biology 9(10): e1001179.
Zhao, L., P. Saelao, C. D. Jones and D. J. Begun (2014). "Origin and spread of
de novo genes in Drosophila melanogaster populations." Science 343(6172):
769-772.
Zhou, Q. and W. Wang (2008). "On the origin and evolution of new genes--a
genomic and experimental perspective." J Genet Genomics 35(11): 639-648.
Zhou, Q., G. Zhang, Y. Zhang, S. Xu, R. Zhao, Z. Zhan, X. Li, Y. Ding, S. Yang
and W. Wang (2008). "On the origin of new genes in Drosophila." Genome
Research 18(9): 1446-1455.
Zhou, Y., J.-s. Wang, X.-j. Yang, D.-h. Lin, Y.-f. Gao, Y.-j. Su, S. Yang, Y.-j.
Zhang and J.-j. Zheng (2013). "Peanut allergy, allergen composition, and
methods of reducing allergenicity: A review." International Journal of Food
Science & Technology 2013: 1-8.
Zhu, Z., Y. Zhang and M. Long (2009). "Extensive structural renovation of
retrogenes in the evolution of the Populus genome." Plant physiology 151(4):
1943-1951.
Zhukov, I., L. Jaroszewski and A. Bierzynski (2000). "Conservative mutation
Met8 --> Leu affects the folding process and structural stability of squash trypsin
inhibitor CMTI-I." Protein Sci 9(2): 273-279.
171