Download Manuscript - CSIRO Research Publications Repository

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of botany wikipedia , lookup

Venus flytrap wikipedia , lookup

Plant physiology wikipedia , lookup

Historia Plantarum (Theophrastus) wikipedia , lookup

Ornamental bulbous plant wikipedia , lookup

Plant morphology wikipedia , lookup

Arabidopsis thaliana wikipedia , lookup

Plant disease resistance wikipedia , lookup

Sustainable landscaping wikipedia , lookup

Flowering plant wikipedia , lookup

Embryophyte wikipedia , lookup

Glossary of plant morphology wikipedia , lookup

Plant evolutionary developmental biology wikipedia , lookup

Xylem wikipedia , lookup

Transcript
1
Comparative genomics reveals conservative evolution of the
2
xylem transcriptome in vascular plants
3
4
Xinguo Li, Harry X. Wu*, and Simon G. Southerton*
5
6
CSIRO Plant Industry, GPO Box 1600, Canberra ACT 2601, Australia
7
8
*
Corresponding authors
9
10
Email addresses:
11
XL: [email protected]
12
HW: [email protected]
13
SS: [email protected]
14
15
16
17
18
19
20
21
22
1
23
Abstract
24
Background: Wood is a valuable natural resource and a major carbon sink. Wood
25
formation is an important developmental process in vascular plants which played a
26
crucial role in plant evolution. Although genes involved in xylem formation have been
27
investigated, the molecular mechanisms of xylem evolution are not well understood.
28
We use comparative genomics to examine evolution of the xylem transcriptome to
29
gain insights into xylem evolution.
30
Results: The xylem transcriptome is highly conserved in conifers, but considerably
31
evolved in angiosperms. The functional domains of genes in the xylem transcriptome
32
are moderately to highly conserved in vascular plants, suggesting the existence of a
33
common ancestral xylem transcriptome. Compared to the total transcriptome derived
34
from a range of tissues, the xylem transcriptome is relatively conserved in vascular
35
plants. Of the xylem transcriptome, cell wall genes, ancestral xylem genes, known
36
proteins and transcription factors are relatively more conserved in vascular plants. A
37
total of 527 putative xylem orthologs were identified, which are unevenly distributed
38
in Arabidopsis chromosomes with eight hot spots observed. Phylogenetic analysis
39
revealed that evolution of the xylem transcriptome has paralleled plant evolution. We
40
also identified 274 conifer-specific xylem unigenes, all of which are of unknown
41
function. These xylem orthologs and conifer-specific unigenes are likely to have
42
played a crucial role in xylem evolution.
43
Conclusions: Conifers have highly conserved xylem transcriptomes, while
44
angiosperm xylem transcriptomes are relatively diversified. Vascular plants share a
45
common ancestral xylem transcriptome. In vascular plants the xylem transcriptome is
46
more conserved compared to the total transcriptome.
47
2
48
Background
49
The evolution of xylem was a critical step that allowed vascular plants to colonize
50
vast areas of the earth’s terrestrial surface. Xylem plays a crucial role in the transport
51
of water and nutrients and provides mechanical support for vascular plants.
52
Herbaceous vascular plants develop primary xylem, and woody plants also produce
53
secondary xylem (or wood). Evolution from tracheids to vessels reflects a more
54
efficient way for angiosperms to develop secondary xylem [1-3]. Primary xylem
55
consists of cellulose, hemicellulose, pectin and proteins, while secondary xylem
56
contains higher amounts of cellulose and lignin. From a practical point of view, wood
57
represents a renewable natural resource for the timber, fibre and biofuel industries and
58
it is a major carbon sink in natural ecosystems.
59
60
In the last few decades the molecular basis of the xylem formation and evolution has
61
been investigated. Syringyl (S) lignin biosynthesis was found to be an evolved lignin
62
pathway in angiosperms, representing an addition to the ancient and predominant
63
guaiacyl (G) lignin pathway conserved in land plants [3-6]. However, the S lignin
64
pathway was also recently observed in lycophytes [7], suggesting a complex
65
evolutionary history of lignin biosynthesis in vascular plants. Genes involved in wood
66
formation have been identified in many plant species [8-13], allowing investigation of
67
xylem evolution at the transcriptome level. A core xylem gene set was identified in
68
white spruce among which 31 transcripts are highly conserved in Arabidopsis [14].
69
The expanding genomic resources of model plant species [15-18] are invaluable for
70
exploring many aspects of plant development and evolution, including xylem
71
formation and evolution. Recent studies suggest whole-genome duplication and
72
reorganization [15-17] have been a major driving force in plant evolution.
3
73
74
Comparative genomics is a powerful tool for investigating plant evolution at the
75
whole-genome level [19-27]. Lower collinearity across dicots and monocots [16, 21]
76
and lineage-specific genes have been observed using comparative genomics [28, 29].
77
Comparisons between rice and Arabidopsis showed that 50% of the rice genome was
78
homologous to Arabidopsis [16], while as much as 88% of the poplar genome shares
79
homology with Arabidopsis [17]. Comparative genomics of poplar with Arabidopsis
80
revealed at least 11,666 protein-coding genes were present in the ancestral eurosid
81
genome [17]. The recently sequenced Selaginella moellendorffii [30], moss [31] and
82
Eucalyptus grandis [18] provide new opportunities for comparative genomics in
83
vascular plants.
84
85
Many commercial crops and forest trees do not have sequenced genomes; thus,
86
comparative genomics in these species must rely on studies of the transcriptome
87
(expressed sequence tags, ESTs). Using the transcriptome approach about half of
88
loblolly pine xylem unigenes did not match Arabidopsis unigenes [32], and 39-45%
89
of pine contigs had no hits in the genomes of Arabidopsis, rice and poplar [33]. Eighty
90
percent of white spruce and 32-36% of sitka spruce unigenes lacked homologs in
91
poplar, Arabidopsis and rice [12, 33]. Similar results were also observed in other
92
comparisons between gymnosperms and angiosperms [19, 34]. In contrast,
93
comparison
94
transcriptomes. For example, about 84% of the white spruce transcripts had matches
95
in the Pine Gene Index [12], and 60-80% of spruce contigs had homologs in loblolly
96
pine and other gymnosperms [33].
of
different
gymnosperm
97
4
species
revealed
highly
conserved
98
Although comparative genomics has increased our understanding of plant evolution at
99
the genome level, most studies to date have focussed on the entire genome sequence,
100
and/or mixed EST resources developed from a range of tissues. In order to examine
101
xylem evolution in vascular plants we focused our attention on genes expressed
102
specifically in xylem. We selected ten species (see Methods) to represent different
103
categories of vascular plants, including gymnosperms (pine and spruce), angiosperms
104
(woody dicots, herbaceous dicots, and monocots) and lycophytes. The non-vascular
105
plant moss was included as a reference for xylem evolution. Up-to-date public
106
genome sequences and xylem transcriptomes of selected plant species were used for
107
comparative genomic analysis, aiming to explore the evolution of the xylem
108
transcriptome in vascular plants.
109
110
Results
111
The xylem transcriptome is highly conserved in conifer species
112
Comparisons of xylem unigenes of the three conifer species (radiata pine, loblolly
113
pine and white spruce) with other vascular plants and moss are shown in Figure 1A-F.
114
At the nucleotide level 78-82% (E ≤ 10-5) and 63-66% (E ≤ 10-50) of radiata pine
115
xylem unigenes have homologs in the xylem unigenes of loblolly pine and white
116
spruce, while 93-95% (E ≤ 10-5) and 74-89% (E ≤ 10-50) have homologs in the gene
117
indices of pine and spruce (Figure 1A). Thus, the radiata pine xylem transcriptome is
118
highly conserved in other conifer species. Further analysis with loblolly pine and
119
white spruce revealed that conifer xylem transcriptomes are highly conserved at both
120
the nucleotide (Figure 1B and 1C) and deduced amino acid sequence level (Figure
121
1D-F).
122
5
123
Conifer xylem transcriptomes are considerably distinct from angiosperms and
124
lycophytes. At the nucleotide level, only 15-32% (E ≤ 10-5) and 1-4% (E ≤ 10-50) of
125
radiata pine xylem unigenes have homologs in the gene models or scaffolds of poplar,
126
Eucalyptus, Arabidopsis, rice, Selaginella and moss (Figure 1A). Similarly low
127
numbers of homologs were identified in analyses with loblolly pine (Figure 1B) and
128
white spruce (Figure 1C). Unsurprisingly, the deduced amino acid sequences of
129
conifer xylem unigenes have relatively more homologs in non-coniferous plants
130
(Figure 1D-F). However, the number of homologs is still considerably lower than the
131
number of matches among the three conifers (Figure 1D-F). Therefore, at both the
132
nucleotide and amino acid level the conifer xylem transcriptome is relatively distinct
133
from angiosperms, lycophytes and moss.
134
135
We examined the average nucleotide identity among the homologous xylem unigenes
136
shared by the different conifer species. The average nucleotide identity between
137
radiata pine and loblolly pine is 97.5%, and 91% between radiata pine and the two
138
spruce species (Figure 2A). Lower nucleotide identity (81-83%) was observed in the
139
homologous unigenes of radiata pine and angiosperms (Populus, Arabidopsis and
140
rice) or moss (Figure 2A). These results were also confirmed in analyses in loblolly
141
pine (Figure 2B), providing additional evidence for xylem transcriptome conservation
142
in conifers, and its divergence in non-coniferous plants.
143
144
The xylem transcriptome has evolved considerably in angiosperms
145
Poplars are woody angiosperms and are used as a model tree species for wood
146
formation. Surprisingly, at the level of nucleotide sequence less than 5% of poplar
147
xylem unigenes have homologs (E ≤ 10-50) in the scaffolds or gene models of the three
6
148
angiosperms (Eucalyptus, Arabidopsis and rice) (Figure 3A). Poplar xylem unigenes
149
also have few homologs in the gene indices of pine or spruce and in the gene models
150
of Selaginella and moss (Figure 3A). These results suggest that the poplar xylem
151
transcriptome is distinct, and that the xylem transcriptome has undergone considerable
152
evolution in angiosperms. This contrasts with the highly conserved xylem
153
transcriptomes of gymnosperms we observed earlier (Figure 1). Furthermore, poplar
154
also has a unique genome (at the nucleotide level) among the species examined in this
155
study (Figure 3C). However, the functional gene sequences of the poplar xylem
156
transcriptome (Figure 3B) and the entire genome (Figure 3D) are moderately (36-50%
157
at E ≤ 10-50) conserved, while their functional domains are highly conserved (77-85%
158
at E ≤ 10-5) in other vascular and non-vascular plants. These results suggest that land
159
plants share an ancestral genome and an ancestral xylem transcriptome .
160
161
The xylem transcriptome is relatively more conserved in vascular plants
162
We examined the pattern of the xylem transcriptome evolution by comparing it with
163
the total transcriptome derived from a range of tissues. Although the xylem
164
transcriptomes of pine, spruce and poplar have few homologs in the gene models of
165
Arabidopsis, rice, Selaginella and moss (1.2-5% at E ≤ 10-50), there are about two to
166
four times more homologs than there are for the total transcriptome (Figure 4A).
167
When using a lower stringency E-value (≤ 10-5), the xylem transcriptomes of pine,
168
spruce and poplar have more homologs (about 20-60%) in the above four model
169
species compared to the total transcriptome (9-34%). Thus, at the nucleotide level the
170
xylem transcriptome is relatively more conserved in vascular and non-vascular plants.
171
Interestingly, both the conifer xylem and the total transcriptomes have few homologs
172
in poplar (a woody species) and in herbaceous species (Arabidopsis, rice, Selaginella
7
173
and moss) (Figure 4A). This suggests that conifer xylem formation is distinct from
174
xylem formation in woody angiosperms.
175
176
The conservative evolution of the xylem transcriptome was more clearly observed
177
when using deduced amino acid sequences (Figure 4B). For example, 37-46% of the
178
conifer xylem transcriptome matched (E ≤ 10-50) the genomes of angiosperms
179
(Populus, Arabidopsis and rice), lycophyte and moss; nearly two times more than the
180
matches with the total conifer transcriptome (Figure 4B). Similar results were also
181
obtained with the poplar xylem transcriptome in the comparisons with the total
182
transcriptome (Figure 4B). Therefore, the functional gene sequences in the xylem
183
transcriptome are significantly more conserved in vascular plants.
184
185
Different functional gene groups in the xylem transcriptome have distinct
186
patterns of evolution
187
We divided the xylem transcriptome into different functional groups for investigating
188
their diverse patterns of evolution. The xylem unigenes with matches (blastx, E ≤ 10-
189
5
190
matches as vascular plant-specific xylem (VSX) genes. Among the ancestral xylem
191
genes derived from radiata pine, loblolly pine and white spruce, 96-99%, 78-94% and
192
48-60% (E ≤ 10-5), or 41-48%, 20-26% and 28-35% (E ≤ 10-50), respectively, have
193
homologs (blastx) in the genomes of the three angiosperms and the lycophyte (Figure
194
5A-C). In contrast, only 8-28% (E ≤ 10-5) and 0.4-1% (E ≤ 10-50) of the conifer VSX
195
genes are homologous to the genomes of these model vascular plants (Figure 5A-C).
196
These results suggest that the ancestral xylem genes are moderately to highly
197
conserved in non-coniferous vascular plants, while considerable evolution has
) in the moss gene models are defined as ancestral xylem genes, and those with no
8
198
occurred in VSX genes among conifers, angiosperms and lycophytes. Similar patterns
199
were observed in comparisons of VSX and ancestral xylem genes with the poplar
200
xylem transcriptome (Figure 5D).
201
202
Cell wall genes were identified according to the CellWallNavigator [35] and
203
MAIZEWALL [36] databases. We compared the level of conservation of cell wall
204
genes to non-cell-wall genes (Figure 5A-D). In the four woody species (radiata pine,
205
loblolly pine, white spruce and poplar) the cell wall genes have 1.5 to two times more
206
homologs than the non-cell-wall genes in the genomes of the four herbaceous model
207
plants (Figure 5A-D). This suggests that cell wall genes have been highly conserved
208
across diverse land plants. Comparisons of transcription factors (TFs) (based on
209
PlantTFDB [37]) with non-TFs and known functional genes with unknowns revealed
210
significantly more conservation of transcription factors and known functional genes
211
among different groups of vascular and non-vascular plants (Additional data file 1).
212
213
Identification of putative xylem orthologs in vascular plants
214
Putative xylem orthologs conserved in diverse plant groups were identified using
215
deduced amino acid or protein sequences (Additional data file 2), and their numbers
216
are briefly presented at two E-value cut-offs (≤ 10-50 and ≤ 10-5) in Figure 6. The
217
xylem orthologs (9,164 at E ≤ 10-5 or 3,460 at E ≤ 10-50) shared by loblolly pine, white
218
spruce and sitka spruce represent a common gene set expressed in conifer wood
219
formation. While the xylem orthologs (6,641 at E ≤ 10-5 or 1,339 at E ≤ 10-50)
220
common to the three conifers and poplar are putative woody plant xylem orthologs.
221
Surprisingly, the number of putative xylem orthologs in other plant groups is only
222
slightly smaller (Figure 6), suggesting that very limited numbers of xylem orthologs
9
223
are specific to each plant group. The majority (93-96%) of the vascular plant xylem
224
orthologs have matches in the gene models of a non-vascular plant (moss). This
225
suggests that the xylem transcriptome of vascular plants has largely evolved from a
226
subset of genes found in non-vascular plants. This is likely to be a group of genes
227
predominantly involved in cell wall biosynthesis that have been recruited to
228
synthesise secondary xylem in higher plants.
229
230
Of the 1,003 loblolly pine xylem unigenes with homologs (tblastx or blastx, E ≤ 10-50)
231
in the six vascular plants, 94% and 100% have hits in the uniprot and TIGR databases,
232
respectively, and 94% are assigned with GO terms (Additional data file 2). Among
233
these unigenes 349 (35%) were identified as cell wall genes, and the most abundant
234
gene family is tubulin (16 unigenes). Cellulose synthase is also abundant (10). Several
235
primary wall genes are abundant including pectate lyase (8), pectinesterase (7), XET
236
(7), expansin (6) and peroxidase (5). Lignin biosynthesis-related genes are well
237
represented in the cell wall gene lists, including SAMS (11), laccase (9), methionine
238
synthase (cobalamin-independent) (6), 4CL (3), COMT (2), CAD (2) and CCoAMT
239
(3). In addition, 49 transcription factors were found, including NAC, PHD, HB,
240
MYB, LIM, CCCH, etc. The large number of transcription factors suggests that wood
241
formation involves considerable transcriptional regulation. Interestingly, aquaporins
242
(16 unigenes) is one of the largest gene families in the 1,003 unigenes, reflecting the
243
importance of water transport in xylem formation.
244
245
The 1,003 loblolly pine xylem unigenes have between 749 and 894 close homologs
246
(blastx or tblastx, E ≤ 10-50) in white spruce, sitka spruce, poplar, Arabidopsis, rice
247
and Selaginella, with 527 xylem orthologs common to all the above species. These
10
248
common xylem orthologs matched (blastx, E ≤ 10-50) 501 unique gene models in moss,
249
indicating that at least 501 ancestral xylem orthologs are shared in land plants. We
250
also identified all loci in Arabidopsis and rice with homology to the 1,003 loblolly
251
pine xylem unigenes. All possible homologs in Arabidopsis are represented by 3,115
252
and 6,563 unique loci at E ≤ 10-50 or ≤ 10-5, respectively, and in rice by 3,057 and
253
8,458 unique loci. This suggests that each xylem ortholog has an average of 4-8
254
paralogs in Arabidopsis and 5-14 paralogs in rice.
255
256
The xylem orthologs and paralogs have an even distribution among chromosomes of
257
Arabidopsis (Additional data file 3A) and poplar (Additional data file 3C), but may be
258
unevenly distributed among rice chromosomes (Additional data file 3B). However,
259
eight hot spots were observed within the five Arabidopsis chromosomes (Figure 7A,
260
B), which contrasts with the even distribution of all Arabidopsis genes across
261
chromosomes [38] (Figure 7C). These hot spots may be due to historical patterns of
262
gene and chromosome duplication [15, 39, 40]. Within the chromosomes of rice and
263
poplar the xylem orthologs are relatively evenly distributed (Additional data file 4 and
264
Additional data file 5). Different distribution patterns of the xylem orthologs among
265
the Arabidopsis, rice and poplar genomes may reflect their diverse history of gene
266
duplication and segmental chromosomal rearrangements.
267
268
Molecular evolution of xylem orthologs in vascular plants
269
We used the 501 ancestral xylem orthologs for phylogenetic analysis because they
270
largely (95%) represent the 527 unique xylem orthologs and it allowed us to include
271
the non-vascular plant moss as a reference. Three different trees (Additional data file
272
6A-C) were constructed using four alignment methods (see Methods), in which
11
273
ClustalX2 and Mafft generated the same tree configuration (Additional data file 6B).
274
Among these trees, morphologically related species are consistently clustered in the
275
same group. The trees constructed by Kalign (Additional data file 6A) and ClustalX2
276
(or Mafft) (Additional data file 6B) clearly separate angiosperms and gymnosperms
277
from lycophytes and moss. Based on tree B moss and Selaginella emerged at the
278
earliest time point followed by the two conifers (loblolly pine and white spruce) and
279
then by the two dicots (poplar and Arabidopsis). The monocot xylem transcriptome
280
appears to be the most evolutionarily distinct. In summary, the phylogenetic analysis
281
suggests that molecular evolution of xylem orthologs parallels plant evolution.
282
283
Identification of putative conifer-specific xylem genes
284
A total of 274 loblolly pine xylem unigenes which matched unigenes of white spruce
285
and sitka spruce (tblastx, E ≤ 10-50), but have no homologs in the gene models of
286
poplar, Arabidopsis, rice, Selaginella and moss, as well as no hits in any other
287
angiosperms in the uniprot known protein database (blastx, E > 10-5), were identified
288
as putative conifer-specific xylem unigenes (Additional data file 7). All conifer-
289
specific xylem unigenes are unknown protein sequences based on the uniprot known
290
protein database, and only 16% were assigned with GO terms in the TIGR gene index
291
database. This contrasts with 94% of the xylem orthologs annotated with GO terms.
292
The poor annotation of the conifer-specific xylem genes is likely due to the intense
293
functional characterisation of genes that has taken place in angiosperms.
294
295
Of the conifer-specific xylem unigenes several relatively abundant transcripts have
296
low homology to arabinogalactan-like proteins (AGPs), glycine/proline-rich proteins
297
(GPRP), metallothionein-like class II proteins, neurofilament triplet H proteins, zinc
12
298
finger proteins, cytochrome c1, etc (Additional data file 7). Genes with low similarity
299
to other cell wall proteins are present, such as alpha tubulin, cellulose synthase like
300
and extensin-like proteins. The conifer-specific xylem unigenes also include some
301
genes similar to transcription factors, i.e. Aux/IAA, NAC, CCAAT-binding, WRKY,
302
R2R3-MYB, BHLH and CCAAT-binding (Additional data file 7). These conifer-
303
specific xylem unigenes may share the functions of related homologues, but some
304
may have unique roles in gymnosperm wood formation.
305
306
Discussion
307
Gymnosperms and angiosperms have distinct patterns of xylem transcriptome
308
evolution
309
Our data revealed that the xylem transcriptome is highly conserved in conifers but has
310
undergone considerably more diversification in angiosperms. This suggests the xylem
311
transcriptome has a distinct pattern of evolution in angiosperms compared to
312
gymnosperms. Pine and spruce diverged 120-140 Ma [41], slightly earlier than the
313
radiation of angiosperms (100-120 Ma) [16, 17]. The rapid evolution of the
314
angiosperm xylem transcriptome suggests greater sensitivity to evolutionary forces in
315
flowering plants. Our results also showed that poplar has a relatively distinct genome
316
and xylem transcriptome, thus poplar is unlikely to be a useful model for investigating
317
wood formation and other developmental processes in gymnosperms. Mechanisms
318
underlying the diverse patterns of xylem transcriptome evolution in angiosperms and
319
gymnosperms are not well understood. Gene duplications and segmental chromosome
320
rearrangements may be the major driving forces for this diversification; however, the
321
role of genome size or gene number remains unclear.
322
13
323
The common ancestor of the xylem transcriptome in vascular plants
324
Our data demonstrated that the nucleotide and protein sequences of the xylem
325
transcriptome are highly diversified among different categories of vascular plants, but
326
that their functional domains are moderately to highly conserved in vascular plants.
327
This is consistent with the diversification of all vascular plants from a common
328
ancestor which is thought to have lived about 400 Ma [42]. Some of these conserved
329
patterns are also consistent with comparisons using the total transcriptome in previous
330
studies [32, 43]. The conserved functional domains of the xylem transcriptome
331
suggest the existence of a shared ancestral xylem transcriptome of vascular plants.
332
333
The common ancestral xylem transcriptome should be present in the earliest ancient
334
vascular plants, which produced hilate/trilete spores during the Late Ordovician [44].
335
The lycophyte Selaginella is among the closest living relatives of these plants and its
336
xylem transcriptome is likely to resemble most closely the ancestral xylem
337
transcriptome. The 5,047 (27%) xylem unigenes of loblolly pine with homologs (E
338
≤10-5) in the six vascular plants including Selaginella could largely represent the
339
ancestral xylem transcriptome. Because the vast majority (95%) of the ancestral
340
xylem transcriptome has homologs (E ≤10-5) in the moss genome, the xylem
341
transcriptome is likely to have evolved largely from the cell wall transcriptome of
342
non-vascular plants, which can be traced back as far as 450 Ma [31].
343
344
Conservative evolution of the xylem transcriptome in vascular plants
345
Xylem functions in the transport of water and solutes throughout the plant and
346
together with the phloem makes up the vascular transport tissues of plants. Our data
347
suggests that the xylem transcriptome is relatively more conserved than the total
348
transcriptome in vascular plants. Several functional gene groups within the xylem
14
349
transcriptome are significantly more conserved among vascular plants, including cell
350
wall genes, ancestral xylem genes, transcription factors and known protein genes.
351
This implies that xylem has evolved more slowly compared to other organs of
352
vascular plants such as leaves, flowers and seeds. On the other hand, the unknown
353
function and VSX genes have been more rapidly evolving in vascular plants. The
354
rapid evolution of vascular plant-specific xylem (VSX) genes contrasts with the
355
generally conservative evolution of the total xylem transcriptome, suggesting VSX
356
genes are particularly sensitive to evolutionary forces. These genes will be useful
357
targets for functional characterisation in order to understand xylem formation and
358
evolution in vascular plants.
359
360
Our data showed that the loblolly pine xylem transcriptome has significantly more
361
homologs in spruce (white spruce and sitka spruce) than in poplar and Eucalyptus,
362
thus it is possible to identify genes specific to gymnosperm wood formation
363
(softwoods). Surprisingly, the number of homologs of the loblolly pine xylem
364
transcriptome in woody angiosperms (poplar and Eucalyptus) is similar to that in
365
herbaceous angiosperms (Arabidopsis and rice), suggesting that a limited number of
366
xylem genes are specific to woody plants. Furthermore, the number of homologs of
367
the conifer xylem transcriptome in angiosperms is only slightly more than the number
368
in lycophytes and moss. The limited number of xylem genes specific to woody plants
369
suggests that differential transcriptional regulation may have played an important role
370
in xylem evolution; in a similar way that transcriptional regulation gives rise to plant
371
diversity [20].
372
373
Comparison of gene expression between softwoods and hardwoods
15
374
A previous study in Cryptomeria japonica identified 56 putative conifer-specific
375
transcripts, including three specific to reproductive organs and one (unknown)
376
specific to woody tissues [45]. Here we identified 274 conifer-specific xylem
377
unigenes, all of which are of unknown function. Transcripts with low homology to
378
some cell wall genes are abundant or present in the conifer-specific xylem unigenes
379
(Additional data file 7). This suggests that some cell wall genes, such as tubulin and
380
CesA, may be specific to conifer wood formation. These conifer-specific xylem genes
381
could provide clues to the molecular processes that give rise to the distinct cell wall
382
structures and chemical properties of softwoods compared to hardwoods. Genes
383
related to lignin biosynthesis are not present in the conifer-specific xylem unigenes,
384
which is consistent with the earlier conclusion that the conifer lignin pathway is
385
conserved in other vascular plants [3].
386
387
Transcripts similar to genes encoding arabinogalactan proteins (AGPs) are the most
388
abundant genes in the conifer-specific xylem unigenes. AGPs are a large class of
389
hydroxyproline-rich glycoproteins. The conifer-specific xylem AGPs (AGP4 and
390
AGP-like) identified in this study had no homologs with any of the 47 AGPs of
391
Arabidopsis [46-48]. Most AGPs are anchored to the plasma membrane [46] and
392
released into the cell wall after cleavage of the GPI anchor [49]. AGPs may act as cell
393
wall plasticizers, enlarging the pectin matrix, and allowing wall extension and cell
394
expansion [50]. In loblolly pine six AGPs including AGP4 were identified in xylem
395
tissues [51]. Radiata pine AGPs are located in the compound middle lamella of newly
396
developed tracheids [52]. The radiata pine EST resource included 11 AGPs, in which
397
PrAGP4 is highly abundant in earlywood libraries [13].
398
16
399
We identified 527 xylem orthologs in vascular plants, including many primary and
400
secondary cell wall genes as well as transcription factors. Conservation of cell wall
401
genes suggests that cell wall biosynthesis is a central event in xylem formation in
402
vascular plants. These xylem orthologs maintain the stability of the basic xylem
403
machinery in vascular plants and may regulate development of xylem structures and
404
properties shared by different vascular plants, such as the common features of
405
softwoods and hardwoods. COMT was previously thought to be one of three enzymes
406
(F5H/Cald5H, COMT and SAD) specifically involved in S lignin synthesis of
407
angiosperms [3]. The occurrence of COMT genes in the xylem orthologs of vascular
408
plants suggests their involvement in diverse functions other than S lignin synthesis.
409
410
Conclusions
411
The xylem transcriptome is highly conserved in conifer species, but the nucleotide
412
sequences of the xylem transcriptome are significantly distinct among angiosperms,
413
thus considerable evolution of the xylem transcriptome has occurred in angiosperms.
414
The functional domains of the xylem transcriptome are moderately to highly
415
conserved among vascular and non-vascular plants. This suggests that vascular plants
416
share an ancestral xylem transcriptome which is likely to have evolved predominantly
417
from the cell wall transcriptome of non-vascular plants. In comparison to the total
418
transcriptome from a wide range of tissues, the xylem transcriptome has evolved
419
conservatively in vascular plants. Several functional gene groups within the xylem
420
transcriptome, including cell wall genes, ancestral xylem genes, transcription factors
421
and known function genes, are relatively more conserved in vascular plants. A total of
422
527 xylem orthologs of vascular plants and 274 conifer-specific xylem genes were
423
identified in this study. These genes provide good targets for molecular investigations
17
424
of xylem formation and evolution in softwoods (conifers) and hardwoods (woody
425
angiosperms). The xylem orthologs are unevenly distributed within Arabidopsis
426
chromosomes, with eight hot spots detected. Phylogenetic analysis of the 501
427
ancestral xylem orthologs suggests that molecular evolution of the xylem
428
transcriptome has paralleled plant evolution.
429
430
Methods
431
Plant species
432
We selected ten species to represent three major classes of vascular plants
433
(gymnosperms, angiosperms and lycophytes). Four conifer species: Pinus radiata
434
(radiata pine), Pinus taeda (loblolly pine), Picea glauca (white spruce) and Picea
435
sitchensis (sitka spruce), represent gymnosperms. Four species of dicots including
436
Populus tremula × Populus tremuloides (hybrid aspen), Populus trichocarpa,
437
Eucalyptus grandis and Arabidopsis thaliana, and one monocot (Oryza sativa, rice)
438
represent angiosperms. The recently sequenced Selaginella moellendorffii [30]
439
represents lycophytes. The non-vascular plant Physcomitrella patens (moss) was
440
included as a reference. The seven woody species (four conifer and three woody dicot
441
species) both undergo primary and secondary xylem formation. Arabidopsis also
442
develops secondary xylem and has been used as a model system for secondary xylem
443
development [10, 11, 53-56]. The monocot rice and the lycophyte S. moellendorffii
444
only produce primary xylem, while the non-vascular plant moss has no xylem at all.
445
446
Public genomic resources
447
We previously developed a radiata pine xylem transcriptome resource with 3,304
448
xylem unigenes [13]. Unigenes of loblolly pine, white spruce, sitka spruce, hybrid
18
449
aspen, P. trichocarpa, Arabidopsis, rice and moss (Additional data file 8) were
450
retrieved from the NCBI UniGene database [57]. The gene indices of pine, spruce and
451
poplar (Additional data file 8) were collected from the TIGR gene index database
452
[58]. These gene indices represent the total transcriptome from a wide range of tissues
453
(such as stem, shoots, xylem, leaves, flowers, bark, roots and seeds, etc). A sub-set of
454
xylem gene indices of pine, spruce and poplar was separately retrieved from pure
455
xylem libraries. These included genes expressed in pure xylem tissues, while mixed
456
xylem tissues (such as shoots, cambial regions with phloem, etc) were not considered.
457
After removing redundant sequences a total of 14,527, 15,262 and 9,109 xylem gene
458
indices of pine, spruce and poplar were identified. To minimize the bias in
459
comparative genomics, pure xylem gene indices were reassembled using the same
460
method and parameters as used in radiata pine [13], resulting in 18,320, 12,489 and
461
7,991 pure xylem unigenes for pine, spruce and poplar, respectively.
462
463
The public transcriptome resources (unigenes and gene indices) are unlikely to
464
include all transcripts expressed in xylem formation due to sampling limitations.
465
Therefore, selected fully sequenced plant species (P. trichocarpa, E. grandis,
466
Arabidopsis, rice, S. moellendorffii and P. patens) were added for comparative
467
genomics in this study (Additional data file 8). Gene models of P. trichocarpa (v1.1),
468
S. moellendorffii (v1.0) and P. patens (moss) (v1.1) were downloaded from the JGI
469
website [59]. E. grandis scaffolds were downloaded from EucalyptusDB [18]. A.
470
thaliana gene models (TAIR8) were downloaded from the TAIR website [60] and O.
471
sativa gene models (v6.0) were downloaded from the MSU website [61].
472
473
Comparative genomic analysis
19
474
Blast programs (blastn, tblastx, blastx, blastp and tblastn) were used for sequence
475
comparisons. All blast searches were performed locally using BlastStation2 (TM
476
software, Inc., CA). Expected-value (E-value or E), sequence identity and bit scores
477
were collected for evaluating single sequence similarity. Percentage of hits, average
478
sequence identity and average bit score at different E-value cut-offs were calculated
479
for the transcriptome comparisons between different plant species.
480
481
The E-value cut-offs for inferring sequence similarity vary from 10-3 [17, 24] to 10-50
482
[62, 63], but 10-10 to 10-30 have been widely used [19, 32, 43, 64-67]. To ensure high
483
confidence as previously suggested [62] and to increase the likelihood of inferring
484
biological significance, we used the following thresholds of E-values: (a) E = 0, the
485
two sequences are identical and considered to derive from the same gene; (b) 10-50 ≤ E
486
< 0, the two sequences are highly similar and likely to be from the same gene family;
487
(c) 10-5 ≤ E < 10-50, the two sequences are similar in one or more regions along the
488
whole sequence and likely to contain the same functional domain(s) or motif(s); (d) E
489
< 10-5, the two sequences are likely to be unrelated genes. As a general rule, E ≤ 10-50
490
was interpreted in this study to indicate whole sequence similarity of two given
491
sequences (gene conservation or apparent homologs); and 10-5-10-50 was interpreted to
492
indicate partial sequence similarity (domain or motif conservation). However, the
493
possible influences of different blast programs and database size were not considered
494
in setting the E-value thresholds.
495
496
Phylogenetic analysis
497
The predicted amino acid sequences of ancestral xylem orthologs from loblolly pine
498
and white spruce, and protein sequences from the five model species (poplar,
20
499
Arabidopsis, rice, Selaginella and moss) were used for phylogenetic analysis. For
500
each species all sequences of ancestral xylem orthologs were combined into one
501
sequence prior to phylogenetic analysis. Sequence alignments were compared using
502
four methods: ClustalX2 [68], Kalign [68], Mafft [68] and Mega4 [69]. Phylogenetic
503
trees were built using the neighbour-joining algorithm with a bootstrap of 1000.
504
505
Abbreviations
506
Mb, million base pairs; Ma, million years ago; EST, expressed sequence tag; E, E-
507
value or expected-value; GO, gene ontology; PGI, pine gene index; SGI, spruce gene
508
index; PplGI, Populus gene index; PGI xylem, gene index from pure xylem tissues of
509
pines; SGI xylem, gene index from pure xylem tissues of spruces; PplGI xylem, gene
510
index from pure xylem tissues of Populus; VSX genes, vascular plant-specific xylem
511
genes; TF, transcription factor; CesA, cellulose synthase; AGP, arabinogalactan
512
protein; COMT, caffeic acid 3-O-methyltransferase.
513
514
Authors’ contributions
515
XL contributed the project idea, collected relevant sequence data, conducted the
516
comparative analysis and prepared the manuscript. SS and HW proposed and guided
517
the research project. All the authors have approved the final manuscript.
518
519
Additional data files
520
Additional data file 1 is a figure showing that known protein genes and transcription
521
factors in the xylem transcriptome are relatively more conserved in diverse plants.
522
Additional data file 2 is a table listing 527 xylem orthologs in vascular plants based on
21
523
the loblolly pine xylem unigenes with homologs (tblastx or blastx, E ≤ 10-50) in the
524
unigenes or gene models of white spruce, sitka spruce, poplar, Arabidopsis, rice and
525
Selaginella. Additional data file 3 is a figure showing the distribution of xylem
526
orthologs on each chromosome of Arabidopsis, rice and poplar, and its significance
527
(P-value) compared to all loci in each chromosome. Additional data file 4 is a figure
528
showing locations of xylem orthologs on rice chromosomes (A: 3,057 homologs
529
based on 1E-50; B: 8,458 homologs based on 1E-5; C: 3,000 randomly selected loci).
530
Additional data file 5 is a figure showing locations of xylem orthologs on poplar
531
chromosomes (A: 3,523 homologs based on 1E-50; B: 7,436 homologs based on 1E-
532
5; C: 3,200 randomly selected loci). Additional data file 6 is a figure showing
533
phylogenetic trees for 501 ancestral xylem orthologs using four alignment methods (A:
534
Kalign; B: ClustalX2 or Mafft; C: Mega 4). Additional data file 7 is a table listing 274
535
putative conifer-specific xylem unigenes. Additional data file 8 is a table summarising
536
the public genomic resources of 11 plant species used in this study.
537
538
Acknowledgements
539
This work was funded by Forest and Wood Products Australia (FWPA), ArborGen
540
LLC, the Southern Tree Breeding Association (STBA), Queensland Department of
541
Primary Industry (QDPI) and the Commonwealth Scientific and Industrial Research
542
Organization (CSIRO). We thank Iain Wilson, Shannon Dillon, Colleen MacMillan,
543
Bryan Clarke and Jason Bragg for their critical comments on the manuscript.
544
545
References
546
547
1.
Sperry JS: Evolution of water transport and xylem structure. Int J Plant
Sci 2003, 164(3):S115-S127.
22
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
Boyce CK, Zwieniecki MA, Cody GD, Jacobsen C, Wirick S, Knoll AH,
Holbrook NM: Evolution of xylem lignification and hydrogel transport
regulation. Proc Natl Acad Sci U S A 2004, 101(50):17555-17558.
Peter G, Neale D: Molecular basis for the evolution of xylem lignification.
Curr Opin Plant Biol 2004, 7(6):737-742.
Li LG, Cheng XF, Leshkevich J, Umezawa T, Harding SA, Chiang VL: The
last step of syringyl monolignol biosynthesis in angiosperms is regulated
by a novel gene encoding sinapyl alcohol dehydrogenase. Plant Cell 2001,
13(7):1567-1585.
Li LG, Popko JL, Umezawa T, Chiang VL: 5-Hydroxyconiferyl aldehyde
modulates enzymatic methylation for syringyl monolignol formation, a
new view of monolignol biosynthesis in angiosperms. J Biol Chem 2000,
275(9):6537-6545.
Osakabe K, Tsao CC, Li LG, Popko JL, Umezawa T, Carraway DT, Smeltzer
RH, Joshi CP, Chiang VL: Coniferyl aldehyde 5-hydroxylation and
methylation direct syringyl lignin biosynthesis in angiosperms. Proc Natl
Acad Sci U S A 1999, 96(16):8955-8960.
Weng JK, Li X, Stout J, Chapple C: Independent origins of syringyl lignin
in vascular plants. Proc Natl Acad Sci U S A 2008, 105(22):7887-7892.
Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, Holmberg A, Amini B,
Bhalerao R, Larsson M, Villarroel R et al: Gene discovery in the woodforming tissues of poplar: Analysis of 5,692 expressed sequence tags. Proc
Natl Acad Sci U S A 1998, 95:13330-13335.
Allona I, Quinn M, Shoop E, Swope K, St Cyr S, Carlis J, Riedl J, Retzel E,
Campbell MM, Sederoff R et al: Analysis of xylem formation in pine by
cDNA sequencing. Proc Natl Acad Sci U S A 1998, 95(16):9693-9698.
Zhao CS, Craig JC, Petzold HE, Dickerman AW, Beers EP: The xylem and
phloem transcriptomes from secondary tissues of the Arabidopsis roothypocotyl. Plant Physiol 2005, 138(2):803-818.
Oh S, Park S, Han KH: Transcriptional regulation of secondary growth in
Arabidopsis thaliana. J Exp Bot 2003, 54(393):2709-2722.
Pavy N, Paule C, Parsons L, Crow JA, Morency MJ, Cooke J, Johnson JE,
Noumen E, Guillet-Claude C, Butterfield Y et al: Generation, annotation,
analysis and database integration of 16,500 white spruce EST clusters.
BMC Genomics 2005, 6:144.
Li X, Wu HX, Dillon SK, Southerton SG: Generation and analysis of
expressed sequence tags from six developing xylem libraries in Pinus
radiata D. Don. BMC Genomics 2009, 10:41.
Pavy N, Boyle B, Nelson C, Paule C, Giguere I, Caron S, Parsons LS, Dallaire
N, Bedon F, Berube H et al: Identification of conserved core xylem gene
sets: conifer cDNA microarray development, transcript profiling and
computational analyses. New Phytol 2008, 180(4):766-786.
Kaul S, Koo HL, Jenkins J, Rizzo M, Rooney T, Tallon LJ, Feldblyum T,
Nierman W, Benito MI, Lin XY et al: Analysis of the genome sequence of
the flowering plant Arabidopsis thaliana. Nature 2000, 408(6814):796-815.
Yu J, Hu SN, Wang J, Wong GKS, Li SG, Liu B, Deng YJ, Dai L, Zhou Y,
Zhang XQ et al: A draft sequence of the rice genome (Oryza sativa L. ssp
indica). Science 2002, 296(5565):79-92.
Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U,
Putnam N, Ralph S, Rombauts S, Salamov A et al: The genome of black
23
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006, 313:15961604.
EucalyptusDB. [http://eucalyptusdb.bi.up.ac.za/ ]
Ujino-Ihara T, Kanamori H, Yamane H, Taguchi Y, Namiki N, Mukai Y,
Yoshimura K, Tsumura Y: Comparative analysis of expressed sequence
tags of conifers and angiosperms reveals sequences specifically conserved
in conifers. Plant Mol Biol 2005, 59(6):895-907.
Quesada T, Li Z, Dervinis C, Li Y, Bocock PN, Tuskan GA, Casella G, Davis
JM, Kirst M: Comparative analysis of the transcriptomes of Populus
trichocarpa and Arabidopsis thaliana suggests extensive evolution of gene
expression regulation in angiosperms. New Phytol 2008, 180(2):408-420.
Liu H, Sachidanandam R, Stein L: Comparative genomics between rice and
Arabidopsis shows scant collinearity in gene order. Genome Res 2001,
11(12):2020-2026.
Feuillet C, Keller B: Comparative Genomics in the grass family: Molecular
characterization of grass genome structure and evolution. Annals of
Botany 2002, 89(1):3-10.
Paterson AH, Bowers JE, Feltus FA, Tang HB, Lin LF, Wang XY:
Comparative genomics of grasses promises a bountiful harvest. Plant
Physiol 2009, 149(1):125-131.
Nishiyama T, Fujita T, Shin-I T, Seki M, Nishide H, Uchiyama I, Kamiya A,
Carninci P, Hayashizaki Y, Shinozaki K et al: Comparative genomics of
Physcomitrella patens gametophytic transcriptome and Arabidopsis
thaliana: Implication for land plant evolution. Proc Natl Acad Sci U S A
2003, 100(13):8007-8012.
Paterson AH, Bowers JE, Burow MD, Draye X, Elsik CG, Jiang CX, Katsar
CS, Lan TH, Lin YR, Ming RG et al: Comparative genomics of plant
chromosomes. Plant Cell 2000, 12(9):1523-1539.
Caicedo AL, Purugganan MD: Comparative plant genomics. Frontiers and
prospects. Plant Physiol 2005, 138(2):545-547.
Bowman JL, Floyd SK, Sakakibara K: Green genes - Comparative genomics
of the green branch of life. Cell 2007, 129(2):229-234.
Campbell MA, Zhu W, Jiang N, Lin H, Ouyang S, Childs KL, Haas BJ,
Hamilton JP, Buell CR: Identification and characterization of lineagespecific genes within the Poaceae. Plant Physiol 2007, 145(4):1311-1322.
Yang XH, Jawdy S, Tschaplinski TJ, Tuskan GA: Genome-wide
identification of lineage-specific genes in Arabidopsis, Oryza and Populus.
Genomics 2009, 93(5):473-480.
Selaginella
moellendorffii
gene
models.
[http://genome.jgipsf.org/Selmo1/Selmo1.home.html]
Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H,
Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y et al: The
Physcomitrella genome reveals evolutionary insights into the conquest of
land by plants. Science 2008, 319(5859):64-69.
Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C,
Retzel E, Whetten R, Sederoff R: Apparent homology of expressed genes
from wood-forming tissues of loblolly pine (Pinus taeda L.) with
Arabidopsis thaliana. Proc Natl Acad Sci U S A 2003, 100(12):7383-7388.
Ralph SG, Chun HJE, Kolosova N, Cooper D, Oddy C, Ritland CE,
Kirkpatrick R, Moore R, Barber S, Holt RA et al: A conifer genomics
24
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality,
sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis).
BMC Genomics 2008, 9:484.
Brenner ED, Katari MS, Stevenson DW, Rudd SA, Douglas AW, Moss WN,
Twigg RW, Runko SJ, Stellari GM, McCombie WR et al: EST analysis in
Ginkgo biloba: an assessment of conserved developmental regulators and
gymnosperm specific genes. BMC Genomics 2005, 6:143.
Girke T, Lauricha J, Tran H, Keegstra K, Raikhel N: The cell wall navigator
database. A systems-based approach to organism-unrestricted mining of
protein families involved in cell wall metabolism. Plant Physiol 2004,
136(2):3003-3008.
Guillaumie S, San-Clemente H, Deswarte C, Martinez Y, Lapierre C,
Murigneux A, Barriere Y, Pichon M, Goffner D: MAIZEWALL. Database
and developmental gene expression profiling of cell wall biosynthesis and
assembly in maize. Plant Physiol 2007, 143(1):339-363.
Guo AY, Chen X, Gao G, Zhang H, Zhu QH, Liu XC, Zhong YF, Gu XC, He
K, Luo JC: PlantTFDB: a comprehensive plant transcription factor
database. Nucleic Acids Res 2008, 36:D966-D969.
Barakat A, Matassi G, Bernardi G: Distribution of genes in the genome of
Arabidopsis thaliana and its implications for the genome organization of
plants. Proc Natl Acad Sci U S A 1998, 95(17):10044-10049.
Vision TJ, Brown DG, Tanksley SD: The origins of genomic duplications in
Arabidopsis. Science 2000, 290(5499):2114-2117.
Segmental
chromosome
duplication
of
Arabidopsis.
[http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml]
Heuertz M, De Paoli E, Kallman T, Larsson H, Jurman I, Morgante M,
Lascoux M, Gyllenstrand N: Multilocus patterns of nucleotide diversity,
linkage disequilibrium and demographic history of Norway spruce [Picea
abies (L.) Karst]. Genetics 2006, 174(4):2095-2105.
Palmer JD, Soltis DE, Chase MW: The plant tree of life: An overview and
some points of view. Am J Bot 2004, 91(10):1437-1445.
Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner AM,
Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH et al: A Populus
EST resource for plant functional genomics. Proc Natl Acad Sci U S A
2004, 101(38):13951-13956.
Steemans P, Le Herisse A, Melvin J, Miller MA, Paris F, Verniers J, Wellman
CH: Origin and radiation of the earliest vascular land plants. Science
2009, 324(5925):353-353.
Ujino-Ihara T, Tsumura Y: Screening for genes specific to coniferous
species. Tree Physiol 2008, 28(9):1325-1330.
Schultz CJ, Johnson KL, Currie G, Bacic A: The classical arabinogalactan
protein gene family of Arabidopsis. Plant Cell 2000, 12(9):1751-1767.
Schultz CJ, Rumsewicz MP, Johnson KL, Jones BJ, Gaspar YM, Bacic A:
Using genomic resources to guide research directions. The
arabinogalactan protein gene family as a test case. Plant Physiol 2002,
129(4):1448-1463.
Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ: The complex
structures of arabinogalactan-proteins and the journey towards
understanding function. Plant Mol Biol 2001, 47(1-2):161-176.
25
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
Knox JP: Up against the wall: arabinogalactan-protein dynamics at cell
surfaces - Commentary. New Phytol 2006, 169(3):443-445.
Coimbra S, Almeida J, Junqueira V, Costa ML, Pereira LG: Arabinogalactan
proteins as molecular markers in Arabidopsis thaliana sexual
reproduction. J Exp Bot 2007, 58(15-16):4027-4035.
Yang SH, Wang HY, Sathyan P, Stasolla C, Loopstra CA: Real-time RTPCR analysis of loblolly pine (Pinus taeda) arabinogalactan-protein and
arabinogalactan-protein-like genes. Physiol Plant 2005, 124(1):91-106.
Putoczki TL, Pettolino F, Griffin MDW, Moller R, Gerrard JA, Bacic A,
Jackson SL: Characterization of the structure, expression and function of
Pinus radiata D. Don arabinogalactan-proteins. Planta 2007, 226:11311142.
Zhao CS, Johnson BJ, Kositsup B, Beers EP: Exploiting secondary growth
in Arabidopsis. Construction of xylem and bark cDNA libraries and
cloning of three xylem endopeptidases. Plant Physiol 2000, 123(3):11851196.
Chaffey N, Cholewa E, Regan S, Sundberg B: Secondary xylem
development in Arabidopsis: a model for wood formation. Physiol Plant
2002, 114(4):594-600.
Funk V, Kositsup B, Zhao CS, Beers EP: The Arabidopsis xylem peptidase
XCP1 is a tracheary element vacuolar protein that may be a papain
ortholog. Plant Physiol 2002, 128(1):84-94.
Nieminen KM, Kauppinen L, Helariutta Y: A weed for wood? Arabidopsis as
a genetic model for xylem development. Plant Physiol 2004, 135(2):653659.
NCBI UniGene database. [ftp://ftp.ncbi.nih.gov/repository/UniGene/]
The
TIGR
plant
gene
index
database.
[http://compbio.dfci.harvard.edu/tgi/plant.html]
Gene models of Populus trichocarpa (v1.1), Selaginella moellendorffii
(v1.0) and Physcomitrella patens (moss) (v1.1). [http://genome.jgi-psf.org/]
Arabidopsis
thaliana
gene
models
(TAIR8).
[ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR8_genome_release/]
Oryza
sativa
(rice)
gene
models
(v6.0).
[ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotati
on_dbs/]
Brenner SE: Practical database searching. Bioinformatics 1998:9-12.
Shimamura K, Ishimizu T, Nishimura K, Matsubara K, Kodama H, Watanabe
H, Hase S, Ando T: Analysis of expressed sequence tags from Petunia
flowers. Plant Sci 2007, 173(5):495-500.
Albert VA, Soltis DE, Carlson JE, Farmerie WG, Wall PK, Ilut DC, Solow
TM, Mueller LA, Landherr LL, Hu Y et al: Floral gene resources from basal
angiosperms for comparative genomics research. BMC Plant Biology 2005,
5:15.
Krutovsky KV, Elsik CG, Matvienko M, Kozik A, Neale DB: Conserved
ortholog sets in forest trees. Tree Genetics & Genomes 2007, 3(1):61-70.
Fulton TM, Van der Hoeven R, Eannetta NT, Tanksley SD: Identification,
analysis, and utilization of conserved ortholog set markers for
comparative genomics in higher plants. Plant Cell 2002, 14(7):1457-1467.
Cairney J, Zheng L, Cowels A, Hsiao J, Zismann V, Liu J, Ouyang S,
Thibaud-Nissen F, Hamilton J, Childs K et al: Expressed Sequence Tags
26
747
748
749
750
751
752
753
754
755
756
68.
69.
from loblolly pine embryos reveal similarities with angiosperm
embryogenesis. Plant Mol Biol 2006, 62(4-5):485-501.
Labarga A, Valentin F, Anderson M, Lopez R: Web services at the
European Bioinformatics Institute. Nucleic Acids Res 2007, 35:W6-W11.
Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular evolutionary
genetics analysis (MEGA) software version 4.0. Mol Biol Evol 2007,
24(8):1596-1599.
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
27
777
Figure legends
778
Figure 1. Comparisons of conifer xylem unigenes with genes in other species.
779
Xylem unigenes of radiata pine, loblolly pine and white spruce were blasted against
780
xylem unigenes, gene indices, gene models or scaffolds of other plant species.
781
Percentage of hits is presented in the Y-axis at E-value cut-offs ranging from 1E-5 to
782
0 in the X-axis. A: Radiata pine vs. other species (blastn); B: Loblolly pine vs. other
783
species (blastn); C: White spruce vs. other species (blastn); D: Radiata pine vs. other
784
species (tblastx or blastx); E: Loblolly pine vs. other species (tblastx or blastx); F:
785
White spruce vs. other species (tblastx or blastx).
786
787
Figure 2. Average nucleotide identity between pine unigenes and respective
788
homologs in other species. Xylem unigenes of radiata pine and loblolly pine were
789
blasted using blastn against unigenes of other species, including loblolly pine, white
790
spruce, sitka spruce, hybrid aspen, P. trichocarpa, Arabidopsis, rice and moss.
791
Homologs at E-value cut-offs ranging from 0 to 1E-5 were collected and their average
792
nucleotide identities were calculated and presented. A: Radiata pine xylem unigenes
793
vs. homologous unigenes of other species; B: Loblolly pine xylem unigenes vs.
794
homologous unigenes of other species.
795
796
Figure 3. Comparisons of genes between poplar and other species. Poplar
797
xylem unigenes and gene models were blasted against gene indices, gene models or
798
scaffolds of other plant species. Percentage of hits is presented in the Y-axis at E-
799
value cut-offs ranging from 1E-5 to 0 in the X-axis. A: Poplar xylem unigenes vs.
800
other species (blastn); B: Poplar xylem unigenes vs. other species (tblastx or blastx);
28
801
C: Poplar gene models vs. other species (blastn); D: Poplar gene models vs. other
802
species (tblastn or blastp).
803
804
Figure 4. The xylem transcriptome is relatively more conserved in vascular and
805
non-vascular plants. Xylem gene indices and total gene indices of pine, spruce and
806
poplar were blasted against the gene models of Populus, Arabidopsis, rice, Selaginella
807
and moss using blastn and blastx. Percentage of hits is presented in the Y-axis at E-
808
value cut-offs of 1E-50 and 1E-5 in the X-axis. Statistical tests for the different
809
number of hits observed with xylem and total transcriptome were performed assuming
810
a binomial distribution. Almost all comparisons showed statistical significances
811
(P<0.001) (data not show), with only two exceptions from the blastn comparisons
812
(1E-50) between the two conifer species and poplar. A: Blastn; B: Blastx.
813
814
Figure 5. Cell wall genes and ancestral xylem genes are relatively more
815
conserved in diverse plants. Putative cell wall genes, non-cell-wall genes, vascular
816
plant-specific xylem (VSX) genes and ancestral xylem genes from xylem unigenes of
817
radiata pine, loblolly pine, white spruce and poplar were blasted using tblastx with
818
pine and spruce gene indices as well as blastx with gene models of poplar,
819
Arabidopsis, rice, Selaginella and moss. Percentage of hits is presented in the Y-axis
820
at E-value cut-offs of 0, 1E-50 and 1E-5 in the X-axis. Different numbers of hits
821
observed with cell wall and non-cell-wall genes, as well as with VSX and ancestral
822
xylem genes were statistically tested assuming a binomial distribution. Almost all
823
comparisons showed statistical significances (P<0.001) (data not show), except for a
824
few comparisons among conifers at 1E-5. A: Radiata pine; B: Loblolly pine; C: White
825
spruce; D: Poplar.
29
826
827
Figure 6. Number of putative xylem orthologs in different groups of land plants.
828
Loblolly pine xylem unigenes (18,320) were blasted using tblastx against the unigenes
829
of white spruce and sitka spruce, and the gene models of poplar, Arabidopsis, rice,
830
Selaginella and moss. The number of putative xylem orthologs common to different
831
plant groups is presented at two E-value cut-offs (1E-50 and 1E-5). Conifers 1:
832
loblolly pine plus white spruce; Conifers 2: conifers 1 plus sitka spruce; Woody
833
plants: conifers 2 plus poplar; Seed plants 1: woody plants plus Arabidopsis; Seed
834
plants 2: seed plants 1 plus rice; Vascular plants: seed plants 2 plus Selaginella; Land
835
plants: vascular plants plus moss. The xylem orthologs shared in conifers 2, woody
836
plants, seed plant 1, seed plants 2 and vascular plants represent genes specific to
837
conifers, woody plants, seed plants, lower seed plants (lacking secondary xylem) and
838
vascular plants, respectively. The high proportion of orthologs common to land plants
839
suggests that the ancestral xylem genome was largely present in non-vascular plants.
840
841
Figure 7. Hot spots of putative xylem orthologs on the chromosomes of
842
Arabidopsis. The 1,003 loblolly pine xylem unigenes highly conserved in other
843
vascular plants (xylem orthologs) have strong homology (1E-50) with 3,115 loci and
844
moderate homology (1E-5) with 6,563 loci in the Arabidopsis genome. Positioning of
845
these homologs on the five chromosomes of Arabidopsis revealed eight hot spots
846
(showed as numbers 1 to 8 with red colour) of putative Arabidopsis xylem orthologs.
847
A: 3,115 highly homologous loci; B: 6,563 moderately homologous loci; C: 6,000
848
randomly selected loci.
849
850
30