Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of botany wikipedia , lookup
Venus flytrap wikipedia , lookup
Plant physiology wikipedia , lookup
Historia Plantarum (Theophrastus) wikipedia , lookup
Ornamental bulbous plant wikipedia , lookup
Plant morphology wikipedia , lookup
Arabidopsis thaliana wikipedia , lookup
Plant disease resistance wikipedia , lookup
Sustainable landscaping wikipedia , lookup
Flowering plant wikipedia , lookup
Embryophyte wikipedia , lookup
Glossary of plant morphology wikipedia , lookup
1 Comparative genomics reveals conservative evolution of the 2 xylem transcriptome in vascular plants 3 4 Xinguo Li, Harry X. Wu*, and Simon G. Southerton* 5 6 CSIRO Plant Industry, GPO Box 1600, Canberra ACT 2601, Australia 7 8 * Corresponding authors 9 10 Email addresses: 11 XL: [email protected] 12 HW: [email protected] 13 SS: [email protected] 14 15 16 17 18 19 20 21 22 1 23 Abstract 24 Background: Wood is a valuable natural resource and a major carbon sink. Wood 25 formation is an important developmental process in vascular plants which played a 26 crucial role in plant evolution. Although genes involved in xylem formation have been 27 investigated, the molecular mechanisms of xylem evolution are not well understood. 28 We use comparative genomics to examine evolution of the xylem transcriptome to 29 gain insights into xylem evolution. 30 Results: The xylem transcriptome is highly conserved in conifers, but considerably 31 evolved in angiosperms. The functional domains of genes in the xylem transcriptome 32 are moderately to highly conserved in vascular plants, suggesting the existence of a 33 common ancestral xylem transcriptome. Compared to the total transcriptome derived 34 from a range of tissues, the xylem transcriptome is relatively conserved in vascular 35 plants. Of the xylem transcriptome, cell wall genes, ancestral xylem genes, known 36 proteins and transcription factors are relatively more conserved in vascular plants. A 37 total of 527 putative xylem orthologs were identified, which are unevenly distributed 38 in Arabidopsis chromosomes with eight hot spots observed. Phylogenetic analysis 39 revealed that evolution of the xylem transcriptome has paralleled plant evolution. We 40 also identified 274 conifer-specific xylem unigenes, all of which are of unknown 41 function. These xylem orthologs and conifer-specific unigenes are likely to have 42 played a crucial role in xylem evolution. 43 Conclusions: Conifers have highly conserved xylem transcriptomes, while 44 angiosperm xylem transcriptomes are relatively diversified. Vascular plants share a 45 common ancestral xylem transcriptome. In vascular plants the xylem transcriptome is 46 more conserved compared to the total transcriptome. 47 2 48 Background 49 The evolution of xylem was a critical step that allowed vascular plants to colonize 50 vast areas of the earth’s terrestrial surface. Xylem plays a crucial role in the transport 51 of water and nutrients and provides mechanical support for vascular plants. 52 Herbaceous vascular plants develop primary xylem, and woody plants also produce 53 secondary xylem (or wood). Evolution from tracheids to vessels reflects a more 54 efficient way for angiosperms to develop secondary xylem [1-3]. Primary xylem 55 consists of cellulose, hemicellulose, pectin and proteins, while secondary xylem 56 contains higher amounts of cellulose and lignin. From a practical point of view, wood 57 represents a renewable natural resource for the timber, fibre and biofuel industries and 58 it is a major carbon sink in natural ecosystems. 59 60 In the last few decades the molecular basis of the xylem formation and evolution has 61 been investigated. Syringyl (S) lignin biosynthesis was found to be an evolved lignin 62 pathway in angiosperms, representing an addition to the ancient and predominant 63 guaiacyl (G) lignin pathway conserved in land plants [3-6]. However, the S lignin 64 pathway was also recently observed in lycophytes [7], suggesting a complex 65 evolutionary history of lignin biosynthesis in vascular plants. Genes involved in wood 66 formation have been identified in many plant species [8-13], allowing investigation of 67 xylem evolution at the transcriptome level. A core xylem gene set was identified in 68 white spruce among which 31 transcripts are highly conserved in Arabidopsis [14]. 69 The expanding genomic resources of model plant species [15-18] are invaluable for 70 exploring many aspects of plant development and evolution, including xylem 71 formation and evolution. Recent studies suggest whole-genome duplication and 72 reorganization [15-17] have been a major driving force in plant evolution. 3 73 74 Comparative genomics is a powerful tool for investigating plant evolution at the 75 whole-genome level [19-27]. Lower collinearity across dicots and monocots [16, 21] 76 and lineage-specific genes have been observed using comparative genomics [28, 29]. 77 Comparisons between rice and Arabidopsis showed that 50% of the rice genome was 78 homologous to Arabidopsis [16], while as much as 88% of the poplar genome shares 79 homology with Arabidopsis [17]. Comparative genomics of poplar with Arabidopsis 80 revealed at least 11,666 protein-coding genes were present in the ancestral eurosid 81 genome [17]. The recently sequenced Selaginella moellendorffii [30], moss [31] and 82 Eucalyptus grandis [18] provide new opportunities for comparative genomics in 83 vascular plants. 84 85 Many commercial crops and forest trees do not have sequenced genomes; thus, 86 comparative genomics in these species must rely on studies of the transcriptome 87 (expressed sequence tags, ESTs). Using the transcriptome approach about half of 88 loblolly pine xylem unigenes did not match Arabidopsis unigenes [32], and 39-45% 89 of pine contigs had no hits in the genomes of Arabidopsis, rice and poplar [33]. Eighty 90 percent of white spruce and 32-36% of sitka spruce unigenes lacked homologs in 91 poplar, Arabidopsis and rice [12, 33]. Similar results were also observed in other 92 comparisons between gymnosperms and angiosperms [19, 34]. In contrast, 93 comparison 94 transcriptomes. For example, about 84% of the white spruce transcripts had matches 95 in the Pine Gene Index [12], and 60-80% of spruce contigs had homologs in loblolly 96 pine and other gymnosperms [33]. of different gymnosperm 97 4 species revealed highly conserved 98 Although comparative genomics has increased our understanding of plant evolution at 99 the genome level, most studies to date have focussed on the entire genome sequence, 100 and/or mixed EST resources developed from a range of tissues. In order to examine 101 xylem evolution in vascular plants we focused our attention on genes expressed 102 specifically in xylem. We selected ten species (see Methods) to represent different 103 categories of vascular plants, including gymnosperms (pine and spruce), angiosperms 104 (woody dicots, herbaceous dicots, and monocots) and lycophytes. The non-vascular 105 plant moss was included as a reference for xylem evolution. Up-to-date public 106 genome sequences and xylem transcriptomes of selected plant species were used for 107 comparative genomic analysis, aiming to explore the evolution of the xylem 108 transcriptome in vascular plants. 109 110 Results 111 The xylem transcriptome is highly conserved in conifer species 112 Comparisons of xylem unigenes of the three conifer species (radiata pine, loblolly 113 pine and white spruce) with other vascular plants and moss are shown in Figure 1A-F. 114 At the nucleotide level 78-82% (E ≤ 10-5) and 63-66% (E ≤ 10-50) of radiata pine 115 xylem unigenes have homologs in the xylem unigenes of loblolly pine and white 116 spruce, while 93-95% (E ≤ 10-5) and 74-89% (E ≤ 10-50) have homologs in the gene 117 indices of pine and spruce (Figure 1A). Thus, the radiata pine xylem transcriptome is 118 highly conserved in other conifer species. Further analysis with loblolly pine and 119 white spruce revealed that conifer xylem transcriptomes are highly conserved at both 120 the nucleotide (Figure 1B and 1C) and deduced amino acid sequence level (Figure 121 1D-F). 122 5 123 Conifer xylem transcriptomes are considerably distinct from angiosperms and 124 lycophytes. At the nucleotide level, only 15-32% (E ≤ 10-5) and 1-4% (E ≤ 10-50) of 125 radiata pine xylem unigenes have homologs in the gene models or scaffolds of poplar, 126 Eucalyptus, Arabidopsis, rice, Selaginella and moss (Figure 1A). Similarly low 127 numbers of homologs were identified in analyses with loblolly pine (Figure 1B) and 128 white spruce (Figure 1C). Unsurprisingly, the deduced amino acid sequences of 129 conifer xylem unigenes have relatively more homologs in non-coniferous plants 130 (Figure 1D-F). However, the number of homologs is still considerably lower than the 131 number of matches among the three conifers (Figure 1D-F). Therefore, at both the 132 nucleotide and amino acid level the conifer xylem transcriptome is relatively distinct 133 from angiosperms, lycophytes and moss. 134 135 We examined the average nucleotide identity among the homologous xylem unigenes 136 shared by the different conifer species. The average nucleotide identity between 137 radiata pine and loblolly pine is 97.5%, and 91% between radiata pine and the two 138 spruce species (Figure 2A). Lower nucleotide identity (81-83%) was observed in the 139 homologous unigenes of radiata pine and angiosperms (Populus, Arabidopsis and 140 rice) or moss (Figure 2A). These results were also confirmed in analyses in loblolly 141 pine (Figure 2B), providing additional evidence for xylem transcriptome conservation 142 in conifers, and its divergence in non-coniferous plants. 143 144 The xylem transcriptome has evolved considerably in angiosperms 145 Poplars are woody angiosperms and are used as a model tree species for wood 146 formation. Surprisingly, at the level of nucleotide sequence less than 5% of poplar 147 xylem unigenes have homologs (E ≤ 10-50) in the scaffolds or gene models of the three 6 148 angiosperms (Eucalyptus, Arabidopsis and rice) (Figure 3A). Poplar xylem unigenes 149 also have few homologs in the gene indices of pine or spruce and in the gene models 150 of Selaginella and moss (Figure 3A). These results suggest that the poplar xylem 151 transcriptome is distinct, and that the xylem transcriptome has undergone considerable 152 evolution in angiosperms. This contrasts with the highly conserved xylem 153 transcriptomes of gymnosperms we observed earlier (Figure 1). Furthermore, poplar 154 also has a unique genome (at the nucleotide level) among the species examined in this 155 study (Figure 3C). However, the functional gene sequences of the poplar xylem 156 transcriptome (Figure 3B) and the entire genome (Figure 3D) are moderately (36-50% 157 at E ≤ 10-50) conserved, while their functional domains are highly conserved (77-85% 158 at E ≤ 10-5) in other vascular and non-vascular plants. These results suggest that land 159 plants share an ancestral genome and an ancestral xylem transcriptome . 160 161 The xylem transcriptome is relatively more conserved in vascular plants 162 We examined the pattern of the xylem transcriptome evolution by comparing it with 163 the total transcriptome derived from a range of tissues. Although the xylem 164 transcriptomes of pine, spruce and poplar have few homologs in the gene models of 165 Arabidopsis, rice, Selaginella and moss (1.2-5% at E ≤ 10-50), there are about two to 166 four times more homologs than there are for the total transcriptome (Figure 4A). 167 When using a lower stringency E-value (≤ 10-5), the xylem transcriptomes of pine, 168 spruce and poplar have more homologs (about 20-60%) in the above four model 169 species compared to the total transcriptome (9-34%). Thus, at the nucleotide level the 170 xylem transcriptome is relatively more conserved in vascular and non-vascular plants. 171 Interestingly, both the conifer xylem and the total transcriptomes have few homologs 172 in poplar (a woody species) and in herbaceous species (Arabidopsis, rice, Selaginella 7 173 and moss) (Figure 4A). This suggests that conifer xylem formation is distinct from 174 xylem formation in woody angiosperms. 175 176 The conservative evolution of the xylem transcriptome was more clearly observed 177 when using deduced amino acid sequences (Figure 4B). For example, 37-46% of the 178 conifer xylem transcriptome matched (E ≤ 10-50) the genomes of angiosperms 179 (Populus, Arabidopsis and rice), lycophyte and moss; nearly two times more than the 180 matches with the total conifer transcriptome (Figure 4B). Similar results were also 181 obtained with the poplar xylem transcriptome in the comparisons with the total 182 transcriptome (Figure 4B). Therefore, the functional gene sequences in the xylem 183 transcriptome are significantly more conserved in vascular plants. 184 185 Different functional gene groups in the xylem transcriptome have distinct 186 patterns of evolution 187 We divided the xylem transcriptome into different functional groups for investigating 188 their diverse patterns of evolution. The xylem unigenes with matches (blastx, E ≤ 10- 189 5 190 matches as vascular plant-specific xylem (VSX) genes. Among the ancestral xylem 191 genes derived from radiata pine, loblolly pine and white spruce, 96-99%, 78-94% and 192 48-60% (E ≤ 10-5), or 41-48%, 20-26% and 28-35% (E ≤ 10-50), respectively, have 193 homologs (blastx) in the genomes of the three angiosperms and the lycophyte (Figure 194 5A-C). In contrast, only 8-28% (E ≤ 10-5) and 0.4-1% (E ≤ 10-50) of the conifer VSX 195 genes are homologous to the genomes of these model vascular plants (Figure 5A-C). 196 These results suggest that the ancestral xylem genes are moderately to highly 197 conserved in non-coniferous vascular plants, while considerable evolution has ) in the moss gene models are defined as ancestral xylem genes, and those with no 8 198 occurred in VSX genes among conifers, angiosperms and lycophytes. Similar patterns 199 were observed in comparisons of VSX and ancestral xylem genes with the poplar 200 xylem transcriptome (Figure 5D). 201 202 Cell wall genes were identified according to the CellWallNavigator [35] and 203 MAIZEWALL [36] databases. We compared the level of conservation of cell wall 204 genes to non-cell-wall genes (Figure 5A-D). In the four woody species (radiata pine, 205 loblolly pine, white spruce and poplar) the cell wall genes have 1.5 to two times more 206 homologs than the non-cell-wall genes in the genomes of the four herbaceous model 207 plants (Figure 5A-D). This suggests that cell wall genes have been highly conserved 208 across diverse land plants. Comparisons of transcription factors (TFs) (based on 209 PlantTFDB [37]) with non-TFs and known functional genes with unknowns revealed 210 significantly more conservation of transcription factors and known functional genes 211 among different groups of vascular and non-vascular plants (Additional data file 1). 212 213 Identification of putative xylem orthologs in vascular plants 214 Putative xylem orthologs conserved in diverse plant groups were identified using 215 deduced amino acid or protein sequences (Additional data file 2), and their numbers 216 are briefly presented at two E-value cut-offs (≤ 10-50 and ≤ 10-5) in Figure 6. The 217 xylem orthologs (9,164 at E ≤ 10-5 or 3,460 at E ≤ 10-50) shared by loblolly pine, white 218 spruce and sitka spruce represent a common gene set expressed in conifer wood 219 formation. While the xylem orthologs (6,641 at E ≤ 10-5 or 1,339 at E ≤ 10-50) 220 common to the three conifers and poplar are putative woody plant xylem orthologs. 221 Surprisingly, the number of putative xylem orthologs in other plant groups is only 222 slightly smaller (Figure 6), suggesting that very limited numbers of xylem orthologs 9 223 are specific to each plant group. The majority (93-96%) of the vascular plant xylem 224 orthologs have matches in the gene models of a non-vascular plant (moss). This 225 suggests that the xylem transcriptome of vascular plants has largely evolved from a 226 subset of genes found in non-vascular plants. This is likely to be a group of genes 227 predominantly involved in cell wall biosynthesis that have been recruited to 228 synthesise secondary xylem in higher plants. 229 230 Of the 1,003 loblolly pine xylem unigenes with homologs (tblastx or blastx, E ≤ 10-50) 231 in the six vascular plants, 94% and 100% have hits in the uniprot and TIGR databases, 232 respectively, and 94% are assigned with GO terms (Additional data file 2). Among 233 these unigenes 349 (35%) were identified as cell wall genes, and the most abundant 234 gene family is tubulin (16 unigenes). Cellulose synthase is also abundant (10). Several 235 primary wall genes are abundant including pectate lyase (8), pectinesterase (7), XET 236 (7), expansin (6) and peroxidase (5). Lignin biosynthesis-related genes are well 237 represented in the cell wall gene lists, including SAMS (11), laccase (9), methionine 238 synthase (cobalamin-independent) (6), 4CL (3), COMT (2), CAD (2) and CCoAMT 239 (3). In addition, 49 transcription factors were found, including NAC, PHD, HB, 240 MYB, LIM, CCCH, etc. The large number of transcription factors suggests that wood 241 formation involves considerable transcriptional regulation. Interestingly, aquaporins 242 (16 unigenes) is one of the largest gene families in the 1,003 unigenes, reflecting the 243 importance of water transport in xylem formation. 244 245 The 1,003 loblolly pine xylem unigenes have between 749 and 894 close homologs 246 (blastx or tblastx, E ≤ 10-50) in white spruce, sitka spruce, poplar, Arabidopsis, rice 247 and Selaginella, with 527 xylem orthologs common to all the above species. These 10 248 common xylem orthologs matched (blastx, E ≤ 10-50) 501 unique gene models in moss, 249 indicating that at least 501 ancestral xylem orthologs are shared in land plants. We 250 also identified all loci in Arabidopsis and rice with homology to the 1,003 loblolly 251 pine xylem unigenes. All possible homologs in Arabidopsis are represented by 3,115 252 and 6,563 unique loci at E ≤ 10-50 or ≤ 10-5, respectively, and in rice by 3,057 and 253 8,458 unique loci. This suggests that each xylem ortholog has an average of 4-8 254 paralogs in Arabidopsis and 5-14 paralogs in rice. 255 256 The xylem orthologs and paralogs have an even distribution among chromosomes of 257 Arabidopsis (Additional data file 3A) and poplar (Additional data file 3C), but may be 258 unevenly distributed among rice chromosomes (Additional data file 3B). However, 259 eight hot spots were observed within the five Arabidopsis chromosomes (Figure 7A, 260 B), which contrasts with the even distribution of all Arabidopsis genes across 261 chromosomes [38] (Figure 7C). These hot spots may be due to historical patterns of 262 gene and chromosome duplication [15, 39, 40]. Within the chromosomes of rice and 263 poplar the xylem orthologs are relatively evenly distributed (Additional data file 4 and 264 Additional data file 5). Different distribution patterns of the xylem orthologs among 265 the Arabidopsis, rice and poplar genomes may reflect their diverse history of gene 266 duplication and segmental chromosomal rearrangements. 267 268 Molecular evolution of xylem orthologs in vascular plants 269 We used the 501 ancestral xylem orthologs for phylogenetic analysis because they 270 largely (95%) represent the 527 unique xylem orthologs and it allowed us to include 271 the non-vascular plant moss as a reference. Three different trees (Additional data file 272 6A-C) were constructed using four alignment methods (see Methods), in which 11 273 ClustalX2 and Mafft generated the same tree configuration (Additional data file 6B). 274 Among these trees, morphologically related species are consistently clustered in the 275 same group. The trees constructed by Kalign (Additional data file 6A) and ClustalX2 276 (or Mafft) (Additional data file 6B) clearly separate angiosperms and gymnosperms 277 from lycophytes and moss. Based on tree B moss and Selaginella emerged at the 278 earliest time point followed by the two conifers (loblolly pine and white spruce) and 279 then by the two dicots (poplar and Arabidopsis). The monocot xylem transcriptome 280 appears to be the most evolutionarily distinct. In summary, the phylogenetic analysis 281 suggests that molecular evolution of xylem orthologs parallels plant evolution. 282 283 Identification of putative conifer-specific xylem genes 284 A total of 274 loblolly pine xylem unigenes which matched unigenes of white spruce 285 and sitka spruce (tblastx, E ≤ 10-50), but have no homologs in the gene models of 286 poplar, Arabidopsis, rice, Selaginella and moss, as well as no hits in any other 287 angiosperms in the uniprot known protein database (blastx, E > 10-5), were identified 288 as putative conifer-specific xylem unigenes (Additional data file 7). All conifer- 289 specific xylem unigenes are unknown protein sequences based on the uniprot known 290 protein database, and only 16% were assigned with GO terms in the TIGR gene index 291 database. This contrasts with 94% of the xylem orthologs annotated with GO terms. 292 The poor annotation of the conifer-specific xylem genes is likely due to the intense 293 functional characterisation of genes that has taken place in angiosperms. 294 295 Of the conifer-specific xylem unigenes several relatively abundant transcripts have 296 low homology to arabinogalactan-like proteins (AGPs), glycine/proline-rich proteins 297 (GPRP), metallothionein-like class II proteins, neurofilament triplet H proteins, zinc 12 298 finger proteins, cytochrome c1, etc (Additional data file 7). Genes with low similarity 299 to other cell wall proteins are present, such as alpha tubulin, cellulose synthase like 300 and extensin-like proteins. The conifer-specific xylem unigenes also include some 301 genes similar to transcription factors, i.e. Aux/IAA, NAC, CCAAT-binding, WRKY, 302 R2R3-MYB, BHLH and CCAAT-binding (Additional data file 7). These conifer- 303 specific xylem unigenes may share the functions of related homologues, but some 304 may have unique roles in gymnosperm wood formation. 305 306 Discussion 307 Gymnosperms and angiosperms have distinct patterns of xylem transcriptome 308 evolution 309 Our data revealed that the xylem transcriptome is highly conserved in conifers but has 310 undergone considerably more diversification in angiosperms. This suggests the xylem 311 transcriptome has a distinct pattern of evolution in angiosperms compared to 312 gymnosperms. Pine and spruce diverged 120-140 Ma [41], slightly earlier than the 313 radiation of angiosperms (100-120 Ma) [16, 17]. The rapid evolution of the 314 angiosperm xylem transcriptome suggests greater sensitivity to evolutionary forces in 315 flowering plants. Our results also showed that poplar has a relatively distinct genome 316 and xylem transcriptome, thus poplar is unlikely to be a useful model for investigating 317 wood formation and other developmental processes in gymnosperms. Mechanisms 318 underlying the diverse patterns of xylem transcriptome evolution in angiosperms and 319 gymnosperms are not well understood. Gene duplications and segmental chromosome 320 rearrangements may be the major driving forces for this diversification; however, the 321 role of genome size or gene number remains unclear. 322 13 323 The common ancestor of the xylem transcriptome in vascular plants 324 Our data demonstrated that the nucleotide and protein sequences of the xylem 325 transcriptome are highly diversified among different categories of vascular plants, but 326 that their functional domains are moderately to highly conserved in vascular plants. 327 This is consistent with the diversification of all vascular plants from a common 328 ancestor which is thought to have lived about 400 Ma [42]. Some of these conserved 329 patterns are also consistent with comparisons using the total transcriptome in previous 330 studies [32, 43]. The conserved functional domains of the xylem transcriptome 331 suggest the existence of a shared ancestral xylem transcriptome of vascular plants. 332 333 The common ancestral xylem transcriptome should be present in the earliest ancient 334 vascular plants, which produced hilate/trilete spores during the Late Ordovician [44]. 335 The lycophyte Selaginella is among the closest living relatives of these plants and its 336 xylem transcriptome is likely to resemble most closely the ancestral xylem 337 transcriptome. The 5,047 (27%) xylem unigenes of loblolly pine with homologs (E 338 ≤10-5) in the six vascular plants including Selaginella could largely represent the 339 ancestral xylem transcriptome. Because the vast majority (95%) of the ancestral 340 xylem transcriptome has homologs (E ≤10-5) in the moss genome, the xylem 341 transcriptome is likely to have evolved largely from the cell wall transcriptome of 342 non-vascular plants, which can be traced back as far as 450 Ma [31]. 343 344 Conservative evolution of the xylem transcriptome in vascular plants 345 Xylem functions in the transport of water and solutes throughout the plant and 346 together with the phloem makes up the vascular transport tissues of plants. Our data 347 suggests that the xylem transcriptome is relatively more conserved than the total 348 transcriptome in vascular plants. Several functional gene groups within the xylem 14 349 transcriptome are significantly more conserved among vascular plants, including cell 350 wall genes, ancestral xylem genes, transcription factors and known protein genes. 351 This implies that xylem has evolved more slowly compared to other organs of 352 vascular plants such as leaves, flowers and seeds. On the other hand, the unknown 353 function and VSX genes have been more rapidly evolving in vascular plants. The 354 rapid evolution of vascular plant-specific xylem (VSX) genes contrasts with the 355 generally conservative evolution of the total xylem transcriptome, suggesting VSX 356 genes are particularly sensitive to evolutionary forces. These genes will be useful 357 targets for functional characterisation in order to understand xylem formation and 358 evolution in vascular plants. 359 360 Our data showed that the loblolly pine xylem transcriptome has significantly more 361 homologs in spruce (white spruce and sitka spruce) than in poplar and Eucalyptus, 362 thus it is possible to identify genes specific to gymnosperm wood formation 363 (softwoods). Surprisingly, the number of homologs of the loblolly pine xylem 364 transcriptome in woody angiosperms (poplar and Eucalyptus) is similar to that in 365 herbaceous angiosperms (Arabidopsis and rice), suggesting that a limited number of 366 xylem genes are specific to woody plants. Furthermore, the number of homologs of 367 the conifer xylem transcriptome in angiosperms is only slightly more than the number 368 in lycophytes and moss. The limited number of xylem genes specific to woody plants 369 suggests that differential transcriptional regulation may have played an important role 370 in xylem evolution; in a similar way that transcriptional regulation gives rise to plant 371 diversity [20]. 372 373 Comparison of gene expression between softwoods and hardwoods 15 374 A previous study in Cryptomeria japonica identified 56 putative conifer-specific 375 transcripts, including three specific to reproductive organs and one (unknown) 376 specific to woody tissues [45]. Here we identified 274 conifer-specific xylem 377 unigenes, all of which are of unknown function. Transcripts with low homology to 378 some cell wall genes are abundant or present in the conifer-specific xylem unigenes 379 (Additional data file 7). This suggests that some cell wall genes, such as tubulin and 380 CesA, may be specific to conifer wood formation. These conifer-specific xylem genes 381 could provide clues to the molecular processes that give rise to the distinct cell wall 382 structures and chemical properties of softwoods compared to hardwoods. Genes 383 related to lignin biosynthesis are not present in the conifer-specific xylem unigenes, 384 which is consistent with the earlier conclusion that the conifer lignin pathway is 385 conserved in other vascular plants [3]. 386 387 Transcripts similar to genes encoding arabinogalactan proteins (AGPs) are the most 388 abundant genes in the conifer-specific xylem unigenes. AGPs are a large class of 389 hydroxyproline-rich glycoproteins. The conifer-specific xylem AGPs (AGP4 and 390 AGP-like) identified in this study had no homologs with any of the 47 AGPs of 391 Arabidopsis [46-48]. Most AGPs are anchored to the plasma membrane [46] and 392 released into the cell wall after cleavage of the GPI anchor [49]. AGPs may act as cell 393 wall plasticizers, enlarging the pectin matrix, and allowing wall extension and cell 394 expansion [50]. In loblolly pine six AGPs including AGP4 were identified in xylem 395 tissues [51]. Radiata pine AGPs are located in the compound middle lamella of newly 396 developed tracheids [52]. The radiata pine EST resource included 11 AGPs, in which 397 PrAGP4 is highly abundant in earlywood libraries [13]. 398 16 399 We identified 527 xylem orthologs in vascular plants, including many primary and 400 secondary cell wall genes as well as transcription factors. Conservation of cell wall 401 genes suggests that cell wall biosynthesis is a central event in xylem formation in 402 vascular plants. These xylem orthologs maintain the stability of the basic xylem 403 machinery in vascular plants and may regulate development of xylem structures and 404 properties shared by different vascular plants, such as the common features of 405 softwoods and hardwoods. COMT was previously thought to be one of three enzymes 406 (F5H/Cald5H, COMT and SAD) specifically involved in S lignin synthesis of 407 angiosperms [3]. The occurrence of COMT genes in the xylem orthologs of vascular 408 plants suggests their involvement in diverse functions other than S lignin synthesis. 409 410 Conclusions 411 The xylem transcriptome is highly conserved in conifer species, but the nucleotide 412 sequences of the xylem transcriptome are significantly distinct among angiosperms, 413 thus considerable evolution of the xylem transcriptome has occurred in angiosperms. 414 The functional domains of the xylem transcriptome are moderately to highly 415 conserved among vascular and non-vascular plants. This suggests that vascular plants 416 share an ancestral xylem transcriptome which is likely to have evolved predominantly 417 from the cell wall transcriptome of non-vascular plants. In comparison to the total 418 transcriptome from a wide range of tissues, the xylem transcriptome has evolved 419 conservatively in vascular plants. Several functional gene groups within the xylem 420 transcriptome, including cell wall genes, ancestral xylem genes, transcription factors 421 and known function genes, are relatively more conserved in vascular plants. A total of 422 527 xylem orthologs of vascular plants and 274 conifer-specific xylem genes were 423 identified in this study. These genes provide good targets for molecular investigations 17 424 of xylem formation and evolution in softwoods (conifers) and hardwoods (woody 425 angiosperms). The xylem orthologs are unevenly distributed within Arabidopsis 426 chromosomes, with eight hot spots detected. Phylogenetic analysis of the 501 427 ancestral xylem orthologs suggests that molecular evolution of the xylem 428 transcriptome has paralleled plant evolution. 429 430 Methods 431 Plant species 432 We selected ten species to represent three major classes of vascular plants 433 (gymnosperms, angiosperms and lycophytes). Four conifer species: Pinus radiata 434 (radiata pine), Pinus taeda (loblolly pine), Picea glauca (white spruce) and Picea 435 sitchensis (sitka spruce), represent gymnosperms. Four species of dicots including 436 Populus tremula × Populus tremuloides (hybrid aspen), Populus trichocarpa, 437 Eucalyptus grandis and Arabidopsis thaliana, and one monocot (Oryza sativa, rice) 438 represent angiosperms. The recently sequenced Selaginella moellendorffii [30] 439 represents lycophytes. The non-vascular plant Physcomitrella patens (moss) was 440 included as a reference. The seven woody species (four conifer and three woody dicot 441 species) both undergo primary and secondary xylem formation. Arabidopsis also 442 develops secondary xylem and has been used as a model system for secondary xylem 443 development [10, 11, 53-56]. The monocot rice and the lycophyte S. moellendorffii 444 only produce primary xylem, while the non-vascular plant moss has no xylem at all. 445 446 Public genomic resources 447 We previously developed a radiata pine xylem transcriptome resource with 3,304 448 xylem unigenes [13]. Unigenes of loblolly pine, white spruce, sitka spruce, hybrid 18 449 aspen, P. trichocarpa, Arabidopsis, rice and moss (Additional data file 8) were 450 retrieved from the NCBI UniGene database [57]. The gene indices of pine, spruce and 451 poplar (Additional data file 8) were collected from the TIGR gene index database 452 [58]. These gene indices represent the total transcriptome from a wide range of tissues 453 (such as stem, shoots, xylem, leaves, flowers, bark, roots and seeds, etc). A sub-set of 454 xylem gene indices of pine, spruce and poplar was separately retrieved from pure 455 xylem libraries. These included genes expressed in pure xylem tissues, while mixed 456 xylem tissues (such as shoots, cambial regions with phloem, etc) were not considered. 457 After removing redundant sequences a total of 14,527, 15,262 and 9,109 xylem gene 458 indices of pine, spruce and poplar were identified. To minimize the bias in 459 comparative genomics, pure xylem gene indices were reassembled using the same 460 method and parameters as used in radiata pine [13], resulting in 18,320, 12,489 and 461 7,991 pure xylem unigenes for pine, spruce and poplar, respectively. 462 463 The public transcriptome resources (unigenes and gene indices) are unlikely to 464 include all transcripts expressed in xylem formation due to sampling limitations. 465 Therefore, selected fully sequenced plant species (P. trichocarpa, E. grandis, 466 Arabidopsis, rice, S. moellendorffii and P. patens) were added for comparative 467 genomics in this study (Additional data file 8). Gene models of P. trichocarpa (v1.1), 468 S. moellendorffii (v1.0) and P. patens (moss) (v1.1) were downloaded from the JGI 469 website [59]. E. grandis scaffolds were downloaded from EucalyptusDB [18]. A. 470 thaliana gene models (TAIR8) were downloaded from the TAIR website [60] and O. 471 sativa gene models (v6.0) were downloaded from the MSU website [61]. 472 473 Comparative genomic analysis 19 474 Blast programs (blastn, tblastx, blastx, blastp and tblastn) were used for sequence 475 comparisons. All blast searches were performed locally using BlastStation2 (TM 476 software, Inc., CA). Expected-value (E-value or E), sequence identity and bit scores 477 were collected for evaluating single sequence similarity. Percentage of hits, average 478 sequence identity and average bit score at different E-value cut-offs were calculated 479 for the transcriptome comparisons between different plant species. 480 481 The E-value cut-offs for inferring sequence similarity vary from 10-3 [17, 24] to 10-50 482 [62, 63], but 10-10 to 10-30 have been widely used [19, 32, 43, 64-67]. To ensure high 483 confidence as previously suggested [62] and to increase the likelihood of inferring 484 biological significance, we used the following thresholds of E-values: (a) E = 0, the 485 two sequences are identical and considered to derive from the same gene; (b) 10-50 ≤ E 486 < 0, the two sequences are highly similar and likely to be from the same gene family; 487 (c) 10-5 ≤ E < 10-50, the two sequences are similar in one or more regions along the 488 whole sequence and likely to contain the same functional domain(s) or motif(s); (d) E 489 < 10-5, the two sequences are likely to be unrelated genes. As a general rule, E ≤ 10-50 490 was interpreted in this study to indicate whole sequence similarity of two given 491 sequences (gene conservation or apparent homologs); and 10-5-10-50 was interpreted to 492 indicate partial sequence similarity (domain or motif conservation). However, the 493 possible influences of different blast programs and database size were not considered 494 in setting the E-value thresholds. 495 496 Phylogenetic analysis 497 The predicted amino acid sequences of ancestral xylem orthologs from loblolly pine 498 and white spruce, and protein sequences from the five model species (poplar, 20 499 Arabidopsis, rice, Selaginella and moss) were used for phylogenetic analysis. For 500 each species all sequences of ancestral xylem orthologs were combined into one 501 sequence prior to phylogenetic analysis. Sequence alignments were compared using 502 four methods: ClustalX2 [68], Kalign [68], Mafft [68] and Mega4 [69]. Phylogenetic 503 trees were built using the neighbour-joining algorithm with a bootstrap of 1000. 504 505 Abbreviations 506 Mb, million base pairs; Ma, million years ago; EST, expressed sequence tag; E, E- 507 value or expected-value; GO, gene ontology; PGI, pine gene index; SGI, spruce gene 508 index; PplGI, Populus gene index; PGI xylem, gene index from pure xylem tissues of 509 pines; SGI xylem, gene index from pure xylem tissues of spruces; PplGI xylem, gene 510 index from pure xylem tissues of Populus; VSX genes, vascular plant-specific xylem 511 genes; TF, transcription factor; CesA, cellulose synthase; AGP, arabinogalactan 512 protein; COMT, caffeic acid 3-O-methyltransferase. 513 514 Authors’ contributions 515 XL contributed the project idea, collected relevant sequence data, conducted the 516 comparative analysis and prepared the manuscript. SS and HW proposed and guided 517 the research project. All the authors have approved the final manuscript. 518 519 Additional data files 520 Additional data file 1 is a figure showing that known protein genes and transcription 521 factors in the xylem transcriptome are relatively more conserved in diverse plants. 522 Additional data file 2 is a table listing 527 xylem orthologs in vascular plants based on 21 523 the loblolly pine xylem unigenes with homologs (tblastx or blastx, E ≤ 10-50) in the 524 unigenes or gene models of white spruce, sitka spruce, poplar, Arabidopsis, rice and 525 Selaginella. Additional data file 3 is a figure showing the distribution of xylem 526 orthologs on each chromosome of Arabidopsis, rice and poplar, and its significance 527 (P-value) compared to all loci in each chromosome. Additional data file 4 is a figure 528 showing locations of xylem orthologs on rice chromosomes (A: 3,057 homologs 529 based on 1E-50; B: 8,458 homologs based on 1E-5; C: 3,000 randomly selected loci). 530 Additional data file 5 is a figure showing locations of xylem orthologs on poplar 531 chromosomes (A: 3,523 homologs based on 1E-50; B: 7,436 homologs based on 1E- 532 5; C: 3,200 randomly selected loci). Additional data file 6 is a figure showing 533 phylogenetic trees for 501 ancestral xylem orthologs using four alignment methods (A: 534 Kalign; B: ClustalX2 or Mafft; C: Mega 4). Additional data file 7 is a table listing 274 535 putative conifer-specific xylem unigenes. Additional data file 8 is a table summarising 536 the public genomic resources of 11 plant species used in this study. 537 538 Acknowledgements 539 This work was funded by Forest and Wood Products Australia (FWPA), ArborGen 540 LLC, the Southern Tree Breeding Association (STBA), Queensland Department of 541 Primary Industry (QDPI) and the Commonwealth Scientific and Industrial Research 542 Organization (CSIRO). We thank Iain Wilson, Shannon Dillon, Colleen MacMillan, 543 Bryan Clarke and Jason Bragg for their critical comments on the manuscript. 544 545 References 546 547 1. Sperry JS: Evolution of water transport and xylem structure. Int J Plant Sci 2003, 164(3):S115-S127. 22 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Boyce CK, Zwieniecki MA, Cody GD, Jacobsen C, Wirick S, Knoll AH, Holbrook NM: Evolution of xylem lignification and hydrogel transport regulation. Proc Natl Acad Sci U S A 2004, 101(50):17555-17558. Peter G, Neale D: Molecular basis for the evolution of xylem lignification. Curr Opin Plant Biol 2004, 7(6):737-742. Li LG, Cheng XF, Leshkevich J, Umezawa T, Harding SA, Chiang VL: The last step of syringyl monolignol biosynthesis in angiosperms is regulated by a novel gene encoding sinapyl alcohol dehydrogenase. Plant Cell 2001, 13(7):1567-1585. Li LG, Popko JL, Umezawa T, Chiang VL: 5-Hydroxyconiferyl aldehyde modulates enzymatic methylation for syringyl monolignol formation, a new view of monolignol biosynthesis in angiosperms. J Biol Chem 2000, 275(9):6537-6545. Osakabe K, Tsao CC, Li LG, Popko JL, Umezawa T, Carraway DT, Smeltzer RH, Joshi CP, Chiang VL: Coniferyl aldehyde 5-hydroxylation and methylation direct syringyl lignin biosynthesis in angiosperms. Proc Natl Acad Sci U S A 1999, 96(16):8955-8960. Weng JK, Li X, Stout J, Chapple C: Independent origins of syringyl lignin in vascular plants. Proc Natl Acad Sci U S A 2008, 105(22):7887-7892. Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, Holmberg A, Amini B, Bhalerao R, Larsson M, Villarroel R et al: Gene discovery in the woodforming tissues of poplar: Analysis of 5,692 expressed sequence tags. Proc Natl Acad Sci U S A 1998, 95:13330-13335. Allona I, Quinn M, Shoop E, Swope K, St Cyr S, Carlis J, Riedl J, Retzel E, Campbell MM, Sederoff R et al: Analysis of xylem formation in pine by cDNA sequencing. Proc Natl Acad Sci U S A 1998, 95(16):9693-9698. Zhao CS, Craig JC, Petzold HE, Dickerman AW, Beers EP: The xylem and phloem transcriptomes from secondary tissues of the Arabidopsis roothypocotyl. Plant Physiol 2005, 138(2):803-818. Oh S, Park S, Han KH: Transcriptional regulation of secondary growth in Arabidopsis thaliana. J Exp Bot 2003, 54(393):2709-2722. Pavy N, Paule C, Parsons L, Crow JA, Morency MJ, Cooke J, Johnson JE, Noumen E, Guillet-Claude C, Butterfield Y et al: Generation, annotation, analysis and database integration of 16,500 white spruce EST clusters. BMC Genomics 2005, 6:144. Li X, Wu HX, Dillon SK, Southerton SG: Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don. BMC Genomics 2009, 10:41. Pavy N, Boyle B, Nelson C, Paule C, Giguere I, Caron S, Parsons LS, Dallaire N, Bedon F, Berube H et al: Identification of conserved core xylem gene sets: conifer cDNA microarray development, transcript profiling and computational analyses. New Phytol 2008, 180(4):766-786. Kaul S, Koo HL, Jenkins J, Rizzo M, Rooney T, Tallon LJ, Feldblyum T, Nierman W, Benito MI, Lin XY et al: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000, 408(6814):796-815. Yu J, Hu SN, Wang J, Wong GKS, Li SG, Liu B, Deng YJ, Dai L, Zhou Y, Zhang XQ et al: A draft sequence of the rice genome (Oryza sativa L. ssp indica). Science 2002, 296(5565):79-92. Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A et al: The genome of black 23 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006, 313:15961604. EucalyptusDB. [http://eucalyptusdb.bi.up.ac.za/ ] Ujino-Ihara T, Kanamori H, Yamane H, Taguchi Y, Namiki N, Mukai Y, Yoshimura K, Tsumura Y: Comparative analysis of expressed sequence tags of conifers and angiosperms reveals sequences specifically conserved in conifers. Plant Mol Biol 2005, 59(6):895-907. Quesada T, Li Z, Dervinis C, Li Y, Bocock PN, Tuskan GA, Casella G, Davis JM, Kirst M: Comparative analysis of the transcriptomes of Populus trichocarpa and Arabidopsis thaliana suggests extensive evolution of gene expression regulation in angiosperms. New Phytol 2008, 180(2):408-420. Liu H, Sachidanandam R, Stein L: Comparative genomics between rice and Arabidopsis shows scant collinearity in gene order. Genome Res 2001, 11(12):2020-2026. Feuillet C, Keller B: Comparative Genomics in the grass family: Molecular characterization of grass genome structure and evolution. Annals of Botany 2002, 89(1):3-10. Paterson AH, Bowers JE, Feltus FA, Tang HB, Lin LF, Wang XY: Comparative genomics of grasses promises a bountiful harvest. Plant Physiol 2009, 149(1):125-131. Nishiyama T, Fujita T, Shin-I T, Seki M, Nishide H, Uchiyama I, Kamiya A, Carninci P, Hayashizaki Y, Shinozaki K et al: Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana: Implication for land plant evolution. Proc Natl Acad Sci U S A 2003, 100(13):8007-8012. Paterson AH, Bowers JE, Burow MD, Draye X, Elsik CG, Jiang CX, Katsar CS, Lan TH, Lin YR, Ming RG et al: Comparative genomics of plant chromosomes. Plant Cell 2000, 12(9):1523-1539. Caicedo AL, Purugganan MD: Comparative plant genomics. Frontiers and prospects. Plant Physiol 2005, 138(2):545-547. Bowman JL, Floyd SK, Sakakibara K: Green genes - Comparative genomics of the green branch of life. Cell 2007, 129(2):229-234. Campbell MA, Zhu W, Jiang N, Lin H, Ouyang S, Childs KL, Haas BJ, Hamilton JP, Buell CR: Identification and characterization of lineagespecific genes within the Poaceae. Plant Physiol 2007, 145(4):1311-1322. Yang XH, Jawdy S, Tschaplinski TJ, Tuskan GA: Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus. Genomics 2009, 93(5):473-480. Selaginella moellendorffii gene models. [http://genome.jgipsf.org/Selmo1/Selmo1.home.html] Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y et al: The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 2008, 319(5859):64-69. Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C, Retzel E, Whetten R, Sederoff R: Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proc Natl Acad Sci U S A 2003, 100(12):7383-7388. Ralph SG, Chun HJE, Kolosova N, Cooper D, Oddy C, Ritland CE, Kirkpatrick R, Moore R, Barber S, Holt RA et al: A conifer genomics 24 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis). BMC Genomics 2008, 9:484. Brenner ED, Katari MS, Stevenson DW, Rudd SA, Douglas AW, Moss WN, Twigg RW, Runko SJ, Stellari GM, McCombie WR et al: EST analysis in Ginkgo biloba: an assessment of conserved developmental regulators and gymnosperm specific genes. BMC Genomics 2005, 6:143. Girke T, Lauricha J, Tran H, Keegstra K, Raikhel N: The cell wall navigator database. A systems-based approach to organism-unrestricted mining of protein families involved in cell wall metabolism. Plant Physiol 2004, 136(2):3003-3008. Guillaumie S, San-Clemente H, Deswarte C, Martinez Y, Lapierre C, Murigneux A, Barriere Y, Pichon M, Goffner D: MAIZEWALL. Database and developmental gene expression profiling of cell wall biosynthesis and assembly in maize. Plant Physiol 2007, 143(1):339-363. Guo AY, Chen X, Gao G, Zhang H, Zhu QH, Liu XC, Zhong YF, Gu XC, He K, Luo JC: PlantTFDB: a comprehensive plant transcription factor database. Nucleic Acids Res 2008, 36:D966-D969. Barakat A, Matassi G, Bernardi G: Distribution of genes in the genome of Arabidopsis thaliana and its implications for the genome organization of plants. Proc Natl Acad Sci U S A 1998, 95(17):10044-10049. Vision TJ, Brown DG, Tanksley SD: The origins of genomic duplications in Arabidopsis. Science 2000, 290(5499):2114-2117. Segmental chromosome duplication of Arabidopsis. [http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml] Heuertz M, De Paoli E, Kallman T, Larsson H, Jurman I, Morgante M, Lascoux M, Gyllenstrand N: Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway spruce [Picea abies (L.) Karst]. Genetics 2006, 174(4):2095-2105. Palmer JD, Soltis DE, Chase MW: The plant tree of life: An overview and some points of view. Am J Bot 2004, 91(10):1437-1445. Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner AM, Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH et al: A Populus EST resource for plant functional genomics. Proc Natl Acad Sci U S A 2004, 101(38):13951-13956. Steemans P, Le Herisse A, Melvin J, Miller MA, Paris F, Verniers J, Wellman CH: Origin and radiation of the earliest vascular land plants. Science 2009, 324(5925):353-353. Ujino-Ihara T, Tsumura Y: Screening for genes specific to coniferous species. Tree Physiol 2008, 28(9):1325-1330. Schultz CJ, Johnson KL, Currie G, Bacic A: The classical arabinogalactan protein gene family of Arabidopsis. Plant Cell 2000, 12(9):1751-1767. Schultz CJ, Rumsewicz MP, Johnson KL, Jones BJ, Gaspar YM, Bacic A: Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case. Plant Physiol 2002, 129(4):1448-1463. Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ: The complex structures of arabinogalactan-proteins and the journey towards understanding function. Plant Mol Biol 2001, 47(1-2):161-176. 25 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. Knox JP: Up against the wall: arabinogalactan-protein dynamics at cell surfaces - Commentary. New Phytol 2006, 169(3):443-445. Coimbra S, Almeida J, Junqueira V, Costa ML, Pereira LG: Arabinogalactan proteins as molecular markers in Arabidopsis thaliana sexual reproduction. J Exp Bot 2007, 58(15-16):4027-4035. Yang SH, Wang HY, Sathyan P, Stasolla C, Loopstra CA: Real-time RTPCR analysis of loblolly pine (Pinus taeda) arabinogalactan-protein and arabinogalactan-protein-like genes. Physiol Plant 2005, 124(1):91-106. Putoczki TL, Pettolino F, Griffin MDW, Moller R, Gerrard JA, Bacic A, Jackson SL: Characterization of the structure, expression and function of Pinus radiata D. Don arabinogalactan-proteins. Planta 2007, 226:11311142. Zhao CS, Johnson BJ, Kositsup B, Beers EP: Exploiting secondary growth in Arabidopsis. Construction of xylem and bark cDNA libraries and cloning of three xylem endopeptidases. Plant Physiol 2000, 123(3):11851196. Chaffey N, Cholewa E, Regan S, Sundberg B: Secondary xylem development in Arabidopsis: a model for wood formation. Physiol Plant 2002, 114(4):594-600. Funk V, Kositsup B, Zhao CS, Beers EP: The Arabidopsis xylem peptidase XCP1 is a tracheary element vacuolar protein that may be a papain ortholog. Plant Physiol 2002, 128(1):84-94. Nieminen KM, Kauppinen L, Helariutta Y: A weed for wood? Arabidopsis as a genetic model for xylem development. Plant Physiol 2004, 135(2):653659. NCBI UniGene database. [ftp://ftp.ncbi.nih.gov/repository/UniGene/] The TIGR plant gene index database. [http://compbio.dfci.harvard.edu/tgi/plant.html] Gene models of Populus trichocarpa (v1.1), Selaginella moellendorffii (v1.0) and Physcomitrella patens (moss) (v1.1). [http://genome.jgi-psf.org/] Arabidopsis thaliana gene models (TAIR8). [ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR8_genome_release/] Oryza sativa (rice) gene models (v6.0). [ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotati on_dbs/] Brenner SE: Practical database searching. Bioinformatics 1998:9-12. Shimamura K, Ishimizu T, Nishimura K, Matsubara K, Kodama H, Watanabe H, Hase S, Ando T: Analysis of expressed sequence tags from Petunia flowers. Plant Sci 2007, 173(5):495-500. Albert VA, Soltis DE, Carlson JE, Farmerie WG, Wall PK, Ilut DC, Solow TM, Mueller LA, Landherr LL, Hu Y et al: Floral gene resources from basal angiosperms for comparative genomics research. BMC Plant Biology 2005, 5:15. Krutovsky KV, Elsik CG, Matvienko M, Kozik A, Neale DB: Conserved ortholog sets in forest trees. Tree Genetics & Genomes 2007, 3(1):61-70. Fulton TM, Van der Hoeven R, Eannetta NT, Tanksley SD: Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. Plant Cell 2002, 14(7):1457-1467. Cairney J, Zheng L, Cowels A, Hsiao J, Zismann V, Liu J, Ouyang S, Thibaud-Nissen F, Hamilton J, Childs K et al: Expressed Sequence Tags 26 747 748 749 750 751 752 753 754 755 756 68. 69. from loblolly pine embryos reveal similarities with angiosperm embryogenesis. Plant Mol Biol 2006, 62(4-5):485-501. Labarga A, Valentin F, Anderson M, Lopez R: Web services at the European Bioinformatics Institute. Nucleic Acids Res 2007, 35:W6-W11. Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol 2007, 24(8):1596-1599. 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 27 777 Figure legends 778 Figure 1. Comparisons of conifer xylem unigenes with genes in other species. 779 Xylem unigenes of radiata pine, loblolly pine and white spruce were blasted against 780 xylem unigenes, gene indices, gene models or scaffolds of other plant species. 781 Percentage of hits is presented in the Y-axis at E-value cut-offs ranging from 1E-5 to 782 0 in the X-axis. A: Radiata pine vs. other species (blastn); B: Loblolly pine vs. other 783 species (blastn); C: White spruce vs. other species (blastn); D: Radiata pine vs. other 784 species (tblastx or blastx); E: Loblolly pine vs. other species (tblastx or blastx); F: 785 White spruce vs. other species (tblastx or blastx). 786 787 Figure 2. Average nucleotide identity between pine unigenes and respective 788 homologs in other species. Xylem unigenes of radiata pine and loblolly pine were 789 blasted using blastn against unigenes of other species, including loblolly pine, white 790 spruce, sitka spruce, hybrid aspen, P. trichocarpa, Arabidopsis, rice and moss. 791 Homologs at E-value cut-offs ranging from 0 to 1E-5 were collected and their average 792 nucleotide identities were calculated and presented. A: Radiata pine xylem unigenes 793 vs. homologous unigenes of other species; B: Loblolly pine xylem unigenes vs. 794 homologous unigenes of other species. 795 796 Figure 3. Comparisons of genes between poplar and other species. Poplar 797 xylem unigenes and gene models were blasted against gene indices, gene models or 798 scaffolds of other plant species. Percentage of hits is presented in the Y-axis at E- 799 value cut-offs ranging from 1E-5 to 0 in the X-axis. A: Poplar xylem unigenes vs. 800 other species (blastn); B: Poplar xylem unigenes vs. other species (tblastx or blastx); 28 801 C: Poplar gene models vs. other species (blastn); D: Poplar gene models vs. other 802 species (tblastn or blastp). 803 804 Figure 4. The xylem transcriptome is relatively more conserved in vascular and 805 non-vascular plants. Xylem gene indices and total gene indices of pine, spruce and 806 poplar were blasted against the gene models of Populus, Arabidopsis, rice, Selaginella 807 and moss using blastn and blastx. Percentage of hits is presented in the Y-axis at E- 808 value cut-offs of 1E-50 and 1E-5 in the X-axis. Statistical tests for the different 809 number of hits observed with xylem and total transcriptome were performed assuming 810 a binomial distribution. Almost all comparisons showed statistical significances 811 (P<0.001) (data not show), with only two exceptions from the blastn comparisons 812 (1E-50) between the two conifer species and poplar. A: Blastn; B: Blastx. 813 814 Figure 5. Cell wall genes and ancestral xylem genes are relatively more 815 conserved in diverse plants. Putative cell wall genes, non-cell-wall genes, vascular 816 plant-specific xylem (VSX) genes and ancestral xylem genes from xylem unigenes of 817 radiata pine, loblolly pine, white spruce and poplar were blasted using tblastx with 818 pine and spruce gene indices as well as blastx with gene models of poplar, 819 Arabidopsis, rice, Selaginella and moss. Percentage of hits is presented in the Y-axis 820 at E-value cut-offs of 0, 1E-50 and 1E-5 in the X-axis. Different numbers of hits 821 observed with cell wall and non-cell-wall genes, as well as with VSX and ancestral 822 xylem genes were statistically tested assuming a binomial distribution. Almost all 823 comparisons showed statistical significances (P<0.001) (data not show), except for a 824 few comparisons among conifers at 1E-5. A: Radiata pine; B: Loblolly pine; C: White 825 spruce; D: Poplar. 29 826 827 Figure 6. Number of putative xylem orthologs in different groups of land plants. 828 Loblolly pine xylem unigenes (18,320) were blasted using tblastx against the unigenes 829 of white spruce and sitka spruce, and the gene models of poplar, Arabidopsis, rice, 830 Selaginella and moss. The number of putative xylem orthologs common to different 831 plant groups is presented at two E-value cut-offs (1E-50 and 1E-5). Conifers 1: 832 loblolly pine plus white spruce; Conifers 2: conifers 1 plus sitka spruce; Woody 833 plants: conifers 2 plus poplar; Seed plants 1: woody plants plus Arabidopsis; Seed 834 plants 2: seed plants 1 plus rice; Vascular plants: seed plants 2 plus Selaginella; Land 835 plants: vascular plants plus moss. The xylem orthologs shared in conifers 2, woody 836 plants, seed plant 1, seed plants 2 and vascular plants represent genes specific to 837 conifers, woody plants, seed plants, lower seed plants (lacking secondary xylem) and 838 vascular plants, respectively. The high proportion of orthologs common to land plants 839 suggests that the ancestral xylem genome was largely present in non-vascular plants. 840 841 Figure 7. Hot spots of putative xylem orthologs on the chromosomes of 842 Arabidopsis. The 1,003 loblolly pine xylem unigenes highly conserved in other 843 vascular plants (xylem orthologs) have strong homology (1E-50) with 3,115 loci and 844 moderate homology (1E-5) with 6,563 loci in the Arabidopsis genome. Positioning of 845 these homologs on the five chromosomes of Arabidopsis revealed eight hot spots 846 (showed as numbers 1 to 8 with red colour) of putative Arabidopsis xylem orthologs. 847 A: 3,115 highly homologous loci; B: 6,563 moderately homologous loci; C: 6,000 848 randomly selected loci. 849 850 30