Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Review Tansley review Conifer genomics and adaptation: at the crossroads of genetic diversity and genome function Author for correspondence: John J. MacKay Tel: +44 (0)1865 275088 Email: [email protected] Julien Prunier1, Jukka-Pekka Verta2 and John J. MacKay1,3 Received: 13 April 2015 Accepted: 14 June 2015 Department of Plant Sciences, University of Oxford, Oxford, OX1 3RB, UK 1 Centre for Forest Research and Institute for Systems and Integrative Biology, Universite Laval, Quebec, QC G1V 0A6, Canada; 2 Friedrich Miescher Laboratory of the Max Planck Society, Spemannstrasse 39, T€ ubingen 72076, Germany; 3Present address: Contents Summary 44 I. Introduction 44 II. Why conifer genomics? 45 III. Conifer genome structure and evolution 46 V. Emerging research directions toward a more integrated understanding of genetic diversity and genome function VI. Conclusion IV. Genomics of adaptation in conifers 57 58 Acknowledgements 58 References 58 49 Summary New Phytologist (2016) 209: 44–62 doi: 10.1111/nph.13565 Key words: Coniferales, gene expression, genome evolution, genomics of adaptation, next generation sequencing (NGS), RNA-sequencing, speciation-by-adaptation. Conifers have been understudied at the genomic level despite their worldwide ecological and economic importance but the situation is rapidly changing with the development of next generation sequencing (NGS) technologies. With NGS, genomics research has simultaneously gained in speed, magnitude and scope. In just a few years, genomes of 20–24 gigabases have been sequenced for several conifers, with several others expected in the near future. Biological insights have resulted from recent sequencing initiatives as well as genetic mapping, gene expression profiling and gene discovery research over nearly two decades. We review the knowledge arising from conifer genomics research emphasizing genome evolution and the genomic basis of adaptation, and outline emerging questions and knowledge gaps. We discuss future directions in three areas with potential inputs from NGS technologies: the evolutionary impacts of adaptation in conifers based on the adaptation-by-speciation model; the contributions of genetic variability of gene expression in adaptation; and the development of a broader understanding of genetic diversity and its impacts on genome function. These research directions promise to sustain research aimed at addressing the emerging challenges of adaptation that face conifer trees. I. Introduction Genomics research has gained in speed, magnitude and scope simultaneously as a result of technology developments and, most recently, next generation sequencing (NGS) technologies that facilitate large-scale analysis with unprecedented data production capacity. The potential to analyze whole genomes of thousands of 44 New Phytologist (2016) 209: 44–62 www.newphytologist.com individuals in model plants and animals is nothing less than revolutionary. The impacts are also felt in nonmodel systems such as conifer trees. In just a few years, NGS has enabled the sequencing of several conifer genomes despite their very large size, estimated at 20–24 Gbp (Birol et al., 2013; Nystedt et al., 2013; Neale et al., 2014), with several others expected in the near future. Because of the massive size of conifer genomes, sequencing and analyses Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust New Phytologist remain massive undertakings even though methods and capacity are being developed to overcome the inherent challenges. Recent genome sequencing initiatives and cDNA sequence resources developed over nearly two decades have led to insights into conifer biology, genetics and genome evolution. Here, we begin by summarizing the major scientific and societal arguments which motivate conifer genomics research. We then review three major areas of research at the center of recent developments in conifer genomics. First, the current understanding of conifer genome structure and evolution is reviewed in light of recent analyses examining the forces that have shaped these megagenomes and the functional implications at the level of gene coding content. Second, we review progress and perspectives toward understanding the genomics of adaptation, exploring two research themes: (1) the genomic basis of adaptation in which we use the speciation-by-adaptation model as a conceptual framework; and (2) the role of genetic variability of gene expression. Third, we discuss emerging research directions which are expected to contribute to a more integrated understanding of genetic diversity and genome function. II. Why conifer genomics? 1. Taxonomic position, ecological and economic importance of conifers The conifers (Pinophyta or Coniferales) are one of four living groups of gymnosperms, which together with the extant flowering plants or angiosperms (Magnoliophyta) make up the seed plants (Raven et al., 2005; Gernandt et al., 2011). There are fewer than 1000 species of extant gymnosperms including 635 recognized conifer species (www.catalogueoflife.org/, Farjon & Page, 1999), compared to 250 000 angiosperms species (Kuzoff & Gasser, 2000) resulting from tremendous and more recent adaptive radiation (Kenrick, 1999). Despite their relatively small numbers, conifer species are distributed in a variety of habitats from evergreen subtropical forests (e.g. Pinus krempfii in Vietnam; Wang et al., 2014), to boreal forests in the Northern hemisphere (e.g. Picea mariana in Canada; Farrar, 1995), and from sea level (e.g. Pinus pinaster in Western Europe; Burban & Petit, 2003) to high mountainous locations (e.g. Picea mexicana in Mexico; Ledig et al., 1997). Many conifers are widely distributed and have large ecological amplitudes (e.g. Pinus sylvestris found from Western Europe to Asia at elevations ranging from sea level to 2440 m altitude; Castro et al., 2004) but some of them are more widely dispersed and occur in small areas with uniform environmental conditions (e.g. Picea chiuauana limited to western mountain tops in Mexico; Ledig et al., 1997). Conifers play an important role in global carbon, nutrient and atmospheric cycles, provide many ecosystem services and are widely used in reforestation programs owing to their adaptability and dominance, especially in temperate and boreal forest ecosystems. Conifers have gained significant economic importance worldwide because of their amenability to large-scale plantations to produce wood, their potential for rapid growth, and their relative ease of processing to make both paper and solid wood products. Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Tansley review Review 45 Over the last several decades, genetic selection and breeding programs have been implemented in both the Northern and Southern hemispheres for pines (Pinus spp.), spruces (Picea spp.), Douglas-fir (Pseudostuga menzeii), larches (Larix spp.), Japanese cypress (Cryptomeria japonica) and firs (Abies spp.), among others. Conifers represent the longest lived individuals on Earth with maximum lifespans of a few thousand years for several species (Loehle, 1987), suggesting that their success is underpinned by considerable phenotypic plasticity and resistance to biodegrading processes. 2. Contemporary issues and emerging challenges The health and adaptation of forest trees populations will become increasingly challenged by environmental changes expected before the end of the 21st Century (Alberto et al., 2013). Simulations indicate that up to 60% of tree species in boreal and temperate regions will have a hard time adjusting to warmer climates predicted for 2085 (Hamann & Wang, 2006) and conifers may be among the most negatively affected (Hanewinkel et al., 2013). Aitken et al. (2008) outlined the three possible outcomes for forest tree populations under present climate warming scenarios: adaptation, migration or extirpation. Of these three major scenarios, the adaptation potential is the most complex to ascertain and will depend upon phenotypic variation and standing genetic variation (Siol et al., 2010), strength of selection, fecundity and biotic interactions (Aitken et al., 2008). In recent years, niche modeling has developed that integrates phenotypic or genetic data, site conditions and climate models, and has been applied to conifers in Europe and North America to project future species distributions in the face of climate change (Rehfeldt et al., 2014). These analyses indicate that inclusion of genetic variability as well as phenotypic plasticity data influences the outcomes of species distribution modeling (Benito et al., 2011), and that some areas will contain genotypes becoming less and less adapted to the climate (Rehfeldt et al., 2014).Therefore, understanding which part of standing genetic variation is adaptive as opposed to neutral is a central research theme in evolutionary biology and was identified as a major challenge to address for forest tree genomics (Neale & Kremer, 2011). The first decade of the 21st Century has already provided us with striking examples of biotic outbreaks resulting from climate change or globalization with devastating effects on conifers. For example, the mountain pine beetle has decimated tens of thousands of hectares of pine forest in western North America because of temperature-driven range expansion of this insect (Raffa et al., 2013). Meanwhile, plant pathogens such as Phythopthora spp. have moved around the world with globalization and in some cases have jumped to new hosts. In 2009, P. ramorum was reported to infect larch plantations on a large scale in the UK but was previously known to infect hardwood trees and cause sudden oak death (Brasier & Webber, 2010). Developing an understanding of the evolution and adaptability of biotic threats, conifer tree defences and their interactions represents a rapidly growing challenge which may at least in part be addressed by genomics research. New Phytologist (2016) 209: 44–62 www.newphytologist.com 46 Review Tansley review A shift of conifer genomics research to these issues and challenges entails a better understanding of adaptation to climate, phenotypic plasticity and drivers of evolution. III. Conifer genome structure and evolution Genome sizes have been estimated from 18 Gbp to over 35 Gbp for major conifers (Murray et al., 2012) making them on average 2009, 249 and 109 the size of the genomes of Arabidopsis, poplar and maize, respectively. The very large size of conifer genomes has played a major role in shaping both the research questions and the experimental approaches in conifer genomics. 1. Conifer genome structure Gene discovery Large-scale EST and cDNA sequencing have occupied a central role in conifer genomics owing to the large size of conifer genomes and enabled the development of databases (Fernandez-Pozo et al., 2011), transcriptome profiling, and efficient genotyping platforms, among others (Neale & Kremer, 2011; Mackay et al., 2012; Ritland, 2012). Conifers have been shown to share a majority of expressed genes with angiosperm plants, thereby facilitating comparative analyses, but 30–40% of their genes do not match proteins of known function (Kirst et al., 2003; Rigault et al., 2011). Genotyping platforms have been derived from cDNA sequencing and have led to the construction of genetic maps of higher density which have enabled structural (Eckert et al., 2009b; Pelgas et al., 2011; Chancerel et al., 2013; Neves et al., 2014) and comparative genomics analyses (Komulainen et al., 2003; Pavy et al., 2012b). More recently, gene sequence analysis from higher throughput pyrosequencing (Parchman et al., 2010) and RNA sequencing (RNA-seq) has enabled the identification of sequence variations, most prominently single nucleotide polymorphisms (SNPs) and gene expression levels, although de novo transcriptome analysis from short reads still poses assembly challenges (Chen et al., 2012; Wang et al., 2013; Yeaman et al., 2014). Genome sequencing Whole-genome sequencing was recently reported for Norway spruce (Nystedt et al., 2013), white spruce (Birol et al., 2013), loblolly pine (Neale et al., 2014); assemblies were released for sugar pine and Douglas-fir (http://pinegenome.org/pinerefseq/) and reduced-depth sequencing was reported for several other species within the coniferales (Nystedt et al., 2013). These developments have been driven by progress in shotgun genome sequencing and associated bioinformatics methods (Simpson et al., 2009; Nystedt et al., 2013; Zimin et al., 2013, 2014) which have been applied to analyzing both haploid and diploid conifer DNA. The sequences and assemblies are shedding new light onto conifer genome evolution (Soltis & Soltis, 2013; De La Torre et al., 2014); however, assemblies reported to date are highly fragmented and each comprises > 10 million unordered scaffolds. The N50 scaffold sizes (i.e. length of the shortest scaffold among those that collectively cover 50% of the assembly) are between 6 and 67 kbp. Future developments aimed at turning these assemblies into contiguous genomes will need to increase scaffold New Phytologist (2016) 209: 44–62 www.newphytologist.com New Phytologist size by about two orders of magnitude and position the scaffolds along chromosomes, forming pseudo molecules. These goals may be achieved through a combination of improved libraries (e.g. mate-pair libraries), longer reads (e.g. single molecule sequencing), improved computational analyses, more efficient and cost-effective physical mapping methods (e.g. optical mapping) and integration of high-density genetic maps (expanding on the methods described in Pelgas et al., 2011; Neves et al., 2014). Retrotransposons underpin the large size of conifer genomes The accumulation of long terminal repeat retrotransposons (LTR-RT) was assumed to account for the very large sizes of conifer genomes, and this was confirmed by recent genome sequence analyses. In total, LTR-RT sequences were estimated to represent 58% of the genome both in Picea abies and the Pinus taeda based on sequence homology to known elements (Nystedt et al., 2013; Neale et al., 2014; Wegrzyn et al., 2014). Estimates for other members of the Pinaceae were in the 40–50% range (Nystedt et al., 2013), which may be lower due to shallower sequencing. The total representation of all transposable element (TE) classes was estimated as 69% in P. abies (Nystedt et al., 2013) and 80% in P. taeda (Wegrzyn et al., 2014), depending in part on the methods of identification. Some of the individual LTR-RT sequences were estimated to have tens to hundreds of thousands of copies in the genome (Morse et al., 2009; Raherison et al., 2012). Angiosperm genome size and structure are also largely influenced by LTR-RTs but their representation in the genome is highly variable between species (Bennetzen et al., 2005; Michael, 2014). Conifer genomes have very low content of type II transposable elements and LINEs compared to most angiosperms (Nystedt et al., 2013; Sena et al., 2014). The abundant conifer LTR-RTs have features that are uniquely representative of conifers. Only three families, the Ty3/Gypsy, Ty1/Copia and Gymny superfamilies make up the bulk of LTRRTs in conifers as shown by recent genome annotations (Nystedt et al., 2013; Neale et al., 2014; Wegrzyn et al., 2014) and BAC sequencing (Morse et al., 2009; Kovach et al., 2010; Sena et al., 2014). The comparative analysis of LTR-RTs in conifers by Nystedt et al. (2013) came to three main conclusions: (1) Ty3/ Gypsy and Ty1/Copia showed high levels of expansion and diversity, exceeding those observed in any of the angiosperms; (2) The lack of removal of replicated LTR-RTs is responsible for their massive accumulation rather than a higher rate of multiplication; (3) The accumulation of retro-elements in conifer genomes is very ancient and has occurred over a very long timeframe spanning tens to hundreds of millions of years, in contrast to LTR-RT-rich monocot genomes (e.g. rice, maize) where multiplication has been concentrated in the last few millions of years. In line with this time frame, much of the diversification of LTR-RTs is estimated to have occurred after the divergence of genera within the conifers. Although plant retrotransposons tend to be more abundant in regions of the genome that are gene-poor, an analysis of Arabidopsis thaliana showed that 7.4% of genes contained TE fragments (Lockton & Gaut, 2009). Sena et al. (2014) reported that traces of TE could be found in some conifer genes but their occurrence in Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust New Phytologist genes has not been examined at the level of the whole genome and their impacts on genes has not been studied in any detail. Abundant gene-like fragments, pseudogenes and gene duplications Gene duplication plays a central role in genome evolution and it is generally accepted that most duplicates are removed from the genome or remain as nonfunctional pseudogenes. Conifers may be inefficient at eliminating nonfunctional gene copies, as observed with the accumulation of LTR-RTs, leading to the accumulation of derelict gene copies and gene-like sequences. Conifer genome annotations have revealed a surprisingly large fraction of sequences classified as genes or gene-like fragments. Gene-like sequences represented 2.4% and 2.9% of the P. abies and P. taeda genome, respectively, (Nystedt et al., 2013; Neale et al., 2014) and as high as 4% from earlier analyses (Morgante & De Paoli, 2011). These estimates would represent 500–800 Mbp of DNA and seem incompatible with the number of functional genes which are estimated between 50 000 and 59 000 by the same authors (Nystedt et al., 2013; Neale et al., 2014). This apparent discrepancy may be explained by the abundance of pseudogenes reported in conifers (Kovach et al., 2010); however, a large-scale characterization of pseudogenes is lacking and this hypothesis has not been directly tested at the genome level. Pseudogene frequency was surveyed among secondary metabolism gene families in the white spruce genome and reported to be abundant but variable in numbers depending on the pathway (Warren et al., 2015). Understanding the origin, accumulation and fate of functional duplicated genes is a key to unraveling genome function and evolution, but will likely be complicated by the pseudogenes found in conifers until significantly more contiguous genome sequences become available. 2. Genome evolution and speciation The split between angiosperms and the gymnosperms dates back c. 325 Myr (Bowe et al., 2000). Gymnosperms have significantly slower evolution rates and less dynamic radiation than flowering plants (Leitch & Leitch, 2012). Early insights into the composition and structure of conifer genomes indicated that they are different from angiosperms and that their organization could not readily be predicted or deduced from lessons learned in other plant genomes (Mackay et al., 2012). Comparative evolution of genome size and structure Gymnosperm genomes are larger on average and less variable in size compared to angiosperm genomes (Leitch & Leitch, 2012). The extent of variation in genome size was estimated at 16-fold among the gymnosperms and 2400-fold among the angiosperms (Bennett & Leitch, 2005). The modal and mean genome size are 1C = 10.0 pg and 1C = 18.8 pg, respectively, in gymnosperms, compared to just 1C = 0.6 and 5.9 pg in angiosperms (Leitch & Leitch, 2012). Within the gymnosperms, the coniferales stand out as having the largest average genome size (Murray et al., 2012). The number of chromosomes, the size of individual chromosomes and ploidy are remarkably less variable in gymnosperms and particularly in conifers compared to angiosperms. Conifers have Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Tansley review Review 47 large chromosomes and their 1n count varies from 9 to 33; nearly all members of the pine family (Pinaceae) have 12 chromosomes. In contrast, angiosperms have 1n = 2 to 320. In addition, genetic mapping showed a high level of synteny in gene order within and across major genera in the Pinaceae (Pavy et al., 2012b), consistent with their low level of variation in chromosome number. There is no evidence of recent whole-genome duplication in conifers despite their large genomes (Kovach et al., 2010; Nystedt et al., 2013), whereas angiosperm genomes have been shaped by several wholegenome duplications (Bennetzen et al., 2005; Michael, 2014). Leitch & Leitch (2012) concluded that chromosome structure is less dynamic in gymnosperms than angiosperms as a result of both ecological and genomic factors that have acted to favor slow genome evolution in gymnosperms. Ancient and highly conserved gene content Different lines of evidence reveal that the gene content of contemporary conifer genomes is ancient and largely conserved between species. The ancient origin of genes and gene families was in the pine family shown by the much larger fraction of the gene duplication events that pre-date the angiosperm–gymnosperm split (c. 300 Myr ago (Ma)) compared to those that occurred more recently (Pavy et al., 2012b). In regard to gene structure, conifer genes tend to accumulate long introns, with the largest introns surpassing 60 kb in spruce (Nystedt et al., 2013) and 120 kb in pine (Wegrzyn et al., 2014). In general, one or a few long introns of several kb are found in conifer genes and conserved across species (Sena et al., 2014), although most introns are 100–200 bp and comparable in size to angiosperm introns (Sena et al., 2014; Wegrzyn et al., 2014). It might be expected that very long introns in conifers would reduce the level of gene expression – although no evidence has been found to support this hypothesis (Nystedt et al., 2013; Sena et al., 2014) – in contrast to mammalian genes where high expression tends to be associated with shorter introns (Castillo-Davis et al., 2002). Evolution of gene families and functional consequences The evolution of genes and gene families is a fundamental mechanism for phenotypic change and adaptation. The rapid evolution of gene families in angiosperms has been a major facilitator of radiation and adaptation (Leitch & Leitch, 2012). In comparison, gene families evolve more slowly in conifers (Pavy et al., 2012b); for example, transcription factors in particular are much less numerous in conifers (Rigault et al., 2011). Hence, although gene families largely overlap between conifers and angiosperms (Rigault et al., 2011; Wegrzyn et al., 2014), entire gene clades are sometimes not detected and differential patterns of expansion are observed within some clades (reviewed in MacKay & Dean, 2011; Mackay et al., 2012). As a result, inferring orthology relationships based on sequence alone is nearly impossible in large gene families. This has been observed for regulatory genes such as KNOX-I genes (GuilletClaude et al., 2004) and MADS box genes (Gramzow et al., 2014), among others. Several classes of enzymes also occur in gene families of moderate to large size in plants and are involved in conifer defenses, including beta glucosidases (Mageroy et al., 2015) and terpene synthases (Keeling et al., 2011), and in growth such as cellulose synthases (Nairn & Haselkorn, 2005). New Phytologist (2016) 209: 44–62 www.newphytologist.com 48 Review New Phytologist Tansley review One approach that facilitates insight into the biological role of conifer genes is to integrate results from large-scale sequence analyses and transcriptome profiling. We illustrate this approach in the NAC-domain gene family where attempts have been made to identify family members linked to vascular development and woody growth (Duval et al., 2014) (see Fig. 1). The Picea glauca gene PgNAC-7 (Fig. 1a) was identified as a functional ortholog of the Arabidopsis thaliana ATSND1, VND6/7 genes that regulated secondary cell wall formation (Demura & Fukuda, 2007); however, functional data are lacking for the other 20+ known NACs in P. glauca. Here, we integrated phylogenetic and transcript data to identify PgNAC-8 as a candidate ortholog of AtSND2, three genes which regulate complex carbohydrates during secondary cell wall formation (Zhong et al., 2010) (Fig. 1). Four other Pg Picea glauca NAC-domain sequences are closely related (Fig. 1a) but only PgNAC-8 is preferentially expressed in secondary xylem (Fig. 1b). The transcript profiling data combined with phylogenetic tree also identifies potential cases of neo- or sub-functionalization following gene duplication events (Fig. 1b; see *, to the right). Slow species evolution despite habitat diversity Conifers occupy a high diversity of ecological niches and are dominant in many temperate and boreal ecosystems; therefore, a broader diversity of species might be expected than is observed in the conifer phylum. Compared to angiosperms, conifers evolve more slowly (Leitch & (a) Leitch, 2012) and have a synonymous mutation rate that is 15-fold lower (Buschiazzo et al., 2012). The slow evolution of conifers is also indicated by the abundance of natural or domestic hybridization events, for example in North American pines, indicating incomplete reproductive isolation between species. In contrast to their relatively low species diversity, conifers are characterized by high levels of heterozygosity and intraspecific diversity. These are imputable to their large population sizes, outbreeding mating systems and efficient gene flow due to windpollination, which in turn maintain common alleles over large geographical distances and low levels of genetic differentiation between populations (Achere et al., 2005; Namroud et al., 2008; Holliday et al., 2011; Prunier et al., 2012; Alberto et al., 2013). In addition, conifers have particularly low average linkage disequilibrium (LD) (Brown et al., 2004; Pavy et al., 2012a), which is often confined within genes (Namroud et al., 2010). The impacts of LD are also influenced by the large physical distances separating genes as shown by BACs analyses (Kovach et al., 2010; Magbanua et al., 2011; Sena et al., 2014). Taken together, these features lead to largely independent evolution of individual genes (Savolainen et al., 2007). They also favor efficient natural selection (compared to drift) in species with large populations leading to the establishment of clines of locally adapted genotypes (Aitken et al., 2008; Alberto et al., 2013). (b) Fig. 1 Integrated analysis of gene family phylogeny and transcriptome profiling data in the NAC-domain gene family. (a) Phylogenetic tree of NAC-domain genes of known sequences from Picea glauca and selected Arabidopsis thaliana NACs ATSND1 and VND6/7 (in shaded box) that regulate secondary cell wall formation in vessels and fibers, constructed by Duval et al. (2014). (b) Transcript abundance classification data (heatmap) from the PiceaGenExpress database (Raherison et al., 2012). Columns represent vegetative tissues: B, vegetative buds; F, foliage; XM, xylem – mature; XJ, xylem – juvenile; P, phelloderm; R, adventitious roots; M, megagametophytes; E, embryogenic cells; transcript levels represent relative abundance classes within each tissue, grey is for missing data. *, Pairs of close paralogs with clearly differentiated gene expression indicative of functional divergence. New Phytologist (2016) 209: 44–62 www.newphytologist.com Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust New Phytologist IV. Genomics of adaptation in conifers The genomic basis of adaptation has been investigated in several tree species including conifers. Studies have largely focused on identifying intraspecific DNA sequence variations that are linked to adaption (Neale & Kremer, 2011). Researchers have investigated the genetic architecture of adaptive traits such as phenology, cold acclimation and resistance to aridity, and described patterns of variation that may indicate the effects of evolutionary forces at the population level (Savolainen et al., 2007). Here, we first discuss the concept of speciation-by-adaptation as a framework to develop further evolutionary insights into the genomics of adaptation in conifers and, second, examine opportunities to gain a more functional understanding through investigations of gene expression variability. In both cases, we highlight opportunities to use NGS technologies to accelerate discoveries. 1. Genomics architecture and population genetics of adaptation Speciation-by-adaptation – a framework for genomics of adaptation A speciation-by-adaptation model has been developed from work integrating natural selection and genetic divergence with analyses of the genomic distributions of loci contributing to adaptive trait variations and reproductive barriers. The speciationby-adaptation model outlined by Turner et al. (2005) has been widely discussed and tested in a variety of organisms (Jones & Wang, 2012; Nosil, 2012; Renaut et al., 2013; Nadeau et al., 2014) but not in conifers. The model proposes that a locus undergoing divergent natural selection between two populations draws adjacent loci into high genetic divergence through hitchhiking which, in turn, could lead to reproductive isolation and speciation. Using an island/ocean metaphor to compare two populations settled in different environmental conditions, genomic regions that encompass polymorphisms undergoing divergent selection are expected to display between-population genetic differentiation levels superior (the ‘islands’) to the general neutral differentiation level of the whole genome (the ‘sea level’) (Nosil et al., 2009) (see Fig. 2, three left side panels). These islands-of-divergence could ultimately encompass polymorphisms leading to reproductive isolation between the two populations (Fig. 2c) (Nosil et al., 2009). For instance, a gene allele that is positively selected in drier conditions may be in genetic linkage to another gene allele affecting pollen phenology. As a consequence of hitchhiking under dry conditions, pollen production time may be delayed and populations growing in dry and wet conditions may consequently become reproductively isolated. Described as extrinsic reproductive isolation (Nosil & Schluter, 2011), this model plainly describes at the genome level how adaptation-with-gene-flow can lead to speciation (Butlin et al., 2012; Yeaman, 2013). In this section (IV.1. ‘Genomics architecture and population genetics of adaptation’), we review the state of knowledge on adaptation in conifers and its genomic basis, providing additional insights as to how the speciation-by-adaption model fits with conifers considering their unique genome features, their population genetics and patterns of speciation. Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Tansley review Review 49 Adaptation at the phenotypic level For over 50 yr, adaptation has been studied in conifers at the phenotypic level by using pedigrees and provenances in reciprocal transplant experiments or common garden tests. Variation has thus been demonstrated in several adaptive traits such as the response to cold (Morgenstern, 1969), among others. Genetic determinism has been established for many traits by estimating the heritable component of phenotypic variation (i.e. heritability, h²) or quantitative genetic differentiation between populations (QST). Results indicate that heritable genetic variations modulate phenotypes in response to selective factors (Savolainen et al., 2007; Prunier et al., 2011) and imply that adaptive variations are imprinted in conifer genomes. Phenotypic plasticity is also observed and each genotype may present a more or less characteristic reaction norm according to the site conditions. Given the distribution of adaptive trait variation among populations and families, quantitative genetics theory predicts that they are underpinned by many alleles and loci with small genetic effects, such that low levels of genetic differentiation are expected for the DNA polymorphisms involved (Le Corre & Kremer, 2012). Nevertheless, several polymorphisms harboring FST values significantly higher than expected for a pure neutral model have been identified in conifers. Molecular studies of adaptation More recently, genomes and population genetics have been studied in several conifers with emerging molecular marker tools being developed to accelerate breeding (Aitken et al., 2008; Neale & Kremer, 2011). Many of these studies have focused on the identification of genetic variations involved in climate adaptation to help breeding programs adjust to changing climate conditions. SNP-arrays represent the most widely used technology because they facilitate the screening of large number of SNPs and genes (Namroud et al., 2008; Eckert et al., 2009a; Pavy et al., 2013). Experiments designed to detect SNPs involved in adaptation can be separated into two categories: (1) ‘bottom-up’ analyses that study genetic variation in species or populations distributed across different environments; and (2) ‘top-down’ analyses that aim to identify polymorphisms associated with the variation of a known adaptive trait (Storz, 2005). On the one hand, bottom-up analyses such as FST-based outlier detection are particularly well suited for studying adaptation in conifers because they assume that loci are independent and that hierarchical population genetic structure has been absent since the beginning of their development (Beaumont & Nichols, 1996). This close match between bottom-up analysis assumptions and conifer population properties may explain the success in identifying adaptive polymorphisms in conifers despite the very low levels of differentiation at adaptive loci that is predicted from quantitative genetics theory (Eveno et al., 2008; Prunier et al., 2011). On the other hand, top-down analyses are consistent with the predictions of quantitative genetics. For example, association genetics studies of adaptive traits such as budset phenology for temperature adaptation have usually found many SNPs involved in trait variation, each typically having a small effect (c. 1–5%) (Eckert et al., 2009a). Researchers usually deploy complementary approaches to confirm findings; therefore, a significant part of the detected polymorphisms are expected to be true adaptive SNPs. In model species, it has become New Phytologist (2016) 209: 44–62 www.newphytologist.com 50 Review Tansley review New Phytologist (a) (b) (c) Fig. 2 The expected distribution of genetic divergence along a chromosome between populations of conifers characterized by low linkage disequilibrium (LD) and distributed across an environmental gradient, using the ‘sea/islands’ metaphor proposed by Nosil et al. (2009). In this example, altitude variation (as illustrated on right-side panels) results in a differential pressure of natural selection upon populations and genotypes leading to the occurrence of loci diverging between populations. (a) Genetic divergence between populations located at different elevations in a single species reveals steep islands-of-divergence around loci of intra-specific altitudinal adaptation, clearly separated by genomic regions with divergence levels well within the neutral expectations, that is, the ‘sea level’. (b) In populations from hybridizing species that thrive at different elevations (the green species at low altitude, red species at high altitude, and hybrids or introgressed individuals are found at intermediate elevations), wider islands-of-divergence are expected around loci for interspecific adaptation still separated by ‘neutral genomic regions’ because LD has not built up enough (yet) to create a ‘continent-of-divergence’. (c) In populations from two reproductively isolated species that may partly overlap along the altitudinal gradient, a signature of speciation-by-adaptation may be expected, that is, continents-of-divergence that include loci involved in interspecific adaptation that have hitchhiked loci involved in the reproductive isolation. routine to integrated diverse complementary methods; for example, approaches were recently developed that integrated association with differentiation analyses (Berg & Coop, 2014) or functional annotations (Daub et al., 2013). Recent work in humans using a variety of approaches has shown that selection at the phenotype level may translate into hundreds of advantageous alleles (Turchin et al., 2012). Given quantitative genetics predictions, many adaptive polymorphisms are expected when testing SNPs for adaptation in conifers; therefore, the integration of different New Phytologist (2016) 209: 44–62 www.newphytologist.com methods may facilitate their detection in conifer genomics as observed in other systems such as human (Turchin et al., 2012). Despite the significant adaptive trait variations, the high level of standing genetic diversity (potentially serving as material for adaptation to selective environmental factors) and the evidence for many genetic polymorphisms involved in adaptation, it clearly appears that conifers did not experience the level of evolutionary radiation driven by adaptation that is reported for angiosperms (Leitch & Leitch, 2012). The low levels of LD and the Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust New Phytologist independence between genes and sites within genes are likely to have differential consequences upon the evolution of conifers in the context of speciation-by-adaptation compared to angiosperms. Is there speciation-by-adaptation in conifers? Within a speciation-by-adaptation framework, conifers represent an interesting case given their unique combination of genetic and genomic attributes. When many alleles with small effects are proposed to underpin adaptation as is the case for conifers, low islands-ofdivergence that are hard to distinguish from the neutral sea-level are expected (Feder et al., 2012). In such cases, reproductive isolation is more likely to result from ‘continents-of-divergence’ arising from several adaptive polymorphisms linked by LD than from islandsof-divergence (Fig. 2c; Yeaman, 2013). Local LD could build up structural variations arising from nonhomologous recombinations, for instance, and thus maintain haplotypes of favorable alleles (Yeaman, 2013); however, the low LD characteristic of conifer genomes would typically prevent the emergence of continents of divergence through hitchhiking, and conifer genomes appear to be less prone to structural variations and rearrangements than angiosperm genomes (as discussed in section II.2. ‘Contemporary issues and emerging challenges’). It has also been advocated that higher LD and islands-ofdivergence might build from positive selection acting on clusters of adaptive loci arising from new mutations and favorable genomic rearrangements (Yeaman, 2013). In conifers, long generation time, large genome sizes and the relative scarcity of structural rearrangements (Leitch & Leitch, 2012) also reduce the likelihood of this scenario. A low likelihood for the appearance of continents-ofdivergence is also suggested by the frequent observation of adaptive alleles separated by sea-level polymorphisms consistent with particularly low LD (Achere et al., 2005; Namroud et al., 2008; Prunier et al., 2011). The islands-of-divergence may be particularly steep in conifers (Fig. 2a) and encompass few genes (as few as one single gene); therefore, they would not be prone to integrate loci leading to reproductive isolation through hitchhiking. These arguments lead to the hypothesis that conifer genomes may be nearly impermeable to speciation-by-adaptation which would contribute to maintaining their slow rate of evolution. As a whole, adaptive studies to date have inadequately surveyed whole gene’s sequence in response to adaptation to test this hypothesis. NGS and RNA-seq to study adaptive evolution in conifers The contribution of natural selection in conifer evolution can be investigated by finely describing the genome architecture of adaptation and studying adaptive divergence in different species. The basic approach involves estimating the extent of divergence around adaptive polymorphisms and requires extensive sequence and polymorphism data. In recent years, RNA-seq has become a major technique to access gene polymorphisms without requiring a homologous genome reference and has been applied to conifers (Chen et al., 2012; Wang et al., 2013; Yeaman et al., 2014). Because RNA-seq facilitates de novo characterization of polymorphisms, it reduces discovery bias related to the identification of polymorphisms in one species subsequently scrutinized in another species (Nosil, 2012). A potential drawback of RNA-seq is biased Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Tansley review Review 51 estimation of allele frequencies associated with differential expression of alleles in diploid tissue sample (De Wit et al., 2015). Because RNA pools vary in complexity between tissues, the tissue sampling strategy will determine both the number and the nature of polymorphisms detected. One potential approach that is unique to conifers (gymnosperms) for overcoming biases or limitations involves characterizing populations by using megagametophytes, a haploid tissue that was reported to contain c. 75% of the known gene transcripts (Verta et al., 2013; see next section IV.2. ‘Role of gene expression variation in conifer adaptability’). Another requirement for studying islands-of-divergence is the ability to locate and order polymorphisms along a physical map or a high density genetic map of the genome (Renaut et al., 2013). RNA-seq for polymorphisms may be ordered within a single gene; however, ordering polymorphisms in neighboring genes represents a challenge given the present status of even the most contiguous genome assemblies (Birol et al., 2013; Neale et al., 2014) and the density of the most recent genetic maps (Pelgas et al., 2011; Neves et al., 2014). Delineating pseudo molecules with continuous advances of genome sequencing, shotgun assembly and genome mapping could overcome this obstacle in the near future, at least in a few highly investigated species. Different patterns of divergence distribution along the genome would be expected in conifers, according to the degree of adaptive divergence and reproductive isolation (Fig. 2). Broadly distributed species should be studied first because they have a higher probability of adaptive variations given the environmental variability within their geographic ranges. The best described conifers to date have wide ecological amplitudes; therefore, they are well-suited for studies of adaptation comparing populations in diverse environmental conditions. At the intraspecific level, the islands-ofdivergence would be particularly steep (Fig. 2a) as discussed. In contrast, higher and wider islands-of-divergence would be expected in pairs of hybridizing species because the divergence time would predate the within-species divergence (Fig. 2b). However, the lack of reproductive isolation suggests that islands-of-divergence would remain limited because they would not encompass polymorphisms leading to reproductive isolation (Fig. 2c). In order to test this hypothesis, both inter- and intraspecific levels should be investigated by sampling at least two not introgressed populations in each hybridizing species. Finally, comparing reproductively isolated species from the same genus with overlapping natural ranges may permit to size the islands-of-divergence related to specific environmental parameters differentiating the species’ ecological niches (Fig. 2c). Here again, comparing the size of islands-of-divergence at both intra- and interspecific levels will require sampling several populations from each species. These three levels of investigation represent a continuum of speciation that would be particularly informative to measure the impact of adaptation in conifer evolution. 2. Role of gene expression variation in conifer adaptability Physiological and comparative gene expression profiling Gene expression in conifers has been most extensively investigated in physiological changes linked to phenotypic plasticity in diverse New Phytologist (2016) 209: 44–62 www.newphytologist.com 52 Review Tansley review traits and responses to changing conditions (see Table 1). The premise for this research is that transcriptome reorganisation underpins tissue differentiation (Table 1a), developmental changes (Table 1b), and responses to both biotic (Table 1c) and abiotic (Table 1d) stress factors. Microarray and more recently, RNA-seq, transcript profiling have identified differentially expressed genes belonging to cellular and biochemical pathways; results have been recently reviewed for topics such as wood formation (Carvalho et al., 2013) and defense (Kolosova & Bohlmann, 2012). The number of differentially expressed genes varies widely from just a few to several thousand, representing < 1% to > 25% of the genes tested, depending on the experiment and statistical criteria (Table 1). The number of differentially expressed genes is largest when comparing tissue types (Yeaman et al., 2014; Raherison et al., 2015) (Table 1a), as seen in other systems as well (Brawand et al., 2011). A transcriptome-wide understanding of the changes is being developed by reconstructing gene expression networks (Lorenz et al., 2011; Raherison et al., 2015), and underlying functional relationships are being explored with promoter–transcription factor interactions (Duval et al., 2014). Gene expression has also identified numerous candidate genes for association mapping aimed at delineating the genetic architecture of quantitative trait variations (Brown et al., 2004; Holliday et al., 2010). Comparative gene expression profiling has developed as a complementary strand of investigation which contrasts individuals with different phenotypes (under same conditions) or genotypes (see Table 2). Recently, this approach has been effective for discovering key genes in defensive pathways by exploiting intraspecific variation in resistance to insect herbivores (Verne et al., 2011; Mageroy et al., 2015) or fungal pathogens (Liu et al., 2013) (Table 2a). Researchers have also targeted genotypic variations in wood properties (Li et al., 2011; Liu et al., 2013) and populations adapted to different climates (Holliday et al., 2008) (Table 2a). Interspecific contrasts are also emerging and provide insights into the conservation of gene expression variation between species, which was found to be very strong for tissue preferential expression (Raherison et al., 2012) and strong but variable for responses to heat and water stresses (Yeaman et al., 2014) (Table 2b). Understanding how expression variation emerges and contributes to evolution and adaptation represents a challenging but important goal for the field of conifer genomics. Expression evolution in conifers Divergence in gene expression has long been hypothesized to be more important to species evolution than divergence in gene function (King & Wilson, 1975). Despite the intense search for signatures of adaptive gene expression divergence, major open questions remain and conifers could be uniquely informative for addressing some of the knowledge gaps. These include expression divergence in the face of strong gene flow or in outbreeding populations, which are not well represented by traditional research model organisms. Patterns of differential gene expression that are compatible with an adaptive hypothesis have been observed in Sitka spruce (Picea sitchensis) cold-hardiness (Holliday et al., 2008). Drawing conclusions from natural expression variation observed in wild populations is far from simple, as multiple genetic parameters will determine the New Phytologist (2016) 209: 44–62 www.newphytologist.com New Phytologist levels of observable expression variation in a natural setting. Here, we discuss how the population genetics landscape typical to conifers is likely to shape expression variation, and the genetic dissection of expression variation to understand adaptive divergence in gene expression. We may predict that the high levels of genetic variation typical of conifers are associated with ample expression variation. Because stabilizing selection seems to be the dominant evolutionary force acting on gene expression, most of the expression variation is expected to be deleterious and selected against (Jordan et al., 2005; Fay & Wittkopp, 2008; Bedford & Hartl, 2009; Brawand et al., 2011); however, some component is likely to be selectively neutral and scale with sequence divergence within the constraints posed by minimal and maximal expression levels (Whitehead & Crawford, 2006). Our current knowledge of the evolutionary importance of expression variation has largely been derived from studies of a few strains or genotypes of well-established model organisms. Therefore, the balance between beneficial, neutral, nearly neutral and deleterious expression variation is not understood in natural populations exposed to the whole spectrum of selection forces, although results are emerging in species such as stickleback fish (Leder et al., 2014). In this context, the numerous germplasm collections and common-garden experiments developed in conifer breeding (Neale & Kremer, 2011) provide a valuable resource for dissecting expression variation into genetic and environmental components and pinpoint expression traits that show signatures compatible with selection, such as seen in population differentiation. The dynamics of expression variation within a population are also intimately connected with dominance relationships between alleles, which influence both the fixation of beneficial mutations and the persistence of deleterious recessive variations under a mutation–selection balance (Hartl & Clark, 2007). Because large populations provide a large mutational target and facilitate efficient natural selection, gene expression in conifers with wide ranges and large ecological amplitudes is expected to evolve under a dynamic mutation–selection balance. On the one hand, as even low levels of recessivity increase the frequency of deleterious alleles in a population (Hartl & Clark, 2007), such variation may persist and represent a significant genetic load (Lemos et al., 2008; Charlesworth & Willis, 2009) as reported in conifer populations (Karkkainen et al., 1996). The dynamics of recessive expression variation are likely to differ from that expected in self-fertilizing and largely homozygous plants, where phenotypic masking by dominance relationships is more restricted and purging of detrimental expression variation is more efficient (Zhang et al., 2011). On the other hand, even a moderately beneficial expression divergence can be under efficient selection in conifer populations because of their large effective size which lessens the effect of genetic drift. However, other population level evolutionary processes might interfere with the expected fixation outcome, such as variable selection pressures on alleles spreading outside their adaptive ranges (Holliday et al., 2011) or different genetic basis for the same adaptation in different lineages (Prunier et al., 2012). Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Comparison of seven vegetative tissue types from aerial and below ground organs Foliage vs root plus stem tissues Foliage vs root plus stem tissues Oligo MA RNA-seq Picea glauca Pinus contorta P. glauca 9 P. engelmannii Cambial tissues at the active vs dormant stages Cambial tissues at the active vs reactivating stages Cambial tissues at the reactivating vs dormant stages Shoot tips during bud formation after transfer to short days (t = 1, 3, 7, 14, 28, 70 d after transfer) Needles at late summer (transition stage) vs early winter (dormancy stage) RNA-seq cDNA MA cDNA MA cDNA MA cDNA MA Cunninghamia lanceolata Picea glauca Picea sitchensis Pinus taeda Pinus pinaster Bark of control vs white pine weevil (P. strobe) treated trees Needles of resistant trees; un-infected vs infected with white pine blister rust (Cronartium ribicola) Needles of susceptible trees: un-infected vs infected with C. ribicola Needles, control vs methyl jasmonate treated trees cDNA MA RNA-seq Pinus monticola Needles of control vs jasmonic acid treated trees RNA-seq Larix gmelinii Picea sitchensis Sample types Methods1 Species (c) Defenses and responses to biotic factors Early (wood formation) vs latewood (cessation of growth and dormancy) Oligo MA Cryptomeria japonica Early vs latewood of low specific gravity Early vs latewood of high specific gravity Xylem at five time points within a growing season Sample types Methods1 87 (4) 110 (5) 667 (19) 2224 (10.2) 4415 (7.3) 883 (1.5) 4018 (6.7) 4660 (43) 10 380 (57) Significant 23 000 9720 51 157 Tested 562 (2.4) 2767 (5.4) 789 (3.4) 2382 (24.5) Yang & Loopstra (2005) Paiva et al. (2008) adjP ≤ 0.0001 Men et al. (2013) adjP ≤ 0.001; |ratio (log2)| ≥ 1 adjP ≤ 0.05; |ratio (log2)| ≥ 0.6 adjP ≤ 0.05; |ratio (log2)| ≥ 0.6 Liu et al. (2013) Ralph et al. (2006) References Statistical criteria3 Holliday et al. (2008) El Kayal et al. (2011) adjP ≤ 0.01; |ratio| ≥ 1.5 adjP ≤ 0.05; |ratio (log2)| ≥ 2 adjP ≤ 0.01 Wang et al. (2013) Mishima et al. (2014) P ≤ 0.05, 0.2; |ratio (log2)| ≥ 1 adjP ≤ 0.001; |ratio (log2)| ≥ 2 Yeaman et al. (2014) adjP ≤ 0.01 References Raherison et al. (2015) adjP ≤ 0.05 Statistical criteria3 References Statistical criteria3 2383 (4.7) Significant No. of features or sequences2 3512 2171 21 840 10 400 59 669 18 082 Tested No. of features or sequences2 8131 (34) 6695 (28.5) 18 052 (76) 23 853 23 889 23 519 Significant Tested No. of features or sequences2 Species (b) Comparative analyses of developmental stages Sample types Methods1 Species (a) Comparative analyses of tissue types Table 1 Gene expression profiling of development and stress responses in conifer trees. Selected studies represent the diversity of species and biological questions investigated in conifers, as well as the different methods (microarrays, RNA-Seq) and the numbers of genes tested New Phytologist Tansley review Review 53 New Phytologist (2016) 209: 44–62 www.newphytologist.com New Phytologist (2016) 209: 44–62 www.newphytologist.com Pinus oligo MA Pinus radiata Compression vs normal wood Embryonic callus generated at cold (18 °C) vs warm (30 °C) temperature Needles of trees grown under seven treatments varying in temperature, humidity and day length. RNA-seq RNA-seq RNA-seq cDNA MA Chamaecyparis obtusa Picea abies Picea glauca 9 P. engelmannii Pinus contorta Pinus taeda Hypocotyls were grown under continuous red vs far-red light P.taeda cDNA MA Pinus sylvestris 12 523 644 (5.1) 496 (7.2) 11 658 (48.8) 2445 (9.2) 23 889 26,496 6841 6413 (27.3) 2875 (7.1) 1608 (1.1) Significant 23 519 40 602 143 723 Tested 1761 (1) 23 084 (13) 12 718 (7.2) FDR = 1; |ratio| ≥ 1.5 adjP ≤ 0.001; |ratio (log2)| ≥ 1.5 adjP ≤ 0.05; |ratio (log2)| ≥ 0.95 Ranade et al. (2013) Villalobos et al. (2012) Lorenz et al. (2011) Yeaman et al. (2014) Sato et al. (2014) Yakovlev et al. (2014) adjP ≤ 0.05 |ratio (log2)| ≥ 1 adjP ≤ 0.01 References Dubouzet et al. (2014) Adomas et al. (2008) adjP ≤ 0.01; |ratio (log2)| ≥ 0.3 adjP ≤ 0.01; |ratio (log2)| ≥ 1 References Statistical criteria3 Statistical criteria3 294 (13.9) 16 (0.8) 10 (0.5) Significant 2 cDNA MA, cDNA microarray; oligo MA, oligonucleotide microarray; RNA-seq, RNA sequencing; No. of features or sequences, represent genes, transcripts, contigs or probes; (%), The number in parentheses is the proportion of significant features or sequences relative to the total number tested; 3 Statistical significance criteria: P, P-value; adjP, adjusted P-value. 1 Compression vs normal wood cDNA MA Pinus pinaster Well-watered vs drought-stressed roots in young trees Sample types Methods1 Species 175 614 2109 Tested No. of features or sequences2 No. of features or sequences2 Roots, control vs saprotrophic fungus (Trichoderma aureoviride) inoculated, 15 days post inoculation. Roots of control vs mutualistic fungus (Laccaria bicolor) inoculated trees, 15 days post inoculation. Roots, control vs pathogenic fungus inoculated (Heterobasidion annosum), 15 d. post inoculation Mucilaginous xylem of control vs ethephon treated trees, 8 weeks post treatment Xylem (woody fibrous tissue) of control vs ethephon treated trees, 8 weeks post treatment Bark of control vs ethephon, 8 weeks post treatment P. taeda cDNA MA Pinus sylvestris (d) Responses to abiotic stress factors Sample types Methods1 Species (c) Defenses and responses to biotic factors Table 1 (Continued) 54 Review Tansley review New Phytologist Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust cDNA MA RNA-seq cDNA MA cDNA MA Picea sitchensis Pinus monticola Pinus taeda Pinus radiata Secondary xylem and phloem of the main stem of young trees. Hybridization results were compared to identify genes that vary similarly among spruce species Needles of trees grown under seven treatments varying in temperature, humidity and day length. Sequences and expression results from the two species were compared. Oligo MA RNA-seq Picea glauca, P. mariana, P. sitchensis Picea glauca 9 P. engelmannii, Pinus contorta 562 (2.4) 51 (2.3) 131 (6) 112 (3.4) 295 (0.9) 789 (3.4) 2224 (10.2) 486 (2.1) 191 (1) Significant 5794 orthologs 23 853 Tested 4298 (74) Vary similarly 5407 Vary similarly Significant No. of features or sequences2 3320 2171 23 000 21 840 23 853 17 825 Tested No. of features or sequences2 Yeaman et al. (2014) Raherison et al. (2012) adjP ≤ 0.01 adjP ≤ 0.01 References Li et al. (2011) Yang & Loopstra (2005) Liu et al. (2013) Holliday et al. (2008) Mageroy et al. (2015) Verne et al. (2011) References Statistical criteria3 P ≤ 0.05 adjP ≤ 0.01 adjP ≤ 0.05; |ratio (log2)| ≥ 2 adjP ≤ 0.05; |ratio (log2)| ≥ 0.6 adjP ≤ 0.05; |ratio (log2)| ≥ 0.6 adjP ≤ 0.05 Statistical criteria3 2 cDNA MA, cDNA microarray; oligo MA, oligonucleotide microarray; RNA-seq, RNA sequencing. No. of features or sequences, represent genes, transcripts, contigs or probes; (%), The number in parentheses is the proportion of differential of features or sequences relative to the total number of tested; 3 Statistical significance criteria: P, P-value; adjP, adjusted P-value. 1 Comparisons Methods1 Species (b) Interspecfic comparisons Needles of susceptible vs resistant trees to the spruce budworm (Choristoneura occidentalis) Needles comparing populations adapted to different climates Oligo MA Picea glauca Needles of resistant trees; un-infected vs infected with white pine blister rust (Cronartium ribicola) Needles of susceptible trees: un-infected vs infected with C. ribicola Earlywood of low vs high specific gravity Latewood of low vs high specific gravity Earlywood of high vs low stiffness Latewood of high vs low stiffness Bark of susceptible vs resistant trees to the white pine weevil (Pissodes strobi) cDNA MA Picea glauca 9 P. engelmanni Comparisons Methods1 Species (a) Intraspecific comparisons Table 2 Comparative gene expression profiling of phenotypically or genetically differentiated trees New Phytologist Tansley review Review 55 New Phytologist (2016) 209: 44–62 www.newphytologist.com 56 Review Tansley review There are also several conifer species that have small or fragmented populations with more restricted gene flow (Alberto et al., 2013) although they have generally been less studied. In small populations where the strength of selection relative to drift is expected to be reduced, a larger proportion of expression variation is likely to occur mainly driven by neutral processes (Khaitovich et al., 2006; Rockman et al., 2010). Small or fragmented conifer populations are also susceptible to buildup of genetic load due to less efficient purging of slightly deleterious expression variation. Approaches for studying expression evolution in conifers The level of heritable expression variation in natural populations depends on many aspects of species’ biology, such as population characteristics and evolutionary history. As the effects of these parameters are especially challenging to discern in any population or species comparison, developing an understanding of the genetic basis of gene expression variation likely represents the best starting point to gain insights into expression evolution in conifers. Expression divergence between populations can be due to the effects of a few loci with manifold phenotypic effects or multiple loci with incremental influence on expression levels (see Fig. 3, hypotheses). In the first case, changes in allele frequencies can have large effects on expression phenotypes irrespective of whether the changes result from selection or drift. In the latter case, selection must act on multiple loci strongly enough to overcome the effects of drift for the phenotype to have an adaptive value. In order to dissect the evolutionary importance of expression variation, we need to be able to determine between these types of scenarios. The biological characteristics of conifers such as late sexual maturity and obligate outbreeding rule out many experimental approaches used to investigate the genetic basis of expression New Phytologist variation in model systems (e.g. pure lines or recombinant inbreds) because they are impractical or would take decades to implement. However, the conserved seed morphology of gymnosperms and analysis of megagametophytes may provide a useful alternative for developing an approach that does not require any breeding (Fig. 3, system). Like all land plants, gymnosperms have a haplodiplontic life cycle where gamete (reproductive cell) formation involves a vegetative haploid state between meiosis and gamete formation (reviewed in Williams, 2009). The gametophyte development in conifers starts with meiosis leading to the production of pollen in the male strobili and a megagametophyte in the female strobili. The female megagametophyte grows to c. 2500 cells through mitosis and produces archegonia, cell structures that are fertilized by the male reproductive cell. A mature seed contains a fully developed diploid embryo surrounded by the haploid megagametophyte (Fig. 3, seed development). Forest genetics research has used the meiotic megagametophyte extensively to identify allelic protein variants (isozymes) or for genetic mapping purposes. For example, megagametophyte segregation analysis identified a null mutant of the cad gene P. taeda and led to the discovery of unexpected metabolic plasticity in lignin biosynthesis (MacKay et al., 1997). Genetic mapping, most recently by Neves et al. (2014), and genome sequencing (Nystedt et al., 2013; Neale et al., 2014) have capitalized on the haploid nature of the megagametophytes to reduce genotypic complexity. The megagametophytes can also be used to study the genetic basis of gene expression variation in natural conifer populations by analyzing series of megagametophytes from a single tree (Verta et al., 2013). Because each megagametophyte is a haploid product of meiosis, heterozygous loci in the mother tree segregate 1 : 1 in the haploid megagametophytes (Fig. 3, system). Sequencing-based Fig. New Phytologist (2016) 209: 44–62 www.newphytologist.com Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust New Phytologist approaches (RNA-seq) are especially attractive to discern between alleles with expression differences because they facilitate parallel identification of genotypes and gene expression levels from the same data set (Fig. 3, RNA-sequencing), and they identify expression quantitative trait loci (eQTL). The megagametophyte approach could contribute a missing link between diversity and function in the burgeoning set of genomic resources available for keystone conifer species (Rigault et al., 2011; Raherison et al., 2012; Nystedt et al., 2013; Pavy et al., 2013; Neale et al., 2014). Because of its experimental simplicity, it can be applied to any wild individual or gymnosperm species with large enough seeds and thus facilitate population-wide analyses. Initial steps towards the application of megagametophyte analysis to natural conifer populations have already taken place (Verta et al., 2013). Allelic expression variation was found to be abundant and largely unique to individuals when comparing two trees of the same white spruce population. An overall low level of pleiotropy was found and suggested that selection affecting the expression in one gene may have little consequence on the expression of other genes (Verta et al., 2013). Expression variation was over-represented in stress-response genes, indicating its potential significance in adaptation to biotic and abiotic stress factors, as also proposed in humans (Ye et al., 2013). A logical next step is to take this approach to the population level, with the promise of uncovering the genetic architecture of gene expression traits involved in adaptation and evolution (Fig. 3, hypotheses). The role and genetic structure of expression divergence could be investigated by comparing locally adapted populations (Mimura & Aitken, 2010) or central and marginal populations (Savolainen et al., 2007) and by analysing the development and maintenance of reproductive barriers (De La Torre et al., 2014), among other areas. V. Emerging research directions toward a more integrated understanding of genetic diversity and genome function Research outputs from the human genome have provided an understanding of the functional consequences of many different types of genetic or genomic variations associated with heritable disorders, predisposition and development of cancer, and changes associated with development and aging, among other areas of inquiry. Structural variations, gene copy number variations (CNVs), epigenetic changes such as DNA methylation, and regulatory controls by noncoding RNAs and other functional DNA elements represent major mechanisms of genetic variation from which phenotypic variation may arise. By comparison, conifer genomics research has focused primarily on SNP variation in or around relatively small sets of genes (Brown et al., 2004; Eckert et al., 2009a) and has developed limited insights into the functional basis of complex traits. We propose that analyses of diversity encompassing more broadly the different types of impacts on genome function are needed to address emerging challenges that face conifer trees and forests. For example, understanding adaptability to changing conditions depends on phenotypic plasticity, standing genetic variation and associated phenotypic variability, and the interactions between them. Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Tansley review Review 57 Direct or indirect effects on gene expression are common denominators of several types of genomic variation which have been under-studied in conifers to date, including epigenetic control, CNVs (through dosage), sequence variations in regulatory elements (noncoding DNA) and variations at trans-acting loci. Several authors (Jordan et al., 2005; Wray, 2007) have argued from first principles that mutations altering the level of gene expression make a qualitatively distinct contribution to phenotypic evolution by affecting certain kinds of traits and being acted upon more efficiently by selection. The exact nature of that contribution is being delineated by recent empirical evidence in systems such as stickleback fish and Arabidopsis linking adaptation to regulatory sequence variations and variations in gene expression responsiveness to stress, respectively (Jones & Wang, 2012; Lasky et al., 2014). We may therefore predict that investigating regulatory and gene expression variations side-by-side with DNA sequence variations will lead to a more integrated understanding of the genomic basis of adaptability. Here, we discuss two areas of relevance that have begun to develop in conifers, that is, epigenetics and copy-number variation. 1. Epigenetic variation Epigenetic variation provides a mechanism for phenotypic diversity in the absence of genetic mutation and thus plays a role phenotypic plasticity and adaptation in plants (Schmitz et al., 2011). It was proposed to be especially important for long-lived organisms such as for trees (see reviews: Yakovlev et al., 2011; Br€autigam et al., 2013). A temperature-dependent epigenetic ‘memory’ conditioned by the temperature during early embryo development was reported in P. abies and shown to influence the timing of bud phenology in offspring (Johnsen et al., 2005; Yakovlev et al., 2010). Epigenetic variation affects gene expression and has been associated with changes in DNA methylation and regulation by noncoding RNAs. In Arabidopsis, trans-generational epigenetic variation resulting in phenotypic diversity has been directly linked to DNA methylation altering transcription (Schmitz et al., 2011). In poplar, DNA methylation associated with aging shaped drought responses (Raj et al., 2011). Yakovlev et al. (2010) identified specific noncoding micro RNAs whose differential expression indicated a putative role in the epigenetic regulation spruce (Picea abies). Conifers accumulate micro RNAs that include both shared and distinct sequences compared to angiosperms (Yakovlev et al., 2011), but in contrast to angiosperms, they appear to produce much lower levels of 24nt small, interfering RNAs (Dolgosheina et al., 2008), except in reproductive tissues (Nystedt et al., 2013). Research on epigenetic memory in P. abies has contributed significantly to our understanding of epigenetic control in adaptability and, although the underlying mechanisms are only partly identified, the findings are stimulating the development of a field of ‘ecological epigenetics’ (Br€autigam et al., 2013). 2. Copy-number variation Structural polymorphisms such as gene copy-number variations (CNVs) epitomize the dynamic nature of genomes (Chain et al., New Phytologist (2016) 209: 44–62 www.newphytologist.com 58 Review New Phytologist Tansley review 2014). Genome-wide analyses have associated CNVs with several disease phenotypes in humans (Craddock et al., 2010), and were associated with local adaptation in stickleback fish populations (Chain et al., 2014). Many CNVs modify transcript levels (Schlattl et al., 2011) and result in protein dosage effects which may be acted upon by selection if they alter fitness. In P. sitchensis, Hall et al. (2011) showed that weevil resistance was associated with CNVs in enzymes involved in (+)-3-carene biosynthesis. A first large-scale analyses in conifers was based on sequence capture analyses of 7434 genes and identified 408 putative cases of presence absence variation (PAV) in P. taeda (Neves et al., 2014). Including finescale CNV information was suggested as a means of increasing the resolution and power of association studies aimed at delineating the genetic architecture of complex traits (Schlattl et al., 2011). To this end, complete genome hybridization arrays (aCGH) have been developed in Picea glauca and used to identify CNVs in several hundreds of genes; much variation in affected genes was observed between full-sib families from the same population (J. Prunier et al., unpublished). VI. Conclusion Conifer genomics has made huge strides in recent years owing to NGS as well as developments in quantitative and population genetics, bioinformatics, and evolutionary biology. We have argued that more integrated analyses of conifer genetic diversity and genome function will significantly enhance our understanding of the molecular the basis of phenotypic plasticity and adaptation as determinants of adaptability in conifers. NGS technologies can be deployed to reveal variations in gene expression, DNA methylation, regulatory microRNAs, CNVs and other structural variations in addition to coding and regulatory sequence variation – simultaneously. These methodologies may also be invaluable for analyzing more species and thus acquiring an understanding that spans across the diverse conifers that make up our forests. Developments may also be expected in areas such as interacting genomes and metagenomics, protein and gene expression networks, among others, which we have not discussed here. Genomic selection methods (Grattapaglia & Resende, 2011) – recently applied to hardwood and conifer trees and shown for its potential to significantly accelerate breeding (Beaulieu et al., 2004; Resende et al., 2012) – will also profit from higher throughput genomic technologies. Some of the challenges that lie ahead include higher throughput and lower cost phenotypic analyses for conifer trees to facilitate genotype–phenotype investigations, and the development and validation of appropriate experimental systems for analyzing conifer gene functions. Transient assays using conifer cells were recently shown to enable hundreds of functional tests (Duval et al., 2014) offering a homologous system which may be advantageous to complement heterologous analyses in model systems. Acknowledgements The authors thank Sebastien Caron and Isabelle Giguere for preparing Fig. 1, Armand Seguin for the phylogenetic tree in Fig. 1, New Phytologist (2016) 209: 44–62 www.newphytologist.com Elie Raherison for assistance with the preparation of Table 1. J-P.V. and J.P. were supported by grants from the Natural Sciences and Engineering Research Council of Canada, the Quebec Ministry of the Economy, Innovation and Exports, and Genome Canada, Genome Quebec and Genome British Columbia for the SmarTForests project (J.J.M.). This review is an output of the partnership between the SMarTForest project and the ProCoGen European Network. References Achere V, Favre JM, Besnard G, Jeandroz S. 2005. Genomic organization of molecular differentiation in Norway spruce (Picea abies). Molecular Ecology 14: 3191–3201. Adomas A, Heller G, Olson A, Osborne J, Karlsson M, Nahalkova J, Van Zyl L, Sederoff R, Stenlid J, Finlay R. 2008. Comparative analysis of transcript abundance in Pinus sylvestris after challenge with a saprotrophic, pathogenic or mutualistic fungus. Tree Physiology 28: 885–897. Aitken SN, Yeaman S, Holliday JA, Wang T, Curtis-McLane S. 2008. Adaptation, migration or extirpation: climate change outcomes for tree populations. Evolutionary Applications 1: 95–111. Alberto FJ, Aitken SN, Alıa R, Gonza lez-Martınez SC, H€a nninen H, Kremer A, Lefevre F, Lenormand T, Yeaman S, Whetten R. 2013. Potential for evolutionary responses to climate change–evidence from tree populations. Global Change Biology 19: 1645–1661. Beaulieu J, Perron M, Bousquet J. 2004. Multivariate patterns of adaptive genetic variation and seed source transfer in Picea mariana. Canadian Journal of Forest Research 34: 531–545. Beaumont MA, Nichols RA. 1996. Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society of London B: Biological Sciences 263: 1619–1626. Bedford T, Hartl DL. 2009. Optimization of gene expression by natural selection. Proceedings of the National Academy of Sciences, USA 106: 1133–1138. Benito GM, Alia R, Robson TM, Zavala MA. 2011. Intraspecific variability and plasticity influence potential tree species distributions under climate change. Global Ecology and Biogeography 20: 766–778. Bennett MD, Leitch IJ. 2005. Angiosperm DNA C-values database. (release 6.0, Oct. 2005). [WWW document] URL http://www.kew.org/cvalues/ [accessed 15 October 2015]. Bennetzen JL, Ma J, Devos KM. 2005. Mechanisms of recent genome size variation in flowering plants. Annals of Botany 95: 127–132. Berg JJ, Coop G. 2014. A population genetic signal of polygenic adaptation. PLoS Genetics 10: e1004412. Birol I, Raymond A, Jackman SD, Pleasance S, Coope R, Taylor GA, Yuen MM, Keeling CI, Brand D, Vandervalk BP et al. 2013. Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics 29: 1492–1497. Bowe LM, Coat G, dePamphilis CW. 2000. Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales’ closest relatives are conifers. Proceedings of the National Academy of Sciences, USA 97: 4092–4097. Brasier C, Webber J. 2010. Plant pathology: sudden larch death. Nature 466: 824– 825. Br€a utigam K, Vining KJ, Lafon-Placette C, Fossdal CG, Mirouze M, Marcos JG, Fluch S, Fraga MF, Guevara M, Abarca D. 2013. Epigenetic regulation of adaptive responses of forest tree species to the environment. Ecology and Evolution 3: 399–415. Brawand D, Soumillon M, Necsulea A, Julien P, Csa rdi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M. 2011. The evolution of gene expression levels in mammalian organs. Nature 478: 343–348. Brown GR, Gill GP, Kuntz RJ, Langley CH, Neale DB. 2004. Nucleotide diversity and linkage disequilibrium in loblolly pine. Proceedings of the National Academy of Sciences, USA 101: 15 255–15 260. Burban C, Petit RJ. 2003. Phylogeography of maritime pine inferred with organelle markers having contrasted inheritance. Molecular Ecology 12: 1487–1495. Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust New Phytologist Buschiazzo E, Ritland C, Bohlmann J, Ritland K. 2012. Slow but not low: genomic comparisons reveal slower evolutionary rate and higher dN/dS in conifers compared to angiosperms. BMC Evolution Biology 12: 8. Butlin R, Debelle A, Kerth C, Snook RR, Beukeboom LW, Castillo Cajas RF, Diao W, Maan ME, Paolucci S, Weissing FJ et al. 2012. What do we need to know about speciation? Trends in Ecology & Evolution 27: 27–39. Carvalho A, Paiva J, Louzada J, Lima-Brito J. 2013. The transcriptomics of secondary growth and wood formation in conifers. Molecular Biology International 2013: 974324, doi: 10.1155/2013/974324. Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA. 2002. Selection for short introns in highly expressed genes. Nature Genetics 31: 415– 418. Castro J, Zamora R, H odar JA, Gomez JM. 2004. Seedling establishment of a boreal tree species (Pinus sylvestris) at its southernmost distribution limit: consequences of being in a marginal Mediterranean habitat. Journal of Ecology 92: 266–277. Chain FJ, Feulner PG, Panchal M, Eizaguirre C, Samonte IE, Kalbe M, Lenz TL, Stoll M, Bornberg-Bauer E, Milinski M. 2014. Extensive copy-number variation of young genes across stickleback populations. PLoS Genetics 10: e1004830. Chancerel E, Lamy J-B, Lesur I, Noirot C, Klopp C, Ehrenmann F, Boury C, Le Provost G, Label P, Lalanne C. 2013. High-density linkage mapping in a pine tree reveals a genomic region associated with inbreeding depression and provides clues to the extent and distribution of meiotic recombination. BMC Biology 11: 50. Charlesworth D, Willis JH. 2009. The genetics of inbreeding depression. Nature Reviews Genetics 10: 783–796. Chen J, Uebbing S, Gyllenstrand N, Lagercrantz U, Lascoux M, K€a llman T. 2012. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms. BMC Genomics 13: 589. Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, Robson S, Vukcevic D, Barnes C, Conrad DF, Giannoulatou E. 2010. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464: 713–720. Daub JT, Hofer T, Cutivet E, Dupanloup I, Quintana-Murci L, RobinsonRechavi M, Excoffier L. 2013. Evidence for polygenic adaptation to pathogens in the human genome. Molecular Biology and Evolution. 30: 1544–1558. De La Torre AR, Roberts DR, Aitken SN. 2014. Genome-wide admixture and ecological niche modelling reveal the maintenance of species boundaries despite long history of interspecific gene flow. Molecular Ecology 23: 2046–2059. De Wit P, Pespeni MH, Palumbi SR. 2015. SNP genotyping and population genomics from expressed sequences–current advances and future possibilities. Molecular Ecology 24: 2310–2323. Demura T, Fukuda H. 2007. Transcriptional regulation in wood formation. Trends in Plant Science 12: 64–70. Dolgosheina EV, Morin RD, Aksay G, Sahinalp SC, Magrini V, Mardis ER, Mattsson J, Unrau PJ. 2008. Conifers have a unique small RNA silencing signature. RNA 14: 1508–1515. Dubouzet JG, Donaldson L, Black MA, McNoe L, Liu V, Lloyd-Jones G. 2014. Heterologous hybridisation to a Pinus microarray: profiling of gene expression in Pinus radiata saplings exposed to ethephon. New Zealand Journal of Forestry Science 44: 21. Duval I, Lachance D, Giguere I, Bomal C, Morency M-J, Pelletier G, Boyle B, MacKay JJ, Seguin A. 2014. Large-scale screening of transcription factor– promoter interactions in spruce reveals a transcriptional network involved in vascular development. Journal of Experimental Botany 65: 2319–2333. Eckert AJ, Bower AD, Wegrzyn JL, Pande B, Jermstad KD, Krutovsky KV, St. Clair JBS, Neale DB. 2009a. Asssociation genetics of coastal Douglas fir (Pseudotsuga menziesii var. menziesii, Pinaceae). I. Cold-hardiness related traits. Genetics 182: 1289–1302. Eckert AJ, Pande B, Ersoz ES, Wright MH, Rashbrook VK, Nicolet CM, Neale DB. 2009b. High-throughput genotyping and mapping of single nucleotide polymorphisms in loblolly pine (Pinus taeda L.). Tree Genetics & Genomes 5: 225– 234. El Kayal W, Allen CCG, Ju CJT, Adams E, King-Jones S, Zaharia LI, Abrams SR, Cooke JEK. 2011. Molecular events of apical bud formation in white spruce, Picea glauca. Plant, Cell & Environment 34: 480–500. Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Tansley review Review 59 Eveno E, Collada C, Guevara MA, Leger V, Soto A, Diaz L, Leger P, GonzalezMartinez SC, Cervera MT, Plomion C et al. 2008. Contrasting patterns of selection at Pinus pinaster Ait. drought stress candidate genes as revealed by genetic differentiation analyses. Molecular Biology and Evolution 25: 417–437. Farjon A, Page CN. 1999. Conifers: status survey and conservation action plan. IUCN/ SSC Conifer Specialist Group. Gland, Switzerland and Cambridge, UK: IUCN. Farrar JL. 1995. Trees in Canada. Markham, Ontario, Canada: Canadian Forest Service & Fitzhenry and Whiteside Limited. Fay J, Wittkopp P. 2008. Evaluating the role of natural selection in the evolution of gene regulation. Heredity 100: 191–199. Feder JL, Gejji R, Yeaman S, Nosil P. 2012. Establishment of new mutations under divergence and genome hitchhiking. Philosophical Transactions of the Royal Society B 367: 461–474. Fernandez-Pozo N, Canales J, Guerrero-Fernandez D, Villalobos DP, Dıaz Perdiguero P, Moreno SM, Bautista R, Flores-Monterroso A, Guevara MA, Collada C. 2011. EuroPineDB: a high-coverage web database for maritime pine transcriptome. BMC Genomics 12: 366. Gernandt DS, Willyard A, Syring JV, Liston A. 2011. The conifers (Pinophyta). In: Plomion C, Bousquet J, Kole C, eds. Genetics, genomics and breeding of conifers. New York, NY, USA: Edenbridge Science/CRC Press, 1–39. Gramzow L, Weilandt L, Theißen G. 2014. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants. Annals of Botany 114: 1407–1429. Grattapaglia D, Resende MDV. 2011. Genomic selection in forest tree breeding. Tree Genetics & Genomes 7: 241–255. Guillet-Claude C, Isabel N, Pelgas B, Bousquet J. 2004. The evolutionary implications of knox-I gene duplications in conifers: correlated evidence from phylogeny, gene mapping, and analysis of functional divergence. Molecular Biology and Evolution 21: 2232–2245. Hall DE, Robert JA, Keeling CI, Domanski D, Quesada AL, Jancsik S, Kuzyk MA, Hamberger B, Borchers CH, Bohlmann J. 2011. An integrated genomic, proteomic and biochemical analysis of (+)-3-carene biosynthesis in Sitka spruce (Picea sitchensis) genotypes that are resistant or susceptible to white pine weevil. Plant Journal 65: 936–948. Hamann A, Wang T. 2006. Potential effects of climate change on ecosystem and tree species distribution in British Columbia. Ecology 87: 2773–2786. Hanewinkel M, Cullmann DA, Schelhaas M-J, Nabuurs G-J, Zimmermann NE. 2013. Climate change may cause severe loss in the economic value of European forest land. Nature Climate Change 3: 203–207. Hartl DL, Clark AG. 2007. Principles of population genetics. Sunderland, MA, USA: Sinauer Associates. Holliday JA, Ralph SG, White R, Bohlmann J, Aitken SN. 2008. Global monitoring of autumn gene expression within and among phenotypically divergent populations of Sitka spruce (Picea sitchensis). New Phytologist 178: 103– 122. Holliday JA, Ritland K, Aitken SN. 2010. Widespread, ecologically relevant genetic markers developed from association mapping of climate-related traits in Sitka spruce (Picea sitchensis). New Phytologist 188: 501–514. Holliday JA, Suren H, Aitken SN. 2011. Divergent selection and heterogeneous migration rates across the range of Sitka spruce (Picea sitchensis). Proceedings of the Royal Society B 279: 75–83. Johnsen Ø, Fossdal CG, Nagy N, Mølmann J, Dæhleno O, Skrøppa T. 2005. Climatic adaptation in Picea abies progenies is affected by the temperature during zygotic embryogenesis and seed maturation. Plant, Cell & Environment 28: 1090– 1102. Jones OR, Wang J. 2012. A comparison of four methods for detecting weak genetic structure from marker data. Ecology and Evolution 2: 1048–1055. Jordan IK, Mari~ no-Ramırez L, Koonin EV. 2005. Evolutionary significance of gene expression divergence. Gene 345: 119–126. Karkkainen K, Koski V, Savolainen O. 1996. Geographical variation in the inbreeding depression of Scots pine. Evolution 50: 111–119. Keeling CI, Weisshaar S, Ralph SG, Jancsik S, Hamberger B, Dullat HK, Bohlmann J. 2011. Transcriptome mining, functional characterization, and phylogeny of a large terpene synthase gene family in spruce (Picea spp.). BMC Plant Biology 11: 43. Kenrick P. 1999. Botany: The family tree flowers. Nature 402: 358–359. New Phytologist (2016) 209: 44–62 www.newphytologist.com 60 Review Tansley review Khaitovich P, Enard W, Lachmann M, P€a€abo S. 2006. Evolution of primate gene expression. Nature Reviews Genetics 7: 693–702. King M-C, Wilson AC. 1975. Evolution at two levels in humans and chimpanzees. Science 188: 107–116. Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C, Retzel E, Whetten R, Sederoff R. 2003. Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proceedings of the National Academy of Sciences, USA 100: 7383–7388. Kolosova N, Bohlmann J. 2012. Conifer defense against insects and fungal pathogens. In: Matyssek R, Schnyder H, Oßwald W, Ernst D, Munch JC, Pretzsch H, eds. Growth and defence in plants. Ecological Studies Vol. 220.Heidelberg, Germany: Springer, 85–109. Komulainen P, Brown G, Mikkonen M, Karhu A, Garcia-Gil MR, O’Malley D, Lee B, Neale D, Savolainen O. 2003. Comparing EST-based genetic maps between Pinus sylvestris and Pinus taeda. TAG. Theoretical and Applied Genetics 107: 667–678. Kovach A, Wegrzyn JL, Parra G, Holt C, Bruening GE, Loopstra CA, Hartigan J, Yandell M, Langley CH, Korf I. 2010. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences. BMC Genomics 11: 420. Kuzoff RK, Gasser CS. 2000. Recent progress in reconstructing angiosperm phylogeny. Trends in Plant Science 5: 330–336. Lasky JR, Des Marais DL, Lowry DB, Povolotskaya I, McKay JK, Richards JH, Keitt TH, Juenger TE. 2014. Natural variation in abiotic stress responsive gene expression and local adaptation to climate in Arabidopsis thaliana. Molecular Biology and Evolution 31: 2283–2296. Le Corre V, Kremer A. 2012. The genetic differentiation at quantitative trait loci under local adaptation. Molecular Ecology 21: 1548–1566. Leder EH, McCairns RS, Leinonen T, Cano JM, Viitaniemi HM, Nikinmaa M, Primmer CR, Meril€a J. 2014. The evolution and adaptive potential of transcriptional variation in sticklebacks – signatures of selection and widespread heritability. Molecular Biology and Evolution 32: 674–689. Ledig FT, Jacob-Cervantes C, Hodgskiss PD, Eguiluz-Piedra T. 1997. Recent evolution and divergence among populations of a rare mexican endemic, chihuahua spruce, following holocene climatic warming. Evolution 51: 1815– 1827. Leitch AR, Leitch IJ. 2012. Ecological and genetic factors linked to contrasting genome dynamics in seed plants. New Phytologist 194: 629–646. Lemos B, Araripe LO, Fontanillas P, Hartl DL. 2008. Dominance and the evolutionary accumulation of cis-and trans-effects on gene expression. Proceedings of the National Academy of Sciences, USA 105: 14 471–14 476. Li X, Wu HX, Southerton SG. 2011. Transcriptome profiling of Pinus radiata juvenile wood with contrasting stiffness identifies putative candidate genes involved in microfibril orientation and cell wall mechanics. BMC Genomics 12: 480. Liu J-J, Sturrock RN, Benton R. 2013. Transcriptome analysis of Pinus monticola primary needles by RNA-seq provides novel insight into host resistance to Cronartium ribicola. BMC Genomics 14: 884. Lockton S, Gaut BS. 2009. The contribution of transposable elements to expressed coding sequence in Arabidopsis thaliana. Journal of Molecular Evolution 68: 80– 89. Loehle C. 1987. Tree life history strategies. Canadian Journal of Forest Research 18: 209–222. Lorenz WW, Alba R, Yu Y-S, Bordeaux JM, Sim~oes M, Dean JF. 2011. Microarray analysis and scale-free gene networks identify candidate regulators in drought-stressed roots of loblolly pine (P. taeda L.). BMC Genomics 12: 264. MacKay J, Dean J. 2011. Transcriptomics. In: Plomion C, Bousquet J, Kole C, eds. Genetics, genomics and breeding of conifers. New York, NY, USA: Edenbridge Science/ CRC Press, 323–357. Mackay J, Dean JF, Plomion C, Peterson DG, Ca novas FM, Pavy N, Ingvarsson PK, Savolainen O, Guevara MA, Fluch S. 2012. Towards decoding the conifer giga-genome. Plant Molecular Biology 80: 555–569. MacKay JJ, O’Malley DM, Presnell T, Booker FL, Campbell MM, Whetten RW, Sederoff RR. 1997. Inheritance, gene expression, and lignin characterization in a mutant pine deficient in cinnamyl alcohol dehydrogenase. Proceedings of the National Academy of Sciences, USA 94: 8255–8260. New Phytologist (2016) 209: 44–62 www.newphytologist.com New Phytologist Magbanua ZV, Ozkan S, Bartlett BD, Chouvarine P, Saski CA, Liston A, Cronn RC, Nelson CD, Peterson DG. 2011. Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine. PLoS One 6: e16214. Mageroy MH, Parent G, Germanos G, Giguere I, Delvas N, Maaroufi H, Bauce E, Bohlmann J, Mackay JJ. 2015. Expression of the b-glucosidase gene Pgbglu-1 underpins natural resistance of white spruce against spruce budworm. Plant Journal 81: 68–80. Men L, Yan S, Liu G. 2013. De novo characterization of Larix gmelinii (Rupr.) Rupr. transcriptome and analysis of its gene expression induced by jasmonates. BMC Genomics 14: 548. Michael TP. 2014. Plant genome size variation: bloating and purging DNA. Briefings in Functional Genomics 13: 308–317. Mimura M, Aitken S. 2010. Local adaptation at the range peripheries of Sitka spruce. Journal of Evolutionary Biology 23: 249–258. Mishima K, Fujiwara T, Iki T, Kuroda K, Yamashita K, Tamura M, Fujisawa Y, Watanabe A. 2014. Transcriptome sequencing and profiling of expressed genes in cambial zone and differentiating xylem of Japanese cedar (Cryptomeria japonica). BMC Genomics 15: 219. Morgante M, De Paoli E. 2011. Toward the conifer genome sequence. In: Plomion C, Bousquet J, Kole C, eds. Genetics, genomics and breeding of conifers. New York, NY, USA: Edenbridge Science/ CRC Press, 389–403. Morgenstern EK. 1969. Genetic variation in seedlings of Picea mariana (Mill.) BSP.: II. Variation patterns. Silvae Genetica 18: 161–167. Morse AM, Peterson DG, Islam-Faridi MN, Smith KE, Magbanua Z, Garcia SA, Kubisiak TL, Amerson HV, Carlson JE, Nelson CD. 2009. Evolution of genome size and complexity in Pinus. PLoS One 4: e4332. Murray BG, Leitch IJ, Bennett MD. 2012. Gymnosperm DNA C-values database. (release 5.0, Dec 2012). [WWW document] URL http://www.kew.org/cvalues [accessed 15 October 2015]. Nadeau NJ, Ruiz M, Salazar P, Counterman B, Medina JA, Ortiz-Zuazaga H, Morrison A, McMillan WO, Jiggins CD, Papa R. 2014. Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato. Genome Research 24: 1316–1333. Nairn CJ, Haselkorn T. 2005. Three loblolly pine CesA genes expressed in developing xylem are orthologous to secondary cell wall CesA genes of angiosperms. New Phytologist 166: 907–915. Namroud MC, Beaulieu J, Juge N, Laroche J, Bousquet J. 2008. Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce. Molecular Ecology 17: 3599–3613. Namroud MC, Guillet-Claude C, Mackay J, Isabel N, Bousquet J. 2010. Molecular evolution of regulatory genes in spruces from different species and continents: heterogeneous patterns of linkage disequilibrium and selection but correlated recent demographic changes. Journal of Molecular Evolution 70: 371– 386. Neale DB, Kremer A. 2011. Forest tree genomics: growing resources and applications. Nature Reviews Genetics 12: 111–122. Neale DB, Wegrzyn JL, Stevens KA, Zimin AV, Puiu D, Crepeau MW, Cardeno C, Koriabine M, Holtz-Morris AE, Liechty JD. 2014. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biology 15: R59. Neves LG, Davis JM, Barbazuk WB, Kirst M. 2014. A high-density gene map of loblolly pine (Pinus taeda L.) based on exome sequence capture genotyping. G3 (Bethesda, Md.). 4: 29–37. Nosil P. 2012. Ecological speciation. Oxford, UK: Oxford University Press. Nosil P, Funk DJ, Ortiz-Barrientos D. 2009. Divergent selection and heterogeneous genomic divergence. Molecular Ecology 18: 375–402. Nosil P, Schluter D. 2011. The genes underlying the process of speciation. Trends in Ecology & Evolution 26: 160–167. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C, Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A. 2013. The Norway spruce genome sequence and conifer genome evolution. Nature 497: 579–584. Paiva J, Garnier-Gere P, Rodrigues J, Alves A, Santos S, Gracßa J, Le Provost G, Chaumeil P, Silva-Perez D, Bosc A. 2008. Plasticity of maritime pine (Pinus pinaster) wood-forming tissues during a growing season. New Phytologist 179: 1180–1194. Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust New Phytologist Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA. 2010. Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics 11: 180. Pavy N, Gagnon F, Rigault P, Blais S, Deschenes A, Boyle B, Pelgas B, Deslauriers M, Clement S, Lavigne P et al. 2013. Development of high-density SNP genotyping arrays for white spruce (Picea glauca) and transferability to subtropical and nordic congeners. Molecular Ecology Resources 13: 324–336. Pavy N, Namroud MC, Gagnon F, Isabel N, Bousquet J. 2012a. The heterogeneous levels of linkage disequilibrium in white spruce genes and comparative analysis with other conifers. Heredity 108: 273–284. Pavy N, Pelgas B, Laroche J, Rigault P, Isabel N, Bousquet J. 2012b. A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers. BMC Biology 10: 84. Pelgas B, Bousquet J, Meirmans PG, Ritland K, Isabel N. 2011. QTL mapping in white spruce: gene maps and genomic regions underlying adaptive traits across pedigrees, years and environments. BMC Genomics 12: 145. Prunier J, Gerardi S, Laroche J, Beaulieu J, Bousquet J. 2012. Parallel and lineagespecific molecular adaptation to climate in boreal black spruce. Molecular Ecology 21: 4270–4286. Prunier J, Laroche J, Beaulieu J, Bousquet J. 2011. Scanning the genome for gene SNPs related to climate adaptation and estimating selection at the molecular level in boreal black spruce. Molecular Ecology 20: 1702–1716. Raffa KF, Powell EN, Townsend PA. 2013. Temperature-driven range expansion of an irruptive insect heightened by weakly coevolved plant defenses. Proceedings of the National Academy of Sciences, USA 110: 2193–2198. Raherison E, Rigault P, Caron S, Poulin P-L, Boyle B, Verta J-P, Giguere I, Bomal C, Bohlmann J, MacKay J. 2012. Transcriptome profiling in conifers and the PiceaGenExpress database show patterns of diversification within gene families and interspecific conservation in vascular gene expression. BMC Genomics 13: 434. Raherison ES, Giguere I, Caron S, Lamara M, MacKay JJ. 2015. Modular organization of the white spruce (Picea glauca) transcriptome reveals functional organization and evolutionary signatures. New Phytologist 207: 172–187. Raj S, Br€a utigam K, Hamanishi ET, Wilkins O, Thomas BR, Schroeder W, Mansfield SD, Plant AL, Campbell MM. 2011. Clone history shapes Populus drought responses. Proceedings of the National Academy of Sciences, USA 108: 12 521–12 526. Ralph SG, Yueh H, Friedmann M, Aeschliman D, Zeznik JA, Nelson CC, Butterfield YS, Kirkpatrick R, Liu J, Jones SJ. 2006. Conifer defence against insects: microarray gene expression profiling of Sitka spruce (Picea sitchensis) induced by mechanical wounding or feeding by spruce budworms (Choristoneura occidentalis) or white pine weevils (Pissodes strobi) reveals large-scale changes of the host transcriptome. Plant, Cell & Environment 29: 1545–1570. Ranade SS, Abrahamsson S, Niemi J, Garcıa-Gil MR. 2013. Pinus taeda cDNA microarray as a tool for candidate gene identification for local red/far-red light adaptive response in Pinus sylvestris. American Journal of Plant Sciences 4: 479–493. Raven PH, Evert RF, Eichhorn SE. 2005. Biology of plants, 7th edn. London, UK: Macmillan. Rehfeldt GE, Jaquish BC, Lopez-Upton J, Sa enz-Romero C, St Clair JB, Leites LP, Joyce DG. 2014. Comparative genetic responses to climate for the varieties of Pinus ponderosa and Pseudotsuga menziesii: realized climate niches. Forest Ecology and Management 324: 126–137. Renaut S, Grassa C, Yeaman S, Moyers B, Lai Z, Kane N, Bowers J, Burke J, Rieseberg L. 2013. Genomic islands of divergence are not affected by geography of speciation in sunflowers. Nature Communications 4: 1827. Resende MD, Resende MF, Sansaloni CP, Petroli CD, Missiaggia AA, Aguiar AM, Abad JM, Takahashi EK, Rosado AM, Faria DA. 2012. Genomic selection for growth and wood quality in Eucalyptus: capturing the missing heritability and accelerating breeding for complex traits in forest trees. New Phytologist 194: 116– 128. Rigault P, Boyle B, Lepage P, Cooke JEK, Bousquet J, MacKay J. 2011. A white spruce gene catalog for conifer genome analyses. Plant Physiology 157: 14–28. Ritland K. 2012. Genomics of a phylum distant from flowering plants: conifers. Tree Genetics & Genomes 8: 573–582. Rockman MV, Skrovanek SS, Kruglyak L. 2010. Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science 330: 372–376. Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust Tansley review Review 61 Sato S, Yoshida M, Hiraide H, Ihara K, Yamamoto H. 2014. Transcriptome analysis of reaction wood in Gymnosperms by next-generation sequencing. American Journal of Plant Sciences 5: 2785–2798. Savolainen O, Pyhajarvi T, Knurr T. 2007. Gene flow and local adaptation in trees. Annual Review of Ecology Evolution and Systematics 38: 595–619. Schlattl A, Anders S, Waszak SM, Huber W, Korbel JO. 2011. Relating CNVs to transcriptome data at fine resolution: assessment of the effect of variant size, type, and overlap with functional regions. Genome Research 21: 2004–2013. Schmitz RJ, Schultz MD, Lewsey MG, O’Malley RC, Urich MA, Libiger O, Schork NJ, Ecker JR. 2011. Transgenerational epigenetic instability is a source of novel methylation variants. Science 334: 369–373. Sena JS, Giguere I, Boyle B, Rigault P, Birol I, Zuccolo A, Ritland K, Ritland C, Bohlmann J, Jones S et al. 2014. Evolution of gene structure in the conifer Picea glauca: a comparative analysis of the impact of intron size. BMC Plant Biology 14: 95. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. 2009. ABySS: a parallel assembler for short read sequence data. Genome Research 19: 1117–1123. Siol M, Wright SI, Barrett SC. 2010. The population genomics of plant adaptation. New Phytologist 188: 313–332. Soltis PS, Soltis DE. 2013. A conifer genome spruces up plant phylogenomics. Genome Biology 14: 122. Storz JF. 2005. Using genome scans of DNA polymorphism to infer adaptive population divergence. Molecular Ecology 14: 671–688. Turchin MC, Chiang CW, Palmer CD, Sankararaman S, Reich D, Hirschhorn JN, Genetic Investigation of ANthropometric Traits (GIANT) Consortium. 2012. Evidence of widespread selection on standing variation in Europe at heightassociated SNPs. Nature Genetics 44: 1015–1019. Turner TL, Hahn MW, Nuzhdin SV. 2005. Genomic islands of speciation in Anopheles gambiae. PLoS Biology 3: 1572–1578. Verne S, Jaquish B, White R, Ritland C, Ritland K. 2011. Global transcriptome analysis of constitutive resistance to the white pine weevil in spruce. Genome Biology and Evolution 3: 851–867. Verta JP, Landry CR, MacKay JJ. 2013. Are long-lived trees poised for evolutionary change? Single locus effects in the evolution of gene expression networks in spruce. Molecular Ecology 22: 2369–2379. Villalobos DP, Dıaz-Moreno SM, Said E-SS, Ca~ nas RA, Osuna D, Van Kerckhoven SH, Bautista R, Claros MG, Ca novas FM, Canton FR. 2012. Reprogramming of gene expression during compression wood formation in pine: coordinated modulation of S-adenosylmethionine, lignin and lignan related genes. BMC Plant Biology 12: 100. Wang J, Abbott RJ, Ingvarsson PK, Liu J. 2014. Increased genetic divergence between two closely related fir species in areas of range overlap. Ecology and Evolution 4: 1019–1029. Wang Z, Chen J, Liu W, Luo Z, Wang P, Zhang Y, Zheng R, Shi J. 2013. Transcriptome characteristics and six alternative expressed genes positively correlated with the phase transition of annual cambial activities in Chinese Fir (Cunninghamia lanceolata (Lamb.) Hook). PLoS One 8: e71562. Warren RL, Keeking CI, Yuen MMS, Raymond A, Taylor GA, Vandervalk BP, Mohamadi H, Paulino D, Chiu R, Jackman SD et al. 2015. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer defense metabolism. Plant Journal 83: 189–212. Wegrzyn JL, Liechty JD, Stevens KA, Wu L-S, Loopstra CA, Vasquez-Gross HA, Dougherty WM, Lin BY, Zieve JJ, Martınez-Garcıa PJ. 2014. Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics 196: 891–909. Whitehead A, Crawford DL. 2006. Neutral and adaptive variation in gene expression. Proceedings of the National Academy of Sciences, USA 103: 5425–5430. Williams CG. 2009. Conifer reproductive biology. Berlin, Germany: Springer Science & Business Media. Wray GA. 2007. The evolutionary significance of cis-regulatory mutations. Nature Reviews Genetics 8: 206–216. Yakovlev IA, Asante DK, Fossdal CG, Junttila O, Johnsen Ø. 2011. Differential gene expression related to an epigenetic memory affecting climatic adaptation in Norway spruce. Plant Science 180: 132–139. Yakovlev IA, Fossdal CG, Johnsen Ø. 2010. MicroRNAs, the epigenetic memory and climatic adaptation in Norway spruce. New Phytologist 187: 1154–1169. New Phytologist (2016) 209: 44–62 www.newphytologist.com 62 Review New Phytologist Tansley review Yakovlev IA, Lee Y, Rotter B, Olsen JE, Skrøppa T, Johnsen Ø, Fossdal CG. 2014. Temperature-dependent differential transcriptomes during formation of an epigenetic memory in Norway spruce embryogenesis. Tree Genetics & Genomes 10: 355–366. Yang S-H, Loopstra CA. 2005. Seasonal variation in gene expression for loblolly pines (Pinus taeda) from different geographical regions. Tree Physiology 25: 1063– 1073. Ye K, Lu J, Raj SM, Gu Z. 2013. Human expression QTLs are enriched in signals of environmental adaptation. Genome Biology and Evolution 5: 1689–1701. Yeaman S. 2013. Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proceedings of the National Academy of Sciences, USA 110: E1743–E1751. Yeaman S, Hodgins KA, Suren H, Nurkowski KA, Rieseberg LH, Holliday JA, Aitken SN. 2014. Conservation and divergence of gene expression plasticity following c. 140 million years of evolution in lodgepole pine (Pinus contorta) and interior spruce (Picea glauca 9 Picea engelmannii). New Phytologist 203: 578–591. Zhang X, Cal AJ, Borevitz JO. 2011. Genetic architecture of regulatory variation in Arabidopsis thaliana. Genome Research 21: 725–733. Zhong R, Lee C, Ye Z-H. 2010. Evolutionary conservation of the transcriptional network regulating secondary cell wall biosynthesis. Trends in Plant Science 15: 625–632. Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M, Marcßais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ. 2014. Sequencing and assembly of the 22-Gb loblolly pine genome. Genetics 196: 875–890. Zimin AV, Marcßais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. 2013. The MaSuRCA genome assembler. Bioinformatics 29: 2669–2677. New Phytologist is an electronic (online-only) journal owned by the New Phytologist Trust, a not-for-profit organization dedicated to the promotion of plant science, facilitating projects from symposia to free access for our Tansley reviews. Regular papers, Letters, Research reviews, Rapid reports and both Modelling/Theory and Methods papers are encouraged. We are committed to rapid processing, from online submission through to publication ‘as ready’ via Early View – our average time to decision is <27 days. There are no page or colour charges and a PDF version will be provided for each article. The journal is available online at Wiley Online Library. Visit www.newphytologist.com to search the articles and register for table of contents email alerts. If you have any questions, do get in touch with Central Office ([email protected]) or, if it is more convenient, our USA Office ([email protected]) For submission instructions, subscription and all the latest information visit www.newphytologist.com New Phytologist (2016) 209: 44–62 www.newphytologist.com Ó 2015 The Authors New Phytologist Ó 2015 New Phytologist Trust