Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hologenome theory of evolution wikipedia , lookup
Organisms at high altitude wikipedia , lookup
High-altitude adaptation in humans wikipedia , lookup
Natural selection wikipedia , lookup
Genetics and the Origin of Species wikipedia , lookup
The eclipse of Darwinism wikipedia , lookup
Genetic drift wikipedia , lookup
UNIVERSITY OF CALGARY Population divergence and candidate signatures of natural selection in alpine and lowland ecotypes of the allotetrapoloid plant, Anemone multifida (Ranunculaceae) by Jamie R McEwen A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF BIOLOGICAL SCIENCES CALGARY, ALBERTA AUGUST, 2012 © Jamie R McEwen 2012 Abstract Adaptation plays a central role in population divergence and speciation. Studying the evolutionary history of populations due to neutral evolutionary processes and the effects of natural selection enables the identification of genes under natural selection in the wild. In this thesis, I conducted a genome scan to elucidate candidate signatures of natural selection in alpine and lowland ecotypes of the allopolyploid plant, Anemone multifida. I found numerous signatures of divergent natural selection between alpine and lowland populations and between alpine populations, but natural selection appeared strongest in alpine environments. These results are consistent with findings in diploid species, but the neutral evolutionary structure of the polyploid A. multifida showed complex patterns of differentiation. Overall, these results indicate divergent natural selection has generated adaptation to alpine and lowland environments despite complex evolutionary history. ii Acknowledgements 1. My supervisors Dr. Jana Vamosi and Dr. Sean Rogers, and committee members Dr. Lawrence Harder, and Dr. Gordon Chua for their ideas and support 2. Grant support from NSERC, Prairie Adaptation Research Collaborative, Alberta Conservation Association, and the University of Calgary 3. Sean Rogers and Jana Vamosi lab members 4. Audra McEwen for support through my degrees iii Table of Contents Abstract…………………………………………………………………………………...ii Acknowledgements………………………………………………………………………iii Table of Contents…………………………………………………………………………iv List of Tables………………………………………………………….………………….vi List of Figures……………………………………………………………………………vii List of Symbols, Abbreviations, Nomenclatures………………………………………..viii Chapter 1: Introduction to Natural Selection and Population Genetics…………………...1 1.1 Identifying Signatures of Natural Selection in the Genome…………………..8 1.2 Challenges of Polyploidy………………………………………………….....10 1.3 Amplified Fragment Length Polymorphism (AFLP)………………………..12 1.3 Alpine and Lowland Environments………………………………………….14 1.4 Research Objectives, Hypotheses and Predictions……………………...…...15 Chapter 2: Materials and Methods……………………………………………………….19 2.1 Study Species, Field Sampling and Population Characteristics……………...19 2.2 DNA Extraction, AFLP and Allele Scoring…………………………………21 2.3 Detection of Outlier Loci…………………………………………………….23 2.4 Genetic and Population Structure Analyses………………………………….24 2.5 Phenotype Analyses………………………………………………………….26 Chapter 3: Results………………………………………………………………………..28 3.1 AFLP and the Detection of Outlier Loci……………………………………..28 3.2 Population Structure of Neutral Loci………………………………………...31 3.3 Population Structure of Outlier Loci…………………………………………37 3.5 Phenotypic Differences in Height and Floral colour….……………………..40 Chapter 4: Discussion……………………………………………………………………42 4.1 Genetic Population Structure at Neutral and Outlier Loci…………………...42 4.2 Limitations and Alternate Explanations……………………………………..48 iv 4.3 Future Directions…………………………………………………………….51 Appendix A: Supplementary Data and Methods…………………………………….......68 Appendix B: AFLP Protocol Taken From the AFLP Plant Mapping Protocol for Regular Plant Genomes (Applied Bioystems……………………………………………………..76 Appendix C: Example of Electropherogram and Raw Data Produced from AFLP……..78 v List of Tables Table 1. ………………….………………………………..……………………………...………19 Table 2..………….…………………………………………………………...…………………..21 Table 3..……..…………...……………………………………………………………………….29 Table 4..…………………………………………………………………………………………..35 Table 5..….……………….............................................................................................................40 vi List of Figures Figure 1…………………………………………………………………………………………….6 Figure 2. ……………..…………………………………………………………………………...20 Figure 3…..………………………………………………………………………………..…...…30 Figure 4….………………………………………………………………………………………..31 Figure 5..…..…………………………………………………….………………………………..32 Figure 6..………………………………………………………………………………………….33 Figure 7. ………………………………………………………………………………………….34 Figure 8…………………………………………………………………………………………...36 Figure 9…..……………………………………………………………………………………….37 Figure 10.….……………………………………………………………………….......................38 Figure 11……..…………………………………………………………………………………...39 Figure 12………………………………………………………………………………………….41 vii List of Symbols, Abbreviations, Nomenclatures AFLP: Amplified Fragment Length Polymorphism AMOVA: Analysis of Molecular Variance HWP: Highwood Pass alpine population HSB: Hailstone Bute alpine population WC: Willow Creek lowland population BL: Beauvais Lake lowland population BHS: Big Hill Springs lowland population viii CHAPTER 1: INTORDUCTION TO NATURAL SELECTION AND POPULATION GENOMICS Natural selection plays an important role in the adaptation of species to their environments, divergence between populations and species diversity (references). Phenotypically, natural selection increases the prevalence of traits in a population or species that confer some form of adaptation to a particular environment. Individuals with adaptive phenotypes have higher fitness, surviving to produce viable offspring with traits that are advantageous in a particular environment (references). Genetically, the alleles underlying adaptive phenotypes increase in frequency between generations in response to natural selection, resulting in the evolution of populations towards adaptive traits, although the dynamics of adaptive shifts can vary (Kauffman 1987; Hadany 2003). Natural selection requires genetic variation within a population, and populations with little variation often have limited adaptive potential (Willi et al. 2006). Natural selection generally reduces genetic variation at loci under selection and causes differentiation between populations if different alleles or genes are the targets of natural selection (Lenormand 2002). Through genetic linkage, non-random segregation of alleles at two or more loci between generations, the regions surrounding genes under selection can also approach fixation, leading to further differentiation between populations, even at loci that are not directly affected by natural selection, a phenomenon also known as genetic hitchhiking (Kim & Nielsen 2004). Population divergence has been driven in large part by natural selection (Schluter 2001). Variation in environmental conditions between populations, or ecological opportunity during colonization or modification of a habitat can drive differentiation between populations or ecotypes (an ecotype is a genetically or phenotypically distinct group within a species; Schluter 2001). A suite of traits that are selected for in one environment can be non-adaptive or potentially deleterious in alternate environments, leading to lower fitness of individuals following migration or gene flow between populations (Lenormand 2002). For example, migrant individuals arriving with a different suite of adaptations in a new population are likely to produce fewer offspring with adaptive phenotypes than established individuals, causing eventual loss of non-adaptive phenotypes despite gene flow, although extensive gene flow can limit the adaptive potential of a population (Schluter 2001; Nosil et al. 2005; Bridle & Vines 2007). In extreme cases of divergence, hybrids between individuals with adaptations to different environments can be selected against, favouring assortative mating and reduced gene flow between ecotypes (Schluter 2001; Nosil et al. 2005). Reductions in gene flow due to divergent selection (which would select against individuals migrating between environments) can also enhance the impact of non-selective mechanisms that can cause genetic differentiation (e.g. genetic drift, or the random loss of alleles from a population which is enhanced in small populations), further accelerating differentiation (Dobzhansky 1957). These isolating effects of adaptation can eventually lead to large scale or whole genome differentiation between populations or ecotypes, eventually leading to speciation (Peichel et al. 2001; Rogers & Bernatchez 2005; Via & West 2008; Feder & Nosil 2010). Isolation by adaptation has been observed in a number of cases by the genetic breakdown of individuals hybridized between sufficiently diverged ecotypes (Burke & Voss 1998; Presgraves et al. 2003; Svedin et al. 2008; Renaut et al. 2012), although processes such as polyploidy may still bridge the gap between species at the late stages of speciation (Chapman & Abbott 2010). By studying how natural selection and neutral evolutionary processes have affected the adaptation and differentiation between populations and individuals within species we can start to understand the mechanisms by which major evolutionary events such as speciation initiate and progress. 2 The neutral theory of population genetics states that many genes, and the phenotypes they affect, evolve through adaptively neutral processes (Lewontin 1974; Kimura 1983). For example, genetic drift can cause fixation alleles in a population (Lewontin 1974; Kimura 1983). Mutation, while typically deleterious in effect, is an important source of new alleles in populations (Hamilton 2009). Demographic events, such as population bottlenecks or founder effects, can enhance allele fixation in a population by limiting genetic diversity (Maruyama & Fuerst 1985; Gavrilets & Hastings 2012). Conversely, gene flow from individuals migrating between populations can introduce or remove alleles from a population (Felsenstein 1976). Determination of the causes of evolution of populations due to neutral and adaptive processes is a challenging process. By demonstrating that variation in genetically based phenotypes is associated with fitness differences between individuals, the direction, magnitude and phenotypic loci of natural selection can be determined (Kingsolver et al. 2001). Demonstration that phenotypes of interest have a genetic basis is ultimately required to separate the effects of phenotypic plasticity (environmentally induced changes in phenotypes within generations) from selection for a phenotypic variant (Thompson 1991). Additionally, as many traits may have evolved primarily due to selectively neutral processes, it is necessary to differentiate between neutral and selective processes in phenotypic evolution (Luikart et al. 2003; Stinchcombe & Hoekstra 2008). However, determining patterns of neutral genetic population structure is generally not possible with phenotypic data alone. By incorporating genetic information from populations, it is possible to separate the effects of neutral and adaptive evolutionary processes for study, and potentially find the genetic basis for adaptive phenotypes. By estimating the degree to which populations are genetically differentiated, signatures of natural selection can be distinguished at specific phenotypic or genetic loci with very high or low levels of 3 population differentiation (Luikart et al. 2003). In doing so, the causes of the evolution of populations can be determined. The development of genetic markers allowed examination the genetic structure of populations to lay the groundwork for detecting natural selection (Charlesworth 2010). By estimating population differentiation due to non-selective evolutionary processes, such as reductions in genetic diversity from population fluctuations, age or sex structure, or the random loss of alleles from a population through genetic drift, early genetic markers permitted the separation of the effects of natural selection from non-selective evolutionary processes (Charlesworth 2010). Additionally, the genetic basis of phenotypes could in some cases be established through the discovery of associations between markers and phenotypes, allowing direct tests of the genetic basis of adaptive phenotypic variation (Charlesworth 2010). Genetic markers have also been used to estimate neutral population genetic structure for providing a baseline expectation for phenotypic variation between populations (Whitlock & Guillaume 2009). Genetic markers used in these early studies, such as allozymes, are generally unreliable for assessing population structure, because they may themselves be targets of natural selection (Luikart et al. 2003; Charlesworth 2010). The development of genetic markers that directly amplify DNA, such as microsatellites, helped circumvent these limitations, as only markers that show neutral variation between populations can be used to estimate population structure (Luikart et al. 2003; Charlesworth 2010). Microsatellite markers are reliable and accurate for estimating neutral population genetic structure (e.g. Forstmeier et al. 2012), but their relatively low genomic coverage has likely limited the discovery of genetic loci that are the targets of natural selection (Meudt & Clarke 2007). Ultimately, the development of population genomics, which utilizes genetic techniques that can amplify a many markers distributed throughout the 4 genome, has made discovery of the molecular bases of adaptation and divergence practical to in any species (Luikart et al. 2003). By studying the population structure of neutral and outlier loci, the relative contribution of neutral and adaptive evolutionary processes to population divergence can be determined (Luikart et al. 2003; Stinchcombe & Hoekstra 2008; Storz & Wheat 2010). This can be accomplished by generating a distribution of population differentiation estimates (e.g. Wright’s Fixation Index, or FST) at multiple loci throughout the genome (Fig. 1; Stinchcombe & Hoekstra 2008). Neutral loci will have intermediate differentiation, whereas loci that are outliers relative to variation in population differentiation at the majority of loci may represent signatures of natural selection (Fig. 1, Stinchcombe & Hoekstra 2008: for alternate hypotheses see Siol et al. 2010; Bierne et al. 2011). Recent technological advances in amplifying multiple markers simultaneously throughout the genome have provided the means to study large portions of the genome in many individuals, greatly enhancing simultaneous estimation of the extent of population divergence due to neutral evolutionary forces and identification of loci that may be involved in adaptation and population divergence (Luikart et al. 2003; Stinchcombe & Hoekstra 2008). Population genomics has high power to detect the molecular mechanisms underlying adaptive divergence, in addition to providing reliable estimates of population structure due to neutral evolutionary processes (Luikart et al. 2003; Stinchcombe & Hoekstra 2008). Many different types of studies have taken advantage of these techniques, including the mapping of quantitative trait loci associated with adaptive phenotypes involved in population divergence (Rogers & Bernatchez 2005; Mackay & Stone 2009; Hager et al. 2009; Schielzeth et al. 2012), and the detection of loci involved in natural selection and speciation (Stinchcombe & Hoekstra 2008; Stapley et al. 2010; Strasburg et al. 2012). 5 Figure 1. A theoretical distribution of FST estimates (a measure of population differentiation) from many genetic loci distributed throughout the genome. Neutral loci show intermediate differentiation (black points), whereas outlier loci (red points) may have been affected by natural selection. The dotted line represents the FST threshold for distinguishing between neutral and outlier levels of population differentiation at each locus. From Stinchcombe & Hoekstra (2008). In addition to neutral evolutionary processes that can cause population divergence, past hybridization or polyploidy may also complicate the discovery of loci involved in adaptation, due to brief, rapid genomic divergence (Soltis & Soltis 1999). Hybridization and polyploidy can cause novel gene function and genomic restructuring, which can be facilitated by or cause rapid ecological divergence, adaptation into to novel environments and speciation (Soltis & Soltis 1999; Osborn et al. 2003; Lexer et al. 2003; Adams & Wendel 2005; Baack et al. 2005; Lai et al. 2005; Whitney et al. 2006; Rieseberg et al. 2007). Extensive genomic changes are common, particularly amongst polyploids, and can 6 leave large sections of chromosomes in a highly differentiated state between populations, meaning genetic variation in a large portion of the genome is caused primarily to genomic changes brought about by hybridization or polyploidy (Soltis & Soltis 1999; Otto & Whitton 2000; Leitch & Bennett 2004; Leitch & Leitch 2008). Additionally, introgression following hybridization or polyploidization can cause gene transfer amongst species, introducing novel genetic variation into populations (Chapman & Abbott 2010; Whitney et al. 2010). If populations are sufficiently isolated or incompatible following these major genomic changes, populations may ultimately diverge via genetic processes unrelated to adaptation (Soltis & Soltis 1999). The distinction between population divergence caused by genomic rearrangements or hybridization from natural selection further underscores the importance of sampling from replicate populations within each environment to distinguish between these fundamentally different causes of population differentiation. The determination of the structure of genetic variation in populations that inhabit different environments provides insight into the effects of selective and neutral evolutionary processes. The expansion of genomics research to include a diverse range of organisms and environments will lead to novel insights into the function and causes of genomic structure and further understanding of evolution in a diverse range of conditions. In this study, I investigate the population genomics of an allopolyploid plant that occupies a wide range of environments. The study of the population genomics of an organism with a complex polyploid genome will develop a foundation for investigating the mechanisms of evolution in a group of organisms that has received relatively little attention. The goals of this study are to 1) illustrate the utility of genome scans with a non-model organism with a complex genome, 2) investigate whether polyploidy inhibits the ability to detect signatures of selection and/or population differentiation, and 3) to determine whether divergent 7 selection pressure in a wide-ranging plant occur in different environments, such as alpine and lowland habitats. 1.1 IDENTIFYING SIGNATURES OF NATURAL SELECTION IN THE GENOME The population genomics approach provides a powerful tool to identify signatures of natural selection in wild populations for initial insight into the patterns and mechanisms of adaptation, and is usually the first step in identifying candidate genes for adaptation (Luikart et al. 2003; Stinchcombe & Hoekstra 2008). Although many studies have sought the genetic basis of traits known to be adaptive, conducting genome scans on populations or ecotypes that show ecological differentiation provides the means for finding any loci associated with divergence or adaptation (Storz 2005; Stinchcombe & Hoekstra 2008; Stapley et al. 2010; Strasburg et al. 2012), and the genomic processes associated with evolution (Kim & Nielsen 2004; Nosil et al. 2009a; Skrede et al. 2009; Tice & Carlon 2011). Provided functional information about loci of interest, selection can be detected on traits not previously hypothesized to be involved in ecological differentiation and traits not easily quantified by direct observation, such as physiological, biochemical traits or patterns of gene expression (Nichols et al. 2008; Derome et al. 2008; Whiteley et al. 2008; Paris et al. 2010; Pavey et al. 2010; Storz & Wheat 2010). By studying many loci distributed throughout the genome, loci that evolve primarily due to neutral and selective processes can be characterized. Quantifying the degree to which these markers are differentiated amongst populations or ecotypes provides a method for distinguishing neutral or demographic effects affecting the whole genome, as reflected in population differentiation amongst the majority of markers, from non-neutral effects, such as natural selection, at each locus (Luikart et al. 2003; Buerkle et al. 2011). Under Hardy-Weinberg conditions, neutral loci typically exhibit intermediate differentiation, whereas loci affected by directional or balancing selection may show very high or low differentiation between 8 populations or ecotypes (Luikart et al. 2003). The distribution and frequencies of these outlier loci amongst populations or environments can suggest how selection caused the excessively low or high differentiation (Luikart et al. 2003). Most investigations in non-model organisms that have found signatures of natural selection have utilized anonymous markers (markers that do not contain sequence information and cannot be directly used to infer function), such as AFLP, that easily and predictably amplify across a wide variety of taxa (Stinchcombe & Hoekstra 2008). An unfortunate limitation of using anonymous markers is the difficulty in obtaining functional information from loci that show signatures of natural selection (Stinchcombe & Hoekstra 2008), although a few studies show promising results from isolating anonymous markers for further investigation (Paris et al. 2010; Paris & Despres 2012). With the development of molecular techniques for sequencing in non-model organisms (e.g. Baird et al. 2008), there is a potential to utilize high-throughput sequencing to simultaneously identify outlier loci and obtain sequencing information that may be used to determine the function of the region under selection. AFLP, however, remains a cost-effective and powerful method for discovering outlier loci that may be the target of natural selection. Although many studies have identified outlier loci that might be targets of natural selection, not all have included information about the ecological context in which the outlier loci were found (Stinchcombe & Hoekstra 2008). When ecological information is considered, many identified outlier loci are associated with a particular environment or ecotype, suggesting that these loci may be targets of natural selection (Rogers & Bernatchez 2007; Poncet, Herrmann, Gugerli, et al. 2010). These signatures of natural selection have been discovered in a variety of organisms along gradients of temperature, precipitation and altitude (Bonin et al. 2006a; Poncet, Herrmann, & Gugerli 2010; Freedman et al. 2010; Bradbury & Hubert 2010; Nunes, Beaumont, & Butlin 2011; Cox & 9 Broeck 2011), in relation to postglacial colonization and ecological divergence (Bernatchez et al. 2010; Schluter et al. 2010; Freedman et al. 2010; Renaut & Maillet 2012), hybridization and introgression (Minder & Widmer 2008; Gagnaire et al. 2009; Whitney et al. 2010), and host-use differentiation (Egan et al. 2008; Apple et al. 2010; Funk et al. 2011). Loci associated with anthropogenic impacts or artificial selection in wild populations have also been detected using genome scans (Paris et al. 2010; Orsini et al. 2012), demonstrating the widespread utility of using genome scans for discovering loci associated with selection. Nevertheless, only a few studies have investigated the function of outlier loci or attempted to confirm they are actually under selection by demonstrating variation in fitness conferred by different alleles (Stinchcombe & Hoekstra 2008; Lowry et al. 2009; Bernatchez et al. 2010; Schluter et al. 2010). Additionally, existing studies have covered a variety of organisms in different ecological settings, but genome scans for signatures of natural selection in organisms with large, complex genomes (such as polyploids) with potentially different responses to natural selection have not yet been conducted. 1.2 CHALLENGES OF POLYPLOIDY Polyploidy plays a major role in the evolution of plants, fungi and several animal lineages (Otto & Whitton 2000; Wendel 2000; De Bodt et al. 2005; Soltis et al. 2009), and diploid organisms are poor models for understanding polyploid genomics. Polyploid species have different modes of inheritance, undergo major genomic changes during formation, and often express genes differently than closely related diploids (Soltis & Soltis 1999; Adams & Wendel 2005). In addition to the challenges of the unconventional genomes of polyploids, common population-genetic models, such as the Hardy-Weinberg equilibrium, and metrics, such as F statistics (e.g. FST). assume heterozygous individuals 10 have a maximum of two alleles, posing a challenge for the analysis of polyploid populations. Polyploids originate by two main mechanisms: autopolyploidy, or polyploidization within a single lineage; and allopolyploidy, or polyploidy associated with interspecific hybridization (Soltis & Soltis 1999). The hybridization of divergent genomes in allopolyploids can amplify allelic diversity at each genetic locus (Soltis & Soltis 2000). However, in all polyploids, genome reduction and genetic bottlenecks following polyploidization may reduce genetic diversity (Leitch & Bennett 2004). In addition to originating with potentially extensive genetic diversity, the mechanism of inheritance in polyploids (polysomic inheritance) lowers the probability of allelic loss through genetic drift or inbreeding, so that allele fixation requires many more generations than in diploid populations (Ronfort et al. 1998; Soltis & Soltis 2000). Detection of the genomic effects of natural selection may be constrained by the “fixed heterozygosity” commonly observed in polyploids due to the effects of polysomic inheritance. Fixed heterozygosity may prolong adaptive potential in polyploids (via retaining copies of alternative alleles), but it may also reduce polyploid fitness through the retention of deleterious alleles despite selection against them (Otto & Whitton 2000). fixed heterozygosity may confound attempts to discover outlier loci as signatures of selection, as alleles would be consistently present if duplicate copies were identical by descent. If so, profiling genome-wide expression in divergent ecotypes may be the best way to discover loci of ecological or evolutionary interest. The potential for up to four alleles per individual in tetraploid populations poses a number of unique challenges for commonly used methods in population genetics based on F-statistics. F-statistics such as FST or Fis use estimates of heterozygosity in diploid systems to quantify the differentiation between populations (FST) or inbreeding within a population (Fis; Hamilton 2009). Mathematical methods have been developed to estimate 11 FST from heterozygosity data in polyploids (e.g. Clark & Jasieniuk 2011), but assessing the allele dosage is difficult or impossible with commonly used genetic markers (but see Esselink et al. 2004), making the application of these methods impractical. However, several methods have been developed to estimate F-statistics from haploid data generated from amplified fragment length polymorphism (AFLP) analysis (Foll & Gaggiotti 2008; Foll et al. 2010), which in practice gives the same type of data for both polyploids and diploids (i.e. alleles are either present or absent, there is no direct information on heterozygosity). AFLP data are also amenable to use with methods for assessing population structure that do not rely on F-statistics, such as distance- or model-based methods (Pritchard et al. 2000). AFLPs can be used to estimate F-statistics in polyploids, but care must be taken when inferring population structure or evolutionary processes. For example, most polyploid species studied to date have multiple origins, and recurring hybridization and introgression are common amongst polyploid species and their progenitors (Soltis & Soltis 1999; Soltis et al. 2004; Grubbs et al. 2009; Wu et al. 2010). Additionally, population structure in polyploids is often affected by differences in breeding systems. For instance, apomictic polyploid species often segregate based on self-compatibility, as opposed to geographic parameters, although several of these studies are based on triploid populations which often have a higher rate of selfing and infertility than tetraploids (Chapman et al. 2000; Meirmans et al. 2003; Van Der Hulst et al. 2003; Lo et al. 2009; Symonds et al. 2010). 1.3 AMPLIFIED FRAGMENT LENGTH POLYMORPHISM (AFLP) Molecular methods for amplifying a large number of loci in a single reaction continue to be refined and improved, particularly for high-throughput sequencing (e.g. Baird et al. 2008). Amplified fragment length polymorphism (AFLP) is a method for amplifying hundreds of markers throughout the genome and has a proven record in 12 population genomics research in a diverse range of organisms (Meudt & Clarke 2007). The basics of the molecular method are in four steps: 1) digestion of DNA with restriction enzymes, 2) ligation of adaptors onto the sticky ends of the DNA at the restriction site, 3) preselective amplification of successfully ligated DNA fragments using PCR with primers complimentary to the adaptors, and 4) selective amplification of DNA fragments using PCR that have complimentary sequences to fluorescently labeled primers with three arbitrarily chosen nucleotides on the end of the primer. The resulting fragments are separated using capillary electrophoresis and each AFLP locus is scored based on fragment size in base pair length. AFLP therefore amplifies specific, repeatable sections of the genome that can be used for genomic investigation in any organism. The basis for detecting genetic variation in AFLP relies primarily on the presence or absence of mutations in the restriction site (Meudt & Clarke 2007). If there is a mutation in a restriction site the restriction enzyme will not cleave the DNA, thus preventing the ligation of adaptors and amplification in the final product. The basis of scoring AFLP markers is either a presence or absence (binary) state at each locus (each locus being a particular fragment length in base pairs; see Appendix C for example data). The term “AFLP allele” refers to the two alternate states at these presence/absence loci and not to the actual alleles at a genetic locus, which can potentially number many more than two alleles. Dominant AFLP data refers to the state in which an allele is scored as either present or absent. The height of the AFLP amplification peak can also be used as the basis for genotyping loci (Fischer et al. 2011), but the reliability and accuracy of peak height data for genomic inference is largely unknown while dominant AFLP data continues to be a reliable method for genome scans (e.g. Tice & Carlon 2011). 13 1.3 ALPINE AND LOWLAND ENVIRONMENTS Alpine and lowland habitats differ extensively in abiotic and biotic conditions (Billings 1974), and several studies having found divergent adaptation between alpine and lowland populations (Emery & Chinnappa 1994; Bonin et al. 2006a; Poncet, Herrmann, & Gugerli 2010; Fischer et al. 2011). The extreme nature of alpine environments has favoured the evolution of alpine specialist species, and speciation itself may be accelerated in alpine habitats (Billings 1974; Hughes & Eastwood 2006), making alpine and lowland systems ideal habitats to study population divergence amongst terrestrial organisms (Schonswetter et al. 2003; Pinceel et al. 2005; Mráz et al. 2007). Alpine habitats are generally characterized as having extreme abiotic environments, being generally colder, more exposed, with a shorter growing season, lower predation, and more intense, higher short spectrum (blue/UV) radiation than lowland environments (Billings 1974; Emery & Chinnappa 1994). Lowland habitats are generally characterized as less extreme environments, but with more competition between species, warmer, longer growing seasons, less exposed, less intense radiation, but a higher far-red and infrared spectrum intensity (Billings 1974; Emery & Chinnappa 1994). Alpine and lowland habitats can be sources of divergent selection in species that occupy both habitats (Byars et al. 2007; Gonzalo-Turpin & Hazard 2009; Ikeda & Setoguchi 2010). If so, these differences in selection should be evident at the molecular level. Determination of the genetic mechanisms contributing to these differences would be important to both understanding the evolution of adaptive trait variation in these environments, and help characterize putative candidate genes associated with economically important traits, such as cold tolerance, response to drought and limited nutrients, and response to environmental stress in general. 14 In addition to functional regions of the genome that may be affected by natural selection, ecological differences between lowland and alpine environments can affect patterns of dispersal, rates of population divergence, and speciation (Hughes & Eastwood 2006; Alvarez et al. 2009; Huang et al. 2011; Buehler et al. 2012). Alpine environments in particular can be functionally similar to islands, with rapid divergence occurring between populations or species as alpine habitats are colonized (Hughes & Eastwood 2006). The isolation of mountain tops amongst intervening temperate habitats can reinforce this differentiation through restricted gene flow (Aegisdóttir et al. 2009; Huang et al. 2011; Buehler et al. 2012). Additionally, environmental variation between alpine sites and the generally extreme nature of alpine habitats may further limit the success of individuals dispersed between populations, thereby reducing gene flow and accelerating differentiation between alpine populations (Alvarez et al. 2009; Meirmans et al. 2011). If conditions vary between alpine populations or different molecular mechanisms of adaptation have evolved between populations in response to alpine environmental conditions, then gene flow may be reduced at loci under selection (Lenormand 2002). Lowland environments can have a more continuous landscape than alpine environments, particularly in the semi-grassland habitats that were investigated in this study. The relatively homogeneous abiotic environment across lowland habitats and the lack of major barriers to dispersal allows more extensive gene flow between lowland populations, which may reduce genetic differentiation (e.g. Carter & Robinson 1993). 1.4 RESEARCH OBJECTIVES, HYPOTHESES AND PREDICTIONS In this study, I conducted a genome scan for signatures of natural selection between alpine and lowland ecotypes of the allopolyploid plant Anemone multifida Poir. (Ranunculaceae). Anemone multifida is a widespread species (Argentina to Alaska) that occupies habitats from sea level to high alpine, making it a good candidate for a genome 15 scan for signatures of natural selection in a species with a large, complex genome. A. multifida is hypothesized to be an allotetraploid based on observations of two distinct chromosome sets, one of which is similar to chromosomes from a clade of alpine specialist species, whereas the other set is more similar to chromosomes from a lowland clade (Meyer et al. 2010; Hoot et al. 2012). Therefore, A. multifida may possess alternate copies of alleles that are advantageous in alpine environments (from the “alpine” chromosome set) and lowland environments (from the “lowland” chromosome set), which may explain its wide habitat range (Meyer et al. 2010; Hoot et al. 2012). A. multifida is distributed from sea level to 4200 m (approximately 2300 m in North America) and has a discontinuous range throughout North America and temperate regions of South America (Hoot et al. 2012). Both sympatric and allopatric populations of A. multifida exhibit extensive morphological variation (Meyer et al. 2012; Hoot et al. 2012). Throughout its North American distribution there are white, red and pink flowers, whereas only white flowers occur in South America. The goals of this study were to determine 1) whether the genome of A. multifida includes outlier loci that may have been the target of natural selection, 2) whether populations in the same environment possess similar alleles, that differ between environments, that would indicate that natural selection (vs. genetic drift) has led to alpine and lowland adaptation, or 3) whether neutral population genetic structure suggests that non-selective processes have driven population divergence. For the first goal, I conducted a genome scan on individuals collected from lowland and alpine populations in and along the Rocky Mountains of western Canada, and quantified differentiation as FST at each AFLP locus to all of the populations. Loci that showed very high or very low FST were deemed to be outliers, which could indicate signatures of natural selection in the genome amongst all populations, although genetic drift could also account for genetic differentiation at outlier loci. If natural selection has 16 affected the genome of A. multifida, outlier loci should show very limited differentiation between populations in the case of balancing selection, or extensive differentiation if divergent selection has had an effect. Alternatively, natural selection has not affected specific sites throughout the genome of A. multifida, all loci should exhibit similar intermediate differentiation, with no outlying loci. The population structure and distribution of outlier loci provides information about the context in which natural selection might be acting on outlier regions of the genome, but this analysis alone does determine whether population structure caused by neutral evolutionary processes could be maintaining differentiation at outlier loci. Therefore, population structure was analyzed for non-outlier (neutral loci). Analysis across multiple populations from each environment allows separation of demographic and environmental factors in determining genetic population structure, and helps identify whether populations diverged primarily due to selectively neutral processes. If so gene flow or genetic drift have affected population evolution, which should be evident at the whole genome scale (i.e. neutral loci). Given low gene flow and/or high genetic drift, populations should be highly differentiated at neutral loci. In contrast, high gene flow and/or negligible genetic drift should generate limited genomic differentiation between populations. Although outliers within genetic data would reveal signatures of natural selection in the genome of Anemone multifida, the distribution of outlier alleles within and between each population is necessary to assess if the differing conditions in alpine and lowland habitats may be driving patterns of genetic differentiation. Therefore, for the second part of this study, I conducted multiple analyses of population structure for outlier loci using a combination of distance- and model-based methods, as well as estimates of FST and allele frequencies within and between populations to determine whether outlier loci were segregated according to environment. To link the genetic data to potentially adaptive 17 phenotypes I also tested for associations between genetic markers, plant height and floral colour (as it is a variable trait in this species) amongst all populations. Shorter plant height may be selected in alpine environments to prevent damage from wind, falling debris and freezing, and taller plant height may be selected in lowland environments to avoid competition for light (Billings 1974; Emery & Chinnappa 1994). In the event of a genotype- phenotype correlation, I could determine whether phenotypes were subject to balancing or directional selection. 18 CHAPTER 2: MATERIALS AND METHODS 2.1 STUDY SPECIES, FIELD SAMPLING AND POPULATION CHARACTERISTICS Leaf tissue was sampled from A. multifida individuals during flowering from two alpine and three lowland sites in Alberta, Canada, during June and July, 2011 (Table 1, Figure 2). Within populations, plants were sampled along a transect with a minimum distance of 7 m between individuals. Leaf material was placed in plastic bags with silica gel for storage and future DNA extraction. Floral colour and plant height were also measured in the field. To measure floral colour, petals were collected from individuals displaying all floral colour morphs (white, red and pink) in a sample from Big Hill Springs, Alberta, placed in a cooler to prevent pigment degradation during transport, and scanned the same day with an Ocean Optics USB 2000 spectrophotometer to assess floral colour (following McEwen and Vamosi 2010). Floral colours generally fell into white (uniform transmittance across the visual spectrum), red (transmittance in visual-red wavelengths), and pink (slightly higher uniform transmittance and lower transmittance in visual-red spectrum) with no UV reflectance, so remaining floral colour phenotypes were scored according to their visual colour without a spectrophotometer. Above-ground plant height was measured on live individuals from the base of the plant at the soil to the tallest flowering shoot using a tape measure. Table 1. Location and elevation of sites from which A. multifida was sampled. The lowland populations were from Big Hill Springs (BHS), Beauvais Lake (BL), and Willow Creek (WC), while the alpine populations were from Hailstone Butte (HSB) and Highwood Pass (HWP). Population Big Hill Springs Beauvais Lake Hailstone Butte Highwood Pass Willow Creek Final Sample Size 24 25 29 24 21 Latitude (°N) Longitude (°W) Elevation (m) 51.251 49.415 50.205 50.604 50.117 114.386 114.092 114.445 114.984 113.777 1229 1472 2080 2377 1055 19 Figure 2. Location of populations sampled from Alberta, Canada, in and along the Rocky Mountains and foothills during June and July, 2011. Populations BHS, WC and BL are lowland (1055 – 1472 metres) and HWP and HSB are alpine (2080 – 2377 metres) populations. The environmental differences between populations were not quantified in this study, but are readily available from other sources for populations adjacent to the sites used in this study (Emery & Chinnappa 1994). Alpine and lowland environments differ considerably in their abiotic characteristics (Table 2; Emery & Chinnappa 1994). Alpine environments generally have more photosynthetically active radiation, stronger winds, lower temperatures and briefer growing seasons than lowland habitats (Table 2; Emery & Chinnappa 1994). In lowland environments, the potential effects of more intense competition are evident in the lower soil moisture and nutrient content, as well as the greater biomass and height at herbaceous plant layers (Table 2; Emery & Chinnappa 1994). 20 Table 2. The environmental differences between an alpine and a lowland environment near the sites sampled in this study, based on Emery et al. (1994). Only soil NH3 is provided, as other soil nutrients (NO3 and PO4) follow a similar pattern (higher nutrient and organic content in alpine than lowland). PAR - photosynthetically active radiation. Elevation (m) PAR (µgE/sm2) Wind (m/s) Growing season temperature (°C) Herb layer biomass (g/m2) Herb layer height (cm) Soil moisture (% wt) Soil NH3 (ug/g dry mass) Alpine 2453 2242 6.6 Lowland 1310 1627 2.8 7.9 14.7 142.3 15.3 60.9 62.1 572.2 72.9 35.5 13.9 2.2 DNA EXTRACTION, AFLP AND ALLELE SCORING DNA was extracted from silica-dried leaf tissues using a standard CTAB/chloroform DNA extraction protocol (Khanuja et al. 1999). Leaves were crushed in a microfuge tube, incubated overnight in a CTAB/β-mercaptoethanol buffer to disrupt tissues and lyse cells. DNA was separated with chloroform and precipitated in ethanol overnight and re-suspended in ddH2O. DNA quality was determined with agarose gel electrophoresis to assess any DNA degradation, and a Beckman Coulter DTX 880 Multimode Detector spectrophotometer (Beckman Coulter, Brea, CA, USA) was used to assess contamination from protein and RNA and quantify DNA. DNA concentration was standardized to 150 ng/µL and a total of 750 ng was used for amplified fragment length polymorphism (AFLP) analysis following the Amplification Kit for Regular Plant Genomes (Applied Biosystems, Carlsbad, CA, USA) using the restriction enzymes EcoR I and Mse I (New England BioLabs, Ipswitch, MA, USA). DNA was digested by incubating overnight with the restriction enzymes, T4 DNA ligase, NaCl, BSA and the complementary adaptors, checked for complete digestion on an agarose gel and diluted to a 10X 21 concentration in water for preselective amplification. Preselective amplification was conducted with the supplied reagents according to the manufacturer’s instructions (Appendices A), and checked to verify that amplification occurred in the 100-1500 bp range on an agarose gel. Preselective product was diluted to a 5X concentration for selective amplification. Selective amplification was performed on the preselective product with MseI - EcoRI adaptors CAA-ACG, CAC-ACG, and CTC-AGG with the AFLP Amplification Core Mix PCR master mix (Applied Biosystems, Carlsbad, CA, USA). AFLP fragments were separated on an Applied Biosystems 3500xL Genetic Analyzer (Applied Biosystems, Carlsbad, CA, USA) at the University of Calgary, Department of Biological Sciences. Allele sizes (in base pairs) were determined by reference to the internal sizing standard (GS-500 LIZ) in the software GENEMAPPER v4.0 (Applied Biosystems, Carlsbad, CA, USA). Fragments between 100-500 bp were scored using automatic allele binning in Genemapper, with a cut-off intensity of 100 fluorescent units to minimize falseallele calling from low level artifacts in the electropherogram. The polymorphic peaks identified in Genemapper were then manually checked for quality and consistent scoring. AFLP alleles with multiple peaks were discarded due to the unreliable sizing of the fragments. AFLP alleles with amplification at or near the 100 fluorescent unit cut off were manually checked for consistent scoring, as peaks with amplification just below the threshold can be a major source of allele drop out (Luikart et al. 2003). The identified alleles were first checked against five DNA sample replicates on different gels. The error rate after correcting for peak quality was determined (in terms of the proportion of inconsistently scored loci there were). Loci that were inconsistently scored between DNA replicates were removed from the final data set to reduce the error rate as much as possible for use in determining genotype and population structure. Samples that had weak 22 amplification or high noise across the electropherogram were also discarded to avoid allele dropout and false-allele calling stemming from failed or non-optimal PCR conditions, leaving 479 loci in the final dataset 2.3 DETECTION OF OUTLIER LOCI Genemapper provides the option of exporting both the dominant (binary, present or absent allele information) and peak height data from each allele (if the allele is present). Throughout the history of AFLP, most analyses have chosen to use the dominant data as the basis of genotyping individuals (Foll et al. 2010). I used BayeScan (Foll & Gaggiotti 2008; Foll et al. 2010; Fischer et al. 2011) to identify outlier AFLP loci based on a decomposition of the logistic transformation of FST for locus i in population j onto locusspecific (αi) and population-specific (βj) components (Foll & Gaggiotti 2008). To identify outlier loci, the posterior probability that each locus is an outlier (αi≠ 0) was estimated with a Markov Chain Monte Carlo method. as the proportion of interations for which α was included in the model during sampling. In this study, I considered a log posterior odds > 10 as indicating that a particular locus is an outlier, as in previous investigations (Foll & Gaggiotti 2008; White, Stamford, et al. 2010; Alberto et al. 2010; Foll et al. 2010; Fischer et al. 2011). Amongst the identified outlier loci, the mechanism of selection can be inferred from αi, with negative αi indicating candidates for balancing selection and positive values indicating candidates for directional selection (Foll & Gaggiotti 2008). In this study, I used a burn-in of 50,000 iterations, and a sample size of 10,000 with a thinning interval of 50 (following Foll & Gaggiotti 2008; Fischer et al. 2011). The number and identity of loci determined to be outliers with peak height and dominant data were compared to determine if any major discrepancies occurred when using either form of AFLP data, but only binary data was used for subsequent analyses as most population-genetics programs currently available accept only dominant data inputs. 23 2.4 GENETIC AND POPULATION STRUCTURE ANALYSES The number of distinct genetic clusters within each dataset was first identified with a principal components analysis of the AFLP genotype data (example in Appendix C) using R statistical software (R Development Core Team 2008), and any apparent clustering in the neutral and outlier loci data along the first and second principal component axes was assessed for both within and between population clustering (e.g. Bryc et al. 2010). Clustering methods can also be useful for visualizations and initial investigations of clustering, but these distance-based methods alone do not constitute a rigorous test of genetic clustering and are prone to variation in interpretation of figures and the distance measurement used (Pritchard et al. 2000). I also used the individual assignment-based approach implemented in STRUCTURE version 2.3.3 (Pritchard et al. 2000; Falush et al. 2003, 2007; Hubisz et al. 2009). STRUCTURE takes a Bayesian approach by sampling the posterior probability of the number of distinct populations (or genetic clusters), given the observed number of genotypes using Markov Chain Monte Carlo (MCMC) methods (Pritchard et al. 2000), using parameters outlined below. Two ancestry models can be used with MCMC sampling in STRUCUTRE, one assuming no admixture (i.e. all individuals come from one population of origin but populations have not interbred since) or allowing admixture (i.e. gene flow may have occurred between two or more populations). I used the admixture model for this study, as it seemed most reasonable considering the ecological and evolutionary history of A. multifida. For the neutral and outlier loci data, simulations using a burn-in of 10000 iterations and 10000 MCMC replicates after burn-in were used to determine the probability of the model assuming 1 to 7 populations. These simulations were replicated 10 times at each level of K (the number of putative populations) to determine the variation in probability estimates, allowing for a correction of the 24 STRUCTURE results such that the most likely number of unique genetic clusters was found (Evanno et al. 2005). I analyzed the distribution of genetic variation among and within populations with an analysis of molecular variance (AMOVA) analysis for both the outlier and neutral loci using GenAlEx (Peakall & Smouse 2006: also see Gaudeul et al. 2004; Honnay et al. 2009). AMOVA can be used to test for genetic variance among populations (i.e. significant population structure), and differentiation amongst individuals within populations (i.e. the population reproduces sexually). The significance of the proportion of variance attributed to among-population effects (ϕ) is tested by comparing the observed ϕ to a distribution of ϕ based on simulated populations of randomly assigned individuals (Peakall & Smouse 2006). To estimate and test genetic population structure between sampled sites (FST), I used AFLPsurv v1.0 (Vekemans et al. 2002), which assesses FST from the frequency of the null allele using a number of options. AFLPsurv uses a Bayesian method to estimate the frequency of the null allele from the sample size (number of individuals) and the number of individuals that have a null allele (Vekemans et al. 2002). I chose a Bayesian method with non-uniform prior distribution of allele frequencies (following the model by Zhivotovsky 1999) assuming Hardy-Weinberg conditions were met, which has been regularly used for estimating null allele frequencies in AFLP studies (Vekemans et al. 2002; Bonin et al. 2007). AFLPsurv also assumes that individuals are diploid, possibly leading to higher estimates of population differentiation in polyploid species (i.e. there may be higher heterozygosity within populations due to the possibility of more than two alleles at each locus. To assess effects of isolation by distance for the neutral and outlier loci, I assessed the relation of FST (estimated with AFLPsurv) to inter-population distance with linear regression. 25 Determination of patterns of population structure at the neutral and outlier loci assumes that loci are transmitted independently of each other. Instead, loci may be in gametic-phase disequilibrium, tending to vary together within and between populations, because of physical linkage or other non-random associations between alleles. In such cases, functional regions of the genome under selection typically impact areas surrounding the allele under selection (Nosil et al. 2009b; Feder & Nosil 2010). Testing for gameticphase disequilibrium amongst outliers is necessary to isolate the effects of genetic hitchhiking from drift or selection at each locus. To test for gametic-phase disequilibrium amongst outlier loci, I used MultiLocus 1.3 (Agapow & Burt 2001), which calculates the index of association (Brown et al. 1980; Smith et al. 1993; Haubold et al. 1998) by comparing the number of loci that are different in pairwise comparisons of all individual comparison. The variation in the number of different loci between individuals is then tested against the number of loci that differ between individuals expected when loci are in equilibrium (Agapow & Burt 2001). Due to computational limitations, disequilibrium was not assessed between neutral loci. 2.5 PHENOTYPE ANALYSES Population differences between floral colour and plant height data were assessed to determine whether environment or demography may have affected the evolution of these ecologically important phenotypes. All statistical tests were done in R statistical software (R Core Development Team 2008). Differences in the frequencies of floral colour morphs among populations were assessed with a chi-square test of independence. ANOVA was used to test for differences in plant height among populations. Many phenotypes display a remarkable phenotypic plasticity, particularly between alpine and lowland environments (e.g. Chinnappa et al. 2005). To determine whether population differences in floral colour distribution or mean plant height represented plasticity, or may have been affected by 26 natural selection, I searched the genetic data for associations with the measured phenotypes. Multiple Spearman correlations were used to detect associations between phenotypes and each allele. As an initial correction for multiple comparisons, Bonferroni correction was used on the resulting p-values. Given the low power of this approach, particularly with many comparisons (Ryman & Jorde 2001), I also did multiple comparison controlling for false discovery rate using the “fdr” option in R following Benjamini & Hochberg (1995). 27 CHAPTER 3: RESULTS 3.1 AFLP AND THE DETECTION OF OUTLIER LOCI A total of 759 markers amplified with the three primer combinations. There were 511 AFLP markers that remained after discarding monomorphic markers and those with poor peak quality. There were 32 of these AFLP loci that were incorrectly scored between DNA replicates (approximately 6.26%). After removing these 32 incorrectly scored loci, 479 markers remained in the final dataset (see Table 3). Amongst all alpine and lowland populations 13 loci (2.7%) were significant outliers amongst the dominant AFLP data, and nine were outliers (1.9%) based on peak height (all of which were included in the dominant data set. Overall, loci assessed with peak height had lower posterior odds at high FST values than the dominant data, but slightly higher posterior odds at moderate FST (Fig. 3, Fig. 4). The false-discovery rate for the dominant data was 0.022, almost half that for peak height (0.041). The power (1 – false negative rate) for peak height was 0.893, which was lower, but in a similar range, to the power for the dominant data at 0.898. All outlier loci have positive α, suggesting divergent selection (Foll & Gaggiotti 2008). 28 Table 3. Basic AFLP primer pair characteristics, including NBANDS, the number of bands scored, NSAMPLES, the number of samples successfully scored, HE, expected heterozygosity, HEprimer, s expected heterozygosity averaged over primer combinations, HEpop, the expected heterozygosity averaged over populations, and P, the proportion of polymorphic markers. Dye NBANDS NSAMPLES EcoRI-CAA MseI-ACG JOE 133 122 EcoRI-CAC MseI-ACG JOE 163 122 EcoRI-CTC MseI-AGG JOE 183 122 HEprimer BHS HE P 0.094 0.233 0.134 0.307 0.123 0.295 0.117 BL HE P 0.084 0.248 0.104 0.294 0.113 0.295 0.100 HSB HE P 0.104 0.263 0.127 0.307 0.131 0.328 0.121 HWP HE P 0.106 0.301 0.128 0.344 0.135 0.388 0.123 WC HE P 0.097 0.308 0.130 0.368 0.129 0.355 0.118 HEpop 0.097 0.124 0.126 29 Figure 3. Relation of FST to the log posterior odds (log(PO)) that a particular locus in the dominant dataset is an outlier, as identified in BayeScan (Foll et al. 2010; Fischer et al. 2011). A log posterior odds of 1 (vertical line) was used as the outlier threshold. Outliers may represent signatures of natural selection. 30 Figure 4. Relation of FST to the log posterior odds (log(PO)) that a particular locus in the dominant dataset is an outlier, as identified in BayeScan (Foll et al. 2010; Fischer et al. 2011). A log posterior odds of 1 (vertical line) was used as the outlier threshold. Outliers may represent signatures of natural selection. 3.2 POPULATION STRUCTURE OF NEUTRAL LOCI The first and second principal components of the neutral locus dataset explained approximately 13.0% and 10.3% of the overall variance, respectively, for a cumulative proportion of 23.3%. Additional principal components explained less than 6% of the variance individually. At the neutral loci, most individuals clustered together, regardless of population of origin, although there was variation amongst individuals along the PC1 axis (Fig. 5). There were a number of individuals that deviated from the major cluster (Fig. 5). 31 Of particular note, 5 individuals from the HWP alpine population clustered together and deviated from other individuals and populations (Fig. 5). There was also a cluster of 6 individuals from three populations, including individuals from both alpine and lowland populations (Fig. 5). These individuals had a lower distance from the main cluster than the further HWP cluster, but still show a relatively moderate degree of differentiation from most individuals at the neutral loci. 3 2 1 0 PC2 -‐1 -‐2 -‐3 -‐4 BHS BL -‐5 WC HSB -‐6 HWP -‐7 -‐4 -‐2 0 PC1 2 4 Figure 5. Scatterplot of the first two principal components of variation in AFLP genotype for neutral loci (loci linked to demographic processes, not including outlier loci). Alpine populations, HWP and HSB; lowland populations, BHS, BL, and WC. STRUCTURE analysis detected K 4 or 5 distinct populations. Other models including K from 1 to 3 or 6 to 7 had substantially lower likelihoods. After correction for the variance in probability estimates, according to Evanno et al.(2005), the model of K = 4 32 populations had the highest support, whereas K = 5 had substantially lower support than other models (Fig. 6). The number of distinct genetic clusters within the data was therefore deemed to be 4 for future plotting and analyses. The inference of 4 distinct candidate populations (beyond the 3 evident in the PCA) suggests genetic structure amongst populations. In agreement with the PCA, STRUCTURE identified unique groups in the HWP population (blue and yellow clusters in Fig. 7, lower plot), corresponding to the HWP and the HSB/BL/BHS groups as was found in the PCA. Two additional major clusters (green and red, Fig. 7) colours, with a few individuals assigned roughly equally to both the red and green clusters overall (Fig. 7). 35 30 ΔK 25 20 15 10 5 0 2 3 4 5 6 7 8 K (number of gene-c clusters) Figure 6. The most likely K number of distinct genetic clusters at the neutral loci (denoted with the highest ΔK) following the correction method for STRUCTURE results by Evanno et al. (2005). ΔK in this case is the mean of the second order rate of change in K divided by the standard deviation of K as determined from 10 replicate simulations at each level of K. The higher ΔK, the more likely K is the correct number of genetic clusters. 33 Figure 7. Barplots showing the probabilities of individual assignment to each genetic cluster (represented by different colours) as assigned using neutral loci and assuming 4 genetic clusters in STRUCTURE 2.3.3. The top plot sorts individuals by cluster, and the bottom plot sorts individuals by site. BHS, BL and WC are lowland and HSB and HWP are alpine sites. AFLPsurv identified significant population structure amongst sampled sites at the neutral loci, with a global FST of 0.041 and a 99% upper limit FST of 0.021 (i.e. p < 0.01). The HWP alpine population was significantly subdivided from all other populations and significantly differentiated from the lowland populations (Table 4). Additionally, each alpine population represented a distinct genetic group, with significant population structure between both alpine populations (Table 4). The lowland sites did not exhibit significant genetic structure, with FST values for all comparisons not significantly different from zero (Table 4). Approximately 9% of the molecular variantion at neutral loci occurred between sites (AMOVA, ϕ4,121 = 0.089, p <0.001), again indicating significant population structure amongst sites. Neutral genetic population structure did not vary significantly with distance between sites and (Linear Regression, r2 = 0.003, F1,8 = 0.024, p = 0.882; Fig. 8). 34 Table 4. FST estimates based on dominant data for all neutral (top panel) and outlier (bottom panel) AFLP loci for all pairs of five A. multifida populations, . Estimates that differ significantly from zero at p < 0.01 are bolded, except in the outlier table in which all FST estimates are significantly greater than zero. The lowland populations are Big Hill Springs (BHS), Beauvais Lake (BL), and Willow Creek (WC). The alpine populations are Hailstone Bute (HSB), and Highwood Pass (HWP). BL HSB HWP WC BHS 0.018 0.019 0.072 0.004 BL HSB HWP WC BHS 0.074 0.206 0.338 0.092 BL HSB HWP 0.041 0.095 0.013 0.067 0.009 0.057 BL HSB HWP 0.167 0.429 0.032 0.445 0.118 0.365 35 0.120 0.100 FST 0.080 0.060 0.040 0.020 0.000 0 50 100 150 200 250 200 250 Distance (km) 0.5 0.45 0.4 0.35 FST 0.3 0.25 0.2 0.15 0.1 0.05 0 0 50 100 150 Distance (km) Figure 8. Isolation by distance between all pairs of populations in this study at the neutral loci (top panel) and the outlier loci (bottom panel). There was no significant effect of distance on neutral or outlier population structure. 36 3.3 POPULATION STRUCTURE OF OUTLIER LOCI The first and second principal components of the neutral locus dataset explained approximately 29.5% and 16.8% of the overall variance, respectively, for a cumulative proportion of 46.3%. Further principal components explained less than 9% of the variance individually. Most individuals from the lowland sites grouped together along the PC1 and PC2 axes (Fig. 9). Alpine sites tended to cluster separately from each other and the lowland cluster (Fig. 9). HWP plants overlapped in PC values much less with lowland individuals than HSB plants (Fig. 9). 2 1.5 BHS BL WC PC2 1 0.5 0 -‐0.5 -‐1 -‐1.5 -‐2.5 -‐1.5 -‐0.5 0.5 1.5 PC1 Figure 9. Scatterplot of the first two principal components from a PCA of outlier loci (candidate loci showing signatures of natural selection) for two Alpine populations (HWP and HSB) and three lowland populations (BHS, BL, and WC). 37 STRUCTURE identified 3 or 4 distinct genetic clusters, with 3 clusters receiving highest support after variance correction (Fig. 10). Specifically, the three genetic clusters (represented by the different colours in Fig. 11) distinguished the HSB and HWP alpine sites from each other and from the lowland sites as a group (Fig. 11). Although no individuals from the lowland or HSB populations had a high probability of assignment to the HWP cluster, many lowland individuals had high probabilities of assignment to the HSB cluster (Fig. 11). This contrast suggests some gene flow between the HSB and lowland sites, but not between the HWP and lowland sites (Fig. 11). Overall, the strong structuring of outlier loci in the alpine sites suggests contrasting selection between alpine and lowland environments. 200 180 160 ΔK 140 120 100 80 60 40 20 0 2 3 4 5 6 7 8 K (number of gene-c clusters) Figure 10. The most likely number of distinct genetic clusters, K, at the outlier loci (denoted with the highest ΔK) detected by STRUCTURE following the correction method of Evanno et al. (2005). ΔK in this case is the mean of the second-order rate of change in K divided by the standard deviation of K as determined from 10 replicate simulations at each level of K. The higher ΔK, the more likely K is the correct number of genetic clusters. 38 Figure 11. Barplot showing the probability of individual assignment to each genetic cluster (represented by different colours) for outlier loci using the Bayesian approach implemented in STRUCTURE 2.3.3. Sites represented are lowland (BHS, BL, and WC) as well as alpine (HSB and HWP). All analyses of the outlier loci consistently identified three groups of individuals, corresponding to each alpine site (HSB and HWP) and a lowland cluster. AFLPsurv identified significant population structure amongst sampled sites at the neutral loci, with a global FST of 0.255 and a 99% upper limit FST of 0.022 (i.e. p < 0.01). FST differed significantly from zero between all population pairs, indicating significant genetic population structure in all populations at the outlier loci (Table 4). Differentiation was most pronounced (i.e. highest FST ) between lowland and alpine environments as well as alpine sites (Table 4). Approximately 15% of the genetic variance occurred between environments in the outlier loci. Between environment variance accounted for a significant proportion of the genetic variance (AMOVA, ϕ1,121 = 0.152, p < 0.01), indicating significant genetic population subdivision between environments. Genetic differentiation at the outlier loci did not vary significantly with distance between sites (Linear Regression, r2 = 0.057, F1,8 = 0.483, p = 0.507; Fig. 8). The frequencies of alleles at outlier loci tended to vary most between alpine and lowland populations, with little variation between lowland populations (Table 8). Only one locus showed similar allele frequencies between the two alpine populations (locus 58); otherwise allele frequencies differed most the HSB and HWP sites. The mean difference in 39 allele frequencies were 0.179 between HSB and the lowland , 0.393 between lowland and HWP alpine population, and 0.461 between the HSB alpine and the HWP alpine sites (Table 8). Thus, outlier loci tended to be associated with alpine environments. No genetic associations were detected amongst pairs of outlier loci (Appendix A). Table 5. Allele frequencies of outlier loci each lowland (BHS, BL and WC) and alpine (HSB and HWP) site. Outlier locus1 locus58 locus78 locus169 locus176 locus185 locus194 locus203 locus209 locus220 locus254 locus333 locus427 BHS (lowland) 0.156 0.479 0.015 0.858 0.293 0.516 0.628 0.512 0.464 0.576 0.149 0.013 0.693 BL (lowland) 0.217 0.211 0.009 0.924 0.033 0.625 0.563 0.565 0.742 0.541 0.016 0.009 0.361 WC (lowland) 0.158 0.333 0.043 0.807 0.264 0.746 0.378 0.448 0.753 0.515 0.195 0.018 0.382 HSB (alpine) 0.691 0.030 0.010 0.773 0.371 0.196 0.106 0.511 0.662 0.469 0.020 0.009 0.297 HWP (alpine) 0.221 0.013 0.715 0.274 0.018 0.668 0.326 0.043 0.200 0.041 0.782 0.527 0.918 3.4 PHENOTYPIC DIFFERENCES IN HEIGHT AND FLORAL COLOUR Phenotypic variation was present between populations. Floral colour frequencies differed significantly between sites (X2 = 30.78, df = 8, p < 0.001; Fig. 13). In particular, almost all individuals from HSB and HWP alpine sites had white flowers. I observed only 2 red-flowered plants at alpine sites, both at HSB. Plant height also differed significantly between sites (ANOVA, F4,118 = 8.83, p < 0.0001, Fig. 13), as those at HSB were shorter than plants other sites (Tukey’s test, p < 0.05: Fig. 13). No AFLP loci showed significant 40 association with floral colour or above ground height phenotypes (Spearman correlation, df = 121, p > 0.05). Figure 12. Variation in plant height and flower colour within and among lowland sites, BHS, BL and WC, and alpine sites, HSB and HWP. 41 CHAPTER 4: DISCUSSION 4.1 GENETIC POPULATION STRUCTURE AT NEUTRAL AND OUTLIER LOCI In this study, I examined patterns of genetic variation between multiple populations of Anemone multifida, an allopolyploid plant with a large range spanning both alpine and lowland environments. Neutral population divergence was evident between alpine populations and between alpine and lowland environments, though the degree to which varied between alpine sites. In addition to neutral population structure, there was evidence for differentiation at the outlier loci according to alpine and lowland environments. Though the presence of neutral population structure may be related to population structure at the outlier loci, the presence of differing allele frequencies between environments at the outlier loci suggests adaptation to alpine and lowland environments may have occurred in A. multifida. Amongst all alpine and lowland sites an estimated 2.7% of the genome (1.9% with peak height data) represents possible signatures of natural selection, within the 1-4% range reported from other studies of contrasting environments (e.g. Apple et al. 2010; Fischer et al. 2011; Paris & Despres 2012). Frequencies of outlier loci varied independently, suggesting that they have evolved independently. These loci were highly differentiated between populations, as expected from divergent selection, with no evidence for balancing selection at any locus. Allele frequencues at the outlier loci in each site are unknown from these data alone, so whether alpine and lowland environments are associated with population divergence at the outlier loci is also unknown. Additionally, as genetic drift is also a prominent form of evolutionary divergence between populations (Nosil, Funk, & Ortiz-Barrientos 2009), drift cannot be excluded as being responsible for the observed outlier genetic variation without comparing patterns of evolution at the neutral and outlier loci in multiple populations. However, separate analysis of outliers and neutrally evolving 42 loci enabled the investigation of genetic structure at both sets of loci to determine the probable roles of genetic drift and divergent natural selection in the structure of genetic variation between alpine and lowland environments. The presence of four distinct genetic clusters within the neutral data suggests neutral evolutionary divergence in A. multifida. Specifically, the HWP alpine population has apparently diverged neutrally from all other sampled sites, and the HSB alpine population has diverged from one lowland site. The presence of a unique cluster of individuals in the HWP alpine population likely contributed to the consistently high FST estimates for this population. Alpine sites can exert extreme abiotic selection (Billings 1974; Korner 2003), and are often isolated by major geographical barriers to gene flow (e.g. mountain ranges). The combined effects of extreme environment and restricted gene flow may have enhanced population divergence in A. multifida, which is consistent with findings of population divergence and speciation in alpine environments (Bonin et al. 2006b; Hughes & Eastwood 2006; Poncet, Herrmann, Gugerli, et al. 2010; Fischer et al. 2011). The neutral genetic structure amongst the sites in this study suggests neutral evolutionary processes, such as genetic drift or restricted gene flow, have contributed to population divergence. Furthermore, patterns of genetic differentiation at the outlier loci have likely been affected by neutral population divergence in conjunction with natural selection. The significant genetic differentiation between all sites at the outlier loci suggests strong divergence at candidate loci for adaptation amongst environments. The particularly high FST between alpine and lowland sites suggests accelerated genetic differentiation at the outlier loci in alpine environments. Similarly, the high FST between the alpine populations at the outlier loci suggests different alleles are under selection at these sites, although neutral processes may be primarily responsible for the differentiation at the outlier loci 43 between these populations. The outlier loci represent three genetic clusters: HSB-alpine, HWP-alpine and lowland populations as a group. Some individuals assigning to the HSBalpine outlier group were present at lowland sites, and HSB-alpine individuals represented the entire HSB site. The STRUCTURE results suggest that the overlap in clustering between HSB and lowland sites evident in the principal component and dendrogram analysis primarily reflects the presence of HSB-alpine alleles in lowland sites and not lowland alleles in the HSB site. The variation in the proportion of HSB alpine alleles in the lowland sites could be the primary cause of significant FST estimates at the outlier loci amongst lowland sites, which otherwise tend to have the lowest FST estimates. The outlier loci from the HWP-alpine cluster were present only in the HWP site, and appeared to be highly divergent from the lowland and HSB-alpine groups. The consistently high FST estimates for all comparisons of outlier loci involving the HWP alpine population are consistent with the cluster analyses in suggesting extensive evolutionary divergence at these loci in this population. Overall, these results support divergent natural selection between environments, which is strongest in alpine habitats. Common explanations of neutral genetic divergence could explain the observed genetic structure, but the allopolyploid history of Anemone multifida may have also affected patterns of neutral genetic differentiation. Many, if not most, polyploid species have multiple origins (Soltis & Soltis 1999; Symonds et al. 2010). Each allopolyploid origin could produce lineages with highly divergent genomes almost immediately. Similarly, differences in the effects of genomic downsizing and restructuring following polyploidization can cause newly synthesized polyploids to have highly differentiated genomes (Soltis & Soltis 1999; Otto & Whitton 2000), creating a polyploid complex. Polyploids tend to have fewer barriers to introgression with closely related species, including examples in Anemone (Heimburger 1959; Boraiah & Heimburger 1964; 44 Heimburger & Boraiah 1964). Interbreeding with different species in some sites could introduce highly divergent genetic material into polyploid populations that occur sympatrically with other species. The sites sampled in this study are situated along the range limits of a number of western and central North American Anemone species that are closely related to A. multifida (Meyer et al. 2010; Hoot et al. 2012), raising the possibility of introgression. Although alpine environments were associated with neutral population divergence in A. multifida, neutral genetic divergence may also been driven by multiple polyploidization events and genomic changes associated with hybridization and polyploidy, perhaps accounting for the two minor divergent clusters of individuals in the neutral data. In this case, the genetic clusters may associate only weakly with sampled sites because each genetic cluster represents a parental lineage (i.e. the putative parental species of the allopolyploid A. multifida). The presence of individuals assigning to the HSB-alpine cluster at the outlier loci at lowland sites suggests that selection is potentially not as strong on the HSB-alpine alleles in lowland environments as in the alpine. The fitness of these individuals is unknown, but they were flowering during sampling, indicating individuals with HSB-alpine alleles survived to reproductive stages in the lowland environment. In contrast, the absence of individuals with a high probability of assignment to the lowland group at the outlier loci in the HSB site suggests that selection acts strongly against lowland alleles in the HSB alpine site. Alpine environments generally exert extreme abiotic selection for survival and reproduction, and successful organisms must function at lower temperatures, shorter growing seasons, and exposure to wind, intense radiation, and falling debris (Billings 1974; Korner 2003). The more temperate abiotic conditions in lowland environments relax selection for extreme abiotic tolerance compared to alpine environments, but biotic stresses such as competition and herbivory may also exert selection (Billings 1974; Emery & 45 Chinnappa 1994). The differences between these environments can eventually cause population divergence through adaptation to extremely different ecological conditions (Bonin et al. 2006a; Poncet, Herrmann, & Gugerli 2010; Fischer et al. 2011), which may have contributed to the neutral population divergence between alpine and lowland environments in this study. The apparently weaker selection against HSB-alpine alleles in the lowland sites may permit more gene flow from HSB to lowland sites, perhaps accounting for the lower neutral population divergence than between HWP and the lowland sites. However, the lower frequency of lowland alleles at HSB indicates that selection for alpine adaptation maintains divergence at the outlier loci in alpine environments. Despite the evidence for limited divergence at outlier loci between HSB and lowland environments, the lower frequency of HWP-alpine alleles in lowland sites, consistently significant neutral genetic differentiation and nearly uniform distribution of HWP-alpine alleles in the HWP site suggests extensive differentiation at both the outlier and neutral loci in the HWP site. Additionally, the divergence between HSB and HWP alpine sites at the outlier loci suggests natural selection for alpine adaptation may have had different effects at these locations, perhaps because of neutral divergence of the HWP population. HSB is approximately 300 m lower than HWP, suggesting differences in environment along elevation gradient may be driving the divergence between all three outlier clusters. Additionally, HSB is located at the eastern front range of the Rocky Mountains closer to the lowland populations, whereas HWP site is on the west site of the front range. The high alpine barrier between HWP and the lowland sites may account for the increased neutral and outlier population divergence, and the apparently more migration between the HSB and lowland sites. HSB site was much more exposed to wind, had thinner soil, and lower shrub cover than the HWP site. The ecological differences between the alpine sites could account for the population divergence at the outlier loci, but this has yet 46 to be tested. Alternatively, individuals at these locations may have evolved different molecular mechanisms for convergent adaptations to similar ecological conditions, as has been observed in other species (Arendt & Reznick 2008). Due to the extensive neutral population divergence in the HWP population, and without data from additional alpine populations, whether natural selection or neutral evolutionary processes are the primary cause for the differentiation at the outlier and neutral loci is uncertain. The high frequency of white-flowered individuals in alpine environments, and the shorter plants at HSB site suggests these phenotypes reflect differences between alpine and lowland environmental conditions. Low shoots can reduce damage from exposure to wind and falling debris, and closer proximity to the ground can also limit freezing and frost formation (Billings 1974; Korner 2003). For example, the particularly windy conditions at the HSB site may explain its short plants, unlike at HWP site, which is not as exposed to wind. The high frequency of white flowers in alpine environments could indicate differences in the pollination community at alpine sites has favoured white floral colour. Alternatively, the lack of pigmentation could be the by-product of lower phytochemical production from generally lower herbivory in alpine environments (Billings 1974). The lack of correlation between genotype and phenotype for both floral colour and shoot height traits could indicate that the genome scan included too few loci to detecting such associations. Further sampling with different restriction enzymes may yield markers associated with floral colour or plant height. Alternatively, floral colour or shoot height may be phenotypically plastic, as is often observed with both plant height and floral colour (Nicotra et al. 2010). The higher false discovery rate and fewer detected loci based on AFLP peak height data than dominant AFLP data suggests the peak height proves lower power. This conclusion contrasts with a previous investigation that found band intensity had a higher 47 power to discover loci that showed signatures of natural selection (Fischer et al. 2011). Polyploidy may cause ambiguity in genotyping based on peak height., Four tetraploids, the four allele copies at each locus introduce more variation in peak height than in diploids, possibly leading to a greater variation in estimates of genetic differentiation and/or decreased ability to detect outlier loci due to homoplasy. If so, peak-height estimates of the number of outliers in polyploid species would be more conservative than in diploid species. Additionally, Fischer et al.'s (2011) may have involved different selection intensity, so that estimates of outliers may depend on biological factors other than the simple presence/absence of natural selection. 4.2 LIMITATIONS AND ALTERNATE EXPLANATIONS Being amongst the first non-theoretical studies of the population genomics of a polyploid species, this study contributes information about the evolutionary divergence and adaptation in a polyploidy species. The results in this study suggest several key differences and similarities between the genomics of genetic divergence and adaptation for polyploids and diploids. Genome scans of diploids for signatures of natural selection consistently discover loci under the effects of divergent natural selection between environments with different ecological conditions (Luikart et al. 2003; Stinchcombe & Hoekstra 2008). Divergent selection has been associated with ecological differences between alpine and lowland environments (Byars et al. 2007; Fischer et al. 2011), along elevation, precipitation and temperature gradients (Bonin et al. 2006a; Gonzalo-Turpin & Hazard 2009; Poncet, Herrmann, Gugerli, et al. 2010; Freedman et al. 2010; Bradbury et al. 2010; Nunes, Beaumont, Butlin, et al. 2011; Cox & Broeck 2011), host-use differences (Egan et al. 2008; Apple et al. 2010; Funk et al. 2011), and ecological opportunity following major geological events (Hughes & Eastwood 2006; Bernatchez et al. 2010; Schluter et al. 2010). Similar associations of outlier alleles with environment were evident for polyploid A. 48 multifida. The mechanisms underlying genetic divergence at loci associated with divergent natural selection may therefore be similar in both polyploid and diploid species. Specifically, alleles under selection become increasingly common in sites between generations, leading to the characteristically low genetic variation at loci under the effects of natural selection (Stinchcombe & Hoekstra 2008). In Anemone multifida, the effects of polysomic inheritance (i.e. fixed heterozygosity) appear to have not constrained the effects of natural selection or neutral evolutionary divergence on the genome. Neutral population divergence in this polyploid species is also similar to that in diploid species. At neutral loci, if population structure exists, diploid populations typically show some site-specific component to neutral genetic variation. For example, diploid populations in different regions typically experience lower gene flow, leading to the eventual whole-genome divergence between populations. Typical examples include neutral population divergence due to allopatry (Hoskin et al. 2005; Roberts 2006; Kuehne et al. 2007; Surget-Groba et al. 2012), or simply isolation by distance between populations (Sharbel et al. 2000; Epperson 2007; Pusadee et al. 2009). The finding of four distinct genetic clusters in the neutral data and significant genetic population structure between some sites suggests that neutral evolutionary divergence between polyploid populations operates similarly, though there was no apparent isolation by distance relationship between populations. Although the outcome of population divergence appears similar between polyploids and diploids, polyploidy may affect the time required for neutral population differentiation. Further investigation into the phylogeographic history of A. multifida, and sequencing of neutral loci to determine variation in allele copy number would elucidate the relative contribution of short- and long-term evolutionary processes to evolutionary divergence in polyploids. 49 The field of the genomics of population divergence and speciation has grown substantially during the past decade with the progression of molecular markers for studying genome-wide evolutionary processes (Charlesworth 2010). While this study identified genetic divergence at a number of loci between alpine and lowland environments and suggests that natural selection has had similar effects on polyploid and diploid genomes, the function and polyploid nature of loci that show signatures of natural selection remains undetermined. Without sequence data or other methods for determining gene copy number, whether fixed heterozygosity has affected the discovery of some outlier loci remains undetermined. The potential effects of polyploidy on outlier detection would reduce the number of estimated outlier loci, so the outliers found in this study may conservatively represent the extent of natural selection on the genome of A. multifida. Additionally, AFLP markers may be associated with certain regions of the genome, so the markers used in this study may not be randomly or evenly distributed throughout the genome (Rogers et al. 2007). Additional outliers may be found with different restriction enzymes are used. The co-migration of AFLP fragments with similar sizes (size homoplasy) can lead to overestimates of allele frequencies and potentially decreased estimation of differentiation at specific loci (Gort et al. 2006; Caballero et al. 2008), reducing the probability of detecting outlier loci. Overall, the number of outliers found in this study may underestimate the extent of natural selection on the genome between alpine and lowland ecotypes, and further genomic analyses may find more outliers associated with alpine and lowland adaptation. In addition to limitations on outlier discovery, alternative explanations of environmentally based natural selection can account for genetic differentiation at the outlier loci (Bierne et al. 2011). Loci associated with genetic incompatibilities between populations, which can be heightened by natural selection between divergent environments, may be the primary cause of increased differentiation at the outlier loci (Rogers & 50 Bernatchez 2006; Bierne et al. 2011). Although natural selection still acts on genetic incompatibilities, the outliers identified in this study may not have direct ecological function related to alpine or lowland adaptation and instead may correspond to incompatible genomic regions between lowland and alpine environments. Similarly, selection against newly arisen deleterious mutations in a population can cause local differentiation at the mutated locus that would appear similar to other loci associated with adaptation (Charlesworth et al. 1997), further emphasizing the importance of determining gene function following outlier identification. Neutral evolutionary processes in some cases can lead to the heightened differentiation characteristic of outlier loci, particularly if certain populations have shared ancestry or barriers to gene flow are more prevalent between a subset of populations (Excoffier et al. 2009; Bonhomme et al. 2010). Neutral mutations that arise in growing populations can also appear to be highly differentiated, as they increase in frequency as the population expands (Klopfstein et al. 2006; Hofer et al. 2009), but are unrelated to adaptation. Although environmental associations at the outlier loci suggest that divergent selection has played a role in the evolution of the outlier loci, determination of the exact cause of genetic differentiation will require further genomic analyses to test these alternate explanations. In particular, the ascertainment of gene function will enable the differentiation of what role environmentally based selection vs. alternative non-environmentally based explanations have played in patterns of genetic variation at the outlier loci. 4.3 FUTURE DIRECTIONS Direct identification of phenotypes that may be the target of natural selection and tests for associations with the alleles found in this study may eventually uncover the traits associated with lowland and alpine adaptation, but this would be time-consuming and potentially ignores many phenotypes that are difficult to assess visually. Sequence data 51 from large portions of the genome, or the AFLP fragments in this study, provides the means for simultaneously discovering and characterizing outlier loci, and is more amenable to future experimentation than anonymous AFLP fragments (Stinchcombe & Hoekstra 2008; Storz & Wheat 2010). Determination of the identity of alleles that show signatures of natural selection and comparison of the sequence data to known genes is a straightforward means for finding phenotypes under natural selection, and lays the foundation for determining the function of ecologically important genes in the case that gene function has not been previously characterized. Determination of whether the alleles that show signatures of natural selection confer a fitness advantage to individuals in the wild provides a more rigorous test for confirming that natural selection actually acts on the alleles underlying adaptive phenotypes. This is important for removing false positives from the dataset, separating the effects of genetic drift from natural selection, and establishing a mechanistic link between genotype, phenotype and natural selection. In addition to further investigation of the functions of outlier loci, many questions remain about the cause of neutral evolutionary divergence amongst different genetic clusters of A. multifida. Neutral genetic population structure was associated with environmental differentiation between alpine and lowland habitats, but the causes of divergence of two minor clusters of individuals is unknown. Determination of the phylogeographic history of Anemone multifida across the range of the species would be a first step in assessing the larger scale and longer term evolutionary processes that caused the currently observed neutral evolutionary divergence. Increasing the sample of populations from across the range of A. multifida, will allow for a reconstruction of the evolutionary history within this species. Similarly, the resolution of the phylogeny of A. multifida and closely related Anemone species will be critical to the determination of whether neutral genetic divergence can be attributed to multiple origins of A. multifida, and 52 whether hybridization between species has contributed to neutral genetic differentiation. Reports of hybrid breakdown between some A. multifida individuals indicate that neutral divergence may have progressed to reproductive isolation (Heimburger & Boraiah 1964), and the examination of interfertility between different genetic groups, such as those identified in this study, could confirm if the early stages of speciation have indeed occurred. The utilization of sequence data in some form for the determination of these questions about neutral divergence will enable the determination of allele copy number, and thus the potential contribution of polyploidy to the patterns observed in this study. The comparison of these results in A. multifida with basic population genetic analysis in closely related diploid species will also clarify whether the patterns of neutral genetic variation observed in this study are due to polyploidy or are characteristic of the genus. Perhaps the greatest limitation to this study is the need for more replication of alpine populations. The HWP population in particular was highly divergent from all other populations, while there was lower levels of genetic differentiation between the HSB and lowland populations. In the absence of additional alpine populations for comparison, it is not possible to determine if the general trend of alpine adaptation and neutral population divergence is more like the HWP or HSB population (other neither) in A. multifida. Future studies of the population genomics of alpine and lowland adaptation in this or other species should seek to 1) replicate alpine populations to a sufficient level, 2) obtain sequence information to enable the identification of potential function of outlier loci or to establish a baseline for targeted investigations into gene function, and 3) establish a link between variation at outlier loci and fitness differences in nature through reciprocal transplant experiments. Only through sufficient replication of populations and determination of exactly what function outlier alleles have in adaptation can a clear picture emerge of how evolution by neutral and selective processes has occurred. 53 Natural selection plays a major role in the adaptation of species to different habitats, the population divergence and species diversification. This study is amongst the first to examine the population genomics of a polyploid species on a molecular level. Polyploidy has played a prominent role in the evolution of plants, fungi and many animal lineages (Otto & Whitton 2000; Wendel 2000; De Bodt et al. 2005; Soltis et al. 2009), and the investigation of the mechanisms of adaptation in polyploids will answer many long standing questions about the evolutionary significance of genome duplication. studies of the genomics of adaptation and population divergence, even in diploids, are still initial stages, but applications of recently developed sequencing technology hold great promise for answering many questions about effects of different evolutionary processes. Through the study of the effects of natural selection on the genome, the molecular mechanisms of adaptation can be discovered and characterized, leading to a clear picture of how organisms adapt to different environmental conditions. The study of the whole-genome effects of population divergence provides the context for understanding the effects of evolution at single loci and the means for understanding the relative contribution of neutral evolutionary processes and natural selection to population divergence. Determination of the contribution of selective and non-selective evolutionary processes to the adaptation and will expand understanding of how evolution has shaped contemporary biological diversity and how adaptation and speciation will progress in the future. 54 Bibliography Adams KL, Wendel JF (2005) Polyploidy and genome evolution in plants. Current Opinion in Plant Biology, 8, 135–41. Aegisdóttir HH, Kuss P, Stöcklin J (2009) Isolated populations of a rare alpine plant show high genetic diversity and considerable population differentiation. Annals of Botany, 104, 1313–22. Agapow PM, Burt A (2001) Indices of multilocus linkage disequilibrium. Molecular Ecology Notes, 1, 101–102. Alberto F, Niort J, Derory J, et al. (2010) Population differentiation of sessile oak at the altitudinal front of migration in the French Pyrenees. Molecular Ecology, 19, 2626–39. Alvarez N, Thiel-Egenter C, Tribsch A, et al. (2009) History or ecology? Substrate type as a major driver of spatial genetic structure in Alpine plants. Ecology Letters, 12, 632–40. Apple JL, Grace T, Joern A, St Amand P, Wisely SM (2010) Comparative genome scan detects host-related divergent selection in the grasshopper Hesperotettix viridis. Molecular Ecology, 19, 4012–28. Arendt J, Reznick D (2008) Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends in Ecology & Evolution, 23, 26–32. Baack EJ, Whitney KD, Rieseberg LH (2005) Hybridization and genome size evolution: timing and magnitude of nuclear DNA content increases in Helianthus homoploid hybrid species. The New Phytologist, 167, 623–30. Baird NA, Etter PD, Atwood TS, et al. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PloS One, 3, e3376. Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate : A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B, 57, 289–300. Bernatchez L, Renaut S, Whiteley AR, et al. (2010) On the origin of species: insights from the ecological genomics of lake whitefish. Philosophical Transactions of the Royal Society of London B, 365, 1783–800. Bierne N, Welch J, Loire E, Bonhomme F, David P (2011) The coupling hypothesis: why genome scans may fail to map local adaptation genes. Molecular Ecology, 20, 2044–72. 55 Billings W (1974) Adaptations and origins of alpine plants. Arctic & Alpine Research, 6, 129–142. De Bodt S, Maere S, Van de Peer Y (2005) Genome duplication and the origin of angiosperms. Trends in Ecology & Evolution, 20, 591–7. Bonhomme M, Chevalet C, Servin B, et al. (2010) Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended. Genetics, 186, 241-262. Bonin A, Ehrich D, Manel S (2007) Statistical analysis of amplified fragment length polymorphism data: a toolbox for molecular ecologists and evolutionists. Molecular Ecology, 16, 3737–58. Bonin A, Taberlet P, Miaud C, Pompanon F (2006) Explorative genome scan to detect candidate loci for adaptation along a gradient of altitude in the common frog (Rana temporaria). Molecular Biology & Evolution, 23, 773–83. Boraiah G, Heimburger M (1964) Cytotaxonomic studies on new world Anemone (section Eriocephalus) with woody rootstocks. Canadian Journal of Botany, 42, 891–922. Bradbury IR, Hubert S, Higgins B, et al. (2010) Parallel adaptive evolution of Atlantic cod on both sides of the Atlantic Ocean in response to temperature. Proceedings of the Royal Society B, 277, 3725–34. Bridle JR, Vines TH (2007) Limits to evolution at range margins: when and why does adaptation fail? Trends in Ecology & Evolution, 22, 140–7. Brown AHD, Feldman MW, Nevo E (1980) Multilocus structure of natural populations of Hordeum spontaneum. Genetics, 96, 523–536. Bryc K, Auton A, Nelson MR, et al. (2010) Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proceedings of the National Academy of Sciences, 107, 786–91. Buehler D, Graf R, Holderegger R, Gugerli F (2012) Contemporary gene flow and mating system of Arabis alpina in a Central European alpine landscape. Annals of Botany, 109, 1359–1367. Buerkle CA, Gompert Z, Parchman TL (2011) The n = 1 constraint in population genomics. Molecular Ecology, 20, 1575–81. Burke J, Voss T (1998) Genetic interactions and natural selection in Louisiana iris hybrids. Evolution, 52, 1304–1310. 56 Byars SG, Papst W, Hoffmann A a (2007) Local adaptation and cogradient selection in the alpine plant, Poa hiemata, along a narrow altitudinal gradient. Evolution, 61, 2925–41. Caballero A, Quesada H, Rolán-Alvarez E (2008) Impact of amplified fragment length polymorphism size homoplasy on the estimation of population genetic diversity and the detection of selective loci. Genetics, 179, 539–54. Carter AJ, Robinson ER (1993) Genetic structure of a population of the clonal grass Setaria incrassata. Biological Journal of the Linnean Society, 48, 55–62. Casper B, Jackson RB (1997) Plant competition underground. Annual Review of Ecology & Systematics, 1997, 545–570. Chapman MA, Abbott RJ (2010) Introgression of fitness genes across a ploidy barrier. The New Phytologist, 186, 63–71. Chapman HM, Parh D, Oraguzie N (2000) Genetic structure and colonizing success of a clonal, weedy species, Pilosella officinarum (Asteraceae). Heredity, 84, 401–409. Charlesworth B (2010) Molecular population genomics: a short history. Genetics Research, 92, 397–411. Charlesworth B, Nordborg M, Charlesworth D (1997) The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genetics Research, 70, 155–74. Chinnappa C, Donald G, Sasidharan R, Emery RN (2005) The biology of Stellaria longipes (Caryophyllaceae). Botany, 83, 1367–1383. Clark LV, Jasieniuk M (2011) POLYSAT: an R package for polyploid microsatellite analysis. Molecular Ecology Resources, 11, 562–6. Cox K, Broeck AV (2011) Temperature related natural selection in a wind pollinated tree across regional and continental scales. Molecular Ecology, 20, 2724–38. Van Der Hulst RGM, Mes THM, Falque M, et al. (2003) Genetic structure of a population sample of apomictic dandelions. Heredity, 90, 326–35. Derome N, Bougas B, Rogers SM, et al. (2008) Pervasive sex-linked effects on transcription regulation as revealed by expression quantitative trait loci mapping in lake whitefish species pairs (Coregonus sp., Salmonidae). Genetics, 179, 1903–17. 57 Dobzhansky T (1957) An experimental study of interaction between genetic drift and natural selection. Evolution, 11, 311–319. Egan SP, Nosil P, Funk DJ (2008) Selection and genomic differentiation during ecological speciation: isolating the contributions of host association via a comparative genome scan of Neochlamisus bebbianae leaf beetles. Evolution, 62, 1162–81. Emery R, Chinnappa C (1994) Specialization, plant strategies, and phenotypic plasticity in populations of Stellaria longipes along an elevational gradient. International Journal of Plant Science, 155, 203–219. Epperson BK (2007) Plant dispersal, neighbourhood size and isolation by distance. Molecular Ecology, 16, 3854–65. Esselink GD, Nybom H, Vosman B (2004) Assignment of allelic configuration in polyploids using the MAC-PR (microsatellite DNA allele counting-peak ratios) method. TAG. Theoretical & Applied Genetics, 109, 402–8. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611–20. Excoffier L, Hofer T, Foll M (2009) Detecting loci under selection in a hierarchically structured population. Heredity, 103, 285–98. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164, 1567–87. Falush D, Stephens M, Pritchard JK (2007) Inference of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes, 7, 574–578. Feder JL, Nosil P (2010) The efficacy of divergence hitchhiking in generating genomic islands during ecological speciation. Evolution, 64, 1729–1747. Felsenstein J (1976) The theoretical population genetics of variable selection and migration. Annual Review of Genetics, 10, 253–280. Fischer MC, Foll M, Excoffier L, Heckel G (2011) Enhanced AFLP genome scans detect local adaptation in high-altitude populations of a small rodent (Microtus arvalis). Molecular Ecology, 20, 1450–62. Foll M, Fischer MC, Heckel G, Excoffier L (2010) Estimating population structure from AFLP amplification intensity. Molecular Ecology, 19, 4638–47. 58 Foll M, Gaggiotti O (2008) A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics, 180, 977–93. Forstmeier W, Schielzeth H, Mueller JC, Ellegren H, Kempenaers B (2012) Heterozygosity-fitness correlations in zebra finches: microsatellite markers can be better than their reputation. Molecular Ecology, 21, 3237–49. Freedman AH, Thomassen HA, Buermann W, Smith TB (2010) Genomic signals of diversification along ecological gradients in a tropical lizard. Molecular Ecology, 19, 3773–88. Funk DJ, Egan SP, Nosil P (2011) Isolation by adaptation in Neochlamisus leaf beetles: host-related selection promotes neutral genomic divergence. Molecular Ecology, 20, 4671–82. Gagnaire PA, Albert V, Jónsson B, Bernatchez L (2009) Natural selection influences AFLP intraspecific genetic variability and introgression patterns in Atlantic eels. Molecular Ecology, 18, 1678–91. Gaudeul M, Till-Bottraud I, Barjon F, Manel S (2004) Genetic diversity and differentiation in Eryngium alpinum L. (Apiaceae): comparison of AFLP and microsatellite markers. Heredity, 92, 508–18. Gavrilets S, Hastings A (2012) Founder Effect Speciation : A Theoretical Reassessment. The American Naturalist, 147, 466–491. Gonzalo-Turpin H, Hazard L (2009) Local adaptation occurs along altitudinal gradient despite the existence of gene flow in the alpine plant species Festuca eskia. Journal of Ecology, 97, 742–751. Gort G, Koopman WJM, Stein A (2006) Fragment length distributions and collision probabilities for AFLP markers. Biometrics, 62, 1107–15. Grubbs KC, Small RL, Schilling EE (2009) Evidence for multiple, autoploid origins of agamospermous populations in Eupatorium sessilifolium (Asteraceae). Plant Systematics and Evolution, 279, 151–161. Hadany L (2003) Adaptive peak shifts in a heterogenous environment. Theoretical population biology, 63, 41–51. Hager R, Cheverud JM, Wolf JB (2009) Relative contribution of additive, dominance, and imprinting effects to phenotypic variation in body size and growth between divergent selection lines of mice. Evolution; international journal of organic evolution, 63, 1118–28. 59 Hamilton MB (2009) Popoulation Genetics. John Wiley and Sons, Chichester, UK. Haubold B, Travisano M, Rainey PB, Hudson RR (1998) Detecting linkage disequilibrium in bacterial populations. Genetics, 150, 1341–8. Heimburger M (1959) Cytotaxonomic studies in the genus Anemone. Canadian Journal of Botany, 37, 587–612. Heimburger M, Boraiah G (1964) Genome relationships of Anemone multifida. Canadian Journal of Genetics & Cytology, 6, 529–539. Hofer T, Ray N, Wegmann D, Excoffier L (2009) Large allele frequency differences between human continental groups are more likely to have occurred by drift during range expansions than by selection. Annals of Human Genetics, 73, 95–108. Honnay O, Jacquemyn H, Van Looy K, Vandepitte K, Breyne P (2009) Temporal and spatial genetic variation in a metapopulation of the annual Erysimum cheiranthoides on stony river banks. Journal of Ecology, 97, 131–141. Hoot SB, Reznicek AA, Palmer JD (2012) Phylogenetic relationships in Anemone (Ranunculaceae) based on morphology and chloroplast DNA. Systematic Botany, 19, 169–200. Hoskin CJ, Higgie M, McDonald KR, Moritz C (2005) Reinforcement drives rapid allopatric speciation. Nature, 437, 1353–6. Huang CC, Hung KH, Hwang CC, et al. (2011) Genetic population structure of the alpine species Rhododendron pseudochrysanthum sensu lato (Ericaceae) inferred from chloroplast and nuclear DNA. BMC Evolutionary Biology, 11, 108. Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources, 9, 1322–32. Hughes C, Eastwood R (2006) Island radiation on a continental scale: exceptional rates of plant diversification after uplift of the Andes. Proceedings of the National Academy of Sciences, 103, 10334–9. Ikeda H, Setoguchi H (2010) Natural selection on PHYE by latitude in the Japanese archipelago: insight from locus specific phylogeographic structure in Arcterica nana (Ericaceae). Molecular Ecology, 19, 2779–91. Kauffman S (1987) Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology, 128, 11–45. 60 Khanuja SPS, Shasany AK, Darokar MP, Kumar S (1999) Rapid isolation of DNA from dry and fresh samples of plants producing large amounts of secondary metabolites and essential oils. Plant Molecular Biology Reporter, 17, 1–7. Kim Y, Nielsen R (2004) Linkage disequilibrium as a signature of selective sweeps. Genetics, 167, 1513–24. Kimura M (1983) The Neutral Theory of Evolution. Cambridge University Press, Cambridge. Kingsolver JG, Hoekstra HE, Hoekstra JM, et al. (2001) The strength of phenotypic selection in natural populations. The American Naturalist, 157, 245–61. Klopfstein S, Currat M, Excoffier L (2006) The fate of mutations surfing on the wave of a range expansion. Molecular Biology and Evolution, 23, 482–90. Korner C (2003) Alpine Plant Life. New York. Kuehne HA, Murphy HA, Francis CA, Sniegowski PD (2007) Allopatric divergence, secondary contact, and genetic isolation in wild yeast populations. Current Biology, 17, 407–11. Lai Z, Nakazato T, Salmaso M, et al. (2005) Extensive chromosomal repatterning and the evolution of sterility barriers in hybrid sunflower species. Genetics, 171, 291–303. Leitch I, Bennett M (2004) Genome downsizing in polyploid plants. Biological Journal of the Linnean, 82, 651–663. Leitch AR, Leitch IJ (2008) Genomic plasticity and the diversity of polyploid plants. Science, 320, 481–483. Lenormand T (2002) Gene flow and the limits to natural selection. Trends in Ecology & Evolution, 17, 183–189. Lewontin R (1974) The Genetic Basis of Evolutionary Change. Columbia University Press, New York. Lewontin RC, Krakauer J (1973) Distribution of gene frequency as a test of the theory of the selective neutrality of polymoprhisms. Genetics, 74, 175–195. Lexer C, Welch ME, Raymond O, Rieseberg LH (2003) The origin of ecological divergence in Helianthus paradoxus (Asteraceae): selection on transgressive characters in a novel hybrid habitat. Evolution, 57, 1989–2000. 61 Lo EYY, Stefanovic S, Dickinson T (2009) Population genetic structure of diploid sexual and polyploid apomictic hawthorns (Crataegus; Rosaceae) in the Pacific Northwest. Molecular Ecology, 18, 1145–1160. Lowry DB, Hall MC, Salt DE, Willis JH (2009) Genetic and physiological basis of adaptive salt tolerance divergence between coastal and inland Mimulus guttatus. The New Phytologist, 183, 776–88. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P (2003) The power and promise of population genomics: from genotyping to genome typing. Nature Reviews Genetics, 4, 981–94. Mackay T, Stone E (2009) The genetics of quantitative traits: challenges and prospects. Nature Reviews Genetics, 10, 565–577. Maruyama T, Fuerst PA (1985) Population bottlnecks and nonequilibrium models in population genetics. II. Number of alleles in a small population that was formed by a recent bottleneck. Genetics, 111, 675–689. Meirmans PG, Goudet J, Gaggiotti OE (2011) Ecology and life history affect different aspects of the population structure of 27 high-alpine plants. Molecular Ecology, 20, 3144–55. Meirmans PG, Vlot EC, Den Nijs JCM, Menken SBJ (2003) Spatial ecological and genetic structure of a mixed population of sexual diploid and apomictic triploid dandelions. Journal of Evolutionary Biology, 16, 343–52. Meudt HM, Clarke AC (2007) Almost forgotten or latest practice? AFLP applications, analyses and advances. Trends in Plant Science, 12, 106–17. Meyer KM, Hoot SB, Arroyo MTK (2010) Phylogenetic Affinities of South American Anemone (Ranunculaceae), including the Endemic Segregate Genera, Barneoudia and Oreithales. International Journal of Plant Sciences, 171, 323–331. Michel A, Sim S, Powell T (2010) Widespread genomic divergence during sympatric speciation. Proceedings of the National Academy of Sciences, 107, 9724–9729. Minder AM, Widmer A (2008) A population genomic analysis of species boundaries: neutral processes, adaptive divergence and introgression between two hybridizing plant species. Molecular Ecology, 17, 1552–63. Mráz P, Gaudeul M, Rioux D, et al. (2007) Genetic structure of Hypochaeris uniflora (Asteraceae) suggests vicariance in the Carpathians and rapid post- 62 glacial colonization of the Alps from an eastern Alpine refugium. Journal of Biogeography, 34, 2100–2114. Nichols KM, Edo AF, Wheeler PA, Thorgaard GH (2008) The genetic basis of smoltification-related traits in Oncorhynchus mykiss. Genetics, 179, 1559–75. Nicotra AB, Atkin OK, Bonser SP, et al. (2010) Plant phenotypic plasticity in a changing climate. Trends in Plant Science, 15, 684–92. Nosil P, Funk DJ, Ortiz-Barrientos D (2009) Divergent selection and heterogeneous genomic divergence. Molecular Ecology, 18, 375–402. Nosil P, Vines T, Funk (2005) Reproductive isolation caused by natural selection against immigrants from divergent habitats. Evolution, 59, 705–719. Nunes V, Beaumont M, Butlin R (2011) Multiple approaches to detect outliers in a genome scan for selection in ocellated lizards (Lacerta lepida) along an environmental gradient. Molecular Ecology, 20, 193–205. Nunes VL, Beaumont MA, Butlin RK, Paulo OS (2011) Multiple approaches to detect outliers in a genome scan for selection in ocellated lizards (Lacerta lepida) along an environmental gradient. Molecular Ecology, 20, 193–205. Orsini L, Spanier KI, DE Meester L (2012) Genomic signature of natural and anthropogenic stress in wild populations of the waterflea Daphnia magna: validation in space, time and experimental evolution. Molecular Ecology, 21, 2160–2175. Osborn TC, Chris Pires J, Birchler J a., et al. (2003) Understanding mechanisms of novel gene expression in polyploids. Trends in Genetics, 19, 141–147. Otto S, Whitton J (2000) Polyploid incidence and evolution. Annual Review of Genetics, 34, 401–437. Paris M, Boyer S, Bonin A, et al. (2010) Genome scan in the mosquito Aedes rusticus: population structure and detection of positive selection after insecticide treatment. Molecular Ecology, 19, 325–37. Paris M, Despres L (2012) Identifying insecticide resistance genes in mosquito by combining AFLP genome scans and 454 pyrosequencing. Molecular Ecology, 1672–1686. Pavey SA, Collin H, Nosil P, Rogers SM (2010) The role of gene expression in ecological speciation. Annals of the New York Academy of Sciences, 1206, 110–29. 63 Peakall R, Smouse PE (2006) GenAlEx 6: genetic analysis in excel. Population genetic software for teaching and research. Molecular Ecology Notes, 6, 288– 295. Peichel CL, Nereng KS, Ohgi KA, et al. (2001) The genetic architecture of divergence between threespine stickleback species. Nature, 414, 901–5. Pinceel J, Jordaens K, Pfenninger M, Backeljau T (2005) Rangewide phylogeography of a terrestrial slug in Europe: evidence for Alpine refugia and rapid colonization after the Pleistocene glaciations. Molecular Ecology, 14, 1133–50. Poncet BN, Herrmann D, Gugerli F, et al. (2010) Tracking genes of ecological relevance using a genome scan in two independent regional population samples of Arabis alpina. Molecular Ecology, 19, 2896–907. Presgraves DC, Balagopalan L, Abmayr SM, Orr HA (2003) Adaptive evolution drives divergence of a hybrid inviability gene between two species of Drosophila. Nature, 423, 715–9. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945–59. Pusadee T, Jamjod S, Chiang Y-C, Rerkasem B, Schaal B a (2009) Genetic structure and isolation by distance in a landrace of Thai rice. Proceedings of the National Academy of Sciences, 106, 13880–5. Renaut S, Maillet N, Normandeau E, et al. (2012) Genome-wide patterns of divergence during speciation: the lake whitefish case study. Philosophical transactions of the Royal Society B, 367, 354–63. Rieseberg LH, Kim S-C, Randell RA, et al. (2007) Hybridization and the colonization of novel habitats by annual sunflowers. Genetica, 129, 149–65. Roberts T (2006) Multiple levels of allopatric divergence in the endemic Philippine fruit bat Haplonycteris fischeri (Pteropodidae). Biological Journal of the Linnean Society, 88, 329–349. Rogers SM, Bernatchez L (2005) Integrating QTL mapping and genome scans towards the characterization of candidate loci under parallel selection in the lake whitefish (Coregonus clupeaformis). Molecular Ecology, 14, 351–61. Rogers SM, Bernatchez L (2006) The genetic basis of intrinsic and extrinsic postzygotic reproductive isolation jointly promoting speciation in the lake whitefish species complex (Coregonus clupeaformis). Journal of Evolutionary Biology, 19, 1979–94. 64 Rogers SM, Bernatchez L (2007) The genetic architecture of ecological speciation and the association with signatures of selection in natural lake whitefish (Coregonus sp. Salmonidae) species pairs. Molecular Biology and Evolution, 24, 1423–38. Rogers SM, Isabel N, Bernatchez L (2007) Linkage maps of the dwarf and Normal lake whitefish (Coregonus clupeaformis) species complex and their hybrids reveal the genetic architecture of population divergence. Genetics, 175, 375– 98. Ronfort J, Jenczewski E, Bataillon T, Rousset F (1998) Analysis of population structure in autotetraploid species. Genetics, 150, 921–30. Ryman N, Jorde PE (2001) Statistical power when testing for genetic differentiation. Molecular Ecology, 10, 2361–73. Schielzeth H, Kempenaers B, Ellegren H (2012) QTL linkage mapping of Zebra finch beak color shows an oligogenic control of a sexually selected trait. Evolution, 66, 18–30. Schluter D (2001) Ecology and the origin of species. Trends in Ecology & Evolution, 16, 372–380. Schluter D, Marchinko KB, Barrett RDH, Rogers SM (2010) Natural selection and the genetics of adaptation in threespine stickleback. Philosophical transactions of the Royal Society of London B, 365, 2479–86. Schonswetter P, Paun O, Tribsch a., Niklfeld H (2003) Out of the Alps: colonization of Northern Europe by East Alpine populations of the Glacier Buttercup Ranunculus glacialis L. (Ranunculaceae). Molecular Ecology, 12, 3373–3381. Sharbel TF, Haubold B, Mitchell-Olds T (2000) Genetic isolation by distance in Arabidopsis thaliana: biogeography and postglacial colonization of Europe. Molecular Ecology, 9, 2109–18. Skrede I, Borgen L, Brochmann C (2009) Genetic structuring in three closely related circumpolar plant species: AFLP versus microsatellite markers and high-arctic versus arctic-alpine distributions. Heredity, 102, 293–302. Smith JM, Smith NH, O’Rourke M, Spratt BG (1993) How clonal are bacteria? Proceedings of the National Academy of Sciences, 90, 4384–8. Soltis DE, Albert VA, Leebens-Mack J, et al. (2009) Polyploidy and angiosperm diversification. American Journal of Botany, 96, 336–48. 65 Soltis D, Soltis P (1999) Polyploidy: recurrent formation and genome evolution. Trends in Ecology & Evolution, 14, 348–352. Soltis PS, Soltis DE (2000) The role of genetic and genomic attributes in the success of polyploids. Proceedings of the National Academy of Sciences 97, 7051–7. Soltis D, Soltis P, Pires J (2004) Recent and recurrent polyploidy in Tragopogon (Asteraceae): cytogenetic, genomic and genetic comparisons. Biological Journal of the Linnean Society, 82, 485–501. Stapley J, Reger J, Feulner PGD, et al. (2010) Adaptation genomics: the next generation. Trends in Ecology & Evolution, 25, 705–12. Stinchcombe JR, Hoekstra HE (2008) Combining population genomics and quantitative genetics: finding the genes underlying ecologically important traits. Heredity, 100, 158–70. Storz J (2005) Using genome scans of DNA polymorphism to infer adaptive population divergence. Molecular Ecology, 14, 671–688. Storz JF, Wheat CW (2010) Integrating evolutionary and functional approaches to infer adaptation at specific loci. Evolution, 64, 2489–509. Strasburg JL, Sherman NA, Wright KM, et al. (2012) What can patterns of differentiation across plant genomes tell us about adaptation and speciation? Philosophical transactions of the Royal Society of London B, 367, 364–73. Surget-Groba Y, Johansson H, Thorpe RS (2012) Synergy between allopatry and ecology in population differentiation and speciation. International Journal of Ecology, 2012, 1–10. Svedin N, Wiley C, Veen T, Gustafsson L, Qvarnström A (2008) Natural and sexual selection against hybrid flycatchers. Proceedings of the Royal Society B, 275, 735–44. Symonds VV, Soltis PS, Soltis DE (2010) Dynamics of polyploid formation in Tragopogon (Asteraceae): recurrent formation, gene flow, and population structure. Evolution, 64, 1984–2003. R Development Core Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing. Thompson JD (1991) Phenotypic plasticity as a component of evolutionary change. Trends in Ecology & Evolution, 6, 246–9. 66 Tice KA, Carlon DB (2011) Can AFLP genome scans detect small islands of differentiation? The case of shell sculpture variation in the periwinkle Echinolittorina hawaiiensis. Journal of Evolutionary Biology, 24, 1814–25. Vekemans X, Beauwens T, Lemaire M, Roldán-Ruiz I (2002) Data from amplified fragment length polymorphism (AFLP) markers show indication of size homoplasy and of a relationship between degree of homoplasy and fragment size. Molecular Ecology, 11, 139–51. Via S, West J (2008) The genetic mosaic suggests a new role for hitchhiking in ecological speciation. Molecular Ecology, 17, 4334–45. Wendel JF (2000) Genome evolution in polyploids. Plant Molecular Biology, 42, 225–49. White TA, Stamford J, Rus Hoelzel A (2010) Local selection and population structure in a deep-sea fish, the roundnose grenadier (Coryphaenoides rupestris). Molecular ecology, 19, 216–26. Whiteley AR, Derome N, Rogers SM, et al. (2008) The phenomics and expression quantitative trait locus mapping of brain transcriptomes regulating adaptive divergence in lake whitefish species pairs (Coregonus sp.). Genetics, 180, 147–64. Whitlock MC, Guillaume F (2009) Testing for spatially divergent selection: comparing QST to FST. Genetics, 183, 1055–63. Whitney KD, Randell RA Rieseberg LH (2006) Adaptive introgression of herbivore resistance traits in the weedy sunflower Helianthus annuus. The American Naturalist, 167. Whitney KD, Randell RA Rieseberg LH (2010) Adaptive introgression of abiotic tolerance traits in the sunflower Helianthus annuus. The New Phytologist, 187, 230–9. Willi Y, Van Buskirk J, Hoffmann AA (2006) Limits to the Adaptive Potential of Small Populations. Annual Review of Ecology, Evolution, & Systematics, 37, 433–458. Wu LL, Cui XK, Milne RI, Sun Y-S, Liu JQ (2010) Multiple autopolyploidizations and range expansion of Allium przewalskianum Regel. (Alliaceae) in the Qinghai-Tibetan Plateau. Molecular ecology, 19, 1691–704. Zhivotovsky LA (1999) Estimating population structure in diploids with multilocus dominant DNA markers. Molecular Ecology, 8, 907–13. 67 Appendix A: Supplementary Data and Methods Cluster dendrograms were generated using the outlier and neutral AFLP data to assist in visualizing genetic population structure. I moved this analysis to the appendix as these were largely consistent with the PCA and mostly redundant. Figure A1. Cluster dendrogram of individuals using Euclidean distance based on the AFLP genotype data at the outlier loci. Individuals from lowland populations are colour coded yellow, orange and red, while individuals from alpine populations are coded light blue and dark blue. Cluster dendrograms were then generated using Euclidean distance to further estimate the degree and nature of any clustering within the outlier and neutral genetic data and to further characterize any apparent outlier individuals 68 Figure A2. Cluster dendrogram of individuals using Euclidean distance based on the AFLP genotype data at the neutral loci. Individuals from lowland populations are colour coded yellow, orange and red, while individuals from alpine populations are coded light blue and dark blue. 69 Table A1. Linkage disequilibrium analysis in Multilocus following Agapow & Burt (2001). The index of association was calculated for pairwise comparisons between all outlier loci, with IA ≠ 0 indicating a statistically significant association (linkage) between two loci (Agapow & Burt 2001). There were no significant associations amongst any outlier loci. Comparison 1&2 1&3 1&4 1&5 1&6 1&7 1&8 1&9 1&10 1&11 1&12 1&13 2&3 2&4 2&5 2&6 2&7 2&8 2&9 2&10 2&11 2&12 2&13 3&4 3&5 3&6 3&7 3&8 3&9 3&10 3&11 3&12 3&13 4&5 4&6 Observed Index of Association -0.013 -0.018 -0.007 0.065 0.163 0.074 -0.005 -0.012 -0.007 -0.017 -0.029 0.063 -0.092 -0.072 0.031 -0.020 0.036 0.001 -0.030 0.022 -0.038 -0.097 -0.017 0.522 -0.092 -0.032 0.000 0.154 0.335 0.168 0.598 0.589 -0.006 -0.081 0.011 70 p-value (p > x) 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.87 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.21 0.21 4&7 4&8 4&9 4&10 4&11 4&12 4&13 5&6 5&7 5&8 5&9 5&10 5&11 5&12 5&13 6&7 6&8 6&9 6&10 6&11 6&12 6&13 7&8 7&9 7&10 7&11 7&12 7&13 8&9 8&10 8&11 8&12 8&13 9&10 9&11 9&12 9&13 10&11 10&12 10&13 11&12 11&13 12&13 0.015 0.179 0.254 0.113 0.388 0.356 -0.024 0.028 -0.003 0.017 -0.029 0.035 -0.051 -0.097 0.026 0.016 0.003 0.009 0.003 -0.022 -0.049 0.031 -0.003 -0.007 0.000 -0.007 0.004 0.022 0.128 0.104 0.131 0.125 -0.003 0.074 0.277 0.164 -0.004 0.169 0.136 0.007 0.410 0.033 -0.030 71 0.21 0.21 0.21 0.21 0.21 0.21 0.21 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.55 0.55 0.55 0.55 0.55 0.55 0.62 0.62 0.62 0.62 0.62 0.79 0.79 0.79 0.79 0.61 0.61 0.61 0.9 0.9 0.26 Table A2. Phenotype data for each individual, with population individuals were sampled from. Measured phenotypes were plant height (in cm) and floral colour. Individual BHS11 BHS13 BHS14 BHS16 BHS17 BHS18 BHS2 BHS20 BHS21 BHS22 BHS23 BHS27 BHS28 BHS32 BHS34 BHS35 BHS36 BHS4 BHS40 BHS44 BHS46 BHS49 BHS5 BHS6 Population BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS BHS Height (cm) 40.3 31.6 39.8 21.3 23.2 32.5 32.4 23.6 22.3 17.3 30.4 33.3 24.3 29.3 29.1 24.2 27.5 26.7 23.7 25.8 34.4 28.1 29.6 15.3 Floral Colour pink red red white pink white red white white white pink white red white pink pink white red red red red red pink red BL1 BL10 BL11 BL12 BL14 BL15 BL BL BL BL BL BL 26.5 27.9 33.5 19.3 29.3 19.1 white white red white white white 72 BL17 BL18 BL19 BL2 BL20 BL21 BL28 BL30 BL32 BL36 BL37 BL39 BL4 BL42 BL45 BL49 BL50 BL52 BL8 BL BL BL BL BL BL BL BL BL BL BL BL BL BL BL BL BL BL BL 41.7 26.3 20.8 20.2 38.6 29.1 27.2 35.4 27.1 16.3 25 17.5 33.7 22.8 31.4 30.1 35.4 30.2 24.3 white white pink white white red white white white white white red white red pink red white white pink HSB1 HSB10 HSB11 HSB12 HSB13 HSB14 HSB15 HSB16 HSB17 HSB18 HSB2 HSB20 HSB21 HSB22 HSB23 HSB24 HSB25 HSB26 HSB27 HSB28 HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB HSB 22.3 15.1 20 25.8 23.7 15.4 15.5 25.5 14.3 25.3 22.2 20.8 26.7 13.3 18.4 17.3 18.5 24.3 25.5 19.8 pink white white white white white white white white white white white white white white white red white white pink 73 HSB29 HSB3 HSB30 HSB4 HSB5 HSB6 HSB7 HSB8 HSB9 HSB HSB HSB HSB HSB HSB HSB HSB HSB 14.3 14.5 20 24.5 23.7 23 27.8 15.4 29 white white white white white red white white white HWP1 HWP10 HWP11 HWP12 HWP13 HWP14 HWP15 HWP16 HWP18 HWP19 HWP20 HWP21 HWP22 HWP23 HWP24 HWP25 HWP26 HWP3 HWP4 HWP5 HWP6 HWP6 HWP8 HWP9 HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP HWP 21.8 26.9 30.3 31.7 25.1 30.9 26.4 27.2 20.1 23.4 20.6 24.5 22.8 21.4 20 27.9 27.6 23 20.5 21.3 31.8 31.1 26.9 23.9 white white white white white white white white white white white white white white white white white white white white white white white white WC10 WC11 WC12 WC17 WC WC WC WC 31.5 17.1 32.1 25.9 red white white red 74 WC18 WC2 WC22 WC23 WC25 WC28 WC3 WC30 WC31 WC36 WC4 WC42 WC45 WC49 WC7 WC8 WC9 WC WC WC WC WC WC WC WC WC WC WC WC WC WC WC WC WC 36.2 30.4 25.4 29.7 26.3 22.4 30.8 27.9 32.2 31.4 42.4 24.1 23.4 19.4 36.5 28.3 27.3 75 white pink white white white red white white white white white white white red white red red Appendix B: AFLP Protocol Take From the AFLP Plant Mapping Protocol for Regular Plant Genomes (Applied Biosystems) Restriction-Ligation: 1. From the AFLP Ligation and Preselective Amplification Module, remove the tubes labeled MseI Adaptor Pair and EcoRI Adaptor 2. Heat tubes in a water bath at 95 °C for 5 minutes. 3. Allow tubes to cool to room temperature over a 10-minute period. 4. Spin in a microcentrifuge for 10 seconds at 1400 × g (maximum). 5. Combine the following in a sterile 0.5 mL microcentrifuge tube: a. 10 µL 10X T4 DNA ligase buffer with ATPa b. 10 µL 0.5 M NaCl c. 5 µL 1 mg/mL BSA (diluted from 10 mg/mL stock) d. 100 Units MseI e. 500 Units EcoRI f. 100 Weiss Units T4 DNA Ligase 6. Add sterile distilled water to bring the total volume to 100 µL. 7. Mix gently. 8. Spin down in a microcentrifuge for 10 seconds. 9. Store on ice until ready to aliquot into individual reaction tubes. 10. Combine the following in a sterile 0.5-mL microcentrifuge tube: a. 1.0 µL 10X T4 DNA ligase buffer that includes ATP b. 1.0 µL 0.5M NaCl c. 0.5 µL 1.0 mg/mL BSA (dilute from 10 mg/mL if necessary) d. 1.0 µL MseI adaptor e. 1.0 µL EcoRI adaptor f. 1.0 µL Enzyme Master Mix 11. Add 0.5 µg genomic DNA in 5.5 µL sterile distilled water 12. Mix thoroughly, then place in a microcentrifuge for 10 seconds. 13. Incubate at room temperature overnight 14. Add 189 µL of TE0.1 buffer to each restriction-ligation reaction 15. Mix thoroughly. Preselective Amplification 1. Combine the following in a PCR reaction tube: a. 4.0 µL diluted DNA prepared by restriction-ligation b. 1.0 µL AFLP preselective primer pairs c. 15.0 µL AFLP Core Mix 2. Place the samples in a thermal cycler at ambient temperature. 3. Run the following PCR method: a. 72°C for 2 minutes b. 20 cycles of: i. 94°C for 20 seconds 76 ii. 56°C for 30 seconds iii. 72°C for 2 minutes c. 60°C for 30 minutes d. 4°C Hold 4. Combine the following in a sterile 0.5-mL microcentrifuge tube: a. 10.0 µL preselective amplification reaction product b. 190.0 µL TE0.1 5. Mix thoroughly, then spin down in a microcentrifuge for 10 seconds 6. Store the diluted preselective amplification product at 2–6 °C if not used immediately. Selective Amplification 1. Combine the following in a PCR reaction tube: a. 3.0 µL diluted preselective amplification reaction product b. 1.0 µL MseI[Primer–Cxx] at 5 µM c. 1.0 µL EcoRI[Dye–primer–Axx] at 1 µM d. 15.0 µL AFLP Core Mix 2. Run PCR using the thermal cycler parameters: a. 94°C for 2 minutes b. Cycle: i. 94°C for 20 seconds ii. 66°C (ramped down by 1 degree each cycle until 56°C) for 30 seconds iii. 72°C for 2 minutes c. 60°C for 30 minutes d. 4°C hold 77 Appendix C: Example of Electropherogram and Raw Data Produced from AFLP Figure C1. An example electropherogram following fragment separation of AFLP fragments via capillary electrophoresis. Dominant loci are scored as either the presence of a peak at a particular size (e.g. at 100 bp) or by using the height of the peak in fluorescent units (e.g. 100 FU at 100 bp). Table C1. Example of binary (D) and peak height (PH) AFLP data exported from Genemapper v4.0 for two of the outlier loci. Dominant AFLP alleles are scored either as present (1) or absent (0). Peak Height AFLP alleles are scored as the height of the amplification peak in the electopherogram (Fig. C1) if present or zero if there was no amplification. Individual BHS11 BHS13 BHS14 BHS16 BHS17 BHS18 BHS2 locus 1 (D) 0 1 0 0 1 0 0 locus 58 (D) 1 1 1 0 1 0 1 78 locus 1 (PH) 0 236 0 0 113 0 0 locus 78 (PH) 0 0 0 0 0 0 0 BHS20 BHS21 BHS22 BHS23 BHS27 BHS28 BHS32 BHS34 BHS35 BHS36 BHS4 BHS40 BHS44 BHS46 BHS49 BHS5 BHS6 BL1 BL10 BL11 BL12 BL14 BL15 BL17 BL18 BL19 BL2 BL20 BL21 BL28 BL30 BL32 BL36 BL37 BL39 BL4 BL42 BL45 BL49 BL50 BL52 BL8 HSB1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 1 1 1 0 1 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 79 0 0 0 0 0 0 0 119 0 230 0 0 0 0 0 0 0 190 0 0 0 0 0 117 0 0 0 0 0 0 0 154 198 141 0 235 0 0 0 186 0 0 202 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 HSB10 HSB11 HSB12 HSB13 HSB14 HSB15 HSB16 HSB17 HSB18 HSB2 HSB20 HSB21 HSB22 HSB23 HSB24 HSB25 HSB26 HSB27 HSB28 HSB29 HSB3 HSB30 HSB4 HSB5 HSB6 HSB7 HSB8 HSB9 HWP1 HWP10 HWP11 HWP12 HWP13 HWP14 HWP15 HWP16 HWP18 HWP19 HWP20 HWP21 HWP22 HWP23 HWP24 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 1771 192 202 215 0 185 0 153 137 218 187 226 0 171 207 154 197 370 225 199 193 332 205 175 208 317 194 162 0 225 0 148 133 0 0 212 242 0 159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 231 298 157 602 597 444 635 258 199 226 137 153 193 144 220 HWP25 HWP26 HWP3 HWP4 HWP5 HWP6 HWP8 HWP9 WC10 WC11 WC12 WC17 WC18 WC2 WC22 WC23 WC25 WC28 WC3 WC30 WC31 WC36 WC4 WC42 WC45 WC49 WC7 WC8 WC9 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 1 0 1 1 0 81 0 0 0 0 0 0 228 0 0 0 220 0 0 0 0 0 0 0 0 0 0 0 0 0 0 278 155 0 0 257 140 173 189 153 0 260 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 138