* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Complex Inheritance of Maize Domestication Traits and Gene
Oncogenomics wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Behavioural genetics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Gene desert wikipedia , lookup
Genetically modified crops wikipedia , lookup
Long non-coding RNA wikipedia , lookup
X-inactivation wikipedia , lookup
Pathogenomics wikipedia , lookup
Essential gene wikipedia , lookup
Public health genomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Heritability of IQ wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Nutriepigenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome evolution wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Microevolution wikipedia , lookup
Minimal genome wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome (book) wikipedia , lookup
The Complex Inheritance of Maize Domestication Traits and Gene Expression By Zachary H. Lemmon A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Genetics) at the UNIVERSITY OF WISCONSIN – MADISON 2014 Date of final oral examination: 4/29/14 The dissertation is approved by the following members of the Final Oral Committee: John F. Doebley, Professor, Genetics David A. Baum, Professor, Botany and Genetics Shawn M. Kaeppler, Professor, Agronomy Patrick H. Masson, Professor, Genetics Bret A. Payseur, Professor, Genetics i Acknowledgements I want to extend my thanks to John Doebley for making this dissertation possible. John has been a constant voice of encouragement and insight throughout my graduate career. He has been instrumental in keeping me focused on the big question, while allowing me the freedom to chase down side interests and projects. John has taught me the importance of focusing my scientific inquiry on the core of a research question, which has shaped the way I approach research. While I have carried out the experiments described in this work, the first steps taken in these projects belong to John and I am grateful for the chance I was given to shepherd them to completion. Every day and conversation I have had with John as my advisor has made me into a better scientist and I am extremely thankful for the opportunity I was given six years ago when I joined the Doebley lab. I have been fortunate enough to also work in an outstanding lab full of supportive individuals on both a personal and professional level. The work performed by a number of my fellow lab members was crucial to the completion of these experiments. Without their help the many DNA and RNA extractions, PCR reactions, measured phenotypes, and plants grown would simply have not happened. Fellow graduate students, postdocs, lab technicians, and undergraduate workers have all assisted in their own way. I am also thankful that in addition to being wonderful coworkers in a professional sense, lab members have contributed to making the lab a fun, exciting, and enjoyable place to spend my Ph.D. career. I will never forget the power of “Tak”, being “skinny up top”, or the “lab master”. To Tony, Laura, CJ, Ali, Bao, Tina, Lisa, Eric III, Jesse, Elizabeth, David, Claudia, Wei, and the numerous undergrads, thank you for making this wonderful experience possible. In addition to my friends and colleagues at Wisconsin, I have been fortunate enough to be involved in a larger community of maize researchers at Cornell University, University of ii Missouri, North Carolina State University, and University of California - Davis. Working with these scientists has exposed me to a variety of questions and topics in maize research regarding phenotype, quantitative genetics, and large scale data collection and analysis resulting in a greatly expanded experience. In particular, collaborations with Qi Sun and Robert Bukowski at Cornell have greatly contributed to analysis in the third chapter of this thesis. Also dialog with Jeff Ross-Ibarra and Matt Hufford at UC-Davis has continuously provided me with insight into the population genetics of maize domestication and given me a valuable resource to draw on. My Ph.D. committee has been an excellent resource during my graduate career. Bret Payseur and Shawn Kaeppler in particular have provided valuable insight into scientific questions and suggested analyses that have become part of this dissertation. David Baum has always made time in his busy schedule to meet with me and keep up to date with my progress. Finally, Patrick Masson has been a constant source of encouragement and has assisted me in several capacities both within and outside of the Ph.D. committee. I am also eternally grateful to my family, who have stood by my side throughout this process. My parents, Karen and Holden, for giving me the tools and opportunity to pursue my goals. My sisters, Addie and Kelsey, for always being there and my wonderful nieces, Laney and Havi, for always making me smile. My amazing friend, Alex, who has been a constant source of support in my life and is one of the family now. Finally, my wife Megan, you have kept me grounded throughout these six years in Madison in both the good and bad times. You are my rock and this would not have been possible without you. iii Contents Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv iv 1 Genetic dissection of a genomic region with pleiotropic effects on domestication traits in maize reveals multiple linked QTL 1 1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Plant Material, Genotypes, and Phenotypes . . . . . . . . . . . . . 6 1.3.2 Mixed Models and Heritability . . . . . . . . . . . . . . . . . . . . . 7 1.3.3 QTL Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.4 Simulation Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 1.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 QTL mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.2 Simulation Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 16 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 Fine mapping of chromosome five domestication genes in maize 26 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 2.3.1 Plant material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.2 Field Trials and Phenotypes . . . . . . . . . . . . . . . . . . . . . . 32 2.3.3 Genotyping with PCR and next generation sequencing . . . . . . . 33 2.3.4 Statistical analysis and segregation of phenotypes . . . . . . . . . . 35 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4.1 RCNIL generation and phenotype least squared means . . . . . . . 38 2.4.2 PCR and GBS genotyping . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.3 QTL fail to segregate as Mendelian traits . . . . . . . . . . . . . . . 42 2.4.4 Multiple factors contribute to culm diameter and kernel row number 45 v 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.1 The complex genetic architecture of culm and kernel row number . 48 2.5.2 Future work on chromosome five QTL . . . . . . . . . . . . . . . . 50 3 The role of cis regulatory evolution in maize domestication 52 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4 3.3.1 Plant material, RNA preparation, and sequencing . . . . . . . . . . 56 3.3.2 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.3 Maize:teosinte gene expression ratios . . . . . . . . . . . . . . . . . 58 3.3.4 Testing for cis and trans effects . . . . . . . . . . . . . . . . . . . . 59 3.3.5 Candidate genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3.6 Proportion of cis variation in maize and teosinte . . . . . . . . . . . 62 3.3.7 Additive and dominant gene expression . . . . . . . . . . . . . . . . 63 3.3.8 CCT gene enrichment in various functional categories . . . . . . . . 64 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.1 RNAseq provides expression data for more than 17,000 genes per tissue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.2 Prolific regulatory variation characterized by relatively few consistent cis differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.4.3 Possible directional bias in cis evolution . . . . . . . . . . . . . . . 74 3.4.4 Gene expression variation is greater in teosinte . . . . . . . . . . . . 76 3.4.5 Selection candidate genes are enriched for CCT genes . . . . . . . . 78 3.4.6 Microarray and RNAseq data partially correspond . . . . . . . . . . 81 3.4.7 CCT genes are unrelated to differentially methylated regions . . . . 83 3.4.8 Dominant and additive gene expression inheritance . . . . . . . . . 85 vi 3.4.9 3.5 Candidate genes enriched in various functional categories . . . . . . 86 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.5.1 Regulatory change between and within maize and teosinte . . . . . 89 3.5.2 What is the frequency of cis and trans regulatory change? . . . . . 90 3.5.3 Tissue specific expression of CCT candidates . . . . . . . . . . . . . 92 3.5.4 Bias toward increased maize expression? . . . . . . . . . . . . . . . 93 3.5.5 Selection-candidates enriched for cis regulatory change . . . . . . . 94 3.5.6 Leaf tissue candidates are enriched for photosynthesis and chloroplast GO terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.5.7 Do crop domestication genes show cis differences? . . . . . . . . . . 96 3.5.8 A catalog of genes with cis regulatory variation . . . . . . . . . . . 96 vii Appendices 99 A Supplemental Content: Genetic dissection of a genomic region with pleiotropic effects on domestication traits in maize reveals multiple linked QTL 100 A.1 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 A.2 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 B Supplemental Content: Fine mapping of chromosome five domestication genes in maize 106 B.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 C Supplemental Content: The role of cis regulatory evolution in maize domestication 109 C.1 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 C.2 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 D Characterization of domestication traits for selection candidate gene Zea agamous2 157 D.1 Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 D.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 D.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 D.3.1 RCNILs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 D.3.2 Transgenic RNAi lines . . . . . . . . . . . . . . . . . . . . . . . . . 161 D.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 D.4.1 RCNILs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 D.4.2 Transgenic RNAi lines . . . . . . . . . . . . . . . . . . . . . . . . . 163 D.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 viii References 168 ix List of Figures 1.1 Cumulative plot of QTL detected in the mapping experiment. . . . . . . . 15 1.2 The number of detected QTL and mean detected QTL effect size versus number of simulated causative loci. . . . . . . . . . . . . . . . . . . . . . . 19 1.3 The proportion of detected QTL with zero, one, or more than one simulated causative genes in the 1.5 LOD support interval. . . . . . . . . . . . . . . . 21 2.1 Histograms of least squared means for the culm diameter and kernel row number phenotypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 GBS genotypes for kernel row number RCNILs. . . . . . . . . . . . . . . . 41 2.3 RCNILs sorted by phenotype from least to greatest. . . . . . . . . . . . . . 43 2.4 Density plots of the culm diameter and kernel row number phenotypes grouped by founding HIF. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.5 QTL LOD profiles for fine mapping of culm diameter and kernel row number traits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1 Overlap of genes assessed in the three tissues overall and in the CCT-AB gene list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2 Parent versus hybrid ear tissue allele specific expression ratios. . . . . . . . 72 3.3 Proportion of expression divergence due to cis regulatory difference. . . . . 73 x 3.4 Cis versus estimated trans regulatory effect for CCT-ABC genes in the ear, leaf, and stem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.5 The proportion of average maize to teosinte R2 from linear models explaining F1 hybrid expression by maize and teosinte parent. . . . . . . . . . . . 77 3.6 Density plots of ln(XPCLR) score of conserved versus CCT-AB candidate genes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.7 Proportion of cis only and trans only genes identified as having dominant or additive inheritance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A.1 Histograms of the least squared means for phenotyped traits from the QTL mapping population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 A.2 Example histograms of simulated traits for several different conditions in terms of number of causative loci, effect size, and heritability. . . . . . . . 102 A.3 Proportion of detected QTL with zero, one, or multiple causative genes in the 1.5 LOD support interval. . . . . . . . . . . . . . . . . . . . . . . . . . 103 C.1 Parent versus hybrid leaf tissue allele specific expression ratios. . . . . . . . 110 C.2 Parent versus hybrid stem tissue allele specific expression ratios. . . . . . . 111 C.3 Dominance by additivity ratio grouped by regulatory category. . . . . . . . 112 D.1 Single kernel weight estimates for zag2 RCNILs. . . . . . . . . . . . . . . . 164 xi List of Tables 1.1 NIRIL phenotyped traits, descriptions, approximate distribution, between year Pearson correlation coefficients, and Pearson p-values. . . . . . . . . . 8 1.2 Final models selected for the thirteen NIRIL phenotypes. . . . . . . . . . . 9 1.3 Detected QTL for the T5S mapping population with position, heritability, and LOD score statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 Final linear mixed models used to produce least squared means for fine mapping RCNILs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2 Detected QTL and HIF effects including LOD, percent variation explained, and additive effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1 Regulatory category as defined by significant (Sig.) or not significant (Not Sig.) binomial tests (BT) and Fisher’s Exact Tests (FET). . . . . . . . . . 60 3.2 Assignable RNAseq Read Counts from F1 hybrids and parents. . . . . . . . 68 3.3 Genes for which RNAseq data was collected and expression was assayed.1 . 69 3.4 Fisher’s Exact Tests for overlap of selection and CCT candidates. . . . . . 80 3.5 Fisher’s Exact Tests for enrichment/depletion of cis and trans only genes in selection features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.6 Fisher’s Exact Tests for overlap between microarray and CCT differentially expressed genes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 xii 3.7 Regulatory category of the closest maize homolog of 6 maize and 22 nonmaize domestication loci. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 A.1 RFLP Markers used during backcrossing of QTL mapping population. . . . 104 A.2 Genetic markers used to score BC6 S6 mapping population. . . . . . . . . . 105 B.1 PCR markers used for genotyping RCNILs including gene or SNP target, AGPv2 position, and primer sequence. . . . . . . . . . . . . . . . . . . . . 107 C.1 Biological replicates for RNAseq experiment. . . . . . . . . . . . . . . . . . 113 C.2 Adapter name, barcode sequence, and barcode length for Illumina adapters used in RNAseq libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 C.3 Number of genomic paired end reads and coverage obtained for constructing pseudo-transcriptomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 C.4 Proportion of divergence due to cis regulatory effect grouped by overall parental divergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 C.5 The number of genes for which the maize or teosinte allele is expressed at a higher level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 C.6 Bias for the maize allele grouped by inbred line for the three tissues in the CCT-ABC gene list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 C.7 Allele specific expression variation among F1 hybrids explained by maize and teosinte parent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 C.8 Number of genes with significant cis expression variation explained by maize and/or teosinte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 C.9 Comparison of observed and expected numbers of genes classified as differentially expressed (DE) or not differentially expressed (NDE) by RNAseq and MicroArray assays in groups A, B, and C in the three tissue types. . . 121 xiii C.10 Regulatory categories for genes identified as differentially expressed between maize and teosinte by microarray assays. . . . . . . . . . . . . . . . 122 C.11 Fisher’s Exact Tests for the overlap between genes associated with differentially methylated regions (DMRs) and CCT-ABC genes from each of the three experimental tissues in our work. . . . . . . . . . . . . . . . . . . . . 123 C.12 Number of candidate genes neighboring differentially methylated regions (DMRs) between maize and teosinte and proportion in which expression data agrees with methylated status. . . . . . . . . . . . . . . . . . . . . . . 124 C.13 Dominance/additivity ratios for genome-wide gene expression . . . . . . . 125 C.14 Contingency tables for additive and dominant gene counts for A, AB, and ABC candidate lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.15 Degree of overlap between our CCT (AB list) genes and genes in different transcription factor families. . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.16 Degree of overlap between CCT (AB list) differentially expressed genes and genes in the 1.5 support intervals for QTL from a previous study. . . . . . 133 C.17 Degree overlap between our CCT (AB list) differentially expressed genes and genes in metabolic pathways defined in KEGG. . . . . . . . . . . . . . 134 C.18 Significantly enriched and depleted GO terms from CCT and trans only gene lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 D.1 Trait abbreviations and descriptions from the zag2 experiment. . . . . . . 162 D.2 Zag2 transgenic RNAi insertion event, background, phenotype, and t-test p-value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 xiv Abstract The genetic basis for morphological change in divergent species is a central question in evolutionary biology. The domestication of maize from its wild progenitor, teosinte, is an excellent system to address this question. We explore the large effect on domestication phenotypes of a poorly understood region of the maize genome using a chromosome five specific mapping population. Unlike other large effect regions of the maize genome, many traits have multiple QTL that do not stack on a single locus suggesting multiple genes on the fifth chromosome influence domestication traits. Simulation studies show clear evidence for limited power to detect QTL for highly polygenic traits that do not accurately portray the true complexity of the underlying genetic architecture. Two QTL in different locations were chosen for fine mapping studies to identify the underlying causative genes. While a single gene was not identified for either QTL, both were successfully narrowed to less than three centimorgan intervals with relatively few genes and evidence of positive selection during maize domestication. Finally, the first genome-wide effort to characterize cis and trans regulatory change between a domesticated crop and its wild progenitor found extensive regulatory variation with relatively few genes having consistent cis differences, which were determined to be under positive selection during the domestication and crop improvement of maize. Consistent with loss of diversity during the domestication bottleneck, cis expression variation explained by the maize parent is reduced in comparison to teosinte with an even greater reduction seen in cis candidate genes. A general increase in the expression of maize alleles was also observed suggesting domestication in maize may have led to a general increase in gene expression. Collectively, these experiments shed light on the evolution of divergent phenotypes and gene regulation in the domesticated maize and its wild progenitor. xv Preface The nature of functional changes to the genes responsible for phenotypic divergence in related species is a topic of ongoing research in evolutionary biology. Many types of genomic features have been shown to influence the development of novel phenotypes. Studies in closely related species have identified gene duplications [1], various types of expression modification [2–4], and gene coding changes [5, 6] that give rise to altered phenotypes. A major contributor to evolutionary biology research is the study of domesticated crops and their wild ancestors, where the intense artificial selection for agronomic traits during the domestication process serves as a proxy for natural selection mechanisms. Experiments characterizing the functional changes responsible for novel phenotypes in the domesticated systems of rice, tomato, wheat, and sorghum have been met with great success [7]. One of the most successfully used domestication crop models is maize, where scientists have extensively investigated the morphological differences between maize (Zea mays spp. mays) and its wild progenitor (Zea mays spp. parviglumis). Maize is an excellent system to pursue evolutionary questions for a number of reasons. Maize was domesticated approximately 9,000 years ago in the Balsas River valley of Mexico [8]. Like other domesticated systems, maize-teosinte F1 hybrids are fertile, which allows the use of powerful genetic techniques to dissect the genetics of complex traits. The maize reference genome also greatly facilitates research by empowering the use of sequence based analyses and comparative genomics [9]. A common collection of phenotypic differences seen between domesticated crops and their wild progenitors is also observed when comparing maize and teosinte. This “domestication syndrome” [10, 11] consists of phenotypes that improve the suitability of a crop for human use such as loss of shattering (natural seed dispersal), increased apical dominance, loss of prolificacy (concentration of seed into one unit), and gigantism of vegetative and reproductive tissues. xvi One method commonly used to examine genetic factors controlling morphological variation in maize is quantitative trait locus (QTL) mapping. Studies examining the domestication of maize have shown QTL representing the profound morphological differences between maize and its wild progenitor teosinte can be primarily attributed to six regions of large effect on the first five chromosomes of maize [8]. Three of these genomic features have been further characterized, identifying single genes of large, pleiotropic effect. The functional causative polymorphisms of these genes include new tissue specific expression patterns [4], elevated expression [3], and coding sequence change [5]. In contrast to these well characterized loci, other regions of the genome with large effect on domestication phenotypes are poorly understood. A prominent theory in evolutionary biology suggests the primary mechanism by which adaptive evolution occurs is through modification of cis regulatory elements [12, 13]. Consistent with this theory, altered cis regulatory elements in domesticated crops account for a large proportion of identified domestication genes [7]. A striking characteristic of these genes is the variety of functional changes that result from cis regulatory change with examples including elevated and decreased expression [3, 14], development of novel tissue specific expression patterns [4, 15], and heterochronic shifts in expression [16]. The demonstrated importance of gene regulatory change in the evolution of new forms has led to several studies investigating genome-wide gene expression in domesticated crops [17–19]. While measuring gene expression differences between a modern crop and its wild relative is an important step in exploring regulatory variation in an evolutionary context, it falls short of the global analyses in yeast and fruit fly [20, 21] that specifically dissect cis and trans regulatory variation. The work presented in this dissertation seeks to explore two facets of diversification between maize and teosinte. First, quantitative genetic methods are used to specifically assess the architecture of domestication QTL and causative genes on the fifth chromosome xvii of maize, providing insight into the genetic factors underlying this previously uncharacterized region of large phenotypic effect in the maize genome. Second, genome-wide regulatory variation due to cis and trans regulatory change is investigated on a genome-wide scale using deep RNA sequencing. This work is presented in three chapters. 1. The first chapter describes a chromosome five specific QTL mapping experiment. A large BC6 S6 population was developed while fixing other regions known to impact domestication traits for a homozygous genotype. Thirteen phenotypes representing differences between the progenitor and maize were measured in two summers and QTL mapping was performed. We detected an average of approximately two QTL per trait with QTL mapping to multiple regions. This suggested that unlike other genomic regions of importance in maize domestication, the fifth chromosome houses a complex of linked loci that all contribute to the phenotypic effect. Additional efforts were made to examine the power and precision of our mapping population with simulated trait datasets. Heritability of a trait was found to have the primary influence on the maximum number of detectable QTL and we observed the Beavis Effect on estimated QTL effect size. This work provides a focused examination of a previously poorly understood region of the maize genome with large phenotypic effects on domestication traits. 2. The second chapter focuses on fine mapping efforts for two QTL for culm diameter and kernel row number on the fifth chromosome identified in chapter one. Our strategy used a population of plants with homozygous recombinant chromosomes in replicated field trials. Neither QTL was successfully mapped to a single gene, however, the culm diameter QTL was greatly reduced in size (∼2.5% of the original 1.5 LOD support interval). The kernel row number QTL was analyzed with whole genome genotyping data and a complex set of genetic factors influencing the trait were identified. The main kernel row number QTL in terms of LOD score on chro- xviii mosome five shifted to a different region outside of the original support interval. The culm diameter and kernel row number QTL contained 40 and 63 genes, respectively, which were examined for attractive candidate genes. Neither QTL had a clear best candidate, but several genes showed evidence for cis regulatory change and multiple genes had evidence of positive selection during the domestication of maize. While this work was unsuccessful in identifying a single causative gene, we greatly reduce the size of the culm diameter QTL and find evidence for complex inheritance of the kernel row number phenotype. 3. Finally, the extent of genome-wide gene regulatory change is examined using next generation sequencing methods. Three tissues from a collection of maize-teosinte F1 hybrids and their inbred parents were harvested and next generation Illumina sequencing was performed to assess differential expression of alleles. Using a hierarchical series of statistical tests, we differentiate between significant cis and trans regulatory effects for approximately 17,000 genes in each of the three tissues studied. We produce a list of filtered candidate genes (∼500 genes per tissue) with significant and consistent cis effects. These genes are significantly associated with selection features from a recent genome-wide scan for selection in maize, suggesting genes with cis regulatory changes are frequently the target of positive selection. Additionally, the proportion of effect due to cis was observed to be positively correlated with overall divergence. Several other characteristics of the candidate cis genes were also analyzed including gene ontology and other functional annotations. This study represents the first genome-wide effort in a domesticated crop and wild progenitor to assess allele specific expression dissecting cis and trans effects using F1 hybrids. 1 Chapter 1 Genetic dissection of a genomic region with pleiotropic effects on domestication traits in maize reveals multiple linked QTL 2 1.1 Abstract The domesticated crop maize and its wild progenitor, teosinte, have been used in numerous experiments to investigate the nature of divergent morphologies. This study examines a poorly understood region on the fifth chromosome of maize associated with a number of traits under selection during domestication using a QTL mapping population specific to the fifth chromosome. In contrast with other major domestication loci in maize where large effect, highly pleiotropic, single genes are responsible for phenotypic effects, our study found the region on chromosome five fractionates into multiple QTL, none with singularly large effects. The smallest 1.5 LOD support interval for a QTL contained 54 genes, one of which was a MADS MIKCC transcription factor, a family of proteins implicated in many developmental programs. We also used simulated trait datasets to investigate the power of our mapping population to identify QTL for which there is a single underlying causal gene. This analysis showed that while QTL for traits controlled by single genes can be accurately mapped, our population design can detect no more than ∼4.5 QTL per trait even when there are 100 causal genes. Thus when a trait is controlled by 5 or more genes in the simulated data, the number of detected QTL can represent a simplification of the underlying causative factors. Our results show how a QTL region with effects on several traits may be due to multiple linked QTL of small effect as opposed to a single gene with large and pleiotropic effects. 3 1.2 Introduction In evolutionary biology, quantitative trait locus (QTL) mapping has been used with great success to define the genetic architecture controlling morphological differences between species. These QTL mapping experiments have identified a number of QTL with large effects in animal [22–24] and plant systems [25–28]. Often these experiments identify QTL clusters in a relatively small number of genomic regions, suggesting an underlying genetic architecture of single pleiotropic genes or several closely linked genes [8, 24, 29–31]. The phenotypic effects of QTL have been successfully mapped to single large effect pleiotropic genes in many species [3, 5, 15, 16, 32–34]. However, these large effect genes often only explain a portion of the divergence between species, leaving a considerable amount of phenotypic differences unexplained. Characterization of QTL clusters not associated with single genes will lead to a more comprehensive understanding of the genetic architecture that contributes to divergent phenotypes. Domesticated crop plants and maize in particular provide a well-suited system in which to study the evolution of new morphologies for a number of reasons. First, maize (Zea mays spp. mays) and its wild progenitor teosinte (Z. mays spp. parviglumis) differ for a suite of traits commonly seen in domesticated crop pairs. Collectively, these differences are known as the domestication syndrome and include reduced lateral branching, loss of natural seed dispersal, and gigantism of vegetative and reproductive tissues [10, 11]. Second, intense artificial selection upon domesticated crops, including maize, for desirable agronomic traits leaves a signature of selection (reduced nucleotide diversity) allowing for identification of putative targets of artificial selection in selective sweeps [35]. Third, like most domestication events, maize domestication took place in the last 10,000 years and surviving wild progenitor populations serve as reasonable surrogates for the ancestor [36]. In addition, maize and teosinte are inter-fertile, allowing for the use of genetic techniques and crosses to dissect the genetic architecture underlying divergent traits [37, 38]. Finally, 4 researchers studying maize have the advantage of a powerful tool in the reference maize genome sequence providing the ability to anchor genetic markers to physical positions, annotation of candidate genes, and characterization of important genomic features such as centromeres [9]. The combination of these characteristics and available tools make maize an effective model system in which to study the evolution of new forms. Previous work in maize and its wild progenitor suggests the genes responsible for phenotypic change are scattered throughout the genome but with several concentrations of genes (QTL) controlling large portions of the phenotypic differences [8, 25]. To date, three large effect pleiotropic genes have been mapped to these genomic regions of large phenotypic importance. The short arm of chromosome one is home to grassy tillers1 (gt1 ), which influences tillering [39] and is largely responsible for the concentration of seed into a single large ear [4]. The gene teosinte branched1 (tb1 ) is found on the long arm of chromosome one and has a large pleiotropic impact on plant and inflorescence branching [3, 40]. Finally, the gene teosinte glume architecture1 (tga1 ) liberates the kernel from its stony fruit case in teosinte [5]. In comparison to these extensively studied genes, little is known about the genetic factors on other chromosomes responsible for phenotypic divergence during maize domestication. While early studies identified tb1 as the gene responsible for much of the phenotypic effect on the long arm of chromosome one [41], a more recent study has identified at least two additional loci upstream of tb1 with significant effects on phenotype [42]. These loci influence the expression of tb1 -like phenotypes in both additive and epistatic ways. The nearest of these loci was only 5 centimorgans (cM) away from tb1 itself and also had an effect specific to ear traits, leaving plant architecture traits such as tillering unaffected. This suggests secondary factors to major effect genes are potentially quite closely linked and could also mediate tissue specific effects. Similarly, the work identifying gt1 also found 5 evidence of a secondary factor located downstream of the identified causative region that slightly increases prolificacy (the number of ears) in plants carrying the teosinte allele [4]. One of the six genomic regions of large pleiotropic effect identified in maize is on chromosome five where the genetic architecture underlying the large phenotypic effects is largely unknown [8]. Previous work has found a number of domestication QTL on chromosome five for culm diameter, kernel row number, ear diameter, disarticulation, and pedicellate spikelet length [8, 37, 38]. A more recent experiment also found QTL for a number of these traits on chromosome five, some of which (kernel row number, ear diameter, and disarticulartion) had particularly large effect and LOD score [25]. While these previous mapping experiments found significant QTL for domestication traits on chromosome five, they could not determine whether this region contained a major QTL with pleiotropic effects on several traits or multiple linked QTL. In this paper, we undertook a QTL mapping study to better characterize the effect of chromosome five on domestication traits. This experiment utilized a population of nearly isogenic recombinant inbred lines (NIRILs) that allowed for concentration of informative crossover events in the region of interest (chromosome five) and replicated block experiments to improve trait measurements. Both of these characteristics increase the mapping power specifically on chromosome five in comparison with a standard F2 mapping population, improving the ability to differentiate between closely linked, moderate to small effect, and interacting QTL. Our QTL mapping detected QTL at multiple locations on the fifth chromosome, none of which have singularly large effect. This suggests that unlike other regions of the maize genome with single large effect genes [3–5], chromosome five houses several linked factors influencing phenotype. We also performed a simulation study to gauge the power and precision of our mapping population. This analysis indicates that for some traits the genetic architecture could be more complex than observed with empirical data. 6 1.3 1.3.1 Materials and Methods Plant Material, Genotypes, and Phenotypes We conducted a QTL mapping experiment to investigate the genetic architecture of domestication traits on maize chromosome five using a collection of nearly isogenic recombinant inbred lines (NIRILs) in the summers of 2009 and 2010. The experimental population was built by introgressing the majority of the short arm of chromosome five and part of the long arm from a teosinte (Iltis and Cochrane collection 81) into the maize inbred W22 by six generations of backcrossing. RFLP markers (Supplemental Table A.1) were used during this process to follow the desired genomic segment and eliminate teosinte segments at other known domestication QTL identified in a previous study [43]. The extensive backcrossing in tandem with tracking and eliminating teosinte segments from specific regions of the genome allowed the experiment to be focused on the segregating teosinte introgression on chromosome five. Five BC6 individuals heterozygous for the target segment on chromosome five were selfed to produce five BC6 S1 families. The families were then selfed for five additional generations to give an experimental BC6 S6 population of 259 highly homozygous NIRILs, which carried a collection of teosinte fifth chromosome introgressions in an isogenic W22 background. Genomic DNA was extracted with a standard CTAB protocol from tissue collected from an average of 15 individuals from each NIRIL in the summer of 2009. A collection of 25 insertion/deletion and microsatellite markers (Supplemental Table A.2) were genotyped across the fifth chromosome introgression using standard PCR and gel electrophoresis methods. In total, there were 443 observed recombination breakpoints among the NIRILs or approximately 1.7 events per line. The range of recombination breakpoints went from zero to six with the majority of lines (51.7%) having either zero or a single recombination event. The number of lines with each number of breakpoints are as fol- 7 lows: 56 (0 breakpoints), 78 (1 breakpoint), 49 (2 breakpoints), 48 (3 breakpoints), 19 (4 breakpoints), 7 (5 breakpoints), and 2 (6 breakpoints). Phenotype data was collected for the experimental NIRILs in three replicated blocks, two in the summer of 2009 and one in 2010, grown at the West Madison Agricultural Research Station in Madison, Wisconsin. Blocks consisted of the 259 NIRILs planted in randomized plots of ten or twelve plants each in 2009 and 2010, respectively. Five plants from each plot were assessed for thirteen phenotypes (Table 1.1) representing a number of plant and inflorescence phenotypic differences between teosinte and maize. Plant traits included plant height, days to pollen shed, the amount of tillering, length of the primary lateral branch, prolificacy, and culm diameter. Inflorescence traits measured in the female inflorescence (ear) were kernels per rank, kernel row number, ear diameter, ear length, and percent staminate spikelets. Several traits from the male inflorescence or tassel were also measured and include the pedicellate spikelet length and tassel branch number. Genotype and phenotype data are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.7sq67. 1.3.2 Mixed Models and Heritability We estimated the NIRIL phenotype for all traits by fitting a linear mixed model. Fixed effects consisted of NIRIL, NIRIL family, and position within block, while block and year were used as random effects. A model (Equation 1.1) was fit with the MIXED procedure in SAS [44] as an initial scope. In this model, Yijklmno is the individual trait value, µ the overall mean, fj the family effect, ai (fj ) is line nested in family, random block effect is bk , horizontal and vertical position in the field nested in block are represented by cl (bk ) and dm (bk ) respectively, tn the year, eijklmno is the experimental error (between plots), and finally gijklmno for within plot sampling error. Each model term was tested for significance on a trait-by-trait basis with t-tests for fixed effects and likelihood ratio tests 8 Table 1.1: NIRIL phenotyped traits, descriptions, approximate distribution, between year Pearson correlation coefficients, and Pearson p-values. Trait CULM DTP EARD EARL KPR KRN LBLH PLHT PROL SPLH STAM TBN TILL Description Diameter of culm Days to pollen shed Ear diameter Ear length Kernels per rank Kernel row number Primary lateral branch length Plant height Prolificacy, ears on lateral branch Spikelet length Percent staminate spikelets Tassel branch number Tillering index Distribution normal normal bimodal normal bimodal bimodal normal normal exponential normal exponential normal exponential Pearson p-value 0.688 0.668 0.907 0.409 0.698 0.718 0.519 0.652 0.422 N/A 0.321 0.691 0.346 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 N/A <0.0001 <0.0001 <0.0001 9 Table 1.2: Final models selected for the thirteen NIRIL phenotypes. Trait Model CULM line(family) + family + x(plot) + y(plot) DTP line(family) + family + x(plot) + y(plot) + x*y(plot) EARD line(family) + family + x(plot) + y(plot) + x*y(plot) EARL line(family) + family + x(plot) + y(plot) + x*y(plot) KPR line(family) + family + x(plot) + y(plot) + x*y(plot) KRN line(family) + family + x(plot) LBLH line(family) + family + x(plot) + y(plot) + x*y(plot) PLHT line(family) + family + x(plot) + y(plot) PROL line(family) + family + x(plot) SPLH line(family) + family + x STAM line(family) + family + x(plot) + y(plot) + x*y(plot) TBN line(family) + family + x(plot) + y(plot) + x*y(plot) TILL line(family) + family + y(plot) 10 with one degree of freedom for random effects. Likelihood ratio and t-tests with p-values greater than 0.05 were deemed not significant and the corresponding terms were removed from the model. While the initial scope of the model included a random block and year effect, none of the random effects were found to be significant. Following definition of appropriate models for the studied traits (Table 1.2), least squared means for each trait were calculated and used for QTL mapping. Yijklmno = µ+ai (fj )+fj +bk +cl (bk )+dm (bk )+cl (bk )∗dm (bk )+tn +eijklmn +gijklmno (1.1) Broad-sense heritabilities on a plot means basis (H 2 ) were calculated for each of the traits. The variance components needed for this calculation were found using a linear mixed model with plot means as the dependent variable and plot and line as random independent variables. Variance components for the line or genotypic component (σg2 ), the plot (σp2 ), and the residual variance due to environment (σe2 ) were extracted and equation 1.2 was used to calculate H 2 . The plot variance (σp2 ) was calculated in the model as a known source of variation in phenotype. Since this plot variance is known, it does not contribute to unaccounted for environmental variation as seen by the residual variance (σe2 ) and was not used to calculate heritability. H 2 = (σ 2g )/(σ 2g + σ 2e ) 1.3.3 (1.2) QTL Mapping We mapped QTL using a model based approach in R/qtl [45, 46] with phenotype, represented by least squared means, and 25 genetic markers for the NIRILs. The introgression on the fifth chromosome started as a heterozygous segment in the BC6 generation and segregates as a S6 population. Consequently, we analyzed the population as a BC0 S6 in R/qtl. Genotypes were first used to produce a genetic map for the teosinte segment introgression using the Kosambi mapping function [47], with a 0.0001 genotyping error 11 rate as implemented in R/qtl. Genetic marker order was initially found by BLAST to the AGPv2 genome and confirmed using the ripple function in R/qtl with a five marker window. Significant LOD score thresholds were determined for each trait with a 5% cutoff based on 10,000 permutations of the data. QTL models for each phenotype were determined by scanning for potential QTL using the Haley-Knott regression method and testing for QTL significance one-by-one. Definition of QTL models was accomplished by first scanning for QTL with the R/qtl function scanone to find an initial QTL position with a LOD score greater than the 5% cutoff calculated by permutations. Next, we scanned for additional QTL using the addqtl function. If this secondary QTL scan detected a QTL that exceeded the 5% LOD score cutoff defined by permutations, it was added to the model and QTL positions were refined using the R/qtl function refineqtl. QTL were added to the model using this cycle of: (1) scanning for additional QTL, (2) adding significant QTL to the model, and (3) refining QTL positions until no more significant QTL could be added. Once all significant QTL were added, pairwise interactions between QTL were tested using the addint function of R/qtl. Significant pairwise interactions (F-test, p < 0.05) were added to the model one by one until no more significant interactions were detected. After the model was finalized, each QTL in the final QTL model was tested for significance with dropone ANOVA analysis. 1.3.4 Simulation Experiment In order to explore the theoretical maximum number of detectable QTL possible in this study, we mapped QTL with simulated datasets where causative genes were randomly chosen from the genes in the teosinte introgressed region. Simulated traits were made for one to 15 causative genes, then 20 to 50 genes by fives, and then 75 and 100 causative genes for a total of 24 different causative gene set sizes. The 25 genotyped markers in our 259 NIRILs were used to assign genotype probabilities to the 2,576 total genes in 12 the introgressed segment of chromosome five based on the genotype of flanking markers. These genotype probabilities were assigned based on physical proximity to the two flanking markers assuming physical distance was proportional to genetic distance so that a gene closely linked to a given marker had a high probability of sharing that marker genotype. When consecutive markers had identical genotypes, this method resulted in all genes between them matching the flanking genotypes. Phenotypic trait values are based on both the underlying genetic contributions of genes and random environmental noise, which together define the heritability of a trait. The genetic values in the simulated data were set as follows. For each simulated dataset, the randomly chosen causative genes were assigned a genotype based on the previously derived genotype probabilities and two effect types: equal and random gamma distributed (alpha = 1.36 and beta = 1) [48]. The effect types for each gene were given a positive, zero, or negative value depending on whether the assigned genotype was homozygous maize, heterozygous, or homozygous teosinte, respectively. Thus, each simulated causative gene had two numeric values (one for equal and one for gamma distributed effects) representing the magnitude and direction of effect on the trait. The total genetic contribution to NIRIL phenotype was then found by simply summing the gene values (equal and gamma effects kept separate) for all simulated causative genes. Environmental noise was added to the summed NIRIL genetic phenotype values by taking random draws from a normal distribution with variance equal to the additional variance needed to reach the desired level of heritability. Two levels of heritability were simulated, 67% and 90%, to mimic the heritabilities of two actual traits, the moderately heritable culm diameter and highly heritable ear diameter. Heritability of the simulated traits was required to be within 2.5% of the desired heritability, otherwise the normal distribution was resampled. This process resulted in each set of simulated causative genes 13 having four states for the NIRILs: equal effect 67% H 2 , equal effect 90% H 2 , gamma effect 67% H 2 , and gamma effect 90% H 2 . We simulated twenty-four causative gene set sizes with two effect types and two heritabilities for a total of 96 distinct simulated states. Each of these states was replicated 1,000 times resulting in 96,000 simulated sets of phenotypes for the 259 NIRILs. These phenotype values were then used with actual NIRIL genotypes to map QTL in the R/qtl software using the same method as described in the previous section. Pairwise QTL interactions were not tested for or added in the simulated datasets because interactions were not part of the simulated conditions. Mapping of QTL for thousands of simulated traits could not be accomplished manually and consequently was done with a custom R script that automated the addition of QTL and saved summary information including QTL estimated effect size, position, LOD scores, and number of QTL. 1.4 1.4.1 Results QTL mapping Previous work has shown chromosome five to be home to several high LOD score and large effect size QTL for a number of inflorescence and plant architecture domestication traits [8, 25]. We undertook a high resolution mapping experiment with a population of NIRILs with variable fifth chromosome teosinte introgressions in a W22 maize background. In the summers of 2009 and 2010, the 259 NIRILs were grown in randomized plots arranged in three replicated blocks. Phenotype data for thirteen traits was collected for five plants per plot. Spikelet length was only collected for a single block in the summer of 2010. We analyzed trait measurements from all three grow environments together in a single linear mixed model with block and year as random effects and position, NIRIL, and family as 14 fixed explanatory variables. Least squared means were estimated from the mixed models and later used for QTL mapping. Histograms of the least squared means show several distribution types including normal, bimodal, and exponential (Supplemental Figure A.1). NIRILs genotyped as 100% maize (29 lines) and 100% teosinte (27 lines) were used to determine whether traits behaved as expected with the full teosinte introgression lines having more teosinte like phenotypes. Several traits believed to not be primary targets of selection during domestication such as days to pollen shed and plant height appear to have little or no overall difference between NIRILs containing the maize and teosinte introgression, while traits that were the primary focus of selection during domestication including kernel row number (KRN) and ear diameter (EARD) have a substantial phenotypic difference between homozygous maize and teosinte NIRILs. For all domestication traits, we observed a difference (sometimes quite small) between the least squared means for maize and teosinte NIRILs consistent with the expected effect of domestication. Particularly large differences are shown for EARD and KRN traits, where the maize genotype is 17.3% and 14.8% larger than the teosinte genotype, respectively. Also of interest is the CULM trait, where the maize genotype was 6.5% larger than teosinte. There was a balanced representation of maize and teosinte genotypes with a high degree of homozygosity in the QTL mapping population. Overall genotypes of the NIRILs were 48.3% maize, 48.2% teosinte, and 3.5% heterozygous. The NIRIL population included lines with teosinte introgressions across 162.24 megabases (Mbp), from position 6,985,619 to 169,231,037 on the maize reference genome (AGPv2). This introgression included 74.47% of the approximately 218 megabase fifth chromosome. Of the 4,503 fifth chromosome genes on the Filtered Gene Set (version 5b), 411 genes on the tip of the small arm and 1,516 genes on the long arm were not included in the teosinte introgressions used in this study. The genetic map generated with the Kosambi mapping function in R/qtl 15 Figure 1.1: Cumulative plot of QTL detected in the mapping experiment. Molecular marker positions are shown in centimorgans at the bottom. QTL name consisting of an abbreviated trait name, chromosome number, and QTL number are located on the left side. The 1.5 LOD support intervals for QTL are indicated by horizontal bars and peak LOD scores by vertical lines. Hatched bars indicate interacting QTL while solid bars are non-interacting. In total, 24 QTL were identified across the fifth chromosome with a variety of confidence interval sizes, max LOD scores, and effect sizes (See Table 1.3 for QTL statistics). Five QTL clusters with contiguous regions of five or more QTL 1.5 LOD support intervals are indicated by grey shading. A grey-scale heat map depicting number of QTL 1.5 LOD support intervals from white (0) to black (8) is located at the top. 16 was calculated to be 86.64 centimorgans (cM), giving an average Mbp to cM ratio of 1.873 Mbp/cM. We analyzed 13 traits and identified 24 QTL (Figure 1.1, Table 1.3) with a broad range of LOD scores ranging from 2.70 (KPR) to 47.22 (KRN). A single epistatic interaction was detected between the two kernel row number QTL, suggesting epistasis is minimal. QTL 1.5 LOD support intervals ranged from 2.3 cM (KRN) to 50.6 cM (KPR) with an average value of approximately 12.5 cM. Heritability on a plot mean basis (Table 1.3) for each trait varied with an average H2 of 63% and range of 23% (PROL) to 90% (EARD). Five QTL clusters, defined as contiguous regions with five or more QTL 1.5 LOD support intervals, were found in the mapping region on chromosome five near 2, 51, 61, 70, and 84 cM (Figure 1.1). There is no clear single concentration of QTL, suggesting this genomic region lacks a single gene of large, pleiotropic effect and that multiple linked factors at loci spread across the fifth chromosome are responsible for the previously identified influence of chromosome five on domestication traits. 1.4.2 Simulation Experiment We performed a simulation experiment to determine the power and precision of our mapping population. Using causative genes projected onto actual NIRIL genotypes, a total of 96 distinct simulated states in terms of number of genes (between one and 100), heritability (67% and 90%), and effect type (equal and gamma) were replicated 1,000 times for a grand total of 96,000 simulated NIRIL trait datasets. Histograms of simulated traits with 90% heritability were clearly bimodal when one causative gene was simulated and progressively moved towards a normal distribution as more and more causative genes were simulated. In comparison, simulated traits with 67% heritability lack a clear bimodal distribution even when only a single causative gene was simulated and are clearly approximately normal when 100 genes are simulated (Figure A.2). 17 Table 1.3: Detected QTL for the T5S mapping population with position, heritability, and LOD score statistics. LOD 1.5 LOD SI Peak Location Percent Variation H2 culm5.1 13.50 58.9 – 69.3 65.3 21.3% 66.5% dtp5.1 dtp5.2 dtp model 16.36 18.76 28.93 0.0 – 11.7 75.7 – 80.0 — 2.3 77.4 — 20.1% 23.6% 40.1% — — 67.3% eard5.1 eard5.2 eard5.3 eard model 3.00 17.99 33.76 65.62 0.0 – 24.2 50.1 – 54.4 82.9 – 85.9 — 12.9 51.9 84.4 — 1.7% 11.7% 25.6% 69.0% — — — 90.0% earl5.1 12.38 0.0 – 5.4 1.9 19.7% 49.1% kpr5.1 kpr5.2 kpr5.3 kpr model 2.70 6.80 4.11 27.41 0.0 – 50.6 44.9 – 64.8 76.0 – 86.2 — 2.2 63.2 80.9 — 3.0% 7.9% 4.6% 38.5% — — — 72.7% krn5.1 krn5.2 krn5.1:2 krn model 6.22 47.22 3.32 50.56 18.8 – 24.7 82.6 – 84.9 — — 21.5 83.8 — — 4.8% 53.4% 2.5% 59.2% — — — 73.7% lblh5.1 24.61 75.0 – 81.1 79.0 35.3% 53.5% plht5.1 plht5.2 plht model 7.64 2.89 14.06 0.0 – 2.4 24.3 – 39.2 — 0.0 31.7 — 11.3% 4.1% 22.0% — — 63.1% prol5.1 8.38 56.9 – 71.6 64.2 13.8% 22.9% splh5.1 splh5.2 splh5.3 splh model 9.14 7.16 2.78 30.60 0.0 – 18.7 65.7 – 68.4 74.3 – 86.6 — 13.0 67.7 78.0 — 10.2% 7.9% 2.9% 41.8% — — — 88.3% stam5.1 6.50 50.7 – 86.6 83.8 10.9% 25.9% tbn5.1 tbn5.2 tbn model 8.28 4.60 10.46 0.0 – 4.0 43.6 – 53.2 — 0.3 47.3 — 13.1% 7.1% 16.9% — — 69.9% till5.1 till5.2 till model 7.21 3.22 18.61 44.1 – 62.9 77.2 – 85.9 — 58.7 81.8 — 9.8% 4.2% 28.1% — — 34.3% 18 Since calculating significant LOD score thresholds via permutations for all 96,000 simulated phenotype sets would have taken weeks of computation time, we calculated LOD score cutoffs in the first 50 replicates of the 96 states. The average threshold was lower for 90% heritability than 67% heritability with no clear difference in threshold caused by the effect type of causative genes. Simulated phenotypes with few causative genes had a lower threshold on average with this effect more pronounced for the gamma distributed effect type. The range of LOD score thresholds determined was quite narrow (2.37 to 2.59 for gamma distributed and 2.38 to 2.60 for equal effects). Consequently, instead of running permutations for the remaining datasets we set a conservative LOD score threshold for mapping all simulated traits. The cutoff we chose was the maximum of the 5% cutoffs found in the first 50 replicates of each of the 96 states. After simulated phenotypes were generated and significance thresholds were set, QTL were mapped using the 96,000 simulated datasets with actual genotypes for the NIRILs in this study. Increasing the number of simulated causative genes from one to 100 caused the mean number of detected QTL to rise from one to ∼4.5 or ∼3.0 for simulated traits with 90% or 67% heritability, respectively (Figure 1.2). Thus, heritability was an important factor in determination of the number of detectable QTL in our experiment. The simulated gamma effects, as opposed to equal effects, appeared to cause the maximum number of detectable QTL to be reached at a larger number of simulated causative genes, but there was no difference in the overall maximum number of QTL detected. Our results show that QTL 1.5 LOD support intervals quickly become associated with multiple genes when many causative genes are simulated (Figure 1.3). In the case of five causative genes with equal effect and 67% heritability, the chance of a QTL containing a single causative gene has already dropped to approximately 50% (Similar patterns are seen for gamma simulated phenotypes in Supplemental Figure A.3). This suggests when making decisions about fine mapping of QTL, researchers would be well advised to 19 Figure 1.2: The number of detected QTL and mean detected QTL effect size versus number of simulated causative loci. Black lines indicate 95% confidence intervals. (A) Simulations consistently detect one QTL when a single causative gene is simulated, but when using as few as three or four causative genes, we lose the ability to distinguish between genes. With high numbers of simulated causative genes, total QTL detected reaches a ceiling of ∼4.5 QTL for simulated traits with 90% heritability and ∼3.0 for traits with 67% heritability. (B) The effects of unresolved genes are merged into the few large effect QTL that are detected, consistent with the Beavis Effect. This is seen in the negative correlation between mean estimated effect and number of causative genes. 20 consider factors such as trait heritability and the power of their mapping population to identify QTL support intervals that contain single causative genes. In our simulation experiment, increasing the number of causative genes also led to an increase in the average estimated effect size of detected QTL (Figure 1.2). We interpreted this as the effects of multiple underlying causative genes being combined into a single detected QTL with a cumulative effect, consistent with the Beavis Effect where multiple small effect loci are detected as single QTL of larger effect [49]. On average, the total additive effect for each simulated phenotype should be the product of the total number of simulated causative genes and the average effect size. We found this expected relationship between number of detected QTL, average estimated additive effect of each detected QTL, and expected total additive effect for both equal and gamma distributed effect size and both heritabilities. Our mapping results using empirical, measured traits, found three QTL for a trait with heritability of 90% (ear diameter) and a single QTL for a trait with 67% heritability (culm diameter). Comparison of these results with the simulations show that for traits with 90% heritability, when three or more QTL are detected there is likely to be anywhere from four to six underlying causative genes, making a 1:1 relationship between number of QTL and causative genes uncertain (Figure 1.2). In contrast to this result, simulated traits with heritability of 67% and a single causative gene averaged a single detected QTL which contained the causative gene 90% to 95% of the time. These observations have implications for future fine mapping efforts to identify the causative gene underlying QTL. 21 Figure 1.3: The proportion of detected QTL with zero, one, or more than one simulated causative genes in the 1.5 LOD support interval. High numbers of causative genes lead to detected QTL that contain multiple causative genes. There is a reasonable percentage of detected QTL in the simulations that contain a single causative gene when few (less than 4) causative genes are simulated, but as the number of simulated causative genes increases we quickly lose the power to distinguish between closely linked causative genes and they become lumped into single detected QTL. Equal effect simulations shown here are very similar to those seen for the gamma distributed effects (Supplemental Figure A.3). 22 1.5 Discussion Previous studies in maize have found single genes underlying genomic regions of large effect on multiple domestication traits [3–5, 41, 50]. This is in stark contrast to our work on chromosome five, where the previously observed large effect of chromosome five on several domestication traits in maize [8, 25] is caused by multiple regions spread across the chromosome. This suggests the nature of genetic factors controlling domestication traits on chromosome five of maize are different from other large domestication loci in maize. Whether or not the situation of chromosome five in maize is unique in maize or crop plants is yet to be seen, but the several loci identified in this study suggest that in addition to effectively acting on highly pleiotropic, large effect single genes, the domestication process also has the capacity to work on several linked genes of variable effect to produce a chromosomal region of large QTL effect. Although our results show that several regions on chromosome five contain QTL affecting different traits, this chromosomal region was initially defined as several tightly clustered QTL in F2 crosses between teosinte and a small-eared primitive Mexican landrace [43]. In contrast, our NIRIL population was developed from a cross of teosinte by a modern agronomic maize inbred (W22) and is expected to harbor domestication QTL as well as improvement QTL selected on during the past 9,000 years since maize was domesticated. Thus while results from this analysis suggest chromosome five houses a complex made of multiple linked factors, we cannot discount the possibility that a simpler genetic architecture would have been observed had we used a primitive maize landrace rather than the maize W22 inbred line. One potential use of QTL mapping results is interrogation of the genes within QTL 1.5 LOD support intervals for likely candidates. The marker density in our experiment leads to most QTL 1.5 LOD support intervals containing hundreds of annotated genes. However, two QTL had a narrow support interval that contained a relatively small number of 23 genes. These two QTL were krn5.2 and eard5.3, which co-localize to the same ∼2.3 cM region. When expanded to the nearest genetic markers, these QTL fell between umc1348 and um1966, which spanned a 4.81 cM region that included 2.654 Mbp with 54 genes from the maize filtered gene set (AGPv2). One interesting candidate that falls in this range is AC212823.4 FG003, which encodes a MADS box transcription factor previously cataloged as MADS-transcription factor 65 (mads65) in the GRASSIUS transcription factor database [51]. Initially identified in plants as important floral organ identity regulators [52, 53], the MADS-box family of transcription factors has since been shown to be involved in a wide variety of developmental programs in various organs and stages of plant development [54]. This particular MADS-box gene has homology to the rice gene OsMADS57, a type II MIKCC MADS gene. The large subclass of MIKCC MADS genes is quite diverse with members involved in floral specification, phase transition, and root development among other developmental functions [54]. This gene was also found to be selected during crop improvement by a recent study [55] and was expressed in many tissues as described in the maize gene expression atlas [56]. All of these factors make AC212823.4 FG003 an attractive candidate in future studies to fine map the causative gene for kernel row number on chromosome five. The limits of a QTL experiment in terms of power and resolution are important factors to consider when undertaking an experiment in any mapping population. To better inform our QTL results with empirically measured traits, we explored the computational limits of the experimental mapping population using simulated trait datasets. In this experiment, we never detected more than six QTL for any of the simulated conditions. The most important characteristic of simulated traits in determining number of detected QTL was heritability and not effect type. As expected, when the number of underlying causative genes increased to a high level, we saw the effect of multiple causative genes being rolled into single detected QTL. This result is consistent of the Beavis Effect [49], a phenomenon 24 that describes the tendency for QTL of small effect to be combined into a single QTL with large estimated effect. If these polygenic QTL, which can have quite high LOD score and effect size, were chosen for fine mapping we would be unlikely to find a single underlying causative polymorphism. Consequently, when considering QTL for fine mapping purposes, researchers must be careful in choosing QTL that have high heritability and mapping populations with sufficient power to resolve QTL to single genes. It is important to realize that the simulation results reflect the specific markers, genotypes, and mapping population used in this study. While some results are likely generally applicable to other QTL experiments, simulations using mapping population specific parameters will provide the best insight into potential genetic architectures and information on population power and precision. QTL mapping has been used to great effect to characterize the genomic regions controlling traits selected on during domestication in maize. These studies have shown that while genetic factors controlling domestication traits are spread throughout the genome, there are concentrated genomic regions where QTL for several domestication traits are in close proximity to each other [8, 25]. In this study, we use a QTL mapping population of NIRILs with teosinte introgressions specific to chromosome five to closely examine previously mapped QTL for a number of domestication traits. We confirmed QTL for these traits exist on chromosome five, however, in our population these QTL further fractionate into multiple QTL. This is in contrast to other genomic regions of large effect in maize where single pleiotropic genes were identified as the causative factor underlying genomic regions of large effect [3–5, 50]. The presence of multiple QTL in several locations on chromosome five suggests the existence of a complicated, linked, multi-gene locus controlling various aspects of domestication traits. This apparent complexity of the chromosome five locus is consistent with results from our simulation experiment, where 25 we show that traits with multiple mapped QTL likely have a more complicated underlying genetic architecture than is indicated by the initial QTL mapping results. 26 Chapter 2 Fine mapping of chromosome five domestication genes in maize 27 2.1 Abstract The fifth chromosome of Zea mays has previously been shown to contain a large effect QTL for several domestication traits. In this work I describe efforts to identify the causative polymorphisms responsible for several of these QTL for the domestication traits of culm diameter and kernel row number. These two QTL represent the first and eighth highest LOD scores detected in the QTL mapping experiment of chapter 1. We utilized several heterogeneous inbred families drawn from a BC2 S3 mapping population that were heterozygous in the 1.5 LOD support interval of these QTL to generate two sets of recombinant chromosome nearly isogenic lines, one for the culm diameter QTL and one for the kernel row number QTL. Lines were grown in replicated, randomized blocks in four years and phenotypes were measured. A linear mixed model was used to obtain least squared means for each line and we looked for segregation of the phenotype based on indel and genotyping by sequencing markers. Simple Mendelian segregation of the lines was not observed for any of the traits of interest, suggesting a single locus does not explain the differences in phenotype. Consequently, we used QTL mapping software to map QTL in the segregating regions of interest on chromosome five for culm diameter and kernel row number. These analyses showed a highly significant heterogeneous inbred family effect as well as multiple QTL in the target region for kernel row, suggesting the genetic factors underlying kernel row number and culm diameter have a complex relationship with multiple loci on several chromosomes. 28 2.2 Introduction The ultimate goal of many studies investigating the evolution of novel morphology in divergent lineages is identification of the causative genes responsible for phenotypic change. Towards this end, genes causing new forms have been identified a number of times in many species including maize, tomato, wheat, barley, and most successful in rice. Over the years there have been more than 20 genes identified in rice with important effects on agronomic and domestication phenotypes such as loss of shattering in domesticated plants [15], increased grain yield in terms of grain number [57], grain weight [58, 59], and plant architecture [60, 61]. In contrast, there are considerably fewer success stories in fine mapping in other organisms. In maize, recent experiments have mapped several high LOD score, large effect domestication QTL to single genes including teosinte branched1 (tb1 ) [3, 41], grassy tillers1 (gt1 ) [4, 39], teosinte glume architecture1 (tga1 )[5], and ZmCCT [50, 62]. One common characteristic of these genes is they were initially characterized as massive, high LOD, large effect size QTL. In maize, domestication phenotypes have been shown to be largely controlled by six regions of the genome [8]. The large concentration of domestication QTL on the fifth chromosome has been repeatedly observed in several studies [25, 37, 43], however, little is known about the causative genes and underlying polymorphisms that cause this large effect. Experiments designed to examine chromosome five in maize have several challenges caused by characteristics of the chromosome. First, this chromosome has gametophyte factor2 (ga2 ) [63], a pollen incompatibility factor which greatly influences pollination rates of specific genotype combinations. Second, there is an extended region of low recombination rate around the centromere (102.3 megabase to 109.2 megabase) that complicates collection of recombinant chromosomes for mapping experiments. In spite of these challenges, characterizing the many domestication QTL for plant architecture and inflorescence traits on the fifth chromosome of maize is a necessary step towards fully understanding the ef- 29 fect domestication had on the maize genome. While many traits have QTL that map to the fifth chromosome, QTL with exceptionally high LOD score and effect size are of particular interest for fine mapping studies. A high LOD score, large effect QTL for kernel row number (krn) and ear diameter (eard), previously reported on chromosome five of maize [25, 37, 43], was shown to fractionate into at least two or three QTL in Chapter 1. The largest QTL for both of these traits in terms of LOD score and effect size (eard5.3 and krn5.2 ) were both located towards the right of the mapping interval between umc1966 and umc1348. The krn5.2 QTL had a LOD score of 45.2, explained 51.98% of phenotype variation, and was estimated to have an additive effect of -0.73 kernel rows. The co-localizing eard5.3 QTL also had a trait high 32.7 LOD score, 25.1% variation explained, and effect of −1.41 mm. This region was ∼1.3 cM or 2.65 Mb and was the narrowest confidence interval found for the mapping population used in chapter 1. The kernel row number and ear diameter traits are highly related, both affecting ear size in the transverse plane. This fact, viewed in the context of co-localization of eard5.3 and krn5.2, suggests a single gene influences both traits. In addition to the high LOD score QTL for krn and eard, the fifth chromosome of maize was shown (Chapter 1) to have QTL for plant architecture traits including tillering, lateral branch length, and culm diameter. The QTL for culm diameter in chapter 1 had the eighth highest LOD score detected. In contrast with the krn5.2 and eard5.3 QTL, mapping for culm diameter revealed a single QTL of moderate effect, culm5.1. This QTL had a considerably larger 1.5 LOD support interval (97.3 megabases), lower LOD score (19.8), lower variation explained (21.27%), and smaller additive effect size (−0.67 mm). The characteristics of culm5.1 in terms of number of QTL, LOD score, and effect size make for a different type of fine mapping candidate than krn5.2 and eard5.3. 30 An experiment was designed to further investigate and identify the causative polymorphisms behind the large effect and LOD score krn5.2 /eard5.3 QTL and the moderate effect culm5.1 QTL. This project used a collection of recombinant chromosome nearly isogenic lines (RCNILs) grown in replicated randomized blocks over multiple years. These RCNILs were derived from heterogeneous inbred families (HIFs) drawn from a BC2 S3 population with a massive ear diameter QTL [25] with a maximum LOD score of 144.4. Lines were generated, genotyped, and grown in replicated blocks in the summers of 2010, 2012, and 2013. RCNILs did not segregate cleanly in the target QTL 1.5 LOD support intervals for the kernel row number and culm diameter phenotypes. I next used genomewide genotyping and QTL mapping methods to account for secondary segregating regions in the genome. The results of this analysis suggest that not only are secondary sites segregating with significant effects on kernel row number and culm diamter, but that multiple factors are again segregating within the initial target QTL support interval. Overall, these results suggest the genetic architecture controlling domestication traits is quite complex with multiple loci contributing to kernel row number and culm diameter phenotypes across the genome. Chromosome five in particular appears to house a collection of genes affecting several domestication traits and represents at least three linked loci that may have been selected as a unit during maize domestication. 2.3 2.3.1 Materials and Methods Plant material We chose to identify the causative genes underlying the large LOD score and effect size QTL for kernel row number (krn) and culm diameter (culm) on chromosome five using recombinant chromosome nearly isogenic lines (RCNILs). These lines consist of individuals carrying two copies of a recombinant chromosome with a recombination breakpoint in the 31 region of interest, which corresponds with the 1.5 LOD support intervals for culm5.1 and krn5.2. Based on QTL mapping results from chapter 1, the two QTL are adjacent to each other with the culm diameter QTL from 54,416,924 to 151,717,831 bp and the kernel row number QTL from 166,576,639 to 169,231,037 bp. Base pair coordinates for these QTL are based on BLAST of flanking marker primer sequences against the second version of the maize reference genome (AGPv2) [9]. I chose to generate RCNILs from segregating heterogeneous inbred families (HIFs) taken from a large BC2 S3 mapping population. Four founding HIFs, two per QTL, heterozygous for the genomic region of interest defined by QTL 1.5 LOD support intervals and surrounding regions were used in production of RCNILs. Care was taken to use HIFs with limited heterozygosity adjacent to the primary region of interest and elsewhere in the genome. A large number of plants from each HIF were screened with PCR based insertion deletion (indel) markers flanking the region of interest to identify plants with recombinant chromosomes in the summers of 2009 and 2010. The initial screening of HIFs for individuals with recombinant chromosomes used three flanking markers (ZHL0029, ZHL0033, and umc1966) located at 38,994,478 bp, 151,446,717 bp, and 169,230,959 bp, respectively. These markers were chosen to be as close as possible to the boundaries of the QTL. Individuals with recombinant chromosomes were self pollinated and seed was harvested and planted in the following winter grow seasons. Plants were grown in winter seasons in a greenhouse environment, where they were genotyped again at the flanking markers to identify plants homozygous for the initially detected recombinant chromosome. These individuals were then self pollinated to make RCNIL seed, carrying two copies of the original recombinant chromosome, for use in subsequent summers for randomized phenotyping blocks and seed increasing purposes. 32 2.3.2 Field Trials and Phenotypes The RCNILs were grown in a total of 16 replicated, randomized blocks in multiple summers between 2010 and 2013. Phenotyping experiments took place at the West Madison Agricultural Research Station (WMARS) with RCNILs for the culm5.1 QTL grown in 2010 and 2012 with the krn5.2 QTL lines grown in 2012 and 2013. When possible, seed for a single RCNIL was taken from up to five seed packets and mixed prior to planting in order to minimize the effect of any single seed lot (mother plant) on phenotype. In each summer, four blocks of RCNILs per QTL were grown in twelve plant plots. Individuals were planted with equal spacing in 14 foot rows with 30 inches between adjacent rows and two foot walkways separating the end and start of a new row. Up to five individuals per plot were measured in 2010 and 2012, while in 2013 kernel row number was assessed for all possible plants. In addition to twelve plant plots, select lines for the culm diameter QTL were grown in a phenotyping block of fully randomized single plant plots (SPP) in the summer of 2012. This block of plants consisted of 60 individuals each from seventeen RCNILs and eight control RCNILs (homozygous for the maize or teosinte chromosomal segment) grown in a completely randomized scheme. The seventeen RCNILs were chosen due to recombination breakpoints being close to preliminary estimates of the causative gene location based on initial analysis of data from the summer of 2010. Individual plants were separated by a larger than normal distance (30 inches in the X and 48 inches in the Y dimension) in order to allow them to grow to their full phenotypic potential with minimal competition and shading from neighboring plants. Traits were measured by hand with culm diameter taken manually in the field with calipers at the narrowest point of the stalk and kernel row number counted after harvest in the lab (2012) or in the field (2013). In the SPP we also measured culm diameter at the largest point to calculate culm area and other basic plant architecture traits (plant height 33 and tiller number) for use in later analyses. In total, 3,182 individuals were assessed for culm diameter (1,021 in 2010 and 2,161 in 2012) and kernel row number was counted for 8,625 individuals (3,168 in 2012 and 5,457 in 2013). A highly related trait to kernel row number (ear diameter) with a co-localizing QTL detected in chapter 1 (See Table 1.3 for details) was also measured in some environments, but was not considered for later analyses since kernel row number and ear diameter are highly related traits. 2.3.3 Genotyping with PCR and next generation sequencing Genomic DNA was extracted from the initial plant of each RCNIL with a standard CTAB method and genotyping from this “founder” individual was used to represent the RCNIL genotype in later analyses. The genotypes of RCNILs were obtained using two strategies, a PCR based method targeting known polymorphisms and a high throughput next generation sequencing protocol. All RCNILs were genotyped using PCR of known indels and single nucleotide polymorphisms (SNPs), while a subset were genotyped using the high throughput genotyping by sequencing (GBS) protocol. All RCNILs developed for fine mapping of krn5.2 were genotyped by GBS while only a subset of culm5.1 RCNILs were genotyped by GBS. However, genotyping of culm5.1 lines was done with a more extensive collection (18 markers) of PCR markers than krn5.2 RCNILs (5 markers). PCR based genetic markers (Supplemental Table B.1) were used to genotype RCNILs with standard agarose gel electrophoresis, florescent fragment analysis, and Sanger sequencing detected SNPs. These three styles of marker were initially developed by identification of scorable polymorphisms that distinguished maize and teosinte control RCNILs through Sanger sequencing of annotated genes in the maize reference genome (AGPv2). Size polymorphism differences greater than approximately 10% of total PCR product length were scored on 4% agarose gels and smaller size polymorphisms were redesigned with florescently labeled primers and genotyped using GeneScan software (v1.70) from 34 Applied Biosystems. If the only scorable polymorphism was a SNP, RCNILs were genotyped by Sanger sequencing and hand calling of SNPs. While great care was taken to choose founding HIFs with minimal heterozygosity, all HIFs had secondary sites segregating elsewhere in the genome. In order to identify these regions and account for their effect on phenotype, we performed GBS [64] on RCNIL genomic DNA for all kernel row number and the subset of culm diameter lines grown in the single plant plot experiment. In order to use the GBS protocol, additional molecular work was required. DNA was treated with 1 µL of RNaseI at room temperature for 30 minutes to remove total RNA from the CTAB DNA preparation. Next, the samples were digested using the methylation sensitive ApeKI restriction enzyme and 96-plex barcoded sequencing adapters were ligated to individual samples. Finally, the 96 barcoded samples were mixed and sequenced (100 bp reads) on an Illumina HiSeq machine [64]. Sequence tags were aligned to the reference maize genome (AGPv2) and SNPs were called and imputed using the GBS pipeline as implemented at Cornell University. This GBS procedure resulted in 955,650 SNPs made up of raw A, T, C, and G SNP calls for the RCNILs across the ten maize chromosomes. Raw SNP calls were further processed in order to call RCNIL genotypes into maize and teosinte using a custom Perl script. The genotype calls were made using SNP calls from the pure maize parent inbred line, W22. Only biallelic markers (43,025 total) were kept and the non-W22 SNP in the RCNILs was assumed to be the teosinte allele. After converting the genotypes into maize, teosinte, and heterozygous calls, SNPs separated by less than 100 base pairs were merged into a single marker, leaving 25,736. If SNP genotypes within a merged marker did not agree they were converted to missing data, “N”. After marker genotypes were called and merged, a final genotype imputation step was carried out using another custom Perl script. In an effort to have this script correct bad and missing data, all genotype calls were subject to imputation. The criteria for 35 changing a call in any given RCNIL involved ten marker windows both upstream and downstream of a given marker. If all markers in one direction or the other were 100% consistent, then the genotype was changed if and only if seven of the ten markers on the other side were also the same genotype. The imputation methods for GBS data described above greatly improved genotype continuity, however, certain regions of the genome were still questionably called. The most inconsistently called genomic regions included extended heterozygous and recombination breakpoints where genotypes switched. Following the processing steps using custom Perl scripts, the data were manually screened to remove and correct inconsistently called markers. Uninformative markers where the adjacent marker to either side had exactly the same genotypes were also removed from the dataset. Regions of the genome where RCNILs had maize and teosinte fixed genotypes that associated with HIF (non-segregating regions fixed for different genotypes in the founding HIFs) were also removed. Finally, independently segregating regions on the same chromosome were given unique chromosomes names (5a, 5b, etc.) to avoid inflation of the genetic map between fixed ancestral recombination breakpoints. After imputation and filtering, a total of 522 genome-wide GBS markers spread across 13 segregating regions of the genome on six chromosomes were used in the final analysis. The four other maize chromosomes were completely fixed for a single homozygous genotype and consequently were excluded from the analysis. 2.3.4 Statistical analysis and segregation of phenotypes We utilized the statistical program SAS to fit a linear mixed model with the PROC MIXED command [44]. Variables used in the model included the RCNIL, HIF, block, year grown, and position within the block. A forward model selection method was used in which the starting model had a minimum number of variables (fixed effects for HIF and RCNIL 36 nested in HIF) and additional variables were added to the model one at a time until the Aikake Information Criterion (AIC) reached its lowest point. The most complicated model selected was for the culm diameter single plant plot experiment, where five explanatory variables were used (Table 2.1). In these models Y stands for the measured phenotype, µ stands for the grand mean, ai the RCNIL, fj corresponds to the HIF, bk the block, cl and dm denote the horizontal and vertical position in block respectively, tn stands for the year, hp is the tiller number phenotype used in SPP experiment only, and finally e and g are error terms. While the single plant plot culm diameter had the most complex model, the other models had only one less variable. Least squared means for the RCNIL nested in HIF effect were extracted and used as an average line phenotype value for subsequent analyses. The goal of these analyses was to associate RCNIL phenotype and genotype. If a single locus in the segregating region is responsible for the phenotypic effect, one should observe simple, clean Mendelian segregation of least squared means based on genotype. Towards this end, RCNILs were sorted by phenotypic value (as represented by least squared mean). Unfortunately, we did not observe segregation of least squared means based on genotype, suggesting multiple factors influence the measured traits. We have two main hypotheses as to why RCNILs failed to segregate in a Mendelian manner. First, the less advanced nature (in comparison to the BC6 S6 population from chapter 1) of the BC2 S3 founding HIFs of the RCNILs may have additional factors segregating elsewhere in the genome that are confounding Mendelian segregation. Second, the primary locus of interest on chromosome five is not a single gene, but rather multiple linked genes that when split up by the various recombination breakpoints in our RCNILs leads to complicated segregation patterns. In order to investigate both of these possibilities, we obtained whole genome genotypes using GBS (described above) and mapped QTL in the R/qtl software package for plants grown in twelve plant rows. The single 37 Table 2.1: Final linear mixed models used to produce least squared means for fine mapping RCNILs. Trait Linear Mixed Model culm (rows) Yijkmo = µ + ai (fj ) + fj + bk + dm (bk ) + eijklm + gijkmo culm (SPP) Yijlmop = µ + ai (fj ) + fj + cl + dm + hp + eijklmnp + gijkmnpo krn (rows) Yijkno = µ + ai (fj ) + fj + bk (tn ) + tn + eijklmn + gijkmno 38 plant plot culm diameter experiment was not analyzed with QTL mapping methods since it only included seventeen RCNILs and consequently lacked power for a QTL analysis. The benefit of using GBS and the statistical methods of QTL mapping are simultaneous exploration of multiple factors in the target QTL region and secondary genomic regions of significant effect outside the QTL. A potential flaw in this approach is lack of statistical power to differentiate closely linked, moderate effect QTL in the relatively small RCNIL fine mapping populations. The full set of RCNILs was used to map QTL for the krn and culm diameter traits in order to maximize our potential power to differentiate between tightly linked factors. In total, 75 lines were used in mapping of the culm5.1 QTL (67 recombinant RCNILs and 8 homozygous maize and teosinte controls). The krn5.2 QTL was mapped with 92 lines, all of which were recombinant chromosome lines. QTL mapping was conducted using the R/qtl package [45] with genetic maps calculated using the Kosambi mapping function with 0.001 error rate. Ten thousand permutations of the data were used to define a significant QTL threshold. QTL were mapped using a step-wise model based approach where QTL were added to a model one-by-one using the addqtl, fitqtl, and refineqtl functions of R/qtl until no more significant QTL were detected. In addition to using detected QTL in the model, the founding HIF was used as an additive covariate to account for variation caused by fixed non-segregating regions of the genome that differ between HIFs removed in the manual curation of GBS genotypes. Details of the step-wise QTL mapping method are available in chapter 1 methods. 2.4 2.4.1 Results RCNIL generation and phenotype least squared means I screened 4,180 total individuals from the four founding HIFs for recombinant chromosomes in the summers of 2009 to 2011. In total 67 and 92 recombinant individuals 39 Figure 2.1: Histograms of least squared means for the culm diameter and kernel row number phenotypes. Distribution of least squared means is approximately normal for the culm diameter least squared means. The kernel row number counts have a noticeable left skew. Average least squared mean for homozygous teosinte and maize RCNILs (designated by solid and dashed lines, respectively) have the expected relationship with the teosinte average always being the lower phenotypic value. 40 were identified and turned into RCNILs in the 1.5 LOD support intervals of culm5.1 and krn5.2 /eard5.3, respectively. The vast majority (3,230 of 4,180) of screened individuals came from HIFs intended for study of the culm diameter QTL. This large number of individuals was required due to the presence of the centromere in the middle of the target QTL region, which greatly reduced recombination rate and limited the number of recombinant individuals. Three linear mixed models were used to analyze the phenotype data for kernel row number and culm diameter. Each model was selected using a forward selection method in which one variable was added to the model at a time until the model fit, as measured by AIC, did not improve. When plotted as histograms, the least squared means of the various RCNILs followed a roughly normal distribution for culm diameter, while the kernel row number trait had a left skew. RCNILs homozygous for the maize and teosinte segment showed the expected relationship with maize RCNILs having larger culm diameter and more kernel rows (Figure 2.1). 2.4.2 PCR and GBS genotyping Initial genotyping of the RCNIL homozygous recombinant chromosome genomic DNA was carried out through traditional methods using PCR. In total I placed 18 markers on 75 RCNILs (including four maize and four teosinte control lines) for the culm diameter QTL and five markers on 92 RCNILs (no maize or teosinte control lines) for the kernel row number QTL (Table B.1). Only five markers were placed on kernel row number RCNILs for two reasons. First, the kernel row RCNILs had recombination events in a much smaller physical distance (17.78 Mb versus 112.45 Mb for the culm diameter QTL). Second, we expected to obtain thorough genome-wide genotyping using GBS, which had already been initiated for the krn RCNILs. Figure 2.2: GBS genotypes for kernel row number RCNILs. Thirteen regions across the genome are segregating in the kernel row number RCNILs. The primary QTL of interest is located in the 5b region where all RCNILs have crossover events. Secondary segregating regions in only one of the two founding HIFs are clearly visible for several genomic regions, for example chromosome 8c segregates in HIF MR0841 but not MR0818. Figure is scaled to marker, so each unit of length represents a single marker. 41 42 Of the nearly one million original SNP calls, only about 5% appeared to be segregating in a biallelic manner. The end genotyping resulted in zero segregating markers on chromosomes one, two, four, and six. The structure of the founding HIFs implies each independent region of the genome that was heterozygous segregates independently of other regions. To account for this, each segregating region was assigned its own linkage group (5a, 5b, etc.) for QTL mapping so that non-segregating segments between heterozygous regions would not influence the results. Overall, 522 markers in 13 linkage groups were segregating across the six other chromosomes (Figure 2.2). 2.4.3 QTL fail to segregate as Mendelian traits RCNILs were sorted by the phenotype least squared mean from least to greatest and we looked for distinct maize and teosinte RCNIL groupings. There was not a clean segregation of RCNILs into maize and teosinte classes for a single marker, suggesting multiple factors within the primary QTL of interest or elsewhere in the genome are influencing the traits of interest (Figure 2.3). The culm diameter trait came closer than the kernel row number trait to clean segregation, especially for lines planted in the single plant plot. An additional complication for both the culm diameter and kernel row number fine mapping was the distinct difference between the grand mean of RCNILs derived from different founding HIFs. For kernel row number, there was an average difference of approximately 1.8 kernel rows between RCNILs from different HIFs and the average rank of HIFs differed by over 40. For culm diameter, the two HIFs differed by an average of approximately 0.1 cm (Figure 2.4). Founding HIF was part of the linear mixed model used to produce least squared means, but obviously the model failed to fully correct for differences between founding HIFs. With this in mind, HIF was used in subsequent mapping methods to further account for differences caused by the starting HIFs. 43 Figure 2.3: RCNILs sorted by phenotype from least to greatest. Genotypes for RCNILs are indicated on the left by green (teosinte), yellow (maize), grey(heterozygous), or white (N) with least squared means as barplots on the right. (A) Culm diameter least squared mean from the twelve plant rows. (B) Culm area as measured in the single plant plots. (C) Kernel row number counted from twelve plant rows. A single causative gene should lead to segregation as a Mendelian locus when sorting RCNILs by phenotype. This was not seen and the genotypes appear more or less random suggesting multiple factors influencing phenotype in the RCNILs. 44 Figure 2.4: Density plots of the culm diameter and kernel row number phenotypes grouped by founding HIF. Distinct differences between distributions are visible between the two founding HIFs for culm diameter (both in the (A) single plant plot and (B) twelve plant row designs) as well as for the (C) kernel row number phenotypes. The overall phenotype means for each HIF are designated by the dashed line for the red HIF and the solid line for the blue HIF). 45 2.4.4 Multiple factors contribute to culm diameter and kernel row number QTL mapping was performed using least squared means as phenotypes and merged genotypes from GBS and PCR methods. Since a limited number (17) of culm diameter RCNILs were genotyped with GBS, we used PCR markers only for culm diameter mapping and consequently QTL were only mapped in the primary segregating region of interest on chromosome five between 59.6 Mb and 144.8 Mb. In contrast, all 92 RCNILs generated for fine mapping of the kernel row number phenotype were genotyped by GBS allowing for full accounting of QTL in genomic locations away from the primary QTL of interest. A single QTL was detected for the culm diameter trait, suggesting a single factor could be responsible for culm5.1 (Figure 2.5). However, there was a very significant founding HIF effect in the QTL mapping model (Table 2.2). This founding HIF effect (F-test, p < 8.59e-10) suggests secondary sites in the genome are still at play and could explain the inability to observe clean, simple segregation of RCNILs based on genotype in the QTL of interest. While mapping of a single QTL for culm diameter is encouraging, the relatively weak QTL LOD score (5.1) and small additive effect (-0.035) in comparison with the HIF LOD (8.7) and effect (0.098) tells us that secondary sites are more important contributors to culm diameter than the QTL we were seeking to fine map. Results for kernel row number QTL mapping are not particularly comparable to the culm diameter results due to the inclusion of full genome genotypes, which extended the mapping to segregating sites elsewhere in the genome. Four QTL were detected (Figure 2.5), two in the primary region of interest on chromosome five with a single QTL each detected on chromosomes seven and ten (Table 2.2). Unsurprisingly, the founding HIF once again had a very significant effect (F-test, p < 2e-16). The two QTL in the target region had the highest LOD and additive effect of mapped QTL. Like culm diameter, the HIF effect had the overall highest LOD score and effect. 46 Figure 2.5: QTL LOD profiles for fine mapping of culm diameter and kernel row number traits. QTL are color coded and labeled as “chromosome@position”. So the highest LOD score kernel row number QTL ([email protected]) should be read as QTL on chromosome 5b at position 7.0. (A) Culm diameter LOD profile for the single QTL detected in the primary mapping region. LOD score (y-axis) versus map position in centimorgans is shown. (B) Kernel row number LOD profiles for four detected QTL, two in the primary region of interest on chromosome 5b. Secondary QTL on chromosomes 7b and 10a have lower LOD score and effect size than the 5b QTL. In addition to significant QTL, a highly significant HIF effect with high LOD score (culm = 8.688, krn = 19.787) was also included in QTL models for both traits. 47 Table 2.2: Detected QTL and HIF effects including LOD, percent variation explained, and additive effect. Name LOD Var. Explained (%) Add. Effect krn5b.1 krn5b.2 krn7b.1 krn10a.1 krn HIF 10.161 13.309 4.758 5.391 19.787 7.28% 10.40% 2.95% 3.40% 18.59% -0.413 -0.387 0.264 -0.114 -1.550 krn model 44.127 89.02% — culm5.1 culm HIF 5.126 8.688 18.73% 35.69% 0.0975 -0.0353 culm model 11.081 49.36% — 48 2.5 Discussion 2.5.1 The complex genetic architecture of culm and kernel row number Efforts to identify causative factors underlying QTL have recently been met with great success in maize. These successful studies have identified genes contributing to loss of prolificacy [4], day length neutrality [50, 62], liberation of the kernel from its fruitcase [5], and apical dominance [3]. Our study set out to contribute to this growing list of genes by examining domestication QTL affecting important traits on the fifth chromosome. Unfortunately, we were unable to identify a single gene contributing to the domestication traits of culm diameter and kernel row number. Instead, we found evidence of multiple factors on chromosome five and other chromosomes controlling kernel row number and culm diameter suggesting the underlying genetic architecture for these traits is quite complex. Prior analyses identified domestication QTL for culm diameter and kernel row number spanning the fifth chromosome from 54.4 to 151.7 Mb and 166.6 to 169.2 Mb, respectively. Using QTL mapping with fine mapping RCNILs produced mixed results. The culm5.1 QTL was further refined to a much smaller region from 83.74 Mb to 86.26 Mb on the fifth chromosome, a reduction in size to ∼2.5 Mb from an initial 1.5 LOD support interval of close to 100 Mb. Unfortunately, the RCNILs used to fine map krn5.2 resulted in multiple causative QTL on the fifth and other chromosomes while also harboring major differences between founding HIFs. The fine mapping QTL for kernel row number closest to the original target region was located from 160.7 Mb to 163.94 Mb on the fifth chromosomes, shifted upstream of the original interval by approximately 3 Mb. It is interesting that the largest LOD score QTL from chapter 1 moved and fractionated into multiple factors, while the comparably smaller LOD score and effect size QTL culm5.1 was narrowed to 49 an interval ∼2.5% the size of the original interval. In terms of number of genes in the 1.5 LOD support intervals, the fine mapping region for culm diameter had a total of 40 genes and the kernel row number QTL had 63 genes. While this fell short of the ultimate goal of a single gene, a small enough number of genes are in the confidence intervals to begin looking for interesting candidate genes. The forty genes in the culm diameter QTL were characterized by looking at functional annotation, expression results from chapter 3 of this thesis, and inclusion in selection features from a recent genome-wide population genetics scan in maize [55]. In terms of protein functional annotations, these genes had a variety of biological functions such as nucleases, transmembrane proteins, metabolic enzymes, chlorophyll binding proteins, and a number of transcription factors. Gene regulatory differences from the allele specific RNAseq experiment gave results for 21 of the 40 genes. Seven genes were classified as having a significant cis regulatory change. However, none of these seven genes were part of the final filtered candidate gene list. Eight of the forty genes were also inside domestication selection features, suggesting genes in the culm QTL were under positive selection during maize domestication. While evidence points to differential gene expression and selection on the genes in the culm5.1 QTL, no single gene has multiple lines of supporting evidence. Genes in the highest LOD score kernel row number QTL were also examined for interesting candidates. Like the culm5.1 QTL, genes in the kernel row QTL had many different functions including ubiquitin association, ribosomal proteins, nucleases, nuclear transporters, and several transcription factors. Of the 63 total genes, 36 were not assessed by the chapter 3 RNAseq experiment. The majority of the assayed genes (24) were not on filtered candidate gene lists for cis regulatory change, however three genes were. The most interesting candidate is an armadillo repeat containing protein with a U-box domain. Armadillo proteins were first characterized in fruit fly and are implicated in a number of functions including intracellular signaling and cytoskeletal regulation. The 50 U-box family of proteins is a class of ubiquitin-protein E3 ligases. While there is evidence for positive selection during maize domestication in the krn QTL, none of the genes with cis regulatory change show signs of positive selection, leaving no ideal candidate for the kernel row number QTL defined in our fine mapping experiment. This work provides a cautionary note for researchers looking to identify causal genes for QTL. In this study we set out to identify the causative gene underlying two QTL, a large effect and LOD score QTL with a narrow confidence interval and a moderate effect and LOD score QTL with a larger support interval. Contrary to expectations, we were actually more successful in narrowing the QTL region for the weaker effect QTL, while the high LOD kernel row number QTL shifted positions slightly and was influenced by multiple factors. We show that the inheritance of genetic factors influencing kernel row number on chromosome five are quite complicated and that a previously mapped high LOD score QTL fractionates into multiple linked factors. The lower LOD score QTL for culm diameter actually resulted in the better fine mapping result with a greatly reduced confidence interval. 2.5.2 Future work on chromosome five QTL The fifth chromosome of maize has been implicated as a major contributor to maize domestication in several studies [8, 25]. QTL for the kernel row number and ear diameter traits are of particular interest due to their large effect, high LOD score, and obvious link to desirable domestication phenotypes. The work in this thesis shows that ear diameter and kernel row number fractionate into multiple linked QTL on the fifth chromosome. Evidence from fine mapping and chapter 1 of this thesis put the kernel row number QTL between 160.7 and 169.2 Mb. Unfortunately, this region contains over 100 genes and we could not identify a single highly attractive candidate gene based on gene annotation, expression profiles, and scans for selection. Even though these efforts were not met with full 51 success, the importance of chromosome five on domestication traits (kernel row number and ear diameter in particular) cannot be understated and future studies looking at this chromosome are inevitable. To aid future studies on these QTL, insight can be taken from this work to maximize the chances of success. I believe there are two primary insights that would be useful to future researchers in this endeavor. First, the uniformity of the genetic background on chromosome five and other chromosomes appears to be of critical importance. The founding HIFs taken from a BC2 S3 population in this experiment proved to have a complicated background with multiple secondary segregating sites that caused problems when mapping in the target QTL interval. Consequently, a more advanced population would be desired. Second, distinct differences between founding HIFs were detected for both the culm diameter and kernel row number QTL, suggesting comparison of RCNILs generated from different HIFs could be misleading. Either designing the experiment to draw on a single founding HIF to avoid this issue or accounting for HIF in analysis of the phenotype data will be important. In spite of accounting for founding HIF in the linear mixed models, I still observed a large difference in kernel row number between founding HIFs suggesting simplification of the experiment to use a single HIF may be the best design. The use of more extensive backcrossing and generation of RCNILs from a single founding HIF will allow for an overall more isogenic genomic background with minimal segregation outside of the desired region. Drawing starting HIFs from the BC6 S6 NIRIL population from chapter 1 is an easy and logical way to do this. Additionally, the kernel row number QTL is already confirmed in the population. Towards this end, we have started the crosses necessary to produce a new population of segregating RCNILs from several of the lines in the mapping population from chapter 1. These RCNILs will be used in future field trials for a new and improved fine mapping attempt of the highly important kernel row number and ear diameter phenotypes on chromosome five. 52 Chapter 3 The role of cis regulatory evolution in maize domestication 53 3.1 Abstract Gene expression differences in divergent lineages caused by modification of cis regulatory elements are thought to be a critically important process in the evolution of species. In this study, we assay genome-wide cis and trans regulatory differences between maize and its wild progenitor, teosinte, using deep RNA sequencing in F1 hybrid and parent inbred lines. Three tissues were sampled and approximately 70% of ∼17,000 genes showed evidence of allele specific expression. Approximately 1,000 of these genes show consistent cis differences among the sampled maize and teosinte lines, of which ∼70% are specific to a single tissue. The number of genes with cis regulatory differences is greatest for ear, which underwent a drastic transformation in form during domestication. Genes with cis effects were also under positive selection during maize domestication and improvement more often than expected by chance. Over all genes, maize was shown to possess less cis regulatory variation than teosinte, a deficit that is greatest for genes with cis regulatory divergence. We observed a directional bias where genes with cis differences favored higher expression in maize, suggesting domestication led to a general upregulation of gene expression. Finally, this work documents the cis and trans regulatory changes between maize and teosinte in over 17,000 genes for three tissues. 54 3.2 Introduction Changes in the cis regulatory elements (CREs) of genes with functionally conserved proteins have been considered a key mechanism, if not the primary mechanism, by which the evolution of the diverse forms of multicellular eukaryotic organisms evolved [12, 13, 65]. Variation in CREs allows for the deployment of tissue specific patterning of gene expression, differences in developmental timing of expression, and variation in the quantitative levels of gene expression. Furthermore, modification of CREs, as opposed to coding sequence changes, are assumed to have less pleiotropy and consequently a lower chance of being deleterious due to unintended consequences in secondary tissues. The importance of CREs for the development of novel morphologies is supported by the growing catalog of examples for which differences in CREs of specfic genes between closely related species contributed to the evolution of diversity in form and pigmentation patterning [66]. While compelling evidence for the importance of CREs in evolution has come from mapping causative variants to CREs, additional evidence has been emerging from genomic analyses. These analyses have shown that cis regulatory variation is abundant both within [67–70] and between species [20, 21, 71]. Some studies have reported a bias such that genes with cis differences between species or ecotypes often show preferential upregulation of the alleles of one parent, possibly as a result of natural selection [21, 68, 72]. Consistent with the proposal that cis differences are a key element of adaptive divergence, divergence for cis regulation between yeast species is more often associated with positive selection than trans divergence [20, 73]. Crop plants offer a powerful system for the investigation of evolutionary mechanisms because they display considerable divergence in form from their wild progenitors, yet exhibit complete cross-fertility with these progenitors [7, 36, 74]. QTL fine-mapping experiments have provided multiple examples of changes in CREs that underlie trait divergence between crops and their ancestors. These studies include examples in which 55 cis changes confer the upregulation of a gene during domestication [3], the downregulation of a gene [14, 62], the loss of a tissue specific expression pattern [15], the gain of a tissue specific expression pattern [4], and a heterochronic shift in the expression profile [16]. These diverse results suggest that changes in CREs offer a powerful means to fine-tune gene expression to generate new plant morphologies. Several genomic scale assays of gene expression differences between crops and their ancestors have been performed, although the experimental designs used did not allow the separation of cis and trans effects. These studies have shown that hundreds or even thousands of genes have altered expression in crops as compared to their progenitors and that genes with altered expression are more likely to show evidence for past selection than genes with conserved expression [17–19]. The data suggest massive alterations in gene expression profiles accompanied domestication. Work in cotton and maize shows a more frequent upregulation of genes in the cultivated as compared to the wild parent, however whether this was due to cis or trans effects was not discernible [17, 18]. In this study, we used RNAseq to parse genome-wide expression differences between maize and its progenitor, teosinte (Zea mays ssp. parviglumis), into cis and trans effects. Three tissue types were assayed: immature ear, seedling leaf, and seedling stem. Approximately 70% of the 17,000 genes assayed show evidence of regulatory divergence between maize and teosinte. Over 1,000 genes show cis divergence that is highly consistent across our sampled lines of maize and teosinte. For ∼70% of genes with consistent cis effects, the cis effects are specific to just one of the three tissue types. The number of genes with cis differences is greatest for the ear, which underwent a profound transformation in form during domestication. Genes with cis regulatory differences between maize and teosinte more frequently show evidence for positive selection associated with domestication than do trans genes. Maize also possesses less cis regulatory variation than teosinte over all genes, and this deficit in maize is greatest for genes with cis regulatory divergence from 56 teosinte. We observed a directional bias in that genes with cis differences more frequently have upregulated expression of maize alleles over teosinte, although we cannot exclude the possibility that this is an artifactual result. Finally, our data provide a catalog of cis and trans regulatory variation for over 17,000 genes in three tissue types for maize and teosinte. 3.3 3.3.1 Materials and Methods Plant material, RNA preparation, and sequencing Six maize inbred lines, nine teosinte inbred lines, and 29 of their 54 possible maize-teosinte F1 hybrids were used in this experiment (Supplemental Table C.1). An average of 1.96 biological replicates (range 1 to 4) of each genotype were used. Plants were grown in growth chambers with a 12 hour dark-light cycle for up to 6 weeks, after which they were moved to a greenhouse. Fifty to 100 milligram samples of the immature ear, leaf, and seedling stem were harvested for RNA extraction during this time. Leaf and seedling stem (including the shoot apical meristem) tissue was collected at the v4 leaf stage. Single ears from maize and F1 hybrid plants were collected when the ears weighed 50 to 100 milligrams with silks just beginning to be visible. Teosinte ears were also collected when silks just started to appear, however, due to the small size of teosinte ears 7 to 16 ears (average of 11.27) from each plant were pooled to obtain ∼50 milligrams of tissue. These three tissue types will from here on be referred to as the ear, leaf, and stem tissues. Total RNA was extracted from the plant tissues using a standard TRIzol protocol. Total RNA was then quantified by spectrophotometer and normalized to 1 µg/µL in nuclease free water. Starting with 5 µg total RNA, we generated polyA selected, strand specific, barcoded RNAseq libraries with a previously published protocol using a five minute fragmentation time and 12 PCR amplification cycles [75]. Library adapters used barcode 57 sequences of four and five base pairs (Supplemental Table C.2) designed to balance percent nucleotide composition within the first five base pairs of sequence reads and to have at least two base pair differences from any other barcode. RNAseq libraries were then pooled in groups of 14 (F1 s) or 15 (parents), and the pooled libraries sequenced on one lane (parents) or two lanes (F1 s) of an Illumina HiSeq2000 sequencer at the University of Wisconsin Biotech Center. 3.3.2 Bioinformatics A pipeline was developed to quantify gene expression in F1 hybrid and parental inbred lines using the RNAseq reads. The pipeline, based on work by Wang et al. [76], has two main steps (1) construction of a pseudo-transcriptome for each parent line from the B73 reference genome and polymorphisms derived from non-B73 genomic paired-end reads and (2) alignment of RNAseq reads to the pseudo-transcriptomes followed by evaluation of read depth at segregating sites. Pseudo-transcriptomes were constructed using the B73 reference genome (version AGPv2) and transcriptome (version ZmB73 5a WGS) plus an average of 403.1 million (17.5X coverage) paired-end genomic sequencing reads from each of the other 14 inbred lines (Supplemental Table C.3). For each of the 14 non-B73 inbreds, paired-end genomic sequencing reads were aligned to the reference genome with the BWA aligner (version 0.5.9) [77]. Only uniquely mapping reads with up to two mismatches were used to limit false polymorphism detection due to paralogous read alignment. Segregating sites from single nucleotide polymorphisms (SNPs) and small insertion or deletion (indel) polymorphisms were called using the GATK package (version 1.0.5588) [78, 79] and filtered to include only polymorphisms that were homozygous in the inbred with read depth of at least 4X. A strand bias filter was also applied to ensure that the polymorphism was detected on both the plus and minus strand. Polymorphisms surviving these filters were 58 then inserted into the reference B73 transcriptome to make a pseudo-transcriptome for each parent. For each of the 29 maize-teosinte pairs, a robust set of segregating sites was determined by comparing the pseudo-transcriptomes of the two parents and taking the sites where: the two parental alleles differed, coverage in genomic read alignment was at least four for both parents within the read length (88bp) of the site, and no heterozygous polymorphisms were detected in genomic read alignments of the two parents within the read length of the site. RNAseq reads from each F1 hybrid and each corresponding pair of inbred parents were then aligned to the combined pseudo-transcriptomes of the two parents (in the case of the B73 parent, the B73 reference transcriptome was used) using the Bowtie aligner (version 0.12.7) [80]. Allele specific expression was assessed by counting depths of reads originating from each parent at segregating sites (determined as described above). Since only perfect alignments were allowed, assignment of reads to parents was straightforward (a read from a given parent could only align to this parent’s allele at a segregating site). 3.3.3 Maize:teosinte gene expression ratios We calculated F1 hybrid and parent maize:teosinte expression ratios for each gene for each of the 29 individual F1 hybrid comparisons. The F1 expression ratio for individual F1 s (e.g. B73 x TIL01) was calculated as the number of maize reads to the number of teosinte reads summed over all segregating sites in the gene. The parent expression ratio for individual F1 comparisons was calculated as the number of reads for the maize parent (e.g. B73) to the number of reads for the teosinte parent (e.g. TIL01) summed over all segregating sites in the gene after correcting for any difference in the total number of reads between the two parent lines. The result of these calculations was a set 29 matched F1 and parent ratios of read counts for each gene. For example, for the B73 x TIL01 59 comparison at a single gene, the F1 and parent maize:teosinte ratios could be 52:56 and 34:30, respectively. We also calculated F1 hybrid and parent maize:teosinte expression ratios for each gene summed over all F1 hybrid comparisons by pooling the read depth values for the 29 F1 hybrids and their parents, respectively. To calculate the overall F1 expression ratio, the maize and teosinte read counts from the F1 hybrids were simply summed over all segregating sites in a gene and across all hybrids. The calculation of the overall parent expression ratio required weighting. The weighting was necessary to avoid counting the parent reads multiple times for each of the F1 hybrids in which it was a parent and to compensate for the fact that different parents had variable total numbers of reads. Only genes with a read depth of at least 100 in both the F1 and its parent were included. The result of these calculations was an overall F1 and parent ratio of read counts for each gene. For example, for a gene, the overall F1 and parent maize:teosinte ratios could be 804:796 and 123:130, respectively. 3.3.4 Testing for cis and trans effects The combination of F1 hybrid and parent inbred expression data allows us to estimate both the cis and trans effects on gene expression. For the F1 hybrids, the maize and teosinte alleles at each gene are in a common trans cellular environment, and thus any deviation of the maize:teosinte F1 expression ratio from 1:1 represents purely cis effects. By contrast, the maize:teosinte parent expression ratio is a combination of the cis and trans effects and any deviation of this ratio from 1:1 reflects the combined cis plus trans effects. Therefore, the trans effects can be estimated by subtracting the F1 hybrid ratio (cis) from the parent ratio (cis plus trans). Maize and teosinte gene expression as measured by the read depth counts at genes were used for statistical testing of cis and trans effects. Significant cis and trans effects were 60 Table 3.1: Regulatory category as defined by significant (Sig.) or not significant (Not Sig.) binomial tests (BT) and Fisher’s Exact Tests (FET). Category Cis Trans Cis + Trans Cis x Trans Compensatory Conserved Ambiguous Parent BT Hybrid BT FET Favored allele? Sig. Sig. Not Sig. — Sig. Not Sig. Sig. — Sig. Sig. Sig. Same Sig. Sig. Sig. Opposite Not Sig. Sig. Sig. — Not Sig. Not Sig. — — All other patterns of significant or not significant 61 determined using binomial and Fisher’s Exact Tests as described in McManus et al. [21]. In brief, two binomial tests were used to identify genes with maize:teosinte expression ratios significantly different from 1:1 in the F1 hybrid and parent comparisons. Genes with an expression ratio significantly different from 1:1 for the F1 hybrid and/or parent comparison were then subjected to a Fisher’s Exact Test to determine if the parent and F1 hybrid maize:teosinte expression ratios were different from one another. An FDR rate of 0.5% using Storey’s q-value [81] was used to compensate for the large number of statistical tests being performed. The combination of the two binomial tests and Fisher’s Exact Test allowed us to classify each gene into one of seven different regulatory categories (Table 3.1) as described in McManus et al. [21]. 3.3.5 Candidate genes Genes whose expression level was the direct target of selection during maize domestication are expected to show a maize:teosinte cis expression ratio that is significantly different from 1:1. These genes can fall into either the cis only (C) or cis plus trans (CT) groups on Table 3.1 as determined by the binomial and Fisher’s Exact Tests. We call this combined group CCT genes and they are the differential expression candidates that are the focus of many of our analyses. The list of CCT genes from the overall test was large (5,609 ear; 5,392 leaf; 5,426 stem; see results). The large number of CCT genes reflects the considerable statistical power to detect slight overall expression biases given that some genes had thousands of reads aligning to segregating sites. We observed significant maize:teosinte expression biases as small as 1.0:1.02 in the overall tests. Such small differences seem unlikely to have biological importance and genes showing these small differences are weak candidates for genes with cis expression variation that is causal in maize domestication and improvement. 62 Therefore, we applied filters to identify candidates with the strongest and most consistent regulatory differences. To narrow down the CCT gene list to candidate genes that show the strongest evidence for differential cis regulation between maize and teosinte, we applied two filters. (1) Genes with the strongest evidence should not only fall in the CCT group for the overall test using the pooled data from all 29 F1 hybrid comparisons, but the best supported genes for cis differences will be the ones for which we have data from a large proportion of our sampled maize and teosinte parents. Thus, we filtered the initial list of CCT genes for those with data from at least fifteen F1 hybrids that include at least three different maize inbreds and five different teosinte inbreds. (2) For genes with cis differences that contributed to maize domestication/improvement, they should not only appear in the CCT list from the overall test, but the direction of the expression bias should be highly consistent among each of the individual F1 hybrids. To classify CCT genes for consistency of directionality of expression bias among the F1 s, we partitioned the genes into groups with 100%, 90% and 80% of F1 s showing the same directionality. In calculating these percentages, we used read depth for each F1 at the gene to weight the contribution of the F1 s to the overall percentage. We refer to the CCT genes with 100%, 90% and 80% consistent directionality among the F1 s as the A-list, B-list and C-list, respectively. For comparative purposes, we made similar A, B and C lists of genes for the cis only or trans only classes. 3.3.6 Proportion of cis variation in maize and teosinte The existence of multiple cis regulatory regimes within maize and teosinte populations are expected to manifest as variation in the expression ratios among F1 hybrids. We asked whether cis expression variation among F1 hybrid ratios was more heavily influenced by maize or teosinte inbred parent. Since three teosinte inbreds (TIL05, TIL10, and TIL15) were involve in only a single F1 each, the three F1 s involving these inbreds were removed 63 from the data in order to balance the number of maize and teosinte inbred parents in the dataset for this analysis. Genes were tested for variation among the F1 expression ratios (cis variation) using a linear model. The log2 (maize:teosinte) F1 expression ratio as the dependent variable was fit to the maize (j=1 to 6) and teosinte (k=1 to 6) parents as the independent variables. All models were fit on a gene-by-gene basis. Significant maize and teosinte parent terms were identified with an F-test (p < 0.05) using the drop1 function in R. The data for each F1 was weighted by its total depth at the gene to account for different read-depths in the F1 hybrids. 3.3.7 Additive and dominant gene expression One theory in domesticated systems states that genes responsible for rapid morphological evolution are primarily loss of function (LOF) alleles [82]. In this scenario, a nondomesticated allele would be dominant to the LOF domesticated allele. While there is some support for this theory in rice diversification and improvement [83], recent QTL and domestication gene cloning experiments present a more diverse collection of functional gene changes [84]. In domesticated systems, the mode of inheritance for gene expression in terms of additivity and dominance has yet to be explored. Our dataset consisting of parent inbred and hybrid expression profiles gives the opportunity to address the LOF hypothesis in terms of gene expression on a genome-wide scale. We calculated the additive effect, dominant effect, and dominant/additive (D/A) ratio for each gene and maize-teosinte F1 hybrid comparison. The overall maize-teosinte average D/A ratio was then calculated after exclusion of outlier F1 D/A ratios using the Dixon method [85]. Genes were next classified as having overdominant (1.25 < |D/A|), dominant (0.75 < | D/A | < 1.25), semi-dominant (0.25 < | D/A | < 0.75), or additive (| D/A | < 0.25) gene action depending on D/A ratio. Following calculation of overall D/A ratios and assignment of gene action, we looked for patterns in D/A ratios and gene action 64 that support the LOF hypothesis [82]. Specifically, we looked for evidence of extensive dominance of the teosinte (non-domesticated) allele for genes with trans only regulatory change. 3.3.8 CCT gene enrichment in various functional categories We assessed whether CCT genes are over or under represented in several categories as compared to all genes or genes with conserved expression levels between maize and teosinte. The categories we tested include transcription factors, several metabolic pathways, gene ontology (GO) categories, selection candidates, and domestication QTL. A list of maize transcription factors and their associate families was downloaded from the plant transcription factor database [86]. Metabolic enzyme cDNA sequences for starch and lipid metabolism pathways in maize were downloaded from the Kyoto Encyclopedia of Genes and Genomes (KEGG) [87, 88] and matched with genes from the maize filtered gene set (version 5b) by BLAST. Matches (single gene hit with percent identity greater than 95%) were found for 370 out of 379 genes and used to test for enrichment of CCT genes in the various metabolic pathways. Genes under positive selection during maize domestication and improvement were taken from a recent genomic scan for selection [55]. We obtained a list of QTL associated with maize domestication and improvement traits from Table A.1 in work by Shannon [25]. In general, we tested for enrichment or depletion of CCT genes in various categories using Fisher’s Exact Tests on 2x2 contingency tables that parse genes by CCT and category status. Statistical testing was first done for CCT-AB candidate genes and extended to CCT-A and CCT-ABC lists if an interesting result presented itself. Additionally, there were a few differences in this general approach depending on what category was being analyzed. For QTL, we looked for enrichment of CCT genes among the genes within the 1.5 LOD support intervals for each trait separately and only included QTL whose 1.5 65 LOD support intervals were narrow enough to encompass 20 or fewer genes. For genes under positive selection during domestication and improvement, we performed an additional three tissue union comparison where genes on any of the three tissue CCT lists were considered a CCT candidate gene. One expectation for genes under selection for CREs is the signature of selection at the CRE itself, upstream of the gene in question. Since there is no hard rule as to how far upstream cis enhancer and repressor elements can function, we addressed this expectation by looking at selection pressure at the transcriptional start site of genes. The raw selection score, represented by cross population composite likelihood ratio (XPCLR) [89], from Hufford et al. [55] served as a test statistic for this analysis. A three tissue union comparison was made between all genes on CCT-AB lists and all genes identified as conserved in the initial assay. Significant differences between the XPCLR score at the transcriptional start site were tested by Kolmogorov-Smirnov and simple t-tests to look for change in the overall distribution and mean of conserved versus CCT genes. Finally, we used the goseq package [90] in R [91] to test for GO term enrichment and depletion in our CCT gene lists, using median gene length to adjust the reference in the goseq analysis. The base background GO term reference consisted of genes for which allele specific expression was assessed in 15 crosses, three unique maize, and five unique teosinte inbred lines with a cumulative depth of 100 at segregating sites in F1 and parent comparisons. GO terms occurring at least five times in the background reference were tested for enrichment and depletion in the CCT-A, CCT-AB, and CCT-ABC gene lists with p-values corrected for multiple testing using the Benjamini-Hochberg method [92]. 66 3.4 3.4.1 Results RNAseq provides expression data for more than 17,000 genes per tissue RNAseq data for seedling leaf, seedling stem (including the shoot apical meristem), and immature ear from six maize inbreds, nine teosinte inbreds, and 29 of their 54 possible F1 hybrids were used to examine variation in gene expression on a genome-wide scale. In total, 259 RNAseq libraries were constructed from an average of 1.96 biological replicates for each parent inbred and F1 . Overall, 996 million, 1.13 billion, and 1.21 billion F1 hybrid and 286 million, 283 million, and 276 million parent RNAseq reads were collected for ear, leaf, and stem tissue types, respectively (Table 3.2). These reads were aligned with custom-made parent specific pseudo-transcriptomes containing an average of 54,000 segregating sites (SNPs or small indels) in each of the 29 maize-teosinte contrasts. Out of the reads from the F1 hybrids, 556 million, 670 million, and 716 million reads mapped to pseudo-transcriptomes in ear, leaf, and stem tissue, respectively. For parent inbred line reads, 171 million, 170 million, and 163 million mapped to the pseudo-transcriptomes (Table 3.2). Thus, approximately the same percentage of reads (58.1% and 59.6%) mapped to pseudo-transcriptomes in both the F1 hybrids and parent datasets with about 7.15% of the total reads mapping to segregating sites in the individual F1 hybrids and their parents. The RNAseq reads from the pooled data for all 29 F1 hybrids and 15 parents that aligned to segregating sites in the transcriptomes represent 23,045, 23,434, and 23,792 genes for ear, leaf and stem tissues, respectively (Table 3.3). The union of these three groups is 24,983 genes, which is 63% of the 39,423 genes from the maize filtered gene set (version 5b). We applied a filter to this list, requiring a read-depth of 100 in both the parent inbreds and F1 hybrids. This filter reduced the lists to 15,939, 15,925, and 16,018 67 Figure 3.1: Overlap of genes assessed in the three tissues overall and in the CCT-AB gene list. Each compartment of the Venn diagram contains the tissue combination on top, number of genes overall in the middle, and number of genes from the CCT-AB gene list on bottom. CCT-AB overlap numbers marked by an “*” indicate significantly more overlap than expected by chance (permutation tests, p < 1e-5). In the overall analysis the vast majority of genes (82%) were assayed in all three tissues. While this percent is much smaller for the CCT-AB candidate gene list (∼7%), this is still more of an overlap than expected by chance. The much higher degree of overlap of CCT-AB genes than expected suggests some CREs act in multiple tissues. Additionally, there are also many single tissue CCT-AB genes, which points towards the many cis elements that appear to function in tissue specific patterns. 68 Table 3.2: Assignable RNAseq Read Counts from F1 hybrids and parents. Tissue F1 Hybrid Count Parent Count F1 Hybrid Percent Parent Percent Total Reads Ear Leaf Stem 996,210,711 1,133,517,167 1,211,779,746 286,233,926 282,553,096 276,295,164 - - Aligned Reads Ear Leaf Stem 556,387,109 670,175,942 716,223,906 171,185,368 169,564,817 162,866,225 55.85% 59.12% 59.11% 59.81% 60.01% 58.95% Segregating Site Reads Ear Leaf Stem 74,556,872 72,995,272 91,355,219 85,296,872a 78,878,805a 78,583,423a 7.48% 6.44% 7.54% 29.80%a 27.92%a 28.44%a a A higher number and percentage of reads map to segregating sites in parents due to each set of parent reads being used in multiple comparisons. In contrast each of the F1 comparisons can only map to segregating sites between two pseudo-transcriptomes. 69 Table 3.3: Genes for which RNAseq data was collected and expression was assayed.1 Ear Leaf Stem Union Genes with mapped RNAseq reads 32,858 32,645 33,316 34,636 Genes with RNAseq reads and segregating sites 22,072 22,393 22,901 24,052 Overall Genes (filtered100 depth) 15,939 15,925 16,018 17,575 Total CCT genes 5,618 5,402 5,435 10,101 Filtered CCT Genes (15F1 + 3M + 5T) 4,770 4,490 4,601 8,398 ABC-List CCT 1,545 1,288 1,371 3,018 C-List CCT 990 843 940 2,314 B-List CCT 512 424 404 1,036 A-List CCT 43 21 27 69 1 Only genes from the maize filtered gene set (version 5b) were considered. 70 genes in ear, leaf, and stem tissues, respectively. The union of these three groups is 17,575 genes or about 45% of the filtered gene set. There is a large degree of overlap among the genes expressed in the three tissues. From the total list of 17,575 genes, 14,420 (82%) were seen in all three tissues. Of the remaining genes, 1,467 are in some combination of two tissues and 1,688 are in only a single tissue (Figure 3.1). All except 16 of these single or two tissue genes were detected at a read depth below 100 in additional tissues. However, for the 1,688 genes expressed in only single tissues at 100 read-depth, an average of 67.4% of their reads come from the tissue with the most reads. For genes detected in all three tissues at 100 read-depth, this value is only 46.9%. Thus, while very few of the 1,688 genes are absolutely tissue specific, this group of 1,688 genes shows greater differences in expression among tissues than the 14,420 genes detected in all three tissues. 3.4.2 Prolific regulatory variation characterized by relatively few consistent cis differences We measured log2 of the ratio of maize to teosinte read counts in F1 hybrids (cis regulatory effect) and the parent log2 ratio (combined cis and trans regulatory effect). The trans effect was estimated as the difference between the F1 and parent log2 ratios. Binomial and Fisher’s Exact Tests were used on read counts to determine whether these ratios deviated from 1:1 and to assign genes to one of seven regulatory categories (Table 3.1). In an overall maize versus teosinte comparison, about 69% of genes (69.27% ear, 74.27% leaf, and 63.82% stem genes) from the three tissues were classified as having some combination of significant cis and/or trans regulatory effect (Figure 3.2). The remaining genes were classified as having conserved (18.6%, 15.3%, and 20.7%) expression in maize and teosinte or ambiguous (12.1%, 10.4%, and 15.5%) expression patterns. All three tissues had similar proportions of genes falling into the different regulatory categories in 71 the overall maize-teosinte comparison (Ear: Figure 3.2, Leaf: Supplemental Figure C.1, Stem: Supplemental Figure C.2). We asked what proportion of regulatory divergence between maize and teosinte was due to cis effects by calculating the ratio: |cis|/(|cis|+ |trans|) [21]. Overall genes, cis effects account for 45%, 42% and 47% of regulatory divergence for ear, leaf and stem tissue, respectively (Supplemental Table C.4). We further asked the relative contribution of cis and trans in generating large expression differences by binning genes based on overall expression difference between maize and teosinte (log2 parent ratio). This analysis shows the magnitude of cis regulatory change is positively correlated with total divergence in expression (Figure 3.3). At high degrees of expression divergence between maize and teosinte (log2 change of 5 or more), over 75% of the divergence is due to cis. Thus, large expression differences appear to be caused primarily through difference in cis regulation as opposed to trans. A primary goal in this study was to identify genes with cis regulatory differences between maize and teosinte. Such genes are candidates for being direct targets of selection during maize domestication or improvement for altered gene expression. Genes selected for regulatory differences would be in either the cis only or cis plus trans regulatory categories. We designate this combined group CCT genes. We identified 5,618 ear, 5,402 leaf and 5,435 stem CCT genes in the overall analysis (Table 3.3). To narrow the list of CCT genes to those with a broad degree of support, the list was filtered to include only those assayed in at least 15 maize-teosinte F1 s involving at least three maize and five teosinte inbred lines. This filtering resulted in reduced lists of 4,770 ear, 4,490 leaf, and 4,601 stem CCT genes. The union of these three sets includes 8,398 genes. Next, we asked if the 8,398 genes on the filtered CCT list from the overall analysis have a consistent directionality in favor of the maize or teosinte allele in the individual F1 hybrids. The goal was to exclude CCT genes for which the significant overall cis effect was 72 Figure 3.2: Parent versus hybrid ear tissue allele specific expression ratios. The parent (x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against each other. Regulatory category in terms of the combination of significant statistical tests determined using the method described in methods is shown designated by color. Proportion and count of genes falling into the various regulatory categories are also shown in the lower right hand corner barplot. 73 Figure 3.3: Proportion of expression divergence due to cis regulatory difference. The amount of total differential expression between the maize and teosinte parents due to the directly measured cis effect (F1 hybrid expression ratio) is shown with error bars depicting one standard error. Total divergence (parent expression ratio) was binned from 0-1, 1-2, 2-3, 3-4, 4-5, and 5+. Divergence due to cis effects increases with total divergence, suggesting large expression differences tend to be caused by cis rather than trans regulatory differences. 74 caused by a large expression bias in a minority or even one of the F1 crosses. We defined three levels of consistency: groups A, B and C for which 100%, 90% and 80% of F1 s showed the same directionality, respectively. Groups A, B, and C genes combined across tissues contained 69, 1,036, and 2,314 genes respectively (Table 3.3). Thus, relatively few of the 8,398 filtered CCT genes show a significant overall cis effect that is highly consistent among 15 or more F1 hybrids. 3.4.3 Possible directional bias in cis evolution Visual examination of Figure 3.2 shows a greater density of cis genes (black dots) with positive log2 hybrid expression ratios than with negative ratios, suggesting cis evolution during domestication more often favored alleles with increased expression in maize relative to teosinte. Consistent with this visual observation, the number of CCT (ABC list) genes with a positive (maize biased) vs. negative (teosinte biased) log2 hybrid expression ratio are 947:598, 814:474 and 826:545 for ear, leaf and stem, respectively (Supplemental Table C.5). All of these ratios are significantly different from a 50:50 unbiased expectation (binomial test, p < 0.001). Additionally, a plot of the distribution of log2 hybrid expression ratio for CCT genes shows a much greater density of genes with positive values (Figure 3.4) for all three tissue types. The apparent bias in directionality of cis evolution could be the result of error in our bioinformatics pipeline. One potential error is preferential alignment of maize RNAseq reads due to overall greater sequence divergence of teosinte lines from the reference transcriptome (B73) in comparison to non-reference maize inbred lines. If such systematic error exists, the observed bias in directionality of cis evolution would be expected to be greatest for F1 s involving the reference B73 (zero alignment bias of maize reads and high bias for teosinte) and less extreme for crosses between teosinte and non-reference maize lines (moderate bias for non-reference maize and high bias for teosinte). 75 Figure 3.4: Cis versus estimated trans regulatory effect for CCT-ABC genes in the ear, leaf, and stem. CCT genes have a directional bias with more genes overall favoring the maize allele than teosinte. Genes with consistent cis regulatory differences tend to favor the domesticated maize allele. This phenomenon exists in all three tissues. While we cannot discount references bias as the cause, this trend suggests there may be an overall directional bias for cis regulatory evolution in maize domestication. 76 To test this expectation, we calculated the number of CCT (ABC list) genes with positive (maize biased) vs. negative (teosinte biased) log2 hybrid expression ratios separately for F1 s involving B73 and non-B73 maize parents. For ear tissue, there are 569 teosinte-biased and 975 maize-biased genes for B73 F1 s and 606 teosinte-biased and 939 maize-biased genes for non-B73 F1 s. A Fisher’s Exact Test fails to reject the null hypothesis that these two ratios are equivalent (p = 0.18). There was also no evidence for non-equivalent ratios with the other two tissue types (Supplemental Table C.6). Thus, we see no evidence for significantly greater bias for maize alleles in crosses involving B73 versus the non-reference maize parents, supporting the argument that alignment bias introduced by use of pseudo-transcriptomes does not explain the excess of CCT genes with the maize allele expressed higher than the teosinte allele. 3.4.4 Gene expression variation is greater in teosinte Both the domestication/improvement bottleneck and selection during domestication are expected to reduce variation in maize as compared to teosinte. We asked if these reductions in variation are apparent in our gene expression data. To quantify whether variation in maize or in teosinte was the source of the variation in our expression ratios among F1 hybrids, we fit a linear model on a gene-by-gene basis where maize and teosinte inbred parent were used as explanatory factors for the expression ratio. Among ∼13,000 genes included in this analysis, the maize parent explains only 85% as much variation as the teosinte parent (Supplemental Table C.7). This represents the general reduction in diversity of maize as compared to teosinte, presumably a result of the domestication/improvement bottleneck. While the bottleneck should cause a reduction in expression variation in maize for all genes, genes that were targets of selection for regulatory differences should have an even greater reduction in expression variation. Consistent with this expectation, we observed 77 Figure 3.5: The proportion of average maize to teosinte R2 from linear models explaining F1 hybrid expression by maize and teosinte parent. Error bars represent ± 1 standard error. In all three tissues, the proportion of maize to teosinte R2 decreases in candidate CCT gene lists with the most ideal candidates (CCT-A) having the most extreme reduction. 78 a greater reduction in variation in maize as compared to teosinte for CCT genes than the full set of ∼13,000 genes (Figure 3.5, Supplemental Table C.7). This greater reduction likely reflects the combined effects of the bottleneck plus selection during domestication. For the full ABC groups of CCT genes, maize contributes 79% of teosinte variation, for the AB group about 74%, and for the A group about 52% of teosinte variation. Thus, among our strongest candidates (A group) for genes with cis regulatory difference between maize and teosinte, the data indicate that maize explains only about half as much of the cis regulatory variation as teosinte. The reduction in gene expression variation in maize vs. teosinte is also seen in the number of individual genes with significant effects due to the maize and/or teosinte parent (Supplemental Table C.8). In terms of numbers of genes, there were 2.0 to 2.5 fold more genes for which only the teosinte parent effect was significant than genes for which only the maize parent effect was significant among AB list genes, and 5-fold more among the A list CCT genes. 3.4.5 Selection candidate genes are enriched for CCT genes We compared our list of CCT genes to putative targets of selection during maize domestication and improvement [55]. There is significant enrichment for CCT genes among selection candidate genes for all three tissues (Table 3.4). The strength of the evidence for selection is strongest for the union of CCT genes from all three tissues. For example, there are 134 CCT-AB genes among the selected genes, while 86.7 would be expected by chance. Also, there were 10 CCT (A-list) genes from stem tissue among selected genes, although only 2.16 are expected by chance, a nearly 5-fold enrichment. XPCLR scores (cross population composite likelihood ratios) [89] quantify the degree of support for positive selection on a genomic region. We drew on a recent study [55] looking at XPCLR score in 10 kilobase windows in maize on a genome-wide scale. 79 Figure 3.6: Density plots of ln(XPCLR) score of conserved versus CCT-AB candidate genes. CCT genes have a significantly higher signature of selection in the 10kb window holding the transcriptional start site. The natural log transformed XPCLR scores for CCT-AB genes are consistently and statistically higher than genes that were identified as conserved in the initial analysis. The distributions of conserved and CCT-AB genes are significantly different by both the shape sensitive Kolmogorov-Smirnov test (p = 1.0587e11) and simple difference of the means t-test (p = 2.2119e-10) 80 Table 3.4: Fisher’s Exact Tests for the overlap between genes in domestication and improvement selection candidate genes and CCT genes from each of the three experimental tissues. CCT Group Overlap Ear Leaf Stem Union A Expected Observed p-value 3.42 11 3.52e-04 1.41 5 9.73e-03 2.16 10 1.89e-05 5.6 20 2.49e-07 AB Expected Observed p-value 44.71 70 9.12e-05 35.29 57 1.79e-04 34.78 60 1.74e-05 86.7 134 1.13e-07 ABC Expected Observed p-value 125.48 174 2.11e-06 105.68 135 1.289e-03 109.89 139 1.626e-03 248.92 317 3.54e-07 81 Comparison of the distributions of ln(XPCLR) scores at the transcriptional start site for CCT-AB genes and genes with conserved expression between maize and teosinte shows that CCT genes having a higher mean XPCLR than conserved genes (Figure 3.6). These two distributions are significantly different in terms of shape (Kolmogorov-Smirnov test, p = 1.06e-11) and overall mean (t-test, p = 2.21e-10). A goal of this study was to explore the relative importance of cis versus trans regulatory divergence during maize domestication. To address this question, we looked at the evidence for selection on genes with cis only effects in comparison to genes that had trans only effects. Genes in the cis and trans only regulatory categories were filtered to only include those that had consistent effects in the F1 hybrid contrasts. Consistent effect was defined as 100%, 90%, and 80% of hybrid contrasts favoring the same directionality of effect. Due to this definition genes in the cis only group were merely the cis only subset of CCT genes. For the trans only group in this analysis, the trans effect was estimated from parent and hybrid expression ratios and a weighted percent of hybrid contrasts favoring maize or teosinte alleles was calculated. Fisher’s Exact Tests on 2x2 contingency tables tabulating cis and trans genes with selection feature genes from Hufford et al. [55] show cis only genes are significantly enriched (p-value < 0.05) for selection in 7 of 9 comparisons, while trans only genes are never enriched and are actually significantly underrepresented among selected genes in two cases (Table 3.5). 3.4.6 Microarray and RNAseq data partially correspond We assessed the degree of correspondence between our CCT genes and 612 differentially expressed genes identified by a recent microarray study in maize [18]. We constructed 2x2 contingency tables for differentially expressed (DE) and non-differentially expressed (NDE) genes from the two studies. A Fisher’s Exact Test shows a highly significant degree of correspondence between the two studies for all three tissue types (Table 3.6). Using our 82 Table 3.5: Fisher’s Exact Tests for enrichment/depletion of cis and trans only genes in selection features. Tissue Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem Regulatory Category Group Observed Expected p-value A List 5 3 3 4 3 1 1.998 0.751 1.316 5.327 2.346 0.282 0.043 0.032 0.138 0.818 0.506 0.256 AB List 36 24 32 28 34 16 24.449 13.516 19.647 41.954 38.388 12.032 0.018 0.006 0.006 0.020 0.490 0.222 ABC List 95 54 84 78 91 42 70.113 45.427 65.615 97.036 101.461 43.148 0.002 0.175 0.016 0.033 0.273 0.935 Cis only Trans only Cis only Trans only Cis only Trans only 83 CCT-AB list, ∼25 gene are identified as DE in both studies while about 7 are expected by chance. However, the absolute level of correspondence between the two studies is rather low. For example, of the 328 leaf genes identified as DE by RNAseq, only 25 (7%) were also identified by the microarray study (Supplemental Table C.9). Thus, while the overlap between our two studies is statistically significant, the two methodologies resulted in largely different lists of DE genes. The largely different lists of DE genes identified by microarray and RNAseq analysis could be due in part to the fact that the microarray analysis includes genes with trans and cis x trans differences. To assess the proportion of the 612 genes that have trans versus cis effects, we examined the regulatory categories of the ∼250 differentially expressed genes (241, 261, 259; ear, leaf, and stem) for which there is both microarray and RNAseq data (Supplemental Table C.10). About 20% of these genes are classified as trans only or cis x trans by RNAseq, while 55% are classified as either cis only or cis + trans. The remainder (25%) are classified as conserved, ambiguous or compensatory. These results suggests the very different lists of DE genes from the two technologies is to a large degree due to differences in tissue, germplasm, environment, sampling error, or technical error, and that inclusion/exclusion of trans and cis x trans genes by the two studies does not explain all of the difference. 3.4.7 CCT genes are unrelated to differentially methylated regions In a recent study, Eichten et al. [93] identified differentially methylated regions (DMRs) in maize and teosinte. We compiled a list of the nearest genes both upstream and downstream of each DMR which gave a list of 332 genes. Of these genes, we have RNAseq data from 115, 116, and 121 for the ear, leaf, and stem tissues, respectively. Of these genes, 19, 14, and 17 genes were on the CCT-ABC gene lists (Supplemental Table C.11). We asked if 84 Table 3.6: Fisher’s Exact Tests for the overlap between differentially expressed genes from the microarray study and CCT genes from each of the three experimental tissues in our work. CCT Group Overlap Ear Leaf Stem Union A Expected Observed p-value 0.556 4 2.14e-03 0.274 3 2.28e-03 0.359 2 4.92e-02 1.040 8 7.83e-06 AB Expected Observed p-value 7.501 23 1.56e-06 6.409 25 4.84e-09 6.248 25 2.91e-09 15.778 48 1.61e-12 ABC Expected Observed p-value 21.774 52 9.58e-10 19.363 48 1.69e-09 20.579 46 1.05e-07 46.069 90 6.34e-12 85 CCT-ABC list genes are over-represented among the DMR associated genes as compared to random expectation and found that they are not (Fisher’s Exact Test, p = 0.1092, p = 0.4309, p = 0.1755; ear, leaf, and stem). Finally, the relationship between methylation status of the DMR does not correspond with the differential expression of maize vs. teosinte alleles at CCT-ABC list genes. Rather than observing that the more methylated allele was expressed at a lower level, the data show that ∼50% of the time, the methylated allele is expressed higher and ∼50% expressed lower (Supplemental Table C.12). 3.4.8 Dominant and additive gene expression inheritance The dominance/additivity (D/A) ratio was calculated for genes that were assessed in at least 15 crosses with three unique maize and five unique teosinte inbred lines. The overall average of gene D/A ratios was close to zero in all three tissues (Supplemental Table C.13), suggesting there is not an extreme overall trend for dominance of nondomesticated teosinte alleles over domesticated maize alleles. Tissues with active developmental programs, immature ear and seedling stem, are quite close to a 1:1 ratio of genes with a positive D/A ratio to genes with negative D/A ratio (1.084 and 0.982 for ear and stem, respectively). In contrast the leaf tissue has substantially more genes with a negative D/A value (1.287 ratio of positive to negative D/A ratios), indicating a higher rate of domesticated maize allele dominance in the leaf tissue. Of the three experimental tissues, two (Ear and Leaf) have an overall mean significantly different from zero (z-test, p < 0.05) and significantly more negative D/A ratios (binomial test, p < 0.05) than positive, suggesting teosinte allele dominance (Supplemental Table C.13). The average D/A ratios of the seven regulatory categories and three CCT gene lists are also fairly close to an overall mean of zero. Even the smallest CCT-A lists (21 to 43 genes) were always less than a fully dominant D/A ratio of one. Density plots for D/A ratio grouped by the seven regulatory categories do not show an obvious shift 86 in distribution (Supplemental Figure C.3). Thus, there is evidence for a weak overall tendency for dominance of non-domesticated expression levels in the ear and leaf tissues with no evidence for this teosinte dominance being linked to a specific regulatory category or candidate CCT gene list. We compared the proportions of genes showing dominant versus additive gene action in the cis only and trans only regulatory classes. Our trans only genes will show dominant gene action when there are haplo-sufficient loss-of-function (LOF) alleles at their trans regulators. In contrast, the effects of cis regulatory elements are expected to be purely additive in absence of transvection or similar mechanism [94]. When one of our cis only genes is classified as having dominant gene action that may also indicate error in classification because of trans effects on its expression that were below the level of statistical detection. Consistent with the expectation that dominance is more likely for trans only genes, the proportion of genes classified as dominant is higher for trans only genes in all three tissue types (Figure 3.7, Supplemental Table C.14). It has been proposed that the allelic variants responsible for evolution during domestication are primarily recessive LOF alleles [82]. Under this model, a non-domesticated allele would be dominant to the recessive LOF domesticated allele. Among our cis only genes with dominant gene action, dominance of the maize versus teosinte allele does not differ from the 50:50 expectation (Figure 3.7, Supplemental Table C.14). Among our trans only genes with dominant gene action, the maize allele is dominant to the teosinte allele more often than expected by chance. These results are counter to the proposal that domestication favored recessive LOF alleles. 3.4.9 Candidate genes enriched in various functional categories We examined our list of CCT genes for enrichment of several functional classes of maize genes including transcription factors, genes in known metabolic pathways, genes underly- 87 Figure 3.7: The proportion of genes showing dominant (red) versus additive (blue) gene action for cis only and trans only AB lists. For all tissues, trans only genes have a higher rate of dominance, however this difference is only significant for the ear and leaf tissues (Fisher’s exact test, p < 0.005 indicated by “*”). The proportion of genes in the trans only lists that are dominant for the teosinte allele (green) and the maize allele (yellow) is shown in the barplot to the right of each pie graph. There is significant deviation from the neutral expectation (1:1) for the ear and leaf tissue (binomial test, p < 0.005 indicated by “*”). 88 ing QTL, and gene ontology (GO) groups. First, a list of maize transcription factors and their corresponding families were compiled from the transcription factor database [86]. Although CCT genes (AB-list) were found to be slightly enriched for several transcription factor families (ARF, MADS-MIKC, and LBD) by Fisher’s Exact Tests, these results do not stand up to Bonferroni multiple test correction (Supplemental Table C.15). We conclude that there is no compelling evidence that CCT genes are enriched for transcription factors. Our list of CCT (AB list) genes was also compared with results from a recent QTL mapping experiment for a number of domestication and improvement traits [25]. We compared observed vs. expected overlap between CCT genes from the three tissues to the genes located within 1.5 LOD QTL support intervals for 16 traits. Testing was done on a trait by trait basis and restricted to 1.5 LOD QTL intervals containing 20 or fewer genes. After correction for multiple testing (Bonferroni), no significant enrichment for CCT-AB genes in domestication QTL was observed (Supplemental Table C.16). The greatest enrichment was seen with the trait ear diameter for which there were four CCT genes assayed in ear tissue within the QTL interval when only 1.22 were expected by chance (Fisher’s Exact Test, p = 0.03). A test for enrichment of CCT and trans only genes in 15 different metabolic pathways defined in the Kyoto Encyclopedia of Genes and Genomes (KEGG) was done using Fisher’s Exact Test on 2x2 contingency tables. There was no compelling evidence for enrichment/depletion of either groups of genes in any of the 15 pathways tested (Supplemental Table C.17). The smallest p-value identified was for the cutin, suberine, and wax biogenesis pathway in leaf tissue for trans only genes (p = 0.012), however this result does not remain significant after Bonferroni multiple test correction. We tested for GO term enrichment and depletion in the CCT and trans only gene lists. These analyses found significant GO term associations in the leaf CCT-ABC gene 89 list for five different categories including enrichment for chloroplast, plastid, thylakoid, and chloroplast thylakoid membrane, and depletion for DNA binding (Supplemental Table C.18). For trans only genes, significant enrichment for a number of GO terms in the ear tissue was detected for transcription factor and photosynthesis related terms with additional enrichment for ribosomal GO terms found in the leaf tissue (Supplemental Table C.18). 3.5 3.5.1 Discussion Regulatory change between and within maize and teosinte Of the ∼17,000 genes assayed 70% have significant cis and/or trans regulatory differences, suggesting considerable regulatory change has occurred during maize domestication and subsequent crop improvement. A similar proportion of genes were found to have cis and/or trans differences in a recent study between two species of Drosophila [21] and yeast [73]. This high amount of variation between maize and teosinte is not surprising given the incredible diversity of maize. Simple presence and absence of gene expression within maize itself is quite variable as shown in a recent study where 27.9% of genes were only expressed in a subset of maize inbred lines [95]. Additionally, this study found over a thousand novel genes not present in the reference B73 genome, suggesting considerable presence absence variation (PAV) also exists within maize. This finding is consistent with another study where PAV and copy number variation (CNV) were assessed, finding hundreds of CNVs and thousands of PAVs that included at least 180 single copy genes [96]. These CNVs and PAVs are accompanied by millions of additional SNPs both within and between genes [97]. In light of the known diversity within maize, it is not particularly surprising to see evidence for prolific cis and trans regulatory variation in gene expression between maize and teosinte. 90 Gene expression differences between populations only addresses some of the variation seen in the dataset. There is also a large amount of variation within the maize and teosinte populations. Only considering cis differences through F1 hybrids, upwards of 60% of genes have evidence for multiple maize or teosinte expression levels and consequently multiple alleles within population. Furthermore, our study shows a drop in expression variation in maize consistent with the reduction in overall diversity caused by the domestication/improvement bottleneck with an even greater reduction in expression variation for genes thought to be under additional artificial selection (CCT candidate genes) [55, 98]. The high level of expression variation still present in teosinte represents an unexplored source of diversity in maize, which may be useful for future crop improvement and plant breeding efforts. This study sheds light on the large amount of expression variation within and between maize and teosinte. However, only a small fraction of this diversity results in consistent expression differences that distinguish maize and teosinte inbred lines. The relatively small number of genes in this study showing consistent expression differences between maize and teosinte (∼1000 of 17,000, ∼6%) is similar to the fraction of genes seen in another recent study by Swanson-Wagner et al. [18]. Thus, this study reveals an immense amount of regulatory diversity within and between maize and teosinte, while also showing only a small fraction of this diversity appears to be fixed for discrete expression patterns that distinguish maize and teosinte populations. 3.5.2 What is the frequency of cis and trans regulatory change? Our study shows cis and trans regulatory differences occur at a similar frequency. However, this is only part of the story, since we also show that cis effects are arguably more important for the generation of large divergence in expression between maize and teosinte (Figure 3.3). Our observation of cis effects accounting for the majority of large expression 91 differences was also seen in a recent Drosophila study by McManus et al. [21]. The frequency of cis and trans regulatory differences in our sampling of maize and teosinte are fairly similar in the three experimental tissues and consistent with work in Drosophila, however, cis regulatory effects account for a significant portion of large expression divergence. In a recent study, Swanson-Wagner et al. [18] used microarrays to assess expression in a number of maize and teosinte inbred lines, many in common with our RNAseq based study. They found a relatively few number of genes (612 of ∼18,000) with differential expression between maize and teosinte. Of the genes assayed in both our RNAseq study and the Swanson-Wagner microarray experiment, all seven regulatory categories were found, with approximately 25% classified as cis only, 10% as trans only, and 25% as cis plus trans. While only ∼50% of the microarray differentially expressed genes were classified as cis only or cis plus trans in our study (potential CCT candidate genes), the overall low correlation between our RNAseq and the Swanson-Wagner microarray experiment makes direct comparison difficult. Comparisons made between two parental samples will identify genes with cumulative cis plus trans regulatory differences, consistent with this expectation cis only, cis plus trans, and trans only were the three most frequent regulatory categories assigned to differentially expressed microarray genes. A prominent hypothesis in evolutionary biology is that mutation in CREs of functionally conserved proteins is the primary mechanism by which morphological evolution occurs [12]. In this hypothesis, mutation of the CREs of highly pleiotropic “master regulator” genes, and the resulting downstream effects, contribute substantially to overall morphological change, which if true predicts large scale rearrangement of gene expression networks based on trans effects. While it is true trans effects occur at a high frequency in this study, these effects are accompanied by an equal number of larger cis regulatory driven expression differences. Thus, we believe the changes to gene regulation during 92 maize domestication are best interpreted as frequent “shaving” of expression by cis regulatory change to fine-tune various pathway elements in addition to the broader adjustments to whole pathways through trans regulatory differences. 3.5.3 Tissue specific expression of CCT candidates We compared the expression of genes identified as candidates between tissues. There was significantly more overlap between the candidate genes from the three experimental tissues than expected by random chance (Permutation tests, p < 1e-5, Figure 3.1). This suggests a high degree of shared cis regulatory effects between tissues. The functioning of CREs in multiple tissues is also supported by the high observed correlation between the direction and magnitude of cis effect in different tissues (Adj. R2 ≈ 80%, Pearson correlation ≈ 90%). These results suggest many CREs function in multiple tissues to drive expression of a gene. While there is evidence for significant overlap of CCT genes between tissues, a very high proportion of total CCT genes (∼70%) are only found in a single tissue. The lowest overlap between tissues for the CCT-AB list (52 genes) was between the ear and leaf tissue, arguably the two most developmentally different tissues studied. This trend is seen in candidate genes as well as when considering all assayed genes. There have been relatively few genome-wide studies using F1 hybrids to dissect cis and trans effects and even fewer that consider multiple tissues [69, 72], but our results are consistent with these previous studies where ∼70% of identified genes were identified in single tissues. Overall, many CCT genes are shared between tissues, but the majority of genes are tissue specific, suggesting modification of both globally active and tissue specific CREs occurred during maize domestication. Even though gene expression is highly correlated between tissues, there is evidence for approximately 20% more functional, consistent cis regulatory changes in the ear seen in 93 the larger number of CCT genes in the ear tissue (555) than in leaf (445) and stem (431). The imbalance in number of differentially expressed genes in different tissues was also observed in a recent study looking in Arabidopsis [72], where the three studied tissues had an approximately 80% difference in number of differentially expressed genes. The maize and teosinte ear have massive morphological differences in terms of size, placement of spikelets, glume, and absence of fruit case. These morphological differences may be due in part to these frequent tissue specific cis regulatory differences. This observation is again at odds with the view of large morphological change in evolution/domestication caused by mutation of CREs for a few “master regulator” genes [12]. Instead this data again sheds light on the many single gene expression changes through “shaving” of allele specific expression with modification of multiple tissue specific CREs. 3.5.4 Bias toward increased maize expression? In the F1 hybrid analysis ∼55% of genes have higher expression of the maize allele than the teosinte allele. High expression of the maize allele also occurs in the comparison between parent inbred lines, except for leaf, where there is the same number of genes favoring maize and teosinte alleles. This same trend of up regulated maize expression extends to the CCT gene lists, where ∼60% of genes favor the maize allele. Our observation of high expression for one of the parents (maize) is also consistent with several previous studies in multiple organisms including maize [18], cotton [17], Arabidopsis [72], Cirsium [68], and fruit fly [21]. Our experimental method using parent derived pseudo-transcriptomes and perfect alignment to segregating sites should ameliorate the issue of alignment bias, but we cannot be sure to have fully eliminated it. While potential alignment bias prevents firm conclusions, genes consistent across all maize and teosinte inbreds are less likely to be artifacts, suggesting the overall bias for maize alleles seen in candidate genes is a real phenomenon. 94 3.5.5 Selection-candidates enriched for cis regulatory change Changes in gene expression, specifically through altered CREs, is not uncommon in the history of domesticated crops. These changes have led to increased fruit size in tomato [16], maize apical dominance [3, 40, 99], loss of prolificacy in maize [4], and changes in rice yield and flowering time [57, 58]. These examples represent cases where large sometimes pleiotropic genetic changes are caused by singular genes. There is no disputing the important role of these types of genetic changes in creating some of the world’s most productive crops. However, this study sheds light on the hundreds of other genes with differential expression patterns, caused by CREs, between maize and teosinte. These hundreds of genes with regulatory differences between maize and teosinte are enriched in selection features [55] and have stronger selection upstream and at the gene in comparison to conserved genes. Positive selection for regulatory effects is restricted to genes specifically with CRE differences, since genes with trans only regulatory change are never enriched for selected genes. While genes with consistent CREs differentiating populations are not all likely to play large, equal, or even critical roles in the domestication of maize. Corroborating evidence such as selection scans can provide the information needed to elucidate truly important players in the domestication process, even if discovering the function for all of these genes in domestication is likely an impossible task. One example of how data from other sources, such as selection scans, can help shed light on candidates is the importance of cis effect magnitude. A number of genes in this study show large shifts in expression between maize and teosinte (log2 (M:T) > 10), however, the magnitude of cis effect has no correlation with strength of selection, suggesting magnitude of effect is not particularly important. In retrospect, this is not surprising considering subtle changes in gene expression are known to cause drastic phenotypic differences. New tissue specific shifts in gt1 expression largely led to elimination of secondary ears in maize [4] and a relatively moderate 2-fold change in expression of tb1 leads to 95 greatly increased apical dominance [3, 40]. In light of this result, selection on CREs during maize domestication may be best characterized as subtle fine-tuning of expression patterns to generate phenotypic change. 3.5.6 Leaf tissue candidates are enriched for photosynthesis and chloroplast GO terms A number of gene ontology terms implicated in photosynthesis and carbon fixation were found to be enriched in the leaf CCT-ABC list. Mapping these genes back to photosynthesis and carbon fixation pathways show two components in the photosystem I receptor as well as part of the ATP synthase (delta subunit). Additionally, a number of enzymes involved in carbon fixation were found to be up or down regulated in maize through cis regulatory means. Most of these enzymes were involved in reactions converting malate to other substrates in carbon fixation. Cytosolic and mitochondrial forms of malate dehydrogenase (mdh) were two of the identified differentially expressed genes. Mdh2, a mitochondrial form, is higher in teosinte, whereas mdh4, cytosolic, is expressed at a higher level in maize. These expression differences suggest there were changes made to malate-oxaloacetate flux between the mitochondria and cytoplasm during maize domestication. Movement of oxaloacetate (OA) has important implications in energy metabolism and photorespiration [100, 101]. The changes in expression suggest there may be lower conversion between OA and malate within the mitochondrial matrix, leading to reduced malate in the mitochondrial and reduced transport of OA into mitochondria. In theory this would leave more OA in the cytoplasm where it would be available for conversion to malate and transport to bundle sheath cells for photosynthesis. This could lead to improved rates of photosynthesis in maize. However, these results should be treated with caution, since the malate dehydro- 96 genase enzymes identified are on a secondary candidate gene list and are not considered to be our best candidates. 3.5.7 Do crop domestication genes show cis differences? Domestication is characterized by a number of common phenotypes including gigantism, loss of prolificacy, loss of shattering, changes to pollination mechanisms, apical dominance, and branching that are collectively considered the domestication syndrome [10, 11]. While domestication syndrome is characterized by several common phenotypes, the genetic modifications that lead to these traits may or may not be due to changes in homologous genes. Genes such as waxy [102–104], tb1 [3, 105], and ghd7 [50, 57] represent several genes that were selected on in multiple crop species, however, there are many more unique genes controlling domestication traits [106–109]. To get a sense of the regulatory status of several crop domestication genes in maize, we generated a list of 28 domestication genes (6 maize and 22 non-maize) and identified the closest homologous gene in maize by protein to protein BLAST (Table 3.7). Of these 28 genes, only sugary1 from maize, an isoamylase starch debranching enzyme, in the ear was on the CCT-B gene list. Furthermore, only two of the remaining genes were on the C list. The inability to identify cis regulatory changes for maize homologs of non-maize domestication genes suggests cis regulatory change in a domestication context may tend to operate on unique genes in different organisms as opposed to a single gene with conserved functions in multiple species. 3.5.8 A catalog of genes with cis regulatory variation A product of this study, similar to selection scans, is a list of candidates for future investigation. The complete set of 25,000 genes (with information on RNAseq read counts, parent and F1 expression ratios, regulatory classification, and other summary informa- Locus Name tga1 ZmYAB2.1 Sh2 Su1 gt1 tb1 waxy Nud qFT10-4 BoCAL PsELF3 DTH2 GS6 GS5 qSH1 shat1 Bh4 TAC1 GW2 Ehd1 BADH2 OsSPL16 qPE9-1 Sh1 Tannin1 FAS Q Vrn1 Organism Maize Maize Maize Maize Maize Maize Amaranths Barley Brassica Brassica Pea Rice Rice Rice Rice Rice Rice Rice Rice Rice Rice Rice Rice Sorghum Sorghum Tomato Wheat Wheat Coding Expression Expression Coding Expression Expression Coding Deletion Expression Coding Coding Unclear Coding Expression Expression Coding Coding Expression Coding Coding Coding Expression Loss of Function expression Coding Expression Coding and expression Expression Functional Change trans only cis + trans conserved cis only trans only cis + trans cis + trans trans only cis only cis x trans trans only ambiguous conserved cis only comp. cis x trans comp. cis x trans cis x trans cis only comp. cis + trans conserved cis + trans cis x trans cis only D B D D D D C D D D D D D D D D D D - trans only comp. cis x trans trans only cis x trans trans only conserved conserved ambiguous conserved ambiguous conserved cis only cis only cis + trans trans only trans only conserved trans only - Reg. Cat. Reg. Cat. CCT Leaf Ear D D D D D D D D D C D D D D D - CCT comp. cis x trans ambiguous conserved trans only trans only cis + trans conserved conserved trans only trans only cis only cis only trans only cis only trans only conserved - Reg. Cat. Stem D D D D D D D D D D D D D D D - CCT Table 3.7: Regulatory category of the closest maize homolog of 6 maize and 22 non-maize domestication loci. 97 98 tion) will be a valuable tool to investigators for screening for new genes of interest and answering preliminary questions about the expression of specific genes. From example, one attractive CCT candidate gene is barren stalk1 (ba1 ), a known maize single gene mutant that causes a defect in branch formation in both the whole plant and tassel [110]. The wild type function of ba1 is inferred to be in branch initiation. In our study, ba1 was one of our strongest candidates with all assayed crosses showing higher expression of the maize allele in the ear. The overall shift in expression was substantial ( 4-fold) and this shift is caused by cis regulatory differences alone. ba1 was also found to be under selection during maize domestication in two independent studies [55, 110]. These combined observations suggest that there was selection for a CRE that codes the upregulation of ba1 in the ear, perhaps resulting in a greater number of rows (branches) of kernels in the maize ear as compared to the teosinte ear. Compelling evidence for this hypothesis could be obtained by fine-mapping and identifying the hypothesized CRE and demonstrating with expression assays that the maize and teosinte alleles of the CRE have the imagined effects on gene expression during ear development and on phenotype (kernel row number) in the adult ear. ba1 illustrates the power of genomic scans to identify strong candidates for future study that can inform us about the fine details of evolution under domestication. 99 Appendices 100 Appendix A Supplemental Content: Genetic dissection of a genomic region with pleiotropic effects on domestication traits in maize reveals multiple linked QTL 101 A.1 Figures Figure A.1: Histograms of the least squared means for phenotyped traits from the QTL mapping population. Several of these distributions are approximately normal, but other traits take on an exponential distribution. The average least squared mean for NIRILs with 100% maize and teosinte genotypes is indicated with an arrow and “M” for maize and “T” for teosinte. 102 Figure A.2: Example histograms of simulated traits for several different conditions in terms of number of causative loci, effect size, and heritability. Histograms from traits with equal effects - 67% H2, equal effects - 90% H2, gamma distributed effect - 67% H2 and gamma distributed effect - 90% H2 are shown in different columns from left to right. Histograms from simulated traits with one, five, ten, twenty, fifty, seventy-five, and one hundred causative loci are shown from top to bottom. The average simulated phenotype value for NIRILs that are 100% maize and teosinte are indicated with arrows labeled by “M” for maize and “T” for teosinte. 103 Figure A.3: Proportion of detected QTL with zero, one, or multiple causative genes in the 1.5 LOD support interval. As seen in the equal effect size simulations, a high number of gamma distributed causative genes leads to detected QTL with multiple causative factors. There is a reasonable percentage of detected QTL in the simulations containing a single causative gene when few (less than 4) causative genes are simulated, but as the number of simulated causative genes increases we quickly lose the power to distinguish between closely linked causative genes and they become lumped into single detected QTL. 104 A.2 Tables Table A.1: RFLP Markers used during backcrossing of QTL mapping population. Marker Chromosome Marker Chromosome bnl5.62 umc157 umc37b npi255 BZ2 bnl8.10 npi615 umc107 npi225 bnl8.45 umc53 npi320 npi421 umc6 umc34 umc134 umc131 umc2b umc5a php20005 umc122 umc49a umc36 umc32 umc121 php20042 umc42b umc161 umc18 TE1 bnl5.37 bnl8.01 umc60 bnl12.97 php10080 npi425 umc2a 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 php20725 umc19 umc127a bnl10.17b umc15 bnl8.23 bnl8.33 bnl6.25 umc90 umc27 umc166 bnl7.71 npi412 umc54 umc127b umc104a bnl6.29 umc65 umc21 umc46 umc132 umc62 npi114 bnl9.11 umc117 umc7 npi253 umc113 umc81 umc95 bnl3.04 umc130 umc49b umc117b bnl7.49 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 8 8 8 8 9 9 9 9 10 10 10 10 10 105 Table A.2: Genetic markers used to score BC6 S6 mapping population. Marker umc2036 bnlg565 bnlg105 phi008 umc2293 umc2060 bnlg1046 umc2035 umc1705 umc1056 umc2294 umc1935 umc1850 mmp58 GRMZM2G116761 umc2298 umc1110 umc1224 umc1283 bnlg1287 dupssr10 bnlg2323 ZHL0301 umc1348 umc1966 Genetic Position AGPv2 0.00 6.54 20.90 21.54 25.26 27.79 31.75 42.17 45.36 48.10 48.43 53.24 54.79 61.98 63.55 65.07 65.39 66.70 67.52 67.69 68.70 74.26 77.01 81.83 86.64 6,985,618 8,492,871 13,812,586 14,072,755 15,110,054 16,462,750 18,701,374 23,891,611 28,196,243 32,036,007 33,783,084 51,438,549 54,416,924 74,916,830 82,236,166 84,800,717 84,825,409 92,368,617 111,997,867 121,584,002 142,483,421 151,717,831 159,447,730 166,576,639 169,231,037 106 Appendix B Supplemental Content: Fine mapping of chromosome five domestication genes in maize 107 B.1 Tables Table B.1: PCR markers used for genotyping RCNILs including gene or SNP target, AGPv2 position, and primer sequence. Gene or SNP Name AGPv2 Position Primers GRMZM2G003313 38,994,478 CCACAGAATCTCTCCACCAGA CTTTTGCTTCTCACCCCAGA GRMZM2G048045 62,595,351 GCCTACGAGCTGCAACAGG GCCCTCCGTTCTACACACAG GRMZM2G116761 82,236,265 TCGCATCTGGAAAGAGCTTC TGAATTGCAAAAGAGGAAACA PZE-105075181 82,970,868 GGCCCGGGCTAGAGAACCGA GTGCGGAGCTTGGGACCGAC GRMZM2G158520 82,952,563 TCGGGCACGAAAGGTGTCGC CACTCTCTCCCGCTCCCGCT GRMZM2G387127 83,436,098 CGCAAGCCGATCTTTTACTC GCAGTTGAACTCGAAGTGGA GRMZM2G387127 83,436,808 CGCAAGCCGATCTTTTACTC GCAGTTGAACTCGAAGTGGA GRMZM2G026117 84,249,368 CTCAGGCCAAGGTCTCACTC AGAGTGTGCGGCTTTCAGTT umc1110 84,825,350 TTACACCAAGGTCCGAAACAAGAT TCTTGGAAGGCAAGACTCTACCTG PZE-105076775 85,553,605 CAAACCTCCCAAGAGAATGC TTGATGCAGATTCGCTGAAC GRMZM2G017882 85,864,165 GTCCGCCTCGGCGACCTAGA CCAGAGGGGACCTGTGGGGG AC207043.3 FG002 86,014,290 CCACACTCATTTGACCAACG TGACGCGTGTTCTAGCTTGT AC207043.3 FG002 86,014,338 CCACACTCATTTGACCAACG TGACGCGTGTTCTAGCTTGT PZE-105077135 86,221,700 AAAGACGCAGCAGGAGAGAG TGCTACGTTACAGGCTGTCG Table B.1: (continued) 108 Gene or SNP Name AGPv2 Position Primers GRMZM2G102758 86,783,453 AGCAGGGTCAAGGACTACCA TCCTGCAGCTCCTCTTCTTC GRMZM2G063106 87,114,719 TGCATTTCTCTGACCTCCTTG TCCGACTTGAGGATCCTGTT umc1283 111,997,810 CTGCTCCCTTATGATGTGATGATG TGCACTGAGGTGTAGGTAGAGCAA GRMZM2G012923 151,446,717 AGCAAAGCATGGGCTAGTGT GCCATGCTGCTTATGGATCT GRMZM2G027886 159,447,674 AACAGCTTTGCTTCCCTGAA CCCAGAGGATCCAGAGTCAG umc1348 166,576,570 CTCACTGACACTTGAACACACACG TTACTGGTCTCCTGATCCTTAGCG umc1221 168,671,954 GCAACAGCAACTGGCAACAG AAACAGGCACAAAGCATGGATAG umc1966 169,230,959 GTTTTCGACGAGGGGACTACATTT CACGGTTGAGAACTTCGCTTGTAG 109 Appendix C Supplemental Content: The role of cis regulatory evolution in maize domestication 110 C.1 Figures Figure C.1: Parent versus hybrid leaf tissue allele specific expression ratios. The parent (x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against each other. Regulatory category in terms of the combination of significant statistical tests determined using the method described in methods is shown designated by color. Proportion and count of genes falling into the various regulatory categories are also shown in the lower right hand corner barplot. 111 Figure C.2: Parent versus hybrid stem tissue allele specific expression ratios. The parent (x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against each other. Regulatory category in terms of the combination of significant statistical tests determined using the method described in methods is shown designated by color. Proportion and count of genes falling into the various regulatory categories are also shown in the lower right hand corner barplot. 112 Figure C.3: Dominance by additivity ratio grouped by regulatory category. Density plots of gene dominance by additivity (D/A) ratios for the three tissues grouped by regulatory category. There is no obvious shift in the distribution for any of the tissues or regulatory categories, indicating the gene regulatory category does not significantly impact overall additivity or dominance. 113 C.2 Tables Table C.1: Biological replicates of F1 hybrid and parent inbred lines for RNAseq expression study with hybrid replicates internal and parent around the perimeter. B73 TIL01 TIL03 TIL05 TIL09 TIL10 TIL11 TIL14 TIL15 TIL25 2/2/2 2/1/1 2/2/2 2/2/1 Inbred 2/2/2 2/2/2 4/2/2 CML103 Ki3 2/2/2 1/2/2 2/2/2 2/2/2 Mo17 Oh43 W22 Inbred 0/2/2 2/2/2 1/2/2 2/2/2 2/2/2 2/2/2 3/2/2 2/2/2 2/1/2 2/2/2 2/2/2 2/2/2 2/2/2 2/2/2 2/1/1 2/2/2 2/2/2 2/2/2 2/2/2 2/2/2 2/2/2 2/2/2 4/3/2 2/2/2 2/2/2 2/2/2 2/2/2 1/2/2 2/2/2 3/2/2 2/2/2 2/2/2 2/2/2 114 Table C.2: Adapter name, barcode sequence, and barcode length for Illumina adapters used in RNAseq libraries. Adapter # Adapter Name Barcode Sequence 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 PE YC3 PE YC4 PE YC5 PE YC6 PE YC7 PE YC8 PE JM 1 PE JM 2 PE JM 3 PE JM 4 PE TB 1 PE TB 2 PE TB 3 PE TB 4 PE ZL1 GCATGT TGTGCT AGTCAT GTAAGT TCCTCT CAGGTT TCCAT TAGCT GTTCT CGATT ATCGT GCTAT TGGAT ATGCT CACTAT Barcode Length 5 5 5 5 5 5 4 4 4 4 4 4 4 4 5 nt nt nt nt nt nt nt nt nt nt nt nt nt nt nt 115 Table C.3: Number of genomic paired end reads and coverage obtained for constructing pseudo-transcriptomes. Inbred Line # Reads genome coverage CML103 Ki3 Mo17 Oh43 TI01 TI03 TI05 TI09 TI10 TI11 TI14 TI15 TI25 W22 4.46E+08 4.38E+08 2.57E+08 5.59E+08 3.44E+08 3.16E+08 4.76E+08 3.42E+08 5.29E+08 3.41E+08 3.22E+08 5.39E+08 4.27E+08 3.07E+08 21.24 19.85 11.37 20.56 14.5 13.15 17.8 15.21 24.29 15.97 13.82 24.22 19.93 13.19 Average 4.03E+08 17.50714 116 Table C.4: Proportion of divergence due to cis regulatory effect grouped by overall parental divergence. Gene Group1 N Tissue % cis ± SE All genes 0 to 1 1 to 2 2 to 3 3 to 4 4 to 5 5+ 15939 14140 1312 268 95 45 79 Ear Ear Ear Ear Ear Ear Ear 0.4519 ± 0.0021 0.4583 ± 0.0022 0.3918 ± 0.0081 0.3524 ± 0.0188 0.337 ± 0.0298 0.4713 ± 0.0495 0.7777 ± 0.0273 All genes 0 to 1 1 to 2 2 to 3 3 to 4 4 to 5 5+ 15925 13784 1739 277 52 21 52 Leaf Leaf Leaf Leaf Leaf Leaf Leaf 0.4164 0.4262 0.3309 0.3752 0.4458 0.6534 0.7707 All genes 0 to 1 1 to 2 2 to 3 3 to 4 4 to 5 5+ 16018 14746 1000 149 40 23 60 Stem Stem Stem Stem Stem Stem Stem 0.4704 ± 0.0021 0.4715 ± 0.0022 0.4284 ± 0.0096 0.4629 ± 0.0233 0.5051 ± 0.0539 0.6365 ± 0.059 0.8081 ± 0.0248 1 ± ± ± ± ± ± ± 0.0021 0.0022 0.0065 0.0173 0.0437 0.0566 0.0298 Group (except for “All genes”) indicates grouping of genes by the absolute value of the parent log2(Maize:Teosinte) ratio. 117 Table C.5: The number of genes for which the maize or teosinte allele is expressed at a higher level. CCT Group Tissue Maize Teosinte A A A B B B C C C ABC ABC ABC Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem 34 16 19 319 265 249 594 533 558 947 814 826 9 5 8 193 159 155 396 310 382 598 474 545 c b a ABC ABC B73 CML103 Ki3 Mo17 Oh43 W22 non-B73 B73 CML103 Ki3 Mo17 Oh43 W22 non-B73 c b a Stem Leaf Ear Tissue 524 582 555 520 512 546 545 465 556 506 478 477 502 494 569 661 602 605 594 640 606 Teosinte Bias 0 7 2 4 1 1 0 0 6 3 4 0 1 0 1 6 5 12 1 4 0 No Bias 847 739 793 806 857 814 826 823 688 760 765 807 775 794 975 839 915 845 949 889 939 Maize Bias Fisher’s Exact Test for B73 versus cumulative non-B73 ratio, p = 0.1821. Fisher’s Exact Tes for B73 versus cumulative non-B73 ratiot, p = 0.2539. Fisher’s Exact Test for B73 versus cumulative non-B73 ratio, p = 0.4326. ABC CCT Group B73 CML103 Ki3 Mo17 Oh43 W22 non-B73 Maize Inbred 1.6164 1.2698 1.4288 1.5500 1.6738 1.4908 1.5156 1.7699 1.2374 1.5020 1.6004 1.6918 1.5438 1.6073 1.7135 1.2693 1.5199 1.3967 1.5976 1.3891 1.5495 Maize:Teosinte Ratio Table C.6: Bias for the maize allele grouped by inbred line for the three tissues in the CCT-ABC gene list. 118 119 Table C.7: Allele specific expression variation among F1 hybrids explained by maize and teosinte parent. Tissue Category R2 maize R2 teosinte Maize/Teosinte Gene Count Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem All genes All genes All genes ABC ABC ABC AB AB AB A A A 32.48% 31.76% 32.04% 32.25% 31.94% 32.20% 30.76% 30.61% 32.28% 26.58% 20.11% 28.86% 38.21% 37.18% 38.56% 41.37% 39.79% 41.26% 42.95% 41.69% 42.22% 48.86% 47.63% 48.26% 85.01% 85.43% 83.09% 77.96% 80.27% 78.05% 71.64% 73.42% 76.45% 54.41% 42.22% 59.80% 13194 13121 13305 1545 1288 1371 555 445 431 43 21 27 120 Table C.8: Number of genes for which the maize and/or teosinte parent contributed to the variance among the F1 hybrid gene expression ratios (heterogeneous) and genes for which there was no variance in expression attributable to the maize or teosinte parent (homogeneous). CCT genes in groups A, B, and C in the three tissue types are shown. Tissue Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem Ear Leaf Stem Category All genes All genes All genes ABC ABC ABC AB AB AB A A A Heterogeneous Maize Teosinte Maize+Teosinte 1880 1810 1924 195 165 193 67 54 57 3 1 2 2959 3005 3215 417 322 374 157 117 128 17 6 8 2504 2327 2645 350 285 321 120 104 105 5 3 7 Homogenous Total 5851 5979 5521 583 516 483 211 170 141 18 11 10 13194 13121 13305 1545 1288 1371 555 445 431 43 21 27 Tissue Ear Ear Leaf Leaf Stem Stem Union Union Ear Ear Leaf Leaf Stem Stem Union Union Ear Ear Leaf Leaf Stem Stem Union Union CCT Group A A A A A A A A AB AB AB AB AB AB AB AB ABC ABC ABC ABC ABC ABC ABC ABC RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE RNAseq-NDE RNAseq-DE 8529 1083 8842 943 8835 985 7970 2170 9244 368 9482 303 9532 288 9414 726 9587 25 9774 11 9804 16 10097 43 MicroArray-NDE 136 52 147 48 154 46 121 90 165 23 170 25 175 25 163 48 184 4 192 3 198 2 203 8 MicroArray -DE Observed 8498.77 1113.23 8813.36 971.64 8809.58 1010.42 7926.07 2213.93 9228.50 383.50 9463.41 321.59 9513.25 306.75 9381.78 758.22 9583.56 28.44 9771.27 13.73 9802.36 17.64 10090.04 49.96 166.23 21.77 175.64 19.36 179.42 20.58 164.93 46.07 180.50 7.50 188.59 6.41 193.75 6.25 195.22 15.78 187.44 0.56 194.73 0.27 199.64 0.36 209.96 1.04 MicroArray -NDE MicroArray -DE Expected Table C.9: Comparison of observed and expected numbers of genes classified as differentially expressed (DE) or not differentially expressed (NDE) by RNAseq and MicroArray assays in groups A, B, and C in the three tissue types. 121 122 Table C.10: Regulatory categories for genes identified as differentially expressed between maize and teosinte by microarray assays. Ear Ambiguous Cis + Trans Cis only Cis x Trans Componesatory Conserved Trans only Total Genes 5.81% 25.73% 26.14% 6.64% 7.05% 13.28% 15.35% 241 Leaf Stem 7.66% 9.65% 29.12% 22.39% 28.74% 30.89% 6.13% 8.49% 8.05% 6.56% 8.05% 12.74% 12.26% 9.27% 261 259 123 Table C.11: Fisher’s Exact Tests for the overlap between genes associated with differentially methylated regions (DMRs) and CCT-ABC genes from each of the three experimental tissues in our work. Overlap Expected Observed p-value Ear Leaf Stem Union 13.466 11.387 12.468 19 14 17 0.1092 0.4309 0.1755 27.493 34 0.1605 124 Table C.12: Number of candidate genes neighboring differentially methylated regions (DMRs) between maize and teosinte and proportion in which expression data agrees with methylated status. Ear Total A B C Total-agree A-agree B-agree C-agree 19 1 3 15 57.90% 100% 100% 46.70% Leaf Stem 14 17 0 0 3 3 11 14 50.00% 58.80% NA NA 33.30% 33.30% 54.50% 64.30% 125 Table C.13: Characteristics of dominance/additivity ratios from a genome-wide analysis including basic statistics such as max, min, mean, and median as well as average D/A ratio for seven regulatory categories and the CCT candidate lists. Ear Leaf Stem Min Max Median Mean Positive D/A Negative D/A Pos:Neg Ratio N -10.4557 10.56194 0.032991 0.035682 6863 6331 1.084031 13194 -273.675 70.80451 0.160156 0.211276 7385 5736 1.287483 13121 -27.8545 78.71309 -0.01118 -0.01638 6593 6712 0.982271 13305 Z-test p-value Binomial p-value 2.442e-05 3.775e-06 1.486e-13 4.741e-47 0.354 0.306 Ambiguous Cis + Trans Cis only Cis x Trans Compensatory Conserved Trans only -0.00408 -0.00204 -0.02053 0.14616 0.052921 0.049997 0.08708 0.020225 0.455915 0.044602 0.32702 -0.08854 0.009092 0.382572 -0.00841 0.05871 0.063987 -0.16874 0.002721 -0.05574 -0.10058 CCT-A CCT-AB CCT-ABC 0.03508 -0.0169 -0.04257 0.329661 0.094785 0.208951 0.026347 0.129459 0.077445 126 Table C.14: Additive and dominant gene counts for the A, AB, and ABC cis and trans only candidate lists. Dominance cells contain the number of genes for which the maize:teosinte allele was dominant. Fisher’s exact tests (FET) interrogate whether the degree of dominance/additivity differs between the cis and trans classes. The binomial test (BT) asks whether the number of maize:teosinte dominant alleles are equal. Ear A Leaf Add Dom Add Dom Add Dom Cis only 11 1:0 5 1:0 3 2:1 Trans only 13 19:2* 5 4:3 2 0:2 FET p<0.005 AB ABC FET p>0.05 FET p>0.05 Cis only 95 22:18 53 18:17 52 19:20 Trans only 112 89:35* 72 81:29* 23 10:13 FET p<0.005 FET p<0.005 FET p>0.05 Cis only 266 62:65 136 50:56 178 68:71 Trans only 203 112:68* 121 107:65* 67 35:42 FET p<0.005 * Stem Binomial test p-value < 0.005. FET p<0.005 FET p<0.05 127 Table C.15: Degree of overlap between our CCT (AB list) genes and genes in different transcription factor families. Family Tissue Assayed Genes Observed Overlap Expected Overlap FET p-value AP2 ARF ARR-B B3 BBR-BPC BES1 bHLH bZIP C2H2 C3H CAMTA CO-like CPP DBB Dof E2F/DP EIL ERF FAR1 G2-like GATA GeBP GRAS GRF HB-other HB-PHD HD-ZIP HSF LBD LFY LSD M-type MIKC MYB MYB related NAC Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear 6 27 8 18 4 3 42 51 28 42 8 3 7 4 7 10 4 17 15 11 10 14 21 8 14 2 19 12 3 0 3 6 23 23 42 25 0 4 0 1 0 0 1 0 2 1 0 0 1 0 0 0 0 0 2 0 0 0 1 0 0 0 1 1 0 0 0 1 2 2 4 0 0.25 1.14 0.34 0.76 0.17 0.13 1.77 2.15 1.18 1.77 0.34 0.13 0.29 0.17 0.29 0.42 0.17 0.72 0.63 0.46 0.42 0.59 0.88 0.34 0.59 0.08 0.8 0.5 0.13 0 0.13 0.25 0.97 0.97 1.77 1.05 1 0.03 1 0.54 1 1 0.84 1 0.33 0.84 1 1 0.26 1 1 1 1 1 0.13 1 1 1 0.59 1 1 1 0.56 0.4 1 NA 1 0.23 0.25 0.25 0.1 1 Table C.15: (continued) 128 Family Tissue Assayed Genes Observed Overlap Expected Overlap FET p-value NF-X1 NF-YA NF-YB NF-YC Nin-like RAV S1Fa-like SBP SRS STAT TALE TCP Trihelix VOZ Whirly WOX WRKY YABBY ZF-HD ALL Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear 2 10 7 7 11 0 0 12 2 1 12 9 22 2 2 0 20 4 1 649 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24 0.08 0.42 0.29 0.29 0.46 0 0 0.5 0.08 0.04 0.5 0.38 0.93 0.08 0.08 0 0.84 0.17 0.04 27.3 1 1 1 1 0.38 NA NA 1 1 1 1 1 1 1 1 NA 1 1 1 0.77 AP2 ARF ARR-B B3 BBR-BPC BES1 bHLH bZIP C2H2 C3H CAMTA CO-like CPP DBB Dof E2F/DP EIL ERF FAR1 Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf 8 27 8 16 4 3 41 42 29 41 8 5 7 6 8 10 4 15 14 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0.27 0.92 0.27 0.54 0.14 0.1 1.39 1.42 0.98 1.39 0.27 0.17 0.24 0.2 0.27 0.34 0.14 0.51 0.47 1 1 1 0.42 1 1 1 1 0.26 1 1 1 1 1 1 1 1 1 1 Table C.15: (continued) 129 Family Tissue Assayed Genes Observed Overlap Expected Overlap FET p-value G2-like GATA GeBP GRAS GRF HB-other HB-PHD HD-ZIP HSF LBD LFY LSD M-type MIKC MYB MYB related NAC NF-X1 NF-YA NF-YB NF-YC Nin-like RAV S1Fa-like SBP SRS STAT TALE TCP Trihelix VOZ Whirly WOX WRKY YABBY ZF-HD ALL Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf 16 14 14 19 6 14 2 16 10 1 0 3 3 9 31 44 28 2 9 5 8 10 0 0 11 0 1 12 8 22 2 2 0 16 4 1 623 0 0 0 1 0 1 0 1 0 1 0 0 0 2 2 1 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 15 0.54 0.47 0.47 0.64 0.2 0.47 0.07 0.54 0.34 0.03 0 0.1 0.1 0.31 1.05 1.49 0.95 0.07 0.31 0.17 0.27 0.34 0 0 0.37 0 0.03 0.41 0.27 0.75 0.07 0.07 0 0.54 0.14 0.03 21.13 1 1 1 0.48 1 0.38 1 0.42 1 0.03 NA 1 1 0.04 0.28 0.78 0.25 1 1 1 1 1 NA NA 1 NA 1 1 1 0.53 1 1 NA 1 1 1 0.94 AP2 ARF Stem Stem 8 27 0 3 0.26 0.87 1 0.06 Table C.15: (continued) 130 Family Tissue Assayed Genes Observed Overlap Expected Overlap FET p-value ARR-B B3 BBR-BPC BES1 bHLH bZIP C2H2 C3H CAMTA CO-like CPP DBB Dof E2F/DP EIL ERF FAR1 G2-like GATA GeBP GRAS GRF HB-other HB-PHD HD-ZIP HSF LBD LFY LSD M-type MIKC MYB MYB related NAC NF-X1 NF-YA NF-YB NF-YC Nin-like Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem 8 14 4 3 50 47 28 41 8 4 7 6 8 10 4 16 15 14 12 13 20 7 15 2 17 14 2 0 3 4 10 23 42 29 2 10 6 7 11 0 0 0 0 2 1 2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 1 0 0 0 0 1 2 2 1 0 0 1 0 0 0 0.26 0.45 0.13 0.1 1.62 1.52 0.91 1.33 0.26 0.13 0.23 0.19 0.26 0.32 0.13 0.52 0.49 0.45 0.39 0.42 0.65 0.23 0.49 0.06 0.55 0.45 0.06 0 0.1 0.13 0.32 0.75 1.36 0.94 0.06 0.32 0.19 0.23 0.36 1 1 1 1 0.49 0.79 0.23 0.74 1 1 1 1 1 1 0.12 1 1 1 1 1 1 1 0.08 1 0.43 1 1 NA 1 0.12 0.04 0.17 0.75 1 1 0.28 1 1 1 Table C.15: (continued) 131 Family Tissue Assayed Genes Observed Overlap Expected Overlap FET p-value RAV S1Fa-like SBP SRS STAT TALE TCP Trihelix VOZ Whirly WOX WRKY YABBY ZF-HD ALL Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem 0 0 11 2 1 13 6 23 2 2 0 19 4 0 640 0 0 0 0 0 0 1 0 0 0 0 0 0 0 20 0 0 0.36 0.06 0.03 0.42 0.19 0.75 0.06 0.06 0 0.62 0.13 0 20.73 NA NA 1 1 1 1 0.18 1 1 1 NA 1 1 NA 0.6 AP2 ARF ARR-B B3 BBR-BPC BES1 bHLH bZIP C2H2 C3H CAMTA CO-like CPP DBB Dof E2F/DP EIL ERF FAR1 G2-like GATA GeBP GRAS GRF Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union 10 27 8 18 4 3 53 52 31 42 8 5 7 6 9 10 4 18 15 18 15 15 23 8 0 6 0 2 0 0 3 1 4 2 0 0 1 0 0 0 1 0 2 0 0 0 2 0 0.76 2.06 0.61 1.38 0.31 0.23 4.05 3.97 2.37 3.21 0.61 0.38 0.54 0.46 0.69 0.76 0.31 1.38 1.15 1.38 1.15 1.15 1.76 0.61 1 0.01 1 0.41 1 1 0.78 0.98 0.21 0.84 1 1 0.43 1 1 1 0.27 1 0.32 1 1 1 0.53 1 Table C.15: (continued) 132 Family Tissue Assayed Genes Observed Overlap Expected Overlap FET p-value HB-other HB-PHD HD-ZIP HSF LBD LFY LSD M-type MIKC MYB MYB related NAC NF-X1 NF-YA NF-YB NF-YC Nin-like RAV S1Fa-like SBP SRS STAT TALE TCP Trihelix VOZ Whirly WOX WRKY YABBY ZF-HD ALL Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union Union 15 2 20 14 3 0 3 7 25 32 48 35 2 10 7 8 11 0 0 12 2 1 14 9 24 2 2 0 23 4 1 724 2 0 3 1 1 0 0 2 5 3 4 2 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 49 1.15 0.15 1.53 1.07 0.23 0 0.23 0.54 1.91 2.45 3.67 2.68 0.15 0.76 0.54 0.61 0.84 0 0 0.92 0.15 0.08 1.07 0.69 1.83 0.15 0.15 0 1.76 0.31 0.08 55.34 0.32 1 0.19 0.67 0.21 NA 1 0.09 0.04 0.45 0.51 0.76 1 0.55 1 1 0.58 NA NA 1 1 1 1 0.51 0.85 1 1 NA 1 1 1 0.84 Table C.15: (continued) 133 Table C.16: Degree of overlap between CCT (AB list) differentially expressed genes and genes in the 1.5 support intervals for QTL from a previous study. Trait Tissue Assayed Genes Observed Overlap Expected Overlap FET p-value BARE DIAM DIS DTP GLCO GLU KRN KW LEN PROL STAM BARE DIAM DIS DTP GLCO GLU KRN KW LEN PROL STAM BARE DIAM DIS DTP GLCO GLU KRN KW LEN PROL STAM Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Ear Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem Stem 0 29 4 10 3 0 15 17 4 5 10 0 28 4 9 3 0 13 17 4 5 9 0 28 4 10 3 0 14 18 4 5 10 0 4 1 1 0 0 2 1 1 0 1 0 0 1 0 0 0 0 3 0 0 1 0 1 0 0 0 0 1 3 0 0 0 0 1.22 0.17 0.42 0.13 0 0.63 0.72 0.17 0.21 0.42 0 0.95 0.14 0.31 0.1 0 0.44 0.58 0.14 0.17 0.31 0 0.91 0.13 0.32 0.1 0 0.45 0.58 0.13 0.16 0.32 1 0.03 0.16 0.35 1 1 0.13 0.52 0.16 1 0.35 1 1 0.13 1 1 1 1 0.02 1 1 0.27 1 0.6 1 1 1 1 0.37 0.02 1 1 1 134 Table C.17: Degree overlap between our CCT (AB list) differentially expressed genes and genes in metabolic pathways defined in KEGG. Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 26 14 0 0.046 1 10 7 0 0.023 1 EarCCT-A 33 16 0 0.052 1 EarCCT-A 10 5 0 0.016 1 11 7 0 0.023 1 32 19 0 0.062 1 34 27 0 0.088 1 16 8 0 0.026 1 46 31 0 0.101 1 64 45 0 0.147 1 12 5 0 0.016 1 21 13 0 0.042 1 98 59 0 0.192 1 25 15 0 0.049 1 EarCCT-A 8 8 0 0.026 1 EarCCT-A 353 223 0 0.727 1 Pathway Group1 Alpha-linoleic Acid Metabolism chidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies EarCCT-A EarCCT-A ALL EarCCT-A EarCCT-A EarCCT-A EarCCT-A EarCCT-A EarCCT-A EarCCT-A EarCCT-A EarCCT-A EarCCT-A Table C.17: 1 Tissue, candidate, and level of list. 135 Group1 Pathway Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB EarCCTAB Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 26 14 1 0.589 0.452 10 7 0 0.294 1 33 16 0 0.673 1 10 5 0 0.21 1 11 7 0 0.294 1 32 19 2 0.799 0.189 34 27 1 1.136 0.687 16 8 0 0.337 1 46 31 0 1.304 1 64 45 0 1.893 1 12 5 1 0.21 0.193 21 13 0 0.547 1 98 59 3 2.482 0.454 Tissue, candidate, and level of list. 136 Group1 Pathway Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism EarCCTAB EarCCTAB EarCCTAB EarCCTABC EarCCTABC EarCCTABC EarCCTABC EarCCTABC EarCCTABC EarCCTABC EarCCTABC EarCCTABC EarCCTABC Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 25 15 1 0.631 0.475 8 8 1 0.337 0.291 353 223 8 9.38 0.726 26 14 5 1.639 0.018 10 7 1 0.82 0.582 33 16 2 1.874 0.575 10 5 0 0.585 1 11 7 1 0.82 0.582 32 19 3 2.225 0.388 34 27 5 3.162 0.203 16 8 0 0.937 1 46 31 3 3.63 0.721 64 45 5 5.269 0.619 Tissue, candidate, and level of list. 137 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 12 5 1 0.585 0.464 21 13 0 1.522 1 98 59 7 6.909 0.545 25 15 2 1.756 0.539 8 8 1 0.937 0.631 353 223 28 26.113 0.376 26 14 1 0.062 0.06 10 7 0 0.031 1 Eartrans-A 33 16 0 0.07 1 Eartrans-A 10 5 0 0.022 1 11 7 1 0.031 0.03 32 19 0 0.084 1 34 27 0 0.119 1 16 8 0 0.035 1 46 31 0 0.136 1 Group1 Pathway Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism EarCCTABC EarCCTABC EarCCTABC EarCCTABC EarCCTABC EarCCTABC Eartrans-A Eartrans-A Eartrans-A Eartrans-A Eartrans-A Eartrans-A Eartrans-A Table C.17: 1 Tissue, candidate, and level of list. 138 Pathway Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Group1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value Eartrans-A 64 45 2 0.198 0.017 12 5 1 0.022 0.022 21 13 0 0.057 1 98 59 2 0.259 0.028 25 15 0 0.066 1 Eartrans-A 8 8 0 0.035 1 Eartrans-A 353 223 5 0.98 0.003 26 14 2 0.506 0.089 10 7 0 0.253 1 33 16 1 0.578 0.445 10 5 2 0.181 0.012 11 7 1 0.253 0.227 32 19 1 0.687 0.503 34 27 2 0.976 0.255 Eartrans-A Eartrans-A Eartrans-A Eartrans-A EartransAB EartransAB EartransAB EartransAB EartransAB EartransAB EartransAB Table C.17: 1 Tissue, candidate, and level of list. 139 Group1 Pathway Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis EartransAB EartransAB EartransAB EartransAB EartransAB EartransAB EartransAB EartransAB EartransAB EartransABC EartransABC EartransABC EartransABC Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 16 8 0 0.289 1 46 31 4 1.121 0.025 64 45 5 1.627 0.023 12 5 1 0.181 0.168 21 13 0 0.47 1 98 59 3 2.133 0.36 25 15 0 0.542 1 8 8 0 0.289 1 353 223 15 8.062 0.016 26 14 2 1.213 0.345 10 7 0 0.606 1 33 16 1 1.386 0.766 10 5 2 0.433 0.063 Tissue, candidate, and level of list. 140 Group1 Pathway Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism EartransABC EartransABC EartransABC EartransABC EartransABC EartransABC EartransABC EartransABC EartransABC EartransABC EartransABC EartransABC LeafCCT-A Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 11 7 2 0.606 0.118 32 19 1 1.646 0.821 34 27 3 2.339 0.418 16 8 1 0.693 0.516 46 31 6 2.686 0.047 64 45 7 3.898 0.09 12 5 1 0.433 0.364 21 13 0 1.126 1 98 59 5 5.111 0.588 25 15 2 1.299 0.378 8 8 0 0.693 1 353 223 23 19.319 0.218 26 13 0 0.021 1 Tissue, candidate, and level of list. 141 Pathway Group1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies LeafCCT-A 10 7 0 0.011 1 LeafCCT-A 33 19 0 0.03 1 LeafCCT-A 10 6 0 0.01 1 11 7 0 0.011 1 32 19 0 0.03 1 34 30 0 0.048 1 16 9 0 0.014 1 46 34 0 0.054 1 64 47 0 0.075 1 12 5 0 0.008 1 21 14 0 0.022 1 98 62 0 0.099 1 25 15 0 0.024 1 LeafCCT-A 8 8 0 0.013 1 ALL LeafCCT-A 353 236 0 0.378 1 Alpha-linoleic Acid Metabolism LeafCCTAB 26 13 0 0.441 1 LeafCCT-A LeafCCT-A LeafCCT-A LeafCCT-A LeafCCT-A LeafCCT-A LeafCCT-A LeafCCT-A LeafCCT-A LeafCCT-A Table C.17: 1 Tissue, candidate, and level of list. 142 Group1 Pathway Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB LeafCCTAB Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 10 7 0 0.237 1 33 19 2 0.644 0.134 10 6 2 0.203 0.016 11 7 1 0.237 0.215 32 19 0 0.644 1 34 30 2 1.017 0.271 16 9 0 0.305 1 46 34 1 1.153 0.691 64 47 2 1.594 0.477 12 5 0 0.17 1 21 14 0 0.475 1 98 62 1 2.103 0.883 25 15 0 0.509 1 Tissue, candidate, and level of list. 143 Pathway Group1 Synthesis/Degradation of Ketone Bodies LeafCCTAB LeafCCTAB ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 8 8 1 0.271 0.241 353 236 9 8.004 0.408 26 13 1 1.276 0.739 10 7 1 0.687 0.515 33 19 3 1.865 0.285 10 6 2 0.589 0.111 11 7 1 0.687 0.515 32 19 2 1.865 0.569 34 30 7 2.945 0.023 16 9 0 0.883 1 46 34 4 3.338 0.432 64 47 4 4.614 0.691 12 5 0 0.491 1 Tissue, candidate, and level of list. 144 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 21 14 0 1.374 1 98 62 6 6.086 0.577 25 15 1 1.472 0.788 8 8 1 0.785 0.563 353 236 26 23.167 0.296 26 13 0 0.026 1 10 7 0 0.014 1 Leaftrans-A 33 19 0 0.038 1 Leaftrans-A 10 6 0 0.012 1 11 7 0 0.014 1 32 19 0 0.038 1 34 30 0 0.059 1 16 9 0 0.018 1 46 34 0 0.067 1 64 47 0 0.093 1 Group1 Pathway Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC LeafCCTABC Leaftrans-A Leaftrans-A Leaftrans-A Leaftrans-A Leaftrans-A Leaftrans-A Leaftrans-A Leaftrans-A Table C.17: 1 Tissue, candidate, and level of list. 145 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 12 5 0 0.01 1 21 14 0 0.028 1 98 62 0 0.123 1 25 15 0 0.03 1 Leaftrans-A 8 8 0 0.016 1 Leaftrans-A 353 236 0 0.468 1 26 13 1 0.447 0.365 10 7 0 0.241 1 33 19 0 0.653 1 10 6 0 0.206 1 11 7 0 0.241 1 32 19 1 0.653 0.486 34 30 0 1.031 1 16 9 0 0.309 1 Pathway Group1 Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies Leaftrans-A Leaftrans-A Leaftrans-A Leaftrans-A ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransAB Table C.17: 1 Tissue, candidate, and level of list. 146 Group1 Pathway Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis and Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransAB LeaftransABC LeaftransABC LeaftransABC LeaftransABC LeaftransABC Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 46 34 0 1.169 1 64 47 0 1.616 1 12 5 1 0.172 0.16 21 14 1 0.481 0.387 98 62 2 2.131 0.634 25 15 1 0.516 0.408 8 8 0 0.275 1 353 236 6 8.112 0.826 26 13 2 1.212 0.345 10 7 0 0.652 1 33 19 1 1.771 0.844 10 6 0 0.559 1 11 7 0 0.652 1 Tissue, candidate, and level of list. 147 Group1 Pathway Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism LeaftransABC LeaftransABC LeaftransABC LeaftransABC LeaftransABC LeaftransABC LeaftransABC LeaftransABC LeaftransABC LeaftransABC LeaftransABC StemCCT-A StemCCT-A Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 32 19 2 1.771 0.54 34 30 3 2.796 0.539 16 9 0 0.839 1 46 34 2 3.169 0.839 64 47 3 4.381 0.827 12 5 1 0.466 0.387 21 14 1 1.305 0.746 98 62 3 5.779 0.937 25 15 3 1.398 0.158 8 8 1 0.746 0.543 353 236 17 21.997 0.897 26 15 0 0.03 1 10 7 0 0.014 1 Tissue, candidate, and level of list. 148 Pathway Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Group1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value StemCCT-A 33 17 0 0.034 1 StemCCT-A 10 6 0 0.012 1 11 7 0 0.014 1 32 19 0 0.039 1 34 30 0 0.061 1 16 8 0 0.016 1 46 32 0 0.065 1 64 47 0 0.095 1 12 6 0 0.012 1 21 14 0 0.028 1 98 61 1 0.124 0.117 25 16 0 0.032 1 StemCCT-A 8 8 0 0.016 1 StemCCT-A 353 235 1 0.477 0.382 26 15 1 0.486 0.39 10 7 0 0.227 1 StemCCT-A StemCCT-A StemCCT-A StemCCT-A StemCCT-A StemCCT-A StemCCT-A StemCCT-A StemCCT-A StemCCT-A StemCCTAB StemCCTAB Table C.17: 1 Tissue, candidate, and level of list. 149 Pathway Group1 Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB StemCCTAB Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 33 17 1 0.551 0.429 10 6 0 0.194 1 11 7 1 0.227 0.206 32 19 1 0.615 0.465 34 30 1 0.972 0.628 16 8 0 0.259 1 46 32 0 1.037 1 64 47 2 1.523 0.453 12 6 0 0.194 1 21 14 0 0.454 1 98 61 1 1.976 0.866 25 16 1 0.518 0.41 8 8 1 0.259 0.232 Tissue, candidate, and level of list. 150 Pathway Group1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value ALL StemCCTAB 353 235 8 7.613 0.494 26 15 3 1.546 0.196 10 7 1 0.721 0.533 33 17 2 1.752 0.535 10 6 0 0.618 1 11 7 2 0.721 0.157 32 19 1 1.958 0.874 34 30 4 3.091 0.374 16 8 1 0.824 0.581 46 32 1 3.297 0.969 64 47 9 4.843 0.048 12 6 0 0.618 1 21 14 0 1.443 1 Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism StemCCTABC StemCCTABC StemCCTABC StemCCTABC StemCCTABC StemCCTABC StemCCTABC StemCCTABC StemCCTABC StemCCTABC StemCCTABC StemCCTABC Table C.17: 1 Tissue, candidate, and level of list. 151 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 98 61 9 6.286 0.172 25 16 1 1.649 0.825 8 8 1 0.824 0.581 353 235 29 24.215 0.176 26 15 0 0.006 1 10 7 0 0.003 1 Stemtrans-A 33 17 0 0.006 1 Stemtrans-A 10 6 0 0.002 1 11 7 0 0.003 1 32 19 0 0.007 1 34 30 0 0.011 1 16 8 0 0.003 1 46 32 0 0.012 1 64 47 0 0.018 1 12 6 0 0.002 1 21 14 0 0.005 1 Group1 Pathway Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism StemCCTABC StemCCTABC StemCCTABC StemCCTABC Stemtrans-A Stemtrans-A Stemtrans-A Stemtrans-A Stemtrans-A Stemtrans-A Stemtrans-A Stemtrans-A Stemtrans-A Stemtrans-A Table C.17: 1 Tissue, candidate, and level of list. 152 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 98 61 0 0.023 1 25 16 0 0.006 1 Stemtrans-A 8 8 0 0.003 1 Stemtrans-A 353 235 0 0.088 1 26 15 0 0.168 1 10 7 0 0.078 1 33 17 0 0.19 1 10 6 0 0.067 1 11 7 0 0.078 1 32 19 0 0.213 1 34 30 0 0.336 1 16 8 1 0.09 0.086 46 32 0 0.358 1 64 47 0 0.526 1 Pathway Group1 Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies Stemtrans-A Stemtrans-A ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism StemtransAB StemtransAB StemtransAB StemtransAB StemtransAB StemtransAB StemtransAB StemtransAB StemtransAB StemtransAB Table C.17: 1 Tissue, candidate, and level of list. 153 Group1 Pathway Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL Alpha-linoleic Acid Metabolism Arachidonic Acid Metabolism Biosynthesis of Unsaturated Fatty Acids Cutin, Suberine, and Wax Biosynthesis Ether Lipid Metabolism Fatty Acid Biosynthesis Fatty Acid Degradation StemtransAB StemtransAB StemtransAB StemtransAB StemtransAB StemtransAB StemtransABC StemtransABC StemtransABC StemtransABC StemtransABC StemtransABC StemtransABC Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 12 6 0 0.067 1 21 14 0 0.157 1 98 61 1 0.683 0.498 25 16 0 0.179 1 8 8 0 0.09 1 353 235 2 2.632 0.743 26 15 0 0.601 1 10 7 0 0.28 1 33 17 0 0.681 1 10 6 0 0.24 1 11 7 0 0.28 1 32 19 0 0.761 1 34 30 1 1.202 0.707 Tissue, candidate, and level of list. 154 Group1 Pathway Fatty Acid Elongation Glycerolipid Metabolism Glycerophospholipid Metabolism Linoleic Acid Metabolism Sphingolipid Metabolism Starch and sucrose metabolism Steroid Biosynthesis Synthesis/Degradation of Ketone Bodies ALL StemtransABC StemtransABC StemtransABC StemtransABC StemtransABC StemtransABC StemtransABC StemtransABC StemtransABC Table C.17: 1 Pathway Genes Assayed Genes Overlap (obs) Overlap (exp) FET p-value 16 8 1 0.32 0.279 46 32 1 1.282 0.73 64 47 1 1.883 0.854 12 6 0 0.24 1 21 14 0 0.561 1 98 61 4 2.444 0.228 25 16 1 0.641 0.48 8 8 1 0.32 0.279 353 235 8 9.414 0.73 Tissue, candidate, and level of list. 155 Table C.18: Significantly enriched and depleted GO terms from CCT and trans only gene lists including tissue, group, accession, description, counts, rate of occurrence, and FDR corrected p-values. GO Description Cand. genes in acc. Genes in acc. Prop. cand. genes Prop. assayed genes FDR chloroplast 135 937 0.144 0.071 0.002 plastid 146 1062 0.137 0.081 0.007 thylakoid 35 171 0.205 0.013 0.012 Leaf-CCTABC chloroplast thylakoid membrane 26 115 0.226 0.009 0.017 Leaf-CCTABC DNA binding 43 771 0.056 0.059 0.016 Ear-transA chlorophyll biosynthetic process 3 13 0.231 0.001 0.027 26 228 0.114 0.017 0 26 228 0.114 0.017 0 40 474 0.084 0.036 0 20 160 0.125 0.012 0 6 13 0.462 0.001 0.001 49 798 0.061 0.06 0.013 Group1 Leaf-CCTABC Leaf-CCTABC Leaf-CCTABC Ear-transAB Ear-transAB Ear-transAB Ear-transAB Ear-transAB 2 nucleic acid binding transcription factor activity sequencespecific DNA binding transcription factor activity regulation of transcription, DNA-dependent sequencespecific DNA binding chlorophyll biosynthetic process Ear-transAB Table C.18: DNA binding 1 Tissue, candidate, and level of list. 2 Under-represented GO term 156 Group GO Description Cand. genes in acc. Genes in acc. Prop. cand. genes Prop. assayed genes FDR Ear-transAB biological process 273 6578 0.042 0.499 0.022 45 228 0.197 0.017 0 45 228 0.197 0.017 0 69 474 0.146 0.036 0.003 7 13 0.538 0.001 0.012 29 160 0.181 0.012 0.02 ribosome 30 294 0.102 0.022 0 cell division 11 77 0.143 0.006 0.03 microtubule 9 60 0.15 0.005 0.048 Leaf-transAB structural constituent of ribosome 20 224 0.089 0.017 0.048 Leaf-transABC cell division 21 77 0.273 0.006 0.005 1 Ear-transABC Ear-transABC Ear-transABC Ear-transABC Ear-transABC Leaf-transAB Leaf-transAB Leaf-transAB nucleic acid binding transcription factor activity sequencespecific DNA binding transcription factor activity regulation of transcription, DNA-dependent chlorophyll biosynthetic process sequencespecific DNA binding Table C.18: 1 Tissue, candidate, and level of list. 2 Under-represented 157 Appendix D Characterization of domestication traits for selection candidate gene Zea agamous2 158 D.1 Forward This appendix details unpublished work on characterization of a selected gene in maize known as Zea agamous2 (zag2 ), a homolog of the Arabidopsis thaliana gene Agamous. Work was carried out by myself with other members of the Doebley Lab contributing to genotyping and phenotyping efforts. 159 D.2 Introduction Multiple studies have looked to identify the signature of selection, both artificial and natural, in evolving species [20, 35, 111, 112]. In maize, recent studies have looked at the signature of selection on both a gene by gene basis [113, 114] and in genome-wide scans [55]. Knowing a gene was under selection during domestication can be difficult to interpret in terms of phenotypic impact due to the inherent lack of phenotype association in population genetic analyses. While some indication as to phenotypic effect can be drawn from analysis of selected genes with protein domain annotation and gene ontology tools, concrete association of a gene with a phenotypic effect using empirical data is still desired. One gene identified as the target of artificial selection in a recent study [114] is a known homolog of Agamous from Arabidopsis thaliana. This Agamous homolog (Zea agamous2 or zag2 ) is located on the third chromosome at ∼137.2 megabases (AGPv2). The translated protein of zag2 is 258 amino acids long and downstream of the highly conserved MADS-box domain shares approximately 45% identity and 60% similarity with the Arabidopsis Agamous gene [115]. Expression of zag2 is associated with the carpel or flowering section of Arabidopsis thaliana and in maize zag2 appears to be exclusively expressed in the carpels of developing ears [116]. The expression of zag2 mRNA in developing ears suggests a likely effect on domestication phenotypes in the female inflorescence. Our study of zag2 involved two techniques. First, we generated a set of recombinant chromosome near isogenic lines (RCNILs) that had recombination breakpoints between zag2 and both the next up and downstream genes. RCNILs were genotyped using three markers (upstream, at the gene, and downstream) to identify the recombination breakpoints location with respect to zag2. Lines were then planted and phenotyped in multiple environments for a large number of phenotypes that focused on ear traits, but also included a number of other plant and tassel traits. Second, a transgenic RNAi construct 160 carrying a portion of the zag2 gene was transformed into maize and backcrossed with two maize inbred lines. We assessed percent fill as a proxy for sterility, while also testing for presence of the construct using resistance to the BASTA herbicide. Neither of these experiments produced evidence of a concrete link between a domestication phenotype and zag2. D.3 Methods D.3.1 RCNILs We screened 1,710 individuals in the winter and summer of 2009 that were drawn from a heterogeneous inbred family, which was heterozygous at zag2. Markers used were umc1102 and PZD00100. From this screen, thirteen individuals with recombination breakpoints between the upstream and downstream genes were identified. Recombinant individuals were selfed and progeny were genotyped with the same markers in the winter 2010 season. Homozygous individuals were identified and selfed again to produce founding members of the RCNILs. RCNIL seed was then used in subsequent summers for seed increase and replicated field block trials. Genomic DNA was also extracted from founding RCNIL individuals and used to genotype at the zag2 coding sequence. This was done with PZD00013.3 (a Taqman SNP marker) and ZHL0285-ZHL0286 (indel marker). We classified RCNILs by location of breakpoint (up or downstream of zag2 ) and genotype (maize or teosinte) at zag2. This resulted in four recombinant NIL classes and two control NIL classes. Phenotyping blocks consisted of RCNILs and several control NILs that were homozygous maize or teosinte for the entire zag2 region. Lines were planted in randomized twelve plant plots in four blocks each in the summer of 2010 and 2011 at the West Madison 161 Agricultural Research Station (WMARS). Thirteen plant architecture traits and seven ear traits were measured (Table D.1) for up to five plants per phenotyping block. Phenotype measurements were fit to a basic linear mixed model (Equation D.1) in R [91] using the lme4 package. This basic model only included explanatory variables for the RCNIL line (ai ) and the block as a random effect (bj ). This was done because the overall size of blocks was small and positional variation due to X and Y position seemed unlikely to be significant. yijk = µ + ai + bj + eijk (D.1) After this model was fit, fixed effects estimates and standard errors were extracted and we looked for association of the phenotypes (represented by fixed effects estimates) with NIL class. D.3.2 Transgenic RNAi lines A zag2 interference RNA (RNAi) construct was developed and introduced into maize. Thirteen insertion events of the RNAi construct were recovered and crossed by maize inbreds B73 and A682. The resulting progeny were then planted in the summer of 2009 and ears were harvested for observation of phenotypes. We scored the percent fill of ears in an effort to assess sterility of individuals with and without the RNAi construct insertion events. The construct carried a BASTA herbicide resistance gene, which allowed for the scoring of presence/absence of the construct by BASTA herbicide treatment. In total, 275 individuals both BASTA resistant and susceptible (construct present or absent) were harvested and scored for the sterility phenotype. Scoring was done by estimation of percent fill in a randomized, blind method to avoid bias caused by knowledge of the individual construct genotype. Phenotypes were analyzed using simple t-test comparisons in R [91]. 162 Table D.1: Trait abbreviations and descriptions from the zag2 experiment. Trait abbreviation Description CULM BARE BRNO LWID LCS TBN EAHT PLHT TILL BRLH NODE LBIL PROL FILL EARL EARD KRN CUPR STAM KW Culm diameter Barren nodes Number of nodes with silks Leaf width Length of central spike Tassel branch number Ear height Plant height Tillering index Branch length including ear Nodes on lateral branch Lateral branch internode length Prolificacy Percent fill Ear length Ear diameter Kernel row number Cupules per rank Percent staminate spikelets Single kernel weight 163 D.4 Results D.4.1 RCNILs The fixed effects estimates and standard errors were sorted from least to greatest, plotted as barplots, and inspected for association with RCNIL type, in terms of genotype upstream, at, and downstream of zag2. While a few single RCNILs differed from others, there was no distinct clustering of RCNIL type in clearly differentiated phenotype groupings for any of the thirteen plant and seven ear phenotypes. Generally, the phenotype estimates for the maize and teosinte control NILs also did not cleanly separate from each other. An example of RCNIL estimates sorted from least to greatest is shown for single kernel weight in Figure D.1). While the maize and teosinte control NILs are not intermingled, there is no clustering of genotypes of the four RCNIL types. Additionally, we see RCNILs with lower phenotype estimates than the maize control NILs, suggesting that if zag2 influences kernel weight it does so in an unexpected underdominant manner. D.4.2 Transgenic RNAi lines Generally, high percent fill was seen in transgenic plants. The two maize backgrounds (B73 and A682) were not significantly different from each other in percent fill (t-test, p = 0.525). Data was collected from only three RNAi transformation events in both the A682 and B73 maize inbred backgrounds. In these three events, a consistent result was only seen for one event (event 39 had no effect in either background, Table D.2), suggesting the effect of an event is dependent on genetic background. Of the fifteen maize transformation event and background combinations, only four had significantly different percent fill between resistant plants (construct positive) and susceptible plants (construct negative). Three of the significant results were large shifts with more than a 60% change in percent fill while the fourth significant result was a more moderate 11% change. 164 Figure D.1: Single kernel weight estimates for zag2 RCNILs. RCNIL class is indicated in the bar with error bars indicating the standard error. Maize and teosinte NILs are not intermingled, however, there is also no clear separation of the RCNIL types (t1, t2, t3, t4) when lines are sorted by estimated phenotype. Furthermore, RCNILs have a lower phenotype than either of the control NILs, suggesting some sort of underdominance may be at work. 165 Table D.2: Zag2 transgenic RNAi insertion event, background, phenotype, and t-test p-value. Maize Background Event Percent Fill (Resistant) Percent Fill (Susceptible) p-value A682 17 97.8% 98.9% 6.63e-01 B73 23 90.0% 91.0% 6.49e-01 B73 24 100.0% 97.8% 3.47e-01 B73 33 23.0% 97.5% 2.94e-04 A682 B73 35 35 20.0% 98.0% 94.0% 98.8% 6.01e-04 6.87e-01 A682 B73 39 39 98.0% 93.3% 90.0% 95.0% 1.15e-01 4.89e-01 B73 43 32.2% 95.0% 1.17e-07 A682 45 100.0% 92.9% 2.53e-01 A682 46 95.6% 94.4% 8.29e-01 A682 B73 47 47 84.4% 100.0% 83.0% 88.9% 8.56e-01 2.75e-03 B73 49 94.4% 86.7% 1.68e-01 B73 50 91.1% 90.0% 3.47e-01 166 D.5 Discussion The results obtained from measurement of phenotypes in RCNILs do not present a clear phenotypic effect of the zag2 gene. RCNIL estimates and standard errors of maize and teosinte control lines were never significantly different from each other. Furthermore, the remaining four genotype classes, distinguished by genotype upstream, at, and downstream of zag2, failed to cluster in segregating groups based on phenotype. Overall, there is very little if any evidence that zag2 has any effect on the 20 measured phenotypes. The reduction in expression of zag2 via transgenic RNAi constructs, likewise failed to present compelling evidence for a phenotypic effect on percent fill of the ear. Overall, relatively few zag2 RNAi transformation events resulted in increased sterility (measured by percent fill). The effect on sterility of any given event seems to be highly dependent on genetic background, since less than half of the events assessed in multiple maize backgrounds gave the same result. Most significant results consisted of drastic increase in sterility, suggesting a major genetic dysfunction. We conclude that the zag2 RNAi constructs have largely non-significant results, which are punctuated by several cases of high genetic dysfunction. Furthermore, the inconsistent effects of specific transformation events in different maize backgrounds seem unlikely to be related to zag2. We failed to identify a phenotypic effect for zag2 in spite of evidence from the literature that zag2 is expressed in the ear [116] and codes for a homolog of a known floral development gene in Arabidopsis [115]. It may be that zag2 controls a phenotype that was under selection during maize domestication that we did not measure. Work by Schmidt et al. [116] shows that zag2 and another Agamous homolog (zag1 ) are expressed in endosperm post-pollination, suggesting a potential role in kernel quality and composition. While we did measure kernel weight, there are many factors that contribute to kernel quality and desirability that we did not assess including hard to soft endosperm ratio, protein, oil, and starch content. 167 A potential complicating factor in our analysis of zag2 is the existence of three additional Agamous homologs in maize [115]. These homologs also share a high degree of identity with the Arabidopsis Agamous and consequently, a high degree of identity and similarity with each other. Of particularly high protein identity with zag2 is Zea mays Mads1 (zmm1 ), which is over 95% identical. The high degree of identity between the maize Agamous homologs is concerning in conjunction with expression in the same tissues [116] as it suggests functional conservation as well as sequence conservation. For example, if the zmm1 gene can substitute functionally for the zag2 gene in the developing ear, then an experiment looking for an ear phenotypic response (such as the RCNIL experiment) would need to account for the genotype at both zag2 and zmm1. The failure to associate a domestication phenotype with zag2 demonstrates the difficulty in using a population genetics approach to identify interesting candidate genes. From the perspective of population genetics, zag2 appears to have been under selection during the maize domestication event and has homology with a known floral development gene in Arabidopsis. A phenotypic effect on a domestication ear phenotype seems quite likely, however, we did not see any noticeable effects in the female inflorescence in these experiments. Similar difficulties in associating phenotype to selection candidate genes has been encountered for two other genes in our lab. The Prolamin-box Binding Factor1 gene was extensively phenotyped in plant architecture and ear traits (unpublished data), before finally identifying a slight difference in kernel size and density [14]. Additionally, the Zea agamous-like1 gene appears to have a significant effect on days to anthesis or flowering time in maize (unpublished data), however, flowering time is not a standard domestication trait. This study sheds light on the difficulty of associating phenotype with a selection candidate gene and provides a word of caution for future studies seeking to accomplish this feat. 168 References [1] Gaines T, Zhang W, Wang D, Bukun B, Chisholm ST, et al. (2010) Gene amplification confers glyphosate resistance in Amaranthus palmeri. Proceedings of the National Academy of Sciences 107: 1029–34. [2] Gompel N, Prud’homme B, Wittkopp PJ, Kassner V, Carroll SB (2005) Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature 433: 481–7. [3] Studer A, Zhao Q, Ross-Ibarra J, Doebley J (2011) Identification of a functional transposon insertion in the maize domestication gene tb1. Nature Genetics 43: 1160–3. [4] Wills DM, Whipple CJ, Takuno S, Kursel LE, Shannon LM, et al. (2013) From Many, One: Genetic Control of Prolificacy during Maize Domestication. PLoS Genetics 9: e1003604. [5] Wang H, Nussbaum-Wagler T, Li B, Zhao Q, Vigouroux Y, et al. (2005) The origin of the naked grains of maize. Nature 436: 714–9. [6] Sun L, Li X, Fu Y, Zhu Z, Tan L, et al. (2013) GS6, a member of the GRAS gene family, negatively regulates grain size in rice. Journal of Integrative Plant Biology : 1–37. [7] Olsen KM, Wendel JF (2013) A bountiful harvest: genomic insights into crop domestication phenotypes. Annual Review of Plant Biology 64: 47–70. [8] Doebley J (2004) The genetics of maize evolution. Annual Review of Genetics 38: 37–59. [9] Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, et al. (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326: 1112–5. [10] Allaby RG, Fuller DQ, Brown TA (2008) The genetic expectations of a protracted model for the origins of domesticated crops. Proceedings of the National Academy of Sciences 105: 13982–6. [11] Pickersgill B (2007) Domestication of plants in the Americas: Mendelian and molecular genetics. Annals of Botany 100: 925–40. insights from 169 [12] Carroll SB (2008) Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134: 25–36. [13] Wittkopp PJ, Kalay G (2012) Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nature Reviews Genetics 13: 59–69. [14] Lang Z, Wills D, Lemmon Z, Shannon L, Bukowski R, et al. (2014) Defining the role of prolamin-box binding factor1 gene during maize domestication. The Journal of Heredity : In Press. [15] Konishi S, Izawa T, Lin SY, Ebana K, Fukuta Y, et al. (2006) An SNP caused loss of seed shattering during rice domestication. Science 312: 1392–6. [16] Frary A, Nesbitt TC, Grandillo S, Knaap E, Cong B, et al. (2000) fw2.2 : a quantitative trait locus key to the evolution of tomato fruit size. Science 289: 85–8. [17] Rapp RA, Haigler CH, Flagel L, Hovav RH, Udall JA, et al. (2010) Gene expression in developing fibres of Upland cotton (Gossypium hirsutum L.) was massively altered by domestication. BMC Biology 8: 139. [18] Swanson-Wagner R, Briskine R, Schaefer R, Hufford MB, Ross-Ibarra J, et al. (2012) Reshaping of the maize transcriptome by domestication. Proceedings of the National Academy of Sciences 109: 11878–83. [19] Koenig D, Jiménez-Gómez JM, Kimura S, Fulop D, Chitwood DH, et al. (2013) Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato. Proceedings of the National Academy of Sciences 110: E2655–62. [20] Emerson JJ, Hsieh LC, Sung HM, Wang TY, Huang CJ, et al. (2010) Natural selection on cis and trans regulation in yeasts. Genome Research 20: 826–36. [21] McManus CJ, Coolon JD, Duff MO, Eipper-Mains J, Graveley BR, et al. (2010) Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Research 20: 816–25. [22] White Ma, Stubbings M, Dumont BL, Payseur Ba (2012) Genetics and evolution of hybrid male sterility in house mice. Genetics 191: 917–34. [23] Alem S, Streiff R, Courtois B, Zenboudji S, Limousin D, et al. (2013) Genetic architecture of sensory exploitation: QTL mapping of female and male receiver traits in an acoustic moth. Journal of Evolutionary Biology 26: 2581–96. [24] Miller CT, Glazer AM, Summers BR, Blackman BK, Norman AR, et al. (2014) Modular Skeletal Evolution in Sticklebacks Is Controlled by Additive and Clustered Quantitative Trait Loci. Genetics : In Press. [25] Shannon LM (2012) The Genetic Architecture of Maize Domestication and Range Expansion. Ph.D. dissertation. Ph.D. thesis, University of Wisconsin - Madison. 170 [26] Wills DM, Burke JM (2007) Quantitative trait locus analysis of the early domestication of sunflower. Genetics 176: 2589–99. [27] Paterson AH, Damon S, Hewitt JD, Zamir D, Rabinowitch HD, et al. (1991) Mendelian factors underlying quantitative traits in tomato: comparison across species, generations, and environments. Genetics 127: 181–97. [28] Xiong LZ, Liu KD, Dai XK, Xu CG, Zhang Q (1999) Identification of genetic factors controlling domestication-related traits of rice using an F2 population of a cross between Oryza sativa and O. rufipogon. Theoretical and Applied Genetics 98: 243–251. [29] Peng J, Ronin Y, Fahima T, Röder MS, Li Y, et al. (2003) Domestication quantitative trait loci in Triticum dicoccoides, the progenitor of wheat. Proceedings of the National Academy of Sciences 100: 2489–94. [30] Cai W, Morishima H (2002) QTL clusters reflect character associations in wild and cultivated rice. Theoretical and Applied Genetics 104: 1217–1228. [31] Gyenis L, Yun SJ, Smith KP, Steffenson BJ, Bossolini E, et al. (2007) Genetic architecture of quantitative trait loci associated with morphological and agronomic trait differences in a wild by cultivated barley cross. Genome 50: 714–23. [32] Simons KJ, Fellers JP, Trick HN, Zhang Z, Tai YS, et al. (2006) Molecular characterization of the major wheat domestication gene Q. Genetics 172: 547–55. [33] Li C, Zhou A, Sang T (2006) Rice domestication by reducing shattering. Science 311: 1936–9. [34] Cong B, Barrero LS, Tanksley SD (2008) Regulatory change in YABBY-like transcription factor led to evolution of extreme fruit size during tomato domestication. Nature Genetics 40: 800–4. [35] Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, et al. (2005) The effects of artificial selection on the maize genome. Science 308: 1310–4. [36] Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127: 1309–21. [37] Briggs WH, McMullen MD, Doebley JF, Gaut BS (2007) Linkage mapping of domestication loci in a large maize teosinte backcross resource. Genetics 177: 1915–28. [38] Doebley J, Stec A (1991) Genetic analysis of the morphological differences between maize and teosinte. Genetics 129: 285–95. [39] Whipple CJ, Kebrom TH, Weber AL, Yang F, Hall D, et al. (2011) Grassy Tillers1 Promotes Apical Dominance in Maize and Responds To Shade Signals in the Grasses. Proceedings of the National Academy of Sciences 108: E506–12. 171 [40] Doebley J, Stec A, Hubbard L (1997) The evolution of apical dominance in maize. Nature 386: 485–8. [41] Clark RM, Nussbaum-Wagler T, Quijada P, Doebley J (2006) A distant upstream enhancer at the maize domestication gene tb1 has pleiotropic effects on plant and inflorescent architecture. Nature Genetics 38: 594–7. [42] Studer AJ, Doebley JF (2011) Do large effect QTL fractionate? A case study at the maize domestication QTL teosinte branched1. Genetics 188: 673–81. [43] Doebley J, Stec A (1993) Inheritance of the morphological differences between maize and teosinte: comparison of results for two F2 populations. Genetics 134: 559–70. [44] Littell R, Milliken G, Stroup W, Wolfinger R (1996) SAS system for mixed models. SAS Institute, Cary, NC., 2nd edition. [45] Broman KW, Wu H, Sen S, Churchill G (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19: 889–890. [46] Broman KW, Sen S (2009) A Guide to QTL Mapping with R/qtl. Statistics for Biology and Health. New York, NY: Springer New York. doi:10.1007/978-0-387-92125-9. URL http://www.springerlink.com/index/10.1007/978-0-387-92125-9. [47] Kosambi DD (1944) The Estimation of Map Distances from Recombination Values. Annals of Eugenics 12: 172–175. [48] Orr HA (1998) The Population Genetics of Adaptation: The Distribution of Factors Fixed during Adaptive Evolution. Evolution 52: 935. [49] Beavis WD (1998) QTL Analyses: Power, Precision, and Accuracy. In: Paterson AH, editor, Molecular Dissection of Complex Traits, New York, NY: CRC Press, chapter 10. 1 edition, pp. 145–162. [50] Hung HY, Shannon LM, Tian F, Bradbury PJ, Chen C, et al. (2012) ZmCCT and the genetic basis of day-length adaptation underlying the postdomestication spread of maize. Proceedings of the National Academy of Sciences 109: E1913–21. [51] Yilmaz A, Nishiyama MY, Fuentes BG, Souza GM, Janies D, et al. (2009) GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiology 149: 171–80. [52] Yanofsky MF, Ma H, Bowman JL, Drews GN, Feldmann KA, et al. (1990) The protein encoded by the Arabidopsis homeotic gene agamous resembles transcription factors. Nature 346: 35–9. [53] Schwarz-Sommer Z, Huijser P, Nacken W, Saedler H, Sommer H (1990) Genetic Control of Flower Development by Homeotic Genes in Antirrhinum majus. Science 250: 931–6. 172 [54] Smaczniak C, Immink RGH, Angenent GC, Kaufmann K (2012) Developmental and evolutionary diversity of plant MADS-domain factors: insights from recent studies. Development 139: 3081–98. [55] Hufford MB, Xu X, van Heerwaarden J, Pyhäjärvi T, Chia JM, et al. (2012) Comparative population genomics of maize domestication and improvement. Nature Genetics 44: 808–11. [56] Sekhon RS, Lin H, Childs KL, Hansey CN, Robin Buell C, et al. (2011) Genomewide atlas of transcription through maize development. The Plant Journal : 1–11. [57] Xue W, Xing Y, Weng X, Zhao Y, Tang W, et al. (2008) Natural variation in Ghd7 is an important regulator of heading date and yield potential in rice. Nature Genetics 40: 761–7. [58] Li Y, Fan C, Xing Y, Jiang Y, Luo L, et al. (2011) Natural variation in GS5 plays an important role in regulating grain size and yield in rice. Nature Genetics 43: 1266–9. [59] Fan C, Xing Y, Mao H, Lu T, Han B, et al. (2006) GS3, a major QTL for grain length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmembrane protein. Theoretical and Applied Genetics 112: 1164–71. [60] Yu B, Lin Z, Li H, Li X, Li J, et al. (2007) TAC1, a major quantitative trait locus controlling tiller angle in rice. The Plant Journal 52: 891–8. [61] Jin J, Huang W, Gao JP, Yang J, Shi M, et al. (2008) Genetic control of rice plant architecture under domestication. Nature Genetics 40: 1365–9. [62] Yang Q, Li Z, Li W, Ku L, Wang C, et al. (2013) CACTA-like transposable element in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize. Proceedings of the National Academy of Sciences 110: 16969–74. [63] Kermicle JL (2006) A selfish gene governing pollen-pistil compatibility confers reproductive isolation between maize relatives. Genetics 172: 499–506. [64] Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6: e19379. [65] Carroll SB (2005) Evolution at two levels: on genes and form. PLoS Biology 3: e245. [66] Stern DL, Orgogozo V (2008) The loci of evolution: how predictable is genetic evolution? Evolution 62: 2155–77. 173 [67] Springer NM, Stupar RM (2007) Allele-specific expression patterns reveal biases and embryo-specific parent-of-origin effects in hybrid maize. The Plant Cell 19: 2391–402. [68] Bell GDM, Kane NC, Rieseberg LH, Adams KL (2013) RNA-seq analysis of allelespecific expression, hybrid effects, and regulatory divergence in hybrids compared with their parents from natural populations. Genome Biology and Evolution 5: 1309–23. [69] Song G, Guo Z, Liu Z, Cheng Q, Qu X, et al. (2013) Global RNA sequencing reveals that genotype-dependent allele-specific expression contributes to differential expression in rice F1 hybrids. BMC Plant Biology 13: 221. [70] Zhang X, Borevitz JO (2009) Global analysis of allele-specific expression in Arabidopsis thaliana. Genetics 182: 943–54. [71] Tirosh I, Reikhav S, Levy Aa, Barkai N (2009) A yeast hybrid provides insight into the evolution of gene expression regulation. Science 324: 659–62. [72] He F, Zhang X, Hu J, Turck F, Dong X, et al. (2012) Genome-wide Analysis of Cis-regulatory Divergence between Species in the Arabidopsis Genus. Molecular Biology and Evolution 29: 3385–3395. [73] Schaefke B, Emerson JJ, Wang TY, Lu MYJ, Hsieh LC, et al. (2013) Inheritance of gene expression level and selective constraints on trans- and cis-regulatory changes in yeast. Molecular Biology and Evolution 30: 2121–33. [74] Purugganan MD, Fuller DQ (2009) The nature of selection during plant domestication. Nature 457: 843–8. [75] Zhong S, Joung Jg, Zheng Y, Chen Yr, Liu B, et al. (2011) High-throughput illumina strand-specific RNA sequencing library preparation. Cold Spring Harbor Protocols 2011: 940–9. [76] Wang X, Soloway PD, Clark AG (2011) A survey for novel imprinted genes in the mouse placenta by mRNA-seq. Genetics 189: 109–22. [77] Li H, Durbin R (2009) Fast and accurate short read alignment with BurrowsWheeler transform. Bioinformatics 25: 1754–60. [78] DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43: 491–8. [79] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20: 1297–303. 174 [80] Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10: R25. [81] Storey JD (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64: 479–498. [82] Lester RN (1989) Evolution under domestication involving disturbance of genic balance. Euphytica 44: 125–132. [83] Gross BL, Olsen KM (2010) Genetic perspectives on crop domestication. Trends in Plant Science 15: 529–537. [84] Burger JC, Chapman MA, Burke JM (2008) Molecular insights into the evolution of crop plants. American Journal of Botany 95: 113–122. [85] Dean RB, Dixon WJ (1951) Simplified Statistics for Small Numbers of Observations. Analytical Chemistry 23: 636–638. [86] Jin J, Zhang H, Kong L, Gao G, Luo J (2014) PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Research 42: D1182–7. [87] Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research 40: D109–14. [88] Kanehisa M (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28: 27–30. [89] Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective sweeps. Genome Research 20: 393–402. [90] Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 11: R14. [91] R Development Core Team (2013). R: A language and environment for statistical computing. URL http://www.r-project.org/. [92] Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 57: 289–300. [93] Eichten SR, Briskine R, Song J, Li Q, Swanson-Wagner R, et al. (2013) Epigenetic and genetic influences on DNA methylation variation in maize populations. The Plant Cell 25: 2783–97. [94] Duncan IW (2002) Transvection effects in Drosophila. Annual Review of Genetics 36: 521–56. 175 [95] Hansey CN, Vaillancourt B, Sekhon RS, de Leon N, Kaeppler SM, et al. (2012) Maize (Zea mays L.) genome diversity as revealed by RNA-sequencing. PLoS One 7: e33071. [96] Springer NM, Ying K, Fu Y, Ji T, Yeh CT, et al. (2009) Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genetics 5: e1000734. [97] Chia JM, Song C, Bradbury PJ, Costich D, de Leon N, et al. (2012) Maize HapMap2 identifies extant variation from a genome in flux. Nature Genetics 44: 803–7. [98] Tenaillon MI, U’Ren J, Tenaillon O, Gaut BS (2004) Selection versus demography: a multilocus investigation of the domestication process in maize. Molecular Biology and Evolution 21: 1214–25. [99] Clark RM, Linton E, Messing J, Doebley JF (2004) Pattern of diversity in the genomic region near the maize domestication gene tb1. Proceedings of the National Academy of Sciences 101: 700–7. [100] Hanning I, Baumgarten K, Schott K, Heldt H (1999) Oxaloacetate transport into plant mitochondria. Plant Physiology 119: 1025–32. [101] Zoglowek C, Krömer S, Heldt HW (1988) Oxaloacetate and malate transport by plant mitochondria. Plant Physiology 87: 109–15. [102] Hunt HV, Denyer K, Packman LC, Jones MK, Howe CJ (2010) Molecular basis of the waxy endosperm starch phenotype in broomcorn millet (Panicum miliaceum L.). Molecular Biology and Evolution 27: 1478–94. [103] Fan L, Bao J, Wang Y, Yao J, Gui Y, et al. (2009) Post-domestication selection in the maize starch pathway. PLoS One 4: e7612. [104] Park YJ, Nemoto K, Nishikawa T, Matsushima K, Minami M, et al. (2009) Waxy strains of three amaranth grains raised by different mutations in the coding region. Molecular Breeding 25: 623–635. [105] Dussert Y, Remigereau MS, Fontaine MC, Snirc A, Lakis G, et al. (2013) Polymorphism pattern at a miniature inverted-repeat transposable element locus downstream of the domestication gene Teosinte-branched1 in wild and domesticated pearl millet. Molecular Ecology 22: 327–40. [106] Sugimoto K, Takeuchi Y, Ebana K, Miyao A, Hirochika H, et al. (2010) Molecular cloning of Sdr4, a regulator involved in seed dormancy and domestication of rice. Proceedings of the National Academy of Sciences 107: 5792–7. [107] Weller JL, Liew LC, Hecht VFG, Rajandran V, Laurie RE, et al. (2012) A conserved molecular basis for photoperiod adaptation in two temperate legumes. Proceedings of the National Academy of Sciences 109: 21158–63. 176 [108] Zhu BF, Si L, Wang Z, Zhou Y, Zhu J, et al. (2011) Genetic control of a transition from black to straw-white seed hull in rice domestication. Plant Physiology 155: 1301–11. [109] Liu J, Van Eck J, Cong B, Tanksley SD (2002) A new class of regulatory genes underlying the cause of pear-shaped tomato fruit. Proceedings of the National Academy of Sciences 99: 13302–6. [110] Gallavotti A, Zhao Q, Kyozuka J, Meeley RB, Ritter MK, et al. (2004) The role of barren stalk1 in the architecture of maize. Nature 432: 630–5. [111] Carling MD, Brumfield RT (2009) Speciation in Passerina buntings: introgression patterns of sex-linked loci identify a candidate gene region for reproductive isolation. Molecular Ecology 18: 834–47. [112] Pool JE, Corbett-Detig RB, Sugino RP, Stevens KA, Cardeno CM, et al. (2012) Population Genomics of sub-saharan Drosophila melanogaster : African diversity and non-African admixture. PLoS Genetics 8: e1003080. [113] Zhao Q, Thuillet AC, Uhlmann NK, Weber AL, Rafalski JA, et al. (2008) The role of regulatory genes during maize domestication: evidence from nucleotide polymorphism and gene expression. Genetics 178: 2133–43. [114] Zhao Q, Weber AL, McMullen MD, Guill K, Doebley J (2011) MADS-box genes of maize: frequent targets of selection during domestication. Genetics Research 93: 65–75. [115] Theissen G, Strater T, Fischer A, Saedler H (1995) Structural characterization, chromosomal localization and phylogenetic evaluation of two pairs of AGAMOUS like MADS-box genes from maize. Gene 156: 155–66. [116] Schmidt RJ, Veit B, Mandel MA, Mena M, Hake S, et al. (1993) Identification and molecular characterization of ZAG1, the maize homolog of the Arabidopsis floral homeotic gene AGAMOUS. The Plant Cell 5: 729–37.