Download The Complex Inheritance of Maize Domestication Traits and Gene

Document related concepts

Oncogenomics wikipedia , lookup

Transposable element wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Behavioural genetics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Twin study wikipedia , lookup

Epistasis wikipedia , lookup

Gene desert wikipedia , lookup

Genetically modified crops wikipedia , lookup

Long non-coding RNA wikipedia , lookup

X-inactivation wikipedia , lookup

Pathogenomics wikipedia , lookup

Essential gene wikipedia , lookup

Public health genomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Heritability of IQ wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Nutriepigenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome evolution wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Microevolution wikipedia , lookup

Minimal genome wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
The Complex Inheritance of Maize Domestication Traits and
Gene Expression
By
Zachary H. Lemmon
A dissertation submitted in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy
(Genetics)
at the
UNIVERSITY OF WISCONSIN – MADISON
2014
Date of final oral examination: 4/29/14
The dissertation is approved by the following members of the Final Oral Committee:
John F. Doebley, Professor, Genetics
David A. Baum, Professor, Botany and Genetics
Shawn M. Kaeppler, Professor, Agronomy
Patrick H. Masson, Professor, Genetics
Bret A. Payseur, Professor, Genetics
i
Acknowledgements
I want to extend my thanks to John Doebley for making this dissertation possible. John
has been a constant voice of encouragement and insight throughout my graduate career.
He has been instrumental in keeping me focused on the big question, while allowing me
the freedom to chase down side interests and projects. John has taught me the importance
of focusing my scientific inquiry on the core of a research question, which has shaped the
way I approach research. While I have carried out the experiments described in this work,
the first steps taken in these projects belong to John and I am grateful for the chance I
was given to shepherd them to completion. Every day and conversation I have had with
John as my advisor has made me into a better scientist and I am extremely thankful for
the opportunity I was given six years ago when I joined the Doebley lab.
I have been fortunate enough to also work in an outstanding lab full of supportive
individuals on both a personal and professional level. The work performed by a number
of my fellow lab members was crucial to the completion of these experiments. Without
their help the many DNA and RNA extractions, PCR reactions, measured phenotypes,
and plants grown would simply have not happened. Fellow graduate students, postdocs,
lab technicians, and undergraduate workers have all assisted in their own way. I am
also thankful that in addition to being wonderful coworkers in a professional sense, lab
members have contributed to making the lab a fun, exciting, and enjoyable place to
spend my Ph.D. career. I will never forget the power of “Tak”, being “skinny up top”,
or the “lab master”. To Tony, Laura, CJ, Ali, Bao, Tina, Lisa, Eric III, Jesse, Elizabeth,
David, Claudia, Wei, and the numerous undergrads, thank you for making this wonderful
experience possible.
In addition to my friends and colleagues at Wisconsin, I have been fortunate enough to
be involved in a larger community of maize researchers at Cornell University, University of
ii
Missouri, North Carolina State University, and University of California - Davis. Working
with these scientists has exposed me to a variety of questions and topics in maize research
regarding phenotype, quantitative genetics, and large scale data collection and analysis
resulting in a greatly expanded experience. In particular, collaborations with Qi Sun
and Robert Bukowski at Cornell have greatly contributed to analysis in the third chapter
of this thesis. Also dialog with Jeff Ross-Ibarra and Matt Hufford at UC-Davis has
continuously provided me with insight into the population genetics of maize domestication
and given me a valuable resource to draw on.
My Ph.D. committee has been an excellent resource during my graduate career. Bret
Payseur and Shawn Kaeppler in particular have provided valuable insight into scientific
questions and suggested analyses that have become part of this dissertation. David Baum
has always made time in his busy schedule to meet with me and keep up to date with my
progress. Finally, Patrick Masson has been a constant source of encouragement and has
assisted me in several capacities both within and outside of the Ph.D. committee.
I am also eternally grateful to my family, who have stood by my side throughout
this process. My parents, Karen and Holden, for giving me the tools and opportunity to
pursue my goals. My sisters, Addie and Kelsey, for always being there and my wonderful
nieces, Laney and Havi, for always making me smile. My amazing friend, Alex, who has
been a constant source of support in my life and is one of the family now. Finally, my
wife Megan, you have kept me grounded throughout these six years in Madison in both
the good and bad times. You are my rock and this would not have been possible without
you.
iii
Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
iv
1 Genetic dissection of a genomic region with pleiotropic effects on domestication traits in maize reveals multiple linked QTL
1
1.1
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.3.1
Plant Material, Genotypes, and Phenotypes . . . . . . . . . . . . .
6
1.3.2
Mixed Models and Heritability . . . . . . . . . . . . . . . . . . . . .
7
1.3.3
QTL Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4
Simulation Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4
1.5
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1
QTL mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2
Simulation Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 16
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Fine mapping of chromosome five domestication genes in maize
26
2.1
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3
Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4
2.3.1
Plant material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.2
Field Trials and Phenotypes . . . . . . . . . . . . . . . . . . . . . . 32
2.3.3
Genotyping with PCR and next generation sequencing . . . . . . . 33
2.3.4
Statistical analysis and segregation of phenotypes . . . . . . . . . . 35
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1
RCNIL generation and phenotype least squared means . . . . . . . 38
2.4.2
PCR and GBS genotyping . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.3
QTL fail to segregate as Mendelian traits . . . . . . . . . . . . . . . 42
2.4.4
Multiple factors contribute to culm diameter and kernel row number 45
v
2.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5.1
The complex genetic architecture of culm and kernel row number . 48
2.5.2
Future work on chromosome five QTL . . . . . . . . . . . . . . . . 50
3 The role of cis regulatory evolution in maize domestication
52
3.1
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3
Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4
3.3.1
Plant material, RNA preparation, and sequencing . . . . . . . . . . 56
3.3.2
Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.3
Maize:teosinte gene expression ratios . . . . . . . . . . . . . . . . . 58
3.3.4
Testing for cis and trans effects . . . . . . . . . . . . . . . . . . . . 59
3.3.5
Candidate genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.6
Proportion of cis variation in maize and teosinte . . . . . . . . . . . 62
3.3.7
Additive and dominant gene expression . . . . . . . . . . . . . . . . 63
3.3.8
CCT gene enrichment in various functional categories . . . . . . . . 64
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4.1
RNAseq provides expression data for more than 17,000 genes per
tissue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4.2
Prolific regulatory variation characterized by relatively few consistent cis differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.3
Possible directional bias in cis evolution . . . . . . . . . . . . . . . 74
3.4.4
Gene expression variation is greater in teosinte . . . . . . . . . . . . 76
3.4.5
Selection candidate genes are enriched for CCT genes . . . . . . . . 78
3.4.6
Microarray and RNAseq data partially correspond . . . . . . . . . . 81
3.4.7
CCT genes are unrelated to differentially methylated regions . . . . 83
3.4.8
Dominant and additive gene expression inheritance . . . . . . . . . 85
vi
3.4.9
3.5
Candidate genes enriched in various functional categories . . . . . . 86
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.5.1
Regulatory change between and within maize and teosinte . . . . . 89
3.5.2
What is the frequency of cis and trans regulatory change? . . . . . 90
3.5.3
Tissue specific expression of CCT candidates . . . . . . . . . . . . . 92
3.5.4
Bias toward increased maize expression? . . . . . . . . . . . . . . . 93
3.5.5
Selection-candidates enriched for cis regulatory change . . . . . . . 94
3.5.6
Leaf tissue candidates are enriched for photosynthesis and chloroplast GO terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5.7
Do crop domestication genes show cis differences? . . . . . . . . . . 96
3.5.8
A catalog of genes with cis regulatory variation . . . . . . . . . . . 96
vii
Appendices
99
A Supplemental Content: Genetic dissection of a genomic region with
pleiotropic effects on domestication traits in maize reveals multiple
linked QTL
100
A.1 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.2 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B Supplemental Content: Fine mapping of chromosome five domestication
genes in maize
106
B.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
C Supplemental Content: The role of cis regulatory evolution in maize
domestication
109
C.1 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
C.2 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
D Characterization of domestication traits for selection candidate gene Zea
agamous2
157
D.1 Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
D.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
D.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
D.3.1 RCNILs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
D.3.2 Transgenic RNAi lines . . . . . . . . . . . . . . . . . . . . . . . . . 161
D.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
D.4.1 RCNILs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
D.4.2 Transgenic RNAi lines . . . . . . . . . . . . . . . . . . . . . . . . . 163
D.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
viii
References
168
ix
List of Figures
1.1
Cumulative plot of QTL detected in the mapping experiment. . . . . . . . 15
1.2
The number of detected QTL and mean detected QTL effect size versus
number of simulated causative loci. . . . . . . . . . . . . . . . . . . . . . . 19
1.3
The proportion of detected QTL with zero, one, or more than one simulated
causative genes in the 1.5 LOD support interval. . . . . . . . . . . . . . . . 21
2.1
Histograms of least squared means for the culm diameter and kernel row
number phenotypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2
GBS genotypes for kernel row number RCNILs. . . . . . . . . . . . . . . . 41
2.3
RCNILs sorted by phenotype from least to greatest. . . . . . . . . . . . . . 43
2.4
Density plots of the culm diameter and kernel row number phenotypes
grouped by founding HIF. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5
QTL LOD profiles for fine mapping of culm diameter and kernel row number traits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1
Overlap of genes assessed in the three tissues overall and in the CCT-AB
gene list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2
Parent versus hybrid ear tissue allele specific expression ratios. . . . . . . . 72
3.3
Proportion of expression divergence due to cis regulatory difference. . . . . 73
x
3.4
Cis versus estimated trans regulatory effect for CCT-ABC genes in the ear,
leaf, and stem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5
The proportion of average maize to teosinte R2 from linear models explaining F1 hybrid expression by maize and teosinte parent. . . . . . . . . . . . 77
3.6
Density plots of ln(XPCLR) score of conserved versus CCT-AB candidate
genes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.7
Proportion of cis only and trans only genes identified as having dominant
or additive inheritance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
A.1 Histograms of the least squared means for phenotyped traits from the QTL
mapping population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.2 Example histograms of simulated traits for several different conditions in
terms of number of causative loci, effect size, and heritability.
. . . . . . . 102
A.3 Proportion of detected QTL with zero, one, or multiple causative genes in
the 1.5 LOD support interval. . . . . . . . . . . . . . . . . . . . . . . . . . 103
C.1 Parent versus hybrid leaf tissue allele specific expression ratios. . . . . . . . 110
C.2 Parent versus hybrid stem tissue allele specific expression ratios. . . . . . . 111
C.3 Dominance by additivity ratio grouped by regulatory category. . . . . . . . 112
D.1 Single kernel weight estimates for zag2 RCNILs. . . . . . . . . . . . . . . . 164
xi
List of Tables
1.1
NIRIL phenotyped traits, descriptions, approximate distribution, between
year Pearson correlation coefficients, and Pearson p-values. . . . . . . . . .
8
1.2
Final models selected for the thirteen NIRIL phenotypes. . . . . . . . . . .
9
1.3
Detected QTL for the T5S mapping population with position, heritability,
and LOD score statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1
Final linear mixed models used to produce least squared means for fine
mapping RCNILs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2
Detected QTL and HIF effects including LOD, percent variation explained,
and additive effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1
Regulatory category as defined by significant (Sig.) or not significant (Not
Sig.) binomial tests (BT) and Fisher’s Exact Tests (FET). . . . . . . . . . 60
3.2
Assignable RNAseq Read Counts from F1 hybrids and parents. . . . . . . . 68
3.3
Genes for which RNAseq data was collected and expression was assayed.1 . 69
3.4
Fisher’s Exact Tests for overlap of selection and CCT candidates. . . . . . 80
3.5
Fisher’s Exact Tests for enrichment/depletion of cis and trans only genes
in selection features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6
Fisher’s Exact Tests for overlap between microarray and CCT differentially
expressed genes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
xii
3.7
Regulatory category of the closest maize homolog of 6 maize and 22 nonmaize domestication loci. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.1 RFLP Markers used during backcrossing of QTL mapping population. . . . 104
A.2 Genetic markers used to score BC6 S6 mapping population. . . . . . . . . . 105
B.1 PCR markers used for genotyping RCNILs including gene or SNP target,
AGPv2 position, and primer sequence. . . . . . . . . . . . . . . . . . . . . 107
C.1 Biological replicates for RNAseq experiment. . . . . . . . . . . . . . . . . . 113
C.2 Adapter name, barcode sequence, and barcode length for Illumina adapters
used in RNAseq libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
C.3 Number of genomic paired end reads and coverage obtained for constructing
pseudo-transcriptomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
C.4 Proportion of divergence due to cis regulatory effect grouped by overall
parental divergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
C.5 The number of genes for which the maize or teosinte allele is expressed at
a higher level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
C.6 Bias for the maize allele grouped by inbred line for the three tissues in the
CCT-ABC gene list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
C.7 Allele specific expression variation among F1 hybrids explained by maize
and teosinte parent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
C.8 Number of genes with significant cis expression variation explained by
maize and/or teosinte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
C.9 Comparison of observed and expected numbers of genes classified as differentially expressed (DE) or not differentially expressed (NDE) by RNAseq
and MicroArray assays in groups A, B, and C in the three tissue types. . . 121
xiii
C.10 Regulatory categories for genes identified as differentially expressed between maize and teosinte by microarray assays. . . . . . . . . . . . . . . . 122
C.11 Fisher’s Exact Tests for the overlap between genes associated with differentially methylated regions (DMRs) and CCT-ABC genes from each of the
three experimental tissues in our work. . . . . . . . . . . . . . . . . . . . . 123
C.12 Number of candidate genes neighboring differentially methylated regions
(DMRs) between maize and teosinte and proportion in which expression
data agrees with methylated status. . . . . . . . . . . . . . . . . . . . . . . 124
C.13 Dominance/additivity ratios for genome-wide gene expression . . . . . . . 125
C.14 Contingency tables for additive and dominant gene counts for A, AB, and
ABC candidate lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C.15 Degree of overlap between our CCT (AB list) genes and genes in different
transcription factor families. . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.16 Degree of overlap between CCT (AB list) differentially expressed genes and
genes in the 1.5 support intervals for QTL from a previous study. . . . . . 133
C.17 Degree overlap between our CCT (AB list) differentially expressed genes
and genes in metabolic pathways defined in KEGG. . . . . . . . . . . . . . 134
C.18 Significantly enriched and depleted GO terms from CCT and trans only
gene lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
D.1 Trait abbreviations and descriptions from the zag2 experiment. . . . . . . 162
D.2 Zag2 transgenic RNAi insertion event, background, phenotype, and t-test
p-value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
xiv
Abstract
The genetic basis for morphological change in divergent species is a central question in
evolutionary biology. The domestication of maize from its wild progenitor, teosinte, is an
excellent system to address this question. We explore the large effect on domestication
phenotypes of a poorly understood region of the maize genome using a chromosome five
specific mapping population. Unlike other large effect regions of the maize genome, many
traits have multiple QTL that do not stack on a single locus suggesting multiple genes on
the fifth chromosome influence domestication traits. Simulation studies show clear evidence for limited power to detect QTL for highly polygenic traits that do not accurately
portray the true complexity of the underlying genetic architecture. Two QTL in different
locations were chosen for fine mapping studies to identify the underlying causative genes.
While a single gene was not identified for either QTL, both were successfully narrowed
to less than three centimorgan intervals with relatively few genes and evidence of positive
selection during maize domestication. Finally, the first genome-wide effort to characterize
cis and trans regulatory change between a domesticated crop and its wild progenitor found
extensive regulatory variation with relatively few genes having consistent cis differences,
which were determined to be under positive selection during the domestication and crop
improvement of maize. Consistent with loss of diversity during the domestication bottleneck, cis expression variation explained by the maize parent is reduced in comparison to
teosinte with an even greater reduction seen in cis candidate genes. A general increase in
the expression of maize alleles was also observed suggesting domestication in maize may
have led to a general increase in gene expression. Collectively, these experiments shed
light on the evolution of divergent phenotypes and gene regulation in the domesticated
maize and its wild progenitor.
xv
Preface
The nature of functional changes to the genes responsible for phenotypic divergence in related species is a topic of ongoing research in evolutionary biology. Many types of genomic
features have been shown to influence the development of novel phenotypes. Studies in
closely related species have identified gene duplications [1], various types of expression
modification [2–4], and gene coding changes [5, 6] that give rise to altered phenotypes. A
major contributor to evolutionary biology research is the study of domesticated crops and
their wild ancestors, where the intense artificial selection for agronomic traits during the
domestication process serves as a proxy for natural selection mechanisms. Experiments
characterizing the functional changes responsible for novel phenotypes in the domesticated
systems of rice, tomato, wheat, and sorghum have been met with great success [7].
One of the most successfully used domestication crop models is maize, where scientists
have extensively investigated the morphological differences between maize (Zea mays spp.
mays) and its wild progenitor (Zea mays spp. parviglumis). Maize is an excellent system to pursue evolutionary questions for a number of reasons. Maize was domesticated
approximately 9,000 years ago in the Balsas River valley of Mexico [8]. Like other domesticated systems, maize-teosinte F1 hybrids are fertile, which allows the use of powerful
genetic techniques to dissect the genetics of complex traits. The maize reference genome
also greatly facilitates research by empowering the use of sequence based analyses and
comparative genomics [9]. A common collection of phenotypic differences seen between
domesticated crops and their wild progenitors is also observed when comparing maize and
teosinte. This “domestication syndrome” [10, 11] consists of phenotypes that improve the
suitability of a crop for human use such as loss of shattering (natural seed dispersal),
increased apical dominance, loss of prolificacy (concentration of seed into one unit), and
gigantism of vegetative and reproductive tissues.
xvi
One method commonly used to examine genetic factors controlling morphological variation in maize is quantitative trait locus (QTL) mapping. Studies examining the domestication of maize have shown QTL representing the profound morphological differences
between maize and its wild progenitor teosinte can be primarily attributed to six regions
of large effect on the first five chromosomes of maize [8]. Three of these genomic features
have been further characterized, identifying single genes of large, pleiotropic effect. The
functional causative polymorphisms of these genes include new tissue specific expression
patterns [4], elevated expression [3], and coding sequence change [5]. In contrast to these
well characterized loci, other regions of the genome with large effect on domestication
phenotypes are poorly understood.
A prominent theory in evolutionary biology suggests the primary mechanism by which
adaptive evolution occurs is through modification of cis regulatory elements [12, 13].
Consistent with this theory, altered cis regulatory elements in domesticated crops account
for a large proportion of identified domestication genes [7]. A striking characteristic of
these genes is the variety of functional changes that result from cis regulatory change
with examples including elevated and decreased expression [3, 14], development of novel
tissue specific expression patterns [4, 15], and heterochronic shifts in expression [16]. The
demonstrated importance of gene regulatory change in the evolution of new forms has
led to several studies investigating genome-wide gene expression in domesticated crops
[17–19]. While measuring gene expression differences between a modern crop and its wild
relative is an important step in exploring regulatory variation in an evolutionary context,
it falls short of the global analyses in yeast and fruit fly [20, 21] that specifically dissect
cis and trans regulatory variation.
The work presented in this dissertation seeks to explore two facets of diversification
between maize and teosinte. First, quantitative genetic methods are used to specifically
assess the architecture of domestication QTL and causative genes on the fifth chromosome
xvii
of maize, providing insight into the genetic factors underlying this previously uncharacterized region of large phenotypic effect in the maize genome. Second, genome-wide regulatory variation due to cis and trans regulatory change is investigated on a genome-wide
scale using deep RNA sequencing. This work is presented in three chapters.
1. The first chapter describes a chromosome five specific QTL mapping experiment. A
large BC6 S6 population was developed while fixing other regions known to impact
domestication traits for a homozygous genotype. Thirteen phenotypes representing
differences between the progenitor and maize were measured in two summers and
QTL mapping was performed. We detected an average of approximately two QTL
per trait with QTL mapping to multiple regions. This suggested that unlike other
genomic regions of importance in maize domestication, the fifth chromosome houses
a complex of linked loci that all contribute to the phenotypic effect. Additional
efforts were made to examine the power and precision of our mapping population
with simulated trait datasets. Heritability of a trait was found to have the primary
influence on the maximum number of detectable QTL and we observed the Beavis
Effect on estimated QTL effect size. This work provides a focused examination of
a previously poorly understood region of the maize genome with large phenotypic
effects on domestication traits.
2. The second chapter focuses on fine mapping efforts for two QTL for culm diameter and kernel row number on the fifth chromosome identified in chapter one. Our
strategy used a population of plants with homozygous recombinant chromosomes
in replicated field trials. Neither QTL was successfully mapped to a single gene,
however, the culm diameter QTL was greatly reduced in size (∼2.5% of the original
1.5 LOD support interval). The kernel row number QTL was analyzed with whole
genome genotyping data and a complex set of genetic factors influencing the trait
were identified. The main kernel row number QTL in terms of LOD score on chro-
xviii
mosome five shifted to a different region outside of the original support interval. The
culm diameter and kernel row number QTL contained 40 and 63 genes, respectively,
which were examined for attractive candidate genes. Neither QTL had a clear best
candidate, but several genes showed evidence for cis regulatory change and multiple
genes had evidence of positive selection during the domestication of maize. While
this work was unsuccessful in identifying a single causative gene, we greatly reduce
the size of the culm diameter QTL and find evidence for complex inheritance of the
kernel row number phenotype.
3. Finally, the extent of genome-wide gene regulatory change is examined using next
generation sequencing methods. Three tissues from a collection of maize-teosinte
F1 hybrids and their inbred parents were harvested and next generation Illumina
sequencing was performed to assess differential expression of alleles. Using a hierarchical series of statistical tests, we differentiate between significant cis and trans
regulatory effects for approximately 17,000 genes in each of the three tissues studied.
We produce a list of filtered candidate genes (∼500 genes per tissue) with significant
and consistent cis effects. These genes are significantly associated with selection
features from a recent genome-wide scan for selection in maize, suggesting genes
with cis regulatory changes are frequently the target of positive selection. Additionally, the proportion of effect due to cis was observed to be positively correlated
with overall divergence. Several other characteristics of the candidate cis genes were
also analyzed including gene ontology and other functional annotations. This study
represents the first genome-wide effort in a domesticated crop and wild progenitor
to assess allele specific expression dissecting cis and trans effects using F1 hybrids.
1
Chapter 1
Genetic dissection of a genomic
region with pleiotropic effects on
domestication traits in maize reveals
multiple linked QTL
2
1.1
Abstract
The domesticated crop maize and its wild progenitor, teosinte, have been used in numerous
experiments to investigate the nature of divergent morphologies. This study examines a
poorly understood region on the fifth chromosome of maize associated with a number of
traits under selection during domestication using a QTL mapping population specific to
the fifth chromosome. In contrast with other major domestication loci in maize where
large effect, highly pleiotropic, single genes are responsible for phenotypic effects, our
study found the region on chromosome five fractionates into multiple QTL, none with
singularly large effects. The smallest 1.5 LOD support interval for a QTL contained
54 genes, one of which was a MADS MIKCC transcription factor, a family of proteins
implicated in many developmental programs. We also used simulated trait datasets to
investigate the power of our mapping population to identify QTL for which there is a
single underlying causal gene. This analysis showed that while QTL for traits controlled
by single genes can be accurately mapped, our population design can detect no more than
∼4.5
QTL per trait even when there are 100 causal genes. Thus when a trait is controlled
by 5 or more genes in the simulated data, the number of detected QTL can represent a
simplification of the underlying causative factors. Our results show how a QTL region
with effects on several traits may be due to multiple linked QTL of small effect as opposed
to a single gene with large and pleiotropic effects.
3
1.2
Introduction
In evolutionary biology, quantitative trait locus (QTL) mapping has been used with great
success to define the genetic architecture controlling morphological differences between
species. These QTL mapping experiments have identified a number of QTL with large
effects in animal [22–24] and plant systems [25–28]. Often these experiments identify QTL
clusters in a relatively small number of genomic regions, suggesting an underlying genetic
architecture of single pleiotropic genes or several closely linked genes [8, 24, 29–31]. The
phenotypic effects of QTL have been successfully mapped to single large effect pleiotropic
genes in many species [3, 5, 15, 16, 32–34]. However, these large effect genes often only
explain a portion of the divergence between species, leaving a considerable amount of
phenotypic differences unexplained. Characterization of QTL clusters not associated with
single genes will lead to a more comprehensive understanding of the genetic architecture
that contributes to divergent phenotypes.
Domesticated crop plants and maize in particular provide a well-suited system in which
to study the evolution of new morphologies for a number of reasons. First, maize (Zea
mays spp. mays) and its wild progenitor teosinte (Z. mays spp. parviglumis) differ for a
suite of traits commonly seen in domesticated crop pairs. Collectively, these differences
are known as the domestication syndrome and include reduced lateral branching, loss
of natural seed dispersal, and gigantism of vegetative and reproductive tissues [10, 11].
Second, intense artificial selection upon domesticated crops, including maize, for desirable
agronomic traits leaves a signature of selection (reduced nucleotide diversity) allowing for
identification of putative targets of artificial selection in selective sweeps [35]. Third, like
most domestication events, maize domestication took place in the last 10,000 years and
surviving wild progenitor populations serve as reasonable surrogates for the ancestor [36].
In addition, maize and teosinte are inter-fertile, allowing for the use of genetic techniques
and crosses to dissect the genetic architecture underlying divergent traits [37, 38]. Finally,
4
researchers studying maize have the advantage of a powerful tool in the reference maize
genome sequence providing the ability to anchor genetic markers to physical positions,
annotation of candidate genes, and characterization of important genomic features such
as centromeres [9]. The combination of these characteristics and available tools make
maize an effective model system in which to study the evolution of new forms.
Previous work in maize and its wild progenitor suggests the genes responsible for
phenotypic change are scattered throughout the genome but with several concentrations
of genes (QTL) controlling large portions of the phenotypic differences [8, 25]. To date,
three large effect pleiotropic genes have been mapped to these genomic regions of large
phenotypic importance. The short arm of chromosome one is home to grassy tillers1 (gt1 ),
which influences tillering [39] and is largely responsible for the concentration of seed into
a single large ear [4]. The gene teosinte branched1 (tb1 ) is found on the long arm of
chromosome one and has a large pleiotropic impact on plant and inflorescence branching
[3, 40]. Finally, the gene teosinte glume architecture1 (tga1 ) liberates the kernel from
its stony fruit case in teosinte [5]. In comparison to these extensively studied genes,
little is known about the genetic factors on other chromosomes responsible for phenotypic
divergence during maize domestication.
While early studies identified tb1 as the gene responsible for much of the phenotypic
effect on the long arm of chromosome one [41], a more recent study has identified at least
two additional loci upstream of tb1 with significant effects on phenotype [42]. These loci
influence the expression of tb1 -like phenotypes in both additive and epistatic ways. The
nearest of these loci was only 5 centimorgans (cM) away from tb1 itself and also had an
effect specific to ear traits, leaving plant architecture traits such as tillering unaffected.
This suggests secondary factors to major effect genes are potentially quite closely linked
and could also mediate tissue specific effects. Similarly, the work identifying gt1 also found
5
evidence of a secondary factor located downstream of the identified causative region that
slightly increases prolificacy (the number of ears) in plants carrying the teosinte allele [4].
One of the six genomic regions of large pleiotropic effect identified in maize is on
chromosome five where the genetic architecture underlying the large phenotypic effects
is largely unknown [8]. Previous work has found a number of domestication QTL on
chromosome five for culm diameter, kernel row number, ear diameter, disarticulation,
and pedicellate spikelet length [8, 37, 38]. A more recent experiment also found QTL
for a number of these traits on chromosome five, some of which (kernel row number, ear
diameter, and disarticulartion) had particularly large effect and LOD score [25]. While
these previous mapping experiments found significant QTL for domestication traits on
chromosome five, they could not determine whether this region contained a major QTL
with pleiotropic effects on several traits or multiple linked QTL.
In this paper, we undertook a QTL mapping study to better characterize the effect of
chromosome five on domestication traits. This experiment utilized a population of nearly
isogenic recombinant inbred lines (NIRILs) that allowed for concentration of informative
crossover events in the region of interest (chromosome five) and replicated block experiments to improve trait measurements. Both of these characteristics increase the mapping
power specifically on chromosome five in comparison with a standard F2 mapping population, improving the ability to differentiate between closely linked, moderate to small
effect, and interacting QTL. Our QTL mapping detected QTL at multiple locations on
the fifth chromosome, none of which have singularly large effect. This suggests that unlike other regions of the maize genome with single large effect genes [3–5], chromosome
five houses several linked factors influencing phenotype. We also performed a simulation
study to gauge the power and precision of our mapping population. This analysis indicates that for some traits the genetic architecture could be more complex than observed
with empirical data.
6
1.3
1.3.1
Materials and Methods
Plant Material, Genotypes, and Phenotypes
We conducted a QTL mapping experiment to investigate the genetic architecture of domestication traits on maize chromosome five using a collection of nearly isogenic recombinant inbred lines (NIRILs) in the summers of 2009 and 2010. The experimental population
was built by introgressing the majority of the short arm of chromosome five and part of
the long arm from a teosinte (Iltis and Cochrane collection 81) into the maize inbred
W22 by six generations of backcrossing. RFLP markers (Supplemental Table A.1) were
used during this process to follow the desired genomic segment and eliminate teosinte
segments at other known domestication QTL identified in a previous study [43]. The
extensive backcrossing in tandem with tracking and eliminating teosinte segments from
specific regions of the genome allowed the experiment to be focused on the segregating
teosinte introgression on chromosome five. Five BC6 individuals heterozygous for the target segment on chromosome five were selfed to produce five BC6 S1 families. The families
were then selfed for five additional generations to give an experimental BC6 S6 population
of 259 highly homozygous NIRILs, which carried a collection of teosinte fifth chromosome
introgressions in an isogenic W22 background.
Genomic DNA was extracted with a standard CTAB protocol from tissue collected
from an average of 15 individuals from each NIRIL in the summer of 2009. A collection of 25 insertion/deletion and microsatellite markers (Supplemental Table A.2) were
genotyped across the fifth chromosome introgression using standard PCR and gel electrophoresis methods. In total, there were 443 observed recombination breakpoints among
the NIRILs or approximately 1.7 events per line. The range of recombination breakpoints
went from zero to six with the majority of lines (51.7%) having either zero or a single
recombination event. The number of lines with each number of breakpoints are as fol-
7
lows: 56 (0 breakpoints), 78 (1 breakpoint), 49 (2 breakpoints), 48 (3 breakpoints), 19 (4
breakpoints), 7 (5 breakpoints), and 2 (6 breakpoints).
Phenotype data was collected for the experimental NIRILs in three replicated blocks,
two in the summer of 2009 and one in 2010, grown at the West Madison Agricultural
Research Station in Madison, Wisconsin. Blocks consisted of the 259 NIRILs planted
in randomized plots of ten or twelve plants each in 2009 and 2010, respectively. Five
plants from each plot were assessed for thirteen phenotypes (Table 1.1) representing a
number of plant and inflorescence phenotypic differences between teosinte and maize.
Plant traits included plant height, days to pollen shed, the amount of tillering, length of
the primary lateral branch, prolificacy, and culm diameter. Inflorescence traits measured
in the female inflorescence (ear) were kernels per rank, kernel row number, ear diameter,
ear length, and percent staminate spikelets. Several traits from the male inflorescence or
tassel were also measured and include the pedicellate spikelet length and tassel branch
number. Genotype and phenotype data are available from the Dryad Digital Repository:
http://dx.doi.org/10.5061/dryad.7sq67.
1.3.2
Mixed Models and Heritability
We estimated the NIRIL phenotype for all traits by fitting a linear mixed model. Fixed
effects consisted of NIRIL, NIRIL family, and position within block, while block and year
were used as random effects. A model (Equation 1.1) was fit with the MIXED procedure
in SAS [44] as an initial scope. In this model, Yijklmno is the individual trait value,
µ the overall mean, fj the family effect, ai (fj ) is line nested in family, random block
effect is bk , horizontal and vertical position in the field nested in block are represented
by cl (bk ) and dm (bk ) respectively, tn the year, eijklmno is the experimental error (between
plots), and finally gijklmno for within plot sampling error. Each model term was tested for
significance on a trait-by-trait basis with t-tests for fixed effects and likelihood ratio tests
8
Table 1.1: NIRIL phenotyped traits, descriptions, approximate distribution, between year
Pearson correlation coefficients, and Pearson p-values.
Trait
CULM
DTP
EARD
EARL
KPR
KRN
LBLH
PLHT
PROL
SPLH
STAM
TBN
TILL
Description
Diameter of culm
Days to pollen shed
Ear diameter
Ear length
Kernels per rank
Kernel row number
Primary lateral branch length
Plant height
Prolificacy, ears on lateral branch
Spikelet length
Percent staminate spikelets
Tassel branch number
Tillering index
Distribution
normal
normal
bimodal
normal
bimodal
bimodal
normal
normal
exponential
normal
exponential
normal
exponential
Pearson
p-value
0.688
0.668
0.907
0.409
0.698
0.718
0.519
0.652
0.422
N/A
0.321
0.691
0.346
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
N/A
<0.0001
<0.0001
<0.0001
9
Table 1.2: Final models selected for the thirteen NIRIL phenotypes.
Trait
Model
CULM
line(family) + family + x(plot) + y(plot)
DTP line(family) + family + x(plot) + y(plot) + x*y(plot)
EARD line(family) + family + x(plot) + y(plot) + x*y(plot)
EARL line(family) + family + x(plot) + y(plot) + x*y(plot)
KPR line(family) + family + x(plot) + y(plot) + x*y(plot)
KRN
line(family) + family + x(plot)
LBLH line(family) + family + x(plot) + y(plot) + x*y(plot)
PLHT
line(family) + family + x(plot) + y(plot)
PROL
line(family) + family + x(plot)
SPLH
line(family) + family + x
STAM line(family) + family + x(plot) + y(plot) + x*y(plot)
TBN line(family) + family + x(plot) + y(plot) + x*y(plot)
TILL
line(family) + family + y(plot)
10
with one degree of freedom for random effects. Likelihood ratio and t-tests with p-values
greater than 0.05 were deemed not significant and the corresponding terms were removed
from the model. While the initial scope of the model included a random block and year
effect, none of the random effects were found to be significant. Following definition of
appropriate models for the studied traits (Table 1.2), least squared means for each trait
were calculated and used for QTL mapping.
Yijklmno = µ+ai (fj )+fj +bk +cl (bk )+dm (bk )+cl (bk )∗dm (bk )+tn +eijklmn +gijklmno (1.1)
Broad-sense heritabilities on a plot means basis (H 2 ) were calculated for each of the
traits. The variance components needed for this calculation were found using a linear
mixed model with plot means as the dependent variable and plot and line as random
independent variables. Variance components for the line or genotypic component (σg2 ),
the plot (σp2 ), and the residual variance due to environment (σe2 ) were extracted and
equation 1.2 was used to calculate H 2 . The plot variance (σp2 ) was calculated in the
model as a known source of variation in phenotype. Since this plot variance is known, it
does not contribute to unaccounted for environmental variation as seen by the residual
variance (σe2 ) and was not used to calculate heritability.
H 2 = (σ 2g )/(σ 2g + σ 2e )
1.3.3
(1.2)
QTL Mapping
We mapped QTL using a model based approach in R/qtl [45, 46] with phenotype, represented by least squared means, and 25 genetic markers for the NIRILs. The introgression
on the fifth chromosome started as a heterozygous segment in the BC6 generation and
segregates as a S6 population. Consequently, we analyzed the population as a BC0 S6
in R/qtl. Genotypes were first used to produce a genetic map for the teosinte segment
introgression using the Kosambi mapping function [47], with a 0.0001 genotyping error
11
rate as implemented in R/qtl. Genetic marker order was initially found by BLAST to
the AGPv2 genome and confirmed using the ripple function in R/qtl with a five marker
window. Significant LOD score thresholds were determined for each trait with a 5% cutoff
based on 10,000 permutations of the data.
QTL models for each phenotype were determined by scanning for potential QTL using
the Haley-Knott regression method and testing for QTL significance one-by-one. Definition of QTL models was accomplished by first scanning for QTL with the R/qtl function
scanone to find an initial QTL position with a LOD score greater than the 5% cutoff
calculated by permutations. Next, we scanned for additional QTL using the addqtl function. If this secondary QTL scan detected a QTL that exceeded the 5% LOD score cutoff
defined by permutations, it was added to the model and QTL positions were refined using
the R/qtl function refineqtl. QTL were added to the model using this cycle of: (1) scanning for additional QTL, (2) adding significant QTL to the model, and (3) refining QTL
positions until no more significant QTL could be added. Once all significant QTL were
added, pairwise interactions between QTL were tested using the addint function of R/qtl.
Significant pairwise interactions (F-test, p < 0.05) were added to the model one by one
until no more significant interactions were detected. After the model was finalized, each
QTL in the final QTL model was tested for significance with dropone ANOVA analysis.
1.3.4
Simulation Experiment
In order to explore the theoretical maximum number of detectable QTL possible in this
study, we mapped QTL with simulated datasets where causative genes were randomly
chosen from the genes in the teosinte introgressed region. Simulated traits were made for
one to 15 causative genes, then 20 to 50 genes by fives, and then 75 and 100 causative
genes for a total of 24 different causative gene set sizes. The 25 genotyped markers in
our 259 NIRILs were used to assign genotype probabilities to the 2,576 total genes in
12
the introgressed segment of chromosome five based on the genotype of flanking markers.
These genotype probabilities were assigned based on physical proximity to the two flanking
markers assuming physical distance was proportional to genetic distance so that a gene
closely linked to a given marker had a high probability of sharing that marker genotype.
When consecutive markers had identical genotypes, this method resulted in all genes
between them matching the flanking genotypes.
Phenotypic trait values are based on both the underlying genetic contributions of genes
and random environmental noise, which together define the heritability of a trait. The
genetic values in the simulated data were set as follows. For each simulated dataset, the
randomly chosen causative genes were assigned a genotype based on the previously derived
genotype probabilities and two effect types: equal and random gamma distributed (alpha
= 1.36 and beta = 1) [48]. The effect types for each gene were given a positive, zero,
or negative value depending on whether the assigned genotype was homozygous maize,
heterozygous, or homozygous teosinte, respectively. Thus, each simulated causative gene
had two numeric values (one for equal and one for gamma distributed effects) representing
the magnitude and direction of effect on the trait. The total genetic contribution to NIRIL
phenotype was then found by simply summing the gene values (equal and gamma effects
kept separate) for all simulated causative genes.
Environmental noise was added to the summed NIRIL genetic phenotype values by
taking random draws from a normal distribution with variance equal to the additional
variance needed to reach the desired level of heritability. Two levels of heritability were
simulated, 67% and 90%, to mimic the heritabilities of two actual traits, the moderately
heritable culm diameter and highly heritable ear diameter. Heritability of the simulated
traits was required to be within 2.5% of the desired heritability, otherwise the normal
distribution was resampled. This process resulted in each set of simulated causative genes
13
having four states for the NIRILs: equal effect 67% H 2 , equal effect 90% H 2 , gamma
effect 67% H 2 , and gamma effect 90% H 2 .
We simulated twenty-four causative gene set sizes with two effect types and two heritabilities for a total of 96 distinct simulated states. Each of these states was replicated
1,000 times resulting in 96,000 simulated sets of phenotypes for the 259 NIRILs. These
phenotype values were then used with actual NIRIL genotypes to map QTL in the R/qtl
software using the same method as described in the previous section. Pairwise QTL interactions were not tested for or added in the simulated datasets because interactions
were not part of the simulated conditions. Mapping of QTL for thousands of simulated
traits could not be accomplished manually and consequently was done with a custom R
script that automated the addition of QTL and saved summary information including
QTL estimated effect size, position, LOD scores, and number of QTL.
1.4
1.4.1
Results
QTL mapping
Previous work has shown chromosome five to be home to several high LOD score and large
effect size QTL for a number of inflorescence and plant architecture domestication traits
[8, 25]. We undertook a high resolution mapping experiment with a population of NIRILs
with variable fifth chromosome teosinte introgressions in a W22 maize background. In
the summers of 2009 and 2010, the 259 NIRILs were grown in randomized plots arranged
in three replicated blocks. Phenotype data for thirteen traits was collected for five plants
per plot. Spikelet length was only collected for a single block in the summer of 2010. We
analyzed trait measurements from all three grow environments together in a single linear
mixed model with block and year as random effects and position, NIRIL, and family as
14
fixed explanatory variables. Least squared means were estimated from the mixed models
and later used for QTL mapping.
Histograms of the least squared means show several distribution types including normal, bimodal, and exponential (Supplemental Figure A.1). NIRILs genotyped as 100%
maize (29 lines) and 100% teosinte (27 lines) were used to determine whether traits
behaved as expected with the full teosinte introgression lines having more teosinte like
phenotypes. Several traits believed to not be primary targets of selection during domestication such as days to pollen shed and plant height appear to have little or no overall
difference between NIRILs containing the maize and teosinte introgression, while traits
that were the primary focus of selection during domestication including kernel row number (KRN) and ear diameter (EARD) have a substantial phenotypic difference between
homozygous maize and teosinte NIRILs. For all domestication traits, we observed a difference (sometimes quite small) between the least squared means for maize and teosinte
NIRILs consistent with the expected effect of domestication. Particularly large differences
are shown for EARD and KRN traits, where the maize genotype is 17.3% and 14.8% larger
than the teosinte genotype, respectively. Also of interest is the CULM trait, where the
maize genotype was 6.5% larger than teosinte.
There was a balanced representation of maize and teosinte genotypes with a high degree of homozygosity in the QTL mapping population. Overall genotypes of the NIRILs
were 48.3% maize, 48.2% teosinte, and 3.5% heterozygous. The NIRIL population included lines with teosinte introgressions across 162.24 megabases (Mbp), from position
6,985,619 to 169,231,037 on the maize reference genome (AGPv2). This introgression
included 74.47% of the approximately 218 megabase fifth chromosome. Of the 4,503 fifth
chromosome genes on the Filtered Gene Set (version 5b), 411 genes on the tip of the small
arm and 1,516 genes on the long arm were not included in the teosinte introgressions used
in this study. The genetic map generated with the Kosambi mapping function in R/qtl
15
Figure 1.1: Cumulative plot of QTL detected in the mapping experiment. Molecular
marker positions are shown in centimorgans at the bottom. QTL name consisting of an
abbreviated trait name, chromosome number, and QTL number are located on the left
side. The 1.5 LOD support intervals for QTL are indicated by horizontal bars and peak
LOD scores by vertical lines. Hatched bars indicate interacting QTL while solid bars
are non-interacting. In total, 24 QTL were identified across the fifth chromosome with a
variety of confidence interval sizes, max LOD scores, and effect sizes (See Table 1.3 for
QTL statistics). Five QTL clusters with contiguous regions of five or more QTL 1.5 LOD
support intervals are indicated by grey shading. A grey-scale heat map depicting number
of QTL 1.5 LOD support intervals from white (0) to black (8) is located at the top.
16
was calculated to be 86.64 centimorgans (cM), giving an average Mbp to cM ratio of 1.873
Mbp/cM.
We analyzed 13 traits and identified 24 QTL (Figure 1.1, Table 1.3) with a broad range
of LOD scores ranging from 2.70 (KPR) to 47.22 (KRN). A single epistatic interaction
was detected between the two kernel row number QTL, suggesting epistasis is minimal.
QTL 1.5 LOD support intervals ranged from 2.3 cM (KRN) to 50.6 cM (KPR) with an
average value of approximately 12.5 cM. Heritability on a plot mean basis (Table 1.3) for
each trait varied with an average H2 of 63% and range of 23% (PROL) to 90% (EARD).
Five QTL clusters, defined as contiguous regions with five or more QTL 1.5 LOD support
intervals, were found in the mapping region on chromosome five near 2, 51, 61, 70, and 84
cM (Figure 1.1). There is no clear single concentration of QTL, suggesting this genomic
region lacks a single gene of large, pleiotropic effect and that multiple linked factors at loci
spread across the fifth chromosome are responsible for the previously identified influence
of chromosome five on domestication traits.
1.4.2
Simulation Experiment
We performed a simulation experiment to determine the power and precision of our mapping population. Using causative genes projected onto actual NIRIL genotypes, a total
of 96 distinct simulated states in terms of number of genes (between one and 100), heritability (67% and 90%), and effect type (equal and gamma) were replicated 1,000 times
for a grand total of 96,000 simulated NIRIL trait datasets. Histograms of simulated traits
with 90% heritability were clearly bimodal when one causative gene was simulated and
progressively moved towards a normal distribution as more and more causative genes
were simulated. In comparison, simulated traits with 67% heritability lack a clear bimodal distribution even when only a single causative gene was simulated and are clearly
approximately normal when 100 genes are simulated (Figure A.2).
17
Table 1.3: Detected QTL for the T5S mapping population with position, heritability, and
LOD score statistics.
LOD
1.5 LOD SI
Peak Location
Percent Variation
H2
culm5.1
13.50
58.9 – 69.3
65.3
21.3%
66.5%
dtp5.1
dtp5.2
dtp model
16.36
18.76
28.93
0.0 – 11.7
75.7 – 80.0
—
2.3
77.4
—
20.1%
23.6%
40.1%
—
—
67.3%
eard5.1
eard5.2
eard5.3
eard model
3.00
17.99
33.76
65.62
0.0 – 24.2
50.1 – 54.4
82.9 – 85.9
—
12.9
51.9
84.4
—
1.7%
11.7%
25.6%
69.0%
—
—
—
90.0%
earl5.1
12.38
0.0 – 5.4
1.9
19.7%
49.1%
kpr5.1
kpr5.2
kpr5.3
kpr model
2.70
6.80
4.11
27.41
0.0 – 50.6
44.9 – 64.8
76.0 – 86.2
—
2.2
63.2
80.9
—
3.0%
7.9%
4.6%
38.5%
—
—
—
72.7%
krn5.1
krn5.2
krn5.1:2
krn model
6.22
47.22
3.32
50.56
18.8 – 24.7
82.6 – 84.9
—
—
21.5
83.8
—
—
4.8%
53.4%
2.5%
59.2%
—
—
—
73.7%
lblh5.1
24.61
75.0 – 81.1
79.0
35.3%
53.5%
plht5.1
plht5.2
plht model
7.64
2.89
14.06
0.0 – 2.4
24.3 – 39.2
—
0.0
31.7
—
11.3%
4.1%
22.0%
—
—
63.1%
prol5.1
8.38
56.9 – 71.6
64.2
13.8%
22.9%
splh5.1
splh5.2
splh5.3
splh model
9.14
7.16
2.78
30.60
0.0 – 18.7
65.7 – 68.4
74.3 – 86.6
—
13.0
67.7
78.0
—
10.2%
7.9%
2.9%
41.8%
—
—
—
88.3%
stam5.1
6.50
50.7 – 86.6
83.8
10.9%
25.9%
tbn5.1
tbn5.2
tbn model
8.28
4.60
10.46
0.0 – 4.0
43.6 – 53.2
—
0.3
47.3
—
13.1%
7.1%
16.9%
—
—
69.9%
till5.1
till5.2
till model
7.21
3.22
18.61
44.1 – 62.9
77.2 – 85.9
—
58.7
81.8
—
9.8%
4.2%
28.1%
—
—
34.3%
18
Since calculating significant LOD score thresholds via permutations for all 96,000
simulated phenotype sets would have taken weeks of computation time, we calculated
LOD score cutoffs in the first 50 replicates of the 96 states. The average threshold was
lower for 90% heritability than 67% heritability with no clear difference in threshold
caused by the effect type of causative genes. Simulated phenotypes with few causative
genes had a lower threshold on average with this effect more pronounced for the gamma
distributed effect type. The range of LOD score thresholds determined was quite narrow
(2.37 to 2.59 for gamma distributed and 2.38 to 2.60 for equal effects). Consequently,
instead of running permutations for the remaining datasets we set a conservative LOD
score threshold for mapping all simulated traits. The cutoff we chose was the maximum
of the 5% cutoffs found in the first 50 replicates of each of the 96 states.
After simulated phenotypes were generated and significance thresholds were set, QTL
were mapped using the 96,000 simulated datasets with actual genotypes for the NIRILs
in this study. Increasing the number of simulated causative genes from one to 100 caused
the mean number of detected QTL to rise from one to ∼4.5 or ∼3.0 for simulated traits
with 90% or 67% heritability, respectively (Figure 1.2). Thus, heritability was an important factor in determination of the number of detectable QTL in our experiment. The
simulated gamma effects, as opposed to equal effects, appeared to cause the maximum
number of detectable QTL to be reached at a larger number of simulated causative genes,
but there was no difference in the overall maximum number of QTL detected.
Our results show that QTL 1.5 LOD support intervals quickly become associated with
multiple genes when many causative genes are simulated (Figure 1.3). In the case of
five causative genes with equal effect and 67% heritability, the chance of a QTL containing a single causative gene has already dropped to approximately 50% (Similar patterns
are seen for gamma simulated phenotypes in Supplemental Figure A.3). This suggests
when making decisions about fine mapping of QTL, researchers would be well advised to
19
Figure 1.2: The number of detected QTL and mean detected QTL effect size versus
number of simulated causative loci. Black lines indicate 95% confidence intervals. (A)
Simulations consistently detect one QTL when a single causative gene is simulated, but
when using as few as three or four causative genes, we lose the ability to distinguish
between genes. With high numbers of simulated causative genes, total QTL detected
reaches a ceiling of ∼4.5 QTL for simulated traits with 90% heritability and ∼3.0 for
traits with 67% heritability. (B) The effects of unresolved genes are merged into the few
large effect QTL that are detected, consistent with the Beavis Effect. This is seen in the
negative correlation between mean estimated effect and number of causative genes.
20
consider factors such as trait heritability and the power of their mapping population to
identify QTL support intervals that contain single causative genes.
In our simulation experiment, increasing the number of causative genes also led to an
increase in the average estimated effect size of detected QTL (Figure 1.2). We interpreted
this as the effects of multiple underlying causative genes being combined into a single
detected QTL with a cumulative effect, consistent with the Beavis Effect where multiple
small effect loci are detected as single QTL of larger effect [49]. On average, the total
additive effect for each simulated phenotype should be the product of the total number of
simulated causative genes and the average effect size. We found this expected relationship
between number of detected QTL, average estimated additive effect of each detected QTL,
and expected total additive effect for both equal and gamma distributed effect size and
both heritabilities.
Our mapping results using empirical, measured traits, found three QTL for a trait
with heritability of 90% (ear diameter) and a single QTL for a trait with 67% heritability
(culm diameter). Comparison of these results with the simulations show that for traits
with 90% heritability, when three or more QTL are detected there is likely to be anywhere
from four to six underlying causative genes, making a 1:1 relationship between number
of QTL and causative genes uncertain (Figure 1.2). In contrast to this result, simulated
traits with heritability of 67% and a single causative gene averaged a single detected
QTL which contained the causative gene 90% to 95% of the time. These observations
have implications for future fine mapping efforts to identify the causative gene underlying
QTL.
21
Figure 1.3: The proportion of detected QTL with zero, one, or more than one simulated
causative genes in the 1.5 LOD support interval. High numbers of causative genes lead to
detected QTL that contain multiple causative genes. There is a reasonable percentage of
detected QTL in the simulations that contain a single causative gene when few (less than
4) causative genes are simulated, but as the number of simulated causative genes increases
we quickly lose the power to distinguish between closely linked causative genes and they
become lumped into single detected QTL. Equal effect simulations shown here are very
similar to those seen for the gamma distributed effects (Supplemental Figure A.3).
22
1.5
Discussion
Previous studies in maize have found single genes underlying genomic regions of large
effect on multiple domestication traits [3–5, 41, 50]. This is in stark contrast to our work
on chromosome five, where the previously observed large effect of chromosome five on
several domestication traits in maize [8, 25] is caused by multiple regions spread across
the chromosome. This suggests the nature of genetic factors controlling domestication
traits on chromosome five of maize are different from other large domestication loci in
maize. Whether or not the situation of chromosome five in maize is unique in maize
or crop plants is yet to be seen, but the several loci identified in this study suggest
that in addition to effectively acting on highly pleiotropic, large effect single genes, the
domestication process also has the capacity to work on several linked genes of variable
effect to produce a chromosomal region of large QTL effect.
Although our results show that several regions on chromosome five contain QTL affecting different traits, this chromosomal region was initially defined as several tightly
clustered QTL in F2 crosses between teosinte and a small-eared primitive Mexican landrace [43]. In contrast, our NIRIL population was developed from a cross of teosinte by a
modern agronomic maize inbred (W22) and is expected to harbor domestication QTL as
well as improvement QTL selected on during the past 9,000 years since maize was domesticated. Thus while results from this analysis suggest chromosome five houses a complex
made of multiple linked factors, we cannot discount the possibility that a simpler genetic
architecture would have been observed had we used a primitive maize landrace rather
than the maize W22 inbred line.
One potential use of QTL mapping results is interrogation of the genes within QTL 1.5
LOD support intervals for likely candidates. The marker density in our experiment leads
to most QTL 1.5 LOD support intervals containing hundreds of annotated genes. However, two QTL had a narrow support interval that contained a relatively small number of
23
genes. These two QTL were krn5.2 and eard5.3, which co-localize to the same ∼2.3 cM
region. When expanded to the nearest genetic markers, these QTL fell between umc1348
and um1966, which spanned a 4.81 cM region that included 2.654 Mbp with 54 genes
from the maize filtered gene set (AGPv2). One interesting candidate that falls in this
range is AC212823.4 FG003, which encodes a MADS box transcription factor previously
cataloged as MADS-transcription factor 65 (mads65) in the GRASSIUS transcription
factor database [51]. Initially identified in plants as important floral organ identity regulators [52, 53], the MADS-box family of transcription factors has since been shown to
be involved in a wide variety of developmental programs in various organs and stages
of plant development [54]. This particular MADS-box gene has homology to the rice
gene OsMADS57, a type II MIKCC MADS gene. The large subclass of MIKCC MADS
genes is quite diverse with members involved in floral specification, phase transition, and
root development among other developmental functions [54]. This gene was also found to
be selected during crop improvement by a recent study [55] and was expressed in many
tissues as described in the maize gene expression atlas [56]. All of these factors make
AC212823.4 FG003 an attractive candidate in future studies to fine map the causative
gene for kernel row number on chromosome five.
The limits of a QTL experiment in terms of power and resolution are important factors
to consider when undertaking an experiment in any mapping population. To better inform
our QTL results with empirically measured traits, we explored the computational limits of
the experimental mapping population using simulated trait datasets. In this experiment,
we never detected more than six QTL for any of the simulated conditions. The most
important characteristic of simulated traits in determining number of detected QTL was
heritability and not effect type. As expected, when the number of underlying causative
genes increased to a high level, we saw the effect of multiple causative genes being rolled
into single detected QTL. This result is consistent of the Beavis Effect [49], a phenomenon
24
that describes the tendency for QTL of small effect to be combined into a single QTL with
large estimated effect. If these polygenic QTL, which can have quite high LOD score and
effect size, were chosen for fine mapping we would be unlikely to find a single underlying
causative polymorphism. Consequently, when considering QTL for fine mapping purposes,
researchers must be careful in choosing QTL that have high heritability and mapping
populations with sufficient power to resolve QTL to single genes. It is important to
realize that the simulation results reflect the specific markers, genotypes, and mapping
population used in this study. While some results are likely generally applicable to other
QTL experiments, simulations using mapping population specific parameters will provide
the best insight into potential genetic architectures and information on population power
and precision.
QTL mapping has been used to great effect to characterize the genomic regions controlling traits selected on during domestication in maize. These studies have shown that
while genetic factors controlling domestication traits are spread throughout the genome,
there are concentrated genomic regions where QTL for several domestication traits are
in close proximity to each other [8, 25]. In this study, we use a QTL mapping population of NIRILs with teosinte introgressions specific to chromosome five to closely examine
previously mapped QTL for a number of domestication traits. We confirmed QTL for
these traits exist on chromosome five, however, in our population these QTL further fractionate into multiple QTL. This is in contrast to other genomic regions of large effect
in maize where single pleiotropic genes were identified as the causative factor underlying genomic regions of large effect [3–5, 50]. The presence of multiple QTL in several
locations on chromosome five suggests the existence of a complicated, linked, multi-gene
locus controlling various aspects of domestication traits. This apparent complexity of the
chromosome five locus is consistent with results from our simulation experiment, where
25
we show that traits with multiple mapped QTL likely have a more complicated underlying
genetic architecture than is indicated by the initial QTL mapping results.
26
Chapter 2
Fine mapping of chromosome five
domestication genes in maize
27
2.1
Abstract
The fifth chromosome of Zea mays has previously been shown to contain a large effect QTL for several domestication traits. In this work I describe efforts to identify the
causative polymorphisms responsible for several of these QTL for the domestication traits
of culm diameter and kernel row number. These two QTL represent the first and eighth
highest LOD scores detected in the QTL mapping experiment of chapter 1. We utilized
several heterogeneous inbred families drawn from a BC2 S3 mapping population that were
heterozygous in the 1.5 LOD support interval of these QTL to generate two sets of recombinant chromosome nearly isogenic lines, one for the culm diameter QTL and one for the
kernel row number QTL. Lines were grown in replicated, randomized blocks in four years
and phenotypes were measured. A linear mixed model was used to obtain least squared
means for each line and we looked for segregation of the phenotype based on indel and
genotyping by sequencing markers. Simple Mendelian segregation of the lines was not
observed for any of the traits of interest, suggesting a single locus does not explain the
differences in phenotype. Consequently, we used QTL mapping software to map QTL in
the segregating regions of interest on chromosome five for culm diameter and kernel row
number. These analyses showed a highly significant heterogeneous inbred family effect
as well as multiple QTL in the target region for kernel row, suggesting the genetic factors underlying kernel row number and culm diameter have a complex relationship with
multiple loci on several chromosomes.
28
2.2
Introduction
The ultimate goal of many studies investigating the evolution of novel morphology in divergent lineages is identification of the causative genes responsible for phenotypic change.
Towards this end, genes causing new forms have been identified a number of times in
many species including maize, tomato, wheat, barley, and most successful in rice. Over
the years there have been more than 20 genes identified in rice with important effects
on agronomic and domestication phenotypes such as loss of shattering in domesticated
plants [15], increased grain yield in terms of grain number [57], grain weight [58, 59], and
plant architecture [60, 61]. In contrast, there are considerably fewer success stories in fine
mapping in other organisms. In maize, recent experiments have mapped several high LOD
score, large effect domestication QTL to single genes including teosinte branched1 (tb1 )
[3, 41], grassy tillers1 (gt1 ) [4, 39], teosinte glume architecture1 (tga1 )[5], and ZmCCT
[50, 62]. One common characteristic of these genes is they were initially characterized as
massive, high LOD, large effect size QTL.
In maize, domestication phenotypes have been shown to be largely controlled by six
regions of the genome [8]. The large concentration of domestication QTL on the fifth
chromosome has been repeatedly observed in several studies [25, 37, 43], however, little
is known about the causative genes and underlying polymorphisms that cause this large
effect. Experiments designed to examine chromosome five in maize have several challenges
caused by characteristics of the chromosome. First, this chromosome has gametophyte factor2 (ga2 ) [63], a pollen incompatibility factor which greatly influences pollination rates of
specific genotype combinations. Second, there is an extended region of low recombination
rate around the centromere (102.3 megabase to 109.2 megabase) that complicates collection of recombinant chromosomes for mapping experiments. In spite of these challenges,
characterizing the many domestication QTL for plant architecture and inflorescence traits
on the fifth chromosome of maize is a necessary step towards fully understanding the ef-
29
fect domestication had on the maize genome. While many traits have QTL that map
to the fifth chromosome, QTL with exceptionally high LOD score and effect size are of
particular interest for fine mapping studies.
A high LOD score, large effect QTL for kernel row number (krn) and ear diameter
(eard), previously reported on chromosome five of maize [25, 37, 43], was shown to fractionate into at least two or three QTL in Chapter 1. The largest QTL for both of these
traits in terms of LOD score and effect size (eard5.3 and krn5.2 ) were both located towards the right of the mapping interval between umc1966 and umc1348. The krn5.2 QTL
had a LOD score of 45.2, explained 51.98% of phenotype variation, and was estimated
to have an additive effect of -0.73 kernel rows. The co-localizing eard5.3 QTL also had
a trait high 32.7 LOD score, 25.1% variation explained, and effect of −1.41 mm. This
region was ∼1.3 cM or 2.65 Mb and was the narrowest confidence interval found for the
mapping population used in chapter 1. The kernel row number and ear diameter traits
are highly related, both affecting ear size in the transverse plane. This fact, viewed in the
context of co-localization of eard5.3 and krn5.2, suggests a single gene influences both
traits.
In addition to the high LOD score QTL for krn and eard, the fifth chromosome of
maize was shown (Chapter 1) to have QTL for plant architecture traits including tillering,
lateral branch length, and culm diameter. The QTL for culm diameter in chapter 1 had
the eighth highest LOD score detected. In contrast with the krn5.2 and eard5.3 QTL,
mapping for culm diameter revealed a single QTL of moderate effect, culm5.1. This QTL
had a considerably larger 1.5 LOD support interval (97.3 megabases), lower LOD score
(19.8), lower variation explained (21.27%), and smaller additive effect size (−0.67 mm).
The characteristics of culm5.1 in terms of number of QTL, LOD score, and effect size
make for a different type of fine mapping candidate than krn5.2 and eard5.3.
30
An experiment was designed to further investigate and identify the causative polymorphisms behind the large effect and LOD score krn5.2 /eard5.3 QTL and the moderate
effect culm5.1 QTL. This project used a collection of recombinant chromosome nearly isogenic lines (RCNILs) grown in replicated randomized blocks over multiple years. These
RCNILs were derived from heterogeneous inbred families (HIFs) drawn from a BC2 S3
population with a massive ear diameter QTL [25] with a maximum LOD score of 144.4.
Lines were generated, genotyped, and grown in replicated blocks in the summers of 2010,
2012, and 2013. RCNILs did not segregate cleanly in the target QTL 1.5 LOD support
intervals for the kernel row number and culm diameter phenotypes. I next used genomewide genotyping and QTL mapping methods to account for secondary segregating regions
in the genome. The results of this analysis suggest that not only are secondary sites segregating with significant effects on kernel row number and culm diamter, but that multiple
factors are again segregating within the initial target QTL support interval. Overall, these
results suggest the genetic architecture controlling domestication traits is quite complex
with multiple loci contributing to kernel row number and culm diameter phenotypes across
the genome. Chromosome five in particular appears to house a collection of genes affecting several domestication traits and represents at least three linked loci that may have
been selected as a unit during maize domestication.
2.3
2.3.1
Materials and Methods
Plant material
We chose to identify the causative genes underlying the large LOD score and effect size
QTL for kernel row number (krn) and culm diameter (culm) on chromosome five using recombinant chromosome nearly isogenic lines (RCNILs). These lines consist of individuals
carrying two copies of a recombinant chromosome with a recombination breakpoint in the
31
region of interest, which corresponds with the 1.5 LOD support intervals for culm5.1 and
krn5.2. Based on QTL mapping results from chapter 1, the two QTL are adjacent to each
other with the culm diameter QTL from 54,416,924 to 151,717,831 bp and the kernel row
number QTL from 166,576,639 to 169,231,037 bp. Base pair coordinates for these QTL
are based on BLAST of flanking marker primer sequences against the second version of
the maize reference genome (AGPv2) [9]. I chose to generate RCNILs from segregating
heterogeneous inbred families (HIFs) taken from a large BC2 S3 mapping population. Four
founding HIFs, two per QTL, heterozygous for the genomic region of interest defined by
QTL 1.5 LOD support intervals and surrounding regions were used in production of RCNILs. Care was taken to use HIFs with limited heterozygosity adjacent to the primary
region of interest and elsewhere in the genome.
A large number of plants from each HIF were screened with PCR based insertion deletion (indel) markers flanking the region of interest to identify plants with recombinant
chromosomes in the summers of 2009 and 2010. The initial screening of HIFs for individuals with recombinant chromosomes used three flanking markers (ZHL0029, ZHL0033,
and umc1966) located at 38,994,478 bp, 151,446,717 bp, and 169,230,959 bp, respectively.
These markers were chosen to be as close as possible to the boundaries of the QTL.
Individuals with recombinant chromosomes were self pollinated and seed was harvested
and planted in the following winter grow seasons. Plants were grown in winter seasons
in a greenhouse environment, where they were genotyped again at the flanking markers
to identify plants homozygous for the initially detected recombinant chromosome. These
individuals were then self pollinated to make RCNIL seed, carrying two copies of the original recombinant chromosome, for use in subsequent summers for randomized phenotyping
blocks and seed increasing purposes.
32
2.3.2
Field Trials and Phenotypes
The RCNILs were grown in a total of 16 replicated, randomized blocks in multiple summers between 2010 and 2013. Phenotyping experiments took place at the West Madison
Agricultural Research Station (WMARS) with RCNILs for the culm5.1 QTL grown in
2010 and 2012 with the krn5.2 QTL lines grown in 2012 and 2013. When possible, seed
for a single RCNIL was taken from up to five seed packets and mixed prior to planting in
order to minimize the effect of any single seed lot (mother plant) on phenotype. In each
summer, four blocks of RCNILs per QTL were grown in twelve plant plots. Individuals
were planted with equal spacing in 14 foot rows with 30 inches between adjacent rows
and two foot walkways separating the end and start of a new row. Up to five individuals
per plot were measured in 2010 and 2012, while in 2013 kernel row number was assessed
for all possible plants.
In addition to twelve plant plots, select lines for the culm diameter QTL were grown in
a phenotyping block of fully randomized single plant plots (SPP) in the summer of 2012.
This block of plants consisted of 60 individuals each from seventeen RCNILs and eight
control RCNILs (homozygous for the maize or teosinte chromosomal segment) grown in a
completely randomized scheme. The seventeen RCNILs were chosen due to recombination
breakpoints being close to preliminary estimates of the causative gene location based on
initial analysis of data from the summer of 2010. Individual plants were separated by
a larger than normal distance (30 inches in the X and 48 inches in the Y dimension) in
order to allow them to grow to their full phenotypic potential with minimal competition
and shading from neighboring plants.
Traits were measured by hand with culm diameter taken manually in the field with
calipers at the narrowest point of the stalk and kernel row number counted after harvest
in the lab (2012) or in the field (2013). In the SPP we also measured culm diameter at the
largest point to calculate culm area and other basic plant architecture traits (plant height
33
and tiller number) for use in later analyses. In total, 3,182 individuals were assessed for
culm diameter (1,021 in 2010 and 2,161 in 2012) and kernel row number was counted
for 8,625 individuals (3,168 in 2012 and 5,457 in 2013). A highly related trait to kernel
row number (ear diameter) with a co-localizing QTL detected in chapter 1 (See Table 1.3
for details) was also measured in some environments, but was not considered for later
analyses since kernel row number and ear diameter are highly related traits.
2.3.3
Genotyping with PCR and next generation sequencing
Genomic DNA was extracted from the initial plant of each RCNIL with a standard CTAB
method and genotyping from this “founder” individual was used to represent the RCNIL
genotype in later analyses. The genotypes of RCNILs were obtained using two strategies,
a PCR based method targeting known polymorphisms and a high throughput next generation sequencing protocol. All RCNILs were genotyped using PCR of known indels and
single nucleotide polymorphisms (SNPs), while a subset were genotyped using the high
throughput genotyping by sequencing (GBS) protocol. All RCNILs developed for fine
mapping of krn5.2 were genotyped by GBS while only a subset of culm5.1 RCNILs were
genotyped by GBS. However, genotyping of culm5.1 lines was done with a more extensive
collection (18 markers) of PCR markers than krn5.2 RCNILs (5 markers).
PCR based genetic markers (Supplemental Table B.1) were used to genotype RCNILs with standard agarose gel electrophoresis, florescent fragment analysis, and Sanger
sequencing detected SNPs. These three styles of marker were initially developed by identification of scorable polymorphisms that distinguished maize and teosinte control RCNILs
through Sanger sequencing of annotated genes in the maize reference genome (AGPv2).
Size polymorphism differences greater than approximately 10% of total PCR product
length were scored on 4% agarose gels and smaller size polymorphisms were redesigned
with florescently labeled primers and genotyped using GeneScan software (v1.70) from
34
Applied Biosystems. If the only scorable polymorphism was a SNP, RCNILs were genotyped by Sanger sequencing and hand calling of SNPs.
While great care was taken to choose founding HIFs with minimal heterozygosity, all
HIFs had secondary sites segregating elsewhere in the genome. In order to identify these
regions and account for their effect on phenotype, we performed GBS [64] on RCNIL
genomic DNA for all kernel row number and the subset of culm diameter lines grown in
the single plant plot experiment. In order to use the GBS protocol, additional molecular
work was required. DNA was treated with 1 µL of RNaseI at room temperature for 30
minutes to remove total RNA from the CTAB DNA preparation. Next, the samples were
digested using the methylation sensitive ApeKI restriction enzyme and 96-plex barcoded
sequencing adapters were ligated to individual samples. Finally, the 96 barcoded samples
were mixed and sequenced (100 bp reads) on an Illumina HiSeq machine [64]. Sequence
tags were aligned to the reference maize genome (AGPv2) and SNPs were called and imputed using the GBS pipeline as implemented at Cornell University. This GBS procedure
resulted in 955,650 SNPs made up of raw A, T, C, and G SNP calls for the RCNILs across
the ten maize chromosomes.
Raw SNP calls were further processed in order to call RCNIL genotypes into maize
and teosinte using a custom Perl script. The genotype calls were made using SNP calls
from the pure maize parent inbred line, W22. Only biallelic markers (43,025 total) were
kept and the non-W22 SNP in the RCNILs was assumed to be the teosinte allele. After
converting the genotypes into maize, teosinte, and heterozygous calls, SNPs separated
by less than 100 base pairs were merged into a single marker, leaving 25,736. If SNP
genotypes within a merged marker did not agree they were converted to missing data,
“N”. After marker genotypes were called and merged, a final genotype imputation step
was carried out using another custom Perl script. In an effort to have this script correct
bad and missing data, all genotype calls were subject to imputation. The criteria for
35
changing a call in any given RCNIL involved ten marker windows both upstream and
downstream of a given marker. If all markers in one direction or the other were 100%
consistent, then the genotype was changed if and only if seven of the ten markers on the
other side were also the same genotype.
The imputation methods for GBS data described above greatly improved genotype
continuity, however, certain regions of the genome were still questionably called. The
most inconsistently called genomic regions included extended heterozygous and recombination breakpoints where genotypes switched. Following the processing steps using
custom Perl scripts, the data were manually screened to remove and correct inconsistently called markers. Uninformative markers where the adjacent marker to either side
had exactly the same genotypes were also removed from the dataset. Regions of the
genome where RCNILs had maize and teosinte fixed genotypes that associated with HIF
(non-segregating regions fixed for different genotypes in the founding HIFs) were also removed. Finally, independently segregating regions on the same chromosome were given
unique chromosomes names (5a, 5b, etc.) to avoid inflation of the genetic map between
fixed ancestral recombination breakpoints. After imputation and filtering, a total of 522
genome-wide GBS markers spread across 13 segregating regions of the genome on six
chromosomes were used in the final analysis. The four other maize chromosomes were
completely fixed for a single homozygous genotype and consequently were excluded from
the analysis.
2.3.4
Statistical analysis and segregation of phenotypes
We utilized the statistical program SAS to fit a linear mixed model with the PROC
MIXED command [44]. Variables used in the model included the RCNIL, HIF, block, year
grown, and position within the block. A forward model selection method was used in which
the starting model had a minimum number of variables (fixed effects for HIF and RCNIL
36
nested in HIF) and additional variables were added to the model one at a time until the
Aikake Information Criterion (AIC) reached its lowest point. The most complicated model
selected was for the culm diameter single plant plot experiment, where five explanatory
variables were used (Table 2.1). In these models Y stands for the measured phenotype,
µ stands for the grand mean, ai the RCNIL, fj corresponds to the HIF, bk the block, cl
and dm denote the horizontal and vertical position in block respectively, tn stands for the
year, hp is the tiller number phenotype used in SPP experiment only, and finally e and g
are error terms. While the single plant plot culm diameter had the most complex model,
the other models had only one less variable.
Least squared means for the RCNIL nested in HIF effect were extracted and used
as an average line phenotype value for subsequent analyses. The goal of these analyses
was to associate RCNIL phenotype and genotype. If a single locus in the segregating
region is responsible for the phenotypic effect, one should observe simple, clean Mendelian
segregation of least squared means based on genotype. Towards this end, RCNILs were
sorted by phenotypic value (as represented by least squared mean). Unfortunately, we did
not observe segregation of least squared means based on genotype, suggesting multiple
factors influence the measured traits.
We have two main hypotheses as to why RCNILs failed to segregate in a Mendelian
manner. First, the less advanced nature (in comparison to the BC6 S6 population from
chapter 1) of the BC2 S3 founding HIFs of the RCNILs may have additional factors segregating elsewhere in the genome that are confounding Mendelian segregation. Second,
the primary locus of interest on chromosome five is not a single gene, but rather multiple
linked genes that when split up by the various recombination breakpoints in our RCNILs
leads to complicated segregation patterns. In order to investigate both of these possibilities, we obtained whole genome genotypes using GBS (described above) and mapped
QTL in the R/qtl software package for plants grown in twelve plant rows. The single
37
Table 2.1: Final linear mixed models used to produce least squared means for fine mapping
RCNILs.
Trait
Linear Mixed Model
culm (rows)
Yijkmo = µ + ai (fj ) + fj + bk + dm (bk ) + eijklm + gijkmo
culm (SPP) Yijlmop = µ + ai (fj ) + fj + cl + dm + hp + eijklmnp + gijkmnpo
krn (rows)
Yijkno = µ + ai (fj ) + fj + bk (tn ) + tn + eijklmn + gijkmno
38
plant plot culm diameter experiment was not analyzed with QTL mapping methods since
it only included seventeen RCNILs and consequently lacked power for a QTL analysis.
The benefit of using GBS and the statistical methods of QTL mapping are simultaneous exploration of multiple factors in the target QTL region and secondary genomic
regions of significant effect outside the QTL. A potential flaw in this approach is lack of
statistical power to differentiate closely linked, moderate effect QTL in the relatively small
RCNIL fine mapping populations. The full set of RCNILs was used to map QTL for the
krn and culm diameter traits in order to maximize our potential power to differentiate between tightly linked factors. In total, 75 lines were used in mapping of the culm5.1 QTL
(67 recombinant RCNILs and 8 homozygous maize and teosinte controls). The krn5.2
QTL was mapped with 92 lines, all of which were recombinant chromosome lines. QTL
mapping was conducted using the R/qtl package [45] with genetic maps calculated using
the Kosambi mapping function with 0.001 error rate. Ten thousand permutations of the
data were used to define a significant QTL threshold. QTL were mapped using a step-wise
model based approach where QTL were added to a model one-by-one using the addqtl,
fitqtl, and refineqtl functions of R/qtl until no more significant QTL were detected. In
addition to using detected QTL in the model, the founding HIF was used as an additive
covariate to account for variation caused by fixed non-segregating regions of the genome
that differ between HIFs removed in the manual curation of GBS genotypes. Details of
the step-wise QTL mapping method are available in chapter 1 methods.
2.4
2.4.1
Results
RCNIL generation and phenotype least squared means
I screened 4,180 total individuals from the four founding HIFs for recombinant chromosomes in the summers of 2009 to 2011. In total 67 and 92 recombinant individuals
39
Figure 2.1: Histograms of least squared means for the culm diameter and kernel row
number phenotypes. Distribution of least squared means is approximately normal for the
culm diameter least squared means. The kernel row number counts have a noticeable
left skew. Average least squared mean for homozygous teosinte and maize RCNILs (designated by solid and dashed lines, respectively) have the expected relationship with the
teosinte average always being the lower phenotypic value.
40
were identified and turned into RCNILs in the 1.5 LOD support intervals of culm5.1 and
krn5.2 /eard5.3, respectively. The vast majority (3,230 of 4,180) of screened individuals came from HIFs intended for study of the culm diameter QTL. This large number
of individuals was required due to the presence of the centromere in the middle of the
target QTL region, which greatly reduced recombination rate and limited the number of
recombinant individuals.
Three linear mixed models were used to analyze the phenotype data for kernel row
number and culm diameter. Each model was selected using a forward selection method in
which one variable was added to the model at a time until the model fit, as measured by
AIC, did not improve. When plotted as histograms, the least squared means of the various
RCNILs followed a roughly normal distribution for culm diameter, while the kernel row
number trait had a left skew. RCNILs homozygous for the maize and teosinte segment
showed the expected relationship with maize RCNILs having larger culm diameter and
more kernel rows (Figure 2.1).
2.4.2
PCR and GBS genotyping
Initial genotyping of the RCNIL homozygous recombinant chromosome genomic DNA
was carried out through traditional methods using PCR. In total I placed 18 markers on
75 RCNILs (including four maize and four teosinte control lines) for the culm diameter
QTL and five markers on 92 RCNILs (no maize or teosinte control lines) for the kernel
row number QTL (Table B.1). Only five markers were placed on kernel row number
RCNILs for two reasons. First, the kernel row RCNILs had recombination events in a
much smaller physical distance (17.78 Mb versus 112.45 Mb for the culm diameter QTL).
Second, we expected to obtain thorough genome-wide genotyping using GBS, which had
already been initiated for the krn RCNILs.
Figure 2.2: GBS genotypes for kernel row number RCNILs. Thirteen regions across the genome are segregating in the
kernel row number RCNILs. The primary QTL of interest is located in the 5b region where all RCNILs have crossover
events. Secondary segregating regions in only one of the two founding HIFs are clearly visible for several genomic regions,
for example chromosome 8c segregates in HIF MR0841 but not MR0818. Figure is scaled to marker, so each unit of length
represents a single marker.
41
42
Of the nearly one million original SNP calls, only about 5% appeared to be segregating in a biallelic manner. The end genotyping resulted in zero segregating markers on
chromosomes one, two, four, and six. The structure of the founding HIFs implies each independent region of the genome that was heterozygous segregates independently of other
regions. To account for this, each segregating region was assigned its own linkage group
(5a, 5b, etc.) for QTL mapping so that non-segregating segments between heterozygous
regions would not influence the results. Overall, 522 markers in 13 linkage groups were
segregating across the six other chromosomes (Figure 2.2).
2.4.3
QTL fail to segregate as Mendelian traits
RCNILs were sorted by the phenotype least squared mean from least to greatest and we
looked for distinct maize and teosinte RCNIL groupings. There was not a clean segregation
of RCNILs into maize and teosinte classes for a single marker, suggesting multiple factors
within the primary QTL of interest or elsewhere in the genome are influencing the traits
of interest (Figure 2.3). The culm diameter trait came closer than the kernel row number
trait to clean segregation, especially for lines planted in the single plant plot.
An additional complication for both the culm diameter and kernel row number fine
mapping was the distinct difference between the grand mean of RCNILs derived from
different founding HIFs. For kernel row number, there was an average difference of approximately 1.8 kernel rows between RCNILs from different HIFs and the average rank
of HIFs differed by over 40. For culm diameter, the two HIFs differed by an average of
approximately 0.1 cm (Figure 2.4). Founding HIF was part of the linear mixed model used
to produce least squared means, but obviously the model failed to fully correct for differences between founding HIFs. With this in mind, HIF was used in subsequent mapping
methods to further account for differences caused by the starting HIFs.
43
Figure 2.3: RCNILs sorted by phenotype from least to greatest. Genotypes for RCNILs
are indicated on the left by green (teosinte), yellow (maize), grey(heterozygous), or white
(N) with least squared means as barplots on the right. (A) Culm diameter least squared
mean from the twelve plant rows. (B) Culm area as measured in the single plant plots. (C)
Kernel row number counted from twelve plant rows. A single causative gene should lead
to segregation as a Mendelian locus when sorting RCNILs by phenotype. This was not
seen and the genotypes appear more or less random suggesting multiple factors influencing
phenotype in the RCNILs.
44
Figure 2.4: Density plots of the culm diameter and kernel row number phenotypes grouped
by founding HIF. Distinct differences between distributions are visible between the two
founding HIFs for culm diameter (both in the (A) single plant plot and (B) twelve plant
row designs) as well as for the (C) kernel row number phenotypes. The overall phenotype
means for each HIF are designated by the dashed line for the red HIF and the solid line
for the blue HIF).
45
2.4.4
Multiple factors contribute to culm diameter and kernel
row number
QTL mapping was performed using least squared means as phenotypes and merged genotypes from GBS and PCR methods. Since a limited number (17) of culm diameter RCNILs
were genotyped with GBS, we used PCR markers only for culm diameter mapping and
consequently QTL were only mapped in the primary segregating region of interest on
chromosome five between 59.6 Mb and 144.8 Mb. In contrast, all 92 RCNILs generated
for fine mapping of the kernel row number phenotype were genotyped by GBS allowing
for full accounting of QTL in genomic locations away from the primary QTL of interest.
A single QTL was detected for the culm diameter trait, suggesting a single factor could
be responsible for culm5.1 (Figure 2.5). However, there was a very significant founding
HIF effect in the QTL mapping model (Table 2.2). This founding HIF effect (F-test, p
< 8.59e-10) suggests secondary sites in the genome are still at play and could explain the
inability to observe clean, simple segregation of RCNILs based on genotype in the QTL of
interest. While mapping of a single QTL for culm diameter is encouraging, the relatively
weak QTL LOD score (5.1) and small additive effect (-0.035) in comparison with the HIF
LOD (8.7) and effect (0.098) tells us that secondary sites are more important contributors
to culm diameter than the QTL we were seeking to fine map.
Results for kernel row number QTL mapping are not particularly comparable to the
culm diameter results due to the inclusion of full genome genotypes, which extended
the mapping to segregating sites elsewhere in the genome. Four QTL were detected
(Figure 2.5), two in the primary region of interest on chromosome five with a single QTL
each detected on chromosomes seven and ten (Table 2.2). Unsurprisingly, the founding
HIF once again had a very significant effect (F-test, p < 2e-16). The two QTL in the target
region had the highest LOD and additive effect of mapped QTL. Like culm diameter, the
HIF effect had the overall highest LOD score and effect.
46
Figure 2.5: QTL LOD profiles for fine mapping of culm diameter and kernel row number
traits. QTL are color coded and labeled as “chromosome@position”. So the highest LOD
score kernel row number QTL ([email protected]) should be read as QTL on chromosome 5b at
position 7.0. (A) Culm diameter LOD profile for the single QTL detected in the primary
mapping region. LOD score (y-axis) versus map position in centimorgans is shown. (B)
Kernel row number LOD profiles for four detected QTL, two in the primary region of
interest on chromosome 5b. Secondary QTL on chromosomes 7b and 10a have lower LOD
score and effect size than the 5b QTL. In addition to significant QTL, a highly significant
HIF effect with high LOD score (culm = 8.688, krn = 19.787) was also included in QTL
models for both traits.
47
Table 2.2: Detected QTL and HIF effects including LOD, percent variation explained,
and additive effect.
Name
LOD
Var. Explained (%)
Add. Effect
krn5b.1
krn5b.2
krn7b.1
krn10a.1
krn HIF
10.161
13.309
4.758
5.391
19.787
7.28%
10.40%
2.95%
3.40%
18.59%
-0.413
-0.387
0.264
-0.114
-1.550
krn model
44.127
89.02%
—
culm5.1
culm HIF
5.126
8.688
18.73%
35.69%
0.0975
-0.0353
culm model
11.081
49.36%
—
48
2.5
Discussion
2.5.1
The complex genetic architecture of culm and kernel row
number
Efforts to identify causative factors underlying QTL have recently been met with great
success in maize. These successful studies have identified genes contributing to loss of
prolificacy [4], day length neutrality [50, 62], liberation of the kernel from its fruitcase [5],
and apical dominance [3]. Our study set out to contribute to this growing list of genes
by examining domestication QTL affecting important traits on the fifth chromosome.
Unfortunately, we were unable to identify a single gene contributing to the domestication
traits of culm diameter and kernel row number. Instead, we found evidence of multiple
factors on chromosome five and other chromosomes controlling kernel row number and
culm diameter suggesting the underlying genetic architecture for these traits is quite
complex.
Prior analyses identified domestication QTL for culm diameter and kernel row number
spanning the fifth chromosome from 54.4 to 151.7 Mb and 166.6 to 169.2 Mb, respectively.
Using QTL mapping with fine mapping RCNILs produced mixed results. The culm5.1
QTL was further refined to a much smaller region from 83.74 Mb to 86.26 Mb on the fifth
chromosome, a reduction in size to ∼2.5 Mb from an initial 1.5 LOD support interval of
close to 100 Mb. Unfortunately, the RCNILs used to fine map krn5.2 resulted in multiple
causative QTL on the fifth and other chromosomes while also harboring major differences
between founding HIFs. The fine mapping QTL for kernel row number closest to the
original target region was located from 160.7 Mb to 163.94 Mb on the fifth chromosomes,
shifted upstream of the original interval by approximately 3 Mb. It is interesting that
the largest LOD score QTL from chapter 1 moved and fractionated into multiple factors,
while the comparably smaller LOD score and effect size QTL culm5.1 was narrowed to
49
an interval ∼2.5% the size of the original interval. In terms of number of genes in the 1.5
LOD support intervals, the fine mapping region for culm diameter had a total of 40 genes
and the kernel row number QTL had 63 genes. While this fell short of the ultimate goal
of a single gene, a small enough number of genes are in the confidence intervals to begin
looking for interesting candidate genes.
The forty genes in the culm diameter QTL were characterized by looking at functional
annotation, expression results from chapter 3 of this thesis, and inclusion in selection
features from a recent genome-wide population genetics scan in maize [55]. In terms of
protein functional annotations, these genes had a variety of biological functions such as
nucleases, transmembrane proteins, metabolic enzymes, chlorophyll binding proteins, and
a number of transcription factors. Gene regulatory differences from the allele specific
RNAseq experiment gave results for 21 of the 40 genes. Seven genes were classified as
having a significant cis regulatory change. However, none of these seven genes were part of
the final filtered candidate gene list. Eight of the forty genes were also inside domestication
selection features, suggesting genes in the culm QTL were under positive selection during
maize domestication. While evidence points to differential gene expression and selection
on the genes in the culm5.1 QTL, no single gene has multiple lines of supporting evidence.
Genes in the highest LOD score kernel row number QTL were also examined for
interesting candidates. Like the culm5.1 QTL, genes in the kernel row QTL had many
different functions including ubiquitin association, ribosomal proteins, nucleases, nuclear
transporters, and several transcription factors. Of the 63 total genes, 36 were not assessed
by the chapter 3 RNAseq experiment. The majority of the assayed genes (24) were
not on filtered candidate gene lists for cis regulatory change, however three genes were.
The most interesting candidate is an armadillo repeat containing protein with a U-box
domain. Armadillo proteins were first characterized in fruit fly and are implicated in
a number of functions including intracellular signaling and cytoskeletal regulation. The
50
U-box family of proteins is a class of ubiquitin-protein E3 ligases. While there is evidence
for positive selection during maize domestication in the krn QTL, none of the genes with
cis regulatory change show signs of positive selection, leaving no ideal candidate for the
kernel row number QTL defined in our fine mapping experiment.
This work provides a cautionary note for researchers looking to identify causal genes
for QTL. In this study we set out to identify the causative gene underlying two QTL, a
large effect and LOD score QTL with a narrow confidence interval and a moderate effect
and LOD score QTL with a larger support interval. Contrary to expectations, we were
actually more successful in narrowing the QTL region for the weaker effect QTL, while
the high LOD kernel row number QTL shifted positions slightly and was influenced by
multiple factors. We show that the inheritance of genetic factors influencing kernel row
number on chromosome five are quite complicated and that a previously mapped high
LOD score QTL fractionates into multiple linked factors. The lower LOD score QTL for
culm diameter actually resulted in the better fine mapping result with a greatly reduced
confidence interval.
2.5.2
Future work on chromosome five QTL
The fifth chromosome of maize has been implicated as a major contributor to maize domestication in several studies [8, 25]. QTL for the kernel row number and ear diameter
traits are of particular interest due to their large effect, high LOD score, and obvious link
to desirable domestication phenotypes. The work in this thesis shows that ear diameter
and kernel row number fractionate into multiple linked QTL on the fifth chromosome.
Evidence from fine mapping and chapter 1 of this thesis put the kernel row number QTL
between 160.7 and 169.2 Mb. Unfortunately, this region contains over 100 genes and we
could not identify a single highly attractive candidate gene based on gene annotation, expression profiles, and scans for selection. Even though these efforts were not met with full
51
success, the importance of chromosome five on domestication traits (kernel row number
and ear diameter in particular) cannot be understated and future studies looking at this
chromosome are inevitable.
To aid future studies on these QTL, insight can be taken from this work to maximize
the chances of success. I believe there are two primary insights that would be useful
to future researchers in this endeavor. First, the uniformity of the genetic background
on chromosome five and other chromosomes appears to be of critical importance. The
founding HIFs taken from a BC2 S3 population in this experiment proved to have a complicated background with multiple secondary segregating sites that caused problems when
mapping in the target QTL interval. Consequently, a more advanced population would
be desired. Second, distinct differences between founding HIFs were detected for both the
culm diameter and kernel row number QTL, suggesting comparison of RCNILs generated
from different HIFs could be misleading. Either designing the experiment to draw on a
single founding HIF to avoid this issue or accounting for HIF in analysis of the phenotype data will be important. In spite of accounting for founding HIF in the linear mixed
models, I still observed a large difference in kernel row number between founding HIFs
suggesting simplification of the experiment to use a single HIF may be the best design.
The use of more extensive backcrossing and generation of RCNILs from a single founding HIF will allow for an overall more isogenic genomic background with minimal segregation outside of the desired region. Drawing starting HIFs from the BC6 S6 NIRIL
population from chapter 1 is an easy and logical way to do this. Additionally, the kernel
row number QTL is already confirmed in the population. Towards this end, we have
started the crosses necessary to produce a new population of segregating RCNILs from
several of the lines in the mapping population from chapter 1. These RCNILs will be used
in future field trials for a new and improved fine mapping attempt of the highly important
kernel row number and ear diameter phenotypes on chromosome five.
52
Chapter 3
The role of cis regulatory evolution
in maize domestication
53
3.1
Abstract
Gene expression differences in divergent lineages caused by modification of cis regulatory
elements are thought to be a critically important process in the evolution of species.
In this study, we assay genome-wide cis and trans regulatory differences between maize
and its wild progenitor, teosinte, using deep RNA sequencing in F1 hybrid and parent
inbred lines. Three tissues were sampled and approximately 70% of ∼17,000 genes showed
evidence of allele specific expression. Approximately 1,000 of these genes show consistent
cis differences among the sampled maize and teosinte lines, of which ∼70% are specific to a
single tissue. The number of genes with cis regulatory differences is greatest for ear, which
underwent a drastic transformation in form during domestication. Genes with cis effects
were also under positive selection during maize domestication and improvement more often
than expected by chance. Over all genes, maize was shown to possess less cis regulatory
variation than teosinte, a deficit that is greatest for genes with cis regulatory divergence.
We observed a directional bias where genes with cis differences favored higher expression in
maize, suggesting domestication led to a general upregulation of gene expression. Finally,
this work documents the cis and trans regulatory changes between maize and teosinte in
over 17,000 genes for three tissues.
54
3.2
Introduction
Changes in the cis regulatory elements (CREs) of genes with functionally conserved proteins have been considered a key mechanism, if not the primary mechanism, by which the
evolution of the diverse forms of multicellular eukaryotic organisms evolved [12, 13, 65].
Variation in CREs allows for the deployment of tissue specific patterning of gene expression, differences in developmental timing of expression, and variation in the quantitative
levels of gene expression. Furthermore, modification of CREs, as opposed to coding sequence changes, are assumed to have less pleiotropy and consequently a lower chance of
being deleterious due to unintended consequences in secondary tissues. The importance
of CREs for the development of novel morphologies is supported by the growing catalog
of examples for which differences in CREs of specfic genes between closely related species
contributed to the evolution of diversity in form and pigmentation patterning [66].
While compelling evidence for the importance of CREs in evolution has come from
mapping causative variants to CREs, additional evidence has been emerging from genomic
analyses. These analyses have shown that cis regulatory variation is abundant both within
[67–70] and between species [20, 21, 71]. Some studies have reported a bias such that genes
with cis differences between species or ecotypes often show preferential upregulation of
the alleles of one parent, possibly as a result of natural selection [21, 68, 72]. Consistent
with the proposal that cis differences are a key element of adaptive divergence, divergence
for cis regulation between yeast species is more often associated with positive selection
than trans divergence [20, 73].
Crop plants offer a powerful system for the investigation of evolutionary mechanisms
because they display considerable divergence in form from their wild progenitors, yet
exhibit complete cross-fertility with these progenitors [7, 36, 74]. QTL fine-mapping
experiments have provided multiple examples of changes in CREs that underlie trait
divergence between crops and their ancestors. These studies include examples in which
55
cis changes confer the upregulation of a gene during domestication [3], the downregulation
of a gene [14, 62], the loss of a tissue specific expression pattern [15], the gain of a tissue
specific expression pattern [4], and a heterochronic shift in the expression profile [16].
These diverse results suggest that changes in CREs offer a powerful means to fine-tune
gene expression to generate new plant morphologies.
Several genomic scale assays of gene expression differences between crops and their
ancestors have been performed, although the experimental designs used did not allow
the separation of cis and trans effects. These studies have shown that hundreds or even
thousands of genes have altered expression in crops as compared to their progenitors and
that genes with altered expression are more likely to show evidence for past selection than
genes with conserved expression [17–19]. The data suggest massive alterations in gene
expression profiles accompanied domestication. Work in cotton and maize shows a more
frequent upregulation of genes in the cultivated as compared to the wild parent, however
whether this was due to cis or trans effects was not discernible [17, 18].
In this study, we used RNAseq to parse genome-wide expression differences between
maize and its progenitor, teosinte (Zea mays ssp. parviglumis), into cis and trans effects.
Three tissue types were assayed: immature ear, seedling leaf, and seedling stem. Approximately 70% of the 17,000 genes assayed show evidence of regulatory divergence between
maize and teosinte. Over 1,000 genes show cis divergence that is highly consistent across
our sampled lines of maize and teosinte. For ∼70% of genes with consistent cis effects,
the cis effects are specific to just one of the three tissue types. The number of genes with
cis differences is greatest for the ear, which underwent a profound transformation in form
during domestication. Genes with cis regulatory differences between maize and teosinte
more frequently show evidence for positive selection associated with domestication than
do trans genes. Maize also possesses less cis regulatory variation than teosinte over all
genes, and this deficit in maize is greatest for genes with cis regulatory divergence from
56
teosinte. We observed a directional bias in that genes with cis differences more frequently
have upregulated expression of maize alleles over teosinte, although we cannot exclude
the possibility that this is an artifactual result. Finally, our data provide a catalog of cis
and trans regulatory variation for over 17,000 genes in three tissue types for maize and
teosinte.
3.3
3.3.1
Materials and Methods
Plant material, RNA preparation, and sequencing
Six maize inbred lines, nine teosinte inbred lines, and 29 of their 54 possible maize-teosinte
F1 hybrids were used in this experiment (Supplemental Table C.1). An average of 1.96
biological replicates (range 1 to 4) of each genotype were used. Plants were grown in
growth chambers with a 12 hour dark-light cycle for up to 6 weeks, after which they were
moved to a greenhouse. Fifty to 100 milligram samples of the immature ear, leaf, and
seedling stem were harvested for RNA extraction during this time. Leaf and seedling stem
(including the shoot apical meristem) tissue was collected at the v4 leaf stage. Single ears
from maize and F1 hybrid plants were collected when the ears weighed 50 to 100 milligrams
with silks just beginning to be visible. Teosinte ears were also collected when silks just
started to appear, however, due to the small size of teosinte ears 7 to 16 ears (average of
11.27) from each plant were pooled to obtain ∼50 milligrams of tissue. These three tissue
types will from here on be referred to as the ear, leaf, and stem tissues.
Total RNA was extracted from the plant tissues using a standard TRIzol protocol. Total RNA was then quantified by spectrophotometer and normalized to 1 µg/µL in nuclease
free water. Starting with 5 µg total RNA, we generated polyA selected, strand specific,
barcoded RNAseq libraries with a previously published protocol using a five minute fragmentation time and 12 PCR amplification cycles [75]. Library adapters used barcode
57
sequences of four and five base pairs (Supplemental Table C.2) designed to balance percent nucleotide composition within the first five base pairs of sequence reads and to have
at least two base pair differences from any other barcode. RNAseq libraries were then
pooled in groups of 14 (F1 s) or 15 (parents), and the pooled libraries sequenced on one
lane (parents) or two lanes (F1 s) of an Illumina HiSeq2000 sequencer at the University of
Wisconsin Biotech Center.
3.3.2
Bioinformatics
A pipeline was developed to quantify gene expression in F1 hybrid and parental inbred
lines using the RNAseq reads. The pipeline, based on work by Wang et al. [76], has two
main steps (1) construction of a pseudo-transcriptome for each parent line from the B73
reference genome and polymorphisms derived from non-B73 genomic paired-end reads
and (2) alignment of RNAseq reads to the pseudo-transcriptomes followed by evaluation
of read depth at segregating sites.
Pseudo-transcriptomes were constructed using the B73 reference genome (version
AGPv2) and transcriptome (version ZmB73 5a WGS) plus an average of 403.1 million
(17.5X coverage) paired-end genomic sequencing reads from each of the other 14 inbred
lines (Supplemental Table C.3). For each of the 14 non-B73 inbreds, paired-end genomic
sequencing reads were aligned to the reference genome with the BWA aligner (version
0.5.9) [77]. Only uniquely mapping reads with up to two mismatches were used to limit
false polymorphism detection due to paralogous read alignment. Segregating sites from
single nucleotide polymorphisms (SNPs) and small insertion or deletion (indel) polymorphisms were called using the GATK package (version 1.0.5588) [78, 79] and filtered to
include only polymorphisms that were homozygous in the inbred with read depth of at
least 4X. A strand bias filter was also applied to ensure that the polymorphism was detected on both the plus and minus strand. Polymorphisms surviving these filters were
58
then inserted into the reference B73 transcriptome to make a pseudo-transcriptome for
each parent.
For each of the 29 maize-teosinte pairs, a robust set of segregating sites was determined
by comparing the pseudo-transcriptomes of the two parents and taking the sites where: the
two parental alleles differed, coverage in genomic read alignment was at least four for both
parents within the read length (88bp) of the site, and no heterozygous polymorphisms
were detected in genomic read alignments of the two parents within the read length of
the site.
RNAseq reads from each F1 hybrid and each corresponding pair of inbred parents
were then aligned to the combined pseudo-transcriptomes of the two parents (in the case
of the B73 parent, the B73 reference transcriptome was used) using the Bowtie aligner
(version 0.12.7) [80]. Allele specific expression was assessed by counting depths of reads
originating from each parent at segregating sites (determined as described above). Since
only perfect alignments were allowed, assignment of reads to parents was straightforward
(a read from a given parent could only align to this parent’s allele at a segregating site).
3.3.3
Maize:teosinte gene expression ratios
We calculated F1 hybrid and parent maize:teosinte expression ratios for each gene for
each of the 29 individual F1 hybrid comparisons. The F1 expression ratio for individual
F1 s (e.g. B73 x TIL01) was calculated as the number of maize reads to the number of
teosinte reads summed over all segregating sites in the gene. The parent expression ratio
for individual F1 comparisons was calculated as the number of reads for the maize parent
(e.g. B73) to the number of reads for the teosinte parent (e.g. TIL01) summed over all
segregating sites in the gene after correcting for any difference in the total number of
reads between the two parent lines. The result of these calculations was a set 29 matched
F1 and parent ratios of read counts for each gene. For example, for the B73 x TIL01
59
comparison at a single gene, the F1 and parent maize:teosinte ratios could be 52:56 and
34:30, respectively.
We also calculated F1 hybrid and parent maize:teosinte expression ratios for each
gene summed over all F1 hybrid comparisons by pooling the read depth values for the 29
F1 hybrids and their parents, respectively. To calculate the overall F1 expression ratio,
the maize and teosinte read counts from the F1 hybrids were simply summed over all
segregating sites in a gene and across all hybrids. The calculation of the overall parent
expression ratio required weighting. The weighting was necessary to avoid counting the
parent reads multiple times for each of the F1 hybrids in which it was a parent and to
compensate for the fact that different parents had variable total numbers of reads. Only
genes with a read depth of at least 100 in both the F1 and its parent were included. The
result of these calculations was an overall F1 and parent ratio of read counts for each gene.
For example, for a gene, the overall F1 and parent maize:teosinte ratios could be 804:796
and 123:130, respectively.
3.3.4
Testing for cis and trans effects
The combination of F1 hybrid and parent inbred expression data allows us to estimate
both the cis and trans effects on gene expression. For the F1 hybrids, the maize and
teosinte alleles at each gene are in a common trans cellular environment, and thus any
deviation of the maize:teosinte F1 expression ratio from 1:1 represents purely cis effects.
By contrast, the maize:teosinte parent expression ratio is a combination of the cis and
trans effects and any deviation of this ratio from 1:1 reflects the combined cis plus trans
effects. Therefore, the trans effects can be estimated by subtracting the F1 hybrid ratio
(cis) from the parent ratio (cis plus trans).
Maize and teosinte gene expression as measured by the read depth counts at genes were
used for statistical testing of cis and trans effects. Significant cis and trans effects were
60
Table 3.1: Regulatory category as defined by significant (Sig.) or not significant (Not
Sig.) binomial tests (BT) and Fisher’s Exact Tests (FET).
Category
Cis
Trans
Cis + Trans
Cis x Trans
Compensatory
Conserved
Ambiguous
Parent BT
Hybrid BT
FET
Favored allele?
Sig.
Sig.
Not Sig.
—
Sig.
Not Sig.
Sig.
—
Sig.
Sig.
Sig.
Same
Sig.
Sig.
Sig.
Opposite
Not Sig.
Sig.
Sig.
—
Not Sig.
Not Sig.
—
—
All other patterns of significant or not significant
61
determined using binomial and Fisher’s Exact Tests as described in McManus et al. [21].
In brief, two binomial tests were used to identify genes with maize:teosinte expression
ratios significantly different from 1:1 in the F1 hybrid and parent comparisons. Genes
with an expression ratio significantly different from 1:1 for the F1 hybrid and/or parent
comparison were then subjected to a Fisher’s Exact Test to determine if the parent and F1
hybrid maize:teosinte expression ratios were different from one another. An FDR rate of
0.5% using Storey’s q-value [81] was used to compensate for the large number of statistical
tests being performed. The combination of the two binomial tests and Fisher’s Exact Test
allowed us to classify each gene into one of seven different regulatory categories (Table 3.1)
as described in McManus et al. [21].
3.3.5
Candidate genes
Genes whose expression level was the direct target of selection during maize domestication
are expected to show a maize:teosinte cis expression ratio that is significantly different
from 1:1. These genes can fall into either the cis only (C) or cis plus trans (CT) groups on
Table 3.1 as determined by the binomial and Fisher’s Exact Tests. We call this combined
group CCT genes and they are the differential expression candidates that are the focus
of many of our analyses.
The list of CCT genes from the overall test was large (5,609 ear; 5,392 leaf; 5,426 stem;
see results). The large number of CCT genes reflects the considerable statistical power
to detect slight overall expression biases given that some genes had thousands of reads
aligning to segregating sites. We observed significant maize:teosinte expression biases
as small as 1.0:1.02 in the overall tests. Such small differences seem unlikely to have
biological importance and genes showing these small differences are weak candidates for
genes with cis expression variation that is causal in maize domestication and improvement.
62
Therefore, we applied filters to identify candidates with the strongest and most consistent
regulatory differences.
To narrow down the CCT gene list to candidate genes that show the strongest evidence
for differential cis regulation between maize and teosinte, we applied two filters. (1) Genes
with the strongest evidence should not only fall in the CCT group for the overall test using
the pooled data from all 29 F1 hybrid comparisons, but the best supported genes for cis
differences will be the ones for which we have data from a large proportion of our sampled
maize and teosinte parents. Thus, we filtered the initial list of CCT genes for those with
data from at least fifteen F1 hybrids that include at least three different maize inbreds
and five different teosinte inbreds. (2) For genes with cis differences that contributed to
maize domestication/improvement, they should not only appear in the CCT list from the
overall test, but the direction of the expression bias should be highly consistent among
each of the individual F1 hybrids. To classify CCT genes for consistency of directionality
of expression bias among the F1 s, we partitioned the genes into groups with 100%, 90%
and 80% of F1 s showing the same directionality. In calculating these percentages, we used
read depth for each F1 at the gene to weight the contribution of the F1 s to the overall
percentage. We refer to the CCT genes with 100%, 90% and 80% consistent directionality
among the F1 s as the A-list, B-list and C-list, respectively. For comparative purposes, we
made similar A, B and C lists of genes for the cis only or trans only classes.
3.3.6
Proportion of cis variation in maize and teosinte
The existence of multiple cis regulatory regimes within maize and teosinte populations
are expected to manifest as variation in the expression ratios among F1 hybrids. We asked
whether cis expression variation among F1 hybrid ratios was more heavily influenced by
maize or teosinte inbred parent. Since three teosinte inbreds (TIL05, TIL10, and TIL15)
were involve in only a single F1 each, the three F1 s involving these inbreds were removed
63
from the data in order to balance the number of maize and teosinte inbred parents in the
dataset for this analysis. Genes were tested for variation among the F1 expression ratios
(cis variation) using a linear model. The log2 (maize:teosinte) F1 expression ratio as the
dependent variable was fit to the maize (j=1 to 6) and teosinte (k=1 to 6) parents as the
independent variables. All models were fit on a gene-by-gene basis. Significant maize and
teosinte parent terms were identified with an F-test (p < 0.05) using the drop1 function
in R. The data for each F1 was weighted by its total depth at the gene to account for
different read-depths in the F1 hybrids.
3.3.7
Additive and dominant gene expression
One theory in domesticated systems states that genes responsible for rapid morphological evolution are primarily loss of function (LOF) alleles [82]. In this scenario, a nondomesticated allele would be dominant to the LOF domesticated allele. While there is
some support for this theory in rice diversification and improvement [83], recent QTL and
domestication gene cloning experiments present a more diverse collection of functional
gene changes [84]. In domesticated systems, the mode of inheritance for gene expression
in terms of additivity and dominance has yet to be explored.
Our dataset consisting of parent inbred and hybrid expression profiles gives the opportunity to address the LOF hypothesis in terms of gene expression on a genome-wide
scale. We calculated the additive effect, dominant effect, and dominant/additive (D/A)
ratio for each gene and maize-teosinte F1 hybrid comparison. The overall maize-teosinte
average D/A ratio was then calculated after exclusion of outlier F1 D/A ratios using the
Dixon method [85]. Genes were next classified as having overdominant (1.25 < |D/A|),
dominant (0.75 < | D/A | < 1.25), semi-dominant (0.25 < | D/A | < 0.75), or additive (|
D/A | < 0.25) gene action depending on D/A ratio. Following calculation of overall D/A
ratios and assignment of gene action, we looked for patterns in D/A ratios and gene action
64
that support the LOF hypothesis [82]. Specifically, we looked for evidence of extensive
dominance of the teosinte (non-domesticated) allele for genes with trans only regulatory
change.
3.3.8
CCT gene enrichment in various functional categories
We assessed whether CCT genes are over or under represented in several categories as compared to all genes or genes with conserved expression levels between maize and teosinte.
The categories we tested include transcription factors, several metabolic pathways, gene
ontology (GO) categories, selection candidates, and domestication QTL. A list of maize
transcription factors and their associate families was downloaded from the plant transcription factor database [86]. Metabolic enzyme cDNA sequences for starch and lipid
metabolism pathways in maize were downloaded from the Kyoto Encyclopedia of Genes
and Genomes (KEGG) [87, 88] and matched with genes from the maize filtered gene set
(version 5b) by BLAST. Matches (single gene hit with percent identity greater than 95%)
were found for 370 out of 379 genes and used to test for enrichment of CCT genes in the
various metabolic pathways. Genes under positive selection during maize domestication
and improvement were taken from a recent genomic scan for selection [55]. We obtained a
list of QTL associated with maize domestication and improvement traits from Table A.1
in work by Shannon [25].
In general, we tested for enrichment or depletion of CCT genes in various categories
using Fisher’s Exact Tests on 2x2 contingency tables that parse genes by CCT and category status. Statistical testing was first done for CCT-AB candidate genes and extended
to CCT-A and CCT-ABC lists if an interesting result presented itself. Additionally, there
were a few differences in this general approach depending on what category was being
analyzed. For QTL, we looked for enrichment of CCT genes among the genes within the
1.5 LOD support intervals for each trait separately and only included QTL whose 1.5
65
LOD support intervals were narrow enough to encompass 20 or fewer genes. For genes
under positive selection during domestication and improvement, we performed an additional three tissue union comparison where genes on any of the three tissue CCT lists
were considered a CCT candidate gene.
One expectation for genes under selection for CREs is the signature of selection at
the CRE itself, upstream of the gene in question. Since there is no hard rule as to
how far upstream cis enhancer and repressor elements can function, we addressed this
expectation by looking at selection pressure at the transcriptional start site of genes. The
raw selection score, represented by cross population composite likelihood ratio (XPCLR)
[89], from Hufford et al. [55] served as a test statistic for this analysis. A three tissue
union comparison was made between all genes on CCT-AB lists and all genes identified
as conserved in the initial assay. Significant differences between the XPCLR score at the
transcriptional start site were tested by Kolmogorov-Smirnov and simple t-tests to look
for change in the overall distribution and mean of conserved versus CCT genes.
Finally, we used the goseq package [90] in R [91] to test for GO term enrichment and
depletion in our CCT gene lists, using median gene length to adjust the reference in the
goseq analysis. The base background GO term reference consisted of genes for which
allele specific expression was assessed in 15 crosses, three unique maize, and five unique
teosinte inbred lines with a cumulative depth of 100 at segregating sites in F1 and parent
comparisons. GO terms occurring at least five times in the background reference were
tested for enrichment and depletion in the CCT-A, CCT-AB, and CCT-ABC gene lists
with p-values corrected for multiple testing using the Benjamini-Hochberg method [92].
66
3.4
3.4.1
Results
RNAseq provides expression data for more than 17,000
genes per tissue
RNAseq data for seedling leaf, seedling stem (including the shoot apical meristem), and
immature ear from six maize inbreds, nine teosinte inbreds, and 29 of their 54 possible
F1 hybrids were used to examine variation in gene expression on a genome-wide scale. In
total, 259 RNAseq libraries were constructed from an average of 1.96 biological replicates
for each parent inbred and F1 .
Overall, 996 million, 1.13 billion, and 1.21 billion F1 hybrid and 286 million, 283
million, and 276 million parent RNAseq reads were collected for ear, leaf, and stem tissue
types, respectively (Table 3.2). These reads were aligned with custom-made parent specific
pseudo-transcriptomes containing an average of 54,000 segregating sites (SNPs or small
indels) in each of the 29 maize-teosinte contrasts. Out of the reads from the F1 hybrids,
556 million, 670 million, and 716 million reads mapped to pseudo-transcriptomes in ear,
leaf, and stem tissue, respectively. For parent inbred line reads, 171 million, 170 million,
and 163 million mapped to the pseudo-transcriptomes (Table 3.2). Thus, approximately
the same percentage of reads (58.1% and 59.6%) mapped to pseudo-transcriptomes in
both the F1 hybrids and parent datasets with about 7.15% of the total reads mapping to
segregating sites in the individual F1 hybrids and their parents.
The RNAseq reads from the pooled data for all 29 F1 hybrids and 15 parents that
aligned to segregating sites in the transcriptomes represent 23,045, 23,434, and 23,792
genes for ear, leaf and stem tissues, respectively (Table 3.3). The union of these three
groups is 24,983 genes, which is 63% of the 39,423 genes from the maize filtered gene set
(version 5b). We applied a filter to this list, requiring a read-depth of 100 in both the
parent inbreds and F1 hybrids. This filter reduced the lists to 15,939, 15,925, and 16,018
67
Figure 3.1: Overlap of genes assessed in the three tissues overall and in the CCT-AB
gene list. Each compartment of the Venn diagram contains the tissue combination on
top, number of genes overall in the middle, and number of genes from the CCT-AB gene
list on bottom. CCT-AB overlap numbers marked by an “*” indicate significantly more
overlap than expected by chance (permutation tests, p < 1e-5). In the overall analysis
the vast majority of genes (82%) were assayed in all three tissues. While this percent is
much smaller for the CCT-AB candidate gene list (∼7%), this is still more of an overlap
than expected by chance. The much higher degree of overlap of CCT-AB genes than
expected suggests some CREs act in multiple tissues. Additionally, there are also many
single tissue CCT-AB genes, which points towards the many cis elements that appear to
function in tissue specific patterns.
68
Table 3.2: Assignable RNAseq Read Counts from F1 hybrids and parents.
Tissue
F1 Hybrid Count
Parent Count
F1 Hybrid
Percent
Parent
Percent
Total Reads
Ear
Leaf
Stem
996,210,711
1,133,517,167
1,211,779,746
286,233,926
282,553,096
276,295,164
-
-
Aligned Reads
Ear
Leaf
Stem
556,387,109
670,175,942
716,223,906
171,185,368
169,564,817
162,866,225
55.85%
59.12%
59.11%
59.81%
60.01%
58.95%
Segregating
Site Reads
Ear
Leaf
Stem
74,556,872
72,995,272
91,355,219
85,296,872a
78,878,805a
78,583,423a
7.48%
6.44%
7.54%
29.80%a
27.92%a
28.44%a
a
A higher number and percentage of reads map to segregating sites in parents due to
each set of parent reads being used in multiple comparisons. In contrast each of the F1
comparisons can only map to segregating sites between two pseudo-transcriptomes.
69
Table 3.3: Genes for which RNAseq data was collected and expression was assayed.1
Ear
Leaf
Stem
Union
Genes with mapped RNAseq reads 32,858 32,645 33,316 34,636
Genes with RNAseq reads and segregating sites 22,072 22,393 22,901 24,052
Overall Genes (filtered100 depth) 15,939 15,925 16,018 17,575
Total CCT genes 5,618 5,402 5,435 10,101
Filtered CCT Genes (15F1 + 3M + 5T) 4,770 4,490 4,601 8,398
ABC-List CCT 1,545 1,288 1,371 3,018
C-List CCT
990
843
940
2,314
B-List CCT
512
424
404
1,036
A-List CCT
43
21
27
69
1
Only genes from the maize filtered gene set (version 5b) were considered.
70
genes in ear, leaf, and stem tissues, respectively. The union of these three groups is 17,575
genes or about 45% of the filtered gene set. There is a large degree of overlap among the
genes expressed in the three tissues. From the total list of 17,575 genes, 14,420 (82%) were
seen in all three tissues. Of the remaining genes, 1,467 are in some combination of two
tissues and 1,688 are in only a single tissue (Figure 3.1). All except 16 of these single or
two tissue genes were detected at a read depth below 100 in additional tissues. However,
for the 1,688 genes expressed in only single tissues at 100 read-depth, an average of 67.4%
of their reads come from the tissue with the most reads. For genes detected in all three
tissues at 100 read-depth, this value is only 46.9%. Thus, while very few of the 1,688
genes are absolutely tissue specific, this group of 1,688 genes shows greater differences in
expression among tissues than the 14,420 genes detected in all three tissues.
3.4.2
Prolific regulatory variation characterized by relatively
few consistent cis differences
We measured log2 of the ratio of maize to teosinte read counts in F1 hybrids (cis regulatory effect) and the parent log2 ratio (combined cis and trans regulatory effect). The
trans effect was estimated as the difference between the F1 and parent log2 ratios. Binomial and Fisher’s Exact Tests were used on read counts to determine whether these
ratios deviated from 1:1 and to assign genes to one of seven regulatory categories (Table 3.1). In an overall maize versus teosinte comparison, about 69% of genes (69.27% ear,
74.27% leaf, and 63.82% stem genes) from the three tissues were classified as having some
combination of significant cis and/or trans regulatory effect (Figure 3.2). The remaining
genes were classified as having conserved (18.6%, 15.3%, and 20.7%) expression in maize
and teosinte or ambiguous (12.1%, 10.4%, and 15.5%) expression patterns. All three
tissues had similar proportions of genes falling into the different regulatory categories in
71
the overall maize-teosinte comparison (Ear: Figure 3.2, Leaf: Supplemental Figure C.1,
Stem: Supplemental Figure C.2).
We asked what proportion of regulatory divergence between maize and teosinte was
due to cis effects by calculating the ratio: |cis|/(|cis|+ |trans|) [21]. Overall genes, cis
effects account for 45%, 42% and 47% of regulatory divergence for ear, leaf and stem
tissue, respectively (Supplemental Table C.4). We further asked the relative contribution
of cis and trans in generating large expression differences by binning genes based on
overall expression difference between maize and teosinte (log2 parent ratio). This analysis
shows the magnitude of cis regulatory change is positively correlated with total divergence
in expression (Figure 3.3). At high degrees of expression divergence between maize and
teosinte (log2 change of 5 or more), over 75% of the divergence is due to cis. Thus, large
expression differences appear to be caused primarily through difference in cis regulation
as opposed to trans.
A primary goal in this study was to identify genes with cis regulatory differences
between maize and teosinte. Such genes are candidates for being direct targets of selection
during maize domestication or improvement for altered gene expression. Genes selected
for regulatory differences would be in either the cis only or cis plus trans regulatory
categories. We designate this combined group CCT genes. We identified 5,618 ear, 5,402
leaf and 5,435 stem CCT genes in the overall analysis (Table 3.3). To narrow the list
of CCT genes to those with a broad degree of support, the list was filtered to include
only those assayed in at least 15 maize-teosinte F1 s involving at least three maize and five
teosinte inbred lines. This filtering resulted in reduced lists of 4,770 ear, 4,490 leaf, and
4,601 stem CCT genes. The union of these three sets includes 8,398 genes.
Next, we asked if the 8,398 genes on the filtered CCT list from the overall analysis
have a consistent directionality in favor of the maize or teosinte allele in the individual F1
hybrids. The goal was to exclude CCT genes for which the significant overall cis effect was
72
Figure 3.2: Parent versus hybrid ear tissue allele specific expression ratios. The parent
(x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against each
other. Regulatory category in terms of the combination of significant statistical tests
determined using the method described in methods is shown designated by color. Proportion and count of genes falling into the various regulatory categories are also shown in
the lower right hand corner barplot.
73
Figure 3.3: Proportion of expression divergence due to cis regulatory difference. The
amount of total differential expression between the maize and teosinte parents due to the
directly measured cis effect (F1 hybrid expression ratio) is shown with error bars depicting
one standard error. Total divergence (parent expression ratio) was binned from 0-1, 1-2,
2-3, 3-4, 4-5, and 5+. Divergence due to cis effects increases with total divergence, suggesting large expression differences tend to be caused by cis rather than trans regulatory
differences.
74
caused by a large expression bias in a minority or even one of the F1 crosses. We defined
three levels of consistency: groups A, B and C for which 100%, 90% and 80% of F1 s
showed the same directionality, respectively. Groups A, B, and C genes combined across
tissues contained 69, 1,036, and 2,314 genes respectively (Table 3.3). Thus, relatively
few of the 8,398 filtered CCT genes show a significant overall cis effect that is highly
consistent among 15 or more F1 hybrids.
3.4.3
Possible directional bias in cis evolution
Visual examination of Figure 3.2 shows a greater density of cis genes (black dots) with
positive log2 hybrid expression ratios than with negative ratios, suggesting cis evolution
during domestication more often favored alleles with increased expression in maize relative
to teosinte. Consistent with this visual observation, the number of CCT (ABC list)
genes with a positive (maize biased) vs. negative (teosinte biased) log2 hybrid expression
ratio are 947:598, 814:474 and 826:545 for ear, leaf and stem, respectively (Supplemental
Table C.5). All of these ratios are significantly different from a 50:50 unbiased expectation
(binomial test, p < 0.001). Additionally, a plot of the distribution of log2 hybrid expression
ratio for CCT genes shows a much greater density of genes with positive values (Figure 3.4)
for all three tissue types.
The apparent bias in directionality of cis evolution could be the result of error in our
bioinformatics pipeline. One potential error is preferential alignment of maize RNAseq
reads due to overall greater sequence divergence of teosinte lines from the reference transcriptome (B73) in comparison to non-reference maize inbred lines. If such systematic
error exists, the observed bias in directionality of cis evolution would be expected to be
greatest for F1 s involving the reference B73 (zero alignment bias of maize reads and high
bias for teosinte) and less extreme for crosses between teosinte and non-reference maize
lines (moderate bias for non-reference maize and high bias for teosinte).
75
Figure 3.4: Cis versus estimated trans regulatory effect for CCT-ABC genes in the ear,
leaf, and stem. CCT genes have a directional bias with more genes overall favoring the
maize allele than teosinte. Genes with consistent cis regulatory differences tend to favor
the domesticated maize allele. This phenomenon exists in all three tissues. While we
cannot discount references bias as the cause, this trend suggests there may be an overall
directional bias for cis regulatory evolution in maize domestication.
76
To test this expectation, we calculated the number of CCT (ABC list) genes with
positive (maize biased) vs. negative (teosinte biased) log2 hybrid expression ratios separately for F1 s involving B73 and non-B73 maize parents. For ear tissue, there are 569
teosinte-biased and 975 maize-biased genes for B73 F1 s and 606 teosinte-biased and 939
maize-biased genes for non-B73 F1 s. A Fisher’s Exact Test fails to reject the null hypothesis that these two ratios are equivalent (p = 0.18). There was also no evidence for
non-equivalent ratios with the other two tissue types (Supplemental Table C.6). Thus,
we see no evidence for significantly greater bias for maize alleles in crosses involving B73
versus the non-reference maize parents, supporting the argument that alignment bias introduced by use of pseudo-transcriptomes does not explain the excess of CCT genes with
the maize allele expressed higher than the teosinte allele.
3.4.4
Gene expression variation is greater in teosinte
Both the domestication/improvement bottleneck and selection during domestication are
expected to reduce variation in maize as compared to teosinte. We asked if these reductions in variation are apparent in our gene expression data. To quantify whether variation
in maize or in teosinte was the source of the variation in our expression ratios among F1 hybrids, we fit a linear model on a gene-by-gene basis where maize and teosinte inbred parent
were used as explanatory factors for the expression ratio. Among ∼13,000 genes included
in this analysis, the maize parent explains only 85% as much variation as the teosinte
parent (Supplemental Table C.7). This represents the general reduction in diversity of
maize as compared to teosinte, presumably a result of the domestication/improvement
bottleneck.
While the bottleneck should cause a reduction in expression variation in maize for all
genes, genes that were targets of selection for regulatory differences should have an even
greater reduction in expression variation. Consistent with this expectation, we observed
77
Figure 3.5: The proportion of average maize to teosinte R2 from linear models explaining
F1 hybrid expression by maize and teosinte parent. Error bars represent ± 1 standard
error. In all three tissues, the proportion of maize to teosinte R2 decreases in candidate CCT gene lists with the most ideal candidates (CCT-A) having the most extreme
reduction.
78
a greater reduction in variation in maize as compared to teosinte for CCT genes than the
full set of ∼13,000 genes (Figure 3.5, Supplemental Table C.7). This greater reduction
likely reflects the combined effects of the bottleneck plus selection during domestication.
For the full ABC groups of CCT genes, maize contributes 79% of teosinte variation, for
the AB group about 74%, and for the A group about 52% of teosinte variation. Thus,
among our strongest candidates (A group) for genes with cis regulatory difference between
maize and teosinte, the data indicate that maize explains only about half as much of the
cis regulatory variation as teosinte.
The reduction in gene expression variation in maize vs. teosinte is also seen in the
number of individual genes with significant effects due to the maize and/or teosinte parent
(Supplemental Table C.8). In terms of numbers of genes, there were 2.0 to 2.5 fold more
genes for which only the teosinte parent effect was significant than genes for which only
the maize parent effect was significant among AB list genes, and 5-fold more among the
A list CCT genes.
3.4.5
Selection candidate genes are enriched for CCT genes
We compared our list of CCT genes to putative targets of selection during maize domestication and improvement [55]. There is significant enrichment for CCT genes among
selection candidate genes for all three tissues (Table 3.4). The strength of the evidence
for selection is strongest for the union of CCT genes from all three tissues. For example,
there are 134 CCT-AB genes among the selected genes, while 86.7 would be expected by
chance. Also, there were 10 CCT (A-list) genes from stem tissue among selected genes,
although only 2.16 are expected by chance, a nearly 5-fold enrichment.
XPCLR scores (cross population composite likelihood ratios) [89] quantify the degree of support for positive selection on a genomic region. We drew on a recent study
[55] looking at XPCLR score in 10 kilobase windows in maize on a genome-wide scale.
79
Figure 3.6: Density plots of ln(XPCLR) score of conserved versus CCT-AB candidate
genes. CCT genes have a significantly higher signature of selection in the 10kb window
holding the transcriptional start site. The natural log transformed XPCLR scores for
CCT-AB genes are consistently and statistically higher than genes that were identified as
conserved in the initial analysis. The distributions of conserved and CCT-AB genes are
significantly different by both the shape sensitive Kolmogorov-Smirnov test (p = 1.0587e11) and simple difference of the means t-test (p = 2.2119e-10)
80
Table 3.4: Fisher’s Exact Tests for the overlap between genes in domestication and improvement selection candidate genes and CCT genes from each of the three experimental
tissues.
CCT Group
Overlap
Ear
Leaf
Stem
Union
A
Expected
Observed
p-value
3.42
11
3.52e-04
1.41
5
9.73e-03
2.16
10
1.89e-05
5.6
20
2.49e-07
AB
Expected
Observed
p-value
44.71
70
9.12e-05
35.29
57
1.79e-04
34.78
60
1.74e-05
86.7
134
1.13e-07
ABC
Expected
Observed
p-value
125.48
174
2.11e-06
105.68
135
1.289e-03
109.89
139
1.626e-03
248.92
317
3.54e-07
81
Comparison of the distributions of ln(XPCLR) scores at the transcriptional start site for
CCT-AB genes and genes with conserved expression between maize and teosinte shows
that CCT genes having a higher mean XPCLR than conserved genes (Figure 3.6). These
two distributions are significantly different in terms of shape (Kolmogorov-Smirnov test,
p = 1.06e-11) and overall mean (t-test, p = 2.21e-10).
A goal of this study was to explore the relative importance of cis versus trans regulatory divergence during maize domestication. To address this question, we looked at the
evidence for selection on genes with cis only effects in comparison to genes that had trans
only effects. Genes in the cis and trans only regulatory categories were filtered to only
include those that had consistent effects in the F1 hybrid contrasts. Consistent effect was
defined as 100%, 90%, and 80% of hybrid contrasts favoring the same directionality of
effect. Due to this definition genes in the cis only group were merely the cis only subset
of CCT genes. For the trans only group in this analysis, the trans effect was estimated
from parent and hybrid expression ratios and a weighted percent of hybrid contrasts favoring maize or teosinte alleles was calculated. Fisher’s Exact Tests on 2x2 contingency
tables tabulating cis and trans genes with selection feature genes from Hufford et al.
[55] show cis only genes are significantly enriched (p-value < 0.05) for selection in 7 of
9 comparisons, while trans only genes are never enriched and are actually significantly
underrepresented among selected genes in two cases (Table 3.5).
3.4.6
Microarray and RNAseq data partially correspond
We assessed the degree of correspondence between our CCT genes and 612 differentially
expressed genes identified by a recent microarray study in maize [18]. We constructed
2x2 contingency tables for differentially expressed (DE) and non-differentially expressed
(NDE) genes from the two studies. A Fisher’s Exact Test shows a highly significant degree
of correspondence between the two studies for all three tissue types (Table 3.6). Using our
82
Table 3.5: Fisher’s Exact Tests for enrichment/depletion of cis and trans only genes in
selection features.
Tissue
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
Regulatory
Category
Group
Observed
Expected
p-value
A List
5
3
3
4
3
1
1.998
0.751
1.316
5.327
2.346
0.282
0.043
0.032
0.138
0.818
0.506
0.256
AB List
36
24
32
28
34
16
24.449
13.516
19.647
41.954
38.388
12.032
0.018
0.006
0.006
0.020
0.490
0.222
ABC List
95
54
84
78
91
42
70.113
45.427
65.615
97.036
101.461
43.148
0.002
0.175
0.016
0.033
0.273
0.935
Cis only
Trans only
Cis only
Trans only
Cis only
Trans only
83
CCT-AB list, ∼25 gene are identified as DE in both studies while about 7 are expected
by chance. However, the absolute level of correspondence between the two studies is
rather low. For example, of the 328 leaf genes identified as DE by RNAseq, only 25 (7%)
were also identified by the microarray study (Supplemental Table C.9). Thus, while the
overlap between our two studies is statistically significant, the two methodologies resulted
in largely different lists of DE genes.
The largely different lists of DE genes identified by microarray and RNAseq analysis
could be due in part to the fact that the microarray analysis includes genes with trans and
cis x trans differences. To assess the proportion of the 612 genes that have trans versus
cis effects, we examined the regulatory categories of the
∼250
differentially expressed
genes (241, 261, 259; ear, leaf, and stem) for which there is both microarray and RNAseq
data (Supplemental Table C.10). About 20% of these genes are classified as trans only
or cis x trans by RNAseq, while 55% are classified as either cis only or cis + trans. The
remainder (25%) are classified as conserved, ambiguous or compensatory. These results
suggests the very different lists of DE genes from the two technologies is to a large degree
due to differences in tissue, germplasm, environment, sampling error, or technical error,
and that inclusion/exclusion of trans and cis x trans genes by the two studies does not
explain all of the difference.
3.4.7
CCT genes are unrelated to differentially methylated regions
In a recent study, Eichten et al. [93] identified differentially methylated regions (DMRs)
in maize and teosinte. We compiled a list of the nearest genes both upstream and downstream of each DMR which gave a list of 332 genes. Of these genes, we have RNAseq data
from 115, 116, and 121 for the ear, leaf, and stem tissues, respectively. Of these genes, 19,
14, and 17 genes were on the CCT-ABC gene lists (Supplemental Table C.11). We asked if
84
Table 3.6: Fisher’s Exact Tests for the overlap between differentially expressed genes from
the microarray study and CCT genes from each of the three experimental tissues in our
work.
CCT Group
Overlap
Ear
Leaf
Stem
Union
A
Expected
Observed
p-value
0.556
4
2.14e-03
0.274
3
2.28e-03
0.359
2
4.92e-02
1.040
8
7.83e-06
AB
Expected
Observed
p-value
7.501
23
1.56e-06
6.409
25
4.84e-09
6.248
25
2.91e-09
15.778
48
1.61e-12
ABC
Expected
Observed
p-value
21.774
52
9.58e-10
19.363
48
1.69e-09
20.579
46
1.05e-07
46.069
90
6.34e-12
85
CCT-ABC list genes are over-represented among the DMR associated genes as compared
to random expectation and found that they are not (Fisher’s Exact Test, p = 0.1092, p
= 0.4309, p = 0.1755; ear, leaf, and stem). Finally, the relationship between methylation status of the DMR does not correspond with the differential expression of maize vs.
teosinte alleles at CCT-ABC list genes. Rather than observing that the more methylated
allele was expressed at a lower level, the data show that ∼50% of the time, the methylated
allele is expressed higher and ∼50% expressed lower (Supplemental Table C.12).
3.4.8
Dominant and additive gene expression inheritance
The dominance/additivity (D/A) ratio was calculated for genes that were assessed in
at least 15 crosses with three unique maize and five unique teosinte inbred lines. The
overall average of gene D/A ratios was close to zero in all three tissues (Supplemental
Table C.13), suggesting there is not an extreme overall trend for dominance of nondomesticated teosinte alleles over domesticated maize alleles. Tissues with active developmental programs, immature ear and seedling stem, are quite close to a 1:1 ratio of genes
with a positive D/A ratio to genes with negative D/A ratio (1.084 and 0.982 for ear and
stem, respectively). In contrast the leaf tissue has substantially more genes with a negative D/A value (1.287 ratio of positive to negative D/A ratios), indicating a higher rate of
domesticated maize allele dominance in the leaf tissue. Of the three experimental tissues,
two (Ear and Leaf) have an overall mean significantly different from zero (z-test, p <
0.05) and significantly more negative D/A ratios (binomial test, p < 0.05) than positive,
suggesting teosinte allele dominance (Supplemental Table C.13).
The average D/A ratios of the seven regulatory categories and three CCT gene lists
are also fairly close to an overall mean of zero. Even the smallest CCT-A lists (21
to 43 genes) were always less than a fully dominant D/A ratio of one. Density plots
for D/A ratio grouped by the seven regulatory categories do not show an obvious shift
86
in distribution (Supplemental Figure C.3). Thus, there is evidence for a weak overall
tendency for dominance of non-domesticated expression levels in the ear and leaf tissues
with no evidence for this teosinte dominance being linked to a specific regulatory category
or candidate CCT gene list.
We compared the proportions of genes showing dominant versus additive gene action
in the cis only and trans only regulatory classes. Our trans only genes will show dominant
gene action when there are haplo-sufficient loss-of-function (LOF) alleles at their trans
regulators. In contrast, the effects of cis regulatory elements are expected to be purely
additive in absence of transvection or similar mechanism [94]. When one of our cis only
genes is classified as having dominant gene action that may also indicate error in classification because of trans effects on its expression that were below the level of statistical
detection. Consistent with the expectation that dominance is more likely for trans only
genes, the proportion of genes classified as dominant is higher for trans only genes in all
three tissue types (Figure 3.7, Supplemental Table C.14).
It has been proposed that the allelic variants responsible for evolution during domestication are primarily recessive LOF alleles [82]. Under this model, a non-domesticated
allele would be dominant to the recessive LOF domesticated allele. Among our cis only
genes with dominant gene action, dominance of the maize versus teosinte allele does not
differ from the 50:50 expectation (Figure 3.7, Supplemental Table C.14). Among our
trans only genes with dominant gene action, the maize allele is dominant to the teosinte
allele more often than expected by chance. These results are counter to the proposal that
domestication favored recessive LOF alleles.
3.4.9
Candidate genes enriched in various functional categories
We examined our list of CCT genes for enrichment of several functional classes of maize
genes including transcription factors, genes in known metabolic pathways, genes underly-
87
Figure 3.7: The proportion of genes showing dominant (red) versus additive (blue) gene
action for cis only and trans only AB lists. For all tissues, trans only genes have a higher
rate of dominance, however this difference is only significant for the ear and leaf tissues
(Fisher’s exact test, p < 0.005 indicated by “*”). The proportion of genes in the trans
only lists that are dominant for the teosinte allele (green) and the maize allele (yellow) is
shown in the barplot to the right of each pie graph. There is significant deviation from the
neutral expectation (1:1) for the ear and leaf tissue (binomial test, p < 0.005 indicated
by “*”).
88
ing QTL, and gene ontology (GO) groups. First, a list of maize transcription factors and
their corresponding families were compiled from the transcription factor database [86]. Although CCT genes (AB-list) were found to be slightly enriched for several transcription
factor families (ARF, MADS-MIKC, and LBD) by Fisher’s Exact Tests, these results do
not stand up to Bonferroni multiple test correction (Supplemental Table C.15). We conclude that there is no compelling evidence that CCT genes are enriched for transcription
factors.
Our list of CCT (AB list) genes was also compared with results from a recent QTL
mapping experiment for a number of domestication and improvement traits [25]. We
compared observed vs. expected overlap between CCT genes from the three tissues to
the genes located within 1.5 LOD QTL support intervals for 16 traits. Testing was done
on a trait by trait basis and restricted to 1.5 LOD QTL intervals containing 20 or fewer
genes. After correction for multiple testing (Bonferroni), no significant enrichment for
CCT-AB genes in domestication QTL was observed (Supplemental Table C.16). The
greatest enrichment was seen with the trait ear diameter for which there were four CCT
genes assayed in ear tissue within the QTL interval when only 1.22 were expected by
chance (Fisher’s Exact Test, p = 0.03).
A test for enrichment of CCT and trans only genes in 15 different metabolic pathways defined in the Kyoto Encyclopedia of Genes and Genomes (KEGG) was done using
Fisher’s Exact Test on 2x2 contingency tables. There was no compelling evidence for
enrichment/depletion of either groups of genes in any of the 15 pathways tested (Supplemental Table C.17). The smallest p-value identified was for the cutin, suberine, and wax
biogenesis pathway in leaf tissue for trans only genes (p = 0.012), however this result does
not remain significant after Bonferroni multiple test correction.
We tested for GO term enrichment and depletion in the CCT and trans only gene
lists. These analyses found significant GO term associations in the leaf CCT-ABC gene
89
list for five different categories including enrichment for chloroplast, plastid, thylakoid,
and chloroplast thylakoid membrane, and depletion for DNA binding (Supplemental Table C.18). For trans only genes, significant enrichment for a number of GO terms in the
ear tissue was detected for transcription factor and photosynthesis related terms with
additional enrichment for ribosomal GO terms found in the leaf tissue (Supplemental
Table C.18).
3.5
3.5.1
Discussion
Regulatory change between and within maize and teosinte
Of the ∼17,000 genes assayed 70% have significant cis and/or trans regulatory differences,
suggesting considerable regulatory change has occurred during maize domestication and
subsequent crop improvement. A similar proportion of genes were found to have cis
and/or trans differences in a recent study between two species of Drosophila [21] and
yeast [73]. This high amount of variation between maize and teosinte is not surprising
given the incredible diversity of maize. Simple presence and absence of gene expression
within maize itself is quite variable as shown in a recent study where 27.9% of genes were
only expressed in a subset of maize inbred lines [95]. Additionally, this study found over
a thousand novel genes not present in the reference B73 genome, suggesting considerable
presence absence variation (PAV) also exists within maize. This finding is consistent
with another study where PAV and copy number variation (CNV) were assessed, finding
hundreds of CNVs and thousands of PAVs that included at least 180 single copy genes
[96]. These CNVs and PAVs are accompanied by millions of additional SNPs both within
and between genes [97]. In light of the known diversity within maize, it is not particularly
surprising to see evidence for prolific cis and trans regulatory variation in gene expression
between maize and teosinte.
90
Gene expression differences between populations only addresses some of the variation seen in the dataset. There is also a large amount of variation within the maize and
teosinte populations. Only considering cis differences through F1 hybrids, upwards of 60%
of genes have evidence for multiple maize or teosinte expression levels and consequently
multiple alleles within population. Furthermore, our study shows a drop in expression
variation in maize consistent with the reduction in overall diversity caused by the domestication/improvement bottleneck with an even greater reduction in expression variation for
genes thought to be under additional artificial selection (CCT candidate genes) [55, 98].
The high level of expression variation still present in teosinte represents an unexplored
source of diversity in maize, which may be useful for future crop improvement and plant
breeding efforts.
This study sheds light on the large amount of expression variation within and between
maize and teosinte. However, only a small fraction of this diversity results in consistent
expression differences that distinguish maize and teosinte inbred lines. The relatively
small number of genes in this study showing consistent expression differences between
maize and teosinte (∼1000 of 17,000,
∼6%)
is similar to the fraction of genes seen in
another recent study by Swanson-Wagner et al. [18]. Thus, this study reveals an immense
amount of regulatory diversity within and between maize and teosinte, while also showing
only a small fraction of this diversity appears to be fixed for discrete expression patterns
that distinguish maize and teosinte populations.
3.5.2
What is the frequency of cis and trans regulatory change?
Our study shows cis and trans regulatory differences occur at a similar frequency. However, this is only part of the story, since we also show that cis effects are arguably more
important for the generation of large divergence in expression between maize and teosinte
(Figure 3.3). Our observation of cis effects accounting for the majority of large expression
91
differences was also seen in a recent Drosophila study by McManus et al. [21]. The frequency of cis and trans regulatory differences in our sampling of maize and teosinte are
fairly similar in the three experimental tissues and consistent with work in Drosophila,
however, cis regulatory effects account for a significant portion of large expression divergence.
In a recent study, Swanson-Wagner et al. [18] used microarrays to assess expression in
a number of maize and teosinte inbred lines, many in common with our RNAseq based
study. They found a relatively few number of genes (612 of ∼18,000) with differential
expression between maize and teosinte. Of the genes assayed in both our RNAseq study
and the Swanson-Wagner microarray experiment, all seven regulatory categories were
found, with approximately 25% classified as cis only, 10% as trans only, and 25% as cis
plus trans. While only ∼50% of the microarray differentially expressed genes were classified
as cis only or cis plus trans in our study (potential CCT candidate genes), the overall
low correlation between our RNAseq and the Swanson-Wagner microarray experiment
makes direct comparison difficult. Comparisons made between two parental samples will
identify genes with cumulative cis plus trans regulatory differences, consistent with this
expectation cis only, cis plus trans, and trans only were the three most frequent regulatory
categories assigned to differentially expressed microarray genes.
A prominent hypothesis in evolutionary biology is that mutation in CREs of functionally conserved proteins is the primary mechanism by which morphological evolution
occurs [12]. In this hypothesis, mutation of the CREs of highly pleiotropic “master regulator” genes, and the resulting downstream effects, contribute substantially to overall
morphological change, which if true predicts large scale rearrangement of gene expression
networks based on trans effects. While it is true trans effects occur at a high frequency
in this study, these effects are accompanied by an equal number of larger cis regulatory driven expression differences. Thus, we believe the changes to gene regulation during
92
maize domestication are best interpreted as frequent “shaving” of expression by cis regulatory change to fine-tune various pathway elements in addition to the broader adjustments
to whole pathways through trans regulatory differences.
3.5.3
Tissue specific expression of CCT candidates
We compared the expression of genes identified as candidates between tissues. There
was significantly more overlap between the candidate genes from the three experimental
tissues than expected by random chance (Permutation tests, p < 1e-5, Figure 3.1). This
suggests a high degree of shared cis regulatory effects between tissues. The functioning
of CREs in multiple tissues is also supported by the high observed correlation between
the direction and magnitude of cis effect in different tissues (Adj. R2 ≈ 80%, Pearson
correlation ≈ 90%). These results suggest many CREs function in multiple tissues to
drive expression of a gene.
While there is evidence for significant overlap of CCT genes between tissues, a very
high proportion of total CCT genes (∼70%) are only found in a single tissue. The lowest
overlap between tissues for the CCT-AB list (52 genes) was between the ear and leaf
tissue, arguably the two most developmentally different tissues studied. This trend is
seen in candidate genes as well as when considering all assayed genes. There have been
relatively few genome-wide studies using F1 hybrids to dissect cis and trans effects and
even fewer that consider multiple tissues [69, 72], but our results are consistent with these
previous studies where ∼70% of identified genes were identified in single tissues. Overall,
many CCT genes are shared between tissues, but the majority of genes are tissue specific,
suggesting modification of both globally active and tissue specific CREs occurred during
maize domestication.
Even though gene expression is highly correlated between tissues, there is evidence for
approximately 20% more functional, consistent cis regulatory changes in the ear seen in
93
the larger number of CCT genes in the ear tissue (555) than in leaf (445) and stem (431).
The imbalance in number of differentially expressed genes in different tissues was also
observed in a recent study looking in Arabidopsis [72], where the three studied tissues
had an approximately 80% difference in number of differentially expressed genes. The
maize and teosinte ear have massive morphological differences in terms of size, placement
of spikelets, glume, and absence of fruit case. These morphological differences may be
due in part to these frequent tissue specific cis regulatory differences. This observation
is again at odds with the view of large morphological change in evolution/domestication
caused by mutation of CREs for a few “master regulator” genes [12]. Instead this data
again sheds light on the many single gene expression changes through “shaving” of allele
specific expression with modification of multiple tissue specific CREs.
3.5.4
Bias toward increased maize expression?
In the F1 hybrid analysis ∼55% of genes have higher expression of the maize allele than the
teosinte allele. High expression of the maize allele also occurs in the comparison between
parent inbred lines, except for leaf, where there is the same number of genes favoring
maize and teosinte alleles. This same trend of up regulated maize expression extends to
the CCT gene lists, where ∼60% of genes favor the maize allele. Our observation of high
expression for one of the parents (maize) is also consistent with several previous studies in
multiple organisms including maize [18], cotton [17], Arabidopsis [72], Cirsium [68], and
fruit fly [21]. Our experimental method using parent derived pseudo-transcriptomes and
perfect alignment to segregating sites should ameliorate the issue of alignment bias, but
we cannot be sure to have fully eliminated it. While potential alignment bias prevents
firm conclusions, genes consistent across all maize and teosinte inbreds are less likely to
be artifacts, suggesting the overall bias for maize alleles seen in candidate genes is a real
phenomenon.
94
3.5.5
Selection-candidates enriched for cis regulatory change
Changes in gene expression, specifically through altered CREs, is not uncommon in the
history of domesticated crops. These changes have led to increased fruit size in tomato
[16], maize apical dominance [3, 40, 99], loss of prolificacy in maize [4], and changes
in rice yield and flowering time [57, 58]. These examples represent cases where large
sometimes pleiotropic genetic changes are caused by singular genes. There is no disputing
the important role of these types of genetic changes in creating some of the world’s most
productive crops. However, this study sheds light on the hundreds of other genes with
differential expression patterns, caused by CREs, between maize and teosinte.
These hundreds of genes with regulatory differences between maize and teosinte are
enriched in selection features [55] and have stronger selection upstream and at the gene
in comparison to conserved genes. Positive selection for regulatory effects is restricted to
genes specifically with CRE differences, since genes with trans only regulatory change are
never enriched for selected genes. While genes with consistent CREs differentiating populations are not all likely to play large, equal, or even critical roles in the domestication of
maize. Corroborating evidence such as selection scans can provide the information needed
to elucidate truly important players in the domestication process, even if discovering the
function for all of these genes in domestication is likely an impossible task.
One example of how data from other sources, such as selection scans, can help shed
light on candidates is the importance of cis effect magnitude. A number of genes in
this study show large shifts in expression between maize and teosinte (log2 (M:T) > 10),
however, the magnitude of cis effect has no correlation with strength of selection, suggesting magnitude of effect is not particularly important. In retrospect, this is not surprising
considering subtle changes in gene expression are known to cause drastic phenotypic differences. New tissue specific shifts in gt1 expression largely led to elimination of secondary
ears in maize [4] and a relatively moderate 2-fold change in expression of tb1 leads to
95
greatly increased apical dominance [3, 40]. In light of this result, selection on CREs during maize domestication may be best characterized as subtle fine-tuning of expression
patterns to generate phenotypic change.
3.5.6
Leaf tissue candidates are enriched for photosynthesis and
chloroplast GO terms
A number of gene ontology terms implicated in photosynthesis and carbon fixation were
found to be enriched in the leaf CCT-ABC list. Mapping these genes back to photosynthesis and carbon fixation pathways show two components in the photosystem I receptor
as well as part of the ATP synthase (delta subunit). Additionally, a number of enzymes
involved in carbon fixation were found to be up or down regulated in maize through cis
regulatory means. Most of these enzymes were involved in reactions converting malate to
other substrates in carbon fixation.
Cytosolic and mitochondrial forms of malate dehydrogenase (mdh) were two of the
identified differentially expressed genes. Mdh2, a mitochondrial form, is higher in teosinte,
whereas mdh4, cytosolic, is expressed at a higher level in maize. These expression differences suggest there were changes made to malate-oxaloacetate flux between the mitochondria and cytoplasm during maize domestication. Movement of oxaloacetate (OA)
has important implications in energy metabolism and photorespiration [100, 101]. The
changes in expression suggest there may be lower conversion between OA and malate
within the mitochondrial matrix, leading to reduced malate in the mitochondrial and reduced transport of OA into mitochondria. In theory this would leave more OA in the
cytoplasm where it would be available for conversion to malate and transport to bundle
sheath cells for photosynthesis. This could lead to improved rates of photosynthesis in
maize. However, these results should be treated with caution, since the malate dehydro-
96
genase enzymes identified are on a secondary candidate gene list and are not considered
to be our best candidates.
3.5.7
Do crop domestication genes show cis differences?
Domestication is characterized by a number of common phenotypes including gigantism,
loss of prolificacy, loss of shattering, changes to pollination mechanisms, apical dominance,
and branching that are collectively considered the domestication syndrome [10, 11]. While
domestication syndrome is characterized by several common phenotypes, the genetic modifications that lead to these traits may or may not be due to changes in homologous genes.
Genes such as waxy [102–104], tb1 [3, 105], and ghd7 [50, 57] represent several genes that
were selected on in multiple crop species, however, there are many more unique genes
controlling domestication traits [106–109]. To get a sense of the regulatory status of several crop domestication genes in maize, we generated a list of 28 domestication genes (6
maize and 22 non-maize) and identified the closest homologous gene in maize by protein
to protein BLAST (Table 3.7). Of these 28 genes, only sugary1 from maize, an isoamylase
starch debranching enzyme, in the ear was on the CCT-B gene list. Furthermore, only two
of the remaining genes were on the C list. The inability to identify cis regulatory changes
for maize homologs of non-maize domestication genes suggests cis regulatory change in
a domestication context may tend to operate on unique genes in different organisms as
opposed to a single gene with conserved functions in multiple species.
3.5.8
A catalog of genes with cis regulatory variation
A product of this study, similar to selection scans, is a list of candidates for future investigation. The complete set of 25,000 genes (with information on RNAseq read counts,
parent and F1 expression ratios, regulatory classification, and other summary informa-
Locus Name
tga1
ZmYAB2.1
Sh2
Su1
gt1
tb1
waxy
Nud
qFT10-4
BoCAL
PsELF3
DTH2
GS6
GS5
qSH1
shat1
Bh4
TAC1
GW2
Ehd1
BADH2
OsSPL16
qPE9-1
Sh1
Tannin1
FAS
Q
Vrn1
Organism
Maize
Maize
Maize
Maize
Maize
Maize
Amaranths
Barley
Brassica
Brassica
Pea
Rice
Rice
Rice
Rice
Rice
Rice
Rice
Rice
Rice
Rice
Rice
Rice
Sorghum
Sorghum
Tomato
Wheat
Wheat
Coding
Expression
Expression
Coding
Expression
Expression
Coding
Deletion
Expression
Coding
Coding
Unclear
Coding
Expression
Expression
Coding
Coding
Expression
Coding
Coding
Coding
Expression
Loss of Function
expression
Coding
Expression
Coding and expression
Expression
Functional Change
trans only
cis + trans
conserved
cis only
trans only
cis + trans
cis + trans
trans only
cis only
cis x trans
trans only
ambiguous
conserved
cis only
comp.
cis x trans
comp.
cis x trans
cis x trans
cis only
comp.
cis + trans
conserved
cis + trans
cis x trans
cis only
D
B
D
D
D
D
C
D
D
D
D
D
D
D
D
D
D
D
-
trans only
comp.
cis x trans
trans only
cis x trans
trans only
conserved
conserved
ambiguous
conserved
ambiguous
conserved
cis only
cis only
cis + trans
trans only
trans only
conserved
trans only
-
Reg. Cat.
Reg. Cat.
CCT
Leaf
Ear
D
D
D
D
D
D
D
D
D
C
D
D
D
D
D
-
CCT
comp.
cis x trans
ambiguous
conserved
trans only
trans only
cis + trans
conserved
conserved
trans only
trans only
cis only
cis only
trans only
cis only
trans only
conserved
-
Reg. Cat.
Stem
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
-
CCT
Table 3.7: Regulatory category of the closest maize homolog of 6 maize and 22 non-maize domestication loci.
97
98
tion) will be a valuable tool to investigators for screening for new genes of interest and
answering preliminary questions about the expression of specific genes.
From example, one attractive CCT candidate gene is barren stalk1 (ba1 ), a known
maize single gene mutant that causes a defect in branch formation in both the whole plant
and tassel [110]. The wild type function of ba1 is inferred to be in branch initiation. In
our study, ba1 was one of our strongest candidates with all assayed crosses showing higher
expression of the maize allele in the ear. The overall shift in expression was substantial
( 4-fold) and this shift is caused by cis regulatory differences alone. ba1 was also found
to be under selection during maize domestication in two independent studies [55, 110].
These combined observations suggest that there was selection for a CRE that codes the
upregulation of ba1 in the ear, perhaps resulting in a greater number of rows (branches)
of kernels in the maize ear as compared to the teosinte ear. Compelling evidence for this
hypothesis could be obtained by fine-mapping and identifying the hypothesized CRE and
demonstrating with expression assays that the maize and teosinte alleles of the CRE have
the imagined effects on gene expression during ear development and on phenotype (kernel
row number) in the adult ear. ba1 illustrates the power of genomic scans to identify
strong candidates for future study that can inform us about the fine details of evolution
under domestication.
99
Appendices
100
Appendix A
Supplemental Content: Genetic
dissection of a genomic region with
pleiotropic effects on domestication
traits in maize reveals multiple
linked QTL
101
A.1
Figures
Figure A.1: Histograms of the least squared means for phenotyped traits from the QTL
mapping population. Several of these distributions are approximately normal, but other
traits take on an exponential distribution. The average least squared mean for NIRILs
with 100% maize and teosinte genotypes is indicated with an arrow and “M” for maize
and “T” for teosinte.
102
Figure A.2: Example histograms of simulated traits for several different conditions in
terms of number of causative loci, effect size, and heritability. Histograms from traits
with equal effects - 67% H2, equal effects - 90% H2, gamma distributed effect - 67% H2
and gamma distributed effect - 90% H2 are shown in different columns from left to right.
Histograms from simulated traits with one, five, ten, twenty, fifty, seventy-five, and one
hundred causative loci are shown from top to bottom. The average simulated phenotype
value for NIRILs that are 100% maize and teosinte are indicated with arrows labeled by
“M” for maize and “T” for teosinte.
103
Figure A.3: Proportion of detected QTL with zero, one, or multiple causative genes in the
1.5 LOD support interval. As seen in the equal effect size simulations, a high number of
gamma distributed causative genes leads to detected QTL with multiple causative factors.
There is a reasonable percentage of detected QTL in the simulations containing a single
causative gene when few (less than 4) causative genes are simulated, but as the number
of simulated causative genes increases we quickly lose the power to distinguish between
closely linked causative genes and they become lumped into single detected QTL.
104
A.2
Tables
Table A.1: RFLP Markers used during backcrossing of QTL mapping population.
Marker
Chromosome
Marker
Chromosome
bnl5.62
umc157
umc37b
npi255
BZ2
bnl8.10
npi615
umc107
npi225
bnl8.45
umc53
npi320
npi421
umc6
umc34
umc134
umc131
umc2b
umc5a
php20005
umc122
umc49a
umc36
umc32
umc121
php20042
umc42b
umc161
umc18
TE1
bnl5.37
bnl8.01
umc60
bnl12.97
php10080
npi425
umc2a
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
php20725
umc19
umc127a
bnl10.17b
umc15
bnl8.23
bnl8.33
bnl6.25
umc90
umc27
umc166
bnl7.71
npi412
umc54
umc127b
umc104a
bnl6.29
umc65
umc21
umc46
umc132
umc62
npi114
bnl9.11
umc117
umc7
npi253
umc113
umc81
umc95
bnl3.04
umc130
umc49b
umc117b
bnl7.49
4
4
4
4
4
4
5
5
5
5
5
5
5
5
5
5
6
6
6
6
6
6
8
8
8
8
9
9
9
9
10
10
10
10
10
105
Table A.2: Genetic markers used to score BC6 S6 mapping population.
Marker
umc2036
bnlg565
bnlg105
phi008
umc2293
umc2060
bnlg1046
umc2035
umc1705
umc1056
umc2294
umc1935
umc1850
mmp58
GRMZM2G116761
umc2298
umc1110
umc1224
umc1283
bnlg1287
dupssr10
bnlg2323
ZHL0301
umc1348
umc1966
Genetic Position
AGPv2
0.00
6.54
20.90
21.54
25.26
27.79
31.75
42.17
45.36
48.10
48.43
53.24
54.79
61.98
63.55
65.07
65.39
66.70
67.52
67.69
68.70
74.26
77.01
81.83
86.64
6,985,618
8,492,871
13,812,586
14,072,755
15,110,054
16,462,750
18,701,374
23,891,611
28,196,243
32,036,007
33,783,084
51,438,549
54,416,924
74,916,830
82,236,166
84,800,717
84,825,409
92,368,617
111,997,867
121,584,002
142,483,421
151,717,831
159,447,730
166,576,639
169,231,037
106
Appendix B
Supplemental Content: Fine
mapping of chromosome five
domestication genes in maize
107
B.1
Tables
Table B.1: PCR markers used for genotyping RCNILs including gene or SNP target,
AGPv2 position, and primer sequence.
Gene or SNP Name
AGPv2 Position
Primers
GRMZM2G003313
38,994,478
CCACAGAATCTCTCCACCAGA
CTTTTGCTTCTCACCCCAGA
GRMZM2G048045
62,595,351
GCCTACGAGCTGCAACAGG
GCCCTCCGTTCTACACACAG
GRMZM2G116761
82,236,265
TCGCATCTGGAAAGAGCTTC
TGAATTGCAAAAGAGGAAACA
PZE-105075181
82,970,868
GGCCCGGGCTAGAGAACCGA
GTGCGGAGCTTGGGACCGAC
GRMZM2G158520
82,952,563
TCGGGCACGAAAGGTGTCGC
CACTCTCTCCCGCTCCCGCT
GRMZM2G387127
83,436,098
CGCAAGCCGATCTTTTACTC
GCAGTTGAACTCGAAGTGGA
GRMZM2G387127
83,436,808
CGCAAGCCGATCTTTTACTC
GCAGTTGAACTCGAAGTGGA
GRMZM2G026117
84,249,368
CTCAGGCCAAGGTCTCACTC
AGAGTGTGCGGCTTTCAGTT
umc1110
84,825,350
TTACACCAAGGTCCGAAACAAGAT
TCTTGGAAGGCAAGACTCTACCTG
PZE-105076775
85,553,605
CAAACCTCCCAAGAGAATGC
TTGATGCAGATTCGCTGAAC
GRMZM2G017882
85,864,165
GTCCGCCTCGGCGACCTAGA
CCAGAGGGGACCTGTGGGGG
AC207043.3 FG002
86,014,290
CCACACTCATTTGACCAACG
TGACGCGTGTTCTAGCTTGT
AC207043.3 FG002
86,014,338
CCACACTCATTTGACCAACG
TGACGCGTGTTCTAGCTTGT
PZE-105077135
86,221,700
AAAGACGCAGCAGGAGAGAG
TGCTACGTTACAGGCTGTCG
Table B.1: (continued)
108
Gene or SNP Name
AGPv2 Position
Primers
GRMZM2G102758
86,783,453
AGCAGGGTCAAGGACTACCA
TCCTGCAGCTCCTCTTCTTC
GRMZM2G063106
87,114,719
TGCATTTCTCTGACCTCCTTG
TCCGACTTGAGGATCCTGTT
umc1283
111,997,810
CTGCTCCCTTATGATGTGATGATG
TGCACTGAGGTGTAGGTAGAGCAA
GRMZM2G012923
151,446,717
AGCAAAGCATGGGCTAGTGT
GCCATGCTGCTTATGGATCT
GRMZM2G027886
159,447,674
AACAGCTTTGCTTCCCTGAA
CCCAGAGGATCCAGAGTCAG
umc1348
166,576,570
CTCACTGACACTTGAACACACACG
TTACTGGTCTCCTGATCCTTAGCG
umc1221
168,671,954
GCAACAGCAACTGGCAACAG
AAACAGGCACAAAGCATGGATAG
umc1966
169,230,959
GTTTTCGACGAGGGGACTACATTT
CACGGTTGAGAACTTCGCTTGTAG
109
Appendix C
Supplemental Content: The role of
cis regulatory evolution in maize
domestication
110
C.1
Figures
Figure C.1: Parent versus hybrid leaf tissue allele specific expression ratios. The parent
(x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against each
other. Regulatory category in terms of the combination of significant statistical tests
determined using the method described in methods is shown designated by color. Proportion and count of genes falling into the various regulatory categories are also shown in
the lower right hand corner barplot.
111
Figure C.2: Parent versus hybrid stem tissue allele specific expression ratios. The parent
(x-axis) versus F1 hybrid (y-axis) allele specific expression ratios are plotted against each
other. Regulatory category in terms of the combination of significant statistical tests
determined using the method described in methods is shown designated by color. Proportion and count of genes falling into the various regulatory categories are also shown in
the lower right hand corner barplot.
112
Figure C.3: Dominance by additivity ratio grouped by regulatory category. Density plots
of gene dominance by additivity (D/A) ratios for the three tissues grouped by regulatory
category. There is no obvious shift in the distribution for any of the tissues or regulatory
categories, indicating the gene regulatory category does not significantly impact overall
additivity or dominance.
113
C.2
Tables
Table C.1: Biological replicates of F1 hybrid and parent inbred lines for RNAseq expression
study with hybrid replicates internal and parent around the perimeter.
B73
TIL01
TIL03
TIL05
TIL09
TIL10
TIL11
TIL14
TIL15
TIL25
2/2/2
2/1/1
2/2/2
2/2/1
Inbred
2/2/2
2/2/2
4/2/2
CML103
Ki3
2/2/2
1/2/2
2/2/2
2/2/2
Mo17
Oh43
W22
Inbred
0/2/2
2/2/2
1/2/2
2/2/2
2/2/2
2/2/2
3/2/2
2/2/2
2/1/2
2/2/2
2/2/2
2/2/2
2/2/2
2/2/2
2/1/1
2/2/2
2/2/2
2/2/2
2/2/2
2/2/2
2/2/2
2/2/2
4/3/2
2/2/2
2/2/2
2/2/2
2/2/2
1/2/2
2/2/2
3/2/2
2/2/2
2/2/2
2/2/2
114
Table C.2: Adapter name, barcode sequence, and barcode length for Illumina adapters
used in RNAseq libraries.
Adapter #
Adapter Name
Barcode Sequence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
PE YC3
PE YC4
PE YC5
PE YC6
PE YC7
PE YC8
PE JM 1
PE JM 2
PE JM 3
PE JM 4
PE TB 1
PE TB 2
PE TB 3
PE TB 4
PE ZL1
GCATGT
TGTGCT
AGTCAT
GTAAGT
TCCTCT
CAGGTT
TCCAT
TAGCT
GTTCT
CGATT
ATCGT
GCTAT
TGGAT
ATGCT
CACTAT
Barcode Length
5
5
5
5
5
5
4
4
4
4
4
4
4
4
5
nt
nt
nt
nt
nt
nt
nt
nt
nt
nt
nt
nt
nt
nt
nt
115
Table C.3: Number of genomic paired end reads and coverage obtained for constructing
pseudo-transcriptomes.
Inbred Line
# Reads
genome coverage
CML103
Ki3
Mo17
Oh43
TI01
TI03
TI05
TI09
TI10
TI11
TI14
TI15
TI25
W22
4.46E+08
4.38E+08
2.57E+08
5.59E+08
3.44E+08
3.16E+08
4.76E+08
3.42E+08
5.29E+08
3.41E+08
3.22E+08
5.39E+08
4.27E+08
3.07E+08
21.24
19.85
11.37
20.56
14.5
13.15
17.8
15.21
24.29
15.97
13.82
24.22
19.93
13.19
Average
4.03E+08
17.50714
116
Table C.4: Proportion of divergence due to cis regulatory effect grouped by overall
parental divergence.
Gene Group1
N
Tissue
% cis ± SE
All genes
0 to 1
1 to 2
2 to 3
3 to 4
4 to 5
5+
15939
14140
1312
268
95
45
79
Ear
Ear
Ear
Ear
Ear
Ear
Ear
0.4519 ± 0.0021
0.4583 ± 0.0022
0.3918 ± 0.0081
0.3524 ± 0.0188
0.337 ± 0.0298
0.4713 ± 0.0495
0.7777 ± 0.0273
All genes
0 to 1
1 to 2
2 to 3
3 to 4
4 to 5
5+
15925
13784
1739
277
52
21
52
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
0.4164
0.4262
0.3309
0.3752
0.4458
0.6534
0.7707
All genes
0 to 1
1 to 2
2 to 3
3 to 4
4 to 5
5+
16018
14746
1000
149
40
23
60
Stem
Stem
Stem
Stem
Stem
Stem
Stem
0.4704 ± 0.0021
0.4715 ± 0.0022
0.4284 ± 0.0096
0.4629 ± 0.0233
0.5051 ± 0.0539
0.6365 ± 0.059
0.8081 ± 0.0248
1
±
±
±
±
±
±
±
0.0021
0.0022
0.0065
0.0173
0.0437
0.0566
0.0298
Group (except for “All genes”) indicates grouping of genes by the absolute value of the parent
log2(Maize:Teosinte) ratio.
117
Table C.5: The number of genes for which the maize or teosinte allele is expressed at a
higher level.
CCT Group
Tissue
Maize
Teosinte
A
A
A
B
B
B
C
C
C
ABC
ABC
ABC
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
34
16
19
319
265
249
594
533
558
947
814
826
9
5
8
193
159
155
396
310
382
598
474
545
c
b
a
ABC
ABC
B73
CML103
Ki3
Mo17
Oh43
W22
non-B73
B73
CML103
Ki3
Mo17
Oh43
W22
non-B73
c
b
a
Stem
Leaf
Ear
Tissue
524
582
555
520
512
546
545
465
556
506
478
477
502
494
569
661
602
605
594
640
606
Teosinte Bias
0
7
2
4
1
1
0
0
6
3
4
0
1
0
1
6
5
12
1
4
0
No Bias
847
739
793
806
857
814
826
823
688
760
765
807
775
794
975
839
915
845
949
889
939
Maize Bias
Fisher’s Exact Test for B73 versus cumulative non-B73 ratio, p = 0.1821.
Fisher’s Exact Tes for B73 versus cumulative non-B73 ratiot, p = 0.2539.
Fisher’s Exact Test for B73 versus cumulative non-B73 ratio, p = 0.4326.
ABC
CCT Group
B73
CML103
Ki3
Mo17
Oh43
W22
non-B73
Maize Inbred
1.6164
1.2698
1.4288
1.5500
1.6738
1.4908
1.5156
1.7699
1.2374
1.5020
1.6004
1.6918
1.5438
1.6073
1.7135
1.2693
1.5199
1.3967
1.5976
1.3891
1.5495
Maize:Teosinte Ratio
Table C.6: Bias for the maize allele grouped by inbred line for the three tissues in the CCT-ABC gene list.
118
119
Table C.7: Allele specific expression variation among F1 hybrids explained by maize and
teosinte parent.
Tissue
Category
R2 maize
R2 teosinte
Maize/Teosinte
Gene Count
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
All genes
All genes
All genes
ABC
ABC
ABC
AB
AB
AB
A
A
A
32.48%
31.76%
32.04%
32.25%
31.94%
32.20%
30.76%
30.61%
32.28%
26.58%
20.11%
28.86%
38.21%
37.18%
38.56%
41.37%
39.79%
41.26%
42.95%
41.69%
42.22%
48.86%
47.63%
48.26%
85.01%
85.43%
83.09%
77.96%
80.27%
78.05%
71.64%
73.42%
76.45%
54.41%
42.22%
59.80%
13194
13121
13305
1545
1288
1371
555
445
431
43
21
27
120
Table C.8: Number of genes for which the maize and/or teosinte parent contributed to
the variance among the F1 hybrid gene expression ratios (heterogeneous) and genes for
which there was no variance in expression attributable to the maize or teosinte parent
(homogeneous). CCT genes in groups A, B, and C in the three tissue types are shown.
Tissue
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
Ear
Leaf
Stem
Category
All genes
All genes
All genes
ABC
ABC
ABC
AB
AB
AB
A
A
A
Heterogeneous
Maize
Teosinte
Maize+Teosinte
1880
1810
1924
195
165
193
67
54
57
3
1
2
2959
3005
3215
417
322
374
157
117
128
17
6
8
2504
2327
2645
350
285
321
120
104
105
5
3
7
Homogenous
Total
5851
5979
5521
583
516
483
211
170
141
18
11
10
13194
13121
13305
1545
1288
1371
555
445
431
43
21
27
Tissue
Ear
Ear
Leaf
Leaf
Stem
Stem
Union
Union
Ear
Ear
Leaf
Leaf
Stem
Stem
Union
Union
Ear
Ear
Leaf
Leaf
Stem
Stem
Union
Union
CCT Group
A
A
A
A
A
A
A
A
AB
AB
AB
AB
AB
AB
AB
AB
ABC
ABC
ABC
ABC
ABC
ABC
ABC
ABC
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
RNAseq-NDE
RNAseq-DE
8529
1083
8842
943
8835
985
7970
2170
9244
368
9482
303
9532
288
9414
726
9587
25
9774
11
9804
16
10097
43
MicroArray-NDE
136
52
147
48
154
46
121
90
165
23
170
25
175
25
163
48
184
4
192
3
198
2
203
8
MicroArray -DE
Observed
8498.77
1113.23
8813.36
971.64
8809.58
1010.42
7926.07
2213.93
9228.50
383.50
9463.41
321.59
9513.25
306.75
9381.78
758.22
9583.56
28.44
9771.27
13.73
9802.36
17.64
10090.04
49.96
166.23
21.77
175.64
19.36
179.42
20.58
164.93
46.07
180.50
7.50
188.59
6.41
193.75
6.25
195.22
15.78
187.44
0.56
194.73
0.27
199.64
0.36
209.96
1.04
MicroArray -NDE MicroArray -DE
Expected
Table C.9: Comparison of observed and expected numbers of genes classified as differentially expressed (DE) or not differentially expressed (NDE) by RNAseq and MicroArray assays in groups A, B, and C in the three tissue types.
121
122
Table C.10: Regulatory categories for genes identified as differentially expressed between
maize and teosinte by microarray assays.
Ear
Ambiguous
Cis + Trans
Cis only
Cis x Trans
Componesatory
Conserved
Trans only
Total Genes
5.81%
25.73%
26.14%
6.64%
7.05%
13.28%
15.35%
241
Leaf
Stem
7.66% 9.65%
29.12% 22.39%
28.74% 30.89%
6.13% 8.49%
8.05% 6.56%
8.05% 12.74%
12.26% 9.27%
261
259
123
Table C.11: Fisher’s Exact Tests for the overlap between genes associated with differentially methylated regions (DMRs) and CCT-ABC genes from each of the three experimental tissues in our work.
Overlap
Expected
Observed
p-value
Ear
Leaf
Stem
Union
13.466 11.387 12.468
19
14
17
0.1092 0.4309 0.1755
27.493
34
0.1605
124
Table C.12: Number of candidate genes neighboring differentially methylated regions
(DMRs) between maize and teosinte and proportion in which expression data agrees with
methylated status.
Ear
Total
A
B
C
Total-agree
A-agree
B-agree
C-agree
19
1
3
15
57.90%
100%
100%
46.70%
Leaf
Stem
14
17
0
0
3
3
11
14
50.00% 58.80%
NA
NA
33.30% 33.30%
54.50% 64.30%
125
Table C.13: Characteristics of dominance/additivity ratios from a genome-wide analysis
including basic statistics such as max, min, mean, and median as well as average D/A
ratio for seven regulatory categories and the CCT candidate lists.
Ear
Leaf
Stem
Min
Max
Median
Mean
Positive D/A
Negative D/A
Pos:Neg Ratio
N
-10.4557
10.56194
0.032991
0.035682
6863
6331
1.084031
13194
-273.675
70.80451
0.160156
0.211276
7385
5736
1.287483
13121
-27.8545
78.71309
-0.01118
-0.01638
6593
6712
0.982271
13305
Z-test p-value
Binomial p-value
2.442e-05
3.775e-06
1.486e-13
4.741e-47
0.354
0.306
Ambiguous
Cis + Trans
Cis only
Cis x Trans
Compensatory
Conserved
Trans only
-0.00408
-0.00204
-0.02053
0.14616
0.052921
0.049997
0.08708
0.020225
0.455915
0.044602
0.32702
-0.08854
0.009092
0.382572
-0.00841
0.05871
0.063987
-0.16874
0.002721
-0.05574
-0.10058
CCT-A
CCT-AB
CCT-ABC
0.03508
-0.0169
-0.04257
0.329661
0.094785
0.208951
0.026347
0.129459
0.077445
126
Table C.14: Additive and dominant gene counts for the A, AB, and ABC cis and trans only
candidate lists. Dominance cells contain the number of genes for which the maize:teosinte
allele was dominant. Fisher’s exact tests (FET) interrogate whether the degree of dominance/additivity differs between the cis and trans classes. The binomial test (BT) asks
whether the number of maize:teosinte dominant alleles are equal.
Ear
A
Leaf
Add
Dom
Add
Dom
Add
Dom
Cis only
11
1:0
5
1:0
3
2:1
Trans only
13
19:2*
5
4:3
2
0:2
FET p<0.005
AB
ABC
FET p>0.05
FET p>0.05
Cis only
95
22:18
53
18:17
52
19:20
Trans only
112
89:35*
72
81:29*
23
10:13
FET p<0.005
FET p<0.005
FET p>0.05
Cis only
266
62:65
136
50:56
178
68:71
Trans only
203
112:68*
121
107:65*
67
35:42
FET p<0.005
*
Stem
Binomial test p-value < 0.005.
FET p<0.005
FET p<0.05
127
Table C.15: Degree of overlap between our CCT (AB list) genes and genes in different
transcription factor families.
Family
Tissue
Assayed
Genes
Observed
Overlap
Expected
Overlap
FET
p-value
AP2
ARF
ARR-B
B3
BBR-BPC
BES1
bHLH
bZIP
C2H2
C3H
CAMTA
CO-like
CPP
DBB
Dof
E2F/DP
EIL
ERF
FAR1
G2-like
GATA
GeBP
GRAS
GRF
HB-other
HB-PHD
HD-ZIP
HSF
LBD
LFY
LSD
M-type
MIKC
MYB
MYB related
NAC
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
6
27
8
18
4
3
42
51
28
42
8
3
7
4
7
10
4
17
15
11
10
14
21
8
14
2
19
12
3
0
3
6
23
23
42
25
0
4
0
1
0
0
1
0
2
1
0
0
1
0
0
0
0
0
2
0
0
0
1
0
0
0
1
1
0
0
0
1
2
2
4
0
0.25
1.14
0.34
0.76
0.17
0.13
1.77
2.15
1.18
1.77
0.34
0.13
0.29
0.17
0.29
0.42
0.17
0.72
0.63
0.46
0.42
0.59
0.88
0.34
0.59
0.08
0.8
0.5
0.13
0
0.13
0.25
0.97
0.97
1.77
1.05
1
0.03
1
0.54
1
1
0.84
1
0.33
0.84
1
1
0.26
1
1
1
1
1
0.13
1
1
1
0.59
1
1
1
0.56
0.4
1
NA
1
0.23
0.25
0.25
0.1
1
Table C.15: (continued)
128
Family
Tissue
Assayed
Genes
Observed
Overlap
Expected
Overlap
FET
p-value
NF-X1
NF-YA
NF-YB
NF-YC
Nin-like
RAV
S1Fa-like
SBP
SRS
STAT
TALE
TCP
Trihelix
VOZ
Whirly
WOX
WRKY
YABBY
ZF-HD
ALL
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
2
10
7
7
11
0
0
12
2
1
12
9
22
2
2
0
20
4
1
649
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
24
0.08
0.42
0.29
0.29
0.46
0
0
0.5
0.08
0.04
0.5
0.38
0.93
0.08
0.08
0
0.84
0.17
0.04
27.3
1
1
1
1
0.38
NA
NA
1
1
1
1
1
1
1
1
NA
1
1
1
0.77
AP2
ARF
ARR-B
B3
BBR-BPC
BES1
bHLH
bZIP
C2H2
C3H
CAMTA
CO-like
CPP
DBB
Dof
E2F/DP
EIL
ERF
FAR1
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
8
27
8
16
4
3
41
42
29
41
8
5
7
6
8
10
4
15
14
0
0
0
1
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0.27
0.92
0.27
0.54
0.14
0.1
1.39
1.42
0.98
1.39
0.27
0.17
0.24
0.2
0.27
0.34
0.14
0.51
0.47
1
1
1
0.42
1
1
1
1
0.26
1
1
1
1
1
1
1
1
1
1
Table C.15: (continued)
129
Family
Tissue
Assayed
Genes
Observed
Overlap
Expected
Overlap
FET
p-value
G2-like
GATA
GeBP
GRAS
GRF
HB-other
HB-PHD
HD-ZIP
HSF
LBD
LFY
LSD
M-type
MIKC
MYB
MYB related
NAC
NF-X1
NF-YA
NF-YB
NF-YC
Nin-like
RAV
S1Fa-like
SBP
SRS
STAT
TALE
TCP
Trihelix
VOZ
Whirly
WOX
WRKY
YABBY
ZF-HD
ALL
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
16
14
14
19
6
14
2
16
10
1
0
3
3
9
31
44
28
2
9
5
8
10
0
0
11
0
1
12
8
22
2
2
0
16
4
1
623
0
0
0
1
0
1
0
1
0
1
0
0
0
2
2
1
2
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
15
0.54
0.47
0.47
0.64
0.2
0.47
0.07
0.54
0.34
0.03
0
0.1
0.1
0.31
1.05
1.49
0.95
0.07
0.31
0.17
0.27
0.34
0
0
0.37
0
0.03
0.41
0.27
0.75
0.07
0.07
0
0.54
0.14
0.03
21.13
1
1
1
0.48
1
0.38
1
0.42
1
0.03
NA
1
1
0.04
0.28
0.78
0.25
1
1
1
1
1
NA
NA
1
NA
1
1
1
0.53
1
1
NA
1
1
1
0.94
AP2
ARF
Stem
Stem
8
27
0
3
0.26
0.87
1
0.06
Table C.15: (continued)
130
Family
Tissue
Assayed
Genes
Observed
Overlap
Expected
Overlap
FET
p-value
ARR-B
B3
BBR-BPC
BES1
bHLH
bZIP
C2H2
C3H
CAMTA
CO-like
CPP
DBB
Dof
E2F/DP
EIL
ERF
FAR1
G2-like
GATA
GeBP
GRAS
GRF
HB-other
HB-PHD
HD-ZIP
HSF
LBD
LFY
LSD
M-type
MIKC
MYB
MYB related
NAC
NF-X1
NF-YA
NF-YB
NF-YC
Nin-like
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
8
14
4
3
50
47
28
41
8
4
7
6
8
10
4
16
15
14
12
13
20
7
15
2
17
14
2
0
3
4
10
23
42
29
2
10
6
7
11
0
0
0
0
2
1
2
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
2
0
1
0
0
0
0
1
2
2
1
0
0
1
0
0
0
0.26
0.45
0.13
0.1
1.62
1.52
0.91
1.33
0.26
0.13
0.23
0.19
0.26
0.32
0.13
0.52
0.49
0.45
0.39
0.42
0.65
0.23
0.49
0.06
0.55
0.45
0.06
0
0.1
0.13
0.32
0.75
1.36
0.94
0.06
0.32
0.19
0.23
0.36
1
1
1
1
0.49
0.79
0.23
0.74
1
1
1
1
1
1
0.12
1
1
1
1
1
1
1
0.08
1
0.43
1
1
NA
1
0.12
0.04
0.17
0.75
1
1
0.28
1
1
1
Table C.15: (continued)
131
Family
Tissue
Assayed
Genes
Observed
Overlap
Expected
Overlap
FET
p-value
RAV
S1Fa-like
SBP
SRS
STAT
TALE
TCP
Trihelix
VOZ
Whirly
WOX
WRKY
YABBY
ZF-HD
ALL
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
0
0
11
2
1
13
6
23
2
2
0
19
4
0
640
0
0
0
0
0
0
1
0
0
0
0
0
0
0
20
0
0
0.36
0.06
0.03
0.42
0.19
0.75
0.06
0.06
0
0.62
0.13
0
20.73
NA
NA
1
1
1
1
0.18
1
1
1
NA
1
1
NA
0.6
AP2
ARF
ARR-B
B3
BBR-BPC
BES1
bHLH
bZIP
C2H2
C3H
CAMTA
CO-like
CPP
DBB
Dof
E2F/DP
EIL
ERF
FAR1
G2-like
GATA
GeBP
GRAS
GRF
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
10
27
8
18
4
3
53
52
31
42
8
5
7
6
9
10
4
18
15
18
15
15
23
8
0
6
0
2
0
0
3
1
4
2
0
0
1
0
0
0
1
0
2
0
0
0
2
0
0.76
2.06
0.61
1.38
0.31
0.23
4.05
3.97
2.37
3.21
0.61
0.38
0.54
0.46
0.69
0.76
0.31
1.38
1.15
1.38
1.15
1.15
1.76
0.61
1
0.01
1
0.41
1
1
0.78
0.98
0.21
0.84
1
1
0.43
1
1
1
0.27
1
0.32
1
1
1
0.53
1
Table C.15: (continued)
132
Family
Tissue
Assayed
Genes
Observed
Overlap
Expected
Overlap
FET
p-value
HB-other
HB-PHD
HD-ZIP
HSF
LBD
LFY
LSD
M-type
MIKC
MYB
MYB related
NAC
NF-X1
NF-YA
NF-YB
NF-YC
Nin-like
RAV
S1Fa-like
SBP
SRS
STAT
TALE
TCP
Trihelix
VOZ
Whirly
WOX
WRKY
YABBY
ZF-HD
ALL
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
Union
15
2
20
14
3
0
3
7
25
32
48
35
2
10
7
8
11
0
0
12
2
1
14
9
24
2
2
0
23
4
1
724
2
0
3
1
1
0
0
2
5
3
4
2
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
49
1.15
0.15
1.53
1.07
0.23
0
0.23
0.54
1.91
2.45
3.67
2.68
0.15
0.76
0.54
0.61
0.84
0
0
0.92
0.15
0.08
1.07
0.69
1.83
0.15
0.15
0
1.76
0.31
0.08
55.34
0.32
1
0.19
0.67
0.21
NA
1
0.09
0.04
0.45
0.51
0.76
1
0.55
1
1
0.58
NA
NA
1
1
1
1
0.51
0.85
1
1
NA
1
1
1
0.84
Table C.15: (continued)
133
Table C.16: Degree of overlap between CCT (AB list) differentially expressed genes and
genes in the 1.5 support intervals for QTL from a previous study.
Trait
Tissue
Assayed
Genes
Observed
Overlap
Expected
Overlap
FET
p-value
BARE
DIAM
DIS
DTP
GLCO
GLU
KRN
KW
LEN
PROL
STAM
BARE
DIAM
DIS
DTP
GLCO
GLU
KRN
KW
LEN
PROL
STAM
BARE
DIAM
DIS
DTP
GLCO
GLU
KRN
KW
LEN
PROL
STAM
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Ear
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Leaf
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
Stem
0
29
4
10
3
0
15
17
4
5
10
0
28
4
9
3
0
13
17
4
5
9
0
28
4
10
3
0
14
18
4
5
10
0
4
1
1
0
0
2
1
1
0
1
0
0
1
0
0
0
0
3
0
0
1
0
1
0
0
0
0
1
3
0
0
0
0
1.22
0.17
0.42
0.13
0
0.63
0.72
0.17
0.21
0.42
0
0.95
0.14
0.31
0.1
0
0.44
0.58
0.14
0.17
0.31
0
0.91
0.13
0.32
0.1
0
0.45
0.58
0.13
0.16
0.32
1
0.03
0.16
0.35
1
1
0.13
0.52
0.16
1
0.35
1
1
0.13
1
1
1
1
0.02
1
1
0.27
1
0.6
1
1
1
1
0.37
0.02
1
1
1
134
Table C.17: Degree overlap between our CCT (AB list) differentially expressed genes and
genes in metabolic pathways defined in KEGG.
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
26
14
0
0.046
1
10
7
0
0.023
1
EarCCT-A
33
16
0
0.052
1
EarCCT-A
10
5
0
0.016
1
11
7
0
0.023
1
32
19
0
0.062
1
34
27
0
0.088
1
16
8
0
0.026
1
46
31
0
0.101
1
64
45
0
0.147
1
12
5
0
0.016
1
21
13
0
0.042
1
98
59
0
0.192
1
25
15
0
0.049
1
EarCCT-A
8
8
0
0.026
1
EarCCT-A
353
223
0
0.727
1
Pathway
Group1
Alpha-linoleic Acid
Metabolism
chidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
EarCCT-A
EarCCT-A
ALL
EarCCT-A
EarCCT-A
EarCCT-A
EarCCT-A
EarCCT-A
EarCCT-A
EarCCT-A
EarCCT-A
EarCCT-A
EarCCT-A
Table C.17:
1
Tissue, candidate, and level of list.
135
Group1
Pathway
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTAB
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
26
14
1
0.589
0.452
10
7
0
0.294
1
33
16
0
0.673
1
10
5
0
0.21
1
11
7
0
0.294
1
32
19
2
0.799
0.189
34
27
1
1.136
0.687
16
8
0
0.337
1
46
31
0
1.304
1
64
45
0
1.893
1
12
5
1
0.21
0.193
21
13
0
0.547
1
98
59
3
2.482
0.454
Tissue, candidate, and level of list.
136
Group1
Pathway
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
EarCCTAB
EarCCTAB
EarCCTAB
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
25
15
1
0.631
0.475
8
8
1
0.337
0.291
353
223
8
9.38
0.726
26
14
5
1.639
0.018
10
7
1
0.82
0.582
33
16
2
1.874
0.575
10
5
0
0.585
1
11
7
1
0.82
0.582
32
19
3
2.225
0.388
34
27
5
3.162
0.203
16
8
0
0.937
1
46
31
3
3.63
0.721
64
45
5
5.269
0.619
Tissue, candidate, and level of list.
137
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
12
5
1
0.585
0.464
21
13
0
1.522
1
98
59
7
6.909
0.545
25
15
2
1.756
0.539
8
8
1
0.937
0.631
353
223
28
26.113
0.376
26
14
1
0.062
0.06
10
7
0
0.031
1
Eartrans-A
33
16
0
0.07
1
Eartrans-A
10
5
0
0.022
1
11
7
1
0.031
0.03
32
19
0
0.084
1
34
27
0
0.119
1
16
8
0
0.035
1
46
31
0
0.136
1
Group1
Pathway
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
EarCCTABC
Eartrans-A
Eartrans-A
Eartrans-A
Eartrans-A
Eartrans-A
Eartrans-A
Eartrans-A
Table C.17:
1
Tissue, candidate, and level of list.
138
Pathway
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Group1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
Eartrans-A
64
45
2
0.198
0.017
12
5
1
0.022
0.022
21
13
0
0.057
1
98
59
2
0.259
0.028
25
15
0
0.066
1
Eartrans-A
8
8
0
0.035
1
Eartrans-A
353
223
5
0.98
0.003
26
14
2
0.506
0.089
10
7
0
0.253
1
33
16
1
0.578
0.445
10
5
2
0.181
0.012
11
7
1
0.253
0.227
32
19
1
0.687
0.503
34
27
2
0.976
0.255
Eartrans-A
Eartrans-A
Eartrans-A
Eartrans-A
EartransAB
EartransAB
EartransAB
EartransAB
EartransAB
EartransAB
EartransAB
Table C.17:
1
Tissue, candidate, and level of list.
139
Group1
Pathway
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
EartransAB
EartransAB
EartransAB
EartransAB
EartransAB
EartransAB
EartransAB
EartransAB
EartransAB
EartransABC
EartransABC
EartransABC
EartransABC
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
16
8
0
0.289
1
46
31
4
1.121
0.025
64
45
5
1.627
0.023
12
5
1
0.181
0.168
21
13
0
0.47
1
98
59
3
2.133
0.36
25
15
0
0.542
1
8
8
0
0.289
1
353
223
15
8.062
0.016
26
14
2
1.213
0.345
10
7
0
0.606
1
33
16
1
1.386
0.766
10
5
2
0.433
0.063
Tissue, candidate, and level of list.
140
Group1
Pathway
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
EartransABC
EartransABC
EartransABC
EartransABC
EartransABC
EartransABC
EartransABC
EartransABC
EartransABC
EartransABC
EartransABC
EartransABC
LeafCCT-A
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
11
7
2
0.606
0.118
32
19
1
1.646
0.821
34
27
3
2.339
0.418
16
8
1
0.693
0.516
46
31
6
2.686
0.047
64
45
7
3.898
0.09
12
5
1
0.433
0.364
21
13
0
1.126
1
98
59
5
5.111
0.588
25
15
2
1.299
0.378
8
8
0
0.693
1
353
223
23
19.319
0.218
26
13
0
0.021
1
Tissue, candidate, and level of list.
141
Pathway
Group1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
LeafCCT-A
10
7
0
0.011
1
LeafCCT-A
33
19
0
0.03
1
LeafCCT-A
10
6
0
0.01
1
11
7
0
0.011
1
32
19
0
0.03
1
34
30
0
0.048
1
16
9
0
0.014
1
46
34
0
0.054
1
64
47
0
0.075
1
12
5
0
0.008
1
21
14
0
0.022
1
98
62
0
0.099
1
25
15
0
0.024
1
LeafCCT-A
8
8
0
0.013
1
ALL
LeafCCT-A
353
236
0
0.378
1
Alpha-linoleic Acid
Metabolism
LeafCCTAB
26
13
0
0.441
1
LeafCCT-A
LeafCCT-A
LeafCCT-A
LeafCCT-A
LeafCCT-A
LeafCCT-A
LeafCCT-A
LeafCCT-A
LeafCCT-A
LeafCCT-A
Table C.17:
1
Tissue, candidate, and level of list.
142
Group1
Pathway
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
LeafCCTAB
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
10
7
0
0.237
1
33
19
2
0.644
0.134
10
6
2
0.203
0.016
11
7
1
0.237
0.215
32
19
0
0.644
1
34
30
2
1.017
0.271
16
9
0
0.305
1
46
34
1
1.153
0.691
64
47
2
1.594
0.477
12
5
0
0.17
1
21
14
0
0.475
1
98
62
1
2.103
0.883
25
15
0
0.509
1
Tissue, candidate, and level of list.
143
Pathway
Group1
Synthesis/Degradation of
Ketone Bodies
LeafCCTAB
LeafCCTAB
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
8
8
1
0.271
0.241
353
236
9
8.004
0.408
26
13
1
1.276
0.739
10
7
1
0.687
0.515
33
19
3
1.865
0.285
10
6
2
0.589
0.111
11
7
1
0.687
0.515
32
19
2
1.865
0.569
34
30
7
2.945
0.023
16
9
0
0.883
1
46
34
4
3.338
0.432
64
47
4
4.614
0.691
12
5
0
0.491
1
Tissue, candidate, and level of list.
144
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
21
14
0
1.374
1
98
62
6
6.086
0.577
25
15
1
1.472
0.788
8
8
1
0.785
0.563
353
236
26
23.167
0.296
26
13
0
0.026
1
10
7
0
0.014
1
Leaftrans-A
33
19
0
0.038
1
Leaftrans-A
10
6
0
0.012
1
11
7
0
0.014
1
32
19
0
0.038
1
34
30
0
0.059
1
16
9
0
0.018
1
46
34
0
0.067
1
64
47
0
0.093
1
Group1
Pathway
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
LeafCCTABC
Leaftrans-A
Leaftrans-A
Leaftrans-A
Leaftrans-A
Leaftrans-A
Leaftrans-A
Leaftrans-A
Leaftrans-A
Table C.17:
1
Tissue, candidate, and level of list.
145
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
12
5
0
0.01
1
21
14
0
0.028
1
98
62
0
0.123
1
25
15
0
0.03
1
Leaftrans-A
8
8
0
0.016
1
Leaftrans-A
353
236
0
0.468
1
26
13
1
0.447
0.365
10
7
0
0.241
1
33
19
0
0.653
1
10
6
0
0.206
1
11
7
0
0.241
1
32
19
1
0.653
0.486
34
30
0
1.031
1
16
9
0
0.309
1
Pathway
Group1
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
Leaftrans-A
Leaftrans-A
Leaftrans-A
Leaftrans-A
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
Table C.17:
1
Tissue, candidate, and level of list.
146
Group1
Pathway
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis and
Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransAB
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
46
34
0
1.169
1
64
47
0
1.616
1
12
5
1
0.172
0.16
21
14
1
0.481
0.387
98
62
2
2.131
0.634
25
15
1
0.516
0.408
8
8
0
0.275
1
353
236
6
8.112
0.826
26
13
2
1.212
0.345
10
7
0
0.652
1
33
19
1
1.771
0.844
10
6
0
0.559
1
11
7
0
0.652
1
Tissue, candidate, and level of list.
147
Group1
Pathway
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
LeaftransABC
StemCCT-A
StemCCT-A
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
32
19
2
1.771
0.54
34
30
3
2.796
0.539
16
9
0
0.839
1
46
34
2
3.169
0.839
64
47
3
4.381
0.827
12
5
1
0.466
0.387
21
14
1
1.305
0.746
98
62
3
5.779
0.937
25
15
3
1.398
0.158
8
8
1
0.746
0.543
353
236
17
21.997
0.897
26
15
0
0.03
1
10
7
0
0.014
1
Tissue, candidate, and level of list.
148
Pathway
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Group1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
StemCCT-A
33
17
0
0.034
1
StemCCT-A
10
6
0
0.012
1
11
7
0
0.014
1
32
19
0
0.039
1
34
30
0
0.061
1
16
8
0
0.016
1
46
32
0
0.065
1
64
47
0
0.095
1
12
6
0
0.012
1
21
14
0
0.028
1
98
61
1
0.124
0.117
25
16
0
0.032
1
StemCCT-A
8
8
0
0.016
1
StemCCT-A
353
235
1
0.477
0.382
26
15
1
0.486
0.39
10
7
0
0.227
1
StemCCT-A
StemCCT-A
StemCCT-A
StemCCT-A
StemCCT-A
StemCCT-A
StemCCT-A
StemCCT-A
StemCCT-A
StemCCT-A
StemCCTAB
StemCCTAB
Table C.17:
1
Tissue, candidate, and level of list.
149
Pathway
Group1
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
StemCCTAB
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
33
17
1
0.551
0.429
10
6
0
0.194
1
11
7
1
0.227
0.206
32
19
1
0.615
0.465
34
30
1
0.972
0.628
16
8
0
0.259
1
46
32
0
1.037
1
64
47
2
1.523
0.453
12
6
0
0.194
1
21
14
0
0.454
1
98
61
1
1.976
0.866
25
16
1
0.518
0.41
8
8
1
0.259
0.232
Tissue, candidate, and level of list.
150
Pathway
Group1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
ALL
StemCCTAB
353
235
8
7.613
0.494
26
15
3
1.546
0.196
10
7
1
0.721
0.533
33
17
2
1.752
0.535
10
6
0
0.618
1
11
7
2
0.721
0.157
32
19
1
1.958
0.874
34
30
4
3.091
0.374
16
8
1
0.824
0.581
46
32
1
3.297
0.969
64
47
9
4.843
0.048
12
6
0
0.618
1
21
14
0
1.443
1
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
Table C.17:
1
Tissue, candidate, and level of list.
151
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
98
61
9
6.286
0.172
25
16
1
1.649
0.825
8
8
1
0.824
0.581
353
235
29
24.215
0.176
26
15
0
0.006
1
10
7
0
0.003
1
Stemtrans-A
33
17
0
0.006
1
Stemtrans-A
10
6
0
0.002
1
11
7
0
0.003
1
32
19
0
0.007
1
34
30
0
0.011
1
16
8
0
0.003
1
46
32
0
0.012
1
64
47
0
0.018
1
12
6
0
0.002
1
21
14
0
0.005
1
Group1
Pathway
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
StemCCTABC
StemCCTABC
StemCCTABC
StemCCTABC
Stemtrans-A
Stemtrans-A
Stemtrans-A
Stemtrans-A
Stemtrans-A
Stemtrans-A
Stemtrans-A
Stemtrans-A
Stemtrans-A
Stemtrans-A
Table C.17:
1
Tissue, candidate, and level of list.
152
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
98
61
0
0.023
1
25
16
0
0.006
1
Stemtrans-A
8
8
0
0.003
1
Stemtrans-A
353
235
0
0.088
1
26
15
0
0.168
1
10
7
0
0.078
1
33
17
0
0.19
1
10
6
0
0.067
1
11
7
0
0.078
1
32
19
0
0.213
1
34
30
0
0.336
1
16
8
1
0.09
0.086
46
32
0
0.358
1
64
47
0
0.526
1
Pathway
Group1
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
Stemtrans-A
Stemtrans-A
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransAB
Table C.17:
1
Tissue, candidate, and level of list.
153
Group1
Pathway
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
Alpha-linoleic Acid
Metabolism
Arachidonic Acid
Metabolism
Biosynthesis of
Unsaturated Fatty
Acids
Cutin, Suberine,
and Wax
Biosynthesis
Ether Lipid
Metabolism
Fatty Acid
Biosynthesis
Fatty Acid
Degradation
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransAB
StemtransABC
StemtransABC
StemtransABC
StemtransABC
StemtransABC
StemtransABC
StemtransABC
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
12
6
0
0.067
1
21
14
0
0.157
1
98
61
1
0.683
0.498
25
16
0
0.179
1
8
8
0
0.09
1
353
235
2
2.632
0.743
26
15
0
0.601
1
10
7
0
0.28
1
33
17
0
0.681
1
10
6
0
0.24
1
11
7
0
0.28
1
32
19
0
0.761
1
34
30
1
1.202
0.707
Tissue, candidate, and level of list.
154
Group1
Pathway
Fatty Acid
Elongation
Glycerolipid
Metabolism
Glycerophospholipid
Metabolism
Linoleic Acid
Metabolism
Sphingolipid
Metabolism
Starch and sucrose
metabolism
Steroid
Biosynthesis
Synthesis/Degradation of
Ketone Bodies
ALL
StemtransABC
StemtransABC
StemtransABC
StemtransABC
StemtransABC
StemtransABC
StemtransABC
StemtransABC
StemtransABC
Table C.17:
1
Pathway
Genes
Assayed
Genes
Overlap
(obs)
Overlap
(exp)
FET
p-value
16
8
1
0.32
0.279
46
32
1
1.282
0.73
64
47
1
1.883
0.854
12
6
0
0.24
1
21
14
0
0.561
1
98
61
4
2.444
0.228
25
16
1
0.641
0.48
8
8
1
0.32
0.279
353
235
8
9.414
0.73
Tissue, candidate, and level of list.
155
Table C.18: Significantly enriched and depleted GO terms from CCT and trans only gene
lists including tissue, group, accession, description, counts, rate of occurrence, and FDR
corrected p-values.
GO
Description
Cand.
genes
in acc.
Genes
in acc.
Prop.
cand.
genes
Prop.
assayed
genes
FDR
chloroplast
135
937
0.144
0.071
0.002
plastid
146
1062
0.137
0.081
0.007
thylakoid
35
171
0.205
0.013
0.012
Leaf-CCTABC
chloroplast
thylakoid
membrane
26
115
0.226
0.009
0.017
Leaf-CCTABC
DNA binding
43
771
0.056
0.059
0.016
Ear-transA
chlorophyll
biosynthetic
process
3
13
0.231
0.001
0.027
26
228
0.114
0.017
0
26
228
0.114
0.017
0
40
474
0.084
0.036
0
20
160
0.125
0.012
0
6
13
0.462
0.001
0.001
49
798
0.061
0.06
0.013
Group1
Leaf-CCTABC
Leaf-CCTABC
Leaf-CCTABC
Ear-transAB
Ear-transAB
Ear-transAB
Ear-transAB
Ear-transAB
2
nucleic acid
binding
transcription
factor activity
sequencespecific DNA
binding
transcription
factor activity
regulation of
transcription,
DNA-dependent
sequencespecific DNA
binding
chlorophyll
biosynthetic
process
Ear-transAB
Table C.18:
DNA binding
1
Tissue, candidate, and level of list.
2
Under-represented GO term
156
Group
GO
Description
Cand.
genes
in acc.
Genes
in acc.
Prop.
cand.
genes
Prop.
assayed
genes
FDR
Ear-transAB
biological process
273
6578
0.042
0.499
0.022
45
228
0.197
0.017
0
45
228
0.197
0.017
0
69
474
0.146
0.036
0.003
7
13
0.538
0.001
0.012
29
160
0.181
0.012
0.02
ribosome
30
294
0.102
0.022
0
cell division
11
77
0.143
0.006
0.03
microtubule
9
60
0.15
0.005
0.048
Leaf-transAB
structural
constituent of
ribosome
20
224
0.089
0.017
0.048
Leaf-transABC
cell division
21
77
0.273
0.006
0.005
1
Ear-transABC
Ear-transABC
Ear-transABC
Ear-transABC
Ear-transABC
Leaf-transAB
Leaf-transAB
Leaf-transAB
nucleic acid
binding
transcription
factor activity
sequencespecific DNA
binding
transcription
factor activity
regulation of
transcription,
DNA-dependent
chlorophyll
biosynthetic
process
sequencespecific DNA
binding
Table C.18:
1
Tissue, candidate, and level of list.
2
Under-represented
157
Appendix D
Characterization of domestication
traits for selection candidate gene
Zea agamous2
158
D.1
Forward
This appendix details unpublished work on characterization of a selected gene in maize
known as Zea agamous2 (zag2 ), a homolog of the Arabidopsis thaliana gene Agamous.
Work was carried out by myself with other members of the Doebley Lab contributing to
genotyping and phenotyping efforts.
159
D.2
Introduction
Multiple studies have looked to identify the signature of selection, both artificial and
natural, in evolving species [20, 35, 111, 112]. In maize, recent studies have looked at
the signature of selection on both a gene by gene basis [113, 114] and in genome-wide
scans [55]. Knowing a gene was under selection during domestication can be difficult to
interpret in terms of phenotypic impact due to the inherent lack of phenotype association
in population genetic analyses. While some indication as to phenotypic effect can be
drawn from analysis of selected genes with protein domain annotation and gene ontology
tools, concrete association of a gene with a phenotypic effect using empirical data is still
desired.
One gene identified as the target of artificial selection in a recent study [114] is a known
homolog of Agamous from Arabidopsis thaliana. This Agamous homolog (Zea agamous2
or zag2 ) is located on the third chromosome at ∼137.2 megabases (AGPv2). The translated protein of zag2 is 258 amino acids long and downstream of the highly conserved
MADS-box domain shares approximately 45% identity and 60% similarity with the Arabidopsis Agamous gene [115]. Expression of zag2 is associated with the carpel or flowering
section of Arabidopsis thaliana and in maize zag2 appears to be exclusively expressed in
the carpels of developing ears [116]. The expression of zag2 mRNA in developing ears
suggests a likely effect on domestication phenotypes in the female inflorescence.
Our study of zag2 involved two techniques. First, we generated a set of recombinant
chromosome near isogenic lines (RCNILs) that had recombination breakpoints between
zag2 and both the next up and downstream genes. RCNILs were genotyped using three
markers (upstream, at the gene, and downstream) to identify the recombination breakpoints location with respect to zag2. Lines were then planted and phenotyped in multiple
environments for a large number of phenotypes that focused on ear traits, but also included a number of other plant and tassel traits. Second, a transgenic RNAi construct
160
carrying a portion of the zag2 gene was transformed into maize and backcrossed with
two maize inbred lines. We assessed percent fill as a proxy for sterility, while also testing
for presence of the construct using resistance to the BASTA herbicide. Neither of these
experiments produced evidence of a concrete link between a domestication phenotype and
zag2.
D.3
Methods
D.3.1
RCNILs
We screened 1,710 individuals in the winter and summer of 2009 that were drawn from a
heterogeneous inbred family, which was heterozygous at zag2. Markers used were umc1102
and PZD00100. From this screen, thirteen individuals with recombination breakpoints
between the upstream and downstream genes were identified. Recombinant individuals
were selfed and progeny were genotyped with the same markers in the winter 2010 season.
Homozygous individuals were identified and selfed again to produce founding members
of the RCNILs. RCNIL seed was then used in subsequent summers for seed increase and
replicated field block trials.
Genomic DNA was also extracted from founding RCNIL individuals and used to genotype at the zag2 coding sequence. This was done with PZD00013.3 (a Taqman SNP
marker) and ZHL0285-ZHL0286 (indel marker). We classified RCNILs by location of
breakpoint (up or downstream of zag2 ) and genotype (maize or teosinte) at zag2. This
resulted in four recombinant NIL classes and two control NIL classes.
Phenotyping blocks consisted of RCNILs and several control NILs that were homozygous maize or teosinte for the entire zag2 region. Lines were planted in randomized twelve
plant plots in four blocks each in the summer of 2010 and 2011 at the West Madison
161
Agricultural Research Station (WMARS). Thirteen plant architecture traits and seven
ear traits were measured (Table D.1) for up to five plants per phenotyping block.
Phenotype measurements were fit to a basic linear mixed model (Equation D.1) in R
[91] using the lme4 package. This basic model only included explanatory variables for the
RCNIL line (ai ) and the block as a random effect (bj ). This was done because the overall
size of blocks was small and positional variation due to X and Y position seemed unlikely
to be significant.
yijk = µ + ai + bj + eijk
(D.1)
After this model was fit, fixed effects estimates and standard errors were extracted and
we looked for association of the phenotypes (represented by fixed effects estimates) with
NIL class.
D.3.2
Transgenic RNAi lines
A zag2 interference RNA (RNAi) construct was developed and introduced into maize.
Thirteen insertion events of the RNAi construct were recovered and crossed by maize
inbreds B73 and A682. The resulting progeny were then planted in the summer of 2009
and ears were harvested for observation of phenotypes. We scored the percent fill of ears
in an effort to assess sterility of individuals with and without the RNAi construct insertion
events. The construct carried a BASTA herbicide resistance gene, which allowed for the
scoring of presence/absence of the construct by BASTA herbicide treatment.
In total, 275 individuals both BASTA resistant and susceptible (construct present
or absent) were harvested and scored for the sterility phenotype. Scoring was done by
estimation of percent fill in a randomized, blind method to avoid bias caused by knowledge of the individual construct genotype. Phenotypes were analyzed using simple t-test
comparisons in R [91].
162
Table D.1: Trait abbreviations and descriptions from the zag2 experiment.
Trait abbreviation
Description
CULM
BARE
BRNO
LWID
LCS
TBN
EAHT
PLHT
TILL
BRLH
NODE
LBIL
PROL
FILL
EARL
EARD
KRN
CUPR
STAM
KW
Culm diameter
Barren nodes
Number of nodes with silks
Leaf width
Length of central spike
Tassel branch number
Ear height
Plant height
Tillering index
Branch length including ear
Nodes on lateral branch
Lateral branch internode length
Prolificacy
Percent fill
Ear length
Ear diameter
Kernel row number
Cupules per rank
Percent staminate spikelets
Single kernel weight
163
D.4
Results
D.4.1
RCNILs
The fixed effects estimates and standard errors were sorted from least to greatest, plotted as barplots, and inspected for association with RCNIL type, in terms of genotype
upstream, at, and downstream of zag2. While a few single RCNILs differed from others,
there was no distinct clustering of RCNIL type in clearly differentiated phenotype groupings for any of the thirteen plant and seven ear phenotypes. Generally, the phenotype
estimates for the maize and teosinte control NILs also did not cleanly separate from each
other. An example of RCNIL estimates sorted from least to greatest is shown for single
kernel weight in Figure D.1). While the maize and teosinte control NILs are not intermingled, there is no clustering of genotypes of the four RCNIL types. Additionally, we
see RCNILs with lower phenotype estimates than the maize control NILs, suggesting that
if zag2 influences kernel weight it does so in an unexpected underdominant manner.
D.4.2
Transgenic RNAi lines
Generally, high percent fill was seen in transgenic plants. The two maize backgrounds
(B73 and A682) were not significantly different from each other in percent fill (t-test,
p = 0.525). Data was collected from only three RNAi transformation events in both
the A682 and B73 maize inbred backgrounds. In these three events, a consistent result
was only seen for one event (event 39 had no effect in either background, Table D.2),
suggesting the effect of an event is dependent on genetic background. Of the fifteen maize
transformation event and background combinations, only four had significantly different
percent fill between resistant plants (construct positive) and susceptible plants (construct
negative). Three of the significant results were large shifts with more than a 60% change
in percent fill while the fourth significant result was a more moderate 11% change.
164
Figure D.1: Single kernel weight estimates for zag2 RCNILs. RCNIL class is indicated in
the bar with error bars indicating the standard error. Maize and teosinte NILs are not
intermingled, however, there is also no clear separation of the RCNIL types (t1, t2, t3,
t4) when lines are sorted by estimated phenotype. Furthermore, RCNILs have a lower
phenotype than either of the control NILs, suggesting some sort of underdominance may
be at work.
165
Table D.2: Zag2 transgenic RNAi insertion event, background, phenotype, and t-test
p-value.
Maize
Background
Event
Percent Fill
(Resistant)
Percent Fill
(Susceptible)
p-value
A682
17
97.8%
98.9%
6.63e-01
B73
23
90.0%
91.0%
6.49e-01
B73
24
100.0%
97.8%
3.47e-01
B73
33
23.0%
97.5%
2.94e-04
A682
B73
35
35
20.0%
98.0%
94.0%
98.8%
6.01e-04
6.87e-01
A682
B73
39
39
98.0%
93.3%
90.0%
95.0%
1.15e-01
4.89e-01
B73
43
32.2%
95.0%
1.17e-07
A682
45
100.0%
92.9%
2.53e-01
A682
46
95.6%
94.4%
8.29e-01
A682
B73
47
47
84.4%
100.0%
83.0%
88.9%
8.56e-01
2.75e-03
B73
49
94.4%
86.7%
1.68e-01
B73
50
91.1%
90.0%
3.47e-01
166
D.5
Discussion
The results obtained from measurement of phenotypes in RCNILs do not present a clear
phenotypic effect of the zag2 gene. RCNIL estimates and standard errors of maize and
teosinte control lines were never significantly different from each other. Furthermore, the
remaining four genotype classes, distinguished by genotype upstream, at, and downstream
of zag2, failed to cluster in segregating groups based on phenotype. Overall, there is very
little if any evidence that zag2 has any effect on the 20 measured phenotypes.
The reduction in expression of zag2 via transgenic RNAi constructs, likewise failed
to present compelling evidence for a phenotypic effect on percent fill of the ear. Overall,
relatively few zag2 RNAi transformation events resulted in increased sterility (measured
by percent fill). The effect on sterility of any given event seems to be highly dependent on genetic background, since less than half of the events assessed in multiple maize
backgrounds gave the same result. Most significant results consisted of drastic increase
in sterility, suggesting a major genetic dysfunction. We conclude that the zag2 RNAi
constructs have largely non-significant results, which are punctuated by several cases of
high genetic dysfunction. Furthermore, the inconsistent effects of specific transformation
events in different maize backgrounds seem unlikely to be related to zag2.
We failed to identify a phenotypic effect for zag2 in spite of evidence from the literature
that zag2 is expressed in the ear [116] and codes for a homolog of a known floral development gene in Arabidopsis [115]. It may be that zag2 controls a phenotype that was under
selection during maize domestication that we did not measure. Work by Schmidt et al.
[116] shows that zag2 and another Agamous homolog (zag1 ) are expressed in endosperm
post-pollination, suggesting a potential role in kernel quality and composition. While we
did measure kernel weight, there are many factors that contribute to kernel quality and
desirability that we did not assess including hard to soft endosperm ratio, protein, oil,
and starch content.
167
A potential complicating factor in our analysis of zag2 is the existence of three additional Agamous homologs in maize [115]. These homologs also share a high degree of
identity with the Arabidopsis Agamous and consequently, a high degree of identity and
similarity with each other. Of particularly high protein identity with zag2 is Zea mays
Mads1 (zmm1 ), which is over 95% identical. The high degree of identity between the
maize Agamous homologs is concerning in conjunction with expression in the same tissues [116] as it suggests functional conservation as well as sequence conservation. For
example, if the zmm1 gene can substitute functionally for the zag2 gene in the developing ear, then an experiment looking for an ear phenotypic response (such as the RCNIL
experiment) would need to account for the genotype at both zag2 and zmm1.
The failure to associate a domestication phenotype with zag2 demonstrates the difficulty in using a population genetics approach to identify interesting candidate genes.
From the perspective of population genetics, zag2 appears to have been under selection
during the maize domestication event and has homology with a known floral development
gene in Arabidopsis. A phenotypic effect on a domestication ear phenotype seems quite
likely, however, we did not see any noticeable effects in the female inflorescence in these
experiments. Similar difficulties in associating phenotype to selection candidate genes
has been encountered for two other genes in our lab. The Prolamin-box Binding Factor1
gene was extensively phenotyped in plant architecture and ear traits (unpublished data),
before finally identifying a slight difference in kernel size and density [14]. Additionally,
the Zea agamous-like1 gene appears to have a significant effect on days to anthesis or
flowering time in maize (unpublished data), however, flowering time is not a standard domestication trait. This study sheds light on the difficulty of associating phenotype with
a selection candidate gene and provides a word of caution for future studies seeking to
accomplish this feat.
168
References
[1] Gaines T, Zhang W, Wang D, Bukun B, Chisholm ST, et al. (2010) Gene amplification confers glyphosate resistance in Amaranthus palmeri. Proceedings of the
National Academy of Sciences 107: 1029–34.
[2] Gompel N, Prud’homme B, Wittkopp PJ, Kassner V, Carroll SB (2005) Chance
caught on the wing: cis-regulatory evolution and the origin of pigment patterns in
Drosophila. Nature 433: 481–7.
[3] Studer A, Zhao Q, Ross-Ibarra J, Doebley J (2011) Identification of a functional
transposon insertion in the maize domestication gene tb1. Nature Genetics 43:
1160–3.
[4] Wills DM, Whipple CJ, Takuno S, Kursel LE, Shannon LM, et al. (2013) From
Many, One: Genetic Control of Prolificacy during Maize Domestication. PLoS
Genetics 9: e1003604.
[5] Wang H, Nussbaum-Wagler T, Li B, Zhao Q, Vigouroux Y, et al. (2005) The origin
of the naked grains of maize. Nature 436: 714–9.
[6] Sun L, Li X, Fu Y, Zhu Z, Tan L, et al. (2013) GS6, a member of the GRAS gene
family, negatively regulates grain size in rice. Journal of Integrative Plant Biology
: 1–37.
[7] Olsen KM, Wendel JF (2013) A bountiful harvest: genomic insights into crop domestication phenotypes. Annual Review of Plant Biology 64: 47–70.
[8] Doebley J (2004) The genetics of maize evolution. Annual Review of Genetics 38:
37–59.
[9] Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, et al. (2009) The B73 maize
genome: complexity, diversity, and dynamics. Science 326: 1112–5.
[10] Allaby RG, Fuller DQ, Brown TA (2008) The genetic expectations of a protracted
model for the origins of domesticated crops. Proceedings of the National Academy
of Sciences 105: 13982–6.
[11] Pickersgill B (2007) Domestication of plants in the Americas:
Mendelian and molecular genetics. Annals of Botany 100: 925–40.
insights from
169
[12] Carroll SB (2008) Evo-devo and an expanding evolutionary synthesis: a genetic
theory of morphological evolution. Cell 134: 25–36.
[13] Wittkopp PJ, Kalay G (2012) Cis-regulatory elements: molecular mechanisms and
evolutionary processes underlying divergence. Nature Reviews Genetics 13: 59–69.
[14] Lang Z, Wills D, Lemmon Z, Shannon L, Bukowski R, et al. (2014) Defining the
role of prolamin-box binding factor1 gene during maize domestication. The Journal
of Heredity : In Press.
[15] Konishi S, Izawa T, Lin SY, Ebana K, Fukuta Y, et al. (2006) An SNP caused loss
of seed shattering during rice domestication. Science 312: 1392–6.
[16] Frary A, Nesbitt TC, Grandillo S, Knaap E, Cong B, et al. (2000) fw2.2 : a quantitative trait locus key to the evolution of tomato fruit size. Science 289: 85–8.
[17] Rapp RA, Haigler CH, Flagel L, Hovav RH, Udall JA, et al. (2010) Gene expression
in developing fibres of Upland cotton (Gossypium hirsutum L.) was massively altered
by domestication. BMC Biology 8: 139.
[18] Swanson-Wagner R, Briskine R, Schaefer R, Hufford MB, Ross-Ibarra J, et al. (2012)
Reshaping of the maize transcriptome by domestication. Proceedings of the National
Academy of Sciences 109: 11878–83.
[19] Koenig D, Jiménez-Gómez JM, Kimura S, Fulop D, Chitwood DH, et al. (2013)
Comparative transcriptomics reveals patterns of selection in domesticated and wild
tomato. Proceedings of the National Academy of Sciences 110: E2655–62.
[20] Emerson JJ, Hsieh LC, Sung HM, Wang TY, Huang CJ, et al. (2010) Natural
selection on cis and trans regulation in yeasts. Genome Research 20: 826–36.
[21] McManus CJ, Coolon JD, Duff MO, Eipper-Mains J, Graveley BR, et al. (2010)
Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Research 20:
816–25.
[22] White Ma, Stubbings M, Dumont BL, Payseur Ba (2012) Genetics and evolution
of hybrid male sterility in house mice. Genetics 191: 917–34.
[23] Alem S, Streiff R, Courtois B, Zenboudji S, Limousin D, et al. (2013) Genetic
architecture of sensory exploitation: QTL mapping of female and male receiver
traits in an acoustic moth. Journal of Evolutionary Biology 26: 2581–96.
[24] Miller CT, Glazer AM, Summers BR, Blackman BK, Norman AR, et al. (2014)
Modular Skeletal Evolution in Sticklebacks Is Controlled by Additive and Clustered
Quantitative Trait Loci. Genetics : In Press.
[25] Shannon LM (2012) The Genetic Architecture of Maize Domestication and Range
Expansion. Ph.D. dissertation. Ph.D. thesis, University of Wisconsin - Madison.
170
[26] Wills DM, Burke JM (2007) Quantitative trait locus analysis of the early domestication of sunflower. Genetics 176: 2589–99.
[27] Paterson AH, Damon S, Hewitt JD, Zamir D, Rabinowitch HD, et al. (1991)
Mendelian factors underlying quantitative traits in tomato: comparison across
species, generations, and environments. Genetics 127: 181–97.
[28] Xiong LZ, Liu KD, Dai XK, Xu CG, Zhang Q (1999) Identification of genetic
factors controlling domestication-related traits of rice using an F2 population of a
cross between Oryza sativa and O. rufipogon. Theoretical and Applied Genetics 98:
243–251.
[29] Peng J, Ronin Y, Fahima T, Röder MS, Li Y, et al. (2003) Domestication quantitative trait loci in Triticum dicoccoides, the progenitor of wheat. Proceedings of the
National Academy of Sciences 100: 2489–94.
[30] Cai W, Morishima H (2002) QTL clusters reflect character associations in wild and
cultivated rice. Theoretical and Applied Genetics 104: 1217–1228.
[31] Gyenis L, Yun SJ, Smith KP, Steffenson BJ, Bossolini E, et al. (2007) Genetic
architecture of quantitative trait loci associated with morphological and agronomic
trait differences in a wild by cultivated barley cross. Genome 50: 714–23.
[32] Simons KJ, Fellers JP, Trick HN, Zhang Z, Tai YS, et al. (2006) Molecular characterization of the major wheat domestication gene Q. Genetics 172: 547–55.
[33] Li C, Zhou A, Sang T (2006) Rice domestication by reducing shattering. Science
311: 1936–9.
[34] Cong B, Barrero LS, Tanksley SD (2008) Regulatory change in YABBY-like transcription factor led to evolution of extreme fruit size during tomato domestication.
Nature Genetics 40: 800–4.
[35] Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, et al. (2005) The effects
of artificial selection on the maize genome. Science 308: 1310–4.
[36] Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127: 1309–21.
[37] Briggs WH, McMullen MD, Doebley JF, Gaut BS (2007) Linkage mapping of domestication loci in a large maize teosinte backcross resource. Genetics 177: 1915–28.
[38] Doebley J, Stec A (1991) Genetic analysis of the morphological differences between
maize and teosinte. Genetics 129: 285–95.
[39] Whipple CJ, Kebrom TH, Weber AL, Yang F, Hall D, et al. (2011) Grassy Tillers1
Promotes Apical Dominance in Maize and Responds To Shade Signals in the
Grasses. Proceedings of the National Academy of Sciences 108: E506–12.
171
[40] Doebley J, Stec A, Hubbard L (1997) The evolution of apical dominance in maize.
Nature 386: 485–8.
[41] Clark RM, Nussbaum-Wagler T, Quijada P, Doebley J (2006) A distant upstream
enhancer at the maize domestication gene tb1 has pleiotropic effects on plant and
inflorescent architecture. Nature Genetics 38: 594–7.
[42] Studer AJ, Doebley JF (2011) Do large effect QTL fractionate? A case study at
the maize domestication QTL teosinte branched1. Genetics 188: 673–81.
[43] Doebley J, Stec A (1993) Inheritance of the morphological differences between maize
and teosinte: comparison of results for two F2 populations. Genetics 134: 559–70.
[44] Littell R, Milliken G, Stroup W, Wolfinger R (1996) SAS system for mixed models.
SAS Institute, Cary, NC., 2nd edition.
[45] Broman KW, Wu H, Sen S, Churchill G (2003) R/qtl: QTL mapping in experimental
crosses. Bioinformatics 19: 889–890.
[46] Broman KW, Sen S (2009) A Guide to QTL Mapping with R/qtl. Statistics for Biology and Health. New York, NY: Springer New York. doi:10.1007/978-0-387-92125-9.
URL http://www.springerlink.com/index/10.1007/978-0-387-92125-9.
[47] Kosambi DD (1944) The Estimation of Map Distances from Recombination Values.
Annals of Eugenics 12: 172–175.
[48] Orr HA (1998) The Population Genetics of Adaptation: The Distribution of Factors
Fixed during Adaptive Evolution. Evolution 52: 935.
[49] Beavis WD (1998) QTL Analyses: Power, Precision, and Accuracy. In: Paterson
AH, editor, Molecular Dissection of Complex Traits, New York, NY: CRC Press,
chapter 10. 1 edition, pp. 145–162.
[50] Hung HY, Shannon LM, Tian F, Bradbury PJ, Chen C, et al. (2012) ZmCCT and
the genetic basis of day-length adaptation underlying the postdomestication spread
of maize. Proceedings of the National Academy of Sciences 109: E1913–21.
[51] Yilmaz A, Nishiyama MY, Fuentes BG, Souza GM, Janies D, et al. (2009) GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant
Physiology 149: 171–80.
[52] Yanofsky MF, Ma H, Bowman JL, Drews GN, Feldmann KA, et al. (1990) The
protein encoded by the Arabidopsis homeotic gene agamous resembles transcription
factors. Nature 346: 35–9.
[53] Schwarz-Sommer Z, Huijser P, Nacken W, Saedler H, Sommer H (1990) Genetic
Control of Flower Development by Homeotic Genes in Antirrhinum majus. Science
250: 931–6.
172
[54] Smaczniak C, Immink RGH, Angenent GC, Kaufmann K (2012) Developmental and
evolutionary diversity of plant MADS-domain factors: insights from recent studies.
Development 139: 3081–98.
[55] Hufford MB, Xu X, van Heerwaarden J, Pyhäjärvi T, Chia JM, et al. (2012) Comparative population genomics of maize domestication and improvement. Nature
Genetics 44: 808–11.
[56] Sekhon RS, Lin H, Childs KL, Hansey CN, Robin Buell C, et al. (2011) Genomewide atlas of transcription through maize development. The Plant Journal : 1–11.
[57] Xue W, Xing Y, Weng X, Zhao Y, Tang W, et al. (2008) Natural variation in
Ghd7 is an important regulator of heading date and yield potential in rice. Nature
Genetics 40: 761–7.
[58] Li Y, Fan C, Xing Y, Jiang Y, Luo L, et al. (2011) Natural variation in GS5 plays
an important role in regulating grain size and yield in rice. Nature Genetics 43:
1266–9.
[59] Fan C, Xing Y, Mao H, Lu T, Han B, et al. (2006) GS3, a major QTL for grain
length and weight and minor QTL for grain width and thickness in rice, encodes a
putative transmembrane protein. Theoretical and Applied Genetics 112: 1164–71.
[60] Yu B, Lin Z, Li H, Li X, Li J, et al. (2007) TAC1, a major quantitative trait locus
controlling tiller angle in rice. The Plant Journal 52: 891–8.
[61] Jin J, Huang W, Gao JP, Yang J, Shi M, et al. (2008) Genetic control of rice plant
architecture under domestication. Nature Genetics 40: 1365–9.
[62] Yang Q, Li Z, Li W, Ku L, Wang C, et al. (2013) CACTA-like transposable element
in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize. Proceedings of the National Academy of Sciences 110:
16969–74.
[63] Kermicle JL (2006) A selfish gene governing pollen-pistil compatibility confers reproductive isolation between maize relatives. Genetics 172: 499–506.
[64] Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. (2011) A robust,
simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS
One 6: e19379.
[65] Carroll SB (2005) Evolution at two levels: on genes and form. PLoS Biology 3:
e245.
[66] Stern DL, Orgogozo V (2008) The loci of evolution: how predictable is genetic
evolution? Evolution 62: 2155–77.
173
[67] Springer NM, Stupar RM (2007) Allele-specific expression patterns reveal biases
and embryo-specific parent-of-origin effects in hybrid maize. The Plant Cell 19:
2391–402.
[68] Bell GDM, Kane NC, Rieseberg LH, Adams KL (2013) RNA-seq analysis of allelespecific expression, hybrid effects, and regulatory divergence in hybrids compared
with their parents from natural populations. Genome Biology and Evolution 5:
1309–23.
[69] Song G, Guo Z, Liu Z, Cheng Q, Qu X, et al. (2013) Global RNA sequencing
reveals that genotype-dependent allele-specific expression contributes to differential
expression in rice F1 hybrids. BMC Plant Biology 13: 221.
[70] Zhang X, Borevitz JO (2009) Global analysis of allele-specific expression in Arabidopsis thaliana. Genetics 182: 943–54.
[71] Tirosh I, Reikhav S, Levy Aa, Barkai N (2009) A yeast hybrid provides insight into
the evolution of gene expression regulation. Science 324: 659–62.
[72] He F, Zhang X, Hu J, Turck F, Dong X, et al. (2012) Genome-wide Analysis of
Cis-regulatory Divergence between Species in the Arabidopsis Genus. Molecular
Biology and Evolution 29: 3385–3395.
[73] Schaefke B, Emerson JJ, Wang TY, Lu MYJ, Hsieh LC, et al. (2013) Inheritance of
gene expression level and selective constraints on trans- and cis-regulatory changes
in yeast. Molecular Biology and Evolution 30: 2121–33.
[74] Purugganan MD, Fuller DQ (2009) The nature of selection during plant domestication. Nature 457: 843–8.
[75] Zhong S, Joung Jg, Zheng Y, Chen Yr, Liu B, et al. (2011) High-throughput illumina
strand-specific RNA sequencing library preparation. Cold Spring Harbor Protocols
2011: 940–9.
[76] Wang X, Soloway PD, Clark AG (2011) A survey for novel imprinted genes in the
mouse placenta by mRNA-seq. Genetics 189: 109–22.
[77] Li H, Durbin R (2009) Fast and accurate short read alignment with BurrowsWheeler transform. Bioinformatics 25: 1754–60.
[78] DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing
data. Nature Genetics 43: 491–8.
[79] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The
Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation
DNA sequencing data. Genome Research 20: 1297–303.
174
[80] Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient
alignment of short DNA sequences to the human genome. Genome Biology 10: R25.
[81] Storey JD (2002) A direct approach to false discovery rates. Journal of the Royal
Statistical Society: Series B (Statistical Methodology) 64: 479–498.
[82] Lester RN (1989) Evolution under domestication involving disturbance of genic
balance. Euphytica 44: 125–132.
[83] Gross BL, Olsen KM (2010) Genetic perspectives on crop domestication. Trends in
Plant Science 15: 529–537.
[84] Burger JC, Chapman MA, Burke JM (2008) Molecular insights into the evolution
of crop plants. American Journal of Botany 95: 113–122.
[85] Dean RB, Dixon WJ (1951) Simplified Statistics for Small Numbers of Observations.
Analytical Chemistry 23: 636–638.
[86] Jin J, Zhang H, Kong L, Gao G, Luo J (2014) PlantTFDB 3.0: a portal for the
functional and evolutionary study of plant transcription factors. Nucleic Acids
Research 42: D1182–7.
[87] Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for integration
and interpretation of large-scale molecular data sets. Nucleic Acids Research 40:
D109–14.
[88] Kanehisa M (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic
Acids Research 28: 27–30.
[89] Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective
sweeps. Genome Research 20: 393–402.
[90] Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) Gene ontology analysis
for RNA-seq: accounting for selection bias. Genome Biology 11: R14.
[91] R Development Core Team (2013). R: A language and environment for statistical
computing. URL http://www.r-project.org/.
[92] Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical
and powerful approach to multiple testing. Journal of the Royal Statistical Society
Series B (Methodological) 57: 289–300.
[93] Eichten SR, Briskine R, Song J, Li Q, Swanson-Wagner R, et al. (2013) Epigenetic
and genetic influences on DNA methylation variation in maize populations. The
Plant Cell 25: 2783–97.
[94] Duncan IW (2002) Transvection effects in Drosophila. Annual Review of Genetics
36: 521–56.
175
[95] Hansey CN, Vaillancourt B, Sekhon RS, de Leon N, Kaeppler SM, et al. (2012)
Maize (Zea mays L.) genome diversity as revealed by RNA-sequencing. PLoS One
7: e33071.
[96] Springer NM, Ying K, Fu Y, Ji T, Yeh CT, et al. (2009) Maize inbreds exhibit high
levels of copy number variation (CNV) and presence/absence variation (PAV) in
genome content. PLoS Genetics 5: e1000734.
[97] Chia JM, Song C, Bradbury PJ, Costich D, de Leon N, et al. (2012) Maize HapMap2
identifies extant variation from a genome in flux. Nature Genetics 44: 803–7.
[98] Tenaillon MI, U’Ren J, Tenaillon O, Gaut BS (2004) Selection versus demography:
a multilocus investigation of the domestication process in maize. Molecular Biology
and Evolution 21: 1214–25.
[99] Clark RM, Linton E, Messing J, Doebley JF (2004) Pattern of diversity in the
genomic region near the maize domestication gene tb1. Proceedings of the National
Academy of Sciences 101: 700–7.
[100] Hanning I, Baumgarten K, Schott K, Heldt H (1999) Oxaloacetate transport into
plant mitochondria. Plant Physiology 119: 1025–32.
[101] Zoglowek C, Krömer S, Heldt HW (1988) Oxaloacetate and malate transport by
plant mitochondria. Plant Physiology 87: 109–15.
[102] Hunt HV, Denyer K, Packman LC, Jones MK, Howe CJ (2010) Molecular basis
of the waxy endosperm starch phenotype in broomcorn millet (Panicum miliaceum
L.). Molecular Biology and Evolution 27: 1478–94.
[103] Fan L, Bao J, Wang Y, Yao J, Gui Y, et al. (2009) Post-domestication selection in
the maize starch pathway. PLoS One 4: e7612.
[104] Park YJ, Nemoto K, Nishikawa T, Matsushima K, Minami M, et al. (2009) Waxy
strains of three amaranth grains raised by different mutations in the coding region.
Molecular Breeding 25: 623–635.
[105] Dussert Y, Remigereau MS, Fontaine MC, Snirc A, Lakis G, et al. (2013) Polymorphism pattern at a miniature inverted-repeat transposable element locus downstream of the domestication gene Teosinte-branched1 in wild and domesticated pearl
millet. Molecular Ecology 22: 327–40.
[106] Sugimoto K, Takeuchi Y, Ebana K, Miyao A, Hirochika H, et al. (2010) Molecular
cloning of Sdr4, a regulator involved in seed dormancy and domestication of rice.
Proceedings of the National Academy of Sciences 107: 5792–7.
[107] Weller JL, Liew LC, Hecht VFG, Rajandran V, Laurie RE, et al. (2012) A conserved
molecular basis for photoperiod adaptation in two temperate legumes. Proceedings
of the National Academy of Sciences 109: 21158–63.
176
[108] Zhu BF, Si L, Wang Z, Zhou Y, Zhu J, et al. (2011) Genetic control of a transition
from black to straw-white seed hull in rice domestication. Plant Physiology 155:
1301–11.
[109] Liu J, Van Eck J, Cong B, Tanksley SD (2002) A new class of regulatory genes
underlying the cause of pear-shaped tomato fruit. Proceedings of the National
Academy of Sciences 99: 13302–6.
[110] Gallavotti A, Zhao Q, Kyozuka J, Meeley RB, Ritter MK, et al. (2004) The role of
barren stalk1 in the architecture of maize. Nature 432: 630–5.
[111] Carling MD, Brumfield RT (2009) Speciation in Passerina buntings: introgression
patterns of sex-linked loci identify a candidate gene region for reproductive isolation.
Molecular Ecology 18: 834–47.
[112] Pool JE, Corbett-Detig RB, Sugino RP, Stevens KA, Cardeno CM, et al. (2012)
Population Genomics of sub-saharan Drosophila melanogaster : African diversity
and non-African admixture. PLoS Genetics 8: e1003080.
[113] Zhao Q, Thuillet AC, Uhlmann NK, Weber AL, Rafalski JA, et al. (2008) The role
of regulatory genes during maize domestication: evidence from nucleotide polymorphism and gene expression. Genetics 178: 2133–43.
[114] Zhao Q, Weber AL, McMullen MD, Guill K, Doebley J (2011) MADS-box genes
of maize: frequent targets of selection during domestication. Genetics Research 93:
65–75.
[115] Theissen G, Strater T, Fischer A, Saedler H (1995) Structural characterization,
chromosomal localization and phylogenetic evaluation of two pairs of AGAMOUS like MADS-box genes from maize. Gene 156: 155–66.
[116] Schmidt RJ, Veit B, Mandel MA, Mena M, Hake S, et al. (1993) Identification and
molecular characterization of ZAG1, the maize homolog of the Arabidopsis floral
homeotic gene AGAMOUS. The Plant Cell 5: 729–37.