* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Transcriptional profiling of wheat caryopsis development using
Point mutation wikipedia , lookup
X-inactivation wikipedia , lookup
Gene nomenclature wikipedia , lookup
RNA silencing wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Pathogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Essential gene wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Protein moonlighting wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
RNA interference wikipedia , lookup
Non-coding RNA wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Primary transcript wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Genome evolution wikipedia , lookup
Genome (book) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Designer baby wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene expression programming wikipedia , lookup
Genomic imprinting wikipedia , lookup
Minimal genome wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Plant Mol Biol (2007) 63:651–668 DOI 10.1007/s11103-006-9114-y Transcriptional profiling of wheat caryopsis development using cDNA microarrays Debbie L. Laudencia-Chingcuanco Æ Boryana S. Stamova Æ Frank M. You Æ Gerard R. Lazo Æ Diane M. Beckles Æ Olin D. Anderson Received: 6 March 2006 / Accepted: 6 November 2006 / Published online: 9 January 2007 Springer Science+Business Media B.V. 2007 Abstract The expression of 7,835 genes in developing wheat caryopses was analyzed using cDNA arrays. Using a mixed model analysis of variance (ANOVA) method, 29% (2,237) of the genes on the array were identified to be differentially expressed at the 6 different time-points examined, which covers the developmental stages from coenocytic endosperm to physiological maturity. Comparison of genes differentially expressed between two time-points revealed a dynamic transcript accumulation profile with major re-programming events that occur at 3–7, 7–14 and 21–28 DPA. A k-means clustering algorithm grouped the differentially expressed genes into 10 clusters, revealing co-expression of genes involved in the same pathway such as carbohydrate and protein synthesis or preparation for desiccation. Functional annotation of genes that show peak expression at specific time-points Electronic supplementary material The online version of this article (doi: 10.1007/s11103-006-9114-y) contains supplementary material, which is available to authorized users. Boryana S. Stamova share first co-authorship D. L. Laudencia-Chingcuanco (&) G. R. Lazo O. D. Anderson Genomics and Gene Discovery, USDA-ARS-WRRC, 800 Buchanan Street, Albany, CA 94710, USA e-mail: [email protected] B. S. Stamova Genetic Resources Conservation Program, University of California-Davis, One Shields Avenue, Davis, CA 95616, USA F. M. You D. M. Beckles Plant Sciences Department, University of California-Davis, One Shields Avenue, Davis, CA 95616, USA correlated with the developmental events associated with the respective stages. Results provide information on the temporal expression during caryopsis development for a significant number of differentially expressed genes with unknown function. Keywords Wheat Endosperm Profiling Transcriptome Caryopsis Development Microarray cDNA ANOVA Introduction We are using cDNA microarrays to study global patterns of gene expression in developing wheat caryopsis. The caryopsis is the fruit of grasses in which the pericarp is fused to the seed coat at maturity and it is interchangeably referred to as the grain in cereals. The caryopsis consists of three major structures: embryo, endosperm and seed coat with the endosperm being the tissue of economic value. The endosperm is the primary site of storage of starch and protein, which are both important for global agriculture (Aquino et al. 1999; Triboi and Triboi-Blondel 2002). The endosperm starch and storage proteins are critical to bread and pasta quality and are exclusive to wheat such that no other cereal can be used similarly (Bechtel et al. 1990; Shewry et al. 2002). Wheat milling, baking, cooking and eating qualities are largely determined by the biochemical composition of the seed (Stoddard 1999a) and processed wheat starch and protein are also finding use as biopolymers (Davies et al. 2003; Woerdeman et al. 2004). To enhance wheat seed quality and functionality with the intention of broadening existing markets, it will be necessary to dissect the underlying 123 652 molecular mechanisms determining grain storage product accumulation (Davies et al. 2003; Francki and Appels 2002; Wilson et al. 2004). The physiological, biochemical and morphological changes that culminate during grain development determine the nutritional profile of the seed at harvest. These fundamental processes are almost certainly regulated at the level of transcription. Wheat endosperm development can be generally divided into five phases (Simmonds and O’Brien 1981): Phase I [0 days post-anthesis (DPA)] fertilization, Phase II (1–5 DPA) ‘‘coenocytic’’ endosperm, Phase III (6–13 DPA) cellularization and early grain filling, Phase IV (14–24 DPA) maximum grain filling, and Phase V (25–38 DPA) desiccation. Phase II is also commonly known as Grain Growth stage, Phase III and Phase IV as the grain filling stage, and Phase V as dry-down stage—where each stage describes the relationship between cell development and storage product accumulation. During Grain Growth, free nuclear division in the embryo sac produces the watery coenocytic tissue. Final cell number of the endosperm is established and differentiation into the outer aleurone layer and inner endosperm cells occurs. Grain filling is dominated by the rapid synthesis of starch and protein in the endosperm (Bechtel et al. 1990; Shewry and Halford 2002; Stoddard 1999b) and to a lesser degree, lipids. Cell division ceases and cell enlargement takes place to accommodate reserve storage. Dry down is characterized by cessation of most metabolic processes. The organ reaches maximal dry weight and the cells undergo a type of programmed cell death in preparation for dormancy (Olsen 2001; Olsen et al. 1999; Young and Gallie 2000). The changes in storage reserve accumulation during wheat grain maturation are well established; however, identifying key molecular determinants controlling carbon flux to the grain and the partitioning of carbon to starch and protein are more elusive. Much of our current knowledge is based on biochemical assays of protein and enzymatic activities of starch and protein biosynthesis during caryopsis development. Our understanding of how each pathway is controlled is complicated by the occurrence of multi-gene families encoding many of the enzymes in these biochemical pathways [reviewed in Morell and Myers (2005)], the interconnectedness of these different pathways (Giroux et al. 1994; Simmonds 1995; Triboi and TriboiBlondel 2002) and the strong influence of the environment on the amount and nature of the starch and protein made (Dupont and Altenbach 2003; Triboi et al. 2003). Gene networks and transcriptional regulators synchronously expressed with the biochemical 123 Plant Mol Biol (2007) 63:651–668 and physiological changes observed during grain development (Appels et al. 2003) may be informative. Classifying genes based on similarities or difference in transcript profile with phenotype can confirm existing knowledge, lead to the dissection and revelation of novel mechanisms determining nutrient partitioning, and generate new unbiased hypotheses (Leader 2005; Zhu et al. 2003). cDNA microarrays are used to provide a comprehensive description of transcript level in an organism after perturbation or during development [reviewed in Schaffer et al. (2000)]. They have been valuable in understanding aspects of grain growth and filling in cereals (Close et al. 2004; Gregersen et al. 2005; Sreenivasulu et al. 2004; Zhu et al. 2003). However, because of its recalcitrant genome (Langridge et al. 2001), this type of transcriptional analysis has only just seen wider utility in wheat (Clarke and Rahman 2005; Gregersen et al. 2005; Kawaura et al. 2006; Leader 2005; Lu et al. 2005; Wilson et al. 2004). In spite of their increasing use and vast contribution to opening new paradigms in plant biology the limitations of cDNA microarrays have been well documented (Alba et al. 2004; Donson et al. 2002; Leader 2005; Meyers et al. 2004). Inherent problems include cross-hybridization of homologous DNA and the poor reproducibility of the PCR amplicons spotted on the array (Meyers et al. 2004). Superior resources for global expression profiling such as tag-based technologies are available [reviewed in Meyers et al. (2004)]. However, they are not widely disseminated because they are technically demanding and relatively expensive. cDNA microarrays in contrast are cheap and relatively portable. The biological relevance of results obtained by this method may be enhanced by rigorous biological and technical replication, validation of results by independent methods and attention to quality control (Brazma et al. 2001; Yang and Speed 2002). We produced a cDNA microarray for wheat endosperm transcriptomics. Our aim was to establish an inhouse platform suitable for robust gene expression profiling of wheat caryopsis development. To support our objectives we created a ‘‘boutique’’ cDNA microarray with high coverage of clones isolated from developing caryopsis in order to focus on this process. Recent wheat transcriptional profiling studies have provided analyses of the changes underpinning caryopses development (Clarke and Rahman 2005; Gregersen et al. 2005; Leader 2005; Lu et al. 2005; Wilson et al. 2004; Xue et al. 2006). We wish to describe these changes, and to use the information as the basis to further explore changes in the wheat caryopsis transcriptome after genetic perturbation. Plant Mol Biol (2007) 63:651–668 In this paper we report, (1) the development and validation of the wheat cDNA microarray (2) the classification of the temporal patterns of gene expression during wheat grain filling, and (3) the correlation of those changes with storage product accumulation and caryopsis development. We show that the array is highly reproducible and validate our measurements by semi-quantitative reverse-transcription PCR. Finally we show that our array and methodology is stringent enough to further examine the mechanistic forces that underscore carbon partitioning in wheat. Materials and methods Plant material and total RNA extraction Triticum aestivum cv. Bobwhite was grown to maturity in 10-inch pots (6 plants per pot) in the greenhouse with maximum daytime temperature of 24C and minimum nighttime temperature of 17C. Natural light was supplemented with 100 W high-pressure sodium lights to maintain a day length of 16 h. Plants were fertilized with Plantex 20-20-20 using an automatic drip irrigation system. Several measures were implemented to reduce the variability of the biological samples. Pots were rotated every two-weeks to reduce the effects of environmental variations. The primary spike of each plant was tagged at anthesis and only grains at the middle of each spike were harvested. Grains were collected at 3, 7, 10, 14, 18, 21, 25, 28, 31, 35, 42, 45 and 50 DPA and immediately frozen with liquid nitrogen. For microarray experiments, RNA was isolated from developing grains at 3, 7, 14, 21, 28 and 35 DPA using a modified TRIZOL (Invitrogen, Carlsbad, CA). Frozen grains from each spike were ground using a mortar and pestle and the powdered tissue resuspended and incubated in a starch pre-extraction buffer (50 mM Tris–HCl pH 9.0, 200 mM NaCl, 1% Sarcosyl, 20 mM EDTA and 5 mM DTT). Starch was separated from the aqueous phase by phenol/chloroform extraction before proceeding with the TRIZOL protocol. Total RNA was treated with RQ1 DNAse (Promega, Madison, WI) and was re-purified using RNA Easy columns (QIAGEN Inc, Valencia, CA). The quality and concentration of RNA was measured by spectroscopy. RNA integrity and assessment of DNA contamination was done by resolving an aliquot of RNA sample in a 2% agarose gel with ethidium bromide and then UV– visualized. Alternatively, RNA integrity was determined using the Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). 653 Determination of physiological parameters during grain development To determine the change in grain fresh and dry weight during development, grains from 9 to 20 single heads were collected, counted and weighed. Grains from a single head were freeze-dried to determine dry weight. Water content was calculated by subtracting dry weight from fresh weight. Values from individual heads were averaged for each time point. To determine the starch and protein content, freezedried grains were ground to a powder using an UDY mill (UDY Corporation, Fort Collins, CO, USA). Total starch content per 100 mg sample was assayed using the Megazyme total starch determination assay kit (Megazyme International, County Wicklow, Ireland). Total protein content was determined on 30 mg samples by N combustion analysis with a Leco nitrogen analyzer (Leco Corporation, St Joseph, MI, USA) using a protein to N ratio of 5.7:1. For both starch and proteins assays, three determinations were done per biological sample, and 2–5 biological samples were used per time point. Values were averaged for each time point. Wheat 8K cDNA array construction Two sets of clones were used for the construction of the wheat 8K cDNA arrays: 2,304 unique clones from developing wheat grain ESTs and the set of approximately 6,000 expressed sequence tag (EST) clones mapped to the wheat hexaploid genome (Qi et al. 2004). The cDNA clones for the developing wheat grain ESTs were derived from two non-normalized cDNA libraries: TA001E1X, generated from RNA isolated from 5 to 30 DPA developing endosperm tissue of T. aestivum cv. Cheyenne, and TA059E1X, generated from developing grains of T. aestivum cv. Butte plants subjected to different abiotic stresses (http://wheat.pw.usda.gov/cgi-bin/nsf/nsf_library. cgi). The set of mapped EST clones were derived from 48 different cDNA libraries generated from different plant tissues of normally grown plants and those subjected to different biotic and abiotic stresses. The description of the cDNA libraries and sequences of the mapped probes are available at http://wheat.pw.usda. gov/NSF/progress_mapping.html. The GenBank accession number (when available) of a clone spotted on the array is used as its unique ID. Clones that are duplicated from the two sources are assigned probe names appended with a letter. Clones that fail to generate a PCR product either due to poor growth or give a double or multiple bands when amplified 123 654 Plant Mol Biol (2007) 63:651–668 were not removed from the array. Instead the original clone was re-arrayed and new plasmid DNA isolated to generate a quality probe. The GenBank accession number of the re-arrayed clone was also appended with a letter. The new array contained 10,800 spots; aside from the control clones, 9,167 of the spots were wheat probes representing 7,835 unique genes. Clone inserts were amplified using a modified pair of universal M13 forward and M13 reverse primers: GTTTTCCCAGTCACGACGTTG and TGAGCGGATAACAATTTC ACACAG, respectively. PCR amplifications were carried out in 100 ll volume in an MJR Tetrad Thermal Cycler (Bio-Rad Laboratories, Hercules, CA) for 30 cycles with 57C annealing temperature and 2.5 min extension time. The reaction cocktail contained plasmid DNA, 1.5 mM MgCl2, 200 lM each of deoxynucleotides dATP, dCTP, dGTP and dTTP, 200 nM each of M13 forward and M13 reverse primers, 1.25 unit Taq polymerase (Promega Biosciences, Inc., San Luis Obispo, CA) and 1· reaction buffer (50 mM KCl, 10 mM Tris–HCl pH 9.0, 0.1% Triton X-100). Amplicon size, yield and integrity were determined by resolving 5 ll of the PCR product in a 1% agarose gel. Amplicons were purified using QIAquick 96 PCR purification kit (QIAGEN Inc, Valencia, CA), dried and resuspended in 50% DMSO as printing buffer. Probe DNA (approximately 300 ng/ll) were spotted from 384-well microtiter plates onto Corning UltraGAPS slides (Corning Inc, NY) using an Omnigrid 100 arraying machine (Genomics Solutions, Ann Arbor, MI) fitted with 48 CMP3 printing pens from Telechem (Sunnyvale, CA). Each array has 48 subgrids and each subgrid had 15 · 15 spots. The same pen prints all the spots on a subgrid. The printed slides were UV-crosslinked at 300 mJ and dried for 2 h in an 80C oven before use. Clone insert identities were verified by resequencing using the ABI Big Dye terminator mix from Perkin-Elmer and analyzed in a 3700 or 3730 ABI DNA analyzer. mation marker genes. The 23 Lucidea Score Card control DNA are artificial genes composed of sequences from yeast intergenic regions. Each of the 10 AFGC spiking controls was spotted on the first row of 44 of the 48 subgrids on the array. The 23 Lucidea Score Card control DNAs were spotted on 4 of the 48 subgrids on the array. The four subgrids (7, 18, 31 and 42) were evenly spaced along the entire length of the printed array. The mRNA for the 10 AFGC spiking controls were in vitro transcribed using Promega RiboProbe in vitro Transcription System-T3 (Promega, Madison, WI). The AFGC control clones are available at the European Arabidopsis Stock Center (http:// seeds.nottingham.ac.uk/Nasc). The set of in vitro transcribed mRNA for the Lucidea Score Card controls were purchased from Amersham. Microarray controls Hybridized slides were scanned using a Genepix 4000B microarray scanner (Axon Instruments, Union City, CA) and raw spot fluorescence intensities were collected using GenePix Pro version 5.0. A quality control filter is used to flag questionable spots on the array so they can be removed from analysis. The criteria for a fair array feature included intensity signal >55% of background +1 standard deviation (SD), less than 3% pixel saturation for both channels and feature shape or circularity. Array features annotated as ‘‘printing buffer’’, ‘‘blank’’ or ‘‘empty’’ were flagged and excluded from analysis. The DNA of several control clones were spotted on the array which could be used for normalization and data quality assessment: The control clones include the 18 Arabidopsis Functional Genomics Consortium (AFGC) microarray control set and 23 Lucidea Score Card (Amersham Biosciences, Palo Alto, CA) calibration, ratio, utility and negative controls. The AFGC control set includes 10 spiking controls derived from human genes which do not show significant cross hybridization with plant DNA and 8 plant transfor- 123 RNA labeling and microarray hybridization RNA was indirectly labeled with Alexa 555 and Alexa 647 fluorophores (Molecular Probes, Inc, Eugene, OR) using the protocol recommended by the manufacturer. Briefly, 10 lg total RNA was annealed to 0.5 lg oligo-dT and reverse transcribed using the following reaction cocktail: 200 units Superscript II (Invitrogen, Carlsbad, CA), 500 lM each of dATP, dCTP and dGTP, 150 lM dTTP, 300 lM amino-allyldUTP, 10 mM dithiothreitol, and 1· Superscipt II Strand buffer. The fluorophores were coupled to the cDNA for at least 2 h, quenched and the unbound fluorophores removed using Microcon YM-30 filters (Millipore, Billerica, MA). Equal amounts of in vitro transcribed mRNA for 10 mammalian spiking controls (0.05 ng each/lg sample RNA) and 8 Lucidea calibration controls were added to each labeling reaction mix. Hybridization was carried out using the Pronto! hybridization kit reagents (Corning Inc, NY) following the supplied protocol. Data collection and preprocessing Plant Mol Biol (2007) 63:651–668 Although sources of variance can be included in the analysis of variance (ANOVA) gene model, the data set were pre-normalized before ANOVA analysis was performed to further reduce non-biologically significant variations. The logarithms (base 2) of the original intensity values, not the ratios, for each probe obtained from each fluorescent channel were used. To reduce any dye bias introduced during RNA labeling, the global median value of all the signals per channel on each individual array was subtracted from each spot raw intensity signal and divided by the standard deviation. The expression values per gene were centered based on the median value of expression of the gene across all time points. Data analysis and visualization Mixed model ANOVA was used to assess the significance of the difference in expression of each gene across the 6 time-points surveyed (Kerr et al. 2000; Wolfinger et al. 2001). With the multiple steps required to carry out a successful microarray experiment, it is not unusual to have ‘‘noisy’’ data. To extract reliable information from the data, non-biologically significant sources of signal variation were identified and their effects removed. The following gene model was used to identify genes that were differentially expressed: Yijklm ¼ l þ Arrayi þ Dyej þ Treatmentk þ BioRepl þ ðDye TreatmentÞjk þ 655 Genesis software (Sturn et al. 2002). Genespring GX software version 7.3 (Agilent Technologies, Santa Clara, CA) was used to identify genes with pattern of expression correlating with the accumulation and rate of accumulation of physiological markers. RT-PCR To confirm the expression patterns observed by microarray data analysis, one-step semi-quantitative reverse transcription polymerase chain reaction (RTPCR) experiments were performed. Probes that exhibited different patterns of expression based on microarray data analysis were chosen. Primer pairs were selected and designed from sequences near the 3¢ end of the gene using PrimeSelect software (DNASTAR Inc. Madison, WI). If a clone belonged to a contig, the primer pair was designed from the consensus sequence of the virtual gene. An 18S rRNA gene was selected as a control. Total RNA isolated from grains of appropriately staged spikes of a fourth biological sample (different from the three sets used for the microarray experiment) was used as templates for amplification. The QIAGEN One Step RT-PCR kit and MJR Peltier thermal cycler were used to carry out the amplifications. RT-PCR fragments were resolved on an agarose gel, stained with ethidium bromide and visualized with UV light. Microarray probe annotation and GO functional categorization þ ðBioRep TreatmentÞlk þ ðBioRep DyeÞjl þ ðDye BioRep TreatmentÞjkl þ eijklm Yijkl denotes the transformed intensity for a gene, l denotes the average intensity and eijklm captures the random errors. The variation due to microarray slide used (Array) was designated as random effect, whereas, variations due to RNA fluorescent labeling (Dye), biological sample RNA (BioRep) and caryopsis developmental stage (Treatment) were treated as fixed effects. Only the main effects interacting with Treatment were included in the model. The Statistical Analysis Software version 9.0 (SAS Institute Inc, Cary, NC) PROC MIXED protocol was used for analysis. The data for the differentially expressed genes were visualized by implementing the hierarchical and nonhierarchical (k-means) clustering methods in the The DNA sequence for each of the unique probes spotted on the array was searched against the non-redundant GenBank DNA and protein databases (release 144) using BLASTN and BLASTX (Altschul et al. 1990, 1997). The best match was extracted using an in-house Perl script and used as a basis for obtaining annotations for each probe based on sequence identity. The DNA sequence for each of the unique probes was also searched against the UniProt database (Release 1.5, TrEMBL, Swiss-Prot, and PIR, http://www.ebi.ac.uk/uniprot) resources using BLASTX, and best matches (E value < 10–10) were compared to terms of the Gene OntologyTM (GO) Consortium. Using GO/UniProt comparison tables, candidate GO assignments were predicted based on EST matches to the UniProt reference sequences. Categories were assigned based on biological, functional, and molecular annotations available from GO (http://www.geneontology.org/). 123 656 Plant Mol Biol (2007) 63:651–668 Results and discussion Timing of developmental events in wheat caryopsis The environmental conditions by which wheat is grown affect the development and quality of the grain (Altenbach et al. 2003; Bhullar and Jenner 1986; Brooks et al. 1982; Daniel and Triboi 2000; Dupont and Altenbach 2003). The timing of grain development in particular is dramatically influenced by temperature (Altenbach and Kothari 2004; Dupont and Altenbach 2003). Altenbach and Kothari showed that high temperature advance and compress the transcriptional programs in developing wheat grain. However, they also observed that when grains grown under two different temperature regimes were staged based on physiological markers rather than on chronological time, the pattern of transcript accumulation in grains grown in high temperature compared with those grown in moderate temperature were equivalent. Presumably this results because the network of the grain transcriptional program is dependent on environmental conditions. Thus, the physiological markers of plant growth may serve as a better reference for analyzing the timing of gene expression. To associate the timing of caryopsis development with changes in gene expression, several key physiological markers of growth were determined for the materials used in the microarray experiments: caryopsis fresh weight, dry weight, water, protein and starch content. Fresh weight (FW) reflects the accumulation of both dry matter and uptake or loss of water. Dry weight (DW) is primarily influenced by the accumulation of the major seed storage molecules, proteins and starch, in the endosperm. Water content (WC), determined as the difference between FW and DW of each grain, can be used to mark grain cellular expansion and desiccation. Grains from appropriately staged spikes were collected, counted, weighed and dried. The dried caryopses in each spike were ground for protein and starch content determination. The results of the physiological marker determinations for 14 different time points are shown in Fig. 1. Our data show that caryopsis FW increased linearly to 31 DPA reaching a maximum of 62 mg/caryopsis (Fig. 1A). The rate of FW accumulation was at its maximum rate at 14 DPA. Caryopsis DW increased steadily and plateau at 35–38 DPA at 37–39 mg/caryopsis. The rate of DW accumulation was maximal at 21 DPA. Although the caryopsis accumulated water up to 21 DPA, the maximum %WC per caryopsis was at 10 DPA. The rate of water accumulation was maximal between 7 DPA and 14 DPA. The caryopsis began to 123 Fig. 1 Physiological marker of developmental events in wheat caryopsis. (A) Fresh weight (circle), dry matter (square) and water content (triangle), shown as amount in mg per caryopsis, were determined at 14 different time points in wheat caryopsis development. (B) Accumulation of storage compounds, starch (circle) and protein (square), during caryopsis development. (C) Classes of expressed genes expected and predicted to be associated with the biological events at each stage in caryopsis development lose water at 28–35 DPA and reached 4–5% of FW at 45–50 DPA. The caryopsis began to turn brown at around 28 DPA and reached physiological maturity at 35–38 DPA, with water content between 35% and 40% FW (Calderini et al. 2000). Starch accumulated at a maximum rate between 10–14 DPA and peaked at 35– 38 DPA. Total protein accumulated linearly between 7–28 DPA and reached its peak level at 38–42 DPA. Six time points in caryopsis development were used for the microarray experiment. These time-points (3, 7, 14, 21, 28 and 35 DPA) cover the different stages of development from coenocytic stage to desiccation. The time-points encompass the transition points in dry matter accumulation and water uptake and loss, which Plant Mol Biol (2007) 63:651–668 657 could reflect control points in the programming of gene expression in the developing caryopsis. The expected and predicted genes that may be expressed in these stages are shown in Fig. 1C. At 3 DPA, the caryopsis is at the coenocytic stage. It is small and has not begun accumulating grain storage molecules. At 7 DPA, the endosperm is actively undergoing cellularization and storage molecule accumulation is initiated. The cells are undergoing expansion as reflected by increasing rate of water uptake, which peaks at 14 DPA. At 14 DPA, the rate of dry matter increases and peaks at 21 DPA. At 21 DPA water content reaches its maximum and begins to decline at 28 DPA. At 35 DPA caryopsis attains maximum dry matter content and reaches physiological maturity. RNA profiling experimental design and data analysis We used an in-house generated wheat cDNA array containing probes for 7,835 genes to examine the gene expression dynamics in the developing caryopsis. The probe set is enriched for genes that are expressed in the endosperm and 75% of the probes (5,817 genes) have been mapped in the hexaploid wheat genome. Data analysis of the embedded spiking and hybridization controls on the arrays showed that we can reliably detect a signal from 0.3 pg of the calibration control RNA for every 1 lg of total RNA labeled for hybridization (see Supplemental Data I). Assuming that the amplification efficiency of the endogenous wheat RNA is similar to that of the spiked-in calibration controls and using the estimated 100–500 pg of total RNA in a plant cell (Goldberg et al. 1978; Kerk et al. 2003), and an average of 1,500 nucleotides/molecule of plant mRNA (Alexandrov et al. 2006; Goldberg et al. 1978; Ogihara et al. 2004), the cDNA microarrays could detect the expression of genes with as low as 36–181 RNA transcripts per cell. Using Goldberg and colleagues’ estimate of 17 transcripts, 340 transcripts and 4,500 transcripts per cell for low, moderate and highly expressed genes, respectively, the wheat 8K cDNA arrays can reliably detect moderately expressed genes and perhaps some of the low expressed genes. A connected loop design (Kerr and Churchill 2001a, b) was used to compare the expression of genes between time points (Fig. 2A). To reduce the effect of dye-bias, dye swaps for each RNA sample were performed for each comparison The experiment was carried out using three biological replicates per time point, wherein RNA was independently isolated from spikes of three different plants at the same stage of development. Overall 54 slides were used for 108 Fig. 2 Experimental design and data normalization. (A) Loop design: RNA level from a specific time-point, T, is directly compared with the RNA level from two adjacent time points and the time-point 3 times removed. The arrow represents independent RNA labeling and hybridization, wherein the RNA at the tail end of the arrow is labeled with Alexa 555 fluorophore and the RNA at the head of the arrow is labeled with Alexa 647 fluorophore. (B) Distribution of normalized data: The left panel shows the distribution of close to 1.17 million microarray spots raw intensity values after log (base 2) transformation. The right panel shows the distribution of all the data points after normalization (dye correction and median centering). Note that after data pre-processing all the data points were normally distributed and centered on zero hybridizations, which yielded 18 data-points per gene per time point. Raw signals from each slide were preprocessed to remove flagged data points and were prenormalized using the spiking control signals. The logarithmic (base 2) transformed signals across all slides showed a skewed distribution that was fixed after normalization (Fig. 2B). The normalized median-centered log2 transformed signals for each probe were used as input for a mixed model analysis of variance (ANOVA) to identify genes that were differentially expressed during development. Dynamics of gene expression in developing caryopsis We identified 2,295 validated probes on our array to be differentially expressed (at P £ 0.01) during caryopsis development (listed in Supplemental Data II). Examination of the distribution of the genes that are differentially expressed between two time-points (Fig. 3, solid bars) indicates that a significant number of genes 123 658 Plant Mol Biol (2007) 63:651–668 Numbre of DE Trnarcspits 1200 963 1000 800 725 600 400 364 401 525 460 256 200 88 69 0 3 to 7 7 to 14 14 to 21 21 to 28 13 28 to 35 Consecutive Time-Points Comparison (DPA) Fig. 3 Differentially expressed genes between two consecutive time points. Solid bars represent differentially expressed genes between two consecutive time points. The hatched bars represent genes that show significant differential expression (P-value = 0.01 or less) in two consecutive time-points only and not significantly expressed in other time points showed differential expression throughout development and that major reprogramming occurs during the early stages of caryopsis development especially between 7 DPA and 14 DPA which coincides with the onset for grain filling. To further gain insight into the transcriptional control of biological events at specific stages of grain development, we examined the transition points for genes that are differentially expressed between two time points. Transcripts that show rapid up-regulation or down-regulation only during a particular time of the grain development could be characteristic of a specific developmental stage. Genes differentially expressed between two time points were queried for genes that were differentially expressed ONLY in the two consecutive time-points chosen (at P £ 0.01) and not in other stages of development (Fig. 3, hatched bars). A significant number of the genes expressed in this manner were observed in three stages: between 3–7, 7–14 and 21–28 DPA. These three stages correspond to distinct developmental and metabolic events occurring in the caryopsis. Caryopsis endosperm at 3–7 DPA is transitioning from a coenocyte to a multi-cellular tissue. The onset of grain filling occurs at 7–14 DPA with a maximal rate of accumulation of transcripts encoding storage proteins and enzymes involved in starch synthesis. At 21–28 DPA, the caryopsis is beginning to undergo maturation and desiccation. A major proportion (48–55%) of the differentially expressed genes have not been ascribed a GO biological function annotation. For those genes with biological GO annotation, the majority falls into the cellular macromolecule metabolism category or generation of precursor metabolites and energy category. About 20% of these genes reach a maximum transcript level between 14 DPA and 21 DPA, when maximum grain filling occurs. The group of genes (69 transcripts) that were differentially expressed only between 14 DPA and 21 DPA contained three 14-3-3b interacting proteins, which were down regulated during that timespan. 14-3-3 are a family of proteins that play a regulatory role in several processes in plant development such as signal transduction, checkpoint control, apoptosis and nutrient-sensing pathways (Ferl 1996; Fulgosi et al. 2002). There are three genes of unknown function that closely follow the pattern of expression of 14-3-3. This co-expression can suggest their possible coinvolvement with these regulatory proteins. It has been proposed that 14-3-3b may inhibit starch biosynthesis (Fulgosi et al. 2002; Sehnke et al. 2001). This is consistent with our observation that 14-3-3b transcript accumulation is inversely related to starch accumulation. Among the genes that are significantly up-regulated only at 14–21 DPA are genes involved in generation of metabolite precursors and energy and those that are possibly involved in defense mechanisms. About half of the genes in this group are significantly down-regulated. These include EF1-a, actin, a-tubulin, carbohydrate metabolism and unknown genes. 123 Global patterns of transcript accumulation in developing caryopsis To assimilate the data generated, hierarchical and k-means clustering methods were used to discriminate and visualize patterns of gene expression during development. Hierarchical clustering (Eisen et al. 1998) allows for the analysis of the relationship of each gene to every other gene on the array, whereas, non-hierarchical clustering, like k-means clustering (Sturn et al. 2002), allows the separation of the genes (or the expression profiles of the transcripts) into distinct classes. To classify the genes into groups with a similar pattern of expression, k-means clusters of 5, 8, 10, 12, 15, 20 and 25 were tested (data not shown). Clustering into 10 groups was selected (Fig. 4) since it gave Plant Mol Biol (2007) 63:651–668 659 Fig. 4 Patterns of gene expression in developing wheat caryopsis The 2,295 differentially expressed genes across all 6 time points were grouped into 10 clusters using the k-means algorithm. The mean centered relative gene expression value (in log2 scale) for each gene is plotted on the y-axis and the time of development in days post-anthesis (DPA) is on the x-axis. The magenta curve represents the median of the gene expression values in each cluster. Tick marks on the x-axis represent the developmental time 3, 7, 14, 21, 28 and 35 DPA. Tick marks on the x-axis for 3 DPA and 35 DPA overlap with the sides of the cluster box enough resolution (after 50 iterations) of the different expression profiles without significant redundancy. The number of genes that grouped into the 10 different clusters ranged from 10 (Cluster 10) to 771 (Cluster 9). Based on trends of mRNA accumulation, the 10 kmeans cluster of gene expression patterns can be sorted into four classes: up-regulated (Fig. 4, Clusters 2, 3, 6, 7 and 10), down-regulated (Fig. 4, Clusters 1 and 5), bellshaped (Fig. 4, Cluster 8), and modulated (Fig. 4, Cluster 4 and 9). The range of gene expression based on fold change is shown in Table 1 and the distribution of the gene biological GO annotations for each cluster is in Table 2. Clusters 1, 4, 6 and 9 contain the majority of the genes and have a narrow range of change in gene expression (1.2–4.9) compared to the other clusters. Gene GO annotation shows a significant number involved in cellular and macromolecule synthesis; 41– 67% of the genes in these clusters have unknown biological roles in the cell. The activity of the genes in Clusters 1 and 5, which show peak of expression at 3–7 DPA, are consistent with the cellular events occurring during Grain Growth stage. These genes are involved in cell division and nucleic acid metabolism (e.g. histones, tubulins), photosynthesis (i.e. chlorophyll a/b binding proteins, ferredoxin) and protein metabolism (i.e. ribosomal Table 1 Level of expression of the developmentally regulated genes within each cluster Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9 Cluster 10 a No. of genes Fold changeb Maximum fold D Minimum fold D Mean fold D ‡2.0 fold D Peak stagec 3 DPA 7 DPA 14 DPA 21 DPA 28 DPA 35 DPA 424 53 92 456 51 337 67 34 771 10 4.9 1.4 1.9 28% 37.2 5.7 11.9 100% 15.7 2.3 4.2 100% 2.6 1.2 1.4 1% 11.1 2.2 5.0 100% 3.2 1.2 1.7 22% 114.6 17.0 42.2 100% 9.7 2.2 4.2 100% 2.7 1.2 1.4 2% 13.3 4.3 9.0 100% 125 298 1 0 0 0 0 0 1 42 5 5 0 0 3 67 8 14 421 9 20 4 0 2 36 15 0 0 0 0 0 1 50 212 10 64 0 0 0 50 7 10 0 0 29 5 0 0 70 468 115 118 0 0 0 0 0 4 0 6 a The number of genes that group to a cluster as described in Fig. 4 Fold change (D) in level of gene expression was determined by subtracting the lowest from the highest value of the relative expression of a gene across all the 6 time points; since the gene relative expression value is in log2 the difference was converted to its natural number Maximum fold change is the fold-change in expression of the gene within a cluster with the highest fold-change value, whereas, the minimum fold-change is the fold-change in expression of the gene in a cluster with lowest value. Mean fold-change is the average fold-change in expression of all the genes in the cluster. Also shown is the portion of genes in a cluster that had ‡2.0-fold change in expression during development c The developmental time point when the maximum transcript level of a gene is reached is designated as ‘‘Peak Stage’’. Indicated in each row are the numbers of genes in each cluster that reached a peak stage at a defined time point b 123 660 Plant Mol Biol (2007) 63:651–668 Table 2 Functional annotation of genes in each cluster Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9 Cluster 10 Carbohydrate metabolism Cellular biosynthesis Cellular catabolism Defense response Energy molecular precursor Lipid metabolism Macromolecule synthesis Nucleic acid metabolism Polymer biosynthesis Polymer catabolism Protein metabolism Unknown 9 24 11 5 15 8 2 8 23 6 5 8 4 11 5 6 14 10 3 7 10 10 1 12 8 5 10 7 4 6 3 1 3 21 10 29 35 26 6 18 7 14 12 4 8 0 0 10 10 0 3 36 14 1 5 23 41 0 19 6 0 9 11 49 0 10 14 1 5 8 61 2 23 11 1 7 14 54 4 51 35 0 2 35 35 1 13 7 1 5 7 67 3 4 3 0 6 3 64 3 47 9 15 15 18 35 4 27 13 1 7 16 52 20 10 0 0 0 10 50 The distribution of genes in each cluster based on selected GO functional annotations are reported in percent of total number of genes in a cluster. The percent total per cluster is not equal to 100% because some of the genes have two or more assigned biological functions in the cell proteins, translation initiation and elongation factors, proteosomal proteins). A significant number of the genes in these clusters (35–41%) still have unknown biological functions in the cell. The major metabolic activity occurring in the grain filling (14–24 DPA) stage of grain development is the synthesis and accumulation of storage molecules, starch and proteins. The main storage proteins are the prolamins, glutenins, and gliadins, which are the major components of the gluten polymer that determines the economic significance of wheat (Shewry and Halford 2002). Most of the storage protein genes are found in Clusters 2 and 7. Genes in these clusters show a dramatic rate of increase in transcript level between 7 DPA and 14 DPA, peak expression at 21 DPA and then plateau thereafter. The average fold change in gene expression in Cluster 2 is 12, those for Cluster 7 is 42. Cluster 7 contains the most highly differentially expressed genes in the endosperm, with a minimum fold change of 17 and the highest of 115. Most of which the genes in Cluster 7 are involved in grain filling: storage protein genes, grain softness protein, and aamylase inhibitors (involved in metabolic and possibly defense functions). 9 of the 67 genes in Cluster 7 are unknown proteins. The maximum rate of storage protein transcript and protein accumulation both occur at the same time between 7 DPA and 14 DPA, which is consistent with results obtained by others (Altenbach and Kothari 2004; Clarke et al. 2000). This observation supports the hypothesis that storage protein gene transcription, poly (A)+ RNA processing, and translation are closely coupled in the developing wheat caryopsis, suggesting that a major control of storage protein accumulation is at the transcriptional level. 123 Cluster 8, which contains 34 genes, shows a bell-shaped pattern of expression with a peak in transcript level at 14 DPA. The cluster represents mainly metabolic enzymes involved in carbohydrate and protein metabolism: orthophosphate dikinase, aspartic protease, aspartate aminotransferase, alanine aminotransferase, sucrose synthase 2 and ADP-glucose pyrophosphorylase small subunit, which is a component of an enzyme involved in the first committed step in starch biosynthesis (Tetlow et al. 2004). The bell-shaped expression of the carbohydrate and protein metabolic genes correlates with the observed rate of starch and protein accumulation in the grain (Fig. 1), except that the maximal rate of accumulation of these storage products is observed between 14 DPA and 21 DPA, following the maximum of the transcript expression. This is expected since the mRNA is the template for protein synthesis. It should be noted, that the accumulation of transcripts involved in starch biosynthesis occurs as early as 3 DPA. This could be from the synthesis of starch in the pericarp tissue, which is still photosynthetic at this stage. However, the presence of starch biosynthesis genes at 3 DPA could also support the morphological observation that starch granules are present as early as the coenocytic stage of endosperm development (Hughes 1976; Simmonds and O’Brien 1981). Cluster 10, which contains 10 genes, shows a maximum rate of transcript accumulation between 7 DPA and 21 DPA, and the transcripts continue to be up-regulated even after the desiccation program has begun late in grain development. The transcript level of 4 of the genes peaked at 21 DPA and the other 6 continue increasing thru the last data point at 35 DPA. Among the members of this cluster are gamma-purothionin, a well-characterized defense Plant Mol Biol (2007) 63:651–668 protein (Bruix et al. 1993; Colilla et al. 1990) and Ec-protein (‘early’ cysteine-labeled metalloprotein). Not much is known about the function of Ec in wheat (Hanley-Bowdoin and Lane 1983). A review of plant metalloproteins (Bilecen et al. 2005), suggested that these proteins may be involved in stress responses, programmed cell death, developmental regulation and heavy-metal metabolism. The cluster analysis, which classified genes with similar expression profiles across development, revealed the enrichment of genes associated in similar metabolic or developmental pathways. The co-expression of genes in a pathway has been observed in other large-scale expression analyses (the most recent ones include Gachon et al. 2005; Wellmer et al. 2006; Zhang et al. 2005). This association within the cluster could provide insight into the function of genes with unknown function. Clustering of similar or paralogous/ orthologous genes involved in the same pathway could also serve to improve approaches to manipulate these pathways by gene silencing. Finally, the genes in each cluster could provide a good source of promoters to drive the transcription of transgenes to a specific expression profile during caryopsis development. Genes of unknown function As described in the previous section, the majority of genes that are differentially expressed during grain development have unknown function. Many of these genes cluster together with annotated genes involved in defined biological processes such as cell division, carbohydrate and protein synthesis, and desiccation. Analysis of gene expression patterns combined with gene sequence analysis could provide testable hypotheses as to the function of these unknown genes. To test the assumption of ‘‘guilt-by-association’’ and gain insight into the character of unknown genes we looked more closely at the genes in Cluster 7 (Fig. 4). The majority of the 67 genes that grouped to Cluster 7 are storage proteins and defense proteins. Nine genes in this cluster could not be ascribed a function based on BLASTX and BLASTN best match hits. Using the TIGR wheat index EST annotator (http:// www.tigr.org/tigr-scripts/tgi/est_ann.pl?db=wheatest), we checked whether the probe for the unknown genes in Cluster 7 assembled with other ESTs to form a contig. If so, then the contig sequence was used to compare against the non-redundant GenBank DNA and protein databases. If the unknown gene had no match to known genes, the protein domain database was searched for common protein motifs (MarchlerBauer et al. 2005). With this strategy, the unknown 661 gene encoded by BG262302 showed identity to a seed storage protein (GenBank accession Q7XYF0) and BE606942 is similar to a ferrochelatase protein (GenBank accession Q9FEK8). Interestingly, the derived translation protein of BQ804784 sequence showed similarity to an mRNA binding-like protein, whereas, BQ167790 sequence has 3 RRM domains, an RNA recognition motif (Marchler-Bauer et al. 2005). The other 5 genes contain no recognizable domains, therefore, remained unknown. Expression of developmental stage-specific genes To further examine gene expression at different stages of grain development, the genes that showed peaks of transcript level at each stage were selected (see Supplemental Data II). Genes highly expressed at a particular stage may indicate relevance for stage-specific developmental functions. Of the 2,295 genes analyzed, 652 showed peaks of expression at 3 DPA, 791 genes peaked at 7 DPA, 219 at 14 DPA, 502 at 21 DPA, 30 at 28 DPA and 101 genes at 35 DPA (see Supplemental Data II). Genes that peaked in expression in the same stage showed clustering of genes of defined function. For example, the maximally expressed genes at 35 DPA include dehydrins and late embryo abundant (LEA) genes, protein synthesis inhibitors, including tritin, proteinase inhibitors, serpin and thaumatin. Dehydrins and LEA proteins (Farrant et al. 2004) are produced in response to environmental stimuli with a dehydrative nature, such as drought, low temperature, salinity, and developmental stages such as seed and pollen maturation (Campbell and Close 1997; Danyluk et al. 1998). Their up-regulation was characteristic only during the last stages of the grain development studied and can signify entering of the grain into the dry-down stage. Tritin is a ribosome inactivating protein (Nielsen and Boston 2001) and has been proposed to have a defense role (Massiah and Hartley 1995) or a suggested role in the programmed senescence of the endosperm at maturity in barley (Leah et al. 1991). Its transcript abundance is significantly increasing between 28 DPA and 35 DPA, when protein biosynthesis is terminating and the seed is preparing for desiccation. Graphical representation of the expression profiles of genes that peaked at different stages (Fig. 5) displays the continuous expression of these genes from one stage to another. For example, genes with a peak of expression at 14 DPA continue their expression thereafter, suggesting a continued functional requirement at later stages in grain development. Several genes involved in starch biosynthesis peak at 14 DPA (i.e. ADP-glucose pyrophosphorylase, starch synthase 123 662 Plant Mol Biol (2007) 63:651–668 Fig. 5 Gene expression at different stages of grain development The left panel shows the morphology of the whole caryopsis at the different time points of development used on the microarray experiments. The yellow bar in the topmost panel is 2 mm. The middle panel shows the pattern of expression of genes that peak at each specific stage. The relative expression value (yaxis) is plotted against developmental time (x-axis). The right panel shows the major physiological and biological processes occurring in the endosperm at different developmental stage based on histological sections and literature IIa-3 and starch branching enzyme IIb) and continue to be expressed up to 35 DPA. Overall, the analysis of the temporal expression patterns of developmentally regulated genes showed gene activities that were well correlated with the developmental events at the respective stages. Physiological markers of caryopsis development and associated differentially expressed genes We examined the relationship of the 497 genes that have changed at least 2-fold or greater in expression across the 6 time-points studied with the physiological markers FW, WC and DW of a developing caryopsis (Fig. 6A). Caryopsis FW is a function of DW and WC accumulation. FW is primarily due to WC at the early stages of development. At 14 DPA the accumulation of WC tapers off whereas that of DW continuous to rise. The increase in FW at the later stage is due to the accumulation of the storage compounds specially starch. Our 123 analysis showed that the pattern of expression of 234 genes correlated well (Pearson correlation = or greater than 0.95) with the pattern of accumulation of these physiological markers (Listed in Supplemental Data IV-a). A majority (196 genes) correlated with WC, 102 of which are shared with FW; 136 genes correlated with FW, 20 of which are unique to FW. Only 18 correlated with DW, 14 shared with FW. All of the transcripts from storage protein genes within this group of 497 genes showed similar pattern of accumulation to WC. To identify genes that may potentially play a role in the control of the accumulation of the physiological markers, we examined the pattern of expression of genes that correlates with the RATE of DW, FW and WC accumulation (Fig. 6B). Our data showed 36 genes with significant correlation (Listed in Supplemental Data IV-b). The majority of the genes (35) correlated with DW rate of accumulation. The expression of one gene (encoding a hypothetical protein) correlated with the rate of accumulation of FW and none Plant Mol Biol (2007) 63:651–668 663 Fig. 6 Physiological markers of development and coexpressed genes. Genes with transcript expression profile pattern similar to the (A) pattern of accumulation (or loss) FW, DW and WC per caryopsis and (B) the pattern of the rate of accumulation of each of the designated physiological markers. The numbers outside the circles corresponds to the number of genes that did not show similar pattern (at Pearson correlation >0.95) as the physiological markers. DW, Dry weight; FW, Fresh weight; WC, water content correlated with WC. The genes that correlated with the pattern of DW rate of accumulation include 11 hypothetical proteins and proteins without known function. Genes with annotations include those involve in the synthesis, accumulation and probable protection of the major storage compounds, starch and storage proteins, both of which substantially contribute to caryopsis DW. These include genes that code for starch synthesis enzymes, starch branching enzyme I and ADP-glucose pyrophosphorylase large subunit, a component of an enzyme complex, the activity of which catalyzes the first committed step in the biosynthesis of starch; inhibitors of protein degradation, protease inhibitor II, a putative Hageman factor (a serine protease inhibitor) and PUP88 protein (a member of the trypsin/a-amylase family of inhibitors); and probable defense proteins—puroindoline, spodomicin and purothionin. Interestingly, the expression of several genes involved in protein degradation also correlated with the rate of DW accumulation. These include serine carboxypeptidase I and II, and a putative 26S proteosome regulatory subunit. Serine carboxypeptidases are proteolytic enzymes reported to be expressed in the 123 664 Plant Mol Biol (2007) 63:651–668 Fig. 7 Transcription factors: A hierarchical clustering of the 49 transcription factors that are differentially expressed across all time-points examined using average linkage clustering and Euclidian distance. The hierarchical cluster color code: yellow bar represents the median value of gene expression during development; red bar represents higher expression than median value of gene expression and the green bar represents lower than median value of gene expression. k-Means clustering was used to group the developmentally regulated genes into 4 classes. The color of the k-means class letter ID corresponds to the color of the vertical bar between the GenBank accession number of each gene and its description; the vertical bar color marks which gene assembled to each of the k-means cluster aleurone layer and participate in the mobilization of endosperm storage proteins during cereal grain germination (Dal Degan et al. 1994). These genes are also expressed in the vascular tissue, where they might be involved in programmed cell death in germinating grains; and in the scutellum, where their function is still unclear (Dominguez et al. 2002). The 26S proteosome, which is a large ATP-dependent protease composed of two multiprotein complexes, a 19S regulatory complex and the 20S proteosome, degrades short-lived intra- cellular proteins marked by ubiquitin (Smalle and Vierstra 2004). Proteases are necessary for protein turnover and development—degradation of damaged, and misfolded proteins provides free amino acids required for the synthesis of new proteins. The breakdown of selected regulatory proteins by the ubiquitin/ proteosome pathway could control key aspects of grain growth and development; and limited proteolysis at highly specific sites is required in the maturation and sub-cellular targeting of enzymes and storage proteins. 123 Plant Mol Biol (2007) 63:651–668 Grain weight, which is a major component of grain yield, is determined by the rate and duration at which the grain accumulates dry matter. It has been reported that wheat cultivars show significant variations in the rate of dry matter accumulation (Bruckner and Fruhberg 1987; Chanda et al.1999) and that increases in rate of caryopsis dry matter accumulation may be an important factor leading to improved yield (Hartung et al. 1989). One could test whether the genes with expression correlating with the pattern of DW rate of accumulation could be used as possible markers in the selection for increased rate of dry matter accumulation that may result in greater yield potentials in crops. Transcription factors involved in grain development In spite of the limitations in the sensitivity of the arrays for detecting low expressed genes the expression of a significant number of transcription factors were detected. Of the 175 genes on the whole array that showed similarity to transcription factors (BLASTX at E-value £ 10–5), 49 were differentially expressed across the 6 time-points examined. Cluster analysis identified 4 groups of transcription factors (A–D) each with distinct patterns of expression (Fig. 7). Clusters A and C show maximal expression during later anthesis but with different kinetics. Cluster A shows a low expression at 3 DPA, rapidly increasing between 7 DPA and 14 DPA, peaking at 21 DPA then slightly decline but still highly expressed thereafter. This cluster is composed of 7 genes including three no apical meristem (NAM) transcription factors, a WRKY transcription factor, a negative regulator of transcription and an F-box containing protein. NAM is a member of a Fig. 8 Data validation by RT-PCR Comparison of gene expression profile of two genes using microarray and RT-PCR: the top panels show the expression profile of MADS-box 9 transcription factor and tritin at the 6 time-points examined using cDNA 665 family of transcription factors found only in plants (Ooka et al. 2003). NAM plays a role in the establishment of the shoot apical meristem. The bZIP transcription factors in this cluster show the highest change difference (5.4-fold) in expression from early anthesis to late anthesis. Their expression profile correlates with that of the storage proteins (gliadins) especially those involved in gluten formation. Previous studies have shown that gluten protein genes are regulated by bZIP domain containing transcription factors (Albani et al. 1997). Cluster B contains 4 genes; two MADS-box containing transcription factors, a putative transcription factor and a zinc-finger containing protein.Clusters C and D have a narrow range of change in gene expression from early to late anthesis. The average change of expression for these groups is 1.4-fold. These clusters include some of the general transcription factors like the TATA- and CCAAT-box binding proteins. Validation of wheat microarray data Semi-quantitative RT-PCR was used to determine the expression of selected clones with distinct patterns of expression to confirm the trends in mRNA accumulation determined by cDNA arrays (Fig. 8). RNAs isolated from a fourth biological sample (different from the set of biological samples used for hybridization on the array) were used as templates for amplification. The pattern of the MADS-box and tritin gene transcript accumulation correlated well with that obtained by cDNA arrays. The good agreement of gene expression on a different biological sample other than what was used on the array provides further support to microarrays. The middle panels show the PCR fragments amplified by RT-PCR from RNA isolated from developing grains using gene-specific primers. Bottom panel show the amplification product for the 18S rRNA control 123 666 the reliability of the microarray results. The patterns of expression of 5 other clones examined by RT-PCR were consistent with the pattern of expression of the same clones obtained using the cDNA arrays (data not shown). Our results indicate that the trend of RNA accumulation obtained with the use of the cDNA microarrays we generated is consistent with the data obtained with other methods used for measuring gene expression. Concluding remarks This study describes the expression dynamics of global gene expression during wheat caryopsis development using DNA microarrays. We report the construction and validation of an 8K wheat cDNA microarray for transcriptome analysis. We provide a comprehensive list of developmentally regulated wheat caryopsis genes and their expression profiles, including the information on the temporal expression of a significant number of uncharacterized genes. Several files containing supplemental data are provided for more comprehensive information on the differentially expressed genes discussed in this paper (also available at http://wheat.pw.usda.gov/pubs/2007/Laudencia). Acknowledgements We would like to thank Adaku Ude, Sarah Vela and Joseph Pham for their excellent technical assistance and Dr. Grace Chen, Dr. Michael Gitt, Dr. Kent McCue and Dr. Craig Parker for the critical reading of the manuscript. We apologize to those whose works we have not cited owing to restrictions in the length of this article. The USDA-ARS CRIS Project 5325-21000-011 funded this work. References Alba R, Fei ZJ, Payton P, Liu Y, Moore SL, Debbie P, Cohn J, D’Ascenzo M, Gordon JS, Rose JKC, Martin G, Tanksley SD, Bouzayen M, Jahn MM, Giovannoni J (2004) ESTs, cDNA microarrays, and gene expression profiling: tools for dissecting plant physiology and development. Plant J 39:697–714 Albani D, Hammond-Kosack MC, Smith C, Conlan S, Colot V, Holdsworth M, Bevan MW (1997) The wheat transcriptional activator SPA: a seed-specific bZIP protein that recognizes the GCN4-like motif in the bifactorial endosperm box of prolamin genes. Plant Cell 9:171–184 Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB, Feldmann KA (2006) Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol Biol 60:69–85 Altenbach SB, DuPont FM, Kothari KM, Chan R, Johnson EL, Lieu D (2003) Temperature, water and fertilizer influence the timing of key events during grain development in a US spring wheat. J Cereal Sci 37:9–20 Altenbach SB, Kothari KM (2004) Transcript profiles of genes expressed in endosperm tissue are altered by high temperature during wheat grain development. J Cereal Sci 40:115–126 123 Plant Mol Biol (2007) 63:651–668 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 Appels R, Francki M, Chibbar R (2003) Advances in cereal functional genomics. Funct Integr Genom 3:1–24 Aquino P, Carron F, Calvo R (1999) Selected wheat statistics. CIMMYT, Mexico Bechtel DB, Zayas I, Kaleikau L, Pomeranz Y (1990) Sizedistribution of wheat–starch granules during endosperm development. Cereal Chem 67:59–63 Bhullar SS, Jenner CF (1986) Effects of temperature on the conversion of sucrose to starch in the developing wheat endosperm. Aust J Plant Physiol 13:605 Bilecen K, Ozturk UH, Duru AD, Sutlu T, Petoukhov MV, Svergun DI, Koch MH, Sezerman UO, Cakmak I, Sayers Z (2005) Triticum durum metallothionein. Isolation of the gene and structural characterization of the protein using solution scattering and molecular modeling. J Biol Chem 280:13701–13711 Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FCP, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum information about a microarray experiment (MIAME) – toward standards for microarray data. Nat Genet 29:365–371 Brooks A, Jenner CF, Aspinall D (1982) Effects of water deficit on endosperm starch granules and on grain physiology of wheat and barley. Aust J Plant Physiol 9:423 Bruckner PL, Frohberg RC (1987) Rate and duration of grain fill in spring wheat. Crop Science 27:451–455 Bruix M, Jimenez MA, Santoro J, Gonzalez C, Colilla FJ, Mendez E, Rico M (1993) Solution structure of gamma 1-H and gamma 1-P thionins from barley and wheat endosperm determined by 1H-NMR: a structural motif common to toxic arthropod proteins. Biochemistry 32:715–724 Calderini DF, Abeledo LG, Slafer GA (2000) Physiological maturity in wheat based on kernel water and dry matter. Agron J 92:895–901 Campbell SA, Close TJ (1997) Dehydrins: genes, proteins, and associations with phenotypic traits. New Phytol 137:61–74 Chanda SV, Narmada K, Singh YD (1999) Dry matter accumulation and associated changes in biochemical parameters during wheat grain development. J Agron Crop Sci 182:153 Clarke B, Rahman S (2005) A microarray analysis of wheat grain hardness. Theor Appl Genet 110:1259–1267 Clarke BC, Appels R, Hobbs M, Skylas D (2000) Genes active in developing wheat endosperm. Funct Integr Genomics 1:44 Close TJ, Wanamaker SI, Caldo RA, Turner SM, Ashlock DA, Dickerson JA, Wing RA, Muehlbauer GJ, Kleinhofs A, Wise RP (2004) A new resource for cereal genomics: 22K barley GeneChip comes of age. Plant Physiol 134:960–968 Colilla FJ, Rocher A, Mendez E (1990) gamma-Purothionins: amino acid sequence of two polypeptides of a new family of thionins from wheat endosperm. FEBS Lett 270:191–194 Dal Degan F, Rocher A, Cameron-Mills V, Von Wettstein D (1994) The expression of serine carboxypeptidases during maturation and germination of the barley grain. Proc Nat Acad Sci USA 91:8209 Daniel C, Triboi E (2000) Effects of temperature and nitrogen nutrition on the grain composition of winter wheat: effects on gliadin content and composition. J Cereal Sci 32:45 Plant Mol Biol (2007) 63:651–668 Danyluk J, Perron A, Houde M, Limin A, Fowler B, Benhamou N, Sarhan F (1998) Accumulation of an acidic dehydrin in the vicinity of the plasma membrane during cold acclimation of wheat. Plant Cell 10:623–638 Davies EJ, Tetlow IJ, Bowsher CG, Emes MJ (2003) Molecular and biochemical characterization of cytosolic phosphoglucomutase in wheat endosperm (Triticum aestivum L. cv. Axona). J Exp Bot 54:1351–1360 Dominguez F, Gonzalez MC, Cejudo FJ (2002) A germinationrelated gene encoding a serine carboxypeptidase is expressed during the differentiation of the vascular tissue in wheat grains and seedlings. Planta 215:727 Donson J, Fang YW, Espiritu-Santo G, Xing WM, Salazar A, Miyamoto S, Armendarez V, Volkmuth W (2002) Comprehensive gene expression analysis by transcript profiling. Plant Mol Biol 48:75–97 Dupont FM, Altenbach SB (2003) Molecular and biochemical impacts of environmental factors on wheat grain development and protein synthesis. J Cereal Sci 38:133–146 Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868 Farrant JM, Bailly C, Leymarie J, Hamman B, Come D, Corbineau F (2004) Wheat seedlings as a model to understand desiccation tolerance and sensitivity. Physiol Plant 120:563–574 Ferl RJ (1996) 14-3-3 Proteins and signal transduction. Ann Rev Plant Physiol Plant Mol Biol 47:49–73 Francki M, Appels R (2002) Wheat functional genomics and engineering crop improvement. Genome Biol 3:Rev 1013 Fulgosi H, Soll J, de Faria Maraschin S, Korthout HA, Wang M, Testerink C (2002) 14-3-3 Proteins and plant development. Plant Mol Biol 50:1019–1029 Gachon CM, Langlois-Meurinne M, Henry Y, Saindrenan P (2005) Transcriptional co-regulation of secondary metabolism enzymes in Arabidopsis: functional and evolutionary implications. Plant Mol Biol 58:229–245 Giroux MJ, Boyer C, Feix G, Hannah LC (1994) Coordinated transcriptional regulation of storage product genes in the maize endosperm. Plant Physiol 106:713–722 Goldberg RB, Hoschek G, Kamalay JC (1978) Sequence complexity of nuclear and polysomal RNA in leaves of the tobacco plant. Cell 14:123–131 Gregersen PL, Brinch-Pedersen H, Holm PB (2005) A microarray-based comparative analysis of gene expression profiles during grain development in transgenic and wild type wheat. Transgen Res 14:887–905 Hanley-Bowdoin L, Lane BG (1983) A novel protein programmed by the mRNA conserved in dry wheat embryos. The principal site of cysteine incorporation during early germination. Eur J Biochem 135:9–15 Hartung RC, Poneleit CG, Cornelius PL (1989) Direct and correlated responses to selection for rate and duration of grain fill in maize. Crop Science 29:740–745 Hughes CE (1976) The developing endosperm of Triticum aestivum (L.): an ultrastructural and morphometric study. PhD thesis, University of Nottingham, England, I–IV Kawaura K, Mochida K, Yamazaki Y, Ogihara Y (2006) Transcriptome analysis of salinity stress responses in common wheat using a 22k oligo-DNA microarray. Funct Integr Genomics 6:132–142 Kerk NM, Ceserani T, Tausta SL, Sussex IM, Nelson TM (2003) Laser capture microdissection of cells from plant tissues. Plant Physiol 132:27–35 Kerr MK, Churchill GA (2001a) Experimental design for gene expression microarrays. Biostatistics 2:183–201 667 Kerr MK, Churchill GA (2001b) Statistical design and the analysis of gene expression microarray data. Genet Res 77:123–128 Kerr MK, Martin M, Churchill GA (2000) Analysis of variance for gene expression microarray data. J Comput Biol 7:819– 837 Langridge P, Lagudah ES, Holton TA, Appels R, Sharp PJ, Chalmers KJ (2001) Trends in genetic and genome analyses in wheat: a review. Aust J Agric Res 52:1043–1077 Leader DJ (2005) Transcriptional analysis and functional genomics in wheat. J Cereal Sci 41:149–163 Leah R, Tommerup H, Svendsen I, Mundy J (1991) Biochemical and molecular characterization of three barley seed proteins with antifungal properties. J Biol Chem 266:1564–1573 Lu C, Hawkesford MJ, Barraclough PB, Poulton PR, Wilson ID, Barker GL, Edwards KJ (2005) Markedly different gene expression in wheat grown with organic or inorganic fertilizer. Proc Biol Sci 272:1901–1908 Marchler-Bauer A, Anderson JB, Cherukuri PF, De WeeseScott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Marchler GH, Mullokandov M, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Yamashita RA, Yin JJ, Zhang D, Bryant SH (2005) CDD: a conserved domain database for protein classification. Nucleic Acids Res 33:D192–D196 Massiah AJ, Hartley MR (1995) Wheat ribosome-inactivating proteins: seed and leaf forms with different specificities and cofactor requirements. Planta 197:633–640 Meyers BC, Galbraith DW, Nelson T, Agrawal V (2004) Methods for transcriptional profiling in plants. Be fruitful and replicate. Plant Physiol 135:637–652 Morell MK, Myers AM (2005) Towards the rational design of cereal starches. Curr Opin Plant Biol 8:204–210 Nielsen K, Boston RS (2001) RIBOSOME-INACTIVATING PROTEINS: A Plant Perspective. Annu Rev Plant Physiol Plant Mol Biol 52:785–816 Ogihara Y, Mochida K, Kawaura K, Murai K, Seki M, Kamiya A, Shinozaki K, Carninci P, Hayashizaki Y, Shin IT, Kohara Y, Yamazaki Y (2004) Construction of a full-length cDNA library from young spikelets of hexaploid wheat and its characterization by large-scale sequencing of expressed sequence tags. Genes Genet Syst 79:227–932 Olsen OA (2001) Endosperm development: cellularization and cell fate specification. Ann Rev Plant Physiol Plant Mol Biol 52:233–267 Olsen OA, Linnestad C, Nichols SE (1999) Developmental biology of the cereal endosperm. Trends Plant Sci 4:253–257 Ooka H, Satoh K, Doi K, Nagata T, Otomo Y, Murakami K, Matsubara K, Osato N, Kawai J, Carninci P, Hayashizaki Y, Suzuki K, Kojima K, Takahara Y, Yamamoto K, Kikuchi S (2003) Comprehensive analysis of NAC family genes in Oryza sativa and Arabidopsis thaliana. DNA Res 10:239–247 Qi LL, Echalier B, Chao S, Lazo GR, Butler GE, Anderson OD, Akhunov ED, Dvorak J, Linkiewicz AM, Ratnasiri A, Dubcovsky J, Bermudez-Kandianis CE, Greene RA, Kantety R, La Rota CM, Munkvold JD, Sorrells SF, Sorrells ME, Dilbirligi M, Sidhu D, Erayman M, Randhawa HS, Sandhu D, Bondareva SN, Gill KS, Mahmoud AA, Ma X-F, Miftahudin, Gustafson JP, Conley EJ, Nduati V, Gonzalez-Hernandez JL, Anderson JA, Peng JH, Lapitan NL, Hossain KG, Kalavacharla V, Kianian SF, Pathan MS, Zhang DS, Nguyen HT, Choi DW, Fenton RD, Close TJ, McGuire PE, Qualset CO, Gill BS (2004) A chromosome bin map of 16,000 expressed sequence tag loci and distribution of genes among the three genomes of polyploid wheat. Genetics 168:701–712 123 668 Schaffer R, Landgraf J, Perez-Amador M, Wisman E (2000) Monitoring genome-wide expression in plants. Curr Opin Biotechnol 11:162–167 Sehnke PC, Chung HJ, Wu K, Ferl RJ (2001) Regulation of starch accumulation by granule-associated plant 14-3-3 proteins. Proc Natl Acad Sci USA 98:765–770 Shewry PR, Halford NG (2002) Cereal seed storage proteins: structures, properties and role in grain utilization. J Exp Bot 53:947–958 Shewry PR, Halford NG, Belton PS, Tatham AS (2002) The structure and properties of gluten: an elastic protein from wheat grain. Philos Trans R Soc Lond Ser B Biol Sci 357:133–142 Simmonds D, O’Brien T (1981) Morphological and biochemical development of the wheat endosperm. American Association of Cereal Chemists, St. Paul, Minnesota Simmonds NW (1995) The relation between yield and protein in cereal grain. J Sci Food Agric 67:309–315 Smalle J, Vierstra RD (2004) The ubiquitin 26S proteasome proteolytic pathway. Ann Rev Plant Biol 55:555–590 Sreenivasulu N, Altschmied L, Radchuk V, Gubatz S, Wobus U, Weschke W (2004) Transcript profiles and deduced changes of metabolic pathways in maternal and filial tissues of developing barley grains. Plant J 37:539–553 Stoddard FL (1999a) Survey of starch particle-size distribution in wheat and related species. Cereal Chem 76:145–149 Stoddard FL (1999b) Variation in grain mass, grain nitrogen, and starch B-granule content within wheat heads. Cereal Chem 76:139–144 Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis of microarray data. Bioinformatics 18:207–208 Tetlow IJ, Morell MK, Emes MJ (2004) Recent developments in understanding the regulation of starch metabolism in higher plants. J Exp Bot 55:2131–2145 Triboi E, Martre P, Triboi-Blondel AM (2003) Environmentallyinduced changes in protein composition in developing grains of wheat are related to changes in total protein content. J Exp Bot 54:1731–1742 123 Plant Mol Biol (2007) 63:651–668 Triboi E, Triboi-Blondel AM (2002) Productivity and grain or seed composition: a new approach to an old problem – invited paper. Eur J Agron 16:163–186 Wellmer F, Alves-Ferreira M, Dubois A, Riechmann JL, Meyerowitz EM (2006) Genome-wide analysis of gene expression during early Arabidopsis flower development. PLoS Genet 2:e117 Wilson ID, Barker GLA, Beswick RW, Shepherd SK, Lu CG, Coghill JA, Edwards D, Owen P, Lyon R, Parker JS, Lenton JR, Holdsworth MJ, Shewry PR, Edwards KJ (2004) A transcriptomics resource for wheat functional genomics. Plant Biotechnol J 2:495–506 Woerdeman DL, Veraverbeke WS, Parnas RS, Johnson D, Delcour JA, Verpoest I, Plummer CJ (2004) Designing new materials from wheat protein. Biomacromolecules 5:1262–1269 Wolfinger R, Gibson G, Wolfinger E, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules R (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol 8:625–637 Xue GP, McIntyre CL, Chapman S, Bower NI, Way H, Reverter A, Clarke B, Shorter R (2006) Differential gene expression of wheat progeny with contrasting levels of transpiration efficiency. Plant Mol Biol 61:863–881 Yang YH, Speed T (2002) Design issues for cDNA microarray experiments. Nat Rev Genet 3:579–588 Young TE, Gallie DR (2000) Programmed cell death during endosperm development. Plant Mol Biol 44:283–301 Zhang X, Feng B, Zhang Q, Zhang D, Altman N, Ma H (2005) Genome-wide expression profiling and identification of gene activities during early flower development in Arabidopsis. Plant Mol Biol 58:401–419 Zhu T, Budworth P, Chen W, Provart N, Chang H-S, Guimil S, Su W, Estes B, Zou G, Wang X (2003) Transcriptional control of nutrient partitioning during rice grain filling. Plant Biotechnol J 1:59–70