Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modeling and Associated Visualization Needs A Trilogy in Four Parts The Acts: Not in Chronological Order • Overview of the G2P cyberinfrastructure • Systems biology models (bottom up) – Viz needs: Multivariate Dynamics, Inner Space, & Sensitivity Analysis • Ecophysiological models (top down) – Viz needs: The same, plus Outer Space • Statistical models (non-mechanistic) – Viz needs: Help & fast!! Solving the G2P problem means developing a methodology… …that lets one start with some species & trait that one knows very little about and end with the ability to quantitatively predict trait scores for target genotype/environment combinations. Build quantitative models Acquire data Ignorance Prediction Tools Elicit hypotheses Testing To work, such a methodology must be cyber-enabled Super-user Developer DI Metabolic data DI Whole plant data DI Modeling and Statistical Inference User inferred Environment data DI Experiment Hypothesis Expression data User inferred Visualization DI Visualization Seq data Systems Biology Models Modeling a single gene Temperature Controlled by the amounts of upstream regulatory gene products Amount of M gene product at time t Some fraction of M degrades per unit time Change in amount Influx amount Efflux amount Rate unit time unit time unit time Linking multiple genes… Transcription Transcription Factor “A” Gene Codons RNAP Promoter Region Prot. Syn. DNA RNAP Promoter Region Translation “B” Gene Codons DNA A “Bathtub” Model Temperature modulates all rates Other Gene Products affect degradation Transcription Factors modulate reading What is a “product”? RNA’s: messenger (mRNA) & otherwise Some models do not distinguish mRNA & protein (e.g., when time scales are long) Some models individually represent mRNA, cytosolic protein, and nuclear protein Some models will separate products by tissue/organ (e.g., leaves, phloem, meristem) Many models include metabolites & protein complexes Basic equation is still the same (influx-eflux) Rate Change in amount Influx amount Efflux amount unit time unit time unit time Linear Constant Frac. M 1 1 Enhancer Repressor 0 0 0 2 3 MichaelisMenton 1 Activation dM dt Hill Function 1 M /(1 M ) Input 0 0 1 2 Mass Action Etc. 3 Temperature effects One form of temperature effect Folded protein packaged to go 32.0 C Chaperone (folds/QC) Bad protein (unreleased) Nucleus 39.5 C Endoplasmic reticulum (Abstracted from Ellgaard et al. 1999) Temperature effects Linear Constant Frac. M 1 1 Enhancer Repressor 0 0 0 Activation dM R dt Hill Function Input 1 2 3 MichaelisMenton 1 M /(1 M ) 0 0 1 2 Mass Action Etc. 3 A close up – the diurnal clock mRNA Michaelis-Menton Environmental effect (light) Mass Translation Hill function Influx - Efflux Net transport intoaction nucleus t 0 t 1 Locke et al., 2005 ? Locke et al., 2005 - 9 of 13 equations Barak et al., 2000 (S. Brady) Flowering time prediction Plant growth & metabolism Photosynthesis biochemistry Root development Soil conditions (water stress) Sensitivity Analysis & Sloppy Systems Photoperiod pathway E C TOC1 PHYB A LHY CCA1 Light low Autonomous pathway Temperature C- FPA CRY2 C D C GI FVE C- FCA CO FLC FT SOC1 AP1 LFY Nearly nonfunctional in the Landsberg erecta strain Vernalization pathway B B Gibberellin pathway Flowering Adapted from various literature Z. Dong, 2003. Each letter is a power of two in sensitivity Stiff & Sloppy Directions Parameter 2 All parameter combinations inside this ellipse yield essentially identical goodness-of-fit values Optimum goodness-of-fit “Sloppy” direction “Stiff” direction Sloppy/Stiff ca. 1000 Parameter 1 The “ellipses” may be “hyper-pancakes” with 15 to 30 sloppy directions. How can these be meaningfully visualized?? Sloppy directions in a clock model Cytosolic 'Y' protein 0.024 GIGANTEA ? 0.020 0.016 Locke et al Simplified 0.012 0.008 0.004 0.000 0 24 48 72 96 Hours after sunset 71 parameters reduced to 46 parameters 120 Ecophysiological Models… • …come in three flavors – Environmental physics models (1945 to present) – Crop simulation models (1965 to present) – Geochemical cycling models • Blend the characteristics of both of the above • Are more recent • …are now poised to contribute to the G2P problem via a top-down approach What is the focus of models in Environmental Physics? • Mimics conditions inside a uniform plant canopy; • The typical setting is an agricultural field; – Includes plant-related, edaphic (soil), and meteorological inputs; • Based on physical principles; – Conservation of matter and energy; convection, conduction, convection; –Some plant processes – gas exchange, photosynthesis, respiration • Plant structure consists of leaves, stems, roots; • Time horizon typically a few days with time steps on the order of minutes. •Ergo plants often do not grow Environmental Physics Models: 1945-75 • • • • 1D or Bulk approach; Big Leaf / Big Root submodels; Bucket soil submodels; Resistance analogs used for the atmospheric environment; Limited prediction of soil or canopy scalar variables; Many empirical relationships; Nebulous controlling variables (e.g., canopy resistance to vapor flux); Poor plant/environment feedback. VPD LEAF Atmosphere Big Leaf W Big Root • • • • TLEAF Bucket of Soil TSOIL TAIR Environmental Physics Models: 1975-90 • • • • • Multi-layer atmosphere, soil, and canopy; “Scaled leaf” approach within canopy layers; Relationships between photosynthesis, transpiration, and biophysics (e.g., stomatal action); Use finite difference methods to compute soil heat, water, and gas flows; Incorporate root density functions and soil physical properties. TAIR , VPD, CO2 , wind speed profiles TCANOPY , VPD, CO2 , canopy profiles TSOIL , , W profiles Atmosphere Layers Canopy Sunlit Layers Shade Soil Layers Rooting Profile What is a Crop Growth Model? • Mimics one “average plant” at a field or smaller scale; • The plant environment is an agricultural production setting; – Includes cultural- and production-related I/O variables; – Includes varietal, edaphic, and meteorological inputs; • Based on physiological processes; – Photosynthesis, respiration, transpiration, nutrient uptake, carbon partitioning, growth, and phenological development; • Plant structure consists of leaves, stems, roots, & grain; • Annual time horizon with daily or hourly time steps. What is the current status of Crop Growth Models? • Skillful models can account for ca. 70% of yield variance; • Ongoing work focuses on refinement and applications; – Problems being researched include methods for estimating cultivar and soil characteristics on an operational scale; • Model structures and approaches have matured; • Recent physical theory may not be emphasized; • Physical theory does not seem to improve predictions. Interestingly, incorporating crop growth model components into physical models does not guarantee improved predictability either, even though physical scientists recognize knowledge of the plant as limiting. Special case Geochemical cycling models •Used to model “ecosystem services” and/or “land surface processes” inside general circulation models •Blend of both kinds of models; • Includes plant-related, edaphic, and meteorological inputs; •Based on physical principles – Conservation of matter and energy; convection, conduction, convection; –Some plant processes – gas exchange, photosynthesis, respiration • Plant structure consists of leaves, stems, roots; • Time horizon of years with time steps on the order of minutes (depends on spatial scale). Main points -- • Neither current crop growth models nor environmental physics models adequately depict plant process control mechanisms; • This accounts for the failure of models to mimic the plasticity of real plants across different environments; • The information needed to remedy this situation is emerging from the genomic sciences; • Incorporating this information requires a reorganization of crop models New Crop Growth Model Concept Energy Water N ,T Sensors Control Submodel [CPAI] [KE60] , ,T Physical Submodel Viz needs for ecophysiological models and G2P components • Largely the same as for systems biology models – multivariate dynamics in spatially discrete plant parts • Note that our “G2P solution” specifies predicting trait scores in non-constant environments. – That most directly refers to the outdoors – Therefore geographic variation must also be considered A hazy shade of winter… • One frame of a movie comparing the standard deviation of flowering time for the Columbia strain of A. thaliana germinating on each day. • Projected by the gene-based model of Wilczek et al, 2009. • The standard deviation is over five years (left, 20042009, real data; right, 2094-2099, A1B climate scenario.) Statistical genetic methods I • Can be used to – Predict phenotypes based on genotypes – Locate regions of the genome likely to contain genes controlling particular phenotypes • Can be used when – Knowledge of gene mechanisms is lacking • Big Caveat – The mathematical form of the G2P relationship is just assumed to be linear – … and the data & models elaborated until the job gets done to adequate accuracy Statistical genetic methods II • Why does it work? – Because there are sufficient regimes of near linearity buried in mechanistic network eq’ns that general linear statistical models have levels of predictive skill useful for some purposes (e.g. crop breeding) – Rest assured that there are limits to what should be expected of these models • How does it work? What are genetic markers? Position within gene Aligned DNA sequences of 25 different genetic lines Single nucleotide polymorphism (SNP) (Data from the Purugganan Lab) Different sibling lines will have different marker combinations The DNA sequence for line 1 has the same sequence as parent “B” at the location of marker “g17286”… …but in line 8 the DNA matches parent “A” at that location Many different linear models Phenom 1 X m ,ve 001 2 X m ,T 1G11a 150000 X m ,last_marker Genome Wide Association Finding quantitative trait loci (QTL) Find markers i, j, and k such that Phenom i X m ,i kj X m ,k X m , j other terms is a good fit 1 A where X m ,n if marker n is from parent in line m 0 B etc…. What a QTL analysis output looks like. This is a “1d-scan” – i.e. Xm,j (Buckler et al, Science, 2009) Two Stat Inf Viz Problems • Higher order scans e.g. k , j ,l X m ,k X m , j X m ,l – Remember SNP numbers can be in the 150K to 3M range. • eQTL viz problems – Can be 30K phenotypes… – …and higher order scans eQTL Analysis – Looking for Regulators Transcription Factor “A” Gene Codons RNAP DNA Promoter Region Prot. Syn. DNA RNAP “B” Gene Codons Promoter Region Let “Pheno” be the amount of mRNA (expression) produced by gene “B”. This could be different in lines that varied either in the promoter of “B” or in lines that had differences in the coding region of gene “A”. These are called “cis” and “trans” effects, respectively. Massive eQTL Variation 75% of all genes have at least 1 eQTL I Chromosome II III IV V QTL Effect Bay + Bay - Trans Hotspot (D. Kliebenstein) Position of eQTL for each of 15,771 genes Arranged by Physical Order Cis Diagonal eQTL Viz Problems… How to plot interaction effects? That is Xm,jXm,k and a gazillion phenotypes Questions? Virtual soybean simulations from Han et al. 2007