Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NextGen Pipeline: Enabling the Plant Science Community Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead) Ed Buckler, Justin Borevitz, Todd Mockler, Pat Schnable, Bob Schmitz, Matt Hudson, Brad Barbazuk, Damian Gessler What is NextGen? • Ultra high-throughput sequence analysis (UHTS) • Several platforms including 454, ABI-Solid, Illumina/Solexa that are capable of generating 1 to 100’s of Gb of DNA sequence on a single run. • Library preparations are relatively simple and kits available • Data analysis is computationally challenging (need to process Tb of data) and beyond the reach of many experimental biologists. UHTS-RNA UHTS-DNA How will UHTS change plant science? • Makes phenotyping not genotyping rate limiting • Genome-wide association studies • Allele-mining • Enables a much deeper understanding of “non-model” species • 1000 genomes project (transcriptome of 1000 plant species) • Genome sequence now available for B. distachyon, S. italica genomes, RILs of maize and rice • Provides detailed transcriptional resolution on global scale • Map 5’, 3’ UTR, TSS, transcript isoforms, • Examine smRNA populations • Map methylation, TF binding sites, etc… NextGen 1.0 Pipeline • Develop an a computational pipeline to process ultra-high throughput sequence datasets • First iteration of NextGen 1.0 Pipeline will perform simple variant detection or transcript quantification starting from DNA and RNA-derived datasets. • Designed explicitly to support modularity and extensibility • Import fastq files and export data in SAM/BAM format. NextGen 2.0 Pipeline • Subsequent versions will have added functionalities that may include: • Ability to process/compare multiple samples • Support varient detection for non-reference genomes • Support multiple methods of analysis (BWA,SOAP2/BOWTIE) • Support additional workflows (smRNA annotation, ChIP seq, de novo assembly) • Input from working groups is imperative • What is the decision tree for subsequent iterations? • What do modeling/stats/viz groups need as NextGen deliverables? • How can NextGen exploit tools under development for G2P? Meeting the needs of biological use cases • Flowering time and photosynthesis • How can NextGen inform modeling efforts • Abiotic Stress • Should we develop a smRNA pipeline for 2.0 • Input from working groups is imperative • What is the decision tree for subsequent iterations? • What do modeling/stats/viz groups need as NextGen deliverables? • How can NextGen exploit tools under development for G2P? Integrating NextGen/Viz Pipeline Workflow • A pathway of operations • Entities: – Operation – Data – Flow • The flow through the operations is managed by the workflow software (e.g., VizTrails) • Candidate software and package are named /ber=Bernice Rogowitz Integrating NextGen/Viz/Modeling Pipelines Literature search Modeling and Statistical Inference Candidate maize gene Homolog Finder (e.g, CoGE) List of homologous Arabadopsis gene IDs 5 genes of interest For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome Browser) Expression data for 20 maize genes Examine clusters that can handle maize data (e.g., eNorthern, MapMan) note: very limited data for maize so may need to go to rice) Co-Expression Analysis (e.g., ATTED2) Expression Network of 10 Arabidopsis Genes Homolog Finder (e.g, CoGE) Find expression values for these genes (e.g, Next Gen) List of 20 homogolous maize gene IDs /ber/tb