Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Visualization and Analysis Workflow December 14, 2009 Draft /ber The concept of a Workflow • Express the analysis of plant systems in terms of the data and operations on those data – Multiple types of data (e.g., experimental, computed, archival) – Mutliple types of operations (e.g., analytical, visualization, search) • Treat the data and operations as components, which can be re-used, replaced, augmented, and extended. Workflow • A pathway of operations • Entities: – Operation – Data – Flow • The flow through the operations is managed by the workflow software (e.g., VizTrails) Multi-layer workflows Conceptual Level: High-level representation for casual users, with lots of defaults pre-selected List of genes Co-expression analysis Network Professional Level: Visibility into underlying workflows, with freedom to select tools and parameters Pathways Analysis of omics data List of genes Statistical analysis tool Network Interactive Visual Analysis Metabolites Infrastructure Level: The explicit treatment of underlying data, databases, data integration, tools, operations, parameters, defaults, wrappers, provenance, interconnectivity, access, etc. VizTrails- a candidate workflow architecture Professional workflow Provenance and metadata Conceptual workflow Interactive visualizations • Visual programming interface for representing data and operations as workflows • Loose coupling, using parameterizable Python wrappers • Extensible, flexible, re-usable components and workflows • Coupled with an attractive, flexible User Interface (to be developed) Example Workflows from iPLANT team • Goals: – Demonstrate the use of a workflow model for representing the data and processes in plant genomic research exploration – Provide a common structure for iPLANT use cases – Help define requirements for data integration – Motivate discussions about analysis that join multiple types of data, allow users to interact dynamically, and provide interactive painting across visual representations (e.g., painting a metabolic pathway with gene expression magnitude) Workflow for Maize Gene Analysis Modeling and Statistical Inference Candidate maize gene Homolog Finder (e.g, CoGE) Literature search List of homologous Arabadopsis gene IDs 5 genes of interest Examine clusters that can handle maize data (e.g., eNorthern, MapMan) note: very limited data for maize so may need to go to rice For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome Browser) Expression data for 20 maize genes Co-Expression Analysis (e.g., ATTED2) Expression Network of 10 Arabidopsis Genes Homolog Finder (e.g, CoGE) Find expression values for these genes (e.g, Next Gen) List of 20 homogolous maize gene IDs /tb/ber Workflow for Analysis of Omics Data in a Model Species Gene expression data Expression Analysis Identify sub-cellular locations of gene products (e.g., Interactome) Metabolite Data Interactive Visual &Statistical Analysis (e.g., ViVA, Co-expression analysis, PlantMetGenMap, Gene Mania) Visually-identifed, cellbased, network regions of interest •Interactive visual and statistical analysis •Explicit support for iterative what-if analysis Visually identified genes and metabolites to map onto functional pathways Inferred Protein-Protein interactions Visualize •Integrated gene expression and metabolomic data iterate iterate Testable Hypotheses Visualize Visually-identified enriched pathways /rg/ber Other Data Sources to be Incorporated 1. 2. 3. 4. 5. Motifs from Regulatory Regions in Model Species Cell-specific Expression Pathways Wiki, place gene(s) of interest in established pathways. Metabolites, incorporate information from Reactome Literature , PubMed Assistant??? Depiction Needed Displays of inferred regulatory networks, as in Gene Mania. 1 Analysis of Gene Expression from A Partially Sequenced Species Experimental exposure of plants to stress Highly expressive genes Ecophysiological data 6 Meta Annotator: Explore known features of these genes (e.g. signaling pathways, eFP, literature) onto pathways (e.g., MapMan) 3 Identification of homologs in reference species (e.g. CoGe) 7 Formulate mechanistic models Visualization of enriched pathways 5 2 Paint identified genes Compare magnitude of activity across reference pathways (e.g., PageMan,KEGG, GO, MapMan) 4 Identification of candidate homologs that have been reported as co-expressed (e.g., statistical correlation) Co-expressed genes for reference species /rg/ber