Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Design Principles Separation between components into a modular system Independent standalone modules, that are also runnable programs – Collaborator wants to run srf2FastQ at home, without a MetaDB – Researcher tries custom parameters, but still track his run in the MetaDB XML Workflows that defines jobs and data dependencies – Parameterized to reuse workflows on different Application Wrapper Interface Application conforms to a standard interface Developers and users to not have to understand rest of the the pipeline Force users to adhere to best practices Syntax, --help option Required test harness Verifications of input, output, parameters Wrapped applications mustLocal be runnable both Execution: Java API: public interface WrapperInterface { int init(); // Optional int get_syntax(); int do_test(); int do_verify_input(); int do_verify_parameters(); int do_run(); int do_verify_output(); int clean_up(); // Optional } $ java SeqWareRunner bpostprocess --help → Reports get_syntax() $ java SeqWareRunner bpostprocess input → Run bpostprocess on the command line $ java SeqWareRunner bpostprocess --db input → Same as above, but without MetaDB feedback $ java SeqWareRunner bpostprocess --db input --config=config.txt $ java SeqWareRunner bpostprocess --db input -A 0 -n 8 XML Workflow Follows DAX Standard, which is input to Pegasus Defines jobs, arguments, configuration, and data dependencies Defines dependencies between jobs <?xml version="1.0" encoding="UTF-8"?> xmlns="http://pegasus.isi.edu/schema/DAX" Use Java Freemarker to populate the XML template <adag xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-2.1.xsd" version="2.1" count="1" index="0" name="bfast" jobCount="3" fileCount="0" childCount="2"> for each experiment <!-- Dependencies --> <!-- jobs --> <job id="ID0000001" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast matches %{reference_file} %{experiment}.fastq...</argument> <profile namespace="globus" key="max_memory">24576</profile> <profile namespace="globus" key="count">8</profile> <uses file="%{experiment}.fastq" link="input"> <uses file="%{experiment}.bmf" link="output" transfer="false" register="false"> </job> <job id="ID0000002" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast localalign ...</argument> <uses file="%{experiment}.bmf" link="input"> <uses file="%{experiment}.baf" link="output" transfer="false" register="false"> </job> <job id="ID0000003" namespace="seqware" name="runner" version="0.0.1"> <argument>bfast postprocess ...</argument> <uses file="%{experiment}.bmf" link="input"> <uses file="%{experiment}.bam" link="output" transfer="true" register="true"> </job> ..... <child ref="ID0000002"> <parent ref="ID0000001"/> </child> <child ref="ID0000003"> <parent ref="ID0000001"/> <parent ref="ID0000002"/> </child> </adag> </xml> Pegasus Each task is a standalone application, independently runnable Scientific says 'how do I run Bfast' Collaborator wants to run srf2FastQ at home, but does not have a pipeline or Metadata DB Researcher wants to try some custom parameters, but we still want to try his run in the Metadata DB Each application conforms to a standard, welldefined interface The interface is abstract enough for users to wrap their applications without knowing Pegasus Each task is a standalone application, independently runnable Scientific says 'how do I run Bfast' Collaborator wants to run srf2FastQ at home, but does not have a pipeline or Metadata DB Researcher wants to try some custom parameters, but we still want to try his run in the Metadata DB Each application conforms to a standard, welldefined interface The interface is abstract enough for users to wrap their applications without knowing