Download Convert your MS raw files into Open Data formats. In the cloud and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Convert your MS raw files into Open Data formats. In the cloud and anywhere. Contributor:​
Steffen Neumann, IPB Halle Only a few open tools support the proprietary formats used natively by the mass spectrometry vendor software. The reason being that these are binary formats, and usually no public documentation exists how to read these and access the actual MS raw data. Sometimes, vendors provide software libraries to access their own formats. Currently, most of these interfaces require Windows dynamic link libraries (DLLs) for the actual file access, which are not compatible with other operating systems such as MacOSX or Linux, and thus not with most high performance computing (HPC) or Docker cloud infrastructures. 1
The open ​
mzML data format​
is supported by a large number of community­developed metabolomics software. 2 Table 1: A selection of open source software libraries for reading, and some for writing, mzML​
Hence, the first step in a metabolomics data processing workflow with Open Source tools is the conversion to an open raw data format. One of the main routes to mzML­formatted data is using 3
Open Source converter ​
msconvert developed by the ​
Proteowizard team​
which is one of the reference implementations for mzML. Msconvert is a is a command line tool for converting between various file formats. ​
It can convert to mzML from Sciex, Bruker, Thermo, Agilent, Shimadzu, Waters and also the earlier file formats like ​
mzData ​
or ​
mzXML​
. Although Proteowizard was initially targeting LC/MS data, it can also readily convert GC/MS data for example from the Waters GCT Premier or Agilent instruments. In PhenoMeNal, we have succeeded to package msconvert in a ​
Docker container and execute it in the ​
WINE environment (“Run Windows applications on Linux, BSD, Solaris and Mac OS X”). Moreover, it is possible to integrate this into the ​
Galaxy workflow system​
, again running on a Linux server. The Dockerfile build script is hosted as part of the ​
PhenoMeNal github team in the ​
docker­pwiz repository. Of course, nothing is perfect (yet). For example, WINE currently crashes with some of the vendor formats, but their developer version has improved in this area a lot. Or, currently the Galaxy integration is only a technology demonstration, where the docker container is invoked locally. The real power comes in when the Galaxy dockerized tools can execute anywhere, ranging from local workstations to in­house OpenStack clusters but also on public cloud providers such as Google cloud. In PhenoMeNal we also aim at massively parallelizing conversion and, thus, speeding up the computationally intensive and time demanding process of conversion. Figure 1​
: A tiny workflow. Figure 2: The successful conversion with the docker­y­fied msconvert tool. Currently, mzXML and the Bruker vendor format have been tested as inputs. Figure 3:​
The resulting mzML file for threonine. References: 1. Martens L, Chambers M, Sturm M, et al. (2011) mzML—a Community Standard for Mass Spectrometry Data. ​
Molecular & Cellular Proteomics : MCP,​
10(1). 2. Rocca­Serra P, Salek RM, Arita M, et al. 2016 Data standards can boost metabolomics research, and if there is a will, there is a way. ​
Metabolomics,​
12:14. 3. Chambers MC, Maclean B, Burke R, et al. (2012) A Cross­platform Toolkit for Mass Spectrometry and Proteomics. ​
Nature biotechnology,​
30(10), 918­920.