Download Supplementary Information Virtual Machine Image of drVM 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Supplementary Information
Virtual Machine Image of drVM ............................................................................... 1
Command line of drVM ......................................................................................... 3
Graphical user interface of drVM ........................................................................ 14
Create DB without no genus ................................................................................ 19
Complete viral sequence retrieval ............................................................................ 23
Docker of drVM ....................................................................................................... 25
Amazon Machine Image of drVM ........................................................................... 29
Virtual Machine Image of drVM
Download Virtual Box
https://www.virtualbox.org/wiki/Downloads
Download the Virtual Machine image file of drVM via
https://sourceforge.net/projects/sb2nhri/files/drVM/drVM.zip (2.43GB)
Unzip the file to drVM.ova
Open VirtualBox
File -> Import Appliance...
Select the file to import
Or, directly double click on drVM.ova
You can change RAM to 16000 MB.
Please check the box of "Reinitialize the MAC address of all network cards"
1
Import
Click on Shared folders to add share (e.g. Data_drVM in your local computer)
Check Auto-mount
OK -> OK
Start
Devices -> Shared Clipboard --> Bidirectional
Open Terminal
Mount to the shared folder
sudo mount -t vboxsf Data_drVM MyData
(Password for manager: manager)
2
Command line of drVM
In terminal:
CreateDB.py -h
Usage:
CreateDB.py -s sequence.fasta -d db_size -kn on/off
-d <int> [Mbp per snap db. default: 200 (200 Mbp)]
-kn on/off [keep nogenus taxonomy. default: off]
Please give a sequence file.
drVM.py -h
usage:
drVM.py -1 read1.fastq -2 read2.fastq [options]
options:
-type iontorrent [default: illumina]
-dn on/off [digital normalization. default: on]
-t <int> [number of threads, default: 2]
-md <int> [min depth, default: 1]
-ar <float> [alignment rate, default: 0.5 (0.1~0.9)]
-bi <int> [blast identity, default: 80 (50~100)]
-cl <int> [contig length, to keep assembly, default: 3000]
-keep [keep sam file]
The viral sequences (sequence.fasta, (16 March 2016)) downloaded from NCBI using
query term “(complete[title]) AND (viridae[organism])” was saved in
/home/manager/Tools/MyDB. The folder of MyDB is the default reference location for
drVM. If you want to change to another location, you should export the path. How to
retrieve viral sequences? Please refer to Download viral sequences.
To create reference databases:
cd Tools/MyDB
CreateDB.py -s sequence.fasta
3
To run drVM:
export MyDB='/home/manager/Tools/MyDB'
To make a new folder for reference databases:
cd ~/Tools/drVM
mkdir VMDB
cd VMDB
CreateDB.py -s yourpath/sequence.fasta -d 100 (For 8 GB RAM)
Please note that if drVM is run with less RAM than 16 GB, a smaller db size should be
specified. Otherwise, empty snap_index database will be produced.
To run drVM:
export MyDB='/home/manager/Tools/drVM/VMDB'
4
To run SRR1170797
cd ~
cd Templates
mkdir Run
cd Run
mkdir SRR1170797
cd SRR1170797
fastq-dump --split-spot --skip-technical SRR1170797
(Or,
wget ftp://ftp-trace.ncbi.nih.gov/sra/srainstant/reads/ByRun/sra/SRR/SRR117/SRR1170797/SRR1170797.sra
fastq-dump --split-spot --skip-technical SRR1170797.sra)
drVM.py -1 SRR1170797.fastq -type iontorrent
09-14 11:04:51 : drVM, Start.  09-14 11:07:11 : drVM, Finish. (@16GB RAM)
A folder named myOutput_YYYYMMDD_MMSS is produced.
5
Please note that the sequencing depth (Y-axis) was obtained by aligning the sequencing
reads
to
the
de
novo
assembled
assembly
(Bovine_viral_diarrhea_virus_2_strain_USMARC60764_acc_KT832817_1.ctg.fa)
using snap (for Ion torrent data). The depth was then placed on the reference genome
(KT832817, best hit of the assembly) based on their corresponding coordination (blast
Bovine_viral_diarrhea_virus_2_strain_USMARC60764_acc_KT832817_1.ctg.fa
against KT832817).
To obtain a SAM alignment file:
cd myOutput_20160914_1104/result/Pestivirus_rawDB
snap
index
Bovine_viral_diarrhea_virus_2_strain_USMARC60764_acc_KT832817_1.ctg.fa
snap_index -O1000
snap single snap_index ../../../SRR1170797.fastq -o snap.sam -x -h 250 -d 12 -n 25 -F
a
6
To run DRR049387
cd ~/Templates/Run
mkdir DRR049387
cd DRR049387
fastq-dump --split-spot --skip-technical --split-files DRR049387
drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq
09-14 12:26:54 : drVM, Start.  09-14 12:38:48 : drVM, Finish. (@16GB RAM)
raw_ctg.fa:
7
raw.blast.result (against refDB):
To change blast identity (minimum percent of identity):
drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq -bi 70
09-14 12:54:37 : drVM, Start.  09-14 13:02:00 : drVM, Finish. (@16GB RAM)
8
Although seven out of the 13 assembled contis (raw_ctg.fa) were annotated with
Rotavirus A using drVM’s refDB, they were all human rotavirus A using blastn against
nt.
cd myOutput_20160914_1254/result/Rotavirus_refDB
blastn -db nt -query raw_ctg.fa -out my.blastn.html -html -remote -num_descriptions 1
-num_alignments 0
e.g.,
9
To run SRR062073:
cd ~/Templates/Run
mkdir SRR062073
cd SRR062073
fastq-dump --split-spot --split-files --skip-technical SRR062073
drVM.py -1 SRR062073_1.fastq -2 SRR062073_2.fastq
10-05 09:18:31 : drVM, Start.  10-05 09:34:05 : drVM, Finish. (@16GB RAM)
10
11
To run ERR690519:
cd ~/Templates/Run
mkdir ERR690519
cd ERR690519
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/ERR/ERR690/ERR690519/ERR690519.sra
fastq-dump --split-spot --split-files --skip-technical ERR690519.sra
drVM.py -1 ERR690519_1.fastq -2 ERR690519_2.fastq
10-05 09:39:12 : drVM, Start.  10-05 15:13:42 : drVM, Finish. (@16GB RAM)
12
To change RAM:
Again! Please note that if drVM is run with less RAM than 16 GB, a smaller db size
should be specified. Otherwise, empty snap_index database will be produced. For 16
GB RAM, a db size of 200 Mbp is OK.
Run accession
Target virus
Run time with Run time with 8
16 GB RAM
GB RAM
SRR1170797
Bovine viral diarrhea virus
140 sec
187 sec
SRRR062073
Human papillomavirus
934 sec
1,449 sec
DRR049387
Human rotavirus
714 sec
562 sec
ERR690519
Influenza A virus
20,070 sec
24,478 sec
13
Graphical user interface of drVM
In the virtual machine of drVM, you can use graphical user interface:
Double clicking on CreateDB.py
Just click on Create to create references for drVM.
14
CretaeDB.py takes less than thirty minutes (10:06  10:25 @ 16GB RAM) to produce
reference databases for drVM.
Double click on drVM.py to select reads for running drVM.
Change Blast identity to 70.
Click on Go.
An output folder will be generated on the desktop.
15
16
For SRR062073:
17
For SRR1170797:
Check iontorrent:
18
Create new DB with no genus taxonomy
Command line:
CreateDB.py -s sequence.fasta -kn on
GUI:
Right click on mouse to create new folder: DB_nogenus
Change DB location
Check keep nogenus taxonomy
Create DB
cd ~/Templates/Run
mkdir ERR233428
cd ERR233428
wget ftp://ftp-trace.ncbi.nih.gov/sra/srainstant/reads/ByRun/sra/ERR/ERR233/ERR233428/ERR233428.sra
fastq-dump --split-spot --split-files --skip-technical ERR233428.sra
19
Change DB location and select reads
In addition to regular genus folders in the folder of result, there is a noGenus folder.
Reads aligned to reference genomes with no genus annotation were assembled into
contigs.
We take Torque teno virus as an example; many TTV genomes have been completely
sequenced and deposited in NCBI, however there was no genus information about this
specie.
Viruses; ssDNA viruses; Anelloviridae; unclassified Anelloviridae; Torque teno virus
In Cotton et al’s publication (Cotten M, Oude Munnink B, Canuti M, Deijs M, Watson
SJ, et al. (2014) Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic
Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification
Algorithm. PLoS ONE 9(4): e93269. doi:10.1371/journal.pone.0093269), they have
assembled four viral genomes for the sample 17 (sequencing reads in ERR233428)
20
including adenovirus (KJ194501), human papillomavirus (KJ194499), norovirus
(KJ194507) and torque teno virus (KJ194502).
To blast the assembled HPV (length: 7782 bp) against KJ194499, sequence identity:
99.97%; the assembled adenovirus (length: 34858 bp) against KJ194501, sequence
identity: 99.98%; the assembled of norovirus (length: 7594 bp) against KJ194507
(length: 6326 bp), sequence identity: 99.97%; the assembled TTV (length: 3813 bp)
against KJ194502, sequence identity: 99.81%. The assembled sequences can be found
in Table S1.
To align ERR233428.fastq on the assembled TTV using SOAP2 (-M 4 -r 2):
Alignment: 1349 ( 0.04%)
To align ERR233428.fastq on the reference KJ194502 using SOAP2 (-M 4 -r 2):
Alignment: 1308 ( 0.04%)
Please note that SNAP reported all aligned reads, the reads were then gathered as
FASTQ files (e.g. Alphapapillomavirus.x.fastq and noGenus.x.fastq) in the GenusFastq
folder. Two nearly identical contigs in the different genus folders,
Alphapapillomavirus_rawDB and noGenus_rawDB, were de novo assembled based on
the collected reads.
To blast the assembled HPV in Alphapapillomavirus_rawDB against the assembled
HPV in noGenus_rawDB:
21
22
Complete viral sequence retrieval
1. via http://www.ncbi.nlm.nih.gov/
To select Nucleotide for searching “(complete[title]) AND (viridae[organism])”
Send to File
Format: FASTA
Create File
(Searched on Oct 20, 2016)
Please note that the sequences in FASTA format will not display GI numbers in default.
drVM is able to build reference databases using the sequences without GI.
23
2. using Entrez Direct (EDirect):
http://www.ncbi.nlm.nih.gov/books/NBK179288/
To install the EDirect software, copy the following commands and paste them into a
terminal window:
cd ~
perl -MNet::FTP -e \
'$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1); $ftp->login;
$ftp->binary; $ftp->get("/entrez/entrezdirect/edirect.zip");'
unzip -u -q edirect.zip
rm edirect.zip
export PATH=$PATH:$HOME/edirect
./edirect/setup.sh
esearch -db nuccore -query "(complete[title]) AND (viridae[organism])" | efetch format fasta > sequence.fasta
3. Download the pre-downloaded sequences (Mar 16, 2016) from sourceforge.
https://sourceforge.net/projects/sb2nhri/files/drVM/sequence_20160316.tar.gz
Unzip the file to sequence.fasta
24
Docker of drVM
Docker in Ubuntu (This test was performed in VirtualBox of drVM)
curl -sSL https://get.docker.com/ | sudo sh
(Password for manager: manager)
Start docker:
Check status:
sudo service docker status
Pull the docker image of drVM:
sudo docker run -t -i -v /home/manager/Templates:/drVM 990210oliver/drvm /bin/bash
cd drVM
mkdir VMDB
cd VMDB
wget https://sourceforge.net/projects/sb2nhri/files/drVM/sequence_20160316.tar.gz
tar -zxvf sequence_20160316.tar.gz
25
CreateDB.py -s sequence.fasta
export MyDB='/drVM/VMDB'
drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq -t 16
drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq -t 16 -bi 70
drVM.py -1 SRR1170797.fastq -type iontorrent -t 16
drVM.py -1 SRR062073_1.fastq -2 SRR062073_2.fastq -t 16
26
Docker in Windows (windows 10)
Please refer to https://docs.docker.com/docker-for-windows/ for details.
The Settings dialogs provide options to allow Docker auto-start, automatically check
for updates, share local drives with Docker containers, enable VPN compatibilty,
manage CPUs and memory Docker uses, restart Docker, or perform a factory reset.
Share your local drives with Docker:
cmd.exe
docker --version
27
docker run -t -i -v c:/Users/Jade/test:/drVM 990210oliver/drvm /bin/bash
cd drVM
mkdir VMDB
cd VMDB
wget https://sourceforge.net/projects/sb2nhri/files/drVM/sequence_20160316.tar.gz
tar -zxvf sequence_20160316.tar.gz
CreateDB.py -s sequence.fasta
export MyDB='/drVM/VMDB'
drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq -t 16
Advisory: Docker for Windows is currently in public beta. Some functionality may
change before the product becomes generally available.
28
Amazon Machine Image of drVM
Access to http://aws.amazon.com/
Sign in to the console (if you already have an account; otherwise sign up with a new
account)
Go to the ‘AWS Management Console’ option, click the ‘EC2’ at upper left.
Before importing the AMI, make sure you are in the correct Availability zone. Amazon
EC2 is hosted in multiple locations world-wide with multiple Availability zones, and
resources cannot be replicated across regions until specified. Our AMI is stored in
region “US East(N. Virginia)”. Check the upper right corner next to your account name,
and make sure it’s set at the correct region. If not, just click and select the correct one
from the dropdown menu.
Search Public Images for “drVM”
Next, click the blue ‘Launch Instance’ button
Or,
29