Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Supplementary Information Virtual Machine Image of drVM ............................................................................... 1 Command line of drVM ......................................................................................... 3 Graphical user interface of drVM ........................................................................ 14 Create DB without no genus ................................................................................ 19 Complete viral sequence retrieval ............................................................................ 23 Docker of drVM ....................................................................................................... 25 Amazon Machine Image of drVM ........................................................................... 29 Virtual Machine Image of drVM Download Virtual Box https://www.virtualbox.org/wiki/Downloads Download the Virtual Machine image file of drVM via https://sourceforge.net/projects/sb2nhri/files/drVM/drVM.zip (2.43GB) Unzip the file to drVM.ova Open VirtualBox File -> Import Appliance... Select the file to import Or, directly double click on drVM.ova You can change RAM to 16000 MB. Please check the box of "Reinitialize the MAC address of all network cards" 1 Import Click on Shared folders to add share (e.g. Data_drVM in your local computer) Check Auto-mount OK -> OK Start Devices -> Shared Clipboard --> Bidirectional Open Terminal Mount to the shared folder sudo mount -t vboxsf Data_drVM MyData (Password for manager: manager) 2 Command line of drVM In terminal: CreateDB.py -h Usage: CreateDB.py -s sequence.fasta -d db_size -kn on/off -d <int> [Mbp per snap db. default: 200 (200 Mbp)] -kn on/off [keep nogenus taxonomy. default: off] Please give a sequence file. drVM.py -h usage: drVM.py -1 read1.fastq -2 read2.fastq [options] options: -type iontorrent [default: illumina] -dn on/off [digital normalization. default: on] -t <int> [number of threads, default: 2] -md <int> [min depth, default: 1] -ar <float> [alignment rate, default: 0.5 (0.1~0.9)] -bi <int> [blast identity, default: 80 (50~100)] -cl <int> [contig length, to keep assembly, default: 3000] -keep [keep sam file] The viral sequences (sequence.fasta, (16 March 2016)) downloaded from NCBI using query term “(complete[title]) AND (viridae[organism])” was saved in /home/manager/Tools/MyDB. The folder of MyDB is the default reference location for drVM. If you want to change to another location, you should export the path. How to retrieve viral sequences? Please refer to Download viral sequences. To create reference databases: cd Tools/MyDB CreateDB.py -s sequence.fasta 3 To run drVM: export MyDB='/home/manager/Tools/MyDB' To make a new folder for reference databases: cd ~/Tools/drVM mkdir VMDB cd VMDB CreateDB.py -s yourpath/sequence.fasta -d 100 (For 8 GB RAM) Please note that if drVM is run with less RAM than 16 GB, a smaller db size should be specified. Otherwise, empty snap_index database will be produced. To run drVM: export MyDB='/home/manager/Tools/drVM/VMDB' 4 To run SRR1170797 cd ~ cd Templates mkdir Run cd Run mkdir SRR1170797 cd SRR1170797 fastq-dump --split-spot --skip-technical SRR1170797 (Or, wget ftp://ftp-trace.ncbi.nih.gov/sra/srainstant/reads/ByRun/sra/SRR/SRR117/SRR1170797/SRR1170797.sra fastq-dump --split-spot --skip-technical SRR1170797.sra) drVM.py -1 SRR1170797.fastq -type iontorrent 09-14 11:04:51 : drVM, Start. 09-14 11:07:11 : drVM, Finish. (@16GB RAM) A folder named myOutput_YYYYMMDD_MMSS is produced. 5 Please note that the sequencing depth (Y-axis) was obtained by aligning the sequencing reads to the de novo assembled assembly (Bovine_viral_diarrhea_virus_2_strain_USMARC60764_acc_KT832817_1.ctg.fa) using snap (for Ion torrent data). The depth was then placed on the reference genome (KT832817, best hit of the assembly) based on their corresponding coordination (blast Bovine_viral_diarrhea_virus_2_strain_USMARC60764_acc_KT832817_1.ctg.fa against KT832817). To obtain a SAM alignment file: cd myOutput_20160914_1104/result/Pestivirus_rawDB snap index Bovine_viral_diarrhea_virus_2_strain_USMARC60764_acc_KT832817_1.ctg.fa snap_index -O1000 snap single snap_index ../../../SRR1170797.fastq -o snap.sam -x -h 250 -d 12 -n 25 -F a 6 To run DRR049387 cd ~/Templates/Run mkdir DRR049387 cd DRR049387 fastq-dump --split-spot --skip-technical --split-files DRR049387 drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq 09-14 12:26:54 : drVM, Start. 09-14 12:38:48 : drVM, Finish. (@16GB RAM) raw_ctg.fa: 7 raw.blast.result (against refDB): To change blast identity (minimum percent of identity): drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq -bi 70 09-14 12:54:37 : drVM, Start. 09-14 13:02:00 : drVM, Finish. (@16GB RAM) 8 Although seven out of the 13 assembled contis (raw_ctg.fa) were annotated with Rotavirus A using drVM’s refDB, they were all human rotavirus A using blastn against nt. cd myOutput_20160914_1254/result/Rotavirus_refDB blastn -db nt -query raw_ctg.fa -out my.blastn.html -html -remote -num_descriptions 1 -num_alignments 0 e.g., 9 To run SRR062073: cd ~/Templates/Run mkdir SRR062073 cd SRR062073 fastq-dump --split-spot --split-files --skip-technical SRR062073 drVM.py -1 SRR062073_1.fastq -2 SRR062073_2.fastq 10-05 09:18:31 : drVM, Start. 10-05 09:34:05 : drVM, Finish. (@16GB RAM) 10 11 To run ERR690519: cd ~/Templates/Run mkdir ERR690519 cd ERR690519 wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/ERR/ERR690/ERR690519/ERR690519.sra fastq-dump --split-spot --split-files --skip-technical ERR690519.sra drVM.py -1 ERR690519_1.fastq -2 ERR690519_2.fastq 10-05 09:39:12 : drVM, Start. 10-05 15:13:42 : drVM, Finish. (@16GB RAM) 12 To change RAM: Again! Please note that if drVM is run with less RAM than 16 GB, a smaller db size should be specified. Otherwise, empty snap_index database will be produced. For 16 GB RAM, a db size of 200 Mbp is OK. Run accession Target virus Run time with Run time with 8 16 GB RAM GB RAM SRR1170797 Bovine viral diarrhea virus 140 sec 187 sec SRRR062073 Human papillomavirus 934 sec 1,449 sec DRR049387 Human rotavirus 714 sec 562 sec ERR690519 Influenza A virus 20,070 sec 24,478 sec 13 Graphical user interface of drVM In the virtual machine of drVM, you can use graphical user interface: Double clicking on CreateDB.py Just click on Create to create references for drVM. 14 CretaeDB.py takes less than thirty minutes (10:06 10:25 @ 16GB RAM) to produce reference databases for drVM. Double click on drVM.py to select reads for running drVM. Change Blast identity to 70. Click on Go. An output folder will be generated on the desktop. 15 16 For SRR062073: 17 For SRR1170797: Check iontorrent: 18 Create new DB with no genus taxonomy Command line: CreateDB.py -s sequence.fasta -kn on GUI: Right click on mouse to create new folder: DB_nogenus Change DB location Check keep nogenus taxonomy Create DB cd ~/Templates/Run mkdir ERR233428 cd ERR233428 wget ftp://ftp-trace.ncbi.nih.gov/sra/srainstant/reads/ByRun/sra/ERR/ERR233/ERR233428/ERR233428.sra fastq-dump --split-spot --split-files --skip-technical ERR233428.sra 19 Change DB location and select reads In addition to regular genus folders in the folder of result, there is a noGenus folder. Reads aligned to reference genomes with no genus annotation were assembled into contigs. We take Torque teno virus as an example; many TTV genomes have been completely sequenced and deposited in NCBI, however there was no genus information about this specie. Viruses; ssDNA viruses; Anelloviridae; unclassified Anelloviridae; Torque teno virus In Cotton et al’s publication (Cotten M, Oude Munnink B, Canuti M, Deijs M, Watson SJ, et al. (2014) Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm. PLoS ONE 9(4): e93269. doi:10.1371/journal.pone.0093269), they have assembled four viral genomes for the sample 17 (sequencing reads in ERR233428) 20 including adenovirus (KJ194501), human papillomavirus (KJ194499), norovirus (KJ194507) and torque teno virus (KJ194502). To blast the assembled HPV (length: 7782 bp) against KJ194499, sequence identity: 99.97%; the assembled adenovirus (length: 34858 bp) against KJ194501, sequence identity: 99.98%; the assembled of norovirus (length: 7594 bp) against KJ194507 (length: 6326 bp), sequence identity: 99.97%; the assembled TTV (length: 3813 bp) against KJ194502, sequence identity: 99.81%. The assembled sequences can be found in Table S1. To align ERR233428.fastq on the assembled TTV using SOAP2 (-M 4 -r 2): Alignment: 1349 ( 0.04%) To align ERR233428.fastq on the reference KJ194502 using SOAP2 (-M 4 -r 2): Alignment: 1308 ( 0.04%) Please note that SNAP reported all aligned reads, the reads were then gathered as FASTQ files (e.g. Alphapapillomavirus.x.fastq and noGenus.x.fastq) in the GenusFastq folder. Two nearly identical contigs in the different genus folders, Alphapapillomavirus_rawDB and noGenus_rawDB, were de novo assembled based on the collected reads. To blast the assembled HPV in Alphapapillomavirus_rawDB against the assembled HPV in noGenus_rawDB: 21 22 Complete viral sequence retrieval 1. via http://www.ncbi.nlm.nih.gov/ To select Nucleotide for searching “(complete[title]) AND (viridae[organism])” Send to File Format: FASTA Create File (Searched on Oct 20, 2016) Please note that the sequences in FASTA format will not display GI numbers in default. drVM is able to build reference databases using the sequences without GI. 23 2. using Entrez Direct (EDirect): http://www.ncbi.nlm.nih.gov/books/NBK179288/ To install the EDirect software, copy the following commands and paste them into a terminal window: cd ~ perl -MNet::FTP -e \ '$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1); $ftp->login; $ftp->binary; $ftp->get("/entrez/entrezdirect/edirect.zip");' unzip -u -q edirect.zip rm edirect.zip export PATH=$PATH:$HOME/edirect ./edirect/setup.sh esearch -db nuccore -query "(complete[title]) AND (viridae[organism])" | efetch format fasta > sequence.fasta 3. Download the pre-downloaded sequences (Mar 16, 2016) from sourceforge. https://sourceforge.net/projects/sb2nhri/files/drVM/sequence_20160316.tar.gz Unzip the file to sequence.fasta 24 Docker of drVM Docker in Ubuntu (This test was performed in VirtualBox of drVM) curl -sSL https://get.docker.com/ | sudo sh (Password for manager: manager) Start docker: Check status: sudo service docker status Pull the docker image of drVM: sudo docker run -t -i -v /home/manager/Templates:/drVM 990210oliver/drvm /bin/bash cd drVM mkdir VMDB cd VMDB wget https://sourceforge.net/projects/sb2nhri/files/drVM/sequence_20160316.tar.gz tar -zxvf sequence_20160316.tar.gz 25 CreateDB.py -s sequence.fasta export MyDB='/drVM/VMDB' drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq -t 16 drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq -t 16 -bi 70 drVM.py -1 SRR1170797.fastq -type iontorrent -t 16 drVM.py -1 SRR062073_1.fastq -2 SRR062073_2.fastq -t 16 26 Docker in Windows (windows 10) Please refer to https://docs.docker.com/docker-for-windows/ for details. The Settings dialogs provide options to allow Docker auto-start, automatically check for updates, share local drives with Docker containers, enable VPN compatibilty, manage CPUs and memory Docker uses, restart Docker, or perform a factory reset. Share your local drives with Docker: cmd.exe docker --version 27 docker run -t -i -v c:/Users/Jade/test:/drVM 990210oliver/drvm /bin/bash cd drVM mkdir VMDB cd VMDB wget https://sourceforge.net/projects/sb2nhri/files/drVM/sequence_20160316.tar.gz tar -zxvf sequence_20160316.tar.gz CreateDB.py -s sequence.fasta export MyDB='/drVM/VMDB' drVM.py -1 DRR049387_1.fastq -2 DRR049387_2.fastq -t 16 Advisory: Docker for Windows is currently in public beta. Some functionality may change before the product becomes generally available. 28 Amazon Machine Image of drVM Access to http://aws.amazon.com/ Sign in to the console (if you already have an account; otherwise sign up with a new account) Go to the ‘AWS Management Console’ option, click the ‘EC2’ at upper left. Before importing the AMI, make sure you are in the correct Availability zone. Amazon EC2 is hosted in multiple locations world-wide with multiple Availability zones, and resources cannot be replicated across regions until specified. Our AMI is stored in region “US East(N. Virginia)”. Check the upper right corner next to your account name, and make sure it’s set at the correct region. If not, just click and select the correct one from the dropdown menu. Search Public Images for “drVM” Next, click the blue ‘Launch Instance’ button Or, 29