* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Net BLAST - Microsoft Research
Survey
Document related concepts
Transcript
Bioinformatics at USDA-ARS Livestock Issues Research Unit Scot E. Dowd, Joaquin Zaragoza Mel Oliver and Paxton Payton Projects • Future: Interactive neural network based models to describe and predict gene expression in Livestock and Pathogens • Present: Various Projects Various States Leading to the Future – – – – – – Molecular Modeling Gene Finding Distributed BLAST Whole Genome Comparison Functional Genomics and pathways Pathway or system targeted Microarray design Functional Genomics • Functional Genomics/Gene Ontology- controlled vocabulary • Define, annotate, categorize, and describe large genetic datasets (e.g. est, mRNA) • We have developed a custom curated database for functional domain BLAST (regular blast and rps-BLAST using kog, cog, pfam, hmmr, smart domains) • Ultimately will become a comprehensive .NET suite of analyses for microarray design from new sequence all the way to result visualization. Ontology • Annotation – propogation of error in definitions • Ca BLAST: need for speed (II) • We are working with roughly 5000-100,000 queries against 1GB databases • 1 query takes a fairly fast PC 3 minute to complete – dual 3.2 GHZ XEON – 6 GB RAM – RAID0 SCSI-320 HD • Other methods MPI-BLAST, WU-BLAST, THREADED BLAST, SGE-BLAST, commercial TURBO BLAST, DNAstar etc. BLAST ALGORITHM Cgtcgctcgctgtaagtac– query e.g.1000 letter word Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) A basic local alignment search tool. Journal of Molecular Biology 215, 403410. • What database sequence is most similar to my query. • Databases one of ours is 60GB worth of letters • BLAST generates statistics based upon similarity and substitution probabilities In simplest form purine to purine better than purine to pyrimidine • Slide along 4 GB database find word match and try to extend • BLASTX as example-Translation into 6 reading frames, search database with these 6 sequences with word size of 3. • Time to BLAST – Up to a point decreased time correlated with number of slaves available – Average test machines (2.4 ghz/1gb RAM/SATA150) – (e.g. 90 seq/13 CPU/3 min) vs (90seq/1CPU/38.5 min) 350MB db GB-LAN .NET Distributed BLAST • Take advantage of unused laboratory compute resources • Provide easy, powerful tool for Distributing BLAST • Target Atmosphere – Windows LAN • Current Open Source Distributed BLAST Applications – Require server class master or version of UNIX – Difficult to set up, configure databases, compile and submit jobs. – No large job fault tolerance W.ND BLAST : A Bioinformatician promoting windows? • • • • • • • • .NET C# First tests Condor, MPI, a ported remote shell Contractor Project Manager Database formatter Worker machines Job leasing Output processing HT backend apps Gotta GUI Database formatter Functionality • Network bandwidth would eventually be limited • Fault tolerant to worker failure • Resume upon reboot if Contractor fails • No statistical problems with search results • Complete BLAST database on each worker node if resources allow • Easy to install a breeze to use .NET Distributed BLAST • Queue at each node – Contractor only allows maximum of two query sequences in each node’s queue – Ensures application wait a minimal amount of time between completion and next job • Thread per node – Makes use of .NET Asynchronous Delegate / AD – scalability ??? – Thread Invokes BLAST on remote node – Upon completion, remote node sends “finished” message to the Contractor – The contractor collects results and performs validity check – Once results are verified, remote worker BLAST starts on queue sequence and Contractor prepares allocates future job .NET Distributed BLAST • Fault Tolerance-revisited – Task migration handled through application-level checkpointing – Worker encounters fault or crashes, – Contractor redirects failed nodes sequence on another worker node. – Minimal loss of time • Integrating QOS functionality- current in works – decrease priority when workstation is in use –based upon system remote call checking CPU%, memory etc – GUI allows increasing or decreasing priority – rev gauges and throttles – Storage requirement limitations - redirect query to other database source (working with 10 connection limitation in XP pro) Future Directions • Quality of Service – Allow Contractor to set priority for application • Contractor Fault Tolerance • Large Network Optimization – Sub Contractors • Asynch Del. Thread limit- ewww kewl WEB SERVICE! • Shadow (Sub) Contractors- network load balance • • • • • The End! Questions? Suggestions? Advice? Even Criticism?