* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Integration of tools - BioBIKE Portal
Survey
Document related concepts
Transcript
Integration of Tools Summary The tour How to cope with overwhelming information described how difficult it sometimes is to get tools of genome analysis to work together. The present tour shows that the task is certainly not impossible. PhAnToMe / BioBIKE offers a common interface in which the results of tool may be used as input to the next. In this example, a set of proteins defined by the results of a Blast search are aligned, and the alignment is used to make a phylogenetic tree. This is best viewed as a slide show. Click To view it, click Slide Show onto thestart top tool bar, then View show. Integration of Tools • How to get to PhAnToMe / BioBIKE • Problem: Find, characterize rII-like proteins Slide # 4– 8 9 – 64 • Examine bacteriophage T4 genome 10 – 18 • Define set of proteins similar to rII, per Blast 19 – 37 • Align rII-like proteins 38 – 51 • Make phylogenetic tree of rII-like proteins 52 – 64 • Reflections and coming attractions 65 To navigate to a specific slide, type the slide number and press Enter (works only within a Slide Show) Integration of Tools There are more tools useful in studying genomes than anyone would care to learn. It is often advantageous to combine tools, but this is often difficult. This problem is illustrated in the tour: How to cope with overwhelming information? PhAnToMe/BioBIKE attempts to remove logistical barriers in combining tools, as illustrated in this tour. Blast Clustal Phylip www.phantome.org PhAnToMe/BioBIKE can be accessed by going to the PhAnToMe web site at www.phantome.org and mousing over the Tools menu. Be sure you are using Firefox. BioBIKE will not function with other browsers. Then click The Phage BioBIKE Enter your e-mail address and click New Login The first time you log in, you'll be asked for identifying information. This is so that any changes you make in the database are associated with you. After filling in the fields, click Register. An alternate route is through the BioBIKE portal at http://biobike.csbc.vcu.edu However you get to BioBIKE, this is what you’d see. Now suppose that your goal is to characterize protein similar to the rII protein of bacteriophage T4 (if you’ve never heard of this protein, no matter). Specifically: - Find such proteins - Align them - Make a phylogenetic tree First, let’s take a look at phage T4. To do that, mouse over the Genome button… …and click SEQUENCE-OF. The SEQUENCE-OF function appears in the workspace. This function displays/returns the sequence of a gene, protein, genome, contig, replicon, or any arbitrary sequence you provide. To tell the function which sequence you want to see, click the entity box, selecting it for entry. The entity box turns white and a cursor appears. You can type in the box, but unless you know the exact name of the phage, it's easier to pull the name off a menu. We want an organism (which is how BioBIKE considers phages), so mouse over the Organisms button… …mouse over the bacteriophage menu. Scroll through the menu until you find phage T4. Note that the phages are arranged alphabetically by their host. Click T4 to bring it into the SEQUENCE-OF function. Now the function is complete (no open white boxes). Mouse over the function’s action icon (the green wedge in the upper left corner)… …and click Execute. Colored gene sequences are presented within the context of the genome and its annotation. You can scroll through the genome, or search for specific genes ore sequences, but for now, just X out of the sequence viewer. (but first note or copy the name of the rIIA gene, T4p001) Problem - Find such proteins - Align them - Make a phylogenetic tree That was interesting, but... What was the problem again? OK. First step, find proteins with similar sequences to T4P001. To do this, mouse over the Strings-Sequences button… …and click SEQUENCE-SIMILAR-TO SEQUENCE-SIMILAR-TO allows a few ways of finding similar sequences, but the most common is BLAST (the default choice). Like BLAST, the function needs a query sequence. Click the query box, and type the name of the gene T4p001 (don't worry about upper/lower case). Then press Enter to close the box. If you executed the function as it stands, it would search (by default) for protein matches. But if you didn't know this, you could specify explicitly what kind of search you want. To do this, mouse over the Options icon… …click Protein-vs-Protein (equivalent to BlastP), and click Apply. It’s possible to limit the search to different classes of proteins, but we’ll just accept the default – all proteins from all organisms and phages within PhAnToMe. The function is complete, so execute it. One way is to doubleclick the name of the function, SEQUENCE-SIMILAR-OF. But this time we'll do it the same way as before, through the action icon. Click Execute on the action menu. The function displays the results in a popup window for human consumption, but it also shows the result in the Result Pane (this shows what is available for future computation). There are evidently a great many proteins known that are similar to p-T4p001 (the protein encoded by the gene T4p001). Let's use this result. First X out of the pop-up display. The list of protein can be used directly (e.g. to make an alignment), but it is better practice to give the list a name so you can recall to you later what you did. To give it a name, mouse over the Definition button… …and click DEFINE. The DEFINE function asks for two things from you: the values you want to name, and the name of the variable that will contain these values. The name can be anything you'll remember (upper/lower case doesn't count). First the name of the variable. Click var to open up the variable box Type a name that makes sense (I chose rII-like) and press Enter to close the box. (The function cannot be executed if any box is open for entry) Next the values. They were given by the function I just executed. Drag that function by clicking and holding the name of the function, SEQUENCE-SIMILAR-TO. …and dragging it towards the value box When it reaches the value box, the box will become highlighted in red. At that point, release the mouse… …and the function will now reside in the value box. Execute this function as you have the others,… …by clicking Execute on the function's Action menu. Be careful not to use the action menu of the inner function SEQUENCE-SIMILAR-TO. That will work -- eliciting the sequence comparison – but no definition will take place. Nothing drastic seems to have happened, but if you look carefully, you'll note two changes. First, a list of phages has appeared in the Result pane. Second, a new Variables button has appeared. We'll use it momentarily. We wanted to use the Blast results, now stored in rII-like. …for what? Ah yes! The time has come to align the protein sequences. To do that, mouse over the Strings-Sequences menu… Problem - Find such proteins - Align them - Make a phylogenetic tree …and mouse over Bioinformatic-Tools…. …and click ALIGNMENT-OF. The ALIGNMENT-OF function asks for a sequence list. Fortunately, you now have one. Click the sequence-list box… …and mouse over your new Variables button… …and click your new variable rII-like button to bring it into the box. The function is now ready for execution, but there are two ways you can tweak the function settings to make the output more useful. To make these changes, mouse over the Options icon… …and click colored to produce a graphical alignment rather than pure text… …and click Label-with-organism to cause the alignment lines to be labeled with the names of the proteins' organisms rather the proteins themselves. Finally, click Apply… …and go to the action icon… …to execute the completed function. The graphical output is produced by a Java Applet called Jalview. Activate the applet. It might take several seconds to complete the alignment A useful alignment, perhaps. Now on to the phylogenetic tree. First, X out of the alignment. Back to the Strings-Sequences menu… Go to the Phylogenetic Tree submenu… … and click TREE-OF. Note that TREE-OF is asking for an alignment. Provide one by dragging the completed ALIGNMENT-OF function into the alignment box. Click and hold the ALIGNMENT-OF box… …and drag it towards its target, the alignment box. You'll know you've gotten there when it becomes highlighted. Release the function. The Colored option is no longer useful (the output it provides is just for human consumption, not for TREE-OF). Get rid of it by clicking its Delete icon. You may have noticed that the alignment you produced before had many columns that were mostly gapped. These are given too much weight by phylogeny programs. To remove those columns, modify the behavior of ALIGNMENT-OF by mousing over its Option icon… …clicking the No-gapped-columns option… …and finally clicking Apply. Now you're ready to execute in the usual way. (This will take longer than the alignment – perhaps a few dozen seconds) You should soon receive in separate popup windows a phylogenetic tree based on the no-gaps alignment of the rII-like sequences. As one might expect, the rII proten from phage T4 clusters with proteins from other enterobacteriophage. Integration of Tools Reflections and Coming Attractions This tour presented three of the most bioinformatic common tools employed by biological researchers: searching by local alignment (Blast), multiple sequence alignment, and construction of phylogentic trees. There are, of course, many, many more tools a researcher may find valuable, and the collective burden can be overwhelming. The case was presented that much is gained by putting the tools within a single interface, BioBIKE. Granted, BioBIKE has its own idiosyncrasies to learn, but at least it’s just one set. The interface that permits access to multiple tools and databases also permits the creation of new tools conceived by a research to address an immediate need, and this topic is explored in the tour, Creating New Tools.