* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SARS: Severe Acute Respiratory Syndrome
Artificial gene synthesis wikipedia , lookup
Expression vector wikipedia , lookup
Biochemistry wikipedia , lookup
Interactome wikipedia , lookup
Genetic code wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Point mutation wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Homology modeling wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
SARS: Severe Acute Respiratory Syndrome Author: Ryan D. Zaremba Severe Acute Respiratory Syndrome, also known as SARS, is a new and mysterious disease affecting thousands of people currently worldwide. Although little is currently known about the disease, breakthroughs occur on a daily basis concerning SARS. The disease is caused by a virus that invades the lungs of a human, causing severe pneumonia-like symptoms. Viruses are able to change their DNA, or mutate, and by doing so, are able to become resistant to many of our current vaccines. This is why many different types of SARS viruses were found throughout the world. Viruses are not classified as living species because they lack the proteins to replicate themselves although they do contain a DNA or RNA genome. They require the DNA replication proteins of a living cell to make more copies of the virus and begin an infection. SARS human coronavirus is transmitted through tiny droplets of saliva of infected people. When a healthy person ingests these droplets, the virus makes its way into the lungs. When in the lungs, it attaches itself to the cells of the alveoli (tiny pouches on the surface of the lungs used for the oxygen-carbon dioxide gas exchange in normal human respiration or breathing). SARS virus then injects its genome into the cell thereby taking over its replication proteins. The cell is held hostage by the replicating virus and many copies of the virus are made. Finally, the cell becomes filled with viruses and it bursts. It releases 1 the newly synthesized viruses to invade other cells and repeat the process. This then leads to a full- blown SARS viral infection. SARS virus is thought to be transmitted through the air (airborne) also. This tutorial will overview the sequences of the proteins within the virus that cause SARS. You will use the basic tools of the Biology Workbench to view the sequences of the protein, find similar “mutated” SARS viruses, and see where the differences between the two exist. Next, you will use Protein Explorer to view a SARS protein in 3-dimensions. Also, with this particular program, you will be able to specifically view the active site (the spot on the protein that causes the illness) of the protein. In order to make this an enjoyable and valuable learning experience, follow along step by step; to do this we recommend that you print out a copy of this tutorial. 2 Part I: Opening An Account Go to the Biology Student Workbench homepage. The address is: http://workbench.sdsc.edu Click on the link that says, “Set up a free account” If you already have an account, go ahead and log into the program. If not, this is what you need to do: Workbench will then ask you for some information. Type in all the information and click “Register”. 3 Type in your user name and password and click “OK”. You will be brought to the homepage for the Biology Workbench: Choose the background color that you want to work with on the Workbench by scrolling down the page. (Rose is the recommended color because it makes it easier to see contrasts in colors that will be used later in alignments of sequences). Click on “Session Tools”. Note: There are a variety of different layouts for the Biology Workbench. You can toggle between the different layouts by clicking on the Biology Workbench logo at the top of the screen. Click on it until you find the format that fits the layout of this tutorial. 4 Starting a New Session: There are a few different sets of tools supplied by the Workbench. They are “Session Tools”, “Protein Tools”, “Nucleic Tools”, “Alignment Tools” and “Structure Tools (Alpha)”. For this tutorial, we will only concern ourselves with “Session Tools”, “Protein Tools”, and “Alignment Tools”. To maintain some organization in your research, you should create a new session for every new topic you research. First, highlight “Start New Session” and click on “Run”. Next, the following screen will ask that you name the new session. Call this session “Severe Acute Respiratory Syndrome”. Then click on “Start New Session”. The new “Severe Acute Respiratory Syndrome” session will appear right below your “Default Session”. You will be able to click back and forth between the default and 5 SARS sessions easily. But remember, you are working on the SARS session, so make sure you have the “Severe Acute Respiratory Syndrome” selected. This tutorial will learn how to use the tools that will let you search the protein databases and analyze the protein sequences that you want to use. By clicking on the box that says “Protein Tools” at the top of the page, you will be brought to the Protein Tools homepage. It is “Empty” because you have not researched any specific protein sequences and saved them yet. To “import” a sequence means to save it for later use. By doing this, you will change the word “empty” to the name of the sequence that you imported. PART II: Importing Sequences from Protein Sequence Databases In this section of the tutorial, you will learn how to search databases for a specific protein sequence. You will be working with the sequence that encodes the Replicase protein and a few others of the SARS virus. Notice the scrollable textbox on the page that contains a variety of tools. You will use some of these tools within this tutorial. Highlight “Ndjinn – Multiple Database Search” and then click “Run”. 6 The next screen lists the different databases that you can search. In the search box at the top of the page type: “SARS”. This tells the search engine what to search for. Also, when searching, you are able to control the amount of results you get. Notice the “Hits per page” drop-down menu. This simply lets you decide how many sequences you want to display on the results page. To improve your choices, select “All”. As you scroll down the page, you will notice a list of different databases. They all contain a variety of sequences that you can search for. They are separated into 2 distinct groups: The first group contains sequences from many different organisms (for example, database “GBBCT” contains a large number of sequences from many different bacteria), whereas the second group contains the entire DNA genome sequences of specific organisms (for example, “Aaeo” contains the entire genome sequence of the bacterium Aquifex aeolicus). 7 Since you are looking for a viral protein, click on the box that is next to the SWISSPROT database. This is a large database filled with many different protein sequences from many different organisms. Scroll back up to the top of the screen and click “Search”. You will then be sent to a page that contains the results of your search. At the time this tutorial was written, the search engine found 49 matches for “SARS”. If you get more then 49 results, do not panic. Inconsistencies in the number of search results can occur because new sequences are being added to the databases on a daily basis. From the descriptions of the search results, we need to determine which one encodes a series of proteins including the Replicase protein. The sequence you want is “SWISSPROT:R1AB_CVHSA” (rank # 0). This is a specific sequence to a mutated SARS virus (known as a “strain”). There are minor differences between strains of SARS viruses. 8 Select this sequence and click on the “Import Sequence(s)” button at the bottom of the page. Your sequence will be transferred to the Protein Tools homepage. Part III: Finding another similar strain of SARS: Using BLASTP Now that you have a strain of SARS, you can use it to find another similar strain of the virus. Because we know that the SARS sequence that you have found already has a similar sequence to other strains, we can use the original strain to search a database for sequences that are extremely similar (or homologous) to it. In order to do this, you are going to use a tool called BLASTP, more commonly called BLAST. Scroll down the text box on the Protein Tools homepage and look for the tool called BLASTP – Compare a PS to a PS DB”. (This stands for “Compare a Protein Sequence to a Protein Sequence Database”). Select this tool and ensure that your SWISSPROT:R1AB_CVHSA protein sequence has a checkmark in the box next to it. Then click on the “Run” button. 9 You will next be sent to a screen that will give you many options. These options allow the user to specify their search. For the purposes of this tutorial you do not need to deal with this. The important step here is to choose the database that you want to use for your homology search. Scroll down in the text box until you come to the “GenBank Viral Sequences” database. Once again, you are only interested in the viral protein databases because you are dealing with viral sequences. Select “GenBank Viral Sequences” and then click on the “Submit” button at the bottom of the page. You will be sent to the following screen that contains the results of your BLASTP search: 10 The first column after the description of the sequence is the “Score (bits)” column. You will see that the first sequences with Score (bits) values have very high ones. A Score (bits) value above 400 means that the sequence is very similar to the original sequence you compared it to. However, you can be more certain of the extent of the similarity between the two sequences by looking at the “E Value”. This number is to the right of the “Score (bits)” value. Essentially, the smaller the E Value, the more similar the sequence is to the original sequence BLASTED. An E Value of zero represents a perfect or near perfect match. As you can see, all the E Values shown are zero. This shows the high degree of similarity between different strains of the SARS virus. You will be able to see how similar each of them are later in the tutorial. Now it is time to decide which strain sequence you want to use to compare to your original sequence. Since you know that different strains of SARS virus are very similar, you should expect them to be nearly identical (have very high homology). By reading the descriptions of the sequences, you notice that most of them say “[SARS coronavirus…]”. This indicates that it is a SARS virus and the following code of letters indicates the specific strain. You are able to find out more information about the 11 sequences by checking the box next to the sequence and clicking the “Show Record(s)” button. Highlight the first few boxes and click “Show Record(s)”. Notice that a detailed report is given to you for each sequence you highlighted. For this tutorial we will be working with one of the strains found in Shanghai, China. If you are working in Internet Explorer, click on the “Back” button or, if you are working in Netscape, simply close the Records window (by clicking on the box in the top lefthand corner of the window). You should be back on the screen that contains the results of you BLASTP search. Make sure that only the Shanghai sequence (GBVRL:40457448_40457449) is highlighted. Now scroll to the bottom of the screen and click on “Import Sequence(s)”. Your Shanghai strain of SARS virus will be imported to the Protein Tools homepage. Now you have two different strains of SARS virus sequences. You can compare and contrast the sequences. It is time to see where the differences lie in these two sequences. 12 Part IV: Aligning the Sequences: Using CLUSTALW In order to compare the protein sequences, they must be aligned one on top of the other. This is the purpose of the CLUSTALW tool. The alignment process takes place by comparing the two sequences and finding common regions within them. Therefore, alignment is a key step for you to determine where differences in the sequences are located. The two sequences that you imported are aligned one on top of the other using colors that show you where similarities and differences exist. Scroll down the textbox menu and highlight “CLUSTALW – Multiple Sequence Alignment”. Click no you two SARS virus sequences, SWISSPROT:R1AB_CVHSA and GBVRL:40457448_40457449. Then click on the “Run” button. Another careen will appear in which alignment options can be altered- we are going to use the default settings so just click on the “Submit” button. You will now be taken to a screen that will show you the aligned sequences. You will notice that in the middle of the alignment, there are dashes (-----) for the sequence GBVRL:40457448_40457449. This is due to the fact that the scientists that 13 submit sequences to the databases have been unable to determine the sequence of the protein at this spot. All amino acids in the sequences that are the same are colored bright blue. For the sake of this tutorial, all amino acids that are colored dark blue and green are very similar amino acids, but they ARE different. All amino acids that are colored black are not similar at all. These are the amino acids that cause there to be different strains of the SARS virus. Conclusion Using the various tools in the workbench, you have shown that different strains of the SARS virus are caused by small mutations (changes) in amino acids in protein sequences. What effect could this have on the proteins of different virus strains? In the next part of this tutorial, you will use a tool that will allow you to see a SARS protein in 3-D. PART V: Using Protein Explorer to Visualize the Mpro Protein In the first part of this tutorial you identified the amino acid sequence of the SARS virus and noticed that there were minor differences between different strains of the virus. Now you are going to use a program that is available on the web at http://molvis.sdsc.edu/protexpl/frntdoor.htm (or its accessible through the Biology 14 Workbench website) to visualize a SARS protein in 3-D. The program will also show you where the active site of the protein is. The two protein sequence that we are going to use is: 1Q2W = SARS Mpro protein First we will look at the 3-dimensional view of the protein and then we will highlight the active site. WARNING: It is important that you follow these directions step by step and do everything exactly as it says in this tutorial. The Protein Explorer software may crash if you try to move through it too quickly. This tutorial includes several warnings to prevent you from moving ahead too quickly…if you pay attention to them, you should not have a problem. PART A: 1. Go to the website for the Biology Workbench by typing in the web address: <workbench.sdsc.edu> 2. Scroll down the page until you see the link that says “Protein Explorer”. You will arrive at a new screen where you need to type in the name of the protein that you would like to see. Type in “1Q2W” in the first space provided. This will bring up the Mpro protein for us to see. Once you have typed in “1Q2W”, click on the “Go” button. 15 3. Be patient: it may take a few seconds to load the information. On Windows machines an intermediate window may appear – if so, click “Start Explorer Session”. This window will appear. Do not press the “OK” button until the red “Busy” sign says “Ready”. Once you have pressed the “OK” button AFTER the sign says “Ready”, we need to do a few more things to make sure that the program does not crash. 4. Click on the “Toggle Spinning” button. This will stop the molecule from spinning which can cause the computer to crash. 16 Once you have stopped the spinning, we are going to hide the water molecules because the can prevent us seeing the things we want to see. Click on the button that says “Hide/Show Water”. At this point, if you have followed the directions correctly, the molecule should look like this: If you do not see this on your screen, re-read the directions and start over. 5. Before we move on, let’s take a look at our Mpro protein. Place you cursor (mouse arrow) on the molecule and hold down the left button on the mouse (or the only button on the mouse if your mouse only have one button). Now move the mouse around. This will allow you to rotate your molecule and look at it from any angle you wish. We will use this technique later on to find the active site of the protein. 6. Now, we are going to find the active site of the protein. In the box on the top left, click on “Explore More!”. 17 The screen on the left will change. Now, we are going to make some changes to our molecule that will make it easier for us to see what is going on. 7. First, let’s change the way the protein looks. Click on the menu that says “Display” and highlight “Cartoon” in the drop-down menu. IF YOU WANT TO PLAY AROUND WITH THE DIFFERENT TYPES, YOU CAN CHOOSE ANY OF THE CHOICES ABOVE THE LINE (SPACEFILL AND ABOVE) 18 TO SEE WHAT THE MOLECULE WILL LOOK LIKE. WE WILL NOT USE ANY OF THE CHOICES BELOW THE LINE. Before you continue, make sure that “Cartoon” is highlighted before continuing with the tutorial. 8. Now we are going to change the color of the protein. As of now, it appears as two colors because two of the same smaller proteins join to form one protein. This is called a protein dimer. Click on the menu box that says “Color” and choose “Blue”. 9. Now we are going to get more specific. We know that all proteins have an active site on them. So we are going to change the color of one of the smaller monomers (two monomers make up the dimer) so that it stands out from the rest of the protein. This will make it easier for us to see where the active site is. Click on the “Select” menu and choose “Chain A”. 19 Nothing visible should happen right away after you choose this. Go to the “Color” menu and choose the color “Red”. After you have chosen “Red”, Chain A in your protein should turn red. Rotate the protein with your mouse again to see what the protein looks like now. As you can see, the red chain and the blue chain are actually identical to each other. They are stuck to each other in a diagonal way. 20 10. If you have done everything correctly so far, your protein should look like this: Now that you can tell that there are really two proteins making up the SARS protein, let’s take a closer look. Because amino acids make up proteins, the active site of this protein should be made of amino acids also. Let’s see if we can find it… 11. Click on the link that says “Advanced Explorer” (you may need to scroll down). 21 REMEMBER: DO NOT PRESS ANY KEYS OR BUTTONS WHEN THE RED “BUSY” SIGN IS ON THE SCREEN UNDERNEATH THE PROTEIN! IF YOU PRESS SOMETHING, THE PROGRAM MIGHT CRASH! 12. After the “Busy” sign has changed to “Ready”, click on the link that says “Seq3D”. This is going to allow us to view individual amino acids on the protein. We will use this tool to find out where the active site on the protein is. 13. Go to the “CPK Colors” menu and choose “Sec’y Structure”. Nothing should have happened to the protein yet. This is going to allow us to pick specific amino acids to highlight. 22 14. Now, look at the bottom left-hand panel. Each one of those letters represents an amino acid in the protein sequence. We want to view the amino acids that make up the active site of the protein. Each of the 20 amino acids has a specific letter that represents it (F-phenylalanine, Rarginine, K-lysine, etc.). As you roll the cursor over a letter, the text box above the sequences tells you what amino acid you have the cursor on. It also tells you the position of the amino acid (the first F in the A chain represents phenylalanine and it is in the number 3 position). We can use this information to find the active site. The sequence of the active site of this protein is: CGTT—L—HV—CTAE—MLNP—Y—C—HM—DRQ 22 3 4 5 27 41 2 44 5 6 7 49 50 1 2 54 145 164 5 187 8 9 The positions of the amino acids are indicated under the sequence. The first amino acid of the active site is cysteine (C) at position 22. Glycine (G) is next at position 23. With your mouse, click on the “C” at position 22 on chain A. When you click on the C, the 23 site at which this amino acid is at should be visible on the protein. It should appear as a colored bump. Next, click on the “G” at position 23. Then, click on the “T” at position 24 and 25. Continue to click on the amino acids that correspond to the active site sequence. When you are done you should be able to see the active site and your protein should look like this: 24 References Chou, Kuo-Chen. Wei, Dong-Qing. Zhong, Wei-Zhu. Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS. Science Direct, www.sciencedirect.com, 2003 July 3. Marra, Marco A. The Genome Sequence of the SARS-Associated Coronavirus. Science. Vol. 300. 2003 May 30. 1399-1404. Rota, Paul A. Characterization of a Novel Coronavirus Associated with Severe Acute Respiratory Syndrome. Science. Vol. 300. 2003 May 30. 1394-1399. World Wide Web. Basic Information About SARS. Center for Disease Control. Visited 2/11/04. http://www.cdc.gov/ncidod/sars/factsheet.htm World Wide Web. Amino Acid Abbreviations used in Protein Sequences. RRDC. Visited 5/7/04. http://www.dur.ac.uk/biological.sciences/Bioinformatics/aminoacids.htm 25 26