* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exercises
Survey
Document related concepts
History of genetic engineering wikipedia , lookup
Genetic code wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic library wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Gene nomenclature wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
Human genome wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Protein moonlighting wikipedia , lookup
Helitron (biology) wikipedia , lookup
Transcript
Άσκηση 2 Αναζήτηση και αποκομιδή δεδομένων από μοριακές βιολογικές βάσεις δεδομένων Exercises - Day 1 Retrieval of data from molecular biology databases. Short Entrez and SRS Tutorials Entrez and SRS Tutorials I. Entrez 1. The signal recognition particle (SRP) has six different protein subunits. Retrieve the sequence of the human SRP54 subunit. Connect to the Entrez page (www.ncbi.nlm.nih.gov/Entrez). Select Proteins as the database. Make sure "All Fields" and Mode = "Automatic" is selected. Enter as search term for the Protein database the word srp54 and press Enter. To refine this search enter as a new search word human and press Enter again. Click on "Retrieve ... documents". Can you find the human protein in the list of entries (The correct one is P13624) ? Click on the "GenPept report" link for that protein to see the details of the database entry. You will see a the annotation section with information about the database entry. Scroll down to the bottom of the page to see the actual amino acid sequence of the protein. Entrez can direct you to related entries from other databases. Click on one of the links under DBSOURCE (gi:...) to see references to the nucleotide database. Another way to see links to the other databases: In the previous page with the list of retrieved documents, for instance click on "Medline link". In the search described above you entered one search word ("srp54") and then refined the search by adding another word ("human"). It is also possible to enter the two words at the same time: human srp54 human AND srp54 srp54 human are three alternatives that accomplish the same thing. Try one of these searches to see if you get the same result as above. What does a search like : srp54 OR human achieve? You can make a more specific search by selecting fields. Go back to the original page of searching the protein database (www.ncbi.nlm.nih.gov/Entrez/protein.html). Select "Title word" instead of "All fields". Enter srp54 as search term. Refine the result by selecting the field "Organism" and searching for human. You should end up with only one entry, the human SRP54 protein. 2. We want to find articles in Medline that deal with SR proteins (involved in the regulation of RNA processing) and that have been published 1997 or 1998 in Nature, Cell or Science. To do this first select the PubMed database. Enter "SR" and "splicing" as search words. Refine the resulting search by selecting the field "Publication date" and enter "1997 OR 1998" as search term. To further refine the search select the "Journal" field and enter "Nature OR Science OR Cell" as search term. How many articles do you find? This type of search could also been done with a single expression: splicing [all fields] AND sr [all fields] AND ( nature [JOUR] OR cell [JOUR] OR Science [JOUR]) AND ( 1997 [publication date] OR 1998 [publication date]) 3. Here's a more specialized type of search. We want to find protein sequences that have a sequence length between 10,000 and 20,000. Go to the protein database, select the "sequence length" field, and enter "010000:020000" (The leading zero is necessary because the sequence length terms are all six-digit integers. When in doubt, use "List terms" to see the terms in a list; the range operator will use the terms in the order that they appear. ) How many sequences do you find? Click on one of them to see the exact number of amino acids. II. Network Entrez client. Try to retrieve the same type of information as above but using the Network Entrez client program. This program is available from your PC desktop. For the first exercise select "Protein" as database. Enter "human srp54" as search term. Press Enter or the Accept button. III. Using SRS II. Using SRS (Sequence Retrieval System) 1. Now let's try the SRS and see if we can extract the same type of information from that service. Connect to the SRS page (srs.ebi.ac.uk:5000). Click on the alternative "Start a new SRS session". Check "Swissprot" as database and then click "Continue". In the resulting page enter the word "srp54" in one of the search fields (AllText) and "human" in one of the others. Click on "Do query". You obtain a list of SRP54 sequences, including non-SRP54 proteins. To make a more specific search go back to the search form and select as "Gene name" srp54. In another frame select as "Organism" homo sapiens (or just homo). The result is the human SRP54 sequence. Click on that entry. Can you find links to DNA databases? Another way to see links to the other databases: Press the "Link" button at the top of the page. Select "PROSITE" as database to be linked (check the box to the left of the "PROSITE" text). Press "Continue". Click on the resulting PROSITE entry(entries). One line of the PROSITE entry reads: P-[LIVM]-x-[FYL]-[LIVMAT]-[GS]-x-[GS]-[EQ]-x(4)-[LIVMF]. This is the pattern typical for SRP54 proteins. 2. We want to find E. coli proteins that have information about DNA binding in the annotation section . Select Swissprot as database. Select as "FtKey" dna_bind (note the underscore character! ) and enter as "Organism" *coli. (The expression *coli means any word that ends in 'coli'.) 3. We want to find Escerichia coli proteins that have (potential) transmembrane domains. Select Swissprot as database. Select "FtKey" = "transmem" and "Organism" = "*coli". 4. As above with Entrez we want to find protein sequences that have a sequence length between 10,000 and 20,000. In SRS select Swissprot as database, select the field "SeqLength" and enter "10000:20000". Try another range if nothing is found. Do you get the same result as with Entrez? This exercise is found at www.medkem.gu.se/edu/molbiol/retrieval.html Retrieval of information about Hypoxanthine Guanine Phosphoribosyl transferase (HGPRT). In these exercises we want to retrieve all sorts of information about a protein. We will answer questions like: What scientific papers are associated with the protein? How do we find the amino acid sequence? What DNA sequences encode the protein? How is the gene organized with respect to exons/introns? What genes are adjacent to our gene of interest? Is the 3D structure of the protein known? How do we find it? Is the protein described in the human genome database? Where is the gene positioned in the genome? On what chromosome is the gene? Are there diagnostic markers for polymorphisms? Are there genetic diseases related to the protein? What are the clinical symptoms? To address these questions we will use the following WWW sites: NCBI Entrez SRS (Sequence retrieval system) GDB (Genome data base) PDB (Brookhaven Protein Data Bank) Molecules R US OMIM As an exercise we will examine the enzyme hypoxanthine (guanine) phosphoribosyl transferase', often abbreviated 'hprt' or 'hgprt'). Exercises A. DNA sequence 1) Go to NCBI Entrez and select "Search WWW Entrez at NCBI Nucleotides ". Try to find the gene that codes for human hprt, i.e. the entire genomic sequence including introns. (Clue: Use as search words: hprt, human, and complete. Click on the button "Retrieve documents" to see the results of the search. To see individual sequences click "Genbank report". Examine the annotation section of the genomic hprt sequence to answer the following questions: What accession number is the sequence ? (First number in "ACCESSION" line) How many exons are there in this gene? What mutations are described and what diseases are they related to? Make a note of one of the mutations and the associated disease. When you have retrieved the gene sequence, go back one page and choose "Graphical view".Click on one of the exons to zoom in on that area. Then go back and select "Protein link" instead of "Graphical view". Select "GenPept report". What size is human hprt (how many amino acid residues)? B. Protein sequence Go back to NCBI-Entrez (www3.ncbi.nlm.nih.gov/Entrez/). Select "Proteins" as database. Try to find the hprt from Escherichia coli. How many amino acids are there in this protein? Does the E. coli protein have the same substrate specificity as the human protein? (Examine the different protein database entries corresponding to E. coli HPRT and see the annotation section under "Comment ...") What scientific publication does the sequence refer to? (Examine links to MEDLINE) . As a comparison to NCBI Entrez use SRS (Sequence Retrieval System). From the main page of SRS select "Start a new SRS session". Choose Swissprot as database and click "Continue". Search the hprt protein from Bacillus subtilis. (Enter "Bacillus subtilis" in the field "Organism" and "hprt" in the field "All-text"). Look in the annotation section for the Bacillus hprt. You should be able to find a link to PROSITE, a database of protein sequence motifs. What motif is characteristic of this group of enzymes? Go back in SRS and select "Enzyme" as the database (instead of Swissprot that you selected before). Search for "hypoxanthine". Can you find another enzyme that uses hypoxanthine as substrate? Retrieve E. coli hprt using SRS and Swissprot as database. (Organism: Escherichia coli). You may examine this sequence with BLAST by selecting the sequence (click in the box to the left of the protein name). Select "Launch BLAST" and "Continue" in the resulting window. What enzymes related to E. coli hprt can you find in the result from the BLAST search in addition to hprt homologs from other organisms? C. 3D structure Select from NCBI-Entrez (www3.ncbi.nlm.nih.gov/Entrez/) the "3D structures" database. Is the three-dimensional structure for HPRT known? What organisms? (Clue: Use the word "hypoxanthine" as search word). Retrieve the hprt proteins by selecting "Structure summary". Identify the human protein. View the molecule with the Entrez viewer (Launch viewer Cn3D) or Rasmol (Launch RasMol viewer). How many subunits make up the protein? (In Rasmol select "Color - Chain" . Each chain will be colored differently) With the Entrez and Rasmol viewers you could see a small ligand bound to the protein. What compound is this? To answer that question select the PDB link ("1HMP") on the Entrez page. On the PDB page select "complete with coordinates" to see the actual PDB entry and information about the ligand. What closely related 3D structures are there? (Select on the Entrez page "Structure neighbors: A"). Compare to the result from the BLAST search above. These exercises related to 3D structure may also be carried out using the "PDB" or "Molecules R US" sites. D. Genome data i. The human genome. For these exercises go to GDB. Search information on HPRT. From the main page of GDB select "All biological data" and "keyword" and choose for instance "hypoxanthine" as search word. What genes code for HPRT? Which of these genes has an associated phenotype (is related to a disease)? On what chrososome is this gene? On what arm and band? Try to find a map of the chromosome showing the location of hprt. What genes flank HPRT? Are there primer pairs for amplification of the entire HPRT coding region? What diseases are related to HPRT ? What are the clinical symptoms? ii. The E. coli genome. The complete nucleotide sequence of the E. coli genome became available in 1997. Select NCBI Entrez "Search the NCBI genomes database" and go to the E. coli genome. Try to zoom in on the HPRT gene which is located at nucleotide position ~142,000. What genes are adjacent to HPRT? Many of these problems related to genome information may also be addressed with the NCBI-OMIM or SRS tools. Hypoxanthine Guanine Phosphoribosyl transferase (HGPRT). Purine nucleotides are assembled from a variety of simple compounds. The starting point for synthesis is ribose 5 phosphate. This pathway is referred to as the de nove pathway of purine synthesis. The committed step is the formation of phosphoribosyl pyrophosphate (PRPP) from ribose 5 phosphate. Free purine bases are formed by the hydrolytic degradation of nucleic acids and nucleotides. Purine nucleotides can be synthesized from these preformed bases by a salvage reaction, which is simpler and much less costly than the reactions of the de novo pathway. In salvage reactions, the ribose phosphate moiety of PRPP is transferred to a purine to form the corresponding ribonucleotide: PRPP + Purine --> Purine ribonucleotide Two salvage enzymes with different specificities recover purine bases. Adenine phosphoribosyl transferase catalyzes the formation of adenylate: Adenine + PRPP --> adenylate + PPi whereas hypoxanthine-guanine phosphoribosyl transferase catalyzes the formation of inosinate and guanylate: Hypoxanthine + PRPP --> inosinate + PPi Guanine + PRPP --> guanylate + PPi