Download Exercises

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of genetic engineering wikipedia , lookup

Genetic code wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic library wikipedia , lookup

Gene wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Gene nomenclature wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Human genome wikipedia , lookup

Genome editing wikipedia , lookup

NEDD9 wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Protein moonlighting wikipedia , lookup

Helitron (biology) wikipedia , lookup

Point mutation wikipedia , lookup

Genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Άσκηση 2
Αναζήτηση και αποκομιδή δεδομένων από μοριακές
βιολογικές βάσεις δεδομένων
Exercises - Day 1
Retrieval of data from molecular biology databases.
Short Entrez and SRS Tutorials
Entrez and SRS Tutorials
I. Entrez
1. The signal recognition particle (SRP) has six different protein subunits. Retrieve the sequence of the
human SRP54 subunit. Connect to the Entrez page (www.ncbi.nlm.nih.gov/Entrez). Select Proteins as
the database. Make sure "All Fields" and Mode = "Automatic" is selected. Enter as search term for the
Protein database the word srp54 and press Enter. To refine this search enter as a new search word
human and press Enter again. Click on "Retrieve ... documents". Can you find the human protein in
the list of entries (The correct one is P13624) ? Click on the "GenPept report" link for that protein to
see the details of the database entry. You will see a the annotation section with information about the
database entry. Scroll down to the bottom of the page to see the actual amino acid sequence of the
protein.
Entrez can direct you to related entries from other databases. Click on one of the links under
DBSOURCE (gi:...) to see references to the nucleotide database. Another way to see links to the other
databases: In the previous page with the list of retrieved documents, for instance click on "Medline
link".
In the search described above you entered one search word ("srp54") and then refined the search by
adding another word ("human"). It is also possible to enter the two words at the same time:
human srp54
human AND srp54
srp54 human
are three alternatives that accomplish the same thing. Try one of these searches to see if you get the
same result as above.
What does a search like : srp54 OR human achieve?
You can make a more specific search by selecting fields. Go back to the original page of searching the
protein database (www.ncbi.nlm.nih.gov/Entrez/protein.html). Select "Title word" instead of "All
fields". Enter srp54 as search term. Refine the result by selecting the field "Organism" and searching
for human. You should end up with only one entry, the human SRP54 protein.
2. We want to find articles in Medline that deal with SR proteins (involved in the regulation of RNA
processing) and that have been published 1997 or 1998 in Nature, Cell or Science. To do this first
select the PubMed database. Enter "SR" and "splicing" as search words. Refine the resulting search by
selecting the field "Publication date" and enter "1997 OR 1998" as search term. To further refine the
search select the "Journal" field and enter "Nature OR Science OR Cell" as search term. How many
articles do you find?
This type of search could also been done with a single expression:
splicing [all fields] AND sr [all fields] AND ( nature [JOUR] OR cell [JOUR] OR Science [JOUR])
AND ( 1997 [publication date] OR 1998 [publication date])
3. Here's a more specialized type of search. We want to find protein sequences that have a sequence
length between 10,000 and 20,000. Go to the protein database, select the "sequence length" field, and
enter "010000:020000"
(The leading zero is necessary because the sequence length terms are all six-digit integers. When in
doubt, use "List terms" to see the terms in a list; the range operator will use the terms in the order that
they appear. )
How many sequences do you find? Click on one of them to see the exact number of amino acids.
II. Network Entrez client.
Try to retrieve the same type of information as above but using the Network Entrez client program.
This program is available from your PC desktop. For the first exercise select "Protein" as database.
Enter "human srp54" as search term. Press Enter or the Accept button.
III. Using SRS
II. Using SRS (Sequence Retrieval System)
1. Now let's try the SRS and see if we can extract the same type of information from that service.
Connect to the SRS page (srs.ebi.ac.uk:5000). Click on the alternative "Start a new SRS session".
Check "Swissprot" as database and then click "Continue". In the resulting page enter the word "srp54"
in one of the search fields (AllText) and "human" in one of the others. Click on "Do query". You obtain
a list of SRP54 sequences, including non-SRP54 proteins. To make a more specific search go back to
the search form and select as "Gene name" srp54. In another frame select as "Organism" homo
sapiens (or just homo). The result is the human SRP54 sequence. Click on that entry. Can you find
links to DNA databases?
Another way to see links to the other databases: Press the "Link" button at the top of the page. Select
"PROSITE" as database to be linked (check the box to the left of the "PROSITE" text). Press
"Continue". Click on the resulting PROSITE entry(entries). One line of the PROSITE entry reads:
P-[LIVM]-x-[FYL]-[LIVMAT]-[GS]-x-[GS]-[EQ]-x(4)-[LIVMF].
This is the pattern typical for SRP54 proteins.
2. We want to find E. coli proteins that have information about DNA binding in the annotation section
. Select Swissprot as database. Select as "FtKey" dna_bind (note the underscore character! ) and enter
as "Organism" *coli. (The expression *coli means any word that ends in 'coli'.)
3. We want to find Escerichia coli proteins that have (potential) transmembrane domains. Select
Swissprot as database. Select "FtKey" = "transmem" and "Organism" = "*coli".
4. As above with Entrez we want to find protein sequences that have a sequence length between 10,000
and 20,000. In SRS select Swissprot as database, select the field "SeqLength" and enter
"10000:20000". Try another range if nothing is found. Do you get the same result as with Entrez?
This exercise is found at www.medkem.gu.se/edu/molbiol/retrieval.html
Retrieval of information about
Hypoxanthine Guanine Phosphoribosyl
transferase (HGPRT).
In these exercises we want to retrieve all sorts of information about a protein. We will answer questions
like:
 What scientific papers are associated with the protein?
 How do we find the amino acid sequence? What DNA sequences encode the protein?
 How is the gene organized with respect to exons/introns?
 What genes are adjacent to our gene of interest?
 Is the 3D structure of the protein known? How do we find it?
 Is the protein described in the human genome database?
 Where is the gene positioned in the genome? On what chromosome is the gene?
 Are there diagnostic markers for polymorphisms?
 Are there genetic diseases related to the protein? What are the clinical symptoms?
To address these questions we will use the following WWW sites:
 NCBI Entrez
 SRS (Sequence retrieval system)
 GDB (Genome data base)
 PDB (Brookhaven Protein Data Bank)
 Molecules R US
 OMIM
As an exercise we will examine the enzyme hypoxanthine (guanine) phosphoribosyl transferase',
often abbreviated 'hprt' or 'hgprt').
Exercises
A. DNA sequence
1) Go to NCBI Entrez and select "Search WWW Entrez at NCBI Nucleotides ". Try to find the gene
that codes for human hprt, i.e. the entire genomic sequence including introns. (Clue: Use as search
words: hprt, human, and complete. Click on the button "Retrieve documents" to see the results of the
search. To see individual sequences click "Genbank report". Examine the annotation section of the
genomic hprt sequence to answer the following questions:
 What accession number is the sequence ? (First number in "ACCESSION" line)
 How many exons are there in this gene?
 What mutations are described and what diseases are they related to? Make a note of one of the
mutations and the associated disease.
When you have retrieved the gene sequence, go back one page and choose "Graphical view".Click on
one of the exons to zoom in on that area. Then go back and select "Protein link" instead of "Graphical
view". Select "GenPept report".
 What size is human hprt (how many amino acid residues)?
B. Protein sequence
Go back to NCBI-Entrez (www3.ncbi.nlm.nih.gov/Entrez/). Select "Proteins" as database. Try to find
the hprt from Escherichia coli.
 How many amino acids are there in this protein?
 Does the E. coli protein have the same substrate specificity as the human protein? (Examine
the different protein database entries corresponding to E. coli HPRT and see the annotation
section under "Comment ...")
 What scientific publication does the sequence refer to? (Examine links to MEDLINE) .
As a comparison to NCBI Entrez use SRS (Sequence Retrieval System). From the main page of SRS
select "Start a new SRS session". Choose Swissprot as database and click "Continue". Search the hprt
protein from Bacillus subtilis. (Enter "Bacillus subtilis" in the field "Organism" and "hprt" in the field
"All-text").
 Look in the annotation section for the Bacillus hprt. You should be able to find a link to
PROSITE, a database of protein sequence motifs. What motif is characteristic of this group of
enzymes?
Go back in SRS and select "Enzyme" as the database (instead of Swissprot that you selected before).
Search for "hypoxanthine".
 Can you find another enzyme that uses hypoxanthine as substrate?
Retrieve E. coli hprt using SRS and Swissprot as database. (Organism: Escherichia coli). You may
examine this sequence with BLAST by selecting the sequence (click in the box to the left of the protein
name). Select "Launch BLAST" and "Continue" in the resulting window.
 What enzymes related to E. coli hprt can you find in the result from the BLAST search in
addition to hprt homologs from other organisms?
C. 3D structure
Select from NCBI-Entrez (www3.ncbi.nlm.nih.gov/Entrez/) the "3D structures" database.
 Is the three-dimensional structure for HPRT known? What organisms? (Clue: Use the word
"hypoxanthine" as search word). Retrieve the hprt proteins by selecting "Structure summary".
Identify the human protein.
 View the molecule with the Entrez viewer (Launch viewer Cn3D) or Rasmol (Launch
RasMol viewer). How many subunits make up the protein? (In Rasmol select "Color - Chain"
. Each chain will be colored differently)

With the Entrez and Rasmol viewers you could see a small ligand bound to the protein. What
compound is this? To answer that question select the PDB link ("1HMP") on the Entrez page.
On the PDB page select "complete with coordinates" to see the actual PDB entry and
information about the ligand.
 What closely related 3D structures are there? (Select on the Entrez page "Structure neighbors:
A"). Compare to the result from the BLAST search above.
These exercises related to 3D structure may also be carried out using the "PDB" or "Molecules R US"
sites.
D. Genome data
i. The human genome.
For these exercises go to GDB. Search information on HPRT. From the main page of GDB select "All
biological data" and "keyword" and choose for instance "hypoxanthine" as search word.
 What genes code for HPRT?
 Which of these genes has an associated phenotype (is related to a disease)?
 On what chrososome is this gene? On what arm and band? Try to find a map of the
chromosome showing the location of hprt.
 What genes flank HPRT?
 Are there primer pairs for amplification of the entire HPRT coding region?
 What diseases are related to HPRT ? What are the clinical symptoms?
ii. The E. coli genome.
The complete nucleotide sequence of the E. coli genome became available in 1997. Select NCBI Entrez
"Search the NCBI genomes database" and go to the E. coli genome. Try to zoom in on the HPRT gene
which is located at nucleotide position ~142,000.
 What genes are adjacent to HPRT?
Many of these problems related to genome information may also be addressed with the NCBI-OMIM
or SRS tools.
Hypoxanthine Guanine Phosphoribosyl
transferase (HGPRT).
Purine nucleotides are assembled from a variety of simple compounds. The starting point for synthesis
is ribose 5 phosphate. This pathway is referred to as the de nove pathway of purine synthesis. The
committed step is the formation of phosphoribosyl pyrophosphate (PRPP) from ribose 5 phosphate.
Free purine bases are formed by the hydrolytic degradation of nucleic acids and nucleotides. Purine
nucleotides can be synthesized from these preformed bases by a salvage reaction, which is simpler and
much less costly than the reactions of the de novo pathway.
In salvage reactions, the ribose phosphate moiety of PRPP is transferred to a purine to form the
corresponding ribonucleotide:
PRPP
+ Purine
-->
Purine ribonucleotide
Two salvage enzymes with different specificities recover purine bases. Adenine phosphoribosyl
transferase catalyzes the formation of adenylate:
Adenine + PRPP --> adenylate + PPi
whereas hypoxanthine-guanine phosphoribosyl transferase catalyzes the formation of inosinate and
guanylate:
Hypoxanthine + PRPP --> inosinate + PPi
Guanine + PRPP --> guanylate + PPi