Download Expt. 2 Bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

QPNC-PAGE wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Western blot wikipedia , lookup

Protein adsorption wikipedia , lookup

Proteolysis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Biochemistry wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Homology modeling wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Biochemistry 3020
Experiment #1
Computers, the Web and Bioinformatics
The computer is a critical tool in laboratory research, particularly in biochemistry.
All major pieces of scientific equipment in research laboratories are connected to a
computer to enable data collection, particularly over extended periods of time. Not only
is it used for data collection and analysis as well as writing, but when connected to the
Internet it can be used for searching the biochemical literature, accessing data bases for
protein and nucleic acid structure, seeking research protocols and methodology, etc. In
this experiment, you will be introduced and gain skills in bioinformatics. The internet is a
a worldwide matrix that allows all connected computers and networks to communicate
with each other. File transfer protocol (ftp) is the most widely used facility on the
Internet allowing for the placement and retrieval of network data.
The World Wide Web (WWW) is the most rapidly growing component of the
Internet. It permits the transfer of data as pages in multimedia form including text,
graphs, audio and video, linked together by hypertext pointers allowing for the retrieval
of data stored on different computers in different locations. Documents used on the Web
are written in Hypertext Markup Language (HTML). Accessing the documents on the
Internet requires a browser which is an interface program that allows for reading
hypertext documents and the display of Web pages on your computer. The most common
browsers are Netscape Navigator and Internet Explorer.
If you are using the university computers, the university home page will be
displayed as a starting point. Off campus, it will depend on the browser you are using.
Often Netscape Navigator or Internet Explorer home pages will be displayed. You may in
fact have your own home page as a starting point. Every home page will have a dialogue
box into which you can type text. To request a specific web page from another computer
type in the web address usually in the form of http://www.~ This will get you to the
specific home page which will generally display its own set of instructions for navigating
through the site. You may notice on some pages that certain words are highlighted. If you
click on these words, called hyperlinks, they quickly will connect you to another related
page that provides specific information related to the hyperlink. Clicking on the Back
button in the menu bar will take you back to the original home page. Table 1 contains
some useful web addresses or websites for biochemistry. However these are just a few
and such sites are continually changing, being either deleted or updated and new ones are
constantly being added to the Internet. In order to effectively access or find critical sites
one uses a search engine which is a searchable directory that organizes Web pages by
subject or classification according to the information one types in. Some such search
engines include Google, Excite HotBot, Lycos, Netscape Search, Yahoo, AltaVista etc.
Surfing the net as it is commonly referred to allows you to find particular information
through these search engines.
The Biochemical Literature
The Internet should not replace the library. It is critical that you become aware of what is
in the library and how to use it. An important part of all research is to search the literature
and the library, as well as other sources such as the Internet. The Internet should be
considered a research tool much like any instrument in the laboratory.
Research begins with the generation of an idea, looking for the answer to a
question, proving a particular hypothesis, studying a particular problem etc., but this idea
most often develops after extensive reading of the literature. Reading the literature gives
one a clear idea of what has already been already been done and what is currently known
in terms of relevant research that might pertain to the research question. Once that is
known, a research direction is more easily focused. Knowledge of the literature, what has
already been done and what is currently being done, allows one to design and develop
experiments. Such design and development also requires knowledge of the books and
journals that are available in which to find such experimental methods. Throughout the
experiment the researcher/experimenter may have to refer to the literature for physical
constants, known data in which to compare her results. There are many handbooks and
encyclopedias that are specific to those types of constants.
Biochemistry as you know, in an inter-related discipline overlapping and
connecting the biological sciences, the physical sciences, and the basic medical sciences.
Thus there are many textbooks, research journals, computer information retrieval services
and handbooks available.
Your Textbook:
Your first exposure to the biochemistry literature will be your textbook for the
course, a general textbook of biochemistry. There are many more advanced textbooks and
ones that are specific to a particular area of biochemistry, but in the beginning stages of
study a general, broad spectrum textbook is a necessary tool. This becomes your starting
point.
Research Journals and Methodology References
Research journals are critical to biochemical research and comprise the core of
biochemical literature. This is where all the current research is published. There are many
journals in every discipline, some more prestigious than others, all with the intent of
keeping scientists current and up to date. Many journals are now available on-line as well
as on CD-ROM. The Journal of Biological Chemistry, Biochimica et Biophysica Acta,
Biochemical Journal, The Journal of Biochemistry, and Biochemical and Biophysical
Research Communications are among the more commonly used and prestigious of
biochemistry research journals. Some of the more useful biochemical methodology
publications are Analytical Biochemistry (monthly), Analytical Chemistry (monthly),
Biochemical Preparations (annual), Current Protocols in Molecular Biology (two
volumes updated quarterly). Some important protocol textbooks include: Laboratory
Techniques in Biochemistry and Molecular Biology. T.S. Work and R.G. Burdon, (Eds).,
Methods of Enzymatic Analysis. H. Bergmeyer (Ed.), Methods in Enzymology, A
Practical Guide to Molecular Cloning. These methodology references are excellent
sources for describing techniques and aids in designing experiments.
Reference Books and Review Publications
Textbooks, particularly introductory textbooks, tend not to have or have very
limited specialized and detailed biochemical information. This type of information can be
found in reference books which can range from general to very specialized series, the
best of which are published in multi-volumes on a periodic basis. The periodic basis
allows for the publication of updated and current information and generally appears in
weekly, biweekly, monthly, semi-annual or annual publications. Each volume will cover
a specialized area with articles written by experts in the field. The Annual Review of
Biochemistry is one of the most widely used review publications. Trends in Biochemical
Sciences in another useful and widely read review publication containing shorter articles.
Handbooks of Chemical and Biochemical Data
Handbooks of Chemical and Biochemical Data are critical for providing
definitions of terms, important reference values such are Rf values, molecular weights,
physical constants such as boiling points, melting points, etc. Much of this data is now on
the Web but sometimes, particularly when writing a paper for a journal, the Web is not a
recognized literature source simply because sometimes the information is not correct.
Thus literature handbooks like the Dictionary of Biochemistry and Molecular Biology,
Glossary of Biochemistry and Molecular Biology, Merck Index, Practical Handbook of
Biochemistry and Molecular Biology, Worthington Enzyme Manual are more legitimate
sources.
Computer-Based Searches, Web Directories and Databases
When you are doing a search it is often a daunting task to review all the journals
and literature sources. It is easier to search the abstracts, a publication that provides brief
summaries of published articles, reviews, and patents. Such abstracts include Chemical
Abstracts and Biological Abstracts. Current Contents and Chemical Titles are two
publications that keep up with published articles and they are published every two weeks.
Both of these are published on line. There are many scientific databases online. Some of
the most useful STN databases for the life sciences include BIOSIS Previews/RN, CA
(chemical abstracts), MEDLINE and MEDLARS. Many of these databases can be
accessed free of charge particularly if used from a institution, while others do have user
fees.
Databases are critical to retrieving bibliographic, nucleic acid sequence, protein
sequence and structure , metabolic pathways, transcription factors, enzymes and many
other types of information. The best way of collecting lists of information and tools
relevant to your research is by accessing directories that collect lists of information, tools
and other services. Many of these are hyperlinked to other useful sites. FASTA (used for
finding protein amino acid sequences) and BLAST (used for comparing protein sequence
data) RasMol or RasMac (gives coordinates for protein structure manipulation), Chime
(protein structure coordinates, SWISS-MODEL (protein structure modeling), VAST
(protein structure similarities, and Molecules R Us (protein structure coordinates) are
some of the databases used for modeling.
Table 1: Web Databases, Directories and Tools
Protein Data Bank (PDB)—Protein Structures determined by X-ray and NMR
http://www.rcsb.org/pdb/
European Bioinformatics Institute—DNA Sequences
http://www.ebi.ac.uk/
National Center for Biotechnology Information (NCBI)— Variety of databases and
resources
http://www.nlm.nih.gov/
Swiss-Protein—Protein sequences and analysis
http://www.expasy.ch/tools/
Biocatalysis/Biodegradation Database of the University of Minnesota—Microbial
metabolism of many chemicals
http://www.labmed.umn.edu/umbbd/index.html
REBASE-The Restriction Enzyme Database—Restriction enzyme direction and action
http://rebase.neb.com/
Georgia Institute of Technology—Tutorials on PDB and RasMol
http://www.chemistry.gatech.edu/faculty/williams/bCourse_information/4582/lab
s/rasmol_pdb.html
The Institute for Genomic Research—Collection of genomic databases
http://www.tigr.org/
RasMol (RasMac)—Molecular graphics for proteins
http://www.umass.edu/microbio/rasmol/
Predict Protein—Protein sequence and structure prediction
http://www.embl-heidelberg.de/predictprotein
Gen Quiz—Protein function analysis based on sequence
http://www.sander.ebi.dc.uk/gqsrv/submit
Pedro’s Biomolecular Research Tools
http://www.public.iastate.edu/~pedro/research_tools.html
Biology Workbench
http://biology.ncsa.uiuc.edu
CMS Molecular Biology Resources
http://www.sdsc.edu/ResTools/smshp.html
BioTech
http://bioech.icmb.utexas.edu
Protocol Online
http://www.protocol-online.net
Chem Connection
http://chemconnect.com/news/journals.html
American Chemical Society
http://pubs.acs.org/
Table 2: Useful Programs for Exploring Structures and Sequences
BLAST
Searches for similar nucleic acid and protein sequences
Chime
Protein structure on moving 3-dimensional coordinates
Entrez (NCBI)
Database of gene sequences
FASTA
Searches for similar protein structures
GenBank (NCBI)
Database of gene sequences
Molecules R Us
Provides coordinates for protein 3D structure and manipulation
RasMol (Ras Mac)
Provides coordinates for protein 3D structure and manipulation
SRS (EMBL)
Sequence retrieval system for cross-referencing databases
Table 3: Internet Terminology
Biological databases--computer sites that contain organized and stored files of
information consisting of literature references, nucleic acid sequences, protein sequences
and protein structures.
Bookmark--a function within Netscape Navigator and other browsers that allows the
user to save a Web site address for later use.
Browser--An interface program such as Netscape that reads hypertext and displays Web
pages on your computer.
Domain—the computer user’s location or local network
e-mail—means of exchanging messages or connecting over the net via computers;
electronic mail
favorites—form of a bookmark used in Internet Explorer
Freeware—software provided free of charge by the developer and is generally able to be
downloaded from the internet
ftp—file transfer protocol; a mechanism of transferring files or data over the network
home page—the beginning page for access to the Web. Each institute or individual will
have a home page containing relevant information and often links to other relevant
information or sites.
HTML—HyperText Markup Language; a special coded language used to write Web
pages.
Hyperlink—a link or connection between web pages usually highlighted such that if you
click on it, it will take you to the page.
Hypertext—the language used to connect similar documents on the Web.
Internet—world wide connection of computers, a matrix that allows the communication
of all computers and networks.
Java—a language used on the Web to allow for the incorporation of multimedia into
Web pages
Modem—an electronic device that allows for the connection of computers by a signal
through phone lines.
Multimedia—the form of media that allows for the incorporation of all types of media
from text to graphics to video and audio, etc.
Search engine—a searchable directory on the Web that organizes Web pages and
information by category and subject classification.
Server—A large mainframe computer that acts as a storage site for retrievable data. The
university server, for example, contains all the data for the university and is accessible by
other computers on or off campus.
URL—Uniform Resource Locator; the standard address form used to identify and locate
a document on the Web, usually prefaced by http://
Web site—the collection of documents or Web pages on a server
WWW—World Wide Web, refered commonly to as “The Web” – the component of the
Web that uses hypertext language to provide resources.
The purpose of this experiment is to help you gain knowledge and experience in
retrieving information from the Web. The following is a tutorial through which you will
work. You will use PubMed to search for mushroom tyrosinase and the other protein
databases to search for -lactalbumin. Once you have completed the tutorial and are
familiar with the basics of searching the Internet for information on proteins after which
you will be given protein to search and answer the questions for your report.
1. Searching the Biochemical Literature on PubMed
Tutorial:
1. Log into the computer you are working on and go to the university home page. At
the top of the page is the URL or university address.
2. Highlight all of the address and delete it to http:// and then type in the URL
(http://www.nlm.nih.gov/) to connect you to the National Center for
Biotechnology Information, also called the United States National Library of
Medicine (National Institute of Health). On the left hand side there are some
topics of interest such as the Human Genome Resources Library Catalogue and
Services; Network of Medical Libraries; Biomedical Research and Informatics;
Environmental Heath and Toxicology, etc. Clicking on any of these will take you
to those sites.
3. You should see a hyperlink (highlighted link) “PubMed” on the right-hand side.
Click on this with your mouse and it takes you to the Web page. On the left-hand
side of the page you will see under Overview a link to the PubMed Tutorial. You
should work through this first so that you have a good idea how to navigate this
site. The following instructions will be pointers but it is up to you to become
familiar with navigating the site and its features.
4.
If you click on Entrez in the upper menu bar it will take you to the features or
PubMed-the cross-data base search page. At the bottom of the page it tells you
how to use the PubMed Search.
The menu bar at the top give a list of searchable items and the left dialogue box
allows you to search specific areas for a topic. For example, if you search
PubMed for lysozyme (the enzyme you will be isolating in Project I), you will get
a number of articles pertaining to that topic. Beside each article will be a
highlighted “Related Articles” which when clicked on will take you to the related
article.
If you want to search all the data bases for a topic type it in and click on Go. The
number of articles found in each database will appear as a number on the left of
the database. For example, if you search for lysozyme you will see that PubMed
alone has more than 19,000 articles in its database-too many to search. Thus you
will have to refine your search by using Boolean operators (see #9 below).
5. Under Overview on the left hand side of the PubMed page is MEDLINE, NLM’s
premier bibliographic database. Clicking on the highlighted MEDLINE will take
you to the Fact Sheet. It can also be accessed through the NLM Gateway:
http://gateway.nlm.nih.gov. Of interest is the Fact Sheet, “What’s the Difference
Between MEDLINE and PubMed?”
MEDLINE has many features but the most basic and one of current interest is the
search capability.
6. If you are interested in searching the bibliography for a particular article you can
enter in the dialogue box under a search term, author name, or journal name or
article. For example, you may want to search lysozyme, an enzyme isolated from
hen egg white and a natural component of human tears. You can choose the
category you want to search under or do a general search of all the databases. For
some of the search categories such as 3D domains for 3D structures, you may
have to download the free software to view them. This would be better done on
your own computer. Using the category box on the left or the menu bar at the top
allows you to search categories quickly and also refine your search.
7. Click on “Search” once you have typed it in and more than 500 citations or
articles will appear. The lists will be composed of author(s), title and reference in
reverse chronological order.
8. Clicking on the author’s name (in hypertext) will allow you to retrieve the
abstract of the article. The hypertext “see Related Articles” another useful and
time-saving feature as it allows you to quickly be linked to other related articles
without first having to search for them. Thus clicking on this will provide a list of
papers related to the specific citation.
9. 500 papers is too many to view at one time and is too broad so you may reduce
the number by modifying your search, making it more specific using Boolean
operators and the menu categories on the left or top menu bar.
Boolean Operators are uppercase terms “AND, OR, NOT” used to refine a
search. They are processed from left to right. Thus parentheses should be used to
nest terms so they will be processed as a unit and then incorporated into the
overall search strategy. For example if you were searching lysozyme you could
refine it to (human lysozyme) NOT hen egg white.
To become familiar with the site:
Search for the enzyme lysozyme again. Using the PubMed bibliographic searches search
for the following aspects of the enzyme.
After you click on Go, a number of articles will appear along with the menu bar at the top
of the articles.
The first box will say Display. The box next to it is the Category or Summary box; it
will have a scroll arrow in it and will contain various categories under which you can
display or categorize the articles, thus refining your search much like using Boolean
terminology. It allows you to choose form various categories such as briefs, abstracts,
citations, MEDLINE, related articles, etc.
The Show button next to Display, gives the number of articles under the categories you
have selected and the number of pages.
Sort allows you to sort the articles alphabetically and chronologically by author, journal
or Publication date.
The Send To button allows you to send the search to text, file, clipboard, e-mail or order.
Page tells the number of pages in the search. The number displayed in the box tells you
the page number you are on and the highlighted Next takes you to the next page. Typing
in a specific number will take you directly to that page rather than having to scroll
through all the pages.
Next to the article there is an empty box. Clicking on the box puts a checkmark in the
box thus selecting it. When you have gone through all the articles you can then collate all
the ones you have selected and print or save only those selected thus providing you a
bibliography of your search.
Under the box there is a paper icon. Clicking on this icon allows you to retrieve an
abstract of the article. Clicking on the highlighted title and authors will do the same.
Above the article there is a highlighted “full text” icon that allows you to download and
print the article if it is available. Not all articles are available.
In becoming familiar with the search features, answer the following questions.
a) How many references and pages are there?
b) What are the other sources of the enzyme?
c) How has it been purified?
d)
e)
f)
g)
h)
i)
j)
k)
Has it been expressed in other organisms such as E.coli?
What is the sequence of the gene coding for it?
What is the expected protein sequence?
What is the metal ion present in the native enzyme?
Find and cite correctly two references that study the inhibition of the enzyme.
What inhibitor molecules of the enzyme have been investigated?
What are other substrates for this enzyme?
What is the expected extinction coefficient for this enzyme in the next
experiment?
l) What is its expected molecular mass, PI, pH.
m) How can it be crystallized? What does its crystal structure look like?
2. Web Tools and Biological Databases
Primary databases and structural analytical tools are important in protein biochemistry. In
this exercise we will analyze the structure of -lactalbumin from bovine milk and
compare it to human -lactalbumin. You will then be given a protein in which to analyze
on your own.
Tutorial:
1. Type the URL address (http://www.rcsb.org/pdb) into the domain at the top of
the browser to take you to the Protein Data Bank (PDB).
2. There is much information on the PDB homepage and you should become
familiar with it by clicking on some of the hyperlinks and seeing what they do and
where they take you. Also familiarize yourself using the tutorial which is located
above the Search box as a question mark. Click on this to access the tutorial.
3. After you have become familiar with the homepage, scroll down to the
“SearchLite” under Search in the middle of the page. Clicking on SearchLite will
take you to the SearchLite page.
4. Type in “human alpha lactalbumin” (or your protein of choice) in the box and
click on Search. A number, in the order of seven or more, of structures should
appear with white square boxes to the left of them. The key at the top defines the
other symbols; the turquoise arrow, the page and the eye. Click on these to see
what they do.
5. Click on the first white square box to the left (Structure 1A4V) and “EXPLORE”
to the right to display the “Structure Explorer” with the “Summary Information”
about the structure of the protein. If you require help, click on the “?” and a
dialogue box will appear.
6. A number of functions will appear on the left side of the screen. “View
Structure”- displays the “Interactive 3D Display” and “Still Images”.
7. Click on “Still Images” initially to view the structure in ribbon or cylinder form.
To enlarge the structure, click on the appropriate choice (i.e. 250 x 250 or 500 x
500). The -helices and -sheets should be visible.
For some of the other features you will have to download the free programs (i.e.
Chime, RASMol, Swiss-Protein Viewer). This is a secure site so there should be
no problems with virus contamination.
8. Once you are familiar with the structure of the protein, then you can view its
rotation about it’s axis by clicking on “Chime” under the “Interactive 3D
Display”. The mouse controls are listed under “Chime Help” at the bottom of the
screen.
9. Under the “Summary Information” you will find other functions.
Clicking on “Sequence Details” will give the amino acid structure and definition
of the secondary structure for the protein. If you would like to ftp this file to
yourself you can download it by clicking on “Download in FASTA format”
which is the format that lists the amino acid sequences in single-letter
abbreviation for each amino acid.
To display a table of bond angles and lengths, click on “Geometry”. Clicking on
“Structural Neighbours” will display the neighbours and their angles and
lengths.
10. Other features such as “VAST” will display Sequence Neighbours and
Structure Neighbours. Sequence neighbours will display sequences similar to
your protein of choice while clicking on Structure Neighbours will display similar
structures to your protein of choice. Clicking on “Other Sources” will display
data files with references to your protein of choice.
11. Under “View Structure”, just above “Chime”, there is a hyperlink to RasMol
(or RasMac) which will allow you to view the detailed structure of a protein and
rotate it on its coordinates allowing you to view it from all its perspectives.
RasMol instructions can be viewed under “Help” or you may want to use the
RasMol Tutorial listed in the Web addresses above.
12. Swiss-Protein Viewer for which the address is given above, is another useful
protein viewer.
13. BLAST, available at the NCBI (www.ncbi.nlm.nih.gov) is a commonly used
protein viewer and analysis tool. Clicking on “Basic BLAST search” will bring
up the dialogue box into which you can type the amino acid sequence of your
choice protein. You can also do this by downloading the amino acid sequence in
FASTA format into a file saved on your computer and then transferring that file
into the BLAST dialogue box to get a list of proteins with similar amino acid
sequence to the one you entered.
Note: When doing a BLAST search, amino acids have a specific code according
to the following table.
Table 1: Amino acid codes for BLAST analysis
A alanine
B aspartate or asparagine
C cystine
D aspartate
E glutamate
F phenylalanine
G glycine
H histidine
I isoleucine
K lysine
L leucine
M methionine
N asparagine
P proline
Q glutamine
R arginine
S serine
T threonine
U selenocysteine
V valine
W tryptophan
Y tyrosine
Z glutamate or glutamine
X any
* translation stop
- gap of indeterminate length
14. Entrez is another approach to studying proteins and nucleic acids which can be
accessed through the NCBI home page by clicking on “Proteins” to obtain the
dialogue box and then entering your protein of choice and clicking on “Search”.
This will provide you with relevant documents. You may also access BLAST
through Entrez.
Procedure:
1. Using the techniques outlined in the above tutorial, explore the enzyme lysozyme.
View structures and look at the amino acid sequences. Provide all the information
outlined in the above tutorial for your lab report.
2. Provide two recent research articles on your enzyme and correctly give the
references.
3. What methods have been used to purify your protein? Briefly describe them.
4. Include the nucleotide sequence of the gene coding for your protein. Begin on the
NCBI home page and enter Entrez. Click on “Nucleotides” and do a search.
Review the GenBank report for the position of introns and exons and obtain a
FASTA report, transfer (download) the files and complete a BLAST search for
related sequences (this should cover many of the above steps outlined in the
tutorial).
5. Using the BLAST tool, compare the amino acid sequences to another protein and
repeat using BLAST to compare the nucleotide sequences for the genes coding for
the protein.
6. Enzyme restriction digestion is critical for determining information about the
structure of a protein. A commonly used restriction enzyme is HindIII. Using the
REBASE site, determine the specificity of this restriction enzyme.
7. Protein separation and elucidation is often done by SDS-PAGE. Using the Web
site on Biocatalysis/Biodegradation, outline the pathway for the microbial
degradation of the detergent, sodium dodecyl sulfate (SDS) used to denature
proteins for SDS-PAGE.
References:
This procedure has been adapted in part from R. Boyer, Modern Experimental
Biochemistry, (2000), (3rd Ed.). Benjamin Cummings (Toronto) and the following
references.
Baxevanis and B. Ouellette (Eds), Bioinformatics: A Practical Guide to the Analysis of
Genes and Proteins (1998), John Wiley & Sons (New York). A new introduction to
computing.
R. Doolittle, (Ed), Methods in Enzymology (1996), “Computer Methods for
Macromolecule Sequence Analysis,” Vol. 266, Academic Press (San Diego).
D. Leon, S. Uridil, and J. Miranda, J. Chem. Ed. 75, 731-734 (1998). “Structural
Analysis and Modeling of Proteins on the Web.”
H. Salter, Biochem, Educ. 26, 3-10 (1998). “Teaching Bioinformatics.”
C. Smith, The Scientist, August 31, pp. 17-19 (1998). “Molecular Modeling.”