Download CHAPTER 22

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Protein folding wikipedia , lookup

Protein domain wikipedia , lookup

Cyclol wikipedia , lookup

Structural alignment wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Alpha helix wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
CHAPTER 22
Application and Experimental Questions
E1.
With regard to DNA microarrays, answer the following questions:
A.
What is attached to the slide? Be specific about the number of spots, the
lengths of DNA fragments, and the origin of the DNA fragments.
B.
What is hybridized to the microarray?
C.
How is hybridization detected?
Answer:
A.
A DNA microarray is a small slide that is dotted with many different
fragments of DNA. In some microarrays, DNA fragments, which were made
synthetically (e.g., by PCR), are individually spotted onto the slide. The DNA
fragments are typically 500 to 5,000 bp in length, and a few thousand to tens of
thousands are spotted to make a single array. Alternatively, short oligonucleotides
can be directly synthesized on the surface of the slide. In this process, the DNA
sequence at a given spot is produced by selectively controlling the growth of the
oligonucleotide using narrow beams of light. In this case, there can be hundreds
of thousands of different spots on a single array.
B.
In most cases, fluorescently labeled cDNA is hybridized to the microarray,
though labeled genomic DNA or RNA could also be used.
C.
After hybridization, the array is washed and placed in a scanning confocal
fluorescence microscope that scans each pixel (the smallest element in a visual
image). After correction for local background, the final fluorescence intensity for
each spot is obtained by averaging across the pixels in each spot. This results in a
group of fluorescent spots at defined locations in the microarray.
E2.
For two-dimensional gel electrophoresis, what physical properties of proteins
promote their separation in the first dimension and the second dimension?
Answer: In the first dimension (i.e., in the tube gel), proteins migrate to the point in the
gel where their net charge is zero. In the second dimension (i.e., the slab gel), proteins are
coated with SDS and separated according to their molecular mass.
E3.
Can two-dimensional gel electrophoresis be used as a purification technique?
Explain.
Answer: Yes, two-dimensional gel electrophoresis can be used as a purification
technique. A spot on a two-dimensional gel can be cut out, and the protein can be eluted
from the spot. This purified protein can be subjected to tandem mass spectroscopy to
determine peptide sequences within the protein. It should be mentioned, however, that
two-dimensional gel electrophoresis would not be used to purify proteins in a functional
state. The exposure to SDS in the second dimension would denature proteins and
probably inactivate their function.
E4.
Explain how tandem mass spectroscopy can be used to determine the sequence of
a peptide. Once a peptide sequence is known, how is this information used to determine
the sequence of the entire protein?
Answer: In tandem mass spectroscopy, the first spectrometer determines the mass of a
peptide fragment from a protein of interest. The second spectrometer determines the
masses of progressively smaller fragments that are derived from that peptide. Because the
masses of each amino acid are known, the molecular masses of these smaller fragments
reveal the amino acid sequence of the peptide. With peptide sequence information, it is
possible to use the genetic code and produce DNA sequences that could encode such a
peptide. More than one sequence is possible due to the degeneracy of the genetic code.
These sequences are used as query sequences to search a genomic database. This program
will (hopefully) locate a match. The genomic sequence can then be analyzed to determine
the entire coding sequence for the protein of interest.
E5.
Describe the two general types of protein microarrays. What are their possible
applications?
Answer: The two general types of protein microarrays are antibody microarrays and
functional protein arrays. In an antibody microarray, many different antibody molecules,
each one recognizing a different peptide sequence, are spotted onto the array. Cellular
proteins are isolated, fluorescently labeled and exposed to the microarray. When a given
protein is recognized by an antibody, it will be captured by the antibody and remain
bound to the spot. Because each antibody recognizes a different peptide sequence, this
microarray can be used to monitor protein expression levels. A functional protein
microarray involves purifying cellular proteins and spotting them onto a slide. This type
of microarray can be analyzed with regard to substrate specificity, drug binding, and/or
protein-protein interactions.
E6.
What is a motif? Why is it useful for computer programs to identify functional
motifs within amino acid sequences?
Answer: A motif is a sequence that carries out a particular function. There are promoter
motifs, enhancer motifs, and amino acid motifs that play functional roles in proteins. For
a long genetic sequence, a computer can scan the sequence and identify motifs with great
speed and accuracy. The identification of amino acid motifs helps a researcher to
understand the function of a particular protein.
E7.
Discuss why it is useful to search a database to identify sequences that are
homologous to a newly determined sequence.
Answer: By searching a database, one can identify genetic sequences that are
homologous to a newly determined sequence. In most cases, homologous sequences carry
out identical or very similar functions. Therefore, if one identifies a homologous member
of a database whose function is already understood, this provides an important clue
regarding the function of the newly determined sequence.
E8.
The secondary structure of 16S rRNA has been predicted using a computer-based
sequence analysis. In general terms, discuss what type of information is used in a
comparative sequence analysis, and explain what assumptions are made concerning the
structure of homologous RNAs.
Answer: In a comparative approach, one uses the sequences of many homologous genes.
This method assumes that RNAs of similar function and sequence have a similar
structure. For example, computer programs can compare many different 16S rRNA
sequences to aid in the prediction of secondary structure.
E9.
Discuss the basis for secondary structure prediction in proteins. How reliable is it?
Answer: The basis for secondary structure prediction is that certain amino acids tend to
be found more frequently in  helices or β sheets. This information is derived from the
statistical frequency of amino acids within secondary structures that have already been
crystallized. Predictive methods are perhaps 60 to 70% accurate, which is not very good.
E10. To reliably predict the tertiary structure of a protein based on its amino acid
sequence, what type of information must be available?
Answer: The three-dimensional structure of a homologous protein must already be solved
by X-ray crystallography before one can attempt to predict the three-dimensional
structure of a protein of interest based on its amino acid sequence.
E11. In this chapter, we considered a computer program that can translate a DNA
sequence into a polypeptide sequence. A researcher has a sequence file that contains the
amino acid sequence of a polypeptide and runs a program that is opposite to the
TRANSLATION program. This other program is called BACKTRANSLATE. It can take
an amino acid sequence file and determine the sequence of DNA that would encode such
a polypeptide. How does this program work? In other words, what are the genetic
principles that underlie this program? What type of sequence file would this program
generate: a nucleotide sequence or an amino acid sequence? Would the
BACKTRANSLATE program produce only a single sequence file? Explain why or why
not.
Answer: The BACKTRANSLATE program works by knowing the genetic code. Each
amino acid has one or more codons (i.e., three-base sequences) that are specified by the
genetic code. This program would produce a sequence file that was a nucleotide base
sequence. The BACKTRANSLATE program would produce a degenerate base sequence
because the genetic code is degenerate. For example, lysine can be specified by AAA or
AAG. The program would probably store a single file that had degeneracy at particular
positions. For example, if the amino acid sequence was lysine–methionine–glycine–
glutamine, the program would produce the following sequence:
5–AA(A/G)ATGGG(T/C/A/G)CA(A/G)
The bases found in parentheses are the possible bases due to the degeneracy of the
genetic code.
E12. In this chapter, we considered a computer program that can translate a DNA
sequence into a polypeptide sequence. Instead of running this program, a researcher could
simply look the codons up in a genetic code table and determine the sequence by hand.
What are the advantages of running this program compared to the old-fashioned way of
doing it by hand?
Answer: The advantages of running a computer program are speed and accuracy. Once
the program has been made, and a sequence file has been entered into a computer, the
program can analyze long genetic sequences quickly and accurately.
E13. To identify the following types of genetic occurrences, would a program use
sequence recognition, pattern recognition, or both?
A.
Whether a segment of Drosophila DNA contains a P element (which is a
specific type of transposable element)
B.
Whether a segment of DNA contains a stop codon
C.
In a comparison of two DNA segments, whether there is an inversion in
one segment compared to the other segment
D.
Whether a long segment of bacterial DNA contains one or more genes
Answer:
A.
To identify a specific transposable element, a program would use
sequence recognition. The sequence of P elements is already known. The program
would be supplied with this information and scan a sequence file looking for a
match.
B.
To identify a stop codon, a program would use sequence recognition.
There are three stop codons that are specific three-base sequences. The program
would be supplied with these three sequences and scan a sequence file to identify
a perfect match.
C.
To identify an inversion of any kind, a program would use pattern
recognition. In this case, the program would be looking for a pattern in which the
same sequence was running in opposite directions in a comparison of the two
sequence files.
D.
A search by signal approach uses both sequence recognition and pattern
recognition as a means to identify genes. It looks for an organization of sequence
elements that would form a functional gene. A search by content approach
identifies genes based on patterns, not on specific sequence elements. This
approach looks for a pattern in which the nucleotide content is different from a
random distribution. The third approach to identify a gene is to scan a genetic
sequence for long open reading frames. This approach is a combination of
sequence recognition and pattern recognition. The program is looking for specific
sequence elements (i.e., stop codons) but it is also looking for a pattern in which
the stop codons are far apart.
E14. Here is short nucleotide sequence within a gene. Via the internet (e.g., see
www.ncbi.nlm.nih.gov/Tools), determine what gene this sequence in found within. Also,
determine the species in which this gene sequence is found.
5’-GGGCGCAATTACTTAACGCCTCGATTATCTTCTTGCGCCACTGATCATTA-3’
Answer: This sequence is within the lacY gene of the lac operon of E. coli. It is found on
page 588, nucleotides 801-850.
E15. Membrane proteins often have transmembrane regions that span the membrane in
an α-helical conformation. These transmembrane segments are about 20 amino acids long
and usually contain amino acids with nonpolar (i.e., hydrophobic) amino acid side chains.
Researchers can predict whether a polypeptide sequence has transmembrane segments
based on the occurrence of segments that contain 20 nonpolar amino acids. To do so,
each amino acid is assigned a hydropathy value, based on the chemistry of its amino acid
side chain. Amino acids with very nonpolar side chains are given a high value, whereas
amino acids that are charged and/or polar are given low values. The hydropathy values
usually range from about +4 to –4.
Computer programs have been devised that scan the amino acid sequence of a
polypeptide and calculate values based on the hydropathy values of the amino acid side
chains. The program usually scans a window of seven amino acids and assigns an
average hydropathy value. For example, the program would scan amino acids 1–7 and
give an average value, then it would scan 2–8 and give a value, then it would scan 3–9
and give a value, and so on, until it reached the end of the polypeptide sequence (i.e.,
until it reached the carboxyl terminus).
The program then produces a figure, known as a hydropathy plot, which describes
the average hydropathy values throughout the entire polypeptide sequence. An example
of a hydropathy plot is shown here.
[Insert Text Art 22.4]
A.
How many transmembrane segments are likely in this polypeptide?
B.
Draw the structure of this polypeptide if it were embedded in the plasma
membrane. Assume that the amino terminus is found in the cytoplasm of the cell.
Answer:
A.
This sequence has two regions that are about 20 amino acids long and very
hydrophobic. Therefore, it is probable that this polypeptide has two
transmembrane segments.
B.
E16. Explain how a computer program can predict RNA secondary structure. What is
the underlying genetic concept that is used by the program to predict secondary structure?
Answer: RNA secondary structure is based on the ability of complementary sequences
(i.e., sequences that obey the AU/GC rule) to form a double helix. The program employs
a pattern recognition approach. It looks for complementary sequences based on the
AU/GC rule.
E17. Are the following statements about protein structure prediction true or false?
A.
The prediction of secondary structure relies on information regarding the
known occurrence of amino acid residues in α helices or β sheets from X-ray
crystallographic data.
B.
The prediction of secondary structure is highly accurate, nearly 100%
correct.
C.
To predict the tertiary structure of a protein based on its amino acid
sequence, it is necessary that the protein of interest is homologous to another protein
whose tertiary structure is already known.
Answer:
A.
B.
C.
True
False. The programs are only about 60 to 70% accurate.
True.