Download Question 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Promoter (genetics) wikipedia , lookup

Biochemistry wikipedia , lookup

Deoxyribozyme wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA-Seq wikipedia , lookup

Interactome wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Protein wikipedia , lookup

Expression vector wikipedia , lookup

Gene wikipedia , lookup

Magnesium transporter wikipedia , lookup

Western blot wikipedia , lookup

Metalloprotein wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Messenger RNA wikipedia , lookup

Biosynthesis wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Proteolysis wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Protein structure prediction wikipedia , lookup

Point mutation wikipedia , lookup

Epitranscriptome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression wikipedia , lookup

Genetic code wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transcript
Question 1
(4 pts) The goal of this question is to familiarize you with the relationship between
amino acids and their corresponding DNA/RNA sequences and issues like the possibility
of multiple codons for the same amino acid, the directionality of DNA/RNA sequences
and the fact that translation occurs in units of 3 bases at a time.
a) (1 pt) Messenger RNA translation usually starts at an AUG codon. However,
there are two alternative start codons that are sometimes used in prokaryotes –
what are they, read 5'-3'?
Answer: GUG, UUG, CUG and AUU are all valid start codons in prokaryotes.
b) (1 pt) Do the three possible start codons code for different amino acids when they
are used as start codons ? What amino acid(s) do they encode?
Answer: No, they all code for Met when used as start codons.
c) (2 pts) How many possible mRNA sequences are there that could encode the
amino acid sequence FYC and what are they, read in the 5'-3' direction?
Answer: phenylalanine - UUU, UUC; tyrosine - UAU, UAC; cysteine - UGU,
UGC, so there are 8 possibilities.
UUU UAU UGU
UUU UAU UGC
UUU UAC UGU
UUU UAC UGC
UUC UAU UGU
UUC UAU UGC
UUC UAC UGU
UUC UAC UGC
Question 2
(6 pts) You have isolated the messenger RNA product of your favorite gene (YFG).
You reverse transcribe the start-to-stop codon portion of this mRNA to obtain
cDNA and proceed to sequence it. The file reversecomp.fa, on the course website,
contains this sequence in FASTA format.
Write a python script that will open the file, read it and create variables with the
following types, names and values (this is the “contract” for this problem set that has
to be adhered to):
-
(2 pts) MRNA: A string variable that contains the mRNA sequence that
the DNA sequence in the file came from.
-
(1 pt) SEQBIT: A string variable that contains the segment of the mRNA
sequence corresponding to amino acids 10-12 of YFG.
Answer: ACAGAGGAA
-
(3 pts) FYPOSITION: An integer variable that contains the position of the
first occurrence of an in-frame set of nucleotides in the mRNA sequence
that encode the amino acid sequence FY (the position of the first
nucleotide in the coding sequence is defined as 1).
Answer: 1162
Name the script decode.py and submit it electronically on the course website (see cover
page for instructions).
(Hint: You can partially debug whether you have the correct mRNA sequence
by checking that it starts and ends with nucleotides that correspond to start
and stop codons; also, remember that there are chemical differences between
the nucleotides used in DNA vs RNA molecules)
NOTES:
1. "Create variables with the following types, names and values" means that
when your program completes, there should be a string variable called
SEQBIT that is equal to the requested RNA sequence, an integer variable
called FYPOSITION that is equal to the requested value etc
2. DO NOT DEVIATE FROM THE CONTRACT. Don't give the variables
different names, don't change their casing (i.e. don't change them to
"seqbit" or "fyposition") and don't give them types other than the ones
specified. If you don't follow the contract, you will be penalized. No
exceptions.
Question 3
(3 pts) Construct a Dynamic Programming matrix to optimally align the following
peptide sequences using the BLOSUM62 scoring matrix found on page 105 of Mount
and a gap penalty of -8. The BLOSUM62 scoring matrix is also available on the course
website for those without a textbook.
TGEK and STGCML
a)
(1 pt) Use the last page included in this problem set to fill in the dynamic
programming matrix and turn it in with your homework.
b)
(1 pt) Circle your traceback path for generating the alignment.
c)
(1 pt) Write out the final alignment below the grid.
Answer:
_TG_EK
STGCML
(see grid on last page)
Question 4
(10 pts) You are a staff scientist at the world-renowned Center for Dinosaur Genetics.
You are currently working on describing the function of an interesting dinosaur protein
X. You have been able to sequence the corresponding gene and know that the sequence
of protein X is as follows (also posted on the class website in file proteinx.fa):
> protein x
MEPQSQAFPGSAGTALQYPPPAYPAAKGGFQVPMIPDYLFPQQQGDLGLGTPDQKPFQG
LESRTQQPSLTPLSTIKAFATQSGSQDLKALNTSYQSQLIKPSRMRKYPNRPSKTPPHE
RPYACPVESCDRRFSRSDELTRHIRIHTGQKPFQCRICMRNFSRSDHLTTHIRTHTGEK
PFACDICGRKFARSDERKRHTKIHLRQKDKKADKSVVASSATSSLSSYPSPVATSYPSP
VTTSYPSPATTSYPSPVPTSFSSPGSSTYPSPVHSGFPSPSVATTYSSVPPAFPAQVSS
FPSSAVTNSFSASTGLSDMTATFSPRTIEICNFQNALNLQQTPLHLAVITNQLVKIAEA
LLGAGCCYRMLRDFRGNTPLHLACEQGCAKVSVGVLTQADGRTMNKLRLMLKAENGRQW
ETRHFLSYISDRTYNIYICILIIASDFDAEDGHKSYGCLCVNTLLHRCIMQWERTPDLV
SLLL
You have little clue as to what protein X might actually do, so you decide to use BLAST
(http://www.ncbi.nlm.nih.gov/BLAST/) see what is known about proteins close to it in
sequence.
a. (1 pt) What version of BLAST is best suited for performing this task?
Answer: protein-protein BLAST (BLASTP)
Using the appropriate program, BLAST the sequence above against the nr database
(use default parameters).
b. (2 pts) What are the bit score, the E-value and the accession number of the top
hit? Do you think this is a significant match? (Hint: you will need to click the
“Format!” link after initially submitting your sequence to BLAST)
Answer: bit score 514, E-value 3*10-144, accession number AAH73983. The match
is certainly significant since given the database and the query, we expect to find
3*10-144 such matches at random (in other words, this would “never” happen at
random). Whether the match actually implies any functional similarity is a
different question.
Let us call the top hit protein Y. Click on the “Gene info” link to the right of the
listing for protein Y (marked with a blue letter “G”) and follow to the gene
information page.
c. (1 pt) Judging from the summary and the references shown on the gene
information page, what is the function of protein Y?
Answer: protein Y (early growth response 1 in Homo sapiens) is a transcription
factor from the family of zinc finger transcription factors. The zinc finger domain
is used to bind DNA and is therefore crucial to the function of the protein.
Scroll to the bottom of the page to get to the “NCBI Reference Sequences (RefSeq)”
section and check for any conserved domains (to get more information about a
conserved domain, if any, click on its name).
d. (2 pts) Name any conserved domains and their function. Based on what you now
know about protein Y, is this domain crucial to its function?
Answer: there are two zinc finger domains present in this protein. One of the
common functions of zinc fingers is to bind the major grove of DNA.
Back on the gene information page the locations (in sequence) of the conserved
domains are listed along with their names. Compare these locations with the matching
regions in the original BLAST alignment of protein X with protein Y (you will need
to refer back to the BLAST results page).
e. (2 pts) Did protein X hit protein Y because of similarities in the conserved
domain(s)? Can you propose a hypothetical function for protein X?
Answer: Yes, the portion of the hit contains within it the regions of zinc finger
domains. Given this information, it is reasonable to propose that protein X is a zinc
finger transcription factor.
Refer back to the first page returned by BLAST (before the formatted results page).
On this page BLAST shows any significant matches between your sequence and the
Conserved Domains Database (CDD). Click on the graphic showing the locations of
matching domains. In the page that appears, you can mouse over the different
domains identified in your sequence and read short descriptions in the text box above.
Based on work done in earlier sections of this problem, you are probably not
surprised to see putative domains located close to the N-terminus. You notice,
however, that there is another domain at the C-terminus. Click its name to go to the
CDD and get more information.
f. (2 pts) Based on what you learned about this domain from the CDD, what else can
you propose about the possible functions of protein X?
Answer: ankyrin repeats are common protein-protein interaction motifs. The fact
that our protein has this domain, suggest that it might interact with other proteins
as part of its function. It is common for transcription factors to interact with other
proteins (regulatory proteins, other transcription factors or proteins of the
ribosome).
-
S
T
G
C
M
L
-
0
-8
-16
-24
-32
-40
-48
T
-8
1
-3
-11
-19
-27
-35
G
-16
-7
-1
3
-5
-13
-21
E
-24
-15
-8
-3
-1
-7
-15
K
-32
-23
-16
-10
-6
-2
-9