* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Question 1
Promoter (genetics) wikipedia , lookup
Biochemistry wikipedia , lookup
Deoxyribozyme wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Interactome wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
Western blot wikipedia , lookup
Metalloprotein wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Messenger RNA wikipedia , lookup
Biosynthesis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Proteolysis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Protein structure prediction wikipedia , lookup
Point mutation wikipedia , lookup
Epitranscriptome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Question 1 (4 pts) The goal of this question is to familiarize you with the relationship between amino acids and their corresponding DNA/RNA sequences and issues like the possibility of multiple codons for the same amino acid, the directionality of DNA/RNA sequences and the fact that translation occurs in units of 3 bases at a time. a) (1 pt) Messenger RNA translation usually starts at an AUG codon. However, there are two alternative start codons that are sometimes used in prokaryotes – what are they, read 5'-3'? Answer: GUG, UUG, CUG and AUU are all valid start codons in prokaryotes. b) (1 pt) Do the three possible start codons code for different amino acids when they are used as start codons ? What amino acid(s) do they encode? Answer: No, they all code for Met when used as start codons. c) (2 pts) How many possible mRNA sequences are there that could encode the amino acid sequence FYC and what are they, read in the 5'-3' direction? Answer: phenylalanine - UUU, UUC; tyrosine - UAU, UAC; cysteine - UGU, UGC, so there are 8 possibilities. UUU UAU UGU UUU UAU UGC UUU UAC UGU UUU UAC UGC UUC UAU UGU UUC UAU UGC UUC UAC UGU UUC UAC UGC Question 2 (6 pts) You have isolated the messenger RNA product of your favorite gene (YFG). You reverse transcribe the start-to-stop codon portion of this mRNA to obtain cDNA and proceed to sequence it. The file reversecomp.fa, on the course website, contains this sequence in FASTA format. Write a python script that will open the file, read it and create variables with the following types, names and values (this is the “contract” for this problem set that has to be adhered to): - (2 pts) MRNA: A string variable that contains the mRNA sequence that the DNA sequence in the file came from. - (1 pt) SEQBIT: A string variable that contains the segment of the mRNA sequence corresponding to amino acids 10-12 of YFG. Answer: ACAGAGGAA - (3 pts) FYPOSITION: An integer variable that contains the position of the first occurrence of an in-frame set of nucleotides in the mRNA sequence that encode the amino acid sequence FY (the position of the first nucleotide in the coding sequence is defined as 1). Answer: 1162 Name the script decode.py and submit it electronically on the course website (see cover page for instructions). (Hint: You can partially debug whether you have the correct mRNA sequence by checking that it starts and ends with nucleotides that correspond to start and stop codons; also, remember that there are chemical differences between the nucleotides used in DNA vs RNA molecules) NOTES: 1. "Create variables with the following types, names and values" means that when your program completes, there should be a string variable called SEQBIT that is equal to the requested RNA sequence, an integer variable called FYPOSITION that is equal to the requested value etc 2. DO NOT DEVIATE FROM THE CONTRACT. Don't give the variables different names, don't change their casing (i.e. don't change them to "seqbit" or "fyposition") and don't give them types other than the ones specified. If you don't follow the contract, you will be penalized. No exceptions. Question 3 (3 pts) Construct a Dynamic Programming matrix to optimally align the following peptide sequences using the BLOSUM62 scoring matrix found on page 105 of Mount and a gap penalty of -8. The BLOSUM62 scoring matrix is also available on the course website for those without a textbook. TGEK and STGCML a) (1 pt) Use the last page included in this problem set to fill in the dynamic programming matrix and turn it in with your homework. b) (1 pt) Circle your traceback path for generating the alignment. c) (1 pt) Write out the final alignment below the grid. Answer: _TG_EK STGCML (see grid on last page) Question 4 (10 pts) You are a staff scientist at the world-renowned Center for Dinosaur Genetics. You are currently working on describing the function of an interesting dinosaur protein X. You have been able to sequence the corresponding gene and know that the sequence of protein X is as follows (also posted on the class website in file proteinx.fa): > protein x MEPQSQAFPGSAGTALQYPPPAYPAAKGGFQVPMIPDYLFPQQQGDLGLGTPDQKPFQG LESRTQQPSLTPLSTIKAFATQSGSQDLKALNTSYQSQLIKPSRMRKYPNRPSKTPPHE RPYACPVESCDRRFSRSDELTRHIRIHTGQKPFQCRICMRNFSRSDHLTTHIRTHTGEK PFACDICGRKFARSDERKRHTKIHLRQKDKKADKSVVASSATSSLSSYPSPVATSYPSP VTTSYPSPATTSYPSPVPTSFSSPGSSTYPSPVHSGFPSPSVATTYSSVPPAFPAQVSS FPSSAVTNSFSASTGLSDMTATFSPRTIEICNFQNALNLQQTPLHLAVITNQLVKIAEA LLGAGCCYRMLRDFRGNTPLHLACEQGCAKVSVGVLTQADGRTMNKLRLMLKAENGRQW ETRHFLSYISDRTYNIYICILIIASDFDAEDGHKSYGCLCVNTLLHRCIMQWERTPDLV SLLL You have little clue as to what protein X might actually do, so you decide to use BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) see what is known about proteins close to it in sequence. a. (1 pt) What version of BLAST is best suited for performing this task? Answer: protein-protein BLAST (BLASTP) Using the appropriate program, BLAST the sequence above against the nr database (use default parameters). b. (2 pts) What are the bit score, the E-value and the accession number of the top hit? Do you think this is a significant match? (Hint: you will need to click the “Format!” link after initially submitting your sequence to BLAST) Answer: bit score 514, E-value 3*10-144, accession number AAH73983. The match is certainly significant since given the database and the query, we expect to find 3*10-144 such matches at random (in other words, this would “never” happen at random). Whether the match actually implies any functional similarity is a different question. Let us call the top hit protein Y. Click on the “Gene info” link to the right of the listing for protein Y (marked with a blue letter “G”) and follow to the gene information page. c. (1 pt) Judging from the summary and the references shown on the gene information page, what is the function of protein Y? Answer: protein Y (early growth response 1 in Homo sapiens) is a transcription factor from the family of zinc finger transcription factors. The zinc finger domain is used to bind DNA and is therefore crucial to the function of the protein. Scroll to the bottom of the page to get to the “NCBI Reference Sequences (RefSeq)” section and check for any conserved domains (to get more information about a conserved domain, if any, click on its name). d. (2 pts) Name any conserved domains and their function. Based on what you now know about protein Y, is this domain crucial to its function? Answer: there are two zinc finger domains present in this protein. One of the common functions of zinc fingers is to bind the major grove of DNA. Back on the gene information page the locations (in sequence) of the conserved domains are listed along with their names. Compare these locations with the matching regions in the original BLAST alignment of protein X with protein Y (you will need to refer back to the BLAST results page). e. (2 pts) Did protein X hit protein Y because of similarities in the conserved domain(s)? Can you propose a hypothetical function for protein X? Answer: Yes, the portion of the hit contains within it the regions of zinc finger domains. Given this information, it is reasonable to propose that protein X is a zinc finger transcription factor. Refer back to the first page returned by BLAST (before the formatted results page). On this page BLAST shows any significant matches between your sequence and the Conserved Domains Database (CDD). Click on the graphic showing the locations of matching domains. In the page that appears, you can mouse over the different domains identified in your sequence and read short descriptions in the text box above. Based on work done in earlier sections of this problem, you are probably not surprised to see putative domains located close to the N-terminus. You notice, however, that there is another domain at the C-terminus. Click its name to go to the CDD and get more information. f. (2 pts) Based on what you learned about this domain from the CDD, what else can you propose about the possible functions of protein X? Answer: ankyrin repeats are common protein-protein interaction motifs. The fact that our protein has this domain, suggest that it might interact with other proteins as part of its function. It is common for transcription factors to interact with other proteins (regulatory proteins, other transcription factors or proteins of the ribosome). - S T G C M L - 0 -8 -16 -24 -32 -40 -48 T -8 1 -3 -11 -19 -27 -35 G -16 -7 -1 3 -5 -13 -21 E -24 -15 -8 -3 -1 -7 -15 K -32 -23 -16 -10 -6 -2 -9