* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download BIO520 Final Exam 5/07 Jim Lund You may use any books, notes
Transcriptional regulation wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene regulatory network wikipedia , lookup
List of types of proteins wikipedia , lookup
Genome evolution wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Gene expression profiling wikipedia , lookup
Interactome wikipedia , lookup
Protein adsorption wikipedia , lookup
Point mutation wikipedia , lookup
Western blot wikipedia , lookup
Community fingerprinting wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gene expression wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Molecular evolution wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Proteolysis wikipedia , lookup
Homology modeling wikipedia , lookup
BIO520 Final Exam 5/07 Jim Lund You may use any books, notes, web pages, software programs, or related materials to complete this exam. You MAY NOT consult with any person regarding the exam’s intellectual content. Please email this lab to Lin Xue at both the email addresses: [email protected] [email protected] 1a (1 pt). Find the RefSeq protein entry for human CYP3A4, a cytochrome P450. Give the RefSeq accession number. 1b (1pt). Where is this gene located in the human genome? Give the chromosome and approximate location in Mb. Example: Ch2, 13.4 Mb. 1c (2pt). What genes are adjacent to CYP3A4 on the chromosome? 2a (1pt). Examine the PDB entry 1W0E for CYP3A4. What is the primary secondary structure found in this protein? 2b (1pt). What method was used to determine this structure, and what is its resolution? 2c (1pt). What is the large molecule in the center of the protein? 3. Human sirtuin 1 protein was BLASTed against the nr database. A one BLAST match is shown below. 3a (1 pt). Examine the HSP shown below. What do the ‘---‘ characters in the query sequence at aa 220-237 indicate? 3b (1 pt). Examine the HSP shown below. What is the difference between ‘Identities’ and ‘Positives’? 3c (1 pt). Examine the HSP shown below. Does the E-value for this HSP indicate this is a excellent, borderline, or insignificant match? 3d (1 pt). Examine the HSP shown below. Would you expect a BLASTN comparison of the DNA sequences for these two proteins to show higher or lower percent identical bps? >ref|XP_635962.1| Gene info NAD(+)-dependent deacetylase, silent information regulator protein (Sir2) family protein [Dictyostelium discoideum AX4] Length=542 Score = 277 bits (708), Expect = 2e-72, Method: Composition-based stats. Identities = 149/333 (44%), Positives = 211/333 (63%), Gaps = 31/333 (9%) Query 185 Sbjct 215 Query 232 Sbjct 271 YTFVQQHLMIGTDPRTILKDLLPETIPPPELD-DMTLWQIVINILSEP-----------Y +Q+ +G DP KD+ + EL+ D W+I+ L+ YKHIQEKKSLGIDPIEFTKDIGFKL----ELEKDDDAWEIITAFLTRKKVAVNLFLNYLK ------PKRKK--RKDINTIEDAVKLLQECKKIIVLTGAGVSVSCGIPDFRSRDGIYARL P RKK D++T E +L + K I+++TGAGVSVSCGIPDFRS+ G+Y + YNTLARPYRKKIATLDLSTFEKVCQLFESSKNIVIITGAGVSVSCGIPDFRSKGGVYETI 231 270 283 330 Query 284 Sbjct 331 Query 344 Sbjct 390 Query 402 Sbjct 448 Query 462 Sbjct 508 AVDFPDLPDPQAMFDIEYFRKDPRPFFKFAKEIYPGQFQPSLCHKFIALSDKEGKLLRNY + +LP P+++FDI Y R +P PFF+FAKEI+PG +PS H FI L D++GKLLRNY EKKY-NLPRPESLFDIHYLRANPLPFFEFAKEIFPGNHKPSPTHSFIKLLDEKGKLLRNY 343 TQNIDTLEQVAGIQR--IIQCHGSFATASCLICKYKVDCEAVRGDIFNQVVPRCPRCPAD TQNIDTLE VAGI R ++ CHGSF+TA+C+ CK VD +R I +P C +C + TQNIDTLEHVAGIDREKLVNCHGSFSTATCITCKLTVDGTTIRDTIMKMEIPLCQQC--N 401 EPLAIMKPEIVFFGENLPEQFHRAMKYDKDEVDLLIVIGSSLKVRPVALIPSSIPHEVPQ + + MKP+IVFFGENLP++F + + D ++DLLIV+GSSL+V+PV+L+P + ++PQ DGQSFMKPDIVFFGENLPDRFDQCVLKDVKDIDLLIVMGSSLQVQPVSLLPDIVDKQIPQ 461 ILINREPLPHLH-FDVELLGDCDVIINELCHRL ILINRE + H FD LGDCD + +L +++ ILINRELVAQPHEFDYVYLGDCDQFVQDLLNKV 389 447 507 493 540 4 (2pt). The best programs for finding genes in eukaryotic genomic DNA sequence make use of EST and protein database searches and alignment of matching database sequences to the genomic DNA to find and verify genes. If this approach is so important and useful, why are de novo gene finding criteria still necessary? 5 (2pt). The rhesus monkey, Macaca mulatta, genome sequence was recently published. You have the sequence for a human gene you are studying and are eager to find the rhesus genomic sequence so you can compare the human and monkey promoter sequences. You cannot find the rhesus sequence in the nr database. Which BLAST program/database search will allow you to find the genomic DNA for your gene? 6. Indicate one thing that can be learned from comparative sequence analysis in the three cases below that is specific to that case (no answers that would fit more than one of the questions). 6a (1pt). Species that diverged a billion years ago (1e9 years) such as yeast-tomato. 6b (1pt). Species that diverged a 100 million years ago (1e8 years) such as mouse-human. 6c (1pt). Species that diverged a 10 million years ago (1e7 years) such as human-chimpanzee. 7 (1pt). Why are two samples co-hybridized to spotted microarrays while a single sample is hybridized to Affymetrix microarrays? 8 (2pt). Aside from its sequence what other information describing a SNP is the most important and useful to know? 9 (2pt). You profile human adrenal tumor samples on a microarray and find that human estrogen receptor 1 (ESR1) is up-regulated in the tumor samples compared to the control samples. You want to find out if any genes known to bind ESR1 are also up-regulated. How would you find a complete and reliable list of proteins known to bind ESR1? To start with, an IntAct search indicates that human ESR1 interacts with 14 proteins. What would you do to expand or refine this list of proteins to arrive at your final list? 10 (1 pt). When building a phylogenetic tree a bootstrap analysis is often performed. What is learned from this type of analysis? 11 (2 pt). How does threading differ from homology modeling? You can describe either how the methods differ or the different circumstances in which they get used. 12 (2 pt). Examine the TMHMM predictions for D. melanogaster bride of sevenless (boss). How many membrane spanning domains predicted to be in this protein? For the N and C termini indicate the predicted location (cytoplasmic or extracellular). 13 (1 pt). Mass spectrometry (MS) or paired MS-MS analysis on a protein sample generates spectra or peptide mass fingerprints. Analysis of this data to identify the peptide fragments and the proteins they come from typically requires a database of proteins from the organism. Why is this database of proteins required? 14 (5 pts). Match each left column database/program with a single application in the right column. There are extra entries in the right column. Indicate your answer as (A8, B10, C4, etc). A. BLAST B. PHRED/PHRAP C. GeneMark D. PSORT E. Cluster F. ReadSeq G. OMIM H. VISTA 1. Protein threading. 2. Catalog of human genes and genetic disorders. 3. Transcription factor binding sites. 4. Order genes and arrays by a similarity measure. 5. DNA sequencing and assembly. 6. Protein structure database. I. Interpro 7. Phylogenetics trees construction and analysis. J. PDB 8. Genome database. K. CLUSTAL 9. Search a sequence database. L. MFOLD 10. Protein subcellular location. M. GENSCAN 11. Multiple sequence alignment. 12. NCBI protein structure database. 13. RNA structure database. 14. Eukaryotic de novo gene finding. 15. Sequence format conversion. 18. Protein domain database. 16. Find secondary structures for RNA. 19. Protein structure viewer. 17. Prokaryotic de novo gene finding. 20. Comparative sequence analysis and visualization.