Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
2008 Spring Biological database Homework 1 This problem set is due by 2PM, March 25, 2008. You shall upload your answers to your web site as instructed by your TA. For all questions, please make a reference such as screen-shot to indicate the source of your answer. 1. Here is a nucleotide sequence: CTCCAGGCCCGTGGGGCTGGCCCTGCACCGCCGAGCTTCCCGGGATGAGGGCCCCCGGTGTGGTCACCCG GCGCGCCCCAGGTCGCTGAGGGACCCCGGCCAGGCGCGGAGATGGGGGTGCACGAATGTCCTGCCTGGCT GTGGCTTCTCCTGTCCCTGCTGTCGCTCCCTCTGGGCCTCCCAGTCCTGGGCGCCCCACCACGCCTCATC TGTGACAGCCGAGTCCTGGAGAGGTACCTCTTGGAGGCCAAGGAGGCCGAGAATATCACGACGGGCTGTG CTGAACACTGCAGCTTGAATGAGAATATCACTGTCCCAGACACCAAAGTTAATTTCTATGCCTGGAAGAG GATGGAGGTCGGGCAGCAGGCCGTAGAAGTCTGGCAGGGCCTGGCCCTGCTGTCGGAAGCTGTCCTGCGG GGCCAGGCCCTGTTGGTCAACTCTTCCCAGCCGTGGGAGCCCCTGCAGCTGCATGTGGATAAAGCCGTCA GTGGCCTTCGCAGCCTCACCACTCTGCTTCGGGCTCTGGGAGCCCAGAAGGAAGCCATCTCCCCTCCAGA TGCGGCCTCAGCTGCTCCACTCCGAACAATCACTGCTGACACTTTCCGCAAACTCTTCCGAGTCTACTCC AATTTCCTCCGGGGAAAGCTGAAGCTGTACACAGGGGAGGCCTGCAGGACAGGGGACAGATGACCAGGTG TGTCCACCTGGGCATATCCACCACCTCCCTCACCAACATTGCTTGTGCCACACCCTCCCCCGCCACTCCT GAACCCCGTCGAGGGGCTCTCAGCTCAGCGCCAGCCTGTCCCATGGACACTCCAGTGCCAGCAATGACAT CTCAGGGGCCAGAGGAACTGTCCAGAGAGCAACTCTGAGATCTAAGGATGTCACAGGGCCAACTTGAGGG CCCAGAGCAGGAAGCATTCAGAGAGCAGCTTTAAACTCAGGGACAGAGCCATGCTGGGAAGACGCCTGAG CTCACTCGGCACCCTGCAAAATTTGATGCCAGGACACGCTTTGGAGGCGATTTACCTGTTTTCGCACCTA CCATCAGGGACAGGATGACCTGGAGAACTTAGGTGGCAAGCTGTGACTTCTCCAGGTCTCACGGGCATGG Please use database mining tools of your choice to tell me as much as you can about this sequence. i. What gene does this sequence represent in human? What is its GI number? GenBank Accession number? Gene symbol? Unigene ID? (1) erythropoietin (2) 62240996 (3) NM_000799 (4) EPO (5) Hs.2303 ii. What database(s) did you search, and what tool(s) did you use in your search? What parameter settings did you use? i. ii. iii. Database : NCBI Tool : BLAST Parameter : nucleotide blast and database is Human genomic plus transcript iii. Retrieve one ortholog of this gene’s complete mRNA sequence and Protein sequence in FASTA format. Compare the results obtained by blastn vs. blastp. i. mRNA sequence >gi|54792749|ref|NM_001006646.1| Canis lupus familiaris erythropoietin (EPO), mRNA ATGTGTGAACCTGCCCCTCCAAAACCCACACAGTCAGCCTGGCACTCTTTTCCAGAATGTCCTGCCCTGC TCCTTTTGCTGTCTTTGCTGCTGCTTCCTCTGGGCCTCCCAGTCCTGGGCGCCCCCCCTCGCCTCATTTG TGACAGCCGGGTCCTGGAGAGATACATCCTGGAGGCCAGGGAGGCCGAAAATGTCACGATGGGCTGTGCT CAAGGCTGCAGCTTCAGTGAGAATATCACCGTCCCAGACACCAAGGTTAATTTCTATACCTGGAAGAGGA TGGATGTTGGGCAGCAGGCCTTGGAAGTCTGGCAGGGCCTGGCACTGCTCTCAGAAGCCATCCTGCGGGG TCAGGCCCTGTTGGCCAACGCCTCCCAGCCATCTGAGACTCCGCAGCTGCATGTGGACAAAGCCGTCAGC AGCCTGCGCAGCCTCACCTCTCTGCTTCGGGCGCTGGGAGCCCAGAAGGAGGCCATGTCCCTTCCAGAGG AAGCCTCTCCTGCTCCACTCCGAACATTCACTGTTGATACTTTGTGCAAACTTTTCCGAATCTACTCCAA TTTCCTCCGTGGAAAGCTGACACTGTACACAGGGGAGGCCTGCAGAAGAGGAGACAGGTGACCAGGTGCT CCCACCCCAGGCACATCCACCACCTCACTCACTACCACTGCCTGGGCCACGCCTCTGCACCACCACTCCT GACCCCTGTCCAGGGGTGATCTGCTCAGCACCAGCCTGTCCCTGTCCCTTGGACACTCCACGGCCAGTGG TGATATCTCAAGGGCCAGAGGAACTGTCCAGAGCTCAAATCAGATCTAAGGATGTCACAGTGCCAGCCTG AGGCCCGAAGCAGGAGGAATTCGGAGGAAATCAGCTCAAACTTGGGGACAGAGCCTTGCTCGGGAGACTC ACCTCGGTGCCCTGCCGAACAGTGATGCCAGGACAAGCTGGAGGGCAATTGCCGATTTTTTGCACCTATC AGGGAGAGACAGGAGAGGCTAGAGAACTAGGTGGCAAGCCATAAATCTTTTAGGCTTCGGGTCTCCTATG ACAGCAAGAGCCCACTGGCAAAGGGGGGGGAGCCATGGAGATGGGATAGGGGCTGGCCCAAAAAAAAAAA AA Protein sequence >gi|54792750|ref|NP_001006647.1| erythropoietin [Canis lupus familiaris] MCEPAPPKPTQSAWHSFPECPALLLLLSLLLLPLGLPVLGAPPRLICDSRVLERYILEAREAENVTMGCA QGCSFSENITVPDTKVNFYTWKRMDVGQQALEVWQGLALLSEAILRGQALLANASQPSETPQLHVDKAVS SLRSLTSLLRALGAQKEAMSLPEEASPAPLRTFTVDTLCKLFRIYSNFLRGKLTLYTGEACRRGDR ii. blastn blastp iv. Retrieve at least 5 homologenes of this gene. Perform a multiple sequence alignment? The human sequence is most similar to what organism? i. ii. v. Pan troglodytes. Score is 99. Is the secondary structure of this protein known? If so, how many “helical fold”are there in its 3D protein structure? How did you determine the exact amino acid number of each helical region? i. Yes. ii. iii. 4 I use PDB to search erythropoietin and then the website will show the picture of the structure. vi. Is the function of this protein known? If so, what does it do? i. Yes. PFAM Accession PF00758 PFAM ID EPO_TPO ii. Cytokines Are Regulatory Peptides That Can Be Produced by Various Cells For Communicating and Orchestrating the Large Multicellular System. Cytokines Are Key Mediators of Hematopoiesis Immunity Allergy Inflammation Tissue Remodelling Angiogenesis and Embryonic Development [2]. Superfamily Includes Both the Long and Short Chain Helical Cytokines. This vii. Which normal human tissues is this gene mainly expressed in? How did you determine this? i. plasma and regulates red cell production ii. find in NCBI Entrez Gene viii. Is this protein involved in any biological pathway(s)? If so, what does the pathway do? i. Yes. Putative erythropoietin signaling pathway (part 2) Role of Akt in hypoxia induced HIF1 activation hsa04060 Cytokine-cytokine receptor interaction hsa04630 Jak-STAT signaling pathway hsa04640 Hematopoietic cell lineage ix. Do any other databases contain information about the superfamily of this target gene product? Which superfamily? How did you find out? i. SUPERFAMILY x. ii. Erythropoietin (EPO) mimetic peptides iii. From classmate. Look for publications relevant to the function(s) of this protein in the biomedical literature. Show one abstract of a relevant article. Abstract The solution structure of human erythropoietin (EPO) has been determined by nuclear magnetic resonance spectroscopy and the overall topology of the protein is revealed as a novel combination of features taken from both the long-chain and short-chain families of hematopoietic growth factors. Using the structure and data from mutagenesis studies we have elucidated the key physiochemical properties defining each of the two receptor binding sites on the EPO protein. A comparison of the NMR structure of the free EPO ligand to the receptor bound form, determined by X-ray crystallography, reveals conformational changes that may accompany receptor binding. xi. Show the protein 3-D structure if there is any. 1. Find the zebra fish homolog of the above gene. And answer the following questions: i. The zebra fish homolog is located on which chromosome? And in Human? i. Chromosome 7: 19.59m ii. ii. Chromosome 7: 100.16m Perform a cDNA and Polypeptide sequence alignment between human and zebra fish of this gene. i. cDNA >ENSDART00000077483 cdna:KNOWN_protein_coding ATGTTTCACGGTTCAGGACTCTTTGCCTTACTGCTGATGGTGCTGGAGTGGACCCGTCCA GGCCTGTCCTCCCCATTACGCCCCATCTGTGACCTGCGCGTCCTCGACCATTTCATTAAG GAGGCATGGGATGCAGAGGCTGCTATGAGAACTTGTAAGGACGATTGCAGCATTGCAACG AACGTCACTGTTCCTCTGACCAGAGTCGATTTTGAAGTCTGGGAAGCGATGAATATAGAG GAGCAAGCTCAGGAGGTCCAGTCAGGCTTACACATGCTGAACGAGGCCATTGGCTCATTA CAGATATCTAATCAGACTGAAGTGCTTCAGTCTCACATAGATGCCAGTATTAGAAACATC GCCAGCATCAGACAAGTGCTGCGAAGTCTCAGCATACCGGAATATGTACCTCCAACCAGT AGTGGAGAAGACAAGGAGACACAGAAAATATCCTCGATCTCAGAGCTGTTTCAGGTCCAT GTCAACTTTCTTCGGGGAAAAGCGCGTCTGCTGCTCGCCAATGCACCTGTCTGTCGACAG GGTGTCAGCTGA Polypeptide >ENSDART00000077483 peptide:ENSDARP00000071950 pep:KNOWN_protein_coding MFHGSGLFALLLMVLEWTRPGLSSPLRPICDLRVLDHFIKEAWDAEAAMRTCKDDCSIAT NVTVPLTRVDFEVWEAMNIEEQAQEVQSGLHMLNEAIGSLQISNQTEVLQSHIDASIRNI ASIRQVLRSLSIPEYVPPTSSGEDKETQKISSISELFQVHVNFLRGKARLLLANAPVCRQ GVS ii. cDNA alignment Polypeptide alignment iii. How many exons does this gene have in zebrafish? How did you determine this? i. ii. 5 There is an “Exon info” in first image, click in. And then we can see the information of exon. iv. What is the expression pattern of this gene in zebrafish? In human? In mouse? i. This gene can be found on Chromosome 7 at location 19,589,899-19,605,421. ii. This gene can be found on Chromosome 7 at location 100,156,359-100,159,257. iii. This gene can be found on Chromosome 5 at location 137,923,490-137,974,470.