Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2008 Spring Biological database Homework 1 This problem set is due by 2PM, March 25, 2008. You shall upload your answers to your web site as instructed by your TA. For all questions, please make a reference such as screen-shot to indicate the source of your answer. 1. Here is a nucleotide sequence: CTCCAGGCCCGTGGGGCTGGCCCTGCACCGCCGAGCTTCCCGGGATGAGGGCCCCCGGTGTGGTCACCCG GCGCGCCCCAGGTCGCTGAGGGACCCCGGCCAGGCGCGGAGATGGGGGTGCACGAATGTCCTGCCTGGCT GTGGCTTCTCCTGTCCCTGCTGTCGCTCCCTCTGGGCCTCCCAGTCCTGGGCGCCCCACCACGCCTCATC TGTGACAGCCGAGTCCTGGAGAGGTACCTCTTGGAGGCCAAGGAGGCCGAGAATATCACGACGGGCTGTG CTGAACACTGCAGCTTGAATGAGAATATCACTGTCCCAGACACCAAAGTTAATTTCTATGCCTGGAAGAG GATGGAGGTCGGGCAGCAGGCCGTAGAAGTCTGGCAGGGCCTGGCCCTGCTGTCGGAAGCTGTCCTGCGG GGCCAGGCCCTGTTGGTCAACTCTTCCCAGCCGTGGGAGCCCCTGCAGCTGCATGTGGATAAAGCCGTCA GTGGCCTTCGCAGCCTCACCACTCTGCTTCGGGCTCTGGGAGCCCAGAAGGAAGCCATCTCCCCTCCAGA TGCGGCCTCAGCTGCTCCACTCCGAACAATCACTGCTGACACTTTCCGCAAACTCTTCCGAGTCTACTCC AATTTCCTCCGGGGAAAGCTGAAGCTGTACACAGGGGAGGCCTGCAGGACAGGGGACAGATGACCAGGTG TGTCCACCTGGGCATATCCACCACCTCCCTCACCAACATTGCTTGTGCCACACCCTCCCCCGCCACTCCT GAACCCCGTCGAGGGGCTCTCAGCTCAGCGCCAGCCTGTCCCATGGACACTCCAGTGCCAGCAATGACAT CTCAGGGGCCAGAGGAACTGTCCAGAGAGCAACTCTGAGATCTAAGGATGTCACAGGGCCAACTTGAGGG CCCAGAGCAGGAAGCATTCAGAGAGCAGCTTTAAACTCAGGGACAGAGCCATGCTGGGAAGACGCCTGAG CTCACTCGGCACCCTGCAAAATTTGATGCCAGGACACGCTTTGGAGGCGATTTACCTGTTTTCGCACCTA CCATCAGGGACAGGATGACCTGGAGAACTTAGGTGGCAAGCTGTGACTTCTCCAGGTCTCACGGGCATGG Please use database mining tools of your choice to tell me as much as you can about this sequence. What gene does this sequence represent in human? (Erythropoietin) What is its GI number? (GeneID: 2056) GenBank Accession number?(NM_000799) Gene symbol? (EPO) Unigene ID? (UGID:131206 UniGene Hs.2303) From this website, we know that it is erythropoietin gene. GeneID: 2056 Gene symbol: EPO ACCESSION NM_000799 UGID:131206 UniGene Hs.2303 i. What database(s) did you search, and what tool(s) did you use in your search? What parameter settings did you use? Blast, Unigene, GenBank, CoreNucleotide, Google ii. Retrieve one ortholog of this gene’s complete mRNA sequence and Protein sequence in FASTA format. Compare the results obtained by blastn vs. blastp. mRNA sequence sequence in FASTA format : >gi|62240996|ref|NM_000799.2| Homo sapiens erythropoietin (EPO), mRNA CCCGGAGCCGGACCGGGGCCACCGCGCCCGCTCTGCTCCGACACCGCGCCCCCTGGACAGCCGCCCTCTC CTCCAGGCCCGTGGGGCTGGCCCTGCACCGCCGAGCTTCCCGGGATGAGGGCCCCCGGTGTGGTCACCCG GCGCGCCCCAGGTCGCTGAGGGACCCCGGCCAGGCGCGGAGATGGGGGTGCACGAATGTCCTGCCTGGCT GTGGCTTCTCCTGTCCCTGCTGTCGCTCCCTCTGGGCCTCCCAGTCCTGGGCGCCCCACCACGCCTCATC TGTGACAGCCGAGTCCTGGAGAGGTACCTCTTGGAGGCCAAGGAGGCCGAGAATATCACGACGGGCTGTG CTGAACACTGCAGCTTGAATGAGAATATCACTGTCCCAGACACCAAAGTTAATTTCTATGCCTGGAAGAG GATGGAGGTCGGGCAGCAGGCCGTAGAAGTCTGGCAGGGCCTGGCCCTGCTGTCGGAAGCTGTCCTGCGG GGCCAGGCCCTGTTGGTCAACTCTTCCCAGCCGTGGGAGCCCCTGCAGCTGCATGTGGATAAAGCCGTCA GTGGCCTTCGCAGCCTCACCACTCTGCTTCGGGCTCTGGGAGCCCAGAAGGAAGCCATCTCCCCTCCAGA TGCGGCCTCAGCTGCTCCACTCCGAACAATCACTGCTGACACTTTCCGCAAACTCTTCCGAGTCTACTCC AATTTCCTCCGGGGAAAGCTGAAGCTGTACACAGGGGAGGCCTGCAGGACAGGGGACAGATGACCAGGTG TGTCCACCTGGGCATATCCACCACCTCCCTCACCAACATTGCTTGTGCCACACCCTCCCCCGCCACTCCT GAACCCCGTCGAGGGGCTCTCAGCTCAGCGCCAGCCTGTCCCATGGACACTCCAGTGCCAGCAATGACAT CTCAGGGGCCAGAGGAACTGTCCAGAGAGCAACTCTGAGATCTAAGGATGTCACAGGGCCAACTTGAGGG CCCAGAGCAGGAAGCATTCAGAGAGCAGCTTTAAACTCAGGGACAGAGCCATGCTGGGAAGACGCCTGAG CTCACTCGGCACCCTGCAAAATTTGATGCCAGGACACGCTTTGGAGGCGATTTACCTGTTTTCGCACCTA CCATCAGGGACAGGATGACCTGGAGAACTTAGGTGGCAAGCTGTGACTTCTCCAGGTCTCACGGGCATGG GCACTCCCTTGGTGGCAAGAGCCCCCTTGACACCGGGGTGGTGGGAACCATGAAGACAGGATGGGGGCTG GCCTCTGGCTCTCATGGGGTCCAAGTTTTGTGTATTCTTCAACCTCATTGACAAGAACTGAAACCACCAA AAAAAAAAAA Protein sequence in FASTA format: >gi|62240997|ref|NP_000790.2| erythropoietin precursor [Homo sapiens] MGVHECPAWLWLLLSLLSLPLGLPVLGAPPRLICDSRVLERYLLEAKEAENITTGCAEHCSLNENITVPD TKVNFYAWKRMEVGQQAVEVWQGLALLSEAVLRGQALLVNSSQPWEPLQLHVDKAVSGLRSLTTLLRALG AQKEAISPPDAASAAPLRTITADTFRKLFRVYSNFLRGKLKLYTGEACRTGDR Blastn:Nucleotide blast Blastp:Protein Blast iii. Retrieve at least 5 homologenes of this gene. Perform a multiple sequence alignment? The human sequence is most similar to what organism? The human sequence is most similar to chimpanzee because of their similarity is 99.48(n) and 99.48(a). iv. Is the secondary structure of this protein known? If so, how many “helical fold”are there in its 3D protein structure?(4 helical fold) How did you determine the exact amino acid number of each helical region?(18,28,20,24) the exact amino acid number of each helical region?(18,28,20,24 amino acid) v. Is the function of this protein known? If so, what does it do? EPO It is used in treating anemia resulting from chronic kidney disease, from the treatment of cancer (chemotherapy & radiation) and from other critical illnesses (heart failure). Erythropoietin is available as a therapeutic agent produced by recombinant DNA technology in mammalian cell culture.. vi. Which normal human tissues is this gene mainly expressed in? How did you determine this? From expression profile below, this gene mainly expressed in eye and prostate. vii. Is this protein involved in any biological pathway(s)? If so, what does the pathway do? Erythropoiesis is the process by which red blood cells (erythrocytes) are produced. In human adults, this usually occurs within the bone marrow.(Although in humans with certain diseases and in some animals, erythropoeiesis also occurs outside the bone marrow, within the spleen or liver, this is termed extramedullary erythropoiesis.) viii. Do any other databases contain information about the superfamily of this target gene product? Which superfamily? How did you find out? GeneCards databases contain information about the superfamily of this gene. ix. Look for publications relevant to the function(s) of this protein in the biomedical literature. Show one abstract of a relevant article. TOPIC: Stat5 activation enables erythropoiesis in the absence of EpoR and Jak2. x. Show the protein 3-D structure if there is any. 1. Find the zebra fish homolog of the above gene. And answer the following questions: i. The zebra fish homolog is located on which chromosome? And in Human? Human chromosome: 7; Location:7q22 Zebra fish chromosome: 1 Perform a cDNA and Polypeptide sequence alignment between human and zebra fish of this gene. cDNA sequence alignment Polypeptide sequence alignment ii. How many exons does this gene have in zebrafish? How did you determine this? Exons: 5 Transcript length: 1,825 bps iii. Translation length: 182 residues What is the expression pattern of this gene in zebrafish? In human? In mouse? zebrafish Human mouse?