Download HW1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
2008 Spring Biological database Homework 1
This problem set is due by 2PM, March 25, 2008. You shall upload your answers to your
web site as instructed by your TA. For all questions, please make a reference such as
screen-shot to indicate the source of your answer.
1. Here is a nucleotide sequence:
CTCCAGGCCCGTGGGGCTGGCCCTGCACCGCCGAGCTTCCCGGGATGAGGGCCCCCGGTGTGGTCACCCG
GCGCGCCCCAGGTCGCTGAGGGACCCCGGCCAGGCGCGGAGATGGGGGTGCACGAATGTCCTGCCTGGCT
GTGGCTTCTCCTGTCCCTGCTGTCGCTCCCTCTGGGCCTCCCAGTCCTGGGCGCCCCACCACGCCTCATC
TGTGACAGCCGAGTCCTGGAGAGGTACCTCTTGGAGGCCAAGGAGGCCGAGAATATCACGACGGGCTGTG
CTGAACACTGCAGCTTGAATGAGAATATCACTGTCCCAGACACCAAAGTTAATTTCTATGCCTGGAAGAG
GATGGAGGTCGGGCAGCAGGCCGTAGAAGTCTGGCAGGGCCTGGCCCTGCTGTCGGAAGCTGTCCTGCGG
GGCCAGGCCCTGTTGGTCAACTCTTCCCAGCCGTGGGAGCCCCTGCAGCTGCATGTGGATAAAGCCGTCA
GTGGCCTTCGCAGCCTCACCACTCTGCTTCGGGCTCTGGGAGCCCAGAAGGAAGCCATCTCCCCTCCAGA
TGCGGCCTCAGCTGCTCCACTCCGAACAATCACTGCTGACACTTTCCGCAAACTCTTCCGAGTCTACTCC
AATTTCCTCCGGGGAAAGCTGAAGCTGTACACAGGGGAGGCCTGCAGGACAGGGGACAGATGACCAGGTG
TGTCCACCTGGGCATATCCACCACCTCCCTCACCAACATTGCTTGTGCCACACCCTCCCCCGCCACTCCT
GAACCCCGTCGAGGGGCTCTCAGCTCAGCGCCAGCCTGTCCCATGGACACTCCAGTGCCAGCAATGACAT
CTCAGGGGCCAGAGGAACTGTCCAGAGAGCAACTCTGAGATCTAAGGATGTCACAGGGCCAACTTGAGGG
CCCAGAGCAGGAAGCATTCAGAGAGCAGCTTTAAACTCAGGGACAGAGCCATGCTGGGAAGACGCCTGAG
CTCACTCGGCACCCTGCAAAATTTGATGCCAGGACACGCTTTGGAGGCGATTTACCTGTTTTCGCACCTA
CCATCAGGGACAGGATGACCTGGAGAACTTAGGTGGCAAGCTGTGACTTCTCCAGGTCTCACGGGCATGG
Please use database mining tools of your choice to tell me as much as you can
about this sequence.
i.
What gene does this sequence represent in human? What is its GI number?
GenBank Accession number? Gene symbol? Unigene ID?
(1) erythropoietin
(2) 62240996
(3) NM_000799
(4) EPO
(5) Hs.2303
ii.
What database(s) did you search, and what tool(s) did you use in your search?
What parameter settings did you use?
i.
ii.
iii.
Database : NCBI
Tool : BLAST
Parameter : nucleotide blast and database is Human genomic plus transcript
iii.
Retrieve one ortholog of this gene’s complete mRNA sequence and Protein
sequence in FASTA format. Compare the results obtained by blastn vs.
blastp.
i.
mRNA sequence
>gi|54792749|ref|NM_001006646.1| Canis lupus familiaris erythropoietin (EPO), mRNA
ATGTGTGAACCTGCCCCTCCAAAACCCACACAGTCAGCCTGGCACTCTTTTCCAGAATGTCCTGCCCTGC
TCCTTTTGCTGTCTTTGCTGCTGCTTCCTCTGGGCCTCCCAGTCCTGGGCGCCCCCCCTCGCCTCATTTG
TGACAGCCGGGTCCTGGAGAGATACATCCTGGAGGCCAGGGAGGCCGAAAATGTCACGATGGGCTGTGCT
CAAGGCTGCAGCTTCAGTGAGAATATCACCGTCCCAGACACCAAGGTTAATTTCTATACCTGGAAGAGGA
TGGATGTTGGGCAGCAGGCCTTGGAAGTCTGGCAGGGCCTGGCACTGCTCTCAGAAGCCATCCTGCGGGG
TCAGGCCCTGTTGGCCAACGCCTCCCAGCCATCTGAGACTCCGCAGCTGCATGTGGACAAAGCCGTCAGC
AGCCTGCGCAGCCTCACCTCTCTGCTTCGGGCGCTGGGAGCCCAGAAGGAGGCCATGTCCCTTCCAGAGG
AAGCCTCTCCTGCTCCACTCCGAACATTCACTGTTGATACTTTGTGCAAACTTTTCCGAATCTACTCCAA
TTTCCTCCGTGGAAAGCTGACACTGTACACAGGGGAGGCCTGCAGAAGAGGAGACAGGTGACCAGGTGCT
CCCACCCCAGGCACATCCACCACCTCACTCACTACCACTGCCTGGGCCACGCCTCTGCACCACCACTCCT
GACCCCTGTCCAGGGGTGATCTGCTCAGCACCAGCCTGTCCCTGTCCCTTGGACACTCCACGGCCAGTGG
TGATATCTCAAGGGCCAGAGGAACTGTCCAGAGCTCAAATCAGATCTAAGGATGTCACAGTGCCAGCCTG
AGGCCCGAAGCAGGAGGAATTCGGAGGAAATCAGCTCAAACTTGGGGACAGAGCCTTGCTCGGGAGACTC
ACCTCGGTGCCCTGCCGAACAGTGATGCCAGGACAAGCTGGAGGGCAATTGCCGATTTTTTGCACCTATC
AGGGAGAGACAGGAGAGGCTAGAGAACTAGGTGGCAAGCCATAAATCTTTTAGGCTTCGGGTCTCCTATG
ACAGCAAGAGCCCACTGGCAAAGGGGGGGGAGCCATGGAGATGGGATAGGGGCTGGCCCAAAAAAAAAAA
AA
Protein sequence
>gi|54792750|ref|NP_001006647.1| erythropoietin [Canis lupus familiaris]
MCEPAPPKPTQSAWHSFPECPALLLLLSLLLLPLGLPVLGAPPRLICDSRVLERYILEAREAENVTMGCA
QGCSFSENITVPDTKVNFYTWKRMDVGQQALEVWQGLALLSEAILRGQALLANASQPSETPQLHVDKAVS
SLRSLTSLLRALGAQKEAMSLPEEASPAPLRTFTVDTLCKLFRIYSNFLRGKLTLYTGEACRRGDR
ii.
blastn
blastp
iv.
Retrieve at least 5 homologenes of this gene. Perform a multiple sequence
alignment? The human sequence is most similar to what organism?
i.
ii.
v.
Pan troglodytes.
Score is 99.
Is the secondary structure of this protein known? If so, how many “helical
fold”are there in its 3D protein structure? How did you determine the exact
amino acid number of each helical region?
i.
Yes.
ii.
iii.
4
I use PDB to search erythropoietin and then the website will
show the picture of the structure.
vi.
Is the function of this protein known? If so, what does it do?
i.
Yes.
PFAM Accession
PF00758
PFAM ID
EPO_TPO
ii.
Cytokines Are Regulatory Peptides That Can Be Produced by Various Cells For
Communicating and Orchestrating the Large Multicellular System. Cytokines
Are Key Mediators of Hematopoiesis Immunity Allergy Inflammation Tissue
Remodelling
Angiogenesis
and
Embryonic
Development
[2].
Superfamily Includes Both the Long and Short Chain Helical Cytokines.
This
vii.
Which normal human tissues is this gene mainly expressed in? How did you
determine this?
i.
plasma and regulates red cell production
ii.
find in NCBI Entrez Gene
viii.
Is this protein involved in any biological pathway(s)? If so, what does the
pathway do?
i.
Yes.
Putative erythropoietin signaling pathway (part 2)
Role of Akt in hypoxia induced HIF1 activation
hsa04060 Cytokine-cytokine receptor interaction
hsa04630 Jak-STAT signaling pathway
hsa04640 Hematopoietic cell lineage
ix.
Do any other databases contain information about the superfamily of this
target gene product? Which superfamily? How did you find out?
i.
SUPERFAMILY
x.
ii.
Erythropoietin (EPO) mimetic peptides
iii.
From classmate.
Look for publications relevant to the function(s) of this protein in the
biomedical literature. Show one abstract of a relevant article.
Abstract
The solution structure of human erythropoietin (EPO) has been determined by
nuclear magnetic resonance spectroscopy and the overall topology of the protein
is revealed as a novel combination of features taken from both the long-chain
and short-chain families of hematopoietic growth factors. Using the structure and
data from mutagenesis studies we have elucidated the key physiochemical
properties defining each of the two receptor binding sites on the EPO protein. A
comparison of the NMR structure of the free EPO ligand to the receptor bound
form, determined by X-ray crystallography, reveals conformational changes that
may accompany receptor binding.
xi.
Show the protein 3-D structure if there is any.
1. Find the zebra fish homolog of the above gene. And answer the following
questions:
i.
The zebra fish homolog is located on which chromosome? And in Human?
i.
Chromosome 7: 19.59m
ii.
ii.
Chromosome 7: 100.16m
Perform a cDNA and Polypeptide sequence alignment between human and
zebra fish of this gene.
i.
cDNA
>ENSDART00000077483 cdna:KNOWN_protein_coding
ATGTTTCACGGTTCAGGACTCTTTGCCTTACTGCTGATGGTGCTGGAGTGGACCCGTCCA
GGCCTGTCCTCCCCATTACGCCCCATCTGTGACCTGCGCGTCCTCGACCATTTCATTAAG
GAGGCATGGGATGCAGAGGCTGCTATGAGAACTTGTAAGGACGATTGCAGCATTGCAACG
AACGTCACTGTTCCTCTGACCAGAGTCGATTTTGAAGTCTGGGAAGCGATGAATATAGAG
GAGCAAGCTCAGGAGGTCCAGTCAGGCTTACACATGCTGAACGAGGCCATTGGCTCATTA
CAGATATCTAATCAGACTGAAGTGCTTCAGTCTCACATAGATGCCAGTATTAGAAACATC
GCCAGCATCAGACAAGTGCTGCGAAGTCTCAGCATACCGGAATATGTACCTCCAACCAGT
AGTGGAGAAGACAAGGAGACACAGAAAATATCCTCGATCTCAGAGCTGTTTCAGGTCCAT
GTCAACTTTCTTCGGGGAAAAGCGCGTCTGCTGCTCGCCAATGCACCTGTCTGTCGACAG
GGTGTCAGCTGA
Polypeptide
>ENSDART00000077483 peptide:ENSDARP00000071950 pep:KNOWN_protein_coding
MFHGSGLFALLLMVLEWTRPGLSSPLRPICDLRVLDHFIKEAWDAEAAMRTCKDDCSIAT
NVTVPLTRVDFEVWEAMNIEEQAQEVQSGLHMLNEAIGSLQISNQTEVLQSHIDASIRNI
ASIRQVLRSLSIPEYVPPTSSGEDKETQKISSISELFQVHVNFLRGKARLLLANAPVCRQ
GVS
ii.
cDNA alignment
Polypeptide alignment
iii.
How many exons does this gene have in zebrafish? How did you determine
this?
i.
ii.
5
There is an “Exon info” in first image, click in. And then we can
see the information of exon.
iv.
What is the expression pattern of this gene in zebrafish? In human? In mouse?
i.
This gene can be found on Chromosome 7 at location 19,589,899-19,605,421.
ii.
This gene can be found on Chromosome 7 at location 100,156,359-100,159,257.
iii.
This gene can be found on Chromosome 5 at location 137,923,490-137,974,470.
Related documents