Download Jeffrey Tsai

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Jeffrey Tsai
2/17/07
Lab 3 – Bioinformatics Lab Notebook Expectations
FGFR3 Gene
1. Name of starting gene from different organism and its protein sequence.
Homo sapiens fibroblast growth factor receptor 3
(achondroplasia, thanatophoric dwarfism)
Protein Sequence:
TRALPEDAGEYTCLAGNSIGFSHHSAWLVVLPAEEELVEADEAGSVYAGILSYGVGFFLFILVVAAVTLCR
LRSPPKKGLGSPTVHKISRFPLKRQVSLESNASMSSNTPLVRIARLSSGEGPTLANVSELELPADPKWELS
RARLTLGKPLGEGCFGQVVMAEAIGIDKDRAAKPVTVAVKMLKDDATDKDLSDLVSEMEMMKMIGKHKNII
NLLGACTQGGPLYVLVEYAAKGNLREFLRARRPPGLDYSFDTCKPPEEQLTFKDLVSCAYQVARGMEYLAS
QKCIHRDLAARNVLVTEDNVMKIADFGLARDVHNLDYYKKTTNGRLPVKWMAPEALFDRVYTHQSDVWSFG
VLLWEIFTLGGSPYPGIPVEELFKLLKEGHRMDKPANCTHDLYMIMRECWHAAPSQRPTFKQLVEDLDRVL
TVTSTDEYLDLSAPFEQYSPGGQDTPSSSSSGDDSVFAHDLLPPAPPSSGGSRT
2. E-values and TTHERM numbers of top three Tetrahymena homologs.
TTHERM
E-value
TTHERM_00522150 protein kinase
domain;
5.1e-22
alias=65.m00148,T000017367,PreTt12...
TTHERM_01502010 Protein kinase
domain;
1.4e-19
alias=427.m00006,T000023223,PreTt1...
TTHERM_00578620 Protein kinase
domain;
5.8e-19
alias=78.m00127,T000011046,PreTt05...
Note: Chose TTHERM_01502010
3. Coding sequence and translated (protein) sequence of Tetrahymena homolog that you
will GFP-tag.
Coding Sequence of TTHERM_01502010
ATGAAGAATCAGGAATTTAATAACTCAGCTCAAAAATCATCAACCTACAAATTACCAATA
AAGAAAAATAGCATGGGTTCCACGATTGGATGTTGGATATCATCTGCCAAGCGCAAAAAT
TTTTTATTTACAAGGGAAATACCATTTTCGGAGGATACTCCAAATTAAATGAACAAGATT
GAAGGGAAGAAAACGATATTTAATAGAAAAACAACTATTTAGTAAAAACAAAATCAGCTT
TAATAGCAACAAGTATAGCAGCAGACTATTTAAAATTCCAATTAGGGTGCATCTTTGTCT
GTTTTGCAGTCAGGACAGGCTCTTCCAAAGCTTAAAATATAAAATGATAGTGTCAACAAT
GCAGCAAATTAAAATTTAGGTGAATCTTCGACAGCTGAAAAAGCTGTTGAAGAGCAAGAG
CGCAAAAAGCCTCAAGGCTTTGTAGAAAAGATTAGGCAACTCAAGAATGTAAATGCAAAA
TAAGTAGAAAACGATTAGTAGCAAAATATCAATGTGAACAACAGCAAAGTAAATAAACTA
CAATCACCAGCGACCACGATGGCTTTCAGGTCTTTGAAGCATAATGAAGATATCCAATTA
TATCTAAATAATTATGAAGACCATGAAGAGGAAGATGAAGAAGAAATTGAATATAAAATT
AATAGAGAGGATACTTAGTTTTCAGACGAATCAGCAAATAGTCCAAATAAATCAAATAAA
TTCCAGCTCTCTTCTTCAGATGCTAGCAATGGACTCCAATCAACAAATGGCAAATCTATA
AATTTATTGCAGCAGAAAAAAGAAAATTAAGCAAATACTTAGCAGTAAGAAGAATTAATC
TCTGTGAACAATAGCAATGCTAAAAATATAGATTTCTATAAAGATGAATCAAAAAATAAT
CAAGAAGAAAAAAAAACATTTAAAAACGACCAAGATCAAATCATTTAGACAAAAAGCATC
AGAGAGAGATTCTCCTTACCAGAAAAACCGAAAGATTCTCAAATACCTCCAATAAAAGAA
ATCAGCAGAGCTAATAGTCAAAAAGATAATTCCACTTCACAGTCACCTTCGATACTTGCT
CAAAGCGAGCTAGAAAATTCAAAAGCCGAAGCTTCTCTATAAAAAATTGTCAAATGGAAG
TCAGGCGATTTCATAGGTGCAGGAAGCTTTGGTCAAGTATTTACAGCCATGAACTGCAAC
ACTGGAGAAATATTTGTCGTCAAAAAGATCATGGTCCATGGTCAATCGAAGTTAGACAAA
GAATTTTTGGATGAATAAGAAAAAGAACTAAGGATCATGCAAACATTATCTCATAAACAC
ATAATTTAATATAAAGGACATGAGAGATAATAAGATTGCTTGTGTATATTTTTGGAATAC
ATGAGCGAAGGAAATATTGATTAGATGCTGAAGAAGTTTGGGCCTTTGGAGGAATAGACA
ATTAAGGTTTATGCTAGACAAATATTAAGTGGAATTCAATATCTCCACTCACAAAAAGTG
ATACATAAAGATATCAAAGGCGCTAATATACTTGTTGGATCAGATGGCATAGTTAAGTTA
TCTGATTTTGGCTGCGCTAAGTAATTAGAACTCACTTTAAATAGTAACAAAGAAATGAAT
AAAACACTAAAGGGCTCGGTTCCTTGGATGTCTCCTGAAATAGTAACCTAGACTAAATAT
GATACAAAGGCTGACATTTGGTCATTTGGTTGCACAATACTTGAAATGGCTCAAGCAGAA
GCTCCATGGTCAAACTATTAGTTTGACAATCCGATTGCAGCTATAATGAAAATAGGTCTT
AGCGATGAAATCCCACAAATCCCAGAAACAATCTCTCCTGACCTCAATCAGTTCATCAGA
AAGTGCTTGCAGAGAGATCCTTCTAAAAGGCCAACTGCAACTGAGCTTTTAAACGACTCT
TTCTTAGCTGAAAATTGA
AA Sequence Translation of TTHERM_01502010
MKNQEFNNSAQKSSTYKLPIKKNSMGSTIGCWISSAKRKNFLFTREIPFSEDTPNQMNKI
EGKKTIFNRKTTIQQKQNQLQQQQVQQQTIQNSNQGASLSVLQSGQALPKLKIQNDSVNN
AANQNLGESSTAEKAVEEQERKKPQGFVEKIRQLKNVNAKQVENDQQQNINVNNSKVNKL
QSPATTMAFRSLKHNEDIQLYLNNYEDHEEEDEEEIEYKINREDTQFSDESANSPNKSNK
FQLSSSDASNGLQSTNGKSINLLQQKKENQANTQQQEELISVNNSNAKNIDFYKDESKNN
QEEKKTFKNDQDQIIQTKSIRERFSLPEKPKDSQIPPIKEISRANSQKDNSTSQSPSILA
QSELENSKAEASLQKIVKWKSGDFIGAGSFGQVFTAMNCNTGEIFVVKKIMVHGQSKLDK
EFLDEQEKELRIMQTLSHKHIIQYKGHERQQDCLCIFLEYMSEGNIDQMLKKFGPLEEQT
IKVYARQILSGIQYLHSQKVIHKDIKGANILVGSDGIVKLSDFGCAKQLELTLNSNKEMN
KTLKGSVPWMSPEIVTQTKYDTKADIWSFGCTILEMAQAEAPWSNYQFDNPIAAIMKIGL
SDEIPQIPETISPDLNQFIRKCLQRDPSKRPTATELLNDSFLAEN*
4. Translations of the coding sequence in all 3 reading frames. Indicate which reading
frame you predict is correct. Is this the same as predicted by the Tetrahymena genome
database?
>EMBOSS_001_1 (This one seems correct and matches the translation above)
MKNQEFNNSAQKSSTYKLPIKKNSMGSTIGCWISSAKRKNFLFTREIPFSEDTPNQMNKI
EGKKTIFNRKTTIQQKQNQLQQQQVQQQTIQNSNQGASLSVLQSGQALPKLKIQNDSVNN
AANQNLGESSTAEKAVEEQERKKPQGFVEKIRQLKNVNAKQVENDQQQNINVNNSKVNKL
QSPATTMAFRSLKHNEDIQLYLNNYEDHEEEDEEEIEYKINREDTQFSDESANSPNKSNK
FQLSSSDASNGLQSTNGKSINLLQQKKENQANTQQQEELISVNNSNAKNIDFYKDESKNN
QEEKKTFKNDQDQIIQTKSIRERFSLPEKPKDSQIPPIKEISRANSQKDNSTSQSPSILA
QSELENSKAEASLQKIVKWKSGDFIGAGSFGQVFTAMNCNTGEIFVVKKIMVHGQSKLDK
EFLDEQEKELRIMQTLSHKHIIQYKGHERQQDCLCIFLEYMSEGNIDQMLKKFGPLEEQT
IKVYARQILSGIQYLHSQKVIHKDIKGANILVGSDGIVKLSDFGCAKQLELTLNSNKEMN
KTLKGSVPWMSPEIVTQTKYDTKADIWSFGCTILEMAQAEAPWSNYQFDNPIAAIMKIGL
SDEIPQIPETISPDLNQFIRKCLQRDPSKRPTATELLNDSFLAEN*
>EMBOSS_001_2
*RIRNLITQLKNHQPTNYQQRKIAWVPRLDVGYHLPSAKIFYLQGKYHFRRILQIK*TRL
KGRKRYLIEKQLFSKNKISFNSNKYSSRLFKIPIRVHLCLFCSQDRLFQSLKYKMIVSTM
QQIKIQVNLRQLKKLLKSKSAKSLKALQKRLGNSRMQMQNKQKTISSKISM*TTAKQINY
NHQRPRWLSGL*SIMKISNYIQIIMKTMKRKMKKKLNIKLIERILSFQTNQQIVQINQIN
SSSLLQMLAMDSNQQMANLQIYCSRKKKIKQILSSKKNQSL*TIAMLKIQISIKMNQKII
KKKKKHLKTTKIKSFRQKASERDSPYQKNRKILKYLQQKKSAELIVKKIIPLHSHLRYLL
KASQKIQKPKLLYKKLSNGSQAISQVQEALVKYLQP*TATLEKYLSSKRSWSMVNRSQTK
NFWMNKKKNQGSCKHYLINTQFNIKDMRDNKIACVYFWNT*AKEILIRC*RSLGLWRNRQ
LRFMLDKYQVEFNISTHKK*YIKISKALIYLLDQMAQLSYLILAALSNQNSLQIVTKK*I
KHQRARFLGCLLKQQPRLNMIQRLTFGHLVAQYLKWLKQKLHGQTISLTIRLQLQ*KQVL
AMKSHKSQKQSLLTSISSSESACREILLKGQLQLSFQTTLSQLKI
>EMBOSS_001_3
EESGIQQLSSKIINLQITNKEKQHGFHDWMLDIICQAQKFFIYKGNTIFGGYSKLNEQD*
REENDIQQKNNYLVKTKSALIATSIAADYLKFQLGCIFVCFAVRTGSSKAQNIK*QCQQC
SKLKFR*IFDS*KSC*RARAQKASRLCRKDQATQECKCKISRKRLVAKYQCEQQQSKQTT
ITSDHDGFQVFEAQ*RYPIISKQL*RP*RGR*RRN*IQNQQRGYLVFRRISKQSKQIKQI
PALFFRCQQWTPINKWQIYKFIAAEKRKLSKYLAVRRINLCEQQQCQKYRFLQR*IKKQS
RRKKNIQKRPRSNHLDKKHQREILLTRKTERFSNTSNKRNQQSQQSKRQFHFTVTFDTCS
KRARKFKSRSFSIKNCQMEVRRFHRCRKLWSSIYSHELQHWRNICRQKDHGPWSIEVRQR
IFG*IRKRTKDHANIISQTHNLIQRT*EIIRLLVYIFGIHERRKY*LDAEEVWAFGGIDN
QGLCQTNIKWNSISPLTKSDTQRYQRRQYTCWIRWHSQVI*FWLRQVIRTHFKQQQRNEQ
NTKGLGSLDVS*NSNLDQI*YKG*HLVIWLHNT*NGSSRSSMVKLLV*QSDCSYNENRSQ
R*NPTNPRNNLS*PQSVHQKVLAERSFQKANCN*AFKRLFLS*KL
5. Answers to questions in part II (3) of exercise.




Record how many HDAC domains your protein has:
o There is one S_TKc domain.
Approximately how big is it?
o It is about 270 amino acids long.
What amino acid numbers does it include?
o The S_TKc domain roughly includes AA numbers 375 – 645
Is it near the N-terminus, C-terminus, or in the middle?
o The domain takes up roughly the final third of the gene. It begins
somewhat near the C-terminus and ends very close to the C-terminus.
6. Top 3 most similar proteins from other organisms in part II (4).
Organism
E-Value
gi|116061057|emb|CAL56445.1|
putative MAP3K alpha 1 protein k
Ostreococcus tauri
2e-63
gi|116643236|gb|ABK06426.1| HAtagged protein kinase domain o
synthetic construct
3e-62
gi|116643230|gb|ABK06423.1| HAtagged protein kinase domain o
synthetic construct
3e-61
gi|46389856|dbj|BAD15457.1|
putative MEK kinase [Oryza
sativa (j
Oryza sativa
(japonica cultivargroup)
5e-61
Solanum
lycopersicum
1e-60
gi|45861623|gb|AAS78640.1|
MAP3Ka [Lycopersicon esculentum]
7. The 3 reference citations obtained in part II (5)
Bonaventure, J. et al. Common Mutations in the Fibroblast Growth Factor Receptor 3 (FGFR 3)
Gene Account for Achondroplasia, Hypochondroplasia, and Thanatophoric Dwarfism.
American Journal of Medical Genetics. 63:148-154.
Santos, HG. et al. Clinical Hypochondroplasia in a Family Caused by a Heterozygous Double
Mutation in FGFR3 Encoding GLY380LYS. American Journal of Medical Genetics Part A.
143A:355-359.
Van Oers, JM. and Wild, PJ. et al. FGFR3 Mutations and a Normal CK20 Staining Pattern Define
Low-Grade Noninvasive Urothelial Bladder Tumours. European Urology. Epub ahead of print
(Jan 12, 2007)
8. Restriction map of the genomic Tetrahymena gene sequence.
Source: http://tools.neb.com/NEBcutter2/index.php
9. Primer Design, Tms, % G/C content, primer length. Highlight where these will anneal
on the coding strand (assuming it is double-stranded).
(Bold is where primers will anneal)
Sequence Beginning:
5- AAGAATCAGGAATTTAATAACTC -3
3- TTCTTAGTCCTTAAATTATTGAG -5
5-CTCTTTCTTAGCTGAAAATTGA-3
Sequence Ending:
5- TCAATTTTCAGCTAAGAAAGAG-3
3- AGTTAAAAGTCGATTCTTTCTC -5
Primers:
Beginning:
5-AAGAATCAGGAATTTAATAACTC-3
 Tm = 46ºC
 % G/C = 26
 Primer Length = 23
Calculation of Estimated Tm:
Tm = 4(G+C) + 2(A+T)
= 4(6) + 2(17)
= 24 + 34
Tm = 58 ºC
Ending –
5-TCAATTTTCAGCTAAGAAAGAG-3
 Tm = 47ºC
 % G/C = 32
 Primer Length = 22
Calculation of Estimated Tm:
Tm = 4(G+C) + 2(A+T)
= 4(7) + 2(15)
= 28 + 30
Tm = 58 ºC
10. Map of cloning vector.
pIGF:
Gene X
BamHI
Gateway
XhoI
PspOMI
pIGF-GTW
BamHI
BamHI
MTT promoter
GFP
BamHI
BamHI
Mtt
pmr-rRNA
2.5 kbp
Ampicillin resistance gene
Related documents