Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Jeffrey Tsai 2/17/07 Lab 3 – Bioinformatics Lab Notebook Expectations FGFR3 Gene 1. Name of starting gene from different organism and its protein sequence. Homo sapiens fibroblast growth factor receptor 3 (achondroplasia, thanatophoric dwarfism) Protein Sequence: TRALPEDAGEYTCLAGNSIGFSHHSAWLVVLPAEEELVEADEAGSVYAGILSYGVGFFLFILVVAAVTLCR LRSPPKKGLGSPTVHKISRFPLKRQVSLESNASMSSNTPLVRIARLSSGEGPTLANVSELELPADPKWELS RARLTLGKPLGEGCFGQVVMAEAIGIDKDRAAKPVTVAVKMLKDDATDKDLSDLVSEMEMMKMIGKHKNII NLLGACTQGGPLYVLVEYAAKGNLREFLRARRPPGLDYSFDTCKPPEEQLTFKDLVSCAYQVARGMEYLAS QKCIHRDLAARNVLVTEDNVMKIADFGLARDVHNLDYYKKTTNGRLPVKWMAPEALFDRVYTHQSDVWSFG VLLWEIFTLGGSPYPGIPVEELFKLLKEGHRMDKPANCTHDLYMIMRECWHAAPSQRPTFKQLVEDLDRVL TVTSTDEYLDLSAPFEQYSPGGQDTPSSSSSGDDSVFAHDLLPPAPPSSGGSRT 2. E-values and TTHERM numbers of top three Tetrahymena homologs. TTHERM E-value TTHERM_00522150 protein kinase domain; 5.1e-22 alias=65.m00148,T000017367,PreTt12... TTHERM_01502010 Protein kinase domain; 1.4e-19 alias=427.m00006,T000023223,PreTt1... TTHERM_00578620 Protein kinase domain; 5.8e-19 alias=78.m00127,T000011046,PreTt05... Note: Chose TTHERM_01502010 3. Coding sequence and translated (protein) sequence of Tetrahymena homolog that you will GFP-tag. Coding Sequence of TTHERM_01502010 ATGAAGAATCAGGAATTTAATAACTCAGCTCAAAAATCATCAACCTACAAATTACCAATA AAGAAAAATAGCATGGGTTCCACGATTGGATGTTGGATATCATCTGCCAAGCGCAAAAAT TTTTTATTTACAAGGGAAATACCATTTTCGGAGGATACTCCAAATTAAATGAACAAGATT GAAGGGAAGAAAACGATATTTAATAGAAAAACAACTATTTAGTAAAAACAAAATCAGCTT TAATAGCAACAAGTATAGCAGCAGACTATTTAAAATTCCAATTAGGGTGCATCTTTGTCT GTTTTGCAGTCAGGACAGGCTCTTCCAAAGCTTAAAATATAAAATGATAGTGTCAACAAT GCAGCAAATTAAAATTTAGGTGAATCTTCGACAGCTGAAAAAGCTGTTGAAGAGCAAGAG CGCAAAAAGCCTCAAGGCTTTGTAGAAAAGATTAGGCAACTCAAGAATGTAAATGCAAAA TAAGTAGAAAACGATTAGTAGCAAAATATCAATGTGAACAACAGCAAAGTAAATAAACTA CAATCACCAGCGACCACGATGGCTTTCAGGTCTTTGAAGCATAATGAAGATATCCAATTA TATCTAAATAATTATGAAGACCATGAAGAGGAAGATGAAGAAGAAATTGAATATAAAATT AATAGAGAGGATACTTAGTTTTCAGACGAATCAGCAAATAGTCCAAATAAATCAAATAAA TTCCAGCTCTCTTCTTCAGATGCTAGCAATGGACTCCAATCAACAAATGGCAAATCTATA AATTTATTGCAGCAGAAAAAAGAAAATTAAGCAAATACTTAGCAGTAAGAAGAATTAATC TCTGTGAACAATAGCAATGCTAAAAATATAGATTTCTATAAAGATGAATCAAAAAATAAT CAAGAAGAAAAAAAAACATTTAAAAACGACCAAGATCAAATCATTTAGACAAAAAGCATC AGAGAGAGATTCTCCTTACCAGAAAAACCGAAAGATTCTCAAATACCTCCAATAAAAGAA ATCAGCAGAGCTAATAGTCAAAAAGATAATTCCACTTCACAGTCACCTTCGATACTTGCT CAAAGCGAGCTAGAAAATTCAAAAGCCGAAGCTTCTCTATAAAAAATTGTCAAATGGAAG TCAGGCGATTTCATAGGTGCAGGAAGCTTTGGTCAAGTATTTACAGCCATGAACTGCAAC ACTGGAGAAATATTTGTCGTCAAAAAGATCATGGTCCATGGTCAATCGAAGTTAGACAAA GAATTTTTGGATGAATAAGAAAAAGAACTAAGGATCATGCAAACATTATCTCATAAACAC ATAATTTAATATAAAGGACATGAGAGATAATAAGATTGCTTGTGTATATTTTTGGAATAC ATGAGCGAAGGAAATATTGATTAGATGCTGAAGAAGTTTGGGCCTTTGGAGGAATAGACA ATTAAGGTTTATGCTAGACAAATATTAAGTGGAATTCAATATCTCCACTCACAAAAAGTG ATACATAAAGATATCAAAGGCGCTAATATACTTGTTGGATCAGATGGCATAGTTAAGTTA TCTGATTTTGGCTGCGCTAAGTAATTAGAACTCACTTTAAATAGTAACAAAGAAATGAAT AAAACACTAAAGGGCTCGGTTCCTTGGATGTCTCCTGAAATAGTAACCTAGACTAAATAT GATACAAAGGCTGACATTTGGTCATTTGGTTGCACAATACTTGAAATGGCTCAAGCAGAA GCTCCATGGTCAAACTATTAGTTTGACAATCCGATTGCAGCTATAATGAAAATAGGTCTT AGCGATGAAATCCCACAAATCCCAGAAACAATCTCTCCTGACCTCAATCAGTTCATCAGA AAGTGCTTGCAGAGAGATCCTTCTAAAAGGCCAACTGCAACTGAGCTTTTAAACGACTCT TTCTTAGCTGAAAATTGA AA Sequence Translation of TTHERM_01502010 MKNQEFNNSAQKSSTYKLPIKKNSMGSTIGCWISSAKRKNFLFTREIPFSEDTPNQMNKI EGKKTIFNRKTTIQQKQNQLQQQQVQQQTIQNSNQGASLSVLQSGQALPKLKIQNDSVNN AANQNLGESSTAEKAVEEQERKKPQGFVEKIRQLKNVNAKQVENDQQQNINVNNSKVNKL QSPATTMAFRSLKHNEDIQLYLNNYEDHEEEDEEEIEYKINREDTQFSDESANSPNKSNK FQLSSSDASNGLQSTNGKSINLLQQKKENQANTQQQEELISVNNSNAKNIDFYKDESKNN QEEKKTFKNDQDQIIQTKSIRERFSLPEKPKDSQIPPIKEISRANSQKDNSTSQSPSILA QSELENSKAEASLQKIVKWKSGDFIGAGSFGQVFTAMNCNTGEIFVVKKIMVHGQSKLDK EFLDEQEKELRIMQTLSHKHIIQYKGHERQQDCLCIFLEYMSEGNIDQMLKKFGPLEEQT IKVYARQILSGIQYLHSQKVIHKDIKGANILVGSDGIVKLSDFGCAKQLELTLNSNKEMN KTLKGSVPWMSPEIVTQTKYDTKADIWSFGCTILEMAQAEAPWSNYQFDNPIAAIMKIGL SDEIPQIPETISPDLNQFIRKCLQRDPSKRPTATELLNDSFLAEN* 4. Translations of the coding sequence in all 3 reading frames. Indicate which reading frame you predict is correct. Is this the same as predicted by the Tetrahymena genome database? >EMBOSS_001_1 (This one seems correct and matches the translation above) MKNQEFNNSAQKSSTYKLPIKKNSMGSTIGCWISSAKRKNFLFTREIPFSEDTPNQMNKI EGKKTIFNRKTTIQQKQNQLQQQQVQQQTIQNSNQGASLSVLQSGQALPKLKIQNDSVNN AANQNLGESSTAEKAVEEQERKKPQGFVEKIRQLKNVNAKQVENDQQQNINVNNSKVNKL QSPATTMAFRSLKHNEDIQLYLNNYEDHEEEDEEEIEYKINREDTQFSDESANSPNKSNK FQLSSSDASNGLQSTNGKSINLLQQKKENQANTQQQEELISVNNSNAKNIDFYKDESKNN QEEKKTFKNDQDQIIQTKSIRERFSLPEKPKDSQIPPIKEISRANSQKDNSTSQSPSILA QSELENSKAEASLQKIVKWKSGDFIGAGSFGQVFTAMNCNTGEIFVVKKIMVHGQSKLDK EFLDEQEKELRIMQTLSHKHIIQYKGHERQQDCLCIFLEYMSEGNIDQMLKKFGPLEEQT IKVYARQILSGIQYLHSQKVIHKDIKGANILVGSDGIVKLSDFGCAKQLELTLNSNKEMN KTLKGSVPWMSPEIVTQTKYDTKADIWSFGCTILEMAQAEAPWSNYQFDNPIAAIMKIGL SDEIPQIPETISPDLNQFIRKCLQRDPSKRPTATELLNDSFLAEN* >EMBOSS_001_2 *RIRNLITQLKNHQPTNYQQRKIAWVPRLDVGYHLPSAKIFYLQGKYHFRRILQIK*TRL KGRKRYLIEKQLFSKNKISFNSNKYSSRLFKIPIRVHLCLFCSQDRLFQSLKYKMIVSTM QQIKIQVNLRQLKKLLKSKSAKSLKALQKRLGNSRMQMQNKQKTISSKISM*TTAKQINY NHQRPRWLSGL*SIMKISNYIQIIMKTMKRKMKKKLNIKLIERILSFQTNQQIVQINQIN SSSLLQMLAMDSNQQMANLQIYCSRKKKIKQILSSKKNQSL*TIAMLKIQISIKMNQKII KKKKKHLKTTKIKSFRQKASERDSPYQKNRKILKYLQQKKSAELIVKKIIPLHSHLRYLL KASQKIQKPKLLYKKLSNGSQAISQVQEALVKYLQP*TATLEKYLSSKRSWSMVNRSQTK NFWMNKKKNQGSCKHYLINTQFNIKDMRDNKIACVYFWNT*AKEILIRC*RSLGLWRNRQ LRFMLDKYQVEFNISTHKK*YIKISKALIYLLDQMAQLSYLILAALSNQNSLQIVTKK*I KHQRARFLGCLLKQQPRLNMIQRLTFGHLVAQYLKWLKQKLHGQTISLTIRLQLQ*KQVL AMKSHKSQKQSLLTSISSSESACREILLKGQLQLSFQTTLSQLKI >EMBOSS_001_3 EESGIQQLSSKIINLQITNKEKQHGFHDWMLDIICQAQKFFIYKGNTIFGGYSKLNEQD* REENDIQQKNNYLVKTKSALIATSIAADYLKFQLGCIFVCFAVRTGSSKAQNIK*QCQQC SKLKFR*IFDS*KSC*RARAQKASRLCRKDQATQECKCKISRKRLVAKYQCEQQQSKQTT ITSDHDGFQVFEAQ*RYPIISKQL*RP*RGR*RRN*IQNQQRGYLVFRRISKQSKQIKQI PALFFRCQQWTPINKWQIYKFIAAEKRKLSKYLAVRRINLCEQQQCQKYRFLQR*IKKQS RRKKNIQKRPRSNHLDKKHQREILLTRKTERFSNTSNKRNQQSQQSKRQFHFTVTFDTCS KRARKFKSRSFSIKNCQMEVRRFHRCRKLWSSIYSHELQHWRNICRQKDHGPWSIEVRQR IFG*IRKRTKDHANIISQTHNLIQRT*EIIRLLVYIFGIHERRKY*LDAEEVWAFGGIDN QGLCQTNIKWNSISPLTKSDTQRYQRRQYTCWIRWHSQVI*FWLRQVIRTHFKQQQRNEQ NTKGLGSLDVS*NSNLDQI*YKG*HLVIWLHNT*NGSSRSSMVKLLV*QSDCSYNENRSQ R*NPTNPRNNLS*PQSVHQKVLAERSFQKANCN*AFKRLFLS*KL 5. Answers to questions in part II (3) of exercise. Record how many HDAC domains your protein has: o There is one S_TKc domain. Approximately how big is it? o It is about 270 amino acids long. What amino acid numbers does it include? o The S_TKc domain roughly includes AA numbers 375 – 645 Is it near the N-terminus, C-terminus, or in the middle? o The domain takes up roughly the final third of the gene. It begins somewhat near the C-terminus and ends very close to the C-terminus. 6. Top 3 most similar proteins from other organisms in part II (4). Organism E-Value gi|116061057|emb|CAL56445.1| putative MAP3K alpha 1 protein k Ostreococcus tauri 2e-63 gi|116643236|gb|ABK06426.1| HAtagged protein kinase domain o synthetic construct 3e-62 gi|116643230|gb|ABK06423.1| HAtagged protein kinase domain o synthetic construct 3e-61 gi|46389856|dbj|BAD15457.1| putative MEK kinase [Oryza sativa (j Oryza sativa (japonica cultivargroup) 5e-61 Solanum lycopersicum 1e-60 gi|45861623|gb|AAS78640.1| MAP3Ka [Lycopersicon esculentum] 7. The 3 reference citations obtained in part II (5) Bonaventure, J. et al. Common Mutations in the Fibroblast Growth Factor Receptor 3 (FGFR 3) Gene Account for Achondroplasia, Hypochondroplasia, and Thanatophoric Dwarfism. American Journal of Medical Genetics. 63:148-154. Santos, HG. et al. Clinical Hypochondroplasia in a Family Caused by a Heterozygous Double Mutation in FGFR3 Encoding GLY380LYS. American Journal of Medical Genetics Part A. 143A:355-359. Van Oers, JM. and Wild, PJ. et al. FGFR3 Mutations and a Normal CK20 Staining Pattern Define Low-Grade Noninvasive Urothelial Bladder Tumours. European Urology. Epub ahead of print (Jan 12, 2007) 8. Restriction map of the genomic Tetrahymena gene sequence. Source: http://tools.neb.com/NEBcutter2/index.php 9. Primer Design, Tms, % G/C content, primer length. Highlight where these will anneal on the coding strand (assuming it is double-stranded). (Bold is where primers will anneal) Sequence Beginning: 5- AAGAATCAGGAATTTAATAACTC -3 3- TTCTTAGTCCTTAAATTATTGAG -5 5-CTCTTTCTTAGCTGAAAATTGA-3 Sequence Ending: 5- TCAATTTTCAGCTAAGAAAGAG-3 3- AGTTAAAAGTCGATTCTTTCTC -5 Primers: Beginning: 5-AAGAATCAGGAATTTAATAACTC-3 Tm = 46ºC % G/C = 26 Primer Length = 23 Calculation of Estimated Tm: Tm = 4(G+C) + 2(A+T) = 4(6) + 2(17) = 24 + 34 Tm = 58 ºC Ending – 5-TCAATTTTCAGCTAAGAAAGAG-3 Tm = 47ºC % G/C = 32 Primer Length = 22 Calculation of Estimated Tm: Tm = 4(G+C) + 2(A+T) = 4(7) + 2(15) = 28 + 30 Tm = 58 ºC 10. Map of cloning vector. pIGF: Gene X BamHI Gateway XhoI PspOMI pIGF-GTW BamHI BamHI MTT promoter GFP BamHI BamHI Mtt pmr-rRNA 2.5 kbp Ampicillin resistance gene