Download NCBI%20Sequence%20Analysis[1]

PART 1 – Using NCBI to Determine Species Relatedness In this assignment, you will be using the NCBI database to compare protein sequences amongst various species. This information will be used to determine species relatedness. You will be using the blast feature to align the sequences and compare them. The proteins to examine are listed in the table. Your task is to collect the data (ie most similar to least similar) for the top 4 organisms that align with your search sequence (from Homo sapiens). Protein & Size Top orgs # of diff. myoglobin Protein & Size FGB preproprotein - 491 aa Top orgs # of diff. Cytochrome B – 380 aa Top orgs # of diff. Alpha-globin 1 (142 aa) Protein & Size Protein & Size Protein & Size # of diff. Protein & Size Preproinsulin – 110 aa Top orgs # of diff. Top orgs # of diff. Cytochrome C – 105 aa Top orgs # of diff. Beta-globin 1 (147 aa) Top orgs Protein & Size Protein & Size FGA preproprotein – 644 aa Top orgs # of diff. Discussion/Analysis 1. Briefly explain the role of each protein that you are analyzing in your BLAST search. 2. Based on your analysis which organism is most closely related to homo sapiens? How did you decide this? 3. Do the orders of the top four make sense in light of the types of organisms that are closely aligned with your BLAST searches? Explain. 4. Why doesn’t the order of organisms stay the same regardless of the protein analyzed? Explain this in detail! 5. What is a molecular clock, why is it useful and how is it calculated? 6. Based on the number of amino acid differences between proteins from homo sapiens and a closely related organism, do you see evidence of ‘faster clocks’ and ‘slower clocks’ – use an example from your data. What accounts for the different rates observed and why is this information important? 7. Select one protein to build a phylogenetic tree between homo sapiens and the 4 top organisms from that particular BLAST search. Copy and paste the sequence alignments and attempt to explain the amino acid changes in your tree. 8. What are the limitations in this type of analysis? PART 2 USING NCBI TO UNDERSTAND THE EVOLUTION OF CHORDATES Use antennapedia as a test case. 1. Sog and Dpp control the dorsal-ventral patterning in fruit flies. The first task will be to detrmine the Sog and Dpp homologues in chordates. When searching using the protein sequence database and run a blast similar to part one (REMEMBER TO EXCLUDE PROTOSTOMIA) 2. When you have completed step 1 and determined the homologues in chordates you task will be to reasearch the role of these proteins in chordates. 3. Now you will compare the expression patterns of Sog and Dpp and the chordate homologues. You will notice that the zones of expression have changed. The question is when did this occur? Did this occur before or after the formation of the deuterostome ancestor? 4. Summarize all your findings in a 1 page report. PART 3 Building Phylogenetic Trees using HOMOLOGENE Starting with the NCBI home page, click on HOMOLGY then HOMOLOGENE. Run a search for Cytochrome C. Click DOWNLOAD. This will download to NOTEPAD. You can edit the names of the subjects to make your tree easier to read. COPY this file by selecting ALL and then using the COPY function. Return to the NCBI home page. Click on BLAST and then the COBALT tool. PASTE the file into the QUERY box. Hit the ALIGN button at the bottom. Once the results come in hit the PHYLOGENETIC TREE – SLANTED VERSION. Copy and Paste this tree into a word document. Repeat this procedure for any of the proteins from part 1 and 2. ANALYSIS OF YOUR RESULTS – 1 Paragraph. 1. Do the trees match? Discuss. 2. Do the trees make sense in light of other evidence? Provide examples. PART 4 Building Phylogenetic Trees using HOMOLOGENE – VERSION A Starting with the NCBI home page, click on HOMOLGY then HOMOLOGENE. Run a search for Myoglobin Select organisms of choice (min 6). Click DOWNLOAD. This will download to NOTEPAD. You can edit the names of the subjects to make your tree easier to read. COPY this file by selecting ALL and then using the COPY function. Return to the NCBI home page. Click on HOMOLGY and then COBALT Copy the sequences into the ENTER THE QUERY box. Hit the ALIGN button at the bottom. Once the results come in hit the DISTANCE TREE OF RESULTS. Then select SLANTED VERISON to view your tree. Copy and Paste this tree into a word document. Repeat this procedure for 6 of the following using the same organisms: Insulin-like Growth Factor-1 Pax-6, Oxytocin receptor, Sonic Hedgehog Fibrinogen Beta Chain Hox-c6, BMP4 , Gremlin, Lactase Fibrinogen Gamma Chain Distal-less, Insulin, Insulin receptor, FOXp2 Cytochrome B Cytochrome C ANALYSIS OF YOUR RESULTS 1. Describe the 8 proteins that were used in this analysis – briefly. 2. What is meant by the term phylogenetics? How will the analysis being done here help you construct the phylogenetic relationships amongst the groups of animals being studied? 3. For each tree determine if there are any groupings – if so circle them and label them (ie mammals, primates, tetrapods, vertebrates etc.) 4. Analyze the trees – is their an overall pattern that emerges from the trees (provide examples) and are their any glaring surprises (provide examples). 5. Build your own master tree from the 8 you have generated. 6. Could you confidently determine species relatedness based on one protein – explain. What are the limitations of this study? EXTENSION 1. You are to analyze 2 sequences. 2. You are to build a molecular clock for each sequence by comparing the number of amino acid differences between humans and other specimens (this can be found from your BLAST results page) and the researched time of divergence between humans and said specimens. 3. Answer the following : What is a molecular clock? How is it built and what can it be used for? 4. Display your two graphs. Now compare them and examine the graphs from other students – Answer the following : Do all graphs show that moloecular clocks tick at the same rate? Explain. 5. Based on your clock how many differences would you expect to find in a specimen like austarolpithecus afarensis? How many differences would there be in a specimen that is 200 million years old? NCBI ASSIGNMENT STUDYING PROTEINS, DOMAINS AND PROTEIN FAMILIES FOCUS ON PROTEINS http://www.ncbi.nlm.nih.gov/books/NBK26830/ A.What ultimately determines the shape of a protein (hint: energy)? B. When a protein folds up, which side groups are facing outwards and which are facing in? C. What are protein domains? Why are they significant? Explain the role of a protein kinase domain. D. What is meant by a protein family? What family does elastase and chymotrypsin belong to? E. What is meant by homologous proteins or homologous protein domains? F. What are chaperones and why are they significant? What are heat shock proteins and why are they necessary? G. Compare the structure of neuraminidase and hemoglobin – how are they similar and different? H. Mature active proteins are often modified from an inactive version – explain the modification in the production of insulin from proinsulin. (Hint: explain proteolytic cleavage) USING PROTEIN BLAST TO ANALYZE PROTEINS _______________________________________________________________ In this section you will be getting unknown sequences of proteins and you will need to determine a) The protein source – size of the protein, role of the protein and provide an image of it b) Any domains – role of the domains c) Protein family – what unites this family? From the NCBI home page (http://www.ncbi.nlm.nih.gov/ ) select BLAST then PROTEIN BLAST. Enter the following into the query box : tsedhfqpffnektfgageadcglrplfekkqvqdqtekelfesyiegrIVEGQDaevglSPWQV MLFrks Hit Blast. Analyze the following samples : SAMPLE 1 pftyat lirqaimess drqltlneiy swftrtfayf vwtvdeveyq rrnaatwkna vrhnlslhkc fvrvenvkga SAMPLE 2 lpsEYYGPLKTLLLKSPdvqPISASAAYILSEICRdKNDAVLPLVRLLLHHDKLVPFATAVAELDLKdtqd aNTIFRGNS LATRCLDEMMKIVGGHYLKVTLKPILDEICDSSKSCEIDPIKLKEGDNVENNKENLRYYVDKLFNTIVKSS MSCPTVMCD IFYSLRQMATQRFPndphVQYSAVSSFVFLRFFAVAVVSPHTFHLRPHHPDaQTIRTLTLISKTIQTLGSW GSLsks-ks sFKETFMCEFfkMFQEEGYIIAVKKFLDEISStetkessgtSEPVHLKEGEMYKRAQGRTRiGKKNFKKRW FCLTSrelt SAMPLE 3 ILYTARCpefktEEGQKKGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEH SAMPLE 4 GFETRVTVLGH VQRGGSPTAFDRVLASRLGARAVELLLEGKGGRCVGIQNn SAMPLE 5 MAHAAQVGLQDATSPIMEELITFHDHALMIIFLICFLVLYALFLTLTTKLTNTSISDAQEMETVWTILPA IILVLIALPSLRILYMTDEVNDPSFTIKSIGHQWYWTYEYTDYGGLIFNSYMLPPLFLEPGDLRLLDVDN RVVLPVEAPIRMMITSQDVLHSWAVPTLGLKTDAIPGRLNQTTFTATRPGVYYGQCSEICGANHSFMPIV LELIPLKIFEMGPVFTL SAMPLE 6 MLLLARCLLLVLVSSLLVCSGLACGPGRGFGKRRHPKKLTPLAYKQFIPNVAEKTLGASGRYEGKISRNS ERFKELTPNYNPDIIFKDEENTGADRLMTQRCKDKLNALAISVMNQWPGVKLRVTEGWDEDGHHSEESLH YEGRAVDITTSDRDRSKYGMLARLAVEAGFDWVYYESKAHIHCSVKAENSVAAKSGGCFPGSATVHLEQG GTKLVKDLSPGDRVLAADDQGRLLYSDFLTFLDRDDGAKKVFYVIETREPRERLLLTAAHLLFVAPHNDS ATGEPEASSGSGPPSGGALGPRALFASRVRPGQRVYVVAERDGDRRLLPAAVHSVTLSEEAAGAYAPLTA QGTILINRVLASCYAVIEEHSWAHRAFAPFRLAHALLAALAPARTDRGGDSGGGDRGGGGGRVALTAPGA ADAPGAGATAGIHWYSQLLYQIGTWLLDSEALHPLGMAVKSS Sample 7 MRAWILLLAVLATSQPIVQVASTEDTSISQRFIAAIAPTRTEPSAASAAAAAATATATATATTALAKAFN PFNELLYKSSDSDRNNKNKGNKHSKSDANRQFNEVHKPRTDQLENSKNKPKQLVNKTNKMAVKDQKHHQP QQQQQQHHKPATTTALTSTESHQSPIETIFVDDPALALEEEVASINVPANAGAIIEEQEPSTYSKKELIK DKLKPDPSTLVEIENSLLSLFNMKRPPKIDRSKIIIPEAMKKLYAEIMGHELDSVNIPRPGLLTKSANTV RSFTHKDSKIDDRFPHHHRFRLHFDVKSIPAEEKLKAAELQLTRDALAQAAVASTSANRTRYQVLVYDIT RVGVRGQREPSYLLLDTKTVRLNSTDTVSLDVQPAVDRWLATPQKNYGLLVEVRTMRSLKPAPHHHVRLR RSADEAHEQWQHKQPLLFAYTDDGRHKARSIRDVSGGGGGGVGGRNRRHQRRPARRKNHEETCRRHSLYV DFADVGWDDWIVAPPGYDAYYCHGKCPFPLADHFNSTNHAVVQTLVNNLNPGKVPKACCVPTQLDSVAML YLNDQSTVVLKNYQEMTVVGCGCR Sample 8 MMEGLLWILLSVIIASVHGSRLKTPALPIQPEREPMISKGLSGCSFGGRFYSLEDTWHPDLGEPFGVMHC VMCHCEPQRSRRGKVFGKVSCRNMKQDCPDPTCDDPVLLPGHCCKTCPKGDSGRKEVESLFDFFQEKDDD LHKSYNDRSYISSEDTSTRDSTTTEFVALLTGVTDSWLPSSSGVARARFTLSRTSLTFSITFQRINRPSL IAFLDTDGNTAFEFRVPQADNDMICGIWKNVPKPHMRQLEAEQLHVSMTTADNRKEELQGRIIKHRALFA ETFSAILTSDEVHSGMGGIAMLTLSDTENNLHFILIMQGLVPPGSSKVPVRVKLQYRQHLLREIRANITA DDSDFAEVLADLNSRELFWLSRGQLQISVQTEGQTPRHISGFISGRRSCDTLQSVLSSGAALTAGQTGGV GSAVFTLHPNGSLDYQLLVAGLSSAVLSVSIEMKPRRRNKRSVLYELSAVFTDQRAAGSCGRVEARHTHM LLQNELFINIATALQPDGELRGQIRLLPYNGLDARRNELPVPLAGVLVSPPVRTGAAGHAWVSVDPQCHL HYEIIVNGLSKSEDASISAHLHGLAEIGEMDDSSTNHKRLLTGFYGQQAQGILKDISVELLRHLNEGTAY LQVSTKMNPRGEIRGRIHVPNHCESPAPRAEFLEEPEFEDLLFTREPTELRKDTHTHIHSCFFEGEQHTH GSQWTPQYNTCFTCTCQKKTVICDPVMCPTLSCTHTVQPEDQCCPICEEKKESKETAAVEKVEENPEGCY FEGDQKMHAPGTTWHPFVPPFGYIKCAVCTCKGSTGEVHCEKVTCPPLTCSRPIRRNPSDCCKECPPEET PPLEDEEMMQADGTRLCKFGKNYYQNSEHWHPSVPLVGEMKCITCWCDHGVTKCQRKQCPLLSCRNPIRT EGKCCPECIEDFMEKEEMAKMAEKKKSWRH Building Phylogenetic Trees using HOMOLOGENE – VERSION B Starting with the NCBI home page, click on HOMOLGY then HOMOLOGENE. Run a search for Myoglobin Click DOWNLOAD. This will download to NOTEPAD. You can edit the names of the subjects to make your tree easier to read. COPY this file by selecting ALL and then using the COPY function. Return to the NCBI home page. Click on HOMOLGY and then COBALT Copy the sequences into the ENTER THE QUERY box. Hit the ALIGN button at the bottom. Once the results come in hit the DISTANCE TREE OF RESULTS. Then select SLANTED VERISON to view your tree. Copy and Paste this tree into a word document. Repeat this procedure for 11 of the following : Insulin-like Growth Factor-1, Pax-6, Oxytocin receptor, Sonic Hedgehog, Fibrinogen Beta Chain, Hox-c6, BMP4 , Gremlin, Lactase, Fibrinogen Gamma Chain, Distal-less, Insulin, Insulin receptor, FOXp2, Cytochrome B, alpha tubulin acetyltransferase 1, Cytochrome C, Somatostatin, NRAS, RRAS, NKX2 ANALYSIS OF YOUR RESULTS 1. Based on your 12 trees, construct the tree that would best display the relationships between 8 of the common organisms in your analysis. 2. Compare the trees. Do they all reveal the same relationships – provide examples? Could you confidently determine species relatedness based on one protein analysis? If not how would you properly determine species relatedness?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download NCBI%20Sequence%20Analysis[1]