Download NCBI%20Sequence%20Analysis[1]

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hedgehog signaling pathway wikipedia , lookup

SR protein wikipedia , lookup

Phosphorylation wikipedia , lookup

Signal transduction wikipedia , lookup

List of types of proteins wikipedia , lookup

Magnesium transporter wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein wikipedia , lookup

Cyclol wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Homology modeling wikipedia , lookup

Protein folding wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Protein domain wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein structure prediction wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
PART 1 – Using NCBI to Determine Species Relatedness
In this assignment, you will be using the NCBI database to compare protein sequences
amongst various species. This information will be used to determine species relatedness.
You will be using the blast feature to align the sequences and compare them. The proteins
to examine are listed in the table. Your task is to collect the data (ie most similar to least
similar) for the top 4 organisms that align with your search sequence (from Homo
sapiens).
Protein &
Size
Top
orgs
# of
diff.
myoglobin
Protein &
Size
FGB
preproprotein
- 491 aa
Top
orgs
# of
diff.
Cytochrome
B – 380 aa
Top
orgs
# of
diff.
Alpha-globin
1
(142 aa)
Protein &
Size
Protein &
Size
Protein &
Size
# of
diff.
Protein &
Size
Preproinsulin
– 110 aa
Top
orgs
# of
diff.
Top
orgs
# of
diff.
Cytochrome
C – 105 aa
Top
orgs
# of
diff.
Beta-globin
1
(147 aa)
Top
orgs
Protein &
Size
Protein &
Size
FGA
preproprotein
– 644 aa
Top
orgs
# of
diff.
Discussion/Analysis
1. Briefly explain the role of each protein that you are analyzing in your BLAST
search.
2. Based on your analysis which organism is most closely related to homo sapiens?
How did you decide this?
3. Do the orders of the top four make sense in light of the types of organisms that are
closely aligned with your BLAST searches? Explain.
4. Why doesn’t the order of organisms stay the same regardless of the protein
analyzed? Explain this in detail!
5. What is a molecular clock, why is it useful and how is it calculated?
6. Based on the number of amino acid differences between proteins from homo
sapiens and a closely related organism, do you see evidence of ‘faster clocks’ and
‘slower clocks’ – use an example from your data. What accounts for the different
rates observed and why is this information important?
7. Select one protein to build a phylogenetic tree between homo sapiens and the 4
top organisms from that particular BLAST search. Copy and paste the sequence
alignments and attempt to explain the amino acid changes in your tree.
8. What are the limitations in this type of analysis?
PART 2 USING NCBI TO UNDERSTAND THE EVOLUTION OF CHORDATES
Use antennapedia as a test case.
1. Sog and Dpp control the dorsal-ventral patterning in fruit flies. The first task will
be to detrmine the Sog and Dpp homologues in chordates. When searching using
the protein sequence database and run a blast similar to part one (REMEMBER
TO EXCLUDE PROTOSTOMIA)
2. When you have completed step 1 and determined the homologues in chordates
you task will be to reasearch the role of these proteins in chordates.
3. Now you will compare the expression patterns of Sog and Dpp and the chordate
homologues. You will notice that the zones of expression have changed. The
question is when did this occur? Did this occur before or after the formation of the
deuterostome ancestor?
4. Summarize all your findings in a 1 page report.
PART 3 Building Phylogenetic Trees using HOMOLOGENE
Starting with the NCBI home page, click on HOMOLGY then HOMOLOGENE.
Run a search for Cytochrome C.
Click DOWNLOAD. This will download to NOTEPAD. You can edit the names of the
subjects to make your tree easier to read.
COPY this file by selecting ALL and then using the COPY function.
Return to the NCBI home page. Click on BLAST and then the COBALT tool.
PASTE the file into the QUERY box. Hit the ALIGN button at the bottom.
Once the results come in hit the PHYLOGENETIC TREE – SLANTED VERSION.
Copy and Paste this tree into a word document.
Repeat this procedure for any of the proteins from part 1 and 2.
ANALYSIS OF YOUR RESULTS – 1 Paragraph.
1. Do the trees match? Discuss.
2. Do the trees make sense in light of other evidence? Provide examples.
PART 4 Building Phylogenetic Trees using HOMOLOGENE – VERSION A
Starting with the NCBI home page, click on HOMOLGY then HOMOLOGENE.
Run a search for Myoglobin
Select organisms of choice (min 6). Click DOWNLOAD. This will download to
NOTEPAD. You can edit the names of the subjects to make your tree easier to read.
COPY this file by selecting ALL and then using the COPY function.
Return to the NCBI home page. Click on HOMOLGY and then COBALT
Copy the sequences into the ENTER THE QUERY box.
Hit the ALIGN button at the bottom.
Once the results come in hit the DISTANCE TREE OF RESULTS. Then select
SLANTED VERISON to view your tree.
Copy and Paste this tree into a word document.
Repeat this procedure for 6 of the following using the same organisms:
Insulin-like Growth Factor-1
Pax-6, Oxytocin receptor, Sonic Hedgehog
Fibrinogen Beta Chain
Hox-c6, BMP4 , Gremlin, Lactase
Fibrinogen Gamma Chain
Distal-less, Insulin, Insulin receptor, FOXp2
Cytochrome B
Cytochrome C
ANALYSIS OF YOUR RESULTS
1. Describe the 8 proteins that were used in this analysis – briefly.
2. What is meant by the term phylogenetics? How will the analysis being done here
help you construct the phylogenetic relationships amongst the groups of animals
being studied?
3. For each tree determine if there are any groupings – if so circle them and label
them (ie mammals, primates, tetrapods, vertebrates etc.)
4. Analyze the trees – is their an overall pattern that emerges from the trees (provide
examples) and are their any glaring surprises (provide examples).
5. Build your own master tree from the 8 you have generated.
6. Could you confidently determine species relatedness based on one protein –
explain. What are the limitations of this study?
EXTENSION
1. You are to analyze 2 sequences.
2. You are to build a molecular clock for each sequence by comparing the number of
amino acid differences between humans and other specimens (this can be found
from your BLAST results page) and the researched time of divergence between
humans and said specimens.
3. Answer the following : What is a molecular clock? How is it built and what can it
be used for?
4. Display your two graphs. Now compare them and examine the graphs from other
students – Answer the following : Do all graphs show that moloecular clocks tick
at the same rate? Explain.
5. Based on your clock how many differences would you expect to find in a
specimen like austarolpithecus afarensis? How many differences would there be
in a specimen that is 200 million years old?
NCBI ASSIGNMENT
STUDYING PROTEINS, DOMAINS AND PROTEIN FAMILIES
FOCUS ON PROTEINS
http://www.ncbi.nlm.nih.gov/books/NBK26830/
A.What ultimately determines the shape of a protein (hint: energy)?
B. When a protein folds up, which side groups are facing outwards and which are
facing in?
C. What are protein domains? Why are they significant? Explain the role of a protein
kinase domain.
D. What is meant by a protein family? What family does elastase and chymotrypsin
belong to?
E. What is meant by homologous proteins or homologous protein domains?
F. What are chaperones and why are they significant? What are heat shock proteins
and why are they necessary?
G. Compare the structure of neuraminidase and hemoglobin – how are they similar
and different?
H. Mature active proteins are often modified from an inactive version – explain the
modification in the production of insulin from proinsulin. (Hint: explain proteolytic
cleavage)
USING PROTEIN BLAST TO ANALYZE PROTEINS
_______________________________________________________________
In this section you will be getting unknown sequences of proteins and you will
need to determine
a) The protein source – size of the protein, role of the protein and provide an
image of it
b) Any domains – role of the domains
c) Protein family – what unites this family?
From the NCBI home page (http://www.ncbi.nlm.nih.gov/ ) select BLAST then
PROTEIN BLAST.
Enter the following into the query box :
tsedhfqpffnektfgageadcglrplfekkqvqdqtekelfesyiegrIVEGQDaevglSPWQV
MLFrks
Hit Blast.
Analyze the following samples :
SAMPLE 1
pftyat lirqaimess drqltlneiy swftrtfayf
vwtvdeveyq
rrnaatwkna vrhnlslhkc fvrvenvkga
SAMPLE 2
lpsEYYGPLKTLLLKSPdvqPISASAAYILSEICRdKNDAVLPLVRLLLHHDKLVPFATAVAELDLKdtqd
aNTIFRGNS
LATRCLDEMMKIVGGHYLKVTLKPILDEICDSSKSCEIDPIKLKEGDNVENNKENLRYYVDKLFNTIVKSS
MSCPTVMCD
IFYSLRQMATQRFPndphVQYSAVSSFVFLRFFAVAVVSPHTFHLRPHHPDaQTIRTLTLISKTIQTLGSW
GSLsks-ks
sFKETFMCEFfkMFQEEGYIIAVKKFLDEISStetkessgtSEPVHLKEGEMYKRAQGRTRiGKKNFKKRW
FCLTSrelt
SAMPLE 3
ILYTARCpefktEEGQKKGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEH
SAMPLE 4
GFETRVTVLGH
VQRGGSPTAFDRVLASRLGARAVELLLEGKGGRCVGIQNn
SAMPLE 5
MAHAAQVGLQDATSPIMEELITFHDHALMIIFLICFLVLYALFLTLTTKLTNTSISDAQEMETVWTILPA
IILVLIALPSLRILYMTDEVNDPSFTIKSIGHQWYWTYEYTDYGGLIFNSYMLPPLFLEPGDLRLLDVDN
RVVLPVEAPIRMMITSQDVLHSWAVPTLGLKTDAIPGRLNQTTFTATRPGVYYGQCSEICGANHSFMPIV
LELIPLKIFEMGPVFTL
SAMPLE 6
MLLLARCLLLVLVSSLLVCSGLACGPGRGFGKRRHPKKLTPLAYKQFIPNVAEKTLGASGRYEGKISRNS
ERFKELTPNYNPDIIFKDEENTGADRLMTQRCKDKLNALAISVMNQWPGVKLRVTEGWDEDGHHSEESLH
YEGRAVDITTSDRDRSKYGMLARLAVEAGFDWVYYESKAHIHCSVKAENSVAAKSGGCFPGSATVHLEQG
GTKLVKDLSPGDRVLAADDQGRLLYSDFLTFLDRDDGAKKVFYVIETREPRERLLLTAAHLLFVAPHNDS
ATGEPEASSGSGPPSGGALGPRALFASRVRPGQRVYVVAERDGDRRLLPAAVHSVTLSEEAAGAYAPLTA
QGTILINRVLASCYAVIEEHSWAHRAFAPFRLAHALLAALAPARTDRGGDSGGGDRGGGGGRVALTAPGA
ADAPGAGATAGIHWYSQLLYQIGTWLLDSEALHPLGMAVKSS
Sample 7
MRAWILLLAVLATSQPIVQVASTEDTSISQRFIAAIAPTRTEPSAASAAAAAATATATATATTALAKAFN
PFNELLYKSSDSDRNNKNKGNKHSKSDANRQFNEVHKPRTDQLENSKNKPKQLVNKTNKMAVKDQKHHQP
QQQQQQHHKPATTTALTSTESHQSPIETIFVDDPALALEEEVASINVPANAGAIIEEQEPSTYSKKELIK
DKLKPDPSTLVEIENSLLSLFNMKRPPKIDRSKIIIPEAMKKLYAEIMGHELDSVNIPRPGLLTKSANTV
RSFTHKDSKIDDRFPHHHRFRLHFDVKSIPAEEKLKAAELQLTRDALAQAAVASTSANRTRYQVLVYDIT
RVGVRGQREPSYLLLDTKTVRLNSTDTVSLDVQPAVDRWLATPQKNYGLLVEVRTMRSLKPAPHHHVRLR
RSADEAHEQWQHKQPLLFAYTDDGRHKARSIRDVSGGGGGGVGGRNRRHQRRPARRKNHEETCRRHSLYV
DFADVGWDDWIVAPPGYDAYYCHGKCPFPLADHFNSTNHAVVQTLVNNLNPGKVPKACCVPTQLDSVAML
YLNDQSTVVLKNYQEMTVVGCGCR
Sample 8
MMEGLLWILLSVIIASVHGSRLKTPALPIQPEREPMISKGLSGCSFGGRFYSLEDTWHPDLGEPFGVMHC
VMCHCEPQRSRRGKVFGKVSCRNMKQDCPDPTCDDPVLLPGHCCKTCPKGDSGRKEVESLFDFFQEKDDD
LHKSYNDRSYISSEDTSTRDSTTTEFVALLTGVTDSWLPSSSGVARARFTLSRTSLTFSITFQRINRPSL
IAFLDTDGNTAFEFRVPQADNDMICGIWKNVPKPHMRQLEAEQLHVSMTTADNRKEELQGRIIKHRALFA
ETFSAILTSDEVHSGMGGIAMLTLSDTENNLHFILIMQGLVPPGSSKVPVRVKLQYRQHLLREIRANITA
DDSDFAEVLADLNSRELFWLSRGQLQISVQTEGQTPRHISGFISGRRSCDTLQSVLSSGAALTAGQTGGV
GSAVFTLHPNGSLDYQLLVAGLSSAVLSVSIEMKPRRRNKRSVLYELSAVFTDQRAAGSCGRVEARHTHM
LLQNELFINIATALQPDGELRGQIRLLPYNGLDARRNELPVPLAGVLVSPPVRTGAAGHAWVSVDPQCHL
HYEIIVNGLSKSEDASISAHLHGLAEIGEMDDSSTNHKRLLTGFYGQQAQGILKDISVELLRHLNEGTAY
LQVSTKMNPRGEIRGRIHVPNHCESPAPRAEFLEEPEFEDLLFTREPTELRKDTHTHIHSCFFEGEQHTH
GSQWTPQYNTCFTCTCQKKTVICDPVMCPTLSCTHTVQPEDQCCPICEEKKESKETAAVEKVEENPEGCY
FEGDQKMHAPGTTWHPFVPPFGYIKCAVCTCKGSTGEVHCEKVTCPPLTCSRPIRRNPSDCCKECPPEET
PPLEDEEMMQADGTRLCKFGKNYYQNSEHWHPSVPLVGEMKCITCWCDHGVTKCQRKQCPLLSCRNPIRT
EGKCCPECIEDFMEKEEMAKMAEKKKSWRH
Building Phylogenetic Trees using HOMOLOGENE – VERSION B
Starting with the NCBI home page, click on HOMOLGY then HOMOLOGENE.
Run a search for Myoglobin
Click DOWNLOAD. This will download to NOTEPAD. You can edit the names of the
subjects to make your tree easier to read.
COPY this file by selecting ALL and then using the COPY function.
Return to the NCBI home page. Click on HOMOLGY and then COBALT
Copy the sequences into the ENTER THE QUERY box.
Hit the ALIGN button at the bottom.
Once the results come in hit the DISTANCE TREE OF RESULTS. Then select
SLANTED VERISON to view your tree.
Copy and Paste this tree into a word document.
Repeat this procedure for 11 of the following :
Insulin-like Growth Factor-1, Pax-6, Oxytocin receptor, Sonic Hedgehog,
Fibrinogen Beta Chain, Hox-c6, BMP4 , Gremlin, Lactase,
Fibrinogen Gamma Chain, Distal-less, Insulin, Insulin receptor, FOXp2,
Cytochrome B, alpha tubulin acetyltransferase 1,
Cytochrome C, Somatostatin, NRAS, RRAS, NKX2
ANALYSIS OF YOUR RESULTS
1. Based on your 12 trees, construct the tree that would best display the relationships
between 8 of the common organisms in your analysis.
2. Compare the trees. Do they all reveal the same relationships – provide examples?
Could you confidently determine species relatedness based on one protein
analysis? If not how would you properly determine species relatedness?