Download Protein Information Tutorial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Circular dichroism wikipedia , lookup

Cyclol wikipedia , lookup

Structural alignment wikipedia , lookup

Protein wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Proteomics wikipedia , lookup

Alpha helix wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein folding wikipedia , lookup

Rosetta@home wikipedia , lookup

Protein design wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Western blot wikipedia , lookup

Protein purification wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein domain wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
ProteinInformationTutorial
Relevantwebsites:
SMART:
http://smart.embl-heidelberg.de/
CBSPredictionServers:
http://www.cbs.dtu.dk/services/
EMBOSS:
http://www.bioinformatics.nl/emboss-explorer/
Characterizingaproteinusingproteindomainidentificationandpredictionserversontheweb.
Inthistutorialyouwilluseknownproteinsequenceandsubmitittoavarietyofprediction
serverstolearnhowtointerprettheoutputfromtheseservers..
Payattentiontotheoutputfromthevariousprograms.Ifyoudonotunderstandit,lookfor
helpfilesorlinkstoinformationexplainingtheoutput.TheCBSserverhasalinkfromthe
outputofmostoftheirprogramsthatdescribestheoutputindetailandhowtointerpretit.
TherearehelpfilesfortheEMBOSSprogramsaswell.
1)UsetheSMARTdatabase
BecausetherearenoSMARTdomainsintheMTR1proteinsequence,we’lluseadifferent
sequencetosubmittoSMART.UsethehumanproteinNEK2withaUniprotaccessionnumber
P51955.IfyouuseUniprot/SwissProtaccessionnumbers,youcansimplytypeinthe
accessionnumberinthetextboxSequenceIDorACCshowninFigure1.Checktheboxes
PFAMdomainsandsignalpeptides.ThenclicktheSequenceSMARTbutton.
Figure1:SubmissionformfortheSMARTdatabase.Notethebatchaccesslinkatthebottom.Clickonthisto
submitalistofproteinaccessionnumbers.
BCHM62802016
ProteinInformationontheWeb
Page1of6
TheoutputincludesagraphicofanydomainsfoundintheproteinasshowninFigure2.
Figure2:DomainfoundinthesequenceNek2_HUMAN(P51955).
Theverticallinesrepresentintronpositions.ThenumberatthetopofthelineistheAA
positionoftheintron-exonboundaryandthenumberatthebottomisthereadingframe,with
differentcolorsdependingonthereadingframe.Ifyoupausethemouseoverthedomain
graphic,itwillexpandthedisplaytoshowmoreinformationaboutthedomain,includingthe
positionofthematchandtheE-valueforthematch.Thedomainsarerepresentedonthe
proteinsequenceinthelocationthattheyarefound.Thebrightgreenhortizontalbars
representcoiled-coiledregionsandthebrightpink/magentacolorrepresentsregionsoflow
complexity.Youcanfollowvariouslinksfromtheoutputtolearnmoreabouttheindividual
domainsaswellaslinksfortheproteinrecordatUniprot.Mouseoverthedomainuntila
graphicexpandsbelowthefigure.Clickonthelink“gotofullannotation”andthewindowwill
change,anddisplaythesequenceofthedomainwiththecatalyticsiteshighlightedingreen,as
showninFigure3.Thiscanbeaveryusefulfeatureisyouareinterestedinmaking
constitutivelyactivemutantsorcatalyticallyinactivemutants.
Figure3:SMARTdomaindetailpage.Catalyticresiduesareshowningreen.
AtthebottomoftheSMARToutputtherearealsoanumberofexpandablemenusthatcontain
linkstoadditionalinformationaboutthedomainshown.
3)ScantheproteinsequencesusingtheTMHMMprogramattheCBSpredictionserver
SubmittheMTR1proteinsequencetotheTMHMMprogram,usingextensivegraphicsasthe
output.TheoutputshouldlooklikethatdisplayedinFigures4aand4b:
BCHM62802016
ProteinInformationontheWeb
Page2of6
Figure4a:TMHMMoutputfromtheCBSpredictionserver
Figure4b:TMMHMMgraphicoutputfromCBSpredictionserver
Theredbarsatthetoprepresentthepositionsofprobablytransmembranespanning
domains.ThepinkorbluelinesconnectingtheTMdomainsprovidethetopologyrelativeto
thecellmembrane.Ifyouwantedtodesignapeptideantibodytothisprotein,wouldyou
choosearegionpredictedtobeontheinsideoroutsideofthemembrane?
DotheseresultsconcurwiththoseobtainedfromInterProScan?Whatadditional
informationdoyougetfromthisserverthatyoudonotfromInterProScan?SubmittheNEK2
proteinsequencetotheTMHMMpredictionserver.DoesitpredictthepresenceofanyTM
domains?
4)Usethepredictionserverstolookforsignalpeptidesandphosphorylationsites.
ScaneithertheMTR1orNEK2proteinsequenceusingthePredictionServersprogram
SignalPtolookforsignalpeptides.GiventheresultsoftheTMpredictions,wouldyouexpect
thisproteintobesecreted?Thissitealsopredictssignalanchorpeptides,whichisamore
likelyresultgiventhenumberofpredictedTMdomains.
UsetheNetPhosKprogramatthePredictionServerssitetopredictthepotential
threonineandtyrosinephosphorylationsites.Keepinmindthatthesepredictionsarejust
that,predictions.Whileusefulforinformingpossibleexperiments,theyarenotnecessarily
correct.
BCHM62802016
ProteinInformationontheWeb
Page3of6
5)ScantheproteinusingtheantigenicprograminEMBOSS.
ThisprogramislocatedundertheProteinMotifsectionofEMBOSS.Thisisasimpleprogram
designedtopredictpotentiallyantigenicregionsofaproteinsequence;agoodstartingpoint
fordesigningpeptideantibodies.Readthemanualforittounderstandthemethodbehindthe
program.Theoutputisfairlysimple,justatabletoregionswhichhaveahighlikelihoodof
beingantigenic.Underthetextboxwhereyouinputthesequence,thereisanoptionfor
increasingtheminimumlengthoftheantigenicregionaswellasasectionforchangingthe
defaultoutput.ThedefaultoutputisEmboss_motifthatshowsthesequencemarkedby
locationandthescore.YoumayfindtheEMBOSSSeqtabletobeabitmoreuseful.Tryafew
differentoutputstoseewhatinformationisprovidedbythem.
Figure5:EMBOSSmotifoutput
Figure6:EMBOSSseqtableoutput
PATTERNmatching
AcommonrequestIgetistofindparticularaminoacidpatternsinasequenceorgroupof
sequences.Oftenthesepatternsaresomewhatdegenerate.Thereisaprogramwithin
EMBOSScalledfuzzprowhichusesPROSITEstylepatternstosearchproteinsequences.
BCHM62802016
ProteinInformationontheWeb
Page4of6
Anexamplerecognitionorcatalyticsitemighthavethepattern:[FY]-[LIV]-G-[DE]-x(2)-{E}
InEnglishthatis:ForY,followedbyL,IorV,followedbyG,followedbyDorE,
followedbyany2aminoacids,followedbyanyaminoacidEXCEPTE.
UsefuzzprotofindthefollowingpatternsintheMTR1protein:
xRVK
Howmanyhitsdidyouget?
Nowtry:QRVK
Youshouldgetonly1hit.
Althoughthisexampleisverysimplistic,patternmatchingisaverypowerfultoolboth
becauseoftheflexibilityofthesearchparameters;somethingyoucannotdousingtheFind
commandinWordandalsobecauseatthecommandlineyoucansetituptosearchasmany
sequencesasyouwant.
HereisthePrositesignatureforGPCRfamily1:
[GSTALIVMFYWC]-[GSTANCPDE]-{EDPKRH}-x-{PQ}-[LIVMNQGA]-{RK}-{RK}-[LIVMFT]-
[GSTANC]-[LIVMFYWSTAC]-[DENH]-R-[FYWCSH]-{PE}-x-[LIVM]
Copyandpastethatintothesearchpatterntextboxforfuzzpro.
Doesitfindahit?Whatisthesequencethatmatchesthispattern?
Myresultswere:
Start
113
End Pattern_name Mismatch Sequence
129 pattern1
. GSIFNITGIAINRYCYI
Prositepatternsyntax
1. ThestandardIUPACone-lettercodesfortheaminoacidsareusedinPROSITE.
2. Thesymbol`x'isusedforapositionwhereanyaminoacidisaccepted.
3. Ambiguitiesareindicatedbylistingtheacceptableaminoacidsforagivenposition,
betweensquarebrackets`[]'.Forexample:[ALT]standsforAlaorLeuorThr.
4. Ambiguitiesarealsoindicatedbylistingbetweenapairofcurlybrackets`{}'theamino
acidsthatarenotacceptedatagivenposition.Forexample:{AM}standsforanyamino
acidexceptAlaandMet.
5. Eachelementinapatternisseparatedfromitsneighborbya`-'.
6. Repetitionofanelementofthepatterncanbeindicatedbyfollowingthatelementwith
anumericalvalueor,ifitisagap('x'),byanumericalrangebetweenparentheses.
Examples:
x(3)correspondstox-x-x
x(2,4)correspondstox-xorx-x-xorx-x-x-x
A(3)correspondstoA-A-A
Note:Youcanonlyusearangewith'x',i.e.A(2,4)isnotavalidpatternelement.
7. WhenapatternisrestrictedtoeithertheN-orC-terminalofasequence,thatpattern
eitherstartswitha`<'symbolorrespectivelyendswitha`>'symbol.Insomerare
BCHM62802016
ProteinInformationontheWeb
Page5of6
cases(e.g.PS00267orPS00539),'>'canalsooccurinsidesquarebracketsfortheCterminalelement.'F-[GSTV]-P-R-L-[G>]'meansthateither'F-[GSTV]-P-R-L-G'or'F[GSTV]-P-R-L>'areconsidered.
Otherpredictiontools
Thetoolslistedherearejustafewofthehundredsthatareavailable.Ifyouare
interestedinaparticularfeatureofaprotein,suchasifithasamyristylationmoietyoris
ubuiquitinatedatcertainresidues,therearepredictionprogramsforthat.StartwithaGoogle
searchandyoucanprobablyfindoneormoresitesthathaveawebinterfacetotheprediction
algorithm.Alwayskeepinmindthatapredictiondoesnotmeanthatitoccursinthecell.If
thereismorethanonepredictionalgorithm,youmightwanttotestyourprotein(s)with
otherpredictionstodeterminetheoverlap.
BCHM62802016
ProteinInformationontheWeb
Page6of6