* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Information Tutorial
Survey
Document related concepts
Circular dichroism wikipedia , lookup
Structural alignment wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Alpha helix wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein folding wikipedia , lookup
Rosetta@home wikipedia , lookup
Protein design wikipedia , lookup
Trimeric autotransporter adhesin wikipedia , lookup
Western blot wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein domain wikipedia , lookup
Transcript
ProteinInformationTutorial Relevantwebsites: SMART: http://smart.embl-heidelberg.de/ CBSPredictionServers: http://www.cbs.dtu.dk/services/ EMBOSS: http://www.bioinformatics.nl/emboss-explorer/ Characterizingaproteinusingproteindomainidentificationandpredictionserversontheweb. Inthistutorialyouwilluseknownproteinsequenceandsubmitittoavarietyofprediction serverstolearnhowtointerprettheoutputfromtheseservers.. Payattentiontotheoutputfromthevariousprograms.Ifyoudonotunderstandit,lookfor helpfilesorlinkstoinformationexplainingtheoutput.TheCBSserverhasalinkfromthe outputofmostoftheirprogramsthatdescribestheoutputindetailandhowtointerpretit. TherearehelpfilesfortheEMBOSSprogramsaswell. 1)UsetheSMARTdatabase BecausetherearenoSMARTdomainsintheMTR1proteinsequence,we’lluseadifferent sequencetosubmittoSMART.UsethehumanproteinNEK2withaUniprotaccessionnumber P51955.IfyouuseUniprot/SwissProtaccessionnumbers,youcansimplytypeinthe accessionnumberinthetextboxSequenceIDorACCshowninFigure1.Checktheboxes PFAMdomainsandsignalpeptides.ThenclicktheSequenceSMARTbutton. Figure1:SubmissionformfortheSMARTdatabase.Notethebatchaccesslinkatthebottom.Clickonthisto submitalistofproteinaccessionnumbers. BCHM62802016 ProteinInformationontheWeb Page1of6 TheoutputincludesagraphicofanydomainsfoundintheproteinasshowninFigure2. Figure2:DomainfoundinthesequenceNek2_HUMAN(P51955). Theverticallinesrepresentintronpositions.ThenumberatthetopofthelineistheAA positionoftheintron-exonboundaryandthenumberatthebottomisthereadingframe,with differentcolorsdependingonthereadingframe.Ifyoupausethemouseoverthedomain graphic,itwillexpandthedisplaytoshowmoreinformationaboutthedomain,includingthe positionofthematchandtheE-valueforthematch.Thedomainsarerepresentedonthe proteinsequenceinthelocationthattheyarefound.Thebrightgreenhortizontalbars representcoiled-coiledregionsandthebrightpink/magentacolorrepresentsregionsoflow complexity.Youcanfollowvariouslinksfromtheoutputtolearnmoreabouttheindividual domainsaswellaslinksfortheproteinrecordatUniprot.Mouseoverthedomainuntila graphicexpandsbelowthefigure.Clickonthelink“gotofullannotation”andthewindowwill change,anddisplaythesequenceofthedomainwiththecatalyticsiteshighlightedingreen,as showninFigure3.Thiscanbeaveryusefulfeatureisyouareinterestedinmaking constitutivelyactivemutantsorcatalyticallyinactivemutants. Figure3:SMARTdomaindetailpage.Catalyticresiduesareshowningreen. AtthebottomoftheSMARToutputtherearealsoanumberofexpandablemenusthatcontain linkstoadditionalinformationaboutthedomainshown. 3)ScantheproteinsequencesusingtheTMHMMprogramattheCBSpredictionserver SubmittheMTR1proteinsequencetotheTMHMMprogram,usingextensivegraphicsasthe output.TheoutputshouldlooklikethatdisplayedinFigures4aand4b: BCHM62802016 ProteinInformationontheWeb Page2of6 Figure4a:TMHMMoutputfromtheCBSpredictionserver Figure4b:TMMHMMgraphicoutputfromCBSpredictionserver Theredbarsatthetoprepresentthepositionsofprobablytransmembranespanning domains.ThepinkorbluelinesconnectingtheTMdomainsprovidethetopologyrelativeto thecellmembrane.Ifyouwantedtodesignapeptideantibodytothisprotein,wouldyou choosearegionpredictedtobeontheinsideoroutsideofthemembrane? DotheseresultsconcurwiththoseobtainedfromInterProScan?Whatadditional informationdoyougetfromthisserverthatyoudonotfromInterProScan?SubmittheNEK2 proteinsequencetotheTMHMMpredictionserver.DoesitpredictthepresenceofanyTM domains? 4)Usethepredictionserverstolookforsignalpeptidesandphosphorylationsites. ScaneithertheMTR1orNEK2proteinsequenceusingthePredictionServersprogram SignalPtolookforsignalpeptides.GiventheresultsoftheTMpredictions,wouldyouexpect thisproteintobesecreted?Thissitealsopredictssignalanchorpeptides,whichisamore likelyresultgiventhenumberofpredictedTMdomains. UsetheNetPhosKprogramatthePredictionServerssitetopredictthepotential threonineandtyrosinephosphorylationsites.Keepinmindthatthesepredictionsarejust that,predictions.Whileusefulforinformingpossibleexperiments,theyarenotnecessarily correct. BCHM62802016 ProteinInformationontheWeb Page3of6 5)ScantheproteinusingtheantigenicprograminEMBOSS. ThisprogramislocatedundertheProteinMotifsectionofEMBOSS.Thisisasimpleprogram designedtopredictpotentiallyantigenicregionsofaproteinsequence;agoodstartingpoint fordesigningpeptideantibodies.Readthemanualforittounderstandthemethodbehindthe program.Theoutputisfairlysimple,justatabletoregionswhichhaveahighlikelihoodof beingantigenic.Underthetextboxwhereyouinputthesequence,thereisanoptionfor increasingtheminimumlengthoftheantigenicregionaswellasasectionforchangingthe defaultoutput.ThedefaultoutputisEmboss_motifthatshowsthesequencemarkedby locationandthescore.YoumayfindtheEMBOSSSeqtabletobeabitmoreuseful.Tryafew differentoutputstoseewhatinformationisprovidedbythem. Figure5:EMBOSSmotifoutput Figure6:EMBOSSseqtableoutput PATTERNmatching AcommonrequestIgetistofindparticularaminoacidpatternsinasequenceorgroupof sequences.Oftenthesepatternsaresomewhatdegenerate.Thereisaprogramwithin EMBOSScalledfuzzprowhichusesPROSITEstylepatternstosearchproteinsequences. BCHM62802016 ProteinInformationontheWeb Page4of6 Anexamplerecognitionorcatalyticsitemighthavethepattern:[FY]-[LIV]-G-[DE]-x(2)-{E} InEnglishthatis:ForY,followedbyL,IorV,followedbyG,followedbyDorE, followedbyany2aminoacids,followedbyanyaminoacidEXCEPTE. UsefuzzprotofindthefollowingpatternsintheMTR1protein: xRVK Howmanyhitsdidyouget? Nowtry:QRVK Youshouldgetonly1hit. Althoughthisexampleisverysimplistic,patternmatchingisaverypowerfultoolboth becauseoftheflexibilityofthesearchparameters;somethingyoucannotdousingtheFind commandinWordandalsobecauseatthecommandlineyoucansetituptosearchasmany sequencesasyouwant. HereisthePrositesignatureforGPCRfamily1: [GSTALIVMFYWC]-[GSTANCPDE]-{EDPKRH}-x-{PQ}-[LIVMNQGA]-{RK}-{RK}-[LIVMFT]- [GSTANC]-[LIVMFYWSTAC]-[DENH]-R-[FYWCSH]-{PE}-x-[LIVM] Copyandpastethatintothesearchpatterntextboxforfuzzpro. Doesitfindahit?Whatisthesequencethatmatchesthispattern? Myresultswere: Start 113 End Pattern_name Mismatch Sequence 129 pattern1 . GSIFNITGIAINRYCYI Prositepatternsyntax 1. ThestandardIUPACone-lettercodesfortheaminoacidsareusedinPROSITE. 2. Thesymbol`x'isusedforapositionwhereanyaminoacidisaccepted. 3. Ambiguitiesareindicatedbylistingtheacceptableaminoacidsforagivenposition, betweensquarebrackets`[]'.Forexample:[ALT]standsforAlaorLeuorThr. 4. Ambiguitiesarealsoindicatedbylistingbetweenapairofcurlybrackets`{}'theamino acidsthatarenotacceptedatagivenposition.Forexample:{AM}standsforanyamino acidexceptAlaandMet. 5. Eachelementinapatternisseparatedfromitsneighborbya`-'. 6. Repetitionofanelementofthepatterncanbeindicatedbyfollowingthatelementwith anumericalvalueor,ifitisagap('x'),byanumericalrangebetweenparentheses. Examples: x(3)correspondstox-x-x x(2,4)correspondstox-xorx-x-xorx-x-x-x A(3)correspondstoA-A-A Note:Youcanonlyusearangewith'x',i.e.A(2,4)isnotavalidpatternelement. 7. WhenapatternisrestrictedtoeithertheN-orC-terminalofasequence,thatpattern eitherstartswitha`<'symbolorrespectivelyendswitha`>'symbol.Insomerare BCHM62802016 ProteinInformationontheWeb Page5of6 cases(e.g.PS00267orPS00539),'>'canalsooccurinsidesquarebracketsfortheCterminalelement.'F-[GSTV]-P-R-L-[G>]'meansthateither'F-[GSTV]-P-R-L-G'or'F[GSTV]-P-R-L>'areconsidered. Otherpredictiontools Thetoolslistedherearejustafewofthehundredsthatareavailable.Ifyouare interestedinaparticularfeatureofaprotein,suchasifithasamyristylationmoietyoris ubuiquitinatedatcertainresidues,therearepredictionprogramsforthat.StartwithaGoogle searchandyoucanprobablyfindoneormoresitesthathaveawebinterfacetotheprediction algorithm.Alwayskeepinmindthatapredictiondoesnotmeanthatitoccursinthecell.If thereismorethanonepredictionalgorithm,youmightwanttotestyourprotein(s)with otherpredictionstodeterminetheoverlap. BCHM62802016 ProteinInformationontheWeb Page6of6