Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics Methods in Redox Biology Dmitri Fomenko Redox Biology Center University of Nebraska - Lincoln Protein set Searches for cysteines Redox-cofactors binding prediction Analysis of conservation profile for redox motifs and single cysteines Filtering out metalbinding cysteines Homology to sporadic selenoproteins and known thiol oxidoreductases Secondary structure context prediction Structure modeling Annotation Cysteine Protein set Searches for cysteines Redox-cofactors binding prediction Analysis of conservation profile for redox motifs and single cysteines Filtering out metalbinding cysteines Homology to sporadic selenoproteins and known thiol oxidoreductases Secondary structure context prediction Structure modeling Annotation 1. Cysteines with redox-catalytic activity. Such cysteines are directly involved in catalysis and occur in oxidoreductases. Examples: Thioredoxins, Glutaredoxins, Glutathione peroxidases, Peroxiredoxins, Methionine sulfoxide reductases. 2. Regulatory cysteines. Protein activity is regulated by redox state of these noncatalytic cysteines. Examples: transcription factors - OxyR, Yap1, chaperone Hsp33, mitochondrial branched chain aminotransferase. 3. Structural Cysteines. These cysteines are involved in formation of intramolecular and intermolecular disulfide bonds during oxidative folding and occur in various protein types. 4. Metal-coordinating Cysteines. These residues are involved in coordination of divalent metal ions. Examples: iron-sulfur clusters, zinc-binding proteins, calcium binding proteins. 5. Catalytic cysteines, which do not change their redox state during catalysis. Examples: cysteine proteases, GAPDH. Cysteine is one of two least abundant amino acids residues in proteins, but it is the most conserved amino acid. Functional cysteines are highly conserved even in distantly related organisms. Major redox motif - CxxC x – any amino acid CxxC-derived redox motifs: CxxS, CxxT, SxxC, TxxC Cysteines in the CxxC redox motif may be replaced with selenocysteine (U) Redox active cysteines are acsessible for interactions and located on protein surface Thioredoxin (b-a-b-a-b-b-a) Major representatives: Thioredoxins Glutaredoxins Peroxiredoxins Glutathione peroxidases Protein disulfide isomerases (PDI) More then 60% of known thiol oxidoreductases are thioredoxin-fold proteins Amino acid distribution 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Amino acid position Y V W T S R Q P N M L K I H G F E D C A Protein set Cysteine Searches for cysteines Redox-cofactors binding prediction Analysis of conservation profile for redox motifs and single cysteines Filtering out metalbinding cysteines pKa = 8.3 Homology to sporadic selenoproteins and known thiol oxidoreductases Secondary structure context prediction Structure modeling TGT, TGC-codons Selenocysteine pKa = 5.2 Annotation TGA-codon Eukaryotes: Initiation of translation AUG SECIS Selenocysteine UGA STOP AAAAAAA 3’ 5’ UAA UAG UGA Bacteria: 100 - 5000 bp SECIS Initiation of translation STOP AUG 5’ 3’ 20 - 50 bp UAA UAG UGA Cys Cys Cys Cys Sec Cys Cys Cys Cys Cys Cys Cys Cys Cys Cys Protein Sequence Database NCBI nonredundant protein set ORFs from completed genomes ORFs from environmental genomes Protein Translated DNA DNA Cys Cys Cys Cys Sec Sec Cys Cys Cys Cys Cys Cys Cys Cys Cys Cys xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxUxxxxx U xxxxxxxUxxxxx U xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C xxxxxxxCxxxxx C TBLASTN Nucleotide Sequence Database Completed genomes dbEST WGS Environmental genomes G P K P T R K R Y C I N S A S I E F G P K + R + C I S + S + + F G P K R G Q S R F U I F S S S L K F gccccaagcggggacaatcgagattctgaatatttagcagctcactgaagttc One of the most popular sequences analysis and alignment tool is BLAST (Basic Local Alignment Search Tools). BLAST is a set of sequence comparison programs that are used to search sequence databases for optimal local alignments to a query sequence. BLAST programs have good sensitivity and reasonable fast. The program exists as standalone version for most computer platforms and operation systems, and as web-version at http://www.ncbi.nlm.nih.gov/BLAST/. NCBI sequences databases updated each day and contain all protein and nucleotide sequences from open resources. There is a specialized extra sensitive tool for protein sequence alignment called PSI-BLAST (Position-Specific Iterative BLAST). This tool can be extremely useful for identification of similarity between distantly related proteins. Query 44262850 44272486 42886178 43971738 43196490 43897127 44572416 43293792 44390226 43598071 43498422 60035720 60052585 1 546 736 83 356 225 193 1196 429 774 414 399 612 472 MEFLAARASALLGCYYUTTSHAMRLGMSGKD--TG MEFLAARASALLGCYY*TTSHAMRLGMSGKD--TG MEFLAARASALLGCYY*TTSHAMRLGMSGKD--TG EIVAGRVSALNECFF*TTGHARFLRESGRD--TD ELLAARVSSANECFY*TTSHVLLLRESSKA--IS VEVVAGKVSALNECFY*TSSHARFLRESGRV--EN MELIAGRVSSVNECFY*TSAHATMLRVSAMT--TE ELLAARVSSANECFY*TTSHVLLLRESSKA--IS VELLASTVSALNECFY*TAAHVSLLRASSEA--LN VEIIAARTSSINECFY*ATSHVMLLRASSKA--TE EIVAGRISALNECFY*TNGHAKALREGAKL--AG VEVIAARTSKLNECFY*TTSHALLLGLSGMN--QD EMIAVCVAEASRCPYCADAHMIYLLASGMD--GD LELVSFRVSQINGCAYCLDMHSKDLRVKGET--EQ 44275476 43834666 44586938 49176232 37525471 9658033 24372842 54302561 50905511 4406372 39996033 45360053 34557042 48834389 46106717 52007258 46317410 54029015 48838221 54025306 3955039 15607465 20807605 46581156 50874889 48847313 53760573 48768248 15596406 33603950 54029683 48782625 44357259 44624393 21242497 17548617 47573188 53730919 50874889 46112894 46142555 46198460 48847131 46580382 53691784 43887282 52006282 53757119 53729577 56420601 23099356 CD01448 CD00158 CD01444 CD01524 Environmental sequence Environmental sequence Environmental sequence E. coli P. luminescens V. cholerae S. oneidensis P. profundum O. sativa D. glomerata G. sulfurreducens R. xylanophilus W. succinogenes Magnetococcus sp. R. xylanophilus T. denitrificans B. cepacia Polaromonas sp. M. barkeri N. farcinica S. peucetius M. tuberculosis T. tengcongensis D. vulgaris D. psychrophila G. metallireducens R. eutropha R. metallidurans P. aeruginosa B. bronchiseptica Polaromonas sp. B. fungorum Environmental sequence Environmental sequence X. axonopodis R. solanacearum R. gelatinosus D. aromatica D. psychrophila Exiguobacterium sp. M. burtonii T. thermophilus G. metallireducens D. vulgaris D. desulfuricans Environmental sequence T. denitrificans M. capsulatus D. aromatica G. kaustophilus O. iheyensis WLNFINRGKDKTFKTPEQIFEILNNAGVDPEKQIVTYUQ--GGIRAAHVMFVLALVSTFSPNINYDRVKVYDGSMGEWA WLEFIDENNNNKFKSQNEIESILNKQNITYEKQIATYUQ--GGIRAAHVFVVLKLIG-------YKNIKVYDASMGEYA WFNLMDR-QTHLFRSEEDIKAILADNGIALDKAIYTYUQ--AGVRAAHANFVLQLIG-------QSEARVYDGSMGEWA WTELVRE-GELKT--TDELDAIFFGRGVSYDKPIIVSCG--SGVTAAVVLLALATLD-------VPNVKLYDGAWSEWG WTMLVEN-GHFKS--ETEITDIFHKQGVDLNKPVITSCG--SGMTAAVLVLGLDIIG-------KKDVYLYDGSWAEWG FAELITG-HKLKE--QAELRPLLTHMLPETAQEYLFSCG--SGVTACIVLLAAYVCG-------YKNLSVYDGSWTEWG FGEVLNG-YKMKS--TTELQAIFQALVGNKALR-IFSCG--SGITACILILASVVAG-------HKSAVLYDGSWADWG FSQLIKD-GFFID--KELLVNRFNAVS-DIEQRLIFSCG--SGVTACVLALGAELAG-------RKMLTVYDGSWTEWG FLEMFDDAPMLLP--ADEIRKKFEQAGISLDRPIVVTCG--SGVTACILALGLYRIG-------KQDIPVYDGSWTEWE FPQILDASQALLP--ADELKKRFDQEGISLESPIVTSCG--TGVTACILALGLHRLG-------KSDVAVYDGSWTEWG YNEA--HIPTAVSIPFAELEKNPALLTASKDRLLVFYCGGVTUVLSPKSAGLAKKSGYE-------KVRVYLDGEPEWK YRQG--HLPGAVRYESAEQVRKLAPQKDAF-IVAY--CSNFNUHSSTRVARELAAMGYE-------NVYDYEGGKQDWV FRES--TIPGSIGVSDGKFKELWGRLPMDPNTKVVVFCGGYECELSHSVAGHMVAMGYK-------NFMTYSGGTPEWK YAAG--HLPGAVNIPLKDLEANLALLPAGQEVVAY--CRGPWCVLAFDAVARLRARGI--------KARRLQDGLPEWR YRAG--HIPGALSVPLERLEAYLAEIPKDQEIVAY--CRGPYCVFADEAVALLRSRGY--------RARRLQEGLPDWR YQAG--HIPGAVNIPIDELPHHLEALPQGQEIVAY--CRGPYCMLAFDAVATLRQAGY--------QARRLEDGFPEWK FTEG--HLPGALNIPLSELDARVSELPAGTEIVAY--CRGPYCVFAVEAVAALRARGF--------KAARLEDGFPEWK FTSA--HLPRARSLPVDELKKRLNELPKDVPIVAY--CRGPFCLMAKDAVELLRKKGY--------RAFHLTDGVAEWR YEMM--HIPGANSIPLEDLEKHLATLPINQEIVAY--CRGRCCLLSVEAVEILRAHGF--------KAVRLEASVQEWL YSAG--HIPGAINIPIDQLSDRIAELPADTEIVVY--CRGEYCVFAYDAVRLLTERGR--------RAVRLRDGMLEWR YLAG--HIPGAVCIPVAELTDRIGELAKDTEVVVY--CRGEYCALAYDAVRLLTDHGR--------RAIRLNDGMLEWR YQAG--HIPGAINIPIAELADRLAELTGDRDIVAY--CRGAYCVMAPDAVRIARDAGR--------EVKRLDDGMLEWR YEQA--HIKGSISIPLEELPNHLNCLSKDKLIVTY--CASYECTSCIEAAELLANYGF--------NVKVYRGGTKEWI FPIPDMNDWNMAETGDKSQQDFEALLGPDKNRPLVFYCGFVKCTRSHNGAVWAQKLGYT-------NVYRMPGGIVAWK WTSSEFKIKGAHRANPGKLDTWKSKFAKDKKIVLYCAUP--NESTSASLARKLTADGFS-------SVHALKGGWREWS AVGAINLPNDGPADIERIKQMELPFTKKDEIIV-YCSUA—-GEQASARVALVLIERGFT-------KTYVVRGGRQAVF VPIE--RIPGAVVVDMHGPLDALGGQLESRDIVVYCACP--NEISAAILAERLRVAGYG-------KTWALAGGFDEWK APIE--RIPGSIVMEIKGPFDTLSGHDASSDFVVYCACP--HEMSAAVLAERLRTAGYP-------NTWALAGGFDEWK DEPS--GIPGAIPVELNVSLKDLPGDLRDASIVIYCACP--HELSAAMLAQRLNASGFT-------RTWALAGGLDAWR RDEQ--RIPGAIAMDLRAPLQDLQFDPEAGDIVVYCACP--NEVSAAQLAKKLRAAGYR-------NTFALRGGYEAWR RAGGG-IIPGALVWSDLDRKMASLDLPHDAHVVVYCACP--NDASAAQVAKRLMAAGFS-------NVRPLHGGIDAWE RKLDPFVIPGTQFADERQLDEIVATYPRDQKLVIYCSCP--NEISAAWMARQLNEAGFS-------DVLPLRGGMEAWR RALDPFVIPGSQFADERQLDEIVATYPHDQKVVIYCSCP--NEISAAWMAKQMNEAGFA-------DVLPLRGGMEAWR RKLDPFTIPGAQFADERQIGDIVSRYPFSQKFVVYCSCP--NEFTAALMAKRLLDAGFT-------DALALRGGLDAWR RQLQPYTIPGAVFADERQLAQILASVPRDRSVVIYCACP--DEVSAAWLAARMRERGYR-------DVRPLLGGLDAWR RMSQPHRIPGAMLYDMSAKDGPIEIEGPDREIVIYCACP--NEASAVMLARTLMGRGFR-------RVRPLHGGIDAWM AGLDLRHIPGAWRVELSEVATHASQLPRDREIVLYCNCP--NEASAATAAQALRAAGLP-------RVRPLAGGLEGWA VAETG-PITGATVAEHDRLLDAVGEWPKNLPIVTLCACP--EDAGAIQAARQLLNAGFL-------SVRPLKGGYEAWL YENG--HIPGAKLIPVGQLESRLDELP--KDKPLVVYUA--IGGRSRVAVQLLAGKGFS-------KIYNLSGGINAWE FKGN--HIKGFKNIPLQVLPTQLDKIP--KDKEVIVICQ--SGMRSKQAVKQLKKAGYT-------QVTEVSGGMNAWR FNSG--HLEGAVNIEVSQLGTRLNEAP--ADKVILVYCR--TGVRSVRASKTLVNAGYT-------DVYNMKGGIMAWM FAGE--RIQGAVNIPIRDLPKRVGELP--KGKPIIVYCK--VGHRGSMAMMFLRGQGY--------NVQSISGGLDGWK FGQG--RLQGAVLIPINEVERRIGEIP--RNRPVVVYCA--VGSRSGLVAGFLSRKGYR-------EVYNMADGIVGWY YAEG--HIPGAMLMPLADLADGMRQLP--AENPLLVYCA--IGGRSRIAAQLLAGNGFS-------KVMNLSGGFKAWN YRQG--HLPGARLVPMGELSDRLDELE--RDGPTLVYCA--IGGRSRVAAQMLAGKGFK-------HVINMAGGFKDWE YEIC--SLPDSKLIPLGDLTSRVHELD--TADDIIVYCH--HGMRSLQAARMLKGMGYK-------KVRNLAGGIDAWA YAAG--HIPKAKHIPLGQLQSRLSELDKHKNKPVLVTCR--SGNRSAHACRILKKAGFE-------SVYNQAGGILAWE FAEG--HIEGAYHIPLGKLEERASEIAQYKEKPVIVTCQ--QGTRSPSACKTLTKQGFS-------RIYEMRGGMLAWR YASG--HLPDAKNIPVAKLADRIGELEKFKDKPIIVCCA--TGMRSNKACAELKKQGFD-------KLHNLAGGVDAWV YAFG--HIPGAVSIPLGELENRMAELP--KDKTIYVVCR--TGTRSDLAAQKLAEKGFD-------RVRNVIPGMSQWN FDKG--HILGARNIPMTQMKQRLIEMR--KDKPIYLYCQ--GSSRSARAAQLLHKKGYK-------EIYQLKGGFKKWT 99 79 60 28 89 45 37 44 51 41 14 69 90 62 70 07 79 45 99 23 85 01 37 01 87 22 71 24 0 69 98 01 15 41 58 AsVO43-+2e AsIIIO33-+CH3+ CH3AsVO32-+2e CH3AsIIIO22-+CH3+ (CH3)2AsVO2-+2e (CH3)2AsIIIOEnvironment sequence Environment sequence Environment sequence Environment sequence D. psychrophila M. musculus R. norvegicus H. sapiens P. pygmaeus G. gallus T. nigroviridis S. pomeroyi M. degradans T. denitrificans M. magnetotacticum S. thermophilum M. acetivorans M. mazei C. hutchinsonii C. neoformans M. grisea D. hansenii A. nidulans A. variabilis Nostoc sp N. punctiforme Synechocystis sp. G. violaceus V. cholerae V. parahaemolyticus Polaromonas sp E. coli Y. pestis C. briggsae S. aureus -WADV-VISNGVINLCA--DKKRVFQEIRRVLRPGGRLQFADIANGKAVPASAVR-NIDLWTAUIAGGLPCEGWRAMLEAVGF -------------NLCA--DKKQVFKEIWRVLRPGGRLQFADIANGKPVPPAALR-NIDLWTAUIAGGLPCEGWRSMLEEVGF -SVDV-VMSNGVINLTP--DKITAFSEVFRVLKPGGAFFYGDIVVAEELGESIRR-NIELWTGUIAGALPVAEITPVLTSVGF -------------------------------------------------VAVDCH-DPKLWASUIGGALPEGEVFRLLEESGF -SFDL-VISNGVFNLSL--QKTLLFAEVFRVLKPQGKLQFADIVLRKKLPTEMKG--AAAWSNUIGGAVSVGDQIEYMLEAGF ESYDI-VISNCVINLVP--DKQQVLQEVYRVLKHGGELYFSDVYASLEVPEDIKS-HKVLWGECLGGALYWKDLAIIAQKIGF ESYDI-VISNCVINLVP--DKQKVLREVYQVLKYGGELYFSDVYASLEVSEDIKS-HKVLWGECLGGALYWKDLAVIAKKIGF ESHDI-VVSNCVINLVP--DKQQVLQEAYRVLKHGGELYFSDVYTSLELPEEIRT-HKVLWGECLGGALYWKELAVLAQKIGF ESHDI-VVSNCVINLVP--DKQQVLQEAYRVLKHGGELYFSDVYTSLELPEEIRT-HKVLWGECLGGALYWKELAVLAQKIGF ESYDI-VISNCVINLTP--DKRAVLREAYRVLKPGGEMYFSDVYANQHLSEAMRK-HRVLWGECLAGALFWRDLYSIAKEVGF DSFDI-IISNCVVNLSP--DKKRVLAEAYSVLKDGGELYFSDVYSSGRLTEEMRN-HKVLWGECLSGALWWKDLLLLAEEVGF GSFDI-IVSNCVLNLAT--DKAAVLRGAQRLLKPGGEMYFSDVYADRRIPEALAR-DEVLYGECLSGALYWNDFLSLARGAGF NSFDV-IISNCVINLCT--DKTAVLKHAWHLLKEGGEFYFSDVYADRRIPTNLSQ-DPILYGECLSGAYYWNDFINAAKTAGF -TVDA-IISNCVINLSP--DKVQVFREAFRVLKPGGRLAFSDIVTTAELPEAMQR-EVALYTACVAGAASVDELTAMLADAGF -TADV-VISNCVINLSP--DKPAVLNDAFRVLKPGGRVAISDVVMLRPLPPELAA-MKELLTGCAAGAATVAELSNWLEQAGF -SVDV-IISNCVINLSP--EKEQVFREAFRVLRPGGRIAVADMVSLAPLPPEVRE-DLALYAGCVAGVATVGELRTMLTEAGF -SVDV-IISNCVINLAP--DKEKVFREAFRVLKPGGRMYVSDMVLLEDLPEDLKN-DCDLLAGCVAGALLKEEYLGLLKKAGF -SVDV-IISNCVINLAP--DKEKVFREAFRVLKPEGRMYISDMVLLDELPEELKN-DSELLAGCIAGAVLKEEYLGLLKKAGF -RADV-VVSNCVMNLVP--DKAKAFSEVFRILKPLGHFSISDIVLKGDLPDAIKK-EGEMYAGCVSGAIKKSEYLGILAEQGF -STDC-IISNCVINLVPHDDKHLVFKEIYRLLKAGGRVSVSDLLAKKQITPELQS-HLGLYVGCISGASLVGEYENWLKEAGF -IADC-IISNCVINLVPAAEKHLVFKEIFRLLKPGGRLAVSDILAKKPMPEKIRS-DIALYVGCISGASTVSEYEEFLKDAGF -TADV-VISNCVLNLVPDDEKPTTFKEIYRLLKSGGRVAISDLLSVRELPDTIKN-NLAFYVGCVSGARSVGEYEKWLKEAGF -SADC-IISNCVINLVPKDAKPIVFAEIARLLKPGGRVAISDILARKPLSPAFVS-DIALYVGCVAGASLVEEYEDWLGRAGL -SVDI-VAQNCLFNIFEPEDLTRALKEAYRVLKCGGRLQMSDPIATSPIPAHLQQ-DERLRAMCLSGALTYEEYTQRIIDAGF -SVDI-VAQNCLFNIFEPEDLTRALKEAYRVLKPGGRLQMSDPIATSPVPAHLQQ-DERLRAMCLSGALTYEEYTQRIIDAGF -SVDI-VAQNCLFNIFEPEDLNRALKEAYRVLKPGGRLQMSDPIATSQIPAHLQQ-DERLRAMCLSGALTYQEYTERITNAGF -SVDV-VAQNCLFNIFEPEDLSRALKEAYRVLKPHGRLIMSDPIAARPIPQHLRQ-DERLRAMCLSGALTYAEYIQHLIDTGF -AIDL-VAQNCLFNIFEPDDLLTALQEVRRVLVPGGRLVLSDPIASRPIPPHLQA-DERLRAMCLSGCLPLEQYLGCIVEAGF -YFDCITISFCLRNVT---DKDKALRSMFRVLKPGGRLLVLEFSKPILEPLSKLYDTYSFHILPKMGQLIANDADSYRYLAES -YFDVITISFCLRNVT---DKDKALRSMFRVLKPGGRLLVLEFSKPVLEPLSKVYDAYSFHLLPKMGELVANDAESYRYLAES -SCDA-VISNGVINLAP--DKRTVFREAARLLKPGGRLALADIVTETQLPEGITC-DTTLWAACIGGAMQVGDYTSAIEAAGL -TFDCITISFGLRNVT---DKDKALRSMYRVLKPGGRLLVLEFSKPIIEPLSKAYDAYSFHVLPRIGSLVANDADSYRYLAES -FFDCITISFGLRNVT---EKEKALRSMFRVLKPGGRLLVLEFSKPLLEPLSKAYDAYSFHILPKIGELVAQDAESYRYLAES -TYDLFTMSFGIRNCT---HPQKVIAEAFRILKPGGQLAILEFSQVN-AALKPIYDAYSFNVIPVLGEILASDRQSYQYLVES -SFDYVTIGFGLRNVP---DYLVALKEMNRVLKPGGMVVCLETSQPTLPVFKQMYALYFKFVMPIFGKLFAKSKEEYEWLQQS CDD10371 Cys157 SAM-dependent methyltransferase Cys207 (Cys/Sec) http://genomics.unl.edu/REDOX/REDOXCysSearch/ http://www.selenodb.org/ http://genome.unl.edu/SECISearch.html Protein set Searches for cysteines Redox-cofactors binding prediction Analysis of conservation profile for redox motifs and single cysteines Filtering out metalbinding cysteines Homology to sporadic selenoproteins and known thiol oxidoreductases Secondary structure context prediction Structure modeling Annotation Redox active cysteines are typically conserved even in distantly related proteins. Blastall and PSI BLAST programs from BLAST tools could be used for conservation profile analyses Protein set Major metal coordinating residues – Cysteine and Histidine Searches for cysteines Redox-cofactors binding prediction Major metal binding motif - CxxC Analysis of conservation profile for redox motifs and single cysteines Filtering out metalbinding cysteines Homology to sporadic selenoproteins and known thiol oxidoreductases Secondary structure context prediction Structure modeling Annotation Metal binding cysteines are conserved even in distantly related organisms More than 90% of CxxC motifs are involved in metal coordination Metal binding proteins are major false-positive hits in thiol oxidoreductases identification process Some of metal binding cysteines are involved in redox regulation Distribution of metal-binding protein patterns and profiles in PROSITE databa http://ca.expasy.org/prosite/ Metal Zinc Iron Calcium Copper Magnesium Nickel Manganese Cobalt Patterns 77 74 24 22 21 10 8 2 Profiles 32 5 4 4 0 0 0 0 There are 77 zinc-binding protein patterns, including 36 patterns that contain one cysteine. 30 of these 36 patterns contain one CxxC motif. Of 74 iron-binding protein patterns, 31 contain one cysteine. 15 of these 31 contain one CxxC motif. Galactitol-1-phosphate dehydrogenases (Zn) Peroxiredoxins – thiol/disulfide oxidoreductase Cx1x2C Conservation profile based distribution for X1 and X2 X1 { N1(AA1), N2(AA2), N3(AA3), …,Nn(AAn)} N1>N2>N3>….>Nn X2 { P1(AA1), P2(AA2), P3(AA3),…, Pn(AAn)} P1>P2>P3>….>Pn AA - amino acid; Nn, Pn - number of amino acids; Metal-binding proteins: N1(AA1)+ N2(AA2) P1(AA1)+ P2(AA2) + N1(AA1)+ N2(AA2)+…..+ Nn(AAn) <=1 P1(AA1)+ P2(AA2)+…..+ Pn(AAn) Redox proteins: N1(AA1)+ N2(AA2) P1(AA1)+ P2(AA2) + N1(AA1)+ N2(AA2)+…..+ Nn(AAn) >1 P1(AA1)+ P2(AA2)+…..+ Pn(AAn) Metal-binding proteins: <=80 Amino acids CxxCxxxxxx……………..xxxxxxCxxC Redox proteins >80 Amino acids CxxCxxxxxx……………..xxxxxxCxxC Prosite pattern based filter Amino acid conservation based filter CxxC motifs distance based filter Protein set Searches for cysteines Redox-cofactors binding prediction Analysis of conservation profile for redox motifs and single cysteines Filtering out metalbinding cysteines Homology to sporadic selenoproteins and known thiol oxidoreductases Secondary structure context prediction Structure modeling Annotation Secondary structure prediction: PSI Pred - http://bioinf.cs.ucl.ac.uk/psipred/ SSPro - http://www.igb.uci.edu/tools/scratch/ Secondary structures distribution (thioredoxin-fold proteins included) 0.9 0.8 0.7 a-helix helix strand b-strand Loop Loop Thioredoxin fold proteins redox motif surrounded by beta strand and alpha helix beta-CxxC-alpha 0.6 0.5 0.4 0.3 0.2 0.1 0 1 -9 2 -8 3 -7 4 -6 5 -5 6 -4 7 -3 8 -2 9 10 21 -10 4 16 5 17 6 18 -1 11 1 13 2 14 3 15 7 19 8 20 9 10 0 12 Secondary structures distribution (non thioredoxin-fold proteins only) 0.9 a-helix ahelix 0.8 b-strand b - strand 0.7 Loop Loop 0.6 Active cysteine followed by alpha helix in most of redox proteins. CxxC – alpha 0.5 0.4 0.3 0.2 0.1 0 10 -10 -1 11 0 121 132 143 154 165 176 187 198 209 21 1 -9 2 -8 3 -7 4 -6 5 -5 6 -4 7 -3 8 -2 9 10 Selenoprotein M (b-a-b-b-b-a) MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS ________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Selenoprotein M (b-a-b-b-b-a) MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS ________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVD ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH___________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ { { Selenoprotein M (b-a-b-b-b-a) LLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS _______________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ Thioredoxin (b-a-b- a-b-b-a) KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI ___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__ Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ Selenoprotein H (a-b-a-b-b-a-a) AAVVAVAEKREKLANGGEGMEEATVVIEHCTSCRVYGRNAAALSQALRLEAPELPVKVNPTKPRRGSFEVTLLRPDGSSAELWTGIKKGPPRKLKFPEPQEVVEELKKYLS _HHHHHH________________EEEEEE____HHHHHHHHHHHHHHHHH____EEEE_________EEEEE_______HHHHHHH____________HHHHHHHHHHHH_ Selenoprotein T (b-b-a-b-a-a-a-a-b-b) GGVPSKRLKMQYATGPLLKFQICVSUGYRRVFEEYMRVISQRYPDIRIEGENYLPQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQAPSIWQWGQENKVYACMMVFFLSNMIENQCMSTGAFEITLNDVPVWSKLES ______EEEEEE____EEEEEEEEE___HHHHHHHHHHHHHH____EEE______HHHHHHHHHHHHHHHHHHHHHH_____HHHH______HHHHHH___HHHHHHHHHHHHHHHHHHH___EEEEEEE__EEEEEEE__ { Selenoprotein M (b-a-b-b-b-a) LLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS _____________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Thioredoxin (b-a-b-a-b-b-a) KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI ___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__ Selenoprotein W 2FA8 SelW SelW SelW SelW SelW AGROBACTERIUM TUMEFACIENS PSEUDOMONAS FLUORESCENS VIBRIO CHOLERAE BOS TAURUS MUS MUSCULUS SEA URCHIN EEEEEEE HHHHHHHHHHHHHH EEEEEEE EEEEE EEEEEE HHHHHHHHHHHH PRIAIRYCTQCN-WLLRAGWMAQEILQTFASDIGEVSLIPST--GGLFEITVD-----GTIIWERKRDG-----GFPG----PKELKQRIRDLID PEIVITYCTQCQ-WLLRAAWLAQELLSTFGDDLGKVSLVPGT--GGIFHITCN-----DVQIWERKADG-----GFPE----AKVLKQRVRDQID AQIEIYYCRQCN-WMLRSAWLSQELLHTFSEEIEYVALHPDT--GGRFEIFCN-----GVQIWERKQEG-----GFPE----AKVLKQRVRDLID VVVRVVYCGAUG-YKPKYLQLKKKLEDEFPSR-LDICGEGTPQVTGFFEVFVA-----GKLVHSKKGGD-----GYVDTESKFLKLVAAIKAALA LAVRVVYCGAUG-YKPKYLQLKEKLEHEFPGC-LDICGEGTPQVTGFFEVTVA-----GKLVHSKKRGD-----GYVDTESKFRKLVTAIKAALA VIVKVIYCGGUG-YGPRYRRLKQELKDEFGDD-VDMAGESTPGTTGWLEVXVN-----GKLIHSKKNGD-----GYIDSESKLKKIVNAVSAAM- Selenoprotein V SelV MUS MUSCULUS SelV RATTUS NORVEGICUS SelV HOMO SAPIENS ILIRVMYCGLUS-YGLRYIILKRTLEHQFPNL-LEFEEERATQVTGEFEVFVD-----GKLIHSKKKGD-----GFVD-ESGLKKLVGAIDEEIK ILIRVMYCGLUS-YGLRYILLKKTLEHQFPNL-LEFEEERATQVTGEFEVFVD-----GKLIHSKKKGD-----GFVD-ETSLKKLVGAIDEEIK VLIRVTYCGLUS-YSLRYILLKKSLEQQFPNH-LLFEEDRAAQATGEFEVFVN-----GRLVHSKKRGD-----GFVN-ESRLQKIVSVIDEEIK Selenoprotein H SelH SelH SelH SelH MUS MUSCULUS HOMO SAPIENS BOS TAURUS DANIO RERIO ATVVIEHCTSURVYGRHAAALSQALQLEAPE--LPVQVNPSKPRRGSFEVTLLRSDNSRVELWTGIKKGPPRKLKFPE----PQEVVEELKKYLS ATVVIEHCTSURVYGRNAAALSQALRLEAPE--LPVKVNPTKPRRGSFEVTLLRPDGSSAELWTGIKKGPPRKLKFPE----PQEVVEELKKYLS PSVVIEHCTSURVYGRNAAALSQALRLQAPE--LTVKVNPARPRRGSFEVTLLRADGSSAELWTGLKKGPPRKLKFPE----PHVVLEELKKYLS LRVVIEHCKSURVYGRNAVVVREALADSHPE--LKVMINPHNPRRNSFEITLMDG-ERADVLWSGIKKGPPRKLKFPE----PAEVVTALKQALE Rdx12 Rdx12 Rdx12 Rdx12 Rdx12 Rdx12 SUS SCROFA HOMO SAPIENS GALLUS GALLUS DANIO RERIO 2A DANIO RERIO 2 VRIVVEYCEPCG-FEATYLELASAVKEQYPG--IEIESRLGG--TGAFEIEIN-----GQLVFSKLENG-----GFPY----EKDFIEAIRRASN VRIVVEYCEPCG-FEATYLELASAVKEQYPG--IEIESRLGG--TGAFEIEIN-----GQLVFSKLENG-----GFPY----EKDLIEAIRRASN VHIMVEYCEPCG-FGATYEELASAVREEYPD--IEIESRLGG--TGAFEIEIN-----GQLVFSKLENG-----GFPY----EKDLIEAIRRARN VQIKVEYCGGUG-YEPRYQELKRVVTAEFTD--ADVSGFVGR--QGSFEIEIN-----GQLIFSKLETS-----GFPY----EDDIMGVIQRAYD VKVKIEYCGAUG-YEPRFQELKREICGNCPD--AEVSGFVGR--RGCFEIQIN-----DFLVFSKLESG-----GFPY----SEDIIEAVVKAKD Selenoprotein T SelT SelT SelT SelT SelT SelT BOS TAURUS SUS SCROFA MUS MUSCULUS ANOPHELES GAMBIAE SOLANUM TUBEROSUM MEDICAGO TRUNCATULA SelT SelT SelT SelT SelT SelT BOS TAURUS SUS SCROFA MUS MUSCULUS ANOPHELES GAMBIAE SOLANUM TUBEROSUM MEDICAGO TRUNCATULA EEEEEEEEE HHHHHHHHHHHHHH EEEEEE PLLKFQICVSUG-YRRVFEEYMRVISQRYPD--IRIEGENYL PLLKFQICVSUG-YRRVFEEYMRVISQRYPD--IRIEGENYL PLLKFQICVSUG-YRRVFEEYMRVISQRYPD--IRIEGENYL ATMTFLYCYSCG-YRKAFDDYHNLILEKYPE--ITIRGSNYD NTVTIDFCSSCS-YRGTAVTMKNMLDNQFPG--IHVVLANYP NTVSIDFCTSCS-YKGNAVSVKNTLESLFPG--INVVLANYP EEEEE EEEEE HHHHHHHHHHH TGAFEITLN-----DVPVWSKLESG-----HLPS----MQQLVQILDNEMK TGAFEITLN-----DVPVWSKLESG-----HLPS----MQQLVQILDNEMK TGAFEITLN-----DVPVWSKLESG-----HLPS----MQQLVQILDNEMK SGAFEITLN-----DVPVWSKLETG-----RFPA----PQEMFQIIDNHLQ SGAFEVYCN-----GELVFSKLKEN-----RFPG----ELELKDLVGRKIA SGAFEVYFN-----GELVFSKLKEN-----RFPG----EFELKELIGRRIG HHHHHHHHHHHHHHHHHHHHHHH HHHH HHHHH HHHHHHHHHHHHHHHHHHHHHHH PQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQ-APSIWQWG-QENKVYACMMVFFLSNMIENQCMS PQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQ-APSIWQWG-QENKVYACMMVFFLSNMIENQCMS PQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQ-APSIWQWG-QENKVYACMMVFFLSNMIENQCMS PSGVNMLLSKVLLVTKLLLIAALMSNYDIGRYIGNP-FAGWWQWC-FNNKLYASMMIFFLGNTLEAQLIS PPLPKRLLGKVVPVFQFGVIGLVMAGEQIFPRLGIAVPPPWFYQL-RANRFGTMATTWLLGNFFQSMLQS PPLPKRALSKVVPVLQTGAIIAITAGDQIFPRLGVT-PPQLYYSL-RANKFGSIASIWLLSNFVQSFLQS Protein set Searches for cysteines Redox-cofactors binding prediction Analysis of conservation profile for redox motifs and single cysteines Filtering out metalbinding cysteines Homology to sporadic selenoproteins and known thiol oxidoreductases Secondary structure context prediction Structure modeling Annotation Sequence or structural similarity to known proteins Functional associations 3D-Jury system http://bioinfo.pl/meta/ Eukaryotes Homo sapiens Drosophila melanogaster Caenorhabditis elegans Arabidopsis thaliana Saccharomyces cerevisiae SelR MsrA Archaea Aeropyrum pernix Sulfolobus solfataricus Sulfolobus tokodaii Archaeoglobus fulgidus Halobacterium sp. NRC-1 Methanothermobacter thermautotrophicus Methanococcus jannaschii Pyrococcus abyssi Pyrococcus horikoshii Thermoplasma acidophilum Thermoplasma volcanium Bacteria Aquifex aeolicus Chlamydia muridarum Chlamydia trachomatis Chlamydophila pneumoniae AR39 Chlamydophila pneumoniae CWL029 Chlamydophila pneumoniae J138 Synechocystis sp. PCC 6803 Mycobacterium leprae Mycobacterium tuberculosis CDC1551 Mycobacterium tuberculosis H37Rv Bacillus halodurans Bacillus subtilis Clostridium acetobutylicum Mycoplasma genitalium Mycoplasma pneumoniae Mycoplasma pulmonis Ureaplasma urealyticum Lactococcus lactis subsp. lactis Staphylococcus aureus subsp. aureus Mu50 Staphylococcus aureus subsp. aureus N315 Streptococcus pneumoniae R6 Streptococcus pneumoniae TIGR4 Streptococcus pyogenes M1 GAS Caulobacter crescentus Agrobacterium tumefaciens Mesorhizobium loti Sinorhizobium meliloti Rickettsia conorii Rickettsia prowazekii Neisseria meningitidis MC58 Neisseria meningitidis Z2491 Campylobacter jejuni Helicobacter pylori 26695 Helicobacter pylori J99 Escherichia coli K12 Escherichia coli O157:H7 Escherichia coli O157:H7 EDL933 Yersinia pestis Buchnera sp. APS Vibrio cholerae Xylella fastidiosa 9a5c Haemophilus influenzae Rd Pasteurella multocida Pseudomonas aeruginosa Borrelia burgdorferi Treponema pallidum Thermotoga maritima Deinococcus radiodurans STRING - Search Tool http://string.embl.de/ Example - Yeasts mitochondrial glutaredoxin Grx5 >gi|6325198| Grx5p [Saccharomyces cerevisiae] COG0278 MFLPKFNPIRSFSPILRAKTLLRYQNRMYLSTEIRKAIEDAIESAPVVLFMKGTPEFPKCGFSRATIGLL GNQGVDPAKFAAYNVLEDPELREGIKEFSEWPTIPQLYVNKEFIGGCDVITSMARSGELADLLEEAQALV PEEEEETKDR