Download 15 kDa Protein (babbba) - Redox Bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bioinformatics Methods in
Redox Biology
Dmitri Fomenko
Redox Biology Center
University of Nebraska - Lincoln
Protein set
Searches for
cysteines
Redox-cofactors
binding prediction
Analysis of conservation
profile for redox motifs
and single cysteines
Filtering out metalbinding cysteines
Homology to sporadic
selenoproteins and known
thiol oxidoreductases
Secondary structure
context prediction
Structure modeling
Annotation
Cysteine
Protein set
Searches for
cysteines
Redox-cofactors
binding prediction
Analysis of conservation
profile for redox motifs
and single cysteines
Filtering out metalbinding cysteines
Homology to sporadic
selenoproteins and known
thiol oxidoreductases
Secondary structure
context prediction
Structure modeling
Annotation
1. Cysteines with redox-catalytic activity. Such cysteines are directly involved in
catalysis and occur in oxidoreductases. Examples: Thioredoxins, Glutaredoxins,
Glutathione peroxidases, Peroxiredoxins, Methionine sulfoxide reductases.
2. Regulatory cysteines. Protein activity is regulated by redox state of these noncatalytic cysteines. Examples: transcription factors - OxyR, Yap1, chaperone Hsp33,
mitochondrial branched chain aminotransferase.
3. Structural Cysteines. These cysteines are involved in formation of intramolecular
and intermolecular disulfide bonds during oxidative folding and occur in various
protein types.
4. Metal-coordinating Cysteines. These residues are involved in coordination of
divalent metal ions. Examples: iron-sulfur clusters, zinc-binding proteins, calcium
binding proteins.
5. Catalytic cysteines, which do not change their redox state during catalysis.
Examples: cysteine proteases, GAPDH.
Cysteine is one of two least abundant amino acids residues in proteins, but it is the
most conserved amino acid. Functional cysteines are highly conserved even in
distantly related organisms.
Major redox motif - CxxC
x – any amino acid
CxxC-derived redox motifs:
CxxS, CxxT, SxxC, TxxC
Cysteines in the CxxC redox motif may be replaced with
selenocysteine (U)
Redox active cysteines are acsessible for interactions and located
on protein surface
Thioredoxin (b-a-b-a-b-b-a)
Major representatives:
Thioredoxins
Glutaredoxins
Peroxiredoxins
Glutathione peroxidases
Protein disulfide isomerases (PDI)
More then 60% of known thiol oxidoreductases
are thioredoxin-fold proteins
Amino acid distribution
1
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21
Amino acid position
Y
V
W
T
S
R
Q
P
N
M
L
K
I
H
G
F
E
D
C
A
Protein set
Cysteine
Searches for
cysteines
Redox-cofactors
binding prediction
Analysis of conservation
profile for redox motifs
and single cysteines
Filtering out metalbinding cysteines
pKa = 8.3
Homology to sporadic
selenoproteins and known
thiol oxidoreductases
Secondary structure
context prediction
Structure modeling
TGT, TGC-codons
Selenocysteine
pKa = 5.2
Annotation
TGA-codon
Eukaryotes:
Initiation of
translation
AUG
SECIS
Selenocysteine
UGA
STOP
AAAAAAA 3’
5’
UAA
UAG
UGA
Bacteria:
100 - 5000 bp
SECIS
Initiation of
translation
STOP
AUG
5’
3’
20 - 50 bp
UAA
UAG
UGA
Cys
Cys
Cys
Cys
Sec
Cys
Cys
Cys
Cys
Cys
Cys
Cys
Cys
Cys
Cys
Protein Sequence Database
NCBI nonredundant protein set
ORFs from completed genomes
ORFs from environmental genomes
Protein
Translated DNA
DNA
Cys
Cys
Cys
Cys
Sec
Sec
Cys
Cys
Cys
Cys
Cys
Cys
Cys
Cys
Cys
Cys
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxUxxxxx
U
xxxxxxxUxxxxx
U
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
xxxxxxxCxxxxx
C
TBLASTN
Nucleotide Sequence Database
Completed genomes
dbEST
WGS
Environmental genomes
G P K P T R K R Y C I N S A S I E F
G P K
+
R + C I
S + S + + F
G P K R G Q S R F U I F S S S L K F
gccccaagcggggacaatcgagattctgaatatttagcagctcactgaagttc
One of the most popular sequences analysis and alignment tool is
BLAST (Basic Local Alignment Search Tools). BLAST is a set of
sequence comparison programs that are used to search sequence
databases for optimal local alignments to a query sequence.
BLAST programs have good sensitivity and reasonable fast. The
program exists as standalone version for most computer platforms and
operation
systems,
and
as
web-version
at
http://www.ncbi.nlm.nih.gov/BLAST/.
NCBI sequences databases updated each day and contain all protein and
nucleotide sequences from open resources.
There is a specialized extra sensitive tool for protein sequence alignment
called PSI-BLAST (Position-Specific Iterative BLAST). This tool can be
extremely useful for identification of similarity between distantly related
proteins.
Query
44262850
44272486
42886178
43971738
43196490
43897127
44572416
43293792
44390226
43598071
43498422
60035720
60052585
1
546
736
83
356
225
193
1196
429
774
414
399
612
472
MEFLAARASALLGCYYUTTSHAMRLGMSGKD--TG
MEFLAARASALLGCYY*TTSHAMRLGMSGKD--TG
MEFLAARASALLGCYY*TTSHAMRLGMSGKD--TG
EIVAGRVSALNECFF*TTGHARFLRESGRD--TD
ELLAARVSSANECFY*TTSHVLLLRESSKA--IS
VEVVAGKVSALNECFY*TSSHARFLRESGRV--EN
MELIAGRVSSVNECFY*TSAHATMLRVSAMT--TE
ELLAARVSSANECFY*TTSHVLLLRESSKA--IS
VELLASTVSALNECFY*TAAHVSLLRASSEA--LN
VEIIAARTSSINECFY*ATSHVMLLRASSKA--TE
EIVAGRISALNECFY*TNGHAKALREGAKL--AG
VEVIAARTSKLNECFY*TTSHALLLGLSGMN--QD
EMIAVCVAEASRCPYCADAHMIYLLASGMD--GD
LELVSFRVSQINGCAYCLDMHSKDLRVKGET--EQ
44275476
43834666
44586938
49176232
37525471
9658033
24372842
54302561
50905511
4406372
39996033
45360053
34557042
48834389
46106717
52007258
46317410
54029015
48838221
54025306
3955039
15607465
20807605
46581156
50874889
48847313
53760573
48768248
15596406
33603950
54029683
48782625
44357259
44624393
21242497
17548617
47573188
53730919
50874889
46112894
46142555
46198460
48847131
46580382
53691784
43887282
52006282
53757119
53729577
56420601
23099356
CD01448
CD00158
CD01444
CD01524
Environmental sequence
Environmental sequence
Environmental sequence
E. coli
P. luminescens
V. cholerae
S. oneidensis
P. profundum
O. sativa
D. glomerata
G. sulfurreducens
R. xylanophilus
W. succinogenes
Magnetococcus sp.
R. xylanophilus
T. denitrificans
B. cepacia
Polaromonas sp.
M. barkeri
N. farcinica
S. peucetius
M. tuberculosis
T. tengcongensis
D. vulgaris
D. psychrophila
G. metallireducens
R. eutropha
R. metallidurans
P. aeruginosa
B. bronchiseptica
Polaromonas sp.
B. fungorum
Environmental sequence
Environmental sequence
X. axonopodis
R. solanacearum
R. gelatinosus
D. aromatica
D. psychrophila
Exiguobacterium sp.
M. burtonii
T. thermophilus
G. metallireducens
D. vulgaris
D. desulfuricans
Environmental sequence
T. denitrificans
M. capsulatus
D. aromatica
G. kaustophilus
O. iheyensis
WLNFINRGKDKTFKTPEQIFEILNNAGVDPEKQIVTYUQ--GGIRAAHVMFVLALVSTFSPNINYDRVKVYDGSMGEWA
WLEFIDENNNNKFKSQNEIESILNKQNITYEKQIATYUQ--GGIRAAHVFVVLKLIG-------YKNIKVYDASMGEYA
WFNLMDR-QTHLFRSEEDIKAILADNGIALDKAIYTYUQ--AGVRAAHANFVLQLIG-------QSEARVYDGSMGEWA
WTELVRE-GELKT--TDELDAIFFGRGVSYDKPIIVSCG--SGVTAAVVLLALATLD-------VPNVKLYDGAWSEWG
WTMLVEN-GHFKS--ETEITDIFHKQGVDLNKPVITSCG--SGMTAAVLVLGLDIIG-------KKDVYLYDGSWAEWG
FAELITG-HKLKE--QAELRPLLTHMLPETAQEYLFSCG--SGVTACIVLLAAYVCG-------YKNLSVYDGSWTEWG
FGEVLNG-YKMKS--TTELQAIFQALVGNKALR-IFSCG--SGITACILILASVVAG-------HKSAVLYDGSWADWG
FSQLIKD-GFFID--KELLVNRFNAVS-DIEQRLIFSCG--SGVTACVLALGAELAG-------RKMLTVYDGSWTEWG
FLEMFDDAPMLLP--ADEIRKKFEQAGISLDRPIVVTCG--SGVTACILALGLYRIG-------KQDIPVYDGSWTEWE
FPQILDASQALLP--ADELKKRFDQEGISLESPIVTSCG--TGVTACILALGLHRLG-------KSDVAVYDGSWTEWG
YNEA--HIPTAVSIPFAELEKNPALLTASKDRLLVFYCGGVTUVLSPKSAGLAKKSGYE-------KVRVYLDGEPEWK
YRQG--HLPGAVRYESAEQVRKLAPQKDAF-IVAY--CSNFNUHSSTRVARELAAMGYE-------NVYDYEGGKQDWV
FRES--TIPGSIGVSDGKFKELWGRLPMDPNTKVVVFCGGYECELSHSVAGHMVAMGYK-------NFMTYSGGTPEWK
YAAG--HLPGAVNIPLKDLEANLALLPAGQEVVAY--CRGPWCVLAFDAVARLRARGI--------KARRLQDGLPEWR
YRAG--HIPGALSVPLERLEAYLAEIPKDQEIVAY--CRGPYCVFADEAVALLRSRGY--------RARRLQEGLPDWR
YQAG--HIPGAVNIPIDELPHHLEALPQGQEIVAY--CRGPYCMLAFDAVATLRQAGY--------QARRLEDGFPEWK
FTEG--HLPGALNIPLSELDARVSELPAGTEIVAY--CRGPYCVFAVEAVAALRARGF--------KAARLEDGFPEWK
FTSA--HLPRARSLPVDELKKRLNELPKDVPIVAY--CRGPFCLMAKDAVELLRKKGY--------RAFHLTDGVAEWR
YEMM--HIPGANSIPLEDLEKHLATLPINQEIVAY--CRGRCCLLSVEAVEILRAHGF--------KAVRLEASVQEWL
YSAG--HIPGAINIPIDQLSDRIAELPADTEIVVY--CRGEYCVFAYDAVRLLTERGR--------RAVRLRDGMLEWR
YLAG--HIPGAVCIPVAELTDRIGELAKDTEVVVY--CRGEYCALAYDAVRLLTDHGR--------RAIRLNDGMLEWR
YQAG--HIPGAINIPIAELADRLAELTGDRDIVAY--CRGAYCVMAPDAVRIARDAGR--------EVKRLDDGMLEWR
YEQA--HIKGSISIPLEELPNHLNCLSKDKLIVTY--CASYECTSCIEAAELLANYGF--------NVKVYRGGTKEWI
FPIPDMNDWNMAETGDKSQQDFEALLGPDKNRPLVFYCGFVKCTRSHNGAVWAQKLGYT-------NVYRMPGGIVAWK
WTSSEFKIKGAHRANPGKLDTWKSKFAKDKKIVLYCAUP--NESTSASLARKLTADGFS-------SVHALKGGWREWS
AVGAINLPNDGPADIERIKQMELPFTKKDEIIV-YCSUA—-GEQASARVALVLIERGFT-------KTYVVRGGRQAVF
VPIE--RIPGAVVVDMHGPLDALGGQLESRDIVVYCACP--NEISAAILAERLRVAGYG-------KTWALAGGFDEWK
APIE--RIPGSIVMEIKGPFDTLSGHDASSDFVVYCACP--HEMSAAVLAERLRTAGYP-------NTWALAGGFDEWK
DEPS--GIPGAIPVELNVSLKDLPGDLRDASIVIYCACP--HELSAAMLAQRLNASGFT-------RTWALAGGLDAWR
RDEQ--RIPGAIAMDLRAPLQDLQFDPEAGDIVVYCACP--NEVSAAQLAKKLRAAGYR-------NTFALRGGYEAWR
RAGGG-IIPGALVWSDLDRKMASLDLPHDAHVVVYCACP--NDASAAQVAKRLMAAGFS-------NVRPLHGGIDAWE
RKLDPFVIPGTQFADERQLDEIVATYPRDQKLVIYCSCP--NEISAAWMARQLNEAGFS-------DVLPLRGGMEAWR
RALDPFVIPGSQFADERQLDEIVATYPHDQKVVIYCSCP--NEISAAWMAKQMNEAGFA-------DVLPLRGGMEAWR
RKLDPFTIPGAQFADERQIGDIVSRYPFSQKFVVYCSCP--NEFTAALMAKRLLDAGFT-------DALALRGGLDAWR
RQLQPYTIPGAVFADERQLAQILASVPRDRSVVIYCACP--DEVSAAWLAARMRERGYR-------DVRPLLGGLDAWR
RMSQPHRIPGAMLYDMSAKDGPIEIEGPDREIVIYCACP--NEASAVMLARTLMGRGFR-------RVRPLHGGIDAWM
AGLDLRHIPGAWRVELSEVATHASQLPRDREIVLYCNCP--NEASAATAAQALRAAGLP-------RVRPLAGGLEGWA
VAETG-PITGATVAEHDRLLDAVGEWPKNLPIVTLCACP--EDAGAIQAARQLLNAGFL-------SVRPLKGGYEAWL
YENG--HIPGAKLIPVGQLESRLDELP--KDKPLVVYUA--IGGRSRVAVQLLAGKGFS-------KIYNLSGGINAWE
FKGN--HIKGFKNIPLQVLPTQLDKIP--KDKEVIVICQ--SGMRSKQAVKQLKKAGYT-------QVTEVSGGMNAWR
FNSG--HLEGAVNIEVSQLGTRLNEAP--ADKVILVYCR--TGVRSVRASKTLVNAGYT-------DVYNMKGGIMAWM
FAGE--RIQGAVNIPIRDLPKRVGELP--KGKPIIVYCK--VGHRGSMAMMFLRGQGY--------NVQSISGGLDGWK
FGQG--RLQGAVLIPINEVERRIGEIP--RNRPVVVYCA--VGSRSGLVAGFLSRKGYR-------EVYNMADGIVGWY
YAEG--HIPGAMLMPLADLADGMRQLP--AENPLLVYCA--IGGRSRIAAQLLAGNGFS-------KVMNLSGGFKAWN
YRQG--HLPGARLVPMGELSDRLDELE--RDGPTLVYCA--IGGRSRVAAQMLAGKGFK-------HVINMAGGFKDWE
YEIC--SLPDSKLIPLGDLTSRVHELD--TADDIIVYCH--HGMRSLQAARMLKGMGYK-------KVRNLAGGIDAWA
YAAG--HIPKAKHIPLGQLQSRLSELDKHKNKPVLVTCR--SGNRSAHACRILKKAGFE-------SVYNQAGGILAWE
FAEG--HIEGAYHIPLGKLEERASEIAQYKEKPVIVTCQ--QGTRSPSACKTLTKQGFS-------RIYEMRGGMLAWR
YASG--HLPDAKNIPVAKLADRIGELEKFKDKPIIVCCA--TGMRSNKACAELKKQGFD-------KLHNLAGGVDAWV
YAFG--HIPGAVSIPLGELENRMAELP--KDKTIYVVCR--TGTRSDLAAQKLAEKGFD-------RVRNVIPGMSQWN
FDKG--HILGARNIPMTQMKQRLIEMR--KDKPIYLYCQ--GSSRSARAAQLLHKKGYK-------EIYQLKGGFKKWT
99
79
60
28
89
45
37
44
51
41
14
69
90
62
70
07
79
45
99
23
85
01
37
01
87
22
71
24
0
69
98
01
15
41
58
AsVO43-+2e  AsIIIO33-+CH3+  CH3AsVO32-+2e  CH3AsIIIO22-+CH3+  (CH3)2AsVO2-+2e  (CH3)2AsIIIOEnvironment sequence
Environment sequence
Environment sequence
Environment sequence
D. psychrophila
M. musculus
R. norvegicus
H. sapiens
P. pygmaeus
G. gallus
T. nigroviridis
S. pomeroyi
M. degradans
T. denitrificans
M. magnetotacticum
S. thermophilum
M. acetivorans
M. mazei
C. hutchinsonii
C. neoformans
M. grisea
D. hansenii
A. nidulans
A. variabilis
Nostoc sp
N. punctiforme
Synechocystis sp.
G. violaceus
V. cholerae
V. parahaemolyticus
Polaromonas sp
E. coli
Y. pestis
C. briggsae
S. aureus
-WADV-VISNGVINLCA--DKKRVFQEIRRVLRPGGRLQFADIANGKAVPASAVR-NIDLWTAUIAGGLPCEGWRAMLEAVGF
-------------NLCA--DKKQVFKEIWRVLRPGGRLQFADIANGKPVPPAALR-NIDLWTAUIAGGLPCEGWRSMLEEVGF
-SVDV-VMSNGVINLTP--DKITAFSEVFRVLKPGGAFFYGDIVVAEELGESIRR-NIELWTGUIAGALPVAEITPVLTSVGF
-------------------------------------------------VAVDCH-DPKLWASUIGGALPEGEVFRLLEESGF
-SFDL-VISNGVFNLSL--QKTLLFAEVFRVLKPQGKLQFADIVLRKKLPTEMKG--AAAWSNUIGGAVSVGDQIEYMLEAGF
ESYDI-VISNCVINLVP--DKQQVLQEVYRVLKHGGELYFSDVYASLEVPEDIKS-HKVLWGECLGGALYWKDLAIIAQKIGF
ESYDI-VISNCVINLVP--DKQKVLREVYQVLKYGGELYFSDVYASLEVSEDIKS-HKVLWGECLGGALYWKDLAVIAKKIGF
ESHDI-VVSNCVINLVP--DKQQVLQEAYRVLKHGGELYFSDVYTSLELPEEIRT-HKVLWGECLGGALYWKELAVLAQKIGF
ESHDI-VVSNCVINLVP--DKQQVLQEAYRVLKHGGELYFSDVYTSLELPEEIRT-HKVLWGECLGGALYWKELAVLAQKIGF
ESYDI-VISNCVINLTP--DKRAVLREAYRVLKPGGEMYFSDVYANQHLSEAMRK-HRVLWGECLAGALFWRDLYSIAKEVGF
DSFDI-IISNCVVNLSP--DKKRVLAEAYSVLKDGGELYFSDVYSSGRLTEEMRN-HKVLWGECLSGALWWKDLLLLAEEVGF
GSFDI-IVSNCVLNLAT--DKAAVLRGAQRLLKPGGEMYFSDVYADRRIPEALAR-DEVLYGECLSGALYWNDFLSLARGAGF
NSFDV-IISNCVINLCT--DKTAVLKHAWHLLKEGGEFYFSDVYADRRIPTNLSQ-DPILYGECLSGAYYWNDFINAAKTAGF
-TVDA-IISNCVINLSP--DKVQVFREAFRVLKPGGRLAFSDIVTTAELPEAMQR-EVALYTACVAGAASVDELTAMLADAGF
-TADV-VISNCVINLSP--DKPAVLNDAFRVLKPGGRVAISDVVMLRPLPPELAA-MKELLTGCAAGAATVAELSNWLEQAGF
-SVDV-IISNCVINLSP--EKEQVFREAFRVLRPGGRIAVADMVSLAPLPPEVRE-DLALYAGCVAGVATVGELRTMLTEAGF
-SVDV-IISNCVINLAP--DKEKVFREAFRVLKPGGRMYVSDMVLLEDLPEDLKN-DCDLLAGCVAGALLKEEYLGLLKKAGF
-SVDV-IISNCVINLAP--DKEKVFREAFRVLKPEGRMYISDMVLLDELPEELKN-DSELLAGCIAGAVLKEEYLGLLKKAGF
-RADV-VVSNCVMNLVP--DKAKAFSEVFRILKPLGHFSISDIVLKGDLPDAIKK-EGEMYAGCVSGAIKKSEYLGILAEQGF
-STDC-IISNCVINLVPHDDKHLVFKEIYRLLKAGGRVSVSDLLAKKQITPELQS-HLGLYVGCISGASLVGEYENWLKEAGF
-IADC-IISNCVINLVPAAEKHLVFKEIFRLLKPGGRLAVSDILAKKPMPEKIRS-DIALYVGCISGASTVSEYEEFLKDAGF
-TADV-VISNCVLNLVPDDEKPTTFKEIYRLLKSGGRVAISDLLSVRELPDTIKN-NLAFYVGCVSGARSVGEYEKWLKEAGF
-SADC-IISNCVINLVPKDAKPIVFAEIARLLKPGGRVAISDILARKPLSPAFVS-DIALYVGCVAGASLVEEYEDWLGRAGL
-SVDI-VAQNCLFNIFEPEDLTRALKEAYRVLKCGGRLQMSDPIATSPIPAHLQQ-DERLRAMCLSGALTYEEYTQRIIDAGF
-SVDI-VAQNCLFNIFEPEDLTRALKEAYRVLKPGGRLQMSDPIATSPVPAHLQQ-DERLRAMCLSGALTYEEYTQRIIDAGF
-SVDI-VAQNCLFNIFEPEDLNRALKEAYRVLKPGGRLQMSDPIATSQIPAHLQQ-DERLRAMCLSGALTYQEYTERITNAGF
-SVDV-VAQNCLFNIFEPEDLSRALKEAYRVLKPHGRLIMSDPIAARPIPQHLRQ-DERLRAMCLSGALTYAEYIQHLIDTGF
-AIDL-VAQNCLFNIFEPDDLLTALQEVRRVLVPGGRLVLSDPIASRPIPPHLQA-DERLRAMCLSGCLPLEQYLGCIVEAGF
-YFDCITISFCLRNVT---DKDKALRSMFRVLKPGGRLLVLEFSKPILEPLSKLYDTYSFHILPKMGQLIANDADSYRYLAES
-YFDVITISFCLRNVT---DKDKALRSMFRVLKPGGRLLVLEFSKPVLEPLSKVYDAYSFHLLPKMGELVANDAESYRYLAES
-SCDA-VISNGVINLAP--DKRTVFREAARLLKPGGRLALADIVTETQLPEGITC-DTTLWAACIGGAMQVGDYTSAIEAAGL
-TFDCITISFGLRNVT---DKDKALRSMYRVLKPGGRLLVLEFSKPIIEPLSKAYDAYSFHVLPRIGSLVANDADSYRYLAES
-FFDCITISFGLRNVT---EKEKALRSMFRVLKPGGRLLVLEFSKPLLEPLSKAYDAYSFHILPKIGELVAQDAESYRYLAES
-TYDLFTMSFGIRNCT---HPQKVIAEAFRILKPGGQLAILEFSQVN-AALKPIYDAYSFNVIPVLGEILASDRQSYQYLVES
-SFDYVTIGFGLRNVP---DYLVALKEMNRVLKPGGMVVCLETSQPTLPVFKQMYALYFKFVMPIFGKLFAKSKEEYEWLQQS
CDD10371
Cys157
SAM-dependent methyltransferase
Cys207 (Cys/Sec)
http://genomics.unl.edu/REDOX/REDOXCysSearch/
http://www.selenodb.org/
http://genome.unl.edu/SECISearch.html
Protein set
Searches for
cysteines
Redox-cofactors
binding prediction
Analysis of conservation
profile for redox motifs
and single cysteines
Filtering out metalbinding cysteines
Homology to sporadic
selenoproteins and known
thiol oxidoreductases
Secondary structure
context prediction
Structure modeling
Annotation
Redox active cysteines are typically conserved even in distantly
related proteins.
Blastall and PSI BLAST programs from BLAST tools could be used for
conservation profile analyses
Protein set
Major metal coordinating residues –
Cysteine and Histidine
Searches for
cysteines
Redox-cofactors
binding prediction
Major metal binding motif - CxxC
Analysis of conservation
profile for redox motifs
and single cysteines
Filtering out metalbinding cysteines
Homology to sporadic
selenoproteins and known
thiol oxidoreductases
Secondary structure
context prediction
Structure modeling
Annotation
Metal binding cysteines are conserved
even in distantly related organisms
More than 90% of CxxC motifs are
involved in metal coordination
Metal binding proteins are major false-positive
hits in thiol oxidoreductases identification
process
Some of metal binding cysteines
are involved in redox regulation
Distribution of metal-binding protein patterns and profiles in PROSITE databa
http://ca.expasy.org/prosite/
Metal
Zinc
Iron
Calcium
Copper
Magnesium
Nickel
Manganese
Cobalt
Patterns
77
74
24
22
21
10
8
2
Profiles
32
5
4
4
0
0
0
0
There are 77 zinc-binding protein patterns, including 36 patterns that
contain one cysteine. 30 of these 36 patterns contain one CxxC motif.
Of 74 iron-binding protein patterns, 31 contain one cysteine. 15 of these 31
contain one CxxC motif.
Galactitol-1-phosphate dehydrogenases (Zn)
Peroxiredoxins – thiol/disulfide
oxidoreductase
Cx1x2C
Conservation profile based distribution for X1 and X2
X1 { N1(AA1), N2(AA2), N3(AA3), …,Nn(AAn)}
N1>N2>N3>….>Nn
X2 { P1(AA1), P2(AA2), P3(AA3),…, Pn(AAn)}
P1>P2>P3>….>Pn
AA - amino acid;
Nn, Pn - number of amino acids;
Metal-binding proteins:
N1(AA1)+ N2(AA2)
P1(AA1)+ P2(AA2)
+
N1(AA1)+ N2(AA2)+…..+ Nn(AAn)
<=1
P1(AA1)+ P2(AA2)+…..+ Pn(AAn)
Redox proteins:
N1(AA1)+ N2(AA2)
P1(AA1)+ P2(AA2)
+
N1(AA1)+ N2(AA2)+…..+ Nn(AAn)
>1
P1(AA1)+ P2(AA2)+…..+ Pn(AAn)
Metal-binding proteins:
<=80 Amino acids
CxxCxxxxxx……………..xxxxxxCxxC
Redox proteins
>80 Amino acids
CxxCxxxxxx……………..xxxxxxCxxC
Prosite pattern based filter
Amino acid conservation based filter
CxxC motifs distance based filter
Protein set
Searches for
cysteines
Redox-cofactors
binding prediction
Analysis of conservation
profile for redox motifs
and single cysteines
Filtering out metalbinding cysteines
Homology to sporadic
selenoproteins and known
thiol oxidoreductases
Secondary structure
context prediction
Structure modeling
Annotation
Secondary structure prediction:
PSI Pred - http://bioinf.cs.ucl.ac.uk/psipred/
SSPro - http://www.igb.uci.edu/tools/scratch/
Secondary structures distribution (thioredoxin-fold proteins included)
0.9
0.8
0.7
a-helix
helix
strand
b-strand
Loop
Loop
Thioredoxin fold proteins
redox motif surrounded by
beta strand and alpha helix
beta-CxxC-alpha
0.6
0.5
0.4
0.3
0.2
0.1
0
1 -9
2 -8
3 -7
4 -6
5 -5
6 -4
7 -3
8 -2
9 10
21
-10
4 16
5 17
6 18
-1 11
1 13
2 14
3 15
7 19
8 20
9 10
0 12
Secondary structures distribution (non thioredoxin-fold proteins only)
0.9
a-helix
ahelix
0.8
b-strand
b
- strand
0.7
Loop
Loop
0.6
Active cysteine followed by
alpha helix in most of redox
proteins.
CxxC – alpha
0.5
0.4
0.3
0.2
0.1
0
10
-10
-1 11
0 121 132 143 154 165 176 187 198 209 21
1 -9
2 -8
3 -7
4 -6
5 -5
6 -4
7 -3
8 -2
9 10
Selenoprotein M (b-a-b-b-b-a)
MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS
________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________
15 kDa Protein (b-a-b-b-b-a)
LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS
____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________
Selenoprotein M (b-a-b-b-b-a)
MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS
________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________
15 kDa Protein (b-a-b-b-b-a)
LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS
____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________
Selenoprotein W (b-a-b-b-b-a)
MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVD
______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH___________
Rdx12 (b-a-b-b-b-a)
MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP
_______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____
{
{
Selenoprotein M (b-a-b-b-b-a)
LLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS
_______________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________
15 kDa Protein (b-a-b-b-b-a)
LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS
____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________
Selenoprotein W (b-a-b-b-b-a)
MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR
______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________
Rdx12 (b-a-b-b-b-a)
MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP
_______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____
Thioredoxin (b-a-b-
a-b-b-a)
KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI
___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__
Selenoprotein W (b-a-b-b-b-a)
MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR
______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________
Rdx12 (b-a-b-b-b-a)
MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP
_______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____
Selenoprotein H (a-b-a-b-b-a-a)
AAVVAVAEKREKLANGGEGMEEATVVIEHCTSCRVYGRNAAALSQALRLEAPELPVKVNPTKPRRGSFEVTLLRPDGSSAELWTGIKKGPPRKLKFPEPQEVVEELKKYLS
_HHHHHH________________EEEEEE____HHHHHHHHHHHHHHHHH____EEEE_________EEEEE_______HHHHHHH____________HHHHHHHHHHHH_
Selenoprotein T (b-b-a-b-a-a-a-a-b-b)
GGVPSKRLKMQYATGPLLKFQICVSUGYRRVFEEYMRVISQRYPDIRIEGENYLPQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQAPSIWQWGQENKVYACMMVFFLSNMIENQCMSTGAFEITLNDVPVWSKLES
______EEEEEE____EEEEEEEEE___HHHHHHHHHHHHHH____EEE______HHHHHHHHHHHHHHHHHHHHHH_____HHHH______HHHHHH___HHHHHHHHHHHHHHHHHHH___EEEEEEE__EEEEEEE__
{
Selenoprotein M (b-a-b-b-b-a)
LLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS
_____________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________
15 kDa Protein (b-a-b-b-b-a)
LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS
____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________
Thioredoxin (b-a-b-a-b-b-a)
KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI
___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__
Selenoprotein W
2FA8
SelW
SelW
SelW
SelW
SelW
AGROBACTERIUM TUMEFACIENS
PSEUDOMONAS FLUORESCENS
VIBRIO CHOLERAE
BOS TAURUS
MUS MUSCULUS
SEA URCHIN
EEEEEEE
HHHHHHHHHHHHHH
EEEEEEE
EEEEE
EEEEEE
HHHHHHHHHHHH
PRIAIRYCTQCN-WLLRAGWMAQEILQTFASDIGEVSLIPST--GGLFEITVD-----GTIIWERKRDG-----GFPG----PKELKQRIRDLID
PEIVITYCTQCQ-WLLRAAWLAQELLSTFGDDLGKVSLVPGT--GGIFHITCN-----DVQIWERKADG-----GFPE----AKVLKQRVRDQID
AQIEIYYCRQCN-WMLRSAWLSQELLHTFSEEIEYVALHPDT--GGRFEIFCN-----GVQIWERKQEG-----GFPE----AKVLKQRVRDLID
VVVRVVYCGAUG-YKPKYLQLKKKLEDEFPSR-LDICGEGTPQVTGFFEVFVA-----GKLVHSKKGGD-----GYVDTESKFLKLVAAIKAALA
LAVRVVYCGAUG-YKPKYLQLKEKLEHEFPGC-LDICGEGTPQVTGFFEVTVA-----GKLVHSKKRGD-----GYVDTESKFRKLVTAIKAALA
VIVKVIYCGGUG-YGPRYRRLKQELKDEFGDD-VDMAGESTPGTTGWLEVXVN-----GKLIHSKKNGD-----GYIDSESKLKKIVNAVSAAM-
Selenoprotein V
SelV MUS MUSCULUS
SelV RATTUS NORVEGICUS
SelV HOMO SAPIENS
ILIRVMYCGLUS-YGLRYIILKRTLEHQFPNL-LEFEEERATQVTGEFEVFVD-----GKLIHSKKKGD-----GFVD-ESGLKKLVGAIDEEIK
ILIRVMYCGLUS-YGLRYILLKKTLEHQFPNL-LEFEEERATQVTGEFEVFVD-----GKLIHSKKKGD-----GFVD-ETSLKKLVGAIDEEIK
VLIRVTYCGLUS-YSLRYILLKKSLEQQFPNH-LLFEEDRAAQATGEFEVFVN-----GRLVHSKKRGD-----GFVN-ESRLQKIVSVIDEEIK
Selenoprotein H
SelH
SelH
SelH
SelH
MUS MUSCULUS
HOMO SAPIENS
BOS TAURUS
DANIO RERIO
ATVVIEHCTSURVYGRHAAALSQALQLEAPE--LPVQVNPSKPRRGSFEVTLLRSDNSRVELWTGIKKGPPRKLKFPE----PQEVVEELKKYLS
ATVVIEHCTSURVYGRNAAALSQALRLEAPE--LPVKVNPTKPRRGSFEVTLLRPDGSSAELWTGIKKGPPRKLKFPE----PQEVVEELKKYLS
PSVVIEHCTSURVYGRNAAALSQALRLQAPE--LTVKVNPARPRRGSFEVTLLRADGSSAELWTGLKKGPPRKLKFPE----PHVVLEELKKYLS
LRVVIEHCKSURVYGRNAVVVREALADSHPE--LKVMINPHNPRRNSFEITLMDG-ERADVLWSGIKKGPPRKLKFPE----PAEVVTALKQALE
Rdx12
Rdx12
Rdx12
Rdx12
Rdx12
Rdx12
SUS SCROFA
HOMO SAPIENS
GALLUS GALLUS
DANIO RERIO 2A
DANIO RERIO 2
VRIVVEYCEPCG-FEATYLELASAVKEQYPG--IEIESRLGG--TGAFEIEIN-----GQLVFSKLENG-----GFPY----EKDFIEAIRRASN
VRIVVEYCEPCG-FEATYLELASAVKEQYPG--IEIESRLGG--TGAFEIEIN-----GQLVFSKLENG-----GFPY----EKDLIEAIRRASN
VHIMVEYCEPCG-FGATYEELASAVREEYPD--IEIESRLGG--TGAFEIEIN-----GQLVFSKLENG-----GFPY----EKDLIEAIRRARN
VQIKVEYCGGUG-YEPRYQELKRVVTAEFTD--ADVSGFVGR--QGSFEIEIN-----GQLIFSKLETS-----GFPY----EDDIMGVIQRAYD
VKVKIEYCGAUG-YEPRFQELKREICGNCPD--AEVSGFVGR--RGCFEIQIN-----DFLVFSKLESG-----GFPY----SEDIIEAVVKAKD
Selenoprotein T
SelT
SelT
SelT
SelT
SelT
SelT
BOS TAURUS
SUS SCROFA
MUS MUSCULUS
ANOPHELES GAMBIAE
SOLANUM TUBEROSUM
MEDICAGO TRUNCATULA
SelT
SelT
SelT
SelT
SelT
SelT
BOS TAURUS
SUS SCROFA
MUS MUSCULUS
ANOPHELES GAMBIAE
SOLANUM TUBEROSUM
MEDICAGO TRUNCATULA
EEEEEEEEE
HHHHHHHHHHHHHH
EEEEEE
PLLKFQICVSUG-YRRVFEEYMRVISQRYPD--IRIEGENYL
PLLKFQICVSUG-YRRVFEEYMRVISQRYPD--IRIEGENYL
PLLKFQICVSUG-YRRVFEEYMRVISQRYPD--IRIEGENYL
ATMTFLYCYSCG-YRKAFDDYHNLILEKYPE--ITIRGSNYD
NTVTIDFCSSCS-YRGTAVTMKNMLDNQFPG--IHVVLANYP
NTVSIDFCTSCS-YKGNAVSVKNTLESLFPG--INVVLANYP
EEEEE
EEEEE
HHHHHHHHHHH
TGAFEITLN-----DVPVWSKLESG-----HLPS----MQQLVQILDNEMK
TGAFEITLN-----DVPVWSKLESG-----HLPS----MQQLVQILDNEMK
TGAFEITLN-----DVPVWSKLESG-----HLPS----MQQLVQILDNEMK
SGAFEITLN-----DVPVWSKLETG-----RFPA----PQEMFQIIDNHLQ
SGAFEVYCN-----GELVFSKLKEN-----RFPG----ELELKDLVGRKIA
SGAFEVYFN-----GELVFSKLKEN-----RFPG----EFELKELIGRRIG
HHHHHHHHHHHHHHHHHHHHHHH
HHHH
HHHHH HHHHHHHHHHHHHHHHHHHHHHH
PQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQ-APSIWQWG-QENKVYACMMVFFLSNMIENQCMS
PQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQ-APSIWQWG-QENKVYACMMVFFLSNMIENQCMS
PQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQ-APSIWQWG-QENKVYACMMVFFLSNMIENQCMS
PSGVNMLLSKVLLVTKLLLIAALMSNYDIGRYIGNP-FAGWWQWC-FNNKLYASMMIFFLGNTLEAQLIS
PPLPKRLLGKVVPVFQFGVIGLVMAGEQIFPRLGIAVPPPWFYQL-RANRFGTMATTWLLGNFFQSMLQS
PPLPKRALSKVVPVLQTGAIIAITAGDQIFPRLGVT-PPQLYYSL-RANKFGSIASIWLLSNFVQSFLQS
Protein set
Searches for
cysteines
Redox-cofactors
binding prediction
Analysis of conservation
profile for redox motifs
and single cysteines
Filtering out metalbinding cysteines
Homology to sporadic
selenoproteins and known
thiol oxidoreductases
Secondary structure
context prediction
Structure modeling
Annotation
Sequence or structural similarity to known proteins
Functional associations
3D-Jury system http://bioinfo.pl/meta/
Eukaryotes
Homo sapiens
Drosophila melanogaster
Caenorhabditis elegans
Arabidopsis thaliana
Saccharomyces cerevisiae
SelR
 


 

MsrA






 

 

Archaea
Aeropyrum pernix
Sulfolobus solfataricus
Sulfolobus tokodaii
Archaeoglobus fulgidus
Halobacterium sp. NRC-1
Methanothermobacter thermautotrophicus
Methanococcus jannaschii
Pyrococcus abyssi
Pyrococcus horikoshii
Thermoplasma acidophilum
Thermoplasma volcanium



























Bacteria
Aquifex aeolicus
Chlamydia muridarum
Chlamydia trachomatis
Chlamydophila pneumoniae AR39
Chlamydophila pneumoniae CWL029
Chlamydophila pneumoniae J138
Synechocystis sp. PCC 6803
Mycobacterium leprae
Mycobacterium tuberculosis CDC1551
Mycobacterium tuberculosis H37Rv
Bacillus halodurans
Bacillus subtilis
Clostridium acetobutylicum
Mycoplasma genitalium
Mycoplasma pneumoniae
Mycoplasma pulmonis
Ureaplasma urealyticum
Lactococcus lactis subsp. lactis
Staphylococcus aureus subsp. aureus Mu50
Staphylococcus aureus subsp. aureus N315
Streptococcus pneumoniae R6
Streptococcus pneumoniae TIGR4
Streptococcus pyogenes M1 GAS
Caulobacter crescentus
Agrobacterium tumefaciens
Mesorhizobium loti
Sinorhizobium meliloti
Rickettsia conorii
Rickettsia prowazekii
Neisseria meningitidis MC58
Neisseria meningitidis Z2491
Campylobacter jejuni
Helicobacter pylori 26695
Helicobacter pylori J99
Escherichia coli K12
Escherichia coli O157:H7
Escherichia coli O157:H7 EDL933
Yersinia pestis
Buchnera sp. APS
Vibrio cholerae
Xylella fastidiosa 9a5c
Haemophilus influenzae Rd
Pasteurella multocida
Pseudomonas aeruginosa
Borrelia burgdorferi
Treponema pallidum
Thermotoga maritima
Deinococcus radiodurans

















































 
 
 
 














 










 








STRING - Search Tool http://string.embl.de/
Example - Yeasts mitochondrial glutaredoxin Grx5
>gi|6325198| Grx5p [Saccharomyces cerevisiae] COG0278
MFLPKFNPIRSFSPILRAKTLLRYQNRMYLSTEIRKAIEDAIESAPVVLFMKGTPEFPKCGFSRATIGLL
GNQGVDPAKFAAYNVLEDPELREGIKEFSEWPTIPQLYVNKEFIGGCDVITSMARSGELADLLEEAQALV
PEEEEETKDR
Related documents