Download prezentace

Document related concepts
no text concepts found
Transcript
Bioinformatika
Jiří Vondrášek
Ústav organické chemie a biochemie
[email protected]
Jan Pačes
Ústav molekulární genetiky
[email protected]
http://bio.img.cas.cz/kurs
Predikce sekundární struktury
v proteinech
Secondary structure
Elements in Protein
B Sheets – atom representations
a helices - atom representations
Metoda Chou-Fasman
Využívá tabulky konformačních parametrů
extrahovaných z reálných struktur a CD
spektroskopie
Tabulka obsahuje pravděpodobnosti pro
jednotlivé sekundární prvky pro každou
aminokyselinu
1. Assign all of the residues in the peptide the
appropriate set of parameters.
2. Scan through the peptide and identify regions
where 4 out of 6 contiguous residues have P(ahelix) > 100. That region is declared an alphahelix. Extend the helix in both directions until a set
of four contiguous residues that have an average
P(a-helix) < 100 is reached. That is declared the
end of the helix. If the segment defined by this
procedure is longer than 5 residues and the
average P(a-helix) > P(b-sheet) for that segment,
the segment can be assigned as a helix.
3. Repeat this procedure to locate all of the helical
regions in the sequence.
3. Scan through the peptide and identify a region
where 3 out of 5 of the residues have a value of
P(b-sheet) > 100. That region is declared as a
beta-sheet. Extend the sheet in both directions until
a set of four contiguous residues that have an
average P(b-sheet) < 100 is reached. That is
declared the end of the beta-sheet. Any segment of
the region located by this procedure is assigned as
a beta-sheet if the average P(b-sheet) > 105 and
the average P(b-sheet) > P(a-helix) for that region.
4. Any region containing overlapping alpha-helical and
beta-sheet assignments are taken to be helical if
the average P(a-helix) > P(b-sheet) for that region.
It is a beta sheet if the average P(b-sheet) > P(ahelix) for that region.
5. To identify a bend at residue number j, calculate the
following value p(t) = f(j)f(j+1)f(j+2)f(j+3)
6. where the f(j+1) value for the j+1 residue is used,
the f(j+2) value for the j+2 residue is used and the
f(j+3) value for the j+3 residue is used. If: (1) p(t) >
0.000075; (2) the average value for P(turn) > 1.00
in the tetrapeptide; and (3) the averages for the
tetrapeptide obey the inequality P(a-helix) < P(turn)
> P(b-sheet), then a beta-turn is predicted at that
location.
CHOU-FASMAN RULES FOR ALPHA HELIX:
Helical residues = >1.0 for helix.
Helical breakers =
4/6 >1.0 nucleates helix.
Helix continues both ways until 4 contiguous
Special rules for Proline.
Segment 5 residues or longer and P(a) > P(b) = helix.
CHOU-FASMAN RULES FOR BETA STRAND:
Beta residues = >1.0 for strand.
Beta breakers =
3/5 >1.0 nucleates strand.
Strand continues both ways until 4 contiguous
Segment with average P(b) > 1.05 and P(b) > P(a) = strand
Name
P(a)
Alanine
Arginine
Aspartic Acid
Asparagine
Cysteine
Glutamic Acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
142
98
101
67
70
151
111
57
100
108
121
114
145
113
57
77
83
108
69
106
P(b)
83
93
54
89
119
037
110
75
87
160
130
74
105
138
55
75
119
137
147
170
P(turn)f(i)
66
95
146
156
119
74
98
156
95
47
59
101
60
60
152
143
96
96
114
50
0.06
0.070
0.147
0.161
0.149
0.056
0.074
0.102
0.140
0.043
0.061
0.055
0.068
0.059
0.102
0.120
0.086
0.077
0.082
0.062
f(i+1) f(i+2) f(i+3)
0.076
0.106
0.110
0.083
0.050
0.060
0.098
0.085
0.047
0.034
0.025
0.115
0.082
0.041
0.301
0.139
0.108
0.013
0.065
0.048
0.035
0.099
0.179
0.191
0.117
0.077
0.037
0.190
0.093
0.013
0.036
0.072
0.014
0.065
0.034
0.125
0.065
0.064
0.114
0.028
0.058
0.085
0.081
0.091
0.128
0.064
0.098
0.152
0.054
0.056
0.070
0.095
0.055
0.065
0.068
0.106
0.079
0.167
0.125
0.053
Chou-Fasman propensities
(partial table)
Amino Acid
Glu
Met
Ala
Val
Ile
Tyr
Pro
Gly
Pa
1.51
1.45
1.42
1.06
1.08
0.69
0.57
0.57
Pb
0.37
1.05
0.83
1.70
1.60
1.47
0.55
0.75
Pt
0.74
0.60
0.66
0.50
0.50
1.14
1.52
1.56
Conformation
Phi Psi Omega
Residues
per turn
Translation
per residue
Antiparallel beta
-139
+135
-178
2.0
3.4
Parallel beta
-119
+113
180
2.0
3.2
alpha helix
-57
-47
180
3.6
1.5
3-10 helix
-49
-26
180
3.0
2.0
Xi-helix
-57
-70
180
4.4
1.15
Polyproline I
-83
+158
0
3.33
1.9
Polyproline II
-78
+149
180
3.0
3.12
Polyproline III
-80
+150
180
3.0
3.1
Garnier-Osguthorpe-Robson
GOR
• Využívá tabulku tendencí určenou primárně
z krystalových struktur
• Tabulka obsahuje jednu pravděpodobnost
pro každou strukturu a každou
aminokyselinu v okně dlouhém 17
aminokyselin
Teorie informace aplikovaná na predikci
struktury
Jakou informaci získáme o pravděpodobnosti, že
residuum j je v jistém stavu (H,E,T,C), ze znalosti
jaké residuum je v pozici jm (m  8), nezávisle
na tom co je residuum j zač.
Je li m=0, podobné Chou-Fasman
SUM
-7
-6
-5
-4
-3
-2
-1
pro
2
3
4
5
6
7
Asn Glu Asp Glu Leu Lys His Gly
-225
H
m=0 1
-60 -77 0
-248
-212
Asn
Glu Asp Glu Leu Lys His Gly
-90
-65 -15 -63 -15 -37 15
Asn Glu
Asp Glu Leu Lys His Gly
-45 -55
-55 -77 5
Asn Glu Asp
-203
-73
-70
-98
-58 15
Asn Glu Asp Glu
Leu Lys His Gly
15
33
-35 -55 -55
-70 5
Asn Glu Asp Glu Leu
Lys His Gly
20
-53 0
-27 -15 -45 25
Asn Glu Asp Glu Leu Lys
His Gly
20
0
-20 0
-35 10
-35
Asn Glu Asp Glu Leu Lys His
Gly
12
-45
-15 0
-45 25
Glu Leu Lys His Gly
-10 -45 -105 -65 25
-129
-50 -25 -30 5
-27 0
-23 0
-10
22
33
40
35
22
12
Přesnost predikce
Obě metody mají přesnost kolem 55 – 65%
Hlavní důvod je to, že přeceňují lokální kontext
na úkor globálního, tedy typ proteinu
Tytéž aminokyseliny mohou zaujímat různé
konfigurace v cytoplazmatickém a v membránovém
proteinu
PSIPred output window
PSIPRED PREDICTION RESULTS
Key
Conf: Confidence (0=low, 9=high)
Pred: Predicted secondary structure (H=helix, E=strand, C=coil)
AA: Target sequence
Conf: 952010265389973742568774158851022313889854542110122124543202
Pred: CCCCEECCCEEEEEECCHHHHHHHHCCCCCHHHHHCCCCCCCCCCEEECCCCEEEEEEEC
AA: PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD
10
20
30
40
50
60
Conf: 102122066401257647861344327778750531369
Pred: CEEEEECCCCCEEEEEECCCCHHHHHHHHHHHHCCCCCC
AA: QIIIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF
70
80
90
Calculate PostScript, PDF and JPEG graphical output for this result using:
http://bioinf.cs.ucl.ac.uk/cgi-bin/psipred/graphics/nph-view.cgi?id=103942638010041
1.Avoid Chou and Fasman algorithm
2.Note the accurracy of the algorithms on standard
benchmarks and "real life" situations.
3.Use methods based on multiple alignments. Check
carefully the alignments (avoid redundancies)
4.Use several independant methods, of similar accuracy
5.In case of disagreement, trust PHD, Jnet and Psipred.
Sekundární strukturní prvky – formulace problému
• Daná proteinová sekvence
– NWVLSTAADMQGVVTDGMASGLDKD...
• Predikce sekvence sekundární struktury:
– LLEEEELLLLHHHHHHHHHHLHHHL...
• „3-state“ problém:
{ARNDCQEGHILKMFPSTWYV}n->
{L,H,E}n
Predikce prvků sekundární struktury u proteinů
motivace pro předpověď prvků sekundární struktury
- efektivní konformační vzorek pro 3D protein folding
- vylepšení ostatních sekvenčních a strukturně analytických
metod
: sekvenční alignment
: homologické a „threading“ modelování (CASP)
: analýza experimentálních dat
: protein design
V proteinech se známou strukturou není určení sekundární
struktury jednoznačným a jednoduchým úkolem
Dva základní klasifikační proramy a postupy pro určení SS
z krystalových struktur DSSP a STRIDE
výsledky těchto dvou postupů se liší nepatrně
Reference
Dictionary of protein secondary structure: pattern recognition of
hydrogen-bonded and geometrical features.
Biopolymers. 1983 Dec;22(12):2577-637.
http://www.embl-heidelberg.de/argos/stride/stride_info.html
http://www.cmbi.kun.nl/gv/dssp/
Výskyt aminokyselin a jejich distribuce v
příslušných prvcích sekundární struktury
by měly být vodítkem při predikci prvků SS
stupeň determinace
Klasické metody
-Chou Fasman
-GOR (Garnier-Osguthorpe-Robson)
Adaptivní metody
-Metoda neuronových sítí
pokusná síť používá sadu známých proteinů
k predikci žádané struktury ze sekvenčních
dat
<nnpredict>
-Metoda založená na homologii hledané sekvence
se známými proteiny
<SOPM>
<PHD>
Neural Network methods
• A neural network with multiple layers is
presented with known sequences and
structures - network is trained until it can
predict those structures given those
sequences
• Allows network to adapt as needed (it can
consider neighboring residues like GOR)
The different approaches: Only the original works and the more recent implementations are presented here.
First Generation
(information is coming from a
single residu, of a single
sequence)
Second Generation
(Local interactions are taken into
account)
Single residue
statistics
Explicit rules
Chou and Fasman
1974
GOR1 1978
Lim 1974
GOR3 1987
Zvelebil et al 1987
Third Generation
(Information coming from
homologous sequences is
incorporated)
PREDATOR
1996
DSC 1996
Nearest-Neighbors
Neural-Networks based
prediction
Levin et al 1986
Nishikawa and Ooi 1986
Holley and Karplus 1989
Qian and Sejnowski 1988
Yi and Lander 1993
NNSSP 1995
PHD 1993
Jnet 1999
rPsipred 1999
Related documents