Download Document

Document related concepts
no text concepts found
Transcript
Structural Bioinformatics
Chih-Hao Lu
陸志豪助理教授
[email protected]
學歷
國立交通大學生物資訊所 博士
專長
研究
領域
結構生物資訊、計算生物學、
演化式計算與機器學習
蛋白質區域結構模組與功能預測
蛋白質結構與動力學的相關研究
蛋白質與分子的交互作用相關研究
Mechanism of drug actions
• To identify drugs that inhibit target proteins
involved in diseases and have therapeutic effect
against diseases
– Drugs often have stronger binding affinities than
natural compounds
A pathway of disease
Natural
compound
x
x
Protein
Drug
Target protein
x x
Protein
Classification of Drug Development
Unknown
O
O
High-Throughput Screening
(HTS)
O
O
Similar compounds
Structure-based Drug Design
(SBDD)
O
O
O
query
Known
Protein (receptor) Structure
Compound similarity search
SBDD or de novo design
O
DDT 2002
Known
Compound structure
Unknown
Central Dogma
Why study protein structure?
• Proteins play crucial functional roles in all biological
processes: enzymatic catalysis, signaling messengers …
• Function depends on 3D structure.
• Easy to obtain protein sequences, difficult to determine
structure.
7
From primary to quaternary
Primary Structure
•蛋白質的骨架是由二
十種胺基酸(Amino
Acid)所組成的長條序
列
•胺基酸彼此是由胜汰
鍵(Peptide Bond)所
連結
Proteins are polypeptide chains
20 Amino Acids
Hydrophobic
?
Polar
Charged
Amino acid
Abbreviated names
Mt
Occurrence in proteins(%)
Glycine
Gly
G
75
7.2
Alanine
Ala
A
89
7.8
Valine
Val
V
117
6.6
Leucine
Leu
L
131
9.1
Isoleucine
Ile
I
131
5.3
Methionine
Met
M
149
2.3
Phenylalanine
Phe
F
165
3.9
Tyrosine
Tyr
Y
181
3.2
Tryptophan
Trp
W
204
1.4
Serine
Ser
S
105
6.8
Proline
Pro
P
115
5.2
Threonine
Thr
T
119
5.9
Cysteine
Cys
C
121
1.9
Asparagine
Asn
N
132
4.3
Glutamine
Gln
Q
146
4.2
Lysine
Lys
K
146
5.9
Histidine
His
H
155
2.3
Arginine
Arg
R
174
5.1
Aspartic acid
Asp
D
133
5.3
Glutamic acid
Glu
E
147
6.3
Secondary Structure
Sequence
TTCCPSIVARSNFNVC
RLPGTPEAICATYTGC
II
a helix
•平均每3.6個殘基
(Residues)形成一個
轉折
• a helix的結構是
由氫鍵(Hydrogen
bonds)的交互作用形
成
310helix, a helix, p helix
The a helix has a dipole moment
Some amino acids are
preferred in a helices
•Good
–Ala Glu Leu Met
•Poor
–Pro Gly Tyr Ser
•結構具有雙向性(Amphipathic)
–疏水性(Hydrophobic)
–親水性(Hydrophilic)
Helical
wheel
b sheet
• b sheet 是由數個
彩帶狀的b strand
所組成的平面
•每兩個b strand可
以分成平行
(parallel)與反平
行(antiparallel)
的結構
Antiparallel b sheets
Parallel b sheets
Turn or Loop
•連接a helix或是b
strand 時,peptide
bond需要作將近180
度的轉折,這些區域
就稱之為Turn
•此外有一些不規則
的結構,統稱為Loop
Turn
Loop
Hairpin loops
Secondary structure elements are
connected to form simple motifs
Schematic diagrams of the
calcium-binding motif
(Luscombe, Genome Biology 2000)
The hairpin b motif occurs
frequently in protein structures
The Greek key motif is found in
antiparallel b sheets
Tertiary Structure
Sequence
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
Secondary Structure
•數個secondary
structure聚在一起,就
形成了蛋白質的三級結
構(Tertiary Structure)
b sheet
a Helix
Tertiary Structure
loop
Simple motifs combine to form
complex motifs
Quaternary structure
•由數個相同或是不
同的三級結構分子
(subunit),再結合
而成的複合體,稱為
四級結構。
How to determine the protein
structure?
• By experimentation
– X-Ray
– NMR (nuclear magnetic resonance spectroscopy)
• Sequence-Structure gap
31
Structure Determination
(X-ray)
Publication
Target
Selection
Crystallomics
Data
Collection
Structure
Solution
Structure
Refinement
PDB
Deposition
Isolation, Expression,
Purification,Crystallization
Functional
Annotation
The first x-ray crystallographic
structural results in 1958
first determination 3-D globular protein structure
(myglobin) in 1958 – John Kendrew
Molecular visualization
• Abstract views of macromolecular
– well-defined secondary structure elements (ahelices and b-strands)
– Jane Richardson, 1985
• a-helix as simple cylinder or broad, spiral ribbon
• b-strand as broad, flat ribbon
The structure of myoglobin
Molecular visualization
RasMol
PyMOL
Swiss-Pdb Viewer
MOLMOL
MolScript
MDL Chime
Green Fluorescent Protein (GFP)
Green Fluorescent Protein (GFP)
Green Fluorescent Protein (GFP)
Green Fluorescent Protein (GFP)
The Protein Data Bank
http://www.rcsb.org/
pdb/home/home.do
Number of Structures Available
Structure-based databases
• Popular software and resources for protein structure
validation
– PDBSum, Procheck, What_Check
• Resources classifying protein structure
– SCOP, CATH, DALI, VAST, CE
• Popular resources of protein interactions
– Protein-Protein(DNA) interaction server, DIP, MINT
• Popular resources visualizing macromolecular
structures
– PDBSum, NDB Atlas, STING
Protein evolution and the SCOP database
http://scop.berkeley.edu/
SCOP
• Classes
– all-b protein
• can have small adornment of a or 310 helix
– all-a structures
• may have several regions of 310 helix, and small b-sheet
outside the a-helical core
– a/b (alpha and beta)
• mainly parallel b sheets (b-a-b units)
– a+b (alpha plus beta)
• mainly antiparallel b sheets (segregated a and b region)
– others
• multidomain proteins, membrane and cell surface proteins,
small proteins, coiled coil proteins, low-resolution structures,
peptides, and designed proteins
SCOP Sample Hierarchy
b
Rossmann fold
TIM
a/b
Flavodoxin-like
Trp biosynthesis
b-Galactosidase (3)
a+b
a/b-Barrel
Glycosyltransferase RuBisCo (C)
b-Glucanase a-Amylase (N) b-Amylase
Acid a-amylase Cyclodextrin glycosyltransferase Oligo-1,6 glucosidase
A. niger
B. circulans
B. stearothermophilus
2aaa:1-353 1cdg:1-382 1cgt:1-382
B. cereus
1cyg:1-378 J. Biochem 113:646-649
Determined by structure
a
Root
Class
Fold
Superfamily
Family
Protein
Species
PDB/Ref
Related by homology
scop
The CATH domain structure database
http://www.cathdb.info
CATH
http://www.cathdb.info/index.html
Structure quality assurance
•
•
•
•
•
Not all structures are of equally high quality
Models from X-ray crystallography
Models from NMR spectroscopy
Errors in deposited structures
Procheck, What_Check
2YSB
Ramachandran Plot
•
•
•
•
A graph between the dihedral
angles of an amino acid in a
protein.
Due to steric hindrance from
amino acid side chains, only
certain angles are allowed in a
folded protein.
A plot between the dihedral angles
of individual amino acids in a
protein can serve to indicate how
well the structure has been
determined.
Any deviations from the allowed
values are called Outliers and
C
usually indicate bad geometry
Dihedral Angles
N
Ramachandran Plot
Standard Plot showing where
different secondary structures fit
into the plot.
A real life example. All non-glycine
residues are in allowed regions.
Validation
So what do you think about this ?
• Ideally, there should be no
outliers in the Ramachandran
plot, except for Glycine and
Proline, which are “special”
amino acids.
• However, there may be some
rational explanation for outliers
by the scientist depositing the
structure. (Always refer to the
publication!).
• Expect to find more than 8590% of residues to fall into the
red regions.
Secondary structure assignment
http://swift.cmbi.ru.nl/gv/dssp/
The role of secondary structure
• In structural genomics
– basic unit for structure classification
– main uses
•
•
•
•
it is indicative of the fold
it is an intuitive means of visualizing protein structure
it influences the sequence alignment
it is related to function
– applications (ex. Secondary Structure Element)
• speed up large-scale all-against-all alignment of 3D
structures
• comparative modeling and threading
Hydrogen Bonding is Key to
Automated Methods
• Why? - ~90% of backbone donors (NH)
and acceptors (C=O) form hydrogen
bonds
• Basic definition
– Angle N – (H) – O greater than 120 degrees
– H …O less than 2.5Å
– Note H’s not usually identified directly
Angle-distance hydrogen bond
assignment
• Baker and Hubbard assigned hydrogen bonds according
to the angle N-H-O and to the distance rHO (1984)
O
?
<2.5Å
>120°
N
H
1Å
O
~3.122Å
30°
?
2.5Å
120°
N
2.165Å
60°
H
1.25Å
1Å
180°
N
H
1Å
O
2.5Å
Coulomb hydrogen bond
calculation – used by DSSP

1
1
1
+ -  1
E = f    +
+ +
rNO rHC' rHO rNC'
•
•
•
•
f is a constant 332 Å kcal/e2
Delta is the + and – polar charge in electrons
Weakest H-bond –0.5 kcal/mole in DSSP
H not given – requires extrapolation – note assumes
planar geometry for peptide bond



DSSP
•
•
•
•
•
•
•
•
H – alpha helix
G = 310 helix
I = Pi helix
B = bridge – single residue sheet
E = extended beta strand
T = beta turn
S = bend
C = coil
http://e106.life.nctu.edu.tw/~hwhuang/dssp/
DSSP as Implemented in the PDB
1ATP
Identifying structural domain and
function in proteins
1NTY
Prediction of protein-protein or
protein-DNA interaction
• Sequence-based methods
– Homology
– Correlated Mutation
• Structure-based methods
– Physical docking
• Hybrid methods
Principles and methods of docking and
ligand design
• Structure-based design
– Docking
• Analog-based design
– QSAR
– (Quantitative
structure-activity
relationships)
Most force fields consist of a summation of
bonded forces associated with chemical bonds,
bond angles, and bond dihedrals, and nonbonded forces associated with van der Waals
forces and electrostatic charge.
Fold recognition method
Prediction in 1D
– Secondary structure prediction
– Solvent accessibility prediction
– Disulfide bond prediction
– Fold recognition
– Enzyme class prediction
– Subcellular localization prediction
– Metal binding sites prediction
– Disulfide connectivity prediction
– Phi psi angle prediction
Secondary structure prediction
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
EEEELLLLLHHHHHHHHHHHHLLLLLHHHHHHLLLLEEEELLLLL
b sheet
H a Helix
E b sheet
L loop
a Helix
loop
Solvent accessibility prediction
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
EEEEBBBBEEEEEBBBBBBBEEEEEEBBBBBBBEEEEEEEBBBBEE
E
E
E
E
B Buried
E Exposed
B
B
B
B
B
B
E
E
Disulfide bond prediction
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
OO
R
O
O
R
Fold recognition
Root
classes
folds
a
b
TIM barrel
a/ b
superfamily
TIM
…
Aldolase
…
……
family proteins species
Chicken
TIM
TIM
Human
…
…
……
…….
a+b
SCOP
Structure
Classification
Of
Proteins
?
Multi-domain
Membrane..
TTCCPSIVARSNFNVCRL
PGTPEAICATYTGCIIIPGA
TCPGDYAN
Small protein
..
….
….…
SCOP statistics 11
800
1294
2327
Subcellular localization prediction
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
?
Eukaryotic Cellular compartments
Metal binding sites prediction
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
NNNNBNNBNNNNNBNNNNNBNNNNNNNNNNNNNNNNNNBNNN
B Binding
N Non-binding
Phi psi angle prediction
Ramachandran plot
• Phi Cn-1 – Nn – Can – Cn
• Psi Nn – Can – Cn – Nn+1
A
B
C
D
G
E
F
H
I
J
K
L
M
N
O
P
Q
R
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
AADGJJKKCPGDANOOEEAAAAJJJJJJJJKKNNQQCCJJJJAAAA
Disulfide Connectivity Prediction
TTC1C2PSIVARSNFNVC3RLPGTPEAIC4ATYTGC5IIIPGATC6PGDYAN
C4
C3
C1
C5
C6
C2
connectivity pattern
1-6, 2-5, 3-4
Training Data
SVM Model
Class 1
Class 2
Class 3
Class 4
:
Features 1~N
Features 1~N
Features 1~N
Features 1~N
:
Class K Features 1~N
SVM
Testing Data
SVM Model
Feature 1
Class 1
Feature 2
Class 2
Feature 3
Class 3
:
:
Feature N
SVM
:
:
Class K
Protein Structure Prediction
Sequence
Sequence Homology
To known fold
>30%
<30%
Homology
Modeling
Threading
Yes
Match Found?
No
Model
Ab initio
86
Homology modeling
• The goal of protein modeling is to predict a structure
from its sequence
–
–
–
–
–
–
–
Template recognition and initial alignment
Alignment correction
Backbone generation
Loop modeling
Side-chain modeling
Model optimization
Model validation
What is Homology Modeling?
Target
Template
KQFTKCELSQNLYDIDGYGRIALPELICTMF
HTSGYDTQAIVENDESTEYGLFQISNALWCK
SSQSPQSRNICDITCDKFLDDDITDDIMCAK
KILDIKGIDYWIAHKALCTEKLEQWLCEKE
?
Homologous
Share Similar
Sequence
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK
FESNFNTQATNRNTDGSTDYGILQINSRWWCND
GRTPGSRNLCNIPCSALLSSDITASVNCAKKIV
SDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
Use as template
1alc
8lyz
88
Structure prediction by homology
modeling
Step 1
Step 2
Step 3
Step 4
89
Structure comparison and alignment
1CRN
1JXX
CE
http://cl.sdsc.edu/ce.html
DALI
http://ekhidna.biocenter.helsinki.fi/dali_server/
Related documents