Download Bioinformatics (1)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CISC 467/667 Intro to Bioinformatics
(Spring 2007)
Protein Structure Prediction
Protein Secondary Structure
CISC667, S07, Lec20, Liao
1
Protein structure
• Primary: amino acid sequence of the protein
• Secondary: characteristic structure units in
3-D.
• Tertiary: the 3-dimensional fold of a
protein subunit
• Quaternary: the arrange of subunits in
oligomers
CISC667, S07, Lec20, Liao
2
Experimental Methods
•
•
•
•
•
X-ray crystallography
NMR spectroscopy
Neutron diffraction
Electron microscopy
Atomic force microscopy
CISC667, S07, Lec20, Liao
3
• Computational Methods for secondary
structures
– Artificial neural networks
– SVMs
–…
• Computational Methods for 3-D structures
– Comparative (find homologous proteins)
– Threading
– Ab initio (Molecular dynamics)
CISC667, S07, Lec20, Liao
4
CISC667, S07, Lec20, Liao
5
CISC667, S07, Lec20, Liao
6
CISC667, S07, Lec20, Liao
7
CISC667, S07, Lec20, Liao
8
CISC667, S07, Lec20, Liao
9
• Helix complete turn
every 3.6 AAs
• Hydrogen bond
between (-C=O) of one
AA and (-N-H) of its 4th
neighboring AA
CISC667, S07, Lec20, Liao
10
Hydrogen bond b/w carbonyl oxygen atom on one
chain and NH group on the adjacent chain
CISC667, S07, Lec20, Liao
11
Ramachandran Plot
PHI: -57; PSI -47
CISC667, S07, Lec20, Liao
12
Ramachandran Plot
Parallel: PHI: -119; PSI: 113
Anti-parallel: PHI: -139; PSI: 135
CISC667, S07, Lec20, Liao
13
CISC667, S07, Lec20, Liao
14
CISC667, S07, Lec20, Liao
15
Residue conformation preferences
Helix: A, E, K, L, M, R
Sheet: C, I, F, T, V, W, Y
Coil: D, G, N, P, S
CISC667, S07, Lec20, Liao
16
Artificial neural networks
• Perceptron o(x1, …, xn ) = g(∑jWj xj )
X0 = 1
x1
x2
xn
W1
Activation
function
W0
W2
.
.W
.
∑jWj xj
n
Input
function
g
Output
o
output
Input links
CISC667, S07, Lec20, Liao
17
• Activation functions
+1
+1
+1
x
x
t
x
-1
Step(x) =
1 if x ≥ t
0 otherwise
Sign(x) =
1 if x ≥ 0
Sigmoid(x) = 1/(1+e-x)
-1 otherwise
CISC667, S07, Lec20, Liao
18
Artificial Neural Networks
CISC667, S07, Lec20, Liao
19
2-unit output
CISC667, S07, Lec20, Liao
20
• Learning: to determine weights and
thresholds for all nodes (neurons) so that the
net can approximate the training data within
error range.
– Back-propagation algorithm
• Feedforward from Input to output
• Calculate and back-propagate the error (which is the
difference between the network output and the target
output)
• Adjust weights (by gradient descent) to decrease the
error.
CISC667, S07, Lec20, Liao
21
Gradient descent
w new = w old - r [∂E/∂w]
E[w]
where r is a positive constant called learning rate, which
determines the step size for the weights to be altered in the
steepest descent direction along the error surface.
CISC667, S07, Lec20, Liao
22
Data representation
CISC667, S07, Lec20, Liao
23
• Issues with ANNs
– Network architecture
• FeedForward (fully connected vs sparsely connected)
• Recurrent
• Number of hidden layers, number of hidden units within a
layer
– Network parameters
• Learning rate
• Momentum term
– Input/output encoding
• One of the most significant factors for good performance
• Extract maximal info
• Similar instances are encoded to “closer” vectors
CISC667, S07, Lec20, Liao
24
An on-line service
CISC667, S07, Lec20, Liao
25
• Performance
– ceiling at about 65% for direct encoding
• Local encoding schemes present limited correlation
information between residues
• Little or no improvement using multiple hidden layers.
– Surpassing 70% by
• Including evolutionary information (contained in multiple
alignment)
• Using cascaded neural networks
• Incorporating global information (e.g., position specific
conservation weights)
CISC667, S07, Lec20, Liao
26
Cathy Wu, Computers Chem. 21(1997)237-256
CISC667, S07, Lec20, Liao
27
Resources
Protein Structure Classification
– CATH:
http://www.biochem.ucl.ac.uk/bsm/cath/
– SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/
– FSSP:
PDB: http://www.rcsb.org/pdb/
CISC667, S07, Lec20, Liao
28
Related documents