Download Lecture 9 Protein Secondary Structure

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Magnesium transporter wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Interactome wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Point mutation wikipedia , lookup

Metalloprotein wikipedia , lookup

Biosynthesis wikipedia , lookup

Genetic code wikipedia , lookup

Structural alignment wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Biochemistry wikipedia , lookup

Transcript
Introduction to Bioinformatics
for Medical Research
Gideon Greenspan
[email protected]
Lecture 9
Protein Secondary Structure
Protein Secondary Structure
• Protein Structure
• Protein Folding
– Alpha helices, beta sheets, loops
• Secondary Structure Prediction
– Artificial Neural Networks
• PHDsec
• PSIpred
2
Structure Levels
3
Structure Prediction: Motivation
• Understand protein function
– Locate binding sites
• Broaden homology
– Detect similar function where sequence differs
• Explain disease
– See effect of amino acid changes
– Design suitable compensatory drugs
4
Prediction Methods
• Primary (sequence) to secondary structure
– Sequence characteristics
• Secondary to tertiary structure
– Fold recognition
– Threading against known structures
• Primary to tertiary structure
– Ab initio modelling
5
Protein Folding
• Proteins fold in watery substrate
– Want hydrophobic residues on interior
– But main chain is hydrophyllic
• Hydrogen bonds neutralize chain
– Alpha helices and beta sheets form core
• Many other chemical interactions
– Including with external chaperones
6
Alpha Helices
• Right-handed spiral
– 5 to 40 amino acids (10 average)
– 3.6 amino acids per turn
7
Beta Sheets
• Parallel or anti-parallel strands
– Each strand has 5-10 amino acids (6 average)
– Up to 6 strands
8
Loop Regions
• All other protein regions
– Irregular shape and size
– Generally at protein surface
9
Secondary Structure Prediction
• Chou-Fasman / GOR Method
– Based on amino acid frequencies
– No more than 60% accurate
• Neural Network methods
– PHDsec and PSIpred
• Use multiple sequences
– Secondary structure based on family
• Best accuracy now ~78%
10
Brain Neurons
• Outgoing signal determined by incoming
• Connected together in networks
• Learns from and teaches experience
11
Artificial Neurons
• Each input’s weight (+ve or -ve) is learned
• Weighted inputs are summed
• Output simulates threshold
12
Neural Networks
13
PHDsec and PSIpred
• PHDsec
– Rost & Sander, 1993
– Based on sequence family alignments
• PSIpred
– Jones, 1999
– Based on PSI-BLAST profiles
• Both consider long-range interactions
14
PHDsec Neural Net
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
.
Amino
acid at
position
Inputs for
one position
Outputs for
alpha helix,
beta strand,
loop
Hidden
layer
15
PSIpred Input
Input
sequence
Type of Analysis
Include PSIBLAST results?
Email address
16
PSIpred Output
Conf: Confidence (0=low, 9=high)
Pred: Predicted secondary structure (H=helix, E=strand, C=coil)
AA: Target sequence
Confidence level
Conf: 988766667637889999877999871289878877049963202468899999997887
Pred: CCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCCCHHHCCCCCHHHCHHHHHHHHHHHHHHH
AA: MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRE
10
20
30
40
50
60
Predicted structure
Conf: 742888731467888768899999999999999987557888998875227887303678
Pred: HHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHH
AA: LASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIA
70
80
90
100
110
120
17
PHDsec Input (1)
Email address
Type of
prediction
Additional
output
Output
format
Reduce
processing
18
PHDsec Input (2)
Type (number) of
input sequences
Upload file
Enter sequence
Wait for results?
19
PHDsec Output (1)
Protein
classification
Structure
proportions
Amino acid
proportions
20
PHDsec Output (2)
Estimated
structure
Confidence
level
Structure
with high
confidence
21
Future Directions
• Finer distinction between features
– E.g. different types of coil/loop
• Multiple neural networks
– Combine different approaches
• More global approaches
– Take information from entire protein
• Still no higher than 80%
22