Download Homology modeling workshop

Document related concepts

Genetic code wikipedia , lookup

Magnesium transporter wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Expression vector wikipedia , lookup

Metalloprotein wikipedia , lookup

Gene expression wikipedia , lookup

Interactome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein wikipedia , lookup

Point mutation wikipedia , lookup

Western blot wikipedia , lookup

Protein purification wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Transcript
Homology Modeling
Workshop
GHIKLSYTVNEQNLKPERFFYTSAVAIL
Outline:
• Introduction to protein structure & databases
• Structure prediction approaches
– Ab-initio
– Threading
– Homology modeling
• Hands ON
From Sequence to Structure
Protein structure is hierarchic:
•
Primary – sequence of covalently attached amino acid
•
Secondary – local 3D patterns (helices, sheets, loops)
•
Tertiary – overall 3D fold
•
Quaternary – two or more protein chains
From Sequence to Structure
• All information about the native structure of a protein is
encoded in the amino acid sequence + its native solution
environment.
• Many possible conformation  still only one or few native
folds are exhibited for each protein (Levinthal’s paradox)
• Protein folding is driven by various forces:
– Ionic forces
– Hydrogen bonds
– The hydrophobic affect
– ...
Protein 3D Structures
A protein’s structure has a critical effect on its function:
1. Binding pockets
PDB ID 1nw7
Protein 3D Structures
A protein’s structure has a critical effect on its function:
2. Areas of specific chemical\electrical properties
Protein 3D Structures
A protein’s structure has a critical effect on its function:
3. Importance of the global fold for function
Motivation to Acquire a Structure
• Identifying active and binding sites
• Characterization of the protein’s mechanism
(catalysis & interactions)
• Searching for ligand of a given binding site
• Understanding the molecular basis of diseases
• Designing mutants
• Drug design
• And more...
Determining Structure
• NMR
• X-ray diffraction
• Electron Microscopy
Why predict protein structure if we
can use experimental tools to
determine it?
• Experimental methods are slow and expensive
• Some structures were failed to be solved
• A representative family structure can suffice to
deduce structures of the entire family sequences
Protein databases
Protein Sequence
& Structure Databases
Some of the available databases:
• RCSB- the Protein Data Bank- all deposited structures
• UniProt- main sequence database
– SwissProt
– Tremble
• NCBI- lots of databases, including sequence and structures
• PDBsum- combines structural & sequence data
UniProt- Protein Sequence
Database
• UniProt is a collaboration between the
European Bioinformatics Institute (EBI), the
Swiss Institute of Bioinformatics (SIB) and the
Protein Information Resource (PIR).
• In 2002, the three institutes decided to pool
their resources and expertise and formed the
UniProt Consortium.
UniProt- Protein Sequence
Database
• The world's most comprehensive catalog of information on
proteins
• Sequence, function & more…
• Comprised mainly of the databases:
– SwissProt –516081 entries– high quality annotation, nonredundant & cross-referenced to many other databases.
– TrEMBL – 10618387 entries – computer translation of the
genetic information from the EMBL Nucleotide Sequence
Database  many proteins are poorly annotated since
only automatic annotation is generated
UniProt- Protein Sequence
Database
UniProt- Protein Sequence
Database
Protein Data Bank (PDB)
• The PDB archive contains information about experimentallydetermined structures of proteins, nucleic acids, and complex
assemblies.
• The structures in the archive range from tiny proteins and bits
of DNA to complex molecular machines like the ribosome.
• There are currently 57013 structures deposited in the PDB.
However, taking out redundant sequences (e.g. 90%) reduces
the number of structures to 19988…
• Each structure receives a unique 4 letter ID
Protein Data Bank (PDB)
http://www.rcsb.org/pdb/home/home.do
PDB ID: 3mht
Protein Data Bank (PDB)
http://www.rcsb.org/pdb/home/home.do
Download
structure
The paper describing
the structure
Data concerning the
structureresolution, R-value….
Display
structure
Protein Data Bank (PDB)
Year
PdbSum
• A database providing an overview of all biological
macromolecular structures
• Connected to UniProt  find the sequence accession of a
known PDB ID
• Detailed description of many structure properties, e.g.:
– EC number
– Chains & ligands and their interactions
– Clefts
– Secondary structure
– FASTA sequence of structure…
–…
PdbSum
PDB ID
http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/
Free text
Search by sequence
PdbSum
Useful tabs
UniProt
accession
Chains &
ligands
PdbSum
Protein tab
Secondary structurefrom the PDB
More Sequences Than Structures
• Discrepancy between the number of known sequences and
solved structures:
5,047,807 UniRef90 entries vs.
25566 90% Non-redundant structures
Computational methods are needed to
obtain more structures
Structure prediction
approaches
Structure Prediction Approaches
1. Homology (Comparative) Modeling
Based on sequence similarity with a protein for
which a structure has been solved.
2. Threading (Fold Recognition)
Requires a structure similar to a known structure
3. Ab-initio fold prediction
Not based on similarity to a sequence\structure
Ab-initio
Structure prediction from “first principals”:
Given only the sequence, try to predict the structure
based on physico-chemical properties
(energy, hydrophobicity etc.)
•
When all else fails  works for novel folds
•
Shows that we understand the process
The Force Field
(energy function)
A group of mathematical expressions describing the
potential energy of a molecular system
•
Each expression describes a different type of physicochemical interaction between atoms in the system:
•
Van der Waals forces
•
Covalent bonds
•
Hydrogen bonds
•
Charges
•
Hydrophobic effects
Non-bonded
terms
Approaches to Ab-initio Prediction
1. Molecular Dynamics
• Simulates the forces that governs the protein within water.
• Since proteins usually naturally fold, this would lead to the
native protein structure.
Problems:
• Thousands of atoms
• Huge number of time steps to reach folded protein
 feasible only for very small proteins
Approaches to Ab-initio Prediction
2. Minimal Energy
Assumption: the folded form is the minimal energy
conformation of a protein
Main principals:
• Define an energy function.
• Search for 3D conformation that minimize energy.
Ab-initio
• Current methods (e.g. Rosetta) primarily utilize the
fact that although we are far from observing all
protein folds, we probably have seen nearly all substructures:
• A library of known sub-structures
(fragments less than 10 residues) is created.
• A range of possible conformations for
each fragment in the query protein are selected.
Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)
Ab-initio - Example
Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)
Fold Recognition (Threading):
Sequence to structure matching
Given a sequence and a library of folds, thread the sequence
through each fold. Take the one with the highest score.
• Method will fail if new protein does not belong to any fold in
the library.
• Score of the threading is computed based on known
physical chemistry properties & statistics of amino acids.
• In practice, fold recognition methods are often mixtures
of sequence matching and threading.
Structure Prediction Approaches
Threading: example
Input:
1. sequence
H bond donor
H bond acceptor
Glycin
Hydrophobic
2. Library of folds of known proteins
Threading: example
H bond donor
H bond acceptor
Glycine
Hydrophobic
S=-2
Z= -1
S=5
Z=1.5
S=20
Z=5
Fold recognition (threading)
Find best fold for a protein sequence:
1)
...
56)
...
MAHFPGFGQSLLFGYPVYVFGD...
-10
...
...
n)
...
-123
...
20.5
Potential fold
We need a scoring (energy) function to distinguish native
structure from misfolded structures.
Ideally, each misfolded structure should have an energy
higher than the native energy, i.e. :Emisfolded-Enative> 0
Fold recognition: FFAS03
•The FFAS03 server provides an interface to the third
generation of the profile-profile alignment and fold recognition
algorithm FFAS.
• Profile-profile alignments utilize information present in
sequences of homologous proteins to amplify the sequence
conservation pattern defining the family
•The result: detection of remote homologies beyond the reach
of other sequence comparison methods.
Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. & Godzik, A. (2005) FFAS03: a server for profile-profile sequence
alignments. Nucl. Acids Res. 33, W284-W288
Fold recognition: HHPRED
Profiles are based on Hidden Markov Models:
0.4
0.1
0.1
0.5
0.6
0.7
0.4
0.7 0.2
0.3
0.6
Emit Amino acid
Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960.
Fold recognition: HHPRED
• Profile Hidden Markov Models (HMMs) are similar to sequence
profiles, but in addition to the amino acid frequencies they
contain information about the frequency of inserts and deletions.
• Using profile HMMs in place of simple sequence profiles should
therefore further improve sensitivity.
• The first to employ HMM-HMM comparison, based on a novel
statistical method.
• Using HMMs both on the query and the database side greatly
enhances the sensitivity and selectivity over sequence-profile
based.
Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960.
I-TASSER- Hybrid Approach
• In a recent wide blind experiment, I-TASSER
generated the best 3D structure predictions among
all automated servers.
• Based on the secondary-structure threading
and the iterative implementation of the Threading
ASSEmbly Refinement (TASSER) program.
I-TASSER
Homology Modeling
Homology Modeling –
Basic Idea
1.
A protein structure is defined by
its amino acid sequence.
2.
Closely related sequences adopt
highly similar structures, distantly
related sequences may still fold
into similar structures.
3.
Three-dimensional structure of
proteins from the same family is
more conserved than their
primary sequences.
Triophospate ismoerases
44.7% sequence identity
0.95 RMSD
Homology modeling requires handling
structures & sequences
• Query- only the protein sequence is available- usually found
at the UniProt database
• Template- after identification, both structural and sequencerelated data should be found- UniPort (or NCBI databases),
RCSB and PDBsum
Homology modelingwidespread technique
Query protein
sequence
Identify
Homologous proteinstructural template
Align query & template
protein sequences
Build model
e.g. Fiser et al., 2004;
Petrey et al., 2005;
Zhang, 2008
Evaluate model
General Scheme
1.
Searching for structures related to the query sequence
2.
Selecting templates
3.
Aligning query sequence with template structures
4.
Building a model for the query using information from
the template structures
Modeller
5.
Evaluating the model
Fiser A et al. Methods in Enzymology 374: 461-491(2004)
General Scheme
1. Searching For Structures
•
Sequence search against the PDB sequences
•
Sequence-profile search
•
Threading: sequence-structure fitness function
1. Searching For Structures
If BLAST search against the PDB fail to recognize adequate
templates, turn to fold recognition (threading) servers:
• FFAS03- http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl
• HHPRED- http://toolkit.tuebingen.mpg.de/hhpred
• HMAP (available through the FUDGE pipeline)http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:
PUDGE
• I-TASSER- http://zhang.bioinformatics.ku.edu/I-TASSER/
These servers not only find optional templates, but also suggest a
pairwise alignment and in some cases even construct the 3D
model.
2. Selecting Templates
How to select the right template?
•
Higher sequence similarity - %ID
•
Close subfamily - phylogenetic tree
•
Seq. 1
“Environment” similarity - solvent, pH, ligand,
Seq. 2
quaternary interactions
Seq. 3
Seq. 4
determined
Seq. 5
Seq. 6
•
The quality of the experimentally
structure
•
Purpose of modeling - e.g. protein-ligand model vs.
geometry of active site
2. Selecting Templates
More than one template
•
Two ways to combine multiple templates:
–
Global model – alignment with different domain of
the target with little overlap between them
–
Local model – alignment with the same part of the
target
2. Selecting Templates
More than one template
The more the merrier -
multiple structures with
the same fold:
2. Selecting Templates
Trial and error
•
Generate a model for each candidate
template and/or their combination.
•
Evaluate the models by an energy or
any other scoring function.
(will be discussed later…)
3. Aligning query and
template sequences
• All comparative modeling programs depend on a
target-template alignment.
• When the sequence similarity between the template
and target proteins is high, simple pairwise alignments
are usually fine (e.g. Needleman-Wunsch global
alignment).
• Gaps or low/medium sequence similarity indicate that
we should improve the alignment...
3. Aligning query and
template sequences
Guidelines:
1.
Create a multiple sequence alignment and extract the
template-query pairwise alignment.
Pairwise alignments – not enough!
3. Aligning query and
template sequences
Guidelines:
1.
Create a multiple sequence alignment and extract the
template-query pairwise alignment.
Template
Query
•
Visual inspection of alignments - difficult to teach…
a matter of experience…
3. Aligning query and
template sequences
Guidelines:
1.
Create a multiple sequence alignment and extract the
template-query pairwise alignment.
2.
Use secondary structure information to improve
pairwise alignment- avoid gaps in these regions!
Query
Template
3. Aligning query and
template sequences
Guidelines:
1.
Create a multiple sequence alignment and extract the
template-query pairwise alignment
2.
Use secondary structure information to improve
pairwise alignment- avoid gaps in these regions!
3.
Biochemical and structural previous data
3. Aligning query and
template sequences
Tips for MSA building
• Where? (to find homologues)
• Structural templates- search against the PDB
• Sequence homologues- search against SwissProt or
Uniprot (recommended!)- usually using BLAST
• How many?
• As many as possible, as long as the MSA looks good
(next week…)
3. Aligning query and
template sequences
Tips for MSA building
• How long? (length of homologues)
• Fragments- short homologues (less than 50,60% the
query’s length) = bad alignment
• Ensure your sequences exhibit the wanted domain(s)
• N/C terminal tend to vary in length between homologues
• How close? (distance from query sequence)
• All too close- no information
• Too many too far- bad alignment
• Ensure that you have a balanced collection!
3. Aligning query and
template sequences
Tips for MSA building
• From who? (which species the sequence belongs to)
• Don’t care, all homologues are welcome
• Orthologues/paralogues may be helpful
• Sequences from distant/close species provide different
types of information
• Which alignment method?
• The best today are MUSCLE, T-Coffee and MAFFT. All
available at
3. Aligning query and
template sequences
Tips for MSA building
• Most importantly, make sure that both the query
and the selected template are included in the MSA.
• Sequences which are more distant than the template
are not needed to be included in the alignment.
3. Aligning query and
template sequences
Query-template alignment
via a profile-to-profile approach:
1. Construct an MSA for the query, serving as profiles depicting
the protein family properties.
2. Align the profile to profiles of all proteins of the PDB, using,
e.g., FFAS03 or HHpred.
3. Compare pairwise alignments constructed via the different
methods – hope to get a consensus prediction…
3. Aligning query and
template sequences
Different levels of similarity between the template & query
initiate various computational approaches:
4. Building a model
Once you have an improved pairwise
alignment between your query & template
Use Modeller to build your model!
A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial
restraints. J. Mol. Biol. 234, 779-815, 1993.
4. Building a model
Modeller



Generation and Refinement
Using satisfaction of spatial restrains
Can perform additional tasks:
 de novo modeling of loops
 Optimization of models – using an objective
function
 Multiple alignment
 Comparison of protein structures
4. Building a model
Modeller
• Other spatial features, such as
hydrogen bonds, and dihedral angles,
are transferred from the templates to
the target.
• Thus, a number of spatial restraints
on its structure are obtained.
• The 3D model is obtained by
satisfying all the restraints as well as
possible .
4. Building a model
Modeller
• Distance and dihedral angle restraints on the target are
calculated from its alignment with template.
• Restraints were obtained also from a statistical analysis of the
relationships from a large database of pairs of homologous
structures.
• Various correlations were obtained, e.g. correlations between CaCa distances. These relationships can be used directly as spatial
restraints.
• Restraints and CHARMM energy terms are then combined into an
objective function, which is optimized in 3D space.
5. Model Evaluation
• The accuracy of the model depends on its
sequence identity with the template:
5. Model Evaluation
The model can be assessed in two levels:
•
Global- reliability of the model as a whole.
*Useful when several models are generated and
one should be chosen as the best one.
*When different models were based on various
templates, may help choose the best one.
•
Local- assessing the reliability of the different
regions, even specific residues, of the model.
*Useful to detect local mistakes, that may
originate in many time from alignment errors.
5. Model Evaluation
Examples of assessment approaches:
1. Assessment of the model’s stereochemistry
2. Prediction of unreliable regions of the model “pseudo energy” profile: peaks  errors
3. Consistence with experimental observations
4. Consistence with evolutionary conservation rates
Summary:
5 Basic Steps
Hands ON
The Query Protein
Name: Dihydrodipicolinate reductase
Enzyme reaction:
Molecular process: Lysine biosynthesis (early stages)
Organism: E. coli
Sequence length: 273 aa
1. Searching For Structures
1. Searching For Structures
Get your sequence
<DAPB_ECOLI
MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAG
KTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQ
AIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTA
LAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGE
RLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL
http://www.uniprot.org/
1. Searching For Structures
Find templates with significant homology:
• BLAST against the sequences in the PDB
Find also more distant templates, using profile-toprofile approach:
• FFAS03 server
• HHPRED server
1. Searching For Structures
Blast against the PDB
http://www.ncbi.nlm.nih.gov/BLAST/
1. Searching For Structures
Blast against the PDB
1. Paste
sequence
2. Select the PDB
database
3.
http://www.ncbi.nlm.nih.gov/BLAST/
1. Searching For Structures
Blast against the PDB
http://www.ncbi.nlm.nih.gov/BLAST/
1. Searching For Structures
Use fold recognition - FFAS03
1. Paste
sequence
Select the PDB
database
Run
1. Searching For Structures
Use fold recognition - HHPRED
http://toolkit.tuebingen.mpg.de/hhpred
Select the PDB
database
1. Paste
sequence
Run
2. Selecting templates
2. Selecting templates
Blast against the PDB
The real structure
of our protein
Closest homologous
structure
2. Selecting templates
Blast against the PDB
The selected
template:
1VM6, chain A
http://www.ncbi.nlm.nih.gov/BLAST/
2. Selecting templates
Use fold recognition - FFAS03
http://ffas.ljcrf.edu/ffas-cgi/cgi/get_mu.pl?ses=&qdb=public&tdb
=PDB0408&type=re&key=221830166.3750.0000000
2. Selecting templates
Use fold recognition - FFAS03
Scores below -9.5  significant
2. Selecting templates
Use fold recognition - HHPRED
http://toolkit.tuebingen.mpg.de/hhpred/histograms/8455009
2. Selecting templates
Use fold recognition - HHPRED
2. Selecting templates
Who is our template?
PDB ID 1VM6 is
UniProt entry
‘DAPB_THEMA’
www.ebi.ac.uk/thornton-srv/databases/pdbsum
3. Alignment
3. Alignment
http://consurftest.tau.ac.il/
3. Alignment
No model
yet…
We will use ConSurf to
get homologues and
build and MSA
3. Alignment
Set to
max- 500
Redundanc
y
Min. identity
Alignment
method
Database;
Swissprot/uniprot/
uniref90/NR
3. Alignment
Job name
Email
3. Alignment
3. Alignment
PSIBLAST result
Filtered sequences
MSA- download the file- right
click on the mouse
Easiest Using Bioedit
• http://www.mbio.ncsu.edu/BioEdit/BioEdit.html
• Easy-to-use sequence alignment editor
• View and manipulate alignments up to 20,000 sequences.
•Four modes of manual alignment: select and slide, dynamic grab
and drag, gap insert and delete by mouse click, and on-screen
typing which behaves like a text editor.
•Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and
NBRF/PIR formats. Also reads GCG and Clustal formats
Easiest Using Bioedit
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
Easiest Using Bioedit
• Find a specific sequence: “Edit-> search -> in titles”
• Erase\add sequences: “Edit-> cut\paste\delete sequence”
• “Sequence Identity matrix” under “Alignment”useful for a rough evaluation of distances within the alignment.
• After taking out sequences, “Minimize Alignment” under
“Alignment” takes out unessential gaps.
• Can save an image using:
“File -> Graphic View” & then “Edit -> Copy page as BITMAP”
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
3. Alignment
Extract query-template pairwise alignment
1. Open: Start  Phylogeny  BioEdit
2. Open the alignment: file  open  ‘query.aln’
2. Select the template:
Edit  Search  Find in Titles  “DAPB_THEMA”
3. Alignment
Extract query-template pairwise alignment
“DAPB_THEMA”
3. Alignment
Extract query-template pairwise alignment
4. Add the query to the template selection: ctrl + ‘query’
5. Invert selection: Edit  invert title selection
6. Delete other sequences: Edit  Cut Sequences(s)
7. Minimize gaps: Alignment  Minimize Alignment
8. Save the pairwise alignment:
File  Save as (Fasta format)  “DAPB_ECOLI_1VM6.fas”
3. Alignment
Extract query-template pairwise alignment
query
DAPB_THEMA
File name
Save as “fasta” format!!!!!!!
3. Alignment
Use fold recognition - FFAS03
Scores below -9.5  significant
3. Alignment
Use fold recognition - FFAS03
http://ffas.ljcrf.edu/ffas-cgi/cgi/get_mu.pl?ses=&qdb=public&tdb
=PDB0408&type=re&key=221830166.3750.0000000
3. Alignment
Use fold recognition - HHPRED
http://toolkit.tuebingen.mpg.de/hhpred/histograms/8455009
3. Alignment
Use fold recognition - HHPRED
3. Alignment
Inspect query-template pairwise alignment
• Generally speaking, in this step we would compare the
pairwise alignments computed by the three approaches:
• MSA-derived
• FFAS03
• HHPRED
• We don’t have the time/patience for that now….
• Thus, we will now edit the pairwise from the MSA- Modeller
requires a specific format, which we have to manually adjust
3. Alignment
Edit query-template pairwise alignment
The name of the query protein (this will
be the name of the modeled PDB file)
>P1; DAPB_ECOLI
Start, end and chain
sequence:DAPB_ECOLI:1:A:274:A ::::
MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGV
TVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAAD
IAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAH
ALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSR
MTFANGAVRSALWLSGKESGLFDMRDVLDLNNL*
The PDB file of the template
>P1;1VM6
(rename DAPB_THEMA)
structureX:1VM6:1:A:212:A ::::
-----MKYGIVGYSGRMGQEIQKVFSE-KGHELVLKVDV-----------------------NGVEEL-DSPDVVIDFSSPEALPKTVDLCKKYRAGLVLGTTALKEEHLQMLRELSKE
VPVVQAYNFSIGINVLKRFLSELVKVLE-DWDVEIVETHHRFKKDAPSGTAILLESAL-------------------GK----SVPIHSLRVGGVPGDHVVVFGNIGETIEIKHRAISR
TVFAIGALKAAEFLVGKDPGMYSFEEVI-----*
Save as “dapb_ecoli_1vm6.pir”
4. Model Building
A script for Modeller- copy to a text file….
from modeller import *
from modeller.automodel import *
log.verbose()
env = environ()
a = automodel(env,
alnfile = 'dapb_ecoli_1vm6.pir',
knowns = ('1VM6'),
sequence = 'DAPB_ECOLI')
a.starting_model= 1
a.ending_model = 1
a.make()
4. Model Building
4. Model Building
1. Paste the
template’s
PDB ID “1VM6”
Get the template structure
2.
http://www.rcsb.org/pdb/home/home.do
4. Model Building
Get the template structure: 1vm6 chain A
Save as:
“1VM6.pdb”
Notice:
case
sensitive!
4. Model Building
Running modeller:
1. Put the PDB file, PIR alignment and modeller
script in a specific directory, e.g. c:\test
2. Desktop  Modeller:
4. Model Building
Running modeller:
3. “cd c:\test”
4. “mod9v7 [modeller script name]
4. Model Building
Running modeller:
5. The run completed successfully:
4. Model Building
Running modeller:
6. Output files:
• Model, e.g. “P2RX1_HUMAN.B99990001.pdb”
• Log file- very important- specifies the problems of
the run
• Other, not important, files
7. Open pymol and look at your model….
8. Evaluate it- tomorrow!
4. Model Building
Edit query-template pairwise alignment
Watch out! Modeller can fail owing to:
1. Non-matching start and end points of the template
at the PIR alignment and PDB template file
2. Small discrepancies between the sequence of the
template and in the PIR alignment… may have to
manually edit the alignment a little…
This, and more, will be reported in the log file 