Download What is a Multiple Alignment?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Two-hybrid screening wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Catalytic triad wikipedia , lookup

Biosynthesis wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Genetic code wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Manually Adjusting Multiple
Alignments
Chris Wilton
Multiple Alignments
 Reviewing multiple alignments
– what is a multiple alignment?
 Analyzing a multiple alignment
– what makes a ‘good’ multiple alignment?
– what can it tell us, why is it useful?
 Adjusting a multiple alignment
– Alignment editors and HowTo
– Demonstration and practice
What is a Multiple Alignment?
 A comparison of sequences
– “multiple sequence alignment”
 A comparison of equivalents:
– Structurally equivalent positions
– Functionally equivalent residues
– Secondary structure elements
– Hydrophobic regions, polar residues
A Good Multiple Alignment?
 Difficult to define…
 Good ones look pretty!
– Aligned secondary structures
– Strongly conserved residues / regions
– Comparison with known structure helps
 Bad ones look chaotic and random.
A Good Multiple Alignment?
conservation
quality
consensus
☻
?
Multiple Alignment Features
 Barton (1993)
– “The position of insertions and deletions suggests
regions where surface loops exist…
Multiple Alignment Features
Multiple Alignment Features
 Barton (1993)
– “The position of insertions and deletions suggests
regions where surface loops exist…
– Conserved glycine or proline suggests a β-turn...
Multiple Alignment Features
Multiple Alignment Features
 Barton (1993)
– “The position of insertions and deletions suggests
regions where surface loops exist…
– Conserved glycine or proline suggests a β-turn…
– Residues with hydrophobic properties conserved at i,
i+2, i+4 (etc) separated by unconserved or hydrophilic
residues suggests a surface β-strand…
Multiple Alignment Features
Multiple Alignment Features
 Barton (1993)
– “The position of insertions and deletions suggests
regions where surface loops exist…
– Conserved glycine or proline suggests a β-turn…
– Residues with hydrophobic properties conserved at i,
i+2, i+4 (etc) separated by unconserved or hydrophilic
residues suggests a surface β-strand…
– A short run of hydrophobic amino acids (4 or 5 residues)
suggests a buried β-strand…
Multiple Alignment Features
Multiple Alignment Features
 Barton (1993)
– Pairs of conserved hydrophobic amino acids separated
by pairs of unconserved or hydrophilic residues
suggests an α-helix with one face packed in the protein
core. Similarly, an i, i+3, i+4, i+7 pattern of conserved
residues.”
Multiple Alignment Features
Multiple Alignment Features
 Barton (1993)
– Pairs of conserved hydrophobic amino acids separated
by pairs of unconserved or hydrophilic residues
suggests an α-helix with one face packed in the protein
core. Similarly, an i, i+3, i+4, i+7 pattern of conserved
residues.”
 Cysteine is a rare amino acid, and is often used in
disulphide bonds ( pairs of conserved cysteines )
 Charged residues ( histidine, aspartate, glutamate,
lysine, arginine ) and other polar residues
embedded in a conserved region indicate
functional importance
Multiple Alignment Features
Quality Assessment
 Bad residues
– Large distance from column consensus
 Bad columns
– Average distance from consensus is
high – “entropy”
 Bad regions
– Profile scores
 Bad quality doesn’t always mean
badly aligned!
LP
I E
MR
I M
I K
L I
VD
EQ
I G
VQ
LN
AM
MW
D
L
V
T
W
D
Y
A
A
S
L
D
F
D
N
P
G
G
A
C
R
T
T
L
I
D
R
I
N
A
I
E
V
M
A
K
L
I
Q
Quality Assessment
 Profiles
– A profile holds scores for each residue type (plus gaps)
over every column of a multiple alignment
– Concepts:
• Consensus sequence
• Amino acid similarity
– Some multiple alignment programs use profiles to build
or add to an alignment
– Any alignment, or even one sequence, can be a profile
(one sequence isn’t a very good one…)
What can we do with a MA?
 Identify subgroups (phylogeny)
– Intra-group sequence conservation
– Evolutionary relatedness (view tree)
 Identify motifs (functionality)
– Evolutionary signals
– Highly conserved residues indicate
functional or structural significance!
 Widen search for related proteins
– MA better than single sequence
– Consensus sequence / profile
useful
RPDDWHLHLR
GGIDTHVHFI
GFTLTHEHIC
PFVEPHIHLD
PKVELHVHLD
What do we want to do?
 Build a homology model?
– Accuracy
 Perform phylogenetic analysis?
– Completeness
 Functional analysis of a protein family?
– Diversity
Building the initial alignment
 Fetch related sequences and run alignment
– Clustal, Dialign, TCoffee, Muscle …
 Fetch a multiple alignment from a database
and add sequences of interest
– Pfam, ProDom, ADDA …
 Start from a motif-finding procedure
– MEME, Pratt, Gibbs Sampler …
Adjusting the alignment
1. Filter alignment:
–
–
–
–
Remove any redundancy
Remove unrelated sequences
Remove unwanted domains
Recalculate alignment if necessary
2. Look for conserved motifs, adjust any
misalignments. Try different colour schemes and
thresholds.
3. One step at a time…
Jalview Alignment Editor
Clamp, M., Cuff, J., Searle, S. M. and Barton, G. J. (2004), "The Jalview Java Alignment Editor", Bioinformatics, 20, 426-7.
Colouring your alignment
HYDROPHOBIC
/ POLAR
hydrophobic
polar
BURIED INDEX
buried
surface
β-STRAND
LIKELIHOOD
probable
unlikely
HELIX
LIKELIHOOD
probable
unlikely
Colouring your alignment
 By conservation thresholds:
Colouring your alignment
 Conservation index
Amino Acid Property
Classification
Schema, eg:
Livingstone & Barton
1993
Sequence Features
Check PDB Structures
 Load MA with sequence(s) for known PDB structure
– View >> Feature Settings >> Fetch DAS Features (wait...) OR
– Right-click >> Associate Structure with Sequence >> Discover
PDB ids (quicker)
 Right-click sequence name >> View PDB Entry
 Structure opens in new window – residues acquire MA
colours
 Highlight residues by hovering mouse over alignment or
structure
 Label residues by clicking on structure
Compare Alignment to Structure
Compare Alignment to Structure
 Crucial way of checking alignment!
 Where are gaps / insertions /deletions ?
– In secondary structures: bad
– In surface loops: okay
 Where are our key / functional residues?
– Are they in probable active site?
– Check they are clustered
– Check they are accessible, not buried
Demonstration and Practice
1. Start Jalview (click here)
2. Tools >> Preferences >>
Visual
select Maximise Window, unselect Quality, set Font Size to 8 or 9,
Colour >> Clustal, uncheck Open File
Editing
check Pad Gaps When Editing
3. File >> Input Alignment >> from URL (use this one)
4. Get used to the controls – selecting and deselecting
sequences/groups (drag mouse), dragging sequences/groups (use shift/ctrl),
selecting sequence regions, hiding sequences/groups, removing columns and
regions… Then explore menus and tools.
5. Now load this alignment – I’ve messed up a good
alignment, and now I’d like you to correct it! There are two
groups of sequences and one single sequence to adjust.
Demonstration and Practice
6. View >> Feature Settings >> DAS Settings




select Uniprot, dssp, cath, Pfam, PDBsum_ligands, PDBsum_DNAbinding,
then click ‘Save as default’
click Fetch DAS Features (then click yes at prompt) ...
Move mouse over alignment and read information about features
Move mouse over sequence names to check for PDB ids
7. Open a PDB structure (choose any)
8. View >> uncheck Show All Chains, then use up-arrow key
to increase structure size.
9. Hover mouse over structure (see how residues are
highlighted in the sequence), then do same for sequence.
Select residues in the structure by clicking them – a label
will appear. Click again to remove label.
10. Check position of insertions & deletions using this method.