Download please click, ppt - Department of Statistics | Rajshahi University

Document related concepts

SR protein wikipedia , lookup

Peptide synthesis wikipedia , lookup

Paracrine signalling wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Signal transduction wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Magnesium transporter wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Metabolism wikipedia , lookup

Biosynthesis wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Genetic code wikipedia , lookup

Protein purification wikipedia , lookup

Structural alignment wikipedia , lookup

Western blot wikipedia , lookup

Interactome wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein wikipedia , lookup

Homology modeling wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Biochemistry wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Proper structural fold of protein
molecule is essential to execute its
precise functional mission
Md Abu Reza, PhD
Associate Professor
Dept of Genetic Eng & Biotech
University of Rajshahi
Bioinformatics Workshop-1
Date : 24th March, 2012
Venue : Dept of Statistics, RU
1
Higher Education Quality Enhance Project
Molecular Organization of a cell
2
Protein – The Master Molecule

Proteins control all biological systems in a cell

They either act in constituting structure or perform distinct
biological function in any physiological system

Many proteins perform their functions independently, the vast
majority of proteins interact with others for proper biological
activity

To perform the function effectively a proper structure is essential.
Without proper structure a protein is useless or even cause
malfunction in system

Conformation and functional-group chemistry controls function

Made up of 20 different types of amino-acid monomers

Proteins define what an organism is, what it looks like, how it
behaves, etc. (responsible for most phenotype)
3
Protein Function
4
Function of Proteins
5
Protein Function is Related to Structure 6
What are Proteins ?
 Proteins are biochemical compounds consisting of one
or more polypeptides typically folded into a globular or
fibrous form in a biologically functional way.
 A polypeptide is a single linear polymer chain of amino
acids bonded together by peptide bonds
 20 natural amino acids join in different permutation and
combinations in different lengths
 Once linked in the protein chain, an individual amino acid
is called a residue, and the linked series of carbon,
nitrogen, and oxygen atoms are known as the main chain
7
or protein backbone
Amino Acids
Lysine with the
carbon atoms in the
side-chain labeled
Amino
Terminal
Carboxy
Terminal
8
How peptide bonds are formed ?
•Here amino acids are both Alanine in
which the R group is a single hydrogen.
•The carboxyl acid end on the first amino
acid is orientated to the amino group of the
second amino acid.
•The -OH group and -H are removed to
form water (condensation reaction).
•The bond forms between the terminal
carbon on the first amino acid and the
nitrogen on the second amino acid.
•The backbone of the molecule has the
sequence N-C-C-N-C-C
•Polypeptides maintain this sequence no
matter how long the chain.
•The R groups project from the backbone.
•As the amino acids are added in
translation the polypeptide folds up into it
specific shape.
9
Colour codes used for atoms
Element
Color Name
Carbon
light grey
Oxygen
red
Hydrogen
white
Nitrogen
light blue
Sulfur
yellow
Phosphorus
orange
Chlorine
green
Bromine, Zinc
brown
Sodium
blue
Iron
orange
Magnesium
dark green
Calcium
dark grey
Unknown
deep pink
10
Stereochemistry
The CORN Law
H
H
View in 3D
11
Structure of the 20 naturally occurring Amino Acids
12
Structure of the 20 naturally occurring Amino Acids
13
Amino Acid Properties
The 20 amino acids can be divided into several groups based on their
properties. Important factors are charge, hydrophilicity or hydrophobicity, size,
and functional groups water-soluble proteins tend to have their hydrophobic
residues (Leu, Ile, Val, Phe, and Trp) buried in the middle of the protein,
whereas hydrophilic side-chains are exposed to the aqueous solvent.
14
Livingstone & Barton, CABIOS, 9, 745-756, 1993
15
Group I: Nonpolar amino acids
Group I amino acids are alanine,
valine, leucine, isoleucine, proline,
phenylalanine, methionine, and
tryptophan. The R groups of these
amino acids have either aliphatic or
aromatic groups. This makes them
hydrophobic (“water fearing”). In
aqueous
solutions,
globular
proteins will fold into a threedimensional shape to bury these
hydrophobic side chains in the
protein interior.
16
Group II: Polar, uncharged amino
acids
Group II amino acids are glycine,
serine, cysteine, threonine, tyrosine,
asparagine, and glutamine. The side
chains in this group possess a spectrum
of functional groups. However, most
have at least one atom (nitrogen,
oxygen, or sulfur) with electron pairs
available for hydrogen bonding to water
and other molecules. Polar aa are
hydrophilic.
17
Group III: Acidic amino acids
The two amino acids in this group are aspartic acid and glutamic acid. Each has
a carboxylic acid on its side chain that gives it acidic (proton-donating) properties.
In an aqueous solution at physiological pH, all three functional groups on these
amino acids will ionize, thus giving an overall charge of −1. In the ionic forms, the
amino acids are called aspartate and glutamate. .
18
Group IV: Basic amino acids
The three amino acids in this group are arginine, histidine, and lysine. Each side
chain is basic (i.e., can accept a proton). Lysine and arginine both exist with an
overall charge of +1 at physiological pH. The guanidino group in arginine’s side
chain is the most basic of all R groups (a fact reflected in its pKa value of 12.5).
As mentioned above for aspartate and glutamate, the side chains of arginine and
lysine also form ionic bonds. The chemical structures of Group IV amino acids
are
19
20
Why Proteins Need Structure !
Functions

Diverse functions related to structure







Structural components of cells
Motor proteins
Enzymes
Antibodies
Hormones
Hemoglobin/myoglobin
Transport proteins in blood
21
Protein structure - bonding

Interactions (forces) governing protein
structure

Covalent Interaction



Peptide bond
Disulfide bond
Non Covalent interaction





Hydrogen bond
Ionic bond (Electrostatic interactions)
Salt bridge
Van-der-Waals interactions
Hydrophobic force
22
Disulfide bond
Covalent bond
between sulfur
atoms on two
cysteine amino acids
 Very strong
Intereaction

From: Elliott, WH. Elliott, DC. (1997) Biochemistry
and Molecular Biology. Oxford: Oxford University
Press. p32
23
Levels of Protein Structure
24
Hierarchical nature of protein
structure
Primary structure (Amino acid sequence)
↓
Secondary structure (α-helix, β-sheet )
↓
Tertiary structure (Three-dimensional structure
formed by assembly of secondary structures)
↓
Quaternary structure (Structure formed by more
than one polypeptide chains)
25
Primary protein structure
Primary structure of insulin
Linear sequence of
amino acids forms
primary structure
 Sequence essential for
proper physiological
function

Bettelheim & March (1990) Introduction
to Organic & Biochemistry
(International Edition) Philadelphia:
26
Saunders College Publishing, p299
Sickle cell anemia
27
28
Sickle-Cell Disease
Secondary structure = local folding
of residues into regular patterns
29
Secondary protein structure

Peptide chains fold into secondary
structures:
 - helix
  - pleated sheet
 Random coil

30
Peptide Bonds are Planar
For a pair of amino acids linked by a peptide bond , six atoms lie in
the same plane: the  carbon atom and CO group of the first amino
acid and the NH group and  carbon atom of the second amino acid
 The C-N distance in a peptide bond is typically 1.32Å
 Two configurations are possible for a planar peptide bond. In the
trans configuration, the 2  carbon atoms are on opposite sides of
the peptide bond. In the cis confi guration, these groups are on the
31
same side of the peptide bond. Almost all peptide bonds are trans

The peptide bond is planar
32
Torsion Angle






In contrast with the peptide bond, the bonds between the amino
group and the  carbon atom and between the  carbon atom and
the carbonyl group are pure single bonds. The two adjacent rigid
peptide units may rotate about these bonds, taking on various
orientations
This freedom of rotation about two bonds of each amino acid allows
proteins to fold in many different ways. The rotations about these
bonds can be specified by torsion angles
The angle of rotation about the bond between the nitrogen and the
 carbon atoms is called phi (  )
The angle of rotation about the bond between the  carbon and the
carbonyl carbon atoms is called psi (  )
A clockwise rotation about either bond as viewed from the nitrogen
atom toward the  carbon atom or from the carbonyl group toward
the  carbon atom corresponds to a positive value
33
The  and  angles determine the path of the polypeptide chain
The peptide bond is planar
34
Ramachandran plot -- shows  and 
angles for secondary structures
A measure of the rotation of a
 and  bond usually lie
between - 180 and + 180 
35
 and  angles for secondary structures
Secondary structure conformation
Conformation
 helix
 Left handed Helix
 Helix
3-10 Helix
 Sheet Parallel
 Sheet Anti-parallel
Phi ()
-57
+57
-57
-49
-119
-119
Psi ()
-47
+47
-80
-56
113
135
Residue Conformational Preference
Conformation
 helix
 Strand
 Turn
A, L, M, Q, K, R, E
V, I,, Y, C, W, F, T
G, N, P, S, D
36
Alpha Helix
• In the -helix, the carbonyl
oxygen of residue “i” forms a
hydrogen bond with the
amide of residue “i+4”.
• Although each hydrogen
bond is relatively weak in
isolation, the sum of the
hydrogen bonds in a helix
makes it quite stable.
• The propensity of a peptide
for forming an -helix also
depends on its sequence.
37
 - helix
Shape
maintained by
hydrogen bonds
between C=O
and N-H groups
in backbone
 R groups
directed
outward from
coil

From: Elliott, WH. Elliott, DC. (1997) Biochemistry and Molecular
Biology. Oxford: Oxford University Press. p28
38
α-Helix

A loop of 13 atoms is formed between the
hydrogen bond.

3.6 amino acids per turn of helix.

Helices observed in proteins can range from four to
over forty residues long, but a typical helix
contains about ten amino acids (about three turns).

α-Helix is also called 3.613 helix, compared to πhelix 4.416 and 310 helix.

Proline is the α-breaker.
39
Propensities for forming α-helical structure
Different amino-acid sequences have different
propensities for forming α-helical structure. Methionine,
alanine, leucine, uncharged glutamate, and lysine
("MALEK" in the amino-acid 1-letter codes) all have
especially high helix-forming propensities, whereas
proline and glycine have poor helix-forming propensities.
Proline either breaks or kinks a helix, both because it
cannot donate an amide hydrogen bond (having no
amide hydrogen), and also because its side-chain
interferes sterically with the backbone of the preceding
turn - inside a helix, this forces a bend of about 30° in
the helix axis
40
Examples of α-Helical Proteins:
α-helical coiled coil
proteins:
Hair
Form superhelix
Found in myosin,
tropomyosin (muscle),
fibrin (blood clots),
keratin (hair)
Also fingernails and wool are
α-helical proteins; silk is β
41
β-sheet (-pleated sheet)





A polypeptide chain, called a β-strand, in a β-sheet is almost fully
extended rather than being tightly coiled as in the -helix
The distance between adjacent amino acids along a  strand is
approximately 3.5Å, in contrast to a distance of 1. 5Å along an 
helix
 sheet is formed by linking two or more  strands lying next to
one another through hydrogen bonds
All residues in Beta sheet have nearly the same  and  angle
Hydrogen bonds can only formed between adjacent polypeptide
chains.
R groups are directed above and below backbone
42
Parallel or Anti-parallel -Sheet
• The adjacent polypeptide chains in a -sheet can be either parallel
or anti-parallel (having the same or opposite amino-to-carboxyl
orientations, respectively).
43
H bonds between 2 same aa
H bonds between different aa
Examples of β-sheet
Proteins:
Fatty acid binding
protein -> β barrels
structure
Antibodies
more β sheets
OmpX: E. coli porin
44
44
Tertiary Structure: 3D
structure of a polypeptide
chain
Quaternary
Structure:
Polypeptide chains
assemble into
multisubunit
structures
Tetramer of hemoglobin
Cell-surface receptor CD4
45
QUATERNARY
STRUCTURE
Deoxyhaemoglobin
46
B-Turns and Loops
• -turns allow the protein backbone
to make abrupt turns.
• Again, the propensity of a peptide for forming b-turns depends on its
sequence.
• In this reverse turns, the CO group of residue i of a polypeptide is
hydrogen bonded to the NH group of residue i + 3
• In other cases, more elaborate structures are responsible for chain
47
reversals. These structures are called loops or sometimes loops (omega
loops) to suggest their overall shape
Why not here
48
49
Random coil

Not really random
structure, just nonrepeating


From: Elliott, WH. Elliott, DC. (1997) Biochemistry
and Molecular Biology. Oxford: Oxford University
Press. p27

‘Random’ coil has fixed
structure within a given
protein
Commonly called
‘connecting loop region’
Structure determined
by bonding of side
chains (i.e. not
necessarily hydrogen
bonds)
50
Tertiary protein structure

Secondary structures fold and pack together to form
tertiary structure


Usually globular shape

But can be fibrous
Tertiary structure stabilized by bonds between R
groups (i.e. side-chains)
51
Tertiary structure = global folding of
a protein chain
52
Tertiary structures are quite varied
53
Quaternary structures
54
Each Protein has a unique
structure
Amino acid sequence
NLKTEWPELVGKSVEE
AKKVILQDKPEAQIIVL
PVGTIVTMEYRIDRVR
LFVDKLDNIAEVPRVG
Folding!
55
Protein Folding
Folding is a
highly
cooperative
process (all
or none)
Protein Folding by Chaperons
•
Chaperone proteins provide a site where misfolded
proteins can fold correctly.
Folding by stabilization
of Intermediates
56
56
Central Dogma
DNA
Transcription
Pre mRNA
(hnRNA)
Splicing, Processing and maturation
mRNA
Translation
protein
57
Chaparonins
Chaparonins Assist in
Protein Folding
They segregate protein
folding from “bad influences”
58
in the cell
Classes of proteins
Functional definition:
Enzymes:
Accelerate biochemical reactions
Structural:
Form biological structures
Transport:
Carry biochemically important substances
Defense:
Protect the body from foreign invaders
Structural definition:
Globular:
Complex folds, irregularly shaped tertiary structures
Fibrous:
Extended, simple folds -- generally structural proteins
Cellular localization definition:
Membrane:
In direct physical contact with a membrane; generally
water insoluble.
Soluble:
Water soluble; can be anywhere in the cell.
59
Components of Tertiary
Structure



Fold – used differently in different contexts – most
broadly a reproducible and recognizable 3
dimensional arrangement
Domain – a compact and self folding component of
the protein that usually represents a discreet
structural and functional unit
Motif (aka supersecondary structure) a recognizable
subcomponent of the fold – several motifs usually
comprise a domain
Like all fields these terms are not used strictly making
capturing data that conforms to these terms all the
more difficult
60
Protein Structure Computational Goals
•
•
•
•
•
•
•
•
•
•
•
Compare all known structures to each other
Compute distances between protein structures
Classify and organize all structures in a biologically
meaningful way
Discover conserved substructure domain
Discover conserved substructural motifs
Find common folding patterns and structural/functional
motifs
Discover relationship between structure and function.
Study interactions between proteins and other proteins,
ligands and DNA (Protein Docking)
Use known structures and folds to infer structure from
sequence (Protein Threading)
Use known structural motifs to infer function from
structure
Many more…
Structural Classification of Proteins (SCOP)
http://scop.berkeley.edu/
•
Class
o
o
•
Fold (Architecture)
o
o
•
Major structural similarity
SSE’s in similar
arrangement
Superfamily (Topology)
o
o
•
Similar secondary
structure content
All α, all β, alternating
α/β etc
Probable common
ancestry
HMM family membership
Family
o
o
Clear evolutionary
relationship
Pairwise sequence
Classes of Protein Structures
•
•

Mainly 
Mainly 
 alternating
o
Parallel sheets, - units
•  
o
o
Anti-parallel sheets,
segregated  and regions
helices mostly on one side of
sheet
Classes of Protein Structures
• Others
o
Multi-domain, membrane and cell surface,
small proteins, peptides and fragments,
designed proteins
Folds / Architectures
• Mainly α
o
o
Bundle
Non-Bundle
• Mainly β
o
o
o
o
o
o
o
o
Single sheet
Roll
Barrel
Clam
Sandwich
Prism
4/6/7/8 Propeller
Solenoid
• α/β and α+β
• Closed
• Barrel
• Roll, ...
• Open
• Sandwich
• Clam, ...
The TIM Barrel Fold
A Conceptual Problem ...
Fold versus Topology
Another example:
Globin
vs.
Colicin
PDB Protein Database
http://www.rcsb.org/pdb/
• Protein DataBase
o
o
o
Multiple Structure Viewers
Sequence & Structure Comparison Tools
Derived Data




o
o
SCOP
CATH
pFAM
Go Terms
Education on Protein Structure
Download Structures and Entire Database
Web services for domain identification
Program
Web access
DIAL
http://www.ncbs.res.in/~faculty/mini/ddbase/dial.html
DomainParser
http://compbio.ornl.gov/structure/domainparser
DOMAK
http://www.compbio.dundee.ac.uk/Software/Domak/domak.html
PDP
http://123d.ncifcrf.gov/pdp.html
70
Protein structure prediction has
remained elusive over half a century
“Can we predict a protein structure
from its amino acid sequence?”
71
Protein Misfolding Diseases
72
Table 6-4
Misfolded proteins and Resulting Disorders
•Prions: molecules resembling ion channels,
causing serious illnesses in animals and humans
• causes protein fibrillation
Alzheimer’s Disease
•
Cause ( BSE) “mad cow disease” in cattle
73
73
MOLECULAR BIOLOGY OF
PRION DISEASE
A normal prion (left), compared to an
aberrant, disease-causing prion (right).
Cellular processing of PrP. (1). The PrP can be internalized before degradation by proteosome or
lysosomal proteases. In PrPsc, processing results in limited proteolysis (2). Limited degradation
produces PrPsc fragments, which accumulate overtime and may have a role in cell death. These
fragments lead to propagation of the PrPsc infection in adjacent cells.
A) Normal PrP can refold into PrPsc in the extra cellular space. B) Fragments of PrPsc may remain
within the cell or may be externalized by transport vesicles or by cellular rupture upon death. C)
Intracellular PrPsc could interact with PrP during intracellular processing resulting in conversion
of PrP to PrPsc in the infected cell. D) Intracellular PrP may spontaneously change conformation to
74
PrPsc.
Possible routes of propagation of ingested prions. After oral uptake, prions may penetrate the
intestinal mucosa through Mcells and reach Peyer's patches as well as the enteric nervous system.
Depending on the host, prions may replicate and accumulate in spleen and lymph nodes. Myeloid
dendritic cells are thought to mediate transport within the lymphoreticular system. From the
lymphoreticular system and likely from other sites prions proceed along the peripheral nervous
system to finally reach the brain, either directly via the vagus nerve or via the spinal cord, under
involvement of the sympathetic nervous system.
75
PRIONS CONT.
Sheep with scrapie
Kuru and Creutzfeldt-Jakob
disease in humans
76
How To Determine Protein Structure ?
77
Protein Structure Prediction
Structure:
Traditional experimental methods:
X-Ray or NMR to solve structures;
generate a few structures per day worldwide
cannot keep pace for new protein sequences
Strong demand for structure prediction:
more than 30,000 human genes;
10,000 genomes will be sequenced in the next 10 years.
Unsolved problem after efforts of two decades.
78
Protein structure and functions are
intimately related
Proteins interact with each other
The structure of a protein influences
its function by determining the other
molecules with which it can interact
and the consequences of those
interactions.
79
Experimental
methods available
to detect protein
structure and
interactions vary in
their level of
resolution.
These observations
can be classified
into four levels: (a)
atomic scale, (b)
binary interactions,
(c) complex
interactions, and
(d) cellular scale.
80
Atomic-scale methods:
showing the precise structural
relationships between interacting
atoms and residues
The highest resolution methods: e.g.,
X-ray crystallography and NMR
Not yet applied to study protein
interactions in a high-throughput
manner.
81
Binary-interaction methods:
Methods to detect interactions
between pairs of proteins
Do not reveal the precise chemical
nature of the interactions but simply
report such interactions take place
The major high-throughput
technology: the yeast two-hybrid
system
82
Complex-interaction methods:
Methods to detect interactions between
multiple proteins that form complexes.
Do not reveal the precise chemical nature of
the interactions but simply report that such
interactions take place.
The major high-throughput technology:
systematic affinity purification followed by
mass spectrometry
83
Cellular-scale methods:
Methods to determine where
proteins are localized (e.g.,
immunofluorescence)
It may be possible to determine
the function of a protein directly
from its localization
84
Principles of proteinprotein interaction analysis
These small-scale analysis methods
are also useful in proteomics because
the large-scale methods tend to
produce a significant number of false
positives
They include (a) genetic methods, (b)
bioinformatic methods, (c) Affinitybased biochemical methods, and (d)
Physical methods.
85
Genetic methods
Classical genetics can be used to
investigate protein interactions by
combining different mutations in the
same cell or organism and observing
the resulting phenotype
Suppressor mutation: A secondary
mutation that can correct the
phenotype of a primary mutation.
86
Suppressor mutation
87
Synthetic lethal effect
88
Bioinformatic methods
(A) The domain fusion method (or Rosetta
stone method):
The sequence of protein X (a singledomain protein from genome 1) is used as
a similarity search query on genome 2.
This identifies any single-domain proteins
related to protein X and also any multidomain proteins, which we can define as
protein X-Y.
As part of the same protein, domain X and
Y are likely to be functionally related.
89
The domain fusion method
(or Rosetta stone method)
The sequence of domain Y can then be used to
identify single-domain orthologs in genome 1.
Thus, Gene Y, formerly an orphan with no known
function, becomes annotated due to its association
with Gene X. The two proteins are also likely to
interact.
The sequence of protein X-Y may also identify
further domain fusions, such as protein Y-Z. This
links three proteins into a functional group and
possibly identifies an interacting complex.
90
The domain fusion method
(or Rosetta stone method)
91
Bioinformatics methods
(B) The phylogenetic profile:
It describes the pattern of presence or absence of
a particular protein across a set of organisms
whose genomes have been sequenced. If two
proteins have the same phylogenetic profile (that
is, the same pattern of presence or absence) in all
surveyed genomes, it is inferred that the two
proteins have a functional link.
A protein’s phylogenetic profile is a nearly unique
characterization of its pattern of distribution
among genomes. Hence any two proteins having
identical or similar phylogenetic profiles are likely
to be engaged in a common pathway or complex.
92
Sequence to Structure to Function
>132L:_ LYSOZYME (E.C.3.2.1.17)
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPG
SRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
Cell wall degrading enzyme
93
Correlation Between Structure &
Function
•Homologous proteins
• Conserved sequence, similar structure and function
• Example: cytochrome c
•Similar function, different sequences
• Conserved and variable regions
• Example: dehydrogenases, kinases
•Similar structure, different function
• Example: thioredoxin
94
Why must we predict structures?

Limitations of current techniques

Proteins often too large for molecular modeling
techniques

Difficult to crystallize some proteins (X-ray), slow
throughput

Difficulty getting NMR results, reliance on modeling

Far more sequences elucidated than structures

3D structures are better conserved than sequence
during evolution.
95
Predicting 3D structures from
Sequence?
Levinthal’s paradox

protein with 100 amino acids => 31100
possible structures

10-13 seconds to sample each structure
1.6*1027 years to go through each structure.

Models improve these odds
 Based
on structure stability, x-ray crystallography
96
Structure prediction methods

Ab initio


Comparative/Homology modeling


Determining structure without reference to existing
protein structures.
Determines structure based on sequence similarity.
Fold recognition/threading

Limited number of folds

Determine structure similarities independent from
sequence similarity.
97
Structure Prediction Process
98
http://www.bmm.icnet.uk/people/rob/CCP11BBS/
Protein Structure-function paradigm

Origins in the lock and key model for enzymatic activity.

Claims that rigid 3D structure of protein determines the function.

Active areas of protein structure for example active sites on
enzymes are highly conserved, other regions are more variable.

Conserved motifs are responsible for conserved functionality.

Forms the basis of proteomic studies and many other branches.

Homology is claimed to be responsible for the correlation.
99
Structure Similarity
 Refers to how well (or poorly) 3D folded
structures of proteins can be aligned
 Expected to reflect functional similarities
(interaction with other molecules)
Proteins in the TIM barrel fold family
100
Structure Similarity
 Refers to how well (or poorly) 3D folded
structures of proteins can be aligned
 Is expected to reflect functional similarities
(interaction with other molecules)
 2000: ~ 20,000 structures in PDB
~ 4,000 different folds (1:5 ratio)
 Three possible reasons:
- evolution,
- physical constraints (e.g., few ways to maximize hydrophobic
interactions),
- limits in techniques used for structure determination
 Given a new structure, the probability is high that it is
similar to an existing one
101
Why Comparing Protein Folded
Structures?
sequence
similarity
Sequence
Structure
Function
structure
similarity
 Low sequence similarity may yield very similar structures
 Sometimes high sequence similarity yields different structures
 Structure comparison is expected to provide more pertinent
information about functional (dis-)similarity among proteins,
especially with non-evolutionary relationships or non-detectable
evolutionary relationships
102
Extensions of Paradigm
Allosteric
interactions
Proteins as
biomachines
Enzyme
catalysis
Proteomics
3D structure
analysis
Assisted Protein
folding
Structure-function
paradigm
de novo
proteins
Protein self
organization
Protein misfolding
and diseases
Biotechnology
Biomedicine
Protein
engineering
103
Paradigms in structure-function
theory

Orthologues possess similar function.

Enzyme homolgues are enzymes.

Regulatory domain homologues are not enzymes.

Equivalent cellular functions are mediated in different species by
homologues.

Coding regions mutate at a slower rate than non coding regions.

Domain homologues are localised in sequence and 3d structure and
possess same order of sec. structures.

Disulphide bridges are invariant among homologues.

Convergent evolution of sequences does not occur.

Domains possess single conformations.
104
Function Assessment

Statistical analysis is hard to apply to functionality assessment.

Function prediction by homology is thus qualitative requiring expert
knowledge and careful study.

Assignment of experimental knowledge from one homologue to uncharacterized sequence is basis of function prediction.

Works best in case of orthologues, can be misleading in paralogues.
Orthologue identification is most powerful tool in molecular function
prediction. Paralogues also can have overlapping functionality, esp.
in eukaryotes.
105
Fold-function Correlation

Common folds are found in unrelated protein
families.

Folds accommodating many families are called
“superfolds”. ex: TIM-barrel

Folds in combination define overall function.

Function is better assessed as a whole of parts.
106
Exceptions to the rule – natively
unfolded proteins

Class of proteins inherently unstable structure, yet
functional. Ex : Regulatory proteins.

Unfolded in physiological state, may fold during
functional cycle.

Lack of fixed structure allows binding to multiple targets.

Target induces folding in the protein. Ex : protein-DNA or
protein-RNA interactions

Unfolded proteins easier to transfer across membranes.
107
Ground rules for Structure
Prediction

Don't always believe what programs tell you


Don't always believe what databases tell you


they're often misleading & sometimes wrong!
Don't always believe what others tell you


they're often misleading & sometimes wrong!
they're often misleading & sometimes wrong!
In short, don't be a naive user

when computers are applied to biology, it is vital to understand the
difference between mathematical & biological significance

computers don’t do biology, they do sums quickly!
108
Implication of Protein Structure and Function
109
Implication of Protein Structure and Function
Structure-Based Drug Design
Structure-based
rational drug
design is still a
major method for
drug discovery.
HIV protease inhibitor
110
CD4 on Mini Scaffold
Rational engineering of a mini-protein that
reproduces the core of the CD4 site interacting with
HIV-1 envelop glycoprotein
Vita, C. et al. Proc. Natl. Acad. Sci. USA (1999)
11
1
HIV
• Envelope
• Viral Core
The Envelope
• Bi-layer lipid outer coat
• Layer of matrix protein p17
• ~72 copies of a complex HIV
protein called spikes projects
through the surface of the virus
particle (Gelderblom et al.,Virology
1987)
• Spike protein
• Cap (3 gp120 molecule)
• Stem (3 gp41molecule)
11
2
Gelderblom et al. 1987
HIV
• Envelope
• Viral Core
The Viral Core
•
Bullet shaped core or capsid
made of viral protein p24
• The capsid surrounds 2 single
strand of HIV RNA each of which
has a copy of 9 viral gene
• ‘gap’ ‘pol’ and ‘env’ - codes for
structural proteins to make new
virus particle
• ‘env’-codes gp160 that is broken
by a viral enzyme to form gp120
and gp41 (Janeway et al. 1999)
• ‘rev’, ‘vif’, ‘vpr’, ’nef’, ‘tat’, and
‘vpu’ - infection and replication
11
• REVERSE TRANSCRIPTASE,
3 INTEGRASE and PROTEASE
Gelderblom et al. 1987
OUR IMMUNE SYSTEM
Lymphocyte
T-Cell
B-Cell
Helper: (recognize antigen, releases cytokine which signals B-cell to
produce antibody, Helps differentiation of B-cell
Suppressor: After battle stops antibody formation, and slows down
the activity of B- and other T-cell
Memory: Memorize the antigen and helps in quick response on next
attack
Cytotoxic T-cell: Recognize and directly kills infected cells
Plasma cells : produces antibody i.e. makes enough receptor
molecule in soluble form which binds to the microorganism
Memory cells: same as memory T cells and both works together
LGL or Natural Killer Cell
11
4
Large granular Lymphocyte
Function known fully, Kills tumor cell and virus infected cells
T-Cell

Four basic kinds of T cell – T helper

Secretion of chemical messenger - cytokines
which in turn stimulates more T helper cell.

So the T cells must have a particular receptor
molecule to receive this message

This receptor molecule is referred to us as a CD
or Cluster of Differentiation (Around 130 CDs has
been identified so far)

CD4 is one of these receptor which is the main
target of HIV to anchor to the T-cell and thereby
get entry to the cell and replicate there
11
5
gp41 Fusogenic domain
mediates Fusion
11
6
Fight Against AIDS

Reverse transcriptase, integrase and protease are the
enzymes targeted to design the anti HIV drugs

9 of 15 FDA approved drugs targets ‘reverse transcriptase’
eg. Zidovedin, Nevirapine, delavirdine

These are big molecules and have severe side effects
mainly on kidney

Most of the time not so effective – viral genome is able to
undergo numerous mutations in its critical areas

Co-receptor (CCR5/CXCR4) blocking but low efficiency

The other way to prevent HIV attack may be to block the
viral glycoprotein to come in contact to the CD4 receptor of
T-cell
Make fool of the virus :
Design a CD4 mimic using a mini scaffold
11
7
A group from France used scorpion toxin as scaffold and designed a
chimeric protein which mimics CD4 activity
Interaction between CD4 and gp120

Whole CD4 does not bind to GP 120. It is only a domain that binds.
D1 the most important domain of CD4 to bind to gp120

D1 has a CDR2 like loop which is the main part of D1 domain to
interact with gp120

Kwong et al. solved the structure of CD4-gp120 complex

Solved structure showed that Phe at 43rd position and Arg at
59th position are important for binding of CD4 to gp120
CD4
11
8
Interaction between CD4 and gp120

Whole CD4 does not bind to GP 120. It is only a domain that binds.
D1 the most important domain of CD4 to bind to gp120

D1 has a CDR2 like loop which is the main part of D1 domain to
interact with gp120

Kwong et al. solved the structure of CD4-gp120 complex

Solved structure showed that Phe at 43rd position and Arg at
59th position are important for binding of CD4 to gp120
11
9
Designing CD4 mimic using mini scaffold
12
0
Designing of CD4M
D1 domain of CD4

Solvent exposed amino
acid residues of the CDR2
like loop of CD4 was
transferred to
charybdotoxin scaffold

The chimeric miniprotein
designed was 33 amino
acid residues long
Charybdotoxin
scaffold
Solvent exposed residues
12
1
Implications of designing a CD4 mimic
12
2

The designed mimic can be used as an antiviral agent

In complex with viral coat proteins the CD4 mimic can
be used to formulate a vaccine against AIDS

The designed CD4 mimic can be used for developing
broad spectrum neutralizing antibodies
Fight Against AIDS
12
3
Homozygous 32 deletion in the HIV co-receptor CCR5
confers resistance to HIV infection
•
Samson, M. et al. Resistance to HIV-1 infection in Caucasian
individuals bearing mutant alleles of the CCR5 chemokine
receptor gene. Nature 382, 722–725 (1996)
•
Liu, R. et al. Homozygous defect in HIV-1 coreceptor accounts
for resistance of some multiply-exposed individuals to HIV-1
infection. Cell 86, 367–377 (1996)
 CCR5-Δ32 (or CCR5-D32 or CCR5 delta 32) is a genetic variant
of CCR5
 This allele is found in 5-14% of Europeans but is rare in Africans
and Asians
12
4
 It has been hypothesized that this allele was favored by natural
selection during the Black Death (1347), which was one of the
worst epidemic in history & 1/3 of the population of Europe died
Homozygous 32 deletion in the HIV co-receptor CCR5
confers resistance to HIV infection
•
Samson, M. et al. Resistance to HIV-1 infection in Caucasian
individuals bearing mutant alleles of the CCR5 chemokine
receptor gene. Nature 382, 722–725 (1996)
•
Liu, R. et al. Homozygous defect in HIV-1 coreceptor accounts
for resistance of some multiply-exposed individuals to HIV-1
infection. Cell 86, 367–377 (1996)
 CCR5-Δ32 (or CCR5-D32 or CCR5 delta 32) is a genetic variant
of CCR5
 This allele is found in 5-14% of Europeans but is rare in Africans
and Asians
12
5
 It has been hypothesized that this allele was favored by natural
selection during the Black Death (1347), which was one of the
worst epidemic in history & 1/3 of the population of Europe died
The authors have created a CCR5 mutant Tcell and they have used these cells in vitro
and also in in vivo mouse model to show
that it confers complete resistance to HIV
They used an engineered Zinc Finger Nuclease to
target human CCR5 efficiently to generate a doublestrand break at a predetermined site in the CCR5
coding region same as CCR5-Δ32 genotype
12
6
BIOINFORMATICS
took the leading role
For this development
12
7
Bioinformatics Bottlenecks in Bangladesh
12
8

Lack of Facilities

Lack of coordinated research

Improper course curriculum in Statistics

Improper course curriculum in Biology
THANK YOU
129