Download Computational Biology 1 - Bioinformatics Institute

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal transduction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Peptide synthesis wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Expression vector wikipedia , lookup

Gene expression wikipedia , lookup

Magnesium transporter wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Metabolism wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Interactome wikipedia , lookup

Biosynthesis wikipedia , lookup

Point mutation wikipedia , lookup

Western blot wikipedia , lookup

Metalloprotein wikipedia , lookup

Genetic code wikipedia , lookup

Protein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Homology modeling wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Biochemistry wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Computational Biology 1
Protein Structure, Stability and Folding
Guna Rajagopal,
Bioinformatics Institute,
[email protected]
References : Molecular Biology of the Cell,
Cell 4th Ed.
Alberts et. al. Pg. 129 – 190
Introduction to Protein Structure,
Structure 2nd Ed.
Branden and Tooze
The Big Picture
Genome projects are providing parts lists for
the genetic and protein components of the cellular
circuitry. Bioinformatics analysis of this data
provides protein function and sometimes structure
by homology, partial identification of regulatory
sites on the DNA and functional RNAs. Partial
networks can be constructed by homology to
known biochemical networks. Genetic defects
that lead to disease can also be identified at this
level. Evolutionary relationships among
organisms can also be calculated from this data.
Structural biology provides experimental
data on the 3-dimensional structure of
biomolecules and computational approaches to
predicting structure from sequence and for
predicting biomolecular recognition. Both static
and dynamic models of biomolecular interactions
are the basis for rational drug design and
automated biochemical reaction network
prediction.
Check if regulation
is by hexamer
(AbrB)6
σH regulates
spo0A
spo0F
kinA
"additional posttranslational
regulation"
σH
"maximal expression at T0"
σ
σ
σ
A
H
spo0K
D
?
#72
P6
P3
P4
(operon)
pathway for
repression of
sporulation in
favor of
competence?
(sigH)
σA
P1P2 P5
P3P4
PhrA PhrA
PhrA
spo0H
P1P2
σ
??
σH
P6
ComA
σA
inhibits RapA
through
unknown
mechanism
PhrA
σA
PhrA*
PhrA*
I
PhrA
(secreted)
P5
orf dnaG sigA
(rpoD)
inhibitory?
*See notes
P1 P2spo0F
O1 O2
Spo0A~P
dimer?
?
rapA
σA
O3
phrA
Spo0A~P
dimer?
Spo0F
RapA
spo0B
obg
σA
KinB
kinB
KapB
σH
KinA
Spo0B
KinA
kapB
P
KinA-P
Spo0F-P
Spo0F
(Spo0A-P)2
Spo0A-P
Spo0B-P
Spo0A
kinA
SinR
RapA
SinR-SinI
Spo0A~P
dimer?
AbrB6
Hpr
spoIIA
Spo0E
SpoIIG
SinR
SinI
SinI
sinI
Commitment
to sporulation
SpoIIA
SinR
sinR
constitutive
during growth
(spo0H)
Activation
of KinB
(unk. mech.)
spoIIG
SinR
dimer?
A
AbrB (AbrB)6
AbrB6 σ
dimer?
Spo0A~P
AbrB
AbrB6
dimer?
σA
σ
PA PS spo0A
constitutive(??)
abrB
Spo0A
A σH
O1 O2
O3
spo0E
Biochemical and genetic network
analysis integrates
integrates data
data from
from all
all the
the steps
steps above
above to
to
provide
provide aa prediction
prediction of
of cellular
cellular system
system function.
function. Such
Such
analyses
analyses provide
provide insight
insight into
into how
how cells
cells process
process and
and act
act upon
upon
complex
complex external
external and
and internal
internal signals.
signals. These
These are
are the
the
fundamental
fundamental control
control mechanisms
mechanisms that:
that: 1)
1) lead
lead to
to partial
partial
penetrance
penetrance of
of genotype
genotype and
and maintenance
maintenance of
of population
population
heterogeneity,
heterogeneity, 2)
2) determine
determine reliability
reliability of
of cellular
cellular function
function and
and
the
the propensity
propensity for
for disease
disease given
given partial
partial failure
failure of
of aa network
network
component,
component, 3)
3) govern
govern adaptation
adaptation of
of pathogens
pathogens to
to
pharmaceutical
pharmaceutical attack,,
attack,, and
and 4)
4) may
may provide
provide the
the basis
basis for
for
reversal
reversal of
of development
development defects
defects and
and early
early detection
detection of
of
cellular
failure
cellular control
control failure.
failure.
Systems Biology
Ultimately,
Ultimately, integration
integration of
of genomic
genomic
data
data and
and genome
genome derived
derived data
data such
such
as
as that
that from
from gene
gene chips,
chips, structural
structural
and
and molecular
molecular dynamic
dynamic data,
data,
network
network functional
functional analyses
analyses and
and
data,
data, will
will lead
lead to
to aa quantitative
quantitative
understanding
understanding of
of differential
differential
developmental
developmental processes
processes and
and finally
finally
aa full
full tracing
tracing of
of the
the molecular
molecular
basis
basis of
of development
development from
from
fertilized
fertilized egg
egg to
to adult
adult organism
organism
Adult
1.5 mm long ~1000 cells
Why are the quantative Sciences Important to Biology?
Many of the technological innovations that allowed us to peer more closely into
the workings of living systems involve/require physical, mathematical, and
computational techniques.
•Enzymology & Metabolism
•System level understanding of regulation
•Pattern formation and development
•Protein structure & function
•Transport (Flagellar motors, ion pumps)
•Mechanisms of mutation and heredity
•Gel Electrophoresis
•NMR/XRAY structural analysis and imaging
•Sequence assembly and analysis
•Mass Spectrometry in Proteomics
•control theory,
•Neuronal signalling and modeling
•Stochastic processes
•Machine learning and knowledge discovery
•Data mining
•Adaptive complex systems theory
Biology in the High-Throughput Era
Genomes
Gene
Products
Pathways &
Physiology
Structure &
Function
Populations
& Evolution
Ecosystems
z
Scientific Challenges
z
Algorithmic Challenges
z
Data Integration Challenges
z
Computational Challenges
Recent Nobel prizes in medicine went to discoveries that had
profound physical implications for cellular function.
1997- Discovery of the Prions
A prion is an infectious agent that has no genetic material. Unlike most
proteins it can fold into more than one structure. One of the structures is
“healthy”, the other forms long filaments that disrupt cellular function.
Indeed, the unhealthy form catalyzes
conversion of the healthy one!
Questions:
1) How can a protein have two stable “lowest energy states”?
2) What is the rate of inter-conversion between them?
3) What does the auto-catalysis do?
4) How much inter-conversion leads to disease?
5) Under what conditions can the disease be transmitted?
Biology after the Genome
•Great effort (and money) going into sequencing the human (and other) genome(s). The
idea was-- once we found all the parts of the cellular program then we would know how
cells functioned.
•You need to know the physical behavior of each of the parts, how they interact
amongst themselves and the environment in order to determine behavior as a whole
•We are now at a point where a physical/mathematical/computational approaches to
integrating the available biological data is necessary.
Bacteria Have ~1e10 Molecules
Percent of Total
Cell Weight
Water
Inorganic ions
Sugars and precursors
Amino acids and precursors
Nucleotides and precursors
Fatty acids and precursors
Other small molecules
Macromolecules (proteins, nucleic
acids, and polysaccharides)
Total
70
1
1
0.4
0.4
1
0.2
26
Number of Types
of Each Molecule
1
20
250
100
100
50
~300
~3000
~4000
Complex Behaviors of Living Systems
Myxococcus
xanthus colony
undergoing
traveling wave selforganization on its
way to sporulation.
Human neutrophil tracking a
Staphylococcus.
Drosophila
melanogaster embryo
developing
The Grand Challenges
• Improve in vitro macromolecular synthesis.
• Conceptually link atomic (mutational) changes to population evolution
molecular & systems modeling).
(via
• Novel polymers for smart-materials, mirror-enzymes & drug selection.
• Model combinations of external signals & genome-programming on expression.
• Manipulate stem-cell fate & stability.
• Engineer reduction of mutation & cancerous proliferation.
• Programmed cells to replace or augment (low toxicity) drugs.
• Programming of cell and tissue morphology.
• Quantitate robustness & evolvability.
• Engineer sensor-effector feedback networks where macro-morphologies determine
the functions; past (Darwinian) or future (prosthetic).
Protein Structure
Overview
•
•
•
•
Proteins are the building blocks from which all
cells are built, i.e. they are biopolymers.
They execute nearly all cell functions i.e. they
are enzymes, channels/pumps, carry
signals/messages, serve as molecular machines,
antibodies, toxins, hormones, elastic fibers etc.
They influence how our bodies function (or
malfunction!).
Their structure (or conformation) under
physiological conditions governs their
function.
Diversity of protein structures
The diversity of viable proteins have been
constrained by natural selection to give:
•
•
•
•
desired function
adequate stability
foldability
evolvability from appropriate evolutionary
precursors.
Levinthal’s paradox and folding pathways
Overview
of
Protein
Function
Protein Architecture
Is based on 3 principles:
• Formation of a polypeptide chain
• Folding of this chain into a compact
function-enabling structure (i.e. the native
structure),
• Post-translational modification of the
folded structure.
Proteins are chains of amino acids
• Polymer – a molecule composed of repeating units
Views of a protein
Wireframe
Ball and stick
See PDB website. These figures can also be produced by RASMOL
Views of a protein
Spacefill
Cartoon
CPK colors
Carbon =
green, black,
or grey
Nitrogen =
blue
Oxygen = red
Sulfur =
yellow
Hydrogen =
white
Illustrating tight packing inside of a protein molecule
Different degrees of mobility within a protein molecule
Fluctuations of portions
of structure as seen in MD
simulations. Some parts of
the protein seem more
mobile than others.
Shape and Structure of Proteins
• The shape of a protein is specified by its
amino acid (AA) sequence.
• Proteins are made up of 20 different AA’s
each linked to its neighbour by a covalent
(peptide)
peptide bond.
• The 3-D shape (conformation) of a protein
influences its function.
AA sequence and structure
Anfinsen’s experiment (see any Biochemistry book)
Amino acid composition
Side chain
• Basic Amino Acid
Structure:
H
– The side chain, R,
varies for each of
the 20 amino acids
H
R
O
N Cα C
Amino
group
H
OH
Carboxyl
group
The side chains, R, as part of a polypeptide chain, have a different
tendency to interact among themselves and water due to their
different electrical properties and their size (steric effects). This
Influences their final conformation.
Side chain properties
• The electronegativity of carbon is at about the
middle of the scale for light elements
– Carbon does not make hydrogen bonds with water
easily – hydrophobic
– O and N are generally more likely than C to hydrogen
bond to water – hydrophilic
• We group the amino acids into three general
groups:
– Hydrophobic
– Charged (positive/basic & negative/acidic)
– Polar
The Hydrophobic Amino Acids
Engage in VdW interactions only and tendency to avoid water
Is the basis for the hydrophobic effect.
Proline severely
limits allowable
conformations!
The Charged Amino Acids
The Polar Amino Acids
Able to make hydrogen bonds to one another, the peptide backbone
and to water.
More Polar Amino Acids
And then there’s…
The Peptide Bond
Convention – start at
amino terminus and
proceed to carboxy terminus
Polypeptides
• A few amino acids in a chain are called a
polypeptide.
polypeptide A protein is usually composed of 50
to 400+ amino acids.
• Since part of the amino acid is lost during
dehydration synthesis, we call the units of a
protein amino acid residues.
residues
Interactions responsible for Stability of Polypeptides
Protein Stability
• High temperature break weak bonds that stabilize
the native state eventually converting it to the
denatured state.
• Denatured state identified by loss of biochemical
activity.
• Because the free energy difference between
denatured and native state is so small, a single
mutation can cause a stable protein to unfold. A
few additional interactions can increase stability
e.g. Taq DNA polymerase used in PCR.
• Thermophilic proteins retain their structure and
activity at high temperatures (e.g. found in microorganisms that live in thermal vents in the deep
ocean.)
Post-translational Modifications
(Important process that determines protein function)
Primary & Secondary Structure
• Primary structure = the linear sequence of amino
acids comprising a protein:
AGVGTVPMTAYGNDIQYYGQVT…
• Secondary structure
– Regular patterns of hydrogen bonding in proteins result
in two patterns that emerge in nearly every protein
structure known: the α-helix and the
β-sheet
– The location of direction of these periodic, repeating
structures is known as the secondary structure of the
protein
Secondary structure
Provide stability;
btwn AAs in
backbone
Planarity of the peptide bond
Psi (ψ) – the
angle of
rotation about
the Cα-C
bond.
Phi (φ) – the
angle of
rotation about
the N-Cα
bond.
The planar bond angles and bond
lengths are fixed.
The angles Psi
And Phi
The alpha helix
φ≈ψ
≈ −60°
Properties of the alpha helix
• φ ≈ ψ ≈ −60°
• Hydrogen bonds between C=O of residue n,
and NH of residue n+4
• 3.6 residues/turn
• 1.5 Å/residue rise
• 100°/residue turn
• 4 – 40+ residues in length
• Often amphipathic or “dual-natured”
– Half hydrophobic and half hydrophilic
– Mostly when surface-exposed
•
If we examine many α-helices,
we find trends…
– Helix formers:
formers Ala, Glu, Leu,
Met
– Helix breakers:
breakers Pro, Gly, Tyr,
Ser
The beta strand (& sheet)
φ ≈ − 135°
ψ ≈ +135°
Properties of beta sheets
• Formed of stretches of 5-10 residues in
extended conformation
• Pleated – each Cα a bit
above or below the previous
• Parallel/aniparallel,
aniparallel
contiguous/non-contiguous
The Ramachandran Plot
• G. N. Ramachandran – first calculations of sterically
allowed regions of phi and psi
Experimental observation of
Secondary Structures via
Circular Dichroism
Effect of secondary structure
on polarized light
Computed CD spectra
of poly (Lys) in alpha,
beta and random coil
conformation.
Turns and Loops
• Secondary structure elements
are connected by regions of
turns and loops
• Turns – short regions
of non-α, non-β
conformation
• Loops – larger stretches with no secondary
structure. Often disordered.
– “Random coil”
– Sequences vary much more than secondary structure
regions
Secondary Structure
Prediction
All the prediction schemes
seem to agree on approx
location of alpha helices and
Beta strands but disagree
considerably on the lengths
and end positions. Loops and
turns are very inconsistently
predicted.
Applications of many methods
more informative than a single
one.