Download Luento - Liisa Holm`s Bioinformatics Group

Document related concepts
no text concepts found
Transcript
Proteiinianalyysi 52930 (2 ov)
Liisa Holm
Organisaatio
• Luennot & Laskuharjoitukset
– 30.3.-28.4.2005, ke, to 14-16, LS 2012
– http://www.bioinfo.biocenter.helsinki.fi:8080/do
wnloads/teaching/spring2005/proteiinianalyysi
/index.html
• Tentti
– Bonusta aktiivisuudesta laskuharjoituksissa
• Oheislukemisto
– Lesk: Introduction to bioinformatics. Oxford
University Press.
Aikataulu
30.3. ke
31.3. to
6.4. ke
7.4. to
13.4. ke
14.4. to
20.4. ke
21.4. to
27.4. ke
28.4. to
Luento
Luento
Laskuharjoitus 1
Luento
Laskuharjoitus 2
Luento
Laskuharjoitus 3
Luento
Laskuharjoitus 4
Tentti
Kurssin tavoitteet
• miten proteiinisekvenssejä luetaan
• proteiinien luokittelujärjestelmät
• sekvenssi – rakenne – funktio
• evoluutio
Muut kurssit
• Esitiedot:
– Geneettinen bioinformatiikka 1-2 ov
• sekvenssivertailu
• fylogeniapuut
• Soveltaminen:
– Proteiinianalyysin harjoitustyöt 3 ov
• webbityökalujen käyttö
Johdanto
Proteiinien merkitys
• Proteiinit tekevät kaiken työn solussa ja
ovat osallisina:
– Geenisäätelyssä
– Metaboliassa
– Signaloinnissa
– Tukirangassa
– Kuljetuksessa
– Solunjakautumisessa
http://www.websters-online-dictionary.org/definition/english/ce/cell.html
Structural proteins
• Collagen
1K6F
http://www.aw-
Actin and muscles
Enzymes
• Catalytic triad: Asp, Ser, His
1CHO
Transcription factors
Ligand
DNA
1L3L
Mistä proteiinit tulevat?
• DNA > RNA > proteiini
– geneettinen koodi
• DNAn emäskolmikko koodaa yhtä aminohappoa
• 20 aminohappoa
– lineaarinen sekvenssi
• tyypillinen pituus 100-400 aminohappoa
• keskimäärin noin 150 aminohappoa
Suuri yllätys …
DNA:n rakenne on hyvin säännölinen
Watson & Crick (1953)
Myoglobiini
Proteiinin rakenteesta puuttuu symmetria
Kendrew & Perutz (1957)
1mbn
Proteiinit ovat erikoislaatuisia
polymeerejä:
• Tietyllä proteiinilla on aina sama
aminohapposekvenssi
– Proteiinin sekvenssi määräytyy DNAsekvenssin perusteella
• Tietyllä proteiinilla on aina uniikki
kolmiulotteinen rakenne.
– Proteiinin rakenne määräytyy
aminohapposekvenssin perusteella.
aina = biologinen aina (poikkeuksia löytyy)
Ei funktiota ilman rakennetta
• Luonnon proteiinit laskostuvat spesifiseksi
kolmiulotteiseksi rakenteeksi
– komplementaarinen interaktiopartnerille
• Denaturaatio tuhoaa funktion
Evoluutio
Sekvenssi – Rakenne - Funktio
DNA-sekvenssi
Luonnonvalinta
Proteiinin sekvenssi
Proteiinin funktio
Proteiinin rakenne
Sekvenssi
proteiinien identifiointi
• klassinen biokemia
–
–
–
–
–
proteiinin puhdistus
molekyylipaino
isoelektrinen piste
CD- ym. spektroskopia
jne.
• laskennallinen analyysi
– DNA-sekvenssi  geenintunnistus, eksonit/intronit 
käännös proteiiniksi
– sekvenssivertailut
• post-genomiikka
– transkriptioprofilointi, proteiini-proteiini-interaktiot, ym.
Historiaa
1953
1955
1957
1975
1977
1995
1996
1998
2000
2000
DNA:n rakenne
Ensimmäinen proteiinisekvenssi
Myoglobiinin rakenne
DNA:n sekvensointimenetelmät
fX-174 faagin ’genomi’
Haemophilus influenzaen genomi
Hiivan genomi
Sukkulamadon genomi
Ihmisen genomi
Rakennegenomiikkaprojekti
Genomit
• DNA-sekvensointi
– entsymaattinen synteesi, spesifiset terminaattorit
– proteiinisekvenssit johdetaan DNA-sekvenssistä
• ORF, open reading frame
• varmennus: linjaus tunnetun EST:n tai cDNA:n tai proteiinin
kanssa
• eukaryoottien eksoni-introni-ongelma
• genomiprojektit
– noin 136 organismia
– eukaryootteja, arkebakteereja ja eubakteereja
Proteome coverage
Organism
Biological Features
proteins
S. cerevisiae
(yeast)
Genes for existence as a single-celled organism
with the basic structure and organisation of the
eukaryotic cell
6231
E. coli
(bacterium)
Genes for growth on external sources of energy,
molecular cell transport through cell membrane,
metabolic pathways and replication as a single
cell
4356 - 5333
C. elegans
(Nematode)
Genes for development by a unique cell lineage,
nervous system and reproduction
22515
D. melanogaster Model for developmental processes by
hormones and cell-cell interactions
(Fruit fly)
17341
H. sapiens
(human)
28814
Duplicates many gene functions in other model
organisms and in addition includes control of
higher brain functions
About 136 complete proteomes deduced from complete genomes.
Täydellinen proteomi
• varmuus ”puuttuvista” geeneistä
• kaikki geenit eivät ekspressoidu samaan
aikaan ja samassa paikassa
• vaihtoehtoinen silmukointi, posttranslationaaliset modifikaatiot: yhdestä
geenistä voikin tulla monta proteiinia
– glykosylaatio
– fosforylaatio
Tietokantoja
• EBI
– http://www.ebi.ac.uk
– http://www.ebi.ac.uk/proteome
• NCBI - Entrez
– http://www.ncbi.nlm.nih.gov
• nrdb, ’non-redundant database’
– 490.374.618 aminohappoa
– 1.504.726 sekvenssiä
Rakenne
Protein structure
• Primary structure
• Secondary structure
• Super-secondary structure
• Tertiary structure
• Quaternary structure
Secondary structure
• backbone
– no amino acid side
chains
• regular patterns
– of hydrogen-bonds
– backbone torsion
angles
• types of secondary
structure
–α-helix
α-Helix
hydrogen bond pattern: n, n+4
β-Sheet
β-sheet
β-strands
view from the top
view from the side
http://broccoli.mfn.ki.se/pps_course_96
2TRX
Cartoon representation
2AAC
Supersecondary structures
• local arrangments of secondary structure
elements
http://www.expasy.org/swissmod/course/text/chapter2.htm
Tertiary structure
1coh
Quaternary
structure
1coh
Protein structure determination
• Protein expression
– membrane proteins
– aggregation
• X-Ray crystallography
• NMR (nuclear magnetic resonance)
• Cryo-EM (electron microscopy)
Structures by X-ray
crystallography
➔Crystallize protein
• Collect diffraction patterns
• Improve iteratively:
– Calculate electron density map
• Phase problem
– Fit amino acid trace through map
X-ray crystallography
• Crystallization
• “An art as much as a science”
Charges
http://crystal.uah.edu/~carter/protein/crystal.htm
Diffraction and electron density
maps
Intensities
X-ray source
Crystal
Diffraction pattern
Iterative refinement
Resolution
Higher resolution =
more accurate positioning of atoms
http://www.sci.sdsu.edu/TFrey/Bio750/Bio750X-
NMR
•
•
•
•
•
Create highly concentrated protein solution
Record spectra
Assign peaks to residues
Calculate constraints
Compute structure
NMR spectra
1D
2D
http://www.cryst.bbk.ac.uk/PPS2/projects/schirra/html/2dnmr.htm
Distance constraints from NMR
• From the sequence
– Topology
– Bond angles
– Bond lengths
• From the NMR experiment
– Torsion angles
– Distance constraints
H
R
Hα
CO CO
Torsion angle
Ensemble of structures
SH3-domain
1aey
What is the true protein
structure?
• X-Ray
– “frozen” state of a protein
• crystal contacts
✔
large protein structure
• NMR
protein in solution
– limited in size
✔
Molecular complexes
via X-ray
30 S subunit of the ribosome
Protein
RNA
1fjg
Cryo-EM
Single particle image reconstruction
Bacteriophage MS2
Koning et al. (2003)
Fitting X-Ray structures into density
maps
GroELcomplex
Hemoglobin
1gr6
Protein
structure
databases
http://www.wwpdb.org/index.html
Molekulaarinen funktio
Post-genomic view:
Function = S interactions
(From left to right, figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR
Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory
Center).
Enzymes
• Catalytic triad: Asp, Ser, His
1CHO
Mechanism
•
•
•
•
Enzymes speed up chemical reactions
Enzymes are not consumed by the reaction
Stabilization of the transition state
Charge-relay cascade
Convergent evolution in serine
proteases
• same reaction
• same mechanism
• same orientation of
catalytic residues
• different structures
– Chymotrypsin:
• His-57, Asp-102,
Ser-195
– Subtilisin:
1cho / 1sib
Substrate specificity
Perona & Craik (1997)
Transcription factors
Ligand
DNA
1L3L
Hydrogen bonding pattern
Vannini (2002)
Funktion määritys
• Biokemiallinen analyysi
• Geneettinen analyysi, fenotyyppi
• Proteiini-proteiini-interaktio
• Työläitä menetelmiä
• Määritysmenetelmä usein räätälöitävä
erikseen jokaiselle funktiolle
Evoluutio
Evoluutio
Sekvenssi – Rakenne - Funktio
DNA-sekvenssi
Luonnonvalinta
Proteiinin sekvenssi
Proteiinin funktio
Proteiinin rakenne
Application: Finding Homologs
Application:
Finding Homologues
• Find Similar Ones in Different Organisms
• Human vs. Mouse vs. Yeast
– Easier to do Expts. on latter!
(Section from NCBI Disease Genes Database Reproduced Below.)
Best Sequence Similarity Matches to Date Between Positionally Cloned
Human Genes and S. cerevisiae Proteins
Human Disease
MIM #
Human
Gene
GenBank
BLASTX
Acc# for
P-value
Human cDNA
Yeast
Gene
GenBank
Yeast Gene
Acc# for
Description
Yeast cDNA
Hereditary Non-polyposis Colon Cancer
Hereditary Non-polyposis Colon Cancer
Cystic Fibrosis
Wilson Disease
Glycerol Kinase Deficiency
Bloom Syndrome
Adrenoleukodystrophy, X-linked
Ataxia Telangiectasia
Amyotrophic Lateral Sclerosis
Myotonic Dystrophy
Lowe Syndrome
Neurofibromatosis, Type 1
120436
120436
219700
277900
307030
210900
300100
208900
105400
160900
309000
162200
MSH2
MLH1
CFTR
WND
GK
BLM
ALD
ATM
SOD1
DM
OCRL
NF1
U03911
U07418
M28668
U11700
L13943
U39817
Z21876
U26455
K00065
L19268
M88162
M89914
9.2e-261
6.3e-196
1.3e-167
5.9e-161
1.8e-129
2.6e-119
3.4e-107
2.8e-90
2.0e-58
5.4e-53
1.2e-47
2.0e-46
MSH2
MLH1
YCF1
CCC2
GUT1
SGS1
PXA1
TEL1
SOD1
YPK1
YIL002C
IRA2
M84170
U07187
L35237
L36317
X69049
U22341
U17065
U31331
J03279
M21307
Z47047
M33779
DNA repair protein
DNA repair protein
Metal resistance protein
Probable copper transporter
Glycerol kinase
Helicase
Peroxisomal ABC transporter
PI3 kinase
Superoxide dismutase
Serine/threonine protein kinase
Putative IPP-5-phosphatase
Inhibitory regulator protein
Choroideremia
Diastrophic Dysplasia
Lissencephaly
Thomsen Disease
Wilms Tumor
Achondroplasia
Menkes Syndrome
303100
222600
247200
160800
194070
100800
309400
CHM
DTD
LIS1
CLC1
WT1
FGFR3
MNK
X78121
U14528
L13385
Z25884
X51630
M58051
X69208
2.1e-42
7.2e-38
1.7e-34
7.9e-31
1.1e-20
2.0e-18
2.1e-17
GDI1
SUL1
MET30
GEF1
FZF1
IPL1
CCC2
S69371
X82013
L26505
Z23117
X67787
U07163
L36317
GDP dissociation inhibitor
Sulfate permease
Methionine metabolism
Voltage-gated chloride channel
Sulphite resistance protein
Serine/threoinine protein kinase
Probable copper transporter
Application:
Finding Homologues (cont.)
• Cross-Referencing, one thing to another thing
• Sequence Comparison and Scoring
• Analogous Problems for Structure Comparison
• Comparison has two parts:
(1) Optimally Aligning 2 entities to get a Comparison
Score
(2) Assessing Significance of this score in a given
Context
Mitä hyötyä proteiinien
bioinformatiikasta voisi olla?
• kuvitteellinen virusepidemia
– DNA-sekvenssi
– vertailu tunnettuihin viruksiin [10]
– antiviruslääkkeiden kehittely
• virukselle spesifiset proteiinit: replikaatio- tai
vaippaproteiinit [01]
– tietokantahaut [15]
– homologiamallitus [25] / ab initio [55]
» lääkesuunnittelu, vasta-aineterapia [50]
» lääkeaineen biologinen siedettävyys [75]
sekvenssi  rakenne
Aminohappojen ominaisuudet
• Proteiinit ovat itseorganisoituvia lineaarisia
heteropolymeerejä, joiden sekvenssi on
jalostunut luonnonvalinnassa
• 20 aminohappoa
– peptidirunko
– sivuketju
• sekvenssi määrää rakenteen
Amino Acids with Sulfur-Containing R-Groups
e
Cys - C
1.9
10.8
ne
Met-M
2.1
9.3
Acidic Amino Acids and their Amides
cid
Asp - D
2.0
9.9
ne
Asn - N
2.1
8.8
cid
Glu - E
2.1
9.5
e
Gln - Q
2.2
9.1
Basic Amino Acids
Aminohappojen ominaisuuksia
levels of complexity in folding
WHAT DO WE KNOW ABOUT
PROTEIN FOLDING?
• water soluable proteins are "globular," tight packed, water excluded
from interior, folded up.
• bond lengths and bond angles don't vary much from equilibrium
positions.
• structures are stable and relatively rigid.
• folding possibilities are limited, both along the backbone chain and
within the side chain groups.
• folding motifs are used repetitively.
• with similar proteins (say from different organisms) structure tends to
be more conserved than the exact sequence of amino acids.
• although sequence must determine structure, it is not yet possible to
predict the entire structure from sequence accurately.
• Net stability corresponds to a few hydrogen bonds.
Sekundaarirakenne > tutorial
• proteiini on kuin rasvapisara vedessä
• peptidirungon pooliset ryhmät
muodostavat vetysidoksia
– NH -- O=C
• syntyy säännönmukaisia
sekundaarirakenteita
• sivuketju moduloi
sekundaarirakennepreferenssejä
DSSP
Dictionary of Protein Secondary Structure:
Pattern Recognition of Hydrogen-Bonded
and Geometrical Features
W. Kabsch & C. Sander
Biopolymers 22, 2577-2637 (1983)
Hydrogen bonds
+0.20e
-0.20e
+0.42e
-0.42e
E ~ q1 q2 [ 1/r(ON) + 1/r(CH) – 1/r(CN) – 1/r(OH)
Ideal H-bond is co-linear, r(NO)=2.9 A and E=-3.0 kcal/mol
Cutoffs in DSSP allow 2.2 A excess distance and ±60º angle
Elementary H-bond patterns
• n-turn(i) =: Hbond(i,i+n), n=3,4,5
• Parallel bridge(i,j) =:
[ Hbond(i-1,j) AND Hbond(j,i+1) ] OR
[ Hbond(j-1,i) AND Hbond(i,j+1) ]
• Antiparallel bridge(i,j) =:
[ Hbond(i,j) AND Hbond(j,i) ] OR
[ Hbond(i-1,j+1) AND Hbond(j-1,i+1) ]
N-turns
-N-C-C--N-C-C--N-C-C--N-C-CH
O H
O H
O H
O
3-turn
-N-C-C--N-C-C--N-C-C--N-C-C--N-C-CH
O H
O H
O H
O H
O
4-turn
-N-C-C--N-C-C--N-C-C--N-C-C-—N-C-C-—N-C-CH
O H
O H
O H
O H
O H
O
5-turn
Parallel bridge
-N-C-C--N-C-C--N-C-C--N-C-C—N-C-CH
O H
O H
O H
O H
O
H
O H
O H
O H
O H
O
-N-C-C--N-C-C--N-C-C--N-C-C—N-C-C-
Antiparallel bridge
-N-C-C--N-C-C--N-C-C--N-C-CH
O H
O H
O H
O
O
H O
H O
H O
H
-C-C-N--C-C-N--C-C-N--C-C-N-
Antiparallel beta-sheet is significantly more stable due to the well aligned H-bonds.
Cooperative H-bond patterns
• 4-helix(i,i+3) =: [4-turn(i-1) AND 4-turn(i)]
• 3-helix(i,i+2) =: [3-turn(i-1) AND 3-turn(i)]
• 5-helix(i,i+4) =: [5-turn(i-1) AND 5-turn(i)]
• Longer helices are defined as overlaps of
minimal helices
Beta-ladders and beta-sheets
• Ladder =: set of one or more
consecutive bridges of identical type
• Sheet =: set of one or more ladders
connected by shared residues
• Bulge-linked ladder =: two ladders or bridges of
the same type connected by at most one extra
residue on one strand and at most four extra
residues on the other strand
3-state secondary structure
•
•
•
•
Helix
Strand
Loop
Quoted consistency of secondary structure
state definition in structures between
sequence-similar proteins is ~70 %
• Richer descriptions possible
– E.g. phi-psi regions
Amino acid preferences for
different secondary structure
• Alpha helix may be considered the default state
for secondary structure. Although the potential
energy is not as low as for beta sheet, H-bond
formation is intra-strand, so there is an entropic
advantage over beta sheet, where H-bonds
must form from strand to strand, with strand
segments that may be quite distant in the
polypeptide sequence.
• The main criterion for alpha helix preference is
that the amino acid side chain should cover and
protect the backbone H-bonds in the core of
the helix. Most amino acids do this with some
key exceptions.
– alpha-helix preference:
• Ala,Leu,Met,Phe,Glu,Gln,His,Lys,Arg
• The extended structure leaves the maximum space
free for the amino acid side chains: as a result, those
amino acids with large bulky side chains prefer to
form beta sheet structures:
– just plain large:Tyr, Trp, (Phe, Met)
– bulky and awkward due to branched beta carbon:Ile, Val, Thr
– large S atom on beta carbon:Cys
• The remaining amino acids have side chains which
disrupt secondary structure, and are known as
secondary structure breakers:
– side chain H is too small to protect backbone H-bond:Gly
– side chain linked to alpha N, has no N-H to H-bond;
rigid structure due to ring restricts to phi = -60: Pro
– H-bonding side chains compete directly with backbone Hbonds: Asp, Asn, Ser
• Clusters of breakers give rise to regions known as loops
or turns which mark the boundaries of regular
secondary structure, and serve to link up secondary
structure segments.