Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Proteiinianalyysi 52930 (2 ov) Liisa Holm Organisaatio • Luennot & Laskuharjoitukset – 30.3.-28.4.2005, ke, to 14-16, LS 2012 – http://www.bioinfo.biocenter.helsinki.fi:8080/do wnloads/teaching/spring2005/proteiinianalyysi /index.html • Tentti – Bonusta aktiivisuudesta laskuharjoituksissa • Oheislukemisto – Lesk: Introduction to bioinformatics. Oxford University Press. Aikataulu 30.3. ke 31.3. to 6.4. ke 7.4. to 13.4. ke 14.4. to 20.4. ke 21.4. to 27.4. ke 28.4. to Luento Luento Laskuharjoitus 1 Luento Laskuharjoitus 2 Luento Laskuharjoitus 3 Luento Laskuharjoitus 4 Tentti Kurssin tavoitteet • miten proteiinisekvenssejä luetaan • proteiinien luokittelujärjestelmät • sekvenssi – rakenne – funktio • evoluutio Muut kurssit • Esitiedot: – Geneettinen bioinformatiikka 1-2 ov • sekvenssivertailu • fylogeniapuut • Soveltaminen: – Proteiinianalyysin harjoitustyöt 3 ov • webbityökalujen käyttö Johdanto Proteiinien merkitys • Proteiinit tekevät kaiken työn solussa ja ovat osallisina: – Geenisäätelyssä – Metaboliassa – Signaloinnissa – Tukirangassa – Kuljetuksessa – Solunjakautumisessa http://www.websters-online-dictionary.org/definition/english/ce/cell.html Structural proteins • Collagen 1K6F http://www.aw- Actin and muscles Enzymes • Catalytic triad: Asp, Ser, His 1CHO Transcription factors Ligand DNA 1L3L Mistä proteiinit tulevat? • DNA > RNA > proteiini – geneettinen koodi • DNAn emäskolmikko koodaa yhtä aminohappoa • 20 aminohappoa – lineaarinen sekvenssi • tyypillinen pituus 100-400 aminohappoa • keskimäärin noin 150 aminohappoa Suuri yllätys … DNA:n rakenne on hyvin säännölinen Watson & Crick (1953) Myoglobiini Proteiinin rakenteesta puuttuu symmetria Kendrew & Perutz (1957) 1mbn Proteiinit ovat erikoislaatuisia polymeerejä: • Tietyllä proteiinilla on aina sama aminohapposekvenssi – Proteiinin sekvenssi määräytyy DNAsekvenssin perusteella • Tietyllä proteiinilla on aina uniikki kolmiulotteinen rakenne. – Proteiinin rakenne määräytyy aminohapposekvenssin perusteella. aina = biologinen aina (poikkeuksia löytyy) Ei funktiota ilman rakennetta • Luonnon proteiinit laskostuvat spesifiseksi kolmiulotteiseksi rakenteeksi – komplementaarinen interaktiopartnerille • Denaturaatio tuhoaa funktion Evoluutio Sekvenssi – Rakenne - Funktio DNA-sekvenssi Luonnonvalinta Proteiinin sekvenssi Proteiinin funktio Proteiinin rakenne Sekvenssi proteiinien identifiointi • klassinen biokemia – – – – – proteiinin puhdistus molekyylipaino isoelektrinen piste CD- ym. spektroskopia jne. • laskennallinen analyysi – DNA-sekvenssi geenintunnistus, eksonit/intronit käännös proteiiniksi – sekvenssivertailut • post-genomiikka – transkriptioprofilointi, proteiini-proteiini-interaktiot, ym. Historiaa 1953 1955 1957 1975 1977 1995 1996 1998 2000 2000 DNA:n rakenne Ensimmäinen proteiinisekvenssi Myoglobiinin rakenne DNA:n sekvensointimenetelmät fX-174 faagin ’genomi’ Haemophilus influenzaen genomi Hiivan genomi Sukkulamadon genomi Ihmisen genomi Rakennegenomiikkaprojekti Genomit • DNA-sekvensointi – entsymaattinen synteesi, spesifiset terminaattorit – proteiinisekvenssit johdetaan DNA-sekvenssistä • ORF, open reading frame • varmennus: linjaus tunnetun EST:n tai cDNA:n tai proteiinin kanssa • eukaryoottien eksoni-introni-ongelma • genomiprojektit – noin 136 organismia – eukaryootteja, arkebakteereja ja eubakteereja Proteome coverage Organism Biological Features proteins S. cerevisiae (yeast) Genes for existence as a single-celled organism with the basic structure and organisation of the eukaryotic cell 6231 E. coli (bacterium) Genes for growth on external sources of energy, molecular cell transport through cell membrane, metabolic pathways and replication as a single cell 4356 - 5333 C. elegans (Nematode) Genes for development by a unique cell lineage, nervous system and reproduction 22515 D. melanogaster Model for developmental processes by hormones and cell-cell interactions (Fruit fly) 17341 H. sapiens (human) 28814 Duplicates many gene functions in other model organisms and in addition includes control of higher brain functions About 136 complete proteomes deduced from complete genomes. Täydellinen proteomi • varmuus ”puuttuvista” geeneistä • kaikki geenit eivät ekspressoidu samaan aikaan ja samassa paikassa • vaihtoehtoinen silmukointi, posttranslationaaliset modifikaatiot: yhdestä geenistä voikin tulla monta proteiinia – glykosylaatio – fosforylaatio Tietokantoja • EBI – http://www.ebi.ac.uk – http://www.ebi.ac.uk/proteome • NCBI - Entrez – http://www.ncbi.nlm.nih.gov • nrdb, ’non-redundant database’ – 490.374.618 aminohappoa – 1.504.726 sekvenssiä Rakenne Protein structure • Primary structure • Secondary structure • Super-secondary structure • Tertiary structure • Quaternary structure Secondary structure • backbone – no amino acid side chains • regular patterns – of hydrogen-bonds – backbone torsion angles • types of secondary structure –α-helix α-Helix hydrogen bond pattern: n, n+4 β-Sheet β-sheet β-strands view from the top view from the side http://broccoli.mfn.ki.se/pps_course_96 2TRX Cartoon representation 2AAC Supersecondary structures • local arrangments of secondary structure elements http://www.expasy.org/swissmod/course/text/chapter2.htm Tertiary structure 1coh Quaternary structure 1coh Protein structure determination • Protein expression – membrane proteins – aggregation • X-Ray crystallography • NMR (nuclear magnetic resonance) • Cryo-EM (electron microscopy) Structures by X-ray crystallography ➔Crystallize protein • Collect diffraction patterns • Improve iteratively: – Calculate electron density map • Phase problem – Fit amino acid trace through map X-ray crystallography • Crystallization • “An art as much as a science” Charges http://crystal.uah.edu/~carter/protein/crystal.htm Diffraction and electron density maps Intensities X-ray source Crystal Diffraction pattern Iterative refinement Resolution Higher resolution = more accurate positioning of atoms http://www.sci.sdsu.edu/TFrey/Bio750/Bio750X- NMR • • • • • Create highly concentrated protein solution Record spectra Assign peaks to residues Calculate constraints Compute structure NMR spectra 1D 2D http://www.cryst.bbk.ac.uk/PPS2/projects/schirra/html/2dnmr.htm Distance constraints from NMR • From the sequence – Topology – Bond angles – Bond lengths • From the NMR experiment – Torsion angles – Distance constraints H R Hα CO CO Torsion angle Ensemble of structures SH3-domain 1aey What is the true protein structure? • X-Ray – “frozen” state of a protein • crystal contacts ✔ large protein structure • NMR protein in solution – limited in size ✔ Molecular complexes via X-ray 30 S subunit of the ribosome Protein RNA 1fjg Cryo-EM Single particle image reconstruction Bacteriophage MS2 Koning et al. (2003) Fitting X-Ray structures into density maps GroELcomplex Hemoglobin 1gr6 Protein structure databases http://www.wwpdb.org/index.html Molekulaarinen funktio Post-genomic view: Function = S interactions (From left to right, figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center). Enzymes • Catalytic triad: Asp, Ser, His 1CHO Mechanism • • • • Enzymes speed up chemical reactions Enzymes are not consumed by the reaction Stabilization of the transition state Charge-relay cascade Convergent evolution in serine proteases • same reaction • same mechanism • same orientation of catalytic residues • different structures – Chymotrypsin: • His-57, Asp-102, Ser-195 – Subtilisin: 1cho / 1sib Substrate specificity Perona & Craik (1997) Transcription factors Ligand DNA 1L3L Hydrogen bonding pattern Vannini (2002) Funktion määritys • Biokemiallinen analyysi • Geneettinen analyysi, fenotyyppi • Proteiini-proteiini-interaktio • Työläitä menetelmiä • Määritysmenetelmä usein räätälöitävä erikseen jokaiselle funktiolle Evoluutio Evoluutio Sekvenssi – Rakenne - Funktio DNA-sekvenssi Luonnonvalinta Proteiinin sekvenssi Proteiinin funktio Proteiinin rakenne Application: Finding Homologs Application: Finding Homologues • Find Similar Ones in Different Organisms • Human vs. Mouse vs. Yeast – Easier to do Expts. on latter! (Section from NCBI Disease Genes Database Reproduced Below.) Best Sequence Similarity Matches to Date Between Positionally Cloned Human Genes and S. cerevisiae Proteins Human Disease MIM # Human Gene GenBank BLASTX Acc# for P-value Human cDNA Yeast Gene GenBank Yeast Gene Acc# for Description Yeast cDNA Hereditary Non-polyposis Colon Cancer Hereditary Non-polyposis Colon Cancer Cystic Fibrosis Wilson Disease Glycerol Kinase Deficiency Bloom Syndrome Adrenoleukodystrophy, X-linked Ataxia Telangiectasia Amyotrophic Lateral Sclerosis Myotonic Dystrophy Lowe Syndrome Neurofibromatosis, Type 1 120436 120436 219700 277900 307030 210900 300100 208900 105400 160900 309000 162200 MSH2 MLH1 CFTR WND GK BLM ALD ATM SOD1 DM OCRL NF1 U03911 U07418 M28668 U11700 L13943 U39817 Z21876 U26455 K00065 L19268 M88162 M89914 9.2e-261 6.3e-196 1.3e-167 5.9e-161 1.8e-129 2.6e-119 3.4e-107 2.8e-90 2.0e-58 5.4e-53 1.2e-47 2.0e-46 MSH2 MLH1 YCF1 CCC2 GUT1 SGS1 PXA1 TEL1 SOD1 YPK1 YIL002C IRA2 M84170 U07187 L35237 L36317 X69049 U22341 U17065 U31331 J03279 M21307 Z47047 M33779 DNA repair protein DNA repair protein Metal resistance protein Probable copper transporter Glycerol kinase Helicase Peroxisomal ABC transporter PI3 kinase Superoxide dismutase Serine/threonine protein kinase Putative IPP-5-phosphatase Inhibitory regulator protein Choroideremia Diastrophic Dysplasia Lissencephaly Thomsen Disease Wilms Tumor Achondroplasia Menkes Syndrome 303100 222600 247200 160800 194070 100800 309400 CHM DTD LIS1 CLC1 WT1 FGFR3 MNK X78121 U14528 L13385 Z25884 X51630 M58051 X69208 2.1e-42 7.2e-38 1.7e-34 7.9e-31 1.1e-20 2.0e-18 2.1e-17 GDI1 SUL1 MET30 GEF1 FZF1 IPL1 CCC2 S69371 X82013 L26505 Z23117 X67787 U07163 L36317 GDP dissociation inhibitor Sulfate permease Methionine metabolism Voltage-gated chloride channel Sulphite resistance protein Serine/threoinine protein kinase Probable copper transporter Application: Finding Homologues (cont.) • Cross-Referencing, one thing to another thing • Sequence Comparison and Scoring • Analogous Problems for Structure Comparison • Comparison has two parts: (1) Optimally Aligning 2 entities to get a Comparison Score (2) Assessing Significance of this score in a given Context Mitä hyötyä proteiinien bioinformatiikasta voisi olla? • kuvitteellinen virusepidemia – DNA-sekvenssi – vertailu tunnettuihin viruksiin [10] – antiviruslääkkeiden kehittely • virukselle spesifiset proteiinit: replikaatio- tai vaippaproteiinit [01] – tietokantahaut [15] – homologiamallitus [25] / ab initio [55] » lääkesuunnittelu, vasta-aineterapia [50] » lääkeaineen biologinen siedettävyys [75] sekvenssi rakenne Aminohappojen ominaisuudet • Proteiinit ovat itseorganisoituvia lineaarisia heteropolymeerejä, joiden sekvenssi on jalostunut luonnonvalinnassa • 20 aminohappoa – peptidirunko – sivuketju • sekvenssi määrää rakenteen Amino Acids with Sulfur-Containing R-Groups e Cys - C 1.9 10.8 ne Met-M 2.1 9.3 Acidic Amino Acids and their Amides cid Asp - D 2.0 9.9 ne Asn - N 2.1 8.8 cid Glu - E 2.1 9.5 e Gln - Q 2.2 9.1 Basic Amino Acids Aminohappojen ominaisuuksia levels of complexity in folding WHAT DO WE KNOW ABOUT PROTEIN FOLDING? • water soluable proteins are "globular," tight packed, water excluded from interior, folded up. • bond lengths and bond angles don't vary much from equilibrium positions. • structures are stable and relatively rigid. • folding possibilities are limited, both along the backbone chain and within the side chain groups. • folding motifs are used repetitively. • with similar proteins (say from different organisms) structure tends to be more conserved than the exact sequence of amino acids. • although sequence must determine structure, it is not yet possible to predict the entire structure from sequence accurately. • Net stability corresponds to a few hydrogen bonds. Sekundaarirakenne > tutorial • proteiini on kuin rasvapisara vedessä • peptidirungon pooliset ryhmät muodostavat vetysidoksia – NH -- O=C • syntyy säännönmukaisia sekundaarirakenteita • sivuketju moduloi sekundaarirakennepreferenssejä DSSP Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features W. Kabsch & C. Sander Biopolymers 22, 2577-2637 (1983) Hydrogen bonds +0.20e -0.20e +0.42e -0.42e E ~ q1 q2 [ 1/r(ON) + 1/r(CH) – 1/r(CN) – 1/r(OH) Ideal H-bond is co-linear, r(NO)=2.9 A and E=-3.0 kcal/mol Cutoffs in DSSP allow 2.2 A excess distance and ±60º angle Elementary H-bond patterns • n-turn(i) =: Hbond(i,i+n), n=3,4,5 • Parallel bridge(i,j) =: [ Hbond(i-1,j) AND Hbond(j,i+1) ] OR [ Hbond(j-1,i) AND Hbond(i,j+1) ] • Antiparallel bridge(i,j) =: [ Hbond(i,j) AND Hbond(j,i) ] OR [ Hbond(i-1,j+1) AND Hbond(j-1,i+1) ] N-turns -N-C-C--N-C-C--N-C-C--N-C-CH O H O H O H O 3-turn -N-C-C--N-C-C--N-C-C--N-C-C--N-C-CH O H O H O H O H O 4-turn -N-C-C--N-C-C--N-C-C--N-C-C-—N-C-C-—N-C-CH O H O H O H O H O H O 5-turn Parallel bridge -N-C-C--N-C-C--N-C-C--N-C-C—N-C-CH O H O H O H O H O H O H O H O H O H O -N-C-C--N-C-C--N-C-C--N-C-C—N-C-C- Antiparallel bridge -N-C-C--N-C-C--N-C-C--N-C-CH O H O H O H O O H O H O H O H -C-C-N--C-C-N--C-C-N--C-C-N- Antiparallel beta-sheet is significantly more stable due to the well aligned H-bonds. Cooperative H-bond patterns • 4-helix(i,i+3) =: [4-turn(i-1) AND 4-turn(i)] • 3-helix(i,i+2) =: [3-turn(i-1) AND 3-turn(i)] • 5-helix(i,i+4) =: [5-turn(i-1) AND 5-turn(i)] • Longer helices are defined as overlaps of minimal helices Beta-ladders and beta-sheets • Ladder =: set of one or more consecutive bridges of identical type • Sheet =: set of one or more ladders connected by shared residues • Bulge-linked ladder =: two ladders or bridges of the same type connected by at most one extra residue on one strand and at most four extra residues on the other strand 3-state secondary structure • • • • Helix Strand Loop Quoted consistency of secondary structure state definition in structures between sequence-similar proteins is ~70 % • Richer descriptions possible – E.g. phi-psi regions Amino acid preferences for different secondary structure • Alpha helix may be considered the default state for secondary structure. Although the potential energy is not as low as for beta sheet, H-bond formation is intra-strand, so there is an entropic advantage over beta sheet, where H-bonds must form from strand to strand, with strand segments that may be quite distant in the polypeptide sequence. • The main criterion for alpha helix preference is that the amino acid side chain should cover and protect the backbone H-bonds in the core of the helix. Most amino acids do this with some key exceptions. – alpha-helix preference: • Ala,Leu,Met,Phe,Glu,Gln,His,Lys,Arg • The extended structure leaves the maximum space free for the amino acid side chains: as a result, those amino acids with large bulky side chains prefer to form beta sheet structures: – just plain large:Tyr, Trp, (Phe, Met) – bulky and awkward due to branched beta carbon:Ile, Val, Thr – large S atom on beta carbon:Cys • The remaining amino acids have side chains which disrupt secondary structure, and are known as secondary structure breakers: – side chain H is too small to protect backbone H-bond:Gly – side chain linked to alpha N, has no N-H to H-bond; rigid structure due to ring restricts to phi = -60: Pro – H-bonding side chains compete directly with backbone Hbonds: Asp, Asn, Ser • Clusters of breakers give rise to regions known as loops or turns which mark the boundaries of regular secondary structure, and serve to link up secondary structure segments.