Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Structural Bioinformatics Chih-Hao Lu 陸志豪助理教授 [email protected] 學歷 國立交通大學生物資訊所 博士 專長 研究 領域 結構生物資訊、計算生物學、 演化式計算與機器學習 蛋白質區域結構模組與功能預測 蛋白質結構與動力學的相關研究 蛋白質與分子的交互作用相關研究 Mechanism of drug actions • To identify drugs that inhibit target proteins involved in diseases and have therapeutic effect against diseases – Drugs often have stronger binding affinities than natural compounds A pathway of disease Natural compound x x Protein Drug Target protein x x Protein Classification of Drug Development Unknown O O High-Throughput Screening (HTS) O O Similar compounds Structure-based Drug Design (SBDD) O O O query Known Protein (receptor) Structure Compound similarity search SBDD or de novo design O DDT 2002 Known Compound structure Unknown Central Dogma Why study protein structure? • Proteins play crucial functional roles in all biological processes: enzymatic catalysis, signaling messengers … • Function depends on 3D structure. • Easy to obtain protein sequences, difficult to determine structure. 7 From primary to quaternary Primary Structure •蛋白質的骨架是由二 十種胺基酸(Amino Acid)所組成的長條序 列 •胺基酸彼此是由胜汰 鍵(Peptide Bond)所 連結 Proteins are polypeptide chains 20 Amino Acids Hydrophobic ? Polar Charged Amino acid Abbreviated names Mt Occurrence in proteins(%) Glycine Gly G 75 7.2 Alanine Ala A 89 7.8 Valine Val V 117 6.6 Leucine Leu L 131 9.1 Isoleucine Ile I 131 5.3 Methionine Met M 149 2.3 Phenylalanine Phe F 165 3.9 Tyrosine Tyr Y 181 3.2 Tryptophan Trp W 204 1.4 Serine Ser S 105 6.8 Proline Pro P 115 5.2 Threonine Thr T 119 5.9 Cysteine Cys C 121 1.9 Asparagine Asn N 132 4.3 Glutamine Gln Q 146 4.2 Lysine Lys K 146 5.9 Histidine His H 155 2.3 Arginine Arg R 174 5.1 Aspartic acid Asp D 133 5.3 Glutamic acid Glu E 147 6.3 Secondary Structure Sequence TTCCPSIVARSNFNVC RLPGTPEAICATYTGC II a helix •平均每3.6個殘基 (Residues)形成一個 轉折 • a helix的結構是 由氫鍵(Hydrogen bonds)的交互作用形 成 310helix, a helix, p helix The a helix has a dipole moment Some amino acids are preferred in a helices •Good –Ala Glu Leu Met •Poor –Pro Gly Tyr Ser •結構具有雙向性(Amphipathic) –疏水性(Hydrophobic) –親水性(Hydrophilic) Helical wheel b sheet • b sheet 是由數個 彩帶狀的b strand 所組成的平面 •每兩個b strand可 以分成平行 (parallel)與反平 行(antiparallel) 的結構 Antiparallel b sheets Parallel b sheets Turn or Loop •連接a helix或是b strand 時,peptide bond需要作將近180 度的轉折,這些區域 就稱之為Turn •此外有一些不規則 的結構,統稱為Loop Turn Loop Hairpin loops Secondary structure elements are connected to form simple motifs Schematic diagrams of the calcium-binding motif (Luscombe, Genome Biology 2000) The hairpin b motif occurs frequently in protein structures The Greek key motif is found in antiparallel b sheets Tertiary Structure Sequence TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN Secondary Structure •數個secondary structure聚在一起,就 形成了蛋白質的三級結 構(Tertiary Structure) b sheet a Helix Tertiary Structure loop Simple motifs combine to form complex motifs Quaternary structure •由數個相同或是不 同的三級結構分子 (subunit),再結合 而成的複合體,稱為 四級結構。 How to determine the protein structure? • By experimentation – X-Ray – NMR (nuclear magnetic resonance spectroscopy) • Sequence-Structure gap 31 Structure Determination (X-ray) Publication Target Selection Crystallomics Data Collection Structure Solution Structure Refinement PDB Deposition Isolation, Expression, Purification,Crystallization Functional Annotation The first x-ray crystallographic structural results in 1958 first determination 3-D globular protein structure (myglobin) in 1958 – John Kendrew Molecular visualization • Abstract views of macromolecular – well-defined secondary structure elements (ahelices and b-strands) – Jane Richardson, 1985 • a-helix as simple cylinder or broad, spiral ribbon • b-strand as broad, flat ribbon The structure of myoglobin Molecular visualization RasMol PyMOL Swiss-Pdb Viewer MOLMOL MolScript MDL Chime Green Fluorescent Protein (GFP) Green Fluorescent Protein (GFP) Green Fluorescent Protein (GFP) Green Fluorescent Protein (GFP) The Protein Data Bank http://www.rcsb.org/ pdb/home/home.do Number of Structures Available Structure-based databases • Popular software and resources for protein structure validation – PDBSum, Procheck, What_Check • Resources classifying protein structure – SCOP, CATH, DALI, VAST, CE • Popular resources of protein interactions – Protein-Protein(DNA) interaction server, DIP, MINT • Popular resources visualizing macromolecular structures – PDBSum, NDB Atlas, STING Protein evolution and the SCOP database http://scop.berkeley.edu/ SCOP • Classes – all-b protein • can have small adornment of a or 310 helix – all-a structures • may have several regions of 310 helix, and small b-sheet outside the a-helical core – a/b (alpha and beta) • mainly parallel b sheets (b-a-b units) – a+b (alpha plus beta) • mainly antiparallel b sheets (segregated a and b region) – others • multidomain proteins, membrane and cell surface proteins, small proteins, coiled coil proteins, low-resolution structures, peptides, and designed proteins SCOP Sample Hierarchy b Rossmann fold TIM a/b Flavodoxin-like Trp biosynthesis b-Galactosidase (3) a+b a/b-Barrel Glycosyltransferase RuBisCo (C) b-Glucanase a-Amylase (N) b-Amylase Acid a-amylase Cyclodextrin glycosyltransferase Oligo-1,6 glucosidase A. niger B. circulans B. stearothermophilus 2aaa:1-353 1cdg:1-382 1cgt:1-382 B. cereus 1cyg:1-378 J. Biochem 113:646-649 Determined by structure a Root Class Fold Superfamily Family Protein Species PDB/Ref Related by homology scop The CATH domain structure database http://www.cathdb.info CATH http://www.cathdb.info/index.html Structure quality assurance • • • • • Not all structures are of equally high quality Models from X-ray crystallography Models from NMR spectroscopy Errors in deposited structures Procheck, What_Check 2YSB Ramachandran Plot • • • • A graph between the dihedral angles of an amino acid in a protein. Due to steric hindrance from amino acid side chains, only certain angles are allowed in a folded protein. A plot between the dihedral angles of individual amino acids in a protein can serve to indicate how well the structure has been determined. Any deviations from the allowed values are called Outliers and C usually indicate bad geometry Dihedral Angles N Ramachandran Plot Standard Plot showing where different secondary structures fit into the plot. A real life example. All non-glycine residues are in allowed regions. Validation So what do you think about this ? • Ideally, there should be no outliers in the Ramachandran plot, except for Glycine and Proline, which are “special” amino acids. • However, there may be some rational explanation for outliers by the scientist depositing the structure. (Always refer to the publication!). • Expect to find more than 8590% of residues to fall into the red regions. Secondary structure assignment http://swift.cmbi.ru.nl/gv/dssp/ The role of secondary structure • In structural genomics – basic unit for structure classification – main uses • • • • it is indicative of the fold it is an intuitive means of visualizing protein structure it influences the sequence alignment it is related to function – applications (ex. Secondary Structure Element) • speed up large-scale all-against-all alignment of 3D structures • comparative modeling and threading Hydrogen Bonding is Key to Automated Methods • Why? - ~90% of backbone donors (NH) and acceptors (C=O) form hydrogen bonds • Basic definition – Angle N – (H) – O greater than 120 degrees – H …O less than 2.5Å – Note H’s not usually identified directly Angle-distance hydrogen bond assignment • Baker and Hubbard assigned hydrogen bonds according to the angle N-H-O and to the distance rHO (1984) O ? <2.5Å >120° N H 1Å O ~3.122Å 30° ? 2.5Å 120° N 2.165Å 60° H 1.25Å 1Å 180° N H 1Å O 2.5Å Coulomb hydrogen bond calculation – used by DSSP 1 1 1 + - 1 E = f + + + rNO rHC' rHO rNC' • • • • f is a constant 332 Å kcal/e2 Delta is the + and – polar charge in electrons Weakest H-bond –0.5 kcal/mole in DSSP H not given – requires extrapolation – note assumes planar geometry for peptide bond DSSP • • • • • • • • H – alpha helix G = 310 helix I = Pi helix B = bridge – single residue sheet E = extended beta strand T = beta turn S = bend C = coil http://e106.life.nctu.edu.tw/~hwhuang/dssp/ DSSP as Implemented in the PDB 1ATP Identifying structural domain and function in proteins 1NTY Prediction of protein-protein or protein-DNA interaction • Sequence-based methods – Homology – Correlated Mutation • Structure-based methods – Physical docking • Hybrid methods Principles and methods of docking and ligand design • Structure-based design – Docking • Analog-based design – QSAR – (Quantitative structure-activity relationships) Most force fields consist of a summation of bonded forces associated with chemical bonds, bond angles, and bond dihedrals, and nonbonded forces associated with van der Waals forces and electrostatic charge. Fold recognition method Prediction in 1D – Secondary structure prediction – Solvent accessibility prediction – Disulfide bond prediction – Fold recognition – Enzyme class prediction – Subcellular localization prediction – Metal binding sites prediction – Disulfide connectivity prediction – Phi psi angle prediction Secondary structure prediction TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN EEEELLLLLHHHHHHHHHHHHLLLLLHHHHHHLLLLEEEELLLLL b sheet H a Helix E b sheet L loop a Helix loop Solvent accessibility prediction TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN EEEEBBBBEEEEEBBBBBBBEEEEEEBBBBBBBEEEEEEEBBBBEE E E E E B Buried E Exposed B B B B B B E E Disulfide bond prediction TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN OO R O O R Fold recognition Root classes folds a b TIM barrel a/ b superfamily TIM … Aldolase … …… family proteins species Chicken TIM TIM Human … … …… ……. a+b SCOP Structure Classification Of Proteins ? Multi-domain Membrane.. TTCCPSIVARSNFNVCRL PGTPEAICATYTGCIIIPGA TCPGDYAN Small protein .. …. ….… SCOP statistics 11 800 1294 2327 Subcellular localization prediction TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN ? Eukaryotic Cellular compartments Metal binding sites prediction TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN NNNNBNNBNNNNNBNNNNNBNNNNNNNNNNNNNNNNNNBNNN B Binding N Non-binding Phi psi angle prediction Ramachandran plot • Phi Cn-1 – Nn – Can – Cn • Psi Nn – Can – Cn – Nn+1 A B C D G E F H I J K L M N O P Q R TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN AADGJJKKCPGDANOOEEAAAAJJJJJJJJKKNNQQCCJJJJAAAA Disulfide Connectivity Prediction TTC1C2PSIVARSNFNVC3RLPGTPEAIC4ATYTGC5IIIPGATC6PGDYAN C4 C3 C1 C5 C6 C2 connectivity pattern 1-6, 2-5, 3-4 Training Data SVM Model Class 1 Class 2 Class 3 Class 4 : Features 1~N Features 1~N Features 1~N Features 1~N : Class K Features 1~N SVM Testing Data SVM Model Feature 1 Class 1 Feature 2 Class 2 Feature 3 Class 3 : : Feature N SVM : : Class K Protein Structure Prediction Sequence Sequence Homology To known fold >30% <30% Homology Modeling Threading Yes Match Found? No Model Ab initio 86 Homology modeling • The goal of protein modeling is to predict a structure from its sequence – – – – – – – Template recognition and initial alignment Alignment correction Backbone generation Loop modeling Side-chain modeling Model optimization Model validation What is Homology Modeling? Target Template KQFTKCELSQNLYDIDGYGRIALPELICTMF HTSGYDTQAIVENDESTEYGLFQISNALWCK SSQSPQSRNICDITCDKFLDDDITDDIMCAK KILDIKGIDYWIAHKALCTEKLEQWLCEKE ? Homologous Share Similar Sequence KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK FESNFNTQATNRNTDGSTDYGILQINSRWWCND GRTPGSRNLCNIPCSALLSSDITASVNCAKKIV SDGNGMNAWVAWRNRCKGTDVQAWIRGCRL Use as template 1alc 8lyz 88 Structure prediction by homology modeling Step 1 Step 2 Step 3 Step 4 89 Structure comparison and alignment 1CRN 1JXX CE http://cl.sdsc.edu/ce.html DALI http://ekhidna.biocenter.helsinki.fi/dali_server/