* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 1: Bio Primer - Columbia CS
Protein adsorption wikipedia , lookup
Messenger RNA wikipedia , lookup
Molecular cloning wikipedia , lookup
Transcription factor wikipedia , lookup
Non-coding RNA wikipedia , lookup
Cell-penetrating peptide wikipedia , lookup
Promoter (genetics) wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Genetic code wikipedia , lookup
Epitranscriptome wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Gene regulatory network wikipedia , lookup
Biochemistry wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Proteolysis wikipedia , lookup
Molecular evolution wikipedia , lookup
Biosynthesis wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Chapter 1: Bio Primer 1.1 Cell Structure; DNA; RNA; transcription; translation; proteins Prof. Yechiam Yemini (YY) Computer Science Department Columbia University COMS 4761 --2007 Overview Cell structure and mechanisms DNA; RNA; Transcription; Regulation Translation; protein; sequence & structure References: B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, Garland Science. R. Horton et al, “Principles of Biochemistry”, 3rd Edition, Prentice Hall. J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition, Pearson Benjamin Cummings. NCBI Introductory overview: http://www.ncbi.nih.gov/About/primer/index.html Animation sites: o http://www.johnkyrk.com/ o http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite COMS 4761 --2007 2 1 Organisms Are Made of Cells COMS 4761 --2007 3 Prokaryotes & Eukaryotes Have Different Cells Prokaryotes: single cell organisms without nucleus E.g., Bacteria: E-coli, H-Pylori Eukaryotes: single/multi-cell organisms with nucleus E.g., Yeast, plants, drosophila, humans Earth formed -4.5B yrs Prokaryotic bacteria -3.5B yrs -1.5B yrs Nucleated cells Multi-cellular -0.5B yrs eukaryotes © Pearson; Benjamin COMS Cummings 4761 --2007 4 2 Prokaryotes Single cell; size 0.2-2µm Eukaryotes Single or multi cell; cell size 10-100µm No nucleus Nucleus Structure One membrane at cell boundary Multiple membranes/compartments DNA No organelles No cytoskeleton Organelles: mitochondria, Golgi, chloroplasts Cytoskeleton Single circular DNA Two or more chromosomes Genes code proteins Genes have large non-coding regions (introns) 90% of DNA encodes proteins 95-97% non-coding DNA Proteins ~105-6 base pairs ~107-9 base pairs DNA is loosely organized DNA is tightly packed (chromatin + histones) Cell division through fission 1-2k protein species Mitosis 5-20k protein species ~106 proteins per cell ~109 proteins per cell COMS 4761 --2007 5 Cells Are Made of Macromolecules Small molecules: 3% Macromolecules: 26% Sugars Polysaccharides Fatty Acids Fats, Lipids, Membranes Amino Acids Proteins Nucleotides Nucleic Acids (DNA, RNA) Molecules % weight Water Inorganic ions Sugars Amino acids Nucleotides Fatty acids Other small molecules Macromolecules (proteins, DNA, RNA, polysaccharides) COMS 4761 --2007 70% 1% 1% 0.4% 0.4% 1% 0.2% 26% 6 3 DNA Structure COMS 4761 --2007 7 The Central Dogma of Biology DNA Transcription RNA Translation Protein DNA stores hereditary information DNA is transcribed into RNA RNA is translated into proteins Proteins perform the key functions of cells COMS 4761 --2007 8 4 DNA Consists of Sequences of Nucleotides DNA strands are sequences of nucleotides Backbone T + T Sugar Phosphate Base Nucleotide A C T T A C G C Bases: Adenine, Guanine, Thymine, Cytosine DNA is organized in complementary double strands Hydrogen bonds hybridize complementary pairs: AT, CG 5’-end Hydrogen bonds 3’-end T A G C A T T A T A G C C G COMS 4761 --2007 G C 9 DNA Forms A Double Helix Helix full turn: 10.5bp Vertical hydrogen bonds support the structure Major and minor grooves provide access by proteins (e.g., transcription factors) COMS 4761 --2007 10 5 DNA Is Tightly Packed DNA is 2m long; needs to fold into 10-6m nucleus Chromatin beads fold around 4 histones Transcription needs to unpack the DNA to copy it COMS 4761 --2007 11 Sample Bioinformatics Challenges Sequencing the genome Discovering sequence similarity Discovering genes Analyzing evolutionary relationships Discovering other important structures Distinguishing exons from introns Regulatory structures: (promoters & transcription factors) Regions expressing micro RNA …. COMS 4761 --2007 12 6 Transcription COMS 4761 --2007 13 Schematics DNA Transcription mRNA Translation Protein COMS 4761 --2007 14 7 Overview A. Assembling transcription complex B. Transcribing DNA to mRNA C. Removing introns COMS 4761 --2007 15 Animation The Transcription Process COMS 4761 --2007 16 8 Transcription Details http://cwx.prenhall.com/horton/medialib/ From PDB COMS 4761 --2007 17 Transcription Factors TFs bind to promoters regions and to RNA polymerases TFs regulate the rate of transcription (up/down) Regulation is yet to be well understood COMS 4761 --2007 18 9 Transcription Is Regulated COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 19 Example The Lac Operon Lac consists of 3 genes; commonly transcribed Used by bacteria to transport and metabolize lactose cAMP activates transcription to initiate transport & metabolism of lactose COMS 4761 --2007 20 10 Lac Activation Low-level sugar generate cAMP cAMP binds with CRP; adjusts its alpha helix to fit the DNA grooves and binds with it CRP-cAMP accelerates polymerase binding Lac Lac COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 21 Splicing The Introns COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 22 11 From Genes To Networks Regulation is organized in networks Top: gene network regulating the body development of sea urchin Middle: a promoter region Bottom: interaction of two modules COMS 4761 --2007 23 Regulatory Networks Can Be Complex Genetic regulatory network controlling the development of the body plan of the sea urchin embryo Davidson et al., Science, 295(5560):1669-1678. COMS 4761 --2007 24 12 Sample Bioinformatics Challenges Discovering and analyzing transcription factors Evolutionary analysis; motifs finding Discovering the structure of regulatory networks Analyzing the operations of regulatory networks Designing synthetic regulatory networks COMS 4761 --2007 25 Translation COMS 4761 --2007 26 13 RNA Encodes Protein Sequences DNA Transcription RNA Translation Protein Proteins are sequences of amino-acids (AA) Translation uses RNA sequence as a template to construct AA sequence The coding problem: Code sequence of 20 amino-acids using 4 nucleic acids 2 nucleic acids can code only 42=16 amino-acids Codon: sequence of 3 nucleic acids; encodes amino acid Translation: translate mRNA codons to amino acids Start/Stop codons define an open reading frame(ORF) Translation requires reading/identifying codons and forming a respective protein sequence COMS 4761 --2007 27 The Genetic Code U U C A A G UUU Phenylalanine UUC Phe UUA Leucine UUG Leu UCU Serine UCC Ser UCA Ser UCG Ser UAU Tyrosine UAC Ty CUU Leu CUC Leu CUA Leu CUG Leu CCU Proline CCC Pro CCA Pro CCG Pro CAU Histidine CAC His CAA Glutamine CAG Gln CGU Arginine CGC Arg CGA Arg CGG Arg AAU Asparagine AAC Asn AAA Lysine AAG Lys AGU Serine AGC Ser AGA Arg AGG Arg GAU Aspartate GAC Asp GAA Glutamate GAG Glu GGU Glycine GGC Gly GGA Gly GGG Gly AUU Isoleucine AUC Ile AUA Ile AUG G C ACU Threonine ACC Thr ACA Thr Methionine ACG Thr GUU Valine GUC Val GUA Val GUG Val GCU Alanine GCC Ala GCA Ala GCG Ala UAA Stop UAG Stop COMS 4761 --2007 UGU Cysteine UGC Cys UGA Stop UGG Tryptophan 28 14 tRNA Provides Translation Units Anticodon 3’ CGA 5’ binds to codon 5’ GCU 3’ of mRNA It translates GCU to Alanine COMS 4761 --2007 http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html 29 Translation Basics Initiation: Ribosome binds to mRNA; moves in 5’3’ until it finds Start codon AUG Elongation Ribosome recruits tRNA to match next codon tRNA binds its AA into peptide bond with protein Ribosome releases tRNA and moves to next codob Termination Until a Stop codon is reached Release factor releases polypeptide from ribosome COMS 4761 --2007 http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html 30 15 Animation Translation of RNA into proteins COMS 4761 --2007 31 Proteins Are Sequences of Amino Acids Proteins are constructed through peptide bonds Proteins are folded into complex conformations Proteins perform functions by binding Transcription factors and polymerase bind to DNA Enzymes bind to molecules to accelerate their reactions Globins bind to oxygen to transport it Antibodies bind to pathogens COMS 4761 --2007 32 16 Example: Hemoglobin COMS 4761 --2007 33 Sickle-Cell Anemia: A Single Nucleotide Change Codon 6 in β-globin COMS 4761 --2007 Sickle structure 34 17 Evolution of β-Globin (α-globin cluster is coded by chromosome 16 ) COMS 4761 --2007 35 The Evolution of α-Globin Across Species COMS 4761 --2007 36 18 Protein Structures COMS 4761 --2007 37 Protein Structure Is Of Central Importance Structure is found through complex crystallography X-ray diffraction; NMR The holy-grail: compute structure from sequence Ab-initio: compute structure directly from sequence Homology techniques: use similarity to known proteins Structure is conserved across wide variations Small number of fold families (α-helix, β-sheets…) There are rules (e.g., hydrophobic AA are packed inside) Nature folds proteins very fast So why is it so difficult to predict structure? COMS 4761 --2007 38 19 SwissProt vs. PDB Statistics PDB ~30k structures COMS 4761 --2007 39 Proteins Interact Via Active Sites Protein interactions are defined by active sites E.g., antibody with pathogen E.g., drug design Proteins use geometry: ligands latch with holes Proteins use physics: electrical fields How can protein-protein interactions be computed? COMS 4761 --2007 40 20 Sample Bioinformatics Challenges Analyzing protein sequence similarity Evolutionary conservation/changes Computing structure from sequences Analyzing structure homologies Analyzing protein-2-protein interactions Inferring function from structure COMS 4761 --2007 41 The Cell Cycle COMS 4761 --2007 42 21 Cells Operate In Cycles G0 Phase cell is at rest G1 Phase (4hrs) Cell either progresses into synthesis or leaves cell cycle to differentiate S Phase (10hrs) DNA Synthesis Checkpoint determines integrity of DNA G2 Phase (4hrs) Cell prepares for Mitosis Checkpoint determines integrity of DNA DNA is repaired or cell dies (Apoptosis) Mitosis (2hrs) Chromosomes are separated Cell divides COMS 4761 --2007 43 The Cell Cycle is Regulated Transition among phases is controlled by a regulatory network Checkpoints are used to assure quality COMS 4761 --2007 44 22 Evolution COMS 4761 --2007 45 Optimizing Functionality DNA is substantially conserved through evolution Evolution = mutation + selection Mutation = single nucleotide polymorphism (SNP); duplication of entire DNA segments mating; recombination Selection = optimize fitness of species Examples Metabolic nets learn to optimize energy budget (Alon 05) Functional similarity Sequence similarity COMS 4761 --2007 46 23