* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides #5B (Green)
Genome evolution wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Expanded genetic code wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Western blot wikipedia , lookup
Epitranscriptome wikipedia , lookup
Protein moonlighting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Interactome wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein adsorption wikipedia , lookup
List of types of proteins wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Biochemistry wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Genetic code wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Biosynthesis wikipedia , lookup
Proteolysis wikipedia , lookup
Gene expression wikipedia , lookup
Protein structure prediction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
BIOM5010 Intro to Molecular Biology James Green Systems and Computer Engineering Carleton University References/sources *1: “Introduction To Molecular Biology” by Salwa Hassan Teama (M.D.) slideshare.net D.O.E. Human Genome Program, http://www.ornl.gov/hgmis BIOC3101 slides, Prof Bill Willmore (CU) *3: BIOC3102 slides, Prof Bill Willmore (CU) *2: “Molecular Biology for Computer Scientists” by Lawrence Hunter in Artificial Intelligence & Molecular Biology *4: http://bix.ucsd.edu/bioalgorithms/slides.php 2 Overview Central Dogma (DNA --> RNA --> Protein) DNA RNA Protein Quick intro: http://www.genome.gov/Pages/EducationKit/video/qt/3D.mov 3 *1 Central Dogma of Molecular Biology 4 DNA DNA as a molecule (bases) DNA sequencing Transcription Exon/Intron Gene finding Chromosomes (centromeres) Chromosomal structure and impact on expression Epigenomics (Hamilton-Stelco story) 5 25K?? Courtesy of U.S. D.O.E. Human Genome Program, http://www.ornl.gov/hgmis The human genome 6 *1 7 *1 There are four different types of nucleotides found in DNA, differing only in the nitrogenous base: A is for adenine; G is for guanine; C is for cytosine and T is for thymine. These bases are classified based on their chemical structures into two groups: adenine and guanine are double ringed structure termed purine , thymine and cytosine are single ring structures termed pyrimidine. The bases pair in a specific way: Adenine A with thymine T (two hydrogen bonds) and guanine G with cytosine C (three hydrogen bonds). Within the structure of DNA, the number of thymine is always equal to the number of adenine and the number of cytosine is always equal to guanine. In contrast to DNA; RNA is a single stranded, the pyrimidine base uracil (U) replaces thymine and ribose sugar replaces deoxyribose. *1 DNA as a molecule *1 Genomic DNA Organization Sequencing a genome Sanger sequencing: http://www.youtube.com/view_play_list?p=F070 1633C91835BF • Now using next-generation-sequencing • Fragment DNA, then sequence indiv frags • Re-assemble into contigs to get full seq 11 RNA Creation of mature mRNA Translation (genetic code) Other types of RNA 12 *3 13 *3 14 Courtesy of U.S. D.O.E. Human Genome Program, http://www.ornl.gov/hgmis 15 *4 Splicing to produce mature mRNA *1 The Genetic Code The purine and pyrmidine bases of the DNA molecule are the letters or alphabet of the genetic code. All information contained in DNA represented by four letters: A,T,C,G. Three nucleotides of DNA (1st, 2nd and 3rd) form triplet codons, there are 64 possible codons, most amino acids have more than one possible codon. Out of the 64 possible 3-base codons, 61 specify amino acids; the other three are stop signals (UAG, UAA, or UGA). The sequence of codons in the mRNA defines the primary structure of the final protein. *1 http://www.accessexcellence.org/RC/VL/GG/genetic.php *1 Series of codons in part of a mRNA molecule. Each codon consists of three nucleotides, representing a single amino acid. Other RNAs RNA has structure Can be functional on its own (e.g. microRNA, ribozymes, aptamers) Hammerhead rybozyme 20 Protein Chain of amino acids Protein folding/structure PTMs (cleavage, localization, AA modifications e.g. hydroxylation, etc) Sequence evolution/MSA MS for identifying proteins in a mixture Protein interactions Important types of proteins 21 *1 The Protein Proteins are the basic building materials of a cell, made by cell itself; the final product of most genes. Proteins are chain like polymers of a few or many thousands of amino acids. Amino acids are represented by codons, which are 3-nucleotide RNA sequences. Amino acids joined together by peptide bonds (polypeptide). Proteins can be composed of one or more polypeptide chains. Proteins have many functions: provide structure that help cells integrity and shape (e.g. collagen in bone); serve as enzymes and hormones; bind and carry substance and control of activities of genes…. Polypeptide chain R Amino terminus of the protein chain H N H O H H Cα H N C O ψ φ Cα R C R 0° - cis 180° - trans ω N Cα H Towards carboxyl terminus H 23 24 The big picture of protein structure …GTC CAG TCA ATA GCG GTC … Genomic DNA Transcription & Translation DNA Protein Amino acid sequence (Primary protein structure) …C A W V Q S I A W S Y D R M A… Local protein folding Secondary protein structure …T T H H H H H T T E E E E… Protein folding continues… Tertiary protein structure 25 Tertiary structure Definition Locations of all atoms in the protein chain Data source Can be resolved through experimental techniques X-ray crystallography, NMR spectroscopy Unreliable, costly, not always possible Computational methods can be applied sometimes Comparative modeling, fold recognition, ab-initio predictions 26 Secondary structure Regions of local repeating regular structure Helices: α, 310, π β-strands which form β-sheets 46% of residues form non-regular structure Connecting chain between regular structures Turns, bends, binding sites Data source Can be derived from tertiary structure 27 Alpha-helices •Corkscrew shape •H-bonds between residue i and i+4 •Most compact structure From S-Star.org lecture 6, http://www.s-star.org •Most abundant regular structure •(32-38% or residues) 28 Beta-sheets From S-Star.org lecture 6, http://www.s-star.org 29 30 The importance of protein structure Proteins are involved in almost all biological processes Function is determined by structure Determining protein structure is of fundamental importance to biology Example: Current drugs target only ~500 of 30,000 proteins 31 *2 Review... 32 *1 Types of gene expression control in eukaryotes Transcriptional, prevent transcription, prevent mRNA from being synthesized. Regulatory regions, Chromosome structure Posttranscriptional, control mRNA after it has been produced. microRNAs Translational, prevent translation; involve protein factors needed for translation. Posttranslational, after the protein has been produced. Many… *1 Genomic DNA Organization Transcriptional Control 35 Microarrays Measure gene expression by quantifying mRNA levels for each gene Matrix of wells, each with probes inside Quantify using brightness of well (spot) Complementary DNA sequence “catches” mRNA Wash rest away DNA tagged with fluorescent beads Being replaced with next generation sequencing 36 37 38 Protein evolution Mutations Phylogenetics Multiple sequence alignment ID conserved regions 39 *1 DNA Mutation Mutation include both gross alteration of chromosome and more subtle alteration to specific gene sequence. Gross chromosomal aberrations include: large deletions; addition and translocation (reciprocal and nonreciprocal). Mutation in a gene's DNA sequence can alter the amino acid sequence of the protein encoded by the gene. Point mutations are the result of the substitution of a single base. Frame-shift mutations occur when the reading frame of the gene is shifted by addition or deletion of one or more bases. Mutations can have harmful, beneficial, neutral, or uncertain effects on health and may be inherited as autosomal dominant, autosomal recessive, or X-linked traits. Mutations that cause serious disability early in life are usually rare because of their adverse effect on life expectancy and reproduction. GREAT site about SNPs, personalized medicine: http://learn.genetics.utah.edu/content/health/pharma/snips/ 41 Multiple sequence alignment 42 43 *2 44 Mass Spectrometry Important proteomic tool Analytic technique Identifies proteins from sample 45 Mass Spectrometry …WDQYTDFUEFAGDUDDALLVKLKLKLMNEFLQWKEQW DGHQW… 46 Mass Spectrometry 47 Mass Spectrometry 48 Mass Spectrometry + + + + + + + 49 Mass Spectrometry + + + + + + abundance + m/z ratio (Survey ion spectrum) 50 Mass Spectrometry abundance + 2 + 1 3 m/z ratio (Survey spectrum) 51 Mass Spectrometry + + 52 Mass Spectrometry V L L D K A 53 abundance Mass Spectrometry m/z ratio (Product ion spectrum) 54 abundance Mass Spectrometry m/z ratio (Product ion spectrum) “LLVK” 55 Mass Spectrometry + abundance + 1 2 3 m/z ratio (Survey ion spectrum) 56 The Central Dogma DNA RNA Proteins 57 Post-translational modifications Phosphorylation Glycosylation Ubiquitination Methylation Others! ? ? ? ? ? 58 59 Random neat stuff Protein-protein interactions Functional genomics GFP Prions and viroids PCR 60 Protein-protein interactions Proteins often physically interact to perform function Can detect complexes in vitro or in silico 61 PIPE 62 PIPE: Homo Sapiens Global Scan First ever “complete” human interactome! 242M pairs 170K PPIs Other methods can only examine ~25% of protein pairs Computational complexity (PIPE <1s per pair) Availability of input features (e.g. structure) Used HPCVL’s Victoria Falls cluster 1168 Sun UltraSparc T2+ cores Total runtime: three months Homo Sapiens (Human)* * Image from BrainMaps.org 63 PIPE: Seasonal Allergic Rhinitis (SAR) Collaborative project with: Department of Pediatrics, Gothenburg University, Gothenburg, Sweden. The Centre for Individualized Medication, Linköping University. Linköping, Sweden. Banting and Best Department of Medical Research, Donnelly Centre, University of Toronto, Toronto, Canada. “Hay fever” Study to find new biomarkers to identify SAR in patients. Results were supported by patient data. 64 Cross-Organism Predictions PIPE capable of crossspecies predictions Makes it possible to predict PPI in a newly sequenced organism, something most methods cannot do. Can predict host-pathogen interactions (HIV, Zika, Hepatitis) 65 PIPE: Volvox/Chlamy/Gonium Collaborative project with: Chlamydomonas (C. reinhardtii) Unicellular (undifferentiated cells). Goniaceae (G. pectorale) Bradley Olson (Olson Lab, Kansas State) Pierre Durand (Wits University, South Africa) Jonathan Featherston (Agricultural Research Council, South Africa) Richard E. Michod (University of Arizona) Unicellular, but forms colonies. Volvocaceae (V. carteri) Multicellular. Richard E. Michod, Evolution of individuality during the transition from unicellular to multicellular life, PNAS, 2007 66 In-Silico Protein Synthesizer (InSiPS) PIPE + Genetic Algorithms + IBM Blue Gene/Q Create novel proteins Bind strongly with target protein Don’t bind to any other proteins (side-effects) UV Light Exposure (DNA damage) Fitness: 0.465 Target: 0.718 Max off-target:0.352 67 Functional genomics ~4500 gene deletion strains of yeast Apply treatment effect on colony growth 68 69 70 71 72 PCR 73 74 75 Questions? 76