* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2 Introduction to Molecular Biology 2.1 Genetic Information
Messenger RNA wikipedia , lookup
Genome evolution wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Expanded genetic code wikipedia , lookup
Community fingerprinting wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Molecular cloning wikipedia , lookup
List of types of proteins wikipedia , lookup
Epitranscriptome wikipedia , lookup
Non-coding RNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Biochemistry wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Protein structure prediction wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene expression wikipedia , lookup
Genetic code wikipedia , lookup
Biosynthesis wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Grundlagen der Bioinformatik, SS’08, D. Huson, April 21, 2008 7 2 Introduction to Molecular Biology We will start with a very short repetition of the basics of molecular biology, including a summary of DNA, RNA, genes, chromosomes, proteins, replication, transcription and translation. Each subject will be complemented by a typical bioinformatical problem which we will study during this course. 2.1 Genetic Information The basic laws of inheritance were discovered by Gregor Mendel in 1866. He defined the concept of a gene as the basic unit responsible for passing on characteristics to the next generation. About 75 years later the biological role of the DNA (Deoxy-Ribonucleic Acid) was elucidated by Max Perutz. In 1953 James Watson and Francis Crick deciphered the structure of the DNA and showed that DNA is the carrier of genetic information in all living organisms. Gregor Mendel J. Watson & F. Crick 2.1.1 DNA DNA is a linear molecule that is made from 4 different basic units, called nucleotides. Each contains a phosphate, a sugar and one of the four bases: adenine, guanine, cytosine and thymine (A, G, C and T). The structure of DNA is a double helix. Each helix is a nucleotide polymer, chained together by phosphodiester bounds. The two helices are held together by hydrogen bonds. These bonds are formed by pairs of bases, each base pair consists of a purine (A or G) and a pyrimidine (C or T). The base pairing rules are: G pairs (preferably) with C, and A (preferably) with T. 8 Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008 2.1.2 Replication of DNA DNA is translated into proteins, but it is also passed on to the next generation. During the process of DNA replication the two strands of the DNA are separated and each strand serves as a template for the generation of the new strand, using the complementarity of bases to duplicate genetic information. Replication s performed an enzyme called DNA polymerase. 2.1.3 Mutations Errors can occur during the replication process, in particular mutations, these are local changes in the primary sequence of the DNA. These may be • substitutions, one base is exchanged by another base, or • insertions and/or deletions, one or more bases are either inserted or deleted. Further errors are a changed arrangement of whole segments along the chromosome or an exchange of segments between two chromosomes. Mutations are the source of phenotypical variation on which natural selection acts, leading to better adapted species. Mutations may also lead to genetic diseases or cancer. The rearrangement of segments is far less probable than single mutations. Depending on the organism the substitution rates differ between 10−4 and 10−9 per genome and replication round and rearrangements are less frequent. Studying the mutation and rearrangements in genomes results in a better understanding for evolutionary processes. An example: the human and murine genome have 85% sequence identity, on average. The largest difference between the two genomes is the internal arrangement of DNA segments. Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008 2.1.4 9 Similarity A main paradigm in molecular biology is that similar gene or protein sequences implies similar functions. What does similarity mean? For a comparison of nucleotide sequences (or protein sequences) we need to “align” them: The Alignment Problem: Given two nucleotide (protein) sequences, find the alignment whose “similarity score” is optimized. Generalize this problem to several sequences (multiple alignment problem). An example: 4 tRNA sequences with the anticodon TTG: 2.2 Phylogeny We assume that all living organisms on Earth share a common origin. Thus all animals, plants and bacteria are (more or less) related. Phylogenetic studies aim at the reconstruction of genealogies of organisms as well as the timing of speciation events. Under simple models of evolution, evolutionary relationships are considered to be hierarchical and phylogenetic trees are used to represent them. The Phylogenetic Tree Reconstruction Problem: Given a multiple alignment of sequences, compute a phylogenetic tree that represents the evolution of the sequences. How to do this, how to compare results? 2.3 Gene structure The organization and structure of genes in the DNA is different in prokaryotes and eukaryotes. In prokaryotic DNA there are no introns: 10 Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008 TATA ATG Terminal exon Intron GT AG TAA TAG TGA 3’ UTR Intron GT AG internal exon(s) Stop site Acceptor site Initial exon Donor site Start site 5’ UTR Promotor In eukaryotic DNA each gene is transcribed from its own start site. The coding regions (exons) are often separated by noncoding regions (introns): Poly−A AAATAAAA hfil The Gene Finding Problem: Given a DNA sequence, determine the genes present in the sequence and determine their structures. 2.4 Central Dogma of Biology Central Dogma of Biology: 2.4.1 DNA =⇒ mRNA =⇒ protein: Translation The process of translation of DNA into a protein consists of 2 phases: • First the open reading frame of the DNA is transcribed into a messenger RNA (mRNA). The Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008 11 mRNA is synthesized from one of the two strands of the double-stranded DNA helix. The transciption reaction is performed by an enzyme called RNA Polymerase. • Then, the mRNA leaves the nucleus and moves to a Ribosome that performs synthesis to make the protein by reading the mRNA and using tRNAs to obtain the correct amino acids, which are attached to the growing polypeptide chain. 2.4.2 The Genetic Code There are two types of genes: • Non-coding genes encode RNA sequences that are used directly in the cell, for example miRNAs, which are used to regulate gene expression. • Coding genes code for proteins. The genetic code is a mapping that specifies how the genetic information of the DNA and/or RNA is translated into a protein sequence: three consecutive bases, known as a codon, determine uniquely an amino acid. There are 43 = 64 different codons and only 20 natural amino acids, thus the mapping is many-to-one. The stop codons are special, as they invoke the end of translation. Because of the redundancy of the genetic code we distinguish between synonymous mutations that do not result in a different amino acid and non-synonymous mutations that do result in a different amino acid. 2 T C 1 A G 2.5 T TTT TTC TTA TTG CTT CTC CTA CTG ATT ATC ATA ATG GTT GTC GTA GTG Phe Phe Leu Leu Leu Leu Leu Leu Ile Ile Ile Met Val Val Val Val C TCT TCC TCA TCG CCT CCC CCA CCG ACT ACC ACA ACG GCT GCC GCA GCG A Ser Ser Ser Ser Pro Pro Pro Pro Thr Thr Thr Thr Ala Ala Ala Ala TAT TAC TAA TAG CAT CAC CAA CAG AAT AAC AAA AAG GAT GAC GAA GAG Tyr Tyr Stop Stop His His Gln Gln Asn Asn Lys Lys Asp Asp Glu Glu G TGT TGC TGA TGG CGT CGC CGA CGG AGT AGC AGA AGG GGT GGC GGA GGG Cys Cys Stop Trp Arg Arg Arg Arg Ser Ser Arg Arg Gly Gly Gly Gly T C A G T C A G T C A G T C A G 3 RNA The synthesis of proteins from mRNA requires certain RNA molecules, such as tRNAs. Many other RNA molecules are essential for the different functions in the cell. in constrast to DNA, RNA molecules single-stranded. In addition, the sugar in RNA is ribose, not deoxyribose. Also thymines are replaced by uracils in RNA. Because of the single-strandedness, RNA can fold over and base-pair with itself. During the folding process biophysical laws play an important role. 12 2.5.1 Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008 Secondary structure of a tRNA Secondary Structure of RNA Problem: For a gi ven RNA sequence, determine the secondary structure that it will fold into. 2.6 Proteins Proteins are organic molecules that are responsible for most chemical reactions in the cell. A protein is a polypeptide - a macromolecule consisting of amino acids that are chained together in a linear fashion. Proteins have a complex structure on four different levels. The amino acid sequence of a protein is the primary structure. Different regions of the sequence form local regular secondary structure elements, such as α helices and β sheets. The tertiary structure results from the folding of theses structures into a three-dimensional structure. The auaternary structure arises when multiple proteins form a complex. For proteins used in organisms, the 3D structure is uniquely determined by the primary sequence of the amino acids. Since the function of a protein is determined by its structure, a prediction of the 3D structure of a protein is very important for the understanding of its role. The 3D structure of a protein can be determined experimentally with the help of x-ray cristallography or NMR. This is often a costly, lengthy and often unsuccessful process. A “Holy Grail” of Bioinformatics: Given an amino acid sequence, predict the three-dimensional structure of the corresponding protein.