Download 2 Introduction to Molecular Biology 2.1 Genetic Information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Messenger RNA wikipedia , lookup

Genome evolution wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Expanded genetic code wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Molecular cloning wikipedia , lookup

Mutation wikipedia , lookup

List of types of proteins wikipedia , lookup

Replisome wikipedia , lookup

Epitranscriptome wikipedia , lookup

Non-coding RNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Biochemistry wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Protein structure prediction wikipedia , lookup

Gene wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene expression wikipedia , lookup

Genetic code wikipedia , lookup

Biosynthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Grundlagen der Bioinformatik, SS’08, D. Huson, April 21, 2008
7
2 Introduction to Molecular Biology
We will start with a very short repetition of the basics of molecular biology, including a summary of
DNA, RNA, genes, chromosomes, proteins, replication, transcription and translation.
Each subject will be complemented by a typical bioinformatical problem which we will study during
this course.
2.1
Genetic Information
The basic laws of inheritance were discovered by Gregor Mendel in 1866. He defined the concept of
a gene as the basic unit responsible for passing on characteristics to the next generation. About 75
years later the biological role of the DNA (Deoxy-Ribonucleic Acid) was elucidated by Max Perutz.
In 1953 James Watson and Francis Crick deciphered the structure of the DNA and showed that DNA
is the carrier of genetic information in all living organisms.
Gregor Mendel
J. Watson & F. Crick
2.1.1
DNA
DNA is a linear molecule that is made from 4 different basic units, called nucleotides. Each contains
a phosphate, a sugar and one of the four bases: adenine, guanine, cytosine and thymine (A, G, C and
T).
The structure of DNA is a double helix. Each helix is a nucleotide polymer, chained together by
phosphodiester bounds. The two helices are held together by hydrogen bonds. These bonds are
formed by pairs of bases, each base pair consists of a purine (A or G) and a pyrimidine (C or T). The
base pairing rules are: G pairs (preferably) with C, and A (preferably) with T.
8
Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008
2.1.2
Replication of DNA
DNA is translated into proteins, but it is also passed on to the next generation. During the process of
DNA replication the two strands of the DNA are separated and each strand serves as a template for
the generation of the new strand, using the complementarity of bases to duplicate genetic information.
Replication s performed an enzyme called DNA polymerase.
2.1.3
Mutations
Errors can occur during the replication process, in particular mutations, these are local changes in the
primary sequence of the DNA. These may be
• substitutions, one base is exchanged by another base, or
• insertions and/or deletions, one or more bases are either inserted or deleted.
Further errors are a changed arrangement of whole segments along the chromosome or an exchange
of segments between two chromosomes.
Mutations are the source of phenotypical variation on which natural selection acts, leading to better
adapted species. Mutations may also lead to genetic diseases or cancer.
The rearrangement of segments is far less probable than single mutations. Depending on the organism the substitution rates differ between 10−4 and 10−9 per genome and replication round and
rearrangements are less frequent.
Studying the mutation and rearrangements in genomes results in a better understanding for evolutionary processes.
An example: the human and murine genome have 85% sequence identity, on average. The largest
difference between the two genomes is the internal arrangement of DNA segments.
Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008
2.1.4
9
Similarity
A main paradigm in molecular biology is that similar gene or protein sequences implies similar functions. What does similarity mean? For a comparison of nucleotide sequences (or protein sequences)
we need to “align” them:
The Alignment Problem: Given two nucleotide (protein) sequences, find the alignment whose
“similarity score” is optimized. Generalize this problem to several sequences (multiple alignment
problem).
An example: 4 tRNA sequences with the anticodon TTG:
2.2
Phylogeny
We assume that all living organisms on Earth share a common origin. Thus all animals, plants and
bacteria are (more or less) related. Phylogenetic studies aim at the reconstruction of genealogies of
organisms as well as the timing of speciation events. Under simple models of evolution, evolutionary
relationships are considered to be hierarchical and phylogenetic trees are used to represent them.
The Phylogenetic Tree Reconstruction Problem: Given a multiple alignment of sequences,
compute a phylogenetic tree that represents the evolution of the sequences.
How to do this, how to compare results?
2.3
Gene structure
The organization and structure of genes in the DNA is different in prokaryotes and eukaryotes.
In prokaryotic DNA there are no introns:
10
Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008
TATA
ATG
Terminal
exon
Intron
GT
AG
TAA
TAG
TGA
3’ UTR
Intron
GT
AG
internal
exon(s)
Stop site
Acceptor site
Initial
exon
Donor site
Start site
5’ UTR
Promotor
In eukaryotic DNA each gene is transcribed from its own start site. The coding regions (exons) are
often separated by noncoding regions (introns):
Poly−A
AAATAAAA
hfil
The Gene Finding Problem: Given a DNA sequence, determine the genes present in the sequence
and determine their structures.
2.4
Central Dogma of Biology
Central Dogma of Biology:
2.4.1
DNA =⇒ mRNA =⇒ protein:
Translation
The process of translation of DNA into a protein consists of 2 phases:
• First the open reading frame of the DNA is transcribed into a messenger RNA (mRNA). The
Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008
11
mRNA is synthesized from one of the two strands of the double-stranded DNA helix. The
transciption reaction is performed by an enzyme called RNA Polymerase.
• Then, the mRNA leaves the nucleus and moves to a Ribosome that performs synthesis to make
the protein by reading the mRNA and using tRNAs to obtain the correct amino acids, which
are attached to the growing polypeptide chain.
2.4.2
The Genetic Code
There are two types of genes:
• Non-coding genes encode RNA sequences that are used directly in the cell, for example miRNAs,
which are used to regulate gene expression.
• Coding genes code for proteins.
The genetic code is a mapping that specifies how the genetic information of the DNA and/or RNA is
translated into a protein sequence: three consecutive bases, known as a codon, determine uniquely an
amino acid.
There are 43 = 64 different codons and only 20 natural amino acids, thus the mapping is many-to-one.
The stop codons are special, as they invoke the end of translation. Because of the redundancy of the
genetic code we distinguish between synonymous mutations that do not result in a different amino
acid and non-synonymous mutations that do result in a different amino acid.
2
T
C
1
A
G
2.5
T
TTT
TTC
TTA
TTG
CTT
CTC
CTA
CTG
ATT
ATC
ATA
ATG
GTT
GTC
GTA
GTG
Phe
Phe
Leu
Leu
Leu
Leu
Leu
Leu
Ile
Ile
Ile
Met
Val
Val
Val
Val
C
TCT
TCC
TCA
TCG
CCT
CCC
CCA
CCG
ACT
ACC
ACA
ACG
GCT
GCC
GCA
GCG
A
Ser
Ser
Ser
Ser
Pro
Pro
Pro
Pro
Thr
Thr
Thr
Thr
Ala
Ala
Ala
Ala
TAT
TAC
TAA
TAG
CAT
CAC
CAA
CAG
AAT
AAC
AAA
AAG
GAT
GAC
GAA
GAG
Tyr
Tyr
Stop
Stop
His
His
Gln
Gln
Asn
Asn
Lys
Lys
Asp
Asp
Glu
Glu
G
TGT
TGC
TGA
TGG
CGT
CGC
CGA
CGG
AGT
AGC
AGA
AGG
GGT
GGC
GGA
GGG
Cys
Cys
Stop
Trp
Arg
Arg
Arg
Arg
Ser
Ser
Arg
Arg
Gly
Gly
Gly
Gly
T
C
A
G
T
C
A
G
T
C
A
G
T
C
A
G
3
RNA
The synthesis of proteins from mRNA requires certain RNA molecules, such as tRNAs. Many other
RNA molecules are essential for the different functions in the cell. in constrast to DNA, RNA molecules
single-stranded. In addition, the sugar in RNA is ribose, not deoxyribose. Also thymines are replaced
by uracils in RNA. Because of the single-strandedness, RNA can fold over and base-pair with itself.
During the folding process biophysical laws play an important role.
12
2.5.1
Grundlagen der Bioinformatik, SS’08, D. Huson (this part by K. Nieselt) April 21, 2008
Secondary structure of a tRNA
Secondary Structure of RNA Problem: For a gi ven RNA sequence, determine the secondary
structure that it will fold into.
2.6
Proteins
Proteins are organic molecules that are responsible for most chemical reactions in the cell. A protein
is a polypeptide - a macromolecule consisting of amino acids that are chained together in a linear
fashion. Proteins have a complex structure on four different levels.
The amino acid sequence of a protein is the primary structure. Different regions of the sequence form
local regular secondary structure elements, such as α helices and β sheets. The tertiary structure results
from the folding of theses structures into a three-dimensional structure. The auaternary structure
arises when multiple proteins form a complex.
For proteins used in organisms, the 3D structure is uniquely determined by the primary sequence of
the amino acids. Since the function of a protein is determined by its structure, a prediction of the
3D structure of a protein is very important for the understanding of its role. The 3D structure of
a protein can be determined experimentally with the help of x-ray cristallography or NMR. This is
often a costly, lengthy and often unsuccessful process.
A “Holy Grail” of Bioinformatics: Given an amino acid sequence, predict the three-dimensional
structure of the corresponding protein.