Download print version

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Eukaryotic transcription wikipedia , lookup

Biochemistry wikipedia , lookup

Community fingerprinting wikipedia , lookup

Silencer (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Epitranscriptome wikipedia , lookup

RNA wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Gene expression wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Molecular cloning wikipedia , lookup

Molecular ecology wikipedia , lookup

Personalized medicine wikipedia , lookup

Gene wikipedia , lookup

DNA supercoil wikipedia , lookup

Genetic engineering wikipedia , lookup

Point mutation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
Strings in molecular biology
Algorithms for Computational Biology
Strings are finite sequences over an alphabet ⌃ (also called sequences).
Zsuzsanna Lipták
Masters in Molecular and Medical Biotechnology
a.a. 2015/16, fall term
• DNA (characters: nucleotides)
⌃ = {A,C,G,T}
Strings and Sequences in Biology
• proteins (characters: peptides)
⌃ = {A,C,D,E,F,. . . ,W,Y}
• RNA (characters: nucleotides)
• many other problems in molecular biology
⌃ = {A,C,G,U}
can be modelled by strings (e.g. gene order, SNPs, haplotypes, . . . )
2 / 10
DNA: nucleotides
The central dogma of molecular biology
5’ ...AACAGTACCATGCTAGGTCAATCGA...3’
3’ ...TTGTCATGGTACGATCCAGTTAGCT...5’
• 4 characters: A C G T: adenine, cytosine, guanine, thymine
(bases, nucleotides)
• orientation (read from 5’ to 3’ end)
• length measured in bp (base pairs)
• double stranded, the two strands are antiparallel
• A - T and C - G complementary (Watson-Crick pairs)
• reverse complement: (ACCTG)rc = CAGGT
source: Wonderwikikids.com
3 / 10
4 / 10
DNA: nucleotides
RNA: nucleotides
5’ ...AACAGTACCATGCTAGGTCAATCGA...3’
3’ ...TTGTCATGGTACGATCCAGTTAGCT...5’
• like DNA, except:
• 4 characters: A C U G: adenine, cytosine, uracil, guanine
(U instead of T)
• RNA is single-stranded
• during transcription, one strand is copied into mRNA (messenger
• builds double stranded hybrids with DNA
RNA), except all T’s are replaced by U’s
• RNA folds upon itself (makes complex 3-dim structures), using the
• the strand which is identical to the mRNA is called coding strand
• the other strand (the one which is used for the transcription) is called
Watson-Crick pairs and other bonds (RNA folding)
template strand
• Both strands can be used as coding strands (for di↵erent genes).
• Some DNA strings are circular: bacterial DNA, mitochondrial DNA.
5 / 10
6 / 10
Protein: Amino acids
The genetic code
There are 20 common amino acids (aa’s); two systems of abbreviations are
used: 3-letter-code and 1-letter-code. We usually use the 1-letter-code.
alanine
arginine
asparagine
aspartic acid
cysteine
glutamine
glutamic acid
glycine
histidine
isoleucine
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
A
R
N
D
C
Q
E
G
H
I
leucine
lysine
methionine
phenylalanine
proline
serine
threonine
tryptophan
tyrosine
valine
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
L
K
M
F
P
S
T
W
Y
V
source: Wikimedia commons
7 / 10
8 / 10
The genetic code
The genetic code
• standard genetic code (some organisms use a di↵erent one)
• standard genetic code (some organisms use a di↵erent one)
• 3 di↵erent reading frames for translation: The DNA sequence
• 3 di↵erent reading frames for translation: The DNA sequence
5’ ...TATTCGAATCGGC...3’
5’ ...TATTCGAATCGGC...3’
can be translated in 3 di↵erent ways, leading to di↵erent aa
sequences.
can be translated in 3 di↵erent ways, leading to di↵erent aa
sequences.
• degeneracy of the genetic code
• degeneracy of the genetic code: 64 codons but only 20 aa’s plus stop
• silent mutations
• silent mutations
codon
9 / 10
9 / 10
The genetic code
The genetic code
Exercise:
• standard genetic code (some organisms use a di↵erent one)
Translate this DNA sequence according to the 3 di↵erent reading frames:
• 3 di↵erent reading frames for translation: The DNA sequence
5’ ...TATTCGAATCGGC...3’
5’ ...TATTCGAATCGGC...3’
can be translated in 3 di↵erent ways, leading to di↵erent aa
sequences.
• degeneracy of the genetic code: 64 codons but only 20 aa’s plus stop
codon
• silent mutations: if third position mutates, this often does not alter
the aa
9 / 10
10 / 10