Download 5` 3` - UTSA CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Nucleosome wikipedia , lookup

Community fingerprinting wikipedia , lookup

RNA wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Genomic library wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Messenger RNA wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Biochemistry wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Molecular cloning wikipedia , lookup

RNA-Seq wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

DNA supercoil wikipedia , lookup

Gene wikipedia , lookup

Genetic code wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epitranscriptome wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Biosynthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
CS5263 Bioinformatics
Lecture 2: Introduction to
molecular biology
Polymer
Monomer
DNA
Deoxyribonucleotides
RNA
Ribonucleotides
Protein
Amino Acid
DNA
• DNA: forms the genetic material of all
living organisms
• A string made from alphabet {A, C, G, T}
– e.g. ACAGAACGTAGTGCCGTGAGCG
• Each letter is called a base
– A deoxyribonucleotides
5’
A
5’-AGCGACTG-3’
G
C
AGCGACTG
G
DNA
A
Many biological processes go from 5’ to 3’
e.g. DNA replication, transcription, etc.
C
T
G
3’
5
Phosphate
4
Base
1
Sugar
3
2
5’
A
3’
Base-pair:
A=T
Forward (+)
strand
G=C
G
5’-AGCGACTG-3’
3’-TCGCTGAC-5’
C
G
A
AGCGACTG
TCGCTGAC
C
T
AGCGACTG
G
3’
Backward (-)
strand
5’
One strand is said to be reversecomplementary to the other
DNA double helix
RNA
• Carry information from DNA to protein
– Other functions have been found
• a string made from alphabet {A, C, G, U}
– e.g. ACAGAACGUAGUGCCGUGAGCG
• Each letter is called a base
– A ribonucleotides
5’
A
5’-AGUGACUG-3’
G
U
AGUGACUG
G
RNA
A
Many biological processes go from 5’ to 3’
e.g. transcription.
C
U
G
3’
5
Phosphate
4
Base
1
Sugar
3
2
RNA Secondary structures
• RNAs are normally single-stranded
• Can form complex structure by self-basepairing
• A=U, C=G
Protein
• The actual “worker” for almost all
processes in the cell
• A string built from 20 letters
– E.g. MGDVEKGKKIFIMKCSQCHTVEKGGKH
• Each letter is called an amino acid
Protein zoom-in
• Composed of a chain of amino acids.
Side chain
R
|
H2N--C--COOH
|
Amino group
H
Carboxyl group
Amino acid
• 20 amino acids, only differ at side chains
– Each can be expressed by three letters
– Or a single letter: A-Y, except B, J, O, U, X
– Alanine = Ala = A
– Arginine = Arg = R
– Asparagine = Asn = N
– Lysine = Lys = K
Amino acids => peptide
R
|
H2N--C--COOH
|
H
R
|
H2N--C--COOH
|
H
R
R
|
|
H2N--C--CO--NH--C--COOH
|
|
H
H
Peptide bond
Protein
R
H2N
R
R
R
R
R
…
N-terminal
•
•
•
•
COOH
C-terminal
Has orientations
Usually recorded from N-terminal to C-terminal
Peptide vs protein: basically the same thing
Conventions
– Peptide is shorter (< 50aa), while protein is longer
– Peptide refers to the sequence, while protein has 2D/3D structure
Protein structure
• Linear sequence of amino acids folds to
form a complex 3-D structure.
• The structure of a protein is intimately
connected to its function.
Genome and chromosome
• Genome: the complete DNA sequences of
an organism
– May contain one (in prokaryotes) or more (in
eukaryotes) chromosomes
• Chromosome: a single large DNA
molecule in an organism
– May be circular or linear
– Contain genes as well as “junk DNAs”
– Highly packed!
Formation of chromosome
Formation of chromosome
50,000 times shorter than extended DNA
Gene
• Gene: unit of heredity in living organisms
– A segment of DNA with information to make a
protein
Some statistics
Chromosomes Bases
Genes
Human
46
3 billion
20k-25k
Dog
78
2.4 billion ~20k
Corn
20
2.5 billion 50-60k
Yeast
16
20 million ~7k
E. coli
1
4 million
Marbled
lungfish
?
130 billion ?
~4k
Human genome
•
•
•
•
46 chromosomes: 22 pairs + X + Y
1 from mother, 1 from father
Female: X + X
Male: X + Y
Human genome
• Every cell contains the same genomic
information
– Except sperms and eggs
– They only contain half of the genome
• Otherwise your children would have 46 + 46
chromosomes
• How does biology achieve that?
Cell division: meiosis
• A reproductive cell
divides into four cells,
each containing only half
of the genomes
– Diploid => haploid
• Two haploid cells (sperm
+ egg) forms a zygote
– Which will then develop
into a multi-cellular
organism by mitosis
Cell division: mitosis
• A cell duplicates its
genome and
divides into two
identical cells
• These cells build up
different parts of
your body
Central dogma of molecular biology
DNA replication is critical in both
mitosis and meiosis
DNA Replication
• The process of copying a double-stranded
DNA molecule
– Semi-conservative
5’-ACATGATAA-3’
3’-TGTACTAT-5’

5’-ACATGATAA-3’
5’-ACATGATAA-3’
3’-TGTACTATT-5’ 3’-TGTACTATT-5’
• Mutation: changes in DNA base-pairs
• Proofreading and error-correcting mechanisms
exist to ensure extremely high fidelity
DNA synthesis
• Creating DNA synthetically in a laboratory
• Chemical synthesis
– Chemical reactions
– Arbitrary sequences
– Maximum length 160-200
• Cloning: make copies based on a DNA template
– Biological reactions
– Requires template
– Many copies of a long DNA in a short time
in vivo Cloning
• Connect a piece of DNA to bacterial DNA,
which can then be replicated together with
the host DNA
in vitro Cloning
• Polymerase chain reaction (PCR)
5’
5’
denature
5’
5’
Primer (< 30 bases)
5’
5’
5’
5’
DNA Polymerase
dNTP
5’
5’
5’
5’
Reaction
Chemical
synthesis
Chemical
In vivo
cloning
Biological
In vitro
cloning
Biological
Template
No
Yes
Yes
Speed
Fast
Length
Very short
Vary (rely
Fast
on host cell)
Long
Medium
Some terms
• Denaturation: a DNA double-strand is separated
into two strands
– By raising temperature
• Renaturation: the process that two denatured
DNA strands re-forms a double-strand
– By cooling down slowly
• Hybridization: two heterogeneous DNAs form a
double-strand
– may have mismatches
– The rationale behind many molecular biological
techniques including DNA microarray
Central dogma of molecular biology
Transcription
• The process that a DNA sequence is
copied to produce a complementary RNA
– Called message RNA (mRNA) if the RNA
carries instruction on how to make a protein
– Called non-coding RNA if the RNA does not
carry instruction on how to make a protein
– Only consider mRNA for now
• Similar to replication, but
– Only one strand is copied
Transcription
(where genetic information is stored)
DNA-RNA pair:
A=U, C=G
T=A, G=C
(for making mRNA)
Coding strand:
5’-ACGTAGACGTATAGAGCCTAG-3’
Template strand: 3’-TGCATCTGCATATCTCGGATC-5’
mRNA:
5’-ACGUAGACGUAUAGAGCCUAG-3’
Coding strand and mRNA have the same sequence, except
that T’s in DNA are replaced by U’s in mRNA.
The genetic code
• There are four bases in DNA (A, C, G, T), and
four in RNA (A, C, G, U), but 20 amino acids in
protein
• How are amino acids encoded in mRNA?
– 4^1 = 4
– 4^2 = 16
– 4^3 = 64
• The actual genetic code used by the cell is a
triplet.
– Each triplet is called a codon
– Redundancy
– Universal
The Genetic Code
Third
letter
Translation
• The sequence of codons is translated to a
sequence of amino acids
• Gene: -GCT TGT TTA CGA ATT• mRNA: -GCU UGU UUA CGA AUU • Peptide: - Ala - Cys - Leu - Arg - Ile –
• Start codon: AUG
– Also code Met
– Stop codon: UGA, UAA, UAA
Translation
• Transfer RNA (tRNA) – a different type of RNA.
– Freely float in the cytoplasm.
– Every amino acid has its own type of tRNA that binds
to it alone.
• Anti-codon – codon binding crucial.
tRNA
tRNA
More complexity
Transcription factor
RNA Polymerase
Transcription starting site
promoter
gene
• RNA polymerase binds to certain location on promoter to initiate
transcription
• Transcription factor binds to specific sequences on the promoter to
regulate the transcription
– Recruit RNA polymerase: induce
– Block RNA polymerase: repress
– Multiple transcription factors may coordinate
More complexity
promoter
Transcription starting site
gene
transcription
Pre-mRNA
• Pre-mRNA needs to be “edited” to form mature mRNA
intron
intron
Pre-mRNA
5’ UTR exon
exon 3’ UTR
exon
Splice
Mature mRNA
(mRNA)
Open reading
frame (ORF)
Start codon
Stop codon
DNA sequencing: Basic idea
• PCR
primer extension
5’-TTACAGGTCCATACTA 
3’-AATGTCCAGGTATGATACATAGG-5’
• We need to supply A, C, G, T for the synthesis to
continue
• Besides A, C, G, T, we add some A*, C*, G*, and T*
– Very similar to ACGT in all aspects, except that
– The extension will stop if used
DNA sequencing, cont
DNA sequencing, cont