Download Introduction to molecular biology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genome evolution wikipedia , lookup

Gene regulatory network wikipedia , lookup

Molecular cloning wikipedia , lookup

Replisome wikipedia , lookup

Community fingerprinting wikipedia , lookup

SR protein wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Polyadenylation wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Expanded genetic code wikipedia , lookup

Biochemistry wikipedia , lookup

Messenger RNA wikipedia , lookup

RNA silencing wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Point mutation wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Transcriptional regulation wikipedia , lookup

List of types of proteins wikipedia , lookup

RNA-Seq wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Non-coding RNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Biosynthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic code wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene expression wikipedia , lookup

Epitranscriptome wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Transcript
Introduction to molecular biology
Summary
•
•
•
•
•
•
•
•
•
Cells
Chromosomes
DNA
RNA
Aminoacids
Proteins
Genomics
Transcriptomics
Proteomics
1
Cells
All the living beings are composed of cells, that
are the basic unit of life. Each cell derives from
other cell.
Prokaryotes
No nucleus or internal membranes.
Eukaryotes
• Nucleus.
• Internal membranes.
• Organelles inside the cell that play
different and specific roles.
Organisms can be:
Unicellular
• Prokaryotes: bacteria, rchaea.
• Eukaryotes: baker yeast.
Multicellular
•Eukaryotes: animals, plants, fungi…
Human beings: 60 E18 cells, 320 different types
Cells
Composition
70% Water
7% Small molecules:
• Salts
• Lipids
• Aminoacids
• Nucleotides
23% Macromolecules:
• Proteins
• Polysaccharide
Cell functions:
A cell contains all the necessary information to
perform a replication (a virus does not!). Processes
developed by cells include:
Metabolic pathways
Traduction of RNA to proteins
…
2
Chromosomes
• The nucleus of Eukaryots contains
one or more DNA molecules (double
stranded). Each of these
supermoleluces are called
chomosomes.
• For examples, human beings have
22 pairs of autosomes) and 1 pair of
sexual chromosomes. :
Cells
• Almost all the cells in an organism have the same
genome (some times there are slight differences).
• The DNA represents all the information needed by the
cell to perform its functions.
3
Three basic macromolecules for life
• DNA
– It contains all the information needed by the cell (the “hard drive”)
– Actually, since almost all the cells in an organism share the same
genome, it contains all the information needed by ANY cell to perform
their functions.
– It stays (almost) always in the nucleus.
• RNA
– RNA has two main functions:
• It mimics the information in DNA (located in the nucleus) and migrates to
other parts of the cell where this information is used (messenger RNA,
mRNA)
• It has a crucial role in protein synthesis (transfer RNA, tRNA).
• Proteins
– Many different functions (signalling, structural, enzymes,
regulation…). They are the key constituents of the organism.
Central dogma of molecular biology
• It is not a DOGMA
– A dogma is some
important part of the
faith that must be
believed.
– The researcher that
coined this term finally
recognized that “I did not
know what dogma
meant”.
– There are strong support
to this… it is not a dogma
(or at least there are
other fields of knowledge
that deserve this term
much more ☺.
4
DNA vs RNA
DNA: code of life
DeoxyriboNucleic Acid
There are four different nucleotides for all living beings: Adenine (A), Guanine
(G), Cytosine (C) y Thymine (T). They have two complimentary pairs: A-T and
C-G
5
DNA structure
DNA replication
6
Structure of a nucleotide
• Purines: Adenine (A) and Guanine (G).
There is a double ring.
• Pyrimidines: Thymine (T), Cytosine (C) and
Uracil (U). Thymine is substituted by Uracil
on RNA. Single ring.
One “nucleotide” is a compound formed by one base (A, C, T ó G), one sugar
molecule and phosphoric acid.
How to read a DNA sequence?
All the nucleotides have two
bonds: 5’ and 3’. The number is
the position of the carbon atoms
in the sugar molecule.
Nucleotides, in turn, form a
phosphodiesther bond.
The DNA molecule is created when bonds betwen 5’
and 3’ of the nucleotides are set. DNA is alwyas read
from 5’ to 3’. Funny equivocations…
Sequence: TGACT
7
DNA
Code:
Symbol
• It can be seen as a code
with only 4 letters instead of
2 (binary coding).
• How many letters? 16 to
describe different
possibilities.
Meaning
Origin of the name
G
G
A
A
Guanine
Adenine
T
T
Thymine
C
C
Cytosine
R
G or A
puRine
Y
T or C
pYrimidine
M
A or C
aMino
K
G or T
Keto
S
G or C
Strong interaction (3 H bonds)
W
A or T
Weak interaction (2 H bonds)
H
A or C or T
not-G, H follows G in the alphabet
B
G or T or C
not-A, B follows A
V
G or C or A
not-T (not-U), V follows U
not-C, D follows C
D
G or A or T
N
G or A or T or C
aNy
-
---
None (gap)
DNA is double stranded
• Hydrogen bonds between the nucleotide pairs..
• DNA is not symmetric!! It has two directions and is
read, always, from 5’ to 3’.
• Both strands are complimentary: A-T and C-G
– Forward strand
– Reverse strand
8
Mitochondrial DNA
Mitochondrial DNA (mtDNA) is the DNA located in organelles called mitochondria. All
mtDNA is received by the mother (since mitochondria is provided by the zygote.
Mitochondria are sometimes described as "cellular power plants," because they generate
ATP, used as a source of chemical energy .
RNA
(RiboNucleic Acid)
• Protein synthesis occurs in the Ribosomes
• Organelles located in in the cytoplasm outside the nucleus.
• DNA is in the nucleus.
• RNA transports the information from the nucleus to the Ribosomes
• The mechanism that creates RNA complimentary to DNA is called
Transcription.
DNA vs RNA:
(T) is substituted by uracil (U).
• RNA is single stranded. It can bend and form two stranded chains (palindromes) (“Sit on a
potato pan, Otis”).
9
Messenger RNA (mRNA)
• Part of the DNA is
trascripted into RNA (RNA is
a copy of the DNA).
• RNA goes to the cytoplasm
and in the ribosomes, mRNA
is used to build proteins.
• RNA itself is the message
from the nucleus to the
cytoplasm.
Transcription process
Inititiation
In the first stage, RNA polimerase binds to a region of DNA
(that is called the promoter). The enzyme opens de
DNA, and allows the creation of the RNA molecule that
has a complementary sequence to the DNA.
Elongation
RNA polimerase moves along the supporting strand and
RNA nucleotides are inserted in the new RNA molecule
Termination
RNA termination process is a complex process (it involves
palidrons –hairpins- in prokaryots and more complex
processes in eukaryots). Once it has finished, DNA is
closed again, and RNA moves form the nucleus to the
cytoplasm.
10
Transcription in action
RNA maturation
• In Eukaryots, the sequence that appears in the genome is not exactly the one
translated.
• RNA has a maturation process
• Remove intermediate sequences called introns.
• Join the exons using polymerases.
• A single gene (DNA) can raise several variants (using different exons). This process is
called Alternative Splicing.
11
What is a gene?
•
Promoter region. It contains the necessary sequences to activate or deactivate
the gene. Limits are fuzzy and depends on different genes. Proximal promoter is
considered to be 1000-50000 bp upstream the TSS (transcription start site)
•
Exons:
Coding regions of the gen (it converts into proteins)
1 to 178 exones/gene (average: 8.8)
8 bp to 17 kb /exon (average145 bp)
•
Introns:
Non coding region flanked by two exons.
Size (average): 1 kb – 50 kb /intron (much larger than exons)
•
Size of a gene: the largest: 2.4 Mb (Dystrophin). Average: 27 kb.
12
PROTEINS
Aminoacids
Amino acids are the basic structural building units of proteins. An amino acid
is a molecule that contains both amine and carboxyl functional groups
with the general formula H2NCHRCOOH, where R is an organic
substituent.
They form polymer chains
Short ones called peptides, large ones called polypeptides or proteins.
Translation
Process to form the protein according to the mRNA template.
As both the amine and carboxylic acid groups of amino acids can react to form amide bonds, one amino
acid molecule can react with another and become joined through an amide linkage. This polymerization
of amino acids is what creates proteins.
13
Aminoacids
• 20 standard aminoacids
• Bricks to build proteins.
• 10 essential amino acids
• Cannot be synthesized by
human body.
• They therefore must be
obtained from food
• Plants synthesizes all of
them.
Aminoacids
Amino Acid
Alanine
Arginine
Asparagine
Aspartic acid
Cysteine
Glutamic acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
3-Letter 1-Letter polarity
acidity
Ala
A
nonpolar neutral
Arg
R
polar
basic
Asn
N
polar
neutral
Asp
D
polar
acidic
Cys
C
polar
neutral
Glu
E
polar
acidic
Gln
Q
polar
neutral
Gly
G
nonpolar neutral
His
H
polar
basic
Ile
I
nonpolar neutral
Leu
L
nonpolar neutral
Lys
K
polar
basic
Met
M
nonpolar neutral
Phe
F
nonpolar neutral
Pro
P
nonpolar neutral
Ser
S
polar
neutral
Thr
T
polar
neutral
Trp
W
nonpolar neutral
Tyr
Y
polar
neutral
Val
V
nonpolar neutral
hydrophobycity
1.8
-4.5
-3.5
-3.5
2.5
-3.5
-3.5
-0.4
-3.2
4.5
3.8
-3.9
1.9
2.8
-1.6
-0.8
-0.7
-0.9
-1.3
4.2
14
Proteins
• Proteins are large molecules composed of aminoacids.
• Their 3D structure is complex
• It is not a double helix as DNA: the shape is different for each of
them.
• Proteins fold. This folding plays a crucial role in their function
• For example, mad cow disease is produced by an anormal folding
of a protein.
Protein structure
• Protein structure is crucial to
determine their chemical
properties, and even, their
function.
• 3D structure determines which
are the aminoacids in the
surface.
• There are 4 levels at which
structure can be studied:
1.
2.
3.
4.
Aminoacid sequence
Polipeptide folding
Protein shape
Protein interactions (that include
changes in the positions of the
atoms).
15
Central Dogma (once again)
• Transcription brings the data from
DNA to RNA
• RNA from the nucleus to the
Ribosoms
• Translation obtains protein according
to the genetic code and the
corresponding mRNA
– tRNA is used as a lorry to carry the
aminoacids as we will see.
Translation
• Translation is the second step in the central
dogma.
• mARN is decoded using the genetic code
• Aminoacids follow the sequence given
by mRNA.
• This process takes place in the cytoplasm.
• tRNA is used as a “lorry” to carry the
aminoacids.
•Ribosomes are the factories to build the
proteins.
16
Trasnfer RNA (tRNA)
tRNA is a RNA that is used to carry aminoacids to the ribosomes in order to build teh proteins.
tRNA abundance is larger than mRNA (75% vs 15%)
Most RNA in the cell is tRNA
tRNA acknoledges mRNA and transfer the correspondign aminoacid to the protein being
created.
Genetic code
Codon: a sequence of 3 nucleotides that codes for an aminoacid according to
this table.
AUG codes methionine, and is also the start code. First AUG in mRNA is the region where translation starts.
17
Some exceptions:
Genetic code is almost universal.
Other considerations…
• A codon is a sequence of 3 nuclotides (DNA or RNA) that
codes for a particular aminoacid.
– There are 4 possible bases (RNA) : A, C, G y U
– 3 bases per codon
• Therefore, there are: 4 * 4 * 4 = 64 possible codons
• Special codons:
– Start codon: AUG. Translation starts in this codon. It also codes
for an aminoacid (methinine)
– Stop codons (three flavours): UAA, UAG, UGA
• There are 61 codons left to code 19 aminoacids
– Genetic code is redundant: the same aminoacid may be coded
by several codons.
18
Translation again:
Anticodon: A sequence of 3
nucleotides in tRNA that acknowledge
the corresponding codon in mRNA.
Using the anticodon, the aminoacid to
include in the protein is selected.
tRNA carries the “free” aminoacid and,
in the Ribosome, it is joined to the
polypeptide chain that it is being
created.
For example, tRNA with anticodon
UAC, corresponds to the AUG codon
that, in turn, codes methionine.
ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG…
ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G…
M
E
V
F
K
A
P
P
I
G
I
stop
Translation in action
19
In brief:
• Proteins are coded in the genes in ADN located in the nucleus. DNA
stays always in the nucleus.
• Ribosomes are factories to build proteins located in the cytoplasm.
mRNA carries the mesage from the nucleus to the ribosomes.
There is an intermediate step called mRNA maturation in which
introns are excluded and exons are retained.
• Ribosomes build what mRNA codes, using aminoacids that in turn,
are carried by tRNA.
– Ribosomes are composed of proteins and rRNA (a third class of RNA…
and there are even more!!)
Some important
Definitions in
BIOINFORMATICS
20
ORF (Open Reading Frame):
•
Coding From DNA to protein is done by codons. There are three possibilities (starts
with the first, the second or the third nucleotide in the sequence). We can use one
strand (forward) or the other (reverse strand). Each of these six possbilities are
called a reading frame. Only one of them is valid. For example, this sequence has
the following possibilities :
ATGCC (M) ATGCC (C)
•
ATGCC (A)
A sequence flanked by start codon and a stop codon is called an Open Reading
Frame (ORF).
ATG
TGA
Genomic Sequence
Open reading frame
ORFs as gene candidates
• An open reading frame that begins with a start codon (ATG)
• Most prokaryotic genes code for proteins that are 60 or more
amino acids in length
• The probability that a random sequence of nucleotides of length
n has no stop codons is (61/64)n
• When n is 50, there is a probability of 92% that the random
sequence contains a stop codon
• When n is 100, this probability exceeds 99%
– A large sequence without stop codons is probably coding a protein.
21
Definitions:
• Nucleic acids = composed of nucleotides= bases or base
pairs
• Short form: nt (nucleotides), bp (base pairs).
• 400-nt: means 400 nucleotide positions (in DNA they
are 800!)
• 400-bp menas 400 base pairs
• 1000000-bp = 1000-kb = 1-Mb
Genomic analysis
How to build a whole genome in four steps:
– Cut it!:
• Restriction enzymes break the DNA in specific sites.
• It is divided into sort pieces.
– Copy it!:
• It is easy to copy DNA (it was designed for that!).
• We get several clones of each DNA sequence using the Polymerase Chain
Reaction (PCR).
– Using a cycle of PCR, the concentration of DNA is doubled
» 20 cycles of PCR increases the concentration by 2^20…
» (about 1 million times)
– Read it!:
• Electrophoresis to read small fragments.
– Ensembl it!:
• Using all the fragments, there are overlapping sequences that can be used to
perform the ensembl (just like building a puzzle).
• This puzzle has “large sky regions” difficult to build: there are large parts of
the genomes quite repetitve (and they are also important).
22
Genomic analysis and bioinformatics
Once that we have the sequence we can find genes (using
statistical properties of the intra gene regions).
It is also important to measure gene expression and predict
their function.
Gene hunting
Protein sequence
analysis
DNA sequence analysis
2001: First draft version
of the human genome.
2003: Human genome
curated. First “release
version”. Mouse genome
completed.
Protein function can be inferred from
their sequence and structure. Structure
analysis gives better resutls
Bioinformatics analysis:
• DNA
– Useful for genomic diseases
• Single gene (mendelian), chromosomal.
• Multifactorial o complex diseases.
Predisposition to develop a disease
– Does not change if the organism has an acquired disease condition Not valid as a marker of an acquiered disease
• RNA
– Easy to measure
– RNA concentration changes for disease state
Early marker for different diseases
• Proteins
– It is difficult to perform a whole proteome analysis.
– They finally explain most of the disease targets Closer to the
biological fact
Most reasonable drug targets
23
Genomes:
ORGANISM
CHROMOSOMES
Size
GENE Number
Homo sapiens
(Humans)
23
3,200,000,000
~ 30,000
Mus musculus
(Mouse)
20
2,600,000,000
~30,000
Drosophila
melanogaster
(Fruit Fly)
4
180,000,000
~18,000
Saccharomyces
cerevisiae (Yeast)
16
14,000,000
~6,000
Zea mays (Corn)
10
2,400,000,000
???
Transcriptome:
• Different mRNA (including splice forms) for a particular organism.
– About 30.000 genes
– About 250.000 splice variants
• Other RNA fucntions related with trasncription regulation
– miRNA: small pieces of RNA that interrupt the transcription of a gene
24
Proteome
• The complete collection of proteins in an organism
– Nobody knows how many… At least several millions.
– For each splice variant, using post transductional modifications,
different proteins (with different functions can be obtaines).
• One gene Several splice forms Several proteins
– Proteins are modified by other molecules that are joined to it
• Phosphate, acyl, methil, sugars, lípids, etc.,
• They change radically the activity of the protein
– There are many proteins with two forms: idle and active. The
transition is done by adding a phosphate group.
• Many disparate biological activity can be assigned to a single gene.
Questions?
25
Problem:
Genetic code
AUG codes methionine, and it is also the start codon.
26