Download some molecular basics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transformation (genetics) wikipedia , lookup

Interactome wikipedia , lookup

Metabolism wikipedia , lookup

Gene regulatory network wikipedia , lookup

SR protein wikipedia , lookup

DNA supercoil wikipedia , lookup

RNA-Seq wikipedia , lookup

Protein wikipedia , lookup

Gene wikipedia , lookup

Signal transduction wikipedia , lookup

Non-coding DNA wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Genetic code wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Epitranscriptome wikipedia , lookup

Protein structure prediction wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Proteolysis wikipedia , lookup

Biochemistry wikipedia , lookup

Biosynthesis wikipedia , lookup

Gene expression wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
The cell
Computational Biology
and Bioinformatics
Some relevant molecular biology
• The fundamental unit of life
• All cells have very similar internal
mechanisms
• Cells reproduce themselves
passing along all the necessary
information to reproduce all
functionality (heredity)
are a bit like computers
• Cells
since they have to respond to all
kinds of information: food,
threats, ...
The cell
The cell
2
There are two big classes: Procaryotes and eukaryotes
The internal structure of
eukaryotic cells is much
more complex
• All information concerning the functional
and structural elements of the cell is
encoded in the DNA (Deoxyribonucleic
Acid)
• Information from one kind of cells can
be treated by other kinds of cells
• DNA is part of a chromosome
• DNA contains information to produce
thousands of proteins
Procaryotes have no
nucleus
• A gene is a part of a DNA string that
encodes a particular protein
• The genome is the collection of all
DNA molecules
3
The cell
The cell
4
Organism
•
A human has 46 DNA
molecules in every cell
organised in
chromosomes
procaryotes, like
• Inbacteria,
there is often
one circular DNA
molecule
5
Year
Size (Mb)
Mycoplasma genitalium
1995
0,6
Haemophilus influenzae
1995
1,8
Escherichia coli
1997
4,6
Saccharomyces cerevisiae
1996
12
Schizosaccharomyces pombe
2002
14
Caenorhabditis elegans
1998
97
Arabidopsis thaliana
2001
120
Oryza sativa
2002
5 000
Drosophila melanogaster
2000
180
Galus Galus
2004
1 200
Rattus Norvegicus
2004
2 900
Mus musculus
2002
3 400
Homo sapiens
2001
3 400
1Mb = 1000000 bases
The size of the genome for different species
The structure of ...
... DNA and RNA
DNA and RNA are linear structures that consist of 4 types of nucleotides
DNA
(Deoxyribonucleic acid)
Watson et Crick (1953)
RNA
(ribonucleic acid)
A nucleotide is constructed of 3 parts (Fig. A) : a base , a sugar and a
phosphate group
The difference between the four types is
in the base part, which can be either
A,T,G or C
... DNA and RNA 2
... DNA and RNA 3
DNA and RNA are linear structures that consist of 4 types of nucleotides
A nucleotide is constructed of 3 parts (Fig. A) : a base , a sugar and a
phosphate group
The difference between the four types in
in the base part, which can be either
A,T,G et C
DNA molecules consist of millions of
these nucleotides
5’
3’
3’ - GTAACGGTCA - 5’
The beginning of the DNA is
annotated with 5’ and the end
with 3’.
... DNA and RNA 5
DNA consists of two complementary strands
DNA consists of two complementary strands
The base pairs are always between
a pyrimidine and a purine : A-T and
C-G
The order of the two strands is
inverted
at the exterior: the main
chain or backbone
5’- CATTGCCAGT - 3’
||||||||||
3’- GTAACGGTCA - 5’
a strand = a sequence of
nucleotides
... DNA and RNA 4
The two strands are entangled and
linked by hydrogen bonds (weak
links) at the inside of the structure
DNA consists of two complementary strands
Coding strand/sense
anti-coding strand/
antisense
The two strands are entangled and linked by hydrogen bonds (weak links) at
the inside of the structure
RNA consist of a single strand
These molecules also assume a 3D
form, where complementary parts of
the RNA strand can interact through
hydrogen bonds
The base-pairing is now between: A-U
et C-G
Uracil (U) replaces Thymine (T) in
RNA !!
The central dogma of
molecular biology
... DNA and RNA 6
• The genes are translated into
DNA replication
During cell division the two DNA strands are separated
• step 1 : transcription
Both strands acts as templates on which the complementary strands are
formed
proteins in two steps
• Parts of the DNA are copied
into shorter RNA strands
called messenger RNA
(mRNA)
error rate = 1 per 109
bases
• step 2 : translation
The genetic information is
preserved in this way
• The mRNA is translated by
the RNA-polymerase into a
sequence of amino acids =
proteins
Transcription
Video on
transcription and
traduction
•
mRNA is a strand that is complementary to the noncoding strand of the DNA
•
The number of mRNA corresponds to the gene
expression level in the cell
RNA 5’ - CAUUGCCAGU - 3’ (mRNA)
DNA 3’ - GTAACGGTCA - 5’ (non coding strand)
Translation
•
Translation from mRNA to proteins
•
•
the mRNA is read in blocks of 3
nucleic acids (=codon)
•
•
•
uses transfer RNA (tRNA) to
perform the translation
43 possibilities = 64 codons
BUT only 20 amino acids
Translation 2
•
The ribosome starts at the 5’
and moves into the direction of
the 3’
•
It catches tRNA molecules which
can associate with specific codon
combinations in the mRNA
•
The amino acids linked to the
tRNA, become associated
through a peptide bond with the
already existing sequence
•
There is no superposition with
the codon
The genetic code is
degenerate
The translation is done by the
ribosome
Can you see the relation
with a Turing-like machine?
The genetic code
RNA 5’ - CAU UGC CAG U - 3’ (mRNA)
Protein - His Cys Gln ...
Evolution
Multi-cellular organisms
Prokaryotes diverged into
two big groupes
In most cases the translation starts
with the codon AUG = M(ethionine)
All cells are derived from a
common ancestor
The analysis of the different genomes provides
information on the evolutionary relationships
between the different species
Evolution 2
• Heredity = transfer of genetic information from the parent to
the offspring
• When a cell divides, the DNA is copied and divided over the
two new cells.
• Sometimes this process produces errors (mutations) that
can
Evolution 3
• Continuous trial and error allows cells and organisms to evolve
• Certain DNA parts are more prone to evolutionary change
than others
•
e.g. the non-coding regions in DNA
• Yet regions important for proteins function need to be
conserved
• Improve the functionality of the cell = selective advantage
• Destroy the functionality of the cell = cell dies
• Does not change the functionality of the cell = selectively
neutral
Evolution 4
Proteins
New genes are build out of old ones in 2 ways
• The majority of objects inside a cell
Duplication and divergence
are proteins
• Proteins are ...
• the elements that define the
Paralog genes = two genes in the
same species that were produced
by duplication and that diverged in
their structure and/or function
structure of a cell
Ortholog genes= The same gene in
two different species that are
derived from the same gene in
their last common ancestor
catalysing reactions (enzymes),
regulation the expression of
genes (transcription factors), give
structure to the cell
(cytoskeleton), signal
transduction, ...
robots that perform almost
• The
every function inside the cell
• The human genome contains 30000
to 35000 proteins
Structural levels
Primary structure
Like DNA and RNA, proteins are sequences of
highly modular building blocks = amino acids
The structure of a protein
can be described at different
levels
hydrophobe
hydrophile
Amino acids
All amino acids (AA) are build out of 3 parts : the
central Cα, nitrogen group et and the sidechain
group
The difference
betwee the amino
acids is in their
sidechain
Amino acids 2
The AA are chained together by peptide bonds
For this reason they
are also called
polypeptides
The structure is
defined by the
amino acids and
their order in the
sequence
Folding 2
Folding
Hydrophobic amino
acids point towards
the interior,
hydrophilic to the
outside
The possible conformations of the main chain
(backbone) are limited
Torsion angles
The Ramachandran
plot shows the
angles that are
observed in existing
proteins
The peptide bonds are
planar. Rotation is only
allowed around the NC (ϕ) and Cα-C (φ)
bonds
Only the ϕ et φ
angles that do not
produce collisions are
acceptable
Folding 3
The sequence of amino acids folds in the lowest
energy conformation
BUT remember the structure is not static, it breathes !
Binding to another protein or molecule may cause
structural changes
Tertiary structure
The folding of a proteins is orchestrated by three
types of interactions between the AA
The stability of the
protein is determined
by the combination of
all the forces between
the residues
The strength of these
interactions is 30 to 300
times weaker then a
peptide bond
We discuss only
globular proteins here
Another type is the
fibrous protein
The folded 3D structure is the tertiary structure of a
protein
Secondary structure
α-helices
α helices and β strands
β strands
in red, the
interactions
between NH and
CO
Analyzing the tertiary structure of proteins has resulted
in the discovery of some geometric regularities
helices and strands in the structure are produced by
hydrogen bonds between the NH and CO groups
Between 50% and 80% of the residues
in a protein can be classified in terms of
these regular structures
In the standard helix,
the interacting
residues are
separated by two
other residues
Certain amino acids are
preferred in helices : Ala (A) Glu
(E), Leu (L) and Met (M)
Pro (P), Gly (G), Tyr (T) and Ser
(S) never or rarely occur in
helices
33
34
α helices and β strands 2
α helices and β strands 3
The hydrogen bonds are
between two different
parts of the protein
β strands
The combination of
β strands = β sheet
loops
Loops often play an
important role in the
function of a protein
α helices
The β strands can be
organised in a parallel
and anti-parallel manner
35
36
Protein domains
Proteins can also consist of multiple globular
parts = domaines
Domains introduce
modularity in the protein
structure
Protein domains 2
496
Structure 1996, Vol 4 No 5
Let’s focus on WW domains for the rest of this session
Figure 1
Nedd4
human
Nedd4
rsp5
rat (mouse)
yeast
C2
Each kind of domain has
its own functionality
The domain size is
between 40 and 350 AA
WW
WW
WW
nematode
human/chicken
YAP65
mouse
WW
DP71
human
CD45AP
human/mouse
Actin
Binding
38D4
nematode
WW
rat
WW
dodo
ess-1
fly
yeast
WW
K01A6.1
nematode
ORF1
human
Db10
tobacco
WW
Mp
HECT
HECT
WW
WW
Pro
WW
Pro
24 spectrin repeats
WW
HECT
WW
Cys
W
Cys
TM
WW
FE65
WW
WW
WW
56G7
YAP65
Dystrophin/ human/rat/mouse/
utrophin
chicken/fish
Domains can fold
independently
WW
WW
C2
BCR homology
PH
PTB
PTB
PPIase
W
W W
WW
WW
PTP homology
ras GTPase activator
ATP-dep RNA helicase
Yo61
nematode
WW
WW
Y
Ykb2
yeast
WW
WW
Y
Schematic representation of several proteins
containing WW domains (red). A single boxed W
(e.g. in DP71 and K01A6.1) represents a portion
of the domain containing either the first or second
conserved tryptophan. The C2 domain (a domain
known to mediate Ca2+ -dependent association
with phospholipids/membranes) is found in
Nedd4/Rsp5 and also in PKC, PLA2, PLC,
rasGAP, synaptotagmin I and other proteins. The
HECT domain in Nedd4/Rsp5 and 56G7 is a
ubiquitin ligase (E3) enzyme also present in E6AP, the yeast ykl162, rat p100 and UreB1. Pro
represents the proline-rich (SH3 binding) region;
Cys, cysteine rich region; TM, transmembrane
domain of CD45 associated protein (CD45AP);
PH, pleckstrin homology domain also found in
dynamin, SOS, PLCg, IRS-1, rasGAP and Btk;
BCR (breakpoint cluster region) homology
domain, also shared by p85 of PI-3 kinase,
rhoGAP and n-chimerin; PTB/PID domains,
recently suggested to be a subclass of PH
domains, are also present in Shc, numb and IRS1; PPIase, peptidylpropyl cis-trans isomerase,
known to associate with transcription factors;
PTP, homologous to protein tyrosine
phosphatases. Actin binding (CH, calponin
homology) domain is homologous to calponin,
actinin, vav and spectrin. In human ORF1
(IQGAP1), the Mp domain is homologous to the
fly muscle protein mp20, and the GTPase
activator is a rasGAP domain. The Y domain,
shared by Yo61 and Ykb2, has no known
function. Sizes of all proteins and domains
are not to scale.
Let’s take a look at the article
ww-and-sh3-domains.pdf
37
hydrophobic interactions with prolines. Future peptide
library screens, mutation analysis and structure determination of WW domains from unrelated proteins (e.g. dystrophin, formin-binding proteins) should help in the elucidation of the differences in binding specificity between
the different WW domains.
Function of WW domains
Protein domains 3
Let’s focus on WW domains for the rest of this session
4"5" 4(1)(+ 36 (7"89:;< =36632+ >#! ?@AA@B !AC!D
E,
As the WW domains were first described only recently,
identification of their role in the various proteins which
harbour them is in its infancy. Nevertheless, exciting clues
are emerging, some related to human genetic disorders
such as Liddle’s syndrome or muscular dystrophy.
downregulated) as the binding partner for ENaC. Nedd4
[19] contains a C2 domain, 3 (or 4 in the human) WW
domains, and a ubiquitin-ligase HECT (homology to the
E6-AP C terminus) domain (Fig. 1). ENaC–Nedd4 interaction is mediated by the WW domains of Nedd4 which
bind to the PY motifs of a, b and g ENaC [8]. Mutations
within the PY motif of b ENaC have been recently identified in Liddle’s patients [20,21]. These were shown to
cause increased channel activity [9] and to lead to abrogation of Nedd4–WW binding [8]. As Nedd4 contains a
ubiquitin-ligase domain, we speculate that this protein
may be a suppressor of the epithelial Na+ channel; in
Liddle’s syndrome patients, in which Nedd4-binding sites
(PY motifs) in the channel subunits are lost, channel ubiquitination and degradation may be impaired, resulting in
an increased number of active channels at the plasma
membrane. It is interesting that a similar role, involving
regulation of the number of transporters (permeases) at
the plasma membrane, was recently proposed for the yeast
homologue of Nedd4, Rsp5/Npi1 [22].
Protein domains 4
Liddle’s syndrome is a hereditary form of systemic renal
hypertension. It is characterized by increased Na+ absorption in the distal nephron [14], which is caused by
increased activity of the epithelial Na+ channel [15].
Recent genetic linkage analyses have demonstrated that
the disease is caused by effective deletion of regions
within the C termini of b [16] or g [17] ENaC, invariably
causing loss of the PY motifs in these subunits. Such deletions lead to increased activity of the channel, which is
probably a result of an increased number of active channels at the plasma membrane [9,15,18]. We have recently
identified Nedd4 (NPC expressed developmentally
RSP5, an essential gene in yeast [22], was originally identified as a suppressor of mutations in the SPT3 gene, a transcription factor interacting with the TATA-binding protein
TFIID [23]; these genetic interactions are likely to be indirect as RSP5 mutations suppress a deletion in SPT3. In
Read sections 1 and 2 of the article WW and SH3
domains: two different scaffolds to recognize prolinerich ligands (2002) by Macias, Wiener and Sudol.
the family of WW domains has
some highly conserved amino
acids
See N and C-terminal W domains
(hence the name)
C-terminal P
U"1< ,< J07#"872 /2I02*62 )7"1*42*# %9 /2726#2+ ;; +%4)"* /2I02*62/ 12*2$)#2+ ?"#3 \70/#)7 ^ &TP(< V32 /#$"6#7A 6%*/2$!2+ $2/"+02/ )$2 :%@2+
"* :7)6> )*+ #32 /24"G6%*/2$!2+ $2/"+02/. 8$2!"%0/7A 0/2+ 9%$ 67)//"9A"*1 #32 ;; /2I02*62/ &,'(. )$2 :%@2+ "* A277%?< _#32$ 6%*/2$!2+ $2/"+02/
)$2 /3%?* "* 1$)A. *21)#"!27A 63)$12+ $2/"+02/ "* :702 )*+ 8%/"#"!27A 63)$12+ $2/"+02/ "* :$%?* :%@2/< `2/"+02/ /3%?* "* $2+ 6%$$2/8%*+ #%
#32 /0112/#2+ /26%*+ :"*+"*1 /"#2. )/ 2@87)"*2+ "* #32 #2@#<
!"#$% &'( )*+ "* !"!% &,-(. /0112/#"*1 #3)# #3"/ 4%+"56)#"%*
6%07+ $28$2/2*# ) *21)#"!2 $2107)#"%* 4263)*"/4 9%$ ) 7)$12
/0:/2# %9 ;; +%4)"*/< ="##72 "/ >*%?* ):%0# #32 $2107)#"%*
%9 #32 #3"$+ /#$)*+< V32 #?% /#$06#0$)7 8$%7"*2/ %9 #32 828#"+2
OKT! )*+ KN!R )$2 8)6>2+ :2#?22* #32 3"137A 6%*/2$!2+ )$%G
4)#"6 $2/"+02/ LXU'W )*+ #32 ;E[< \%48)$2+ #% #32 LHKMN
Analyse the text and determine what you
understand and don’t understand