* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download some molecular basics
Transformation (genetics) wikipedia , lookup
Interactome wikipedia , lookup
Gene regulatory network wikipedia , lookup
DNA supercoil wikipedia , lookup
Signal transduction wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Genetic code wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Epitranscriptome wikipedia , lookup
Protein structure prediction wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Proteolysis wikipedia , lookup
Biochemistry wikipedia , lookup
Biosynthesis wikipedia , lookup
Gene expression wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
The cell Computational Biology and Bioinformatics Some relevant molecular biology • The fundamental unit of life • All cells have very similar internal mechanisms • Cells reproduce themselves passing along all the necessary information to reproduce all functionality (heredity) are a bit like computers • Cells since they have to respond to all kinds of information: food, threats, ... The cell The cell 2 There are two big classes: Procaryotes and eukaryotes The internal structure of eukaryotic cells is much more complex • All information concerning the functional and structural elements of the cell is encoded in the DNA (Deoxyribonucleic Acid) • Information from one kind of cells can be treated by other kinds of cells • DNA is part of a chromosome • DNA contains information to produce thousands of proteins Procaryotes have no nucleus • A gene is a part of a DNA string that encodes a particular protein • The genome is the collection of all DNA molecules 3 The cell The cell 4 Organism • A human has 46 DNA molecules in every cell organised in chromosomes procaryotes, like • Inbacteria, there is often one circular DNA molecule 5 Year Size (Mb) Mycoplasma genitalium 1995 0,6 Haemophilus influenzae 1995 1,8 Escherichia coli 1997 4,6 Saccharomyces cerevisiae 1996 12 Schizosaccharomyces pombe 2002 14 Caenorhabditis elegans 1998 97 Arabidopsis thaliana 2001 120 Oryza sativa 2002 5 000 Drosophila melanogaster 2000 180 Galus Galus 2004 1 200 Rattus Norvegicus 2004 2 900 Mus musculus 2002 3 400 Homo sapiens 2001 3 400 1Mb = 1000000 bases The size of the genome for different species The structure of ... ... DNA and RNA DNA and RNA are linear structures that consist of 4 types of nucleotides DNA (Deoxyribonucleic acid) Watson et Crick (1953) RNA (ribonucleic acid) A nucleotide is constructed of 3 parts (Fig. A) : a base , a sugar and a phosphate group The difference between the four types is in the base part, which can be either A,T,G or C ... DNA and RNA 2 ... DNA and RNA 3 DNA and RNA are linear structures that consist of 4 types of nucleotides A nucleotide is constructed of 3 parts (Fig. A) : a base , a sugar and a phosphate group The difference between the four types in in the base part, which can be either A,T,G et C DNA molecules consist of millions of these nucleotides 5’ 3’ 3’ - GTAACGGTCA - 5’ The beginning of the DNA is annotated with 5’ and the end with 3’. ... DNA and RNA 5 DNA consists of two complementary strands DNA consists of two complementary strands The base pairs are always between a pyrimidine and a purine : A-T and C-G The order of the two strands is inverted at the exterior: the main chain or backbone 5’- CATTGCCAGT - 3’ |||||||||| 3’- GTAACGGTCA - 5’ a strand = a sequence of nucleotides ... DNA and RNA 4 The two strands are entangled and linked by hydrogen bonds (weak links) at the inside of the structure DNA consists of two complementary strands Coding strand/sense anti-coding strand/ antisense The two strands are entangled and linked by hydrogen bonds (weak links) at the inside of the structure RNA consist of a single strand These molecules also assume a 3D form, where complementary parts of the RNA strand can interact through hydrogen bonds The base-pairing is now between: A-U et C-G Uracil (U) replaces Thymine (T) in RNA !! The central dogma of molecular biology ... DNA and RNA 6 • The genes are translated into DNA replication During cell division the two DNA strands are separated • step 1 : transcription Both strands acts as templates on which the complementary strands are formed proteins in two steps • Parts of the DNA are copied into shorter RNA strands called messenger RNA (mRNA) error rate = 1 per 109 bases • step 2 : translation The genetic information is preserved in this way • The mRNA is translated by the RNA-polymerase into a sequence of amino acids = proteins Transcription Video on transcription and traduction • mRNA is a strand that is complementary to the noncoding strand of the DNA • The number of mRNA corresponds to the gene expression level in the cell RNA 5’ - CAUUGCCAGU - 3’ (mRNA) DNA 3’ - GTAACGGTCA - 5’ (non coding strand) Translation • Translation from mRNA to proteins • • the mRNA is read in blocks of 3 nucleic acids (=codon) • • • uses transfer RNA (tRNA) to perform the translation 43 possibilities = 64 codons BUT only 20 amino acids Translation 2 • The ribosome starts at the 5’ and moves into the direction of the 3’ • It catches tRNA molecules which can associate with specific codon combinations in the mRNA • The amino acids linked to the tRNA, become associated through a peptide bond with the already existing sequence • There is no superposition with the codon The genetic code is degenerate The translation is done by the ribosome Can you see the relation with a Turing-like machine? The genetic code RNA 5’ - CAU UGC CAG U - 3’ (mRNA) Protein - His Cys Gln ... Evolution Multi-cellular organisms Prokaryotes diverged into two big groupes In most cases the translation starts with the codon AUG = M(ethionine) All cells are derived from a common ancestor The analysis of the different genomes provides information on the evolutionary relationships between the different species Evolution 2 • Heredity = transfer of genetic information from the parent to the offspring • When a cell divides, the DNA is copied and divided over the two new cells. • Sometimes this process produces errors (mutations) that can Evolution 3 • Continuous trial and error allows cells and organisms to evolve • Certain DNA parts are more prone to evolutionary change than others • e.g. the non-coding regions in DNA • Yet regions important for proteins function need to be conserved • Improve the functionality of the cell = selective advantage • Destroy the functionality of the cell = cell dies • Does not change the functionality of the cell = selectively neutral Evolution 4 Proteins New genes are build out of old ones in 2 ways • The majority of objects inside a cell Duplication and divergence are proteins • Proteins are ... • the elements that define the Paralog genes = two genes in the same species that were produced by duplication and that diverged in their structure and/or function structure of a cell Ortholog genes= The same gene in two different species that are derived from the same gene in their last common ancestor catalysing reactions (enzymes), regulation the expression of genes (transcription factors), give structure to the cell (cytoskeleton), signal transduction, ... robots that perform almost • The every function inside the cell • The human genome contains 30000 to 35000 proteins Structural levels Primary structure Like DNA and RNA, proteins are sequences of highly modular building blocks = amino acids The structure of a protein can be described at different levels hydrophobe hydrophile Amino acids All amino acids (AA) are build out of 3 parts : the central Cα, nitrogen group et and the sidechain group The difference betwee the amino acids is in their sidechain Amino acids 2 The AA are chained together by peptide bonds For this reason they are also called polypeptides The structure is defined by the amino acids and their order in the sequence Folding 2 Folding Hydrophobic amino acids point towards the interior, hydrophilic to the outside The possible conformations of the main chain (backbone) are limited Torsion angles The Ramachandran plot shows the angles that are observed in existing proteins The peptide bonds are planar. Rotation is only allowed around the NC (ϕ) and Cα-C (φ) bonds Only the ϕ et φ angles that do not produce collisions are acceptable Folding 3 The sequence of amino acids folds in the lowest energy conformation BUT remember the structure is not static, it breathes ! Binding to another protein or molecule may cause structural changes Tertiary structure The folding of a proteins is orchestrated by three types of interactions between the AA The stability of the protein is determined by the combination of all the forces between the residues The strength of these interactions is 30 to 300 times weaker then a peptide bond We discuss only globular proteins here Another type is the fibrous protein The folded 3D structure is the tertiary structure of a protein Secondary structure α-helices α helices and β strands β strands in red, the interactions between NH and CO Analyzing the tertiary structure of proteins has resulted in the discovery of some geometric regularities helices and strands in the structure are produced by hydrogen bonds between the NH and CO groups Between 50% and 80% of the residues in a protein can be classified in terms of these regular structures In the standard helix, the interacting residues are separated by two other residues Certain amino acids are preferred in helices : Ala (A) Glu (E), Leu (L) and Met (M) Pro (P), Gly (G), Tyr (T) and Ser (S) never or rarely occur in helices 33 34 α helices and β strands 2 α helices and β strands 3 The hydrogen bonds are between two different parts of the protein β strands The combination of β strands = β sheet loops Loops often play an important role in the function of a protein α helices The β strands can be organised in a parallel and anti-parallel manner 35 36 Protein domains Proteins can also consist of multiple globular parts = domaines Domains introduce modularity in the protein structure Protein domains 2 496 Structure 1996, Vol 4 No 5 Let’s focus on WW domains for the rest of this session Figure 1 Nedd4 human Nedd4 rsp5 rat (mouse) yeast C2 Each kind of domain has its own functionality The domain size is between 40 and 350 AA WW WW WW nematode human/chicken YAP65 mouse WW DP71 human CD45AP human/mouse Actin Binding 38D4 nematode WW rat WW dodo ess-1 fly yeast WW K01A6.1 nematode ORF1 human Db10 tobacco WW Mp HECT HECT WW WW Pro WW Pro 24 spectrin repeats WW HECT WW Cys W Cys TM WW FE65 WW WW WW 56G7 YAP65 Dystrophin/ human/rat/mouse/ utrophin chicken/fish Domains can fold independently WW WW C2 BCR homology PH PTB PTB PPIase W W W WW WW PTP homology ras GTPase activator ATP-dep RNA helicase Yo61 nematode WW WW Y Ykb2 yeast WW WW Y Schematic representation of several proteins containing WW domains (red). A single boxed W (e.g. in DP71 and K01A6.1) represents a portion of the domain containing either the first or second conserved tryptophan. The C2 domain (a domain known to mediate Ca2+ -dependent association with phospholipids/membranes) is found in Nedd4/Rsp5 and also in PKC, PLA2, PLC, rasGAP, synaptotagmin I and other proteins. The HECT domain in Nedd4/Rsp5 and 56G7 is a ubiquitin ligase (E3) enzyme also present in E6AP, the yeast ykl162, rat p100 and UreB1. Pro represents the proline-rich (SH3 binding) region; Cys, cysteine rich region; TM, transmembrane domain of CD45 associated protein (CD45AP); PH, pleckstrin homology domain also found in dynamin, SOS, PLCg, IRS-1, rasGAP and Btk; BCR (breakpoint cluster region) homology domain, also shared by p85 of PI-3 kinase, rhoGAP and n-chimerin; PTB/PID domains, recently suggested to be a subclass of PH domains, are also present in Shc, numb and IRS1; PPIase, peptidylpropyl cis-trans isomerase, known to associate with transcription factors; PTP, homologous to protein tyrosine phosphatases. Actin binding (CH, calponin homology) domain is homologous to calponin, actinin, vav and spectrin. In human ORF1 (IQGAP1), the Mp domain is homologous to the fly muscle protein mp20, and the GTPase activator is a rasGAP domain. The Y domain, shared by Yo61 and Ykb2, has no known function. Sizes of all proteins and domains are not to scale. Let’s take a look at the article ww-and-sh3-domains.pdf 37 hydrophobic interactions with prolines. Future peptide library screens, mutation analysis and structure determination of WW domains from unrelated proteins (e.g. dystrophin, formin-binding proteins) should help in the elucidation of the differences in binding specificity between the different WW domains. Function of WW domains Protein domains 3 Let’s focus on WW domains for the rest of this session 4"5" 4(1)(+ 36 (7"89:;< =36632+ >#! ?@AA@B !AC!D E, As the WW domains were first described only recently, identification of their role in the various proteins which harbour them is in its infancy. Nevertheless, exciting clues are emerging, some related to human genetic disorders such as Liddle’s syndrome or muscular dystrophy. downregulated) as the binding partner for ENaC. Nedd4 [19] contains a C2 domain, 3 (or 4 in the human) WW domains, and a ubiquitin-ligase HECT (homology to the E6-AP C terminus) domain (Fig. 1). ENaC–Nedd4 interaction is mediated by the WW domains of Nedd4 which bind to the PY motifs of a, b and g ENaC [8]. Mutations within the PY motif of b ENaC have been recently identified in Liddle’s patients [20,21]. These were shown to cause increased channel activity [9] and to lead to abrogation of Nedd4–WW binding [8]. As Nedd4 contains a ubiquitin-ligase domain, we speculate that this protein may be a suppressor of the epithelial Na+ channel; in Liddle’s syndrome patients, in which Nedd4-binding sites (PY motifs) in the channel subunits are lost, channel ubiquitination and degradation may be impaired, resulting in an increased number of active channels at the plasma membrane. It is interesting that a similar role, involving regulation of the number of transporters (permeases) at the plasma membrane, was recently proposed for the yeast homologue of Nedd4, Rsp5/Npi1 [22]. Protein domains 4 Liddle’s syndrome is a hereditary form of systemic renal hypertension. It is characterized by increased Na+ absorption in the distal nephron [14], which is caused by increased activity of the epithelial Na+ channel [15]. Recent genetic linkage analyses have demonstrated that the disease is caused by effective deletion of regions within the C termini of b [16] or g [17] ENaC, invariably causing loss of the PY motifs in these subunits. Such deletions lead to increased activity of the channel, which is probably a result of an increased number of active channels at the plasma membrane [9,15,18]. We have recently identified Nedd4 (NPC expressed developmentally RSP5, an essential gene in yeast [22], was originally identified as a suppressor of mutations in the SPT3 gene, a transcription factor interacting with the TATA-binding protein TFIID [23]; these genetic interactions are likely to be indirect as RSP5 mutations suppress a deletion in SPT3. In Read sections 1 and 2 of the article WW and SH3 domains: two different scaffolds to recognize prolinerich ligands (2002) by Macias, Wiener and Sudol. the family of WW domains has some highly conserved amino acids See N and C-terminal W domains (hence the name) C-terminal P U"1< ,< J07#"872 /2I02*62 )7"1*42*# %9 /2726#2+ ;; +%4)"* /2I02*62/ 12*2$)#2+ ?"#3 \70/#)7 ^ &TP(< V32 /#$"6#7A 6%*/2$!2+ $2/"+02/ )$2 :%@2+ "* :7)6> )*+ #32 /24"G6%*/2$!2+ $2/"+02/. 8$2!"%0/7A 0/2+ 9%$ 67)//"9A"*1 #32 ;; /2I02*62/ &,'(. )$2 :%@2+ "* A277%?< _#32$ 6%*/2$!2+ $2/"+02/ )$2 /3%?* "* 1$)A. *21)#"!27A 63)$12+ $2/"+02/ "* :702 )*+ 8%/"#"!27A 63)$12+ $2/"+02/ "* :$%?* :%@2/< `2/"+02/ /3%?* "* $2+ 6%$$2/8%*+ #% #32 /0112/#2+ /26%*+ :"*+"*1 /"#2. )/ 2@87)"*2+ "* #32 #2@#< !"#$% &'( )*+ "* !"!% &,-(. /0112/#"*1 #3)# #3"/ 4%+"56)#"%* 6%07+ $28$2/2*# ) *21)#"!2 $2107)#"%* 4263)*"/4 9%$ ) 7)$12 /0:/2# %9 ;; +%4)"*/< ="##72 "/ >*%?* ):%0# #32 $2107)#"%* %9 #32 #3"$+ /#$)*+< V32 #?% /#$06#0$)7 8$%7"*2/ %9 #32 828#"+2 OKT! )*+ KN!R )$2 8)6>2+ :2#?22* #32 3"137A 6%*/2$!2+ )$%G 4)#"6 $2/"+02/ LXU'W )*+ #32 ;E[< \%48)$2+ #% #32 LHKMN Analyse the text and determine what you understand and don’t understand