Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture7 Introduction to signaling pathways Reverse Engineering of biological networks Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS Self organizing mapping(SOM) Introduction to signaling pathways Signaling networks involves the transduction of “signal” usually from outside to the inside of the cell On molecular level signaling involves the same type of processes as metabolism such as production and degradation of substances, molecular modifications (mainly phosphorylation but also methylation and acetylation) and activation or inhibition of reactions. But signaling pathways serve for information processing or transfer of information while metabolism provide mainly mass transfer Introduction to signaling pathways Signal transduction often involves: •The binding of a ligand to an extracellular receptor •The subsequent phosphorylation of an intra cellular enzyme •Amplification and transfer of the signal •The resultant change in the cellular function e.g. increase /decrease in the expression of a gene Signaling paradiam Usually a signaling network has three principal parts: Events around the membrane Reactions that link sub-membrane events to the nucleus Events that leads to transcription Source: Systems biology in practice by E. klipp et. al. Schematic representation of receptor activation Source: Systems biology in practice by E. klipp et. al. Steroids Not always a receptor exists at the membrane for example the steroid receptors. Sterol lipids include hormones such as cortisol, estrogen, testosteron and calcitriol. These steroids simply cross the membrane of the target cell and then bound the intracellular receptor which results in the release of the inhibitory molecule from the receptor. The receptor then traverses the nuclear membrane and binds to its site on the DNA to trigger the transcription of the target gene. Source: Systems biology by Bernhard O. Palsson G-protein signaling G-protein coupled receptor (GPCR) represents important components of signal transduction network This class of receptor comprises 5% of the genes in C. elegans The G-protein complex consists of three subunits (α, β and λ) and in its inactive state bound to guanosine diphosphate(GDP) When a ligand binds to the GPCR, the G-protein exchanges its GDP for a guanosine trihosphate(GTP) This exchange leads to the dissociation of the G-protein from the receptor and its split into a βλ complex and a GTPbound α subunit which is its active state initiating other downstream processes Source: Systems biology by Bernhard O. Palsson G-protein signaling model Source: Systems biology in practice by E. klipp et. al. G-protein signaling model Time course of G protein activation. The total number of molecules is 10000. The concentration of GDP-bound Gα is low for the whole period due to its fast complex formation with the heterodimer Gβλ Source: Systems biology in practice by E. klipp et. al. The JAK-STAT network The JAK-STAT signaling system is an important twostep process that is involved in multiple cellular functions including cell growth and inflammatory response A cell surface receptor often dimerizes upon binding to a cytokine The monomeric form of the receptor is associated with a kinase called JAK When the receptor dimerizes the JAKs induce phosphorylation of themselves and the receptor which is the active state of the receptor. The active complex phosphorylates the STAT(signal transducer and activator of transcription) molecules STAT molecules then dimerizes, go to nucleus and trigger transcription Source: Systems biology in practice by E. klipp et. al. Schematic representation of the MAP kinase cascade. An upstream signal causes phosphorylation of the MAPKKK. The phosphorylation of the MAPKKK in turn phosphorylates the protein at the next level. Dephosphorylation is assumed to occur continuously by phosphatases or autodephosphorylation Source: Systems biology in practice by E. klipp et. al. Signaling pathways in Baker’s yeast HOG pathway activated by osmotic shock, pheromone pathway activated by pheromones from cells of opposite mating type and pseudohyphal growth pathway stimulated by starvation condition A MAP kinase cascade is a particular part of many signalling pathways . In this figure its components are indicated by bold border Source: Systems biology in practice by E. klipp et. al. Reverse Engineering of biological networks The task of reverse engineering of a genetic network is the reconstruction of the interactions among biological entities ( genes, proteins, metabolites etc.) in a qualitative way from experimental data using algorithm that weight the nature of the possible interactions with numerical values. In forward modeling network is constructed with known interactions and subsequently its topological and other properties are analyzed In reverse engineering the network is estimated from experimental data and then it is used for other predictions Reverse Engineering of gene regulatory network By clustering the gene expression data, we can determine coexpressed genes. Co-expressed genes might have similar regulatory characteristics but it is not possible to get the information about the nature of the regulation. Here we discuss a reverse engineering method of estimating regulatory relation between genes based on gene expression data from the following paper: Reverse engineering gene networks using singular value decomposition and robust regression M. K. Stephen Yeung, Jesper Tegne´ r†, and James J. Collins‡ Proc. Natl. Acad. Sci. USA 99:6163-6168 Reverse Engineering of gene regulatory network It is assumed that the dynamics i.e. the rate of change of a geneproduct’s abundance is a function of the abundance of all other genes in the network. For all N genes the system of equations are as follows: In Vector notation Where f(X) is a vector valued function Reverse Engineering of gene regulatory network Under linear assumption i.e. has linear relation with Xi s we can write Here Aij is the coupling parameter that represents the influence of Xj on the expression rate of Xi . In other words Aij represents a network showing the regulatory relation among the genes. Target of reverse engineering is to determine A. Solving A requires a large number of measurements of and X Reverse Engineering of gene regulatory network Measurement of several ways. is difficult and hence can be estimated in First, if time series data can be obtained then can be approximated by using the profiles of the expression values for fixed time intervals Alternatively a cellular system at steady state can be perturbed by external stimulation and then can be determined by comparing the gene expression in the perturbed cellular population and the unperturbed reference population. Reverse Engineering of gene regulatory network Now using any method if we can produce matrices then we can write Or, and (if external perturbation is used) Here BNxM is the matrix representing the effect of perturbation The goal of reverse engineering is to use the measured data B, X, and to deduce A i.e. the connectivity matrix of the regulatory relation among the genes. Reverse Engineering of gene regulatory network By taking transpose the system can be rewritten as A is the unknown. If M =N and X is full-ranked, we can simply invert the matrix X to find A. However, typically M<<N mainly because of the high cost of perturbations and measurements. We therefore have an underdetermined problem. Underdetermined problem means the number of linearly independent equations is less than the number of unknown variables. Therefore there is no unique solution One way to get around this is to use SVD to decompose XT into Reverse Engineering of gene regulatory network where U and V are each orthogonal which means: with I being the identity matrix, and W is diagonal: Without loss of generality, we may assume that all nonzero elements of wk are listed at the end, i.e., w1, w2, . . . , wL =0 and wL+1, wL+2,. . . , wN≠0, where L :=dim(ker(XT)). Then one particular solution for A is: Reverse Engineering of gene regulatory network the general solution is given by the affine space with C = (cij)N×N, where cij is zero if j >L and is otherwise an arbitrary scalar coefficient. This family of solutions in Eq. 3 represents all the possible networks that are consistent with the microarray data. Among these solutions, the particular solution A0 is the one with the smallest L2 norm. Now, the question is which one of the solutions of equation 3 is the best. Reverse Engineering of gene regulatory network In such cases, we may rely on insights provided by earlier works on gene regulatory networks and bioinformatics databases, which suggest that naturally occurring gene networks are sparse, i.e., generally each gene interacts with only a small percentage of all the genes in the entire genome. Imposing sparseness on the family of solutions given by Eq. 3 means that we need to choose the coefficients cij to maximize the number of zero entries in A. This is a nontrivial problem. Reverse Engineering of gene regulatory network The task is equivalent to the problem of finding the exact-fit plane in robust statistics, where we try to fit a hyperplane to a set of points containing a few outliers. Here they have chosen L1 regression where the figure of merit is the minimization of the sum of the absolute values of the errors, for its efficiency. In short, this method of reverse engineering can produce multiple solutions (gene networks) that are consistent with a given microarray data. This paper says among them the sparsest one is the best solution and used L1 regression to detect the best solution. Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS 24 [1] Metabolomics Tissue Samples MS Species Metabolite information Molecular weight and formula Fragmentation Pattern Experimental Information Species Metabolite 1 Species-Metabolite relation DB Metabolites B C Metabolite 2 D E F Metabolite 3 Metabolite 4 I L H K Metabolite 5 Metabolite 6 Interpretation of Metabolome 25 Data Processing from FT-MS data acquisition of a time series experiment to assessment of cellular conditions 10 (a) Metabolite quantities for time series experiments T8 T6 T7 T5 OD600 T4 (b) Data preprocessing and constructing data matrix T3 T2 T1 1 E. coli 0.1 0 200 400 600 800 Time point Time (min) (c) Classification of ions into metabolite-derivative group (d) Annotation of ions as metabolites x11 x21 ..... x s1 x12 ..... ..... xs 2 Detected Theoretical m/z m/z Molecular formula Exact mass Error Candidate Species 72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli 143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli 662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli 664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... M/2 x1 j ..... x2 j ... ..... ... ..... ... x1k x2 k ..... ..... Metabolites M (e) Assessment of cellular condition by metabolite composition ..... ..... ..... ..... M+1 ..... ..... ..... ..... x1M x2 M ..... xsM m/z (b) Data matrix metab.1 metab.200 x x ..... x ..... x1k ..... x1M time 1 11 12 1j time 2 x21 ..... ..... x2 j ... x2k ..... x2 M ..... ..... ..... ..... ... ..... ..... ..... xs1 ..... xt1 ..... time 8 xN1 xs 2 ..... ..... ... ..... ..... xsM ..... ..... ........ ..... ..... ..... xt 2 ..... xtj ... ..... ..... xtM ..... ..... ........ ..... ..... ..... xN 2 ..... xNj ... xNk ..... xNM 719.4869 747.5112 722.505 Software are provided by T. Nishioka (Kyoto Univ./Keio Univ.) 27 M-12 M-11 5 6 9 (c) Classification M-8 of ions into metabolite4 derivative group (DPClus) 3 M-9 M-10 M-14 Correlation M-5 network for individual ions.M-4 M-7 2-3 8 10 M-157 M-6 Intensity ratio between Monoisotope (M) M-13 and Isotope (M+1) 2-2 # of Carbons in molecular formula: M-16 11 M-17 PG9 PG3 PG10 1-3 1-1 M-3 M-2 1-4,5 M-1 PG4 PG7 PG6 PG1 2-1 PG2 PG8 PG5 1-6 1-2 28 (d) Annotation of ions as metabolites using KNApSAcK DB Detected m/za Theoretical m/z Molecular formula 72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli 143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli 253.2137 254.2210 C16H30O2 254.2246 0.0036 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius 253.2185 254.2258 C16H30O2 254.2246 0.0012 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius 281.2444 282.2516 C18H34O2 282.2559 0.0042 Oleic acid Escherichia coli C18H34O2 282.2559 0.0042 cis-11-Octadecanoic acid Lactobacillus plantarum Exact mass Error Candidate Species C18H34O2 282.2559 0.0042 omega-Cycloheptylundecanoic acid Alicyclobacillus acidocaldarius 297.2410 298.2482 C18H34O3 298.2508 0.0026 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius 297.2467 298.2540 C18H34O3 298.2508 0.0032 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius 297.2516 298.2589 C18H34O3 298.2508 0.0081 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius 321.0506 322.0579 C10H15N2O8P 322.0566 0.0013 dTMP Escherichia coli K12 346.0570 347.0643 C10H14N5O7P 347.0631 0.0012 AMP Escherichia coli C10H14N5O7P 347.0631 0.0012 3'-AMP Escherichia coli C10H14N5O7P 347.0631 0.0012 dGMP Escherichia coli 401.0168 402.0241 C10H16N2O11P2 402.0229 0.0012 dTDP Escherichia coli 402.9962 404.0035 C9H14N2O12P2 404.0022 0.0013 UDP Escherichia coli 426.0237 427.0310 C10H15N5O10P2 427.0294 0.0016 Adenosine 3',5'-bisphosphate Escherichia coli C10H15N5O10P2 427.0294 0.0016 ADP Escherichia coli C10H15N5O10P2 427.0294 0.0016 dGDP Escherichia coli C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18A2 Actinomadura spiralis MI178-34F18 C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18C2 Actinomadura spiralis MI178-34F18 454.0391 455.0464 458.1112 459.1185 C15H22N7O8P 459.1267 0.0083 Phosmidosine B Streptomyces sp. strain RK-16 495.1039 496.1112 C24H20N2O10 496.1118 0.0006 Kinamycin A Streptomyces murayamaensis sp. nov. C24H20N2O10 496.1118 0.0006 Kinamycin C Streptomyces murayamaensis sp. nov. 505.9908 506.9981 C10H16N5O13P3 506.9957 0.0023 ATP,dGTP Escherichia coli 547.0756 548.0829 C16H26N2O15P2 548.0808 0.0020 dTDP-L-rhamnose Escherichia coli 565.0503 566.0576 C15H24N2O17P2 566.0550 0.0025 UDP-D-glucose Escherichia coli C15H24N2O17P2 566.0550 0.0025 UDP-D-galactose Escherichia coli C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-mannosamine Escherichia coli C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-glucosamine Escherichia coli 606.0775 607.0848 ADP-L-glycero-beta-D-mannoheptopyranose 618.0897 619.0970 C17H27N5O16P2 619.0928 0.0042 662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli Escherichia coli 29 (e) Estimation of cell condition based on a function of the composition of metabolites. 1 0.1 0 T4 T3 T2 T1 T5 200 T8 T6 T7 400 600 PLS (Partial Least Square regression model) -- extract important combinations of metabolites. N (biol.condition) << M (metabolites) 800 Metabolites Time (min) measurement points OD600 10 cell condition Responses K=1 Y N=8 M=220 X PLS cell condition N=8 Y(Cell density)= a1 x1 +…+ aj xj +….+ aM xM xj, the quantity for jth metabolites 30 (e) Assessment of cellular condition by metabolite composition Detection of stage-specific metabolites (PLS model of OD600 to metabolite intensities) y(OD600 Cell Density)= a1 x1 +…+ aj xj +….+ aM xM xj , the quantity for jth aj > 0, stationary phase-dominant metabolites aj < 0, exponential phase-dominant metabolites MS/MS analyses 0.1 dTDP-6-deoxy-L-mannose Parasperone A UDP-glucose, UDP-galactose UDP-N-acetyl-D-glucosamine UDP-N-acetyl-D-mannosamine aj Lenthionine omega-Cycloheptylnonanoate omega-Cycloheptylundecanoate, cis-11-Octadecanoic acid UDP Octanoic acid dTMP, dGMP, 3'-AMP NADH PG2,4,6,8,10 80 metabolites 0.0 120 metabolites Argyrin G omega-Cycloheptyl-alpha-hydroxyundecanoate ATP, dGTP omega-Cycloheptyl-alpha-hydroxyundecanoate dTDP Glyoxylate PG1,3,5,7,9 MS/MS analyses -0.15 Exponential-phase dominant ADP, Adenosine 3',5'-bisphosphate, dGDP ADP-(D,L)-glycero-D-manno-heptose Red: E.coli metabolites;Black: Other bacterial metabolites NAD Stationary-phase dominant 10 Phosphatidylglycerols detected by MS/MS spectra O O unsaturated PGs C15H31 O O X3 O O O C15H31 O O X3 O cyclopropanated PGs Exponential phase Cyclopropane Formaiton of PGs Stationary phase (b) Relation of mass differences among PG1 to 10 marker molecules PG1 ∆(CH ) PG3 (Cluster 1)PG5 ∆(CH ) 28.0281 28.0315 30:1(14:0,16:1) 32:1(16:0,16:1) 34:1(16:0,18:1) 2 2 2 2 US CFA 14.0170 CFA 14.0187 2.0138 PG7 ∆(CH ) PG9 28.0330 34:2(16:1,18:1) 36:2(18:1,18:1) CFA 14.0110 2 2 PG6 ∆(CH ) PG2 ∆(CH ) PG4 CFA 14.0181 28.0298 28.0237 31:0(14:0,c17:0) 33:0(16:0,c17:0) 34:5(16:0,c19:0) 2 2 2 2 US 2.0051 PG8 ∆(CH ) PG10 28.0314 35:1(16:1,c19:1) 37:1(18:1,c19:0) 2 2 (Cluster 2) Cyclopropane CFA 14.0197 Formation of PGs occurs in the transition from exponential to stationary phase. Self organizing Maps Time-series Data Growth curve 10 j … 1 … T 2 0.1 1 0.01 Time Expression profiles Gene1 Gene2 ... Genei ... GeneD Stage x11 x21 ... xi1 ... xD1 1 x12 x22 ... xi 2 ... xD 2 2 ... x1 j ... x2 j ... ... ... xij ... ... ... xDj …. j ... x1T ... x2T ... ... ... xiT ... ... ... xDT … T When we measure time-series microarray, gene expression profile is represented by a matrix SOM makes it possible to examine gene similarity and stage similarity simultaneously. x1 x2 ... xi ... x D T, # of time-series microarray experiments D, # of genes in a microarray Time-series Data Growth curve 10 j … 1 … T 2 0.1 1 0.01 Time Expression profiles Gene1 Gene2 ... Genei ... GeneD Stage x11 x21 ... xi1 ... xD1 1 x12 x22 ... xi 2 ... xD 2 2 ... x1 j ... x2 j ... ... ... xij ... ... ... xDj …. j ... x1T ... x2T ... ... ... xiT ... ... ... xDT … T … … Stage similarity STATES State-Transition When we measure time-series microarray, gene expression profile is represented by a matrix SOM makes it possible to examine gene similarity and stage similarity simultaneously. x1 x2 ... Expression similarity xi ... x D T, # of time-series microarray experiments D, # of genes in a microarray Multivariate Analysis SOM : expression similarity of genes and stage similarity simultaneously. BL-SOM is available at http://kanaya.aist-nara.ac.jp/SOM/ SOM was developed by Prof. Teuvo Kohonen in the early 1980s Multi-dimensional data/input vectors are mapped onto a two dimensional array of nodes In original SOM, output depends on input order of the vectors. To remove this problem Prof. Kanaya developed BL-SOM. [1] Initial model vectors are determined based on PCA of the data. [2] The learning process of BL-SOM makes the output independent of the order of the input vectors. SOM Algorithm Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al. SOM Algorithm Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al. SOM Algorithm Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al. SOM Algorithm in Fig. before Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al. Self-organizing Mapping (Summary) X [1] Detection method for transition points in metabolite quantity based on batch-learni (BL-SOM) 1 [2] Diversity of metabolites in species Species-metabolite relation Database XT X2 Gene i (xi1,xi2,..,xiT) Gene1 Gene2 ... Genei ... GeneD x11 x21 ... xi1 ... xD1 x12 x22 ... xi 2 ... xD 2 ... x1 j ... x2 j ... ... ... xij ... ... ... xDj ... x1T ... x2T ... ... ... xiT ... ... ... xDT x1 x2 ... xi ... x D T, different time-series microarray experiments Self-organizing Mapping (Summary) Arrangement of lattice points in multi-dimensional expression space X1 Lattice points are optimized for reflecting data distribution Gene Classification Genes are classified into the nearest lattice points XT X2 Gene i (xi1,xi2,..,xiT) Self-organizing Mapping (Summary) Arrangement of lattice points in multi-dimensional expression space X1 Lattice points are optimized for reflecting data distribution Gene Classification Genes with similar expression profiles are clusterized to identical or near lattice points X1 (Time 1) Feature Mapping X2 (Time 2) In the i-th condition, lattice points containing only highly (low) expressed genes are colored by red (blue). XT X2 (ex.) Xk> Th.(k) Xk< -Th.(k) X3 (Time 3) k=1,2,…,T ….. ….. ….. XT (Time T) Visually comparing among each stage of time-series data Non-linear projection of multi-dimensional expression profiles of genes. Original dimension is conserved in individual lattice points. Several types of information is stored in SOM Estimation of transition points; Bacillus subtilis (LB medium) (Data: Kazuo Kobayashi, Naotake Ogasawara (NAIST)) Stage 1 2 3 4 5 6 7 High prob. 10 Cell Density (OD600 ) 0 6 5 1 7 8 4 3 log(Prob. Density) 2 0.1 -1000 1 0.01 LB 0.001 -2000 0 200 400 600 800 1000 Low prob. (min) SOM for time-series expression profile State transition point is observed between stages 3 and 4 8 Integerated analysis of gene expression profile and metabolite quantity data of Arabidopsis thaliana (sulfur def./cont.; Data are provided by K.Saito, M. Hirai group (PSC) ) ppm(error rate) Nakamura et al (2004) State transition Feature Maps Leaf Leaf Gene Metabolites (m/z) Root Lattice points with highly difference between 12 and 24 h. Blue: Decreased Red: increased Accurate molecular weights Candidate metabolites corresponding to accurate molecular weights 3. Species-metabolite relation Database Root Download sites of BL-SOM Riken: http://prime.psc.riken.jp/ NAIST: http://kanaya.naist.jp/SOM/ Application of BL-SOM to “-omics” Genome Kanaya et al., Gene, 276, 89-99 (2001) Abe et al., Genome Res., 13, 693-702, (2003) Abe et al., J.Earth Simulator, 6, 17-23, (2003) Abe et al., DNA Res., 12, 281-290. (2005) Transcriptome Haesgawa et al., Plant Methods, 2:5:1-18 (2006) Metabolome Kim et al., J. Exp.Botany, 58, 415-424, (2007) Fukusaki et al., J.Biosci.Bioeng., 100, 347-354, (2005) Transcriptome and Metabolome Hirai, M. Y., M. Klein, et al. J.Biol. Chem., 280, 25590-5 (2005) Hirai, M. Y., M. Yano, et al. Proc Natl Acad Sci U S A 101, 10205-10 (2004) Morioka, R, et al., BMC Bioinformatics, 8, 343, (2007) Yano et al., J.Comput. Aided Chem.,7,125-136 (2007) Summary of Bioinformatics Tool developed in our laboratory http://kanaya.naist.jp/~skanaya/Web/JTop.html All softwares and DB are freely accessable via Web. Metabolomics -- MS data processing Transcriptome and Metabolomics Profiling -- estimation of transition points Species-metabolite DB Network analysis: PPI Transcriptomics -- Statistics, Profiling, … Introduction to self organizing mapping software & Introduction to software package Expander http://acgt.cs.tau.ac.il/expander/