Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gene expression wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Community fingerprinting wikipedia , lookup
Synthetic biology wikipedia , lookup
Expression vector wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Biopharma Systems Biology Solutions to Microarray Nightmare By Teresa Sardón, Cristina Segú and José Manuel Mas at Anaxomics Systems biology approaches may hold the key to overcoming the data analysis and interpretation difficulties from microarray technology. Here, a tool to understand a drug mechanism of action from a microarray experiment is outlined In spite of the numerous benefits that microarray technology brings to pathology characterisation and drug development, it has some associated drawbacks at different levels. Particular difficulties can be encountered in gene expression microarray in terms of data analysis and interpretation. Different approaches to overcome microarray results interpretation have been put forward. Those that are based on systems biology approaches seem to deliver more informative results, such as a newly available tool that integrates microarray data into mathematical models to unveil the underlying mechanisms of action. Microarray Drawbacks The concept of DNA microarray, developed in the early 1990s, was evolved from Southern blotting technique, based on solid-phase hybridisation technology. This method relies on the immobilisation of probe molecules onto a solid surface and the recognition of their complementary Keywords DNA target sequence by hybridisation. Gene expression DNA microarrays Microarray technology have dramatically System biology accelerated Mathematical modelling many types of 44 IPT 45 2013.indd 44 investigations, ranging from basic science to clinical applications (1,2). Gene expression microarrays represent an extremely potent tool to researchers, enabling them to monitor the expression of thousands of genes simultaneously. However, the implementation of the technique entails some difficulties at different stages of the process. Experimental Design Microarrays are not always useful, and sometimes it is not possible to extract evident conclusions. To overcome this, one has to make sure that the variation to be measured is dependent on differences in gene expression, and that those can take place in the timeframe of the experiment. For example, to measure the response to a specific intervention in the activation of the coagulation cascade, a gene expression microarray is not adequate, since the activation of the coagulation cascade factors does not involve changes in expression. Once the suitability of this methodology has been evaluated, it is essential to correctly design the microarray experiments (3). There are several issues to consider in order to obtain a well-designed experiment: the use of blocking to avoid confounding; blinding and randomisation to avoid bias; what type of data analysis will be employed; and estimating the adequate number of replicates necessary to have significant conclusions (4). Non-Biological Factors Another drawback is that the microarray method can alter the results, with non-biological factors contributing to the variability of data. There are some characteristics of the microarray technique concerning both the method and the platform, which can be sources of variation in the measurements of gene expression, including sample quality, differences in labelling and hybridisation efficiency, plus spatial biases across the microarray surface (5). These kinds of difficulties have been extensively studied, and are normally solved by running enough replicates and using normalisation methods (6). Analysis Complexity Only obtaining known conclusions, or extremely abstract ones, can be another difficulty. When researchers get the microarray data and try to extract conclusions from them, they tend to search for differences in expression in proteins known to be involved in the process under study. This normally allows for supporting the already suspected mechanism, but does not allow for drawing a Innovations in Pharmaceutical Technology Issue 45 30/05/2013 08:37 iptonline.com new one. In many other cases, the complexity of the solutions makes it difficult to obtain further conclusions. Pros and cons of topological mathematical modelling ProsCons Simple approach In addition, it can be difficult to discover unexpected patterns beyond the ideas that informed the study design. One of the advantages of microarrays is their capacity of proving the expression of thousands of genes at once. Nevertheless, it is also a handicap when drawing conclusions. Standard statistical techniques fail to summarise much of the information in all gene measures across the samples, owing to the high number of possible combinations of proteins differentially expressed between control and intervention, and the reduced number of samples analysed. The obtained gene list often depends on the statistical test used, and most of them lack the ability to control the expected number of false positives at a desirable level. Analysis of hubs (a common connecting protein in a network) and bottlenecks (proteins connecting different networks) Hub Bottleneck Systems Biology Tools Systems biology can be used as an analytical tool for microarray data. There are two ways in which systems biology approaches can facilitate the analysis of results: but also how all constituents of a network function together, by studying their complex interactions (topology). The network can restrict the number of possible conclusions obtained from a microarray experiment. By observing the links between two significant proteins, one may begin to suspect that they represent much more than chance associations in the results, and that they are listed because of an underlying biological process. There are different mathematical modelling approaches: those based only on topological information (topological modelling) and those that exploit additional information, such as function or expression. Though topological information is not always sufficiently informative (see the pros and cons box), the integration of other pieces of information generate more complete models. The computational analysis of these models allows the identification of new biological restrictions which, when applied to the microarray analysis, constricts the amount of data showing only the significant data. Case Study One example of a systems biology approach, involving a mathematical model for integrating known Innovations in Pharmaceutical Technology Issue 45 IPT 45 2013.indd 45 All links are weighted equally W1 W2 W5 W4 W6 W8 W1=W2=W3=W4=W5=W6=W7=W8 Contextualising Systems biology methods seek to understand not only each constituent of a biological network, • Fast – chancing databases • Lack of complete link information • Errors in links reported W3 Mathematical Modelling In order to obtain a complete and useful output from a gene expression microarray experiment, the high amount of information obtained could be filtered, thus making use of the already known data from the biological system under study. One possible way of combining the existing knowledge with the new microarray data generated is by using systems biology approaches (7). Topology depends on the information used to construct the map. Labile information leads to labile conclusions biological information with microarray information, is the online SimsCells.com software. This uses a therapeutic performance mapping system to generate and explore mathematical models representing different organisms and cell types (8,9). To model biological processes, the technology exploits available biological and medical information, in addition to topological data. Data relating specific inputs (for example, drug Z treatment, gene Y activation) with their corresponding biological outputs (such as clinical effect X, protein W inhibition) are collated in a database and used for training and validation of the models (see Figure 1, Step 1 on page 46). The mathematical models generated help solve the disadvantages presented by topological modelling methods, the liability of the conclusions and the equal weighting of the links. Once a mathematical model has been trained to behave like the represented biological system, it can be questioned about the mechanistic pathways that link a stimulus to its associated outcome. For instance, one may ask about the mechanism of action (MoA) that drives the side-effect of a drug, or the network that links 45 30/05/2013 18:57 iptonline.com Step 1: Initial modelling Previous knowledge about humans Interaction data Gene/protein network Clinical analysis and microarrays Cell biology Clinical trials and drugs Biochemistry Physiology Molecular biology Truth Table Mathematical model Universe of solutions fulfilling the network and the Truth Table restrictions 0,2 0,3 0,1 W1n 0,6 0,4 0,6 W2n 0,8 0,9 0,5 W3n 0,7 0,5 0,6 W4n 0,5 0,7 0,3 W5n 0,2 0,3 0,2 W6n 0,8 0,9 0,8 W7n 0,5 0,4 0,9 W8n Solution 1 Solution 2 Solution 3 A Solution n B Microarray information Average solution Interestingly, when models generated through this approach are questioned, they do not provide one 46 IPT 45 2013.indd 46 Step 1: Previous knowledge from different scientific sources is used to generate a mathematical model whose responses to any stimulus comply with the biological restrictions. (A) Generated models are questioned about mechanistic solutions linking a new stimulus to its corresponding biological outcome. (B) Microarray experimental results can further restrict the models’ biological solutions. Step 2: All the possible solutions are compared and grouped in a 2D representation according to their common mechanistic patterns. By this clustering, it is possible to identify groups of similar outcomes. Alternatively, common patterns from the different clusters can be summarised in a unique ‘average’ solution. stimulus (for example, different side-effects observed in an individual treated with the same drug) and different mechanistic explanations to the same biological response, such as multifactorial diseases. In order to get a mechanistic inside on the pool of model outcomes, the software applies a strategy based on sampling methods, where the common patterns in response to an intervention are identified. In Figure 1, the image in Step 2 shows a 2D space distribution of all MoAs obtained. The way each particular MoA responds to a set of stimulus is computed and, with a mathematical conservative transformation, reduced to two dimensions. The closer two particular MoAs are placed in the 2D representation, the more similar their responses to a stimulus. Step 2: Sampling methods the down-regulation of a gene with its downstream effect on other proteins. Figure 1: Scheme of mathematical modelling process unique solution, but rather identify a universe of possible solutions that satisfy the restrictions set by the topology and the database. In a similar manner, the approach revealed in nature different molecular responses to the same By extracting common patterns, it is possible to draw a unique ‘central’ solution (‘average’ MoA) which, with little variation, includes most of the possible solutions. In addition, it is possible to group the MoAs in clusters and study each cluster separately (see Figure 1, Step 2). Each of them contains a set of MoAs that, without being equal, share a common pattern in their responses. Considering the inter-individual mechanistic variations observed in nature, the different groups of calculated solutions could represent Innovations in Pharmaceutical Technology Issue 45 30/05/2013 08:38 iptonline.com Control microarray the most common patterns of responses, in the sense that in both cases the easiest solution is prioritised. Control model System biology data Key proteins Figure 2: Microarray analysis using systems biology. The comparison of the microarrays becomes a comparison of mathematical models taking into account systems biology and other biological data Drawing Conclusions The size and complexity of microarray experiments often results in a wide variety of possible interpretations. Biologically significant changes in expression can be missed by expression arrays due to technical limitations. Therefore, mathematical models become an invaluable tool to analyse microarray results from a cellular global behaviour perspective. To extract the biologically relevant information out of the gene expression microarray experiments, the case study approach outlined here integrates microarray data into the mathematical model training, together with the topology and the database. In this way, gene expression data further restrict the model possible outcomes. When the results of a microarray experiment are integrated in a mathematical model compiling all known biological information, the discrete information provided by the microarray (coming from a small number of samples) is converted in a mathematical model composed by thousands of ‘individuals’, thereby increasing the power of the analysis. By comparing the models corresponding to the control and the intervention, it is possible to obtain key proteins differentially activated in both groups of samples (see Figure 2). These differential proteins are candidate proteins to have different expression levels in the microarray and are worth following. By using this method, researchers can identify all the key proteins relevant for the measured outcome of the experiment, reducing the number of false positives out of the microarray data while at the same Intervention microarray time obtaining a logical, biologically supported conclusion. It provides some insight which goes beyond what is already known from direct investigation of the phenomenon being studied, predicting properties that might not be evident to the experimenter. Acknowledgement The research leading to these results has received funding from the European Union’s Seventh Framework Biological Networks 2.0 – an integrative view of genome biology data, BMC Bioinformatics 11: p610, 2010 8. Visit: www.simscell.com 9. Mas JM, Pujol A, Aloy P and Farrés J, Methods and systems for identifying molecules or processes of biological interest by using knowledge discovery in biological data, US Patent Application No. 12/912,535, 2010 Programme (FP7/2007-2013) under the grant agreement number HEALTH-F4-2012-305869 (SysMalVac). References 1. Sassolas A, Leca-Bouvier BD and Blum LJ, DNA biosensors and microarrays, Chemical Reviews 108(1): pp109-139, 2008 2. Russell S, Meadows LA and Russell RR, Microarray Technology in Practice, Academic Press/Elsevier, 2009 3. Falciani F, Microarray Technology Through Applications, Taylor & Francis Group, 2007 4. Stekel D, Microarray Bioinformatics, Cambridge University Press, 2003 5. Draghici S, Khatri P, Eklund AC and Szallasi Z, Reliability and reproducibility issues in DNA microarray measurements, Trends in Genetics: TIG 22(2): pp101-109, 2006 6. Reimers M, Making informed choices about microarray data analysis, PLoS Computational Biology 6(5): e1000786, 2010 7. Kozhenkov S, Dubinina Y, Sedova M, Gupta A, Ponomarenko J and Baitaluk M, Innovations in Pharmaceutical Technology Issue 45 IPT 45 2013.indd 47 Intervention model Teresa Sardón is Head of Analytical Services at Anaxomics, where she is responsible for the business development and commercialisation of the SimsCell analysis platform. She graduated in Pharmacy at the University of the Basque Country and holds an MS and PhD in Biochemistry from the Universitat Autònoma de Barcelona. Her background includes five years of postdoctoral research at the European Molecular Biology Laboratory in Germany and four at the Centre for Genomic Regulation in Barcelona, working in the fields of cellular and molecular biology. Email: [email protected] Cristina Segú is a Project Manager at Anaxomics. She has a degree in Biotechnology from the Universitat Autònoma de Barcelona. As a member of the Molecular Health Department, she has gained extensive experience in building computable descriptions of available molecular, biochemical and physiological data. Email: [email protected] José Manuel Mas is a Founder and Chief Operations Officer at Anaxomics. He was previously the EU Head of R&D at RPS and Founder and Chief Technology Officer at Infociencia. José holds a degree in Biochemistry, an MSc in Biotechnology and a PhD in Computer Sciences. He has wide experience in the development of biocomputational tools and artificial intelligence techniques. Email: [email protected] 47 30/05/2013 16:40