* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download GENE-CBR - Indiana University School of Informatics
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Oncogenomics wikipedia , lookup
Genome (book) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets dr. florentino fdez-riverola university of vigo Computer System of New Generation 1 Outline DNA Microarray Technology characteristics and model operation overview Bioinformatics and AI new challenges and emerging research areas CBR systems case-based reasoning GENE-CBR human genome analysis using CBR systems Demo geneCBR in action: cancer diagnosis using microarrays 2/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Microarrays: characteristics silicon chips that can measure the expression levels of thousands of genes simultaneously microarrays are base on a database over 40000 fragments of genes called expressed sequence tags (ESTs) allow us for the first time to obtain a “global” view of the cells belonging to: • different individuals • different time-intervals for the same individual • different tissues of the same individual gene expression profiles can be used as inputs to large-scale data analysis as: • fingerprints to build more accurate molecular classification • discovering hidden taxonomies • Increasing our understanding of normal and disease states 3/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Microarrays: model operation overview how does the chip work? Microarray chips incorporate different dyed genes tiled in a grid-like fashion The individual’s DNA to analyze is dyed with a different colour Both sets of labelled DNA strands are allowed to hybridize or bind hybridization events are detected identifying fluorescent changes in the strands or DNA scanner an scanner and the associated software perform various forms of image analysis to measure and report raw gene expression values the scanned intensities show how active the genes represented by the ESTs are in the cell: • strong fluorescence indicates that the gene is very active in the cell • no fluorescence indicates that the gene is inactive in the cell preprocessing microarray data file 4/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Available data bone narrow samples from 43 adult patients with Acute Myeloid Leukemia (AML) plus 6 sane individuals 10 4 7 22 6 patients with Acute Promyelocytic Leukemia patients with Acute Myeloid Leukemia with inv(16) patients with Acute Monocytic Leukemia patients with Acute non-Monocytic Leukemia samples belonging to sane individuals [APL] [AML-inv(16)] [AML-mono] [AML-other] [control samples] volume of information processed each microarray contains 22.283 ESTs ( genes) 49 microarrays = 1.091.867 gene expression values today available data 150 microarrays (Human Genome 133A) + 210 microarrays (Human Genome - plus) 5/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Challenges for microarray Data Mining three main types of data analysis needed for biomedical applications: gene selection ( attribute selection in AI): • find the genes most strongly related to a particular class classification ( supervised classification in AI) • classifying diseases or predicting outcomes based on gene expression patterns, and perhaps even identifying the best treatment for given genetic signature clustering ( unsupervised classification in AI) • finding new biological classes or refining existing ones three parallel research areas: convenient visualization of experiments and results discovery of biological knowledge (metabolic pathways, etc.) low-level analysis providing better readouts (preprocessing, normalization, etc.) 6/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Problems with existing data analysis of microarrays presents a number of unique challenges for Machine Learning and Data Mining techniques but … Its capacity for generating enormous amounts of data is, however, also an handicap: great amount of data belonging to each individual (thousands of genes) • efficiency and memory problems lack of initial knowledge • which is the significance level of each gene? given the difficulty of collecting microarray samples, the number of samples is likely to remain small in many interesting cases having so many fields relative to so few samples creates a high likelihood of finding false positives these problems are increased if we consider the potential errors that can be present in microarray data (symmetric and random errors) it is required sophisticated data analysis techniques and robust methods capable of extracting biologically meaningful knowledge from the raw data 7/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo CBR systems (Case-Based Reasoning) Kolodner (1983a, 1983b). Problem solving paradigm in AI. It can be viewed as a methodology for reasoning and learning “reasoning by re-using past cases is a powerful and frequently applied way to solve problems for humans” Joh (1997) the memory of the system (case base) stores a certain number of previously experienced situations CASE = PROBLEM description + applied SOLUTION [ + RESULT ] a new problem is solved by finding similar past cases and reusing them in the new problem situation Riesbeck et al., (1989) 4 cyclical steps are performed when it is necessary to solve a new problem Kolodner (1993); Aamodt y Plaza (1994); Watson (1997) Case-based reasoning is - in effect - a cyclic and integrated process of solving a problem, learning from this experience, solving a new problem, and so on... 8/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo The CBR cycle RETRIEVING one or more previously experienced cases New problem (1) RETRIEVE most similar cases REUSING the case(s) in one way or another MEMORY (2) (4) RETAIN confirmed solution CASE BASE (3) REUSE REVISING the solution based on reusing a previous case(s) proposed solution REVISE RETAINING the new experience by incorporating it into the existing knowledge-base (case base). 9/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Main characteristics of CBR systems adaptive and dynamic systems: the number of cases stored in the memory of the model changes, allowing the system adaptation to new situations CBR allow the utilisation of general knowledge in the resolution of a particular problem CBR facilitate the indexation of the available information CBR can use uncompleted cases CBR are advised about their limitations (perhaps a problem has no solution) CBR facilitate the utilisation of representative and flexible data structures case adaptation aids to discover inter-connections and hided structures in the available data CBR can be completely automated 10/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR 11/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Goals Objectives GENE-CBR Develop an effective and reliable system able to diagnose cancer subtypes based on the analysis of microarray data CBR system (Case-Based Reasoning) “Solve new problems (new patient) based on the previous experience (diagnosed patients)” doctor uses Implement a flexible tool for designing and testing new techniques and experiments AI techniques selection, clustering, inference… research group programmer Construct an advanced edition module for run-time modification of coded techniques BeanShell Programmer interface 12/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Logic architecture wizard JCBR doctor research group DFP CBR GCS Expert Mode Diagnostic (testing techniques) (diagnosing) Mode [1] [2] RETRIEVE REUSE CASE BASE Programming Mode (BeanShell) [4] [3] RETAIN REVISE programmer 13/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Model overview gene-CBR reclassification Gene Selection most relevant genes = DFP Clustering revised prediction and final diagnostic genetically similar patients Initial prediction Prediction Knowledge Discovery 14/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR::[i] retrieval objectives: perform gene selection without losing information • extracting simplified fuzzy patterns (FP) for each pathology possibility of using AI techniques initially discarded main phases: supervised fuzzy discretisation of gene expression values • Low, Medium, High and overlapping labels (LM, MH) supervised gene selection for each pathology advantages: independence of the ordering existing in data takes into account data variability allows for discovering new knowledge obtained results are interpretable 15/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR::[i] healthy APL AML-inv() AML-mono retrieval AML-other Leucemia Aguda Promielocítica 16/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR::[i] healthy APL AML-inv() AML-mono retrieval AML-other FP_AML-other FP_healthy FP_AML-inv() FP_APL FP_AML-monocytic DFP . . . 17/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR::[ii] reuse objectives: unsupervised identification of genetic similarities between patients • taking only into account the previous selected genes (DFP) main phases: training a GCS network DFP-dimensional • Growing Cell Structures. Fritzke, B. (1993) presenting the new patient to the network classifying using a proportional weighting voting schema advantages: clustering without taking into account the patient class definition of an indexing and similarity structure between nodes ( relating patients) generation of clusters containing new subtypes of unknown cancer (knowledge discovery) 18/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR::[ii] reuse DFP . . . PAT. gene expression values DFP CLASS + Similarity AML-inv() AML-inv() AML-otras AML-inv() - Similarity ¿? AML-inv() 19/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR::[iii] revise objectives: provide doctors with meaningful information about the classification carried out by the system help in discovering new knowledge • if-then rules as decision making support mechanism information supplied: identification of similar patients (from a genetically point of view) proportional weighting voting and assigned weights rules generation using See5. Quinlan, J.R. (2000) • DFP genes belonging to the set of patients retrieved by the GCS network advantages: doctors can supervise the final decision proposed by the system new knowledge generation in the form of easy understandable rules 20/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR::[iii] revise AML-inv() AML-inv() AML-inv() AML-otras AML-inv() BIOLOGICAL AND CLINICAL CHARACTERISTICS CARIOTYPE Rule 6: (45 / 4, lift 1.1) If X65962 (AFFX-HSAC07/X00351_5_at) is LOW then If U96781 (AFFX-BioDn-3_at) is LOW-MEDIUM then AML-other Else If D87845 (AFFX-hum_alu_at) is HIGH then AML-inv() [0.968] 21/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo GENE-CBR::[iv] retain objectives: feedback the system with new knowledge • • • • • new subclassification of existing cancer pathologies reclassification of existing patients identification of correlated genes discovering of new marks able to distinguish new pathologies Identification of prototypical patients and rare cases main phases: update the case base with new a microarray every time a new classification is generated modification of the parameters of the model advantages: possibility of easily integrating new biological knowledge in the hybrid system 22/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Applied technologies Design patterns 100% Java Swing BeanShell Log4j JFreeChart Action Future MVC Singleton Wizard Unified Modeling Language Poseidon for UML 23/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Future work going through a plug-in architecture designing a core where each technique is implemented as a plug-in => aiBENCH implementing fold-cross validation generation of multiple training and test cases in an automatic way supporting standard microarray data formats MIAME: Minimum Information About a Microarray Experiment deploying of GENE-CBR with JavaWebStart remote and automatic access to latest versions of project GENE-CBR on-line access to genetic sequence databases geneBank (http://www.ncbi.nlm.nih.gov/Genbank) 24/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo Demo:: GENE-CBR in action 25/26 Microarrays Bioinformatics-AI CBR systems geneCBR Demo geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets dr. florentino fdez-riverola university of vigo Computer System of New Generation 26