Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine Learning/Data Mining for Cancer Genomics Bernard Manderick, Vrije Universiteit Brussel Henry Nyongesa, University of the Western Cape Collaboration: Artificial Intelligence Laboratory – VUB Intelligent Systems Laboratory - UWC South Africa National Bioinformatics Institute - SANBI Interuniversity Institute of Bioinformatics in Brussels - (IB)2 Project Outline • Machine Learning (ML) is a rapidly growing field of research both in terms of new techniques and applications. • The most exciting aspect of ML is the No Free Lunch principle that states no single ML-algorithm is optimal on types of problems, and hence therefore you can’t have a priori knowledge if any one technique is the most suitable for a particular problem. • In this research we focus on using ML for “mining” large genomic data sets in order to class human tumours. Big Data and the Curse of Dimensionality • The number of different types of datasets in the public domain continues to grow exponentially. • Academic computing research is currently addressing so called “big data" solutions to make sense of the vast datasets coming out of research in other disciplines, including genomics and bioinformatics. • Such data sets are large scale and highly multi-dimensional and not easily amenable to traditional data analysis tools. • ML techniques are suited for automated knowledge discovery from large complex data sets. Data Mining and Cancer Genomics • Data mining addresses the problem of discovering patterns, regularities and structure within data collections. • The field is for this reason, also referred to as “knowledge discovery from databases” (KDD). • Such discovered knowledge can then be applied to make predictions on similar datasets, suggest explanation of dependencies between independent variables, or generally improve decision making. • • Cancer is increasingly becoming more common in African populations. Gene Expression profiling can be used to distinguish between known cancer sub-types, and discover new types that may have remained unknown to pathologists. Research Collaboration between VUB and UWC • Capacity building in competences for advanced research and scholarship: VUB investigators will offer support, expertise and training to UWC staff and students. • Collaborative research into novel machine learning and data mining techniques: Next Generation data mining techniques will require new machine learning algorithms, and new methods for information storage and retrieval, feature selection and selection optimization, and optimization of decision making. • International cooperation through staff and student exchange visits: Make available and accessible to collaborating groups research and educational material developed by either group. Long Term Objectives • Establishment of a Centre for Machine Learning and Data Mining Applications at UWC. • Human capacity development in the field of machine learning and data mining. • Recognition of South African science and technology through international cooperation and collaboration. • Dissemination of South African research output in international workshops and conferences. Significance of Research This research aims to answer basic questions in cancer genomics research: • What is the most optimal and relevant representation of genomic data? • How to diagnose a patient based gene expression profile and on the knowledge gained from previously assayed patients. • How to best integrate genome-wide analytic tool into the large and rapidly increasing amount of genome-wide datasets. Mode of Collaboration • Through staff and student exchange visits, and joint co-supervision of research students. • Participate in joint authorship, publication and dissemination of research papers. • Host a research conference/workshop in each year of the project, jointly organised by the partners, at UWC during which identified international experts and other field scientists shall be invited. • Make project decisions jointly in a democratic fashion, and with the maximum amount of information available, after discussions at face-to- face meetings or by email. • Develop and establish an online collaboration platform within the project to document data and code development, and other information related to the project. Work packages and Timelines • WP1: Use Distributed/High Performance Computing for large sclae genome analysis. • • WP2: Visits to enhance collaboration between partners WP 3: Africa Workshops on Artificial Intelligence, Machine Learning, Data Mining and Bioinformatics • Establishment of UWC – Centre for Artificial Intelligence and Data Mining. Collaboration • Professor Alan Christoffels, Director, SANBI, UWC. • Professor Ann Nowe, VUB/(IB)2 (Machine Learning/Bioinformatics) • Professor Tom Lenaerts, VUB/(IB)2 (Bioinformatics)