Download Machine Learning/Data Mining for Cancer Genomics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Machine Learning/Data Mining for Cancer Genomics
Bernard Manderick, Vrije Universiteit Brussel
Henry Nyongesa, University of the Western Cape
Collaboration:
Artificial Intelligence Laboratory – VUB
Intelligent Systems Laboratory - UWC
South Africa National Bioinformatics Institute - SANBI
Interuniversity Institute of Bioinformatics in Brussels - (IB)2
Project Outline
•
Machine Learning (ML) is a rapidly growing field of research both in
terms of new techniques and applications.
•
The most exciting aspect of ML is the No Free Lunch principle that
states no single ML-algorithm is optimal on types of problems, and
hence therefore you can’t have a priori knowledge if any one
technique is the most suitable for a particular problem.
•
In this research we focus on using ML for “mining” large genomic
data sets in order to class human tumours.
Big Data and the Curse of Dimensionality
•
The number of different types of datasets in the public domain
continues to grow exponentially.
•
Academic computing research is currently addressing so called “big
data" solutions to make sense of the vast datasets coming out of
research in other disciplines, including genomics and bioinformatics.
•
Such data sets are large scale and highly multi-dimensional and not
easily amenable to traditional data analysis tools.
•
ML techniques are suited for automated knowledge discovery from large
complex data sets.
Data Mining and Cancer Genomics
•
Data mining addresses the problem of discovering patterns, regularities
and structure within data collections.
•
The field is for this reason, also referred to as “knowledge discovery
from databases” (KDD).
•
Such discovered knowledge can then be applied to make predictions on
similar datasets, suggest explanation of dependencies between
independent variables, or generally improve decision making.
•
•
Cancer is increasingly becoming more common in African populations.
Gene Expression profiling can be used to distinguish between known
cancer sub-types, and discover new types that may have remained
unknown to pathologists.
Research Collaboration between VUB and UWC
•
Capacity building in competences for advanced research and
scholarship: VUB investigators will offer support, expertise and training
to UWC staff and students.
•
Collaborative research into novel machine learning and data mining
techniques: Next Generation data mining techniques will require new
machine learning algorithms, and new methods for information storage
and retrieval, feature selection and selection optimization, and
optimization of decision making.
•
International cooperation through staff and student exchange visits:
Make available and accessible to collaborating groups research and
educational material developed by either group.
Long Term Objectives
•
Establishment of a Centre for Machine Learning and Data Mining
Applications at UWC.
•
Human capacity development in the field of machine learning and
data mining.
•
Recognition of South African science and technology through
international cooperation and collaboration.
•
Dissemination of South African research output in international
workshops and conferences.
Significance of Research
This research aims to answer basic questions in cancer genomics
research:
•
What is the most optimal and relevant representation of genomic
data?
•
How to diagnose a patient based gene expression profile and on the
knowledge gained from previously assayed patients.
•
How to best integrate genome-wide analytic tool into the large and
rapidly increasing amount of genome-wide datasets.
Mode of Collaboration
•
Through staff and student exchange visits, and joint co-supervision of research
students.
•
Participate in joint authorship, publication and dissemination of research
papers.
•
Host a research conference/workshop in each year of the project, jointly
organised by the partners, at UWC during which identified international experts
and other field scientists shall be invited.
•
Make project decisions jointly in a democratic fashion, and with the maximum
amount of information available, after discussions at face-to- face meetings or by
email.
•
Develop and establish an online collaboration platform within the project to
document data and code development, and other information related to the
project.
Work packages and Timelines
•
WP1: Use Distributed/High Performance Computing for large sclae
genome analysis.
•
•
WP2: Visits to enhance collaboration between partners
WP 3: Africa Workshops on Artificial Intelligence, Machine
Learning, Data Mining and Bioinformatics
•
Establishment of UWC – Centre for Artificial Intelligence and Data
Mining.
Collaboration
•
Professor Alan Christoffels, Director, SANBI, UWC.
•
Professor Ann Nowe, VUB/(IB)2 (Machine
Learning/Bioinformatics)
•
Professor Tom Lenaerts, VUB/(IB)2 (Bioinformatics)