Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NeoMark: how to predict oral cancer recurrence through multiscale data analysis Marco Picone, Sebastian Steger, Konstantinos Exarchos, Marco De Fazio, Gianfranco Chiari, Diego Ardigò, Elena Martinelli Early prediction of cancer reoccurrence constitutes a challenge for oncologists and surgeons. In the EU-Project NeoMark, scientists from different medical and biology research fields joined efforts with Information Technology experts to identify methods and algorithms able to early predict the reoccurrence risk for OSCC. The main challenge is to design and develop algorithms able to identify a “signature'” or bio-profile of the disease, by integrating multiscale and multivariate data from medical images, genomic profile from tissue and circulating cells RNA and other medical parameters collected from patients before and after treatment. A limited number of relevant biomarkers will be identified and used in a real-time PCR device, for early detection of disease reoccurrence. The idea behind NeoMark is that, by analyzing a sufficient set of different types of data (clinical, biomedical, genomic, histological, from digital imaging, from surgery evidence, etc.) of patients affected by OSCC before treatment and at the time of remission, a set of relevant biomarkers appearing only in presence of the disease might be identified. The recurrence of the same biomarker phenotype during post-remission follow-up may precede the clinical manifestation of the relapse thus allowing earlier intervention. Figure 1 - NeoMark System overview The versatile user requirements and especially the integration of heterogeneous input data required a careful design of the NeoMark system. Our goal was to integrate as much functionality as possible in a single unified service oriented system, achieving great flexibility and usability. These properties increase the user acceptance and may decrease human error. The basic scheme of the implemented system can be seen in Fig.1. The proposed architecture is a service-oriented architecture (SOA) able to support the use of Web services to ensure interoperability between different systems. There are some individual applications that work as modules in the system in order to provide a single starting point to meet the needs of the users that can add new information, review and edit available data and make analyses with the stored information. The main module of this architecture is the data repository located on the NeoMark Server. For the interaction with this central component there are some different tools for Data Entry, Genomic Analyses, Imaging processing, Data Mining and Security. Some of those tools have a web-based access point and the others for some computational constraints are located on the client’s machine, but always with an interaction with the central unit. The NeoMark System is scalable because we can add in easily new Hospitals or Centers that after a small initialization procedure (Sensitive Data database and standalone application) can immediately start with the data storage and with the data analyses. The central repository is called Integrated Health Record Repository (IHRR) and its purpose is to store many types of heterogeneous data coming from the different modules and layers of the system. Generally all systems handling patient data need to provide a concept of handling sensitive data in order to protect the patient’s right of privacy and to prevent data abuse. All data that allows identifying the identity of a patient just by itself or in combination with other data is considered as sensitive. To manage this kind of issue the NeoMark Architecture has a central DB for the clinical data (IHRR) and different local databases to store the sensitive data according to each hospital constraints and restrictions. These local databases are located in each specific hospital's network in order to be accessible only by local doctors that are authorized to see patient's personal information and that are connected from the same hospital's network. The interaction between sensitive information and the NeoMark repository in managed by a specific standalone application that hide this kind of approach to final users and allows to create, edit and manage sensitive information with clinical data stored in the centralized repository. Figure 2 - Sensitive Data Management Architecture Most of the user interaction is done via the web interface. The physician can manage patients, enter clinical data, view all features and the NeoMark results. The clinician can upload genomic data and researchers can view anonymous statistics, which could serve as a base for future research on oral cancer. However there are three exceptions to this architecture: • The NeoMark Image Processing Tool. This standalone Win64 application is installed on the radiologist's workstation. It is used to semi automatically extract relevant features from medical images. Due to the huge amount of imaging data and the computational complexity of the sophisticated image processing and analysis algorithms, it was not feasible to integrate this functionality in the rest of the system. However the tool is connected to the NeoMark system via a network connection. The task of the feature extraction module is to extract from that huge amount of data meaningful numeric features from tumors and suspicious lymph nodes that appear to be important for reoccurrence prediction. Whether or not a feature is really important will eventually be determined by correlation analysis of each feature with the NeoMark result for a given training set. All images are acquired before treatment and then every 6 month during follow up. The high resolution (1mm slice thickness) CT images cover the entire head and neck region, whereas the MR images only cover the tumors and significant lymph nodes. Before they can be loaded by the tool, the need to be anonymized to be compliant with privacy regulations regarding the handling of sensitive data. Figure 3- Image Processing Tool • The PCR Chip Upload Tool. This tool downloads genomic features from a PCR chip reader device and submits them to the NeoMark system. Due to the direct access to external hardware, this tool could not be integrated, but rather is a standalone application which is installed on the clinician's workstation. The qRT-PCR platform is under development in STMicroelectronics, in order to obtain quantitative information about the PCR amplification of the targeted genes. It is a portable, real-time, integrated analytical system based on qRT-PCR performed in an array of silicon micro chambers. The small size of the components, as well as its low power requirements make this system an ideal candidate for further miniaturization into a hand-held, point-of-care device. The qRT-PCR lab-on-chip is disposable and relatively inexpensive in order to make this method of analysis economically viable. The excellent thermal conductivity of silicon makes it ideal in applications requiring rapid cycles of heating and cooling. Figure 4-PCR Tool • The Genomic Data Cleaning and Filtering is used to analyze information taken from gene expression data coming from Feature Extraction (FE) files. The analyses in based on Control and Duplicate Features, Filtering of Genes based on low data quality and Filtering of Genes with high number of missing values taken from. The relevant information that are stored in the database are Feature Name, Probe Name Gene Name, Systematic Name, Description and Log2-ratio. Application generates as output a cleaned file with a small dimension that contains only these relevant information and that can be. uploaded from a specific page of NeoMark WebApplication into the database. The analysis of the heterogeneous data constitutes the cornerstone of the NeoMark artificial intelligence component. The aim of this component is twofold: i) to assess the risk of reoccurrence in the very early stages of treatment, i.e. as soon as the patient reaches remission, and ii) to efficiently and effectively model the disease evolution during the whole follow-up period based on a multitude of heterogeneous data, thus monitoring the patient’s therapeutic progression. As described in the clin ical scenario of the NeoMark project, for each patient that has been diagnosed with oral cancer a wide range of heterogeneous data is collected and analyzed. Specif ically, due to the complex nature of the disease, a holistic approach is performed which integrates a great multitude of clinical, imaging and genomic data in order to “frame” every possible aspect related to the onset and progression of oral cancer. In the present study we employ DBNs in order to early identify potential relapses of the disease, during the follow-up. As described in the clinical scenario, a snapshot of the patient’s medical condition is acquired during every predefined follow-up with the doctor. By exploiting the information of history snapshots we aim to model the progression of the disease in the future. The proposed prognostic model is based on DBNs, which are temporal extensions of Bayesian Networks (BNs.). We have presented a novel ICT enabled cancer reoccurrence prediction method and have described the system implementing this idea. In addition to the great innovation of collecting and jointly interpreting such an enormous amount of heterogeneous data, the development of the NeoMark system led to further innovations: • The data analysis component not only predicts the probability of a relapse over- all, but also the probability at a given time. All predictions are updated upon retrieval of follow up input data. • For the first time genomic data obtained from a PCR chip will eventually replace the expensive and complex laboratory based genomic data extraction. • The innovative semi-automatic multimodal image feature extraction alghorithms extract imaging features of tumors and lymph nodes that are well suited for further processing by the data analysis component due to their numeric manner and robustness.