Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Pre-Triage Decision Support Improvement in Maternity Care by means of Data Mining Eliana Pereiraa, Andreia Brandãoa, Maria Salazarc, Carlos Filipe Portelab, Manuel Filipe Santosb, José Machadoa , António Abelhaa , Jorge Bragad a Computer b Science and Technology Center (CCTC). University of Minho. Braga. Portugal; Algoritmi Research Center. University of Minho. Guimarães. Portugal; c Serviços de Sistemas de Informação, Centro Hospitalar do Porto, Porto, Portugal d Centro Materno Infantil do Norte, Centro Hospitalar do Porto, Porto, Portugal [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected] ABSTRACT The conventional triage systems like Manchester Triage System (MTS) are not suitable for maternity care, a decision model for pre-triaging patients in emergency (URG) and consultation (ARGO) classes was built and incorporated into a Decision Support System (DSS) implemented in Centro Materno Infantil do Norte (CMIN). Complementarily, DSS produces several indicators to support clinical and management decisions. A recent data analysis revealed a bias in the classification of URG cases. Frequently, cases classified as URG correspond to ARGO. This misclassification has been studied by means of Data Mining (DM) techniques in order to improve the pre-triage model and to discover knowledge for developing a new triage system based on waiting times and on a 5-scale of classes. This chapter presents a kind of sensitivity analysis combining input variables in six scenarios and considering four different DM techniques. CRISP-DM methodology was used to conduct the project. INTRODUCTION Nowadays, we live in an age where information, knowledge and globalization are important issues. Organizations should be able to respond to new challenges and new demands in an environment of constant change. In the healthcare sector, information technologies , providing complete and reliable information for healthcare professionals support their clinical and administrative decisions (Khodambashi, 2013). One example is the system of triage in the hospital emergency unit. In a hospital setting, various types of triage systems are used. The most commonly used consider five levels of severity, the Emergency Severity Index (ESI), the Manchester Triage System (MTS) and the Canadian Triage and Acuity Scale (CTAS). The main limitation of this type of scales is the lack of flexibility, since they usually are used only in emergency units in general and not specific for emergency units (Portela et al., 2013). 2 In Centro Hospitalar do Porto (CHP), in particular the women emergency care of Gynaecology and/or Obstetrics (GO), it was found that the MTS system, implemented in the urgency service is not the most accurate for specific cases of triage such as obstetrics and gynaecology. This happens due to most of the questions used for triage determined that urgent cases were identified when in fact they were not (false positives). Such approach increases the number of patients in emergency and, consequently increasing waiting times of patients who actually need priority attention. Due to the limitations identified, a new system has been developed in order to reduce the false positive rate. An Intelligent Decision Support System (IDSS) has been implemented for pre-triaging patients into two different classes: Urgent (URG) when the patient should be treated at the emergency service in CMIN; and Less Urgent (ARGO), when patient is oriented for a consultation in CMIN. The IDSS will be able to be executed in real time and will include business intelligence components (eg. indicators of Voluntary interruption of pregnancy, triage indicators) and Data Mining. This system is implemented since 2010 and along four years of existence, the number of GO patients in the urgency of the CHP decreased significantly. However, this only solves part of the problems inherent to an emergency department because it only makes an efficient analysis of the patient and it is not performing a priority triage according to patient symptoms. Further work is needed in order to understand and improve pre-triage rules. Data Mining (DM) techniques have been used to determine the level of accuracy of the implemented pretriage system, identifying opportunities to improve the quality of patient care. To this end classification models were induced to predict whether the assignment of URG and ARGO occur according to the questionnaire for evaluating clinical characteristics of the patient. The best result obtained was in the study without ARGO and without URG (accuracy close to 100%). Results demonstrate the reliability of the system for pre-triage. However, other scenarios (accuracy of around 80% in the worst case) demonstrate the need for transformation of the pre-triage system on 5-level priority system to allow better categorization of patients. Beyond this introductory chapter, the document includes more five sections. The first is related to the background and related work which describes the context in which the problem occurs and describes the process of Knowledge Discovery from Databases and the method CRISP-DM. The second section presents a DM case study following the CRISP-DM methodology. The results are discussed in the third section. In the fourth section are presented some conclusions and, finally, in section five some future directions are pointed. BACKGROUND AND RELATED WORK Context Centro Materno Infantil do Norte (CMIN) is a unit of the Centro Hospitalar do Porto (ex Maternidade Júlio Dinis). Women needing for gynaecology or obstetric urgent health care are submitted to a pre-triage system developed in CHP. This system aims to prevent a possible routing of the patient taking into account the clinical characteristics and therefore help selecting the best decision for each situation. To develop this system were used techniques of knowledge discovery and data mining. This Intelligent Decision Support System (IDSS) uses different data available, collected through specific triage 3 questionnaire. For this approach, the knowledge was obtained directly from the empirical and scientific expertise of health professionals to make the first version of decision models. Later, were made adjusts to optimize the process. This system is running since January 2010. During the last four years, were admitted to CMIN 66,730 patients: 18,773 in 2010, 18,348 in 2011, 12,445 patients in 2012 and 17,164 in 2013. As mentioned before, the routing system implemented in CMIN allows distinguish the urgent cases (URG) from the less urgent cases (ARGO), however, healthcare professionals are in charge for the final decision. Healthcare professionals can force URG or ARGO classification, if they do not agree with the result returned by the pre-triage system. Only a pre-triage is performed, no priorities are associated. Triage systems and pre-triage system in CMIN Most emergency services in North America and Europe use triage tools to ensure that patients who need intensive care receive priority treatment and to determine which patients require minors care and can wait, giving priority to the patient in worst clinical conditions (Murray, Bullard, & Grafstein, 2004; Smithson et al., 2013). In a hospital environment various types of triage systems are used. The most commonly used are those with five levels of severity, the Emergency Severity Index (ESI), the Manchester Triage System (MTS) and the Canadian Triage Acuity Scale (CTAS). The main limitation of this type of scales is the lack of flexibility, since, typically, these systems are used only in emergency units (Murray, Bullard, & Grafstein, 2004) because is not prepared to the specificities of other units as is Maternity Care Patients which go to the specialties of obstetrics and gynaecology are a particular case, since they are characterized by very specific symptoms, typical characteristic of obstetrics and gynaecology fields. An example of a priority triage system specific for gynaecology and obstetrics is Obstetric Triage Acuity Scale (OTAS). OTAS was developed based on the Canadian triage and Acuity Scale (CTAS), a tool that was introduced in 1999 and was reviewed in 2006 and 2008 (Murray et al., 2004). This system is very limited because only covers pregnant women and CMIN serves patients who are not pregnant too (Smithson et al., 2013). So in CMIN a pre-triage system was built by using conventional knowledge acquisition and representation techniques in order to characterize those patients at two levels of importance: URG in the case of be urgent and ARGO if it is a less urgent case. This pre-triage system is constituted by a set of six flowcharts supported by a detailed questionnaire that attempts to cover the entire class of patients admitted into CMIN: Pregnant women, post-partum woman, non-post-partum woman, maybe pregnant woman, patients to VIP, and patients to CTG. Agency for Integration, Archive and Diffusion Interoperability among information systems in CHP is guaranteed by the Agency for Integration, Archive and Diffusion of Medical Information (AIDA), which is based on the use of intelligent agents to enable communication among different systems. This multi-agent system allows for the standardization of clinical systems and overcomes medical and administrative complexity inherent to different sources of information. All medical information systems are supported by AIDA, including Electronic Health Record (EHR) and triage routing system implemented in CMIN (Peixoto, Santos, Abelha, & Machado, 2012; Abelha, Analide, Machado, Neves, & Novais, 2007). SWOT analysis The SWOT analysis encompasses an approach to Strengths, Weaknesses, Opportunities and Threats. The SWOT analysis is an important tool to support decision making and is usually used to systematically analyze strategic situations and identify the level of organizations from their external and internal environments. With this tool, strategies can be developed, that can be incorporated on their strengths, 4 eliminating the weaknesses, taking advantage of the opportunities and threats facing. The strengths and weaknesses are identified by an internal review of the organization, while opportunities and threats are the result of an external review. This analysis allows for helping organizations, projects or even individuals about the systematic thinking and comprehensive diagnostic factors. Thus, the positive and negative factors can be identified and subsequently a strategy can be developed resulting in a good fit of these factors. The strengths and weaknesses are identified by an internal review of the organization, while opportunities and threats are the result of an external review. (Salar & Salar, 2014; Shariatmadari, Sarfaraz, Hedayat, & Vadoudi, 2013; Pereira, Salazar, Abelha, & Machado, 2013). Knowledge Discovery and Data Mining The Knowledge Discovery from Databases (KDD) process encompasses a set of ongoing activities that share the knowledge discovered from databases. It consists of five stages (Fayyad, Piatetsky-shapiro, & Smyth, 1996) (Krzysztof, Witold, Roman, & Lukasz, 2007): Selection – At this stage is performed the selection of the data set that will be needed to achieve the DM; Pre-processing – This stage comprises cleaning and processing of data in order to turn them consistent; Transformation – This phase consists of working out the data according to the variable target. Data Mining - At this step are defined the objectives and the type of result that is wanted to achieve. According to the type of desired result, the type of task to be executed was defined (e.g. classification, segmentation, summarization, dependency modelling) and the technique to be used (e.g. decision trees, association rules, linear regression, artificial neuronal networks) was identified. Subsequently, the selected data mining techniques were applied to the data set to obtain patterns. Interpretation/Evaluation – involves the interpretation and evaluation of the patterns obtained. The results obtained are validated by applying the patterns found at new datasets (Azevedo, 2011). Until 1995, many authors considered the KDD and DM equivalent terms. Nowadays, DM is a phase of the Knowledge Discovery in Databases process (KDD) that consists in finding patterns or relationships that may exist in the data stored in data repositories, while the KDD process refers to the whole process of discovering useful knowledge. Cross Industry Standard Process for Data Mining (CRISP-DM) Cross Industry Standard Process for Data Mining (CRISP-DM) was the DM methodology addressed due to its characteristics. CRISP-DM has a close relationship with the phases of the KDD process, described above. The CRISP-DM divides the process of data mining into six major phases (Chapman, 2000; Krzysztof et al., 2007; Machado, Abelha, Rua, & Centre, 2013).These steps are: Business Understanding: this phase focuses on understanding the goal of the project from a business perspective, defining a first plan to achieve the purposes; Data Understanding: involves data collection and start-up activities for familiarization with the 5 data, identifying problems or interesting sets; Data Preparation: at this stage are included all the tasks responsible for creating cases that will be used to build the model table. Data preparation tasks are expected to be executed multiple times. These tasks comprise building of the table of cases, selection of attributes, data cleaning and transformation. Besides, new attributes can be added, obtained based on existing ones. The data preparation phase can significantly improve the information that can be discovered through DM. Modelling: at this stage are applied modelling techniques (e.g. Decision trees, association rules, linear regression, artificial neural networks) and their calibrated parameters for optimization. During this phase it is common return to the data preparation stage; Evaluation: to built a model that seems to have great quality from a perspective of data analysis. However, it is necessary to check whether the model meets the business objectives; Deployment: the knowledge obtained by the model is presented in a way that the customer can use. Model Evaluation DM models built for predicting a particular process, analyze the relationship between the variables used and what contribution they have to perform to the target. In the case of two-class target, a confusion matrix M can be used. Cell M(1,1) stands for the number of True Positive results (TP), where the value obtained corresponds to the expected value. Cell M(2,1) stands for False Positive (FP) results, in which the resulting value incorrectly identifies the occurrence of the procedure (error type I). True Negative (TN) results, cell M(2,2) is also a possible situation, in what the model correctly predicts the nonoccurrence of the procedure and, finally, False Negative (FN) results, in which the non-occurrence of the procedure is not identified correctly (cell M(1,2) – error type II) (Beguería, 2006; Ripley, 2002). From these values statistical metrics for assessing data quality can be deducted, in particular: Sensitivity: to correctly detect the occurrence of the process. It is the resulting ratio with a correct positive (TP), on all the values corresponding to positive (TP + FN); Specificity: is the ability to correctly identify in a model, the non-occurrence of a procedure. It is measured by the ratio of correctly identified values as negative (TN) by all the negative values (TN + FP); Accuracy: is the total percentage of ratio between the values detected correctly and the actual values. It is measured by the proportion of all the results obtained correctly (TP + TN) from the models of all cases liable to be obtained (TP + TN + FP + FN) (João, 2007); IMPROVEMENT OF PRE-TRIAGE DECISION USING DATA MINING In order to validate the pre-triage system of CMIN, Data Mining techniques were used, in particular classification techniques, to verify if the decision model for URG and ARGO is well calibrated in accordance to the surveys that are conducted among patients. In other words, this set of experiments aims 6 to predict whether the definition of URG and ARGO can be standardized according to the specific questionnaire for the evaluation of clinical characteristics of the patient. This procedure has been carried out for all flowcharts featuring the 6 types of patients in CMIN (Pregnant, postpartum, not pregnant, patients who may be pregnant, VIP patients and patients to CTG). Business Understanding As was mentioned before the result of the pre-triage system implemented in CMIN can be URG (routing patients to the urgency of CMIN) and can be ARGO (patients are routed for urgent consultations). As has also been mentioned in the CMIN emergencies are attended 6 types of patients. Each one of these classes of patients is associated to a set of specific questions that determine the state of Urgency (URG) or not Urgency (ARGO). In this context, the problem can be formulated as "How likely is the answer URG or ARGO taking into account the clinical characteristics of patients". This can be translated into a problem of Data Mining as "How accurately a patient is distinguished as URG taking into account a set of specific clinical features?" Data Understanding The data was extracted from AIDA and were analyzed in terms of the quality of the variables to be used in the process. The sample covers the period between 06.01.2010 and 08.04.2014. 78984 cases were analysed, being divided by: 35238 cases of pregnant women; 4050 cases of postpartum woman; 24547 cases of non-postpartum and pregnant woman; 4754 patients who may be pregnant; 2843 patients who use the CMIN to make the process of Voluntary Interruption Pregnancy (VIP); 2511 patients to Cardiotocography (CTG). Attributes were extracted from the different forms used in the pre-triage system, as explained bellow: Patient “Pregnant" consists of the following variables: Results of the triage (RoT) - This is the Target variable and dictates the outcome of the triage process. Possible results: {URG, ARGO}; Symptoms - Represent some specific symptoms that can occur in pregnant and be related with the well-being of the fetus or the pregnant woman. Possible results: { Headache (Hd), Visual Changes (VC), Tension Increase of reference (TIR), epigastric pain/right hypochondrium (EP\RH), nausea/vomiting (N\V), changes in skin/mucous color (CS\MC), breakthrough bleeding (BB), decreased fetal movement (DFM), loss of amniotic fluid (LAF), Trauma in pregnancy (TP) }; Another pathological reason (APR) - If any of the symptoms mentioned in the previous point is not found, the pathological reason should be pointed out in this topic; General state (GS) - In this parameter, the nurses assess the general condition of the patient. This parameter is defined by a range of three possible outcomes. 7 Possible results: {good, bad, reasonable}; Pain Scale (PS) – It’s a scale between 1 and 10 that represent the pain scale, where 1 represents the total absence of pain and 10 representing the pain as much as possible; Symptoms1 - These variables represent symptoms of a more general nature. Possible results: {Fever (Fv), Urinary Symptoms (US), Hemorrhage (Hm), Convulsions (Cv), Syncope (Sc)}. Patient "Postpartum" consists of the following variables: Symptoms2 - Represent some specific symptoms that can occur in postpartum women, and are related to the well-being of women. Possible results: {Breast swelling (BS), Foul lochia (FL), Remove Suture (RS), Fluid (blood or other) passes through the dressing (FPTD)}; The others variables used are already described in the Type of Patient "Pregnant": Results of the triage (RoT), Pain Scale (PS), Symptoms 1) Patient "Not Postpartum" and "Maybe pregnant" consists of the following variables: The variables used were the same already described in the Type of Patient "Pregnant ": Results of the triage (RoT), Pain Scale (PS) and Symptoms 1. Patient “For VIP” and “For CTG” consists of the following variables: The variables used were described in the Type of Patient "Pregnant" : Results of the triage (RoT) and Pain Scale (PS)} Data Preparation At this stage, some studies were performed in order to construct scenarios for achieving the desired models. Four possible scenarios were considered: All data - all data present in the data repository were used for the realization of the models, for each type of Patient; Without ARGO – It was used all the data, except those in which the target variable was not filled with ARGO according to what would be expected, in each type of patient; Without URG – It was used all the data, except those in which the target variable was not filled with URG according to what would be expected in each type of patient. Without URG and ARGO – it was used all data, except those where the target variables ARGO and URG did not meet according to what may be expected in each type of patient. 8 After a preliminary analysis of the data, it was found that they exhibited adequate quality. Thus, a statistical analysis of the data was performed, being represented in the figure 1 the number of occurrences of the target variable for each scenario and for each flowchart. Figure 1- Distribution of values of the target result of triage (ARGO or URG) in different models to the case of pregnant, postpartum, non-postpartum, patients who may be pregnant, VIP patients and patients to CTG. Modelling To induce classification models, four DM classification techniques were used (Fayyad et al., 1996), (Rojão, 2011): Decision Tree (DT): generates automatically rules that are conditional statements that show the logic used to build the tree; Naive Bayes (NB): uses Bayes' Theorem that consists of a formula that calculates a probability by counting the frequency of values and combinations of values in the old data; Generalized Linear Models (GLM): is a popular statistical technique for linear modelling of binary classification. Support Vector Machine (SVM): is a powerful DM technique based on linear and nonlinear regression for binary and multiclass classification. In order to use the GLM model to perform a binary classification, it was necessary to go back to the previous step in order to transform the variable Result of the Triage in binary, since this variable contains the following values: URG: If patients are routed to the urgency of CMIN; ARGO: If patients are routed for urgent consultations. Because URG and ARGO are mutually exclusive classes, “1” was assigned to cases URG and “0” was assigned to cases ARGO. The developed models for each flowchart can be represented by: Mn ≡<Af, Vi, TDMy> Model Mn belongs to the approach (A) and is composed by variables (V) and a DM technique (TDM), where: Af ∈ {Classification} TDMy∈ {SVM, NB, GLM, DT} For each one of the flowcharts different variables (V) than can be used. For the flowchart “Pregnant”: Vi ∈ {RoT, Hd, VC, TIR, EP|RH, N\V, CV\MC, BB, DFM, LAF, TP, APR, GS, PS, Fv, US, Hm, Cv, Sc} For the flowchart “Postpartum”: 9 Vi ∈ {BS, FL, RS, FPTD, RoT, PS, Fv, US, Hm, Cv, Sc} For the flowcharts “Not Postpartum” and “Maybe Pregnant”: Vi ∈ {RoT, PS, Fv, US, Hm, Cv, Sc} Finally, for the flowcharts “For VIP” and “For CTG”: Vi ∈ {RoT, PS} Globally, 96 models were induced (4 Scenarios * 4 techniques * 6 flowcharts / type of patients * 1 target). Evaluation To evaluate the results achieved by the DM models, the evaluation metrics described before were used. The models used 60% of the data for training and 40% of the data for testing (holdout). For each model and by type of patient were calculated the values of sensitivity, specificity and acuity, represented in the tables 1, 2, 3, 4, 5 and 6. Table 1- Models evaluation for pregnant patients. Pregnant Support Vector Machine Sensitivity All Data Naïve Bayes Specificity Acuity 0.953 0.660 0.800 Without URG 0.957 0.647 0.789 Without ARGO 1.000 0.702 0.850 1.000 1.000 1.000 Without URG and ARGO Sensitivity Specificity Acuity 0.951 0.685 0.818 Without URG 0.949 0.693 0.822 Without ARGO 1.000 0.701 0.849 1.000 1.000 1.000 All Data Without URG and ARGO Generalized Linear Model Decision Tree Sensitivity Specificity Acuity All Data 0.951 0.685 0.818 Without URG 0.949 0.693 0.822 Without ARGO 1.000 0.702 0.850 Without URG and ARGO 1.000 1.000 1.000 Sensitivity Specificity Acuity All Data 0.952 0.603 0.751 Without URG 0.957 0.605 0.753 Without ARGO 1.000 0.614 0.778 Without URG and ARGO 1.000 0.838 0.918 Specificity Acuity Table 2- Models evaluation for postpartum patients. Postpartum Support Vector Machine Sensitivity All Data Naïve Bayes Specificity Acuity 0.925 0.716 0.823 Without URG 0.911 0.998 0.947 Without ARGO 1.000 0.743 0.870 1.000 1.000 1.000 Sensitivity Specificity Acuity 0.925 0.716 0.823 Without URG and ARGO Sensitivity 0.925 0.715 0.822 Without URG 0.911 0.993 0.945 Without ARGO 1.000 0.739 0.868 1.000 0.993 0.997 All Data Without URG and ARGO Generalized Linear Model All Data Decision Tree All Data Sensitivity Specificity Acuity 0.931 0.692 0.809 10 Without URG 0.911 0.998 0.947 Without URG 0.916 0.942 0.927 Without ARGO 1.000 0.742 0.870 Without ARGO 1.000 0.718 0.853 Without URG and ARGO 1.000 0.998 0.999 Without URG and ARGO 1.000 0.945 0.976 Table 3- Models evaluation for not postpartum patients. Non Postpartum Support Vector Machine Sensitivity All Data Naïve Bayes Specificity Acuity 0.823 0.888 0.869 Without URG 0.821 1.000 0.942 Without ARGO 1.000 0.888 0.917 1.000 1.000 1.000 Without URG and ARGO Sensitivity Specificity Acuity 0.823 0.888 0.869 Without URG 0.821 1.000 0.942 Without ARGO 1.000 0.888 0.917 1.000 1.000 1.000 All Data Without URG and ARGO Generalized Linear Model Decision Tree Sensitivity Specificity Acuity Sensitivity Specificity Acuity All Data 0.823 0.888 0.869 All Data 0.822 0.887 0.868 Without URG 0.821 1.000 0.942 Without URG 0.821 0.999 0.942 Without ARGO 1.000 0.888 0.917 Without ARGO 1.000 0.887 0.916 Without URG and ARGO 1.000 1.000 1.000 Without URG and ARGO 1.000 0.999 0.999 Table 4- Models evaluation, for maybe pregnant patients. Maybe Pregnant Support Vector Machine Sensitivity Naïve Bayes Specificity Acuity Sensitivity Specificity Acuity All Data 0.247 0.000 0.247 All Data 0.795 0.861 0.850 Without URG 0.789 1.000 0.961 Without URG 0.789 0.997 0.959 Without ARGO 0.246 0.000 0.246 Without ARGO 1.000 0.861 0.879 Without URG and ARGO 1.000 1.000 1.000 Without URG and ARGO 1.000 0.998 0.998 Sensitivity Specificity Acuity 0.795 0.863 0.851 Without URG 0.789 1.000 0.961 Without ARGO 1.000 0.862 0.879 1.000 1.000 1.000 Generalized Linear Model All Data Without URG and ARGO Decision Tree Sensitivity Specificity Acuity 0.795 0.861 0.850 Without URG 0.789 0.997 0.959 Without ARGO 1.000 0.861 0.879 1.000 0.998 0.998 All Data Without URG and ARGO Table 5- Models evaluation for VIP patients. VIP Support Vector Machine Sensitivity All Data Naïve Bayes Specificity Acuity 0.026 0.000 0.026 Without URG 0.667 1.000 0.997 Without ARGO 0.027 0.000 0.027 Sensitivity Specificity Acuity 0.667 0.980 0.977 Without URG 0.667 1.000 0.997 Without ARGO 1.000 0.977 0.977 All Data 11 Without URG and ARGO 1.000 1.000 1.000 Without URG and ARGO Generalized Linear Model 1.000 1.000 1.000 Decision Tree Sensitivity Specificity Acuity All Data 0.667 0.980 0.977 Without URG 0.667 1.000 0.997 Without ARGO 1.000 0.977 0.977 Without URG and ARGO 1.000 1.000 1.000 Sensitivity Specificity Acuity All Data 0.667 0.980 0.977 Without URG 0.667 1.000 Without ARGO 1.000 0.977 Without URG and ARGO 0.000 0.995 0.997 0.977 0.995 Table 6- Models evaluation for CTG patients. CTG Support Vector Machine Sensitivity Naïve Bayes Specificity Acuity Sensitivity Specificity Acuity All Data 0.199 0.000 0.199 All Data 1.000 0.823 Without URG 0.941 1.000 0.998 Without URG 0.941 1.000 0.998 0.200 0.000 0.200 Without ARGO 1.000 0.815 0.818 1.000 1.000 1.000 Without URG and ARGO 1.000 1.000 1.000 Without ARGO Without URG and ARGO Generalized Linear Model Sensitivity 0.827 Decision Tree Specificity Acuity 0.827 Sensitivity Specificity Acuity All Data 1.000 0.823 0.827 Without URG 0.941 1.000 0.998 0.818 1.000 All Data 1.000 0.823 Without URG 0.941 1.000 0.998 Without ARGO 1.000 0.815 0.818 Without ARGO 1.000 0.815 Without URG and ARGO 1.000 1.000 1.000 Without URG and ARGO 1.000 1.000 Deployment The models obtained will be used to improve pre-triage system implemented in CMIN. DM models will be integrated in the DSS implemented in CMIN. An increment of the quality of patient care and service is expected by optimizing the resources allocation and by reducing the waiting time. In order to assess the DSS and the pre-triage system and to define a strategy, a SWOT analysis has been carried out: Strengths: System calibrated for discriminating between URG and ARGO; Specific system for gynaecology and obstetrics; Usability; Interoperability; High availability; Health professionals are interested in the benefits of the IDSS system implemented; High collaboration between clinical (nurses and physicians) and information systems staff. Weaknesses: Limited and reduced range; Possibility of error in referring patients; 12 The system is only a routing model able to distinguish patients into two levels (URG, ARGO). Opportunities: System with a big possibility of growing; Evolution to a specific priority system similar to MTS and OTAS; Introduce / improve real-time and online learning components; Use Data Mining to improve the DSS. Threats to Pre-triage system: Wrongs diagnosis; System failures; Competition from other similar systems; Security of the system. DISCUSSION In the data preparation phase it was found that a minority of cases (10 %) classified as URG have not enough information to justify such classification. On the other hand, some cases labelled as ARGO have associated parameters for URG classification. These situations occur only when the patients have one or two parameters associated to URG classification. This is because healthcare professionals, responsible for triage, have the final decision and can force a different outcome (URG, ARGO and EMERG) if they do not agree with the result of the pre-triage system. In 12% of the cases healthcare professionals force a different result than it is obtained by sorting the pre-triage system of CMIN. The Graphs of figure 1 presented in Data Preparation subsection showed the data distribution for each one of the flowcharts / type of patients considered in this study. Taking into account the induced models for each one of the four proposed scenarios (all data, without URG, and without ARG, without ARGO and no URG), best results obtained for each one of the flowcharts / type of patients are presented in the table 7. The number of correct or incorrect predictions was calculated using 40% (testing data) of the total data for each target / set of data. Table 7- Number of cases that the best Data Mining technique applied hit and missed for each of the scenarios defined, in pregnant, postpartum, non-postpartum, maybe pregnant, for VIP and for CTG flowcharts/type of patients. Pregnant Data Mining technique(s) Post-Partum Correct % of Correct Incorrect All Data NB 11527 2568 81.78 All Data Without U RG NB 14544 3148 82.21 Without U RG GLM and 14658 SVM GLM and 11502 SVM Non Postpartum 2596 84.95 Without ARGO Without ARGO Without URG and ARGO Data Mining Correct 0 100.00 Without URG and ARGO Data Mining technique(s) Correct % of Correct Incorrect GLM and SVM GLM and SVM 1351 291 82.28 1331 75 94.67 SVM 1398 208 87.05 SVM 1311 0 100.00 Maybe Pregnant Incorrect % of Data Mining Correct Incorrect % of 13 technique(s) All Data Without U RG Without ARGO Without URG and ARGO Data Mining technique(s) All Data Without U RG Without ARGO Without URG and ARGO Correct SVM. NB and GLM GLM. SVM and NB GLM. SVM and NB GLM. SVM and NB For VIP GLM. NB and DT GLM. SVM. DT and NB GLM. DT and NB GLM and SVM 8597 1295 8566 technique(s) Correct 86.91 All Data GLM 1592 278 85.13 527 94.21 Without U RG GLM 1589 64 96.13 8519 775 91.66 Without ARGO GLM 1650 227 87.91 8488 0 GLM and 1519 SVM For CTG 0 100.00 Correct 100.00 Without URG and ARGO % of Correct Incorrect 1105 26 1091 3 1066 25 1102 0 97.70 Data Mining technique(s) All Data 99.73 Without U RG 97.71 Without ARGO 100.00 Without URG and ARGO Correct Incorrect % of Correct GLM. NB and DT GLM. SVM. DT and NB GLM. DT and NB 801 167 82.74 805 2 99.75 788 175 81.83 GLM. SVM. DT and NB 802 0 100.00 In general for all flowcharts the model all data obtained the worst results. This is explained by the existence of records that are classified as URG which do not have any parameters associated to URG and records where the result was ARGO and the result should be URG (because they have parameters associated to URG). For this scenario the worst performance is related to pregnant flowchart with 81.78 % of correctness and the best is related to patients to VIP with a precision equal to 97.70 %. VIP model can be considered an acceptable value for a decision support system, because for all the cases the percentage of correct predictions were upper than 97.5%. Models without URG and without ARGO were performed in order to validate the pre-triage and to find possible improvement points. For the case without URG the worst result is related to pregnant patients with 82.21 % of correct classifications and the best result is related to CTG with 99.75% of correct classifications. Model without ARGO presented the worst result for the class of patients CTG with 81.83% of correct classifications and the best case is related to VIP with about 97.71% of correct classifications. By removing the cases ARGO and URG that were not expectable, accuracy increases. This means that when the output is not forced, the flowcharts (without URG and without ARGO) are adequate i.e., only in these cases the pre-triage system works without failures. Pre-triage system can be improved, as witnessed by the studies all data, without ARGO and without URG. The studies without URG and without ARGO showed that the pre-triage system is calibrated (DM accuracy of 100%) and consequently, the flowcharts are adequate. However, in the other cases (all data, without URG and without ARGO) the attained results (accuracy lower than 89.71%) showed that sometimes it is necessary to force the pre-triage result, in order to better characterize the patient condition. After an analysis of patient records from January 2010 to April 2014, some weaknesses were found. Some errors may occur in the categorization of patient outcome and the system only distinguishes patient in two levels of priorities (URG or Argo). For example, the order of patient care for the class URG is the order of arrival, which is not always consistent with the level of severity. And lastly, there are still healthcare professionals that sometimes need to force a different output than would be expected by the system (URG or ARGO). The idea of transforming the pre-triage system in a specific priority triage system to gynecology and obstetrics similar to the Manchester Triage System is gaining momentum. This amendment would allow patients who are currently triaged as URG, were distinguished among various 14 levels of priorities. These changes undoubtedly would bring great gains in healthcare, since patients with greater severity would be served first, and only after those in need of minor clinical care. The SWOT analysis showed some threats. The biggest threat is the possibility of wrong diagnosis and existence of system failures. The security is an issue that administrators should always take into account in the development process of any system. In this sense, it is very important to ensure the security and confidentiality of the information. It is also very important to ensure system availability. This means that contingency plans should be followed in the case of disaster situations or if the system fails (CMIN activity cannot be stopped). DSS system will be supported by the interoperability platform AIDA which assures a high level of security. CONCLUSIONS In the health sector is very important to make quick and assertive decisions because they are often related to human life. This paper presented data mining models for assessing a triage DSS implemented in the CMIN for routing patients between URG and ARGO. This system classifies patients into two levels and was built to reduce the number of cases classified as URG where in fact they were not. With this system non URG cases are routed for consultation. Making use of DM techniques a work was carried out in order to verify whether or not the system is calibrated according to what it is expect by using flowcharts, i.e., if the pre-triage result matches the patient condition. Six distinct scenarios were explored. The scenarios without ARGO and without URG showed the need of improvements in the pre-triage system. This system should evolve to a priority based system similar to MST and OTAS, but prepared to attend all type of maternity care patients. This change can increase the patient care quality and satisfaction by reducing the number of misclassified cases (cases where the nurses responsible for triage do not agreed with the pre-triage system result) and decreasing the hospital care waiting time. The new triage system will be more sensitive to the patient condition and will better characterize the patient care needs, prioritizing the treatment according their clinical condition. A SWOT analysis was demonstrated the pertinence in adapting the current pre-triage system into a priorities specific system for Gynaecology and Obstetrics units. This new system will be similar to MTS, enabling the triage of patients at 5 levels of acuity, and similar to the OTAS, in the case of pregnant flowchart, being a system specific to Obstetrics and Gynaecology specialties. FUTURE RESEARCH DIRECTIONS Work will be done joining the methodologies used by MTS and the OTAS, the variables used in pretriage system implemented in CMIN, along with new variables that are shown to be relevant in order to map the existing system to a priorities specific system in Gynaecology and Obstetrics with 5 levels of acuity. Like the pre-triage system, this new system of priorities will not be limited to pregnant patients (as with the OTAS system), but will consider the remaining types of patients who are treated at CMIN (postpartum, non-postpartum, maybe pregnant, to IGO and to CTG patients). Currently studies concerning to the adaptation of this new triage system are in development together with clinical specialists in the field of Maternity Care. Complementary work will be carried out in order make an intelligent DSS by exploring adaptive capacities of the triage system. Via ensemble DM models, the data recorded can be used to adapt, in realtime, the triage model. 15 ACKNOWLEDGEMENTS This work is funded by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within projects PEst-OE/EEI/UI0752/2014 and PEst-OE/EEI/UI0319/2014. The work of Filipe Portela was supported by a postdoctoral grant associated to FCT project INTCare II - PTDC/EEI-SII/1302/2012. REFERENCES Abelha, A., Analide, C., Machado, J., Neves, J., & Novais, P. (2007). Ambient Intelligence And Simulation In Health Care Virtual Scenarios, IFIP — The International Federation for Information Processing, 243, 461–468. Beguería, S. (2006). Validation and evaluation of predictive models in hazard assessment and risk management. Natural Hazards, 37(3), 315–329. Chapman, P. (2000). The CRISP-DM User Guide. NCR Systems Engineering Compenhagen. Fayyad, U., Piatetsky-shapiro, G., & Smyth, P. (1996). From Data Mining to Knowledge Discovery in, 37–54. João, A. (2007). Avaliação de Artigos Científicos. Bases da Epidemiologia Clínica. Khodambashi, S. (2013). Business Process Re-engineering Application in Healthcare in a Relation to Health Information Systems. Procedia Technology, 9(2212), 949–957. doi:10.1016/j.protcy.2013.12.106 Krzysztof, C., Witold, P., Roman, S., & Lukasz, K. (2007). Data Mining. A knowledge Discovery Approach. Springer. Machado, J., Abelha, A., Rua, F., & Centre, A. (2013). Real-time Predictive Analytics for Sepsis Level and Therapeutic Plans in Intensive Care Medicine. International Information Institute. Murray, M., Bullard, M., & Grafstein, E. (2004). Revisions to the Canadian Emergency Department Triage and Acuity Scale implemenation guidelines. Cjem, 6(6), 421–7. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17378961 Peixoto, H., Santos, M., Abelha, A., & Machado, J. (2012). Intelligence in Interoperability with AIDA. In L. Chen, A. Felfernig, J. Liu, & Z. Raś (Eds.), Foundations of Intelligent Systems SE - 31 (Vol. 7661, pp. 264–273). Springer Berlin Heidelberg. doi:10.1007/978-3-64234624-8_31 16 Pereira, R., Salazar, M., Abelha, A., & Machado, J. (2013). SWOT Analysis of a Portuguese Electronic Health Record. In C. Douligeris, N. Polemi, A. Karantjias, & W. Lamersdorf (Eds.), I3E (Vol. 399, pp. 169–177). Springer. Portela, F., Cabral, A., Abelha, A., Salazar, M., Quintas, C., Machado, J., … Santos, M. F. (2013). Knowledge Acquisition Process for Intelligent Decision Support in Critical Health Care. In & J. V. R. Martinho, R. Rijo, M. Cruz-Cunha (Ed.), R. Martinho, R. Rijo, M. CruzCunha, & J. Varajão (Vol. Informatio). Hershey, PA: Medical Information Science Reference. Ripley, B. D. (2002). Statistical Data Mining, (May). Rojão, A. I. R. . (2011). Data mining languages for business intelligence. University of Minho. Retrieved from http://repositorium.sdum.uminho.pt/handle/1822/22892 Salar, M., & Salar, O. (2014). Determining Pros and Cons of Franchising by Using Swot Analysis. Procedia - Social and Behavioral Sciences, 122, 515–519. doi:10.1016/j.sbspro.2014.01.1385 Shariatmadari, M., Sarfaraz, A. H., Hedayat, P., & Vadoudi, K. (2013). Using SWOT Analysis and Sem to Prioritize Strategies in Foreign Exchange Market in Iran. Procedia - Social and Behavioral Sciences, 99, 886–892. doi:10.1016/j.sbspro.2013.10.561 Smithson, D. S., Twohey, R., Rice, T., Watts, N., Fernandes, C. M., & Gratton, R. J. (2013). Implementing an obstetric triage acuity scale: interrater reliability and patient flow analysis. American Journal of Obstetrics and Gynecology, 209(4), 287–93. doi:10.1016/j.ajog.2013.03.031 ADDITIONAL READING SECTION 17 Angelini, D. J., Zannieri, C. L., Silva, V. B., Fein, E., & Ward, P. J. (1990). Toward a concept of triage for labor and delivery: staff perceptions and role utilization. The Journal of Perinatal & Neonatal Nursing, 4(3), 1–11. Bellazzi, R., & Zupan, B. (2008). Predictive data mining in clinical medicine: current issues and guidelines. International journal of medical informatics, 77(2), 81-97. Cabral, A., Pina, C., Machado, H., Abelha, A., Salazar, M., Quintas, C., … Santos, M. F. (2011). Data acquisition process for an intelligent decision support in gynecology and obstetrics emergency triage. ENTERprise Information Systems Communications in Computer and Information Science in Maria Manuela Cruz-Cunha, João Varajão, Philip Powell. Ricardo Martinho. Volume 221, 2011, pp 223-232. Springer. Cronin, J. G. (2003). The introduction of the Manchester triage scale to an emergency department in the Republic of Ireland. Accident and Emergency Nursing. Kantardzic, M. (2011). Data mining: concepts, models, methods, and algorithms: Wiley-IEEE Press. Koh, H. C., & Tan, G. (2011). Data mining applications in healthcare. Journal of Healthcare Information Management—Vol, 19(2), 65. Maconochie, I., & Dawood, M. (2008). Manchester triage system in paediatric emergency care. BMJ (Clinical research ed.) (Vol. 337, p. a1507). Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/22334644 Martins, H. M. G., Cuña, L. M. D. C. D., & Freitas, P. (2009). Is Manchester (MTS) more than a triage system? A study of its association with mortality and admission to a large Portuguese hospital. Emergency Medicine Journal, 26(3), 183–186. Peng, Y., Kou, G., Shi, Y., & Chen, Z. (2008). A descriptive framework for the field of data mining and knowledge discovery. International Journal of Information Technology & Decision Making, 7(04), 639–682. Portela, F., Santos, M. F., Silva, Á., Machado, J., Abelha, A., & Rua, F. (2013). Data mining for real-time intelligent decision support system in intensive care medicine. S. Khodambashi, “Business Process Re-engineering Application in Healthcare in a Relation to Health Information Systems,” Procedia Technology in Maria Manuela Cruz-Cunha, João 18 Varajão, Helmut Krcmar and Ricardo Martinho (Eds), vol. 9, no. 2212, pp. 949–957, Jan. 2013. Santos, M. F., & Azevedo, C. S. (2005). Data mining: descoberta de conhecimento em bases de dados. FCA editores. Saúde, M. da. (2006). Serviço de Urgência - Recomendações para a organização dos cuidados urgentes e emergentes. Ministério Da Saúde - Hospitais SA. Services, U. S. D. o. H. H. (2003). Emergency Severity Index - Five-level Triage Systems. . Ministério da Saúde (2013). Triagem Obstétrica- modelo de Triagem. Lisboa: Direção Geral de Saúde. Turban E., A. J. E. & L. T.-P. (2005). Decision Support Systems and Intelligent Systems . W. Bonney, “Applicability of Business Intelligence in Electronic Health Record,” Procedia Soc. Behav. Sci., vol. 73, pp. 257–262, Feb. 2013. KEY TERMS & DEFINITIONS AIDA - Platform developed to ensure interoperability among healthcare information systems. Data Mining - Process of exploring large amounts of data in search of consistent patterns. Decision Support System- A computerized information system used to support decision-making process in an organization or business. Interoperability - Autonomous ability to interact and communicate. Knowledge Discovery from Databases – Process that encompasses a set of ongoing activities that share the knowledge discovered from databases. It consists of five stages, namely, selection, pre-processing, transformation, data mining and interpretation/evaluation. Manchester Triage System - The MTS is a scale used in the triage process of patients when they are admitted in the Emergency Department. Maternity Care - Health institution where patients of gynecologists and obstetrics specialties are admitted. Obstetric Triage Acuity Scale – A 5-category scale used in the triage process of patients when they are admitted in the Emergency Department of an Obstetric unit. SWOT analysis - Picking and discussion of strengths, weaknesses, opportunities and threats with the purpose of know better and improve a system 19 Triage System - A triage system has as main aim to improve the quality of care in that it provides a service based on clinical characteristics and the target time.