Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clinical text mining at Stockholm University and at other research groups in Europe. Hercules Dalianis Clinical Text Mining Group Department of Computer and Systems Sciences (DSV) [email protected] Stockholm University (2016) • 73 Departments and centers • 71,000 Students • 2,000 PhD students • 5,500 Employees Hercules Dalianis, Donostia, April 6, 2016 2 Frescati main campus Hercules Dalianis, Donostia, April 6, 2016 3 Dep. of Computer and Systems Sciences (DSV) • 5,400 students • 85 PhD students • 173 Employees Hercules Dalianis, Donostia, April 6, 2016 4 DSV in Kista, Silicon Valley of Sweden Hercules Dalianis, Donostia, April 6, 2016 5 Clinical text mining group 2007-2014 Aron Henriksson, Sara Brissman, Martin Hassel, Hideyuki Tanushi, Mia Kvist, Maria Skeppstedt, Sumithra Velupillai, Hercules Dalianis (Not in photo Claudia Ehrentraut, and Rebecka Weegar) Hercules Dalianis, Donostia, April 6, 2016 6 Claudia Ehrentraut, and Rebecka Weegar Hercules Dalianis, Donostia, April 6, 2016 HEALTH BANK 2 mil. patient records 7 2007-2014 HEALTH BANK • 23 000 users (readers and writers), • 6-7 different professions • Structured information: – Serial number, time points, clinical unit, age, gender, blood and laboratory values, ATCcodes, ICD-10 diagnosis codes • Unstructured text under different headings – Anamnesis, Assessment, Social, Discharge letter Hercules Dalianis, Donostia, April 6, 2016 8 Research projects • MINECAN - Data and text mining of cancer symptoms and comorbidities in electronic patient records in the Nordic languages, funded The Nordic Information for Action e-Science Center of Excellence, 2014-2019. • DADEL - High-Performance Data Mining for Drug Effect Detection, 2013-2016. • AVID - Avidentifiering för sekundär användning av patientjournaler, 2016. • Detect-HAI, Detection of Healthcare Associated Infections (finalized) Hercules Dalianis, Donostia, April 6, 2016 9 Supervised or un-supervised methods? • Supervised need lots of annotation efforts by physicians • Unsupervised can use already annotated data ICD-10 codes, ACT-drug-codes, time stamps, patient gender and age. Hercules Dalianis, Donostia, April 6, 2016 10 Ethical issues around annotations • Direct access to data • Annotation is sensitive • Annotation is difficult Hercules Dalianis, Donostia, April 6, 2016 11 Unsupervised methods • Unsupervised methods can use the built in structure. Hercules Dalianis, Donostia, April 6, 2016 12 Monitoring HAIs • Compulsory manual reporting by personnel – However seldom carried out • Point Prevalence Measures (PPM) – Manual and carried out twice a year (during a day) • Infektionsverktyget (Infection tool) – All prescriptions of antibiotics is reported centrally Hercules Dalianis, Donostia, April 6, 2016 13 Manual monitoring • Difficult • Tiresome • Low IAA between physicians • Only on a small sample 1-2 percent of all inpatients Hercules Dalianis, Donostia, April 6, 2016 14 Automatic HAI monitoring • To ease burden of clinicians • To assist hospital management • To get better reporting on a larger population Hercules Dalianis, Donostia, April 6, 2016 15 A Hospital Acquired Infection Case 123 H - IVA 322916614D 2007-08-21 9:12 1944 Woman Anamnesis Pneumonia, I110. Heart failure, unspecified, I509. Got a urine catheter two days ago. Has now fever. Done a lab test on the urine and gave antibiotics, Penomax. 123 H - IVA 322916614D 2007-08-22 16:12 1944 Woman No fever. The lab test on urine shows that she had bacteria in the urine. Information written in the patient record but also in the structured fields for temperature, drugs and lab results. Hercules Dalianis, Donostia, April 6, 2016 16 Temporality and negation Pat. op. för två dagar sedan The pat. uw. sur. two days ago Hon har inte feber, men mycket röd runt op. ställe She does not have fever, but very red around op. place Hercules Dalianis, Donostia, April 6, 2016 17 Temporality and negation Pat. op. för två dagar sedan The pat. uw. sur. two days ago Hon har inte feber, men mycket röd runt op. ställe She does not have fever, but very red around op. place Hercules Dalianis, Donostia, April 6, 2016 18 Temporality and negation Pat. op. för två dagar sedan The pat. uw. sur. two days ago Hon har inte feber, men mycket röd runt op. ställe She does not have fever, but very red around op. place Hercules Dalianis, Donostia, April 6, 2016 19 Temporality and negation Pat. op. för två dagar sedan The pat. uw. sur. two days ago Hon har inte feber, men mycket röd runt op. ställe She does not have fever, but very red around op. place Hercules Dalianis, Donostia, April 6, 2016 20 NegEx for Swedish Affirmed – The patient has fever Non-Affirmed – The patient has no fever Pseodo-negations – Fever can not be ruled out Not only fever but also…. Hercules Dalianis, Donostia, April 6, 2016 21 Factuality of symptoms and findings • Sumithra Velupillais’ six levels Hercules Dalianis, Donostia, April 6, 2016 22 + Certainly Positive Probably Positive Possibly Positive Possibly Negative Probably Negative - Certainly Negative Patient has Parkinsons disease. Physical examination strongly suggests Parkinson. Patient possibly has Parkinson. Parkinson cannot yet be ruled out. No support for Parkinson. Parkinsson can be excluded. Hercules Dalianis, Donostia, April 6, 2016 23 Factuality of diagnosis Hercules Dalianis, Donostia, April 6, 2016 24 Automatic classification - results • 0.699 F-measure (all classes) • 0.762 F-measure (merged classes) Hercules Dalianis, Donostia, April 6, 2016 25 Hercules Dalianis, Donostia, April 6, 2016 26 Two step process • Step 1 – Which diagnoses does a patient have? – Aiming at finding the diagnosis. • Step 2 – How certain is the diagnosis? Aiming at deciding the factuality level of the diagnosis. Hercules Dalianis, Donostia, April 6, 2016 27 Detecting Healthcare associated infections Healthcare associated infections (HAIs) : Statistics • International studies have found that up to 10 per cent of patients at any given time has Health care associated infections, (Humphreys and Smyths, 2006) • 10 per cent or more of the in-patients obtain a HAI in Europe • Three million injured patients and 50 000 deaths yearly only in Europe. Hercules Dalianis 29 Definition of Health care Associated Infection (HAI) [a]n infection occurring in a patient in a hospital or other health care facility in whom the infection was not present or incubating at the time of admission. This includes infections acquired in the hospital but appearing after discharge, and also occupational infections among staff of the facility Hercules Dalianis 30 One classification approach for detecting HAI - Detect-HAI • Pre processing and • Machine learning based approach Hercules Dalianis, Donostia, April 6, 2016 31 Machine learning based approach • 215 hospitalisation records (vårdtillfällen) – 128 with HAI 1 300 000 tokens – 85 without HAI 300 000 tokens • WEKA Machine learning toolkit using the SVM, Support Vector Machine Algorithm and RF, Random Forest Hercules Dalianis, Donostia, April 6, 2016 32 IST infection specific terms 1,045 terminology entries • CT (Computed tomography), kateter (catheter), dränage (drainage), sårinfektion (wound infection), intubering (intubation), operation (surgery), röd (red), urinstämma (urinary retention), ultraljud (ultrasound), feber (fever), . .. Hercules Dalianis, Donostia, April 6, 2016 33 Hospitalisation records for training Machine Learning Hospitalisation records for decision WEKA SVM / RANDOM FOREST HAI Hercules Dalianis, Donostia, April 6, 2016 NON HAI 34 Results detecting HAI • SVM, Support Vector Machine algorithm 74% recall and 86% precision using Terms + negation • RF, Random forest, 87% recall and 83% precision, using lemmas – See Ehrentraut et al 2014. Hercules Dalianis, Donostia, April 6, 2016 35 Hercules Dalianis, Donostia, April 6, 2016 36 Template for extracted data from tables Mall för utdata extraherade från tabeller ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ Mall start ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ @@@@@|patientnr|kon|fodelsear|handelsedatum|veckodag|@@@@@ <<<<<Journalanteckning>>>>> #####|journalanteckning_id|vardenhet|yrke|mall|##### %%%%%|sokord_term|vardeterm|%%%%% ICD-10 kod|kod text or anteckning %%%%%|sokord_term|(1)vardeterm(2)vardeterm(3)vardeterm...|%%%%% ICD-10 kod|kod text or anteckning .... <<<<<Läkemedelsmodul>>>>> #####|lakemedel_id|##### ATC-kod|kod text .... <<<<<Mikrobiologiska Svar>>>>> #####|svar_uid|undersokning|##### analysnamn #####|svar_uid|undersokning|##### (1)analysnamn (2)analysnamn .... <<<<<Kroppstemperatur>>>>> kroppstemperatur Hercules Dalianis, Donostia, April 6, 2016 .... ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ Mall slut ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ 37 A hospitalisation record hospitalisation records @@@@@|011|M|1947|2012-04-29|tisdag|@@@@@ <<<<<Journalanteckning>>>>> #####|25608293|H - Akutmott (Inf)|Läkare|Intagningsanteckning|##### %%%%%|Tid/nuv.sjukdomar|-----|%%%%% Välkänd pat på lungklin. Har emfysem och bronkiektasier sedan unga år. Senaste halvåret haft växt av pseudomonas i sputumodl vid upprepade tillfällen och pat har fått upprepade kurer med bredspektrumantibiotika, Tazocin + Meronem. Senaste kuren avslutad den 15/4 och man satte i stället in honom på Azitromax. %%%%%|Aktuella läkemedel|-----|%%%%% t Calcichew D3 1 x 2 t Betapred 05mg 5 x 1 i nedtrappande dos, #####|14941941|Blododling, aerob och anaerob|##### Ingen växt <<<<<Kroppstemperatur>>>>> 38 38 38,5 @@@@@|011|M|1947|2012-04-30|onsdag|@@@@@ …… Hercules Dalianis, Donostia, April 6, 2016 38 Conclusions of Detect-HAI • Lower percentage than physician • But consequent analysis, (physians low IAA) • 100 per cent analysis on all records 24/7 Hercules Dalianis, Donostia, April 6, 2016 39 Health records Hercules Dalianis, Donostia, April 6, 2016 40 Questions? Research groups in Europe • Finland • Austria • Norway • Bulgaria • Denmark • Italy • United Kindom • United Kingdom • Germany • Spain • France Hercules Dalianis, Donostia, April 6, 2016 42 Finland, Turku/Åbo • University of Turku • Information and language technology for health information and communication, ITKTIK group – Computer Science • Prof Tapio Salakoski – Nursing Science • Prof Sanna Salanterä – Nursing narratives • Machine learning Hercules Dalianis, Donostia, April 6, 2016 43 Norway • NTNU-Norwegian University of Science and Technology-Trondheim • Associate professor Øystein Nytrø – Disease trajectories Hercules Dalianis, Donostia, April 6, 2016 44 Denmark • DTU Denmarks Technical University and • University of Copehagen – Prof Søren Brunak • Biomedicine, systems biologi • Danish psychiatric and fertility records Hercules Dalianis, Donostia, April 6, 2016 45 United Kingdom • Professors Donia Scott and Ehud Reiter – Text generation from neo-natal data • Professor Sophia Ananiadou, National Centre for Text Mining (NaCTeM), University of Manchester. – Biomedical text mining Hercules Dalianis, Donostia, April 6, 2016 46 Germany • Professor Udo Hahn, Jena University Language & Information Engineering (JULIE), Jena • Dr. Katrine Tomanek, Jena and Averbis • Dr. Philipp Daumke, Averbis • Tagging, active learning, biomedical text Hercules Dalianis, Donostia, April 6, 2016 47 France • Assoc. Professor Pierre Zweigenbaum, LIMSI, Paris • French clinical text mining • Dr. Frederique Segond, Xerox Parc, Grenoble and now Viseo • MD. Marie-Helene Metzger, University of Lyon's Hôpital de la Croix-Rousse • Dr. Emmanuel Chazard, Universite de Lille • Detection of ADE Hercules Dalianis, Donostia, April 6, 2016 48 Austria • Prof Stefan Schulz, University of Graz – Medical language processing – Secondary use of clinical data • Prof MD, Klaus-Peter Adlassnig – Medical University of Vienna – Medexter Healthcare GmbH – Detection of healthcare associated infections and other ADEs Hercules Dalianis, Donostia, April 6, 2016 49 Bulgaria • Professor Galia Angelova, Linguistic Modelling Department, Bulgarian Academy of Sciences. • Associated prof. Svetla Boytcheva – Clinical text mining of Bulgarian. – 100 000 notes, etc Hercules Dalianis, Donostia, April 6, 2016 50 Italy • Prof. Giuseppe Attardi, Department of Informatics, University of Pisa. • Dr. Anita Alicante, Dipartimento di Ingegneria Elettrica e delle Tecnologie dell'Informazione, DIETI, University of Napoli "Federico II, Napoli, Italy • Unsupervised entity and relation extraction from clinical records in Italian. Hercules Dalianis, Donostia, April 6, 2016 51 Spain • Professor Paloma Martínez Fernández – Universidad Carlos III de Madrid • Dr Isabel Segura Bedmar – Extracting drug indications and adverse drug reactions from Spanish health social media – Automatic Identification of Biomedical Concepts in Spanish Language Unstructured Clinical Texts – Etc… Hercules Dalianis, Donostia, April 6, 2016 52 Questions? Hercules Dalianis, Donostia, April 6, 2016 54 References • Humphreys, H. and E.T.M. Smyths. Prevalence surveys of healthcare-associated infections: what do they tell us, if anything?. Clin Microbiol Infect. 2006. 12: 2-4. • Proux D, Hagège C, Gicquel Q, Pereira S, Darmoni S, et al. Architecture and Systems for Monitoring Hospital Acquired Infections inside Hospital Information Workflows, in the Proceedings of Workshop on Biomedical Natural Language Processing, RANLP2011, Hissar, Bulgaria, 15 Sept 2011, pp 43-48. • M. Klompas and D. S. Yokoe. Automated surveillance of health careassociated infections. Clinical Infectious Diseases, 48(9):1268– 1275, 2009. Hercules Dalianis, Donostia, April 6, 2016 55 References • Ehrentraut, C., Kvist, M., Sparrelid, M. and Dalianis, H. 2014. Detecting Healthcare-Associated Infections in Electronic Health Records - Evaluation of Machine Learning and Preprocessing Techniques, in the Proceedings of the 6th International Symposium on Semantic Mining in Biomedicine (SMBM 2014). Bodenreider, O., Oliveira, J.L., Rinaldi, F. (Eds.), Aveiro, Portugal. • Ehrentraut, C, H. Tanushi, H. Dalianis and J. Tiedemann. 2012. Detection of Hospital Acquired Infections in sparse and noisy Swedish patient records. A machine learning approach using Naïve Bayes, Support Vector Machines and C4.5. In the Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data, AND, December 9, 2012 held in conjunction with Coling 2012, Bombay, Hercules Dalianis, Donostia, April 6, 2016 56 References • Tanushi, H., M. Kvist and E. Sparrelid. 2014. Detection of HealthcareAssociated Urinary Tract Infection in Swedish Electronic Health Records. Advances in Data & Knowledge Management for Healthcare. Invited Session in International Conference on Innovation in Medicine and Healthcare (InMed'14),. • Tanushi, H., H. Dalianis, M. Duneld, M. Kvist, M. Skeppstedt and S. Velupillai. 2013. Negation Scope Delimitation in Clinical Text Using Three Approaches: NegEx, PyConTextNLP and SynNeg. The 19th Nordic Conference of Computational Linguistics. • Freeman, R., Moore, L. S. P., Álvarez, L. G., Charlett, A., & Holmes, A. 2013. Advances in electronic surveillance for healthcare-associated infections in the 21st century: a systematic review. Journal of Hospital Infection, 84(2), 106-119. Hercules Dalianis, Donostia, April 6, 2016 57 Identifying adverse drug event information in clinical notes with distributional semantic representations of context work carried out with Aron Henriksson Mia Kvist Martin Duneld Hercules Dalianis, Donostia, April 6, 2016 58 Introduction Adverse drug events (ADE) • ADEs causes 3.7% of hospital admissions worldwide. • One of the most common causes of death • Seventh most common cause of death in Sweden Hercules Dalianis, Donostia, April 6, 2016 59 ADE-detection • To detect known and unkown adverse drug effects • Using real patient records with real patients – Post marketing drug safety surveillance • Patient records with assigned ICD-10 codes denoting adverse drug events Hercules Dalianis, Donostia, April 6, 2016 60 Two steps: Named Entities and Relations classification • Pre-annotation • Annotation • Machine learning using Conditional Random Fields (CRF++) for identifying Named Entities • Classification with Random Forest Hercules Dalianis, Donostia, April 6, 2016 61 Hercules Dalianis, Donostia, April 6, 2016 62 Hercules Dalianis, Donostia, April 6, 2016 63 Pre-annotation using Clinical Entity Finder (CEF) • The data set was pre-annotated with the named entities: Finding, Disorder, Body structure and Pharmaceutical drug, Skeppstedt et al., (2014) • CEF uses CRF++ (Conditional Random Fields) machine learning system trained on manually annotated notes from one internal medicine emergency unit. • CEF obtained F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure (Around 4,000 training instances) Hercules Dalianis, Donostia, April 6, 2016 64 After pre-annotation re-annotation and adding ADE relations • Three annotators, one clinician and two computer scientists, all trained annotators • Manual annotation correction • Adding temporality, speculation, negation • Manual relation annotation, indications and ADEs Hercules Dalianis, Donostia, April 6, 2016 65 Hercules Dalianis, Donostia, April 6, 2016 66 Agreement between pre-annotator and human annotators Hercules Dalianis, Donostia, April 6, 2016 67 IAA-interannotator agreement – main annotator and sub annotators Hercules Dalianis, Donostia, April 6, 2016 68 NER on ADE B: a window size of 1 + 1 and a regularization parameter of 9; +DSM: a window size of 2 + 2 and a regularization parameter of 9; +mDSM: a window size of 1 + 1 and a regularization parameter of 1 Hercules Dalianis, Donostia, April 6, 2016 69 Indication and ADE relations • Random forest for relation mining • Features – Distance between two entities – Annotated tokens and class – Context left of first ENTITY1, right of second ENTITY2 and words in between entities. ENTITY1 ENTITY2 – The patient obtained urticaria and was given Betapred. • Low below 30 per cent F-score Hercules Dalianis, Donostia, April 6, 2016 70 ADE Relation mining Hercules Dalianis, Donostia, April 6, 2016 71 Conclusion • Access to clinical data and text • Access to annotated text • Access to terminologies/ontologes ICD-10/Snomed CT • Natural language pre-processing • Machine learning approach • => Various system for healthcare Hercules Dalianis, Donostia, April 6, 2016 72 Questions? References • Henriksson, A., Kvist, M., Dalianis, H., & Duneld, M. (2015). Identifying adverse drug event information in clinical notes with distributional semantic representations of context. Journal of biomedical informatics, 57, 333-349. Hercules Dalianis, Donostia, April 6, 2016 74 Related relations mining • Rule based – Eriksson et al 2013, identified 35,477 unique ADEs in Danish patient record => 0.75 recall and 0.89 precision – Wang et al 2009, Studied seven specific drugs. 25,074 English discharge summaries for eval. 0.75 recall and 0.31 precision for known ADEs – Hazlehurst et al 2009, 450,000 patients, 0.74 to 0.31 PPV (precision) Hercules Dalianis, Donostia, April 6, 2016 75 Related relations mining • IAA for annotations – Mihăilă et al 2013, Protein-Protein interactions IAA experiment, 885 relations 0.64 F-Score in IAA – 0.51 F-Score in IAA in average for causal and effect relation Hercules Dalianis, Donostia, April 6, 2016 76 Related relations mining • Machine learning and (rule) based – Aramaki et al, 2010, used 3,012 Japanese discharge summaries – Annotated 1,045 drugs and 3,601 possible ADE – 7.7% of the discharge summaries contained ADE. – 59% could be extracted automatically – 0.41 precision and 0.92 recall using PTM (Pattern matching methods) – 0.58 precison and 0.62 recall using SVM Hercules Dalianis, Donostia, April 6, 2016 77 Related relations mining • Santiso et al 2014, 6,100 concepts and 4,700 ADR-(Adverse Drug Reactions) relations for training and evaluated on 2,100 concepts and 1,600 ADR-relations • 0.93 precision and 0.85 recall using the Random Forest algorithm. • IAA for the four annotators unknown Hercules Dalianis, Donostia, April 6, 2016 78 Related relations mining • Gurulingappa et al 2012 • Annotated 3,000 medical case reports containing ADE reports. • IAA on relations many different variants stretching from 0.1 to 0.6 • Maximum Entropy (MaxEnt) classifiers gave 0.75 precision and 0.64 recall. Hercules Dalianis, Donostia, April 6, 2016 79 Future work • Compound splitting • Feature optimization • More training data • Balanced training data Hercules Dalianis, Donostia, April 6, 2016 80 References • • • E. Aramaki, Y. Miura, M. Tonoike, T. Ohkuma, H. Masuichi, K. Waki, K. Ohe, Extraction of adverse drug effects from clinical records, Stud Health Technol Inform 160 (Pt 1) (2010) 739–743 S. Santiso, A. Pérez, K. Gojenola, I. Taldea, A. Casillas, M. Oronoz, Adverse Drug Event prediction combining shallow analysis and machine learning, in: Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)@ EACL, 2014, pp. 85–89 H. Gurulingappa, A. M. Rajput, A. Roberts, J. Fluck, M. Hofmann-Apitius, L. Toldo, Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports, Journal of biomedical informatics 45 (5) (2012) 885–892 Hercules Dalianis, Donostia, April 6, 2016 81 References • • • • S. Mihăilă, T. Ohta, S. Pyysalo, S. Ananiadou, Biocause: Annotating and analysing causality in the biomedical domain, BMC bioinformatics 14 (1) (2013) 2. R. Eriksson, P. B. Jensen, S. Frankild, L. J. Jensen, S. Brunak, Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text, Journal of the American Medical Informatics Association 20 (5) (2013) 947–953 X. Wang, G. Hripcsak, M. Markatou, C. Friedman, Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study, Journal of the American Medical Informatics Association 16 (3) (2009) 328–337 B. Hazlehurst, A. Naleway, J. Mullooly, Detecting possible vaccine adverse events in clinical notes of the electronic medical record, Vaccine 27 (14) (2009) 2077–2083 Hercules Dalianis, Donostia, April 6, 2016 82