Download Clinical text mining at Stockholm University and at other

Document related concepts
no text concepts found
Transcript
Clinical text mining at Stockholm
University and at other research
groups in Europe.
Hercules Dalianis
Clinical Text Mining Group
Department of Computer and Systems Sciences (DSV)
[email protected]
Stockholm University (2016)
• 73 Departments and centers
• 71,000 Students
• 2,000 PhD students
• 5,500 Employees
Hercules Dalianis, Donostia, April 6, 2016
2
Frescati main campus
Hercules Dalianis, Donostia, April 6, 2016
3
Dep. of Computer and Systems
Sciences (DSV)
• 5,400 students
• 85 PhD students
• 173 Employees
Hercules Dalianis, Donostia, April 6, 2016
4
DSV in Kista, Silicon Valley of
Sweden
Hercules Dalianis, Donostia, April 6, 2016
5
Clinical text mining group 2007-2014
Aron Henriksson, Sara Brissman, Martin Hassel, Hideyuki Tanushi,
Mia Kvist, Maria Skeppstedt, Sumithra Velupillai, Hercules Dalianis
(Not in photo Claudia Ehrentraut, and Rebecka Weegar)
Hercules Dalianis, Donostia, April 6, 2016
6
Claudia Ehrentraut, and Rebecka Weegar
Hercules Dalianis, Donostia, April 6, 2016
HEALTH BANK
2 mil. patient records
7
2007-2014
HEALTH BANK
• 23 000 users (readers and writers),
• 6-7 different professions
• Structured information:
– Serial number, time points, clinical unit, age,
gender, blood and laboratory values, ATCcodes, ICD-10 diagnosis codes
• Unstructured text under different headings
– Anamnesis, Assessment, Social, Discharge
letter
Hercules Dalianis, Donostia, April 6, 2016
8
Research projects
•
MINECAN - Data and text mining of cancer symptoms and
comorbidities in electronic patient records in the Nordic
languages, funded The Nordic Information for Action
e-Science Center of Excellence, 2014-2019.
•
DADEL - High-Performance Data Mining for Drug Effect
Detection, 2013-2016.
•
AVID - Avidentifiering för sekundär användning av
patientjournaler, 2016.
•
Detect-HAI, Detection of Healthcare Associated Infections
(finalized)
Hercules Dalianis, Donostia, April 6, 2016
9
Supervised or un-supervised
methods?
• Supervised need lots of annotation efforts by
physicians
• Unsupervised can use already annotated data
ICD-10 codes, ACT-drug-codes, time stamps,
patient gender and age.
Hercules Dalianis, Donostia, April 6, 2016
10
Ethical issues around annotations
• Direct access to data
• Annotation is sensitive
• Annotation is difficult
Hercules Dalianis, Donostia, April 6, 2016
11
Unsupervised methods
• Unsupervised methods can use the built in
structure.
Hercules Dalianis, Donostia, April 6, 2016
12
Monitoring HAIs
• Compulsory manual reporting by personnel
– However seldom carried out
• Point Prevalence Measures (PPM)
– Manual and carried out twice a year (during a day)
• Infektionsverktyget (Infection tool)
– All prescriptions of antibiotics is reported centrally
Hercules Dalianis, Donostia, April 6, 2016
13
Manual monitoring
• Difficult
• Tiresome
• Low IAA between physicians
• Only on a small sample 1-2 percent of all inpatients
Hercules Dalianis, Donostia, April 6, 2016
14
Automatic HAI monitoring
• To ease burden of clinicians
• To assist hospital management
• To get better reporting on a larger population
Hercules Dalianis, Donostia, April 6, 2016
15
A Hospital Acquired Infection Case
123 H - IVA 322916614D 2007-08-21 9:12
1944 Woman Anamnesis
Pneumonia, I110. Heart failure, unspecified,
I509.
Got a urine catheter two days ago. Has now
fever. Done a lab test on the urine and gave
antibiotics, Penomax.
123 H - IVA 322916614D 2007-08-22 16:12
1944 Woman
No fever. The lab test on urine shows that she
had bacteria in the urine.
Information written in the patient record but also in the
structured fields for temperature, drugs and lab results.
Hercules Dalianis, Donostia, April 6, 2016
16
Temporality and negation
Pat. op. för två dagar sedan
The pat. uw. sur. two days ago
Hon har inte feber, men mycket röd runt op. ställe
She does not have fever, but very red around op. place
Hercules Dalianis, Donostia, April 6, 2016
17
Temporality and negation
Pat. op. för två dagar sedan
The pat. uw. sur. two days ago
Hon har inte feber, men mycket röd runt op. ställe
She does not have fever, but very red around op. place
Hercules Dalianis, Donostia, April 6, 2016
18
Temporality and negation
Pat. op. för två dagar sedan
The pat. uw. sur. two days ago
Hon har inte feber, men mycket röd runt op. ställe
She does not have fever, but very red around op. place
Hercules Dalianis, Donostia, April 6, 2016
19
Temporality and negation
Pat. op. för två dagar sedan
The pat. uw. sur. two days ago
Hon har inte feber, men mycket röd runt op. ställe
She does not have fever, but very red around op. place
Hercules Dalianis, Donostia, April 6, 2016
20
NegEx for Swedish
Affirmed – The patient has fever
Non-Affirmed – The patient has no fever
Pseodo-negations – Fever can not be ruled out
Not only fever but also….
Hercules Dalianis, Donostia, April 6, 2016
21
Factuality of symptoms and
findings
• Sumithra Velupillais’ six levels
Hercules Dalianis, Donostia, April 6, 2016
22
+
Certainly
Positive
Probably
Positive
Possibly
Positive
Possibly
Negative
Probably
Negative
-
Certainly
Negative
Patient has Parkinsons disease.
Physical examination strongly suggests Parkinson.
Patient possibly has Parkinson.
Parkinson cannot yet be ruled out.
No support for Parkinson.
Parkinsson can be excluded.
Hercules Dalianis, Donostia, April 6, 2016
23
Factuality of diagnosis
Hercules Dalianis, Donostia, April 6, 2016
24
Automatic classification - results
• 0.699 F-measure (all classes)
• 0.762 F-measure (merged classes)
Hercules Dalianis, Donostia, April 6, 2016
25
Hercules Dalianis, Donostia, April 6, 2016
26
Two step process
• Step 1
– Which diagnoses does a patient have?
– Aiming at finding the diagnosis.
• Step 2
– How certain is the diagnosis?
Aiming at deciding the factuality level of the
diagnosis.
Hercules Dalianis, Donostia, April 6, 2016
27
Detecting Healthcare associated
infections
Healthcare associated infections
(HAIs) : Statistics
• International studies have found that up to
10 per cent of patients at any given time has
Health care associated infections, (Humphreys
and Smyths, 2006)
• 10 per cent or more of the in-patients obtain a
HAI in Europe
• Three million injured patients and 50 000 deaths
yearly only in Europe.
Hercules Dalianis
29
Definition of Health care Associated
Infection (HAI)
[a]n infection occurring in a patient in a hospital or
other health care facility in whom the infection was
not present or incubating at the time of admission.
This includes infections acquired in the hospital but
appearing after discharge, and also occupational
infections among staff of the facility
Hercules Dalianis
30
One classification approach for
detecting HAI - Detect-HAI
• Pre processing and
• Machine learning based approach
Hercules Dalianis, Donostia, April 6, 2016
31
Machine learning based approach
• 215 hospitalisation records (vårdtillfällen)
– 128 with HAI
1 300 000 tokens
– 85 without HAI 300 000 tokens
• WEKA Machine learning toolkit using the
SVM, Support Vector Machine Algorithm and
RF, Random Forest
Hercules Dalianis, Donostia, April 6, 2016
32
IST infection specific terms
1,045 terminology entries
• CT (Computed tomography), kateter (catheter),
dränage (drainage), sårinfektion (wound
infection), intubering (intubation), operation
(surgery), röd (red), urinstämma (urinary
retention), ultraljud (ultrasound), feber (fever), .
..
Hercules Dalianis, Donostia, April 6, 2016
33
Hospitalisation
records for training
Machine
Learning
Hospitalisation
records for decision
WEKA
SVM /
RANDOM FOREST
HAI
Hercules Dalianis, Donostia, April 6, 2016
NON HAI
34
Results detecting HAI
• SVM, Support Vector Machine algorithm 74%
recall and 86% precision using Terms + negation
• RF, Random forest, 87% recall and 83%
precision, using lemmas
– See Ehrentraut et al 2014.
Hercules Dalianis, Donostia, April 6, 2016
35
Hercules Dalianis, Donostia, April 6, 2016
36
Template for extracted data
from tables
Mall för utdata extraherade från tabeller
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ Mall start ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
@@@@@|patientnr|kon|fodelsear|handelsedatum|veckodag|@@@@@
<<<<<Journalanteckning>>>>>
#####|journalanteckning_id|vardenhet|yrke|mall|#####
%%%%%|sokord_term|vardeterm|%%%%%
ICD-10 kod|kod text
or
anteckning
%%%%%|sokord_term|(1)vardeterm(2)vardeterm(3)vardeterm...|%%%%%
ICD-10 kod|kod text
or
anteckning
....
<<<<<Läkemedelsmodul>>>>>
#####|lakemedel_id|#####
ATC-kod|kod text
....
<<<<<Mikrobiologiska Svar>>>>>
#####|svar_uid|undersokning|#####
analysnamn
#####|svar_uid|undersokning|#####
(1)analysnamn
(2)analysnamn
....
<<<<<Kroppstemperatur>>>>>
kroppstemperatur
Hercules Dalianis, Donostia, April 6, 2016
....
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ Mall slut ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
37
A hospitalisation record
hospitalisation records
@@@@@|011|M|1947|2012-04-29|tisdag|@@@@@
<<<<<Journalanteckning>>>>>
#####|25608293|H - Akutmott (Inf)|Läkare|Intagningsanteckning|#####
%%%%%|Tid/nuv.sjukdomar|-----|%%%%%
Välkänd pat på lungklin. Har emfysem och bronkiektasier sedan unga år. Senaste
halvåret haft växt av pseudomonas i sputumodl vid upprepade tillfällen och pat har
fått upprepade kurer med bredspektrumantibiotika, Tazocin + Meronem. Senaste
kuren avslutad den 15/4 och man satte i stället in honom på Azitromax.
%%%%%|Aktuella läkemedel|-----|%%%%%
t Calcichew D3 1 x 2
t Betapred 05mg 5 x 1 i nedtrappande dos,
#####|14941941|Blododling, aerob och anaerob|#####
Ingen växt
<<<<<Kroppstemperatur>>>>>
38
38
38,5
@@@@@|011|M|1947|2012-04-30|onsdag|@@@@@
……
Hercules Dalianis, Donostia, April 6, 2016
38
Conclusions of Detect-HAI
• Lower percentage than physician
• But consequent analysis, (physians low IAA)
• 100 per cent analysis on all records 24/7
Hercules Dalianis, Donostia, April 6, 2016
39
Health records
Hercules Dalianis, Donostia, April 6, 2016
40
Questions?
Research groups in Europe
• Finland
• Austria
• Norway
• Bulgaria
• Denmark
• Italy
• United Kindom
• United Kingdom
• Germany
• Spain
• France
Hercules Dalianis, Donostia, April 6, 2016
42
Finland, Turku/Åbo
• University of Turku
• Information and language technology for
health information and communication,
ITKTIK group
– Computer Science
• Prof Tapio Salakoski
– Nursing Science
• Prof Sanna Salanterä
– Nursing narratives
• Machine learning
Hercules Dalianis, Donostia, April 6, 2016
43
Norway
• NTNU-Norwegian University of Science and
Technology-Trondheim
• Associate professor Øystein Nytrø
– Disease trajectories
Hercules Dalianis, Donostia, April 6, 2016
44
Denmark
• DTU Denmarks Technical University and
• University of Copehagen
– Prof Søren Brunak
• Biomedicine, systems biologi
• Danish psychiatric and fertility records
Hercules Dalianis, Donostia, April 6, 2016
45
United Kingdom
• Professors Donia Scott and Ehud Reiter
– Text generation from neo-natal data
• Professor Sophia Ananiadou, National Centre for
Text Mining (NaCTeM), University of Manchester.
– Biomedical text mining
Hercules Dalianis, Donostia, April 6, 2016
46
Germany
• Professor Udo Hahn, Jena University Language &
Information Engineering (JULIE), Jena
• Dr. Katrine Tomanek, Jena and Averbis
• Dr. Philipp Daumke, Averbis
• Tagging, active learning, biomedical text
Hercules Dalianis, Donostia, April 6, 2016
47
France
• Assoc. Professor Pierre Zweigenbaum, LIMSI,
Paris
• French clinical text mining
• Dr. Frederique Segond, Xerox Parc, Grenoble
and now Viseo
• MD. Marie-Helene Metzger, University of Lyon's
Hôpital de la Croix-Rousse
• Dr. Emmanuel Chazard, Universite de Lille
• Detection of ADE
Hercules Dalianis, Donostia, April 6, 2016
48
Austria
• Prof Stefan Schulz, University of Graz
– Medical language processing
– Secondary use of clinical data
• Prof MD, Klaus-Peter Adlassnig
– Medical University of Vienna
– Medexter Healthcare GmbH
– Detection of healthcare associated infections and
other ADEs
Hercules Dalianis, Donostia, April 6, 2016
49
Bulgaria
• Professor Galia Angelova, Linguistic Modelling
Department, Bulgarian Academy of Sciences.
• Associated prof. Svetla Boytcheva
– Clinical text mining of Bulgarian.
– 100 000 notes, etc
Hercules Dalianis, Donostia, April 6, 2016
50
Italy
• Prof. Giuseppe Attardi, Department of
Informatics, University of Pisa.
• Dr. Anita Alicante, Dipartimento di Ingegneria
Elettrica e delle Tecnologie dell'Informazione,
DIETI, University of Napoli "Federico II, Napoli,
Italy
• Unsupervised entity and relation extraction from
clinical records in Italian.
Hercules Dalianis, Donostia, April 6, 2016
51
Spain
• Professor Paloma Martínez Fernández
– Universidad Carlos III de Madrid
• Dr Isabel Segura Bedmar
– Extracting drug indications and adverse drug
reactions from Spanish health social media
– Automatic Identification of Biomedical Concepts in
Spanish Language Unstructured Clinical Texts
– Etc…
Hercules Dalianis, Donostia, April 6, 2016
52
Questions?
Hercules Dalianis, Donostia, April 6, 2016
54
References
•
Humphreys, H. and E.T.M. Smyths. Prevalence surveys of
healthcare-associated infections: what do they tell us, if anything?.
Clin Microbiol Infect. 2006. 12: 2-4.
•
Proux D, Hagège C, Gicquel Q, Pereira S, Darmoni S, et al.
Architecture and Systems for Monitoring Hospital Acquired
Infections inside Hospital Information Workflows, in the Proceedings
of Workshop on Biomedical Natural Language Processing, RANLP2011, Hissar, Bulgaria, 15 Sept 2011, pp 43-48.
•
M. Klompas and D. S. Yokoe. Automated surveillance of health careassociated infections. Clinical Infectious Diseases, 48(9):1268–
1275, 2009.
Hercules Dalianis, Donostia, April 6, 2016
55
References
•
Ehrentraut, C., Kvist, M., Sparrelid, M. and Dalianis, H. 2014.
Detecting Healthcare-Associated Infections in Electronic Health
Records - Evaluation of Machine Learning and Preprocessing
Techniques, in the Proceedings of the 6th International Symposium
on Semantic Mining in Biomedicine (SMBM 2014). Bodenreider, O.,
Oliveira, J.L., Rinaldi, F. (Eds.), Aveiro, Portugal.
•
Ehrentraut, C, H. Tanushi, H. Dalianis and J. Tiedemann. 2012.
Detection of Hospital Acquired Infections in sparse and noisy Swedish
patient records. A machine learning approach using Naïve Bayes,
Support Vector Machines and C4.5. In the Proceedings of the Sixth
Workshop on Analytics for Noisy Unstructured Text Data, AND,
December 9, 2012 held in conjunction with Coling 2012, Bombay,
Hercules Dalianis, Donostia, April 6, 2016
56
References
•
Tanushi, H., M. Kvist and E. Sparrelid. 2014. Detection of HealthcareAssociated Urinary Tract Infection in Swedish Electronic Health
Records. Advances in Data & Knowledge Management for Healthcare.
Invited Session in International Conference on Innovation in Medicine
and Healthcare (InMed'14),.
•
Tanushi, H., H. Dalianis, M. Duneld, M. Kvist, M. Skeppstedt and S.
Velupillai. 2013. Negation Scope Delimitation in Clinical Text Using
Three Approaches: NegEx, PyConTextNLP and SynNeg. The 19th
Nordic Conference of Computational Linguistics.
•
Freeman, R., Moore, L. S. P., Álvarez, L. G., Charlett, A., & Holmes,
A. 2013. Advances in electronic surveillance for healthcare-associated
infections in the 21st century: a systematic review. Journal of Hospital
Infection, 84(2), 106-119.
Hercules Dalianis, Donostia, April 6, 2016
57
Identifying adverse drug event
information in clinical notes with
distributional semantic
representations of context
work carried out with
Aron Henriksson
Mia Kvist
Martin Duneld
Hercules Dalianis, Donostia, April 6, 2016
58
Introduction
Adverse drug events (ADE)
• ADEs causes 3.7% of hospital admissions
worldwide.
• One of the most common causes of death
• Seventh most common cause of death in Sweden
Hercules Dalianis, Donostia, April 6, 2016
59
ADE-detection
• To detect known and unkown adverse drug
effects
• Using real patient records with real patients
– Post marketing drug safety surveillance
• Patient records with assigned ICD-10 codes
denoting adverse drug events
Hercules Dalianis, Donostia, April 6, 2016
60
Two steps:
Named Entities and
Relations classification
• Pre-annotation
• Annotation
• Machine learning using Conditional Random
Fields (CRF++) for identifying Named Entities
• Classification with Random Forest
Hercules Dalianis, Donostia, April 6, 2016
61
Hercules Dalianis, Donostia, April 6, 2016
62
Hercules Dalianis, Donostia, April 6, 2016
63
Pre-annotation using
Clinical Entity Finder (CEF)
• The data set was pre-annotated with the named
entities: Finding, Disorder, Body structure and
Pharmaceutical drug, Skeppstedt et al., (2014)
• CEF uses CRF++ (Conditional Random Fields)
machine learning system trained on manually
annotated notes from one internal medicine
emergency unit.
• CEF obtained F-score of 0.81 for Disorder,
0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85
for Body Structure (Around 4,000 training instances)
Hercules Dalianis, Donostia, April 6, 2016
64
After pre-annotation re-annotation
and adding ADE relations
• Three annotators, one clinician and two
computer scientists, all trained annotators
• Manual annotation correction
• Adding temporality, speculation, negation
• Manual relation annotation, indications and ADEs
Hercules Dalianis, Donostia, April 6, 2016
65
Hercules Dalianis, Donostia, April 6, 2016
66
Agreement between pre-annotator
and human annotators
Hercules Dalianis, Donostia, April 6, 2016
67
IAA-interannotator agreement –
main annotator and sub annotators
Hercules Dalianis, Donostia, April 6, 2016
68
NER on ADE
B: a window size of 1 + 1 and a regularization parameter of 9;
+DSM: a window size of 2 + 2 and a regularization parameter of 9;
+mDSM: a window size of 1 + 1 and a regularization parameter of 1
Hercules Dalianis, Donostia, April 6, 2016
69
Indication and ADE relations
• Random forest for relation mining
• Features
– Distance between two entities
– Annotated tokens and class
– Context left of first ENTITY1, right of second ENTITY2
and words in between entities.
ENTITY1
ENTITY2
– The patient obtained urticaria and was given Betapred.
• Low below 30 per cent F-score
Hercules Dalianis, Donostia, April 6, 2016
70
ADE Relation mining
Hercules Dalianis, Donostia, April 6, 2016
71
Conclusion
• Access to clinical data and text
• Access to annotated text
• Access to terminologies/ontologes ICD-10/Snomed CT
• Natural language pre-processing
• Machine learning approach
• => Various system for healthcare
Hercules Dalianis, Donostia, April 6, 2016
72
Questions?
References
• Henriksson, A., Kvist, M., Dalianis, H., & Duneld,
M. (2015). Identifying adverse drug event
information in clinical notes with distributional
semantic representations of context. Journal of
biomedical informatics, 57, 333-349.
Hercules Dalianis, Donostia, April 6, 2016
74
Related relations mining
• Rule based
– Eriksson et al 2013, identified 35,477 unique ADEs in
Danish patient record
=> 0.75 recall and 0.89 precision
– Wang et al 2009, Studied seven specific drugs.
25,074 English discharge summaries for eval.
0.75 recall and 0.31 precision for known ADEs
– Hazlehurst et al 2009, 450,000 patients,
0.74 to 0.31 PPV (precision)
Hercules Dalianis, Donostia, April 6, 2016
75
Related relations mining
• IAA for annotations
– Mihăilă et al 2013, Protein-Protein interactions IAA
experiment, 885 relations
0.64 F-Score in IAA
– 0.51 F-Score in IAA in average for causal and effect
relation
Hercules Dalianis, Donostia, April 6, 2016
76
Related relations mining
• Machine learning and (rule) based
– Aramaki et al, 2010, used 3,012 Japanese
discharge summaries
– Annotated 1,045 drugs and 3,601 possible ADE
– 7.7% of the discharge summaries contained ADE.
– 59% could be extracted automatically
– 0.41 precision and 0.92 recall using PTM (Pattern
matching methods)
– 0.58 precison and 0.62 recall using SVM
Hercules Dalianis, Donostia, April 6, 2016
77
Related relations mining
• Santiso et al 2014, 6,100 concepts and 4,700
ADR-(Adverse Drug Reactions) relations for
training and evaluated on 2,100 concepts and
1,600 ADR-relations
• 0.93 precision and 0.85 recall using the Random
Forest algorithm.
• IAA for the four annotators unknown
Hercules Dalianis, Donostia, April 6, 2016
78
Related relations mining
• Gurulingappa et al 2012
• Annotated 3,000 medical case reports containing
ADE reports.
• IAA on relations many different variants
stretching from 0.1 to 0.6
• Maximum Entropy (MaxEnt) classifiers gave
0.75 precision and 0.64 recall.
Hercules Dalianis, Donostia, April 6, 2016
79
Future work
• Compound splitting
• Feature optimization
• More training data
• Balanced training data
Hercules Dalianis, Donostia, April 6, 2016
80
References
•
•
•
E. Aramaki, Y. Miura, M. Tonoike, T. Ohkuma, H. Masuichi,
K. Waki, K. Ohe, Extraction of adverse drug effects from
clinical records, Stud Health Technol Inform 160 (Pt 1)
(2010) 739–743
S. Santiso, A. Pérez, K. Gojenola, I. Taldea, A. Casillas, M.
Oronoz, Adverse Drug Event prediction combining shallow
analysis and machine learning, in: Proceedings of the 5th
International Workshop on Health Text Mining and
Information Analysis (Louhi)@ EACL, 2014, pp. 85–89
H. Gurulingappa, A. M. Rajput, A. Roberts, J. Fluck, M.
Hofmann-Apitius, L. Toldo, Development of a benchmark
corpus to support the automatic extraction of drug-related
adverse effects from medical case reports, Journal of
biomedical informatics 45 (5) (2012) 885–892
Hercules Dalianis, Donostia, April 6, 2016
81
References
•
•
•
•
S. Mihăilă, T. Ohta, S. Pyysalo, S. Ananiadou, Biocause:
Annotating and analysing causality in the biomedical domain,
BMC bioinformatics 14 (1) (2013) 2.
R. Eriksson, P. B. Jensen, S. Frankild, L. J. Jensen, S.
Brunak, Dictionary construction and identification of possible
adverse drug events in Danish clinical narrative text, Journal
of the American Medical Informatics Association 20 (5)
(2013) 947–953
X. Wang, G. Hripcsak, M. Markatou, C. Friedman, Active
computerized pharmacovigilance using natural language
processing, statistics, and electronic health records: a
feasibility study, Journal of the American Medical Informatics
Association 16 (3) (2009) 328–337
B. Hazlehurst, A. Naleway, J. Mullooly, Detecting possible
vaccine adverse events in clinical notes of the electronic
medical record, Vaccine 27 (14) (2009) 2077–2083
Hercules Dalianis, Donostia, April 6, 2016
82