Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
How Much Does Automatic Text De-Identification Impact Clinical Problems, Tests, and Treatments? Stéphane M. Meystre, MD, PhD1,3, Óscar Ferrández, PhD4, Brett R. South, MS1,2.3, Shuying Shen, MStat1,2,3, Matthew H. Samore, MD1,2,3 1 Department of Biomedical Informatics, 2 Department of Internal Medicine, University of Utah, 3 VA Health Care System, Salt Lake City, UT, 4 Nuance Communications Inc., Burlington, MA Abstract: Clinical text de-identification can potentially overlap with clinical information such as medical problems or treatments, therefore causing this information to be lost. In this study, we focused on the analysis of the overlap between the 2010 i2b2 NLP challenge concept annotations, with the PHI annotations of our best-of-breed clinical text deidentification application. Overall, 0.81% of the annotations overlapped exactly, and 1.78% partly overlapped. Introduction: Clinical text de-identification (i.e., removal of all Protected Health Information (PHI)), as defined in the HIPAA Safe Harbor legislation, allows clinical notes to be used for research without patient consent, a requirement often difficult, if not impossible, to fulfill. We have developed and evaluated a best-of-breed clinical text automatic deidentification application for VHA clinical notes (aka “BoB”),1 and realized that the potential for impacting clinical data was not negligible. We added a clinical eponyms disambiguation module in BoB, and started several experiments focusing on the impact of de-identification on subsequent uses of clinical notes, the first of which is presented here. Methods: For accessibility and annotations availability reasons, we chose to use the 2010 i2b2 NLP challenge corpus and reference standard for this early study. The 2010 i2b2 challenge focused on the annotation of medical problems, tests, and treatments, and well as on their local context assessment (e.g., “…denied chest pain”), and the extraction of specific relations between these concepts.2 We then used “BoB”, our clinical text de-identification application, to automatically annotate all PHI, as well as clinical eponyms, in this corpus, and then analyzed the overlap of these new annotations with the 2010 i2b2 NLP challenge reference standard annotations. Results: The 2010 i2b2 NLP challenge corpus included a total of 47685 annotations; 849 partly overlapped with BoB’s PHI annotations, and 386 exactly overlapped. BoB correctly reclassified 112 clinical eponyms (e.g., Parkinson, Pfannenstiel, Holter, Foley, Whipple, Roux) to obtain the aforementioned counts. Overall, an average of 0.81% of BoB’s annotations overlapped exactly with the i2b2 annotations, and 1.78% partly overlapped. Most overlaps (76%) were annotated as person names, and among these overlaps, 45% of the total were treatment annotations (e.g., Colace, Lopressor, Senna, Hickman), 19% were problem annotations (e.g., E. Coli, Fournier, Addison), and 12% were test annotations (e.g., Apgars, Papanicolaou). i2b2 categories Problem Test Treatment i2b2 annot. 19667 13833 14185 PHI overlap # 187 180 482 Partial overlap Eponyms 18 41 53 Overlap [%] 0.95 1.30 3.40 PHI overlap # 65 40 281 Exact overlap Eponyms 5 11 2 Overlap [%] 0.33 0.29 1.98 Conclusion: This early study demonstrates that even an efficient text de-identification system like BoB can cause clinical information to be mistakenly considered as PHI and hidden or removed. This overlap is small, but not negligible. Another recent detailed study focused on the impact of text de-identification on the subsequent automatic extraction of medication names, and found no significant impact,3 but medications represent only a small part of the clinical information found in clinical notes, and a minority of the overlapping information we analyzed. Our plans are now to focus on the analysis of the impact of de-identification on subsequent uses of VHA clinical notes. Acknowledgments: Research supported by VA HSR HIR 08-374. Views expressed are those of authors and not necessarily those of the Department of VA or affiliated institutions. References 1. Ferrandez O, South BR, Shen S, Friedlin FJ, Samore MH, Meystre SM. BoB, a best-of-breed automated text de-identification system for VHA clinical documents. JAMIA. 2012. Sep.4. 2. Uzuner O, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. JAMIA. 2011.Aug.16;18(5):552–6. 3. Deleger L, Molnar K, Savova G, Xia F, Lingren T, Li Q, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. JAMIA. 2012. Aug.2. 177