Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Identifying Negation/Uncertainty Attributes for SHARPn NLP Presentation to SHARPn Summit “Secondary Use” June 11-12, 2012 Cheryl Clark, PhD MITRE Corporation Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. The Challenge: Text Mentions versus Clinical Facts ■ Negation: event has not occurred or entity does not exist She had no fever yesterday. ■ Uncertainty: a measure of doubt The symptoms are not inconsistent with renal failure. ■ Conditional: could exist or occur under certain circumstances The patient should come back to the ED if any rash occurs. ■ Subject: person the observation is on; experiencer Mother had lung cancer. ■ Generic: no clear subject/experiencerfever renal infarction E. coli is sensitive to Cipro but enterococcus is not HOSPITAL-PEDIATRIC DISCHARGE SUMMARY NAME – ##### DATE OF ADMISSION – #### LOCATION – ##### BIRTH DATE - ##### HOSPITAL-PEDIATRIC DISCHARGE SUMMARY (REASON FOR ADMISSION) NAME – ##### SWOLLEN, PAINFUL HANDS. VOMITING. SYMPTOMS OF 18 DATE OF ADMISSION – #### HOURS DURATION. HOSPITAL-PEDIATRIC DISCHARGE LOCATION – #####SUMMARY NAME (ABSTRACT) – ##### BIRTH DATE - ##### DATE PATIENT, OF ADMISSION – #### 1 YEAR OLD. IS KNOWN TO HAVE SICKLE CELL HOSPITAL-PEDIATRIC DISCHARGE SUMMARY LOCATION – ##### (REASON FOR ADMISSION) DISEASES AND 2 EPISODES OF MENINGITIS. DEVELOPED NAME – ##### BIRTH DATE - ##### SWOLLEN,AND PAINFUL HANDS. VOMITING. SYMPTOMS OF 18 SWOLLEN, PAINFUL WARM HANDS. HAD SEVERAL DATE ADMISSION – #### HOURS DURATION. EPISODES OF VOMIINT PRIOR TO OF ADMISSION. LOCATION ##### OR (REASON FOR ADMISSION) LABORATORY STUDIES DID NOT REVEAL– ANEMIA BIRTH DATE - ##### SWOLLEN, PAINFUL HANDS. VOMITING. SYMPTOMS OF 18 (ABSTRACT) SYSTEMIC INFECTION. HYDRATION THERAPY AND BED HOURS DURATION. REST WERE PATIENT, PROVIDED, IN TO 48 HAVE HOURS. 1 WITH YEAR IMPORVEMENT OLD. IS KNOWN SICKLE CELL FOR WAS DISCHARGED IMPROVED. TO BE FOLLOWED IN DISEASES AND 2 (REASON EPISODES OF ADMISSION) MENINGITIS. DEVELOPED SWOLLEN, PAINFUL HANDS. VOMITING. SYMPTOMS OF 18 (ABSTRACT) HEMATOLOGY CLINIC. PAINFUL SWOLLEN, AND WARM HANDS. HAD SEVERAL HOURS DURATION. EPISODES VOMIINT PRIOR TO ADMISSION. PATIENT, 1 YEAR OLD. ISOFKNOWN TO HAVE SICKLE CELL STUDIES DID NOTDEVELOPED REVEAL ANEMIA OR DISEASES AND 2LABORATORY EPISODES OF MENINGITIS. (ABSTRACT) SYSTEMIC INFECTION. HYDRATION SWOLLEN, PAINFUL AND WARM HANDS. HAD SEVERALTHERAPY AND BED REST WERE PROVIDED, WITH 48 HOURS. EPISODES OF VOMIINT PRIOR TOPATIENT, ADMISSION. 1 IMPORVEMENT YEAR OLD. ISIN KNOWN TO HAVE SICKLE CELL WAS DISCHARGED IMPROVED. TO 2OR BE FOLLOWED LABORATORY STUDIES DID NOT REVEAL ANEMIA DISEASES AND EPISODES OFIN MENINGITIS. DEVELOPED HEMATOLOGY CLINIC. SYSTEMIC INFECTION. HYDRATION THERAPYPAINFUL AND BEDAND WARM HANDS. HAD SEVERAL SWOLLEN, REST WERE PROVIDED, WITH IMPORVEMENT IN VOMIINT 48 HOURS. EPISODES OF PRIOR TO ADMISSION. WAS DISCHARGED IMPROVED. TO LABORATORY BE FOLLOWEDSTUDIES IN DID NOT REVEAL ANEMIA OR HEMATOLOGY CLINIC. SYSTEMIC INFECTION. HYDRATION THERAPY AND BED REST WERE PROVIDED, WITH IMPORVEMENT IN 48 HOURS. WAS DISCHARGED IMPROVED. TO BE FOLLOWED IN HEMATOLOGY CLINIC. rash lung cancer Cipro … no uncertain conditional family member generic Page 2 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Background: Assertion Analysis Tool, Version 1 Input docs Negation & Uncertainty Cue/Scope Tagger Compute scope enclosures by rule Identify sections Extract words, concepts, locations i2b2 concepts Identify word classes and ordering Independent Evaluation: Assertion Classifier (Maximum Entropy) i2b2/VA 2010 Clinical NLP Challenge Assertion Status Task F Score = 0.93 i2b2 assertions 3 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Assertion Status Integration within SHARPn Clinical Document Pipeline cTAKES analysis engines … Input docs … Negation & Uncertainty Cue/Scope Tagger Annotations All annotations are UIMA Common Analysis Structure (CAS) … Compute scope enclosures by rule Identify sections Extract words, concepts, locations Identify word classes and ordering Updated attribute annotations Assertion Classifier (Maximum Entropy) 4 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. i2b2 Assertion Categories ■ Assertion classification system designed to meet requirements of 2010 i2b2/VA Challenge Assertion subtask Present: default category Patient had a stroke Absent: problem does not exist in the patient History inconsistent with stroke Possible: uncertainty expressed We are unable to determine whether she has leukemia Conditional: patient experiences the problem only under certain conditions Patient reports shortness of breath upon climbing stairs Hypothetical: medical problems the patient may develop If you experience wheezing or shortness of breath Corresponds to SHARPn conditional Not Patient: problem associated with someone who is not the patient Family history of prostate cancer Page 5 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Re-architecting Assertions ■ i2b2 assertion output values – defined for medical problems – closed set of values – mutually exclusive (fixed priority when multiple values apply) present absent single, possible multi-way hypothetical classifier not patient conditional (no SHARPn equivalent) ■ SHARPn assertion attributes – apply to various entities, events, relations – independent – attributes can have multiple values – additional attributes may be added negation uncertainty conditional subject multiple classifiers, some binary yes/no yes/no yes/no multi-valued (patient, family, donor, other…) … Page 6 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Assertion Module Refactoring: Phase 1 ■ Simple mapping from i2b2 assertion classes to SHARPn attributes – Uses existing i2b2-trained single classifier model – Identifies i2b2/SHARPn equivalences – Maps to SHARPn attribute values Please call physician [if you develop shortness of breath ] . i2b2 assertion status = “hypothetical” SHARPn conditional attribute = “true” Page 7 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Assertion Module Refactoring: Phase 2 ■ Direct assignment of SHARPn attribute values ■ Will use multiple classifiers trained on SHARPn data – Will identify attribute values directly ■ Benefits – Aligns with SHARPn concept attributes requirements – Aligns with SHARPn clinical data annotation – Enables more accurate meaning representation i2b2 2010 Paradigm Choose one: present absent He does not smoke , has no hypertension , and possible has no family history of coronary artery disease. hypothetical conditional not patient negator absent SHARPn Attribute Paradigm not patient family negation = present subject = family_member Page 8 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. System Errors=> Need for Better Linguistic Analysis for Assertions ■ Need for phrasal structure; scope extent not always enough negated She had [no chest pain or chest pressure ] with this and this was deemed a negative test. not negated 9 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Syntactic Approaches* ■ Insert a signifier node into constituency parse above entity ■ Use tree kernel methods to compare similarity with negated sentences in training data (can be used on other modifiers as well with varying degrees of success) * Slide courtesy of Tim Miller, Children’s Hospital Boston Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Tree kernel fragment mining* ■ Use TK model to extract tree fragment features (Pighin & Moschitti 07) ■ Allows interaction with other feature types ■ Faster to find fragments than do whole-tree comparisons * Slide courtesy of Tim Miller, Children’s Hospital Boston Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Next Steps: Assertions for Relations ■ Some assertion attributes apply to relations, too. – negation – uncertainty – conditional location relation uncertain The fundal AVMs are a potential site of bleeding although do not explain the extent of bleeding . negated causal relation Page 12 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Next Steps: Classifier Retraining and Component Evaluation ■ Model Retraining – – – – Models for individual attributes Linguistic features based on parser output Training on SHARPn data Enhancements to parsers ■ Evaluation – Accuracy on i2b2 gold annotations vs. accuracy on SHARPn gold annotations ■ i2b2 absent vs. SHARPn negated ■ i2b2 possible vs. SHARPn uncertainty ■ i2b2 hypothetical vs. SHARPn conditional – Evaluation based on system-generated entity annotations – Evaluation on CEM concept rather than on individual mentions Page 13 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Thank you! SHARPn Negation/Uncertainty Team John Aberdeen David Carrell Cheryl Clark Matt Coarr Scott Halgrim Lynette Hirschman Donna Ihrke Tim Miller Guergana Savova Ben Wellner Page 14 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Backup Slides Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Clarifying Definitions Negation and temporal The patient had the tumor removed. The text span “removed” indicates the tumor was there but does not exist anymore. Originally annotated as negated. No longer annotated as negated. Course: degree_of (tumor, CHANGED (span for “removed”)) Circumstantial negation (i2b2 calls this conditional) While smoking, he does not use his nicotine patch Annotated as negated Allergens ALLERGIES Medications mentioned as allergens originally negated Allergen status distinguished from negation Allergy_indicator_class PCN Sulpha Zocor Asendin Rocephin Page 16 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. System Errors=> Need for Better Linguistic Analysis for Assertions absent = negated present = should not be negated She had no signs of infection [on her leg wounds ]and she did have some mild erythema around her right great toe Issue is structure and not simply span extent: negated She had [no chest pain or chest pressure ] with this and this was deemed a negative test. not negated 17 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. MASTIF-Generated SHARPn attributes in cTAKES Output ■ [Add screenshot] default values calculated value Page 18 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Assertions for Different Concept Types polarity = -1 negated Page 19 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Issues: Differences in training data annotation UMLS CUI-driven annotation (SHARPn) UMLE contains some concept-internal negation; concept-internal subject Cigarette smoker Concept: [C0337667] (finding) Never smoked Non-smoker Concept: [C0425293] Never smoked tobacco (finding) Concept: [C0337672] Non-smoker (finding) Mother smokes Father smokes Concept: [C0424969] (finding) Concept: [C0424968] (finding) Mother does not smoke Concept: [C2586137] (finding) Father does not smoke Concept: [C2733448] (finding) i2b2 concept excludes contextual cues; SHARPn concept includes it. The patient has never smoked. i2b2 concept: smoked (negated) SHARPn concept: never smoked (not negated) Page 20 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Issue: Differences in training data annotation No known allergies Concept: [C0262580] No known allergies i2b2: concept = known allergies; type = problem; assertion = absent SHARPn: concept = no known allergies; type = disease/disorder; (finding in UMLS) assertion = present NKA i2b2: concept = nka ; type= problem; assertion = absent Page 21 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved. Abstract We describe a methodology for identifying negation and uncertainty in clinical documents and a system that uses that information to assign assertion values to medical problems mentioned in clinical text. This system was among the top performing systems in the assertion subtask of the 2010 i2b2/VA community evaluation Challenges in natural language processing for clinical data, and has subsequently been packaged as a UIMA module called the MITRE Assertion Status Tool for Interpreting Facts (MASTIF), which can be integrated with cTAKES. We describe the process of extending MASTIF, which uses a single multi-way classifier to select among a closed set of mutually exclusive assertion categories, to a system that uses individual, independent classifiers to assign values to independent negation and uncertainty attributes associated with a variety of clinical concepts (e.g., medications, procedures, and relations) as specified by SHARPn requirements. We discuss the benefits that result from this new representation and the challenges associated with generating it automatically. We compare the accuracy of MASTIF on i2b2 data with accuracy on a subset of SHARPn clinical documents, and discuss the contribution of linguistic features to accuracy and generalizability of the system. Finally, we discuss our plans for future development. Page 22 Approved for Public Release: 12-2751 © 2012 The MITRE Corporation. All rights reserved.