Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Construction grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Dependency grammar wikipedia , lookup
Distributed morphology wikipedia , lookup
Integrational theory of language wikipedia , lookup
Lexical analysis wikipedia , lookup
Musical syntax wikipedia , lookup
From free text to clinical data Language and Computing Davide Zaccagnini, MD Karen Doyle, RN October 23, 2007 Outline • Reality of Applying NLP to AHLTA documents • Use Cases • Ontology-Based NLP Use Cases • PRIMARY Use Case for Health Care Documentation compared with documentation produced for Biomedical Research – • Collect information to determine diagnosis (ses) and execute a plan of treatment and communicate with healthcare team. By-products of Electronic Documentation – Coding for Billing – Problem Lists – Past Medical History – Social History; 14 Elements tobacco use ETOH, toxin exposure, marital status – Family History – Medications – Allergies – Bio-surveillance – Quality Metrics; Pay for Performance, Joint Commission, HEDIS – Research AHLTA offers Structured Documentation Tool Medcin Terms in Blue Structured and Unstructured Text DoD HA Policy Guidance Ref ASAD Health Affairs August 7, 2007 Blue is the original code calculated based on the structured documentation. Pinks are the how the Doctor can change the subscores,. But the document does not change. Background of TATRC HPI Free Text DUMMY • Lost Data in S/O sections: What is the value? • Patient History – – – – – Patient’s “story”, reflects signs and symptoms History of Present Illness Review of Systems: Past Family, Social and Medical History Used to calculate Evaluation and Management (E&M) Billing Codes • HPI: History of Present Illness – Definition: A chronological description of the present illness from the first sign or symptom, or from last encounter – Comprised of 8 Elements used in the calculation of E&M code Location, quality, severity, duration, timing, context, modifying factors, associated signs and symptoms (HPI Dummy # 1) Free text Section Extracted manually for Analysis 100 Texts for Processing Free Text to Data: What is desirable? • HPI 1 45yo G4P4, POD14 s/p TAH, doing well. Denies f/c. Denies any pain. Not taking any pain meds. Staples removed on 9May. Appetite good. No N/V. Normal bowel/bladder function. She is very happy with the outcome of surgery. Only concern is incision -very small area that has not healed completely. has been keeping the incision clean and dry. • Expand Abbreviations • Codify Terms to Vocabularies ICD 9 SNOMED, MEDCIN • Negation • Modality • Applying Rules – Financial Billing – Obtain; age, height, weight, blood pressure, dates – Quality Metrics – Surveillance – History, Family, Past Medical, Current Problems? Free Text Example Expand Abbreviations Code to Vocabularies Evaluate for Negation Apply Rules negation appetite good good very f/c n/v TAH pain happy taking pain meds Ontology-based NLP Natural Language Processing and Understanding “…..natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.” Wikipedia Representations (formal or otherwise) DATA MODELS ONTOLOGY AGREED UPON TERMS: FORMALLY DEFINED OF CONCEPTS: • PREDEF. USE • NO PREDEF. USE • DATA DRIVEN • REALITY DRIVEN • PREDEF. CONTEXT • NO PREDEF. CONTEXT • SPECIALIZED MODEL • INFERRED MODEL What is fever? All definitions are accurate within their model, but what is fever? does the patient have fever? Formal representations The world according to a database Patients {ID#, ZIP code, BP} ID# ZIP code BP 001123 02139 80/120 001223 24425 65/130 The world according to an ontology patient has (identifier (is_a (ID#)) ∩ lives_in (geographic_area) ∩ has (blood_pressure (is_measured_by (blood pressure measurement(…))) identifier is_a is_identifed_by patient has blood pressure is_measured_by blood pressure measurement generates value lives in is_a 80/120 ID# geographical area is_identifed_by ZIP code 65/130 is_a Ontologies: the meaning of data An ontology: • Explicitly specifies meaning • Represents reality, not data • Is a formal schema • Its consistency can be automatically enforced and checked NLP Workflow • Example Pipeline Input handler -> Fetches document and pass to first processing component Paragrapher -> Paragraph and title detection Segmenter -> Maps tokens and multi-words to ontology. Rewriting to enhance mapping Section labeler -> Assigns section labels to paragraphs Syntactic parser -> Performs syntactic parsing validating against grammar Fragment labeler -> Assigns fragment labels to pieces of text within sections Lexeme filter Negation/modality Semantic tagger -> Filters out function words (e.g. determiners) to reduce false mapping positives -> Identifies negation, modality and future -> Further deduces concepts based on syntax, rewriting, full definitions and so on Vital signs extractor -> Extracts vital signs Labs extractor -> Extracts lab results FreePharma Disambiguator Coder Concept filters Relevance ranker Output handler -> Extracts medications -> Disambiguates concepts -> Codes to standard classification systems like SNOMED-CT, ICD-9,… -> Marks concepts that belong to different filters (e.g. diagnoses, procedures) -> Calculates relevance of concepts -> Creates XML/HTML/… output Semantic Tagging Sample: “Demonstrated benign small polyps in the antrum” Morphological Variations: polyp < polyps antrum Word Clustering: antral polyp Known Synonyms: maxillary sinus polyp, ; > antral polyp antral antral polyp Concept: SNOMED CT : 29074008 : POLYP OF ANTRUM (DISORDER) Types of Disambiguation by STRING: lexical match between a term, (or it’s inflections) and a concept in the ontology. symptom Ex.: “Patient presents fever” fever cough Types of Disambiguation by DEFINITION: match between terms and concepts in the ontology, where these concepts meet necessary and sufficient conditions (logicbased reasoning) has_location (liver) Λ is_a (biopsy) organ true true liver biopsy Ex.: “Patient underwent a liver biopsy” = liver biopsy procedure Types of Disambiguation by RELATIONSHIPS: match between SOME of the term(s), assigned to different concepts in the ontology, where these concepts compose the full definition of the concept using a ‘suggested parent’. CT of Neck = is_a (CT scan) Λ has_location (neck) is_a (CT scan) Λ has_location (thyroid) true true true is_a CT thyroid Ex.: “CT of thyroid” has_location neck = ? Examples of disambiguation Ontology and NLP and data integration Natural language processing concepts Terminologies are Lexicon Grammar mapped to terms in multiple languages Proprietary English ICD-9 Spanish LinKBase® Medical Ontology MEDCIN SNOMED CT CPT Radlex (partial) Cross-mapped to multiple coding systems Conclusion • Ontologies are powerful NLP tools for: • Segmentation • Disambiguation • Higher level inference • Interoperability of extracted data • Requires human resources for maintenance, but reduce the need for annotated data • They are “white boxes” • Models that can be expanded and changed • Combined with stochastic algorithms, they provide both formality and scalability Thank you NLP/U, formal representations “Patients in the North East have higher blood pressure than the average population” identifier is_a ID# is_measured_by has is_identifed_by patient blood pressure blood pressure measurement generates value lives in geographical area is_a is_identifed_by 80/120 ZIP code 65/130 is_a Disambiguation • Words in document are mapped to concepts in the ontology • When more than one candidate exist in the ontology, it builds a graph of concept relations using: 1. Nearness in sentence 2. IS_A Relationships 3. Horizontal relationships Syntactic Parsing «A very young patient was given a double dose by his mother.» Note passive construction Negation via Syntax Modality via Syntax Reference Resolution “TeSSI” understands indirect reference to patient Disambiguation The system is able to disambiguate between two different meanings of “depressed” in one and the same sentence. While it defines the “depressed” in “depressed patient” as a state of mind, it recognizes “depressed” as a part of “depressed fracture” and tags this noun phrase with the corresponding SNOMED code. Fragment Labeling • Sentences and phrases are labeled • History, exam, impression, etc. • Independent of superficial formatting • One label – one type of information Fragment Labeling “HPI: The patient whose mother had breast cancer presents with loss of hearing” Family History Chief Complaint FreePharma . Medication Extraction • Example Semantic Indexing TeSSI : Terminology Supported Semantic Indexing Input handler -> Fetch document and pass to first processing component Paragrapher -> Paragraph and title detection Segmenter Disambiguator Relevance ranker Indexer -> Map tokens and multi-words to ontology -> Disambiguate concepts -> Calculate relevance of concepts -> Write information to index for quick access. Information Extraction Input handler -> Fetch document and pass to first processing component Paragrapher -> Paragraph and title detection Segmenter -> Map tokens and multi-words to ontology Section labeler -> Assign section labels to paragraphs Syntactic parser -> Perform syntactic parsing validating against grammar Fragment labeler -> Assign fragment labels to pieces of text within sections Negation/modality -> Identify negation, modality and future Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on Vital signs extractor -> Extract vital signs Labs extractor -> Extract lab results FreePharma Output handler -> Extract medications -> Create XML/HTML/… output Knowledge Discovery Input handler -> Fetch document and pass to first processing component Paragrapher -> Paragraph and title detection Segmenter -> Map tokens and multi-words to ontology Section labeler -> Assign section labels to paragraphs Syntactic parser -> Perform syntactic parsing validating against grammar Fragment labeler -> Assign fragment labels to pieces of text within sections Negation/modality -> Identify negation, modality and future Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on Vital signs extractor -> Extract vital signs Labs extractor -> Extract lab results FreePharma -> Extract medications Rules Engine -> Xml structured rules for interpreting syntactic structure and forming semantic represenations -> Add discovered knowledge to onology Ontology writer Automatic coding Input handler -> Fetch document and pass to first processing component Paragrapher -> Paragraph and title detection Segmenter -> Map tokens and multi-words to ontology Section labeler -> Assign section labels to paragraphs Syntactic parser -> Perform syntactic parsing validating against grammar Fragment labeler -> Assign fragment labels to pieces of text within sections Negation/modality -> Identify negation, modality and future Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on Vital signs extractor -> Extract vital signs Labs extractor -> Extract lab results FreePharma -> Extract medications Rules Engine Code Calculator -> Xml structured rules for interpreting syntactic structure and forming semantic represenations -> Code calculator: e&M, ICD-9, CPT Output handler -> Create XML/HTML/… output NLP-based applications and products Quality Projects: CPR Technologies JCAHO Eclipsys • Extraction of CMS Core Measures • National Patient Safety Network • Datawarehousing 44 44 Coding Projects: Kaiser Permanente Convergent Solutions • • • • 45 E&M Coding SNOMED Coding ICD-9 Coding CPT in development 45 Medication Extraction Projects: The Marshfield Clinic Medquist UAB • Medication Reconcilation • Personalized Medication Project • Validation of therapies from literature 46 46 Interoperability Projects: Integic/DoD Revolution Health • Semantic Integration of the military health systems • Tie together free text content and portal applications 47 47 Web Search and Retrieval Projects: Revolution Health Merck • Ontolgy enhanced search • Concept based indexing 48 48 Radiology Projects: FUJIFILM MEDICAL SYSTEMS • Findings and pertinent negatives extracted from radiology reports 49 49 Radiology • Observation Types • • • • Findings Pertinent Negatives Quality Assurance Unclassified • Observation Components • Fundamentals • Modifiers • Qualifiers • Observation Status • (Present) / Historical • Changed/Not Changed/(not stated) Observation Types • Findings • E.g. “bilateral infiltrates” • Pertinent Negatives • E.g. “the lungs are clear” • Quality Assurance • E.g. “poor inspiration” • Unclassified • E.g. “the lungs are unchanged” Observation Components • Fundamentals • • • • Pathologic Entities Physiologic entities Devices Procedure • Modifiers • Location • Qualitative • Quantitative • Uncertainty (modal) • Negation Observation Status • • • • • • • Historical (non-Historical) Change Stated No Change Stated (Change not stated) Grouped Contains Uncertain (modal) Element Example PN and F (Modal) Example Hx and Grouped Example CS and NCS Example Quality Assurance Findings Finding of PE in historical context Finding of devices Modifier in long distance dependency Pertinent Negatives A knowledge that lungs should be clear negation of abnormalities statement of normality