Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Enhancing Quality of Retrieval Through Concept Edit History -EVS Update Frank Hartel Sherri De Coronado Gilberto Fragoso Iris Guo Kim Ong February 26, 2003 NCICB Jamboree 1 Outline • Terminology development -- concept creation, modification, split, merge, retirement • Edit history Usage • TDE Ontylog editor extension • Next steps • Summary February 26, 2003 NCICB Jamboree 2 Elementary Edit Actions In Terminology Development (Create, Modify, Split, Merge, Retire) Create Split Create Modify Modify Version 1 Retire Create Split Modify Version 2 Merge Split Retire Merge Create Split Modify Version 3 Retire Merge Version 4 Retire Merge Evolution of versions/baseline over time February 26, 2003 NCICB Jamboree 3 Scientific Reasons for Concept Splits • Oncogene ras discovered based on sequence homology (hybridization) to the v-onc gene of the Harvey strain of murine sarcoma virus. • Subsequently, it was discovered that there were multiple related ras genes, Ha-ras, and Ki-ras. Later on, a new ras, N-ras, was found. February 26, 2003 NCICB Jamboree 4 Scientific Reasons for Concept Merges • BCL1 gene discovered in the vicinity of a t(11;14) translocation, involved in the malignant transformation of B cells. • PRAD1 gene found in parathyroid adenomas bearing chromosomal abnormalities. • CCND1 codes for one of a set of proteins, cyclins, that regulate cell cycle progression. February 26, 2003 NCICB Jamboree 5 Concept Based Retrieval C2 C1 User Concepts used for retrieval Search Engine Relevant documents D1<C1, C2> D2<C1, C3, C4 > Document February 26, 2003 NCICB Jamboree Indexing terms 6 Edit History Usage Thesaurus version pre-indexed documents Edit History R1 Version 1 new R2 Version 2 modify Version 3 retire merge R3 split R4 Version 4 Search Engine February 26, 2003 Concepts used for retrieval • Document are often indexed using different versions of terminology. • Re-indexing document to keep in pace with changes made to the terminology is impractical and can be very costly. • Edit history can greatly enhance precision and recall. NCICB Jamboree 7 Edit History Storage February 26, 2003 NCICB Jamboree 8 Terminology Development Environment February 26, 2003 NCICB Jamboree 9 Terminology Development Environment • Previously, only three types of edit action are logged – add, modify, and delete. • Concepts created through split actions are confounded by newly created concepts. • Concepts merged into other concepts are indistinguishable from retired concepts. • Failure to explicitly track merge and split edit actions may result in a low recall rate in information retrieval. *February Recall defines documents retrieved as fraction of all relevant documents. 26, 2003the number of relevant NCICB Jamboree 10 Approach Taken to Extend TDE • Create reusable concept edit tree Java bean • Develop user interface for processing split, merge, and retirement edit actions • Log edit events in TDE history database with clarity and precision February 26, 2003 NCICB Jamboree 11 Extend Ontylog Editor With Plug-Ins Use Concept Edit Tree widget to build plug-ins February 26, 2003 NCICB Jamboree 12 TDE Extension - Split Panel Roles and properties may be transferred from one concept to another using drag & drop. A concept is created as a result of a split. Edit action is explicitly logged in the TDE History database as a split event. February 26, 2003 NCICB Jamboree 13 TDE Extension - Merge Panel Concept to stay Concept to retire Non-redundant roles and properties are transferred from the retiring concept to the resultant merged concept. February 26, 2003 NCICB Edit action is explicitly logged in theJamboree TDE History database as a merge event. 14 TDE Extension - Preretirement Concept to retire •Sub-concepts are re-treed. •Role relationships targeted (i.e., pointing) to the retiring concept are either removed or re-targeted. February 26, 2003 NCICB Jamboree Concept can be retired only if all preconditions are met. 15 TDE Extension - Retire Panel A non-editable tree shows concept definition information pertinent to the retiring concept. February 26, 2003 NCICB Edit action is explicitly logged in theJamboree TDE History database as a retire event. 16 Next Steps • Consolidate edit history logged by individual modelers in terminology development environment (TDE) into concept history data useful to Distributed Terminology System (DTS) users February 26, 2003 NCICB Jamboree 17 Next Steps • Extend caBIO and DTS Server capability to facilitate high quality information retrieval XMLRPC Client Repositories of Indexed Document External Databases February 26, 2003 DTS XMLRPC History Server API caBIO.jar DTS Extension End User Applications DTS Server Edit history database EVS Concepts used for retrieval NCICB Jamboree ( 18 ) to be developed Summary • Tracking explicit edit actions in TDE is absolutely essential to terminology and concept based information retrieval. • We have successfully extend TDE Ontylog editor to explicitly track split, merge, and retirement edit events. • Concept history data and supporting APIs will soon become available to DTS users and developers through caBIO. February 26, 2003 caBIO (Cancer Bioinformatics Infrastructure Objects) NCICB Jamboree 19 EVS Team Frank Hartel Sherri De Coronado Gilberto Fragoso Margaret Haber Larry Wright Jim Oberthaler Northrop Grumman, Inc. Kevric Corporation Aspen Inc. Apelon, Inc. February 26, 2003 NCICB Jamboree Kim Ong Iris Guo Bob Dione 20 Contact Dr. Francis W. Hartel Center for Bioinformatics National Cancer Institute 6116 Executive Blvd. Rockville, MD 20892-8335 Phone: (301) 435-3869 Fax: (301) 480-4222 Email: [email protected] February 26, 2003 NCICB Jamboree 21