Download Visualization of Relational Text Information

Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY Overview Prior work  Java based text mining  Computation of unnamed relations  Graphical display of relations Text  Text Text Text Text Tex t Text Text Tex t Relations between terms   Noun phrase co-occurrence statistics [Roark, Charniak] Choose seed words and look for terms near them. [Brin] [Gravano, Agichtein] – Repeat  Biomedical domain – Blaschke used dictionary of common verbs – Pustejovsky found inhibit relations  Stevens, Palakal, Mostafa – Detected abstract-wide co-occurrence using dictionary of genes and useful verbs. Graphical Displays Biolayout – protein similarity  ProtInAct – interactive system using yFiles  Zhang – interactive 3D system  Jenssen – gene network  Leroy – GeneScene  BioLayout –Enright and Ouzounis Five related protein families and their corresponding relationships. Spheres represent proteins and lines represent protein similarities. ProInAct- Spencer and Bennett Proteins clustered by functional interaction Zhang-Protein interaction mapping Jenssen – A literature network Lines connect genes that have co-occurred in 1 or more papers. Leroy –GeneScene What would we like to do?  Find scientifically meaningful connections between important terms. – Such as Swanson’s Reynaud’s disease – fish oil connection. Allow exploration of relations by user.  Filter the relations by ontology or term types  Perform path analysis  Let the user vary the graphical display.  Data we analyzed  Two sets of patent data – 584 patents on Viagra and phosphodiesterase inhibitors. – 1514 patents on quinolones (like Cipro) Recognized major technical terms in each patent.  Filtered organic chemical nomenclature.  The Talent text mining system  Text Analysis and Language Engineering Tools – Finds multiword noun phrases – Does shallow parse – Can extract NPs and VGs  As well as all other sentence parts The JTalent Library  Java class library with JNI interface – To Talent DLL  Creates database load files of terms – – – – Paragraph Sentence Offset Term type (NP, VG) TalentShow Demo The KSS Library  Java class library of functions for – Accessing a database (DB2, Access) – Manipulating a search engine – Manipulating tables of information created by JTalent. Database Tables  Documents – Title, author, URL, ID  TermDocs – – – – –  Term Paragraph Sentence Offset Type Dictionary of terms, types and IDs – Such as MeSH Computing term information Compute unique terms from Termdocs  Compute frequency  Compute salience  – Based on frequency – Number of docs they appear in more than once Compute term relations Named relations based on abbreviation expansions.  Unnamed relations based on proximity, with weight based on how frequently they occur near each other.  Mutual information weight:   totalterms  paircount   m  log  freq1  freq2   Tuning Computed relations Select only terms above a salience threshold.  Only relations in which one or both are members of an ontology.  Store relations in a database table for rapid access:  Term | weight | term  Original System Visual client  SOAP server  – Queries database to get relations – Round trip for each new query  Instead, we export the data for the user to visualize as they wish. Exporting relations   Save relations and ontology information in xml file. <relation> – <term>    <iq>78</iq> <source>MeSH</source> <relationDocuments> – <doc> 34</doc – </term> – <term> </term>   </relation> This XML file is a portable version of the computed relations that we can then use with any number of viewers. A Graphical Relations Viewer Creates a Java Relations object for each relation it reads from the XML file.  Inserts them into a Trie structure based on lower cased first term.  – If there is already a Relation at that point, it adds them to a Vector for that term.  Creates an alphabetical list of all terms in a 2nd Trie. Using the Viewer   When you enter part of a term, it shows all terms starting with that fragment in the left list box. When you click on a term, it shows all its relations in the right list box. Lexical Navigation  Displays relations between terms graphically and allows you to explore them without formulating a specific query. Possible enhancements Show only terms belonging to an ontology.  Show only higher IQ terms  Show the documents the relations occur in.  Show the ontology reference.  Show computed paths  Show more kinds of named relations.  – Inhibits, expresses Evaluations of Information Visualization    Few, if any, graphical displays have been evaluated thus far for effectiveness. Usability studies are hard to construct and carry out. Intuition seems to show – that exploration may result in discoveries. – Relations more than one step apart seem best displayed graphically.  Remains to be shown that such visualizations are actually useful. Differences in Intent  Displays may represent information your system has discovered. – Gene – protein relations  Or they may represent data from which the user may discover new information. – New 2nd or 3rd order relationships  These are rather different applications of visualization technology Summary Java-based text mining system  Database of terms and positions  Computation of relations  Export as XML  Graphical relations viewer  The value of such visual interfaces has not yet been established.  Acknowledgements Bhavani Iyer – XML export  Eric Brown – DictMatcher hash code  Daniel Tunkelang – graphical layout  Bob Mack – paper suggestions 

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Visualization of Relational Text Information