Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Text Mining Special Interest Group [Therese Vachon, NIBR] Informatics and Knowledge Management/Information and Knowledge Engineering Novartis Institute for Biomedical Research, Cambridge, MA 4-6th October 2004 At NIBR, Text Mining is considered a solution for • Analyzing, tagging, annotating, exploring, structuring and classifying textual data and large document sets (internal and external) • Identifying meaningful concepts from text and relationships between concepts • Improving the quality of text retrieval methods • Detecting novel patterns • Detecting similarities across textual data • Improving navigation across data sources and document sets What are the Critical Business Needs that Text Mining Could Address? • Discovery of unexpected relationship and relevant information • Generation of new hypotheses • Assisted annotation and curation • Unified view of heterogeneous sources • Analysis of trends and patterns • Analysis of complex relationships between data elements • Detection of unexpected or emerging information • Knowledge inference • Contextual and semantic navigation What Questions do Users Want to Ask ? • Find information about specific products, targets, genes, proteins, companies, etc. • Link literature and experimental data • Identify biological interactions • Semantic text analysis of full text papers • Dynamic data flow analysis and categorization • Monitoring competitors’ capabilities and activities • Discovering networks of associations Text Mining Applications/Components Function Current Future Text retrieval methods Keyword searching, fuzzy search, semantic search Case-based reasoning Query inference Knowledge representation Thesauri, taxonomies, ontologies UIMA (unstructured information management architecture), … Annotation Manual annotation Assisted annotation Categorization & Clustering K-means partitioning, hierarchical clustering, SOMs, rule-based categorization SVM, Latent semantic analysis NLP or OBIIE Concept extraction IE 1st generation IE 2nd generation Analysis of positive and negative associations among data objects Full text access Indexing , access and mining of full text internal documents Indexing, access and mining of full text literature articles Visualisation Interactive network of cooccurences (XML SVG) Heat maps, factorial maps, etc. and graphlets technology Network of typed relations Current Text Mining Applications in Novartis/NIBR? • • • • • • Knowledge Space Portal Novartis Knowledge Miner (Ulix) Competitive Intelligence Analysis Platform News Analysis Platform Research EIS Text mining components What Future Applications are Planned? • • • • • • • • Text mining in genomics Text mining in chemoinformatics Text mining of full text publications Text mining for marketing and sales Mining of medical enquiries Expert location systems Generic text analysis platform Generic categorical variable analysis platform Knowledge Space Portal • Provide key elements for efficiently accessing Novartis-internal and external information relevant to daily decision in the drug discovery and development process: – Data integration across heterogeneous data sources and applications (internal and external) – Consistent user interface for data retrieval, exploration and analysis across all data types – Contextual (ultralink), tree-based (static or dynamic taxonomies) and semantic (knowledge map) navigation – Data exploration and analysis methods – Personalized views – Collaborative, annotation and information sharing tools – Alerting Display-Navigation-Ultralink Protease modulator in Literature DB (Medline-Embase) Easy navigation in record titles Sort capabilities Ranking value and access to document Analysis tools Search report: Number of Docs, Key-words extracted Graph Navigator – Protease modulators in CI DBs July 2004 - ADIS & Pharmaprojects Clustering Data Analysis – Protease modulators in CI DBs Univariate - Companies Univariate - Diseases conditionned by Companies July 2004 - ADIS & Pharmaprojects Univariate - MOA Clustering Diseases -MOAs Conclusions • Text mining techniques have already been implemented in relevant areas • Additional techniques need to be developed/tested/implemented especially in the field of information extraction/NLP and in the field of categorization • Collaboration with external partners/research institutes needed • Text mining components to be applied to all applications dealing with text