Download Text Mining SIG TV Novartis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Text Mining Special Interest Group
[Therese Vachon, NIBR]
Informatics and Knowledge Management/Information
and Knowledge Engineering
Novartis Institute for Biomedical Research,
Cambridge, MA 4-6th October 2004
At NIBR, Text Mining is considered a
solution for
• Analyzing, tagging, annotating, exploring,
structuring and classifying textual data and
large document sets (internal and external)
• Identifying meaningful concepts from text
and relationships between concepts
• Improving the quality of text retrieval
methods
• Detecting novel patterns
• Detecting similarities across textual data
• Improving navigation across data sources and
document sets
What are the Critical Business Needs that
Text Mining Could Address?
• Discovery of unexpected relationship and
relevant information
• Generation of new hypotheses
• Assisted annotation and curation
• Unified view of heterogeneous sources
• Analysis of trends and patterns
• Analysis of complex relationships between data
elements
• Detection of unexpected or emerging information
• Knowledge inference
• Contextual and semantic navigation
What Questions do Users Want to Ask ?
• Find information about specific products,
targets, genes, proteins, companies, etc.
• Link literature and experimental data
• Identify biological interactions
• Semantic text analysis of full text papers
• Dynamic data flow analysis and categorization
• Monitoring competitors’ capabilities and
activities
• Discovering networks of associations
Text Mining Applications/Components
Function
Current
Future
Text retrieval methods
Keyword searching, fuzzy
search, semantic search
Case-based reasoning
Query inference
Knowledge representation
Thesauri, taxonomies,
ontologies
UIMA (unstructured information
management architecture), …
Annotation
Manual annotation
Assisted annotation
Categorization &
Clustering
K-means partitioning,
hierarchical clustering,
SOMs, rule-based
categorization
SVM, Latent semantic analysis
NLP or OBIIE
Concept extraction
IE 1st generation
IE 2nd generation
Analysis of positive and negative
associations among data objects
Full text access
Indexing , access and mining
of full text internal
documents
Indexing, access and mining of full
text literature articles
Visualisation
Interactive network of cooccurences (XML SVG)
Heat maps, factorial maps,
etc. and graphlets
technology
Network of typed relations
Current Text Mining Applications in
Novartis/NIBR?
•
•
•
•
•
•
Knowledge Space Portal
Novartis Knowledge Miner (Ulix)
Competitive Intelligence Analysis Platform
News Analysis Platform
Research EIS
Text mining components
What Future Applications are Planned?
•
•
•
•
•
•
•
•
Text mining in genomics
Text mining in chemoinformatics
Text mining of full text publications
Text mining for marketing and sales
Mining of medical enquiries
Expert location systems
Generic text analysis platform
Generic categorical variable analysis platform
Knowledge Space Portal
• Provide key elements for efficiently accessing
Novartis-internal and external information
relevant to daily decision in the drug
discovery and development process:
– Data integration across heterogeneous data sources and
applications (internal and external)
– Consistent user interface for data retrieval, exploration and
analysis across all data types
– Contextual (ultralink), tree-based (static or dynamic
taxonomies) and semantic (knowledge map) navigation
– Data exploration and analysis methods
– Personalized views
– Collaborative, annotation and information sharing tools
– Alerting
Display-Navigation-Ultralink
Protease modulator in Literature DB (Medline-Embase)
Easy navigation in
record titles
Sort capabilities
Ranking value
and access to
document
Analysis tools
Search report:
Number of Docs,
Key-words
extracted
Graph Navigator –
Protease modulators in CI DBs
July 2004 - ADIS & Pharmaprojects
Clustering
Data Analysis –
Protease modulators in CI DBs
Univariate - Companies
Univariate - Diseases
conditionned by Companies
July 2004 - ADIS & Pharmaprojects
Univariate - MOA
Clustering Diseases -MOAs
Conclusions
• Text mining techniques have already been
implemented in relevant areas
• Additional techniques need to be
developed/tested/implemented especially in
the field of information extraction/NLP and in
the field of categorization
• Collaboration with external partners/research
institutes needed
• Text mining components to be applied to all
applications dealing with text