Download Data Mining Workbench for Interactive Data Exploration

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Analyzing the Web from Start to Finish
Knowledge Extraction from a Web Forum using KNIME
Feb 25, 2014
Bernd Wiswedel
KNIME.com AG, Zurich, Switzerland
Agenda
•
•
•
•
KNIME Overview
Demo / Intro
KNIME forum analyis … using KNIME
Q/A
A Brief History of KNIME
2004: KNIME development commences
2006: KNIME v1 released
2006: Spin-off in Konstanz, Germany
2006-2007: First commercial partners
2008: KNIME moves to Zurich
2010: Enterprise products released
2011: KNIME.com AG founded
2013: KNIME comes to the West Coast…
+3000 Organizations Using KNIME
~30% Life Science
~70% Business Intelligence, Analytics
+50 Very Active Community Developers
„KNIME saved my
life in a world of scripts
that I do not want to learn!“
2012
Who’s Using KNIME?
• >22.000 Individuals
• ~3.000 Organizations world wide
• ~400 KNIME.com Customers
The KNIME Platform
KNIME loads and integrates data from diverse data sources:
• Different data bases
• Various file formats (CSV, XML, SDF, etc.)
KNIME provides huge repository of
modules for easy-to-use, modular
• Data preprocessing
• Data fusion
• Data transformation
In addition to standard data
mining techniques, KNIME
adds cutting edge data
analysis algorithms.
(…thanks to its academic
roots)
Interactive views provide data overviews
and insights into the learned models.
Interactive linking&brushing techniques
allow for powerful exploration of models
and data.
KNIME
Due to its open API and “node-in-a-sandbox”-approach
additional (also external) tools are easily integrated,
e.g.
• Access to the statistics tool R
• Complete integration of the machine learning
library WEKA
• Application area specific integration, e.g. CDK
(Chemical Development Kit), RDKit, ImageJ, …
KNIME is Eclipse-based: Integrating other Eclipse
projects such as BIRT, DTP, etc. provides even more
functionality
KNIME Selected Node Highlights
Over 1000 native and imbedded nodes included:
Statistics
Data Mining
Time Series
Image Processing
Neighborgrams
Web Analytics
Text Mining
Network Analysis
Social Media Analysis
WEKA
R
Database Support
ETL
Text Processing
Data Generation
XML Read/Write
PMML Read / Write
Social Media Analysis
Business Intelligence
Community Nodes
3rd Party Nodes
Advanced Visualization
14
Demo.
KNIME Forum Analysis
http://tech.knime.org/forum
KNIME Forum Analysis
Challenges:
• Get data into KNIME
• Extract simple statistics (how many posts,
response time, response length)
• Classify topics and detect topic shifts
• Identify content and users
Forum Analysis – Classify Posts
• Use text mining to classify forum post into
categories such as ‘io’, ‘manipulation’,
‘mining’, …
• No training set available
 (mis-)use KNIME node description
• See evolution of discussion topics over the
years
Forum Analysis – Classify Posts
Want to classify forum post (only
first post, no comments)…
Forum Analysis – Classify Posts
… using KNIME node description text
as labeled training set
Demo.
Forum Analysis – Content & Users
• Look at individual categories (KNIME
General, Developer, Reporting, …)
• Learn what is discussed
• See who is contributing
Forum Analysis – Content & Users
Input are all discussions
in one forum category…
Forum Analysis – Content & Users
Output is a multi page
report with tag cloud and
user connection graph
Combines KNIME’s text and
network mining extensions
Demo.
Thank You
For more information:
[email protected]
http://www.KNIME.com
KNIME.com AG
Technoparkstr.1
8005 Zurich
Switzerland
Tel: +41-44-445-2660
Fax: +41-44-445-2662