Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DataJewel1: Tightly Integrating Visualization with Temporal Data Mining Mihael Ankerst, David H. Jones, Anne Kao, Changzhou Wang 1US patent pending DataJewel: A novel Architecture for temporal data mining Motivation: In different domains, different kind of patterns are of interest Architecture that provides access to many temporal mining algorithms Databases are built based on organizational needs Architecture that links together databases Databases can be huge in size Data has to be compressed Current Data Mining tools are for data mining experts Architecture that is very intuitive and easy to use Visual Data Mining Data Mining Algorithms Visualization - Evaluation + + Flexibility - + User Interaction - + Actionable Data Mining Visual Data Mining - Information Visualization Visual Data Mining Architecture: Tightly Integrated Visualization Data Data Visualization of DM-Algorithm the result DM-Algorithm step 1 DM-Algorithm step n DM-Algorithm Result Result Result Visualization of the result Knowledge Knowledge Knowledge Preceding Visualization (PV) Subsequent Visualization (SV) Tightly integrated Visualization (TIV) Visualization + Interaction Visualization of the data Data Architecture of DataJewel Data source layer Access and link multiple heterogeneous databases, data sources Statistical layer Compression, aggregation, sampling Data mining layer Extensible set of data mining algorithms for automatic pattern discovery Visualization layer Extensible set of visualizations for representing data and the patterns + interaction capabilities for the user to incorporate domain expertise The Visualization Component Time Event type Location … 09/11/2001 Door broken Seattle … 09/12/2001 … … … January 2002 S M T W T F S Tuesday, Jan 1st 2002 Doors Lights Engine Landing Gear The Temporal Mining Component Goal: Mining algorithms should be Very efficient (result in interactive times) Types of patterns: single event: recurrence, periodicity,… multiple events: similarity, causality, clustering,… Tightly integrated with the visualization Solution: Algorithm computes pattern and updates visualization by assigning unique colors just to events which are contained in the pattern All algorithms result in updating the color assignment: - CalendarView visualizes the data and the patterns - Same color assignment interface is used by the user and the algorithm The Temporal Mining Component Implemented new mining algorithms LongestStreak Most Deviations Correlated Events Basic ideas of algorithms are motivated by control charting (stabilized p-chart) Frequency mean time 7 5 5 6 10 The Statistical & Database Component • Access to data from different databases • Precompute compressed/aggregated/ sampled data • Use lookup tables to further compress data Currently, we can analyze millions of records in real-time The Statistical & Database Component Airline_a Procurement DB Date ATA Maintenance DB Complaint_t … xt 12/1/2000 73 …. … 12/1/2000 73 … … 15/1/2000 49 … … Maintenance DB Airline_b Date ATA Complaint_t … xt 1/1/2000 35 …. … 1/1/2000 35 … … 1/1/2000 39 … … The Statistical & Database Component Airline_a Procurement DB Date ATA Complaint_t … xt 12/1/2000 73 …. … Aggregate data with: 12/1/2000 73 … … 15/1/2000 49 … … Select Date, ATA, count(*) as Freq From airline_a GROUP BY Date, ATA ORDER BY Date, ATA Maintenance DB Airline_b Date ATA Complaint_t … xt 1/1/2000 35 …. … Date 1/1/2000 35 … … 1/1/2000 39 … … 12/1/1999 73 27 15/1/1999 49 9 … … ATA … Freq The Statistical & Database Component Airline_a Procurement DB Date ATA Maintenance DB Complaint_t … xt 1/1/2000 35 …. … Aggregate data with: 1/1/2000 35 … … 1/1/2000 39 … … Select Date, ATA, count(*) as Freq From airline_b GROUP BY Date, ATA ORDER BY Date, ATA Airline_b Date ATA Complaint_t … xt 1/1/2000 35 …. … Date ATA Freq 1/1/2000 35 … … 1/1/2000 39 … … 1/1/2000 35 344 1/1/2000 39 193 … … … User-Centric Data Mining User selects data source/ attributes Data is compressed and loaded Data is visualized User invokes algorithm User interacts with visualization User selects visualization technique User selects date range Raw data is shown DataJewel – Scenario: Mining Algorithm Using 41 “different” colors… DataJewel – Scenario: Mining Algorithm DataJewel – Scenario: Mining Algorithm Press here for running mining algorithm DataJewel – Scenario: Mining Algorithm DataJewel – Scenario: Mining Algorithm DataJewel – Scenario: Mining Algorithm DataJewel – Scenario: User Interaction DataJewel – Scenario: User Interaction DataJewel – Scenario: User Interaction DataJewel – Scenario: User Interaction Screenshots One airline, one model, ATA: 49 (airborne auxiliary power) Conclusions Data mining algorithms and visualization technique can nicely complement each other CalendarView is a new visualization technique, representing frequency of daily events DataJewel uses the same visualization to represent the data and the patterns. The color assignment interface is used by both the user (to incorporate domain knowledge) and for the computer (to represent the discovered patterns). These two key properties greatly improve the applicability of the system by domain experts. Future work: user studies, new visualizations, algorithms, …