Download Lecture 2: VIS - information visualization and data mining

Parallel coordinates Tax rates Population House price Birth-rate Parallel coordinates Tax rates Population 34000 House price 2300000 Birth-rate 27 23 Tax rates Population House price Birth-rate 23 34000 2300000 27 Parallel coordinates Tax rates Population House price Birth-rate Tax rates Population House price Birth-rate 23 34000 2300000 27 Parallel coordinates Tax rates Population House price Birth-rate Tax rates Population House price Birth-rate 23 34000 2300000 27 28 12000 1900000 25 … Parallel coordinates Tax rates Population House price Birth-rate Parallel coordinates Tax rates Population House price Birth-rate Positiv korrelation Ingen uppenbar korrelation Negativ korrelation Parallel coordinates Table Lens MPG Horsepower Weight Acceleration Cylinders Year Table Lens MPG Horsepower Weight Acceleration Cylinders Year Mosaic plot Titanic Mosaic plot 1st 2nd 3rd Crew Mosaic plot 1st Child Adult 2nd 3rd Crew Mosaic plot 1st Child 2nd 3rd q Adult Female / Male Crew 3D Representations • Use 3D wisely • More dimensions do not mean that more information is simultaneously displayed 3D Representations • Use 3D wisely • More dimensions do not mean that more information is simultaneously displayed 3D Representations • Use 3D wisely • More dimensions do not mean that more information is simultaneously displayed Presentation Space Limitations • • • Scrolling Overview + Detail Zoom and Pan Scrolling Overview + detail • • Focus+context • No information is hidden Micro / macro readings Overview + detail Overview + detail Overview + detail Distortion Perspective wall Distortion Distortion MPG Horsepower Weight Acceleration Cylinders Year Distortion MPG 14 11 Horsepower 150 132 Weight 4532 4821 Acceleration 135 110 Cylinders 8 6 Year 72 71 Zoom and pan Zoom and pan Geometric zoom Geometric and semantic zoom Interaction Interaction techniques • Brushing • A collection of techniques to dynamically query and directly select elements in visual representations Interaction techniques MPG • Brushing Horsepower Weight Acceleration Cylinders Year Interaction techniques Model = Saab Weight • Boot = large Cylinders = 4 Details on demand Price Interaction techniques • Coordinated and multiple views (CMV • An action in one view is immediately propagated to all other views Demo http://setebos.svt.ntnu.no/tomasz/gallery/Vul16/ Analysis of (very) Large Data Data Mining • Having an (enormous) amount of data ‣ Wonder what it can tell us ‣ Isolate (unexpected) relationships ‣ (Hopefully) find some which are - Interesting - Novel ‣ Informative 37 Data Mining: • Extraction of interesting (non-trivial, previously unknown and potentially useful) information or patterns from data in ((very) large) databases 38 Data Mining and Visualization • Data mining provides complex representations • Fits (optimizes) them to the data • Then visualize the data mining results. 39 Visual Data Mining Possible patterns Relevant Data InfoViz Data Mining Data Warehouse Selection New Knowledge! Data Cleaning Database(s) 40 Problems with Data • Holes - Missing data values • Errors and ‘estimates’ ‣ Income of *exactly* 100000? • Sample inconsistencies ‣ e.g. medical records with different numbers of readings for the same person 41 Data Mining Tasks 1. Exploratory Data Analysis 2. Descriptive Modelling 3. Predictive Modelling 4. Discovering Patterns and Rules 5. Retrieval by content 42 Exploratory analysis • Pure data mining • “Explore the data with no clear idea of what we are looking for” • Typically very visual approach ‣ Very tied to ‘Visual Data Mining’ • Problems with: ‣ Large number of data points ‣ Large numbers of dimensions in data 43 Descriptive Modelling • Attempt to describe all of the data • Perhaps use: ‣ Model of overall probability distribution in the p-dimensional space ‣ Partitioning into groups e.g.: - Cluster analysis for natural grouping - Segmentation for user-desired groups 44 Predictive Modelling • Form a model of the data set which allows prediction of a variable based on the known values of the others • Classification ‣ Prediction of a discrete variable • Regression analysis ‣ Prediction of a continuous variable • (Prediction does not mean future here) 45 Predictive Modelling 46 Discovering Rules and Patterns • Concerned with the identification of local patterns in sub-sets of the space • Examples: ‣ Frequently occurring sets of transactions ‣ Finding patterns of action indicating fraud 47 Retrieval by Content • Using a pattern of interest to locate similar patterns • Examples: Automatically… ‣ Finding images with similar content - Face recognition at airports ‣ Finding text documents with similar content - e.g. Urkund 48 Scoring functions • All of the preceding classes of task share a common feature: ‣ The notion of “is like” or “similarity” - Or difference (dissimilarity) ‣ Defined through a ‘scoring function’ • In numerical data this is often easy • In general it is not… 49 Scoring Functions • Is an orange like an apple? • Yes: ‣ Both are fruit. ‣ Both grow on trees. • No: ‣ One is citrus, one isn’t. ‣ One is orange, one is is green/red 50 Scoring Functions • Specification of the scoring function(s) is crucial to the effectiveness of the system. • One of the biggest contributions the user has to make! 51 DM for Vis • Modelling, Patterns and Rules are valid filters for mapping • Simplification of data - modelling • Extraction of interesting features: ‣ Patterns, Rules • Form valid representations for data features 52 Sampling • • • Take K items to be a representative set of M items Data abstraction Many ways of doing this • Random • Systematic • Density-based • … 53 Cluster Analysis (Descriptive Modelling) • Cluster: a collection of data items ‣ Similar to one another within the same cluster ‣ Different from the items in other clusters • Cluster analysis ‣ Grouping sets of data items into clusters ‣ Data abstraction ‣ Automatically 54 Major clustering approaches • There are a number of approaches ‣ We will consider just one • K-Means algorithm: ‣ Given a value k, find a partition of k clusters that minimizes the total intracluster variance 55 K-means, example with K=3 56 K-means, example with K=3 57 K-means, example with K=3 58 K-means, example with K=3 59 K-means, example with K=3 60 K-means, example with K=3 61 K-means Method 1. Place K points into the space represented by the items that are being clustered - These points represent initial group centroids 2. Assign each data item to the group that has the closest centroid 3. When all items have been assigned, recalculate the positions of the K centroids 4. Repeat Steps 2 and 3 until the centroids no longer move 62 SMART Series: Sketch-based Matching through Approximated Ratios in Time Series Searching for all possible patterns in a time series is a computationally complex problem. 63 SMART Series: Sketch-based Matching through Approximated Ratios in Time Series DEMO 64

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 2: VIS - information visualization and data mining