Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
New Directions in Analysis and Visualization [Visual Analytics] Dr Jeremy Walton NAG Ltd, Oxford [email protected] Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford Overview Introduction NAG, HECToR Visualization distribution, collaboration, steering Data mining classification, exploratory analysis The ADVISE project large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 2 Overview Introduction NAG, HECToR Visualization distribution, collaboration, steering Data mining classification, exploratory analysis The ADVISE project large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 3 NAG profile Products Mathematical, statistical, data analysis components 3D visualization, compilers & tools HPC software engineering services HECToR support Users Academic researchers Professional developers Analysts / modelers Founded 1976 Not-for-profit company Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 4 High-End Computing Terascale Resource Latest high-end computing service for UK funded by EPSRC, NERC & BBSRC will run from 2007-2013 Partners: Hardware: Cray Inc Service Provision: University of Edinburgh HPCx Ltd hardware hosting, user services, help desk CSE Support: NAG Ltd technical assessment of project application porting / tuning / optimisation of user codes training courses (inc. visualization) best practice guides, documentation, FAQs Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 5 Overview Introduction NAG, HECToR Visualization distribution, collaboration, steering Data mining classification, exploratory analysis The ADVISE project large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 6 Visualization toolkits Help construct visualization applications no wheel-reinvention, stone canoes, chocolate teapots Proprietary supported commercial systems e.g. Excel, IRIS Explorer, Spotfire Open source, freely available software e.g. OpenDX, InfoVis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 7 NAG’s IRIS Explorer… General purpose toolkit for data visualization Reusable building blocks (modules) Connect modules to build application Point-and-click development Visual programming approach Build, execute, reshape Add new modules, if required Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 8 …in action Application in map editor Modules in module librarian Reads data Colormaps it Makes ribbon Displays it Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 9 Make the connections Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 10 Add more modules... Adds axes Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 11 ...and even more Adds caption Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 12 Some examples Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 13 Trendalyzer (Gapminder) Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 14 Worldmapper: area Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 15 Worldmapper: deaths by disease Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 16 Many eyes: shared visualization Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 17 Overview Introduction NAG, HECToR Visualization distribution, collaboration, steering Data mining classification, exploratory analysis The ADVISE project large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 18 NAG Data Mining Tools Data Cleaning Data imputation - adding missing values Outlier detection - finding suspect data records Data Transformation Scaling Data - before distance computation Principal Component Analysis - reducing # of variables Model fitting Cluster analysis - finding interesting groups Classification techniques - # of groups is known Regression no groups - outcome is continuous Linear / Non-linear / Time series Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 19 Example: exploratory data analysis How many species of water vole (Arvicola) in UK? Measurement data Presence / absence of 13 skull characteristics 300 observations, each in one of 14 regions 3 groups: A. terrestris / A. sapidus / unclassified UK cases Treatment Average data within each region Gives 14 data points in 13 dimensions How to display dataset? Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 20 2D scatterplots Analysis 2D scatterplots? Structure is unclear (13 x 12) / 2 = 78 plots needed Principal components analysis? 2 PCs explain 49% of the variance 3 PCs explain 65% of the variance Should be > 85% for confident representation Fisher’s iris dataset (4 variables) is 95% Alternative technique Metric scaling Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 22 Metric scaling 14 data points – one for each region Each point has values for 13 variables Construct 14 by 14 dissimilarity matrix, Δ Δij = distance between points i & j in 13D space Δ is symmetric, with zero diagonal elements Want to find a new matrix, Δ* set of 14 new data points in 3D space that preserve Δ Project Δ to Δ* using metric scaling Display data points in 3D Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 23 Exploratory data analysis conclusions 2D scatterplots don’t indicate group structure cf. iris dataset 3D PCA unreliable here Metric scaling of Δ used to reduce D from 13 to 3 3D visualization reveals group structure Distinct A. sapidus group UK sample represents only A. terrestris Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 28 Overview Introduction NAG, HECToR Visualization distribution, collaboration, steering Data mining classification, exploratory analysis The ADVISE project large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 29 The ADVISE project DTI-funded research project, started March 2007 NAG / VSN / University of Leeds Merge visualization & statistics (visual analytics) use statistics to identify key characteristics of dataset understand the characteristics through visualization User community pharmaceuticals environmental science engineering Initial user meeting held September 2007 Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 30 Large datasets Size matters (but isn’t everything) Developer’s view: Too large for our current system Problems of performance robustness User’s view: Too large for me to understand Current ADVISE datasets are “only” a few GB complications (e.g comparing several) could raise this HECToR users have TB datasets Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 31 ADVISE ideas Retention of visual programming interface Re-use of algorithmic base IRIS Explorer modules GenStat statistics functionality (from VSN) Three layered architecture User interface Web service middleware Visualization components Distribution, tailored user interface, collaboration Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 32 ADVISE progress Porting IE modules to standalone environment some of these use GenStat for statistics New system used to revisit air quality demo early (IEEE Viz 96) web-based visualization new system more efficient Working with real user data Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 33 Conclusions NAG offers software components for developers no wheel-reinvention, stone canoes, chocolate teapots Visualization & data mining crucial for analysis distribution, steering, classification, exploration interactivity / interrogation important integration is an ongoing field of activity ADVISE project developing a new system for visual analysis working with real user problems improving understanding of data Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 34