Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
New Directions in Analysis and Visualization [Visual Analytics] Dr Jeremy Walton NAG Ltd, Oxford [email protected] Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford Overview  Introduction  NAG, HECToR  Visualization  distribution, collaboration, steering  Data mining  classification, exploratory analysis  The ADVISE project  large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 2 Overview  Introduction  NAG, HECToR  Visualization  distribution, collaboration, steering  Data mining  classification, exploratory analysis  The ADVISE project  large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 3 NAG profile  Products  Mathematical, statistical, data analysis components  3D visualization, compilers & tools  HPC software engineering services  HECToR support  Users  Academic researchers  Professional developers  Analysts / modelers  Founded 1976  Not-for-profit company Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 4 High-End Computing Terascale Resource  Latest high-end computing service for UK  funded by EPSRC, NERC & BBSRC  will run from 2007-2013  Partners:  Hardware: Cray Inc  Service Provision: University of Edinburgh HPCx Ltd  hardware hosting, user services, help desk  CSE Support: NAG Ltd  technical assessment of project application  porting / tuning / optimisation of user codes  training courses (inc. visualization)  best practice guides, documentation, FAQs Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 5 Overview  Introduction  NAG, HECToR  Visualization  distribution, collaboration, steering  Data mining  classification, exploratory analysis  The ADVISE project  large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 6 Visualization toolkits  Help construct visualization applications  no wheel-reinvention, stone canoes, chocolate teapots  Proprietary supported commercial systems  e.g. Excel, IRIS Explorer, Spotfire  Open source, freely available software  e.g. OpenDX, InfoVis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 7 NAG’s IRIS Explorer…  General purpose toolkit for data visualization  Reusable building blocks (modules)  Connect modules to build application  Point-and-click development  Visual programming approach  Build, execute, reshape  Add new modules, if required Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 8 …in action Application in map editor Modules in module librarian Reads data Colormaps it Makes ribbon Displays it Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 9 Make the connections Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 10 Add more modules... Adds axes Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 11 ...and even more Adds caption Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 12 Some examples Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 13 Trendalyzer (Gapminder) Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 14 Worldmapper: area Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 15 Worldmapper: deaths by disease Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 16 Many eyes: shared visualization Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 17 Overview  Introduction  NAG, HECToR  Visualization  distribution, collaboration, steering  Data mining  classification, exploratory analysis  The ADVISE project  large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 18 NAG Data Mining Tools  Data Cleaning  Data imputation - adding missing values  Outlier detection - finding suspect data records  Data Transformation  Scaling Data - before distance computation  Principal Component Analysis - reducing # of variables  Model fitting  Cluster analysis - finding interesting groups  Classification techniques - # of groups is known  Regression no groups - outcome is continuous  Linear / Non-linear / Time series Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 19 Example: exploratory data analysis  How many species of water vole (Arvicola) in UK?  Measurement data  Presence / absence of 13 skull characteristics  300 observations, each in one of 14 regions  3 groups:  A. terrestris / A. sapidus / unclassified UK cases  Treatment  Average data within each region  Gives 14 data points in 13 dimensions  How to display dataset? Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 20 2D scatterplots Analysis  2D scatterplots?  Structure is unclear  (13 x 12) / 2 = 78 plots needed  Principal components analysis?  2 PCs explain 49% of the variance  3 PCs explain 65% of the variance  Should be > 85% for confident representation  Fisher’s iris dataset (4 variables) is 95%  Alternative technique  Metric scaling Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 22 Metric scaling  14 data points – one for each region  Each point has values for 13 variables  Construct 14 by 14 dissimilarity matrix, Δ  Δij = distance between points i & j in 13D space  Δ is symmetric, with zero diagonal elements  Want to find a new matrix, Δ*  set of 14 new data points in 3D space that preserve Δ  Project Δ to Δ* using metric scaling  Display data points in 3D Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 23 Exploratory data analysis conclusions  2D scatterplots don’t indicate group structure  cf. iris dataset  3D PCA unreliable here  Metric scaling of Δ used to reduce D from 13 to 3  3D visualization reveals group structure  Distinct A. sapidus group  UK sample represents only A. terrestris Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 28 Overview  Introduction  NAG, HECToR  Visualization  distribution, collaboration, steering  Data mining  classification, exploratory analysis  The ADVISE project  large data, interactive analysis Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 29 The ADVISE project  DTI-funded research project, started March 2007  NAG / VSN / University of Leeds  Merge visualization & statistics (visual analytics)  use statistics to identify key characteristics of dataset  understand the characteristics through visualization  User community  pharmaceuticals  environmental science  engineering  Initial user meeting held September 2007 Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 30 Large datasets  Size matters (but isn’t everything)  Developer’s view: Too large for our current system  Problems of  performance  robustness  User’s view: Too large for me to understand  Current ADVISE datasets are “only” a few GB  complications (e.g comparing several) could raise this  HECToR users have TB datasets Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 31 ADVISE ideas  Retention of visual programming interface  Re-use of algorithmic base  IRIS Explorer modules  GenStat statistics functionality (from VSN)  Three layered architecture  User interface  Web service middleware  Visualization components  Distribution, tailored user interface, collaboration Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 32 ADVISE progress  Porting IE modules to standalone environment  some of these use GenStat for statistics  New system used to revisit air quality demo  early (IEEE Viz 96) web-based visualization  new system more efficient  Working with real user data Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 33 Conclusions  NAG offers software components for developers  no wheel-reinvention, stone canoes, chocolate teapots  Visualization & data mining crucial for analysis  distribution, steering, classification, exploration  interactivity / interrogation important  integration is an ongoing field of activity  ADVISE project  developing a new system for visual analysis  working with real user problems  improving understanding of data Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford 34