Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford 24 October 2002 Data Mining & Visualization 1 Tools NAG Data Mining Components Statistical & machine learning routines (written in C) Data cleaning Data transformation Model building classification, clustering, prediction IRIS Explorer Modular visualization environment, Visual programming interface Extensible - users can add new modules 24 October 2002 Data Mining & Visualization 2 Example Image from Landsat Multi-Spectral Scanner 36 independent variables per region Each region = 3 by 3 array of pixels 4 spectral bands per pixel Each pixel is in one of six classes Each pixel = 80 m2 types of land use Want to extrapolate for class values elsewhere 24 October 2002 Data Mining & Visualization 3 Treatment (1) Use principal component analysis reduces 36 dimensions to 2 Choose three classes explains ~ 85% of variance 2 - cotton crop 4 - damp grey soil 5 - soil with vegetation stubble 2 independent variables, 1 class variable 1364 points 24 October 2002 Data Mining & Visualization 4 Treatment (2) Model data using a decision tree (Quinlan’s C4.5) Classify original data using decision tree Each node splits data into two sets Aims to maximise separation of classes Splitting continues recursively each point is assigned to a class 92% agreement with original classes Classify new data using decision tree e.g. points on a grid establish boundaries of classes 24 October 2002 Data Mining & Visualization 5 Visualization 24 October 2002 Data Mining & Visualization 6 Original class values 24 October 2002 Data Mining & Visualization 7 Predicted class values 24 October 2002 Data Mining & Visualization 8 Visualization 24 October 2002 Data Mining & Visualization 9 Lessons learnt / next steps Visual programming interface Easy to link components & IRIS Explorer building modules using C interface Extensible to other data sets? What about larger data sets? 24 October 2002 Data Mining & Visualization 10