Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS4031/CS5012 Data Mining and Visualization Yaji Sripada Dept. of Computing Science, University of Aberdeen 1 Time table • Lectures – 2 lectures • 11:00 -12:00 Tuesdays in Taylor A21 • 10:00 – 11:00 Wednesdays in Meston 311 • Practicals – 1 two hour practical on Mondays in Meston 311 (Room changed since the last lecture) • 15:00-17:00 Dept. of Computing Science, University of Aberdeen 2 Assessment • Course is worth 15 credits • Two components – 25% continuous assessment – 75% end of term exam • Continuous assessment – Issued in Week 6/7 – Due on the Friday of Week 10/11 Dept. of Computing Science, University of Aberdeen 3 Course Organization • Lectures – Discuss topics including • Practicals – 10 weeks • Mostly using some existing software • Developing your own software (PHP) • Data Mining – General Data – Time series – Spatial data • Information Visualization (InfoVis) • Case Studies – 8th week assessment Dept. of Computing Science, University of Aberdeen 4 Reading • Mostly lecture notes and some research papers • I will provide all the required reading material Dept. of Computing Science, University of Aberdeen 5 Introduction Dept. of Computing Science, University of Aberdeen 6 Overgrowth of Data • Humans accumulate large volumes of data in many domains – Business • Transactional data – Scientific • Complete sequence data from Human Genome Project – of 3 billion DNA units – Engineering • 100s of sensors on a gas turbine taking measurements every second – And many more? Dept. of Computing Science, University of Aberdeen 7 Information Hidden in Data • Data are raw facts • Humans routinely ‘dig’ useful abstractions from raw data – An example abstraction ‘mined’ from past exam results – No coursework submitted => will fail the exam as well • For small data sets (a few hundred bytes) – Simple and manual data analysis OK (Even preferred!!!) – Statistics • For large data sets (a few Gigabytes or more) – Manual analysis is impossible – Computer Assistance needed Dept. of Computing Science, University of Aberdeen 8 Two views of computer assistance • Data Mining View – Machines can automatically (or semiautomatically) extract meaningful and useful information from heaps of raw data • Information Visualization (InfoVis) View – Humans themselves can make sense of data if data are presented visually • We learn both these views in this course Dept. of Computing Science, University of Aberdeen 9 Data Mining • Process of automatically (or semiautomatically) discovering useful, novel and meaningful patterns from substantial quantities of data. • Tasks – Classification – Clustering – Association Rule Mining Dept. of Computing Science, University of Aberdeen 10 Typical Applications • Customer Relationship Management (CRM) • Linking gene variations among individuals to common illnesses (e.g. Cancer) • Identifying abnormal conditions in an operational gas turbine • More ? Dept. of Computing Science, University of Aberdeen 11 Information Visualization • Process of representing data in such a way (usually involves visual presentations) that enable users to gain useful insights into the data • Focus is on designing a data representation scheme that makes in underlying ‘information’ visible to the user • For rendering the representation scheme – Computer graphics technology is exploited • Good InfoVis techniques are based on – Good understanding of the information structures underlying the data – Good understanding of the human perception and cognition – Good graphics Dept. of Computing Science, University of Aberdeen 12 Typical Applications • • • • Newsmap Touchgraph ManyEyes CountryScape Dept. of Computing Science, University of Aberdeen 13 SVG • Scalable Vector Graphics – An XML-based Web Language to textually specify vector graphics – E.g. SVG specification of a circle <svg width="300" height="200" xmlns="http://www.w3.org/2000/svg"> <circle cx="125" cy="110" r="20" fill="red" /> </svg> • W3C Recommendation • Browser support for SVG content – Firefox provides built in support – IE needs an Adobe Plug-in • SVG content can be created – Using text editors (static) – Programmatically (dynamic) Dept. of Computing Science, University of Aberdeen 14 SVG in an HTML page • Three methods • Using the <embed> tag <embed src=“circle.svg" width="300" height="100" type="image/svg+xml" pluginspage="http://www.adobe.com/svg/viewer/install/" /> • Using the <object> tag – <object data="rect.svg" width="300" height="100" type="image/svg+xml" codebase="http://www.adobe.com/svg/viewer/install/" /> • Using the <iframe> tag – <iframe src="rect.svg" width="300" height="100"> </iframe> Dept. of Computing Science, University of Aberdeen 15 Learning SVG • In this course focus is on designing effective visualizations – We assume knowledge of SVG for rendering the visualizations • Fundamentals of SVG covered in lectures • Some practicals involve SVG creation – No practicals exclusively for learning SVG Dept. of Computing Science, University of Aberdeen 16 Detailed Course Plan Lectures Data Exploratory Data Analysis (EDA) Practicals Exploratory Data Analysis (EDA) Classification – I Classification – II Clustering – I Classification Clustering Clustering – II InfoVis – I InfoVis – II HCE Tree Maps InfoVis – III Association rule mining Association Rule Mining Time Series – I Time Series – II Time Series – III Assignment Time Searcher Sequence Mining Spatial Data GIS Time Series Representations GIS Spatial Data Mining Issues with Data mining and InfoVis Exploring spatio-temporal Data Users Dept. of Computing Science, University of Aberdeen 17 Summary • All modern organizations – possess large volumes of data and – Users want to understand these data • You learn technologies to – Extract and/Or present information from large data sets • Analytical methods • Visualization methods Dept. of Computing Science, University of Aberdeen 18