Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden Overview • • • • Data Analysis Data Mining Applications Outlook Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Data Analysis Data Mining Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • ``Data Mining is one of the five key note technologies that will have a major impact across a wide range of industries within the next three to five years’’ (Gartner) • ``Data Mining is one of the top ten new technologies in which companies will invest during the next five years’’ (Gartner) • ``Data Mining is an overhyped concept’’ (OTR) Data Analysis Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Data analysis = Processing data • Exploratory vs. Confirmatory – are there interesting structures? – can we predict the value? • Descriptive vs. Inferential – statement about data set – draw more general conclusions • Data analysis = process of computing various summaries and derived values from the given collection of data Tools Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Cookbook fallacy: Data analysis = picking and applying the right tool. – Tools are not independent. – Matching is an iterative process (which needs intelligence). Stat vs. ML Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Statistics – Mathematics • Machine Learning – Experimental Computer Science • ``Statistics is difficult’’ • ``Algorithms are not exact’’ Models Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Models vs. Algorithms • Empirical vs. Mechanistic Models • Understanding vs. Prediction • Models vs. Patterns • Overfitting • Constraints Algorithms Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Enabling data analysis • Too many: often no foundations, no applications • In practice only a restricted set of algorithms is used Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. The nature of Data • Different kinds of data – – – – Numerical Data Text Images Sound • Raw data has – – – – – missing values distortions misrecording inadequate sampling etc. Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. The nature of data • Data sets can be large – horizontal – vertical • Curse of dimensionality • Experiments • Sampling Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. The nature of data • Too little – Example: storm situations • Too much – Example: image segmentation • • • • Static vs. dynamic Off-line vs. On-line Infoglut What is collected? Overview • • • • • • • • Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Statistical methods and concepts Bayesian methods Time series Rule induction Neural networks Fuzzy logic Stochastic search methods Applications Overview Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Why Intelligent Data Analysis Fundamental Concepts of Statistics Intelligent Data Analysis: Issues and Challenges Artificial Neural Networks Fuzzy Logic Industrial Applications of NeuroFuzzy Networks Statistical Methods for Data Analysis Time Series Analysis Overview Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Chaos and Reality Bayesian Networks ANN Visualization Tools Rule Induction Evolutionary Systems Data Analysis in Real-World Applications Enrichment • Data Fusion – combine data sets • Example: – customer database – survey information Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Data Mining Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Database technology • Data visualization • Data warehouse vs Operational database – – – – time-dependent non-volatile subject-oriented integrated • Target: decision making Data Mining Data Mining • • • • • • Selection Cleaning Enrichment Coding Data Mining Reporting Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Cleaning • • • • • Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Remove duplicates Check domain consistency Remove data Project data Combine data in one table Coding • • • • Adress - Region Date of birth - Age Scaling of numerical data Date - Number of months Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Data Mining • SQL queries • Clustering • Pattern Recognition ML ES KDD DB Visual Statistics Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Nearest Neighbor • Search k nearest points Oil Search Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Shell research • South-East Asia coring measurements kinds of stone Applications Outlook Outlook Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Positive – – – – Moore’s Law New kinds of computers Data collection More data is more easy reachable • Negative – Collective memory gets lost – Infoglut • Data battle Outlook Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. • Merge of Machine Learning and Statistics • Algorithms – Adaptive parameters – Black Box data mining • From suites to tailored tools Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Intelligent Data Analysis • Intelligent Data Analysis – User Interaction – also uses tools from Machine Learning NetTalk Title: (Logo UL-onder) Creator: Adobe Illus trator( r) 6.0 Prev iew : This EPS picture w as not s av ed w ith a preview inc luded in it. Comment: This EPS picture w ill print to a Pos tSc ript printer , but not to other ty pes of printers. Sound Sound Generator generator Speech-synthesis Speech-synthesis expertsystem system expert INTELLI NetTalk Neural Network