Download Data analysis techniques and tools

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
. – Data analysis techniques and tools
PROFESSOR FRANCESCO CIVARDI
COURSE AIMS
“To avoid the danger of "drowning in information, but starving for knowledge" the
branch of research known as data analysis has emerged, and a considerable number
of methods and software tools have been developed. However, it is not these tools
alone but the intelligent application of human intuition in combination with
computational power, of sound background knowledge with computer-aided
modeling, and of critical reflection with convenient automatic model construction,
that results in successful intelligent data analysis projects.” (Berthold et al., 2010).
Aim of the course is to teach students a good command of concepts which allow
them to apply data analysis techniques, data warehousing, OLAP, data mining and
machine learning algorithms to several application areas.
These concepts derive from the synergy between various subjects: artificial
intelligence, statistics, Bayesian methods, information theory, control theory,
computational complexity theory, neurophysiology, research into databases and
information retrieval techniques. Last, but not least, the new Network Science
(social, biological etc.).
The areas of application include medical diagnosis, banking customer credit risk
analysis, supermarket and e-commerce customer purchase behaviour analysis,
industrial process optimization, the detection of fraud and prediction of terrorist
attacks.
COURSE CONTENT
–
–
–
–
–
–
–
–
–
Introduction to business intelligence, OLAP and data mining
Concepts of data warehousing
Multi-dimensional analysis. Dimensional modelling
Relational and multidimensional databases
Overview of SQL
Introduction to MDX language
Data mining topics: classification, prediction, clustering and association
Decision trees. Entropy and information gain
Overview of probability theory. Bayes' theorem
–
–
–
–
–
–
–
–
Naive Bayes classifier. Bayesian networks
Linear and multiple regression. Logistic regression
Neural networks
Support vector machines
Validation and model comparison
Cluster analysis: EM, k-means clustering and hierarchical algorithms
Association analysis
Introduction to Network Science
READING LIST
- Slides and lesson notes
- websites and papers announced during the lectures
BERTHOLD, M.R., BORGELT, C., HÖPPNER, F., KLAWONN, F, Guide to intelligent data analysis, Springer
2010.
C. VERCELLIS, Business Intelligence - Modelli matematici e sistemi per le decisioni, McGraw-
Hill, 2006.
Reference texts:
R. KIMBALL, Data Warehouse: La guida completa, Hoepli, 2002
I.H. WITTEN - EIBE FRANK, Data Mining, Practical Machine Learning Tools and Techniques
with Java implementations, Morgan Kaufmann, 1999
J. HAN E M. KAMBER, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.
TEACHING METHOD
Lectures, computer projects with freeware (KNIME, Gephi, NodeXL).
ASSESSMENT METHOD
The assessment is based on active participation in the course and a final project, with
presentation and discussion of the results.
NOTES
Further information can be found on the lecturer's webpage at http://docenti.unicatt.it or
on the Faculty notice board.