Download Data Mining - Indiana University Bloomington

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
CSCI-B565
Data Mining
Course Objective
The course objective is to study algorithmic and practical aspects of discovering patterns and relationships in large
databases. This course is designed to introduce basic concepts of data mining and also provide hands-on experience in
data analysis, clustering and prediction. Data mining is a dynamic field that has wide applications to a number of
scientific areas such as finance, life sciences, social sciences, or medicine.
Textbooks
Required
Data Mining: Concepts and Techniques - by J. Han et al., Morgan Kaufmann 2006. (ISBN: 978-1-55860-901-3; $54.47)
Recommended
Introduction to Data Mining - by P.-N. Tan et al., Pearson 2006. (ISBN 0-321-321136-7; $87.99)
Topics









basic concepts
o introduction to data mining, origins of data mining, data mining tasks
o relational databases, transactional databases, data warehouses
data
o types of data, data quality, similarity metrics, summary statistics
o data preprocessing: cleaning, normalization, reduction, transformation, integration
data warehouse and OLAP technology for data mining
o multidimensional data model and OLAP operations
o warehouse architecture, implementations and relationship with data mining
association rule mining
o basic concepts: frequent itemset generation, rule generation, apriori and FP-growth algorithms
o advanced concepts: graph data, sequential patterns, infrequent patterns, concept hierarchies
classification and regression algorithms
o Bayesian classification, k-nearest neighbor, neural networks, classification and regression trees, support
vector machines, ensemble methods
o handling biased data, and class-imbalanced data
clustering
o partitioning methods (k-means, k-medoids) and hierarchical methods (agglomerative/divisive clustering)
o density-based, graph-based, prototype-based, and model-based clustering
o clustering with constraints
anomaly detection
o statistical approaches to outlier detection
o density-based, proximity-based, clustering-based techniques
mining complex types of data
o mining spatial, text, time-series and multimedia data
o mining web data, mining graphs
o mining streaming data
human factors and social issues
o ethics of data mining and social impacts
o privacy-preserving data mining
o user interfaces, data and result visualization
Grading
Midterm exam: 25%
Final exam/project: 25%
Homework assignments: 40%
Class participation/activity: 10%
School of Informatics and Computing, Indiana University