Download Data Miing and Knowledge Discvoery

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CSC 478
Programming Data Mining Applications
Course Summary
Bamshad Mobasher
DePaul University
What we did
 Data Mining Overview
 The KDD Process
 Data Preprocessing and Understanding
 Using Python, Numpy, Pandas
 Using Scikit-learn modules
 Some emphasis on visualizing and understanding characteristics of the data
 Supervised Knowledge Discovery
 Classification
 Regression Analysis
 Techniques such as KNN, Ridge Regression, Decision Tree and Bayesian
classification
 Lots of emphasis on model evaluation
 Evaluation metrics
 Train-Test methodologies such as cross-validation
 Systematic parameter selection (e.g., grid search)
2
What we did
 Unsupervised Knowledge Discovery
 Cluster analysis
 Using PCA and SVD for dimensionality reduction, data characterization, and
noise reduction.
 Association rule discovery
 Emphasis on using unsupervised approaches as components of larger
knowledge discovery efforts
 E.g., using PCA before clustering; using clustering as the basis for classification
 Real application domains
 Text Mining and document analysis/filtering
 Recommender systems
 Predictive modeling for marketing/business applications
 Image analysis
3
What we did not do
(and you should learn later)
 Approaches for mining sequential/temporal data
 Markov models; time series analysis, sequential pattern mining
 More Ensemble and Hybrid Classifiers/Predictors
 Combining multiple classifiers
 Random Forest classifiers
 Other Meta-learners such as Ada Boost
 Support Vector Machines and Kernel-Based Classifiers
 Topic modeling with Latent factor models
 LDA  Latent Dirichlet Allocation
4
Related documents