Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Nearest-neighbor chain algorithm wikipedia , lookup
Principal component analysis wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Mixture model wikipedia , lookup
Nonlinear dimensionality reduction wikipedia , lookup
K-means clustering wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Dr. Eick COSC 6342 2013 Midterm Exam Review List The midterm exam has been scheduled for We. March 20, 2:30-2:50p in our class room. The exam is open book and notes—you are not allowed to use any computers! Relevant Material: All transparencies covered in the lectures (except transparencies associate with Topic4 and DENCLUE transparencies with topic 9) and the following pages of the textbook (second edition) are relevant for the 2013 midterm exam: 1-14, 21-28, 30-42, 47-55, 61-73, 76-84, 87-93, 108-120, 125-128, 143-154, 163-172, 174-181 (except line smoother), 186-197, DBSCAN paper, some Wikipedia pages referenced by transparencies. Moreover, I recommend to read the descriptions of Kmeans, EM, and kNN in the “Top 10 data mining algorithms” article, posted on the webpage. Checklist: What is machine learning? hypothesis class, VC-dimension, basic regression, overfitting, underfitting, training set, test set, cross-validation, model complexity, triple trade-off (Dieterich), performance measure/loss function. Bayes’ Theorem, Naïve Bayesian approach, losses and risks, derive optimal decision rules for a given cost/risk function. Maximum likely hood estimation, variance and bias, noise, Bayes’ estimator and MAP, parametric classification, model selection procedures, multivariate Gaussian, covariance matrix, Malhalanobis distance, PCA (goals and objectives, what does it actually do, what is it used for, how many principal components do we choose?), multidimensional scaling (only what is it used for, and how is it different from PCA). K-means (prototype-based/representative-based clustering, how does the algorithm work, optimization procedure, algorithm properties), EM (assumptions of the algorithm, mixture of Gaussians, how does it work, how is cluster membership estimated, how is the model updated from cluster membership, relationship to K-means), DBSCAN (how does it form clusters, input parameters, how is it different from K-mean/EM). Non-parametric density estimation (histogram, naïve estimator, Gaussian kernel estimator), non-parametric regression (Running mean and kernel smoother, how is it different from regular regression), k-nearest neighbor classification (transparencies only). Decision trees (how are they generated from data sets, how do they classify data, pruning, properties of decision tree classifiers), Regression Trees (what they are used for, how are they generated from a dataset). 1 Transparencies and Other Teaching Material Course Organization ML Spring 2011 Topic 1: Introduction to Machine Learning(Eick/Alpaydin Introduction, Tom Mitchell's Introduction to ML---only slides 1-8 and 15-16 will be used) Topic 2: Supervised Learning (examples of classification techniques: Decision Trees, kNN) Topic 3: Bayesian Decision Theory (excluding Belief Networks) and Naive Bayes (Eick on Naive Bayes) Topic 4: Using Curve Fitting as an Example to Discuss Major Issues in ML (read Bishop Chapter1 in conjuction with this material; not covered in 2011) Topic 5: Parametric Model Estimation Topic 6: Dimensionality Reduction Centering on PCA (PCA Tutorial, Arindam Banerjee's More Formal Discussion of the Objectives of Dimensionality Reduction) Topic 7: Clustering1: Mixture Models, K-Means and EM (Introduction to Clustering, Modified Alpaydin transparencies, Top 10 Data Mining Algorithms paper) Topic 8: Non-Parametric Methods Centering on kNN and Density Estimation(kNN, NonParametric Density Estimation, Summary Non-Parametric Density Estimation, Editing and Condensing Techniques to Enhance kNN, Toussant's survey paper on editing, condesing and proximity graphs) Topic 9: Clustering 2: Density-based Clustering (DBSCAN paper, DENCLUE2 paper) Topic 10: Decision Trees 2