Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Final Review 7-Text Mining • Unstructured Data • Two modes of mining: analysis vs. retrieval • Precision vs. Recall as metric – With lots of data you can find anything • Tools for text mining – Stopwords, stemming – Term document matrix – TF-IDF • Latent Semantic Indexing (LSI) – Uses PCA to find ‘concepts’ (topics) – Documents that share concepts will be close • Probabilistic Models – Naïve Bayes vs. Multinomial • LDA: Documents from Topics from Words 8-Web Mining • Detecting robots • Markov Models for Page prediction • Ranking web pages – Flow model – Power iteration – Random walk and the stationary distribution • Spider traps and how to get around them • Adwords model for advertising cost-per-click 9-Advanced Classification • Neural Networks – – – – Neuron: inputs, linear combination, activation function, output Architecture: layers, nodes per layer Training through back propagation Good for complex problems like face detection, speech, video • Support Vector Machines – – – – Assume classes are separable Plus/minus plane, margin, support vectors Finds the maximum margin separable classifier If not separable, use “kernel trick” 10-Ensembles • Ensemble Methods – – – – – – Collections of ‘small’ models can fit something complex Typically beats individual models Model Averaging Boosting – fit to models with error upweighted Bagging – fit to bootstrapped versions of data Random Forests – fit to trees with random variables at each split 11-Bayesian Methods • Hierarchical Modelling with MCMC – – – – – – No pooling vs complete pooling vs. Bayesian solution Priors tell how much you should depend on the data Congugate priors (e..g beta/binomial) make life easy. MCMC for other cases Metropolis Hastings: sample from the posterior Use trace plots to assess convergence 12-Recommender Systems • Netflix Prize – We won! • Recommender Systems – – – – Evaluation via RMSE or DCG Nearest Neighbors SVD Ensembles (of teams, of models) very powerful 13-Networks • Nodes and edges – Node and edge centrality – Degrees and degree distribution • Network Models – – – – Erdos/Renyi Preferential Attachment Power Law graphs Small world networks