Download Final Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Mixture model wikipedia , lookup

Transcript
Final Review
7-Text Mining
• Unstructured Data
• Two modes of mining: analysis vs. retrieval
• Precision vs. Recall as metric
– With lots of data you can find anything
• Tools for text mining
– Stopwords, stemming
– Term document matrix
– TF-IDF
• Latent Semantic Indexing (LSI)
– Uses PCA to find ‘concepts’ (topics)
– Documents that share concepts will be close
• Probabilistic Models
– Naïve Bayes vs. Multinomial
• LDA: Documents from Topics from Words
8-Web Mining
• Detecting robots
• Markov Models for Page prediction
• Ranking web pages
– Flow model
– Power iteration
– Random walk and the stationary distribution
• Spider traps and how to get around them
• Adwords model for advertising cost-per-click
9-Advanced Classification
• Neural Networks
–
–
–
–
Neuron: inputs, linear combination, activation function, output
Architecture: layers, nodes per layer
Training through back propagation
Good for complex problems like face detection, speech, video
• Support Vector Machines
–
–
–
–
Assume classes are separable
Plus/minus plane, margin, support vectors
Finds the maximum margin separable classifier
If not separable, use “kernel trick”
10-Ensembles
• Ensemble Methods
–
–
–
–
–
–
Collections of ‘small’ models can fit something complex
Typically beats individual models
Model Averaging
Boosting – fit to models with error upweighted
Bagging – fit to bootstrapped versions of data
Random Forests – fit to trees with random variables at
each split
11-Bayesian Methods
• Hierarchical Modelling with MCMC
–
–
–
–
–
–
No pooling vs complete pooling vs. Bayesian solution
Priors tell how much you should depend on the data
Congugate priors (e..g beta/binomial) make life easy.
MCMC for other cases
Metropolis Hastings: sample from the posterior
Use trace plots to assess convergence
12-Recommender Systems
• Netflix Prize
– We won!
• Recommender Systems
–
–
–
–
Evaluation via RMSE or DCG
Nearest Neighbors
SVD
Ensembles (of teams, of models) very powerful
13-Networks
• Nodes and edges
– Node and edge centrality
– Degrees and degree distribution
• Network Models
–
–
–
–
Erdos/Renyi
Preferential Attachment
Power Law graphs
Small world networks