Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4.30 Machine Learning Pádraig Cunningham Machine Learning Group University College Dublin 2 Outline  Week 1  Introduction & General Overview of Matrix Decomposition  Nearest Neighbour Classifiers  Tutorial  Week 2: Neural Networks  Simple Perceptron, Backpropagation  Other Architectures: Hopfield, Self-Organising Maps  Tutorial  Week 3  Support Vector Machines  Kernel Methods & Evaluation  Tutorial  Week 4  Decision Trees  Naïve Bayes  Tutorial Intro to ML 3 Outline  Week 5: Ensemble Techniques     Bagging Boosting Tutorial Coursework 3-4 pieces, 15 hours, Weka & Java Week 6: Unsupervised Learning  Hierarchical Clustering  Other Clustering Algorithms: k-Means, Spectral Clustering  Tutorial  Week 7: Dimension Reduction  Principle Components Analysis, LSI, SVD  Feature Selection  Tutorial  Later  2 revision tutorials Intro to ML 4 Why Machine Learning    Recent progress in algorithms and theory Loads of processing power Computational power is available Growing flood of online data  Amazon  Google Intro to ML 5 3 niches for ML  Data mining: using historical data to improve decisions   Software applications that cannot be programmed by hand.     medical records  medical knowledge autonomous driving speech recognition i.e. weak theory domains. Self customising programs   Personalised Newspaper E-mail filtering Intro to ML 6 Data-mining in medical records Quality Assurance in Maternity Care. http://svr-www.eng.cam.ac.uk/projects/qamc/qamc.html Intro to ML 7 Rule Learning The QAMC system uses Decision /trees (I think!) It is also possible to extract rules from data:If Then No previous normal delivery, and Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission Probability of Emergency C-Section is 0.6 Over training dat 26/41 = 0.63 Over test data: 12/20 = 0.6 <Rule taken from Machine Learning by Tom Mitchell> Intro to ML 8 Spam Filtering  For Machine Learning…    Lots of training data High dimensionality data (lots of features) Email is a diverse concept    Porn, mortgage, religion, cheap drugs… Work, family, play… Spam Filtering is a challenge because…    Arms race: spammers vs filters False Positives are unacceptable Spam is a changing concept Intro to ML 9 ALVIN Problems too difficult to program by hand Alvin drives at 70mph on motorways Intro to ML 10 Autonomous Vehicles  DARPA Grand Challenge 2005  Winner: Stanley from Stanford  Various modules use ML Intro to ML 11 SmartRadio   Internet-based music radio Personalised  Collaborative Recommendation  Content-Based   Recommendation supported by knowledge discovery from log data supported by feature extraction from sound files   feature seleciton refinement Intro to ML 12 Smart Radio  Smart Radio is a web based client-server music application which allows listeners build, manage and share music programmes The project was set up to look at a possible model for:  The regulated distribution of music on the web  A personalised stream of music service  To provide an architecture and data to test our data mining and collaborative filtering algorithms Intro to ML 13 ML Dimensions  Lazy v’s Eager  k-NN   v’s rule learning Supervised v’s Unsupervised Symbolic v’s Sub-symbolic Intro to ML