Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
9/5/16 Comp 135 Introduction to Machine Learning and Data Mining Fall 2016 Professor: Roni Khardon What is Machine Learning? • Have/collect some “data” • Analyze it to produce “knowledge/ insight” • Use that knowledge for some “task” Computer Science Tufts University COMP 135 What is Machine Learning? • Traditionally this is partitioned into: – supervised learning – unsupervised learning – reinforcement learning • But many novel forms exist and are discovered and invented and being developed COMP 135 Roni Khardon, Tufts University Supervised Learning Applications Domain/problem Domain/problem Response Variable Weather Temperature tomorrow Health Informatics Number of Patients with Influenza Environmental Engineering Predict soil contamination level Commerce Product demand COMP 135 Roni Khardon, Tufts University Classes/Labels Character recognition Face Recognition Alphabet Male/Female Specific People Spam filtering Document Classification Spam or not News Heading Structure activity relation of molecules Carcinogenic or not Active drug or not Protein fold prediction Fold types Astronomy: sky survey Star types COMP 135 Supervised Learning Applications In many applications the response variables is numerical and not categorical. These are known as regression problems. Roni Khardon, Tufts University Roni Khardon, Tufts University Unsupervised Learning Clustering is often a form of data exploration allowing us to identify groupings that are otherwise not apparent Domain/problem Gene-array data Groups Similar activity patterns Text Customer Activity Word Classes Customer “types” (phone; web; movies; etc) COMP 135 Roni Khardon, Tufts University 1 9/5/16 Unsupervised Learning Association rules capture nuggets of imperfect prediction rules Reinfocement Learning Agent can control environment (take actions) and gets occasional rewards. Goal is to maximize long term reward. Domain/problem Associations Market-basket Records Nappy & Milk à Bread (10% / 80%) Nappy & Milk à Beer (10% / 75%) Domain/problem Aim of Policy AI Planning/ Robot Control Plan course of actions to achieve explicit or implicit goal. Census Data (Age <16) à Not in Army (X% / 100%) (Age > 30) & (Boston) à (has BA/BSc) (X% / Y%) (Computer) Games City Management Auto-player Traffic Control Web site management Contents/advertising policy COMP 135 Roni Khardon, Tufts University COMP 135 Novel Forms of ML • • • • • • Supervised Learning Constrained Clustering Active Learning Collaborative Filtering Collective Classification Many more … You can think about more forms and whether they are feasible COMP 135 Roni Khardon, Tufts University • We will focus for a while on supervised learning. • Is this really possible? • When? • Why? • How? COMP 135 Supervised Learning Application Training Data Classifier Predictions of Labels for new data COMP 135 Roni Khardon, Tufts University Data Representation New Data Learning Algorithm Roni Khardon, Tufts University Roni Khardon, Tufts University • What does it look like? • Raw data depends on the applications and our interpretation of it. Training Data • Let’s look at 4 cases: – Toy example: PlayTennis – Soybean data – Mutagenesis molecule data – Astrophysics time series data COMP 135 New Data Roni Khardon, Tufts University 2 9/5/16 Classifier Learning Algorithm • Given data representation (e.g. as table of features/attributes) • How can we analyze the data to produce a classifier that produces correct predictions? • What a classifier does: Learning Algorithm Classifier – Input: new example – Output: new example’s predicted label • How should we represent the classifier? COMP 135 Roni Khardon, Tufts University COMP 135 Roni Khardon, Tufts University Evaluation Learning Algorithm • Evaluating the Classifier (the output of the algorithm) for a specific data/ application • In the next few lectures we will discuss 4 different algorithms: • Evaluating the Learning Algorithm that should be effective for any/many applications. COMP 135 Roni Khardon, Tufts University – Nearest Neighbors – Decision Trees – Linear classifiers – Bayesian Classifier COMP 135 Roni Khardon, Tufts University 3