Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 657/790 Machine Learning and Data Mining Course Introduction Student Survey • Please hand in sheet of paper with: • Your name and email address • Your classification (eg, 2nd year computer science PhD student) • Your experience with MATLAB (none, some or much) • Your undergraduate degree (when, what, where) • Your AI experience (courses at UWM or elsewhere) • Your programming experience Course Information • Course Instructor: Joe Bockhorst • email: [email protected] • office: 1155 EMS • Course webpage: http://www.uwm.edu/~joebock/790.html • office hours: ??? • Possible times: • • • • before class on Monday (3:30-5:30) Monday morning Wednesday morning after class Monday (7:00-9:00) Textbook & Reading Assignment • Machine Learning (Tom Mitchell) • Bookstore in union, $140 new • Amazon.com hard cover: $125 new , $80 used • Amazon.com soft cover: < $30 • Read (posted on class web page) • • • • Preface Chapter 1 Sections 6.1, 6.2, 6.9, 6.10 Sections 8.1, 8.2 Powerpoint Vs Whiteboard • Powerpoint encourages words over pictures (not good) • But powerpoint can be saved, tweaked, easily shared, … • Notes posted on course website following lecture • Your thoughts? Full Disclosure • Slides are a combination of 1) Jude Shavlik’s notes from UW-Madison machine learning course (Prof. I had) 2) Textbook Slides (Google “machine learning textbook”) 3) My notes Class Email List • Is there one? Course Outline • 1st half covers supervised learning • Algorithms: support vector machines, neural networks, probabilistic models … • Methodology • 2nd half covers graphical probability models • Powerful statistical models very useful for learning in complex and/or noisy settings Course "Style" • Primarily algorithmic & experimental • Some theory, both mathematical & conceptual (much on statistics) • "Hands on" experience, interactive lectures/discussions • Broad survey of many ML subfields • • • • • "symbolic" (rules, decision trees) "connectionist" (neural nets) Support Vector Machines statistical ("Bayes rule") genetic algorithms (if time) Two Major Goals • to understand what a learning system should do • to understand how (and how well) existing systems work Background Assumed • Programming • Data structures and algorithms • • Math CS 535 • Calculus (partial derivatives) • Simple probability & statistics Programming Assignments in MATLAB • Why MATLAB? • • • • Fast prototyping Integrated plotting Widely used in academia (industry too?) Will save you time in the long run • Why not MATLAB? • Proprietary software • Harder to work from home • Optional Assignment: familiarize yourself with MATLAB, use MATLAB help system Student Computer Labs • E256, E280, E285, E384, E270 • All have MATLAB installed under Windows XP Requirements • Bi-weekly programming plus perhaps some “paper & pencil” homework • • • • "hands on" experience valuable HW0 – build a dataset HW1 & HW2 supervised learning algorithms HW3 & HW4 graphical probability models • Midterm exam (after about 8-10 weeks) • Final exam • Find project of your choosing • during last 4-5 weeks of class Grading HW's Project Midterm Final Quality Discussion 25% 20% 20% 30% 5% Late HW's Policy • HW's due @ 4pm • you have 5 late days to use over the semester • (Fri 4pm → Mon 4pm is 1 late "day") • SAVE UP late days! • extensions only for extreme cases • Penalty points after late days exhausted • 10% per day • Can't be more than one week late Machine Learning Vs Data Mining • Machine Learning: computer algorithms that improve automatically through experience [Mitchell]. • Data Mining: Extracting knowledge from large amounts of data. [Han & Kamber] (synonym: knowledge discovery in databases (KDD)) What’s the difference? Topics in ML and DM texts (Mitchell Vs Han & Kamber) Supervised learning, decision trees, neural nets, Bayesian networks, k-nearest neighbor, genetic algorithms, unsupervised learning (clustering in DM jargon),… reinforcement learning, learning theory, evaluating learning systems, using domain knowledge, inductive logic programming, … Data Warehouse, OLAP, query languages, association rules, presentation, … ML DM We’ll try to cover topics in red The learning problem • Learning = improving with experience Improve over task T, with respect to performance measure P, based on experience E • Example: learn to play checkers T: Play Checkers P: % of games won E: games played against self Famous Example: Discovering Genes • T: find genes in DNA sequences • ACGTGCATGTGTGAACGTGTGGGTCTGATGATGT… • P: % of genes found • E: experimentally verified genes * Prediction of Complete Gene Structures in Human Genomic DNA, Burge & Carlin J. Molecular Biology, 1997, 268 78-94 Famous Example 2: Autonomous Vehicles Driving • T: drive vehicle • P: reach destination • E: machine observation of human driver ML key to winning DARPA Grand Challenge Stanford team won 2005 driverless vehicle race across Mojave Desert “The robot's software system relied predominately on state-of-the-art AI technologies, such as machine learning and probabilistic reasoning.” [Winning the DARPA Grand Challenge, Thrun et al., Journal of Field Robotics, 2006] Why study machine learning ? (data mining) • Data is plentiful • Retail, video, images, speech, text, DNA, bio-medical measurements, … • • • • Computational power is available Budding Industry ML has great applications ML still relatively immature Next Time: HW0 – Create Your Own Dataset • Think about this • will need to create it by week after next • Google to find: • • UCI archive (or UCI KDD archive) UCI ML archive (UCI machine learning repository) HW0 – Your “Personal Concept” • Step 1: Choose a Boolean (true/false) concept • Subjective Judgement • Books I like/dislike • Movies I like/dislike • Web pages I like/dislike • “Time will tell” concepts • Stocks to buy • Medical outcomes • Sensory interpretation • Face recognition (See text) • Handwritten digit recognition • Sound recognition HW0 – Your “Personal Concept” • Step 2: Choosing a feature Space • We will use fixed-length feature vectors • Choose N features Defines a space • Each feature has Vi possible values • Each example is represented by a vector of N feature values (i.e., is a point in the feature space) e.g.: <red, 50, round> color weight shape • Feature Types • • • • Boolean Nominal In HW0 we will use a subset Ordered (see next slide) Hierarchical • Step 3: Collect examples (“I/O” pairs) Standard Feature Types for representing training examples – source of “domain knowledge” • Nominal • No relationship among possible values e.g., color є {red, blue, green} (vs. color = 1000 Hertz) • Linear (or Ordered) • Possible values of the feature are totally ordered e.g., size є {small, medium, large} ← discrete weight є [0…500] ← continuous • Hierarchical • Possible values are partially ordered in an ISA hierarchy e.g. for shape -> closed polygon square continuous triangle circle ellipse Example Hierarchy (KDD* Journal, Vol 5, No. 1-2, 2001, page 17) Product Pct Foods 2302 Product Subclasses Dried Cat Food Tea 99 Product Classes Canned Cat Food Friskies ~30k • Structure of one feature! Liver, 250g Products • “the need to be able to incorporate hierarchical (knowledge about data types) is shown in every paper.” - From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001 * Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers Our Feature Types (for homeworks) • Discrete • tokens (char strings, w/o quote marks and spaces) • Continuous • numbers (int’s or float’s) • If only a few possible values (e.g., 0 & 1) use discrete • i.e., merge nominal and discrete-ordered (or convert discrete-ordered into 1,2,…) • We will ignore hierarchy info and only use the leaf values (it is rare any way) Today’s Topics • Creating a dataset of fixed length feature vectors • HW0 out on-line • Due next Monday Some Famous Examples • Car Steering (Pomerleau) Digitized camera image Learned Function Steering Angle • Medical Diagnosis (Quinlan) Medical record • • • • • • age = 13 sex = M wgt = 18 Learned Function DNA Categorization TV-pilot rating Chemical-plant control Back gammon playing WWW page scoring Credit application scoring ill vs healthy HW0: Creating your dataset 1. Choose a dataset • • based on interest/familiarity meets basic requirements • >1000 examples • category (function) learned should be binary valued • ~500 examples labeled class A, other 500 labeled class B → Internet Movie Database (IMD) HW0: Creating your dataset 2. IMD has a lot of data that are not discrete or continuous or binary-valued for target function Name (category) Country Name Studio List of movies Name Director/ Year of birth List of movies Producer Actor Made Directed Produced Acted in Movie Year of birth Gender Oscar nominations List of movies Title, Genre, Year, Opening Wkend BO receipts, List of actors/actresses, Release season HW0: Creating your dataset 3. Choose a boolean or binaryvalued target function (category) • • • Opening weekend box office receipts > $2 million Movie is drama? (action, sci-fi,…) Movies I like/dislike (e.g. Tivo) HW0: Creating your dataset 4. How to transfer available attributes: Other example attributes (select predictive features) • Movie • Studio • • • Average age of actors Number of producers Percent female actors • • • Number of movies made Average movie gross Percent movies released in US HW0: Creating your dataset • Director/Producer • • • • Years of experience Most prevalent genre Number of award winning movies Average movie gross • Actor • Gender • Has previous Oscar award or nominations • Most prevalent genre HW0: Creating your dataset David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they selected and a novel way of sampling from the data: • Opening weekend box office receipts > $2 million • 25 attributes • Accuracy = 83.3% • Default accuracy = 56% • Movie is drama? • 12 attributes • Accuracy = 71.9% • Default accuracy = 51% • http://kdl.cs.umass.edu/proximity/about.html What Do You Think Machine Learning Means? What is Learning? Learning denotes changes in the system that … enable the system to do the same task … more effectively the next time. - Herbert Simon Learning is making useful changes in our minds. - Marvin Minsky Major Paradigms of Machine Learning • Inducing Functions from I/O Pairs • • • • • Decision trees (e.g., Quinlan’s C4.5 [1993]) Connectionism / neural networks (e.g., backprop) Nearest-neighbor methods Genetic algorithms SVM’s • Learning without a Teacher • Conceptual clustering • Self-organizing systems • Discovery systems Not in Mitchell’s textbook (will spend 0-2 lectures on this – but also in CS776) Major Paradigms of Machine Learning • Improving a Multi-Step Problem Solver • Explanation-based learning • Reinforcement learning Will be covered briefly • Using Preexisting Domain Knowledge Inductively • Analogical learning • Case-based reasoning • Inductive/explanatory hybrids