Download Document

Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1 Objectives:  Understand what is machine learning  Motivate why it has become so important  Identify Types of learning and salient frameworks, algorithms and their utility  Take a sneak peak at the next set of problems 2  What is learning?  Why learn?  Types of learning and salient frameworks  Frontiers 3  Example: Learning to ride a bicycle  T: Task of learning to ride a bicycle  P: Performance of balancing while moving  E: Experience of riding in many situations  Is it wise to memorize all situations and appropriate responses by observing an expert? 4 Improve on task, T, with respect to performance metric, P, based on experience, E. T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded while observing a human driver. T: Categorize email messages as spam or legitimate. P: Percentage of email messages correctly classified. E: Database of emails, some with human-given labels Source: Introduction to Machine Learning by Raymond J. Mooney 5   Determine f such that yn=f(xn) and g(y, x) is minimized for unseen values of y and x pairs. Form of f is fixed, but some parameters can be tuned:  So, y=fθ(x), where, x is observed, and y needs to be inferred  e.g. y=1, if mx > c, 0 otherwise, so θ = (m,c)  Machine Learning is concerned with designing algorithms that learn “better” values of θ given “more” x (and y) for a given problem 6    What is the scope of the task? How will performance be measured? How should learning be approached?  Scalability:  How can we learn fast?  How much resources are needed to learn?  Generalization:  How will it perform in unseen situations?  Online learning:  Can it learn and improve while performing the task? 7          Artificial Intelligence Data Mining Probability and Statistics Information theory Numerical optimization Adaptive Control Theory Neurobiology Psychology (cognitive, perceptual, dev.) Linguistics 8  What is learning?  Why learn?  Types of learning and salient frameworks  Frontiers 9   Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task (knowledge engineering bottleneck). Develop systems that can automatically adapt and customize themselves to individual users.  Personalized news or mail filter  Personalized tutoring  Discover new knowledge from large databases (data mining).  Market basket analysis (e.g. diapers and beer)  Medical text mining (e.g. migraines to calcium channel blockers to magnesium) Source: Introduction to Machine Learning by Raymond J. Mooney 10  Computational studies of learning may help us understand learning in humans and other biological organisms.  Hebbian neural learning ▪ “Neurons that fire together, wire together.” log(perf. time)  Power law of practice log(# training trials) Source: Introduction to Machine Learning by Raymond J. Mooney 11  Many basic effective and efficient algorithms available  Large amounts of data available  Large amounts of computational resources available Source: Introduction to Machine Learning by Raymond J. Mooney 12 Automatic vehicle navigation • Road recognition • Automatic navigation Speech recognition • Speech to text • Automated services over the phone Face detection • Facebook face tagging suggestions • Camera autofocus for portraits 13  What is learning?  Why learn?  Types of learning and salient frameworks  Frontiers 14  Remember, y=fθ(x)?  y can be continuous or categorical  y may be known for some x or none at all  f can be simple (e.g. linear) or complex  f can incorporate some knowledge of how x was generated or be blind to the generation  etc… 15  Supervised learning:  For, y=fθ(x), a set of xi, yi (usually classes) are known  Now predict yj for new xj  Examples:  Two classes of protein with given amino acid sequences  Labeled male and female face images 16  In a nutshell:  Input is non-linearly transformed by hidden layers usually a “fuzzy” linearly classified combination  Output is a linear combination of the hidden layer  Use when:  Want to model a non-linear function  Labeled data is available  Don’t want to write new s/w  Variations:  Competitive learning for classification  Many more… 17  In a nutshell:  Learns optimal boundary between two classes (red line)  Use when:  Labeled class data is available  Want to minimize chance of error in the test case  Variations:  Non-linear mapping of the input vectors using “Kernels” 18  Unsupervised learning:  For, y=fθ(x), only a set of xi are known  Predict y, such that y is simpler than x but retains its essence  Examples:  Clustering (when y is a class label)  Dimensionality reduction (when y is continuous) 19  In a nutshell:  Grouping a similar objects based  on a definition of similarity  That is, intra vs. inter cluster similarity, e.g. distance from center of the cluster Use when:  Class labels are not available, but  you have a desired number of clusters in mind Variations:  Different similarity measures  Automatic detection of number of clusters  Online clustering 20  In a nutshell:  High dimensional data, where not all dimensions are independent, e.g. (x1, x2, x3), where x3=ax1+bx2+c  Use when:  You want to perform linear dimensionality reduction  Variations:  ICA  Online PCA 21  In a nutshell:  Learning a lower-dimensional manifold (e.g. surface) close to which the data lies  Use when:  You want to perform non- linear dimensionality reduction  Variations:  SOM 22  Generative models:  For, y=fθ(x), we have some idea of how x was generated given x and θ  Examples:  HMMs: Given phonemes and {age, gender}, we know how the speech can be generated  Bayesian Networks: Given {gender, age, race} we have some idea of what a face will look like for different emotions 23  Discriminative Models:  Do not care about how the data was generated  Finding the right features is of prime importance  Followed by finding the right classifier  Examples:  SVM  MLP Source: “Automatic Recognition of Facial Actions in Spontaneous Expressions” by Bartlett et al in Journal of Multimedia, Sep 2006 24  What is learning?  Why learn?  Types of learning and salient frameworks  Frontiers 25  1980s:           Advanced decision tree and rule learning Explanation-based Learning (EBL) Learning and planning and problem solving Utility problem Analogy Cognitive architectures Resurgence of neural networks (connectionism, backpropagation) Valiant’s PAC Learning Theory Focus on experimental methodology 1990s        Data mining Adaptive software agents and web applications Text learning Reinforcement learning (RL) Inductive Logic Programming (ILP) Ensembles: Bagging, Boosting, and Stacking Bayes Net learning Source: Introduction to Machine Learning by Raymond J. Mooney 26  2000s         Support vector machines Kernel methods Graphical models Statistical relational learning Transfer learning Sequence labeling Collective classification and structured outputs Computer Systems Applications ▪ ▪ ▪ ▪ Compilers Debugging Graphics Security (intrusion, virus, and worm detection)  E mail management  Personalized assistants that learn  Learning in robotics and vision Source: Introduction to Machine Learning by Raymond J. Mooney 27 Bioinformatics • Gene expression prediction (just scratched the surface) • Automated drug discovery Speech recognition • Context recog., e.g. for digital personal assistants (SiRi?) • Better than Google translate; imagine visiting Brazil Image and video processing • Automatic event detection in video • “Seeing” software for the blind 28 Robotics • Where is my iRobot? • Would you raise a “robot” child and make it learn? Advanced scientific calculations • Weather modeling through prediction • Vector field or FEM calculation through prediction Who knows… • Always in search of new problems 29  Learning the structure of classifiers  Automatic feature discovery and active learning  Discovering the limits of learning  Information theoretic bounds?  Learning that never ends  Explaining human learning  Computer languages with ML primitives Adapted from: “The Discipline of Machine Learning” by Tom Mitchell, 2006 30 Thank you! 31       Inference: Using a system to get the output variable for a given input variable Learning: Changing parameters according to an algorithm to improve performance Training: Using machine learning algorithm to learn function parameters based on input and (optionally) output dataset known as “training set” Validation and Testing: Using inference (without training) to test the performance of the learned system on data Offline learning: When all training happens prior to testing, and no learning takes place during testing Online learning: When learning and testing happen for the same data 32

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document