Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction Slide 1 of 21 CE802 MACHINE LEARNING AND DATA MINING INSTRUCTOR: PAUL SCOTT P.D.Scott University of Essex Introduction Slide 2 of 21 INTRODUCTION COURSE ORGANISATION WHAT IS MACHINE LEARNING? WHAT IS DATA MINING? P.D.Scott University of Essex Introduction Slide 3 of 21 COURSE ORGANISATION SYLLABUS ASSESSMENT RESOURCES “WHAT DO I NEED TO KNOW?” LECTURES CONTACT INFORMATION P.D.Scott University of Essex Introduction Slide 4 of 21 SYLLABUS Introduction Learning to classify Learning to predict numeric values Evaluating learning procedures Association rules Clustering Reinforcement learning Multiple learners (Neural networks) P.D.Scott University of Essex Introduction Slide 5 of 21 ASSESSMENT EXAMINATION A two hour examination early in the summer term. Worth 60% of module mark. TWO ASSIGNMENTS Assignment 1: Worth 15% of module mark. Due Week 9 Assignment 2: Worth 25% of module mark. Due Week 16 (i.e. after Xmas break) P.D.Scott University of Essex Introduction Slide 6 of 21 RESOURCES COURSE NOTES Hard copy distributed in class Electronic version available on module web site WEKA A public domain machine learning tool box MODULE WEB SITE Lots of information about the module including lists of relevant reading material and web sites. P.D.Scott University of Essex Introduction Slide 7 of 21 “WHAT DO I NEED TO KNOW?” “How much programming is there in the course?” This is not a programming course. You will not be assessed on your ability to program. “How much maths do I need to know?” Most of the maths will be basic probability and statistics. Occasionally I will use vectors and a little calculus but this will not be examinable. P.D.Scott University of Essex Introduction Slide 8 of 21 LECTURES Lectures are scheduled as follows: Wednesday 11:00 am - 12:50 pm 5S.3.2 Friday 3:00 pm - 3:50 pm 4.336 Normally the Wednesday session will be a lecture and the Friday section a problem class. There are no labs for this module. However, you are expected to spend some time familiarising yourself with software relevant to the assignment – notably WEKA. P.D.Scott University of Essex Introduction Slide 9 of 21 CONTACT INFORMATION My office: 1NW.3.19 (Networks Building, same level as Square 1) My email address: [email protected] My office phone: Ext 2015 Appointments: Email or phone – I can usually find a time within 24 hours. P.D.Scott University of Essex Introduction Slide 10 of 21 WHAT IS MACHINE LEARNING? Machine learning is that branch of artificial intelligence concerned with getting computers to learn from experience. Very roughly: Programming is telling a machine what to do? Machine learning is showing a machine what you want it to do and expecting it to figure out how to do it. But actually it can be more complicated than that. Sometimes we cannot even show the machine what we want because we do not really know. A smart machine can even help in this situation. P.D.Scott University of Essex Introduction Slide 11 of 21 VERY BRIEF HISTORY OF MACHINE LEARNING Machine learning has been part of artificial intelligence since before the subject began Late 1940s Turing’s refutation of Lady Lovelace’s Objection (“computers cannot be intelligent because they only do what their programmers tell them to”) was to suggest that computers could learn for themselves. Early 1950s Arthur Samuel built a checkers (called “draughts” in UK) playing program that improved its performance through playing and reached state championship level. Middle 1950s – Middle 60s First neural network learning systems. Some work on symbolic learning systems. Late 1960s Eclipse of neural networks as a result of wildly overoptimistic claims about their capabilities. Early 1970s – Middle 1970s Very little done in machine learning – most researchers believed they must first solve “the representation problem”. Late 1970s Renaissance of machine learning as potential solution to the “knowledge bottleneck” in expert systems. Middle 1980s Renaissance of neural net approaches Late 1980s – Present CE802 P.D.Scott University of Essex Introduction Slide 12 of 21 KEY ELEMENTS OF A MACHINE LEARNER Learning is often defined as improving performance at some task so there must be A task An associated performance measure Since learning is defined as deriving from experiences or examples, there must be A set of examples A representation format for examples Learning must produce some information structure that can be used to perform the task. This information structure can be viewed as a model of the domain in which learning takes place. This in turn implies the existence of a set of possible models from which the one produced is selected. This is determined by the model representation which thus restricts the set of models that can be considered The process of learning itself can then be viewed as a search of the space of possible representations for a model that maximises the performance measure. The restrictions of the representation and the strategy used by the search process lead to inductive bias – favouring some possible models over others. P.D.Scott University of Essex Introduction Slide 13 of 21 A TAXONOMY OF LEARNING TASKS Learning to classify Given a set of training examples and their associated classes Learn to correctly predict the classification of unclassified examples. e.g. A parent teaching a child to recognise animals by showing him/her pictures. Learning to predict numerical values (regression) Given a set of training examples and associated numerical values Learn to correctly predict the numerical value for other examples in which it is not known. e.g. Learning to predict tomorrow’s temperature from meteorological records. Learning to classify and regression are often called “supervised learning” Learning to form groups (clustering) Given a set of unclassified examples Develop a “sensible” scheme for classifying them. e.g. This slide! Clustering is often called “unsupervised learning”. Learning what to do next (reinforcement learning) Given the experience of engaging actively in a task Learn to improve performance when engaged in similar tasks in future e.g. Samuels Checker Player P.D.Scott University of Essex Introduction Slide 14 of 21 ATTRIBUTE TYPES Levels of Measurement Statisticians classify variables by what they term “level of measurement”. There are three main groups: Nominal The values taken by nominal variables simply define a mutually exclusive set of categories. No other relationship is assumed between members of the value set. For example: nationality, favourite colour, make of car. In machine learning these are often termed categorical attributes. Ordinal The values of ordinal variables are totally ordered so they may be ranked, but there is no other numerical significance to the intervals between them. For example: degree classification. In machine learning it is common to treat ordinals as if they were either nominal or interval attributes. Interval-Ratio Interval-ratio variables have the properties of ordinal variables and, in addition, the interval between values has an arithmetic meaning. Consequently, arithmetic operations may be applied to them. For example: age, number of siblings, income. In machine learning these are often called continuous or numeric attributes. P.D.Scott University of Essex Introduction Slide 15 of 21 WHAT IS DATA MINING? It is not easy to provide a concise definition of data mining is because: the field is new and steadily evolving practitioners, researchers and authors differ on the precise boundaries they draw around the area. It is easier to approach the subject in terms of the types of problems it is intended to address. The Problem/Challenge In the last few decades computing technology has enabled organizations to accumulate huge quantities of data about the domains in which they operate. These huge archives are the result of a major investment of resources. The archives may contain much potentially valuable information about the relationships between objects represented in the data. In the past, these archives have been underutilised because of the difficulty of discovering such relationships. A Broad Definition Data mining is the development and application of computer tools to assist in the discovery of useful or interesting relationships in large databases. P.D.Scott University of Essex Introduction Slide 16 of 21 Some Examples Marketing A businessman who knows his customer well enough to know what his customer wants will be a successful businessman. Marketing costs contribute a significant fraction to the price of many products. Much of this money is wasted in that it goes to people unlikely to buy the product. If you want to sell luxury cars, it is probably stupid to: Send mailshots to districts where the average income is very low. Put TV ads out during children’s Saturday morning cartoons. The more you know about the type of people who might be likely to buy your product, the more effectively you can use your marketing budget. The above examples of bad marketing are obvious but would it be better to: Send mailshots to affluent suburbs or rural areas? Put TV ads out during football matches or news programmes? Companies have quite a lot of information about the type of people who buy their products but they would benefit from even more. Hence the “loyalty card” in supermarkets and other chain stores. P.D.Scott University of Essex Introduction Slide 17 of 21 Loyalty Cards These cards supply vast amounts of information about the products customers buy. In particular they tell the company what combinations of products are frequently bought. This provides an opportunity for cross selling. Exploration of transaction data to discover groups of products purchased together is known as market basket analysis or affinity grouping. Of course, many such affinities are obvious. e.g. People who buy paint often buy paintbrushes But many are much more surprising:A Much Quoted Example A US supermarket applied data mining techniques to loyalty card transaction data and found On Thursdays and Fridays customers frequently purchase diapers (UK nappies) and sixpacks of beer. Of course, once the rule is discovered it is often easy to find an explanation. e.g. Couples with babies cannot go out so easily so are more likely to spend Saturday evenings in drinking beer in front of the TV. Nevertheless this relationship was not deduced (c.f. explanation based learning). The supermarket concerned arranged that beer should be displayed near the diapers and hence boosted its beer sales. P.D.Scott University of Essex Introduction Slide 18 of 21 Bioinformatics The Problem The human genome project and similar undertakings are generating vast amounts of information about the DNA sequences of humans and animals. This information is even less useful than a core dump of a huge computer program unless it can be interpreted. It is necessary to locate genes and translate the DNA sequences into the corresponding sequences of amino acids – that is, the proteins they encode. Unfortunately, knowing the sequence of amino acids doesn’t tell you much about a protein’s shape and hence its function. The Solution Suppose you already know the function (and possibly the structure) of a family of proteins. You can use data mining or statistical techniques to discover the general characteristics of the sequences of all known members of that protein family. For example, you might construct a Hidden Markov Model. This provides you with a “template” of the typical member of the family. This template can then be matched against a database of protein sequences to discover other proteins that are likely to be members of the same family and hence have similar functions. P.D.Scott University of Essex Introduction Slide 19 of 21 So What Is Data Mining? Statisticians draw a distinction between: Confirmatory Data Analysis The objective is to determine whether or not a set of data supports a given hypothesis. Example: Analysing a set of opinion poll data to confirm the hypothesis that women are more likely than men to vote Conservative. Exploratory Data Analysis The objective is to discover relationships in a set of data. Example: Investigating a set of opinion poll data and discovering that a greater proportion of women than men said they would vote Conservative. The distinction is not rigid: many tools can be used for both. Traditional statistics has place a greater emphasis on confirmatory than exploratory analysis. An Alternative Definition Data mining could be defined as the use of computer tools for exploratory data analysis in large data sets. As such it draws on three parent areas: Machine Learning Statistics Visualization P.D.Scott University of Essex Introduction Slide 20 of 21 Stages in Data Mining The application of appropriate statistical, machine learning and/or visualization techniques is actually only a small part of most data mining projects. Typically it takes less than 10% of the effort. Main Stages 1. 2. 3. 4. 5. 6. Define the problem Identify the relevant sources of data Access the selected data sources Combine the data sources Apply appropriate data mining tools Inspect results and, if necessary, return to an earlier stage. In practice stages 2, 3 and 4 dominate the work because: Data is always in mutually incompatible formats (Murphy’s Law1). None of these formats are suitable for the data mining tools you plan to use (Murphy’s Law again). The data will contain inconsistent, incomplete or erroneous elements (Murphy’s Law yet again) All of this means a major effort in data transformation and cleaning is often needed before the actual data mining can begin. Murphy’s Law: ”If anything could possibly go wrong, it will.” 1 P.D.Scott University of Essex Introduction Slide 21 of 21 Data Mining Tools A large proportion of commercial data mining is done using specialised packages providing a variety of alternative procedures that can be applied to the data. Typically they fall into one of two groups: Visualization Tools These provide a range of techniques for visual display of data points. For example, 3D rotatable scatter plots. Such tools enable the user to exploit the powerful pattern recognition capabilities of the eye to detect regularities. Analytic Tools These provide a range of machine learning and statistical techniques . Typical facilities may include: Data transformation routines Simple visualization tools Naïve Bayes Classifier Decision tree induction Back propagation neural nets Kohonen nets Multiple regression A graphical programming language that enables the user to construct sequences of operations using icons and a mouse. P.D.Scott University of Essex