Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Se#ng up ML Problem Rao Vemuri UC Davis [email protected] Defini>on of ML • A computer program is said to learn from – experience E – with respect to some class of tasks T – and performance measure P, – if its performance at T, as measured by P, improves with experience E. • Example: T – Playing chess P – Percentage of games won E – Number of games played Three Different Tasks Quan>fying P • Percentage of games won • f (won, lost, draw) – Win = 1 point – Lost = -‐1 point – Draw = 0 points • Amount of money won • Championship >tles won Capturing Experience E: A Data Set 1 2 3 4 Outlook Temperature Humidity Windy Surfing Sunny Sunny Rainy Overcast Mild Hot Mild Cool True False False True Yes No No Yes Normal High Normal High A^ributes or Features 1 2 3 4 Outlook Temperature Humidity Windy Surfing? Sunny Sunny Rainy Overcast Mild Hot Mild Cool Normal High Normal High True False False True Yes No No Yes Features: Medical Example Blood Pres. Cholesterol 1 normal high Glucose high Heart AAack? True Yes 1 2 3 4 Outlook Temperature Humidity Windy Sunny Sunny Rainy Overcast Mild Hot Mild Cool Normal High Normal High True False False True Yes No No Yes Nota>on: Instance 1 2 3 4 Outlook Temperature Hunidity Windy Surfing Sunny Sunny Rainy Overcast Mild Hot Mild Cool True False False True Yes No No Yes Normal High Normal High Note on Features • Each instance is described by the same set of features • The features may be – con>nuous (eg. Temperature) – discrete (eg. Cost in $) – Binary (eg. True/False) – Categorical (eg. Red/Blue/Yellow) Nota>on: Data Set 1 2 3 4 Outlook Temperature Hunidity Windy Surfing Sunny Sunny Rainy Overcast Mild Hot Mild Cool True False False True Yes No No Yes Normal High Normal High Training & Test Sets • Training Set: E (Experience) – The last column has labels like YES or NO – These labels are either given or inserted by an expert • Test Set: t – The last column has no labels – Our job is find those labels Hypothesis • A combina>on of a^ributes and our guess as to what the label should be for that combina>on – If the (outlook = don’t care)^(Temp = cool)^(humidity=normal)^(Windy=Trues) is one possible hypothesis. – For this hypothesis our machine should answer YES or NO Types of ML Algorithms • Supervised: You are given labeled training data. Create a func>on that fits the data – Classifica>on (looking for discrete categories) – Regression (looking for a con>nuous func>on) • Unsupervised: You are given unlabeled training data. Discover unknown, but useful, classes • Reinforcement: The learner is not told which ac>ons to take. Discovers which ac>ons yield the best reward, in the long run. WEKA • W(aikato) E(nvironment) for K(nowlegde) A(nalysis) • Developed by the University of Waikato in New Zealand • Machine Learning Tools and Techniques in Java • Comprehensive suite of Java class libraries – Implemented many state-‐of-‐the-‐art machine learning and data mining algorithms • h^p://www.cs.waikato.ac.nz/~ml/index.html WEKA Consists of… • • • • • Explorer Experimenter Knowledge Flow Simple Command Line Interface Java Interface Explorer • Is WEKA’s main graphical user interface • Weka package consists of – Filters – Classifiers – Clusterers – Associa>ons – A^ribute Selec>on – Visualiza>on tool Pre-‐Processing • Data loaded from URL or DB • Preprocessing rou>nes in WEKA are called ‘filters’ – – MergeA*ributeValuesFilter – NominalToBinaryFilter – Discre:seFilter – ReplaceMissingValuesFilter Homework Assignment 1 • Search for WEKA on the Web and write – (a) 4 short sentences about what the best features of WEKA are. – (b) One sentence on where WEKA is useful Assignment Due: A week from today (Sep 13). Write your answers in English and submit on one sheet of paper.