Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine Learning Definition 1 – “The subfield of AI concerned with programs that learn from experience” – Russell / Norvig, AIMA Definition 2 – “the application of induction algorithms, which is one step in the knowledge discovery process.” – Machine Learning definition in glossary from Machine Learning at http://robotics.stanford.edu/~ronnyk/glossary.html David R. Musicant Supervised Learning Classification Example: Cancer diagnosis Patient ID # of Tumors Avg Area Avg Density Diagnosis 1 5 20 118 Malignant 2 3 15 130 Benign 3 7 10 52 Benign 4 2 30 100 Malignant Use this training set to learn how to classify patients where diagnosis is not known: Patient ID # of Tumors Avg Area Avg Density Diagnosis 101 4 16 95 ? 102 9 22 125 ? 103 1 14 80 ? Input Data Training Set Test Set Classification The input data is often easily obtained, whereas the classification is not. David R. Musicant Classification Problem Goal: Use training set + some learning method to produce a predictive model. Use this predictive model to classify new data. Sample applications: Application Medical Diagnosis Input Data Noninvasive tests Optical Character Recognition Protein Folding Scanned bitmaps Research Paper Acceptance David R. Musicant Classification Results from invasive measurements Letter A-Z Amino acid construction Protein shape (helices, loops, sheets) Words in paper title Paper accepted or rejected Application: Breast Cancer Diagnosis Research by Mangasarian,Street, Wolberg David R. Musicant Breast Cancer Diagnosis Separation Research by Mangasarian,Street, Wolberg David R. Musicant Application: Document Classification The Federalist Papers – Written in 1787-1788 by Alexander Hamilton, John Jay, and James Madison to persuade residents of the State of New York to ratify the U.S. Constitution – All written under the pseudonym “Publius” Who wrote which of them? – Hamilton wrote 56 papers – Madison wrote 50 papers – 12 disputed papers, generally understood to be written by Hamilton or Madison, but not known which Research by Bosch, Smith David R. Musicant Federalist Papers Classification Graphic by Fung David R. Musicant Research by Bosch, Smith Application: Face Detection Training data is a collection of Faces and NonFaces Rotation and Mirroring added in to provide robustness Image obtained from work by Osuna, Freund, and Girosi at http://www.ai.mit.edu/projects/cbcl/res-area/object-detection/face-detection.html David R. Musicant Face Detection Results Image obtained from work by Osuna, Freund, and Girosi at http://www.ai.mit.edu/projects/cbcl/res-area/object-detection/face-detection.html David R. Musicant Nearest Neighbor Simple effective approach for supervised learning problems Patient ID # of Tumors Avg Area Avg Density Diagnosis 1 5 20 118 Malignant 2 3 15 130 Benign 3 7 10 52 Benign 4 2 30 100 Malignant Envision each example as a point in ndimensional space – Picture with 2 of them Classify test point same as nearest training point (Euclidean distance) David R. Musicant k-Nearest Neighbor Nearest Neighbor can be subject to noise – Incorrectly classified training points – Training anomalies k-Nearest Neighbor – Find k nearest training points (k odd) and vote on which classification Training time? Testing time? Works on numerical data David R. Musicant