Download 9/5/16 1 Comp 135 Introduction to Machine Learning and Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
9/5/16
Comp 135
Introduction to Machine Learning
and Data Mining
Fall 2016
Professor: Roni Khardon
What is Machine Learning?
•  Have/collect some “data”
•  Analyze it to produce “knowledge/
insight”
•  Use that knowledge for some “task”
Computer Science
Tufts University
COMP 135
What is Machine Learning?
•  Traditionally this is partitioned into:
–  supervised learning
–  unsupervised learning
–  reinforcement learning
•  But many novel forms exist and are
discovered and invented and being
developed
COMP 135
Roni Khardon, Tufts University
Supervised Learning Applications
Domain/problem
Domain/problem
Response Variable
Weather
Temperature tomorrow
Health Informatics
Number of Patients with Influenza
Environmental Engineering
Predict soil contamination level
Commerce
Product demand
COMP 135
Roni Khardon, Tufts University
Classes/Labels
Character recognition
Face Recognition
Alphabet
Male/Female
Specific People
Spam filtering
Document Classification
Spam or not
News Heading
Structure activity relation
of molecules
Carcinogenic or not
Active drug or not
Protein fold prediction
Fold types
Astronomy: sky survey
Star types
COMP 135
Supervised Learning Applications
In many applications the response variables is
numerical and not categorical. These are known as
regression problems.
Roni Khardon, Tufts University
Roni Khardon, Tufts University
Unsupervised Learning
Clustering is often a form of data exploration
allowing us to identify groupings that are otherwise
not apparent
Domain/problem
Gene-array data
Groups
Similar activity patterns
Text
Customer Activity
Word Classes
Customer “types” (phone;
web; movies; etc)
COMP 135
Roni Khardon, Tufts University
1
9/5/16
Unsupervised Learning
Association rules capture nuggets of imperfect
prediction rules
Reinfocement Learning
Agent can control environment (take actions) and
gets occasional rewards.
Goal is to maximize long term reward.
Domain/problem
Associations
Market-basket Records
Nappy & Milk à Bread
(10% / 80%)
Nappy & Milk à Beer
(10% / 75%)
Domain/problem
Aim of Policy
AI Planning/ Robot Control
Plan course of actions to
achieve explicit or implicit
goal.
Census Data
(Age <16) à Not in Army
(X% / 100%)
(Age > 30) & (Boston)
à (has BA/BSc) (X% / Y%)
(Computer) Games
City Management
Auto-player
Traffic Control
Web site management
Contents/advertising policy
COMP 135
Roni Khardon, Tufts University
COMP 135
Novel Forms of ML
• 
• 
• 
• 
• 
• 
Supervised Learning
Constrained Clustering
Active Learning
Collaborative Filtering
Collective Classification
Many more …
You can think about more forms and
whether they are feasible
COMP 135
Roni Khardon, Tufts University
•  We will focus for a while on supervised
learning.
•  Is this really possible?
•  When?
•  Why?
•  How?
COMP 135
Supervised Learning
Application
Training Data
Classifier
Predictions of Labels for
new data
COMP 135
Roni Khardon, Tufts University
Data Representation
New Data
Learning
Algorithm
Roni Khardon, Tufts University
Roni Khardon, Tufts University
•  What does it look like?
•  Raw data depends on the applications and
our interpretation of it.
Training Data
•  Let’s look at 4 cases:
–  Toy example: PlayTennis
–  Soybean data
–  Mutagenesis molecule data
–  Astrophysics time series data
COMP 135
New Data
Roni Khardon, Tufts University
2
9/5/16
Classifier
Learning Algorithm
•  Given data representation (e.g. as table
of features/attributes)
•  How can we analyze the data to produce
a classifier that produces correct
predictions?
•  What a classifier does:
Learning
Algorithm
Classifier
–  Input: new example
–  Output: new example’s predicted label
•  How should we represent the classifier?
COMP 135
Roni Khardon, Tufts University
COMP 135
Roni Khardon, Tufts University
Evaluation
Learning Algorithm
•  Evaluating the Classifier (the output of
the algorithm) for a specific data/
application
•  In the next few lectures we will discuss
4 different algorithms:
•  Evaluating the Learning Algorithm that
should be effective for any/many
applications.
COMP 135
Roni Khardon, Tufts University
–  Nearest Neighbors
–  Decision Trees
–  Linear classifiers
–  Bayesian Classifier
COMP 135
Roni Khardon, Tufts University
3