Download 7. C07-Machine Learning

Document related concepts

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Artificial Intelligence
Machine Learning,
Pattern Recognition,
Data Mining
Dae-Won Kim
School of Computer Science & Engineering
Chung-Ang University
AI Scope
1. Search-based optimization techniques for real-life problems
• Hill climbing, Branch and bound, A*, Greedy algorithm
• Simulated annealing, Tabu search, Genetic algorithm
2. Reasoning: Logic, Inference, and knowledge representation
• Logical language: Syntax and Semantics
• Inference algorithm: Forward/Backward chaining, Resolution, and Expert System
3. Machine Learning/ Pattern Recognition/ Data Mining
• Classification: Bayesian algorithm, Nearest-neighbor algorithm, Neural network
• Clustering: Hierarchical algorithm, K-Means algorithm
4. Uncertainty based on Probability theory
5. Planning, Scheduling, Robotics, and Industry Automation
Have you ever heard of Big Data?
Progress in digital data acquisition
and storage technology has resulted
in the growth of huge databases.
Data mining is the extraction of
implicit, previously unknown, and
potentially useful information from
data.
We build algorithms that sift
through databases automatically,
seeking patterns.
Strong patterns, if found, will likely
generalize to make accurate
predictions on future data.
Algorithms need to be robust
enough to cope with imperfect data
and to extract patterns that are
inexact useful.
Machine learning provides the
technical basis of data mining.
We will study simple machine
learning methods, looking for
patterns in data.
People has been seeking patterns in
data since human life began.
e.g., Samsung Galaxy: Samsung Pay,
Managers in Samsung want to find
consuming patterns of users so that it’d
provide personalized services.
In data mining, computer algorithm
is solving problems by analyzing
data in databases.
Data mining is defined as the
process of (knowledge) discovering
patterns in data.
Data mining is defined as the
process of (knowledge) discovering
patterns in data.
We start with a simple example.
Q: Tell me the name of this fish.
Algorithm ??
We have 100 fishes, and measured
their lengths. (e.g., fish: x=[length]t)
Our algorithm can measure the
length of a new fish, and estimate
its label.
Yes, it is a typical prediction task
through classification technique.
But, it is often inexact and
unsatisfactory.
Next, we measured their lightness.
(e.g., fish: x=[lightness])
Lightness is better than length.
Let us use both lightness and width.
(e.g., fish: x=[lightness, width])
Each fish is represented a point
(vector) in 2D x-y coordinate space.
Everything is represented as Ndimensional vector in coordinate
space.
The world is represented as matrix
We assume that you have learned
the basic concepts of linear algebra.
The objective is to find a line that
effectively separates two groups.
How to find the line using a simple
Math from high school?
We can build a complex nonlinear
line to provide exact separation.
The formal procedure is given as:
This shows a predictive task of data
mining, often called as pattern
classification/ recognition/ prediction.
The act of taking in raw data and
making an action based on the
category of the pattern.
We build a machine that can
recognize or predict patterns.
Q: How to represent and classify texts?
-Opinion mining
-Sentiment analysis
Another famous task of data mining
is a descriptive task. Cluster analysis
is the well-known group discovery
algorithm.
We will experience the basic issues
in the prediction task (pattern
classification) in forthcoming weeks.
Some terms should be defined.
Given training data set : ‘n x d’ pattern/data matrix:
Fish
Lightness
Length
Weight
Width
Class
Label
Fish-1
10
70.3
6.0
36
Salmon
Fish-2
10
75.5
8.8
128
Salmon
Fish-3
29
51.1
9.4
164
Sea bass
Fish-4
36
49.9
8.4
113
Sea bass
Given training data set : ‘n x d’ pattern/data matrix:
‘d’ features (attributes, variables, dimensions, fields)
Fish
Lightness
Length
Weight
Width
Class
Label
Fish-1
10
70.3
6.0
36
Salmon
Fish-2
10
75.5
8.8
128
Salmon
Fish-3
29
51.1
9.4
164
Sea bass
Fish-4
36
49.9
8.4
113
Sea bass
‘n’ patterns (objects, observations, vectors,
records)
Classification
 General description
• Supervised pattern classification
• Labeled training patterns, the groups are known a priori
• Constructs rules for classifying new data into the known groups
 Specific terms
•
•
•
•
•
Pattern=object=observation is represented as a feature vector
Distance measure for numeric and categorical data
Training set (answer database) and test set (new observation)
Prediction performance by accuracy, sensitivity, specificity, …
ex) Bayesian classifier, Nearest-neighbor classifier, SVM, NN, LDA, …
Each pattern is represented as a
feature vector.
The training pattern matrix is stored
in a file or database.
Given labeled training patterns, the
class groups are known a priori.
We constructs algorithms to classify
new data into the known groups.
Training data vs. Test data
Training data are used as answers.
We are learning algorithms using
training data.
Test data are a set of new unseen
data. We predict class labels using
the learned algorithm.
Training data
# of data
# of features
data index
feature-1
feature-2
…
feature-N
class label
data index
feature-1
feature-2
…
feature-N
class label
…
data index
…
Feature-1
feature-2
…
feature-N
Test data
# of data
# of features
data index
feature-1
feature-2
…
feature-N
data index
feature-1
feature-2
…
feature-N
Feature-1
feature-2
…
feature-N
…
data index
class label
For example, we try to classify the
tumor type of breast cancer patients
Breast-cancer-training.txt
100
30
Patient-1
165
52
…
210
cancer
Patient-2
170
50
…
230
normal
…
Patient-100
…
160
47
…
250
Breast-cancer-test.txt
10
30
Patient-1
163
55
…
215
Patient-2
155
50
…
240
165
45
…
235
…
Patient-10
cancer
To evaluate the performance of
prediction algorithms, we need a
performance measure (Accuracy).
Gold Standard (Truth)
Prediction
Result
Positive
Negative
Positive
True Positive
False Positive
Negative
False Negative
True Negative
Suspicious Patients with Breast Cancer
Prediction
Result
Positive (Cancer)
Negative (Normal)
Positive (Cancer)
True Positive
False Positive
Negative (Normal)
False Negative
True Negative
Accuracy = (True Positive + True Negative) /
(True Positive + False Positive + False Negative + True Negative)
Gold Standard (Truth)
Prediction
Result
Positive
Negative
Positive
True Positive
False Positive
Negative
False Negative
True Negative
Suspicious Patients with Breast Cancer
Prediction
Result
Positive (Cancer)
Negative (Normal)
Positive (Cancer)
30
5
Negative (Normal)
10
55
Accuracy = (30 + 55) / (30 + 5 + 10 + 55) = 0.85 (85%)
References
# Textbooks
1) R.O. Duda, et al., “Pattern Classification”
2) S. Theodoridis., “Pattern Recognition”
3) T.M. Mitchell., “Machine Learning”
# Advanced Topic:
Emotional data mining
Personalized music recommendation (Nov.)