Download Machine Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Universitas Singaperbangsa Karawang
Lecture on
Machine Learning
Ito Wasito
Faculty of Computer Science
University of Indonesia
4 April 2017
Outline & Content
•
•
•
•
•
•
•
•
•
•
•
Why Machine Learning
What is machine learning?
Learning system model
Training and testing
Performance
Algorithms
Machine learning structure
What are we seeking?
Learning techniques
Future Opportunities
Conclusion
Why Machine Learning?
• Recent progress in algorithms
and theory
• Growing flood of online data
• Computational power is
available
• Budding industry
Three Niches for Machine Learning
Big Data
Application of Data Mining
Tid Refund Marital
Status
Taxable
Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
Splitting Attributes
Refund
Yes
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
NO
> 80K
YES
10
Training Data
Married
Model: Decision Tree
Program Too Difficult To Program by
Hand
Software that Customizes to User
http://scele.cs.ui.ac.id/
Machine Learning Definition
Machine Learning is the study of
computer algorithms that
improve automatically through
experience (T. Mitchell).
What is Machine Learning?
•
•
•
•
Study of algorithms that
improve their performance P
some task T
with experience E
well-defined learning task: <P,T,E>
Learning to detect objects in images
*) Proposed by Leslie Valiant
Function Approximation
Decision Tree
Decision Tree Learning
Decision Tree Learning (2)
Learning system model
Testing
Input
Samples
Learning
Method
System
Training
Training and Testing
Data acquisition
Practical usage
Universal set
(unobserved)
Training set
(observed)
Testing set
(unobserved)
Training and Testing
• Training is the process of making the system
able to learn.
– Training set and testing set come from the same
distribution
– Need to make some assumptions or bias
Performance
• There are several factors affecting the performance:
– Types of training provided
– The form and extent of any initial background
knowledge
– The type of feedback provided
– The learning algorithms used
• Two important factors:
– Modeling
– Optimization
Algorithms
• The success of machine learning system also
depends on the algorithms.
• The algorithms control the search to find and
build the knowledge structures.
• The learning algorithms should extract useful
information from training examples.
Algorithms
• Supervised learning
– Prediction
– Classification (discrete labels), Regression (real values)
• Unsupervised learning
–
–
–
–
Clustering
Probability distribution estimation
Finding association (in features)
Dimension reduction
• Semi-supervised learning
• Reinforcement learning
– Decision making (robot, chess machine)
Algorithms
Unsupervised learning
Supervised learning
Semi-supervised learning
30
Machine learning structure
• Supervised learning
Machine learning structure
• Unsupervised learning
What are we seeking?
• Supervised: Low E-out or maximize probabilistic terms
E-in: for training set
E-out: for testing set
• Unsupervised: Minimum quantization error, Minimum
distance, MAP, MLE(maximum likelihood estimation)
What are we seeking?
Under-fitting VS. Over-fitting (fixed N)
error
(model = hypothesis + loss functions)
Learning techniques
• Supervised learning categories and techniques
– Linear classifier (numerical functions)
– Parametric (Probabilistic functions)
• Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden
Markov models (HMM), Probabilistic graphical models
– Non-parametric (Instance-based functions)
• K-nearest neighbors, Kernel regression, Kernel density
estimation, Local regression
– Non-metric (Symbolic functions)
• Classification and regression tree (CART), decision tree
– Aggregation
• Bagging (bootstrap + aggregation), Adaboost, Random forest
Learning techniques
• Linear classifier
, where w is an d-dim vector (learned)
• Techniques:
–
–
–
–
–
Perceptron
Logistic regression
Support vector machine (SVM)
Ada-line
Multi-layer perceptron (MLP)
Learning techniques
Using perceptron learning algorithm(PLA)
Training
Testing
Error rate: 0.10
Error rate: 0.156
Learning techniques
Using logistic regression
Training
Testing
Error rate: 0.11
Error rate: 0.145
Learning techniques
• Non-linear case
• Support vector machine (SVM):
– Linear to nonlinear: Feature transform and
kernel function
Learning techniques
• Unsupervised learning categories and techniques
– Clustering
• K-means clustering
• Spectral clustering
– Density Estimation
• Gaussian mixture model (GMM)
• Graphical models
– Dimensionality reduction
• Principal component analysis (PCA)
• Factor analysis
Gaussian-Mixtures Clustering Demo
K
f ( x)    k f k ( x; k )
k 1
ANEMIA PATIENTS AND CONTROLS
Red Blood Cell Hemoglobin Concentration
4.4
4.3
4.2
4.1
4
3.9
3.8
3.7
3.3
3.4
3.5
3.6
3.7
Red Blood Cell Volume
Padhraic Smyth, UCI
3.8
3.9
4
ANEMIA PATIENTS AND CONTROLS
Red Blood Cell Hemoglobin Concentration
4.4
4.3
4.2
4.1
4
3.9
3.8
3.7
3.3
3.4
3.5
3.6
3.7
Red Blood Cell Volume
3.8
3.9
4
EM ITERATION 1
Red Blood Cell Hemoglobin Concentration
4.4
4.3
4.2
4.1
4
3.9
3.8
3.7
3.3
3.4
3.5
3.6
3.7
Red Blood Cell Volume
3.8
3.9
4
EM ITERATION 3
Red Blood Cell Hemoglobin Concentration
4.4
4.3
4.2
4.1
4
3.9
3.8
3.7
3.3
3.4
3.5
3.6
3.7
Red Blood Cell Volume
3.8
3.9
4
EM ITERATION 5
Red Blood Cell Hemoglobin Concentration
4.4
4.3
4.2
4.1
4
3.9
3.8
3.7
3.3
3.4
3.5
3.6
3.7
Red Blood Cell Volume
3.8
3.9
4
EM ITERATION 10
Red Blood Cell Hemoglobin Concentration
4.4
4.3
4.2
4.1
4
3.9
3.8
3.7
3.3
3.4
3.5
3.6
3.7
Red Blood Cell Volume
3.8
3.9
4
EM ITERATION 15
Red Blood Cell Hemoglobin Concentration
4.4
4.3
4.2
4.1
4
3.9
3.8
3.7
3.3
3.4
3.5
3.6
3.7
Red Blood Cell Volume
3.8
3.9
4
Future Opportunities
• Learn across full mixed-media data
• Learn across multiple internal database,
plus the web and newsfeeds
• Learn by active experimentation
• Learn decisions rather than predictions
• Cumulative, lifelong learning
• Programming languages with learning
embedded?
Introduction Materials
• Text Books
– T. Mitchell (1997). Machine Learning, McGrawHill Publishers.
– N. Nilsson (1996). Introduction to Machine
Learning (drafts).
• Lecture Notes
– T. Mitchell’s Slides
– Introduction to Machine Learning
Tools
Scikit-Learn: scikit-learn.org
WEKA: www.cs.waikato.ac.nz/ml/weka/
Orange: orange.biolab.si
Spark-Mlib Big data
Related documents