Download Diapositive 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Transcript
Machine Learning
Documentation Initiative
Workshop on the Modernisation of Statistical Production
Topic iii) Innovation in technology and methods driving opportunities for modernisation
Kenneth Chu and Claude Poirier
Geneva, Switzerland, 15-17 April 2015
What is Machine Learning (ML)
Application of artificial intelligence in which
algorithms use available information to process
(or assist the processing of) statistical data
Coding
Editing
Linkage
Collection
• 20 applications were reported.
2
Statistics Canada • Statistique Canada
2017-05-22
Why should we consider ML ?
 Relatively new discipline of computer science
• No needs for probabilistic models
• Less stringent for the BIG Data era
 NSOs should all explore the use of ML
3
Statistics Canada • Statistique Canada
2017-05-22
Classes of ML
SUPERVISED ML
 Ex.1: Logistic regression [statistics]
• Training data: Binary response (0:1) and predictors
• Maximum likelihood leads to model parameters
• Resulting model is used to predict responses
 Ex.2: Support Vector Machines [non-statistics]
• Training data: Binary response (0:1) and predictors
• Hyperplanes in the space of predictors separate responses
• SVM optimisation problem comes from geometry
 Decision trees, neural networks, Bayesian networks
4
Statistics Canada • Statistique Canada
2017-05-22
Classes of ML
UNSUPERVISED ML
 Ex.1: Principal Component Analysis [statistics]
• PCA summarizes a set of data by finding orthogonal
sub-spaces that represent most of the variation
• There is no longer a response variable in the setting
 Ex.2: Cluster Analysis [non-statistics]
• CA seeks to determine grouping in given data
• Again, there are no response variables in the setting
5
Statistics Canada • Statistique Canada
2017-05-22
Applications
 Automated Coding
• Bayesian classifier (Germany): Occupation coding
• CASCOT (United Kingdom): Occupation coding
• Indexing utility (Ireland): Individual consumption
• SVM (New Zealand): Occupation and Qualification
6
Statistics Canada • Statistique Canada
2017-05-22
Applications
 Data Editing
• Bayesian Networks (Eurostat): Voting intentions
• Classification Trees (Portugal): Foreign trade data
• Cluster Analysis (USA): Census of agriculture
• CART (New Zealand): Census of population
• Random Forests (New Zealand): Donor imputation
• Association Analysis (New Zealand): Edit rules
7
Statistics Canada • Statistique Canada
2017-05-22
Applications
 Record Linkage
• Neither like coding, nor editing
• Quality of linkages depends on pre-processing more
than matching
• No applications of Machine Learning in official
statistics were listed
8
Statistics Canada • Statistique Canada
2017-05-22
Applications
 Other areas – Data collection
• Classification Tree (USA): Non-response prediction
• Classification Tree (USA): Reporting errors
• Naïve Bayes text mining (Italy): Web scraping
• K-nearest neighbours (Hungary): Tax audit
• Image Processing (Canada): Remote sensing
9
Statistics Canada • Statistique Canada
2017-05-22
Concluding remarks
 Several machine learning applications
 Gap in the area of record linkage
 Attention required outside statistical paradigms
 Next: Applying Machine Learning on BIG Data
• Will this be possible only on a case-by-case basis?
10
Statistics Canada • Statistique Canada
2017-05-22
Thank you
Merci
 For more information,
please contact:
Pour plus d’information,
veuillez contacter :
[email protected]
11
Statistics Canada • Statistique Canada
2017-05-22