Download Tensor

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Internal Seminar at OCBE
24 March 2017
Tensor Decompositions
and Applications
Hadi Fanaee Tork
Postdoctoral Researcher at OCBE
[email protected]
1
111 112 113
211 212 213
311 312 313
Convert DBs to
Tensor
111 112 113
Data Analysis
Applications
Starting Points
Why Tensors?
Tensor
Decomposition
Tensor
Tensor Datasets
Software
131 132 133
231 232 233
331 332 333
121 122 123
221 222 223
321 322 323
111 112 113
211 212 213
311 312 313
131 132 133
231 232 233
331 332 333
121 122 123
221 222 223
321 322 323
111 112 113
211 212 213
311 312 313
131 132 133
231 232 233
331 332 333
121 122 123
221 222 223
321 322 323
111 112 113
211 212 213
311 312 313
Tensor
• Geometric object for extension of scalars, vectors and matrices to
higher dimensions
111
Scalar
111 112 113
Vector
111 112 113
211 212 213
311 312 313
Matrix
131 132 133
231 232 233
331 332 333
121 122 123
221 222 223
321 322 323
111 112 113
211 212 213
311 312 313
Tensor
3
Tensor
4
Tensor analysis: A trending topic
Citation to papers dealing with tensor analysis (1964–2011)
M. Kroonenberg , History of Multiway Component Analysis and Three-Way Correspondence Analysis, 2014
5
An interdisciplinary community
•
•
•
•
•
•
•
•
Mathematics: linear algebra and optimization
Pyschometrics: multi-way analysis of behaviors
Chemometrics: multi-way analysis of chemical processes
Signal Processing: multidimensional signal processing
Neuroscience: EEG and fMRI data classification
Computer Vision: image and video data classification
Data Mining: unsupervised knowledge discovery
Machine Learning: dimensionality reduction and feature extraction
6
Why tensor analysis is emerging?
50 years ago
Now
7
Necessity of Tensors
Camera
row
111 112 113
211 212 213
311 312 313
column
Misleading knowledge!
Human Eye
row
131 132 133
231 232 233
331 332 333
121 122 123
221 222 223
321 322 323
111 112 113
211 212 213
311 312 313
depth
column
8
Necessity of Tensors
Misleading knowledge!
9
Tensors in medicine
Characteristics
array
Diagnosis
Region
And many more …
Row
Channel
Syndromic surveillance
(Epidemiology)
Neuroscience (fMRI)
Neuroscience (EEG)
Frequency
Subject
Genes
Subject
Gene expression dynamic
Medical diagnosis
Indicator
Questionnaire data
Column
10
How to generate tensor data?
Example on Syndromic surveillance data
Syndromic surveillance systems : continuously monitoring multiple pre-diagnostic streams of
indicators from different regions with the aim of early detection of disease outbreaks before
laboratory confirmations.
11
Univariate data
• Emergency department visits at Oslo during the past 30 days
Days
12
Multivariate data
• Emergency department visits at Oslo during the past 30 days
Days
• Medication sales over the counter at Oslo during the past 30 days
Days
13
Tensor data
Multiple Indicators through different regions and different sampling times
14
Transforming data to tensor format
Dimension#1
Locations
Dimension#2
Indicators
Dimension#3
Time
Oslo : 1
Bergen: 2
Tromsø: 3
Medic_sales: 1
Dead birds: 2
Schol absence : 3
Work absence : 4
Emergency_visits: 5
3) Tensor
2) Translating to index
Medication Sales at Oslo in 2016/05/01 :
12565 
X(1,1,1)= 12565
13434
12565
Medic_sales
Medication Sales t Oslo in 2016/05/02:
13434 
X(1,1,2)=13434
Indicators
1) Indexing modes:
10377
11
Dead birds
Schol absence
Work absence
Emergency visits
Dead birds at Oslo in 2016/05/01: 11
X(1,2,1)=11
2016/05/01: 1
2016/05/02: 2 Medication Sales at Bergen in
....
2016/05/01: 10377 
2016/05/30: 30 X(2,1,1)= 10377
Oslo Bergen Tromsø
Locations
15
Mode Concept
Indicators
 First law of geography: everything is related to everything
else, but near things are more related than distant things
(Tobler, 1970)
 Time is universal order of changes (Leibinz, 1956)
Region
A major phenomena that
affect individual’s state
(here sales)
16
Tensor Decomposition
Tool PCA?
Do youSingle
remember
SeveralDecomposition
Applications!
Tensor
is extension of PCA for
multi-way(tensor) data
Kroonenberg, Applied Multiway Analysis 2008
Projection of large data to a summarized subspace  Main goal in data mining
17
Tensor Decomposition Models
18
CP (Canonical-PARAFAC)
Canonical Polyadic Form (Hitchcock, 1927)
Parallel Factor Analysis (Harshman, 1970)
Canonical Decomposition (Carrol&Chang, 1970)
≈
Bd
Bd
Ad
Ad
X
19
Tucker
Tucker, 1966
B
X
≈
A
G
C
20
DEDICOM (DEcomposition into DIrectional COMponents)
Harshman, 1978
Modelling asymmetric relationships
T
R
B
X
(IxIxK)
≈
A
B
(D X D)
A
(D X D x K)
(D x I)
(D X D x K)
(I x D)
21
PARAFAC2
Modeling tensors with unequal-length in one dimension
Kiers and Bro, 1999
diagonal matrix
Xk
≈
Ck
A
≈
T
Bk
22
Block Term Decomposition
De Lathauwer, 2008
Sum of low rank Tucker tensors
Br
Br
X
≈
Ar
Gr
Ar
Cr
Gr
Cr
23
RESCAL
Restricted and symmetric Tucker model
Modeling multirelational Data
Nickel et al. [2011]
X
≈
(IxIxK)
24
Coupled Matrix and Tensor Decomposition
Joint decomposition of matrices and tensors
Acar, Kolda, Dunlavy, 2011
25
Applications
• Data Quality Assessment
• Missing value estimation
• Outlier detection
• Exploratory Analysis
•
•
•
•
Factor analysis
Clustering
Trend analysis
Pattern Discovery
• Predictive modelling
• Prediction
• Forecasting
• Data Integration
• Joint analysis of multiple sources of data
26
Data Quality Assessment
27
samples
Original Tensor
F=2
Samples
Subjects
Factor #1
Factor #2
Tensor
Decomposition
Factor #1
Factor #2
Characteristics
131 132 133
231 232 233
331 332 ???
121 ??? 123
221 ??? 223
321 322 323
??? 112 113
211 212 213
311 ??? 313
Factor #1
Factor #2
Characteristics
1. Missing value estimation
Reconstruction
131 132 133
231 232 233
331 332 332
121 121.5 123
221 225 223
321 322 323
110 112 113
211 212 213
311 308 313
Factor Matrices
(Tensor Model)
Predicted values
Reconstructed
Tensor
Acar, Evrim, et al. "Scalable tensor factorizations for incomplete data." Chemometrics and Intelligent Laboratory Systems
106.1 (2011): 41-56.
28
1. Missing value estimation (Example)
Rows
Rows
Columns
Liu, Ji, et al. "Tensor completion for estimating missing values in visual data." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 35.1 (2013): 208-220.
29
Samples
Original Tensor
F=2
Samples
Subject
Factor #1
Factor #2
Tensor
Decomposition
|225-10|=215  Outlier
|121.5-122|=0.5  Regular
|308-307|=1  Regular
Factor #1
Factor #2
Characteristics
131 132 133
231 232 233
331 332 333
121 122 123
221 10 223
321 322 323
109 112 113
211 212 213
311 307 313
Factor #1
Factor #2
Characteristics
2. Outlier detection
Reconstruction
131 132 133
231 232 233
331 332 332
121 121.5 123
221 225 223
321 322 323
110 112 113
211 212 213
311 308 313
Factor Matrices
(Tensor Model)
Reconstructed Tensor
Predicted values
Tan, Huachun, et al. "Traffic volume data outlier recovery via tensor model." Mathematical Problems in Engineering 2013 (2013).
30
Exploratory Analysis
31
Original Tensor
Characteristics
Subjects
Factor #1
Weight Alcohol
Factor Analysis
questionnaire3
Factor #1
questionnaire1
Samples
Factor #1
Factor #2
questionnaire2
Factor #2
Factor Matrices
(Tensor Model)
Suspicious
Factor #1
S1
Andrade, J. M., et al. "3-Way characterization of soils by Procrustes rotation, matrix-augmented principal
components analysis and parallel factor analysis." analytica chimica acta 603.1 (2007): 20-29.
Subjects
Samples
Factor #1
Factor #2
F=2
Height
Factor #2
Factor #1
Factor #2
Characteristics
Tensor
Decomposition
Age
Samples
Characteristics
3. Factor Analysis
S2
32
Factor #2
4. Clustering
Characteristics
Tensor
Decomposition
F=6
Samples
Factor #1
Factor #2
Factor #3
Factor #4
Factor #5
Factor #6
6
500
New
Feature
Space
For
Subjects
Clustering
Algorithm
(e.g. K-means)
N=2
1
2
2
2
.
.
.
Cancer
Normal
Normal
Normal
.
.
.
Factor Matrices for
Dimension “Subjects”
Hadi Fanaee-T, et. al., Event and Anomaly Detection Using Tucker3 Decomposition, (ECAI'2012-UDM Workshop), CEUR Workshop
Proceedings , Vol 960, PP. 8-12, 2012, DOI:10.13140/RG.2.1.2370.9286
33
5. Trend Analysis
Abnormal Trend in 2003-2005
Characteristics
Tensor
Decomposition
F=6
Years
Factor #1
Factor #2
Factor #3
Factor #4
Factor #5
Factor #6
factors
Factor Matrices for
Dimension “Year”
Multivariate
Statistics
Time
Multivariate Time series
Original Tensor
Fanaee-T, Hadi, and Gama, João. "Event detection from traffic tensors: A hybrid model." Neurocomputing 203 (2016): 22-33, Elsevier.
34
Pattern1: {Subject A & C, Q1 and Q2, Age and Alcohol}
6. Pattern Discovery
A
C
Time
Time
Alcohol
Pattern#1
Pattern#2
69157 x 1
69157 x 1
Age
+
357 x 1
+ …
Characteristics
Samples
Samples
357 x 1
Characteristics
F=25
Characteristics
Characteristics
CP Tensor
Decomposition
Pattern#25
Factor Matrices (CP Tensor Model)
Bader, Brett W., Michael W. Berry, and Murray Browne. "Discussion tracking in Enron email using PARAFAC." Survey of Text
Mining II. Springer London, 2008. 147-163.
35
Predictive Modelling
36
Characteristics
S=1
S=2
…
S=20
S=20
S=2
samples
S=1
Train Set
Normal
Cancer
…
Normal
Labels
Characteristics
7. Prediction
S=21
S=22
S=…
S=30
???
???
…
???
S=30
S=22
questionnaire S=21
Test Set
Phan, Anh Huy, and Andrzej Cichocki. "Tensor decompositions for feature extraction and classification of high dimensional datasets."37
Nonlinear theory and its applications, IEICE 1.1 (2010): 37-68
7. Prediction
(40 x 3) (30 x 3)
Characteristics
X
Train
F1 F2 F3
Target Labels
F1 F2 F3
Train
B
C
F=3
C. X . B = TR
20
9
Samples
Characteristics
Tensor
Decomp
osition
Dimensionality reduction: 40 x 30 = 1200  3 x 3 =9
C. Y . B = TS
Test
9
Samples
Normal
Cancer
.
.
Normal
Predicted Labels
Make prediction
using built model
10
S=1
S=2
.
.
S=20
(e.g. NN)
Factor Matrix for Dimension “Samples” and
“Characteristics”
Y
Test
Build
Predictive
Model
S=21
S=22
…
S=30
Cancer
Cancer
…
Normal
38
Subjects
Original Tensor
Factor Matrices
(Tensor Model)
Years
Reconstruct
Tensor
Characteristics
forecast
columns of
time factors
for t+1
Factor #1
Factor #2
Subjects
Years
Factor #1
Factor #2
F=2
Forecast of Subjects’s
Characteristics in the
next Year!
Factor #1
Factor #2
Characteristics
Tensor
Decomposition
Factor #1
Factor #2
Characteristics
8. Forecasting
Subjects
Predicted Tensor
Spiegel, Stephan, et al. "Link prediction on evolving data using tensor factorization." New Frontiers in Applied Data Mining.
39
Springer Berlin Heidelberg, 2011. 100-110. (PAKDD 2011)
Data Integration
40
Source 2 : Demographic data
Characteristics
9. Data Fusion with Coupled Matrix and Tensor
Factorization
Source 1 : Questionnaire data
Samples
Demographic
Information
Original Tensor
Source 3 : Microarray data
Subjects
Acar, Evrim, Tamara G. Kolda, and Daniel M. Dunlavy. "All-at-once optimization for coupled matrix and tensor factorizations."
41
arXiv preprint arXiv:1105.3422 (2011).
Software and Tools
42
Open Source Software
• MATLAB
•
•
•
•
•
• R
N-way toolbox (http://www.models.life.ku.dk/nwaytoolbox) July 2013
Tensor toolbox (http://www.sandia.gov/~tgkolda/TensorToolbox) Feb 2015
TDALAB (http://www.bsp.brain.riken.jp/TDALAB ( May 2013
TensorBox (http://www.bsp.brain.riken.jp/~phan) Dec 2014
Tensorlab (http://www.tensorlab.net) March 2016 - Recommended
• Threeway ,
PTAk and
Multiway
• Python
• scikit-tensor
• Java (Hadoop)
• BigTensor (http://datalab.snu.ac.kr/bigtensor), 2015 For Big Data
Big data example: freebase dataset: 38M * 532 * 38M (139M nonzeros)
• Standalone/Multi-platform/Open-source
• SimTensor : Synthetic Tensor Data Generator (www.simtensor.org) Nov 2016
• Tensoraia
43
Further reading
• Key survey papers
1.
2.
3.
4.
5.
Papalexakis, Evangelos E., Christos Faloutsos, and Nicholas D. Sidiropoulos. "Tensors for data mining
and data fusion: Models, applications, and scalable algorithms." ACM Transactions on Intelligent
Systems and Technology (TIST) 8.2 (2016): 16.
Hadi Fanaee-T, and João Gama. "Tensor-based anomaly detection: An interdisciplinary survey."
Knowledge-Based Systems 98 (2016): 130-147.
Kolda, Tamara G., and Brett W. Bader. "Tensor decompositions and applications." SIAM review 51.3
(2009): 455-500.
Acar, Evrim, and Bülent Yener. "Unsupervised multiway data analysis: A literature survey." Knowledge
and Data Engineering, IEEE Transactions on 21.1 (2009): 6-20.
Mørup, Morten. "Applications of tensor (multiway array) factorizations and decompositions in data
mining." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1.1 (2011): 24-40.
• Slides
• Morten Mørup, Applications of tensor decomposition in data mining and machine learning, 2010
• Johan Westerhuis, Multiway Data Analysis (www.cac.science.ru.nl/education/Brugcollege/multiway.ppt)
• Books
• Kroonenberg, Pieter M. Applied multiway data analysis. Vol. 702. John Wiley & Sons, 2008.
• Cichocki, Andrzej, et al. Nonnegative matrix and tensor factorizations: applications to exploratory multiway data analysis and blind source separation. John Wiley & Sons, 2009.
44
Thank You!
Hadi Fanaee Tork
[email protected]
1
Related documents