Download x 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Theoretical computer science wikipedia , lookup

Vector generalized linear model wikipedia , lookup

Data analysis wikipedia , lookup

Inverse problem wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Computer simulation wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Predictive analytics wikipedia , lookup

Corecursion wikipedia , lookup

Pattern recognition wikipedia , lookup

Data assimilation wikipedia , lookup

Generalized linear model wikipedia , lookup

Regression analysis wikipedia , lookup

Transcript
Connection between Multilayer
Perceptrons and Regression Using
Independent Component Analysis
Aapo Hyvärinen and Ella Bingham
Preliminary version appeared in Proc.
ICANN'99
Summarized by Seong-woo Chung
2001.9.14
Introduction

Express observed random variables x1, x2, …, xq as linear
combinations of unknown component variables, denoted
by s1, s2, …, sn ( n>=q for nonsingular joint density)
x  As

The variables in x are divided into two parts, observed and
missing
 So k first variables form the vector of the observed variables xo=(x1,
…, xk)T , and the remaining variables forum the vector of the
missing variables xm=(xk+1, …, xq)T
(
xo
xm
)(
(C) 2001, SNU CSE Biointelligence Lab
Ao
Am
)s
Introduction(Continued)

The problem is to predict xm for a given observation of xo
 The regression x^ m is conventionally defined as the conditional
expectation
^
x m  E{x m | x o }

Model the joint density of x by ICA, and then, for a given
sample of incomplete data, predict the missing values in xm
using the conditional expectation, which is well defined
once the ICA model has been estimated
E{x m | x o }  A m
 sp(s)ds
A os  x o
(C) 2001, SNU CSE Biointelligence Lab
Regression by ICA and by an
MLP: The connection

Denote the probability densities of the si by pi , and gi(u) =
p´i(u) / pi(u) + cu
T
E{xm | xo }  A m g ( A o x o )


The regression function for data modeled by ICA, is given
by the output of an MLP with one hidden layer
The weight vectors of the MLP are simple functions of the
mixing matrix, and the nonlinear activation functions of
the MLP are functions of the probability densities of the si
 The vector AoTxo can be interpreted as an initial linear estimate of s
 The nonlinear aspect of g() consists largely of thresholding the
linear estimates of s, to obtain s= g(AoTxo)
 The final linear layer is basically a linear reconstruction of the
form xm = Amŝ
(C) 2001, SNU CSE Biointelligence Lab
Simulation

Simulation data is 100-dimensional and there are 101000 data
samples
 The independent components, generated according to some
probability density are mixed using a randomly generated n×n
mixing matrix
 The mixtures x are divided into observed (xo) and missing (xm)
 The dimensionality of xo is 99 and the dimensionality of xm is 1
 The variables in xo are uncorrelated and their variance is set to one
 A training data set of size 100000 and a test data set of size 1000


The ICA estimation on the training data set give the estimated
values for the source signals s and the mixing matrix A
The value of the missing variable xm is predicted either using
numerical integration or using approximation method
(C) 2001, SNU CSE Biointelligence Lab
Simulation – Strongly
Supergaussian Data
p( s ) 
1
3
2 (1 | s |)4
(  3) / dsign ( s )
f ' ( s) 
 (  1)  | s / d |
(C) 2001, SNU CSE Biointelligence Lab
Simulation – Laplace Distributed
Data
exp(  2 | s |
p( s ) 
2
f ' ( s )  2sign ( s )
(C) 2001, SNU CSE Biointelligence Lab
Simulation – Very Weakly
Supergaussian Data
1 1
p( s ) 
2 cosh 2 s
f ' ( s )  tanh s
(C) 2001, SNU CSE Biointelligence Lab
Conclusion

Approximation
 If the distributions of the independent components are
close to gaussian, it gives excellent results
 If they are strongly supergaussian, the approximation is
less accurate but still quite reasonable in the range we
experimented with

Regression
 The stronger the supergaussianity, the better the quality
of the regression
 In contrast, for weakly supergaussian components, ICA
regression does not really explain the data
(C) 2001, SNU CSE Biointelligence Lab
Discussion

Regression by ICA is computationally demanding, due to
the integration
 The integration may be approximated by the computationally
simple procedure of computing the outputs of an MLP


The output of each hidden-layer neuron corresponds to
estimation of one of the independent components
The choice of the nonlinearity is a problem of estimating
the probability densities of the independent components
(C) 2001, SNU CSE Biointelligence Lab