Download - ecourse.org

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Line (geometry) wikipedia , lookup

List of important publications in mathematics wikipedia , lookup

Addition wikipedia , lookup

Bra–ket notation wikipedia , lookup

System of polynomial equations wikipedia , lookup

Linear algebra wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

System of linear equations wikipedia , lookup

Transcript
Author's personal copy
International Journal of Approximate Reasoning 55 (2014) 294–310
Contents lists available at ScienceDirect
International Journal of Approximate Reasoning
www.elsevier.com/locate/ijar
Imaginary numbers for combining linear equation models
via Dempster’s rule
Liping Liu
The University of Akron, Akron, OH, United States
a r t i c l e
i n f o
Article history:
Received 10 August 2012
Received in revised form 30 May 2013
Accepted 4 September 2013
Available online 17 September 2013
Keywords:
Linear belief functions
Dempster’s rule of combination
Sweeping
Imaginary numbers
Linear equation systems
Least-squares
a b s t r a c t
This paper proposes the concept of √
imaginary extreme numbers, which are like traditional
imaginary number a + bi with i = −1 being replaced by e = 1/0, along with the usual
operations on these numbers including addition, subtraction, and division. It then applies
the concept to representing linear equations in knowledge-based systems. It proves that
the combination of linear equations via Dempster’s rule is equivalent to solving a system of
simultaneous equations or finding a least-squares estimate when they are overdetermined.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
The concept of linear belief functions [3,7] unifies the representation of a diverse range of linear models in expert
systems [9,16]. These linear models include linear equations that characterize linear deterministic relationships of continuous or discrete variables and stochastic models such as linear regressions, linear time series, and Kalman filters in which
some variables are deterministic while others stochastic. They also include normal distributions that describe probabilistic
knowledge on a set of variables, a lack of knowledge such as ignorance and partial ignorance, and direct observations or
observations with missing values. Despite their variety, the concept of linear belief functions unifies them as manifestations
of a single concept, represents them as matrices with the same semantics, and combine them by a single mechanism, the
matrix addition rule [3,8], which is consistent with Dempster’s rule of combination [7].
What makes the unification possible is the sweeping operator, a matrix transformation that starts from one point in
a matrix, called a sweeping point, and gradually spreads the change to the entire matrix. Sweeping was first introduced
as an iterative method for manipulating matrices using computers [14]. The operation was later adopted in statistics, and
the term was coined by Beaton [1]. It was applied to the calculation of multivariate normal distributions [2] and used as
a conceptual tool for understanding the least squares process [5]. It was recently used for representing, transforming, and
combining linear belief functions [3,16,9].
As a variant of the Gauss–Jordan elimination method, the sweeping operator encounters the division-by-zero problem,
which the elimination method handles by pivoting, or parallel interchanges of matrix rows and/or columns so that nonzero
elements are moved to the main diagonal [4]. In the theory of linear belief functions, when two models are combined,
their matrix representations must be fully swept via the matrix addition rule [3,8]. For deterministic linear models such
as linear equations, sweeping points are often zero, and a sweeping, if it needs to be done, will have to divide regular
numerical values by zero, a mathematical operation that is not defined. Because elements at different positions in a matrix
representation have different meanings, pivoting is not applicable as a workaround. The division-by-zero issue has been a
challenge that hinders the development of intelligent systems that implement linear belief functions.
0888-613X/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.ijar.2013.09.004
Author's personal copy
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
295
In this paper, I propose a concept of imaginary extreme numbers to deal with the division-by-zero problem. An imaginary
extreme number is a complex number like 3 + 4e with extreme number e = 10 . On these imaginary numbers, the usual
operations can be defined. The notion of imaginary extreme numbers makes it possible to represent linear equations as
knowledge in intelligent systems. As we will illustrate, a linear equation is transformed into an equivalent one by a sweeping
from a zero sweeping point and a reverse sweeping from an extreme sweeping point. The notion also makes it possible to
combine linear equations as independent pieces of knowledge via Dempster’s rule of combination. We will show that the
combination of linear equations corresponds to solving the equations when the equations are under- or just-determined or
finding the least-squares estimate when the equations are over-determined.
2. Matrix sweepings
Sweeping is a matrix transformation. It starts from a sweeping point and spreads the change across the entire matrix.
A sweeping point may be a nonzero matrix element, and in this case, two versions, respectively termed forward and reverse
sweepings, are defined as follows:
Definition 1 (Forward sweep). For any matrix A = [ai j ]n×m and any sweeping point (i , j ), a forward sweeping of A from ai j
or location (i , j ), replaces element ai j by its negative inverse −1/ai j , any other elements aik in row i and akj in column j
respectively by aik /ai j and akj /ai j , and the remaining elements akl , which are not in the same row or column as ai j , i.e.,
k = i and j = l, by akl − ail akj /ai j .
Definition 2 (Reverse sweep). For any matrix A = [ai j ]n×m and any sweeping point (i , j ), a reverse sweeping of A from ai j
or location (i , j ), replaces element ai j by its negative inverse −1/ai j , any other elements aik in row i and akj in column j
respectively by −aik /ai j and −akj /ai j , and the remaining elements akl , which are not in the same row or column as ai j , i.e.,
k = i and l = j, by akl − ail akj /ai j .
Note that forward and reverse sweepings defined above operationally differ only in the sign for the elements in the same
column or row as the sweeping point. Yet the difference is significant in that forward and reverse sweepings cancel each
other’s effects, and thus the modifiers “forward” and “reverse” are justified. Both forward and reverse sweeping operations
may be also defined to sweep from a nonsingular submatrix as a sweeping point.
Definition 3. Assume real matrix A is made of submatrices as A = ( A i j ) and assume A i j is a nonsingular submatrix. Then a
forward (reverse) sweeping of A from sweeping point A i j replaces submatrix A i j by its negative inverse −( A i j )−1 , any other
submatrix A ik in row i and any submatrix A kj in column j are respectively replaced by (−)( A i j )−1 A ik and (−) A kj ( A i j )−1 ,
and the remaining submatrix A kl not in the same row or column as A i j , i.e., k = i and j = l, by A kl − A kj ( A i j )−1 A il .
When applied to a moment matrix that consists of a mean vector and a covariance matrix, sweeping operations can
transform a normal distribution to its various forms, each with interesting semantics. Assume X has mean vector μ and
covariance matrix Σ . Then, in general, the moment matrix is
M( X ) =
μ
Σ
and its fully swept form
→
−
M( X ) =
μΣ −1
−Σ −1
→
−
represents the density function of X . Note that M ( X ) symbolizes that M ( X ) has been fully swept from the covariance
matrix of X , or to be brief, M ( X ) has been fully swept from X .
Besides density functions, a fully swept matrix can represent other non-probabilistic knowledge. In the parlance of belief
functions, a zero fully swept matrix
→
−
M( X ) =
0
0
(1)
is a vacuous linear belief function, representing full ignorance because it is a neutral element in the knowledge base of
linear belief functions – it does neither add nor remove knowledge from any other belief functions when combined with
them via Dempster’s rule.
It is interesting to imagine that, for usual normal distributions, if the inverse covariance matrix Σ −1 → 0, then
→
−
M ( X ) −→
0
.
0
Author's personal copy
296
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
→
−
Although the zero swept matrix is the limit of M ( X ), it does not correspond to any probability distribution. It is also conceptually different from a normal distribution with an infinite variance that Bayesians have coined. According to Dempster
in a personal communication, the concept of vacuous belief functions is a belief function whose mass function puts total
mass one on the whole frame of discernment, whence all nontrivial beliefs are zero and plausibilities are one. A normal distribution with infinite variance on the other hand is the limiting case of an ordinary normal distribution where the variance
becomes infinite. In such a limit the probability of any finite interval becomes zero, and the distribution can be thought of
as uniform on whole real line.
A partial sweeping has more interesting semantics. For example, for the normal distribution of X , Y , and Z with moment
matrix:
⎤
⎡
3 4 2
⎢4 2 0⎥
⎥
M1( X , Y , Z ) = ⎢
⎣ 2 5 2 ⎦,
0 2 6
its sweeping from the variance terms for X and Y is a partially swept matrix
⎤
⎡
0.4375
0.625
0.75
⎢ −0.3125 0.125 −0.25 ⎥
→
− →
−
⎥.
M1( X , Y , Z ) = ⎢
⎣ 0.125
−0.25
0.5 ⎦
−0.25
0.5
5
This matrix contains two pieces of information about the variables. First, the submatrix corresponding to variables X and Y ,
⎡
⎤
0.4375
0.625
⎣ −0.3125 0.125 ⎦
0.125
−0.25
represents the density function of X and Y . Second, the partial matrix,
⎡
⎤
⎢
⎢
⎣
−0.25
0.75
−0.25 ⎥
⎥
0.5 ⎦
0.5
5
represents a regression model Z = 0.75 − 0.25 X + 0.5Y + with ∼ N (0, 5), or conditional distribution Z given X and Y as
N (0.75 − 0.25 X + 0.5Y , 5). Since this regression model alone casts no information on independent variables X and Y , the
missing elements in the above partial matrix shall be zero according to Eq. (1). Thus, we use the following matrix
⎡
0
0
⎢ 0
→
− →
−
0
M2( X , Y , Z ) = ⎢
⎣ 0
0
−0.25 0.5
⎤
0.75
−0.25 ⎥
⎥
0.5 ⎦
5
to represent a regression model.
→
− →
−
→
− →
−
Note that M 2 ( X , Y , Z ) and M 1 ( X , Y , Z ) look alike and carry similar meanings when interpreted as linear belief functions.
→
− →
−
However, they have a huge difference in terms of probability. M 1 ( X , Y , Z ) represents a usual multivariate normal distribution (one may verify that it is the product of two distributions: a marginal density function of X and Y ; and a conditional
→
− →
−
distribution of Z given X and Y ). In contrast, M 2 ( X , Y , Z ) does not correspond to any probability distribution; it contains a
conditional normal distribution of Z given X and Y but is vacuous on X and Y .
When the conditional variance of Z vanishes, the conditional distribution reduces to a regular linear equation model
Z = 0.75 − 0.25x + 0.5 y, and we use the following matrix
⎡
0
0
⎢ 0
→
− →
−
0
⎢
M3( X , Y , Z ) = ⎣
0
0
−0.25 0.5
⎤
0.75
−0.25 ⎥
⎥
0.5 ⎦
0
to represent it. It has long been realized that a linear model such as a regression model or a linear equation is a limiting
case of multivariate normal distributions [6]. However, the limits are not probability distributions; they can neither be
represented as finite probability distributions nor satisfy the usual axioms of probability. However, they are belief functions
in the perfect sense. For example, a linear equation model specifies a hyperplane as the sole focal element and assigns the
whole mass one to the focal element. In addition, with sweeping operations, the limits can be uniformly represented as a
moment matrix or its partially swept form like other linear belief functions, including normal distributions as special cases.
Author's personal copy
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
297
3. Imaginary extreme numbers
With sweeping operations, any linear belief function can be uniformly represented as a moment matrix or its partially or
fully swept form. To combine linear belief functions via Dempster’s rule, it entails their matrix representations must be fully
swept [3]. To this end, one will encounter problems of various degrees for some linear models such as linear equations. For
→
− →
−
example, sweeping the matrix M 3 ( X , Y , Z ) from Z will involve divisions by zero.
Dempster proposed replacing a division-by-zero operation by a division by δ , a symbol for an extremely small number,
as a workaround [3]. Its promise lies in the claim that, after reverse sweepings and letting δ approach zero, the result will
be free of δ and other anomalies. For example, for equations
Y = 3X + 5
(2)
Y = − X + 6,
(3)
and
their matrix representations are respectively
⎤
⎡
0 5
M4( X , Y ) = ⎣ 0 3 ⎦ ,
3 0
→
−
⎡
⎤
0
6
→
−
M 5 ( X , Y ) = ⎣ 0 −1 ⎦ .
−1 0
(4)
To fully sweep both matrices, we have to perform sweepings on the variance of Y , which are zero in both matrices. Instead
of zero, we imagine they are very small numbers, denoted respectively by δ1 and δ2 . (Here we follow the examples in [3]
and use a different δ for a different model. Using one single δ for all models also works for this example and will make the
computation simpler as an anonymous referee pointed out.) Then sweeping from them will result in:
⎡
→
− →
−
⎢
−15δ1−1
M 4 ( X , Y ) = ⎣ −9δ1−1
⎡
→
− →
−
⎢
3δ1−1
6δ2−1
M 5 ( X , Y ) = ⎣ −δ2−1
−δ2−1
5δ1−1
⎤
⎥
3δ1−1 ⎦ ,
−δ1−1
⎤
6δ2−1
⎥
−δ2−1 ⎦ .
−δ2−1
Then the combination of these linear equations corresponds to the addition of the above two fully swept matrices as follows:
⎡
→
− →
−
⎢
−15δ1−1 + 6δ2−1 5δ1−1 + 6δ2−1
M ( X , Y ) = ⎣ −9δ1−1 − δ2−1
⎤
⎥
3δ1−1 − δ2−1 ⎦ .
3δ1−1 − δ2−1
−δ1−1 − δ2−1
→
− →
−
Unsweeping M ( X , Y ) from Y , we obtain
⎡
−15δ1−1 + 6δ2−1 −
(3δ1−1 −δ2−1 )(5δ1−1 +6δ2−1 )
⎢
δ1 +δ2
⎢
−1
−1
−1
−1
⎢
(
3
δ
−δ
2 )(3δ1 −δ2 )
M ( X , Y ) = ⎢ −9δ1−1 − δ2−1 − 1
−1
−1
δ1 +δ2
⎢
⎣
3δ −1 −δ −1
−1
−1
→
−
1
2
δ1−1 +δ2−1
→
−
5δ1−1 +6δ2−1
−1
−1
δ1 +δ2
3δ1−1 −δ2−1
δ1−1 +δ2−1
1
⎤
⎥
⎥
⎥
⎥.
⎥
⎦
δ1−1 +δ2−1
To be free from δ1 and δ2 , we need also unsweep M ( X , Y ) from X . The result is complex. For example, at positions (1, 1)
and (1, 2), the resulting elements are respectively:
−
and
−15δ1−1 + 6δ2−1 −
−9δ1−1 − δ2−1 −
(3δ1−1 −δ2−1 )(5δ1−1 +6δ2−1 )
δ1−1 +δ2−1
−1
−1
(3δ1 −δ2 )(3δ1−1 −δ2−1 )
δ1−1 +δ2−1
Author's personal copy
298
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
−1
−1
5 δ 1 + 6 δ2
−
δ1−1 + δ2−1
−15δ1−1 + 6δ2−1 −
(3δ1−1 −δ2−1 )(5δ1−1 +6δ2−1 ) 5δ1−1 +6δ2−1 δ1−1 +δ2−1
−9δ1−1 − δ2−1 −
−1
δ1−1 +δ2−1
−1
−1
.
−1
(3δ1 −δ2 )(3δ1 −δ2 )
−1
−1
δ1 +δ2
Finally, let δ1 → 0 and δ2 → 0. A very tedious calculation and analysis results in the limit of M ( X , Y ) as follows:
⎤
⎡
0.25 5.75
0 ⎦.
M( X , Y ) = ⎣ 0
0
0
That is, X = 0.25 and Y = 5.75 with zero covariance matrix, or complete certainty. Note that these are the solution to the
simultaneous linear equations (2) and (3).
The δ -approach has a major difficulty; it turns a simple sweeping operation into complicated symbolic operations and
mathematical limit analysis. The task is daunting to unaided humans, and especially when a problem involves large matrices,
complexity is often overwhelming or intractable. The following are two matrices of five variables:
⎤
⎡
0
0 2.78 0.83
0.4
0
1.5
1
1 ⎥
⎢ 0
⎥
⎢
→
− →
− →
−
0
1.5
0
1 ⎥
⎢ 0
M1( X , Y , Z , U , V ) = ⎢
⎥,
0
0 ⎥
⎢ 1.5 1.5 −25
⎦
⎣ 1
0
0
−0.17
0
1
1
0
0
−0.5
2
0
M2( Z ) =
where M 2 assumes variable Z takes on value 2 for certain. To combine these two linear belief functions, we need to sweep
→
− →
− →
− →
− →
−
M 1 from X and Y into M 1 ( X , Y , Z , U , V ) as:
⎡
0
⎢ 1
⎢ −δ
⎢
⎢ 0
⎢
⎢ 1 .5
⎢ δ
⎢
⎢ 1
⎣ δ
0
2.78
0.83
0.4
0
1 .5
1
1
δ
1 .5
δ
−25 − 1.5×δ 1.5 − 1.5×δ 1.5
− 1δ.5
− 1δ.5 − 1δ.5
δ
δ
0
1
− 1δ
1 .5
δ
0
1
1
δ
δ
− 1δ.5 −
−0.17 −
−δ
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
δ
− 1δ.5
1
⎤
1
1 .5
δ
1
−δ
δ
−0.5 −
1
δ
−
1
δ
and M 2 from Z as:
→
−
M2( Z ) =
2
δ
− 1δ
.
(Here I use one δ for both matrices to simplify the result.) Then their combination is
⎡
0
⎢ 1
⎢ −δ
⎢
⎢ 0
→
− →
− →
− →
− →
−
→
−
⎢
M 1 ( X , Y , Z , U , V ) + M 2 ( Z ) = ⎢ 1 .5
⎢
⎢ δ
⎢ 1
⎣ δ
0
2.78 + 2δ
0.83
0.4
0
1 .5
1
1
δ
δ
0
1
− 1δ
1 .5
δ
0
1
1
δ
δ
δ
1 .5
δ
−25 − 4δ.5
− 1δ.5
− 3δ.0
−
1
δ
−
− 3δ
δ
− 1δ
⎥
⎥
⎥
⎥
⎥
⎥.
⎥
⎥
⎥
⎦
δ
1 .5
−0.17 −
⎤
1
δ
− 1δ
−0.5 −
2
δ
Now if we are to unsweep the combination from a variable, say Z , we will end up with a matrix, which is too large to be
wholly displayed here but whose last two columns are shown below:
Author's personal copy
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
⎡
0.83 −
−1.5
2
δ (2.78+ δ )
5.5
−25− δ
−1.5 1.5
δ
δ
−25− 5δ.5
−1.5 1.5
δ
δ
−25− 5δ.5
⎢
⎢
⎢
1
⎢
δ −
⎢
⎢
⎢
0−
⎢
⎢
⎢
1
⎢
− 1δ.5
−25− 5δ.5
⎢
⎢
⎢ −0.17 − 1 − (− 1δ.5 )(− 1δ.5 )
⎢
δ
−25− 5δ.5
⎢
⎣
1.5
3.0
(−
)(−
δ
δ )
− 1δ −
5.5
−25−
0.4 −
1
δ
1
δ
−
−
− 3δ
− 1δ −
−0.5 −
δ
−3.0
2
δ (2.78+ δ )
5.5
−25− δ
−3.0 1.5
δ
δ
−25− 5δ.5
−3.0 1.5
δ
δ
−25− 5δ.5
2
δ
1
−25− 5δ.5
(− 1δ.5 )(− 3δ.0 )
−25− 5δ.5
−
(− 3δ.0 )(− 3δ.0 )
−25− 5δ.5
299
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥.
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
The δ -approach makes it difficult to automate the computation. As we saw in the demonstration, the first sweeping,
starting from a real matrix, will result in polynomial fraction in δ . The second sweeping will create fractions of polynomial
fractions. The succeeding sweepings will create layers of layers of nested fractions. There are two perceivable approaches
to these symbolic expressions. The obvious approach is to treat each matrix element as a potentially complex nonlinear
expressions of δ , not as a number. This requires a very sophisticated parser to understand each matrix element.
The second approach is to turn each element as a polynomial fraction, but this requires transforming each nested fractions into a simple polynomial fraction using polynomial multiplications and divisions after each sweeping. Each sweeping
involves 5 polynomial multiplications and some additions. The first sweeping will create polynomial of order 1, and the nth
sweeping will create polynomial of order 2n−1 . Usually we need to fully unsweep the matrix before we can take limits,
and so we will need to perform 2n sweepings if n is the number of variables. Thus, in the worst case, each polynomial is
of order 22n−1 and each polynomial multiplication involves 24n numerical multiplications. A sweeping operates on all n2
elements of a matrix, and it will need 5n2 24n multiplications. To finish 2n forward and backward sweepings, there will be
10n3 24n multiplications in total. Therefore, the complexity is exponentially increasing with the number of variables. In addition, to store each matrix element, there is a need for two sorted lists of numerical values to represent two polynomials.
The memory requirement is n2 22n+1 because each polynomial is of order 22n−1 in the worst case. Since the polynomials
change from sweeping to sweeping, these lists have to change, additionally impacting computational performance.
The δ -approach was a major factor leading to the failure of two development projects that I conducted with colleagues
years ago. The difficulty also undermined the application of linear belief functions to engineering and social science problem
domains, where not all researchers have sufficient training in mathematical limit analysis. In this section, I propose a new
type of imaginary numbers, called extreme numbers, and use
√ it to resolve the division-by-zero issue. Just as a usual imaginary
number uses i for within the real numbers non-existent −1, we use e for 10 , which also does not exist. Also, as a usual
imaginary number consists of two parts, a real part and an imaginary part, an imaginary extreme number does as well. For
example, 3 − 2e is an extreme number with 3 as the real part and −2 as the imaginary part.
Definition 4. Assume a and b are any real numbers and the symbol e stands for the within-the-real-numbers non-existent 10 .
Then we call a + be an extreme number, a its real part, and b its imaginary part. When the imaginary part is nonzero, we
call the extreme number true extreme. When the real part is zero, we call the extreme number pure extreme. When both real
and imaginary parts are zero, the extreme number is zero, i.e., a + be = 0 if and only if a = 0 and b = 0.
When the imaginary part vanishes, an extreme number reduces to a real one. Thus, the system of extreme numbers
includes real numbers as a subset. Extreme numbers may be added, subtracted, or scaled as usual.
Definition 5. For any extreme number a + be and a real number c, their multiplication, or scaling of a + be using scale c is
defined as
c (a + be ) = (a + be )c = ac + bce .
(5)
Definition 6. For any two extreme number a1 + b1 e and a2 + b2 e, their addition is defined as
(a1 + b1 e ) + (a2 + b2 e ) = (a1 + a2 ) + (b1 + b2 )e .
(6)
Subtraction may be defined similarly or it can be derived from addition and scaling. Note that extreme numbers form a
vector space, and the system of extreme numbers is closed under the operation of scaling, addition, and subtraction.
Unlike usual imaginary numbers, the multiplication of two extreme numbers is not defined because it is not closed operationally. However, two other operations, division and crossing, can be defined, and are closed when applied to sweepings
from the diagonal elements of a symmetric submatrix (see later).
Author's personal copy
300
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
Definition 7. For any two extreme number a1 + b1 e and a2 + b2 e, their division is defined as follows:
a1 + b 1 e
a2 + b 2 e
=
⎧
⎪
⎨
⎪
⎩
b1
,
b2
a1
+ ba21 e ,
a2
b2 = 0,
b2 = 0, a2 = 0,
(7)
a 2 = b 2 = b 1 = 0.
a1 e ,
Note that this definition makes sense from the δ -approach:
a1 + b 1 e
a2 + b 2 e
=
a1 + b1 1δ
1
a2 + b 2 δ
=
a1 δ + b 1
a2 δ + b 2
,
which approaches b1 /b2 if δ → 0. If the denominator is real, i.e., b2 = 0, then division reduces to scaling. If the denominator
is zero, and the numerator is one, i.e., b1 = 0 and a1 = 1, the division becomes the standard definition of extreme numbers:
1
= e. Also, since 0e = 0, 0/0 is defined to be 0.
0
Because division generally cancels out imaginary parts, the operation of multiplication followed by division, called crossing, can be defined.
Definition 8. For any three extreme numbers a1 + b1 e, a2 + b2 e, and a3 + b3 e, their crossing is defined as follows:
⎧
⎪
⎨
(a1 + b1 e )(a2 + b2 e )
=
⎪
a3 + b 3 e
⎩
a1 b2 +a2 b1
+ bb1 b3 2 e ,
b3
a1 a2
+ a2 b1a+3a1 b2 e ,
a3
(a1 a2 )e ,
b3 = 0,
a3 = 0, b3 = b1 b2 = 0,
(8)
a 3 = b 3 = b 1 = b 2 = 0.
This definition can be also understood from the δ -approach as follows:
(a1 + b1 e )(a2 + b2 e ) (a1 + b1 1δ )(a2 + b2 1δ ) a1 a2 + (a2 b1 + a1 b2 ) 1δ + b1 b2 ( 1δ )2
=
=
a3 + b 3 e
a3 + b3 1δ
a3 + b3 1δ
=
a1 a2 δ + (a2 b1 + a1 b2 ) + b1 b2 1δ
a3 δ + b 3
.
Let δ → 0. Then the above expression reduces to
a2 b 1 + a1 b 2
b3
+
b1 b2
b3
e
if b3 = 0. Crossing reduces to division if one of the multiplicands a1 + b1 e and a2 + b2 e is real, i.e., b1 b2 = 0. If at the same
time the denominator is a nonzero real number, i.e., b3 = 0 and a3 = 0, it is reduced to scaling:
(a1 + b1 e )(a2 + b2 e )
a3
=
a1 a2
a3
+
a2 b 1 + a1 b 2
a3
e.
It is consistent with the definition of extreme numbers if the divider a3 + b3 e = 0, and b1 = 0, b2 = 0.
Since real numbers are a special case of extreme ones, the above definitions for addition, subtraction, scaling, division,
and crossing apply to real numbers and are consistent with the counterpart operations for real numbers.
Remark 9. Division is not defined if the numerator is a true extreme number, and the denominator is zero, i.e., b1 = 0 but
a2 + b2 e = 0. Crossing is not defined in two remaining cases: (1) b1 = 0 or b2 = 0 but a3 + b3 e = 0, which will result in a
division of a true extreme number by zero; and (2) b1 b2 = 0 but b3 = 0, which will result in the multiplication of two true
extreme numbers.
Extreme numbers may be extended to extreme matrices with the inverse of zero matrix being defined as
0−1 = Ie
where I is an identity matrix. The idea is to replace a zero square matrix by δ I if the sweeping point is a zero matrix. The
definition is consistent with sweeping operations when the sweeping points are individual zero elements. In general, A + Be
with real part A and imaginary part B, where both A and B are of the same dimensions. Operations on extreme matrices
can be adopted from those for extreme numbers with slight modifications on division and crossing. For any two extreme
matrices A 1 + B 1 e and A 2 + B 2 e of appropriate dimension, if B 2 is nonsingular, then
( A 1 + B 1 e )( A 2 + B 2 e )−1 = B 1 ( B 2 )−1 ,
( A 2 + B 2 e )−1 ( A 1 + B 1 e ) = ( B 2 )−1 B 1 .
Author's personal copy
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
301
For any three extreme matrices A 1 + B 1 e, A 2 + B 2 e, and A 3 + B 3 e, if B 3 is nonsingular,
( A 1 + B 1 e )( A 3 + B 3 e )−1 ( A 2 + B 2 e ) = A 1 ( B 3 )−1 B 2 + B 1 ( B 3 )−1 A 2 + B 1 ( B 3 )−1 B 2 e .
Imaginary extreme numbers are a new concept. Here I briefly
√ compare it with other similar devices. First, extreme
numbers are similar to classic imaginary numbers except that i = −1 is replaced by e = 10 . Both systems follow the same
addition, subtraction and scaling rules. However, classic imaginary numbers will not follow the rules of division and crossing
operations. Second, extreme numbers are related to affinely extended real numbers, which include improper elements ∞
and −∞ [11]. Here ∞ is essentially the e element in extreme numbers. However, there is a difference; an extreme number
a + be consists of one real part and one imaginary part whereas an affinely extended real number is a usual real number
or infinity, not both. As a result, affinely extended numbers cannot solve the division-by-zero problem in matrix sweepings.
Third, extreme matrices have some flavor of pseudoinverses such as the Moore–Penrose inverse [12]. A pseudoinverse A + of
a matrix A is such a matrix that A A + A = A and A + A A + = A + and both A A + and A A + are Hermitian. The concept extends
the usual matrix inverse for a nonsingular, square matrix to singular and rectangular ones. For example, the pseudoinverse
of a zero matrix is its transpose: 0+ = 0 T . Of course, one may already see its differences from extreme matrices, where
0−1 = Ie, not 0 T . Most importantly, if A is a real matrix, then so is A + . But A −1 is in general an extreme matrix.
Extreme numbers may be useful for other applications. In this paper, I focus on application to matrix sweepings. According to the definitions of forward and reserve sweepings, the operations involved in sweepings include divisions, crossings,
and subtractions. Then, in combination, the only operation is addition. Therefore, the defined operations for extreme numbers are necessary and sufficient if we treat each cell of the matrix of a linear model as an extreme number, including a
real number as a special case.
Are the operations defined valid? The question boils down to whether there are any cases in which any operation is
undefined. As stated in Remark 9, the only case in which division may be undefined is dividing a true extreme number by
zero. The two cases that crossing may be undefined are: (1) the multiplication of two true extreme numbers followed by a
division of a real number; and (2) the multiplication of at least one true extreme number followed by a division by zero.
These cases will not happen when sweeping and unsweeping from the diagonal elements of a symmetric submatrix as in
the case for representing and combining linear belief functions. In fact, assume M is a real matrix. Then, extreme number
divisions and crossings are closed operations in sweepings of M if the sweeping points are the diagonal elements of a
symmetric submatrix. To see this, we just need to prove that, in applying a sweeping to M, or its successive swept forms,
neither dividing a true extreme number by zero nor dividing the product of two true extreme numbers followed by division
by a real number will happen. Without loss of generality, assume M = (mi , j )(n+1)×n , where all cells are real numbers and
the lower n × n submatrix is symmetric like a covariance matrix, whose main diagonal elements, m2,1 , m3,2 , . . . , mn+1,n ,
are to be swept from. M may be the result of some successive sweepings on nonzero sweeping points. Let us assume we
need to perform a sweeping operation on mi +1,i = 0. Then extreme numbers will enter the matrix: (1) in row i + 1, the
numbers will be (−)mi +1, j e; (2) in column i, the numbers will be (−)m j ,i e; and in any other location, the number will be
mk,l − mk,i mi +1,l e. Now, assume we need to sweep on another point, say ( j + 1, j ), with its current value being m j +1, j −
m j +1,i mi +1, j e. There are two general cases. First, if m j +1, j − m j +1,i mi +1, j e is a true extreme number, i.e., m j +1,i mi +1, j = 0,
the operation will involve dividing by true extreme numbers only. The second case is when m j +1, j − m j +1,i mi +1, j e is a real
number (including zero), or m j +1,i mi +1, j = 0. Note that m j +1,i mi +1, j = 0 implies either m j +1,i = 0 or mi +1, j = 0 or both.
Because the lower n × n submatrix of M is symmetric, m j +1,i mi +1, j = 0 also implies either mi −1, j = 0 or m j −1,i = 0 or both.
Thus, the entire row j + 1 or entire column j will still be real numbers after sweeping on mi +1,i = 0. Therefore, in this
case, sweeping will involve either dividing a real number by m j +1, j in row j + 1 or in column j, or, at other locations, one
extreme number minus a division of the product of two real numbers by m j +1, j . Thus, it will not involve dividing a true
extreme number by zero or dividing the product of two true extreme numbers by a real number even when m j +1, j = 0.
The same argument applies if we need to sweep on another real or zero sweeping point.
This justifies the definition of extreme numbers as well as their operations. It implies that both division and crossing are
well defined operations for sweeping the matrix representation of any linear model.
Note that if sweeping points are not diagonal elements of a symmetric submatrix, one can run into situations that
requires dividing a true extreme number by zero or dividing the product of two true extreme numbers by a real number.
For example, after sweeping from the zero at the (1, 1) location to the following matrix,
⎡
0
⎣0
1
1
0
0
⎤
0
1⎦
0
we obtain the following matrix, to which another sweeping from (2, 2) will involve dividing real extreme number e and −e
by zero:
⎡
−e
⎣ 0
e
e
0
−e
⎤
0
1 ⎦.
0
Author's personal copy
302
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
Fig. 1. Matrices for Y = 3 X + 5 and Y = − X + 6.
4. Combining linear equation models
In this section, I apply imaginary extreme numbers to the representation, transformation, and combination of linear
equation models. To motivate the reader, we first illustrate the applications using a numerical example. Let us sweep the
→
−
→
−
→
−
matrices M 4 ( X , Y ) and M 5 ( X , Y ) in Eq. (4) from Y , both involving divisions by zero. For matrix M 4 ( X , Y ), its sweeping
from Y is
⎡
−15e
M 4 ( X , Y ) = ⎣ −9e
→
− →
−
3e
⎤
5e
3e ⎦ .
−e
Similarly, sweeping M 5 from Y results in
⎡
6e
M 5 ( X , Y ) = ⎣ −e
−e
→
− →
−
⎤
6e
−e ⎦ .
−e
Then, according to the matrix addition rule, their addition
⎡
−9e
M ( X , Y ) = M 4 ( X , Y ) + M 5 ( X , Y ) = ⎣ −10e
→
− →
−
→
− →
−
→
− →
−
2e
⎤
11e
2e ⎦
−2e
represents the combination of the two linear equations Y = 3 X + 5 and Y = − X + 6. To be free from imaginary numbers,
→
− →
−
we apply a reverse sweeping to M ( X , Y ) from either X or Y . Applying a reverse sweeping from X , we obtain its partially
→
−
swept form M ( X , Y ) as follows:
⎡
9e
− −−10e
⎢
1
M ( X , Y ) = ⎣ − −10e
→
−
− −2e
10e
11e −
(−9e)×2e
−10e
− −2e
10e
−2e −
2e ×2e
−10e
⎤
⎤
⎡
−0.9 9.2e
⎥ ⎣
0
0.2 ⎦ .
⎦=
0.2 −1.6e
Another reverse sweeping from Y will result in an unswept moment matrix of X and Y as follows:
⎡
⎢
M( X , Y ) = ⎣
0.2×9.2e
−1.6e
.2×0.2
0 − 0−
1.6e
0.2
−1.6e
−0.9 −
− −91.2e
.6e
− −01..26e
− −11.6e
⎤
⎡
⎤
0.25 5.75
⎥ ⎣
0
0 ⎦.
⎦=
0
0
Note that this is the same result as we obtained via Dempster’s δ -approach in the last section. As we can see, however, the
notion of imaginary numbers reduces the complex computation of nonlinear symbolic expressions to the simple additions,
divisions, and crossings of extreme numbers. Therefore, in a computerized system, symbolic operations are avoided, and
only a simple data type of imaginary numbers along with their division, crossing, and subtraction operations, is required.
In a prototype system, dubbed LMOS Excel, sweepings and combination are all defined on extreme numbers rather than
real ones. Fig. 1 shows a screen shot of LMOS Excel for linear equations, and Fig. 2 shows the combination of the extreme
matrices and its reverse sweepings. Note that, a red-colored value (with a solid border for monochrome printing) indicates
a sweeping point that has been swept from.
It is not a coincidence that the combination of linear equations corresponds to solving a system of simultaneous equations. Intuitively, a linear equation carries partial knowledge on the values of some variables through a linear relationship
with other variables. If each of such equations is considered an independent piece of knowledge, its combination with
other similar knowledge will render the values more certain. When there exist sufficient number of linear equations, their
combination may jointly determine a specific value of the variables with complete certainty or a solution to the system of
simultaneous linear equations. In the following, I will formally prove this result and also show the correspondence between
algebraic operations and linear belief functions computations.
Author's personal copy
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
303
Fig. 2. The combination of Y = 3 X + 5 and Y = − X + 6.
4.1. Linear representation and transformation
When linear equations are expressed as linear belief functions or partially swept matrices, what does a sweeping operation mean to linear equations and what does linear transformation mean to linear belief functions? Let us first study their
correspondence.
In general, a linear equation may be expressed explicitly as
X n = b + a1 X 1 + a2 X 2 + · · · + an−1 X n−1 .
(9)
The matrix representation for the explicit form is straightforward:
⎡
⎤
...
0
b
...
0
a1 ⎥
−
−
→
−−−−→
⎢
⎥
... ⎥.
M ( X 1 , . . . , X n −1 , X n ) = ⎢ . . . . . . . . .
⎣ 0 ...
⎦
0
an−1
a1 . . . an−1
0
0
⎢ 0
(10)
This partially swept matrix indicates that we have ignorance on the values of X 1 , X 2 , . . . , and X n−1 ; thus they correspond
to a zero submatrix in the fully swept form. Given X 1 , X 2 , . . . , and X n−1 , the value of X n is b for sure; thus its conditional
mean and variance are respectively b and 0. Of course, in algebra, a variable on the right-hand side can be moved to the
left-hand side through a linear transformation. For example, if a1 = 0, Eq. (9) can be equivalently turned into
X1 = −
b
a1
−
a2
a1
X2 − · · · −
an−1
a1
X n −1 +
1
a1
Xn .
(11)
This transformation can be also done through the sweepings of matrix representations. For example, by sweeping the matrix
in Eq. (10) from the zero variance term of X n we obtain
⎡
−ba2 e
−ba1 e
⎢ −(a1 )2 e
−
a1 a2 e
⎢
⎢ −a2 a1 e
−
−
→ −
−
→
−−−−→ −
−
→
−(
a2 )2 e
M ( X 1 , X 2 , . . . , X n −1 , X n ) = ⎢
⎢
...
...
⎢
⎣ −an−1 a1 e −an−1a2 e
a1 e
a2 e
⎤
. . . −ban−1 e
be
. . . −a1 an−1 e
a1 e ⎥
⎥
. . . −a2 an−1 e
a2 e ⎥
⎥
...
...
... ⎥
⎥
. . . −(an−1 )2 e an−1 e ⎦
...
an−1 e
−e
and then by a reverse sweeping from −(a1 )2 e, the inverse variance term of X 1 , we obtain
⎡
⎢
⎢
−b/a1
0
−a2 /a1
...
⎢
⎣ −a /a
n −1 1
1/a1
−−−−→ −
−
→
⎢
M ( X 1 , X 2 , . . . , X n −1 , X n ) = ⎢
−
−
→
0
−a2 /a1
0
...
0
0
⎤
...
0
0
. . . −an−1 /a1 1/a1 ⎥
⎥
...
0
0 ⎥
⎥,
...
...
... ⎥
⎦
...
0
0
...
0
0
which is the matrix representation for Eq. (11).
A linear equation may be also expressed implicitly as
a1 X 1 + a2 X 2 + · · · + an−1 X n−1 + an X n = b.
This implicit expression may be represented as two separate linear equations in explicit forms:
a1 X 1 + a2 X 2 + · · · + an−1 X n−1 + an X n = U
and U = b. Their matrices are respectively
(12)
Author's personal copy
304
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
⎡
⎤
... 0 0
. . . 0 a1 ⎥
−
−
→
−
−
→
⎢
⎥
M 1 ( X 1 , . . . , Xn , U ) = ⎢ . . . . . . . . . . . . ⎥
⎣ 0 ... 0 a ⎦
n
a1 . . . an 0
0
⎢ 0
and
b
.
0
M 2 (U ) =
−
−
→
−
−
→ →
−
To combine them via Dempster’s rule, we sweep both matrices from U respectively into M 1 ( X 1 , . . . , X n , U ) as
⎡
0
⎢ −(a1 )2 e
⎢
⎢ ...
⎢
⎣ −an a1 e
a1 e
and
→
−
M 2 (U ) =
⎤
...
0
0
. . . −a1 an e a1 e ⎥
⎥
...
...
... ⎥
⎥
. . . −(an )2 e an e ⎦
...
an e
−e
be
−e
,
−
−
→
−
−
→ →
−
and then add the results position-wise into M ( X 1 , . . . , X n , U ) as
⎡
0
⎢ −(a1 )2 e
⎢
⎢ ...
⎢
⎣ −an a1 e
a1 e
⎤
...
0
be
. . . −a1 an e a1 e ⎥
⎥
...
...
... ⎥
⎥.
. . . −(an )2 e an e ⎦
...
an e
−2e
(13)
→
−
−
−
→
−
−
→ →
−
Note that we could vacuously extend M 2 (U ) into M 2 ( X 1 , . . . , X n , U ) by adding zero elements corresponding to variables
−
−
→
−
−
→ →
−
X 1 , . . . , X n , meaning ignorance on these variables, and then combine it with M 1 ( X 1 , . . . , X n , U ) as simple matrix addition.
Of course, the result will be the same as in Eq. (13). To remove the auxiliary variable U , we shall turn it into unswept, or
moment form:
⎡
ba1
e
2
(a1 )2
− 2 e
⎢
⎢
⎢
M ( X 1 , . . . , Xn , U ) = ⎢
⎢ ...
⎢ an a1
⎣−
e
2
a 1 /2
−
−
→
−
−
→
...
ban
e
2
a1 an
− 2 e
b /2
⎤
⎥
⎥
...
... ⎥
⎥
⎥
(an )2
. . . − 2 e an /2 ⎦
...
an /2
0
...
...
a 1 /2 ⎥
and then remove U by projecting the above matrix to the variables X 1 , X 2 , . . . , and X n : removing the variables that are not
swept corresponds to the marginalization of a joint belief function into a marginal one just as in probability theory. Thus,
we obtain:
Lemma 10. The matrix representation for linear equation a1 X 1 + a2 X 2 + · · · + an−1 X n−1 + an X n = b is
⎡
−
−
→
−
−
→
1
b
⎛2
⎢
M ( X 1 , . . . , Xn ) = ⎢
⎣
− 12
a1
⎞
. . . an e
⎤
⎥
⎥.
⎝ . . . ⎠ a1 . . . an e ⎦
a1
(14)
an
Note that, without extreme numbers, linear equation a1 X 1 + a2 X 2 + · · · + an−1 X n−1 + an X n = b cannot be represented as
a linear belief function readily in an explicit form. This is a manifestation of the reasons for introducing the dual representations of the concept of Gaussian belief functions [7].
−
−
→
−−−−→
Assume coefficient an = 0, we can then unsweep it from X n and obtain M ( X 1 , . . . , X n−1 , X n ) as
⎡
⎢
⎢
⎢
⎣
0
0
...
0
−a1 /an
⎤
...
0
b/an
...
0
−a1 /an ⎥
⎥
...
...
...
⎥,
...
0
−an−1 /an ⎦
. . . −an−1 /an
0
Author's personal copy
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
305
which is the matrix representation for an explicit form for Eq. (12):
Xn =
b
an
−
a1
X1 − · · · −
an
an−1
an
X n −1 .
Therefore, for both implicit and explicit representations, sweeping operations correspond to linear transformations. In
general, we proved the following theorem:
→
−
Theorem 11. Assume M ( X , . . . , Y , . . .) is a matrix representation of a linear equation. Then sweeping M from Y and unsweeping it
from X corresponds to linear transformations that shift X to the left-hand side and Y the right-hand side of the linear equation.
4.2. Combination of linear equations
Now let us study the combination of multiple linear equations. First, we have a result for combining two explicit linear
equations:
Lemma 12. Assume Y is a single variable, X is n-dimensional horizontal vector, b1 and b2 are constant real values, and A 1 and A 2
are n-dimensional vertical real vectors. Then the combination of two equations Y = b1 + X A 1 and Y = b2 + X A 2 via Dempster’s rule,
after removing variable Y , corresponds to the equation
b1 + X A 1 = b2 + X A 2 .
Proof. The matrix representations for Y = b1 + X A 1 and Y = b2 + X A 2 are as follows:
⎡
⎤
0
→
−
M1( X , Y ) = ⎣ 0
( A 1 )T
b1
A1 ⎦ ,
0
0
M2( X , Y ) = ⎣ 0
( A 2 )T
b2
A2 ⎦ .
0
⎡
→
−
⎤
To combine them, we need to sweep both matrices from Y and then add them position-wise into
⎡
⎤
−b1 ( A 1 )T e − b2 ( A 2 )T e (b1 + b2 )e
→
− →
−
→
− →
−
→
− →
−
⎢
⎥
M ( X , Y ) = M 1 ( X , Y ) + M 1 ( X , Y ) = ⎣ − A 1 ( A 1 ) T e − A 2 ( A 2 ) T e ( A 1 + A 2 )e ⎦ .
( A 1 + A 2 )T e
−2e
→
− →
−
Now unsweeping M ( X , Y ) from Y , we obtain
⎡
⎢
→
−
M( X , Y ) = ⎣
1
(b − b1 )( A 1 − A 2 )T e
2 2
− 12 ( A 1 − A 2 )( A 1 − A 2 )T e
( A 1 + A 2 ) T /2
⎤
(b1 + b2 )/2
⎥
( A 1 + A 2 )/2 ⎦ .
0
Projecting the above matrix to X , we obtain
→
−
M( X ) =
1
(b − b1 )( A 1 − A 2 )T e
2 2
− 12 ( A 1 − A 2 )( A 1 − A 2 )T e
.
→
−
Then according to Lemma 10, M ( X ) is the matrix representation of the implicit linear equation of X : X ( A 1 − A 2 ) = b2 − b1 ,
which obviously is the result of solving simultaneous linear equations Y = b1 + X A 1 and Y = b2 + X A 2 by substitution:
b1 + X A 1 = b2 + X A 2 . 2
When linear equations are expressed implicitly, their combination is equivalent to forming a larger system of linear
equations. The following is the formal result:
Lemma 13. The combination of two systems of linear equations via Dempster’s rule is identical to joining them into a larger system of
linear equations.
Proof. First, assume X A = U and X B = V are two systems of explicit linear equations on vectors of variables X , U , and V ,
where U and V are distinct vectors of variables. Their matrix representations are
Author's personal copy
306
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
⎤
⎡
0
→
−
M1( X , U ) = ⎣ 0
AT
0
A ⎦,
0
0
M2( X , V ) = ⎣ 0
BT
0
B ⎦.
0
⎡
→
−
⎤
→
−
It is easy to verify their combination by following the usual matrix addition rule [3], i.e., by first sweeping M 1 ( X , U ) from U
→
−
→
− →
− →
−
→
− →
− →
−
and M 2 ( X , V ) from V , then adding the resulting matrices into M ( X , U , V ), and finally unsweeping M ( X , U , V ) from both
U and V :
⎡
0
⎢ 0
→
−
M 12 ( X , U , V ) = ⎢
⎣ AT
BT
⎤
0
A
0
0
0
B⎥
⎥,
0⎦
0
(15)
which corresponds to
X
A
B = U
V
,
a larger system containing two sets of equations X A = U and X B = V .
Next, to combine two systems of implicit equations X A = u and X B = v, we just need to combine X A = U , U = u,
X B = V , and V = v. Because the combination of linear belief functions is commutative and associative [8], we can combine
→
−
X A = U with X B = V and U = u with V = v first and then combine their intermediate results. M 12 ( X , U , V ) in Eq. (15) is
already the combination of X A = U with X B = V . Equations U = u and V = v are represented as matrices:
u
0
M 3 (U ) =
M4(V ) =
,
v
0
,
and their combination is as follows:
⎡
u
M 34 (U , V ) = ⎣ 0
0
⎤
v
0 ⎦.
0
→
−
To combine M 34 (U , V ) and M 12 ( X , U , V ), we need to apply forward sweepings to both matrices from U and V , or equivalently from the joint vector (U , V ) into the following:
⎡
0
→
− →
− →
−
M 12 ( X , U , V ) = ⎣ −e ( A , B )( A , B ) T
e( A , B )T
→
− →
−
M 34 (U , V ) =
⎤
0
e( A , B ) ⎦ ,
−eI
e (u , v )
,
−eI
and then add them point-wise into:
⎡
0
→
− →
− →
−
M ( X , U , V ) = ⎣ −e ( A , B )( A , B ) T
e( A , B )T
⎤
e (u , v )
e( A , B ) ⎦ .
−2eI
→
− →
− →
−
Since U and V are auxiliary variables, to remove them, we need to first reversely sweep M ( X , U , V ) from (U , V ) into:
⎡
⎢
→
−
M( X , U , V ) = ⎣
1
e (u , v )( A , B ) T
2
− 12 e ( A , B )( A , B )T
1
( A , B )T
2
and then project the matrix to X :
→
−
M( X ) =
1
e (u , v )( A , B ) T
2
1
− 2 e ( A , B )( A , B )T
1
(u , v )
2
1
( A, B )
2
⎤
⎥
⎦
0
.
This matrix is simply the matrix representation of the following system of implicit equations:
X
A
B = u
v .
2
Author's personal copy
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
307
To fully appreciate the meaning of the combination of linear equations, let us perform sweepings on the matrix representation of the combined system. First, we have a result that is similar to Theorem 11 but is for the combination of
multiple equations:
Theorem 14. Assume Y i = X A i , i = 1, 2, . . . , m, are m linearly independent equations of n-dimensional vector X , where n m.
→
−
Assume M ( X , Y 1 , . . . , Y m ) is the matrix for the combined system of the m linear equations. Then there is an m-dimensional subvector
→
−
in X such that a reverse sweeping from the subvector and a forward sweeping from Y 1 , . . . , Y m on M ( X , Y 1 , . . . , Y m ) is equivalent to
solving the system of m linear equations for the subvector in terms of Y 1 , . . . , and Y m and the complement of the subvector in X .
Proof. Let Y = (Y 1 , . . . , Y m ) and A = ( A 1 , A 2 , . . . , A m ). Then according to Lemma 13, the combination of the m linear equations forms the system X A = Y , with A being a n × m coefficient matrix. Because the m linear equations are independent,
i.e., none is a linear combination of others, there is an m-dimensional subvector of X that can be solved in terms of other
variables. Without loss of generality, assume X = ( X 1 , X 2 ) with X 1 being the subvector of m variables that can be solved
and A is
A=
A1
A2
with A 1 being a nonsingular m × m matrix. Then we have X 1 A 1 + X 2 A 2 = Y , which is represented as
⎡
0
⎢ 0
−
−
→ −
−
→
M ( X1, X2, Y ) = ⎢
⎣ 0
A 1T
0
0
0
A 2T
⎤
0
A1 ⎥
⎥.
A2 ⎦
0
−
−
→ −
−
→
Let us apply a forward sweeping to M ( X 1 , X 2 , Y ) from Y :
⎡
−
−
→ −
−
→ →
−
0
⎢ −e A A T
1 1
⎢
⎣ −e A 2 A 1T
M ( X1, X2, Y ) = ⎢
0
0
e A1 ⎥
⎥
−e A 1 A 2T
−e A 2 A 2T
e A 1T
⎤
⎥.
e A2 ⎦
−eI
e A 2T
−
−
→ −
−
→ →
−
Now unsweep M ( X 1 , X 2 , Y ) from X 1 . Noting that A 1 is nonsingular and
A 1 A 1T
− 1
− 1
= A 1T
( A 1 )−1 ,
we can easily verify that
⎡
0
⎢
−
−
→ →
−
0
M ( X1, X2, Y ) = ⎢
⎣ − A 2 ( A 1 )−1
( A 1 )−1
0
−( A 1T )−1 A 2T
0
0
0
⎤
( A 1T )−1 ⎥
⎥,
⎦
0
0
which is the matrix representation of X 1 = − X 2 A 2 ( A 1 )−1 + Y ( A 1 )−1 .
2
→
−
According to Lemma 10, each implicit linear equation is represented as a matrix in fully swept form. Assume M i ( X ) is
the matrix for equation i, where i = 1, 2, . . . , m. Then, the combination of m implicit linear equations is simply the sum of
these fully swept matrices:
→
−
M( X ) =
m
→
−
M i ( X ).
i =1
For the combined system of implicit linear equations, we have the following result:
→
−
Theorem 15. Assume M ( X ) is the matrix representation of the combined system of m implicit linear equations for an n-dimensional
→
−
vector X . Then M ( X ), i.e., the fully unswept form of M ( X ), represents the solution to the system of the m simultaneous linear equations
if m = n and all equations are linearly independent. In addition, if m > n and there are n linearly independent equations, M ( X )
represents the least-squares estimate of X .
Proof. According to Lemma 13, the combined system of m implicit linear equations is X A = C , where C is a vector of m
real constants and A is n × m coefficient matrix. Using auxiliary variable U , the system is equivalent to the combination of
Author's personal copy
308
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
⎡
0
→
−
M1( X , U ) = ⎣ 0
AT
with
M 2 (U ) =
C
0
⎤
0
A⎦
0
.
Via extreme numbers, the combination is
⎤
⎡
0
→
− →
−
M( X , U ) = ⎣ − A AT e
AT e
Ce
Ae ⎦ .
−2Ie
→
− →
−
Unsweeping M ( X , U ) from the inverse covariance matrix of U , we obtain
⎡
⎢
→
−
M( X , U ) = ⎣
1
C AT e
2
− 12 A A T e
T
C /2
⎤
⎥
A /2 ⎦ .
A /2
0
→
−
Since A has rank n, A A T is positive definite. Thus, we can unsweep M ( X , U ) from the inverse covariance matrix of X and
obtain
⎡
C A T ( A A T )−1
⎣
M( X , U ) =
0
0
1
C [I
2
+ A T ( A A T )−1 A ]
0
0
⎤
⎦
(16)
implying that, after combination, variable X takes on value
X = C AT A AT
− 1
with certainty. Note that this solution is the least-squares estimate of X from regression model X A = C with A being the
observation matrix for independent variables and C being the observations for a dependent variable. In the special case
when m = n, we have
Thus
A AT
− 1
− 1 − 1
= AT
A .
⎡
C A −1
M( X , U ) = ⎣ 0
0
⎤
C
0 ⎦,
0
implying that X = C A −1 and U = C with certainty. This is simply the solution to X A = C .
2
Summarizing Lemma 13 and the above two theorems, we can conclude about the meaning of combining linear equations. When a number of linear equations are under-determined, their combination corresponds to solving the equations
for some variables in terms of others. When they are over-determined, their combination corresponds to finding the leastsquares estimate for all the variables. Finally, when they are just determined, their combination corresponds to solving the
simultaneous equations for all the variables.
Theorem 15 presents an interesting result in the case when linear equations are over-determined, i.e., m > n. Note that
the most important application of pseudoinverses is to solve linear systems, and it is well known that the Moore–Penrose
inverse provides a least-squares solution to a system of linear equations [13]. Here Dempster’s rule of combination, along
with sweeping operations on extreme matrices, a seemingly unrelated method, achieved the same result.
We shall also note another interesting aspect of the result. That is, we may actually combine conflicting pieces of evidence in the case. This can be explained from two different perspectives.
First, in terms of finite belief functions, a linear equation is a belief function with mass one assigned to a hyperplane
determined by the equation. The combination of two linear equation assigns mass one to the intersection of two hyperplanes. In the same logic, the combination of n linearly independent equations assigns mass one to the intersection of n
linearly independent hyperplanes, which is the solution to the n simultaneous equations. Then any additional linear equation will either pass through the solution or miss it. If it passes through the solution, the additional evidence is consistent
with former n pieces of evidence or n linear equations. If it misses the solution, then the additional evidence is in conflict
with some or all of these n equations so that the intersection of n + 1 focal elements is empty. Therefore, in general, when
Author's personal copy
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
309
combining m linear equations of n variables as finite belief functions, the intersection of m focal elements may be empty
when m > n.
Second, we can also notice the possible conflict from Eq. (16), which implies the auxiliary variable U takes on the value
U=
− 1 1 A
C I + AT A AT
2
(17)
with certainty. This is in conflict with initial component model U = C if A T ( A A T )−1 A = I . Note that, m > n, and there exist
only n independent linear equations. Only n variables of U can take independent values, and the remaining n − m variables
take on the values as derived from those n independent values. Otherwise, U = C will have internal conflicts among some
or all components of U .
In the Dempster–Shafer theory, two finite belief functions that are in conflict are deemed incompatible [15]. Thus, the
Dempster–Shafer theory does not combine conflicting evidence. However, I do see the need to do so in practical problems,
where judgments and observations are often inherently fuzzy or imprecise, leading to conflicts. Being able to combine
conflicting evidence is a virtue and thus I see Theorem 15 as a useful one. The theorem essentially states that, when there
is a conflict, the combination finds a solution that is the closest to all the observations or judgments. The result may be
justified statistically or philosophically. It may prescribe a method for combining conflicting evidence. Of course, Dempster’s
rule of combination must be altered to allow such a result.
5. Conclusion
In manipulating and combining linear belief functions often arises the division-by-zero enigma, in which one has to
divide a number by zero or a matrix by a zero matrix. A current workaround is to replace the division by a symbolic
one in the hope that the symbol may vanish in later operations. However, the computation is often intractable, and the
complexity is in the order of O (n3 24n ) numerical multiplications and O (n2 22n ) memory size. To resolve this problem, this
paper proposed a concept of imaginary extreme numbers, on which usual operations such as addition, subtraction, and
division can be defined.
The extreme number approach is essentially an approximation of the δ -approach; it does not keep any higher-order
terms of 1/δ . Thus the computation is dramatically reduced. In comparison, using extreme numbers, each sweeping takes 3
multiplications, two divisions, and one addition. The worst case complexity is polynomial: 2n × n2 × 5 = 10n3 multiplications
and divisions of imaginary numbers. The memory requirement is 2n2 .
This paper showed that, when applied to matrix sweepings, the class of extreme numbers is closed under the defined
operations. The paper then focused on the application of extreme numbers to representing and transforming linear equation models in both explicit and implicit expressions and combining them as linear belief functions via Dempster’s rule.
I showed that the combination of linear equations via Dempster’s rule corresponds to joining them into a larger linear system, and further sweeping operations correspond to finding a solution to the system. In particular, when a system of linear
equations is under- or just-determined, the combination is equivalent to solving the equations, and when the equations are
over-determined, it is equivalent to finding the least-squares estimate.
The δ -approach was a mere workaround in Dempster’s illustration [3]. It has never been fully developed or validated,
and it has never been applied to the combination of linear equations. Because the extreme number approach is its approximation, this papers indirectly proves that the δ -approach, including both Dempster’s original multi-δ approach and
the reduced single-δ approach, should work in general for combining linear equations, except that their computational
complexity is undesirable.
Linear equations are a limiting case of multivariate Gaussian distributions and follow the definition of continuous belief
functions [7]. Thus, they are manipulated as such in this paper via matrices. Linear equations may be qualitatively understood as finite belief functions, and the main result of this paper can be qualitatively explained via Dempster’s rule of
intersections and multiplications [10]. Essentially, a linear equation specifies a hyperplane as the only focal element with the
whole mass one assigned. It embodies a piece of knowledge on involved variables as such that their true location is somewhere on the hyperplane for sure, but there is no further knowledge justifying its whereabouts on the hyperplane, meaning
we have partial ignorance. The combination of linear equations corresponds to the intersection of hyperplanes representing
the linear equations. It restricts the true location to a subset smaller than original intersecting hyperplanes. When more
equations are combined, the smaller their intersection is, implying that the combined knowledge becomes more certain.
When the number of equations are just right, the intersection becomes one unique point; the combination renders perfect
knowledge. Of course, this idea of intersecting focal elements can only go this far. When there are too many equations, some
of them may be conflicting with each other and thus cannot be combined in the Dempster–Shafer theory. However, when
these linear equations are combined as continuous belief functions, the combination determines a location that is closest to
all the focal elements. This seems to suggest an alternative method to the combination of conflicting evidence in the future.
I anticipate that the concept of extreme numbers is useful in other context besides linear belief functions. For example,
in a separate paper, I showed that, to compute the inverse of a symmetric matrix, one may sweep from each of its leading
diagonal elements without pivoting, regardless whether they are zero or not. Note that, in computing matrix inverses or
solving linear equations, it has been a long-lasting problem on how to handle the division-by-zero problem in the Gaussian
Author's personal copy
310
L. Liu / International Journal of Approximate Reasoning 55 (2014) 294–310
elimination method [4]. Now the concept of extreme numbers renders pivoting or randomization unnecessary, resolving the
problem in the computation of symmetric matrix inverses.
There are a few mathematical problems regarding the foundation of extreme numbers. For example, for knowledge
representations, one may be concerned with the equivalence of models; sweeping from a nonzero real value will always
produce an equivalent model whereas sweeping from an extreme value may not. I have obtained some preliminary result
toward proving the conjecture that, for any extreme matrix
A + Be = (ai , j + b i , j e )(n+1)×n
a sweeping from ai +1,i + b i +1,i e is lossless or produces an equivalent model if and only if ai +1,i b i +1,i = 0.
The lossless conjecture has been numerically validated by two different systems developed based on extreme numbers. The first one, dubbed LMOS, is a sophisticated prototype that supports multiple users, multiple projects, and multiple
database management systems. It allows user-friendly encoding of knowledge and one-stop combination of multiple linear models. The second one, LMOS Excel, is a lighter version built on Microsoft Excel platform. It has the capability of
representing and combining linear belief models, and in addition, it can sweep any matrix, which may not be the matrix
representation of a linear model, from any sweeping point, which may not be on the principal diagonal of a symmetric
submatrix. Thus, it can be used for solving general algebraic problems such as inverting a matrix, computing the rank or
determinant of a matrix, or solving linear equations or finding least-squares estimates by standard textbook methods, i.e.,
not via Dempster’s rule of combination as studied in this paper.
References
[1] A.E. Beaton, The use of special matrix operators in statistical calculus, Technical Report RB-64-51, Educational Testing Service, 1964.
[2] A.P. Dempster, Elements of Continuous Multivariate Analysis, Addison–Wesley, Reading, MA, 1969.
[3] A.P. Dempster, Normal belief functions and the Kalman filter, in: A.K.M.E. Saleh (Ed.), Data Analysis from Statistical Foundations, Nova Science Publishers, Hauppauge, New York, 2001, pp. 65–84.
[4] G.H. Golub, C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore and London, 1989.
[5] J.H. Goodnight, A tutorial on the sweep operator, Am. Stat. 33 (3) (1979) 149–158.
[6] C.G. Khatri, Some results for the singular normal multivariate regression models, Sankhya, Ser. A 30 (1968) 267–280.
[7] L. Liu, A theory of Gaussian belief functions, Int. J. Approx. Reason. 14 (1996) 95–126.
[8] L. Liu, Local computation of Gaussian belief functions, Int. J. Approx. Reason. 22 (1999) 217–248.
[9] L. Liu, C. Shenoy, P.P. Shenoy, Knowledge representation and integration for portfolio evaluation using linear belief functions, IEEE Trans. Syst. Man
Cybern., Ser. A 36 (4) (2006) 774–785.
[10] L. Liu, R. Yager, Classic works on the Dempster–Shafer theory of belief function: An introduction, in: R. Yager, L. Liu (Eds.), Classic Works of the
Dempster–Shafer Theory of Belief Functions, Springer-Verlag, New York, NY, 2008, pp. 1–34.
[11] E.J. McShane, Unified Integration, Academic Press, Orlando, FL, 1983.
[12] R. Penrose, A generalized inverse for matrices, Proc. Camb. Philos. Soc. 51 (1955) 406–413.
[13] R. Penrose, On best approximate solution of linear matrix equations, Proc. Camb. Philos. Soc. 52 (1956) 17–19.
[14] A. Ralston, Mathematical Methods for Digital Computers, John Wiley & Sons, New York, NY, 1960.
[15] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, NJ, 1976.
[16] R.R. Srivastava, L. Liu, Applications of belief functions in business decisions: A review, Inf. Syst. Front. 5 (4) (2003) 359–378.