* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Natural Order-Generic Collapse for ω
Survey
Document related concepts
History of logic wikipedia , lookup
Modal logic wikipedia , lookup
Quantum logic wikipedia , lookup
List of first-order theories wikipedia , lookup
Model theory wikipedia , lookup
Propositional calculus wikipedia , lookup
Non-standard calculus wikipedia , lookup
Law of thought wikipedia , lookup
Mathematical logic wikipedia , lookup
Intuitionistic type theory wikipedia , lookup
Intuitionistic logic wikipedia , lookup
First-order logic wikipedia , lookup
Natural deduction wikipedia , lookup
Curry–Howard correspondence wikipedia , lookup
Transcript
The Natural Order-Generic Collapse for ω-Representable Databases over the Rational and the Real Ordered Group Nicole Schweikardt Institut für Informatik / FB 17 Johannes Gutenberg-Universität, D-55099 Mainz [email protected] http://www.informatik.uni-mainz.de/˜nisch/homepage.html Abstract. We consider order-generic queries, i.e., queries which commute with every order-preserving automorphism of a structure’s universe. It is well-known that first-order logic has the natural order-generic collapse over the rational and the real ordered group for the class of dense order constraint databases (also known as finitely representable databases). I.e., on this class of databases over Q, < or R, <, addition does not add to the expressive power of first-order logic for defining ordergeneric queries. In the present paper we develop a natural generalization of the notion of finitely representable databases, where an arbitrary (i.e. possibly infinite) number of regions is allowed. We call these databases ω-representable, and we prove the natural order-generic collapse over the rational and the real ordered group for this larger class of databases. Keywords: Logic in Computer Science, Database Theory, Constructive Mathematics 1 Introduction and Main Results In relational database theory a database is modelled as a relational structure over a fixed, possibly infinite universe U. A k-ary query is a mapping Q which assigns to each database A a k-ary relation Q(A) ⊆ Uk . In many applications the elements in U only serve as identifiers which are exchangeable. If this is the case, one demands that queries commute with every permutation of U. Such queries are called generic. If U is linearly ordered, a query may refer to the ordering. In this setting it is more appropriate to consider queries which commute with every order-preserving (i.e. strictly increasing) mapping of U. Such queries are called order-generic. A basic way of expressing order-generic queries is by first-order formulas that make use of the linear ordering and of the database relations. Database theorists distinguish between two different semantics: active semantics, where quantifiers only range over database elements, and the (possibly) stronger natural semantics, where quantifiers range over all of U. In the present paper we always consider natural semantics. L. Fribourg (Ed.): CSL 2001, LNCS 2142, pp. 130–144, 2001. c Springer-Verlag Berlin Heidelberg 2001 The Natural Order-Generic Collapse 131 It is a reasonable question whether the use of additional, e.g. arithmetical, predicates on U allows first-order logic to express more order-generic queries than with linear ordering alone. In some situations this question can be answered “yes” (e.g. if U is the set of natural numbers with + and × as additional predicates, cf. [3]). In other situations the question must be answered “no” (e.g. if U is the set of natural numbers with + alone, cf. [8]) — such results are then called collapse results, because first-order logic with the additional predicates collapses to first-order logic with linear ordering alone. A recent overview of this area of research is given in [3]. In classical database theory, attention usually is restricted to finite databases. In this setting Benedikt et al. [2] have obtained a strong collapse result: Firstorder logic has the natural order-generic collapse for finite databases over ominimal structures. This means that if the universe U together with the additional predicates, has a certain property called o-minimality, then for every order-generic first-order formula ϕ which uses the additional predicates, there is a formula with linear ordering alone which is equivalent to ϕ on all finite databases. Belegradek et al. [1] have extended this result: Instead of o-minimality they consider quasi o-minimality, and instead of finite databases they consider finitely representable databases (also known as dense order constraint databases). Many structures interesting to database theory, including N, <, +, Q, <, +, R, <, +, and R, <, +, ×, ex , are indeed o-minimal or at least quasi o-minimal. A database is called finitely representable if each of its relations can be explicitly defined by a first-order formula which makes use of the linear ordering and of finitely many constants in U. For U ∈ {Q, R}, finitely representable databases are exactly those databases where every relation is defined by a Boolean combination of order-constraints over U. I.e., those database relations essentially consist of a finite number of multidimensional rectangles in U. A reasonable question is whether such collapse results hold for even larger classes of databases. In [8] it was shown that over N, <, + the natural ordergeneric collapse does indeed hold for arbitrary databases. However, this result cannot be carried over to dense linear orders: Belegradek et al. have shown (cf. [1, Theorem 3.2]) that e.g. over Q, <, + the natural order-generic collapse does not hold for arbitrary databases. This result draws a borderline between finite and finitely representable databases on the one side and arbitrary databases on the other. In the present paper we extend that borderline. We develop a natural generalization of the notion of finitely representable databases. We call these databases ω-representable, and we obtain the following Main Theorem 1. First-order logic has the natural order-generic collapse for ω-representable databases over Q, <, + and R, <, +. We call a database ω-representable if each of its relations can be explicitly defined by a formula in infinitary logic which makes use of the linear ordering and of a countable, unbounded sequence of constants s1 < s2 < · · · in U. For U ∈ {Q, R}, ω-representable databases turn out to be exactly those databases where every relation is defined by an infinitary Boolean combination of order-constraints 132 Nicole Schweikardt over U, (sn )n1 . I.e., those database relations essentially consist of a finite or a countable number of multidimensional rectangles in U. In particular, the theorem above shows that there is a natural class that contains “essentially infinite” databases, to which the collapse results of Benedikt et al. and Belegradek et al. can be generalized, for the special case of Q, <, + or R, <, + as underlying structures. The two main tools for proving Main Theorem 1 are (1.) a result of [8] that implies, for U ∈ {Q, R}, the natural order-generic collapse over U, <, + for the class of ω-databases (these are the databases whose active domain is either finite or consists of an unbounded sequence s1 < s2 < · · · of elements in U), and (2.) the following Main Theorem 2, which allows us to lift collapse results for ω-databases to collapse results for ω-representable databases. Main Theorem 2. Let U, <, · · · be an extension of U, < with arbitrary additional predicates. If first-order logic has the natural order-generic collapse over U, <, · · · for the class of ω-databases, then it also has the natural ordergeneric collapse over U, <, · · · for the class of ω-representable databases. Structure of the Paper. In section 2 we provide the notation used throughout the paper. In section 3 we give an outline of the proof and we point out analogies and differences compared with related papers which use a similar proof method. In section 4 we explain the collapse result of [8] which gives us the collapse for ω-databases. In section 5 we examine infinitary logic and give a characterization of ω-representable relations. In section 6 we explain how an ω-representable database can be represented by a ω-database. In section 7 we show that there are first-order interpretations that map an ω-representable database to an ωdatabase, and vice versa. In section 8 we prove the two main theorems. In section 9 we conclude the paper by pointing out further questions and a potential application. 2 Preliminaries We use Q for the set of rationals, R for the set of reals, and ω for the set of non-negative integers. For r, s ∈ R we write int [r, s] to denote the closed interval {x ∈ R : r x s}. Analogously, we write int [r, s) for the halfopen interval int [r, s] \ {s}, and int (r, s) for the open interval int [r, s] \ {r, s}. Depending on the particular context, we use x as abbreviation for a sequence x1 , . . , xm or a tuple (x1 , . . , xm ). Accordingly, if q is a mapping defined on all elements in x, we write q( x) to denote the sequence q(x1 ), . . , q(xm ) or the tuple (q(x1 ), . . , q(xm )). If R is an m-ary relation on the domain of q, we write q(R) to denote the relation {q( x) : x ∈ R}. Instead of x ∈ R we often write R( x). For two disjoint sets A and B we write A B to denote the disjoint union of A and B. The Natural Order-Generic Collapse 133 First-Order Logic FO(τ ). A signature τ consists of finitely many relation and constant symbols. Each relation symbol R ∈ τ has a fixed arity ar(R) ∈ ω. Whenever we refer to some “c ∈ τ ”, we implicitly assume that c is a constant symbol in τ . Analogously, “R ∈ τ ” always means that R is a relation symbol in τ . We use x1 , x2 , . . as variable symbols. Atomic τ -formulas are y1 =y2 and R(y1 , . . , ym ), where R ∈ τ is of arity, say, m and y1 , . . , ym are constant symbols in τ or variable symbols. FO(τ )-formulas are built up as usual from the atomic τ -formulas and the logical connectives ∧, ∨, ¬, the variable symbols x1 , x2 , . . , the existential quantifier ∃, and the universal quantifier ∀. We write qd(ϕ) to denote the quantifier depth of a formula ϕ, i.e., the maximum number of nested quantifiers that occurs in ϕ. We sometimes write ϕ(x1 , . . , xk ) to indicate that x1 , . . , xk are the free variables of ϕ, i.e., those variables that are not bound by a quantifier. We say that ϕ is a sentence if it has no free variables. If we insert additional constant or relation symbols, e.g. < and +, into a signature τ , then we simply write FO(τ, <, +) instead of FO(τ ∪ {<, +}). Structures. Let τ be a signature. A τ -structure A = U, τ A consists of an arbitrary set U which is called the universe of A, and a set τ A that contains – an interpretation RA ⊆ Uar(R) , for each R ∈ τ , – an interpretation cA ∈ U, for each c ∈ τ . and The active domain of A is the set of all constants of A, together with the set of all elements in U that belong to one of A’s relations. Sometimes we explicitly want to specify the universe U of a τ -structure A. In these cases we say that A is a U, τ -structure. In the present paper, we only consider structures with universe U ∈ {R, Q}. For a FO(τ )-sentence ϕ we say that A models ϕ and write A |= ϕ to indicate that ϕ is satisfied when interpreting each symbol in τ by its interpretation in τ A . We write A |= ϕ to indicate that A does not model ϕ. For a FO(τ )formula ϕ(x1 , . . , xk ) and for elements a1 , . . , ak in the universe of A we write A |= ϕ(a1 , . . , ak ) to indicate that the (τ ∪ {x1 , . . , xk })-structure A, a1 , . . , ak models the FO(τ ∪ {x1 , . . , xk })-sentence ϕ. Since it is more convenient for our proof, we will talk about structures instead of databases. A structure can be viewed as a database whose database schema may contain not only relation symbols but also constant symbols. This allows us to restrict ourselves to boolean queries (which are formulated by sentences) instead of considering the general case of k-ary queries for arbitrary k (which are formulated by formulas with k free variables). Order-Generic Collapse. Let U ∈ {R, Q}. A mapping α : U → U is called an order-automorphism of U if it is bijective and strictly increasing. For a U, τ structure A we write α(A) to denote the α(U), τ -structure with Rα(A) = α(RA ) for all R ∈ τ and cα(A) = α(cA ) for all c ∈ τ . Let U, <, · · · be an extension of U, < with arbitrary additional predicates. A FO(τ, <, · · · )-sentence ϕ is called order-generic on A iff for every orderautomorphism α of U it is true that “A, <, · · · |= ϕ iff α(A), <, · · · |= ϕ”. 134 Nicole Schweikardt Let C be a class of structures. We say “first-order logic has the natural ordergeneric collapse over U, <, · · · on structures in C” to express that the following is valid for every signature τ : Let ϕ be a FO(τ, <, · · · )-sentence, and let K be the class of all U, τ -structures in C on which ϕ is order-generic. There exists a FO(τ, <)-sentence ψ which is equivalent to ϕ on K, i.e., “A, <, · · · |= ϕ iff A, < |= ψ” is true for all A ∈ K. Infinitary Logic L∞ω (<, S). Infinitary logic is defined in the same way as first-order logic, except that arbitrary (i.e. possibly infinite) disjunctions and conjunctions are allowed. Only in the context of infinitary logic we allow a signature to contain infinitely many symbols. What we need in the present paper is the following: Let S be a possibly infinite set of constant symbols. The logic L∞ω (<, S) is given by the following clauses: It contains all atomic formulas x=y and x<y, where x and y are variable symbols or elements in S. If it contains ϕ, then it contains also ¬ϕ. If it contains ϕ and if x is a variable symbol, then it contains also ∃xϕand ∀xϕ. If Φ is a (possibly infinite) set of L∞ω (<, S)-formulas, then Φ and Φ are formulas in L∞ω (<, S). The semantics is a direct extension of the semanticsof first-order logic, where Φ is true if there is some ϕ ∈ Φ which is true; and Φ is true if every ϕ ∈ Φ is true. In the present paper we use infinitary logic only for the universe U = R or U = Q, where the constant symbols are interpreted by numbers in U. Consequently, we identify the set S of constant symbols with a set S ⊆ U. Sets of Type at Most ω, ω-Structures, and ω-Representable Structures. Let U ∈ {R, Q}. We say that S ⊆ U is of type ω if U, <, S is isomorphic to U, <, ω. One can easily see that S is of type ω if and only if S = {s1 < s2 < · · ·}, where the sequence (sn )n1 is strictly increasing and unbounded. Accordingly, we say that S is of type at most ω if S is finite or of type ω. We say that a U, τ -structure A is an ω-structure if the active domain of A is of type at most ω. A relation R ⊆ Um is called ω-representable if there is a set S ⊆ U of type at most ω such that R is definable in L∞ω (<, S), i.e. there is a L∞ω (<, S)-formula ϕR (x1 , . . , xm ) with R = { a ∈ Um : U |= ϕR ( a)}. Accordingly, a U, τ -structure A is called ω-representable if each of A’s relations is ω-representable. For better readability, we formulate the rest of the paper only for the case U = R. However, all statements remain correct if one replaces R by Q. 3 Outline of the Proof – The Lifting Method It is by now quite a common method in database theory to lift results from one class of databases to another. This lifting method can be described as follows: Known: A result for a class of “easy” databases. The Natural Order-Generic Collapse 135 Wanted: The analogous result for a class of “complicated” databases. Method: (1.) Show that all the relevant information about a “complicated” database can be represented by an “easy” database. (2.) Show that the translation from the “complicated” to the “easy” database (and vice versa) can be performed in an appropriate way (e.g. via an efficient algorithm or via FO-formulas). (3.) Use this to translate the known result for the “easy” databases into the desired result for the “complicated” databases. In the literature the “easy” database which represents a “complicated” database is usually called the invariant of the “complicated” database. Table 1 gives a listing of recent papers in which the lifting method has been used. Table 1. Some papers using the lifting method. “compl.” dbs “easy” dbs result (“easy” dbs) [9] planar spatial finite dbs evaluation of dbs fixpoint+counting queries [7] region dbs finite dbs order-generic collapse over R, <, +, × (cf. [2]) [5] finitely rep. dbs finite dbs logical characterization of complexity classes [1] finitely rep. dbs finite dbs order-generic collapse over quasi o-minimal structures [here] ω-rep. dbs ω-dbs order-generic collapse over R, <, + result (“compl.” dbs) evaluation of top. FO(R, <)-queries collapse from top. FO(R, <, +, ×)-queries to top. FO(R, <)-queries complexity of query evaluation order-generic collapse over quasi o-minimal structures order-generic collapse over R, <, + In particular, Belegradek, Stolboushkin, and Taitslin [1] and Grädel and Kreutzer [5] show that all the relevant information about a finitely representable database (i.e. a database defined by a finite Boolean combination of orderconstraints) can be represented by a finite database, and that the translation from finitely representable to finite (and vice versa, in [1]) can be done by a first-order interpretation. Grädel and Kreutzer use this translation to carry over logical characterizations of complexity classes to results on the data complexity of query evaluation. They lift, e.g., the well-known logical characterization “PTIME = FO+LFP on ordered finite structures” to the result stating that the polynomial time computable queries against finitely representable databases are exactly the FO+LFPdefinable queries. Belegradek, Stolboushkin, and Taitslin use their FO-translations from finitely representable databases to finite databases (and vice versa) to lift collapse results for finite databases to collapse results for finitely representable databases. 136 Nicole Schweikardt In the present paper the same is done for ω-representable databases and ω-databases (instead of finitely representable databases and finite databases, respectively). I.e.: (1.) We show how all the relevant information about an ω-representable database can be represented by an ω-database (cf. sections 5 and 6). The representation here is considerably different from the representations of [1] and [5]. It is, as the author feels, more natural for the context considered in the present paper. (2.) We show that the translation from the ω-representable to the ω-database (and vice versa) can be done by a first-order interpretation (cf. section 7). (3.) We use this translation to carry over a collapse result for ω-databases from [8] to a collapse result for ω-representable databases (cf. section 8). 4 The Collapse Result for ω-Structures In [8] a structure A is called nicely representable if it satisfies the following conditions: (1) There is an infinite sequence (In )n∈ω of intervals In = int [ln , rn ], such that ln rn < ln+1 , and the sequence (rn )n∈ω is unbounded, (2) n∈ω In is the active domain of A, (3) every relation RA of A is constant on the multi-dimensional rectangles In1 × · · · × Inar(R) (for all n1 , . . , nar(R) ∈ ω). I.e., either all elements in In1 × · · · × Inar(R) belong to RA , or no element in In1 × · · · × Inar(R) belongs to RA . Theorem 1 ([8], Theorem 4). First-order logic has the natural order-generic collapse over R0 , <, + for nicely representable structures. Let us mention that the class of ω-representable structures (considered in the present paper) properly contains both, the class of finitely representable and the class of nicely representable structures, whereas the class of nicely representable structures does not contain the class of finitely representable structures. The proof of Theorem 1 presented in [8] even shows the slightly stronger result which states that first-order logic has the natural order-generic collapse over R, <, + for structures that satisfy the conditions (1), (2’), and (3), where the condition (2’) says that there is a set N ⊆ ω such that n∈N In is the active domain of A. In particular ω-structures, i.e. structures whose active domain is of type at most ω, do satisfy the conditions (1), (2’), and (3). This gives us the following Corollary 1. First-order logic has the natural order-generic collapse over R, <, + for ω-structures. The Natural Order-Generic Collapse 5 137 Infinitary Logic and ω-Representable Relations It is well-known that FO(<, S) allows quantifier elimination over R, for every set of constants S ⊆ R. In this section we show that also L∞ω (<, S) allows quantifier elimination over R, provided that S is of type at most ω. Recall from section 2 that S ⊆ R is of type ω if and only if S = {s1 < s2 < · · ·}, where the sequence (sn )n1 is strictly increasing and unbounded. Accordingly, S is of type at most ω if S is finite or of type ω. However, our aim is not only to show that L∞ω (<, S) allows quantifier elimination, but to give an explicit characterization of the quantifier free formulas. This characterization will give us full understanding of what ω-representable relations look like. Before giving the formalization of the quantifier elemination let us fix some notation. For the rest of this paper let S ⊆ R always be of type at most ω. We write S(i) to denote the i-th smallest elementin S. For infinite S we define S(0) := −∞ and N (S) := ω, and we obtain R = i∈N (S) int [S(i), S(i+1)). For finite S we define S(0) := −∞, N (S) := {0, . . , |S|}, and S(|S|+1) := +∞; and, as before, we obtain R = i∈N (S) int [S(i), S(i+1)). For m 1 and ı = (i1 , . . , im ) ∈ N (S)m we define S( ı) := (S(i1 ), . . , S(im )), and CubeS;ı := int [S(i1 ), S(i1 +1)) × · · · × int [S(im ), S(im +1)) . We say that S( ı) are the coordinates of the cube CubeS;ı. Obviously, Rm = CubeS;ı . ı∈N (S)m Let a = (a1 , . . , am ) ∈ Rm . The type typea;S;ı of a with respect to CubeS;ı is the conjunction of all atoms in {yi =xi , yi <xi , xi =xj , xi <xj : i, j ∈ {1. . , m}, i = j} which are satisfied if one interprets the variables x1 , . . , xm , y1 , . . , ym by the numbers a1 , . . , am , S(i1 ), . . , S(im ). We define typesm to be the set of all complete conjunctions of atoms in {yi =xi , yi <xi , xi =xj , xi <xj : i, j ∈ {1. . , m}, i = j}, i.e., the set of all conjuctions t where, for all i, j ∈ {1, . . , m} with i = j, either yi =xi or yi <xi occurs in t, and either xi =xj or xi <xj or xj <xi occurs in t. Of course, typesm is finite, and typea;S;ı ∈ typesm . Analogously, we define Typesm to be the set of all subsets of typesm , i.e., Typesm = {T : T ⊆ typesm }. Of course, Typesm is finite. For a relation R ⊆ Rm we define TypeR;S;ı := {typea;S;ı : a ∈ R ∩ CubeS;ı} to be the set of all types occurring in the restriction of R to CubeS;ı. We say that TypeR;S;ı is the type of CubeS;ı in R. Of course, TypeR;S;ı ∈ Typesm . In the formalization of the quantifier elimination we further use the following notation: If ϕ is a L∞ω (<, S)-formula with free variables x := x1 , . . , xk and y := y1 , . . , ym , we write ϕ( y /S( ı)) to denote the formula one obtains by replacing the variables y1 , . . , ym by the real numbers S(i1 ), . . , S(im ). 138 Nicole Schweikardt Proposition 1 (Quantifier Elimination). Let S ⊆ R be of type at most ω and let m 1. Every formula ϕ(x1 , . . , xm ) in L∞ω (<, S) is equivalent over R to the formula ϕ̃( x) := t( y /S( ı)) ∧ m S(ij ) xj < S(ij +1) j=1 ı ∈ N (S)m t∈TypeR;S;ı where R ⊆ Rm is the relation defined by ϕ( x). I.e., R = { a ∈ Rm : R |= ϕ( a)} = { a ∈ Rm : R |= ϕ̃( a)}. The proof is similar to the quantifier elimination for FO(<, S) over R. Due to space limitations it is omitted here. Recall from section 2 that a relation R ⊆ Rm is called ω-representable iff there is a set S ⊆ R of type at most ω such that R is definable in L∞ω (<, S). From Proposition 1 we know what R looks like: It is defined by an infinitary boolean combination of order-constraints over S, and it essentially consists of a finite or a countable number of multidimensional rectangles. (Note, however, that also certain triangles are allowed, e.g. via the constraint S(i) x1 < x2 < S(i+1)). An ω-representable binary relation is illustrated in Figure 1. b Fig. 1. An ω-rep. binary relation R. The grey regions are those that belong to R. 6 ω-Representations of Relations and Structures Definition 1. Let R ⊆ Rm . A set S ⊆ R is called sufficient for defining R if S is of type at most ω and R is definable in L∞ω (<, S). Remark 1. We say that a relation R ⊆ Rm is constant on a set M ⊆ Rm if either all elements of M belong to R or no element of M belongs to R. From Proposition 1 we obtain that a set S ⊆ R of type at most ω is sufficient for defining R if and only if R is constant on the sets CubeS;ı;t := { b ∈ CubeS;ı : typeb;S;ı = t} , for all ı ∈ N (S)m and all t ∈ typesm . The Natural Order-Generic Collapse 139 Let R ⊆ Rm be ω-representable and let S ⊆ R be sufficient for defining R. From Remark 1 we know, for all ı ∈ N (S)m and all t ∈ typesm , that either R ∩ CubeS;ı;t = ∅ or R ⊇ CubeS;ı;t . This means that if we know, for each ı ∈ N (S)m and each t ∈ typesm , whether or not R contains an element of CubeS;ı;t , then we can reconstruct the entire relation R. For ij = 0 we represent the interval int [S(ij ), S(ij +1)) ⊆ R by the number S(ij ). Consequently, for ı ∈ (N (S) \ {0})m , we can represent CubeS;ı;t ⊆ Rm by the tuple S( ı) ∈ S m . The information whether or not R contains an element of CubeS;ı;t can be represented by the relation RS;t := {S( ı) : ı ∈ (N (S) \ {0})m and R ∩ CubeS;ı;t = ∅}. In general, we would like to represent every CubeS;ı;t , for every ı ∈ N (S)m , by a tuple in S m . Unfortunately, the case where ij = 0 must be treated separately, because S(0) = −∞ ∈ S. There are various possibilities for solving this technical problem. Here we propose the following solution: Use S(1) to represent the interval int [S(0), S(1)). With every tuple ı ∈ N (S)m we associate a characteristic tuple char( ı) := (c1 , . . , cm ) ∈ {0, 1}m and a tuple ı ∈ (N (S) \ {0})m via cj := 0 and ij := 1 if ij = 0, and cj := 1 and ij := ij if ij = 0. Now CubeS;ı;t can be represented by the tuple S( ı ) ∈ S m . The information whether or not R contains an element of CubeS;ı;t can be represented by the relations RS;t;u := {S( ı ) : ı ∈ N (S)m , char( ı) = u, and R ∩ CubeS;ı;t = ∅} (for all u ∈ {0, 1}m ). This leads to Definition 2 (ω-Representation of a Relation). Let R ⊆ Rm be ω-representable, and let S ⊆ R be sufficient for defining R. (a) We represent the m-ary relation R over R by a finite number of m-ary relations over S as follows: The ω-representation of R with respect to S is the collection repS (R) := RS;t;u t∈types , u∈{0,1}m , m where RS;t;u := {S( ı ) : ı ∈ N (S)m , char( ı) = u, and R ∩ CubeS;ı;t = ∅}. Here, for ı ∈ N (S)m we define ı and char( ı) via ij := 1 and char( ı) j := 0 if ij = 0, and ij := ij and char( ı) j := 1 if ij = 0. (b) For x ∈ CubeS;ı;t we say that u := char( ı) is the characteristic tuple of x w.r.t. S, y := S( ı ) is the representative of x w.r.t. S, and t is the type of x w.r.t. S. From Remark 1 we obtain that x ∈ R iff y ∈ RS;t;u . We will now tranfer the notion of “ω-representation” from relations to τ -structures. Recall from section 2 that a R, τ -structure A is called ω-representable iff each of A’s relations is ω-representable. Definition 3. Let A be a R, τ -structure. A set S ⊆ R is called sufficient for defining A if – S is of type at most ω, – cA ∈ S, for every constant symbol c ∈ τ , and 140 Nicole Schweikardt – S is sufficient for defining RA , for every relation symbol R ∈ τ . Let A be a R, τ -structure, and let S be a set sufficient for defining A. According A to Definition 2, each of A’s relations of arity, say, m can be represented by A R A a finite collection repS (R ) = RS;t;u t∈types , u∈{0,1}m of relations over S. I.e. m A can be represented by a structure repS (A) with active domain S as follows: Definition 4 (ω-Representation of a Structure). Let τ be a signature. (a) The type extension τ of τ is the signature which consists of – the same constant symbols as τ , – a unary relation symbol S, and – a relation symbol Rt;u of arity, say, m, for every relation symbol R ∈ τ of arity m, every t ∈ typesm , and every u ∈ {0, 1}m . (b) Let A be an ω-representable R, τ -structure and let S be a set sufficient for defining A. We represent A by the R, τ -structure repS (A) which satisfies – crepS (A) = cA (for each c ∈ τ ), – S repS (A) = S (for the unary relation symbol S ∈ τ ), and rep (A) A – Rt;u S = RS;t; u∈ u (for each R ∈ τ , each t ∈ typesar(R) , and each ar(R) ). {0, 1} 7 FO-Interpretations The concept of first-order interpretations (or, reductions) is well-known in mathematical logic (cf., e.g. [4]). In the present paper we consider the following easy version: Definition 5 (FO-Interpretation of σ in τ ). Let σ and τ be signatures. A FO-interpretation of σ in τ is a collection Φ = ϕc (x) c∈σ , ϕR (x1 , . . , xar(R) ) R∈σ of FO(τ )-formulas. For every U, τ -structure A, the U, σ-structure Φ(A) is given via – {cΦ(A) } = {a ∈ U : A |= ϕc (a)}, for each constant symbol c ∈ σ, – RΦ(A) = { a ∈ Uar(R) : A |= ϕR ( a)}, for each relation symbol R ∈ σ. Making use of a FO-interpretation of σ in τ , one can translate FO(σ)-formulas into FO(τ )-formulas (cf., [4, Exercise 11.2.4]): Lemma 1. Let σ and τ be signatures, let Φ be a FO-interpretation of σ in τ , and let d be the maximum quantifier depth of the formulas in Φ. For every FO(σ)-sentence χ there is a FO(τ )-sentence χ with qd(χ ) qd(χ)+d, such that “A |= χ iff Φ(A) |= χ” is true for every U, τ -structure A. Proof. χ is obtained from χ by replacing every atomic formula R( x) (resp. x=c) by the formula ϕR ( x) (resp. ϕc (x)). The Natural Order-Generic Collapse 141 The following lemma shows that A is first-order definable in repS (A), i.e.: all relevant information about A can be reconstructed from repS (A) (if A is ωrepresentable and if S is sufficient for defining A). Lemma 2. There is a FO-interpretation Φ of τ in τ ∪{<} such that Φ(repS (A), <) = A, for every ω-representable R, τ -structure A and every set S which is sufficient for defining A. Proof (sketch). For every constant symbol c ∈ τ we define ϕc (x) := x=c. For every relation symbol R ∈ τ of arity, say, m we construct a formula ϕR ( x) which expresses that x ∈ R. From Definition 2(b) we know that x ∈ R iff y ∈ RS;t;u , where y , t, and u are the representative, the type, and the characteristic tuple, respectively, of x w.r.t. S. It is straightforward to construct, for fixed t ∈ typesm and u ∈ {0, 1}m , a FO(τ , <)-formula ψt,u ( x) which expresses that – x has type t w.r.t. S, – u is the characteristic tuple of x w.r.t. S, and – for the representative y of x w.r.t. S it holds that Rt;u ( y ). The disjunction of the formulas ψt;u ( x), for all t ∈ typesm and all u ∈ {0, 1}m , gives us the desired formula ϕR ( x) which expresses that x ∈ R. We now want to show the converse of Lemma 2, i.e., we want to show that the ω-representation of A is first-order definable in A. Up to now the ω-representation repS (A) was parameterized by a set S which is sufficient for defining A. For the current step we need the existence of a canonical, first-order definable set S. For this canonization we can use the following result of Grädel and Kreutzer [5, Lemma 8]: Lemma 3 (Canonical set sufficient for defining R). Let R ⊆ Rm be ωrepresentable and let SR be the set of all elements s ∈ R which satisfy the following condition (∗): There are a1 , . . , am , ε ∈ R, ε > 0, such that one of the following holds: – For all s ∈ int (s−ε, s) and for no s ∈ int (s, s+ε) we have R a[s/s ] . Here a[s/s ] means that all components aj =s are replaced by s . for all s ∈ int (s, s+ε) we have R a[s/s ] . – For no s ∈ int (s−ε, s) and – R a[s/s ] holds for all s ∈ int (s−ε, s+ε) \ {s}, but not for s = s. – R a[s/s ] holds for s = s, but not for any s ∈ int (s−ε, s+ε) \ {s}. The following holds true: (1.) SR is included in every set S ⊆ R which is sufficient for defining R. (2.) SR is sufficient for defining R. The set SR is called the canonical set sufficient for defining R. It is straightforward to formulate a FO(R, <)-formula ζR (x) which expresses condition (∗), such that SR = {s ∈ R : R, R, < |= ζR (s)} for every ω-representable m-ary relation R. 142 Nicole Schweikardt Definition 6 (Canonical Representation of a Structure). Let τ be a signature and let A be an ω-representable R, τ -structure. The set SA := {cA : c ∈ τ } ∪ SR A R∈τ is called the canonical set sufficient for defining A. Similarly, the representation canrep(A) := repSA (A) is called the canonical representation of A. Remark 2. It is straightforward to see that α canrep(A) = canrep α(A) is true for every ω-representable R, τ -structure A and every order-automorphism α of R. We are now ready to prove the converse of Lemma 2. Lemma 4. There is a FO-interpretation Φ of τ in τ ∪ {<} such that Φ (A, <) = canrep(A), for every ω-representable R, τ -structure A. Proof (sketch). For every constant symbol c ∈ τ we define ϕc (x) := x=c. For every relation symbol R ∈ τ let ζR (x) be the formula from Lemma 3 describing the canonical set sufficient for defining RA . Obviously, the formula ϕS (x) := c∈τ x=c ∨ R∈τ ζR (x) describes the canonical set sufficient for defining A. For every relation symbol Rt;u ∈ τ of arity, say, m we construct a formula ϕRt;u ( y ) which expresses that y ∈ Rt;u . We make use of Definition 2(b). I.e., ϕRt;u states that y1 , . . , ym satisfy ϕS and that there is some x such that – – – – y is the representative of x w.r.t. SA , R( x), x has type t w.r.t. SA , and u is the characteristic tuple of x w.r.t. SA . It is straightforward to formalize this in first-order logic. 8 The Main Theorems and Their Proofs We first show the Main Theorem 2. Let R, <, · · · be an extension of R, < with arbitrary additional predicates. If first-order logic has the natural order-generic collapse over R, <, · · · for the class of ω-structures, then it has the natural order-generic collapse over R, <, · · · for the class of ω-representable structures. Proof. Let τ be a signature, let ϕ be a FO(τ, <, · · · )-sentence, and let K be the class of all ω-representable R, τ -structures on which ϕ is order-generic. We need to find a FO(τ, <)-sentence ψ such that “A, <, · · · |= ϕ iff A, < |= ψ” is valid for all A ∈ K. Let τ be the type extension of τ . We first make use of Lemma 2: Let Φ be the FO-interpretation of τ in τ ∪ {<} which is obtained in Lemma 2. In The Natural Order-Generic Collapse 143 particular, we have Φ(canrep(A), <) = A, for all A ∈ K. From Lemma 1 we obtain a FO(τ , <, · · · )-sentence ϕ such that “canrep(A), <, · · · |= ϕ iff Φ(canrep(A), <), <, · · · |= ϕ iff A, <, · · · |= ϕ” is true for all A ∈ K. From our assumption we know that first-order logic has the natural ordergeneric collapse over R, <, · · · for the class of ω-structures. Of course canrep(A) is an ω-structure. Furthermore, with Remark 2 we obtain that ϕ is order-generic on canrep(A) for all A ∈ K. Hence there must be a FO(τ , <)-sentence ψ such that “canrep(A), <, · · · |= ϕ iff canrep(A), < |= ψ ” is true for all A ∈ K. We now make use of Lemma 4: Let Φ be the FO-interpretation of τ in τ ∪{<} which is obtained in Lemma 4. In particular, we have Φ (A, <) = canrep(A), for all A ∈ K. According to Lemma 1, we can transform ψ into a FO(τ, <)-sentence ψ such that “A, < |= ψ iff Φ (A, <), < |= ψ iff canrep(A), < |= ψ ” is true for all A ∈ K. Obviously, ψ is the desired sentence, and hence our proof is complete. Main Theorem 2 and Corollary 1 directly give us the following Main Theorem 1. First-order logic has the natural order-generic collapse over R, <, + for the class of ω-representable structures. 9 Conclusion We have developed the notion of ω-representable databases, which is a natural generalization of the notion of finitely representable (i.e. dense order constraint) databases. We have shown that any collapse result for ω-databases can be lifted to the analogous collapse result for ω-representable databases. In particular, this implies that first-order logic has the natural order-generic collapse over R, <, + and Q, <, + for ω-representable databases. Recursive Databases. In theoretical computer science one is often interested in things that can be represented in the finite. This is not a priori true for ω-representable databases. sHowever, there is a line of research considering recursive structures (cf. [6]). In this setting a database is called recursive if there is, for each of its relations, an algorithm which effectively decides whether or not an input tuple belongs to that relation. The results of the present paper are, in particular, true for the class of ω-representable recursive databases, which still is a rather natural extension of the class of finitely representable (i.e. dense order constraint) databases. Open Questions. It is an obvious question if the collapse results discussed in the present paper also hold for Z-databases (i.e. databases whose active domain is of type at most Z) and for Z-representable databases. It should be straightforward to transform the proof of Main Theorem 2 in such a way that it is valid for these databases. However, we do not know if the corresponding analogue to Corollary 1 is valid. 144 Nicole Schweikardt Another question is whether such a collapse result for ω-representable databases is valid also over structures other than R, <, + and Q, <, +. E.g.: Is it valid over R, <, +, ×, or even over all (quasi) o-minimal structures? (This would then fully generalize the results of Belegradek et al. [1].) We also want to mention a potential application concerning topological queries: Kuijpers and Van den Bussche [7] used the theorem of Benedikt et al. [2] to obtain a collapse result for topological first-order definable queries. One step of their proof was to encode spatial databases (of a certain kind) by finite databases, to which the result of [2] can be applied. Here the question arises whether there is an interesting class of spatial databases that can be encoded by ω-representable (but not by finite) databases in such a way that our main theorem helps to obtain some collapse result for topological queries. Acknowledgements I want to thank Luc Segoufin for pointing out to me the connection to topological queries. Furthermore, I thank Clemens Lautemann for helpful discussions on the topics of this paper. References 1. O.V. Belegradek, A.P. Stolboushkin, and M.A. Taitslin. Extended order-generic queries. Annals of Pure and Applied Logic, 97:85–125, 1999. 2. M. Benedikt, G. Dong, L. Libkin, and L. Wong. Relational expressive power of constraint query languages. Journal of the ACM, 45:1–34, 1998. 3. M. Benedikt and L. Libkin. Expressive power: The finite case. In G. Kuper, L. Libkin, and J. Paredaens, editors, Constraint Databases, pages 55–87. Springer, 2000. 4. H.D. Ebbinghaus and J. Flum. Finite Model Theory. Springer, 1999. 5. E. Grädel and S. Kreutzer. Descriptive complexity theory for constraint databases. In Proc. CSL 1999, volume 1683 of Lecture Notes in Computer Science, pages 67–81. Springer, 1999. 6. D. Harel. Towards a theory of recursive structures. In Proc. MFCS 1998, volume 1450 of Lecture Notes in Computer Science, pages 36–53. Springer, 1998. 7. B. Kuijpers and J. Van den Bussche. On capturing first-order topological properties of planar spatial databases. In Proc. ICDT 1999, volume 1540 of Lecture Notes in Computer Science, pages 187–198. Springer, 1999. 8. C. Lautemann and N. Schweikardt. An Ehrenfeucht-Faı̈ssé approach to collapse results for first-order queries over embedded databases. In Proc. STACS 2001, volume 2010 of Lecture Notes in Computer Science, pages 455–466. Springer, 2001. 9. L. Segoufin and V. Vianu. Querying spatial databases via topological invariants. JCSS, 61(2):270–301, 2000.