V E R T I C A L D A T A B A S E D E S I G N The concept of vertical data files, in fact, is not new at all. Copeland et al (1985) presented an attribute-level Decomposition Storage Model called DSM, similar to the Attribute Transposed File model (AT F) (Batory, 1979) that stores each column of a relational table into a separate table. However, DSM was shown to perform well. It utilizes surrogate keys to map individual attributes together, hence requiring a surrogate key to be associated with each att ribute of each record in the database. Attribute -level vertical decomposition is also used in Remotely Sensed Imagery, e.g. Landsat Thematic Mapper Imagery, where it is called Band Sequential (BSQ) format. Beyond attribute -level decomposition, Wong et al (1985) presented the Bit Transposed File model (BTF), which further partitioned each column into bit level and utilized encoding methods to reduce the storage space. Due to the difficulty of accessing files directly in an operating system, a higher layer of accessing known as database is invented. In most cases, databases are stored horizontally which is suitable for data retrieval but not data mining purposes. On the other hand, vertical database can achieve both data retrieval and data mining purposes. In vertical databases, data is stored vertically and processed horizontally through fast, multi-operand logical operations, such as AND, OR, XOR, and complement. Predicate tree (P-tree) is one of lossless vertical structures that can meet the requirement. P-tree is suitable to represent numerical and categorical data and has been successfully used in OLAP operations (Wang et al, 2003) and various data mining applications, including classification (Khan et al, 2002), clustering (Denton et al, 2002), and association rule mining (Ding et al, 2002). A vertical database consists of a set of P -trees rather than a set of relational tables. To convert a relational table of horizontal records to a set of vertical P trees, the table has to be projected into colu mns, one for each attribute, retaining the original record order in each. Then each attribute column is further decomposed into separate bit vectors, one for each bit position of the values in that attribute. Figure 1 shows a relational table with three a ttributes, in which all of the attributes are numeric. Figure 2 shows the decomposition process from the relational table R to a set of bit vectors. R (A1, A2, A3) A2 5 2 7 7 2 4 3 1 2 3 2 2 5 7 2 3 7 2 2 5 5 1 1 4 Figure 1. A relational table R.