Download Caching Management of Mobile DBMS

Caching Management of Mobile DBMS Jenq-Foung Yao Department of Mathematics and Computer Science Georgia College & State University Milledgeville, GA 31061 Email: [email protected] Phone: (912) 445-1626 Fax: (912) 445-2602 Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, TX 75275 Email: [email protected] Phone: (214) 768-3087 Fax: (214) 768-3085 Abstract Unlike a traditional client-server network, a mobile computing environment has a very limited bandwidth in a wireless link. Thus, one design goal of caching management in a mobile computing environment is to reduce the use of wireless links. This is the primary objective for this research. Quota data and private data mechanisms are used in our design so that an MU user is able to query and update data from the local DBMS1 without cache coherence problems. The effect of the two mechanisms is to increase the hit ratio. An agent on an MU along with a program on a base station are used to handle the caching management, including prefetching/hoarding, cache use, cache replacement, and cache-miss handling. The simulation results clearly indicate that our approaches are improvements to the previous research. Keywords: Caching, Mobile Computing, Mobile DBMS, Mobile Unit (MU), Database, Agent, User Profile, Validation Report (VR) 1 The local DBMS contains cache data on an MU. 1 1. INTRODUCTION For the past ten years, personal computer technology has been progressing at an astonishing rate. The size of a PC is becoming smaller, and the capacity of software and hardware functionality is increasing. Simultaneously, the technologies of cellular communications, satellite services and wireless LAN are rapidly expanding. These state-of-art technologies of PC and wireless have brought about a new breed of technology called mobile computing (MC). Several mobile computing examples have been discussed in [7] and [1]. Most people acknowledge that the mobile environment is an expansion of distributed systems. Mobile units (MUs) and the interfacing devices (base stations that may interact with MUs) are added to the existing distributed systems (see Figure 1). This is a client/server network setting. The servers would be on some fixed hosts or base stations, and the clients could be fixed hosts or mobile units. The mobile units are frequently disconnected for some periods because of the expansive wireless connection, bandwidth competition, and limited battery power. To allow users to access resources at all times no matter which mode they are in, many research issues need to be dealt with. The data caching/replication on an MU is one of the important methods that can help to resolve this problem. 2. PREVIOUS WORKS In the present research, caching management is handled in two different levels - the file system level and the DBMS level. Issues on the file system level have been addressed widely [11] [22] [12] [21] [17]. Some of the approaches on the file system level have developed real systems that have been used daily, such as Coda [22]. These research efforts on the file system level have some shortcomings. The major one is that all of them explicitly exclude a DBMS. In addition, they use the optimistic replication control. This kind of control allows WRITE operation on different partitions (locations). Committing data in a timely fashion is not important in these systems. Data are allowed to have several different versions on the different partitions, and later will be integrated (and committed). In the academic environment of these approaches, users rarely write to the same file at the same time. Most of the previous works in mobile DBMS [1] [2] [13] [3] [25] concentrated on the time window “w” and the size of Invalidation Report (IR). These researches impacted the wireless link usage to a certain degree. However, these previous approaches assumed read-only operation on the local cache, just one of the possibilities in which to use cache data on an MU. These previous researches also uplinked to the fixed network for the cache-miss data, which also is just one of the cache-miss handling possibilities. Only a few researchers at the DBMS level have dealt with the update issue on MUs. Chan, Si, and Leong proposed a mobile caching mechanism based on an object-oriented paradigm [5]. Conceptually their approach is based on an idea that is similar to ours. That is, to cache the frequently accessed database items in MUs to improve performance of database queries and availability of database data items for query processing during disconnection. This is a concept called hot spot [2]. This concept states that frequently accessed data are likely to be accessed again in the future. The other research, which dealt with WRITE operations, is in [24]. In this research, virtual resources are pre-allocated on an MU so that the MU has its own potential share of data. The research is based on a trucking distribution system where each truck is pre-assigned an amount of load. When a truck has 2 actually loaded goods, it then reports the actual load to the database server. Only the aggregate quantity data have been dealt with. The approach is very similar to research proposed by O'Neil [19]. O'Neil proposed an escrow transactional method that pre-allocates some fixed portion of an aggregate quantity data to a transaction prior to its commit. When the time comes to commit the transaction, there ought to be enough value of this data item available due to the pre-allocation. The whole mechanism of the approach in [19] takes place in a centralized DBMS. 3. OBJECTIVES FOR THE RESEARCH The objectives of this research are to provide some solutions for unaddressed issues in the DBMS area. These issues include how to handle WRITE operations on the local cache, different techniques to handle cache-misses, how to deal with cache coherence, etc. We will fully address the issues in the next two sections. Performance evaluations based on simulation results are then discussed in section six. 4. MODEL OF CACHING MANAGEMENT A client-agent-server architecture is used in our model. The rationale is that we would like to build a mobile DBMS on top of existing DBMSs. The agent is an interface between a database server and a mobile client. All the additional functionality to the existing DBMSs is built in the agent. That is, among other things the agent handles all cache prefetch, cache update, cache-miss handling, and cache coherence. The agent includes an MU agent on an MU and a VR2 handler on a base station. The data prefetching can be done either through a wireful link or through a wireless link. However, the wireful link is always the preference as long as the situations allow. The cache will be stored in the local disk of an MU, and organized in the format of relations. These relations are deemed part of the local RDBMS and can be queried via the local RDBMS. 4.1. Model Assumptions Assumptions in our research are as follows: 1. Only the issues at the DBMS level will be dealt with in this research. 2. The environment is a typical client-server network. The server is on a fixed network. The client could be a mobile unit or a fixed host; we mainly deal with issues on a mobile unit. 3. A user is able to use the MU for an extended time frame. Therefore, data can be cached there for long periods. 4. An MU has an independent local DBMS. This local DBMS and the database server on the fixed network all support the relational models. SQL on the MU can query both the private RDBMS and the RDBMS on the database server. An MU agent serves as a query interface among them. 5. All the MUs are portable laptop computers. We assume they are as powerful as their desktop counterparts. 2 Please refer to Section 4.5 3 6. The data prefetching can be done either through a wireful link or through a wireless link. However, the wireful link is always the preference as long as the situations allow. In addition, a mass prefetching is preferable in the case of low network traffic, such as overnight. 7. The cache will be stored in the local disk of an MU, and organized in the format of relations. These relations are deemed part of the local RDBMS on the MU and can be queried via the local RDBMS. 8. We assume that downlink and uplink channels have the same bandwidth capacity for the evaluation purpose. 9. The granularity of the fetching data is a portion of relation. 4.2. Caching Granularity The caching granularity is one of the key factors in caching management systems. Most of the mobile systems at the Operating system level use a file or a group of files (cluster or replica) as the caching granularity. Using a file as the granularity is not an appropriate choice. On the DBMS level, the caching granularities are usually an attribute [5], an object [5], a page [4], a tuple [9], or a semantic region [6]. Attribute caching and tuple caching create undesirable overheads due to the large number of independent cache attributes/tuples. On the other hand, object caching, page caching and semantic caching reduce these overhead problems by collecting data in a group of tuples. The static grouping in page caching and object caching is lacking of flexibility compared to the attribute/tuple caching. Semantic caching provides the flexibility, permitting the dynamic grouping to the requests of the present queries. However, how a semantic caching can be implemented in detail is still unclear, such as who is in charge of the cache update [6]. We propose a portion of a relation as the caching granularity. This portion of the relation contains a group of tuples from the original relation. These tuples are extracted from the original relation with query operators SELECTION() and PROJECTION. We also preserve the primary key’s attributes in a cache relation. Therefore, our approach is not exactly like that of the updateable snapshot. We call our approach “Morsel Caching” because we cache a portion of a base relation as a cache relation. We then call this caching granularity “Cache relation”. The cache relation is defined in Definition 2. One may view a cache relation as a special case of updateable snapshot. Definition 1 Let a database D = {Ri} be a set of base relations. For every base relation Ri, let Ai stand for the set of its attributes. Let Aik be the set of attributes that are included in the primary key of Ri. A user morsel, UM, is a tuple <UA, UP, UC>, where UC = UAUP (Rj) where Rj is one of the relations in D; UA  Aj and Ajk  UA; Up = P1  P2  P3  …  Pn where Pj is a conjunctive of simple predicates, i.e. Pj = bj1  bj2  bj3  …  bjl, each bjt is a simple predicate. Definition 2 A Cache Relation, CR, is the smallest relation which contains a set of Uc’s from Definition 1 and all associated with the same base relation. The motivation why we do not have a join included to form a cache relation is to make the cache-miss handling and the update synchronizing issues much easier. 4 Note that we use a cache relation as the caching granularity in prefetching and in cache replacement. The cache update granularity, however, is only a subset of attributes of a tuple owing to the fact that the update takes place via a wireless link whose bandwidth is limited. This is different from other approaches, which use the same granularity for prefetching, cache replacement and cache update. Thus, our approaches are much more flexible, in that the granularity is not fixed but dynamic for different occasions. 4.3. User Profile There are several previous approaches that use a mechanism called “hoarding profile”. The hoarding profile lets users choose their preference data to cache. This is the most direct and effective way to cache data that users need. The drawback is that it needs human involvement and people may not know what they will really use. Hence, only the sophisticated users are able to provide the most effective cache data. A classic example is Coda [22], which uses the “hoard profile” command scripts to update the “hoard database”. Data are fetched to the cache based on the hoard database. Another example is Thor [10], which uses an object-oriented query language to describe the hoarding profile. Its applications are similar to Coda’s. Algorithm Extract-Cache-Relation: 1. Extract-Cache-Relation(user-profile){ /* This algorithm extracts data from the base relation and inserts it into the cache relation */ 2. For each entry of the user profile { 3. if (cache_flag = 0) then { /* the corresponding cache relation is not exist */ 4. { create a cache relation; } 5. set cache_flag = 1; /* mark the cache relation as cached */ 6. For (each attribute j that is not in the cache relation) do 7. { add a column for the attribute into the cache relation; } 8. extract data from the base relation and insert the data into the cache relation; 9. } 10. } We adapt this concept as part of our caching mechanism. In our model, data are fetched to the cache based on the data access frequency and a user hint. A user hint contains the projected needs of a user as specified in a user profile, and the relations that are presently used. The user profile is created prior to the very first prefetching. We let a user create his user profile by running a simple program which prompts the user to enter the information. This information includes the name of a cache relation, the name of the related base relation, the attributes of the primary key, the attributes of the cache relation, and a set of criteria that will be used to create the cache relation. The program then organizes the entered information in the format shown in Figure 2. A user profile is an input to another program (see Algorithm ExtractCache-Relation). The output of the program is a set of cache relations. Hence incorporating a user profile with the program could extract a portion of base relations into cache relations. Alternatively, the 5 DBA could perform these tasks for users. Should a user lack a user profile for whatever reasons, the cache relations would be the base relations on the database server. That is, if a user chooses not to have a user profile, the cache relations would be the same as the base relations. Which base relations will be fetched in the prefetching stage is based on the relation access frequency. Cache relations are fetched onto an MU based on the user profile during the prefetching stage. A cache relation created with the user profile has a different name from the base relation. It is up to the user to name a cache relation in the user profile. If a cache relation is not created with a user profile, then the cache relation would share the same name with the original relation. In this case, a cache relation is like a replication of the base relation. The program (Extract-Cache-Relation) will be run against the user profile at the very first prefetching. This is a one-time deal. Once the user profile has been created and has been used to extract cache relations, it can also be used to assist in handling cache-misses. When a cache-miss occurs, the MU agent can look up the user profile and trace back to the base relation. If there is no entry for the cache relation, then the MU agent needs to create a new entry for the missing relation based on the cache-miss query. If the cache-miss query involves a join, each relation involved in the join will be an entry that needs to be created. That is, if three relations involve a join, three entries will be created. The new entry is created in a temporary user profile. The MU agent then runs the extract-cache-relation program against the temporary user profile. Once this has been done, the temporary user profile will be appended to the user profile. In the future, if the user decides to cache more relations, the procedure is similar to the case of a cache-miss. This user profile is kept on the MU that a user is using. From time to time, it will be backed up to the database server. Each entry could be used to create a user morsel (see Definition 1 in Section 4.2). There are six parameters for each entry. The first parameter is the cache flag bit. The ON bit (1) means the cache relation exists, and the OFF bit (0) means the cache relation does not exist. Initially, all the cache flags are set to OFF (0) bit. The second parameter is the name of the base relation from which that data will be extracted. The third parameter is a name for the new cache relation. It is the user’s choice for the name. The fourth parameter is a set of attributes of the primary key from the base relation. The fifth parameter is a set of attributes from the base relation that will be included in the cache relation. The last parameter is a set of criteria that is used to select a user morsel. These criteria are in the same format of the user morsel’s criteria (Up), which are defined in Definition 1 in Section 4.2. To extract and insert data into the new cache relation with SQL, the MU agent first submits an INSERT operation to the server to insert the data to a temporary base relation with the same name as the cache relation. Note that the data is SELECTed from the corresponding permanent relations and inserted into the temporary after it has been CREATEd. This relation then is moved to the local DBMS as a cache relation. Sometimes an existing cache relation needs to be modified, such as adding more columns for some new attributes. This case happens when a cache relation needs to be extended, as when another entry of the user profile attempts to create the same cache relation. We only allow one cache relation to be extracted from a base relation. Thus, creating another cache relation is not allowed. The solution to this problem is to add the new attributes in this entry of the user profile to the existing cache relation. To add a column for a new attribute into an existing relation, use the ALTER command in SQL. After adding 6 new attributes into an existing cache relation, use UPDATE command in SQL to insert values for the new attributes of the cache relation. Note that one base relation can only produce one cache relation and more attributes can be added into the cache relation later. A cache relation will not be split into two or more relations, nor will it be coalesced with other relation(s). This is very different from the semantic caching, in which a semantic region may be split into several semantic regions or coalesced with other semantic regions over a time frame [6]. In addition, the MU user will query these cache relations as their own private relations on the local DBMS. He is not aware of the morsels within a relation. 4.4. Cache Replacement Policy The cache replacement policy is another factor that affects the caching performance. This policy determines which part of cache will be replaced when the cache is running out of space for new cache data. There are three different types of cache replacement policies, which are based on temporal locality, spatial locality, and semantic locality. Temporal locality is the property that data in which have been used recently will be used again soon, such as MRU. Therefore, a cache replacement policy using temporal locality property may replace the lease recently used (LRU) data. Spatial locality is the property that the data spatially close to the recently used data are likely to be used again in the near future. Thus, a cache replacement policy using spatial locality property would replace the data that are spatially farther away from the recently used data. The property of semantic locality is that a semantic region which is most similar to a currently being accessed region is most likely to be used in the future. Regions which are not related to the current queries, based on a semantic distance function, should be targets for replacement first. We propose a new replacement policy which uses the property that less frequently used data are less likely to be used again (LFU). This proposal is based on our observation, Kenning’s empirical results [15], and the common belief of “hot spot”. This property is thus called frequency locality. The replacement policy, which uses the frequency locality, will replace the least frequently used cache relation with a new cache relation when the cache does not have enough space. A frequency function constantly records all the access frequencies for all relations on the database server and the local DBMS. The relations of the local DBMS have different names from those of the base relations. However, we keep one counter for both. In any case, it is reasonable to assume that less frequently used data on a DBMS is less likely to be used again, because the access frequency history is a lifetime record. Therefore, using frequency locality property to apply on the cache replacement policy is suitable. 4.5. Cache Update There are two aspects in terms of cache update. The first aspect is how to update the cache on an MU when data on the database server is updated. The second aspect is how the MU user writes on the cache, and how to deal with the data consistency problem. For the first aspect, we adapt the idea proposed in [1], in which an invalidation report (IR) has been used to inform MUs which cache data are invalid. We use one of the three proposed models, the time stamp model. In addition, we add modified data items in the report and change the file server type to be 7 stateful. We call this report a Validation Report (VR). The motivation of using VR is because our cache relations are kept in the local DBMS for a long time, if not permanently. The base stations in our model keep track of the cached public relations within their wireless cell. Only the modified non-quota public data within a certain time (say L seconds) and has been requested by an MU during the registration will be included in the VR. When an MU gets into the cell of a base station, the MU has to register to the base station. The base station then asks the base station with which the MU was previously registered to hand over all the information about the MU, such as VR. This step is required in order for hand-off to be completed. A data item in a VR included Time Stamp, Relation Name, Primary Key, and Updated attributes as shown below: (Time Stamp, Relation Name, Primary Key, Updated Attribute(1), Updated Attribute(2), … ,Updated Attribute(n) ) Thus, a data item in a VR is the updated portion of a tuple. Each MU has one VR at the base station. A VR is sent every L seconds to a specific MU. An MU waits and checks the incoming VR before answering a normal query (see Figure 3). The broadcasting time (the time stamp) for a VR is also included in the VR. When an MU has received its VR, it then sends an acknowledgment that contains the broadcast time of the VR to the base station to confirm the receiving. The base station keeps all the modified data items in the VR from the last time it received an acknowledgment from the MU. Upon receiving a new acknowledgment, the base station then compares each data item's timestamp in the VR with the broadcast time in the acknowledgment. In addition, it discards the data items in the VR that are older than the broadcast time in the acknowledgment. For the second aspect, the MU user updates the cache itself, meaning the user writes on the cache. We propose two new ideas here to make cache write possible without releasing the ACID property. The first new idea is categorizing relations into two types: the public relations and the private relations. When a query accesses a private relation owned by the user, this query can be answered immediately, whereas answering a query, which accesses a public relation, needs to wait for the next VR. This is owing to the fact that a private relation is only modified by that one user. The data dictionary on an MU contains the information about whether a relation is public or private. The second new idea is the quota mechanism. The MU user can read and write on the quota data. Both of these two new ideas are elaborated in the following two subsections. 4.6. Public Relations vs. Private Relations Previous research does not differentiate various types of relations. The most obvious differentiation is to separate public relations from private relations. The difference between the two is that the public relation is shared and can be modified by a group of people, whereas the private relation is solely owned and used by one user. Many such private relations exist in the academic environment. For example, the relations owned by students are private relations, which are solely for their personal use. Some other private relations are written by one user but can be read by a group of people. The owner grants the read rights. The public access to this type of private relation is read only. The owner is the one who can make 8 changes to these private relations. Therefore, this type of relation is also categorized as private relation. We define the private relation in Definition 3, the public relation in Definition 4. Definition 3 A Private Relation is a relation whose primary copy exists at the MU. Only the owner of the private relation can modify this relation. Definition 4 A Public Relation is a relation whose primary copy exists at a database server. Portion of it may be cached at an MU. A group of authorized people can modify this relation. Note that the database server knows about the ownership because the ownership information is part of the DBMS' security mechanism, and usually stored in the data dictionary. Therefore, the database server knows an MU user is the only one who may update the private relations on the MU, and nobody else is able to modify the copy of the private relations on the database server. The private relations should be downloaded at the very first prefetching. This is a one-time deal. Some of them could be created on the MU. We assume that the primary copies of the private relations are at the MU. Before an MU user begins using the MU, he or she may work on the database server for some time (for instance, via a fix host). Thus, there may already be private relations existing on the database server. When the user switches to use the MU, the private data on the database server may need to be downloaded to the MU's local DBMS (the cache). Because we assume that an MU is reliable, it is safe to keep the primary copy of the data on the MU. In addition, from time to time the user may copy the data back to the database server, just in case the MU user wants to share the private data with other users. The sharing here should be READ only. The MU user may eventually switch back to using fixed host. Thus, a copy of the private data on the database server is necessary. The differentiation of the private relations and the public relations has several advantages. First, the private relations can be updated at a mobile unit without worrying about the data consistency problem. Consequently, it would prevent some uplink wireless traffic that handles the WRITE operations at the database server. That is, the data consistency problems of the normal WRITE operations need to be taken care of via communication on the wireless link. Therefore, WRITE on the private relation prevents this traffic on the wireless link. Second, owing to the nature of the private relations that a user can always write to a private relation on the MU, we could treat the WRITE operations on the private relations as cache-hits. Thus, allowing the WRITE of the private relation on an MU increases the hit ratio. Third, when a user on an MU submits a query to access a private relation, the query can be answered immediately. Whereas answering a query, which accesses a public relation, needs to wait for the next VR to get the newest version of data. The rationale is that the private relations have only been modified by the user on the MU. Therefore, the data version on the MU is always the newest one. Accessing the private relations would always get the newest version of data. There is no reason to wait for the next VR. Hence, it accelerates the response time. Thus differentiating relations into two types, namely the public relations and the private relations, is worth the effort. 9 4.7. Quota Mechanism If we wish to write on the public relations of local DBMS, some complication comes up because the public relations are shared by a group of users. The complication is that the systems must keep track of all operations on the public relations to ensure the ACID property. To ensure data consistency, a locking mechanism is one solution for a pessimistic approach. However, the disadvantage of locking mechanism is that a long lock prevents others from accessing the same data [16]. If the mobile systems use the locking mechanism, such a long locking situation could happen quite often because MUs are frequently disconnected for an extended length of time. Our solution to the problem is to use the quota mechanism to download a quota of data items from the database server to the cache of an MU. The leftover quota may return to the server, or alternatively more quotas may be downloaded from the server. Using this strategy, mobile clients can have their own allowance of data to work on and prevent a long wait. The idea is quite simple, just like resource allocation. The database server allocates some data resources to the cache on an MU, and these data resources become delegations of the database server on MUs. There are two previous works addressing this concept [19][23]. In [19], the author proposed an escrow transactional method that pre-allocates some fixed portion of an aggregate quantity data (see Definition 5) to a transaction prior to its commit. When the transaction commits, there will be enough value of this data item due to the pre-allocation. Only an aggregate quantity data can be updated this way in this approach. The whole mechanism of this approach takes place in a centralized DBMS. Thus, it is not quite the same concept as the one that we are addressing. The mechanism addressed in [23] is closer to our approach. The idea in this paper is that they divide an aggregate quantity data in a server into several fixed units. For instance, if there is a data value “20”, they could divide this value into four data units with each data unit being a five. Each unit then is allocated to different clients. Each client can have full authority to handle the data unit given her. The server may choose to keep one unit of data to herself. When a client does not have enough data to commit a transaction, the MU may request some more data unit(s) from either the server or the other client who holds the same data unit. These two approaches only apply to aggregate quantity data. The data that can be handled are still very limited. We build on these ideas so that the aggregate quantity data can be dynamically allocated to different MUs with different data units so called quota (see Definition 7). Our approach also allows non-aggregate data (see Definition 6) to be a quota. However, only the aggregate quantitative data can be divided into several units and allocated as quota. An MU must download the whole non-aggregate data as a quota. These approaches significantly enhance the ideas proposed by the two previous researches. Our approach allows any kind of data to use the quota mechanism as long as the DBA of the DBMS defines data items as “quota data” in the data dictionary. Our approach is also the first one that can have different sizes of data unit. In addition, we are the first to propose use of the quota idea in a mobile DBMS environment. Definition 5 Aggregate quantity data can be computed with mathematical operators (such as addition, subtraction, etc.) in a database management system. The data type of aggregate quantity data is numerical only. 10 Definition 6 Non-aggregate data cannot be computed with mathematical operators in a database management system. The data types of the non-aggregate data could be numerical, string, and character. Note that a numerical data is not necessarily an aggregate quantity data, such as social security numbers. We do not compute social security numbers in a database management system. For the preconditions of Definitions 7, 8, and 9, let D = {Ri} be a set of base relations and Dc = {CRi} be a set of cache relations where CRi is the cache relation for Ri. Also, let CAi be the attributes for CRi and Ai be the attributes for Ri, and CAi  Ai. Let attribute ai  CAi, tc  CRi; t  Ri  tc(ai) = t(ai) for all ai. Notation t(ai) represents the value of the attribute ai, which is in the tuple t [18]. To improve performance by allowing updates at the MU, however, this may be relaxed. Given an aggregate quantity data, we can cache part of the data value at the cache. Definition 7 Given a data value qt. A quota, q, is either: 1. If qt is an aggregate quantity data, then q = qt - qm, for some value qm. 2. If qt is a non-aggregate data, then q = qt. Now we explain how the quota concept is used in MU caching. Let tc  CRi. Suppose aq  CAi is an attribute defined by the DBA to be a quota attribute. For the case one in Definition 7, this means that when the tuple is fetched to the MU, a quota of the attribute value (rather than the exact attribute value) is fetched. Let t  Ri be the corresponding tuple from Ri related to tc. For the quota attribute, aq, tc(aq) = t(aq) - qm. Therefore, tc(aq) is the quota q, t(aq) becomes the remainder value qm, and the total value, qt, is not stored. Prior to prefetch, t(aq) = qt. During prefetch t(aq) becomes qm and q is downloaded to the cache. For case two in Definition 7, tc(aq) = t(aq) and t(aq) becomes “null” afterward. Some examples of case one in Definition 7 are a bank account balance and total number of seats on a flight. A specific seat of a flight belongs to case two in the definition. A quota attribute is an attribute of a relation. This attribute can have a quota value. The DBA decides which attributes are quota attributes and keeps this information in the data dictionary. We define quota attribute in Definition 8. Definition 8 Suppose Aq is an attribute and Aq  Ai. If the quota mechanism applies to Aq, then Aq is a quota attribute. When an MU agent caches an aggregate quantity data, it could download only the quota amount as defined in case one of Definition 7. The quota threshold is recorded in the data dictionary by a DBA, and is the amount of value that can be a quota. The quota threshold of a data value cannot be greater than the data value. This is part of the quota constraints defined in the data dictionary. Quota threshold is defined in Definition 9. In the definition, V is a real number, V denotes the largest integer such that V  V. 11 Definition 9 Let aq  Ai be a quota attribute of an aggregate quantity data type and t  Ri. Vqt is a quota threshold for aq in t if there exists a real number, Raq, where Vqt = Raq * t(aq), and 0 < Raq  1. This Raq is a fixed value for aq and is stored in the data dictionary. Note that a non-aggregate data, such as a string, cannot have a partial value for a quota. This type of data could have quota as in case two of Definition 7. A DBA, according to the intrinsic nature of attributes and rules of the organization (such as the security), defines whether an attribute can or cannot have a quota in the data dictionary. Algorithm Post-Process 1. Post-Process(relation) { /* This program is run by the MU agent */ 2. Locate the record that contains the base relation in the user profile; 3. For (each tuple that satisfy the criteria list in the record of the user profile) do { 4. For (each quota attribute) do { 5. if (quota attribute is aggregate quantity data) then 6. The attribute value of the base relation = attribute_value - quota /* send a query to database server to perform this statement */ 7. else 8. The attribute value of the base relation = null; /* send a query to database server to perform this statement */ 9. The attribute value of the cache relation = quota; /* send a query to database server to perform this statement */ 10. } 11. } 12. } Quota downloading occurs in three situations. The first occurrence comes about during the prefetching stage. The quota data will be fetched along with other data in the same relation by a query. After running the program stated in Algorithm Extract-Cache-Relation (see Section 4.3) to fetch the data from the database server to the MU agent, both the base relations and the cache relation need to be postprocessed by running Algorithm Post-Process to accommodate the quota mechanism. The quota data cannot be used until this Post-Process program is accomplished. An MU agent runs the program. Note that the information of quota attributes is obtained from the data dictionary. The second occurrence happens when a quota on the cache is not sufficient to answer a query. The local DBMS will return an error message to the MU agent indicating that the data is not sufficient to answer the query. The MU agent then submits a query (a transaction) to get more quota value from the database server. Upon receiving the query, the database server sends the quota to the MU. The server waits until receiving an acknowledgment from the MU, which requests the quota, to ensure that the quota has been received, and then commits the quota transaction. If the server does not receive an acknowledgment for a certain time frame, the transaction will be aborted. Thus, from the server’s point 12 of view, downloading quota of a data item is like a transaction requested by an MU. To be exact, it is the MU agent that submits this request (see our model architecture in Section 4.9). Note that all these downloading operations are recorded in the log of the database server as well. In the case of system failure, all the uncommitted transactions can be re-done according to the log. The recovery issue is another big research area that we will not explore further, rather leaving it to future research. The methods of obtaining quota are two types of update SQL. The first type of SQL updates the base relation, the other updates the cache relation. Algorithm Upload-Cache 1. Upload-Cache(base-relation, cache-relation, Quota-attributes) { 2. update cache-relation to clean up the quota values; /* send a query to the local DBMS to perform this statement */ 3. update base-relation to add the quota values; /* send a query to the database server to perform this statement */ 4. } The third occurrence of quota downloading is the case of cache-miss. That is, the data is either not in an existing cache relation or the cache relation does not exist. The MU agent could generate a query and send it to the server to get a quota for the missing data. New entry (or entries) in a temporary user profile based on the query will be created on the MU. The cache relation will be created, if it is not existing; a new attribute will be added, if there is one; and the missing data will be fetched. The SQL to fetch the new data is similar to that of the first occurrence except the fact that a new SQL statement is needed to add the new attributes. This new SQL is a table alteration command that adds one column for a new attribute into the cache relation. Prior to running the SQL, this alteration SQL is run to add a new column for each new attribute of the cache relation. After all the SQL are run, the temporary user profile then is appended to the user profile. Alternatively, the query that requests the missing data item could be sent to the database server. After obtaining the query result, the result will be sent back to the MU. Please refer to the different scenarios of the cache-miss handling stated in Section 5. From the view of an MU, the downloaded quotas become the MU's own data in the cache. Later the quota leftover could be uploaded back to the server. Algorithm Upload-Cache demonstrates the upload procedures. An MU agent runs this program (Algorithm Upload-Cache). Alternatively, more quotas could be downloaded from the server. Two SQL statements are used to upload the quota from a cache relation to a base relation. The purpose of the quota mechanism is to increase cache-hits. Without the quota mechanism, an MU may need to wait for its turn to work on a data item (checkout the data item) on the database server. That is, wait until the other user unlocks the data item (check-in the data item). There are two scenarios here. First, data is not already in the cache. Second, the data is in the cache but cannot be used by the MU. We see both cases as cache-misses. With the quota mechanism, we allocate part of the data resources to MUs caches so that the MUs’ users do not have to wait for the data on the database server. It makes the cache- 13 misses for these types of data become cache-hits. Thus our approach, the quota mechanism, increases the ratio of cache-hit from the non-quota approach. 4.8. Query Categorization To further improve the query response time, we categorize queries into two types in the mobile environment. One is a normal query just as those performed on the fixed network. The MU needs to make sure the cache data is validated before using the cache data to answer a normal query. These normal queries require data to be consistent with that at a database server. Note that a normal query that accesses private data in a private relation is answered immediately. The other query type, named a NOW query, is also answered immediately without waiting for the up-to-date data. Thus, a user can use a NOW query to obtain timely data that is more important to the user than the correct version. Stock price information is a good example. Some stable data, such as history data, also can use this kind of query. The NOW query can avoid the latency of the data update and thus improve the query performance. Notice that the latency of the data update is caused by the synchronous approach. In the synchronous approach, an MU must wait for a VR to answer queries and to ensure the data validation. A user can specify a NOW query in her/his query request. "qtime = 'NOW'" is a special key statement for the NOW query. The MU agent will pre-process each query request, and discriminate which queries can proceed without waiting for the next VR. Before the MU agent passes a NOW query to the local DBMS, it will take out this predicate, "time = 'NOW'". 4.9. Model Architecture The relationship among a fixed network, a base station, and an MU are shown in Figure 4. A base station is a coordinator between a fixed network and an MU. A base station keeps track of relations that are being cached on the MUs, which are currently within its wireless cell. To reduce the overhead, we propose the use of private relations. Instead of keeping track of all relations that are being cached on MUs, now we only need to keep track of the public relations. All the private relations are kept on the MU. If all cache relations are private, there will be no overhead at the base station. We further categorize public relations into quota public relations and non-quota public relations. The base station also broadcasts validation reports (VRs) of the modified non-quota public data to MUs. To be exact, only the non-quota public data, which has already been requested by an MU during the registration3 and has been modified on the database server, will be included in a VR. There is a program on a base station in charge of gathering updated public data in VRs, and sending VRs to MUs that are in its cell. An MU is the center of the entire architecture in our model because the major target of this research is the caching management on an MU. An MU agent (see Figure 4), among other things, is in charge of the data caching, cache update, query pre-processing, and cache-miss handling. This MU agent plays a role as the interface between the MU and the base station. An MU agent keeps track of relation access frequencies for the MU user and saves this record in a linked list incorporated with a hash table. We call it the history list. The MU agent will fetch a portion of the whole database located at the server to an MU. 3 Each mobile unit must register to the base station when enters the cell of the base station 14 This fetching operation is based on the priority list that includes a user hint in a user profile plus the currently used relations, and the frequency of use in the history list. The user profile is created by the MU user, and is stored on the MU. We use SQL as an interface to fetch relations from a database server to a local DBMS. The information of the fetched relations, such as data types and data domains, is put in the data dictionary of the local DBMS. A fetched relation could be just a portion of the base relation on the database server. All these prefetching operations are performed by the MU agent. Moreover, An MU agent is in charge of cache update. When the MU agent receives a validation report, it will update the cache accordingly. An MU agent also is an interface between the user and the local DBMS. When the user submits a query, the MU agent will submit the query to the local DBMS immediately under three situations. The three situations are, (1.) a query is a NOW query, (2.) a query accesses private relations, and (3.) a query accesses quota public data. In turn, the local DBMS will proceed with the query. If a query is not fit one of these three situations, the MU agent needs to wait for the next validation report before passing this query to the local DBMS to answer it. This is to ensure that the newest cache data will be used to answer the query. In the case of a cache-miss, the local DBMS will send the error message to the MU agent. The MU agent generates a query based on what data is missing. This query then is sent to the database server to handle the cache-miss. The user profile plays a role in the prefetching. In addition, the MU agent uses it as a road map to locate base relations on the database server and cache relations on the MU during the cache update and cache-miss handling. The MU agent also uses the user profile to check whether a base relation is cached on the MU. 5. CASES OF CACHING MANAGEMENT In this section, we discuss different scenarios of cache use and cache-miss handling. We then derive mathematical formulas, which reflect the behavior of the different approaches with respect to hit ratio and number of queries that can be supported by the wireless link. 5.1. Cache Use There are several possibilities to use data items on an MU's cache: 1. Perform only the READ operations on the cache data. The WRITE operations are uplinked to the fixed network to perform. This case is called UR. 2. Perform READ and private WRITE on the local DBMS. Send each query performing a public WRITE to the database server. Handle the query in the database server and send back the result to the MU. This case is called UR/PrivW. 3. Perform READ and public WRITE on the quota data on the local DBMS. Send the sub-query, performing PUBLIC WRITE on the non-quota data to the database server. Handle the query in the database server and send back the result for the MU. This case is called UR/Quota. 4. Perform READ, private WRITE and public WRITE on the quota data on the local DBMS. Send the sub-query, performing PUBLIC WRITE on the non-quota data to the database server. Handle the query in the database server and send back the result for the MU. This case is called UR/PrivW/Quota. 15 There are some other possibilities, which clearly have undesirable performances; we do not include them in this research. For instance, uplink to the server for all READ and WRITE operations. Because we have an independent local DBMS on an MU, this makes things much easier to implement. The DBMS will take care of the indexing as long as we include primary keys in cache relations, which is what we have done. In addition, cache relations and base relations have different names that further make it easier for us. A user should know about the names of cache relations on her MU, because the user defined the cache names in the user profile. When the user submits an SQL request, she would know which relation(s) to use. Thus, the user has the flexibility to choose either from cache relations or base relations to run SQL on. 5.2. Cache-miss Handling When data cannot be found on the local DBMS, a cache-miss occurs. That is, SQL processing will return an error message indicating the specific data that cannot be found in the local DBMS. The MU agent then needs to handle the cache-miss situation. In the case of cache-miss handling, there are also several possibilities: 1. Uplink to the fixed network for the missing data items. That is, send the query that attempts to use the missing data to the fixed network. The database server handles the query and sends the result back to the MU. This case is called MResult. 2. Send a query, containing the relation names with the missing data items of the cache-miss, to the server on the fixed network. The server then sends back the missing data items. Note that the data items, similar to a VR, would be in a compressed format that contains a relation name and the tuples that contain the missing data item. This case is called MData. 3. Send an error message to the user, and abort the query. This case is called MError. There are two scenarios with respect to a cache-miss. First scenario is that the requested relation is not in the cache. In this case, the relation could be dealt with in case M Data. That is, request the database server to send the missing data. In this case, the miss data handling is very similar to the case of fetching a new relation onto the local DBMS. First, the MU agent adds the new relation’s information into the user profile. Based on the new information in the personal profile, the MU agent then creates a new cache relation on the MU. The next step is to extract data from the base relation, and then to insert data into the new cache relation. Once this has been done, the same query then can be run against the new cache relation. With case MResult, simply send the query to the database server to get a result. In the case of MError, inform the user of an error message. The second scenario occurs only when the requested attributes are missing and the relation that contains these attributes exists, only fetch those tuples that contain the missing data in the format that is stated in cache update section. While the systems are fetching the missing data, the MU agent submits queries to add new columns for any needed new attributes in the cache relation. When the MU has obtained the data, it then puts these fetched tuples into the corresponding relation(s). Again, this second 16 scenario is dealt with in case MData. The cases of MResult and MError are handled in a similar manner to that with MResult and MError in the first cache-miss scenario. 5.3. Mathematical Equations There are four cases of cache update and three cases of cache-miss handling. Combining these different cases, we could have twelve different combined cases: UR-MResult, UR-MData, UR-MError, UR/PrivWMResult, UR/PrivW-MData, UR/PrivW-MError, UR/Quota-MResult, UR/Quota-MData, UR/Quota-MError, UR/PrivW/Quota-MResult, UR/PrivW/Quota-MData, UR/PrivW/Quota-MError. In addition, another case that is similar to UR-MResult with an invalidation report (instead of a validation report) has been addressed in six different papers [1] [2] [13] [3] [25] [6]. We call this case UR'-MResult'. Chan, Si and Leong’s approach [5] allows WRITE on the cache, but the WRITE applies only on the cache-hit data. A WRITE query must lock the data on the server prior to write and update the server after write on the cache. In other words, their approach of WRITE operation in fact could be deemed as the case of cache-miss, MData. That is, the traffic on wireless link that is caused by sending the updated data from the MU to the database server is the same as that caused by sending missing data from the server to the MU. Hence, this approach can be categorized in case UR-MData. Nevertheless, we use private WRITE and quota mechanisms in our model; therefore, the cases of our approach are UR/PrivW/QuotaMResult, UR/PrivW/Quota-MData, and UR/PrivW/Quota-MError. Most of the previous approaches only handle READ operations on an MU. Our performance evaluation shows the advantages of this approach. In this section, we develop a mathematical equation of TQ for each case. TQ is the potential maximum number of queries that the wireless link can handle in L interval. This is the throughput we use to evaluate the performances of different cases. In addition, we assume h is the original hit ratio for the data access on cache. If a requested data is in the cache, traditionally this case is deemed as cache-hits. The hit ratio will be adjusted as we develop the mathematical equations. A few notations are defined as follows for the mathematical equations (some of them are adapted from [1].) The parameters that are used in the mathematical equations are defined in Table 1. We assume that the server begins to broadcast the validation report periodically at times Ti, and Ti = iL. The server keeps a list Uj defined as follows: Uj = {[j, tj] | j  D and tj is the timestamp of the last update of j such that Ti-1  tj  Ti} We further assume that a query performs only either a READ or a WRITE. In reality, a query could be several READs, WRITEs, or both. In our evaluation model, we will break down this type of query into a set of queries that only contain one READ or WRITE. When we actually implement a query, we do not break down the query. If a query accesses some data that are cache-hits and some data that are cache-misses, only the cache-misses data will be handled via wireless link by creating a new query to perform it. Hence, breaking down a query conceptually is only for the evaluation purpose. Under normal circumstances, if data can be found in the cache, then it is deemed a cache hit (h) regardless of whether the operation is READ or WRITE. If a WRITE on the cache is not allowed, then we must adjust the original hit ratio (h) to a new one (h’). Therefore, h’ = rR * h. See Table 1 for rR. On the other hand, if we allow a private WRITE on a local DBMS without asking permission from the 17 database server for the data coherence, then a private WRITE could be deemed as a cache-hit. Thus, hit ratio could be adjusted as the following equation: (rR * h * QL) + QprivW h' = QL Now, if only WRITEs on quota data are allowed, then: (rR * h * QL) + QqpubW h' = QL Finally, if only WRITEs on quota data and on private data are allowed, then: (rR * h * QL) + (QqpubW + QprivW) h' = QL As you can see, allowing different types of WRITE could increase cache-hit ratio to different extents. A general equation for maximum capacity of wireless link of all cases is discussed as follows. Assume that a VR contains all the modified data that are cached on an MU since the last time a VR was sent to an MU. Assume an MU listens constantly while connected (awake). In the caching validation, the server sends a VR to an MU every L seconds. The total number of bits available during the L interval is (L * B). The number of bits of all VRs is (n * bVR). Thus, the total number of bits available for query answering is (L * B – n * bVR). A fraction TQ * (1 - h') of TQ corresponds to the queries that are cachemisses. Each one of the cache-miss queries takes (bq + ba) bits to answer. If we consider the cache-miss traffic on the wireless link, the traffic in bits due to the cache-miss queries is: TQ * (1 - h') * (bq + ba). 4 Therefore, (L * B) – (n * bVR) = TQ * (1 - h') * (bq + ba) Thus, (L * B) – (n * bVR) TQ = (1 - h') * (bq + ba) The equations for different cases have different h', and ba. 4 We ignore the acknowledgement because all cases have same amount of acknowledgement, there is no point to add this factor in comparisons. 18 6. SIMULATION RESULTS AND PERFORMANCE ANALYSIS In this section, we analyze performances of the cache management algorithms. Mathematical models for all cases of the caching management were created in the previous section. Crystal Ball is the simulation software that has been used. The simulation results are analyzed and compared to each other. The comparisons among all caching strategies are based on wireless link usage with respect to throughput in a certain time period. In other words, how many queries can be answered within a certain time frame and in a limited wireless link bandwidth for different approaches are compared. Impacts of different percentages of WRITE, of private WRITE and quota public WRITE on throughput and on hit ratio are evaluated. Simulation results are listed and discussed by experiment type. Note that all Merror5 cases are not included in this section because all these cases lack cache-miss traffic on the wireless link and are thus not applicable. Performance evaluations and comparisons of different cases then are carried out and analyzed. 6.1. Experiment one The first experiment is based on the most likely hit ratio (80%), WRITE probabilities (20%), private WRITE probability (50%), and quote public WRITE probability (50%) to simulate all nine cases. Thus, the purpose of this experiment is to show the default cases and to compare the performances. The parameters for the first simulation are summarized in Table 2. Some of the figures in Table 2 are from the previous papers [1] [2] [3]. We assigned the rest of the parameters with realistic value. Most of the parameters are generated based on triangular distribution. We have chosen the triangular distribution as we think that it more accurately reflects the distribution of the parameter values in Table 2 used. The parameters estimating size (bx variables) will definitely not follow either a uniform, normal, or exponential distribution but we could expect certain values to clearly standout as the most likely. The other parameters similarly use a triangular distribution. In actuality since we run 1000 trials for each experiment and report the average, we do expect that the choice of distribution has little impact on over all results. As to the number of MUs (n) that are in the cell of the base station within L time interval, we use Poisson distribution. We assume MUs arrive randomly. Thus, the MU arrival rate is based on exponential distribution, which generates random arrival time. Consequently, the number of MUs in the cell of a base station during a time interval is a Poisson distribution. We assume that the likeliest number of MUs is 100 in a cell during the time interval L, and one thousand trials are extracted from Poisson distribution with mean 100. Each case has been run with one thousand trials in Crystal Ball. The simulation results provide the frequency distribution of one thousand TQs 6. We are interested in the mean of the distribution. Mean of the distribution represents TQ. The rest of the statistics describe 5 6 Send error message to user when a cache-miss occurs. TQ is the potential total number of queries that all MUs in a cell can submit and obtain results successfully. 19 properties of the mean value of TQ. We will use this mean value of TQ as the simulation result, and will compare it with other cases’ mean values of TQ. Normalized TQs are shown in Table 3. That is, TQs of all cases are divided by TQ of case U R’MResult’ which represents most of the previous researches [1] [10] [3] [13]. Thus, case UR’-MResult’ is one in Table 3 and the rest of the cases are the ratios that compare with case UR’-MResult’. It clearly shows that our approaches, cases UR/PrivW/Quota-MResult and UR/PrivW/Quota-MData, are much better than all the rest of the cases. Our approaches, cases UR/PrivW/Quota-MResult and UR/PrivW/Quota-MData, are about 1.7 times better in terms of throughput than case UR’-MResult’ in this simulation. The only cases that perform worse than case UR’-MResult’ are cases UR-MResult and UR-MData. This is not a surprise to us because case UR’-MResult’ broadcasts IRs and cases UR-MResult and UR-MData broadcast VRs. The size of an IR is smaller than the size of a VR. Thus, case UR’-MResult’ utilizes less wireless link and results in a better throughput than cases UR-MResult and UR-MData that have similar cache management mechanism with case UR’-MResult’. Case UR-MResult represents the previous approaches in [6] [24] and case UR-MData represents the other previous approach in [5]. The simulation results demonstrate that our approach is much better than all the previous researches in terms of throughput. The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData, UR/W-MResult, UR/W-MData are also performing better than the previous approaches. 6.2 Experiment Two In this experiment, we increase the WRITE probability from the default situations. We change the mean value of WRITE probability to 80%, the minimum value of WRITE probability to 70%, and the maximum value of WRITE probability to 90%. The rest of the assumptions are the same as the first simulation’s. The simulation results are shown in Table 3. As you can see, our approach performs even better in this simulation than in the previous one. In terms of throughput, our approach is about 3.5 times better than case UR'-MResult', instead of just 1.7 times better in the previous simulation. The cases UR/PrivWMResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData, UR/W-MResult, UR/W-MData also perform better over case UR'-MResult' than the previous simulation. 6.3. Experiment Three In this experiment, we increase both WRITE and quota public WRITE probability from the default situlations to see the impact of changes. In this simulation, we increase the probability of quota public WRITE to 80% (minimum is 70% and maximum is 90%). The percentage of WRITE is the same as the second simulation’s, 80%. The rest of the parameters are the same as the first simulation’s. The simulation results are even better than the second simulation’s. Our approaches are 5.6 times better than case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData, UR/W-MResult, UR/W-MData also perform better than the second simulation. By all means, they perform even better when compared with the first simulation’s. 6.4. Experiment Four We increase probabilities of WRITE, of private WRITE, and of quota public WRITE in this experiment to see the impact. In this simulation, we increate the ratio of private WRITE to 80% 20 (minimum is 70% and maximum is 90%). The rest of the parameters are the same as the third simulation’s. The simulation results of our approaches are even better than the third simulation’s. Our approaches are 8.5 times better than case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivWMData, UR/Quota-MResult, UR/Quota-MData, UR/W-MResult, UR/W-MData are also perform better than the third simulation. Cases UR/PrivW-MResult, UR/PrivW-MData are now 3 times better than case UR'-MResult'. By all means, they perform even better when compared with the first two simulations’. 6.5. Experiment Five We decrease the hit ratio from the the default value to a very low number in this experiment to see the impact. In this simulation, we decrease the hit ratio to 10% (minimum is 5% and maximum is 30%). The rest of the parameters are the same as the first simulation’s. The simulation results of our appraoches are not as good as the previous simulations’. In this case, our approaches are still 1.28 times better than case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData also performed better than case UR'-MResult'. 6.6. Experiment Six In this experiment, we decrease the hit ratio to a medium degree from the default situations to see the impact. In this simulation, we set the hit ratio to 50% (minimum is 30% and maximum is 80%). The rest of the parameters are the same as the fifth simulation’s. The simulation results of our approaches are 1.47 times better than case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData also perform better. The impacts of different hit ratios follow the same pattern as the previous simulations’ . 6.7. Experiment Seven In this experiment, we set the hit ratio to a very low number and the WRITE probability to a very high number to see the impact. As we have seen in experiment five, low hit ratio does not show a large difference in our approaches. Thus, we would like to see the impact of low hit ratio but high WRITE probability. In this simulation, we set WRITE probability to 90% (minimum is 70% and maximum is 99%). The rest of the parameters are the same as the fifth simulation’s. The simulation results of our appraoches are much better than the fifth simulations’. In this case, our approaches are about 3 times better than case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData, UR/W-MResult, UR/W-MData also perform better than case UR'-MResult' in this simulation. Thus, as long as the WRITE probability is high and even though the original hit ratio is very low most of the cases can perform better than case UR'-MResult'. The impacts of different hit ratios follow the same pattern as the previous simulations’ . 6.8. Impact of Private and Quota Public WRITE Combined In this section, we report on several experiments run to see the impact of private WRITE and quota public WRITE combined. The original hit ratio, h, remains 80%. The simulation results are shown in Figures 5, 6, and 7. The breaking point in 20%-WRITE is at 10%. The breaking point in both 50%- 21 WRITE and 90%-WRITE are at about 5%. The impacts of our approaches follow the similar pattern. The three READ only cases remain, not surprisingly, the same low throughputs over all different percentages of WRITE, private WRITE, and quota public WRITE. 6.9. Summary of Simulations By far, our approaches, cases UR/PrivW/Quota-MResult and UR/PrivW/Quota-MData, have had the best performances all the way through comparing them with the rest of the cases. The next best are U R/PrivWMResult and UR/PrivW-MData . The ones that follow are UR/Quota-MResult and UR/Quota-MData. The worst three are UR’-MResult’, UR-MResult and UR-MData. In addition, no previous research approaches perform well. Case UR-MResult represents the previous approach in [6] [24] and case UR-MData represents the other previous approach in [5]. Both of these two cases perform worse than other cases all the way through. Case UR’MResult’, which represents most of the previous researches [1] [3] [13], is a bit better than cases UR-MResult and UR-MData. Nevertheless, it is still outperformed by most of the approaches. As you can see, READ only approach is very expensive. The reason is very obvious. READ only approach treats solely READ as a cache-hit. WRITE operations, which are required coordination with the database server, consume a lot of wireless link resource. Note that WRITE operations are allowed in [24] and [5]. However, the WRITE operations in these two approaches require coordination with the database server, which is deemed as cache-misses. This is why we treat these two approaches, [24] and [5], as READ only. As regards the impacts on the adjusted hit ratios, all cases follow a similar pattern. The impacts on the adjusted hit ratio, h’, for all cases are listed in Table 4. All READ only approaches have smaller adjusted hit ratios because WRITE operations are excluded as a cache-hit. Case UR/PrivW/Quota has very good adjusted hit ratios as long as one of the three situations occurs. The three situations are high ratio of WRITE, high ratio of private WRITE, and high ratio of quota public WRITE. Note that the uplink is only a fraction of the capacity of the downlink in the real world situation. We assume that downlink and uplink channels have the same bandwidth capacity for the evaluation purpose. Our approaches have significant improvement on the hit ratio over the other approaches. By all means, the performance of our approaches will be even better in the real world situation7, because higher hit ratio means lower uplink traffic. We examined the impacts of different percentage of WRITE with different probabilities of private WRITE and quota public WRITE. WRITE probability at 10% is the breaking point for our approaches (see Figure 5). That is, when both private WRITE and quota public WRITE is more than 10%, our approaches start to outperform the previous approaches. When the WRITE probability increases to 50% (or 90%), the breaking point becomes at 5% (see Figures 6, 7). In addition, when the WRITE probability increases with high percentage of private WRITE and that of quote public WRITE (90% or more), our approaches outperform dramatically the previous approaches (cases UR’-MResult’, UR-MResult and URMData). Note our approaches always perform better than the previous approaches UR-MResult and UR-MData. The reason why our approaches cannot perform better than case UR’-MResult’ in low private WRITE and quota public WRITE is due to the fact that case UR’-MResult’ broadcast IR instead of VR. 7 Unlink (backchannel) is only a fraction of the capacity of the downlink. 22 One last point we would like to discuss is about the impacts on different size of granularity. In cache update and cache miss handling, our granularity is a set of tuples, and each tuple only contains a subset of attributes. Semantic caching’s granularity (semantic region) is a set of tuples, and granularity of page caching is a page. Obviously, the size of our granularity is the smallest, the size of a page is the largest, and the size of semantic region is in the middle. Consequently, our performance is the best, semantic caching is second, and the page caching is the worst. In equation TQ, the variable bd = n * granularity, where n is the number of granules. Obviously, when granularity is smaller the bd is smaller, and a smaller bd results in a larger TQ (throughput). 7. SUMMARY AND FUTURE WORK In this research, we have designed and developed all the required algorithms8 for a mobile agent on an MU along with a program on a base station. The whole design aims at improving data caching/replication on a mobile unit including, among other things, prefetching/hoarding, cache management, cache coherence, and cache replacement. The simulation results have shown our approaches are far superior to the previous researches. This is because we use a quota mechanism and categorize relations into private and public. These approaches enable a user to query private and quota data directly from the local DBMS on an MU without data coherence problem. In addition, our approaches significantly reduce usage of the valuable wireless link, which is the most limited resource in a mobile computing environment. The previous researches [1] [3] [13] [6] assume a READ only approach, which has been shown to be not very efficient when probabilities of private WRITE and quota public WRITE are high. The approaches in [24] and [5] allow WRITE on cache. However, these WRITE operations must be kept in sync with the database server’s. This consumes a large portion of the valuable wireless link. There are some possible extensions of this paper for future research. First, we would like to translate all the algorithms into some high level languages, preferably Java. Java is highly portable and excellent in building a front-end interface, such as a web page. The built Java applets can talk with JDBC. ODBC is the interface to a RDBMS. JDBC can then interact with ODBC. Second, we would like to address the issue of how to generate VRs efficiently on the base station, including checking updated data with the database server. This is an important issue that we would like to address in the future. Lastly, how to handle hand-off efficiently is also an important issue that we would like to address in the future. REFERENCES [1] D. Barbara and T. Imielinski. Sleepers and Workaholics: Caching Strategies in Mobile Environments. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pages 1-12, May 1994. [2] D. Barbara and T. Imielinski. Sleepers and Workaholics: Caching Strategies in Mobile Environments. MOBIDATA: An Interactive Journal of Mobile Computing, 1(1), Nov. 1994. 8 They are not all listed in this paper. If interested, please refer to [26]. 23 [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] O. Bukhres and J. Jing. Performance Analysis of Adaptive Caching Algorithms in Mobile Environments. International Journal of Information Sciences (IJIS), North Holland, 1995. M. Carey, M. Franklin, and M. Zaharioudakis. Fine-grained sharing in page server database systems. In Proceedings of ACM SIGMOD Conference, 1994. B. Y. Chan, A. Si, and H. V. Leong. Cache Management for Mobile Databases: Design and Evaluation. In Proceedings of The International Conference on Data Engineering, IEEE, pages 54-63, 1998. S. Dar, M. J. Franklin, B. T. Jonsson, D. Srivastava, and M. Tan. Semantic Data Caching and Replacement. In Proceedings of the 22nd VLDB Conference, Mumbai (Bombay), India, pages 330341, 1996. M. H. Dunham and A. Helal. Mobile Computing and Databases: Anything New?, SIGMOD Record, 24(4), pages 5-9, Dec. 1995. M. J. Franklin, M. J. Carey, and M. Livny. Global Memory Management in Client-Server DBMS Architectures. In Proceedings of the International Conference on VLDB, Pages 596-609, 1992. M. Franklin. Client Data Caching: A Foundation For High Performance Object Database Systems, Kluwer Academic Publishers, 1996. R. Gruber, F. Kaashoek, B. Liskov, and L. Shrira. Disconnected Operation in the Thor ObjectOriented Database System, In Proceedings of Workshop on Mobile Computing Systems and Applications, pages 51-56, IEEE, Dec. 1994. J. S. Heidemann, T.W. Page, R.G. Guy, and G. J. Popek. Primarily Disconnected Operation: Experience with Ficus. In Proceedings of the Second Workshop on the Management of Replicated Data, Nov. 1992. P. Honeyman, L. Huston, J. Rees, et al. The LITTLE WORK Project. In Proceedings of the 3rd Workshop on Workstations Operating Systems, IEEE, April 1992. J. Jing, A. Elmagarmid, A. Helal, and R. Alonso. Bit-Sequences: A New Cache Invalidation Method in Mobile Environments, Purdue University, Department of Computer Sciences, Technical Report CSD-TR-95-076, Dec. 1995. J. Jing, A. Elmagarmid, A. S. Helal, and R. Alonso. Bit-Sequences: An adaptive cache invalidation method in mobile client/server environments. Mobile Networks and Applications Journal 2(2), pages 115-127, 1997. G. H. Kuenning, G. J. Popek, and P. L. Reiher. An Analysis of Trace Data for Predictive File in Mobile Computing, University of California, Los Angeles, Technical Report CSD-940016, Apr. 1994. Also appeared in Proceedings of the 1994 Summer Usenix Conference. Won Kim, Nat Ballou, Jorge F. Garz and Darrell Woelk, “A Distributed Object-Oriented Database System Supporting Shared and Private Databases. ACM Transactions on Information Systems, 9(1), pages 31-51, Jan. 1991. H. Lei and D. Duchamp, Transparent File Prefetching, Columbia University, Computer Science Department, Mar. 1995. D. Maier. The Theory of Relational Databases. Computer Science Press, 1983. 24 [19] [20] [21] [22] [23] [24] [25] [26] P. E. O'Neil. The Escrow Transactional Method. ACM Transactions on Database Systems, 11(4), Dec. 1986. E. O’Neil, P. O’Neil and G. Weikum. The LRU-K Page Replacement Algorithm for Database Disk Buffering. In Proceedings of the ACM SIGMOD, pages 297-306, 1993. R. H. Patterson and G. A. Gibson. A Status Report on Research in Transparent Informed Prefetching, Carnegie Mellon University, School of Computer Sciences, Technical Report CMUCS-93-113, Feb. 1993. M. Satyanarayanan, J. J. Kistler, L. B. Mummert, M. R. Ebling, P. Kumar, and Q. LU. Experience with Disconnected Operation in a Mobile Computing Environment, Canegie Mellon University, School of Computer Science, Technical Report CMU-CS-93-168, June 1993. Also published in Proceedings of the 1993 USENIX Symposium on Mobile and Location-Independent, Cambridge, MA, Aug. 1993. N. Soparkar and A. Silberschatz. Data-value Partitioning and Virtual Messages. PODS, pages 357367, 1990. G. Walborn and P. K. Chrysanthis. ``PRO-MOTION: Management of Mobile Transactions.'' Proceedings of the 11th ACM Annual Symposium on Applied Computing, Special Track on Database Technology, pages 101-108, San Jose, CA, Mar. 1997. K. L. Wu, P. S. Yu, and M. S. Chen. Energy-Efficient Caching for Wireless Mobile Computing, In Proceedings of the 12th International Conference on Engineering, pages 336-343, Feb. 1996. Jenq-Foung Yao. Caching Management of Mobile DBMS on A Mobile Unit. Ph.D. Dissertation, Southern Methodist University, August 1998. 25 Table 1. Parameters that are used in the mathematical equations QL QR QW RW QpubW rpubW QqpubW rqpubW QnqpubW QprivW L B N bq ba br bd TQ bVR QC rR H h' Description Total number of queries that are submitted from an MU in the time interval L. QL = QR + QW Number of READ queries in QL.  QR = QL - QW Number of WRITE queries in QL; QW = QpubW + QprivW and also QW = rW * QL The percentage of the queries that perform WRITE in QL. Number of queries, which perform a public WRITE; QpubW = QqpubW + QnqpubW and QpubW = rpubW * QW The percentage of the queries that perform public WRITE in QW. Number of queries, which write on quota public relations. QqpubW = rqpubW * QpubW The percentage of the queries that perform quota public WRITE in QpubW. Number of queries, which write on non-quota public relations. Number of queries, which write on private relations. QprivW = (1 - rpub) * QW VR broadcast interval The bandwidth of the wireless network. Total number of MUs in the cell of a base station. Size of a query in bits. Size of an answer in bits; there are two types of ba: br and bd. Size of a query result in bits. Size of a data item in bits which is the cache update granularity. The potential maximum number of queries that the wireless link can handle in L interval. Size of a VR in bits. Total number of queries that can be served completely using cache during the time interval L. QC = rR * h * QL rR is the percentage of READ queries in QL. rR = 1 - rW The pre-defined hit ratio. The adjusted hit ratios including WRITE queries in different cases. 26 Table 2. Assumptions of first simulation Parameters Bd Bq bIR br bVR H rW RqpubW RpubW N L B QL Min 64 32 16 64 64 0.7 0.1 0.3 0.3 10 N/A N/A N/A Likeliest 128 64 64 128 256 0.8 0.2 0.5 0.5 100 10 19200 100 Max 256 96 128 256 640 0.9 0.5 0.8 0.8 120 N/A N/A N/A Distribution Triangular Triangular Triangular Triangular Triangular Triangular Triangular Triangular Triangular Poisson N/A N/A N/A 27 Table 3. Normalized throughput of all experiments Cases UR'-MResult' UR-MResult UR-MData UR/PrivW-MResult UR/PrivW-MData UR/Qouta-MResult UR/Qouta-MData UR/PrivW/Quota-MResult UR/PrivW/Quota-MData Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Exp 7 1 1 1 1 1 1 1 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.85 0.87 0.86 0.85 1.27 1.7 1.69 3.03 1.1 1.18 1.65 1.27 1.7 1.68 2.98 1.11 1.18 1.63 1.18 1.36 1.62 1.23 1.09 1.13 1.34 1.18 1.36 1.61 1.21 1.09 1.13 1.33 1.7 3.5 5.65 8.51 1.28 1.47 3.17 1.7 3.49 5.62 8.36 1.29 1.46 3.13 28 Table 4. Adjusted hit ratios, h’, for all cases Cases Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Exp 7 UR' 64% 16% 19% 19% 11% 37% 1.5% UR 64% 16% 19% 19% 11% 36% 1.5% UR/PrivW 73% 54% 54% 75% 24% 49% 44% UR/Quota 69% 39% 50% 35% 19% 44% 27% UR/PrivW/Quota 79% 76% 86% 91% 31% 57% 69% 29 Figure 1. Architecture for Mobile Systems (Adapted from Figure 1 in [7]) 30 ( cache_flag, base_relation_name[1], cache_relation_name[1], [Pattribute1, Pattribute2,…, Pattributem], [attribute1, attribute2,…, attribute]n, [criteria1, criteria2,…, criterial] ) ( cache_flag, base_relation_name[2], cache_relation_name[2], [Pattribute1, Pattribute2,…, Pattributem], [attribute1, attribute2,…, attributen], [criteria1, criteria2,…, criterial] ) ……………….. ( cache_flag, base_relation_name[n], cache_relation_name[n], [Pattribute1, Pattribute2,…, Pattributem], [attribute1, attribute2,…, attributen], [criteria1, criteria2,…, criterial] ) Figure 2. User Profile 31 Figure 3. Broadcasting Validation Report (adapted from [1]) 32 Figure 4. The Relationship among a Fixed Network, a Base Station, and an MU 33 Wireless Link Uasge (W:20%) 6000 5500 Number of Queries 5000 UR'-MResult' UR-MResult 4500 UR-MData UR/PrivW-MResult UR/PrivW-MData 4000 UR/Quota-Mresult UR/Quota-MData 3500 UR/PrivW/Quota-MResult UR/PrivW/Quota-MData 3000 100% 90% 70% 50% 20% 10% 2000 0% 2500 Percentage of Private and Quota Write in All Write Figure 5. Impact of Private and Quota Public WRITE (write: 20%) 34 Wireless Link Uasge (W:50%) 8200 Number of Queries 7200 UR'-MResult' UR-MResult 6200 UR-MData UR/PrivW-MResult 5200 UR/PrivW-MData UR/Quota-Mresult 4200 UR/Quota-MData 3200 UR/PrivW/Quota-MData UR/PrivW/Quota-MResult 100% 90% 70% 50% 20% 10% 1200 0% 2200 Percentage of Private and Quota Write in All Write Figure 6. Impact of Private and Quota Public WRITE (write: 50%) 35 Wireless Link Uasge (W = 90%) 9800 8800 6800 UR'-MResult' UR-MResult 5800 UR-MData 4800 UR/PrivW-MResult UR/PrivW-MData 3800 UR/Quota-Mresult 100% 90% 800 70% UR/PrivW/QuotaMResult UR/PrivW/Quota-MData 50% 1800 20% UR/Quota-MData 10% 2800 0% Number of Queries 7800 Percentage of Private and Quota Write in All Write Figure 7. Impact of Private and Quota Public WRITE (write: 90%) 36 Figure Captions: Figure 1. Architecture for Mobile Systems (adapted from Figure 1 in [7]) Figure 2. User Profile Figure 3. Broadcasting Validation Report (Adapted from [1]) Figure 4. The Relationship Among a Fixed Network, a Base Station, and an MU Figure 5. Impact of Private and Quota Public WRITE (write: 20%) Figure 6. Impact of Private and Quota Public WRITE (write: 50%) Figure 7. Impact of Private and Quota Public WRITE (write: 90%) Yao & Dunham

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Caching Management of Mobile DBMS