Download Caching Management of Mobile DBMS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational algebra wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Caching Management of Mobile DBMS
Jenq-Foung Yao
Department of Mathematics and Computer Science
Georgia College & State University
Milledgeville, GA 31061
Email: [email protected]
Phone: (912) 445-1626
Fax: (912) 445-2602
Margaret H. Dunham
Department of Computer Science and Engineering
Southern Methodist University
Dallas, TX 75275
Email: [email protected]
Phone: (214) 768-3087
Fax: (214) 768-3085
Abstract
Unlike a traditional client-server network, a mobile computing environment has a very limited
bandwidth in a wireless link. Thus, one design goal of caching management in a mobile computing
environment is to reduce the use of wireless links. This is the primary objective for this research.
Quota data and private data mechanisms are used in our design so that an MU user is able to query
and update data from the local DBMS1 without cache coherence problems. The effect of the two
mechanisms is to increase the hit ratio. An agent on an MU along with a program on a base station
are used to handle the caching management, including prefetching/hoarding, cache use, cache
replacement, and cache-miss handling. The simulation results clearly indicate that our approaches
are improvements to the previous research.
Keywords: Caching, Mobile Computing, Mobile DBMS, Mobile Unit (MU), Database, Agent, User
Profile, Validation Report (VR)
1
The local DBMS contains cache data on an MU.
1
1. INTRODUCTION
For the past ten years, personal computer technology has been progressing at an astonishing rate. The
size of a PC is becoming smaller, and the capacity of software and hardware functionality is increasing.
Simultaneously, the technologies of cellular communications, satellite services and wireless LAN are
rapidly expanding. These state-of-art technologies of PC and wireless have brought about a new breed of
technology called mobile computing (MC). Several mobile computing examples have been discussed in
[7] and [1].
Most people acknowledge that the mobile environment is an expansion of distributed systems. Mobile
units (MUs) and the interfacing devices (base stations that may interact with MUs) are added to the
existing distributed systems (see Figure 1). This is a client/server network setting. The servers would be
on some fixed hosts or base stations, and the clients could be fixed hosts or mobile units. The mobile
units are frequently disconnected for some periods because of the expansive wireless connection,
bandwidth competition, and limited battery power. To allow users to access resources at all times no
matter which mode they are in, many research issues need to be dealt with. The data caching/replication
on an MU is one of the important methods that can help to resolve this problem.
2. PREVIOUS WORKS
In the present research, caching management is handled in two different levels - the file system level
and the DBMS level. Issues on the file system level have been addressed widely [11] [22] [12] [21] [17].
Some of the approaches on the file system level have developed real systems that have been used daily,
such as Coda [22]. These research efforts on the file system level have some shortcomings. The major
one is that all of them explicitly exclude a DBMS. In addition, they use the optimistic replication control.
This kind of control allows WRITE operation on different partitions (locations). Committing data in a
timely fashion is not important in these systems. Data are allowed to have several different versions on
the different partitions, and later will be integrated (and committed). In the academic environment of
these approaches, users rarely write to the same file at the same time.
Most of the previous works in mobile DBMS [1] [2] [13] [3] [25] concentrated on the time window
“w” and the size of Invalidation Report (IR). These researches impacted the wireless link usage to a
certain degree. However, these previous approaches assumed read-only operation on the local cache, just
one of the possibilities in which to use cache data on an MU. These previous researches also uplinked to
the fixed network for the cache-miss data, which also is just one of the cache-miss handling possibilities.
Only a few researchers at the DBMS level have dealt with the update issue on MUs. Chan, Si, and
Leong proposed a mobile caching mechanism based on an object-oriented paradigm [5]. Conceptually
their approach is based on an idea that is similar to ours. That is, to cache the frequently accessed
database items in MUs to improve performance of database queries and availability of database data items
for query processing during disconnection. This is a concept called hot spot [2]. This concept states that
frequently accessed data are likely to be accessed again in the future.
The other research, which dealt with WRITE operations, is in [24]. In this research, virtual resources
are pre-allocated on an MU so that the MU has its own potential share of data. The research is based on a
trucking distribution system where each truck is pre-assigned an amount of load. When a truck has
2
actually loaded goods, it then reports the actual load to the database server. Only the aggregate quantity
data have been dealt with. The approach is very similar to research proposed by O'Neil [19]. O'Neil
proposed an escrow transactional method that pre-allocates some fixed portion of an aggregate quantity
data to a transaction prior to its commit. When the time comes to commit the transaction, there ought to
be enough value of this data item available due to the pre-allocation. The whole mechanism of the
approach in [19] takes place in a centralized DBMS.
3. OBJECTIVES FOR THE RESEARCH
The objectives of this research are to provide some solutions for unaddressed issues in the DBMS
area. These issues include how to handle WRITE operations on the local cache, different techniques to
handle cache-misses, how to deal with cache coherence, etc. We will fully address the issues in the next
two sections. Performance evaluations based on simulation results are then discussed in section six.
4. MODEL OF CACHING MANAGEMENT
A client-agent-server architecture is used in our model. The rationale is that we would like to build a
mobile DBMS on top of existing DBMSs. The agent is an interface between a database server and a
mobile client. All the additional functionality to the existing DBMSs is built in the agent. That is, among
other things the agent handles all cache prefetch, cache update, cache-miss handling, and cache
coherence. The agent includes an MU agent on an MU and a VR2 handler on a base station. The data
prefetching can be done either through a wireful link or through a wireless link. However, the wireful
link is always the preference as long as the situations allow. The cache will be stored in the local disk of
an MU, and organized in the format of relations. These relations are deemed part of the local RDBMS
and can be queried via the local RDBMS.
4.1. Model Assumptions
Assumptions in our research are as follows:
1. Only the issues at the DBMS level will be dealt with in this research.
2. The environment is a typical client-server network. The server is on a fixed network. The client
could be a mobile unit or a fixed host; we mainly deal with issues on a mobile unit.
3. A user is able to use the MU for an extended time frame. Therefore, data can be cached there for
long periods.
4. An MU has an independent local DBMS. This local DBMS and the database server on the fixed
network all support the relational models. SQL on the MU can query both the private RDBMS and
the RDBMS on the database server. An MU agent serves as a query interface among them.
5. All the MUs are portable laptop computers. We assume they are as powerful as their desktop
counterparts.
2
Please refer to Section 4.5
3
6. The data prefetching can be done either through a wireful link or through a wireless link. However,
the wireful link is always the preference as long as the situations allow. In addition, a mass
prefetching is preferable in the case of low network traffic, such as overnight.
7. The cache will be stored in the local disk of an MU, and organized in the format of relations. These
relations are deemed part of the local RDBMS on the MU and can be queried via the local RDBMS.
8. We assume that downlink and uplink channels have the same bandwidth capacity for the evaluation
purpose.
9. The granularity of the fetching data is a portion of relation.
4.2. Caching Granularity
The caching granularity is one of the key factors in caching management systems. Most of the
mobile systems at the Operating system level use a file or a group of files (cluster or replica) as the
caching granularity. Using a file as the granularity is not an appropriate choice. On the DBMS level, the
caching granularities are usually an attribute [5], an object [5], a page [4], a tuple [9], or a semantic region
[6]. Attribute caching and tuple caching create undesirable overheads due to the large number of
independent cache attributes/tuples. On the other hand, object caching, page caching and semantic
caching reduce these overhead problems by collecting data in a group of tuples. The static grouping in
page caching and object caching is lacking of flexibility compared to the attribute/tuple caching.
Semantic caching provides the flexibility, permitting the dynamic grouping to the requests of the present
queries. However, how a semantic caching can be implemented in detail is still unclear, such as who is in
charge of the cache update [6].
We propose a portion of a relation as the caching granularity. This portion of the relation contains a
group of tuples from the original relation. These tuples are extracted from the original relation with query
operators SELECTION() and PROJECTION. We also preserve the primary key’s attributes in a cache
relation. Therefore, our approach is not exactly like that of the updateable snapshot. We call our
approach “Morsel Caching” because we cache a portion of a base relation as a cache relation. We then
call this caching granularity “Cache relation”. The cache relation is defined in Definition 2. One may
view a cache relation as a special case of updateable snapshot.
Definition 1 Let a database D = {Ri} be a set of base relations. For every base relation Ri, let Ai stand
for the set of its attributes. Let Aik be the set of attributes that are included in the primary key of Ri. A
user morsel, UM, is a tuple <UA, UP, UC>, where UC = UAUP (Rj) where Rj is one of the relations in D;
UA  Aj and Ajk  UA; Up = P1  P2  P3  …  Pn where Pj is a conjunctive of simple predicates, i.e. Pj =
bj1  bj2  bj3  …  bjl, each bjt is a simple predicate.
Definition 2 A Cache Relation, CR, is the smallest relation which contains a set of Uc’s from Definition 1
and all associated with the same base relation.
The motivation why we do not have a join included to form a cache relation is to make the cache-miss
handling and the update synchronizing issues much easier.
4
Note that we use a cache relation as the caching granularity in prefetching and in cache replacement.
The cache update granularity, however, is only a subset of attributes of a tuple owing to the fact that the
update takes place via a wireless link whose bandwidth is limited. This is different from other
approaches, which use the same granularity for prefetching, cache replacement and cache update. Thus,
our approaches are much more flexible, in that the granularity is not fixed but dynamic for different
occasions.
4.3. User Profile
There are several previous approaches that use a mechanism called “hoarding profile”. The
hoarding profile lets users choose their preference data to cache. This is the most direct and effective way
to cache data that users need. The drawback is that it needs human involvement and people may not
know what they will really use. Hence, only the sophisticated users are able to provide the most effective
cache data. A classic example is Coda [22], which uses the “hoard profile” command scripts to update
the “hoard database”. Data are fetched to the cache based on the hoard database. Another example is
Thor [10], which uses an object-oriented query language to describe the hoarding profile. Its applications
are similar to Coda’s.
Algorithm Extract-Cache-Relation:
1. Extract-Cache-Relation(user-profile){
/* This algorithm extracts data from the base relation and inserts it into the cache relation */
2.
For each entry of the user profile {
3.
if (cache_flag = 0) then { /* the corresponding cache relation is not exist */
4.
{ create a cache relation; }
5.
set cache_flag = 1; /* mark the cache relation as cached */
6.
For (each attribute j that is not in the cache relation) do
7.
{ add a column for the attribute into the cache relation; }
8.
extract data from the base relation and insert the data into the cache relation;
9.
}
10. }
We adapt this concept as part of our caching mechanism. In our model, data are fetched to the cache
based on the data access frequency and a user hint. A user hint contains the projected needs of a user as
specified in a user profile, and the relations that are presently used. The user profile is created prior to the
very first prefetching. We let a user create his user profile by running a simple program which prompts
the user to enter the information. This information includes the name of a cache relation, the name of the
related base relation, the attributes of the primary key, the attributes of the cache relation, and a set of
criteria that will be used to create the cache relation. The program then organizes the entered information
in the format shown in Figure 2. A user profile is an input to another program (see Algorithm ExtractCache-Relation). The output of the program is a set of cache relations. Hence incorporating a user
profile with the program could extract a portion of base relations into cache relations. Alternatively, the
5
DBA could perform these tasks for users. Should a user lack a user profile for whatever reasons, the
cache relations would be the base relations on the database server. That is, if a user chooses not to have a
user profile, the cache relations would be the same as the base relations. Which base relations will be
fetched in the prefetching stage is based on the relation access frequency. Cache relations are fetched
onto an MU based on the user profile during the prefetching stage. A cache relation created with the user
profile has a different name from the base relation. It is up to the user to name a cache relation in the user
profile. If a cache relation is not created with a user profile, then the cache relation would share the same
name with the original relation. In this case, a cache relation is like a replication of the base relation.
The program (Extract-Cache-Relation) will be run against the user profile at the very first prefetching.
This is a one-time deal. Once the user profile has been created and has been used to extract cache
relations, it can also be used to assist in handling cache-misses. When a cache-miss occurs, the MU agent
can look up the user profile and trace back to the base relation. If there is no entry for the cache relation,
then the MU agent needs to create a new entry for the missing relation based on the cache-miss query. If
the cache-miss query involves a join, each relation involved in the join will be an entry that needs to be
created. That is, if three relations involve a join, three entries will be created. The new entry is created in
a temporary user profile. The MU agent then runs the extract-cache-relation program against the
temporary user profile. Once this has been done, the temporary user profile will be appended to the user
profile. In the future, if the user decides to cache more relations, the procedure is similar to the case of a
cache-miss.
This user profile is kept on the MU that a user is using. From time to time, it will be backed up to the
database server. Each entry could be used to create a user morsel (see Definition 1 in Section 4.2). There
are six parameters for each entry. The first parameter is the cache flag bit. The ON bit (1) means the
cache relation exists, and the OFF bit (0) means the cache relation does not exist. Initially, all the cache
flags are set to OFF (0) bit. The second parameter is the name of the base relation from which that data
will be extracted. The third parameter is a name for the new cache relation. It is the user’s choice for the
name. The fourth parameter is a set of attributes of the primary key from the base relation. The fifth
parameter is a set of attributes from the base relation that will be included in the cache relation. The last
parameter is a set of criteria that is used to select a user morsel. These criteria are in the same format of
the user morsel’s criteria (Up), which are defined in Definition 1 in Section 4.2.
To extract and insert data into the new cache relation with SQL, the MU agent first submits an
INSERT operation to the server to insert the data to a temporary base relation with the same name as the
cache relation. Note that the data is SELECTed from the corresponding permanent relations and inserted
into the temporary after it has been CREATEd. This relation then is moved to the local DBMS as a
cache relation.
Sometimes an existing cache relation needs to be modified, such as adding more columns for some
new attributes. This case happens when a cache relation needs to be extended, as when another entry of
the user profile attempts to create the same cache relation. We only allow one cache relation to be
extracted from a base relation. Thus, creating another cache relation is not allowed. The solution to this
problem is to add the new attributes in this entry of the user profile to the existing cache relation. To add
a column for a new attribute into an existing relation, use the ALTER command in SQL. After adding
6
new attributes into an existing cache relation, use UPDATE command in SQL to insert values for the new
attributes of the cache relation.
Note that one base relation can only produce one cache relation and more attributes can be added into
the cache relation later. A cache relation will not be split into two or more relations, nor will it be
coalesced with other relation(s). This is very different from the semantic caching, in which a semantic
region may be split into several semantic regions or coalesced with other semantic regions over a time
frame [6]. In addition, the MU user will query these cache relations as their own private relations on the
local DBMS. He is not aware of the morsels within a relation.
4.4. Cache Replacement Policy
The cache replacement policy is another factor that affects the caching performance. This policy
determines which part of cache will be replaced when the cache is running out of space for new cache
data. There are three different types of cache replacement policies, which are based on temporal locality,
spatial locality, and semantic locality. Temporal locality is the property that data in which have been used
recently will be used again soon, such as MRU. Therefore, a cache replacement policy using temporal
locality property may replace the lease recently used (LRU) data. Spatial locality is the property that the
data spatially close to the recently used data are likely to be used again in the near future. Thus, a cache
replacement policy using spatial locality property would replace the data that are spatially farther away
from the recently used data. The property of semantic locality is that a semantic region which is most
similar to a currently being accessed region is most likely to be used in the future. Regions which are not
related to the current queries, based on a semantic distance function, should be targets for replacement
first.
We propose a new replacement policy which uses the property that less frequently used data are less
likely to be used again (LFU). This proposal is based on our observation, Kenning’s empirical results
[15], and the common belief of “hot spot”. This property is thus called frequency locality. The
replacement policy, which uses the frequency locality, will replace the least frequently used cache relation
with a new cache relation when the cache does not have enough space. A frequency function constantly
records all the access frequencies for all relations on the database server and the local DBMS. The
relations of the local DBMS have different names from those of the base relations. However, we keep
one counter for both. In any case, it is reasonable to assume that less frequently used data on a DBMS is
less likely to be used again, because the access frequency history is a lifetime record. Therefore, using
frequency locality property to apply on the cache replacement policy is suitable.
4.5. Cache Update
There are two aspects in terms of cache update. The first aspect is how to update the cache on an MU
when data on the database server is updated. The second aspect is how the MU user writes on the cache,
and how to deal with the data consistency problem.
For the first aspect, we adapt the idea proposed in [1], in which an invalidation report (IR) has been
used to inform MUs which cache data are invalid. We use one of the three proposed models, the time
stamp model. In addition, we add modified data items in the report and change the file server type to be
7
stateful. We call this report a Validation Report (VR). The motivation of using VR is because our cache
relations are kept in the local DBMS for a long time, if not permanently. The base stations in our model
keep track of the cached public relations within their wireless cell. Only the modified non-quota public
data within a certain time (say L seconds) and has been requested by an MU during the registration will
be included in the VR. When an MU gets into the cell of a base station, the MU has to register to the base
station. The base station then asks the base station with which the MU was previously registered to hand
over all the information about the MU, such as VR. This step is required in order for hand-off to be
completed. A data item in a VR included Time Stamp, Relation Name, Primary Key, and Updated
attributes as shown below:
(Time Stamp, Relation Name, Primary Key, Updated Attribute(1), Updated Attribute(2), … ,Updated
Attribute(n) )
Thus, a data item in a VR is the updated portion of a tuple. Each MU has one VR at the base station. A
VR is sent every L seconds to a specific MU. An MU waits and checks the incoming VR before
answering a normal query (see Figure 3). The broadcasting time (the time stamp) for a VR is also
included in the VR. When an MU has received its VR, it then sends an acknowledgment that contains the
broadcast time of the VR to the base station to confirm the receiving. The base station keeps all the
modified data items in the VR from the last time it received an acknowledgment from the MU. Upon
receiving a new acknowledgment, the base station then compares each data item's timestamp in the VR
with the broadcast time in the acknowledgment. In addition, it discards the data items in the VR that are
older than the broadcast time in the acknowledgment.
For the second aspect, the MU user updates the cache itself, meaning the user writes on the cache.
We propose two new ideas here to make cache write possible without releasing the ACID property. The
first new idea is categorizing relations into two types: the public relations and the private relations. When
a query accesses a private relation owned by the user, this query can be answered immediately, whereas
answering a query, which accesses a public relation, needs to wait for the next VR. This is owing to the
fact that a private relation is only modified by that one user. The data dictionary on an MU contains the
information about whether a relation is public or private. The second new idea is the quota mechanism.
The MU user can read and write on the quota data. Both of these two new ideas are elaborated in the
following two subsections.
4.6. Public Relations vs. Private Relations
Previous research does not differentiate various types of relations. The most obvious differentiation
is to separate public relations from private relations. The difference between the two is that the public
relation is shared and can be modified by a group of people, whereas the private relation is solely owned
and used by one user. Many such private relations exist in the academic environment. For example, the
relations owned by students are private relations, which are solely for their personal use. Some other
private relations are written by one user but can be read by a group of people. The owner grants the read
rights. The public access to this type of private relation is read only. The owner is the one who can make
8
changes to these private relations. Therefore, this type of relation is also categorized as private relation.
We define the private relation in Definition 3, the public relation in Definition 4.
Definition 3 A Private Relation is a relation whose primary copy exists at the MU. Only the owner of the
private relation can modify this relation.
Definition 4 A Public Relation is a relation whose primary copy exists at a database server. Portion of it
may be cached at an MU. A group of authorized people can modify this relation.
Note that the database server knows about the ownership because the ownership information is part of
the DBMS' security mechanism, and usually stored in the data dictionary. Therefore, the database server
knows an MU user is the only one who may update the private relations on the MU, and nobody else is
able to modify the copy of the private relations on the database server.
The private relations should be downloaded at the very first prefetching. This is a one-time deal.
Some of them could be created on the MU. We assume that the primary copies of the private relations are
at the MU. Before an MU user begins using the MU, he or she may work on the database server for some
time (for instance, via a fix host). Thus, there may already be private relations existing on the database
server. When the user switches to use the MU, the private data on the database server may need to be
downloaded to the MU's local DBMS (the cache). Because we assume that an MU is reliable, it is safe to
keep the primary copy of the data on the MU. In addition, from time to time the user may copy the data
back to the database server, just in case the MU user wants to share the private data with other users. The
sharing here should be READ only. The MU user may eventually switch back to using fixed host. Thus,
a copy of the private data on the database server is necessary.
The differentiation of the private relations and the public relations has several advantages. First, the
private relations can be updated at a mobile unit without worrying about the data consistency problem.
Consequently, it would prevent some uplink wireless traffic that handles the WRITE operations at the
database server. That is, the data consistency problems of the normal WRITE operations need to be taken
care of via communication on the wireless link. Therefore, WRITE on the private relation prevents this
traffic on the wireless link. Second, owing to the nature of the private relations that a user can always
write to a private relation on the MU, we could treat the WRITE operations on the private relations as
cache-hits. Thus, allowing the WRITE of the private relation on an MU increases the hit ratio. Third,
when a user on an MU submits a query to access a private relation, the query can be answered
immediately. Whereas answering a query, which accesses a public relation, needs to wait for the next VR
to get the newest version of data. The rationale is that the private relations have only been modified by
the user on the MU. Therefore, the data version on the MU is always the newest one. Accessing the
private relations would always get the newest version of data. There is no reason to wait for the next VR.
Hence, it accelerates the response time. Thus differentiating relations into two types, namely the public
relations and the private relations, is worth the effort.
9
4.7. Quota Mechanism
If we wish to write on the public relations of local DBMS, some complication comes up because the
public relations are shared by a group of users. The complication is that the systems must keep track of
all operations on the public relations to ensure the ACID property. To ensure data consistency, a locking
mechanism is one solution for a pessimistic approach. However, the disadvantage of locking mechanism
is that a long lock prevents others from accessing the same data [16]. If the mobile systems use the
locking mechanism, such a long locking situation could happen quite often because MUs are frequently
disconnected for an extended length of time. Our solution to the problem is to use the quota mechanism
to download a quota of data items from the database server to the cache of an MU. The leftover quota
may return to the server, or alternatively more quotas may be downloaded from the server. Using this
strategy, mobile clients can have their own allowance of data to work on and prevent a long wait. The
idea is quite simple, just like resource allocation. The database server allocates some data resources to the
cache on an MU, and these data resources become delegations of the database server on MUs.
There are two previous works addressing this concept [19][23]. In [19], the author proposed an
escrow transactional method that pre-allocates some fixed portion of an aggregate quantity data (see
Definition 5) to a transaction prior to its commit. When the transaction commits, there will be enough
value of this data item due to the pre-allocation. Only an aggregate quantity data can be updated this way
in this approach. The whole mechanism of this approach takes place in a centralized DBMS. Thus, it is
not quite the same concept as the one that we are addressing. The mechanism addressed in [23] is closer
to our approach. The idea in this paper is that they divide an aggregate quantity data in a server into
several fixed units. For instance, if there is a data value “20”, they could divide this value into four data
units with each data unit being a five. Each unit then is allocated to different clients. Each client can
have full authority to handle the data unit given her. The server may choose to keep one unit of data to
herself. When a client does not have enough data to commit a transaction, the MU may request some
more data unit(s) from either the server or the other client who holds the same data unit. These two
approaches only apply to aggregate quantity data. The data that can be handled are still very limited.
We build on these ideas so that the aggregate quantity data can be dynamically allocated to different
MUs with different data units so called quota (see Definition 7). Our approach also allows non-aggregate
data (see Definition 6) to be a quota. However, only the aggregate quantitative data can be divided into
several units and allocated as quota. An MU must download the whole non-aggregate data as a quota.
These approaches significantly enhance the ideas proposed by the two previous researches. Our approach
allows any kind of data to use the quota mechanism as long as the DBA of the DBMS defines data items
as “quota data” in the data dictionary. Our approach is also the first one that can have different sizes of
data unit. In addition, we are the first to propose use of the quota idea in a mobile DBMS environment.
Definition 5 Aggregate quantity data can be computed with mathematical operators (such as addition,
subtraction, etc.) in a database management system. The data type of aggregate quantity data is numerical
only.
10
Definition 6 Non-aggregate data cannot be computed with mathematical operators in a database
management system. The data types of the non-aggregate data could be numerical, string, and character.
Note that a numerical data is not necessarily an aggregate quantity data, such as social security
numbers. We do not compute social security numbers in a database management system.
For the preconditions of Definitions 7, 8, and 9, let D = {Ri} be a set of base relations and Dc = {CRi}
be a set of cache relations where CRi is the cache relation for Ri. Also, let CAi be the attributes for CRi
and Ai be the attributes for Ri, and CAi  Ai. Let attribute ai  CAi, tc  CRi; t  Ri  tc(ai) = t(ai) for
all ai. Notation t(ai) represents the value of the attribute ai, which is in the tuple t [18]. To improve
performance by allowing updates at the MU, however, this may be relaxed. Given an aggregate quantity
data, we can cache part of the data value at the cache.
Definition 7 Given a data value qt. A quota, q, is either:
1. If qt is an aggregate quantity data, then q = qt - qm, for some value qm.
2. If qt is a non-aggregate data, then q = qt.
Now we explain how the quota concept is used in MU caching. Let tc  CRi. Suppose aq  CAi is an
attribute defined by the DBA to be a quota attribute. For the case one in Definition 7, this means that
when the tuple is fetched to the MU, a quota of the attribute value (rather than the exact attribute value) is
fetched. Let t  Ri be the corresponding tuple from Ri related to tc. For the quota attribute, aq, tc(aq) =
t(aq) - qm. Therefore, tc(aq) is the quota q, t(aq) becomes the remainder value qm, and the total value, qt, is
not stored. Prior to prefetch, t(aq) = qt. During prefetch t(aq) becomes qm and q is downloaded to the
cache. For case two in Definition 7, tc(aq) = t(aq) and t(aq) becomes “null” afterward. Some examples of
case one in Definition 7 are a bank account balance and total number of seats on a flight. A specific seat
of a flight belongs to case two in the definition.
A quota attribute is an attribute of a relation. This attribute can have a quota value. The DBA
decides which attributes are quota attributes and keeps this information in the data dictionary. We define
quota attribute in Definition 8.
Definition 8 Suppose Aq is an attribute and Aq  Ai. If the quota mechanism applies to Aq, then Aq is a
quota attribute.
When an MU agent caches an aggregate quantity data, it could download only the quota amount as
defined in case one of Definition 7. The quota threshold is recorded in the data dictionary by a DBA, and
is the amount of value that can be a quota. The quota threshold of a data value cannot be greater than the
data value. This is part of the quota constraints defined in the data dictionary. Quota threshold is defined
in Definition 9. In the definition, V is a real number, V denotes the largest integer such that V  V.
11
Definition 9 Let aq  Ai be a quota attribute of an aggregate quantity data type and t  Ri. Vqt is a quota
threshold for aq in t if there exists a real number, Raq, where Vqt = Raq * t(aq), and 0 < Raq  1. This Raq
is a fixed value for aq and is stored in the data dictionary.
Note that a non-aggregate data, such as a string, cannot have a partial value for a quota. This type of
data could have quota as in case two of Definition 7. A DBA, according to the intrinsic nature of
attributes and rules of the organization (such as the security), defines whether an attribute can or cannot
have a quota in the data dictionary.
Algorithm Post-Process
1. Post-Process(relation) { /* This program is run by the MU agent */
2.
Locate the record that contains the base relation in the user profile;
3.
For (each tuple that satisfy the criteria list in the record of the user profile) do {
4.
For (each quota attribute) do {
5.
if (quota attribute is aggregate quantity data) then
6.
The attribute value of the base relation = attribute_value - quota /* send a query to database
server to perform this statement */
7.
else
8.
The attribute value of the base relation = null; /* send a query to database server to perform
this statement */
9.
The attribute value of the cache relation = quota; /* send a query to database server to
perform this statement */
10.
}
11. }
12. }
Quota downloading occurs in three situations. The first occurrence comes about during the
prefetching stage. The quota data will be fetched along with other data in the same relation by a query.
After running the program stated in Algorithm Extract-Cache-Relation (see Section 4.3) to fetch the data
from the database server to the MU agent, both the base relations and the cache relation need to be postprocessed by running Algorithm Post-Process to accommodate the quota mechanism. The quota data
cannot be used until this Post-Process program is accomplished. An MU agent runs the program. Note
that the information of quota attributes is obtained from the data dictionary.
The second occurrence happens when a quota on the cache is not sufficient to answer a query. The
local DBMS will return an error message to the MU agent indicating that the data is not sufficient to
answer the query. The MU agent then submits a query (a transaction) to get more quota value from the
database server. Upon receiving the query, the database server sends the quota to the MU. The server
waits until receiving an acknowledgment from the MU, which requests the quota, to ensure that the quota
has been received, and then commits the quota transaction. If the server does not receive an
acknowledgment for a certain time frame, the transaction will be aborted. Thus, from the server’s point
12
of view, downloading quota of a data item is like a transaction requested by an MU. To be exact, it is the
MU agent that submits this request (see our model architecture in Section 4.9). Note that all these
downloading operations are recorded in the log of the database server as well. In the case of system
failure, all the uncommitted transactions can be re-done according to the log. The recovery issue is
another big research area that we will not explore further, rather leaving it to future research. The
methods of obtaining quota are two types of update SQL. The first type of SQL updates the base relation,
the other updates the cache relation.
Algorithm Upload-Cache
1. Upload-Cache(base-relation, cache-relation, Quota-attributes) {
2.
update cache-relation to clean up the quota values; /* send a query to the local DBMS to perform
this statement */
3.
update base-relation to add the quota values; /* send a query to the database server to perform
this statement */
4. }
The third occurrence of quota downloading is the case of cache-miss. That is, the data is either not in
an existing cache relation or the cache relation does not exist. The MU agent could generate a query and
send it to the server to get a quota for the missing data. New entry (or entries) in a temporary user profile
based on the query will be created on the MU. The cache relation will be created, if it is not existing; a
new attribute will be added, if there is one; and the missing data will be fetched. The SQL to fetch the
new data is similar to that of the first occurrence except the fact that a new SQL statement is needed to
add the new attributes. This new SQL is a table alteration command that adds one column for a new
attribute into the cache relation. Prior to running the SQL, this alteration SQL is run to add a new column
for each new attribute of the cache relation. After all the SQL are run, the temporary user profile then is
appended to the user profile. Alternatively, the query that requests the missing data item could be sent to
the database server. After obtaining the query result, the result will be sent back to the MU. Please refer
to the different scenarios of the cache-miss handling stated in Section 5.
From the view of an MU, the downloaded quotas become the MU's own data in the cache. Later the
quota leftover could be uploaded back to the server. Algorithm Upload-Cache demonstrates the upload
procedures. An MU agent runs this program (Algorithm Upload-Cache). Alternatively, more quotas
could be downloaded from the server. Two SQL statements are used to upload the quota from a cache
relation to a base relation.
The purpose of the quota mechanism is to increase cache-hits. Without the quota mechanism, an MU
may need to wait for its turn to work on a data item (checkout the data item) on the database server. That
is, wait until the other user unlocks the data item (check-in the data item). There are two scenarios here.
First, data is not already in the cache. Second, the data is in the cache but cannot be used by the MU. We
see both cases as cache-misses. With the quota mechanism, we allocate part of the data resources to MUs
caches so that the MUs’ users do not have to wait for the data on the database server. It makes the cache-
13
misses for these types of data become cache-hits. Thus our approach, the quota mechanism, increases the
ratio of cache-hit from the non-quota approach.
4.8. Query Categorization
To further improve the query response time, we categorize queries into two types in the mobile
environment. One is a normal query just as those performed on the fixed network. The MU needs to
make sure the cache data is validated before using the cache data to answer a normal query. These
normal queries require data to be consistent with that at a database server. Note that a normal query that
accesses private data in a private relation is answered immediately. The other query type, named a NOW
query, is also answered immediately without waiting for the up-to-date data. Thus, a user can use a NOW
query to obtain timely data that is more important to the user than the correct version. Stock price
information is a good example. Some stable data, such as history data, also can use this kind of query.
The NOW query can avoid the latency of the data update and thus improve the query performance.
Notice that the latency of the data update is caused by the synchronous approach. In the synchronous
approach, an MU must wait for a VR to answer queries and to ensure the data validation. A user can
specify a NOW query in her/his query request. "qtime = 'NOW'" is a special key statement for the NOW
query. The MU agent will pre-process each query request, and discriminate which queries can proceed
without waiting for the next VR. Before the MU agent passes a NOW query to the local DBMS, it will
take out this predicate, "time = 'NOW'".
4.9. Model Architecture
The relationship among a fixed network, a base station, and an MU are shown in Figure 4. A base
station is a coordinator between a fixed network and an MU. A base station keeps track of relations that
are being cached on the MUs, which are currently within its wireless cell. To reduce the overhead, we
propose the use of private relations. Instead of keeping track of all relations that are being cached on
MUs, now we only need to keep track of the public relations. All the private relations are kept on the
MU. If all cache relations are private, there will be no overhead at the base station. We further categorize
public relations into quota public relations and non-quota public relations. The base station also
broadcasts validation reports (VRs) of the modified non-quota public data to MUs. To be exact, only the
non-quota public data, which has already been requested by an MU during the registration3 and has been
modified on the database server, will be included in a VR. There is a program on a base station in charge
of gathering updated public data in VRs, and sending VRs to MUs that are in its cell.
An MU is the center of the entire architecture in our model because the major target of this research is
the caching management on an MU. An MU agent (see Figure 4), among other things, is in charge of the
data caching, cache update, query pre-processing, and cache-miss handling. This MU agent plays a role
as the interface between the MU and the base station. An MU agent keeps track of relation access
frequencies for the MU user and saves this record in a linked list incorporated with a hash table. We call
it the history list. The MU agent will fetch a portion of the whole database located at the server to an MU.
3
Each mobile unit must register to the base station when enters the cell of the base station
14
This fetching operation is based on the priority list that includes a user hint in a user profile plus the
currently used relations, and the frequency of use in the history list. The user profile is created by the MU
user, and is stored on the MU. We use SQL as an interface to fetch relations from a database server to a
local DBMS. The information of the fetched relations, such as data types and data domains, is put in the
data dictionary of the local DBMS. A fetched relation could be just a portion of the base relation on the
database server. All these prefetching operations are performed by the MU agent. Moreover, An MU
agent is in charge of cache update. When the MU agent receives a validation report, it will update the
cache accordingly.
An MU agent also is an interface between the user and the local DBMS. When the user submits a
query, the MU agent will submit the query to the local DBMS immediately under three situations. The
three situations are, (1.) a query is a NOW query, (2.) a query accesses private relations, and (3.) a query
accesses quota public data. In turn, the local DBMS will proceed with the query. If a query is not fit one
of these three situations, the MU agent needs to wait for the next validation report before passing this
query to the local DBMS to answer it. This is to ensure that the newest cache data will be used to answer
the query. In the case of a cache-miss, the local DBMS will send the error message to the MU agent. The
MU agent generates a query based on what data is missing. This query then is sent to the database server
to handle the cache-miss. The user profile plays a role in the prefetching. In addition, the MU agent uses
it as a road map to locate base relations on the database server and cache relations on the MU during the
cache update and cache-miss handling. The MU agent also uses the user profile to check whether a base
relation is cached on the MU.
5. CASES OF CACHING MANAGEMENT
In this section, we discuss different scenarios of cache use and cache-miss handling. We then derive
mathematical formulas, which reflect the behavior of the different approaches with respect to hit ratio and
number of queries that can be supported by the wireless link.
5.1. Cache Use
There are several possibilities to use data items on an MU's cache:
1. Perform only the READ operations on the cache data. The WRITE operations are uplinked to the
fixed network to perform. This case is called UR.
2. Perform READ and private WRITE on the local DBMS. Send each query performing a public
WRITE to the database server. Handle the query in the database server and send back the result to
the MU. This case is called UR/PrivW.
3. Perform READ and public WRITE on the quota data on the local DBMS. Send the sub-query,
performing PUBLIC WRITE on the non-quota data to the database server. Handle the query in the
database server and send back the result for the MU. This case is called UR/Quota.
4. Perform READ, private WRITE and public WRITE on the quota data on the local DBMS. Send the
sub-query, performing PUBLIC WRITE on the non-quota data to the database server. Handle the
query in the database server and send back the result for the MU. This case is called UR/PrivW/Quota.
15
There are some other possibilities, which clearly have undesirable performances; we do not include
them in this research. For instance, uplink to the server for all READ and WRITE operations.
Because we have an independent local DBMS on an MU, this makes things much easier to
implement. The DBMS will take care of the indexing as long as we include primary keys in cache
relations, which is what we have done. In addition, cache relations and base relations have different
names that further make it easier for us. A user should know about the names of cache relations on her
MU, because the user defined the cache names in the user profile. When the user submits an SQL
request, she would know which relation(s) to use. Thus, the user has the flexibility to choose either from
cache relations or base relations to run SQL on.
5.2. Cache-miss Handling
When data cannot be found on the local DBMS, a cache-miss occurs. That is, SQL processing will
return an error message indicating the specific data that cannot be found in the local DBMS. The MU
agent then needs to handle the cache-miss situation. In the case of cache-miss handling, there are also
several possibilities:
1. Uplink to the fixed network for the missing data items. That is, send the query that attempts to use
the missing data to the fixed network. The database server handles the query and sends the result
back to the MU. This case is called MResult.
2. Send a query, containing the relation names with the missing data items of the cache-miss, to the
server on the fixed network. The server then sends back the missing data items. Note that the data
items, similar to a VR, would be in a compressed format that contains a relation name and the tuples
that contain the missing data item. This case is called MData.
3. Send an error message to the user, and abort the query. This case is called MError.
There are two scenarios with respect to a cache-miss. First scenario is that the requested relation is
not in the cache. In this case, the relation could be dealt with in case M Data. That is, request the database
server to send the missing data. In this case, the miss data handling is very similar to the case of fetching
a new relation onto the local DBMS. First, the MU agent adds the new relation’s information into the
user profile. Based on the new information in the personal profile, the MU agent then creates a new
cache relation on the MU. The next step is to extract data from the base relation, and then to insert data
into the new cache relation. Once this has been done, the same query then can be run against the new
cache relation. With case MResult, simply send the query to the database server to get a result. In the case
of MError, inform the user of an error message.
The second scenario occurs only when the requested attributes are missing and the relation that
contains these attributes exists, only fetch those tuples that contain the missing data in the format that is
stated in cache update section. While the systems are fetching the missing data, the MU agent submits
queries to add new columns for any needed new attributes in the cache relation. When the MU has
obtained the data, it then puts these fetched tuples into the corresponding relation(s). Again, this second
16
scenario is dealt with in case MData. The cases of MResult and MError are handled in a similar manner to that
with MResult and MError in the first cache-miss scenario.
5.3. Mathematical Equations
There are four cases of cache update and three cases of cache-miss handling. Combining these
different cases, we could have twelve different combined cases: UR-MResult, UR-MData, UR-MError, UR/PrivWMResult, UR/PrivW-MData, UR/PrivW-MError, UR/Quota-MResult, UR/Quota-MData, UR/Quota-MError, UR/PrivW/Quota-MResult,
UR/PrivW/Quota-MData, UR/PrivW/Quota-MError.
In addition, another case that is similar to UR-MResult with an invalidation report (instead of a
validation report) has been addressed in six different papers [1] [2] [13] [3] [25] [6]. We call this case
UR'-MResult'. Chan, Si and Leong’s approach [5] allows WRITE on the cache, but the WRITE applies only
on the cache-hit data. A WRITE query must lock the data on the server prior to write and update the
server after write on the cache. In other words, their approach of WRITE operation in fact could be
deemed as the case of cache-miss, MData. That is, the traffic on wireless link that is caused by sending the
updated data from the MU to the database server is the same as that caused by sending missing data from
the server to the MU. Hence, this approach can be categorized in case UR-MData. Nevertheless, we use
private WRITE and quota mechanisms in our model; therefore, the cases of our approach are UR/PrivW/QuotaMResult, UR/PrivW/Quota-MData, and UR/PrivW/Quota-MError. Most of the previous approaches only handle READ
operations on an MU. Our performance evaluation shows the advantages of this approach.
In this section, we develop a mathematical equation of TQ for each case. TQ is the potential
maximum number of queries that the wireless link can handle in L interval. This is the throughput we use
to evaluate the performances of different cases. In addition, we assume h is the original hit ratio for the
data access on cache. If a requested data is in the cache, traditionally this case is deemed as cache-hits.
The hit ratio will be adjusted as we develop the mathematical equations. A few notations are defined as
follows for the mathematical equations (some of them are adapted from [1].) The parameters that are
used in the mathematical equations are defined in Table 1.
We assume that the server begins to broadcast the validation report periodically at times Ti, and Ti =
iL. The server keeps a list Uj defined as follows:
Uj = {[j, tj] | j  D and tj is the timestamp of the last update of j such that Ti-1  tj  Ti}
We further assume that a query performs only either a READ or a WRITE. In reality, a query could
be several READs, WRITEs, or both. In our evaluation model, we will break down this type of query
into a set of queries that only contain one READ or WRITE. When we actually implement a query, we
do not break down the query. If a query accesses some data that are cache-hits and some data that are
cache-misses, only the cache-misses data will be handled via wireless link by creating a new query to
perform it. Hence, breaking down a query conceptually is only for the evaluation purpose.
Under normal circumstances, if data can be found in the cache, then it is deemed a cache hit (h)
regardless of whether the operation is READ or WRITE. If a WRITE on the cache is not allowed, then
we must adjust the original hit ratio (h) to a new one (h’). Therefore, h’ = rR * h. See Table 1 for rR. On
the other hand, if we allow a private WRITE on a local DBMS without asking permission from the
17
database server for the data coherence, then a private WRITE could be deemed as a cache-hit. Thus, hit
ratio could be adjusted as the following equation:
(rR * h * QL) + QprivW
h' =
QL
Now, if only WRITEs on quota data are allowed, then:
(rR * h * QL) + QqpubW
h' =
QL
Finally, if only WRITEs on quota data and on private data are allowed, then:
(rR * h * QL) + (QqpubW + QprivW)
h' =
QL
As you can see, allowing different types of WRITE could increase cache-hit ratio to different extents.
A general equation for maximum capacity of wireless link of all cases is discussed as follows.
Assume that a VR contains all the modified data that are cached on an MU since the last time a VR was
sent to an MU. Assume an MU listens constantly while connected (awake). In the caching validation, the
server sends a VR to an MU every L seconds. The total number of bits available during the L interval is
(L * B). The number of bits of all VRs is (n * bVR). Thus, the total number of bits available for query
answering is (L * B – n * bVR). A fraction TQ * (1 - h') of TQ corresponds to the queries that are cachemisses. Each one of the cache-miss queries takes (bq + ba) bits to answer. If we consider the cache-miss
traffic on the wireless link, the traffic in bits due to the cache-miss queries is: TQ * (1 - h') * (bq + ba). 4
Therefore,
(L * B) – (n * bVR) = TQ * (1 - h') * (bq + ba)
Thus,
(L * B) – (n * bVR)
TQ =
(1 - h') * (bq + ba)
The equations for different cases have different h', and ba.
4
We ignore the acknowledgement because all cases have same amount of acknowledgement, there is no point to add
this factor in comparisons.
18
6. SIMULATION RESULTS AND PERFORMANCE ANALYSIS
In this section, we analyze performances of the cache management algorithms. Mathematical models
for all cases of the caching management were created in the previous section. Crystal Ball is the
simulation software that has been used. The simulation results are analyzed and compared to each other.
The comparisons among all caching strategies are based on wireless link usage with respect to throughput
in a certain time period. In other words, how many queries can be answered within a certain time frame
and in a limited wireless link bandwidth for different approaches are compared. Impacts of different
percentages of WRITE, of private WRITE and quota public WRITE on throughput and on hit ratio are
evaluated.
Simulation results are listed and discussed by experiment type. Note that all Merror5 cases are not
included in this section because all these cases lack cache-miss traffic on the wireless link and are thus not
applicable. Performance evaluations and comparisons of different cases then are carried out and
analyzed.
6.1. Experiment one
The first experiment is based on the most likely hit ratio (80%), WRITE probabilities (20%), private
WRITE probability (50%), and quote public WRITE probability (50%) to simulate all nine cases. Thus,
the purpose of this experiment is to show the default cases and to compare the performances. The
parameters for the first simulation are summarized in Table 2. Some of the figures in Table 2 are from
the previous papers [1] [2] [3]. We assigned the rest of the parameters with realistic value. Most of the
parameters are generated based on triangular distribution. We have chosen the triangular distribution as
we think that it more accurately reflects the distribution of the parameter values in Table 2 used. The
parameters estimating size (bx variables) will definitely not follow either a uniform, normal, or
exponential distribution but we could expect certain values to clearly standout as the most likely. The
other parameters similarly use a triangular distribution. In actuality since we run 1000 trials for each
experiment and report the average, we do expect that the choice of distribution has little impact on over
all results. As to the number of MUs (n) that are in the cell of the base station within L time interval, we
use Poisson distribution. We assume MUs arrive randomly. Thus, the MU arrival rate is based on
exponential distribution, which generates random arrival time. Consequently, the number of MUs in the
cell of a base station during a time interval is a Poisson distribution. We assume that the likeliest number
of MUs is 100 in a cell during the time interval L, and one thousand trials are extracted from Poisson
distribution with mean 100. Each case has been run with one thousand trials in Crystal Ball. The
simulation results provide the frequency distribution of one thousand TQs 6. We are interested in the
mean of the distribution. Mean of the distribution represents TQ. The rest of the statistics describe
5
6
Send error message to user when a cache-miss occurs.
TQ is the potential total number of queries that all MUs in a cell can submit and obtain results successfully.
19
properties of the mean value of TQ. We will use this mean value of TQ as the simulation result, and will
compare it with other cases’ mean values of TQ.
Normalized TQs are shown in Table 3. That is, TQs of all cases are divided by TQ of case U R’MResult’ which represents most of the previous researches [1] [10] [3] [13]. Thus, case UR’-MResult’ is one
in Table 3 and the rest of the cases are the ratios that compare with case UR’-MResult’. It clearly shows that
our approaches, cases UR/PrivW/Quota-MResult and UR/PrivW/Quota-MData, are much better than all the rest of the
cases. Our approaches, cases UR/PrivW/Quota-MResult and UR/PrivW/Quota-MData, are about 1.7 times better in
terms of throughput than case UR’-MResult’ in this simulation. The only cases that perform worse than case
UR’-MResult’ are cases UR-MResult and UR-MData. This is not a surprise to us because case UR’-MResult’
broadcasts IRs and cases UR-MResult and UR-MData broadcast VRs. The size of an IR is smaller than the
size of a VR. Thus, case UR’-MResult’ utilizes less wireless link and results in a better throughput than
cases UR-MResult and UR-MData that have similar cache management mechanism with case UR’-MResult’.
Case UR-MResult represents the previous approaches in [6] [24] and case UR-MData represents the other
previous approach in [5]. The simulation results demonstrate that our approach is much better than all the
previous researches in terms of throughput. The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult,
UR/Quota-MData, UR/W-MResult, UR/W-MData are also performing better than the previous approaches.
6.2 Experiment Two
In this experiment, we increase the WRITE probability from the default situations. We change the mean
value of WRITE probability to 80%, the minimum value of WRITE probability to 70%, and the
maximum value of WRITE probability to 90%. The rest of the assumptions are the same as the first
simulation’s. The simulation results are shown in Table 3. As you can see, our approach performs even
better in this simulation than in the previous one. In terms of throughput, our approach is about 3.5 times
better than case UR'-MResult', instead of just 1.7 times better in the previous simulation. The cases UR/PrivWMResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData, UR/W-MResult, UR/W-MData also perform better over case
UR'-MResult' than the previous simulation.
6.3. Experiment Three
In this experiment, we increase both WRITE and quota public WRITE probability from the default
situlations to see the impact of changes. In this simulation, we increase the probability of quota public
WRITE to 80% (minimum is 70% and maximum is 90%). The percentage of WRITE is the same as the
second simulation’s, 80%. The rest of the parameters are the same as the first simulation’s. The
simulation results are even better than the second simulation’s. Our approaches are 5.6 times better than
case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData,
UR/W-MResult, UR/W-MData also perform better than the second simulation. By all means, they perform even
better when compared with the first simulation’s.
6.4. Experiment Four
We increase probabilities of WRITE, of private WRITE, and of quota public WRITE in this
experiment to see the impact. In this simulation, we increate the ratio of private WRITE to 80%
20
(minimum is 70% and maximum is 90%). The rest of the parameters are the same as the third
simulation’s. The simulation results of our approaches are even better than the third simulation’s. Our
approaches are 8.5 times better than case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivWMData, UR/Quota-MResult, UR/Quota-MData, UR/W-MResult, UR/W-MData are also perform better than the third
simulation. Cases UR/PrivW-MResult, UR/PrivW-MData are now 3 times better than case UR'-MResult'. By all
means, they perform even better when compared with the first two simulations’.
6.5. Experiment Five
We decrease the hit ratio from the the default value to a very low number in this experiment to see the
impact. In this simulation, we decrease the hit ratio to 10% (minimum is 5% and maximum is 30%). The
rest of the parameters are the same as the first simulation’s. The simulation results of our appraoches are
not as good as the previous simulations’. In this case, our approaches are still 1.28 times better than case
UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult, UR/Quota-MData also
performed better than case UR'-MResult'.
6.6. Experiment Six
In this experiment, we decrease the hit ratio to a medium degree from the default situations to see the
impact. In this simulation, we set the hit ratio to 50% (minimum is 30% and maximum is 80%). The rest
of the parameters are the same as the fifth simulation’s. The simulation results of our approaches are 1.47
times better than case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult,
UR/Quota-MData also perform better. The impacts of different hit ratios follow the same pattern as the
previous simulations’ .
6.7. Experiment Seven
In this experiment, we set the hit ratio to a very low number and the WRITE probability to a very
high number to see the impact. As we have seen in experiment five, low hit ratio does not show a large
difference in our approaches. Thus, we would like to see the impact of low hit ratio but high WRITE
probability. In this simulation, we set WRITE probability to 90% (minimum is 70% and maximum is
99%). The rest of the parameters are the same as the fifth simulation’s. The simulation results of our
appraoches are much better than the fifth simulations’. In this case, our approaches are about 3 times
better than case UR'-MResult' (see Table 3). The cases UR/PrivW-MResult, UR/PrivW-MData, UR/Quota-MResult,
UR/Quota-MData, UR/W-MResult, UR/W-MData also perform better than case UR'-MResult' in this simulation. Thus,
as long as the WRITE probability is high and even though the original hit ratio is very low most of the
cases can perform better than case UR'-MResult'. The impacts of different hit ratios follow the same pattern
as the previous simulations’ .
6.8. Impact of Private and Quota Public WRITE Combined
In this section, we report on several experiments run to see the impact of private WRITE and quota
public WRITE combined. The original hit ratio, h, remains 80%. The simulation results are shown in
Figures 5, 6, and 7. The breaking point in 20%-WRITE is at 10%. The breaking point in both 50%-
21
WRITE and 90%-WRITE are at about 5%. The impacts of our approaches follow the similar pattern.
The three READ only cases remain, not surprisingly, the same low throughputs over all different
percentages of WRITE, private WRITE, and quota public WRITE.
6.9. Summary of Simulations
By far, our approaches, cases UR/PrivW/Quota-MResult and UR/PrivW/Quota-MData, have had the best
performances all the way through comparing them with the rest of the cases. The next best are U R/PrivWMResult and UR/PrivW-MData . The ones that follow are UR/Quota-MResult and UR/Quota-MData. The worst three are
UR’-MResult’, UR-MResult and UR-MData. In addition, no previous research approaches perform well. Case
UR-MResult represents the previous approach in [6] [24] and case UR-MData represents the other previous
approach in [5]. Both of these two cases perform worse than other cases all the way through. Case UR’MResult’, which represents most of the previous researches [1] [3] [13], is a bit better than cases UR-MResult
and UR-MData. Nevertheless, it is still outperformed by most of the approaches. As you can see, READ
only approach is very expensive. The reason is very obvious. READ only approach treats solely READ
as a cache-hit. WRITE operations, which are required coordination with the database server, consume a
lot of wireless link resource. Note that WRITE operations are allowed in [24] and [5]. However, the
WRITE operations in these two approaches require coordination with the database server, which is
deemed as cache-misses. This is why we treat these two approaches, [24] and [5], as READ only.
As regards the impacts on the adjusted hit ratios, all cases follow a similar pattern. The impacts on
the adjusted hit ratio, h’, for all cases are listed in Table 4. All READ only approaches have smaller
adjusted hit ratios because WRITE operations are excluded as a cache-hit. Case UR/PrivW/Quota has very
good adjusted hit ratios as long as one of the three situations occurs. The three situations are high ratio of
WRITE, high ratio of private WRITE, and high ratio of quota public WRITE. Note that the uplink is only
a fraction of the capacity of the downlink in the real world situation. We assume that downlink and
uplink channels have the same bandwidth capacity for the evaluation purpose. Our approaches have
significant improvement on the hit ratio over the other approaches. By all means, the performance of our
approaches will be even better in the real world situation7, because higher hit ratio means lower uplink
traffic.
We examined the impacts of different percentage of WRITE with different probabilities of private
WRITE and quota public WRITE. WRITE probability at 10% is the breaking point for our approaches
(see Figure 5). That is, when both private WRITE and quota public WRITE is more than 10%, our
approaches start to outperform the previous approaches. When the WRITE probability increases to 50%
(or 90%), the breaking point becomes at 5% (see Figures 6, 7). In addition, when the WRITE probability
increases with high percentage of private WRITE and that of quote public WRITE (90% or more), our
approaches outperform dramatically the previous approaches (cases UR’-MResult’, UR-MResult and URMData). Note our approaches always perform better than the previous approaches UR-MResult and UR-MData.
The reason why our approaches cannot perform better than case UR’-MResult’ in low private WRITE and
quota public WRITE is due to the fact that case UR’-MResult’ broadcast IR instead of VR.
7
Unlink (backchannel) is only a fraction of the capacity of the downlink.
22
One last point we would like to discuss is about the impacts on different size of granularity. In cache
update and cache miss handling, our granularity is a set of tuples, and each tuple only contains a subset of
attributes. Semantic caching’s granularity (semantic region) is a set of tuples, and granularity of page
caching is a page. Obviously, the size of our granularity is the smallest, the size of a page is the largest,
and the size of semantic region is in the middle. Consequently, our performance is the best, semantic
caching is second, and the page caching is the worst. In equation TQ, the variable bd = n * granularity,
where n is the number of granules. Obviously, when granularity is smaller the bd is smaller, and a smaller
bd results in a larger TQ (throughput).
7. SUMMARY AND FUTURE WORK
In this research, we have designed and developed all the required algorithms8 for a mobile agent on
an MU along with a program on a base station. The whole design aims at improving data
caching/replication on a mobile unit including, among other things, prefetching/hoarding, cache
management, cache coherence, and cache replacement. The simulation results have shown our
approaches are far superior to the previous researches. This is because we use a quota mechanism and
categorize relations into private and public. These approaches enable a user to query private and quota
data directly from the local DBMS on an MU without data coherence problem. In addition, our
approaches significantly reduce usage of the valuable wireless link, which is the most limited resource in
a mobile computing environment. The previous researches [1] [3] [13] [6] assume a READ only
approach, which has been shown to be not very efficient when probabilities of private WRITE and quota
public WRITE are high. The approaches in [24] and [5] allow WRITE on cache. However, these
WRITE operations must be kept in sync with the database server’s. This consumes a large portion of the
valuable wireless link.
There are some possible extensions of this paper for future research. First, we would like to translate
all the algorithms into some high level languages, preferably Java. Java is highly portable and excellent
in building a front-end interface, such as a web page. The built Java applets can talk with JDBC. ODBC
is the interface to a RDBMS. JDBC can then interact with ODBC. Second, we would like to address the
issue of how to generate VRs efficiently on the base station, including checking updated data with the
database server. This is an important issue that we would like to address in the future. Lastly, how to
handle hand-off efficiently is also an important issue that we would like to address in the future.
REFERENCES
[1]
D. Barbara and T. Imielinski. Sleepers and Workaholics: Caching Strategies in Mobile
Environments. In Proceedings of the 1994 ACM SIGMOD International Conference on
Management of Data, pages 1-12, May 1994.
[2]
D. Barbara and T. Imielinski. Sleepers and Workaholics: Caching Strategies in Mobile
Environments. MOBIDATA: An Interactive Journal of Mobile Computing, 1(1), Nov. 1994.
8
They are not all listed in this paper. If interested, please refer to [26].
23
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
O. Bukhres and J. Jing. Performance Analysis of Adaptive Caching Algorithms in Mobile
Environments. International Journal of Information Sciences (IJIS), North Holland, 1995.
M. Carey, M. Franklin, and M. Zaharioudakis. Fine-grained sharing in page server database
systems. In Proceedings of ACM SIGMOD Conference, 1994.
B. Y. Chan, A. Si, and H. V. Leong. Cache Management for Mobile Databases: Design and
Evaluation. In Proceedings of The International Conference on Data Engineering, IEEE, pages
54-63, 1998.
S. Dar, M. J. Franklin, B. T. Jonsson, D. Srivastava, and M. Tan. Semantic Data Caching and
Replacement. In Proceedings of the 22nd VLDB Conference, Mumbai (Bombay), India, pages 330341, 1996.
M. H. Dunham and A. Helal. Mobile Computing and Databases: Anything New?, SIGMOD
Record, 24(4), pages 5-9, Dec. 1995.
M. J. Franklin, M. J. Carey, and M. Livny. Global Memory Management in Client-Server DBMS
Architectures. In Proceedings of the International Conference on VLDB, Pages 596-609, 1992.
M. Franklin. Client Data Caching: A Foundation For High Performance Object Database Systems,
Kluwer Academic Publishers, 1996.
R. Gruber, F. Kaashoek, B. Liskov, and L. Shrira. Disconnected Operation in the Thor ObjectOriented Database System, In Proceedings of Workshop on Mobile Computing Systems and
Applications, pages 51-56, IEEE, Dec. 1994.
J. S. Heidemann, T.W. Page, R.G. Guy, and G. J. Popek. Primarily Disconnected Operation:
Experience with Ficus. In Proceedings of the Second Workshop on the Management of Replicated
Data, Nov. 1992.
P. Honeyman, L. Huston, J. Rees, et al. The LITTLE WORK Project. In Proceedings of the 3rd
Workshop on Workstations Operating Systems, IEEE, April 1992.
J. Jing, A. Elmagarmid, A. Helal, and R. Alonso. Bit-Sequences: A New Cache Invalidation
Method in Mobile Environments, Purdue University, Department of Computer Sciences,
Technical Report CSD-TR-95-076, Dec. 1995.
J. Jing, A. Elmagarmid, A. S. Helal, and R. Alonso. Bit-Sequences: An adaptive cache invalidation
method in mobile client/server environments. Mobile Networks and Applications Journal 2(2),
pages 115-127, 1997.
G. H. Kuenning, G. J. Popek, and P. L. Reiher. An Analysis of Trace Data for Predictive File in
Mobile Computing, University of California, Los Angeles, Technical Report CSD-940016, Apr.
1994. Also appeared in Proceedings of the 1994 Summer Usenix Conference.
Won Kim, Nat Ballou, Jorge F. Garz and Darrell Woelk, “A Distributed Object-Oriented Database
System Supporting Shared and Private Databases. ACM Transactions on Information Systems,
9(1), pages 31-51, Jan. 1991.
H. Lei and D. Duchamp, Transparent File Prefetching, Columbia University, Computer Science
Department, Mar. 1995.
D. Maier. The Theory of Relational Databases. Computer Science Press, 1983.
24
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
P. E. O'Neil. The Escrow Transactional Method. ACM Transactions on Database Systems, 11(4),
Dec. 1986.
E. O’Neil, P. O’Neil and G. Weikum. The LRU-K Page Replacement Algorithm for Database
Disk Buffering. In Proceedings of the ACM SIGMOD, pages 297-306, 1993.
R. H. Patterson and G. A. Gibson. A Status Report on Research in Transparent Informed
Prefetching, Carnegie Mellon University, School of Computer Sciences, Technical Report CMUCS-93-113, Feb. 1993.
M. Satyanarayanan, J. J. Kistler, L. B. Mummert, M. R. Ebling, P. Kumar, and Q. LU. Experience
with Disconnected Operation in a Mobile Computing Environment, Canegie Mellon University,
School of Computer Science, Technical Report CMU-CS-93-168, June 1993. Also published in
Proceedings of the 1993 USENIX Symposium on Mobile and Location-Independent, Cambridge,
MA, Aug. 1993.
N. Soparkar and A. Silberschatz. Data-value Partitioning and Virtual Messages. PODS, pages 357367, 1990.
G. Walborn and P. K. Chrysanthis. ``PRO-MOTION: Management of Mobile Transactions.''
Proceedings of the 11th ACM Annual Symposium on Applied Computing, Special Track on
Database Technology, pages 101-108, San Jose, CA, Mar. 1997.
K. L. Wu, P. S. Yu, and M. S. Chen. Energy-Efficient Caching for Wireless Mobile Computing, In
Proceedings of the 12th International Conference on Engineering, pages 336-343, Feb. 1996.
Jenq-Foung Yao. Caching Management of Mobile DBMS on A Mobile Unit. Ph.D. Dissertation,
Southern Methodist University, August 1998.
25
Table 1. Parameters that are used in the mathematical equations
QL
QR
QW
RW
QpubW
rpubW
QqpubW
rqpubW
QnqpubW
QprivW
L
B
N
bq
ba
br
bd
TQ
bVR
QC
rR
H
h'
Description
Total number of queries that are submitted from an MU in the time interval L.
QL = QR + QW
Number of READ queries in QL.  QR = QL - QW
Number of WRITE queries in QL; QW = QpubW + QprivW and also QW = rW * QL
The percentage of the queries that perform WRITE in QL.
Number of queries, which perform a public WRITE;
QpubW = QqpubW + QnqpubW and QpubW = rpubW * QW
The percentage of the queries that perform public WRITE in QW.
Number of queries, which write on quota public relations.
QqpubW = rqpubW * QpubW
The percentage of the queries that perform quota public WRITE in QpubW.
Number of queries, which write on non-quota public relations.
Number of queries, which write on private relations.
QprivW = (1 - rpub) * QW
VR broadcast interval
The bandwidth of the wireless network.
Total number of MUs in the cell of a base station.
Size of a query in bits.
Size of an answer in bits; there are two types of ba: br and bd.
Size of a query result in bits.
Size of a data item in bits which is the cache update granularity.
The potential maximum number of queries that the wireless link can handle in L
interval.
Size of a VR in bits.
Total number of queries that can be served completely using cache during the
time interval L. QC = rR * h * QL
rR is the percentage of READ queries in QL. rR = 1 - rW
The pre-defined hit ratio.
The adjusted hit ratios including WRITE queries in different cases.
26
Table 2. Assumptions of first simulation
Parameters
Bd
Bq
bIR
br
bVR
H
rW
RqpubW
RpubW
N
L
B
QL
Min
64
32
16
64
64
0.7
0.1
0.3
0.3
10
N/A
N/A
N/A
Likeliest
128
64
64
128
256
0.8
0.2
0.5
0.5
100
10
19200
100
Max
256
96
128
256
640
0.9
0.5
0.8
0.8
120
N/A
N/A
N/A
Distribution
Triangular
Triangular
Triangular
Triangular
Triangular
Triangular
Triangular
Triangular
Triangular
Poisson
N/A
N/A
N/A
27
Table 3. Normalized throughput of all experiments
Cases
UR'-MResult'
UR-MResult
UR-MData
UR/PrivW-MResult
UR/PrivW-MData
UR/Qouta-MResult
UR/Qouta-MData
UR/PrivW/Quota-MResult
UR/PrivW/Quota-MData
Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Exp 7
1
1
1
1
1
1
1
0.86 0.86 0.86 0.86 0.86 0.86 0.86
0.86 0.86 0.86 0.85 0.87 0.86 0.85
1.27
1.7
1.69 3.03
1.1
1.18 1.65
1.27
1.7
1.68 2.98 1.11 1.18 1.63
1.18 1.36 1.62 1.23 1.09 1.13 1.34
1.18 1.36 1.61 1.21 1.09 1.13 1.33
1.7
3.5
5.65 8.51 1.28 1.47 3.17
1.7
3.49 5.62 8.36 1.29 1.46 3.13
28
Table 4. Adjusted hit ratios, h’, for all cases
Cases
Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Exp 7
UR' 64% 16% 19% 19% 11% 37% 1.5%
UR 64% 16% 19% 19% 11% 36% 1.5%
UR/PrivW 73% 54% 54% 75% 24% 49% 44%
UR/Quota 69% 39% 50% 35% 19% 44% 27%
UR/PrivW/Quota 79% 76% 86% 91% 31% 57% 69%
29
Figure 1. Architecture for Mobile Systems (Adapted from Figure 1 in [7])
30
( cache_flag, base_relation_name[1],
cache_relation_name[1],
[Pattribute1, Pattribute2,…, Pattributem],
[attribute1, attribute2,…, attribute]n,
[criteria1, criteria2,…, criterial] )
( cache_flag, base_relation_name[2],
cache_relation_name[2],
[Pattribute1, Pattribute2,…, Pattributem],
[attribute1, attribute2,…, attributen],
[criteria1, criteria2,…, criterial] )
………………..
( cache_flag, base_relation_name[n],
cache_relation_name[n],
[Pattribute1, Pattribute2,…, Pattributem],
[attribute1, attribute2,…, attributen],
[criteria1, criteria2,…, criterial] )
Figure 2. User Profile
31
Figure 3. Broadcasting Validation Report (adapted from [1])
32
Figure 4. The Relationship among a Fixed Network, a Base Station, and an MU
33
Wireless Link Uasge (W:20%)
6000
5500
Number of Queries
5000
UR'-MResult'
UR-MResult
4500
UR-MData
UR/PrivW-MResult
UR/PrivW-MData
4000
UR/Quota-Mresult
UR/Quota-MData
3500
UR/PrivW/Quota-MResult
UR/PrivW/Quota-MData
3000
100%
90%
70%
50%
20%
10%
2000
0%
2500
Percentage of Private and Quota
Write in All Write
Figure 5. Impact of Private and Quota Public WRITE (write: 20%)
34
Wireless Link Uasge (W:50%)
8200
Number of Queries
7200
UR'-MResult'
UR-MResult
6200
UR-MData
UR/PrivW-MResult
5200
UR/PrivW-MData
UR/Quota-Mresult
4200
UR/Quota-MData
3200
UR/PrivW/Quota-MData
UR/PrivW/Quota-MResult
100%
90%
70%
50%
20%
10%
1200
0%
2200
Percentage of Private and Quota
Write in All Write
Figure 6. Impact of Private and Quota Public WRITE (write: 50%)
35
Wireless Link Uasge (W = 90%)
9800
8800
6800
UR'-MResult'
UR-MResult
5800
UR-MData
4800
UR/PrivW-MResult
UR/PrivW-MData
3800
UR/Quota-Mresult
100%
90%
800
70%
UR/PrivW/QuotaMResult
UR/PrivW/Quota-MData
50%
1800
20%
UR/Quota-MData
10%
2800
0%
Number of Queries
7800
Percentage of Private and Quota
Write in All Write
Figure 7. Impact of Private and Quota Public WRITE (write: 90%)
36
Figure Captions:
Figure 1. Architecture for Mobile Systems (adapted from Figure 1 in [7])
Figure 2. User Profile
Figure 3. Broadcasting Validation Report (Adapted from [1])
Figure 4. The Relationship Among a Fixed Network, a Base Station, and an MU
Figure 5. Impact of Private and Quota Public WRITE (write: 20%)
Figure 6. Impact of Private and Quota Public WRITE (write: 50%)
Figure 7. Impact of Private and Quota Public WRITE (write: 90%)
Yao & Dunham