Download 6.Peter_SW

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Encyclopedia of World Problems and Human Potential wikipedia , lookup

Database model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Versant Object Database wikipedia , lookup

Transcript
Sleepers & Workaholics
Caching Strategies in Mobile Computing
Dr. Daniel Barbará
Dr. Tomasz Imielinski
About Me
Peter Rosegger
 5th year Computer Science
 Specialization: Databases
 Graduation: December 2007
Sleepers & Workaholics
Caching Strategies in Mobile Computing
Dr. Daniel Barbará
 Professor at George Mason University
 Several patents associated with mobile caching
Dr. Tomasz Imielinski
 Professor at Rutgers University
 Senior VP: Search Technology at Ask.com
1994
16 million cellular subscribers in US
1994
The Future of Mobile Computing
Use Habits:
 Large # of users
 Check weather, stocks, scores, etc.
 Mobile between cells (& wireless networks)
Hardware:
 Low-powered palmtop machines
 Poor battery life
 Narrow bandwidth
The Future of Mobile Computing
Query complex databases, but…
 Frequently powered off to save battery
 Frequently changing cells
 Network traffic must be minimized to
conserve bandwidth
Why Caching is Important
Conserve:
1. COMPUTATIONAL RESOURCES
2. BATTERY LIFE
3. BANDWIDTH
Traditional Strategies Fail
Server lacks knowledge of:
 Which units are in its cell
 Which units are powered ON
Client caches cannot be tracked
The Solution
Purpose of Sleepers & Workaholics:
"…to propose a taxonomy of different
cache invalidation strategies and study
the impact of clients' disconnection
times on their performance."
Strategies
 Timestamps (TS)
 Amnesic Terminals (AT)
 Signatures (SIG)
Control Strategy:
 No Cache (NC)
Timestamps
-Cache entries have timestamps
-Synchronous, history based, uncompressed reports
SERVER:
Notify clients of identifiers of items changed within last w
seconds
CLIENT:
For each item in cache:
 If in report, purge from cache
 If NOT in report, update timestamp to current time
Amnesic Terminals
-Cache entries have identifiers
-Synchronous, history based, uncompressed reports
SERVER:
Notify clients of identifiers of items changed within last w
seconds
CLIENT:
For each item in cache:
 If in report, purge from cache
 If NOT in report, do nothing
Signatures
-Checksums calculated over value of data to form Signature
-Signatures combined using XOR
-Synchronous, state based, compressed reports
SERVER:
Server broadcasts the set of combined signatures
CLIENT:
Item in cache is declared invalid if it belongs to “too many”
unmatching signatures (suspected of being out of date)
Analysis
Calculate THROUGHPUT for each strategy…
L = time between invalidation report broadcasts
W = bandwidth
B C = # bits in the broadcast (invalidation reports)
# bits available for answering queries (cache misses)
 LW  BC
Analysis
T = THROUGHPUT; queries per interval handled by the system
h = cache hit rate, expressed [0, 1]
b q = # bits for a query
b a = # bits to answer a query
Traffic (in bits) due to cache misses
 T(1 h)(bq  ba )
Throughput
T(1 h)(bq  ba )  LW  BC
LW  BC
T
(1 h)(bq  ba )
Effectiveness of a Strategy
T
e
Tmax
Maximal Throughput
Server knows:
-What units are in the cell
-What those units have in their caches
Server can:
-instantaneously notify units when an item changes
BC  0
h  MaximalHitRatio
Maximal Hit Ratio
The Hit Ratio achieved in ideal conditions:
MHR 

   

e
e d

0

MHR 


Maximal Throughput
BC  0
Tmax

h  MaximalHitRatio
LW

(1 M.H.R.)(bq  ba )
No Caching
-No invalidation report
-No intervals
BC  0
h 0
LW
Tnc  
(b

b
)
 q
a
Timestamps
LW  n c (log( n)  bT )
TTS 
(bq  ba )(1 hts)
Amnesic Terminals
LW  n L log( n)
TAT 
(bq  ba )(1 hat )
Signatures
Consider the probability of false diagnosis:
 Probability of a false positive
 Probability of a false negative
1
TSIG 
LW  6g( f  1)(ln( )  ln( n))

(bq  ba )(1 hsig )
Asymptotic Analysis
Analyze throughput in extreme cases:
 As probability of sleeping s0, s1
Analyze throughput as system parameters vary:




Database size
Update frequency
Bandwidth
Etc.
Workaholics
Unit sleeps less and less: s0
 All hit ratios approach the same value
 SIG lags behind TS and AT by a factor of
BEST THROUGHPUT:
 AT, because its report is the shortest

pnf
Sleepers
Unit sleeps more and more: s1

All hit ratios approach 0
BEST THROUGHPUT:

No Caching eventually wins as s becomes very large
 For practical purposes, SIG is the best choice
Infrequent Updates
Effectiveness as s ranges from 0 to 1
Increase Database Size & Bandwidth
Effectiveness as s ranges from 0 to 1
Update Intensive
Effectiveness as s ranges from 0 to 1
Increase Database Size & Bandwidth
Effectiveness as s ranges from 0 to 1
Conclusions on Effectiveness
Strategy depends on circumstances:
 SIG is best for sleepers
 TS is best for query-intensive scenarios, but…
 AT is best for workaholics
How can we improve effectiveness?
Relax: Consistency of the Cache
Depending on data type, data may not need to
be exact…
EX: stocks, weather, etc.
Makes shorter invalidation reports possible
How Do We Decide to Update?
- Consider cached copies to be quasi-copies
- Each quasi-copy has a coherency condition
attached to it
Coherency Conditions:
Delay Condition - updated based on time
Arithmetic Condition - updated based on difference
between data and quasi-copy
Adaptive Invalidation Reports
-Start with TS strategy
Use algorithms to optimize strategy.
Examples:
 If an item is queried very often by units that sleep
a lot, include it in reports for longer
 If an item changes frequently, do not bother
caching
Criticism
 Units rarely powered down
 Battery life better than predicted
 Battery life does not dictate use
 Units still lose reception frequently
 Today’s most common “sleeper” condition -explicitly excluded from definition in S&W
 Bandwidth better than predicted
However…
 Adjust “sleeper” to include lost reception
 Caching is still important
 Endless demand for computational resources
 Endless demand for battery life
 Endless demand for more bandwidth