Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sleepers & Workaholics Caching Strategies in Mobile Computing Dr. Daniel Barbará Dr. Tomasz Imielinski About Me Peter Rosegger  5th year Computer Science  Specialization: Databases  Graduation: December 2007 Sleepers & Workaholics Caching Strategies in Mobile Computing Dr. Daniel Barbará  Professor at George Mason University  Several patents associated with mobile caching Dr. Tomasz Imielinski  Professor at Rutgers University  Senior VP: Search Technology at Ask.com 1994 16 million cellular subscribers in US 1994 The Future of Mobile Computing Use Habits:  Large # of users  Check weather, stocks, scores, etc.  Mobile between cells (& wireless networks) Hardware:  Low-powered palmtop machines  Poor battery life  Narrow bandwidth The Future of Mobile Computing Query complex databases, but…  Frequently powered off to save battery  Frequently changing cells  Network traffic must be minimized to conserve bandwidth Why Caching is Important Conserve: 1. COMPUTATIONAL RESOURCES 2. BATTERY LIFE 3. BANDWIDTH Traditional Strategies Fail Server lacks knowledge of:  Which units are in its cell  Which units are powered ON Client caches cannot be tracked The Solution Purpose of Sleepers & Workaholics: "…to propose a taxonomy of different cache invalidation strategies and study the impact of clients' disconnection times on their performance." Strategies  Timestamps (TS)  Amnesic Terminals (AT)  Signatures (SIG) Control Strategy:  No Cache (NC) Timestamps -Cache entries have timestamps -Synchronous, history based, uncompressed reports SERVER: Notify clients of identifiers of items changed within last w seconds CLIENT: For each item in cache:  If in report, purge from cache  If NOT in report, update timestamp to current time Amnesic Terminals -Cache entries have identifiers -Synchronous, history based, uncompressed reports SERVER: Notify clients of identifiers of items changed within last w seconds CLIENT: For each item in cache:  If in report, purge from cache  If NOT in report, do nothing Signatures -Checksums calculated over value of data to form Signature -Signatures combined using XOR -Synchronous, state based, compressed reports SERVER: Server broadcasts the set of combined signatures CLIENT: Item in cache is declared invalid if it belongs to “too many” unmatching signatures (suspected of being out of date) Analysis Calculate THROUGHPUT for each strategy… L = time between invalidation report broadcasts W = bandwidth B C = # bits in the broadcast (invalidation reports) # bits available for answering queries (cache misses)  LW  BC Analysis T = THROUGHPUT; queries per interval handled by the system h = cache hit rate, expressed [0, 1] b q = # bits for a query b a = # bits to answer a query Traffic (in bits) due to cache misses  T(1 h)(bq  ba ) Throughput T(1 h)(bq  ba )  LW  BC LW  BC T (1 h)(bq  ba ) Effectiveness of a Strategy T e Tmax Maximal Throughput Server knows: -What units are in the cell -What those units have in their caches Server can: -instantaneously notify units when an item changes BC  0 h  MaximalHitRatio Maximal Hit Ratio The Hit Ratio achieved in ideal conditions: MHR        e e d  0  MHR    Maximal Throughput BC  0 Tmax  h  MaximalHitRatio LW  (1 M.H.R.)(bq  ba ) No Caching -No invalidation report -No intervals BC  0 h 0 LW Tnc   (b  b )  q a Timestamps LW  n c (log( n)  bT ) TTS  (bq  ba )(1 hts) Amnesic Terminals LW  n L log( n) TAT  (bq  ba )(1 hat ) Signatures Consider the probability of false diagnosis:  Probability of a false positive  Probability of a false negative 1 TSIG  LW  6g( f  1)(ln( )  ln( n))  (bq  ba )(1 hsig ) Asymptotic Analysis Analyze throughput in extreme cases:  As probability of sleeping s0, s1 Analyze throughput as system parameters vary:     Database size Update frequency Bandwidth Etc. Workaholics Unit sleeps less and less: s0  All hit ratios approach the same value  SIG lags behind TS and AT by a factor of BEST THROUGHPUT:  AT, because its report is the shortest  pnf Sleepers Unit sleeps more and more: s1  All hit ratios approach 0 BEST THROUGHPUT:  No Caching eventually wins as s becomes very large  For practical purposes, SIG is the best choice Infrequent Updates Effectiveness as s ranges from 0 to 1 Increase Database Size & Bandwidth Effectiveness as s ranges from 0 to 1 Update Intensive Effectiveness as s ranges from 0 to 1 Increase Database Size & Bandwidth Effectiveness as s ranges from 0 to 1 Conclusions on Effectiveness Strategy depends on circumstances:  SIG is best for sleepers  TS is best for query-intensive scenarios, but…  AT is best for workaholics How can we improve effectiveness? Relax: Consistency of the Cache Depending on data type, data may not need to be exact… EX: stocks, weather, etc. Makes shorter invalidation reports possible How Do We Decide to Update? - Consider cached copies to be quasi-copies - Each quasi-copy has a coherency condition attached to it Coherency Conditions: Delay Condition - updated based on time Arithmetic Condition - updated based on difference between data and quasi-copy Adaptive Invalidation Reports -Start with TS strategy Use algorithms to optimize strategy. Examples:  If an item is queried very often by units that sleep a lot, include it in reports for longer  If an item changes frequently, do not bother caching Criticism  Units rarely powered down  Battery life better than predicted  Battery life does not dictate use  Units still lose reception frequently  Today’s most common “sleeper” condition -explicitly excluded from definition in S&W  Bandwidth better than predicted However…  Adjust “sleeper” to include lost reception  Caching is still important  Endless demand for computational resources  Endless demand for battery life  Endless demand for more bandwidth