Download Caching Architectures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
World Wide Web Caching:
Trends and Technologies
Gerg Barish & Katia Obraczka
USC Information Sciences Institut , USA ,2000
This report presented by Loubna ALI
Introduction :
The web caching is the Introducing proxy servers at certain points in the
network that serve in caching Web documents for faster client access.
In our days web caching is very important because of:
–The rapid growth in HTTP traffic to form the largest part of the Internet
traffic which causes more network congestion and server unavailability.
–The number of Web static pages almost doubles every year.
But for that it becomes an attractive solution it must has the following
features:
 Bandwidth saving
 Improving content availability.
 Improving web server availability.
 Reducing network latency.
 Server load balancing.
 Improving user’s perception about networks performance.
In this paper we will described several web caching architectures ,
Cache deployment options, and Design techniques. Finally we will
organize the summary and the future works.
Caching Architectures:
1-Proxy Caching: this kind of cache is deployed at the edges of the
network and it has the following disadvantages:
–Unavailable cache cause Unavailable network.
–Single point of failure.
–User browser manual reconfiguration in times of failure (browser autoreconfiguration is a recent trend).
2-Reverse Proxy Caching: in this kind the proxies situate near the
content provider
3-Transparent Caching: we have an advantage here that the needs to
manually configure web browsers is eliminated.
1
There is two kind of transparent caching:
–Router-based transparent proxy caching
–Switch-based transparent proxy caching
the switch-based caching is better than the router based because it is
less expensive, it reduce the latency in doing the load balancing.
4-Adaptive Web Caching: it uses the distributed cache meshes to solve
the hot spot problem, and it has the following properties:
–Caches dynamically join and leave the groups based on content
demand
–Adaptivity and self-organizing
it use the Cache Group Management Protocol(CGMP) and the Content
Routing Protocol(CRP)
5-Push Caching: it keep the data close to those clients requesting this
information.
We assume here that we are able to launch caches that may cross
administrative boundaries. But we have as disadvantage incurs cost
(storage and transmission).
6-Active Caching: this cache is applies caching to dynamic documents
because 30 % of client HTTP requests contains cookies, it use the cache
applets and when we demand the information the servers provides the
cache with the objects and any associated cache applets.
Cache Deployment options:
Near the content consumer(consumer-oriented)
In this situation we have the better response time and other advantage
that the requests are serve locally.
Near the content provider(provider-oriented)
the advantages of this situations are:
–Improves access to logical sets of data
–Improve the scalability and availability of content
but we have a problem critical to delay sensitive content (audio, video)
At strategic points in the network
–Based on user access patterns and network topology and conditions
but there is a problem with administrative control
2
Design Techniques:
Hierarchical Caching:
The Caches are arranged in a tree-like structure:
-A child cache can query parent caches and other siblings but a parent
cache can never query children
-This maintains information gradually filtering down to the leaves
here we have the problem of parents swamping and to avoid this
problem clustering may be applied to hierarchies.
Advantages:
–Bandwidth efficient , especially when cache servers are slow.
–Allows to efficiently diffuse popular web pages towards the demand.
Disadvantages
–Cache server needs to be placed at key access points of the network à
requires coordination among caches.
–Each level adds a delay.
–High levels are bottlenecks.
–multiple copies at different cache levels.
Intercache Communication
This design composed of multiple distributed caches.
It use the following protocols:
–ICP (Internet Cache Protocol) [Squid]: Caches issue queries to other
caches to determine the best location of object retrieval. Main problem is
the message overhead
–CRP (Content Routing Protocol): ICP with multicast feature to query
cache meshes
–Cache digests [Squid]: summarizes cache objects
–WCCP (Web Cache Communication Protocol) [Cisco]: Enables
transparent redirection of HTTP traffic to Cisco Cache Engine
–CARP (Cache Array Routing Protocol) [Microsoft]: Uses Hashing
Schemes for location determination of the required proxy having the
requested information
Hashing function
The principal idea of this design is to point the local cache in direction of
other caches which have the object or can get it.
3
-Hash-Based request routing:
–Use hash-function to map a key (such as the url) to a cache within a
cluster
–Reduces (eliminates) the need of caches to query each other
–Ex) Netcache-MD5-indexed URL hash-function CARP
Optimized I/O:
It treat the object cache with high performance data base for determine if
the object has been cached in memory data structure, and the disk
operations locate where is in the disk place the content.
The advantage here is that the costly I/O operations can be avoided.
Microkernel Operating System:
It present how the resources are managed .
The advantage are:
 Improve resource allocation.
 Optimize cache performance.
Content prefetching
The principal idea of this design is that the latter uses data accumulate
by the server, such as historical information.
We have three manner to implement this cache:
–Between clients and servers
–Between clients and proxies
–Between proxies and servers
 Improvements:
–Less latency (from 26% improvement to 57%)
–Improved access time
Cache coherency (consistency)
This cache ensure that the cached object does not reflect stale or
defunct data.
The consistency techniques:
–Client polling: compare the cached object with that of the original
object .
–Invalidation callbacks: the server contact the proxies when objects
change.
–TTL and Adaptive TTL
–If-Modified Since: caches invalid objects only when they are requested
and there expiration date has been reached.
4
Summary
As we have seen, there are different designing caches but some issues
common among them.
 we have as advantages:
1.Improve content availability.
2.Reduce network latencies.
3.Reduce address increasing bandwidth demands.
4.Can hide network problems.
5.Reduce server burden.
 Disadvantages:
1.Stale pages.
2.Information retained in caches.
But the election of the cache which is the must suitable for our
application depend at the application itself may be we need the cache
which has the more less latency or which has the more security
properties….etc.
Open Future Works(trends):
To improve the informations of this theme we must dispute the problem
of security and real time data.
Content Security:
For ensure the security we can present two example for two mechanisms
which are developed in 2002:
1.Net cache:
-The appliance deployed in parallel to firewall
-The appliance can be used to control who accesses a web site.
-Virus scanning for all incoming content
2. Cache flow:
-Added content filtering to its caches.
Handling more complex objects and real-time data
RTEE(real time event engine): captures, caches, and queries data at
speeds greater than 12000 event/s.
Web Caching based on Ontology ?
–User access pattern prediction
–Prefatching
–Cache placement/replacement
5
This article:
 Good organized .
 Explains all kind of web caching and all technique with manner
simple.
 Presents the need of caches ,desirable properties and the gains
which we expected for well choose the cache which is the most
suitable for our applications.
But:
 It does not Present and explain the trends which is very important
in this range ,such as :security and real time data.
There is loss at tools of illustration such as: photos , designs ..etc.
Loubna ALI
6