Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Web Caching By Amisha Thakkar Web Caching 1 Overview • • • • • • What is a Web Cache ? Caching Terminology Why use a cache? Disadvantages of Web Cache Other Features Caching Rules Web Caching 2 Overview • • • • • Caching Architectures Cache Deployment Scheme Active Caching Real World Solution Research Areas Web Caching 3 What is a Web Cache ? • Cache is a place where temporary copies of objects are stored • Cached information is generally closer to the requester than the permanent information is • Objects -HTML pages, images, files Web Caching 4 What is a Web Cache? Web Caching 5 Caching Terminology • Client - An application program that establishes connections for sending requests • Server- An application program that accepts connection to service requests by sending back responses • Origin Server-The server on which the given resource resides or is to be created Web Caching 6 Caching Terminology • Proxy- An intermediary program which acts both as a server and a client which requests on behalf of the other clients • Proxy is not necessarily a cache * Proxy does not always cache the replies passing through it * It may be used on a firewall to monitor accesses Web Caching 7 Why use a cache ? • • • • To reduce latency To reduce network traffic Load on origin servers will be reduced Can isolate end users from network failures Web Caching 8 Disadvantages of Web cache • With cached data there is always a chance of receiving stale information • Content providers lose access counts when cache hits are served • Manual configuration is often required • Operation of cache requires additional resources • In some situations the cache can be a single 9 point of failure Web Caching Other Features • Depending on the perspective the following may be good or bad * Cache requests on behalf of clients ; the servers never see the clients IP addresses * Cache provides an easy opportunity to monitor and analyze browsing activities * Cache can be used to block certain requests Web Caching 10 Types of Web Caches • Proxy caches * Serve a large number of users * Large corporations and ISP’s often set them up on the firewalls * They are type of shared caches • Browser caches * Use a section of the computer’s hard disk to store objects that you have seen Web Caching 11 Caching Rules • Rules on which caches work * Some of them set in protocols * Some are set by cache administrator • Most common rules : * If the object is authenticated or secure it won’t be cached * Object’s headers indicate whether the object is cacheable or not Web Caching 12 Caching Rules * Object is considered fresh when It has an expiry time or other age controlling directive set & is still within the fresh period If the browser cache has already seen the object & has been set to check once a session Web Caching 13 Caching Rules If a proxy cache has seen the object recently & it was modified relatively long ago Fresh documents are served directly from the cache without checking with the origin server Web Caching 14 Caching Rules * For a stale object , the origin server will be asked to validate the object , or tell the cache whether the copy is still good * The most common validator is the time that the object was last changed Web Caching 15 Caching Architectures Hierarchical /Simple Cache • Browser-cache interaction is same as browser -host interaction, i.e. a TCP connection is made & item requested • If not found send request to parent cache • Hierarchy built up - each level serving indirectly a wider community of users Web Caching 16 Caching Architectures Hierarchical /Simple Cache National Network National Network Regional Network Regional Network Institutional Network Institutional Network Institutional Network Web Caching Institutional Network 17 Caching Architectures Distributed /Co-operating Cache • Decentralized(Cache Mesh) • Multiple servers cooperate in such a way that they share their individual caches to create a large distributed one • Simply put caching proxies communicating with each other to serve different users • On a cache miss, it checks with other proxy caches before contacting the origin server Web Caching 18 Caching Architectures Distributed /Co-operating Cache • Caches communicate amongst themselves using a protocol like ICP (Internet Cache Protocol) • Caches can be selected on the basis of * Distances from the end user * Specialize in particular URLs(location hint). Web Caching 19 Caching Architectures Distributed /Co-operating Cache • Why Distributed - limitations of hierarchy * Width of cache in hierarchy: caches at same level are inaccessible to each other * LRU policy implies sufficient disk space * Cost in replication of disk storage * Amount of disk space reqd. depends on number of users served & breadth of reading Web Caching 20 Caching Architectures Distributed /Co-operating Cache More the users more disk space higher in the hierarchy * Exponential growth of number of documents on WWW Web Caching 21 Caching Architectures Distributed /Co-operating Cache • Caching close to user - more effective, higher the level lower the efficiency • Can be created for load balancing • Most effective when serving a community of interests Web Caching 22 Caching Architectures Distributed /Co-operating Cache • First an UDP packet sent for cache inquiry. • Cache selection decision is determined by RTT • Potential problem -network congestion because of UDP • In favor* UDP exchange :2 IP packets, TCP :at least 8 packets Web Caching 23 Caching Architectures Distributed /Co-operating Cache * UDP reply from cache can indicate a. Presence b. Speed c. Availability of requested documents Web Caching 24 Caching Architectures Hybrid Cache Note: ICP Web Caching 25 Cache Deployment Schemes • Proxy caching Web Caching 26 Cache Deployment Schemes • Advantages Clients point all web requests directly to cache : no effect on non web traffic Cost of upgrading h/w & s/w is limited Administration on caches limited to basic configuration Web Caching 27 Cache Deployment Schemes • Disadvantages Every browser must be configured to point to the cache Each client can hit only one cache Single point of failure Unnecessary duplication of data Bottleneck in cases where content is otherwise available in LAN Web Caching 28 Cache Deployment Schemes • Transparent Proxy caching Web Caching 29 Cache Deployment Schemes • Advantages No browser configuration Cost of upgrading h/w & s/w is limited No administration of intermediate systems required Web Caching 30 Cache Deployment Schemes • Disadvantages Each client can hit only one cache If cache goes down internet as well as intranet access lost Negative impact on non web traffic Cache has to route non web traffic Routing ,packet examination & n/w addr. translation steal CPU cycles from the main cache serving function Web Caching 31 Cache Deployment Schemes • Transparent proxy caching with web cache redirection. Web Caching 32 Cache Deployment Schemes • Advantages Switch/ router examines the packets Minimal impact on non-web traffic Frees up CPU cycles for the web cache Allows client load to be dynamically spread over multiple caches Eliminates single point of failure especially if redundant redirectors are used Web Caching 33 Cache Deployment Schemes • Disadvantages Additional intermediate systems must be deployed Increases expense Web Caching 34 Active Caching • Current problem unable to cache dynamic documents • Cache applet is server supplied code that is attached with an URL , or collection of URLs • Applet is written in platform independent language Web Caching 35 Active Caching • On a user request the applet is invoked by the cache • The applet decides what is to be sent to the user * Giving the proxy a new document to send back to the user * Allowing the proxy to use the cached copy * Instructing the proxy to send the request to the web server Web Caching 36 Active Caching • Functions of the applet* Logging user accesses * Checking access permissions * Client-Specific Information Distribution Web Caching 37 Active Caching • The proxy has the freedom to not invoke the applet but send the request to the server • Proxy promises to not send back a cached copy without invoking the applet • If applet too huge ,send request to server • Proxy not obligated to cache any applet , in that case agrees to not service the request for that document Web Caching 38 Active Caching • Proxy can devote resources to the applets associated with the hottest URLs to its user • Proxy that receives the request is typically the proxy closest to the user , the scheme automatically migrates the server processing to the nodes that are close to users • Thus increasing the scalability of web based services Web Caching 39 Real World Solution • CacheFlow has successfully implemented caching solutions for e-commerce • Provide client-side & server-side solution • On the client-side the cache is placed between the network & the firewall i.e. in front of the firewall & the web server • Request for dynamic content or secure transactions are passed to origin servers for processing Web Caching 40 Real World Solution • This offers several advantages* Offloads load from servers & firewalls * Scale the network to handle more customer transactions & large traffic spikes * Reduce capital & operating costs * Reduces the security risks of users accessing servers that are inside the firewalls Web Caching 41 Real World Solution • They have developed an operating system:CacheOS • Main features related to caching : Adaptive Asynchronous Refresh , Object Pipelining • Variables tracked for AAR : * Frequency of request (model of use) * Frequency of change (model of change) * Time cost to retrieve object Web Caching 42 Real World Solution • CacheOS then automatically determines refresh pattern • 90% hit rate • Some facts : * As many as 90% or more web objects can be static * 8 sec threshold Web Caching 43 Real World Solution • Successful Implementations: * Proflowers.com * Kbkids.com * delta-air.com * Xerox Web Caching 44 Research Areas • How are the cache proxies organized, hierarchically, distributed, or hybrid?-cache architectures • Where to place a cache proxy in order to achieve optimal performance?proxy placement Web Caching 45 Research Areas • How do proxies cooperate with each other?proxy co-operation • What kind of data/information can be shared among co-operating proxies?-data sharing • How does a proxy decide what and when to prefetch from Web server or other proxies to reduce access latency in the future?-prefetching Web Caching 46 Research Areas • How does a proxy manage pages?-cache placement and replacement • How does a proxy maintain data consistency?-cache coherency • How is the control information distributed among pages?-control information Distribution • How to deal with data which is not cacheable?-dynamic data caching Web Caching 47