Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Web Performance Web Performance • Why do we care? • What is performance? – User Experience – Web Server – Network • How can we tell how we are doing? • What good is it? Why do we care? “Twenty-eight percent of shoppers who have suffered failed performance attempts said they stopped shopping at the web site where they had problems, and six percent said they stopped buying at that particular company’s off-line store.” (Boston Consulting Group, quoted in Infoworld / Computerworld 3/00) “It takes only 8 ½ seconds for half of the subjects to [give up]” (Peter Bickford, “Worth the Wait?” in Netscape/View Source Magazine 10/97) “Perhaps as much as $4.35 billion in e-commerce sales in the U.S. may be lost each year due to unacceptable download speeds and resulting user bailout behaviors.” (Zona Research 4/99) “Fifty-eight percent of online customers surveyed indicated quick download time as a key factor in determining whether they would return to a web site.” (Forrester Research 1/99) “One of the top three reasons cited by online shoppers for dissatisfaction with a web site is slow site performance.” (Jupiter Communications / NFO Worldwide 1/99) “At one site, the abandonment rate fell from 30% to 6-8% because of a one second improvement in load time.” (Zona Research 4/99) Effects of Poor Performance • Lost prospective customer – If the site didn’t work, or took too long, your prospect may not return for a long time – if ever. • Lost sale – If your competitor’s site was up and responsive, you may have lost a single sale. • Lost customer – If this happens repeatedly, you’ve lost a customer, – AND the customer may stop going to associated web sites and physical locations! • Lost reputation – People talk about poor performance; word spreads. – People are looking for a few good sites that they can trust! What is performance? • User Experience – How fast does the page load? – How available is the site? • Web Server – How many requests/second can be served? • throughput – What is the effect of web proxies? • Network – What is the network performance? • Latency, bandwidth Network Performance • At the network level, performance can be measured in terms of: – Latency • How long it takes a message to travel from one end of the network to the other – Bandwidth • The number of bits that can be transmitted over the network in a certain period of time latency bandwidth Network Performance Measures • Overhead: latency of interface vs. Latency: network Universal Performance Metrics Sender Sender Overhead Transmission time (size ÷ bandwidth) (processor busy) Time of Flight Transmission time (size ÷ bandwidth) Receiver Overhead Receiver Transport Latency (processor busy) Total Latency Total Latency = Sender Overhead + Time of Flight + Message Size ÷ BW + Receiver Overhead Includes header/trailer in BW calculation? Total Latency Example • 1000 Mbit/sec., sending overhead of 80 µsec & receiving overhead of 100 µsec. • a 10000 byte message (including the header), allows 10000 bytes in a single message • 3 situations: distance 1000 km v. 0.5 km v. 0.01 • Speed of light ~ 300,000 km/sec (1/2 in media) • Latency0.01km = • Latency0.01km = • Latency1000km = Total Latency Example • 1000 Mbit/sec., sending overhead of 80 µsec & receiving overhead of 100 µsec. • a 10000 byte message (including the header), allows 10000 bytes in a single message • 3 situations: distance 1000 km v. 0.5 km v. 0.01 • Speed of light ~ 300,000 km/sec • Latency0.01km = 80 + 0.01km / (50% x 300,000) + 10000 x 8 / 1000 + 100 = 260 µsec • Latency0.5km = 80 + 0.5km / (50% x 300,000) + 10000 x 8 / 1000 + 100 = 263 µsec • Latency1000km = 80 + 1000 km / (50% x 300,000) + 10000 x 8 / 1000 + 100 = 6931 µsec • Long time of flight => complex WAN protocol So What? • Long distance = long msg transmission time – Servers should be as close as possible to clients • Low bandwidth = long msg transmission time – Servers should have high bandwidth links • High Overhead = long msg transmission time – Reduce the communication overhead as much as possible – Fast TCP implementation – More memory The Internet DNS Cache Access Routers Devices Routers Access Provider Internet Browser The Internet Web Server Peering Point Routers Routers PSInet Digex BBN Verio UUnet Sprint GTE Worldcom Mindspring The Internet & Performance • Routers – Read packet headers and send along – Each hop adds delay Routers Routers Routers Routers • ISP Peering – Congestion may occur at peering points – End-to-end route in one direction my differ from route in the other direction Peering Point ISP “A” Routers ISP “B” Routers The Internet & Performance • Network Connection – Performance of connection to ISP is generally a limiting factor • ISP Services – Domain Name Service (DNS) • Each time a request is made, the server name must be translated into an IP address • Name Caching – DNS server retains addresses until “time to live” has passed – Client machine may also cache names for a short period of time – Web Proxies • Cache most frequently accessed pages • Zipf’s law DNS Cache Routers Routers Access Devices Access Provider Access Provider Internet Browser Web Server Performance • Throughput: Requests per second • How do you measure? – Live • May be too late…. – Offline • Replay logs - does the past characterize the future? • Synthetic Workload - does it characterize reality? • “...factoring out I/O, the primary determinant to server performance is the concurrency strategy.” – -- JAWS: Understanding High Performance Web Systems Applications of Workload Models • Identify Performance Problems – Problems may only occur under high load • Benchmark Web Components – Deployment decisions – Evaluate new features • Capacity Planning – Determine network, memory, disk and clustering needs Web Workload Characterization • Based on the results of numerous studies • Key properties – HTTP Message Characteristics • Several request methods and response codes Category Parameter Protocol Request Method Response Code Resource Content type Resource size Response size Popularity Modification freqency Temporal Locality # embedded resources Users Session interarrival times # clicks per session Request interarrival times – Resource Characteristics • Diverse content-type, size, popularity, and modification frequency – User Behavior • User browsing habits significantly affect workload Parameter Characterization • Associate each parameter with quantitative values • Statistics – Mean, median, mode • OK for parameters that don’t vary much – Probability Distributions • Capture how a parameter varies over a wide range of values Probability Distribution • Every random variable gives rise to a probability distribution • Probability Density Function – Assigns a probability to every interval of the real numbers QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • Cumulative Distribution Function – Describes the probability distribution of a real-valued random variable X – F(x) = P(X <= x) – The probability that a random variable will be less than or equal to x • In the following slides, we will show the CDF of commonly used distributions QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Poisson Distribution • F(x) = (e-k)/k! • Used to model the time between independent events that happen at a constant average rate QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • The number of times a web server is accessed per minute is a Poisson distribution – For instance, the number of edits per hour recorded on Wikipedia's Recent Changes page follows an approximately Poisson distribution. Exponential Distribution • F(x) = e-x • Used to model the time until the next occurrence of an event in a Poisson process QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • Session interarrival times are exponential – Time between the start of one user session and the start of the next user session Pareto Distribution • F(x) = (x/a)-k • k is shape, a is minimum value for x • Power law • 80-20 rule QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. – 20% of the sample is responsible for 80% of the results • Response sizes, Resource sizes, Number of embedded images, Request interarrival times • Often used to model self-similar patterns Probability Distributions in Web Workload Models Distribution Workload Parameter Exponential Session interarrival times Pareto Response Sizes Resource Sizes Number of embedded images Request interarrival times Lognormal Response sizes Resource sizes Temporal Locality Zipf-like Resource Popularity Probability Distribution Conversion • Most languages have random number library functions QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. – Uniform distribution • Must convert from uniform distribution to the chosen distribution • Given: the cumulative distribution function, CDF, of the chosen distribution QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. – 1. Generate a random number; call this number p – 2. Compute x such that CDF(x) = p • Determine the inverse of the CDF – 3. x is the random number you use QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Inverse of the CDF For the exponential distribution User Experience • 8 - second rule – Probably 4 seconds today • Typical page – Multiple requests • Example – Page has 20 elements – Server must be capable of 5 requests/second KEYNOTE User Experience Performance Tips • • • • • • • • • Check for web standards compliance Minimize the use of JavaScript and style sheets Turn off reverse DNS lookups on the server Get more memory Index your database tables Make fewer database queries Decrease the number of page components Decrease the size of each component Minimize Perceived Delay – Give the viewer something to look at while the page is loading Website Analysis • Websites quickly become large and difficult to test and optimize • Use tools – Workload generators • Webstone • JMeter – Site analysis - log files • Webalizer JMeter QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.