Chapter 28 Applications: World Wide Web(HTTP) Guoying Yang Liangqi Guo Song Ye Introduction World Wide Web(WWW) The primary protocol used to transfer a Web page from a server to a Web browser. Importance of The Web History * During the early of the internet, FTP data transfers accounted for one third of Internet traffic. * By 1995, Web traffic became the largest consumer of Internet backbone bandwidth. More people know about and use the Web. Most companies have Web sites to de business. Architectural Components Web pages: the Web consists of a large set of documents that are accessible over the Internet. Each Web page is classified as a hypermedia document. * Suffix media: indicate that a document can contain items other than text. * Prefix hyper: a document can contain selectable links that refer to other, related documents. Architectural Components Web browser consists of an application that a user invokes to access and display a Web page. Web server obtain a copy of the specified page, response the client’s request. HyperText Markup Language(HTML) * Tags: give guidelines for display. Some tags come in pairs that apply to all items between the pair. * For example: <center> ……</center> Uniform Resource Locators Uniform Resource Locator(URL) * Each Web page is assigned a unique name(URL). * A URL follows http scheme has the following form: http:// hostname [:port] / path [; parameters] [? query] * port: an optional protocol port number . * path: a string that identifies one particular document on the server. * parameters: an optional string supplied be the client. * ?query: an optional string used when the browser send a question. Uniform Resource Locators The absolute form of a URL http://www.cs.purdue.edu/people/comer The relative URL * Communication has been established with a specific server. * Omits the address of the server. * For example: only the string /people/comer/ is need to specify the document named by the absolute URL above. An Example Document <HTML> The author of this text is <A HREF=“ http://nas.cl.uh.edu/perkins “ > TCP/IP </A> <HTML> Hypertext Transfer Protocol What is HTTP? The protocol used for communication between a browser and a Web server or between intermediate machines and Web servers. HTTP has following characteristics: Hypertext Transfer Protocol Application Level. Request/Response. Stateless. Bi-Directional Transfer. Capability Negotiation. Support For Caching. Support For Intermediaries. HTTP GET Request The browser sends a GET request to which a server responds by sending the requested item. Browser Sends GET command followed by a URL and an HTTP version number. Examples: GET http://www.cs.purdue.edu/people/comer/ HTTP/1.1 GET /people/comer/ HTTP/1.0 Server Responds by sending a copy of the page. Some Useful URLs http://www.w3.org/Protocols/ http://www.w3.org/Protocols/rfc2616/rfc2616 .html http://www.w3.org/Library/Examples/ http://www.w3.org/Library/User/Applications .html Error Messages When a Web server receives an illegal request, it usually generates error messages in valid HTML. The browser will display the error message like this: Error Messages’ Code and Meaning 400 Wrong request syntax. 401 Authorization required. A list of allowed authorization scheme will also be sent. 402 No Chargeto field on the request for a paid service. 403 Forbidden resource 404 The server cannot find the URL requested. 405 Accessing the resource using a method not allowed. 406 Resource type incompatible with the client. 410 Resource no longer available and no forwarding information exist. 500 The server has encountered an internal error and cannot continue with the request. 501 The server does not support the method of a legal request. 502 Secondary server does not return a valid response. 503 The service is unavailable, because the server is too busy. 504 Secondary server takes too long to respond. Persistent Connections And Lengths What is persistent Connections? Once a client opens a TCP connection to a particular server, the client leaves the connection in place during multiple request and response. When either a client or server is ready to close the connection, it informs the other side, and the connection is closed. Advantage: Reduced overhead. Disadvantage: Need to identify the beginning and end of each item sent over the connection. Data Length And Program Output To allow a TCP connection to persist through multiple requests and responses. HTP sends a length before each response. Thus, to provide for dynamic Web pages, The HTTP standard specifies that if the server does not know the length of an item, the server can inform the browser(client) that it will close the connection after transmitting the item. Length Encoding and header HTTP borrows the format from e-mail,use 822 format and MIME extension. Example: KEYWORD : information content--length : 34 content--Language : en content--Encoding : ascii Other header: Connection :close (used when server don’t know the length ) Negotiation In addition to specifying details about an item being sent ,HTTP use header to permit client and server to “negotiate” capabilities. Capabilities include: connection : representation : control : content Two basic type of Negotiation: Server--driven(server select) Agent--driven ( 2 steps ) Continue:Agent driven agent driven 2 steps : 1. First browser sent request to server to ask what is available, the server return a list of possibilities 2. Browser select one of the possibilities and sent a second request to obtain the item Advantage: browser have full control about the choice Disadvantage:select one possibility ,sent two request Accept header Accept header:browser use this to specify which media or representations are acceptable Example : Accept: text/html , text/plain; q=0.5 , text/x-dvi ;q=0.8 q is preference level variety of Accept header: Accept-Encoding Accept-charset Accept-Language Conditonal request HTTP allow a sender to make a request conditional Example : If-Modified-Since :Sat,01 Jan 2000 05:00:01 GMT (avoid get item older than Jan 1,2000.) Proxy server A local server which is configured to cache copies of web page of original source . Advantage: 1 . Decrease latency 2 . Reduce load of server To guarantee correctness ,HTTP includes explicit support for proxy server: 1.How proxy handle each request 2. How header should be interpreted 3. How browser negotiate with a proxy Caching The goal of caching is improve efficiency : reduce both latency and network traffic How long should a item be kept in cache? HTTP allows a server to control caching in two way: 1. Server specify caching details 2. HTTP allow browser to force REVALIDATION SUMMARY The World wild web consists of hypermedia document stored on a set of web server and accessed by browser. Each document is assigned a URL that uniquely identifies it . A browser and server use HTTP to communicate HTTP is an application-level protocol with explicit support for negotiation, proxy server,caching ,and persistent connection.