Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HTTP WEB Risanuri Hidayat, Ir., M.Sc. World Wide Web T. Berners-Lee, R. Fielding, H. Frystyk: “Hypertext Transfer Protocol - HTTP/1.0”, RFC 1945, 1996. Naming scheme for resources URL, URN, URI Multimedia documents MIME encoding (RFC) Transfer protocol HTTP/1.0, HTTP/1.1 Implemented over TCP/IP Integrated with Internet infrastructure DNS, SMTP Sejarah Hypertext systems: no network access protocol Gopher, WAIS no hyperlinks WWW @ CERN (Tim Berners-Lee, 1990) HTTP/0.9 (1992) Aplikasi Internet Application e-mail remote terminal access Web file transfer streaming multimedia remote file server Internet telephony Application layer protocol Underlying transport protocol smtp [RFC 821] telnet [RFC 854] http [RFC 2068] ftp [RFC 959] proprietary (e.g. RealNetworks) NSF proprietary (e.g., Vocaltec) TCP TCP TCP TCP TCP or UDP TCP or UDP typically UDP What is HTTP HTTP stands for Hypertext Transfer Protocol. It's the network protocol used to deliver virtually all files and other data (collectively called resources) on the World Wide Web, whether they're HTML files, image files, query results, or anything else. Usually, HTTP takes place through TCP/IP sockets (and this tutorial ignores other possibilities). A browser is an HTTP client because it sends requests to an HTTP server (Web server), which then sends responses back to the client. The standard (and default) port for HTTP servers to listen on is 80, though they can use any port. HTTP is used to transmit resources, not just files. A resource is some chunk of information that can be identified by a URL HTTP method GET URL or pathname //www.dcs.qmw.ac.uk/index.html HTTP/ 1.1 HTTP version HTTP/1.1 HTTP version headers message body status code reason headers message body 200 OK •Resource := MIME-encoded data •Content negotiation •Authentication resource data Methods: •GET, HEAD, POST •PUT, DELETE, TRACE, OPTIONS, CONNECT URL URL http://www.cdk3.net:8888/WebExamples/earth.html DNS lookup Resource ID (IP number, port number, pathname) 55.55.55.55 8888 WebExamples/earth.html Web server Network address file 2:60:8c:2:b0:5a Socket HTTP Transactions HTTP uses the client-server model: An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a stateless protocol, i.e. not maintaining any connection information between transactions). HTTP Protocol http: hypertext transfer protocol WWW’s application layer protocol client/server model client: browser that requests, receives, “displays” WWW objects server: WWW server sends objects in response to requests http1.0: RFC 1945 http1.1: RFC 2068 PC running Explorer Server running Apache Web server SUN running Netscape Navigator HTTP Protocol http: TCP transport service: client initiates TCP connection (creates socket) to server, port 80 server accepts TCP connection from client http messages (application-layer protocol messages) exchanged between browser (http client) and WWW server (http server) TCP connection closed http is “stateless” server maintains no information about past client requests Protocols that maintain “state” are complex! past history (state) must be maintained if server/client crashes, their views of “state” may be inconsistent, must be reconciled HTTP Protocol The format of the request and response messages are similar, and Englishoriented. Both kinds of messages consist of: an initial line, zero or more header lines, a blank line (i.e. a CRLF by itself), and an optional message body (e.g. a file, or query data, or query output). Request Initial Request Line A request line has three parts, separated by spaces: a method name, the local path of the requested resource, and the version of HTTP being used. A typical request line is: GET /path/to/file/index.html HTTP/1.0 GET is the most common HTTP method; it says "give me this resource". Other methods include POST and HEAD-- more on those later. Method names are always uppercase. The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general). The HTTP version always takes the form "HTTP/x.x", uppercase HTTP Request Header Format Two types of messages: request, response http request message: ASCII (human-readable format) request line (GET, POST, HEAD commands) header lines Carriage return, line feed indicates end of message GET /somedir/page.html HTTP/1.1 Connection: close User-agent: Mozilla/4.0 Accept: text/html, image/gif,image/jpeg Accept-language:en (extra carriage return, line feed) HTTP Request Header Format Response/Reply Initial Response Line (Status Line). The initial response line, called the status line, also has three parts separated by spaces: the HTTP version, a response status code that gives the result of the request, and an English reason phrase describing the status code. Typical status lines are: HTTP/1.0 200 OK or HTTP/1.0 404 Not Found Notes: HTTP Reply Header Format status line (protocol status code status phrase) header lines data, e.g., requested html file HTTP/1.1 200 OK Connection: close Date: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html data data data data data ... HTTP Reply Status Code 200 OK request succeeded, requested object later in this message 301 Moved Permanently requested object moved, new location specified later in this message (Location:) 400 Bad Request request message not understood by server 404 Not Found requested document not found on this server 505 HTTP Version Not Supported Sample HTTP Exchange To retrieve the file at the URL http://www.somehost.com/path/file.html first open a socket to the host www.somehost.com, port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through the socket: GET /path/file.html HTTP/1.0 From: [email protected] User-Agent: HTTPTool/1.0 [blank line here] Sample HTTP Exchange The server should respond with something like the following, sent back through the same socket: HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Happy New Millennium!</h1> (more file contents) . . . </body> </html> After sending the response, the server closes the socket. User-server interaction: authentication Authentication goal: control access to server server client documents usual http request msg stateless: client must 401: authorization req. present authorization in each request WWW authenticate: authorization: typically name, password usual http request msg + Authorization:line authorization: header line in request usual http response msg if no authorization presented, server usual http request msg refuses access, sends a + Authorization:line WWW authenticate: time header line in usual http response msg response User-server interaction: cookies Server sends “cookie” to client in response Set-cookie: # Client present cookie in later requests server client usual http request msg usual http response + Set-cookie: # cookie: # Server matches presented-cookie with server-stored cookies authentication remembering user preferences, previous choices usual http request msg cookie: # usual http response msg usual http request msg cookie: # usual http response msg cookiespectific action cookiespectific action User-server interaction: conditional GET client Goal: don’t send object if client has up-to-date http request msg If-modified-since: stored (cached) version <date> client: specify date of http response cached copy in http HTTP/1.0 304 Not Modified request If-modified-since: <date> server: response contains no object if cached copy up-to-date: HTTP/1.0 304 Not Modified server object not modified http request msg If-modified-since: <date> http response HTTP/1.1 200 OK … <data> object modified Message format: multimedia extensions MIME: multimedia mail extension, RFC 2045, 2056 additional lines in msg header declare MIME content type MIME version method used to encode data multimedia data type, subtype, parameter declaration encoded data From: [email protected] To: [email protected] Subject: Picture of yummy crepe. MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Type: image/jpeg base64 encoded data ..... ......................... ......base64 encoded data . MIME types Text example subtypes: plain, html Video example subtypes: mpeg, quicktime Image example subtypes: jpeg, gif Audio exampe subtypes: basic (8-bit mu-law encoded), 32kadpcm (32 kbps coding) Application other data that must be processed by reader before “viewable” example subtypes: msword, octetstream HTTP Headers (samples) User-Agent Mozilla/4.0 Accepts: (client-side) Mean #bytes per header: 300 (requests), 160 (responses) * Require parsing ! text/html, image/* Content-type: (server-side) text/html Expires, Last-Modified, If-Modified-Since absolute time stamps (1-sec resolution) Eg: Thu, 03 Jun 1999 20:16:34 GMT= Accept-Language, Accept-Charset Content-encoding HTTP/1.1 Improvements B/W optimization persistent connections pipelining does not block waiting for previous responses end-of-message mechanism Content-range access only specified “range” of a resource Explicit cache control (Cache-control) Digest authentication (Content-MD5) Web Caches (proxy server) Goal: satisfy client request without involving origin server User sets browser: WWW accesses via web cache client sends all http requests to web cache if object at web cache, web cache immediately returns object in http response else requests object from origin server, then returns http response to client origin server client client Proxy server origin server Why WWW Caching? Assume: cache is “close” to client (e.g., in same network) smaller response time: cache “closer” to client decrease traffic to distant servers link out of institutional/local ISP network often bottleneck origin servers public Internet 1.5 Mbps access link institutional network 10 Mbps LAN institutional cache Web caching (in)effectiveness Observed hit ratios below 50% even lower byte-weighted ratios ! Possible remedies ? Prefetching Delta-encoding HTML macros Duplicate suppression (digest-based) HTTP status & perspective J. C. Mogul, “What’s wrong with HTTP (and why it doesn’t matter)”, Proc. USENIX Technical Conference, 1999 Definitely not optimal Probably adequate It works well enough It’s not the only game in town Two-way initiation of operations Real-time Deferred delivery Revising it again would be too hard HTTP/1.0 -> HTTP/1.1 evolution took 4+ years !