Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Potted View of the Web Who, where, when, why. Approximately. Lee Gillam Department of Computing Key Milestones - some developments • ARPANET: c1967, decentralized computer network (US DoD) => INTERNET: c1983, relying on TCP/IP as a means to split data into packets and route them to computers • Email: c1972, software for ARPANET • Personal Computer (affordable?): c1980, Apple Lisa/Mac and IBM PC • SGML: c1970; Hypertext: c1987, Apple’s Hypercard • Web Browser: c1990, Tim Berners-Lee then at CERN, the European Organization for Nuclear Research – Hypertext + Internet + PC to produce an information network enabling physicists are CERN to share experimental results – First webpage: http://info.cern.ch/hypertext/WWW/TheProject.html, Key Milestones - some developments • Browsers - TBL Browser called “WorldWideWeb”; Mosaic: 1993; Opera: 1994; Mozilla (Netscape): 1994; Internet Explorer: 1995 ….. IE, Firefox, Safari, …. • Web Servers - Apache, IIS, ….. • Search Engines - AltaVista, Google, MSN, ….. • Estimated size of the web: 125m sites (July 2007); 88m (July 2006). Google: ~5bn pages (+/- 14bn) in English. 2001, English about 68%; 35% by 2004. • What else? Video, Audio, IM, …. • What next? – Semantic Web - the machine-understandable web (TBL) How does it work? • Using the Domain Name Servers (DNS) – A distributed database that converts names to addresses. Overall management by the Internet Corporation for Assigned Names and Numbers (ICANN). US, 1998. –http://www.bbc.co.uk (= http://www.bbc.tv) => http://212.58.224.116 (http://212.58.224.116:80/) – Hypertext Transfer Protocol (HTTP): for moving information (HTML) around the web. – HTTP client establishes TCP connection to a port (usually 80) [Email is port 25]; HTTP server (program) responds with messages (e.g. 404 error) and data (HTML). – Country code top level domains (ccTLD): Nominet for .uk (.gb); .tv = Tuvalu, an island in the Pacific. .com, .net, .eu, .travel, .org …. (gTLD) How does it work? • Think about a distributed telephone conversation 26 26 what? 26 days Full days or working days? • What do we assume? How does it work? <p> <p>? ?? <para> An unknown or unexpected error has occurred. • Communication and commonality (standards) are vital (TCP/IP, HTTP, HTML….) ….. Machines are dumb ….. The Browser still has to present the (represented) information somehow. How does it work? • HTML: Elements, Attributes and Values <a href=“http://www.cs.surrey.ac.uk”>Surrey Computing</a> – Elements delimit: e.g. <a>, </a> – Elements are additionally specified by attributes, e.g. href – Values fill elements or attributes, e.g. http://www.cs.surrey.ac.uk and Surrey Computing – Should have closure for elements (corresponding </ >) browsers can be forgiving How does it work? • HTML: Elements, Attributes and Values – Elements and values are embedded, e.g. <h1> Visit the <a href=“http://www.cs.surrey.ac.uk”> Surrey Computing </a> Website </h1> – 1[h1 2[text, a 3[@href 4[text]4, text]3, text]2] 1.