Download WebHistory - Department of Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia , lookup

Transcript
A Potted View of the Web
Who, where, when, why.
Approximately.
Lee Gillam
Department of Computing
Key Milestones - some developments
• ARPANET: c1967, decentralized computer network (US DoD) =>
INTERNET: c1983, relying on TCP/IP as a means to split data into
packets and route them to computers
• Email: c1972, software for ARPANET
• Personal Computer (affordable?): c1980, Apple Lisa/Mac and
IBM PC
• SGML: c1970; Hypertext: c1987, Apple’s Hypercard
• Web Browser: c1990, Tim Berners-Lee then at CERN, the
European Organization for Nuclear Research
– Hypertext + Internet + PC to produce an information network
enabling physicists are CERN to share experimental results
– First webpage:
http://info.cern.ch/hypertext/WWW/TheProject.html,
Key Milestones - some developments
• Browsers - TBL Browser called “WorldWideWeb”; Mosaic: 1993;
Opera: 1994; Mozilla (Netscape): 1994; Internet Explorer: 1995
….. IE, Firefox, Safari, ….
• Web Servers - Apache, IIS, …..
• Search Engines - AltaVista, Google, MSN, …..
• Estimated size of the web: 125m sites (July 2007); 88m (July
2006). Google: ~5bn pages (+/- 14bn) in English. 2001, English
about 68%; 35% by 2004.
• What else? Video, Audio, IM, ….
• What next?
– Semantic Web - the machine-understandable web (TBL)
How does it work?
• Using the Domain Name Servers (DNS)
– A distributed database that converts names to addresses.
Overall management by the Internet Corporation for Assigned
Names and Numbers (ICANN). US, 1998.
–http://www.bbc.co.uk (= http://www.bbc.tv) =>
http://212.58.224.116 (http://212.58.224.116:80/)
– Hypertext Transfer Protocol (HTTP): for moving
information (HTML) around the web.
– HTTP client establishes TCP connection to a port
(usually 80) [Email is port 25]; HTTP server (program)
responds with messages (e.g. 404 error) and data
(HTML).
– Country code top level domains (ccTLD): Nominet for .uk
(.gb); .tv = Tuvalu, an island in the Pacific. .com, .net, .eu,
.travel, .org …. (gTLD)
How does it work?
• Think about a distributed telephone conversation
26
26 what?
26 days
Full days or
working
days?
• What do we assume?
How does it work?
<p>
<p>?
??
<para>
An unknown or
unexpected error
has occurred.
• Communication and commonality (standards) are
vital (TCP/IP, HTTP, HTML….) ….. Machines are
dumb ….. The Browser still has to present the
(represented) information somehow.
How does it work?
• HTML: Elements, Attributes and Values
<a href=“http://www.cs.surrey.ac.uk”>Surrey Computing</a>
– Elements delimit: e.g. <a>, </a>
– Elements are additionally specified by attributes, e.g. href
– Values fill elements or attributes, e.g.
http://www.cs.surrey.ac.uk and Surrey Computing
– Should have closure for elements (corresponding </ >) browsers can be forgiving
How does it work?
• HTML: Elements, Attributes and Values
– Elements and values are embedded, e.g.
<h1>
Visit the
<a
href=“http://www.cs.surrey.ac.uk”>
Surrey Computing
</a>
Website
</h1>
– 1[h1 2[text, a 3[@href 4[text]4, text]3, text]2] 1.