Download 5_FindingItOnTheWeb - Systems and Computer Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia , lookup

Transcript
Finding It on the Web
Search Engines, Tags, Digg
Introduction
• Every web resource has a name - a
unique address, or URL (Uniform
Resource Locator), of the form:
• www.webmama.com or
• This course
• Underlying the URL is an IP Address,
an unique12-digit number in dotted
decimal form, e.g., 134.117.254.227
•
Winter 2008
Binary 10000110 01010101 11111110 11100011
Learning in Retirement - The Evolution of the Web
2
Finding the IP Address
• Finding the IP Address that
corresponds to a specific URL is a
service that is part of the Internet
called the Domain Name Service, or
DNS.
• That’s all well and good, but what if
you don’t know the URL of the site
where the information you want is
stored?
Winter 2008
Learning in Retirement - The Evolution of the Web
3
Search Engine
• That is where Search Engines come
into play.
• Finding a web site or an item in a web
site
• Some popular search engines are:
Google, Yahoo!Search, Ask.com, Live
Search, Technorati, Alexa Internet.
Winter 2008
Learning in Retirement - The Evolution of the Web
4
Types of Search Engines
General - Open Source - Metasearch
Regional - People - Email-based - Visual
Job - Forum – Blog – News - Multimedia
Code - BitTorrent (P2P file transfer) Accountancy – Medical – Property – Legal Business - Comparison Shopping Geographic - Social Search engines for kids - Desktop search
engines - Answer-based - Google-based
Yahoo!-based - Ask.com-based
Winter 2008
Learning in Retirement - The Evolution of the Web
5
General Search Engines
•
•
•
•
•
•
•
•
•
Winter 2008
Alexa Internet
Ask.com (formerly Ask Jeeves)
Exalead
Gigablast
Google
Live Search (formerly MSN Search)
MozDex
WiseNut
Yahoo! Search
Learning in Retirement - The Evolution of the Web
6
Market Share – Search Engines
Winter 2008
Learning in Retirement - The Evolution of the Web
7
Google
• Google's mission is to organize the world's
information and make it universally
accessible and useful.
• As a first step to fulfilling that mission,
Google's founders Larry Page and Sergey
Brin developed a new approach to online
search that took root in a Stanford University
dorm room and quickly spread to
information seekers around the globe.
• Google is now widely recognized as the
world's largest search engine -- an easy-touse free service that usually returns relevant
results in a fraction of a second.
Winter 2008
Learning in Retirement - The Evolution of the Web
8
Google Query Handling
1. The web server sends
the query to the index
servers. The content
inside the index servers is
similar to the index in the
back of a book - it tells
which pages contain the
words that match the
query.
3. The search
results are
returned to the
user in a fraction
of a second.
2. The query travels to
the doc servers, which
actually retrieve the
stored documents.
Snippets are generated to
describe each search
result.
http://www.google.ca/intl/en/corporate/tech.html
Winter 2008
Learning in Retirement - The Evolution of the Web
9
PageRank Technology
• PageRank reflects Google's view of the importance of
web pages by considering more than 500 million
variables and 2 billion terms. Pages that Google believes
are important pages receive a higher PageRank and are
more likely to appear at the top of the search results.
• PageRank also considers the importance of each page
that casts a vote, as votes from some pages are
considered to have greater value, thus giving the linked
page greater value.
• Important pages receive a higher PageRank and appear
at the top of the search results.
• Google's technology uses the collective intelligence of the
web to determine a page's importance. There is no
human involvement or manipulation of results, which is
why users have come to trust Google as a source of
objective information untainted by paid placement.
Winter 2008
Learning in Retirement - The Evolution of the Web
10
Hypertext-Matching Analysis
• Google's search engine also analyzes page
content. However, instead of simply
scanning for page-based text (which can be
manipulated by site publishers through
meta-tags), Google's technology analyzes
the full content of a page and factors in
fonts, subdivisions and the precise location
of each word.
• Google also analyzes the content of
neighboring web pages to ensure the results
returned are the most relevant to a user's
query.
Winter 2008
Learning in Retirement - The Evolution of the Web
11
Google’s Computing Power
• Google searches are conducted by custombuilt software on 100’s of thousands of
custom-built PC’s housed in huge “computer
farms” scattered across the world.
• “The largest computer system in the world”
• “Working together, these customized
computers rapidly cary out searches by
breaking the queries down into tiny parts.
These parts ar eprocessed simultaneously
by comparing thenm to copies of the
Interent that have been indexed and
organized in advance.”
Quotes from: The Google Story, David A. Vise, PAN Books, 2006.
Winter 2008
Learning in Retirement - The Evolution of the Web
12
Ten things Google has found to be true
1. Focus on the user and all else will follow.
From its inception, Google has focused on
providing the best user experience possible.
Google has steadfastly refused to make any
change that does not offer a benefit to the users
who come to the site:
•
•
•
•
Winter 2008
The interface is clear and simple.
Pages load instantly.
Placement in search results is never sold to anyone.
Advertising on the site must offer relevant content
and not be a distraction.
Learning in Retirement - The Evolution of the Web
13
2. It's best to do one thing really, really
well.
3. Fast is better than slow.
4. Democracy on the web works.
5. You don't need to be at your desk to
need an answer.
6. You can make money without doing
evil.
Google’s corporate motto is:
“Don’t Be Evil”
Winter 2008
Learning in Retirement - The Evolution of the Web
14
7. There's always more information
out there.
8. The need for information crosses
all borders.
9. You can be serious without a suit.
10. Great just isn't good enough.
Winter 2008
Learning in Retirement - The Evolution of the Web
15
Comparison: “Carleton University”
• Google
• 1,820,000 for. (0.22 seconds)
• Yahoo
• 5,380,000 (About 0.24 seconds)
• Live Search
• 4,680,000 results
• Ask.com
• 792,200
Winter 2008
Learning in Retirement - The Evolution of the Web
16
Carleton Plug
Google Executive Management Group
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Winter 2008
Dr. Eric Schmidt, Chairman of the Board and Chief Executive Officer
Larry Page, Co-Founder & President, Products
Sergey Brin, Co-Founder & President, Technology
Shona Brown, Senior Vice President, Business Operations
W. M. Coughran, Jr., Senior Vice President, Engineering
David C. Drummond, Senior Vice President, Corporate Development
and Chief Legal Officer
Alan Eustace, Senior Vice President, Engineering & Research
Urs Hölzle, Senior Vice President, Operations & Google Fellow
Jeff Huber, Senior Vice President, Engineering
Omid Kordestani, Senior Vice President, Global Sales & Business
Development
George Reyes, Senior Vice President & Chief Financial Officer
Jonathan Rosenberg, Senior Vice President, Product Management
Laszlo Bock, Vice President, People Operations
Elliot Schrage, Vice President, Global Communications & Public
Affairs
Learning in Retirement - The Evolution of the Web
17
Shona Brown
• Carleton B. Eng. ’87. Computer Systems Engineering
• Rhodes Scholar: MA Oxford Economics and Philosophy
• Ph. D. Stanford in Industrial Engineering and Engineering
Management.
• Published her PhD thesis “Competing on the Edge” in coauthorship with her supervisor – became a highly regarded
best seller on the business booklists
• Joined McKinsey and Company – a global management
consulting firm. Became a Principal.
• Hired by Google in 2003 as Vice President of Business
Operations – to guide Google’s growth after the IPO. Now
Senior Vice President of Business Operations: Human
Resources; Business Operations; and executive trouble
shooting.
• She is, essentially, the Chief Operating Officer of Google
Winter 2008
Learning in Retirement - The Evolution of the Web
18
Social Bookmarking
• Social bookmarking is a way for Internet users to
store, organize, share and search bookmarks of
web pages.
• In a social bookmarking system, users save links to
web pages that they want to remember or share.
• These bookmarks are usually public, but
depending on the service's features, may be saved
privately, shared only with specific people or
groups, shared only inside certain networks, or
another combination of public and private.
• The allowed people can usually view these
bookmarks chronologically, by category or tags, via
a search engine, or even randomly.
Winter 2008
Learning in Retirement - The Evolution of the Web
19
Tags
• A tag is a (relevant) keyword or term
associated with or assigned to a piece
of information (e.g. a picture, a
geographic map, a blog entry, or video
clip), thus
• describing the item and
• enabling keyword-based classification
and search of information.
Winter 2008
Learning in Retirement - The Evolution of the Web
20
Some Web Sites That Use Tags.
• del.icio.us - A social bookmarking site that
allows users to bookmark many sites and
then tag them with many descriptive
words, allowing other people to search by
those terms to find pages that other people
found useful.
• Flickr - A picture service that allows users
to tag images with many specific nouns,
verbs, and adjectives that describe the
picture. This is then searchable.
Winter 2008
Learning in Retirement - The Evolution of the Web
21
Other Tag Sites
• Gmail - A webmail site that was one of
the first to allow categorization of
objects using tags, known as "labels"
on emails.
• Technorati - A weblog search engine.
• Last.fm - A social music website that
allows users to tag artists, albums and
tracks
Winter 2008
Learning in Retirement - The Evolution of the Web
22
Digg
• A social content website, launched
December 5th 2004.
• Digg is a community-based popularity
website with an emphasis on technology
and science articles, recently expanding to a
broader range of categories such as politics
and entertainment.
• It combines social bookmarking, blogging,
and syndication with a form of nonhierarchical, democratic editorial control.
•
Winter 2008
From Wikipedia, the free encyclopedia
Learning in Retirement - The Evolution of the Web
23
If you dig it, man! Digg It!
• News stories and websites are submitted by
users, and then promoted to the front page
through a user-based ranking system. This
differs from the hierarchical editorial system that
many other news sites employ.
• Readers can view all of the stories that have
been submitted by fellow users in the
"digg/News/Upcoming" section of the site. Once
a story has received enough "diggs", it appears
on Digg's front page.
• Should the story not receive enough diggs, or if
enough users report a problem with the
submission, the story will remain in the "digg
all" area, where it may eventually be removed.
Winter 2008
Learning in Retirement - The Evolution of the Web
24
Digging deeper
• Articles are short summaries of stories on
other websites with links to the stories, and
provisions for readers to comment on the
story.
• All content and access to the site is free,
but registration is compulsory for certain
elements, such as promoting ("digging")
stories, submitting stories and commenting
on stories.
• Digg also allows for stories to be posted to
a user's blog automatically when he or she
diggs a story.
Winter 2008
Learning in Retirement - The Evolution of the Web
25
More digging
• Originally, stories could be submitted in
fifteen different categories which were:
deals, gaming, links, mods, music, robots,
security, technology, Apple, design,
hardware, Linux/Unix, movies,
programming, science and software.
• With the release of Digg 3.0 on June 26,
2006, the categories became divided into 6
containers: Technology, Science, World &
Business, Sports, Entertainment, Gaming,
with sub-categories.
Winter 2008
Learning in Retirement - The Evolution of the Web
26