Download PowerPoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Discussion Class 9
Google
1
Discussion Classes
Format:
Question
Ask a member of the class to answer.
Provide opportunity for others to comment.
When answering:
Stand up.
Give your name. Make sure that the TA hears it.
Speak clearly so that all the class can hear.
Suggestions:
Do not be shy at presenting partial answers.
Differing viewpoints are welcome.
2
Question 1: Indexing the Web
(a) Who are the authors of this paper?
(b) The authors criticize conventional ranking methods,
based on vector similarity. What are their criticisms?
Do you agree with them?
(c) Why not use standard full-text indexing with tf.idf
weighting?
3
Question 2: Ranking
The authors of the paper state that their objective is to
maximize precision.
(a) What do they mean by "precision"?
(b) What assumptions does this imply about users and their
wishes?
(c) How does their view of relevance differ from the
conventional view?
(d) How well would you expect Google to perform in the
TREC ad hoc track?
4
Question 3: PageRank Algorithm
(a) Traditional text search engines rank hits by the similarity
of each document to a query. How does PageRank rank
the hits returned by a query?
(b) What is the concept behind PageRank?
(c) What other ranking methods does Google use?
5
Question 4: Anchor Text
(a) What is anchor text?
(b) How does Google use anchor text to index a web
page?
(c) What are the computational challenges in this
approach?
6
Question 5: Scaling
Much of the article is about scalability.
(a) How many pages were they indexing when they wrote
the article? How many today? How many queries does the
system handle every day?
(b) What is their strategy for scalability? Where do you
think the limitations lie?
(c) How did they manage to implement such a large-scale
(and ever changing) system with a small technical staff?
7
Question 6: Spamming
"There are even numerous companies which specialize in
manipulating search engines for profit."
(a) Explain this statement.
(b) How does Google overcome this problem?
(c) Why are the authors unenthusiastic about using metadata
for indexing the web?
8
Question 7: Implementation
(a) What is the function of the Google lexicon? How is it
stored?
(b) What is the function of the hit list? How is it stored?
(c) How can a Google search find a web page that has never
been indexed?
9