Download What is the "INvisible Web", a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia , lookup

Transcript
What is the "Invisible Web", a.k.a.1 the "Deep Web"?
(“Görünmez Web” bir başka deyişle “Derin Web” nedir?)
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
The "visible web" is what you see in the results pages from general web search engines. It's also what you
see in almost all subject directories (konu rehberleri). The "invisible web" is what you cannot retrieve ("see")
in the search results and other links contained (içermek) in these types of tools.
The first version of this web page written in 2000, when this topic was new and baffling (şaşırtıcı, kafa
karıştırıcı) to many web searchers. Since then, search engines’ crawlers (arama motoru örümcekleri) and
indexing programs have overcome (üstesinden gelmek) many of the technical barriers that made it impossible
for them to find and provide (sağlamak, karşılamak) invisible web pages. These types of pages used to be
invisible but can now be found in most search engine results:



Pages in non-HTML formats (pdf, Word, Excel, Power Point, etc.) are "translated" into HTML now in
most search engines and can "seen" in search results.
Script-based (kod/komut dizisi tabanlı) pages, whose links contain a “?” or other script coding, no
longer cause most search engines to exclude (hariç tutmak) them.
Pages generated (oluşturmak) dynamically by other types of database software (e.g., Active Server
Pages, Cold Fusion) can be indexed if there is a stable (durağan, sabit) URL2 somewhere that search
engine spiders can find. Once these were largely shunned (uzak durmak, kaçınmak) by search engines.
There are now many types of dynamically generated pages like these that are found in most general
web search engines. There must be a stable link to the page somewhere.
Why isn't everything visible?
There are still some hurdles (engel) search engine spiders cannot leap (sıçrama, atlama). Here are some
examples of material that remains hidden from general search engines:
o
1
2
The Contents of Searchable Databases. When you search in a library catalog, article database,
statistical database, etc., the results are generated "on the fly" (anında, irticalen, doğaçlama) in answer
to your search. Because the crawler programs cannot type or think, they cannot enter passwords on a
login screen or keywords in a search box. Thus, these databases must be searched separately.
 A special case: Google Scholar is part of the public or visible web. It contains citations to journal
articles and other publications, with links to publishers or other sources where one can try to
access the full text of the items. This is convenient (uygun), but results in Google Scholar are only
a small fraction (kesim) of all the scholarly publications (bilimsel yayınlar) that exist online. Much
“A”lso “K”nown “A”s
URL (Uniform Resource Locator) = web sayfalarına/sitelerine ulaşmaya yarayan tek biçim, standart adresler.
1
o
more - including most of the full text - is available through article databases that are part of the
invisible web.
Excluded Pages. Search engine companies exclude some types of pages by policy, to avoid cluttering
(karıştırmak) their databases with unwanted content.
 Dynamically generated pages of little value beyond single use. Think of the billions of possible
web pages generated by searches for books in library catalogs, public-record databases, etc. Each
of these is created in response (yanıt) to a specific need. Search engines do not want all these
pages in their web databases, since they generally are not of broad interest.
 Pages deliberately (kasten) excluded by their owners. A web page creator who does not want
his/her page showing up in search engines can insert special "meta tags" (üst veri etiketleri) that
will not display on the screen, but will cause most search engines' crawlers to avoid the page.
How to Find the Invisible Web
Simply think "databases" and keep your eyes open. You can find searchable databases containing invisible
web pages in the course of (süresince, boyunca) routine searching in most general web directories. Of
particular value in academic research are
 ipl2
 Infomine
Use Google and other search engines to locate searchable databases by searching a subject term and the word
"database". If the database uses the word database in its own pages, you are likely to find it in Google. The
word "database" is also useful in searching a topic in the Google Directory or the Yahoo! directory, because
they sometimes use the term to describe searchable databases in their listings.
EXAMPLES for Google & Yahoo:
plane crash (uçak kazası) database
languages database
toxic chemicals (zehirli kimyasallar) database
Remember that the Invisible Web exists. Remember that, in addition to what you find in search engine
results (including Google Scholar) and most web directories, there are other gold mines (altın madeni) you
have to search directly. This includes all of the licensed article, magazine, reference, news archives, and other
research resources that libraries and some industries buy for those authorized (yetkili) to use them. The
contents of these are not freely available: libraries and corporations (şirketler) buy the rights (haklar) for their
authorized users to view the contents. If they appear free, it's because you are somehow authorized to search
and read the contents (library card holder, member of the company, etc.).
As part of your web search strategy, spend a little time looking for databases in your field or topic of study or
research. Remember, however, that all proprietary information (tescilli bilgi) -- most of the journals,
magazines, news, and books – may not be freely available. Publishers and authors control them under
copyright and other distribution (dağıtım) rules. You will be prompted to pay (para ödemeye sevk etmek) or
enter a password to see full text. A library you have the rights to use may have access to what you want,
however.
The Ambiguity Inherent in the Invisible Web:
(Görünmez Web’in doğasında olan belirsizlikler)
It is very difficult to predict (önceden bildirmek) what sites or kinds of sites or portions (bölüm) of sites will or
won't be part of the Invisible Web. There are several factors involved (içermek):
o
o
o
o
Which sites replicate (kopyalamak) some of their content in static pages [hybrid (karma) of
visible and invisible in some combination]?
Which replicate it all [visible in search engines if you construct a search matching terms in the
page]?
Which databases replicate none of their dynamically generated pages in links and must be
searched directly [totally invisible]?
Search engines can change their policies on what they exclude and include.
2