Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Promoting Your Project Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY England Email [email protected] URL http://www.ukoln.ac.uk/ Project Manager for Exploit Interactive web magazine http://www.exploit-lib.org/ UKOLN is funded by the Library and Information Commission, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. 1 Approaches What approaches can we take to raising the profile of our web site? • Tell our friends and colleagues (at conferences in exotic places) • Give away pens and bags • Let it happen automatically • Submitting resources • Perhaps giving parts of our web site away? 2 Automated Indexing Many users use search engines such as AltaVista, HotBot, Northern Lights, etc. to find resources. Issues: • Will my site be indexed? • Will it be near the top of a sensible search query? • How can I improve things? 3 Problems in Being Indexed Size of Index Search engines are failing to keep up with the growth of the web Not all pages on a web site will be indexed Typically a 500 page sample will be indexed Frames (and "splash screens") Many indexing robots can't access framed web sites or web sites which use "splash screens" 4 Improving Indexing of Key Resources How to ensure that quality pages are indexed: • Don't publish non-work pages on the server • Move from a single large institutional server to multiple (real or virtual) servers: Instead of <www.ukoln.ac.uk/exploit/> use <exploit.ukoln.ac.uk/> or (even better) <exploit-lib.org/> • Avoid use of frames (or provide link to alternative entry point) These approaches will improve chances of more complete indexing of the web site 5 Improving Indexing (2) Do you know if your project web sites uses the Robot Exclusion Protocol (REP) - a /robots.txt file? User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ # Following apply to all robots # Don't index /cgi-bin directory # Don't index /tmp directory Use the REP to: • Prevent junk (old or draft versions, experimentation, etc) from being indexed Check your /robots.txt file to: • Ensure that your web site can be indexed Tools are available to help you manage the robots.txt file. For example RoboGen: <http://www.rietta.com/robogen/> 6 Improving Indexing (3) Updating the /robots.txt file may be difficult. The (new) <META> feature allows HTML authors to control robots. <META NAME="robots" CONTENT="noindex, nofollow"> Use this in key menu pages for resources you don't want indexed. deliverables reports draft personal See <http://info.webcrawler.com/mak/ projects/robots/meta-user.html> and <http://www.kollar.com/robots.html> 7 Some Solutions (3) Getting Your Web Site Indexed (cont) Several search engines allow URLs to be submitted Bulk Submissions Turnaround time from a few days to several months And what about bulk submission services? 8 Some Solutions (4) Some Submission Engines http://www.webposition.com/ http://www.netsubmitter.com/ http://www.registerpro.com/ http://www.pegasoweb.com/ engenius/ http://www.exploit.com/wizard/ There are products for submitting sites to multiple search engines (and analysing your pages, reporting on your position in search engines, etc.) But: • How good are they? • How ethical are they? • How cost-effective are they? 9 Has It Worked? How do you know if robots are visiting your web site? The free BotWatch Perl program will analyse your log files and generate a report on visits by robots. BotWatch is available at <http://www.tardis.ed.ac.uk/~sxw/robots/ botwatch.html> See also <http://www.botspot.com/> 10 Problems in Ranking Typically large numbers of hits are obtained. Metadata may help <meta name="keywords" content="exploit, web magazine, TAP, telematics"> <meta name="description" content="Exploit Interactive is a .."> <meta name="DC.Title" content="Exploit .." But: • "AltaVista" and Dublin Core metadata are not supported by all (many?) search engines • Issues about maintenance of metadata 11 Some Solutions Use of "AltaVista" metadata is a must for key pages Use of Dublin Core: • Could be used in specialist applications (domain-specific search engines, current awareness services, B2B, etc.) • Think about additional benefits to you (e.g. local searching, auditing) • Scope for discussions with search engine vendors? • Need to think about deployment and The Exploit Interactive web magazine uses Dublin maintenance Core metadata to enhance local searching. The rd parties metadata can also be used by 3 12 Analysis of NFP Web Sites Report of an analysis of NFP (National Focal Point) web sites published in Exploit Interactive issue 3. Of the 10 web sites: • No significant use of metadata on main entry point • Six made no use of REP, one disallowed all robots and three made sensible use • No use of separate domain names • One framed site 13 http://www.exploit-lib.org/issue3/nfp-websites/ Web Directories Web directories (e.g. Yahoo!) provide manuallycompiled classifications of the web Benefits to Projects: • Additional place to be found • "61% reach in UK Search engine market" • Can be sensibly classified e.g. Ariadne magazine is in <http://www.yahoo.co.uk/Reference/Libraries/ Professional_Resources/Internet_in_Libraries/> Problems: • Time-consuming for cataloguers • Entries can be submitted, but this can be time-consuming • "..sub-domains have difficulties in getting into Yahoo!" Compare: www.ukoln.ac.uk/projects/eu/exploit/ www.ukoln.ac.uk/~exploit/ 14 www.exploit-lib.org www.ukoln-exploit.ac.uk Submission to Web Directories It might be worth submitting to web directories such as Yahoo! Remember that the information will be processed by humans. See <http://www. searchengine watch.com/ webmasters/> 15 Give Your Web Site Away Another way to promote your web site is to give it away! You could give away: • Parts of the site to robots (e.g. metadata) • Parts of the interface You could give away the • The entire site interface to: • your local indexer • a remote indexing service e.g. HotBot See <www.ariadne. ac.uk/issue21/ webwatch/> Search interface embedded in Exploit Interactive article at <http://www.exploit-lib.org/issue3/nfp-websites/> 16 Give Part of Your Site Away OMNI gives an example of a site hosting remote search interfaces. Enhances remote interface, but several issues. See article at <http://www. ariadne.ac.uk /issue21/ webwatch/> for discussion http://www.omni.ac.uk/other-search/ 17 Give Your Web Site Away Why not have your web site mirrored? Mirrors in, say, USA and Australia will help to promote your service. Issues: Is your web site easily mirrored? • Are relative URLs used? • Do you use directories structures to delineate areas of your web site? • If you use server-side scripting for management purposes, do you hide unusual URLs: /issue1/mag-features.asp # Problems /issue1/mag-features/default.asp /issue1/mag-features/ # Usable on Unix (also techniques such as Apache rewrites) If your web site can't be mirrored, can it be preserved? 18 See AlertBox column at <http://www.useit.com/ alertbox/990321.html> Citation Is your project web site address easy to remember? Issues: • Short domain names are a winner • Short URLs are desirable (try to avoid org. structure) • Try to cite directories (shorter and less ambiguous): www.exploit.org/issue1/pride/article.htm (article.html, article.asp) www.exploit.org/issue1/pride/ # pride/default.asp • Very important for web site home page • Try to avoid use of tilde (~) • Avoid citing binary files (inaccessible, lack "Promoting Web Site" Talk of metadata, alternative versions, Given on 18 Nov 1999 etc.) Slides: [HTML] – [PowerPoint] 19 Let's Not Forget Publications http://www.exploit-lib.org/issue3/ Getting published in a web magazine (such as Exploit Interactive) can have many benefits: • Visibility to (variety of) readers • Web magazine may submit its pages to search services • Links in web magazine may be harvested • Web magazine may be made available on CD ROM, free text system, etc. 20 Magazine articles may also be cited e.g. see <http://sunsite.berkeley .edu/CurrentCites/> Measuring Your Success Link popularity is growing in importance as search engines make use of citation analysis ("this site is best, as there are lots of links to it" or "this site is linked to by important sites"). LinkPopularity.com lets you check on the number of sites linking to your web site "I tried [LinkPopularity.com], pointing out to a potential advertiser that EEVL had, according to HotBot, 1099 sites linking to it, whilst there were only 18 sites linking to their site, and suggested that what they needed was more exposure. It seems to have worked, as they have agreed to buy an ad on the soon to be released new design EEVL site." Roddy 21 McLeod, EEVL (posting to lis-elib list) Don't Forget Your Stats You will produce graphs of your web statistics for project reports Do the graphs indicate: • A healthy growth • Growth in the number of robots • Growth in the wrong community Look beneath the surface Think about "enterprise analysis packages" referer: "" referer: "www.foo.fr/goodstuff/" # Entered directly # Followed link If you record the referrer field you will be able to see the links users follow to arrive at your web site. This may help to inform dissemination strategies. 22 Universal Design Many of the guidelines provided will have additional benefits: • Robots and people with disabilities (e.g.blind users) have similar characteristics i.e. can't follow images, may not be able to access framed sites, etc. • Indexing programs may index ALT attributes in <IMG> elements • Sensibly-structured web sites can be more easily archived and mirrored. • Metadata for general resource discovery can be reused for other applications (e.g. current awareness services). 23 Conclusions To conclude: • There are approaches to the web site architectural design which can help in promoting your project web site, including: – Project-specific domains – Use of the robots.txt file – Accessible web design – Short URLs – Metadata • Once you have the correct architecture, you can assist in the promotion process through various submission tools • Many of the solutions will have additional benefits • Ideally the solutions will be implemented at the start of the project! • Dialogue with your server administrator is important 24 Further Information Book Reviews <http://www.hw.ac. <http://www.searchenginewatch.com/> uk/libWWW/irn/irn58/ Deadlock irn58d.html#recent> <http://www.deadlock.com/promote/> <http://www.hw.ac.uk Did-it /libWWW/irn/irn59/ <http://www.did-it.com/> irn59d.html#recent> ViirtualPromote Search Engine Watch <http://www.virtualpromote.com/promotea.html> Pegasoweb <http://http://www.pegasoweb.com/> Yahoo! <http://dir.yahoo.com/ Computers_and_Internet/ Internet/World_Wide_Web/ Information_and_ Documentation/Site_Announ cement_and_Promotion/> Broadcaster – URL submission service <http://www.broadcaster.co.uk/> Submit-it – URL submission service 25 <http://www.submit-it.com/>