Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lies, damn lies and Web statistics A brief introduction to using and abusing web statistics Paul Smith, ILRT July 2006 1 Overview • Some key terms explained • Log Analysers – What you can find out – What you can’t find out – Cache for questions • Trackers / counters • Further reading 2 Key terms • • • • • • • • Log file IP address Hit Visitor / visit / user session Page request / view Referrer Cache server Proxy server 3 Log Analysers • A few examples: – from Google Directory listing • What we use in ILRT: – – – – UNIX scripts Analog AWStats WebTrends / tools [www.analog.cx] [awstats.sourceforge.net] [www.webtrends.com] 4 What you can find out • Number of requests made to your server • When they were made • Which files were asked for • Which host asked you for them. 5 What you can find out (cont’d) • What people told you their browsers were • What the referring pages were You should be aware, though, that: • Many browsers deliberately lie • Users can configure the browser name • Some people use "anonymisers" which deliberately send false browsers and referrers. 6 What you can’t do • You can't tell the identity of your users • You can't tell how many visitors you've had • You can't tell how many visits you've had • Cookies don't solve these problems • You can't follow a person's path through your site 7 What you can’t do (cont’d) • You often can't tell where users entered your site, or where they linked to you from • You can't tell how they left your site, or where they went next • You can't tell how long people spent reading each page • You can't tell how long people spent on your site 8 Cache for questions • Cacheing proxy servers are the main problem: – if users get your pages from a local cache server, you will never know – many users can connect to your server using the same cache/proxy server – one user can appear to connect from many different hosts (eg AOL) 9 Trackers / Counters • A more recent innovation: – Code embedded in each of your web pages – Makes call directly to host data server – Can reveal more detail (screen size, screen colours, originating host name, etc) • Examples: – SiteStat: [www.nedstat.co.uk/] – Google Analytics: [analytics.google.com] 10 Further reading • How the Web works by Stephen Turner • Interpreting WWW Statistics by Doug Linder • Measuring Web Site Usage: Log File Analysis by Susan Haigh and Janette Megarity • Who Goes There? • Measuring Library Web Site Usage by Kathleen Bauer • Why Web Usage Statistics are (Worse Than) Meaningless by Jeff Goldberg. 11