Download Lies, damn lies and Web statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lies, damn lies and Web statistics
A brief introduction to using
and abusing web statistics
Paul Smith, ILRT
July 2006
1
Overview
• Some key terms explained
• Log Analysers
– What you can find out
– What you can’t find out
– Cache for questions
• Trackers / counters
• Further reading
2
Key terms
•
•
•
•
•
•
•
•
Log file
IP address
Hit
Visitor / visit / user session
Page request / view
Referrer
Cache server
Proxy server
3
Log Analysers
• A few examples:
– from Google Directory listing
• What we use in ILRT:
–
–
–
–
UNIX scripts
Analog
AWStats
WebTrends
/ tools
[www.analog.cx]
[awstats.sourceforge.net]
[www.webtrends.com]
4
What you can find out
• Number of requests made to your
server
• When they were made
• Which files were asked for
• Which host asked you for them.
5
What you can find out (cont’d)
• What people told you their browsers were
• What the referring pages were
You should be aware, though, that:
• Many browsers deliberately lie
• Users can configure the browser name
• Some people use "anonymisers" which
deliberately send false browsers and referrers.
6
What you can’t do
• You can't tell the identity of your users
• You can't tell how many visitors you've
had
• You can't tell how many visits you've
had
• Cookies don't solve these problems
• You can't follow a person's path through
your site
7
What you can’t do (cont’d)
• You often can't tell where users entered
your site, or where they linked to you
from
• You can't tell how they left your site, or
where they went next
• You can't tell how long people spent
reading each page
• You can't tell how long people spent on
your site
8
Cache for questions
• Cacheing proxy servers are the
main problem:
– if users get your pages from a local cache
server, you will never know
– many users can connect to your server
using the same cache/proxy server
– one user can appear to connect from many
different hosts (eg AOL)
9
Trackers / Counters
• A more recent innovation:
– Code embedded in each of your web pages
– Makes call directly to host data server
– Can reveal more detail (screen size, screen
colours, originating host name, etc)
• Examples:
– SiteStat:
[www.nedstat.co.uk/]
– Google Analytics: [analytics.google.com]
10
Further reading
• How the Web works by Stephen Turner
• Interpreting WWW Statistics by Doug Linder
• Measuring Web Site Usage: Log File Analysis
by Susan Haigh and Janette Megarity
• Who Goes There?
• Measuring Library Web Site Usage by Kathleen
Bauer
• Why Web Usage Statistics are (Worse Than)
Meaningless by Jeff Goldberg.
11