Download Web Server Log

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140
"http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)“
252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453
"http://www.yisou.com/search?p=data+mining&source=toolbar_yassist_button&pid=400
740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)"
252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145
"http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
SV1; MyIE2)"
252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif
HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE
6.0; Windows NT 5.1; SV1; MyIE2)"
2: Web
Server Log
152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140
"http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)“
252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453
"http://www.yisou.com/search?p=data+mining&source=toolbar_yassist_button&pid=400
740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)"
252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145
"http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
SV1; MyIE2)"
252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif
HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE
6.0; Windows NT 5.1; SV1; MyIE2)"
An extract from KDnuggets web log
© 2006 KDnuggets
Web Server Log – An Example
KDnuggets.com
Server
Page contents
http://www.kdnuggets.com/jobs/
Web server log
152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET … HTTP/1.1" 200
152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /gps.html HTTP/1.1" 200
152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200
© 2006 KDnuggets
…
Web (Server) Log – In Depth
A sample web log line
152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)“
152.152.98.11
-[16/Nov/2005:16:32:50 -0500]
"GET /jobs/ HTTP/1.1"
200
15140
"http://www.google.com/search?q=salary+for+data+mining
&hl=en&lr=&start=10&sa=N"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
.NET CLR 1.1.4322)"
© 2006 KDnuggets
Web log field: IP
152.152.98.11
IP address - can be converted to host name,
such as xyz.example.com
© 2006 KDnuggets
Web log fields: Name, Login
The name of the remote user (usually
omitted and replaced by a dash “-”)
Login of the remote user (also usually
omitted and replaced by a dash “-”)
© 2006 KDnuggets
Web log field: Date/Time/TZ
[16/Nov/2005:16:32:50 -0500]
Date:
DD/Mon/YYYY
Time:
HH:MM:SS
Time Zone:
(+|-)HH00
relative to GMT
-0500 is US EST
© 2006 KDnuggets
Web log field: Request
"GET
/jobs/
HTTP/1.1"
URL:
relative
to domain
HTTP protocol:
e.g.
HTTP/1.0 or
HTTP/1.1
Method:
GET
HEAD
POST
OPTIONS
…
Note: the request is recorded as sent, so it may contain errors,
hacks, and any strange thing you can imagine
© 2006 KDnuggets
Web log field: Status code
200
Status (Response) code. Most important ones are:

200 – OK (most frequent, hopefully)

206 – partial access

301 – permanently redirected (e.g. access to
/courses is redirected to /courses/ )

302 – temporarily redirected

304 – not modified

404 – not found
…
© 2006 KDnuggets
Web log field: Object size
15140
size of the object returned to the client,
in bytes
Can also be “-” if status code is 304 (not
modified)
© 2006 KDnuggets
Web log field: Referrer
http://www.google.com/search?q=salary
+for+data+mining&hl=en&lr=&start=10
&sa=N
URL the visitor came from (here it was a Google query for
“salary for data mining”, 2nd page of results – starting from 10)
Referrer can also be a static page, internal (same domain) or
external (different domain),
or “-” in case of a direct request (e.g. type-in, bookmark)
Referrer analysis is very valuable
© 2006 KDnuggets
Web log field: User agent
"Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
User agent (browser) http://en.wikipedia.org/wiki/User_agent
Almost all browsers start with Mozilla – for historic reasons
In many cases additional information:
Browser type, version : MSIE
6.0 - Internet Explorer 6.0
OS: Windows NT 5.1 (XP SP2) with .NET Framework 1.1 installed
© 2006 KDnuggets
Web Usage Mining
 Basic
 Totals
 Simple
 Request level breakdowns
 Advanced
 Visit level analysis
 Target pages; Conversion analysis
© 2006 KDnuggets
Web Log Analysis Programs
 Free
 Analog, awstats, webalizer
 Google analytics
 Commercial
 WebTrends, WebSideStory, …
www.kdnuggets.com/software/web-mining.html
© 2006 KDnuggets
Web Usage Mining - Basic
 Totals for each component
 Hits – total number of requests
 Files – number of GETs
 Pages – number of HTML pages
 Sites – unique IP addresses
 Response codes
 Kbytes – total Kbytes transferred
 User Agents
© 2006 KDnuggets
Example:
KDnuggets.com Nov 2005 totals
Monthly Statistics (from webalizer)
Total
Value
Hits
1,121,643
Files
930,468
Pages
312,889
Kbytes
Unique Sites (IP)
10,578,535
35,942
Unique URLs
6,769
Unique Referrers
7,213
Unique User Agents
2,724
© 2006 KDnuggets
More
details
Q: What is the
meaning of the
difference between
Hits and Files?
Example:
KDnuggets.com Nov 2005 totals, 2
Monthly stats for Files by Status Code
Answer: the difference
between Hits and Files is
the number of requests
with status code not 200.
Code
Hits
Code 200 - OK
930,468
Code 206 - Partial
Content
9,303
Code 301 - Moved
Permanently
4,217
Code 302 - Found
457
Code 304 - Not
Modified
170,874
Code 404 - Not Found
Other
© 2006 KDnuggets
6,297
27
Difference between Files and Pages
 Q: What is the meaning of difference between
Files and Pages ?
© 2006 KDnuggets
Difference between Files and Pages
 A: the difference between Files and Pages is the
number of non-HTML files (e.g. image, javascript,
etc
 In November 2005 KDnuggets log HTML files were
about 1/3 of all requests
 However, this data does not separate bot requests
(which are heavily weighted towards HTML pages)
© 2006 KDnuggets
Notes: web log formats
 We used web log in Apache standard format
 Some old logs have a different format without the
last 2 fields (referrer and user agent), but these
are now rare.
© 2006 KDnuggets
Related documents