Download Project milestone 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Transcript
ACCTG 6910, Spring 2003
DESB, University of Utah
Project Milestone 5 (April 3 – 17)
Question 1 (25%): Discover access patterns in web logs.
The supervisory council for University of Utah’s web portal has contacted the e.bis
Research Lab to discover user access patterns from its web logs.
As a volunteer in the Lab, you have been asked to perform association rule and
sequential pattern mining tasks on a small sample web log. It contains 4736 users,
10000 sessions and 11042 visit with the following attributes:
1-5
7-11
13-17
user id
session id
URL id
Step 1: Download from the Project section in the class website the data set –
weblog.txt and a text file – urlmapping.txt that describes mappings of URL codes in
weblog.txt to URLs in UU’s web site.
Step 2: Use IBM Intelligent Miner to mine the data set for large item sets, association
rules and large sequential patterns. Use 0.3 % for support level for association rule
and sequential pattern mining and 50 % for confidence level for association rule
mining. Mine the data set again using two different support levels for both
association rule and sequential pattern mining.
Step 3: Report and analyze the results. Please identify 10 interesting association
rules and 10 large sequential patterns respectively. Use the urlmapping.txt to help find
the URLs that match URL ids in the rules/patterns Write up a short (one to two
paragraphs) of analysis of these rules/patterns and any actions you recommend the
supervisory council to consider.
Question 2 (25 %): If the data file includes referrer and visit duration information for
each visit, please discuss how you might use clustering to help identify clusters in the
data file.