Download COP2253

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
COURSE SYLLABUS
Semester: Fall 2013
Course Prefix/Number: CAP 6990
Course Title: Web Data Mining
Course Credit Hours: 3.0
Course Meeting Times/Places:
Online
Instructor and Contact Information:
Dr. Runa Bhuamik
E-mail: [email protected]
Course Web Site: http://elearning.uwf.edu/
Prerequisites or Co-requisites: Data Mining (CAP5771).
Course Description:
The primary focus of this course is on Web usage mining and its applications to e-commerce and
business intelligence. Specifically, we will consider techniques from machine learning, data
mining, text mining, and databases to extract useful knowledge from Web data which could be
used for site management, automatic personalization, recommendation, and user profiling. The
first half of the course will be focused on a detailed overview of the data mining process and
techniques, specifically those that are most relevant to Web mining. The second half will
concentrate on the applications of these techniques to Web and e-commerce data, and their use in
Web analytics, user profiling and personalization.
List of Topics:
The following issues and topics will be covered throughout the course. Many of these
topics will be revisited several times during the course in a variety of contexts.
Data Mining and Knowledge Discovery






The KDD process and methodology
Data preparation for knowledge discovery
Overview of data mining techniques
Market basket analysis
Classification and prediction
Clustering


Memory-based reasoning
Evaluation and Interpretation
Web Usage Mining Process and Techniques








Data collection and sources of data
Data preparation for usage mining
Mining navigational patterns
Integrating e-commerce data
Leveraging site content and structure
User tracking and profiling
E-Metrics: measuring success in e-commerce
Privacy issues
Web Mining Applications and Other Topics





Data integration for e-commerce
Web personalization and recommender systems
Web content and structure mining
Web data warehousing
Review of tools, applications, and systems
Teaching materials



Required Textbook:
o Web data Mining - Exploring Hyperlinks, Contents and Usage Data, By Bing
Liu, Third Edition, Springer, July 2011, ISBN 978-3-642-19459-7
References
o Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber,
Morgan Kaufmann Publishers, ISBN 1-55860-489-8.
o Principles of Data Mining, by David Hand, Heikki Mannila, Padhraic Smyth, The
MIT Press, ISBN 0-262-08290-X.
o Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin
Kumar, Pearson/Addison Wesley, ISBN 0-321-32136-7.
o Machine Learning, by Tom M. Mitchell, McGraw-Hill, ISBN 0-07-042807-7
Data mining resource site: KDnuggets Directory
Topics (subject to change, slides may be changed too)
1. Introduction
2. Data pre-processing
o Data cleaning
o Data transformation
o Data reduction
o Discretization
3. Association rules and sequential patterns
o Basic concepts
2
o
o
o
o
o
4.
5.
6.
7.
8.
Apriori Algorithm
Mining association rules with multiple minimum supports
Mining class association rules
Sequetial pattern mining
Summary
Supervised learning (Classification)
o Basic concepts
o Decision trees
o Classifier evaluation
o Rule induction
o Classification based on association rules
o Naive-Bayesian learning
o Naive-Bayesian learning for text classification
o Support vector machines
o K-nearest neighbor
o Bagging and boosting
o Summary
Unsupervised learning (Clustering)
o Basic concepts
o K-means algorithm
o Representation of clusters
o Hierarchical clustering
o Distance functions
o Data standardization
o Handling mixed attributes
o Which clustering algorithm to use?
o Cluster evaluation
o Discovering holes and data regions
o Summary
Information retrieval and Web search
o Basic text processing and representation
o Cosine similarity
o Relevance feedback and Rocchio algorithm
o Opinion spam or fake review detection
Recommender systems and collaborative filtering
o Content-based recommendation
o Collaborative filtering based recommendation
 K-nearest neighbor
 Association rules
 Matrix factorization
Web data extraction
o Wrapper induction
o Automated extraction
References:
Weka’s site: http://www.cs.waikato.ac.nz/~ml/weka/
3
Grading Policy:
The final grade will be determined (tentatively) based on the following components:
Assignments = 65%
Final Project = 25% (throughout the semester)
Final Exam = 10%
Assignments:
There will be 5-6 assignments during the semester involving the concepts and techniques
discussed in class. The assignments may involve experimenting with various tools, as
well as other written or problem-oriented exercises. Some assignments must be done
individually.
Late Policy:
1.
2.
You are expected to complete work on schedule. Deadlines are part of the real
world environment you are being prepared for.
Documentation of health or family problems may be required.
Late assignments will be penalized 25% per day (that means, four days after due date it
will not be accepted).
Course Project:
For the class project, students can choose to do an implementation project, a data analysis
project, or a research paper. Implementation projects may be done individually or in
groups of 2 people (depending the complexity and the type of the project). Research
papers and data analysis projects must be done individually. Each group or individual
will submit a specific project proposal to be approved. More details about the possible
project options, as well as due dates for the proposal and the final submission, will be
available later.
About this Course:
This course is delivered completely online. Students must have consistent access to the Internet.
Learning at a distance may be a very different environment for many of you. You will set your
own schedules, and work at your own pace.
You may require some additional time online during the first few days while you become
accustomed to the online format and you may even feel overwhelmed at times. It will get better.
4
You should be prepared to spend more than 8 – 10 hours per week online completing lessons,
activities, and participating in class discussions. Finally, you may want to incorporate these tips
to help you get started:

Set a time at least twice a week (schedule) to:
o Check elearning postings to determine your tasks.
o Check elearning frequently throughout the week for updates.

Within the first week, become familiar with elearning and how to use it.
o It is a tool to help you learn!

Ask questions when you need answers.
o If you have problems, contact your instructor early.
Technology Requirements:
Knowledge of a machine learning tool – WEKA (on the Windows environment) will be
necessary for the project.
Expectations for Academic Conduct/Plagiarism Policy:
Academic Conduct Policy: (Web Format) | (PDF Format) | (RTF Format)
Plagiarism Policy: (Word Format) | (PDF Format) | (RTF Format)
Student Handbook: (PDF Format)
Assistance:
Students with special needs who require specific examination-related or other course-related
accommodations should contact Barbara Fitzpatrick, Director of Disabled Student Services
(DSS), [email protected], (850) 474-2387. DSS will provide the student with a letter for the
instructor that will specify any recommended accommodations.
Other Course Policies:
Class material and due dates: Students are responsible for all announcements and all material
presented. Students are expected to keep up with due dates and submit all assignments and work
into the elearning dropbox before the due date.
Communication: You are responsible for checking your e-mail and the elearning site regularly,
preferably once a day, to keep up with important announcements, assignments, etc.
Re-grading Assignments: It is the student’s responsibility to check graded assignments/tests
when they are returned to you. I will gladly re-grade an assignment/test when a question or
mistake is brought to my attention. To ensure fairness, I reserve the right to re-grade the entire
assignment/test. As a result, your grade may increase, decrease, or remain the same. Grades will
not be changed after a week from the date graded assignments/tests are returned to the class.
Grades: Final grades will be calculated using a standard grade distribution. The last day of the
term for withdrawal from an individual course with an automatic grade of “W” is 3/24. Students
requesting late withdrawal (W or WF) from class must have the approval of the advisor,
instructor, and the department chairperson (in that order) and finally by the Academic Appeals
5
committee. Requests for late withdraws may be approved only for the following reasons (which
must be documented):
1. A death in the immediate family.
2. Serious illness of the student or an immediate family member.
3. A situation deemed similar to categories 1 and 2 by all in the approval process.
4. Withdrawal due to Military Service (Florida Statute 1004.07)
5. National Guard Troops Ordered into Active Service (Florida Statute 250.482)
Requests without documentation will not be accepted. Requests for late withdrawal simply for
not succeeding in a course, do not meet the criteria for approval and will not be approved.
Applying for an incomplete or “I” grade will be considered only if: (1) there are extenuating
circumstances to warrant it, AND (2) you have a passing grade and have completed at least 70%
of the course work, AND (3) approval of the department chair.
Participation and Feedback: I encourage active participation and regular feedback. I believe
that effective communication between the instructor and students will make the course more
useful, interesting, and productive. Please contact me if you have any questions, concerns, or
suggestions! 
Important Note: Any changes to the syllabus or schedule made during the semester take
precedence over this version. Check the elearning site (or email) regularly for up-to-date
information.
Overall Grading Scale:
1. A : 92 - 100
2. A-: 89 - 91
3. B+: 87 - 88
4. B : 82 - 86
5. B- : 79 - 81
6. C+: 77 - 78
7. C : 72 - 76
8. C : 72-76
9. C-: 69-71
10. D+: 67-68
11. D: 59-66
12. F: 0-58
6
7