Download IST 565 Data Mining Course: Data Mining Semester: Summer 2016

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
4/8/16
IST 565 Data Mining
Course: Data Mining
Instructor: Jon Fox
Office: Bentonville, AR
Office Hours: by Appointment / Online
Semester: Summer 2016
Email: [email protected]
Phone: 706.399.1538 (cell)
Meeting Place: Online
Catalog Description
A broad introduction to data mining tools and techniques for information professionals.
Students will develop a portfolio of resources, demonstrations, recipes, and examples of
various data mining techniques.
Course Description
This course will introduce popular data mining methods for extracting knowledge from data.
The principles and theories of data mining methods will be discussed and related to current
opportunities in the business environment. Students will also acquire hands-on experience
using state-of-the-art software to develop data mining solutions to practical problems. course
focuses on becoming familiar and comfortable with a range of the available analytic tools in the
context of several difficult, data-focused problems.
The topics of the course will include the key tasks of data mining, including data preparation,
concept description, association rule mining, classification, clustering, evaluation and analysis.
By addressing problems in creative ways and connecting sets of available tools and data, the
students are able to gain a practical understanding of analytics as a whole while identifying
areas of interest for additional exploration.
Learning Objectives
During the course, we will emphasize:
1. Experiential learning through reading and practical exercises.
2. Collaborative learning through online discussions between instructors and peers.
3. Self-learning with appropriate instructional support and timely feedback using data
mining case studies.
In order to be successful in this course, the student will:
1. Pro-actively research solution options vs. relying solely on textbook content.
2. Actively code while completing the reading assignments.
4/8/16
3. Present results in a professional manner. Comments – Clarity – Correctness – Credit.
4. Submit their assignments on time.
Upon completion of the course, the student will be able to:
1. Understand the fundamental processes, concepts, and techniques of data mining.
2. Appreciate the range of applicability of data mining to real problems in areas such as
business, science, and engineering.
3. Advance your understanding of contemporary data-mining tools
4. Communicate your results in a meaningful way.
Required Course Materials
Pang-Ning Tan, Michael Stienbach, and Vipin Kumar, Introduction to Data Mining,
Pearson, 2005. (Free sample chapters are available at the following website: http://wwwusers.cs.umn.edu/~kumar/dmbook/index.php)
The required software includes Weka, R, and Tableau. All of these software packages
are available through remote lab.
Course Assignments / Percent of Final Grades / Due Dates:
Tasks
1
2
3
4
Discussions
Homework Assignments
Project Checkpoints
Final Project
Percent of Course
10%
55%
5%
30%
Due Dates
All semester long
Week 3, 5, 7, & 9
Week 4 & Week 10
Due at end of semester
Assignment #1: Discussions
The online discussions provide an opportunity to discuss current data mining readings,
events, technologies, and methods. The discussions facilitate the first, second, and fourth
learning objectives of the course by providing the opportunity to demonstrate
understanding of data mining concepts and the range of applicability of data mining. There
are 10 possible discussions in this course worth a maximum of 1 point each. Maximum
points are possible if the submission is on-time, complete, and correct.
Assignment #2: Homework Assignments
Homework assignments provide open-ended problem solving experiences that build on the
material covered in the readings. The assignments facilitate the first and third learning
objectives of the course by providing the opportunity to apply techniques from class to
realistic problem solving situations. A separate instruction document will be provided with
4/8/16
specific instruction for each assignment. The first homework assignment is worth a
maximum of 10 points. The final 3 homework assignments in this course are worth a
maximum of 15 points each. Maximum points are possible if the submission is on-time,
complete, and correct.
Assignment #3: Final Course Project
For the final project, students will identify a data-mining problem, bring together different
data sources, conduct analysis, draw conclusions, and produce a report explaining the
results. Maximum points are possible if the submission is on-time, complete, and
demonstrates the student’s ability to match the appropriate data mining methods to the
chosen problem, draw appropriate conclusions, and present the results in a meaningful
way.
Class-Wide Phone Conferences:
The instructor will answer student questions during toll-free phone conferences. There will be
an introductory call early in the semester and then one call prior to each of the two project
checkpoints. The phone conferences are optional but participation is highly encouraged as
course learning objectives, specific concepts, and upcoming assignments will be discussed.
Course Grading:
Grades for specific assignments and the course final grade will be assigned by the instructor
through the course’s on-line site. There are 100 possible grade points in this course and each
Assignment’s grade value goes directly toward the total earned by each student. The numeric
final point total will translate to the final letter grade for the course as follows:
A
C+
100-93
79-78
AC
92-90
77-73
B+
C-
89-88
72-70
B
D
87-83
69-60
BF
82-80
< 60
Grades will be available for viewing in the Grade Book section for the course’s on-line site.
Academic Integrity
The academic community of Syracuse University and of the School of Information Studies
requires the highest standards of professional ethics and personal integrity from all members of
the community. Violations of these standards are violations of a mutual obligation
characterized by trust, honesty, and personal honor. As a community, we commit ourselves to
standards of academic conduct, impose sanctions against those who violate these standards,
and keep appropriate records of violations. The academic integrity statement can be found at
http://supolicies.syr.edu/ethics/acad_integrity.htm.
4/8/16
Blackboard
The iSchool uses Syracuse University’s Blackboard system to facilitate distance learning and
main campus resources. The environment is composed of a number of elements that will help
you be successful in both your current coursework and your lifelong learning opportunities. To
access Blackboard, go to the following URL: http://blackboard.syr.edu. Use your Syracuse
University NetID & Password to log into Blackboard. For questions regarding technical aspects
of Blackboard, please submit a help ticket to the iSchool dashboard at My.iSchool.Dashboard
(https://my.ischool.syr.edu). Log in with your NetID, select “Submit a Helpdesk Ticket,” and
select Blackboard as the request type. The iSchool Blackboard support team will assist you.
Students with Disabilities
In compliance with Section 504 of the Americans with Disabilities Act (ADA), Syracuse University
is committed to ensure that “no otherwise qualified individual with a disability … shall, solely by
reason of disability, be excluded from participation in, be denied the benefits of, or be
subjected to discrimination under any program or activity …” If you feel that you are a student
who may need academic accommodations due to a disability, you should immediately register
with:
Office of Disability Services (ODS)
804 University Avenue
Room 308 3rd Floor
315.443.4498 or 315.443.1371 (TTD only)
ODS is the Syracuse University office that authorizes special accommodations for students with
disabilities.
4/8/16
Course Schedule
Week
0
5/16 – 5/22
1
5/23 – 5/29
2
5/30 – 6/5
3
6/6 – 6/12
4
6/13 – 6/19
5
6/20 – 6/26
6
6/27 – 7/3
7
7/4 – 7/10
8
7/11 – 7/17
9
7/18 – 7/24
10
7/25 – 7/31
11
8/1 – 8/7
12
8/8 – 8/14
Topic
Course Introduction
Readings
Syllabus
Assignments
Intro Lecture
Data Mining Intro
Align our class with the
methods, goals, and
expectations
of
the
course.
Data Preparation
Before we use the data,
we have to make sure it is
ready.
Data Exploration
Understanding
the
structure and content of
the data.
Classification
Using decision trees to
classify data.
Pang Ch. 1
Syllabus Quiz
Model Evaluation
How good is our model?
Does our algorithm help
us understand the data.
Classification
Using Naïve Bayes as a
decision tree alternative.
Classification
Nearest neighbors and
support vectors.
Clustering
Finding the patterns and
producing meaningful
results.
Clustering
Finding the patterns and
producing meaningful
results.
Association Rules
Finding oranges in your
shopping basket
Project Preparation
Identify, resolve, and
hopefully avoid the pitfalls
of data mining.
Project Presentation
Present results in a
meaningful way.
Pang Ch. 4
Exercise 2
Pang Ch. 5
Discussions
Adobe / Phone Conf 2
Pang Ch. 5
Discussions
Exercise 3
Pang Ch. 8
Discussions
Adobe / Phone Conf 3
Discussion Profile-Post
Pang Ch. 2
Adobe / Phone Conf 1
Discussions
Pang Ch. 3
Exercise 1
Discussions
Pang Ch. 4
Final Project Proposal
Discussions
Discussions
Pang Ch. 8
Exercise 4
Discussions
Pang Ch. 6
Final Project Status
Discussions
NA
Project Preparation
No assignments due
NA
Final Course Project
Due on 8/14
4/8/16
Read More About It:
Lantz, B., (2015). Machine Learning with R, Packt Publishing.
Mitchell, T. (1997). Machine Learning, McGraw-Hill: Boston. Available at the following
website: http://www.cs.cmu.edu/~tom/mlbook.html.
North, M., (2012). Data Mining for the Masses, Infinite.
Putler, D. D. & R. E. Krider (2012). Customer and Business Analytics. Boca Raton, FL: CRC Press.