Download Business Intelligence and Data Mining ISOM 3360: Spring 2016

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Business Intelligence and Data Mining ISOM 3360: Spring 2016 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage 1. Course Overview
Jia Jia, ISOM Email: [email protected] Office: LSK 5045 Begin subject: [ISOM3360]... <-­‐-­‐-­‐ Note! Mon 13:00 -­‐ 15:00 and by appt. Lecture: Wed & Fri 15:00 – 16:30 ( LSK1014) Lab: LA1 Mon 12:00 -­‐ 13:00 (LSKG005) LA2 Thu 13:30 -­‐ 14:30 (LSKG005) Accessible from Canvas This course will change the way you think about data and its role in business.
Businesses, governments, and individuals create massive collections of data as a byproduct of
their activity. Increasingly, decision-makers rely on intelligent technology to analyze data
systematically to improve decision-making. In many cases automating analytical and decisionmaking processes is necessary because of the volume of data and the speed with which new
data are generated.
In virtually every industry, data mining has been widely used across various business units such
as marketing, finance and management to improve decision making. In this course, we discuss
specific scenarios, including the use of data mining to support decisions in customer relationship
management, market segmentation, credit risk management, e-commerce, financial trading,
online recommendation, and search engine strategies.
The course will explain with real-world examples the uses and some technical details of various
data mining techniques. The emphasis primarily is on understanding the business application
of data mining techniques, and secondarily on the variety of techniques. We will discuss the
mechanics of how the methods work only if it is necessary to understand the general concepts
and business applications. You will establish analytical thinking to the problems and understand
that proper application of technology is as much an art as it is a science.
In order to accommodate the emerging need for big data applications, modern distributed file
systems and MapReduce algorithms and some other large-scale algorithms will also be covered.
The course is designed for students with various backgrounds, while some basic mathematical
knowledge of linear algebra, calculus and probability is a prerequisite.
After taking this course you should:
1. Approach business problems data-analytically (intelligently). Think carefully & systematically
about whether & how data can improve business performance.
2. Be able to interact competently on the topic of data mining for business intelligence. Know the
basics of data mining processes, techniques, & systems well enough to interact with business
analysts, marketers, and managers. Be able to envision data-mining opportunities.
3. Be able to identify the right BI tools/techniques for various business problems. Gain hands-on
experience in using popular BI tools and get ready for the job positions that require familiarities
with the BI tools.
2. Lecture Notes and Readings
• Lecture notes
For most classes I will hand out lecture notes, which will outline the primary material for the
class. Other readings are intended to supplement the material we learn in class. They give
alternative perspectives and additional details about the topics we cover:
• Supplemental readings posted to Canvas or distributed in class.
• Supplemental book (optional):
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management,
third Edition, by Michael Berry and Gordon Linoff , Wiley, 2011 ISBN: 0470650931
Data Science for Business What you need to know about data mining and data-analytic thinking,
by Foster Provost, Tom Fawcett, O'Reilly Media, 2013 ISBN: 9781449361327
Many students find these books to be excellent supplemental resources. In the class schedule
below I suggest the most important sections to read to supplement each class module.
3. Requirements and Grading
The grade breakdown is as follows:
1. Lab participation: 10%
2. Homework (4): 40%
3. Midterm quiz: 20%
4. Final exam: 30%
4. Important Notes on the Lab Session
This is primarily a lecture-based course, but student participation is an essential part of the
learning process in the form of active practice. You are NOT going to learn without practicing
the data analysis yourselves. During the lab session, I will expect you to be entirely devoted to
the class by following the instructions. And you should actively link the empirical results you
obtained during the lab to the concepts you learned in the lectures.
During the Lab session, you will gain hands-on experience with top data mining tools
e . g . , Weka and R.
5. Homework Assignment and Exams
There will be a total of 4 individual homework, each comprising questions to be answered and
hands-on tasks. Completed assignments must be handed in prior to the start of the class on the
due date. If submitted by email they must arrive at least one hour prior to the start of class.
Assignments will be graded and returned promptly.
Assignments are due prior to the start of the lecture on the due date. Turn in your assignment
early if there is any uncertainty about your ability to turn it in on the due date. Assignments up to
24 hours late will have their grade reduced by 25%; assignments up to one week late will have
their grade reduced by 50%. After one week, late assignments will receive no credit.
The in-class mid-term quiz is to be tentatively scheduled on March, 23. Let me know as
early as possible if there is any unavoidable conflict. The final exam will be held during the final
examination period; the date will be announced later in the semester. The quiz and exam must
be taken at their scheduled times; make up quizzes and exams will only be given for special
cases, in accordance with University guidelines.
Tentative Schedule of Lecture Topics
The following table shows the planned list of topics that we will cover in each class as well as
the assignment due dates. Please note that this schedule is tentative and may be adjusted as
the semester progresses.
Class Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 22 23 Date Assignment Due Dates Topics Introduction Feb. 3 What is BI? Why BI now? What is data mining? DM process. DM tasks. DM Basics Feb. 5 Feb. 12 Decision tree learning. Business application: Customer segmentation Feb. 17 Feb. 19 Feb. 24 Model evaluation. Cost-­‐sensitive learning. ROC graph Feb. 26 Linear and logistic regression Business application: Customer retention Mar. 2 Mar. 4 "naïve" Bayes and linear discriminant analysis Mar. 9 Homework 1 Due Mar. 11 Mar. 16 Text Analytics/Mining Business application: spam filtering and financial news trading Mar. 18 Midterm Review Midterm quiz (Coverage 1-­‐12) Unsupervised learning: Association rule learning Business application: Basket analysis Unsupervised learning: Clustering analysis Business application: Customer segmentation Support vector machines Business application: Handwritten digit recognition Mar. 23 Apr. 1 Apr. 6 Apr. 8 Apr. 13 Search engine (SE) analytics: How does SE work? What is SE marketing? How to combat web spam? Nearest neighbors and recommender system in e-­‐
commerce Apr. 20 Apr. 22 Apr. 29 May. 4 Apr. 15 Apr. 27 Homework 2 Due Large-­‐scale data processing: MapReduce framework Final Exam Review Homework 3 Due Homework 4 Due Lab Session Schedule
Lab Number 1 2 3 4 5 6 7 8 Date Feb. 15&18 Feb. 22&25 Feb. 29&Mar.3 Mar. 7&10 Mar. 14&17 Apr. 11&14 Apr. 18&21 Apr. 25&28 Topics Weka installation, Weka demo (data type, format, loading) R installation and data exploration with R Decision tree learning with Weka Decision tree vs. logistic regression with Weka Model selection with R Financial news trading (naive bayes) with Weka Association rule learning with Weka Target marketing application