Download IS 579— Business Intelligence and Data Mining

IS 579— Business Intelligence and Data Mining Instructor: Phone: WWW: Course WWW: Office hours: Dr. Debabrata (Deb) Dey Office: 206–543–1855 Email: http://faculty.washington.edu/ddey/index.htm UW Canvas (http://canvas.uw.edu/) Thursdays 4–6 PM PCAR 514 [email protected] Text, Software, and Other Resources Text: Data Mining: Practical Machine Learning Tools and Techniques, Witten, Frank, & Hall, Morgan Kaufmann, 3rd Ed., 2011. Software: Microsoft Excel and WEKA–Version 3.6.10 (http://www.cs.waikato.ac.nz/ml/weka/). Links to web-based resources will sometimes be provided. (There are many web sites that provide tutorials and other background materials for basic data mining concepts. Unfortunately, not all of them are always correct. Since it is not possible for me to review every possible site and check its correctness, I suggest that you restrict yourself to only those web resources that are provided by me in class and at the Canvas site.) Course Packet For each topic discussed in this course, I have prepared an extensive set of PowerPoint slides. These slidedecks are all posted at the Canvas site. These presentations are not intended to substitute regular reading from the textbook; they are there to merely act as a summary of topics/issues/concepts discussed in class. These slide-decks will save you the time while taking notes in class, so that you can make better use of that time by listening, asking question, and participating in class. In addition, all practice questions and solutions will be posted at the Canvas site. (You do not have to buy any course packet from copy centers.) Course Description and Objective The objective of this course is to introduce students to the various techniques of data mining so that they can identify problems and opportunities in their companies and apply these techniques. Special attention would be given to existing real-world applications that make use of data mining techniques. Students are expected to understand the basic concepts and their applicability, but are not expected to do programming or detailed implementation. Course Motivation Over the years, organizations have accumulated a vast amount of data in their enterprise-wide information systems. These data typically represent daily operations and transactions within a business context. It is easy to see that all the business intelligence and rules are, in some way, embedded in these data. The question then becomes one of how we can mine this vast amount of data in order to: (i) learn the embedded business intelligence, and (ii) apply that intelligence to run a business in a more efficient and effective manner. Over the last decade or so, data mining has become a very important part of intelligent business practices, leading to higher revenues and lower costs, while maintaining or enhancing the quality of products and services. This technique could be applied in any one or combination of functional areas. Examples of such applications abound in the real world. Companies such as Blockbuster, Amazon, and American Express, for example, are mining their transactions data to recommend appropriate products and services to their customers. WalMart is using data mining techniques for more efficient logistics, supply chain management, inventory control, and pricing. Many financial institutions are using data mining for loan processing, credit rating, and target marketing. Companies are also using data mining techniques along with their online presence for technical support and customized solution provision. IS 579 Syllabus: Deb Dey 2 Classroom Expectations Please bring copies of the posted slide-decks to every class. Please display your nametag at each session. Please turn off (or put in the silent mode) your cell phone during class time. We will be working with numbers in class to gain better understanding of the concepts; please bring your calculator to every class. If you bring a laptop computer or a handheld device to class, please restrict its use for class-related purpose only. Class Participation Interactive learning is not only fun, but is also very effective in grasping the material more quickly. In order to promote a classroom environment conducive to interactive (often, also called active) learning, I will regularly grade students based on their participation during class. Participation points are awarded for thoughtful, pertinent questions, answers, and discussions; at the same time, frivolous questions may lead to a deduction of participation points. Please place your nametag in front of you throughout the quarter to enable me to grade you correctly; missing nametag simply means no participation points. Homework and Course Project There are no graded homework assignments for this class. However, to provide regular practice, sets of practice questions will be given out. Students are encouraged to work on these questions individually, and consult the posted solutions. If required, direct feedback on your work will also be provided. There is a team project for this course. Details for the project will be provided separately. Exams and Pop Quizes There will be two exams in this course: an in-class Exam 1 and a take-home Exam 2. Both exams are open book and notes. For Exam 1, a calculator will be required; a computer is not required. For Exam 2, you may use a calculator or a computer. The final exam is cumulative. It should be submitted at the Canvas site on or before the due date. Makeup exams are usually not granted. In exceptional situations, a makeup oral/written exam can be arranged for a student. Please consult me early in the quarter if you foresee conflicts with your work or workrelated travel schedule. I will be regularly giving out in-class quizzes to ensure that you keep up with the basic concepts. The lowest-score quiz will be dropped in grade calculations. Grading Course Project Pop Quizzes Exam 1 (in-class) Exam 2 (take-home) Class Participation 20% 20% 20% 30% 10% Graded Work, Feedback, and Solutions Graded work (pop quizzes and midterm) and feedback (on submitted practice questions) will be returned promptly to your pick-up folder; please check your pick-up folders regularly. Solutions to practice questions will be posted at the Canvas site. Please check your answers against these solutions. If you have questions, please bring them to my attention as soon as possible. IS 579 Syllabus: Deb Dey 3 Tentative Schedule Date 09/26/13 10/03/13 10/10/13 10/17/13 Lecture Topic Introduction Database and Probability Information Theory Data Cleaning, Conversion, & Preparation Basic Classification Basic Classification WEKA Testing & Validation 10/24/13 Advanced Classification Review for Exam 1 10/31/13 Exam 1 11/07/13 11/14/13 11/21/13 Association Rule Mining Association Rule Mining Numerical Prediction Numerical Prediction Other Classification Techniques Implementation & Management Review for Exam 2 11/25/13 Due Practice Set 1 Practice Set 2 Practice Set 3 Practice Set 4 Exam 1 (In-Class) Practice Set 5 Practice Set 6 Exam 2 (Take-Home) Class Project Exam 2 (Take-Home) 11/28/13 No Class — Thanksgiving 12/05/13 Advanced Topics (TBA) Additional Tutorial and Review Date Posted Time 10/10/13 9:00–10:00 PM 10/24/13 9:00–10:00 PM 11/07/13 9:00–10:00 PM 11/21/13 9:00–10:00 PM IS 579 Syllabus: Deb Dey Lecture Notes and Description of Topics Chapter 1: Introduction Topics: Basic concepts Reading: Text Chapters 1 and 3 Chapter 2: Database and Probability Basics Topics: Relational databases, SQL, Bayes’ rule, conditional independence, contingency tables Chapter 3: Information Theory Topics: Information theory, information gain, tree induction, gain ratio Chapter 4: Data Cleaning, Conversion and Preparation Topics: Noise, redundancy, (lack of) specificity, heterogeneity, attribute expansion, attribute consolidation, input formatting, data partitioning Reading: Text Chapters 2 and 7.2 Chapter 5: Basic Classification Topics: Basic concepts of Naïve Bayesian and decision tree (ID3) classifiers Reading: Text Chapter 4.2 and 4.3 Chapter 6: Testing and Validation Topics: Training versus testing data, partitioning datasets, accuracy, stratified accuracy, confusion matrix, relative information score (RIS), cost-based measure, lift ratio Reading: Text Chapter 5 Chapter 7: Advanced Classification Topics: Dealing with missing values and numeric features, decision tree (C4.5), pruning Reading: Text Chapters 4.2, 4.3, and 6.1 Chapter 8: Association Rules Topics: Market basket analysis, items and itemsets, support, confidence, mining basics, Apriori-Gen algorithm, single item transactions Reading: Text Chapter 4.5, 6.3 Chapter 9: Numerical Prediction Topics: Linear regression, multiple regression, performance measures, regression trees Reading: Text Chapters 4.6, 6.5 Chapter 10: Other Classification Techniques Topics: Rule-based and case-based classification Reading: Text Chapters 4.4, 4.7, 6.4 Chapter 11: Implementation and Management Topics: Data integration and preparation, data quality, choice of techniques, choice of software tools, build vs. buy, insource vs. outsource, performance issues, testing, deployment, change management Reading: To be announced 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download IS 579— Business Intelligence and Data Mining