Download Syllabus for DSC 491: Introduction to Data Mining in Business

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Syllabus for DSC 491: Introduction to Data Mining in Business
Course Goals
1.
To give students confidence in using the computer to analyze very large data sets to discern,
understand, and interpret the truth about populations and processes.
2.
To promote critical thinking by critically examining the appropriate uses of and conclusions drawn
from some of the most important statistical data mining methods (specifically cluster analysis,
market basket analysis, tree diagrams, logistic regression and neural nets).
3.
To help students gain perspective in the general use of scientific methods by discussing both the
assumptions behind statistical methods and remedial actions needed when assumptions are
violated.
4.
To promote understanding contexts by emphasizing that the current state of statistical practice is a
historical and changing situation.
5.
To give students the opportunity to engage with other learners by discussing the practical and
theoretical meanings of statistical data analyses. Also, to give students practice in communicating
statistical results by reporting their conclusions in writing to the professor, who will provide
feedback.
6.
To demonstrate how statistical analyses allow the evaluation of alternative choices in a business
context, and to show how these methods help us to reflect and act in a business context.
Text:
Required: Applied Analytics Using SAS Enterprise Miner 5
Recommended: Discovering Knowledge in Data, An Introduction to Data Mining
By Daniel T. Larose
Cheating as outlined in the 2008-2009 Student Handbook, shall be dealt with severely in
accordance with the guidelines in the aforementioned document. Copying from another person’s
exam, use of unapproved information on an exam, receiving unauthorized help from another
student are all cheating. Discussion with others about a home task is permitted but direct copying
of another student’s home task is cheating.
Proposed Topics
1. Data Mining vs. Data Warehousing, what are they and how are they different?
What kinds of business questions can be answered with Data Mining methods?
2. DMAIC, SEMMA, Virtuous Cycle, Vardeman and Jobe plan
3. Supervised vs Unsupervised (Directed vs Undirected) Methods
3. Types of variables: Target vs Input: interval, nominal, ordinal, categorical, dummy.
4. Hypothesis Testing, p-values, χ2 statistic, F statistic, log worth.
5. Cluster Analysis
6. Market Basket Analysis, i.e., affinity grouping, lift, confidence, support
7. Decision Trees, lift, cumulative gains, model evaluation, decision-making
8. Logistic Regression, model evaluation, decision-making
9. Neural Nets, model evaluation, decision-making
SAS Enterprise Miner 5.2 to be used extensively. Time permitting, JMP to be used as well
Grading Policy
You will be graded no tougher
than the following:
3 Exams
50%
Lowest A is 90
Final Exam 35%
Lowest B is 75
Homework 15%
Lowest C is 62
Lowest D is 50
Below 50 is F