Download mis2502-002-exam-3-study-guide-fall-2016

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
MIS2502.002 Fall 2016: Exam3 Study Guide
Date/Time: 12/19, 1:00 – 3:00 pm
Place: Regular classroom
The exam will be a combination of multiple-choice and short-answer questions. It is a closed-book, closednotes exam.
The exam will have 2 parts.
Part one will be 50 multiple choice questions, worth one point each. Students will record their answers to part
one questions on a SCANTRON sheet.
Part two will also be worth 50 points. Most questions in part two will require students to perform one or
more calculations.
You should bring a pencil. You should bring a calculator. You will not be able to use a computer or your
smartphone’s calculator during the exam.
The following is a list of items that you should review in preparation for the exam. Note that not every item on
this list may be on the exam, and there may be items on the exam not on this list.
Data Visualization
 Be sure to review the “data visualization” lecture from earlier in the semester. (Key concepts: Lie
Factor, Data Ink, Chart-Junk.)
Data Mining and Data Analytics Techniques
 Explain the three data analytics techniques we covered in the course
 Decision Trees, Clustering, and Association Rules
 What kinds of problems can each solve? Provide a business-oriented example.
 Make recommendations to a business based on the results of each type of analysis.
 Explain how data mining differs from OLAP analysis. How is data mining different from working with a
data cube or a pivot table?
Using R and RStudio
You will not need to generate blocks of R code for this exam. However, you should be familiar with the
basic syntax.
 Explain the difference between R and RStudio
 The role of packages in R
 Generate and explain basic syntax for R, for example:
 Variable assignment
 Identify functions versus variables
 Identify how to access a variable (column) from a dataset (table)
Understanding Descriptive Statistics (Introduction to R)
 Be able to read and interpret a histogram
 Be able to read and interpret sample (descriptive) statistics
 Be able to read and interpret results from simple hypothesis testing (e.g., t-test)
Decision Tree Analysis (Decision Trees in R)
 Understand the role and structure of outcome and predictor variables in a decision tree
 Interpret a decision tree: determine the probability of an event happening based on predictor variable
values
 Understand the meaning of the complexity factor (COMPLEXITYFACTOR) and minimum split
(MINIMUMSPLIT), and how it can alter the decision tree
 Compute Chi-Square and understand how it is used in the tree induction algorithm
 Interpret the correct classification rate of a decision tree
Cluster Analysis (Cluster Analysis Using R)
 Be able to read the output from a cluster analysis and interpret a scatter plot of 2 dimensional data (i.e.,
the baseball example from the slides)
 Interpret within-cluster sum of squares error and between-cluster sum of squares error
 Within-cluster sum of squares error is also known as: intra-cluster sum of squares error, intra-cluster SSE,
or “withinss” in R
 Between-cluster sum of squares error is also known as: inter-cluster sum of squares error, inter-cluster
SSE, or “betweenss” in R
 Relate error to cohesion and separation
 What does it mean when those values are larger (or smaller)?
 What happens to those statistics as the number of clusters increases?
 What is the advantage of fewer clusters? (There’s a balance between cohesion and ease of interpretation)
 Interpret standardized cluster means for each input variable. Describe a particular cluster in relation to
the average for the entire data set
Association Rules (Association Rules Using R)
 Be able tread and interpret the output from an association rule analysis
 Find the strongest (or weakest) rule from a set of output
 Understand and be able to explain the difference between support, confidence, and lift.
 Can you have high confidence and low lift?
 Given a set of baskets, compute and interpret support, confidence, and lift for an association rule
 Given a table of aggregate purchase numbers for two products, compute and interpret the lift for the rule
based on those two products
Extra credit
There will be an extra credit opportunity on this exam. I will ask two questions. Each question can be
answered with a SQL statement. Each question will be worth 5 points each for a possible total of 10
points of extra credit.
Technically, SQL is beyond the scope of exam 3. However, I am offering this extra credit opportunity as
compensation to students who feel they ran out of time on exams 1 or 2. Of course, proficiency with
writing SQL queries is a major emphasis of this course (and those two earlier exams) so this is a
reasonable offering.