Download University of Sydney Fall 2013 Discipline of Business

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Multinomial logistic regression wikipedia , lookup

Transcript
University of Sydney
Discipline of Business Analytics
Fall 2013
QBUS 6810: Statistical Learning and Data Mining
We 6-9 pm
Room H04-159 (Seminar Room 6, Ground floor, Merewether Building)
Description
It is common for businesses to have access to very rich information data sets, often generated
automatically as a by-product of the main institutional activity of a firm or business unit. Data
mining, or statistical learning, deals with inferring and validating patterns, structures and
relationships in data, as a tool to support decisions in the business environment. This post
graduate course in statistical learning offers a survey of main statistical methodologies for
visualization and analysis of business and market data. It provides the tools necessary to extract
information required for specific tasks such as credit scoring, prediction and classification,
market segmentation and product positioning. Emphasis will be given to business applications
of data mining using modern software tools.
The goals are that students
(1) know which business analytic tool is most relevant for what type of business problem,
(2) know advantages and limitations of each method,
(3) can extract information from large volumes of data readily available from the business
environment,
(4) can obtain and interpret a meaningful analytical result using a software package such as
STATA®, Gauss, Matlab or SAS,
(5) can present and write about finding effectively
Lecturer
Office Hours
Artem Prokhorov, PhD
Tel.: 02 9351 6584
Office: Merewether-499
Email: [email protected]
Web: http://alcor.concordia.ca/~aprokhor
We 10-12 or by appointment.
Emailing me your questions is often the fastest way to get an answer. Also, I am in the office
most of the time and can usually talk to students without an appointment.
Book
The Elements of Statistical Learning: Data Mining, Inference and Prediction, by T.Hastie, R.Tibshirani and
J.Friedman (2002), Springer Series in Statistics, 2009.
Freely available at: http://www-stat.stanford.edu/~tibs/ElemStatLearn/.
Grading
Exam
Presentations
Group Project
=
=
=
50%
25%
25%
I reserve the right to adjust the final grade distribution as I see appropriate.
Exam
I will announce the time and place and provide a practice exam at the end of the semester
Presentations Presentations are in-class lectures given by a student to other students. They will cover
chapters of the Book that I will assign and can include any other material you find or I provide
you with. They will be followed by unmarked clicker-based multiple choice quizzes composed
by the presenter.
Group Project Your own statistical analysis of your own problem using your own business or economic or
financial data, carried out in groups of 2-3 people. Involves:
1. finding a topic of interest to you (it can come from the applied portion of your previous
course paper, from a recent paper you saw, from the examples we cover in class, or from
me).
1
2. finding data for it (e.g., section Databases at alcor.concordia.ca/~aprokhor/links.html); those
who will not have a topic (and data) by the time of Easter break will be assigned one.
3. choosing an appropriate method (choose from those we study and talk to me).
4. estimating the model in the software of your choice. Do this by May 1 if you want my
feedback.
5. presenting the results in class in the last couple of weeks of the course.
6. incorporating any feedback you receive and writing up results in a 10-15 page summary
(background, method, findings, interpretation and limitations). Deadline for email
submission of the summary is June 1.
General Info There are no make-ups for presentations, group projects or the exam.
Not showing up for a presentation or the exam, not turning in the project automatically gets
you a zero for the relevant part of the grade unless there is a well documented medical excuse,
in which case the weight of the missing part is spread over the remaining parts.
Students must notify instructor about religious observances at the beginning of the semester so
that they can be accommodated.
The following outline is tentative. We may add topics from the Book.
Tentative Outline
I.
Introduction and linear algebra for data analysis: Introduction to statistical learning.
Vector spaces, inner product, matrices, matrix inverse. Covariance matrix.
II.
Data visualization and introduction to supervised learning: The spectral and the
singular value decomposition of a matrix: the biplot. Optimal linear prediction. Loss
functions.
III.
Linear regression model: Representation, inference (estimation and testing)
IV.
Variable selection and shrinkage methods: Stepwise selection, rigde regression, lasso,
lars, principal components regression.
V.
Linear methods for classification: Linear probability model and logistic regression
VI.
Linear methods for classification: Canonical variates and discriminant analysis
VII.
Semiparametric regression: Regression splines and smoothing splines
VIII.
Kernel smoothing methods Kernel smoothing. Local polynomial regression.
IX.
Model assessment and selection; model inference and averaging.
X.
Classification trees: Regression and classification trees, boosting.
XI.
Neural networks: Neural networks, training.
XII.
Unsupervised learning: Association analysis. Market basket analysis, distance and
similarity, multidimensional scaling
XIII.
Cluster analysis: K-means, Gaussian mixtures, hierarchical methods
2