Download CISC 492: Data Analytics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

Transcript
CISC 492: Data Analytics
Instructor: D.B. Skillicorn
Data mining builds inductive models from data. Almost all organisations, and many
individuals, accumulate data from their interactions, and can use this data to improve
service, and sometimes profit.
The algorithms used for data mining must be efficient, because of the huge volumes of
data that have to be examined, and sophisticated, because the benefit of an extracted
concept depends heavily on how subtle it is.
This course is a project course, meeting for two 1.5 hour sessions each week. We will
examine a number of datasets, with each participant using a particular technique to
investigate each dataset and see what structure the technique discovers. You will have a
chance to try several different techniques during the course. You will present your
progress to the class each week. You may want to look at kaggle.com for some ideas
about the kind of data mining problems we will look at.
Good working knowledge of standard software environments is required, especially the
ability to develop scripts and plot data (e.g. Excel, Matlab, Open GL + Perl, Python,
Awk).
Prerequisites: CISC 333;
Textbooks (suggested):
Tan, Steinbach, Kumar, Introduction to Data Mining, Addison-Wesley.
Hand, Mannila, Smyth, Principles of Data Mining, MIT Press.
Assessment:
In-class performance: 70% based on assessments from all class participants;
30% take-home examination involving a simulated data mining
task.