Download CS 514/514G Syllabus - Institute for Academic Outreach

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
CS 5xx/5xxG: Data Mining and Machine Learning
Spring 2016
Instructor: Ruthie Dare-Halma, PhD
Office: VH 2230
Phone: 660-785-7235
Email: [email protected]
Office Hours: Monday, Wednesday 8:30-noon
Tuesday 8:30-11:30
*Videoconferencing and telephone visits are available at these times and by appointment.
Catalog description: fundamental data mining concepts and techniques for discovering interesting patterns from data in
various applications, emphasizing machine learning methodologies. Prerequisites: CS 170 or CS 180; STAT 190.
Texts:
1) Data Mining: Concepts and Techniques, 3rd edition; Jiawei Han, Micheline Kamber and Jian Pei; The Morgan
Kaufmann Series in Data Management Systems, Morgan Kaufmann Publishers, July 2011. ISBN 9780123814791.
2) Learning from Data, Yaser Abu-Mostafa, Malik, Magdon-Ismail, Hsuan-Tien Lin, , AMLbook.com, 2012. ISBN
978-1-60049-006-4.
Course topics: The following topics will be covered.
I Introduction
-What is data mining?
-Kinds of data that can be mined
-Kinds of patterns that can be mined
-Technologies used
-Targeted applications
-Major issues in data mining
II Data
-Data objects and attribute types
-Basic statistical descriptions of data
-Data visualization
-Measuring data similarity and dissimilarity
III Data Preprocessing
-Preprocessing overview
-Data cleaning
-Data integration
-Data reduction
-Data transformation and data discretization
IV Mining frequent patterns, associations, and correlations
-Basic concepts
-Frequent itemset mining methods
-Pattern evaluation methods
V Classification
-Basic concepts
-Decision tree induction
-Bayes classification methods
-Rule-based classification
-Model evaluation and selection
-Techniques to improve classification accuracy
VI Cluster Analysis
-Basic concepts
-Partitioning methods
-Hierarchical methods
-Density-based method
-Grid-based methods
-Cluster evaluation
VII The Learning Problem
-Problem setup
-Types of learning
-supervised, reinforcement, unsupervised, other
-Learning feasibility
-Error and Noise
VIII Training and Testing
-Theory of Generalization
-Interpreting the Generalization Bound
-Approximation-Generalization tradeoff
IX The Linear Model
-Classification
-Linear Regression
-Logistic Regression
-Nonlinear Transformation
X Overfitting
-When does overfitting occur?
-RegularizationX
-Validation
XI Learning Principles
-Occam’s Razor
-Sampling Bias
-Data Snooping
Learning Outcomes: By the end of the course it is expected that the student will





Develop an understanding of data mining
Understand what machine learning is and why it is essential to designing intelligent data products
Learn different methods for finding and describing structural patterns in electronically-stored data
Understand the significance of data mining by studying real-world examples
Become familiar with the potential ethical complications from data mining, including discrimination and use of
personal information
Competency Based Grading: This is a competency-based course. This means that you work at your own pace. You will
need to complete all homework and labs as you progress through the course material. All of the material (other than the
text) will be posted in Blackboard. You may complete the graded assessments whenever you think you are ready.
Your performance in class will be evaluated with several assessments. A score of 80% (equivalent to a B) or above will
signify competence for a given assessment, while a score of 90% (equivalent to an A) or above will signify mastery.
Competence in all assessments for a given course topic will be required to demonstrate competence for that topic, while
competence in all course topics will be required to demonstrate competence for the course.
Quizzes will be available for you to monitor your own progress throughout the course, and team projects will allow you to
practice the skills of the course. The assessments for determining competency and the final grade will be exams covering
each major area of the course, and a final project.
Students scoring lower than 80% in the course will not be deemed to have achieved competency. Students may retake
assessments until a grade signifying competency has been achieved. Failure to achieve this mark by the deadline
announced at the start of the course will result in a transcripted grade of F.
Academic Integrity: I believe students can learn a great deal by collaborating with others and discussing ideas. I
encourage you to work with colleagues on assignments and to study together as you prepare for exams and quizzes.
However, I expect that everything you turn in to me (unless otherwise specified in the assignment) will be your own work.
In your homework and assessment preparation, you may get an idea from another student, but the actual final product
must be written in your own words without referring to another student’s work (unless specifically stated otherwise in the
assignment). It is never acceptable to take credit for another person’s work or ideas. Plagiarism is a serious offense and
the consequences would be severe. Anyone submitting work to be graded which, in my estimation and beyond reasonable
doubt, is not his/her own will receive an F; I reserve the right to pursue formal integrity violation procedures should this
occur.
Persons with Disabilities: If you have a documented disability for which you are or may be requesting an
accommodation, you are encouraged to contact both your instructor and the Disability Services office (x4478)
immediately. I will be happy to make appropriate accommodations. If you desire to seek accommodation for a graded
activity, you must give at least a one week notice before the activity is to commence.