Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 5xx/5xxG: Data Mining and Machine Learning Spring 2016 Instructor: Ruthie Dare-Halma, PhD Office: VH 2230 Phone: 660-785-7235 Email: [email protected] Office Hours: Monday, Wednesday 8:30-noon Tuesday 8:30-11:30 *Videoconferencing and telephone visits are available at these times and by appointment. Catalog description: fundamental data mining concepts and techniques for discovering interesting patterns from data in various applications, emphasizing machine learning methodologies. Prerequisites: CS 170 or CS 180; STAT 190. Texts: 1) Data Mining: Concepts and Techniques, 3rd edition; Jiawei Han, Micheline Kamber and Jian Pei; The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann Publishers, July 2011. ISBN 9780123814791. 2) Learning from Data, Yaser Abu-Mostafa, Malik, Magdon-Ismail, Hsuan-Tien Lin, , AMLbook.com, 2012. ISBN 978-1-60049-006-4. Course topics: The following topics will be covered. I Introduction -What is data mining? -Kinds of data that can be mined -Kinds of patterns that can be mined -Technologies used -Targeted applications -Major issues in data mining II Data -Data objects and attribute types -Basic statistical descriptions of data -Data visualization -Measuring data similarity and dissimilarity III Data Preprocessing -Preprocessing overview -Data cleaning -Data integration -Data reduction -Data transformation and data discretization IV Mining frequent patterns, associations, and correlations -Basic concepts -Frequent itemset mining methods -Pattern evaluation methods V Classification -Basic concepts -Decision tree induction -Bayes classification methods -Rule-based classification -Model evaluation and selection -Techniques to improve classification accuracy VI Cluster Analysis -Basic concepts -Partitioning methods -Hierarchical methods -Density-based method -Grid-based methods -Cluster evaluation VII The Learning Problem -Problem setup -Types of learning -supervised, reinforcement, unsupervised, other -Learning feasibility -Error and Noise VIII Training and Testing -Theory of Generalization -Interpreting the Generalization Bound -Approximation-Generalization tradeoff IX The Linear Model -Classification -Linear Regression -Logistic Regression -Nonlinear Transformation X Overfitting -When does overfitting occur? -RegularizationX -Validation XI Learning Principles -Occam’s Razor -Sampling Bias -Data Snooping Learning Outcomes: By the end of the course it is expected that the student will Develop an understanding of data mining Understand what machine learning is and why it is essential to designing intelligent data products Learn different methods for finding and describing structural patterns in electronically-stored data Understand the significance of data mining by studying real-world examples Become familiar with the potential ethical complications from data mining, including discrimination and use of personal information Competency Based Grading: This is a competency-based course. This means that you work at your own pace. You will need to complete all homework and labs as you progress through the course material. All of the material (other than the text) will be posted in Blackboard. You may complete the graded assessments whenever you think you are ready. Your performance in class will be evaluated with several assessments. A score of 80% (equivalent to a B) or above will signify competence for a given assessment, while a score of 90% (equivalent to an A) or above will signify mastery. Competence in all assessments for a given course topic will be required to demonstrate competence for that topic, while competence in all course topics will be required to demonstrate competence for the course. Quizzes will be available for you to monitor your own progress throughout the course, and team projects will allow you to practice the skills of the course. The assessments for determining competency and the final grade will be exams covering each major area of the course, and a final project. Students scoring lower than 80% in the course will not be deemed to have achieved competency. Students may retake assessments until a grade signifying competency has been achieved. Failure to achieve this mark by the deadline announced at the start of the course will result in a transcripted grade of F. Academic Integrity: I believe students can learn a great deal by collaborating with others and discussing ideas. I encourage you to work with colleagues on assignments and to study together as you prepare for exams and quizzes. However, I expect that everything you turn in to me (unless otherwise specified in the assignment) will be your own work. In your homework and assessment preparation, you may get an idea from another student, but the actual final product must be written in your own words without referring to another student’s work (unless specifically stated otherwise in the assignment). It is never acceptable to take credit for another person’s work or ideas. Plagiarism is a serious offense and the consequences would be severe. Anyone submitting work to be graded which, in my estimation and beyond reasonable doubt, is not his/her own will receive an F; I reserve the right to pursue formal integrity violation procedures should this occur. Persons with Disabilities: If you have a documented disability for which you are or may be requesting an accommodation, you are encouraged to contact both your instructor and the Disability Services office (x4478) immediately. I will be happy to make appropriate accommodations. If you desire to seek accommodation for a graded activity, you must give at least a one week notice before the activity is to commence.