Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DEREE COLLEGE SYLLABUS FOR: ITC 3333 DATA MINING AND BIG DATA 3/0/3 (Spring 2016 ) PREREQUISITES: ITC1070 Information Technology Fundamentals –orCS 1070 Introduction to Information Systems ITC 2188 Introduction to Programming MA 2010 Statistics I CATALOG DESCRIPTION: Data and feature selection, cleaning, extracting patterns from structured and unstructured data, evaluation, big data, tools, applications RATIONALE: The volume of data that organisations collect is increasing exponentially, we are in the era of big data, but more data does not mean more knowledge. With data mining techniques, we can navigate through data that are chaotic, heterogeneous, unstructured and noisy in in order to infer what is relevant. The applications range from support of decision making, and marketing, up to fraud detection, and medicine. The aim of this course is to provide an understanding of the basic principles, and to apply them through appropriate tools in many real world problems that involved big data. LEARNING OUTCOMES: As a result of taking this course, the student should be able to: 1. Combine appropriate data mining techniques, while also considering scalability, to discover information nuggets that are appropriate for a specific problem. 2. Evaluate the quality of the inferred information by using a variety of evaluation methods. METHOD OF TEACHING AND LEARNING: In congruence with the teaching and learning strategy of the college, the following tools are used: In congruence with the learning and teaching strategy of the College, the following tools/activities are used: • Classroom lectures and occasional laboratory practical sessions. • Office hours held by the instructor to provide further assistance to students. • Use of the Blackboard Learning platform, where instructors post lecture notes, assignment instructions, timely announcements, as well as additional resources. ASSESSMENT: Summative: Project: Programming and/or tool usage to address one or more problems in data mining. 100% Formative: In class quizzes or lab exercises 0 The formative assessment aims to shape teaching along the semester and prepare students for the summative assessments. The project tests Learning Outcome: 1,2 (Assignment instructions and assessment rubrics are distributed on the first day of class with the Course Outline.) READING LIST: REQUIRED MATERIAL: Tan, P., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston: Pearson Addison Wesley. Instructor notes FURTHER READING: Hand D., Mannila H., Smyth P., (2001), Principles of Data Mining, MIT Press. Zaki, M. J., Meira W.,( 2014). Data Mining and Analysis, Cambridge University Press. RECOMMENDED MATERIAL: M. Bishop. Pattern Recognition and Machine Learning. Heidelberg, Germany: Springer 2006. i‐xx, 740 pp., ISBN: 0‐387‐31073‐8 $74.95 Hardcover. Kybernetes, 36(2), 275-275. doi:10.1108/03684920710743466 Mitchel T.M. (1997), Machine Learning, McGraw Hill COMMUNICATION REQUIREMENTS: Daily access to the course’s site on the College’s Blackboard CMS. Use of word processing and/or presentation graphics software for documentation of assignments SOFTWARE REQUIREMENTS: Python and related libraries: Skikit-learn, numpy Apache Flink Weka and/or Java machine learning libraries WWW RESOURCES: http://www.kdnuggets.com/ http://www.autonlab.org/tutorials/ http://www-users.cselabs.umn.edu/classes/Fall2011/csci5523/index.php?page=homework%20and%20projects http://archive.ics.uci.edu/ml/ http://www.sciencemag.org/site/feature/data/compsci/machine_le arning.xhtml INDICATIVE CONTENT: INDICATIVE CONTENT: 1. Introduction to data mining 2. Data preprocessing 3. Data mining, knowledge representation 4. Data mining algorithms: Association rules 5. Data mining algorithms: Classification 6. Data mining algorithms: Prediction 7. Evaluation 8. Data mining algorithms: Clustering 9. Visualisation 10. Scalability of data mining 11. Big Data 12. Streaming 13. Advanced topics in Data mining