Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
George Mason University–Office of the Registrar Undergraduate Course Approval Form Please complete this form and attach a copy of the syllabus and catalog description for new courses. Forward the form and attachments to your departmental curriculum committee for approval, and then to your College/School curriculum committee, or Dean’s office, for final approval. The approved form should then be forwarded to the Academic Scheduling Office, MS 3D1. This is for undergraduate course approval only. Please see the Provost Office/Graduate Council website to obtain a copy of the Graduate Course Approval Form and for details about the graduate course approval process. Note: Colleges and Schools are responsible for submitting new or modified catalog descriptions (35 words or less, using catalog format) to Creative Services by deadlines outlined in the yearly Catalog production calendar. Please indicate: New__X__ Modify_______ Delete_______ Department/Unit: _______CDS______________ Course Subject/Number: _____CDS 401_____________ Submitted by: ___John Wallin and Kirk Borne_____________ Ext: ___3-3617__ Email: [email protected]____ Course Title: _____Scientific Data Mining____________________________________________________ Effective Term (New/Modified Courses only): __Fall 2010 Final Term (deleted courses only):____________ Credit Hours: (Fixed) __3__ (Var.) ______ to ______ Grade Type (check one): __X__ _____ _____ Regular graduate (A, B, C, etc.) Satisfactory/No Credit only Special graduate (A, B, C, etc. + IP) Repeat Status*(check one): _X_ NR-Not repeatable ____ RD-Repeatable within degree ____ RT-Repeatable within term *Note: Used only for special topics, independent study, or internships courses Total Number of Hours Allowed: ___3___ Schedule Type Code(s): 1._LEC LEC=Lecture SEM=Seminar STU=Studio INT=Internship IND=Independent Study 2.____ LAB=Lab RCT=Recitation (second code used only for courses with Lab or Rct component) Prereq _X_ Coreq ___ (Check one):________CDS 302______________________________________ __________________________________________________________________________________________ Note: Modified courses - review prereq or coreq for necessary changes; Deleted courses - review other courses to correct prereqs that list the deleted course. Description of Modification (for modified courses):____________________________________________________________________ Special Instructions (major/college/class code restrictions, if needed):__________________________________________ Approval Signatures: Department or Unit: _________________________________________ Date: _____________ (Signature) College/School Committee: ____________________________________ Date: _____________ (Signature) George Mason University Undergraduate Course Coordination Form Approval from other units: Please list those units outside of your own which may be affected by this new, modified, or deleted course. Each of these units should approve this action prior to its being submitted to the COS Curriculum Committee for approval. Unit: Head of Unit’s Signature: Date: Unit: Head of Unit’s Signature: Date: Unit: Head of Unit’s Signature: Date: Unit: Head of Unit’s Signature: Date: Unit: Head of Units Signature: Date: COS Curriculum Committee approval: ______________________________________________ Date: ____________ Course Proposal Submitted to the COS Curriculum Committee 1. COURSE NUMBER AND TITLE: CDS 401 – Scientific Data Mining Course Prerequisites: CDS 302 Catalog Description: 2. COURSE JUSTIFICATION: Course Objectives: Students will be given a set of case studies and projects to develop and expand their understanding of data mining and its scientific applications. This will provide a foundation for future data-centric applications in their careers. Course Necessity: This course will be as an elective for majors in the Computational and Data Science majors as well as an elective for students who are interested in learning more about this scientific data mining from other disciplines. Course Relationship to Existing Programs: This course is uniquely tailored to the new program in Computational and Data Sciences. It may be used for students in other majors who are interested in learning about this field. Courses in IT&E do cover some aspects of this subject, but they are not tailored toward the unique aspects of data mining across the natural sciences. Course Relationship to Existing Courses: The content of this course is similar to some graduate courses within the CSI Ph.D. program. However, the content will be suitably altered for an undergraduate audience. 3. APPROVAL HISTORY: This course was approved by the CDS department as part of its proposed undergraduate degree program. 4. SCHEDULING AND PROPOSED INSTRUCTORS: Semester of Initial Offering: Spring 2010 Proposed Instructors: Kirk Borne, Robert Weigel 5. TENTATIVE SYLLABUS: See attached. CDS 401 SCIENTIFIC DATA MINING -- SYLLABUS -- Prerequisites: CDS 302 Credits: 3 Instructor: Borne Office Hours: TBD Course Description: This course provides a broad overview of the data mining component of the knowledge discovery process, as applied to scientific research. Scientific databases are growing at near-exponential rates. As the amount of data has grown, so has the difficulty in analyzing these large databases. Data mining is the search for hidden, meaningful patterns in such databases. Identifying these patterns and rules can provide significant competitive advantage to scientific research projects and in other career settings. Data mining is motivated and analyzed as the “killer app” for large scientific databases. Data mining techniques, algorithms, and applications are covered, as well as the key concepts of machine learning, data types, data preparation, previewing, noise handling, feature selection, normalization, data transformation, similarity measures, and distance metrics. Algorithms and techniques will be analyzed specifically in terms of their application to solving particular problems. Several scientific case studies will be presented from the science research literature. The techniques that are presented will be drawn from well known statistical, machine learning, visualization, and database algorithms, including clustering, decision trees, regression, Bayes theorem, nearest neighbor, neural networks, and genetic algorithms. Topics will include informatics, semantic knowledge mining, and the integration of data mining with large (and often distributed) scientific databases. Lecture Content: Data Mining Roots and Concepts Scientific Motivation Background Methods o Statistics o Machine Learning o Visualization o Rule-Based Algorithms Software o ADaM o WEKA o JOONE o SNNS o YALE o Intelligent Miner Data Preparation for Data Mining o Data Types o Feature Selection and Dimension Reduction o Previewing o Cleaning o Transformation and Normalization o Distance Measures and Similarity Metrics Supervised Learning Methods o Decision Trees o Artificial Neural Networks o Bayes Networks o Markov Models Unsupervised Learning Methods o Nearest Neighbor o Clustering o Link Analysis o Association Mining o Principal Components Analysis o Outlier Detection Kernel Methods o Kernel-PCA o Support Vector Machines Science Case Studies o Astronomy o Physics o Bioinformatics o Drug Discovery o Combinatorial Chemistry o Remote Sensing, Earth Sciences, Geographic Information Systems o Digital Libraries o Autonomous Science Discovery Robots Special Topics o Text Mining o Image Mining o Temporal Mining o Spatial Mining High-Performance Data Mining o Genetic Algorithms o Distributed Data Mining o Grid Mining o Parallel Mining Next Generation Mining o Informatics o Semantic Mining o Knowledge Mining Homework: (explain assignments) Students will use on-line and computational resources learn about the paradigms, languages, and methods in scientific datamining. Examples will be drawn from the a variety of scientific domains, and show students how to use statistical and machine learning tools to approach this field. Project: (describe, if applicable) There will be no class project for this class. Exams: (give details about midterm and final exams) Midterm and final exams will be given, based on the content of the lectures and the homework assignments. Short essays as well as analytic calculations about problem complexity, time, and simple examples from programs will be used. Grades: Homework (40), Projects (%), Midterm (30%), Final Exam (30%) Required Texts: (list) M. H. Dunham, Data Mining: Introductory and Advanced Topics, 1 st Edition, Prentice-Hall, 2002.