Download CDS 401 - George Mason University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
George Mason University–Office of the Registrar Undergraduate Course Approval Form
Please complete this form and attach a copy of the syllabus and catalog description for new courses. Forward the form and
attachments to your departmental curriculum committee for approval, and then to your College/School curriculum
committee, or Dean’s office, for final approval. The approved form should then be forwarded to the Academic Scheduling
Office, MS 3D1. This is for undergraduate course approval only. Please see the Provost Office/Graduate Council
website to obtain a copy of the Graduate Course Approval Form and for details about the graduate course approval process.
Note: Colleges and Schools are responsible for submitting new or modified catalog descriptions (35 words or less, using
catalog format) to Creative Services by deadlines outlined in the yearly Catalog production calendar.
Please indicate: New__X__
Modify_______
Delete_______
Department/Unit: _______CDS______________ Course Subject/Number: _____CDS 401_____________
Submitted by: ___John Wallin and Kirk Borne_____________ Ext: ___3-3617__ Email: [email protected]____
Course Title: _____Scientific Data Mining____________________________________________________
Effective Term (New/Modified Courses only): __Fall 2010 Final Term (deleted courses only):____________
Credit Hours: (Fixed) __3__
(Var.) ______ to ______
Grade Type (check one): __X__
_____
_____
Regular graduate (A, B, C, etc.)
Satisfactory/No Credit only
Special graduate (A, B, C, etc. + IP)
Repeat Status*(check one): _X_ NR-Not repeatable ____ RD-Repeatable within degree ____ RT-Repeatable within term
*Note: Used only for special topics, independent study, or internships courses
Total Number of Hours Allowed: ___3___
Schedule Type Code(s): 1._LEC LEC=Lecture SEM=Seminar STU=Studio INT=Internship IND=Independent Study
2.____ LAB=Lab RCT=Recitation (second code used only for courses with Lab or Rct component)
Prereq _X_ Coreq ___ (Check one):________CDS 302______________________________________
__________________________________________________________________________________________
Note: Modified courses - review prereq or coreq for necessary changes; Deleted courses - review other courses to correct prereqs that list the deleted course.
Description of Modification (for modified courses):____________________________________________________________________
Special Instructions (major/college/class code restrictions, if needed):__________________________________________
Approval Signatures:
Department or Unit: _________________________________________ Date: _____________
(Signature)
College/School Committee: ____________________________________ Date: _____________
(Signature)
George Mason University
Undergraduate Course Coordination Form
Approval from other units:
Please list those units outside of your own which may be affected by this new, modified, or deleted course. Each of these units should
approve this action prior to its being submitted to the COS Curriculum Committee for approval.
Unit:
Head of Unit’s Signature:
Date:
Unit:
Head of Unit’s Signature:
Date:
Unit:
Head of Unit’s Signature:
Date:
Unit:
Head of Unit’s Signature:
Date:
Unit:
Head of Units Signature:
Date:
COS Curriculum Committee approval: ______________________________________________ Date: ____________
Course Proposal Submitted to the COS Curriculum Committee
1. COURSE NUMBER AND TITLE:
CDS 401 – Scientific Data Mining
Course Prerequisites: CDS 302
Catalog Description:
2. COURSE JUSTIFICATION:
Course Objectives:
Students will be given a set of case studies and projects to develop and expand their understanding
of data mining and its scientific applications. This will provide a foundation for future data-centric
applications in their careers.
Course Necessity:
This course will be as an elective for majors in the Computational and Data Science majors as well as an
elective for students who are interested in learning more about this scientific data mining from other
disciplines.
Course Relationship to Existing Programs:
This course is uniquely tailored to the new program in Computational and Data Sciences. It may be used for
students in other majors who are interested in learning about this field. Courses in IT&E do cover some aspects
of this subject, but they are not tailored toward the unique aspects of data mining across the natural sciences.
Course Relationship to Existing Courses:
The content of this course is similar to some graduate courses within the CSI Ph.D. program. However, the
content will be suitably altered for an undergraduate audience.
3. APPROVAL HISTORY:
This course was approved by the CDS department as part of its proposed undergraduate degree program.
4. SCHEDULING AND PROPOSED INSTRUCTORS:
Semester of Initial Offering:
Spring 2010
Proposed Instructors:
Kirk Borne, Robert Weigel
5. TENTATIVE SYLLABUS: See attached.
CDS 401
SCIENTIFIC DATA MINING
-- SYLLABUS --
Prerequisites: CDS 302
Credits: 3
Instructor: Borne
Office Hours: TBD
Course Description:
This course provides a broad overview of the data mining component of the knowledge discovery
process, as applied to scientific research. Scientific databases are growing at near-exponential
rates. As the amount of data has grown, so has the difficulty in analyzing these large databases.
Data mining is the search for hidden, meaningful patterns in such databases. Identifying these
patterns and rules can provide significant competitive advantage to scientific research projects and in
other career settings. Data mining is motivated and analyzed as the “killer app” for large scientific
databases. Data mining techniques, algorithms, and applications are covered, as well as the key
concepts of machine learning, data types, data preparation, previewing, noise handling, feature
selection, normalization, data transformation, similarity measures, and distance metrics. Algorithms
and techniques will be analyzed specifically in terms of their application to solving particular
problems. Several scientific case studies will be presented from the science research literature. The
techniques that are presented will be drawn from well known statistical, machine learning,
visualization, and database algorithms, including clustering, decision trees, regression, Bayes
theorem, nearest neighbor, neural networks, and genetic algorithms. Topics will include informatics,
semantic knowledge mining, and the integration of data mining with large (and often distributed)
scientific databases.
Lecture Content:





Data Mining Roots and Concepts
Scientific Motivation
Background Methods
o Statistics
o Machine Learning
o Visualization
o Rule-Based Algorithms
Software
o ADaM
o WEKA
o JOONE
o SNNS
o YALE
o Intelligent Miner
Data Preparation for Data Mining
o Data Types
o Feature Selection and Dimension Reduction
o Previewing







o Cleaning
o Transformation and Normalization
o Distance Measures and Similarity Metrics
Supervised Learning Methods
o Decision Trees
o Artificial Neural Networks
o Bayes Networks
o Markov Models
Unsupervised Learning Methods
o Nearest Neighbor
o Clustering
o Link Analysis
o Association Mining
o Principal Components Analysis
o Outlier Detection
Kernel Methods
o Kernel-PCA
o Support Vector Machines
Science Case Studies
o Astronomy
o Physics
o Bioinformatics
o Drug Discovery
o Combinatorial Chemistry
o Remote Sensing, Earth Sciences, Geographic Information Systems
o Digital Libraries
o Autonomous Science Discovery Robots
Special Topics
o Text Mining
o Image Mining
o Temporal Mining
o Spatial Mining
High-Performance Data Mining
o Genetic Algorithms
o Distributed Data Mining
o Grid Mining
o Parallel Mining
Next Generation Mining
o Informatics
o Semantic Mining
o Knowledge Mining
Homework: (explain assignments)
Students will use on-line and computational resources learn about the paradigms, languages, and methods in
scientific datamining. Examples will be drawn from the a variety of scientific domains, and show students how
to use statistical and machine learning tools to approach this field.
Project: (describe, if applicable)
There will be no class project for this class.
Exams: (give details about midterm and final exams)
Midterm and final exams will be given, based on the content of the lectures and the homework assignments.
Short essays as well as analytic calculations about problem complexity, time, and simple examples from
programs will be used.
Grades:
Homework (40), Projects (%), Midterm (30%), Final Exam (30%)
Required Texts: (list)
M. H. Dunham, Data Mining: Introductory and Advanced Topics, 1 st Edition, Prentice-Hall, 2002.