Download Course Approval Form - Office of the Provost

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
For approval of new courses and deletions or
modifications to an existing course.
Course Approval Form
registrar.gmu.edu/facultystaff/curriculum
Action Requested:
Course Level:
x Create new course
Delete existing course
Modify existing course (check all that apply)
Title
Prereq/coreq
Other:
College/School:
Submitted by:
Subject Code:
Credits
Schedule Type
Repeat Status
Restrictions
Number:
Effective Term:
776
(Do not list multiple codes or numbers. Each course proposal must
have a separate form.)
Title:
Current
Banner (30 characters max including spaces)
New
Mining Massive Datasets
Credits:
3
(check one)
Grade Mode:
Fixed
Variable
x
(check one)
or
to
x
(check one)
Prerequisite(s):
CS750 or equivalent
Computer Science
Email:
[email protected]
x
Fall
Spring
Summer
Year
2011
Mining Massive Datasets
Repeat Status:
Regular (A, B, C, etc.)
Satisfactory/No Credit
Special (A, B C, etc. +IP)
Undergraduate
Graduate
Grade Type
Department:
Ext:
X31627
Volgenau School of Engineering
Daniel Barbará
CS
x
Schedule
Type Code(s):
(check all that
apply)
Not Repeatable (NR)
Repeatable within degree (RD)
Repeatable within term (RT)
x
Lecture (LEC)
Lab (LAB)
Recitation (RCT)
Internship (INT)
Corequisite(s):
Maximum credits
allowed:
Independent Study (IND)
Seminar (SEM)
Studio (STU)
Instructional Mode:
x 100% face-to-face
Hybrid: ≤ 50% electronically delivered
100% electronically delivered
Special Instructions: (list restrictions for major, college, or degree;hard-coding; etc.)
Are there equivalent course(s)?
Yes
x No
If yes, please list
Catalog Copy for NEW Courses Only (Consult University Catalog for models)
Description (No more than 60 words, use verb phrases and present tense)
Notes (List additional information for the course)
Applications with massive amounts of data are becoming commonplace.
From Social Network data to Genomics, the need for efficient, scalable needs
to analyze data is pressing. This course covers the techniques to mine large
datasets, including Distributed File Systems and Map-Reduce, similarity
search, data stream processing. It covers classic problems in data mining,
such as clustering, association rule mining, and others from the point of view
of scalability. The course includes a final project to exercise the concepts
covered in class.
Indicate number of contact hours:
Hours of Lecture or Seminar per week: 2.5
Hours of Lab or Studio:
When Offered: (check all that apply)
x Fall
Summer
Spring
Approval Signatures
Department Approval
Date
College/School Approval
Date
If this course includes subject matter currently dealt with by any other units, the originating department must circulate this proposal for review by
those units and obtain the necessary signatures prior to submission. Failure to do so will delay action on this proposal.
Unit Name
Unit Approval Name
For Graduate Courses Only
Unit Approver’s Signature
Date
Graduate Council Member
Provost Office
Graduate Council Approval Date
For Registrar Office’s Use Only: Banner_____________________________Catalog________________________________
revised 2/2/10
Course Proposal Submitted to the Curriculum Committee of the College of Science
1. COURSE NUMBER AND TITLE:
CS 776: Mining Massive Datasets
Course Prerequisites:
CS 750 or equivalent course
Catalog Description:
Applications with massive amounts of data are becoming commonplace. From Social Network data to Genomics, the need for efficient, scalable
needs to analyze data is pressing. This course covers the techniques to mine large datasets, including Distributed File Systems and Map-Reduce,
similarity search, data stream processing. It covers classic problems in data mining, such as clustering, association rule mining, and others from the
point of view of scalability. The course includes a final project to exercise the concepts covered in class
2. COURSE JUSTIFICATION:
Course Objectives:
To familiarize students with the emerging techniques for analyzing very large datasets.
To apply the concepts learned in class in a project utilizing massive datasets and a cluster of computers such
as the Hydra cluster.
Course Necessity:
Massive datasets are becoming commonplace in the industry. While GMU has classes on Data Mining, it lacks
a class that focuses on large datasets analysis.
Course Relationship to Existing Programs:
This course can be used as an elective in the MS-CS and Phd-CS programs.
Course Relationship to Existing Courses:
This course is the natural extension of CS 688 and CS 750
3. APPROVAL HISTORY:
4. SCHEDULING AND PROPOSED INSTRUCTORS:
Semester of Initial Offering: Fall 2011
Proposed Instructors: Dr. Daniel Barbará and Dr. Huzefa Rangwala
5. TENTATIVE SYLLABUS:
COURSE PROPOSAL
BY
THE DEPARTMENT OF
COMPUTER SCIENCE
PROPOSAL DESIGNATION
New Course Proposal
I.
CATALOG DESCRIPTION
A.
CS 776: Mining Massive Datasets
B.
Prerequisite: CS 750 or equivalent
Applications with massive amounts of data are becoming commonplace. From Social Network data to
Genomics, the need for efficient, scalable needs to analyze data is pressing. This course covers the techniques
to mine large datasets, including Distributed File Systems and Map-Reduce, similarity search, data stream
processing. It covers classic problems in data mining, such as clustering, association rule mining, and others
from the point of view of scalability. The course includes a final project to exercise the concepts covered in
class
II.
JUSTIFICATION
A.
Desirability of adding this course
Massive datasets are becoming commonplace in the industry. While GMU has classes on Data Mining, it lacks
a class that focuses on large datasets analysis.
B.
Relationship to other courses
This course is the natural follow-up to CS750. It can be used as elective in the MS and PhD CS programs, and
also in the Data Mining certificate
III. SCHEDULING
IV.
A.
This course will be offered in the Fall semester of 2011 and every year subsequently.
B.
Proposed instructors are Dr. Daniel Barbará and Dr. Huzefa Rangwala
SAMPLE SYLLABUS
Syllabus: CS 776 Mining Massive Datasets
Course Objectives
This course addresses the techniques needed to perform the analysis and mining of very large datasets. The
emergence of such datasets is becoming ubiquitous in industry, government and scientific organizations.
Topics Covered
• Distributed File Systems and Map-Reduce
• Scalable similarity search
• Data-stream analysis
• Search engines for large repositories of data
• Algorithms for clustering massive data
• Algorithms to find association rules in very large data
Grading Policy
• Individual assignments 35%
• Final exam 30%
• Final group project and presentation 35%
Sample Schedule
Class
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Topic
Data Mining refreshing
Large Scale File Systems and MapReduce
Large Scale File Systems and MapReduce (II)
Large-scale similarity searching
Data-Stream mining (I)
Data-Stream mining (II)
Link Analysis (I)
Link Analysis (II)
Large-scale Frequent Itemset
finding
Large-scale Clustering (I)
Large-scale Clustering (II)
Recommender Systems
Student presentations
Student presentations
Final exam
Readings
Rajaraman and Ullman (RU) Ch 1
RU Ch. 2
RU Ch 2
RU Ch 3
RU Ch 4 and selected papers
RU Ch 4 and selected papers
RU Ch 5 and selected papers
RU Ch 5 and readings from Koller
and Miller’s book on Prob.
Graphical models
RU Ch 6
RU Ch 7
RU Ch 7
RU Ch 9
Textbooks
Required: Mining of Massive Datasets by Anand Rajaraman and Jeffrey Ullman. Soon to be published.
Available currently on-line
Reference: Probabilistic Graphical Models: Principles and Techniques (Adaptive Computation and Machine
Learning) by Daphne Koller and Nir Friedman. The MIT Press (August 31, 2009)
Selected papers.
Course Description
776 Mining Massive Datasets (3:3:0) Prerequisite: CS750 or equivalent. The course investigates techniques
to mine large datasets, including Distributed File Systems and Map-Reduce, similarity search, data stream
processing. It covers classic problems in data mining, such as clustering, association rule mining, and others
from the point of view of scalability. The course includes a final project to exercise the concepts covered in
class