Download College of Science - George Mason University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Database model wikipedia , lookup

Transcript
George Mason University
Graduate Course Approval/Inventory Form
Please complete this form and attach a copy of the syllabus for new courses. Forward it as an email attachment
to the Secretary of the Graduate Council. A printed copy of the form with signatures should be brought to the
Graduate Council Meeting. Complete the Coordinator Form on page 2, if changes in this course will affect
other units.
Please indicate:
__X_ NEW
____ MODIFY
____ DELETE
Local Unit: SCS
Graduate Council Approval Date:
Course Abbreviation: BINF
Course Number: 650
Full Course Title: Introduction to Bioinformatics Database Design
Abbreviated Course Title (24 characters max.): Intro to Bioinformatics DBs
Credit hours: 3
Repeatable for Credit?
Program of Record:
___ D=Yes, not within same term
___ T=Yes, within the same term
_X_ N=Cannot be repeated for credit
Up to hours
Up to hours
Activity Code (please indicate):
_X_ Lecture (LEC) ___ Lab (LAB)
___ Recitation (RCT)
___ Studio (STU)
___ Internship (INT) ___ Independent Study (IND)
____ Seminar (SEM)
Catalog Credit Format 3 : 3 : 0
Course Level: GF(500-600) _X__ GA(700+) ____
Maximum Enrollment: 20
For NEW courses, first term to be offered: Spring 2006
Prerequisites or corequisistes: BINF 634 or equivalent, or permission of the instructor
Catalog Description (35 words or less) Please use catalog format and attach a copy of the syllabus for new
courses. Students will acquire skills needed to exploit public biological databases and establish and maintain
personal databases that support their own research; such skills include learning underlying datamodels and the
basics of DBMS, and SQL.
For MODIFIED or DELETED courses as appropriate:
Last term offered:
Previous Course Abbreviation:
Previous number:
Description of modification:
APPROVAL SIGNATURES:
Submitted by:
________________________________ email: ________________
Department/Program:
College Committee:
________________________________ Date: __________________
________________________________ Date: _________________
Graduate Council Representative: ________________________________ Date: __________________
GEORGE MASON UNIVERSITY
Course Coordination Form
Approval from other units:
Please list those units outside of your own who may be affected by this new, modified, or deleted course. Each of these units must
approve this change prior to its being submitted to the Graduate Council for approval.
Unit:
Head of Unit’s Signature:
Date:
CSI (relevant course: 710)
Unit:
Head of Unit’s Signature:
Date:
Unit:
Head of Unit’s Signature:
Date:
Unit:
Head of Unit’s Signature:
Date:
Unit:
Head of Units Signature:
Date:
INFS (relevant courses: 614, 846)
Graduate Council approval: ______________________________________________ Date: ____________
Graduate Council representative: __________________________________________ Date: ____________
Provost Office representative: ________________________________________
Date: __________
Course Proposal Submitted to the Graduate Council
by
The School of Computational Sciences
1. COURSE NUMBER AND TITLE:
BINF 650: Introduction to Bioinformatics Database Design
Prerequisites: BINF 634 (or equivalent), or permission of the instructor.
Catalog Description: This course focuses on the basics of creating and working with scientific and statistical
databases, with an emphasis on those housing biological data. Students will gain practical experience working
with database management systems and using SQL and other database tools to create their own research
databases.
2. COURSE JUSTIFICATION:
Course objectives: This course is essentially a Research Methods course, in which students learn to model and
instantiate, in a relational database management system, a practical database to support their own scientific
research; students also learn methods for access, and methods for the retrieval and integration of data from
relevant public databases that do not depend simply on pre-computed forms. Biological databases use
relational schemas, often supplemented with XML schemas, and exploitation of them requires an
understanding of the data representations and how to perform joins and to use SQL for data retrieval.
Biological data is often very complex and many valid representations are possible, leading to difficulties in
integration. Students will be introduced to strategies for coping with this challenge. As students are introduced
to SQL and other programming tools for creating biological databases to support their own research, they will
also be expected to learn how to understand and make effective use of existing public repositories. Particular
topics will include formats and schemas in important bioinformatics databases (Genbank, EMBL, PDB),
XML schema and XML exchange methods, using CGI for the query interface, using generic database tools to
browse and manage databases (Tomcat and Pgadmin), relevant database applications of SOAP and CORBA,
the types of models used in designing databases, and how ontologies (such as GO) affect database design and
queries. At the end of the course, the student will be able to create, populate, and query a database supporting
a research area of biological interest (the focus changes each year and has included non-microarray gene
expression data and SNP data).
Course Necessity: Bioinformatics is defined by the NIH as the computational handling of biological data, and
by the NSF as the storage and organization of large amounts of data originating from biological samples. With
the advent of high-throughput data streams, including from sequencing and gene expression platforms, and the
multiple outputs of modeling programs, storage and organization of both primary and derived data requires
that investigators be able to create and maintain research databases as well as understand how to query public
biological data repositories and integrate the resulting data streams with their own data. Biological data
represents one of the most complex sets of data known, combining genetic, molecular, biochemical and
environmental components, and thus represents a highly challenging subject for data modeling: this course
will teach students fundamental concepts for breaking up scientific data into its atomic units and ways to
create relationships to allow flexible retrieval. There is no graduate course at GMU that exclusively covers the
topic of biological data and data modeling in addition to the structure of public biological databases.
Course Relationship to Existing Programs: This course has been prototyped as a Special Topics course,
BINF739, in Spring 2004 and Spring 2005, and received positive reviews as well a considerable irate demand
from our students in 2006 when it was not offered. Inquiries of local employers indicate that database skills,
especially an understanding of how to retrieve specific information from the large public repositories and
integrate it into local research project data, is one of the top skills in demand. This course is intended as an
elective for advanced MS or beginning PhD students in the BINF programs.
Course Relationship to Existing Courses: No existing course at GMU has a similar focus or goal. INFS 614
has some overlap in teaching SQL basics and ER diagrams, but does not include models important in certain
public bioinformatics databases (OO, star, ASN1). Similarly, CSI710 has some theoretical overlap, but it is a
broader survey class covering generic properties of scientific databases for a wide range of disciplines and
does not address the specific data representations and integration of disparate data types required for
bioinformatics. In fact, Jamison and Weller have been asked to give guest-lectures in CSI710 to introduce
those students to these topics over the past several years. INFS864 appears to handle a large number of
advanced database topics of interest to computer scientists, but devotes only a single lecture to the problems
posed in the human genome project, a DNA sequence problem that has limited application to problems
presented by emerging data types. These courses all require a number of pre-requisites (from 3 to 5) that are
prohibitory in light of the preparation of bioinformatics students, although we have recommended them to the
small number of students who have the appropriate background to profit from them.
3. SCHEDULING AND PROPOSED INSTRUCTORS:
Semester of Initial Offering: Spring 2007
Proposed instructors: Dr. Jennifer Weller, Dr. Olga Brazhnik
4. TENTATIVE SYLLABUS: See attached.
BINF 650 Syllabus
Week
1
subject
Introduction to public bioinformatics DB
resources
Structuring knowledge domains (ER, OO, star)
2
Framing bioinformatics experiments and
expression as a schema
SQL-99 data and schema definition
Model support for bioinformatics research
Relational algebra and calculus
3
4
Mapping between models and schemas
SQL: assertions, views
5
Bioinformatics data set retrieval, validation and
interpretation
Relational db mapping, dependencies
Conceptual modeling of bioinformatics data
structures
Modeling use cases and experiment requirements
Practicum: Modeling a hatching enzyme
experiment
midterm questions
midterm
Bioinformatics data types and conversion methods
Presentation of student work flows, conceptual
models
Tools for populating and validating bioinformatics
databases
Normalizing and indexing
Web access, programmatic access, security of
clinical data
Complex queries
Data interchange protocols for bioinformatics
XML-based protocols (MAGEML, SBML)
6
7
8
9
10
11
12
Reading
E/N: Ch 1 - 4, Appendix
E,F
R/G: Ch 1,2
H/K: Ch 1.1-1.3
E/N: Ch 8
R/G: Ch 5
H/K: 4
E/N: Ch 5,6
R/G: Ch 3,4
H/K:
E/N: Ch 9
R/G:
H/K:
E/N: Ch 7,11
R/G:
H/K: 2.2, 2.5
E/N:
R/G:
H/K:
E/N:
R/G:
H/K:
E/N: Ch 12-14
R/G: Ch 6-9
H/K: 3, 10
E/N: Ch 10
R/G: Ch 10, 11, 20
H/K:
E/N: Ch 15
R/G: Ch 12-14
H/K:
E/N: Ch 26
R/G: Ch 27
H/K: 4
13
Student Presentations
14
Final
1. "Fundamentals of Database Systems" by R. Elmasri and S.B. Navathe. from Addison Wesley, Fourth
Edition,2003. Primary text[abbreviated E/N for chapter assignments]
Suggested additional texts:
2. "Database Management Systems" by Raghu Ramakrishnan and Johannes Gehrke from McGraw Hill, Third
Edition, 2002.Highly Recommended [abbreviated R/G for chapter assignments]
3. "Data Mining: Concepts and Techniques" by Jiawei Han and Micheline Kamber from Morgan Kaufman
Publishers, 2001 . Optional but useful [abbreviated H/K for chapter assignments]