Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
George Mason University Graduate Course Approval/Inventory Form Please complete this form and attach a copy of the syllabus for new courses. Forward it as an email attachment to the Secretary of the Graduate Council. A printed copy of the form with signatures should be brought to the Graduate Council Meeting. Complete the Coordinator Form on page 2, if changes in this course will affect other units. Please indicate: __X_ NEW ____ MODIFY ____ DELETE Local Unit: SCS Graduate Council Approval Date: Course Abbreviation: BINF Course Number: 650 Full Course Title: Introduction to Bioinformatics Database Design Abbreviated Course Title (24 characters max.): Intro to Bioinformatics DBs Credit hours: 3 Repeatable for Credit? Program of Record: ___ D=Yes, not within same term ___ T=Yes, within the same term _X_ N=Cannot be repeated for credit Up to hours Up to hours Activity Code (please indicate): _X_ Lecture (LEC) ___ Lab (LAB) ___ Recitation (RCT) ___ Studio (STU) ___ Internship (INT) ___ Independent Study (IND) ____ Seminar (SEM) Catalog Credit Format 3 : 3 : 0 Course Level: GF(500-600) _X__ GA(700+) ____ Maximum Enrollment: 20 For NEW courses, first term to be offered: Spring 2006 Prerequisites or corequisistes: BINF 634 or equivalent, or permission of the instructor Catalog Description (35 words or less) Please use catalog format and attach a copy of the syllabus for new courses. Students will acquire skills needed to exploit public biological databases and establish and maintain personal databases that support their own research; such skills include learning underlying datamodels and the basics of DBMS, and SQL. For MODIFIED or DELETED courses as appropriate: Last term offered: Previous Course Abbreviation: Previous number: Description of modification: APPROVAL SIGNATURES: Submitted by: ________________________________ email: ________________ Department/Program: College Committee: ________________________________ Date: __________________ ________________________________ Date: _________________ Graduate Council Representative: ________________________________ Date: __________________ GEORGE MASON UNIVERSITY Course Coordination Form Approval from other units: Please list those units outside of your own who may be affected by this new, modified, or deleted course. Each of these units must approve this change prior to its being submitted to the Graduate Council for approval. Unit: Head of Unit’s Signature: Date: CSI (relevant course: 710) Unit: Head of Unit’s Signature: Date: Unit: Head of Unit’s Signature: Date: Unit: Head of Unit’s Signature: Date: Unit: Head of Units Signature: Date: INFS (relevant courses: 614, 846) Graduate Council approval: ______________________________________________ Date: ____________ Graduate Council representative: __________________________________________ Date: ____________ Provost Office representative: ________________________________________ Date: __________ Course Proposal Submitted to the Graduate Council by The School of Computational Sciences 1. COURSE NUMBER AND TITLE: BINF 650: Introduction to Bioinformatics Database Design Prerequisites: BINF 634 (or equivalent), or permission of the instructor. Catalog Description: This course focuses on the basics of creating and working with scientific and statistical databases, with an emphasis on those housing biological data. Students will gain practical experience working with database management systems and using SQL and other database tools to create their own research databases. 2. COURSE JUSTIFICATION: Course objectives: This course is essentially a Research Methods course, in which students learn to model and instantiate, in a relational database management system, a practical database to support their own scientific research; students also learn methods for access, and methods for the retrieval and integration of data from relevant public databases that do not depend simply on pre-computed forms. Biological databases use relational schemas, often supplemented with XML schemas, and exploitation of them requires an understanding of the data representations and how to perform joins and to use SQL for data retrieval. Biological data is often very complex and many valid representations are possible, leading to difficulties in integration. Students will be introduced to strategies for coping with this challenge. As students are introduced to SQL and other programming tools for creating biological databases to support their own research, they will also be expected to learn how to understand and make effective use of existing public repositories. Particular topics will include formats and schemas in important bioinformatics databases (Genbank, EMBL, PDB), XML schema and XML exchange methods, using CGI for the query interface, using generic database tools to browse and manage databases (Tomcat and Pgadmin), relevant database applications of SOAP and CORBA, the types of models used in designing databases, and how ontologies (such as GO) affect database design and queries. At the end of the course, the student will be able to create, populate, and query a database supporting a research area of biological interest (the focus changes each year and has included non-microarray gene expression data and SNP data). Course Necessity: Bioinformatics is defined by the NIH as the computational handling of biological data, and by the NSF as the storage and organization of large amounts of data originating from biological samples. With the advent of high-throughput data streams, including from sequencing and gene expression platforms, and the multiple outputs of modeling programs, storage and organization of both primary and derived data requires that investigators be able to create and maintain research databases as well as understand how to query public biological data repositories and integrate the resulting data streams with their own data. Biological data represents one of the most complex sets of data known, combining genetic, molecular, biochemical and environmental components, and thus represents a highly challenging subject for data modeling: this course will teach students fundamental concepts for breaking up scientific data into its atomic units and ways to create relationships to allow flexible retrieval. There is no graduate course at GMU that exclusively covers the topic of biological data and data modeling in addition to the structure of public biological databases. Course Relationship to Existing Programs: This course has been prototyped as a Special Topics course, BINF739, in Spring 2004 and Spring 2005, and received positive reviews as well a considerable irate demand from our students in 2006 when it was not offered. Inquiries of local employers indicate that database skills, especially an understanding of how to retrieve specific information from the large public repositories and integrate it into local research project data, is one of the top skills in demand. This course is intended as an elective for advanced MS or beginning PhD students in the BINF programs. Course Relationship to Existing Courses: No existing course at GMU has a similar focus or goal. INFS 614 has some overlap in teaching SQL basics and ER diagrams, but does not include models important in certain public bioinformatics databases (OO, star, ASN1). Similarly, CSI710 has some theoretical overlap, but it is a broader survey class covering generic properties of scientific databases for a wide range of disciplines and does not address the specific data representations and integration of disparate data types required for bioinformatics. In fact, Jamison and Weller have been asked to give guest-lectures in CSI710 to introduce those students to these topics over the past several years. INFS864 appears to handle a large number of advanced database topics of interest to computer scientists, but devotes only a single lecture to the problems posed in the human genome project, a DNA sequence problem that has limited application to problems presented by emerging data types. These courses all require a number of pre-requisites (from 3 to 5) that are prohibitory in light of the preparation of bioinformatics students, although we have recommended them to the small number of students who have the appropriate background to profit from them. 3. SCHEDULING AND PROPOSED INSTRUCTORS: Semester of Initial Offering: Spring 2007 Proposed instructors: Dr. Jennifer Weller, Dr. Olga Brazhnik 4. TENTATIVE SYLLABUS: See attached. BINF 650 Syllabus Week 1 subject Introduction to public bioinformatics DB resources Structuring knowledge domains (ER, OO, star) 2 Framing bioinformatics experiments and expression as a schema SQL-99 data and schema definition Model support for bioinformatics research Relational algebra and calculus 3 4 Mapping between models and schemas SQL: assertions, views 5 Bioinformatics data set retrieval, validation and interpretation Relational db mapping, dependencies Conceptual modeling of bioinformatics data structures Modeling use cases and experiment requirements Practicum: Modeling a hatching enzyme experiment midterm questions midterm Bioinformatics data types and conversion methods Presentation of student work flows, conceptual models Tools for populating and validating bioinformatics databases Normalizing and indexing Web access, programmatic access, security of clinical data Complex queries Data interchange protocols for bioinformatics XML-based protocols (MAGEML, SBML) 6 7 8 9 10 11 12 Reading E/N: Ch 1 - 4, Appendix E,F R/G: Ch 1,2 H/K: Ch 1.1-1.3 E/N: Ch 8 R/G: Ch 5 H/K: 4 E/N: Ch 5,6 R/G: Ch 3,4 H/K: E/N: Ch 9 R/G: H/K: E/N: Ch 7,11 R/G: H/K: 2.2, 2.5 E/N: R/G: H/K: E/N: R/G: H/K: E/N: Ch 12-14 R/G: Ch 6-9 H/K: 3, 10 E/N: Ch 10 R/G: Ch 10, 11, 20 H/K: E/N: Ch 15 R/G: Ch 12-14 H/K: E/N: Ch 26 R/G: Ch 27 H/K: 4 13 Student Presentations 14 Final 1. "Fundamentals of Database Systems" by R. Elmasri and S.B. Navathe. from Addison Wesley, Fourth Edition,2003. Primary text[abbreviated E/N for chapter assignments] Suggested additional texts: 2. "Database Management Systems" by Raghu Ramakrishnan and Johannes Gehrke from McGraw Hill, Third Edition, 2002.Highly Recommended [abbreviated R/G for chapter assignments] 3. "Data Mining: Concepts and Techniques" by Jiawei Han and Micheline Kamber from Morgan Kaufman Publishers, 2001 . Optional but useful [abbreviated H/K for chapter assignments]