Download The new Computational and Data Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
The new Computational and Data Sciences (CDS)
Undergraduate Program at George Mason University
Kirk D. Borne, John Wallin, Robert Weigel (GMU)
We present the new undergraduate science program in Computational and Data Sciences (CDS) at George Mason University (GMU),
which began offering courses for both major (B.S.) and minor degrees in Spring 2008. The overarching theme and goal of the program are to
train the next-generation scientists in the tools and techniques of cyber-enabled science (e-Science). New courses include Introduction to
Computational & Data Sciences, Scientific Data & Databases, Scientific Data & Information Visualization, Scientific Data Mining, and
Scientific Modeling & Simulation. This is an interdisciplinary science program, drawing examples, classroom materials, and student
activities from a broad range of physical and biological sciences, including Space Physics (Space Weather), Solar Physics, Astrophysics,
Geosciences, Computational Fluid Dynamics, Materials Science, Physics, Chemistry, and Bioinformatics. We will describe some of the
motivations and early results from the program. More information is available at http://cds.gmu.edu/ . The development of the GMU CDS
program is sponsored by the NSF CCLI (Course, Curriculum, and Laboratory Improvement) program, through award # 0737091.
Computational and Data Sciences:
• Computational and Data Sciences are emerging fields involving applications of sophisticated
simulation and data-oriented methods to build models and solve problems in science & engineering.
• Recently emerged interdisciplinary areas in the chemical, physical and biological sciences (such as
biotechnology, nanotechnology, molecular electronics, photonics in nanoscale systems, and
energetics of DNA/protein binding) require highly-qualified professionals with strong computational
skills to work closely with experimentalists in solving complex scientific and engineering problems.
• Emerging data-intensive science fields (such as Geoinformatics, Bioinformatics, Astroinformatics,
Materials Informatics, and Medical Informatics) require specially trained professionals with strong
data skills to address ubiquitous data-intensive applications in science, industry, and government.
• Computational and data sciences complement existing theoretical and experimental science
approaches and may be thought of as a new mode of scientific inquiry.
Existing – Graduate Program in Computational Science & Informatics (CSI) at GMU:
• The CSI graduate program has existed since 1992, having graduated 175 PhDs to-date.
• There are over 90 graduate courses in the CSI program, covering many science disciplines.
New! – Undergraduate Program in Computational & Data Sciences (CDS) at GMU:
• Why? Students graduating with a traditional discipline-based bachelor’s degree in biology,
chemistry, mathematics, or physics generally do not have the required computational background
necessary to participate as productive members of modern interdisciplinary scientific research
teams, which are becoming increasingly data- and computationally intensive.
• What? The BS program in CDS provides students with a variety of opportunities to become
research professionals possessing interdisciplinary knowledge, including sciences and applied
mathematics, augmented with strong computational and data-oriented skills.
• How? Students in this program are exposed to a wide range of computational and data science
applications, and will learn computational science tools, high-performance computing, applied and
theoretical computational techniques, modeling and simulation, statistical analysis, optimization,
data & information visualization, scientific database applications, scientific data mining & knowledge
discovery in databases (KDD), and data-intensive science research methods.
• Who? Students with a broad interest in computers and sciences will benefit from the program.
• The Results? Graduates from this program will acquire interdisciplinary knowledge and will be able
to apply scientific principles in solving complex real-world problems.
• When & Where? The program started offering courses in Spring 2008 at GMU.
http://cds.gmu.edu/programs/bs_cds/program6.php
Topics covered in CDS 101 (Introduction to Computational & Data Sciences):
•
•
•
•
•
•
•
•
•
•
•
•
•
The Scientific Method - Experiments, Observations, and Models
Computer Internals – binary numbers and logic circuits
Computer Algorithms and Tool – introduction to programming in Matlab
Data acquisition – linking sensors to computers
Signal Processing – understanding sources of noise and errors
Scientific Databases – storing and organizing scientific data
Data Reduction and Analysis – moving from data to information
Data Mining – moving from information to knowledge
Computer Models – using mathematics and algorithms to represent reality
Computer Simulations – solving linear systems and applications
Computer Visualization – seeing experiments as images
High Performance Computing – simulation at the cutting edge of technology
Future directions in computational science – languages, quantum computing and beyond
Topics covered in CDS 401 (Scientific Data Mining):
•
•
•
•
•
•
•
•
Scientific Motivation for data mining
Quantitative Concepts for use in data mining
Software packages for data mining
Data Preparation (e.g., data types, previewing, cleaning dirty data, normalization, transformation)
Distance and Similarity Metrics for Clustering and Classification
Supervised Learning Methods (e.g., decision trees, neural networks, Bayes, Markov models)
Unsupervised Learning Methods (e.g., clustering, nearest neighbor, networks, link/association analysis)
Scientific Data Mining Case Studies & Special Topics (e.g., time series, images, spatial data, outlier detection)
"The importance of data in science and engineering continues on a path of exponential growth; some even assert that the
leading science driver of high-end computing will soon be data rather than processing cycles. Thus it is crucial to provide major
new resources for handling and understanding data.“ (NSF Atkins Report on Cyberinfrastructure, 2005)
“Data science has developed to include the study of the capture of data, their analysis, metadata, fast retrieval, archiving,
exchange, mining to find unexpected knowledge and data relationships, visualization in two and three dimensions including
movement, and management.” (F. Smith, Data Science as an Academic Discipline, Data Science Journal, Vol. 5. p. 163, 2006)
National study reports identifying workforce demands for computational & data sciences:
•
•
•
•
•
•
NSF report on "Knowledge Lost in Information: Report of the NSF Workshop on Research Directions for
Digital Libraries" (2003)
NSB (National Science Board) report on "Long-lived Digital Data Collections: Enabling Research and
Education in the 21st Century" (2005)
NSF Atkins Report on "Revolutionizing Science and Engineering Through Cyberinfrastructure" (2005)
NSF-sponsored Computing Research Association report on “Cyberinfrastructure for Education and
Learning for the Future: A Vision and Research Agenda” (2005)
NSF + ARL report on "Long-term Stewardship of Digital Data Sets in Science and Engineering" (2006)
NSF report on "Cyberinfrastructure Vision for 21st Century Discovery" (2007)
Space Weather
Colliding Galaxies
Virtual Observatories
& Large Scientific DBs
CFD
Computational & Data Sciences Program of Study:
Students must complete a total of 18 credits in computational and data sciences core
courses, 15 credits in computer science, 23 credits in mathematics, 6 credits in statistics,
21-25 credits in a science concentration, and 3-9 credits in computational and data sciences
electives. Three concentration areas are currently offered: Physics, Chemistry, and Biology.
Additional concentrations may be added in the future. Students are encouraged to
undertake an optional research project that allows them to gain useful experience in the
development of simulations and other aspects of computational science.
Catalog Descriptions: Six required core computational & data sciences courses (18 credits):
CDS 101 Introduction to Computational and Data Sciences – Introduction to the use of
computers in scientific discovery through simulations and data analysis. Covers historical
development and current trends in the field.
CDS 301 Scientific Information and Data Visualization – The techniques and software used
to visualize scientific simulations, complex information, and data visualization for
knowledge discovery.
CDS 302 Scientific Data & Databases – Data & databases used by scientists. Includes basics
about database organization, queries, and distributed data systems. Student exercises will
include queries of existing systems, along with basic design of simple database systems.
CDS 401 Scientific Data Mining – Data mining techniques from statistics, machine learning,
and visualization to scientific knowledge discovery. Students will be given a set of case
studies and projects to test their understanding of this field and provide a foundation for
future applications in their careers.
CDS 410 Modeling and Simulations I – Numerical differentiation and integration, initialvalue and boundary-value problems for ordinary differential equations, methods of
solution of partial differential equations, iterative methods of solution of nonlinear systems,
and approximation theory.
CDS 411 Modeling and Simulation II – This course covers the application of modeling and
simulation methods to various scientific applications, including fluid dynamics, solid
mechanics, materials science, molecular mechanics, and astrophysics.
Early Results:
1. The B.S. degree (CDS major) was approved in 2007. The minor was approved in 2008.
2. The first courses (CDS 101 and 302) were offered in Spring 2008.
3. Additional courses (CDS 301 and 401) were offered in Fall 2008.
4. Currently, 10 active students are majoring in the program.
5. New course for Spring 2009: CDS 151 – Data Ethics in an Information Society
"Data-driven science is becoming a new scientific paradigm – ranking with theory, experimentation, and computational science.“
(JISC/NSF Workshop on Data Driven Science & Repositories, 2007)
“There is a growing gap between the generation of data and our understanding of it.”
(I. Witten & E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2005)
“While data doubles every year, useful information seems to be decreasing.“
(M. Dunham, Data Mining Introductory and Advanced Topics, 2002)
“The Information Revolution has provided unprecedented opportunity to ensure that scientific data are fully integrated in the
fundamental workings and decision making of our society.” (S. Iwata, Scientific Agenda of Data Science, Data Science Journal,
Vol. 7. p. 54, 2008)