Download Teaching Data Modeling and Database Design Graeme Simsion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Teaching Data Modeling and Database Design
Graeme Simsion, June 2005
Background
I have been teaching data modeling to information systems professionals at the
introductory, advanced and “master class” levels, and to college undergraduates and
postgraduates since the late 1980s.
I have been involved in the development of data modeling courses and curricula at
Melbourne, Monash and Swinburne Universities in Australia, and liaised with a number
of colleges that use my book “Data Modeling Essentials1” (first published 1994).
From 1982-1999, I managed a consultancy that was Australia’s leading provider of data
modeling consultancy and education, and am currently conducting research on data
modeling practice through the University of Melbourne. We hired a number of graduates
into trainee data modeler positions, and assisted clients to do the same.
Effectiveness of Education and Training
In my experience, data modeling and database design are often not well-taught, in the
college / university environment, particularly in terms of providing students with skills
that will be useful in industry. Specifically:
 Students who have completed introductory (and even advanced) courses in data
modeling are frequently unable to develop even very simple models in real
situations (as distinct from using descriptions devised by an instructor with a
‘correct answer’ in mind).
 Courses address aspects of data modeling / database design that students are
unlikely to be able to apply in practice – or not until they have substantial
practical experience. Examples include database tuning (for small projects, likely
to be unnecessary, for large projects likely to be the responsibility of a specialist)
and knowledge of alternative modeling conventions.
 Normalization – a key technique – is often taught in such a mathematical fashion
(typically without the use of diagrams) that students fail to gain any intuitive
understanding of its use; without that understanding the skill is quickly forgotten.
 The conventions taught in many colleges are not those widely used in industry.
To some extent this reflects academic interests in alternative approaches (e.g.
ORM) that do not have a significant user base, but there is a common
misunderstanding, reflected in many texts, that the Entity-Relationship approach
(Chen, 1976) is dominant in industry. In fact, practitioners generally (mis)-use
the term “entity-relationship” to refer to the Information Engineering “crow’s
foot” conventions supported by most documentation tools. UML is the emerging
alternative standard.
1
Third Edition, co-authored with Graham C. Witt published by Morgan Kaufmann, San Francisco, 2005.

Similarly, despite the dominance of relational DBMSs (even for persistent data in
object-oriented applications) some courses and texts persist in covering outdated
architectures – hierarchical and network models.
Goals of Teaching Data Modeling and Database Design
Virtually every information systems practitioner deals with structured data on a daily
basis: specifying/writing programs and reports against a database, specifying additions
and changes, developing coding schemes, evaluating and adapting the data structures in
software packages. Knowledge of the principles of data representation and structure is
fundamental to information systems practice. This is the domain of data modeling.
Conversely, database tuning (physical database design) is generally recognized as a
specialized task – and one that has less direct relevance to the general IS practitioner.
In my view, the principal goal of teaching data modeling and database design is to
provide students with an understanding of how data is represented and organized in
modern databases, and of the principles and tradeoffs involved in good logical data
organization. In an undergraduate course, it is not unreasonable to expect them to be able
to devise sound relational (or extended relational) data structures to support simple (but
real) user requirements.
This is not to deny the value of teaching foundational theory: but without a firm
appreciation of how this translates into practice, the theory is quickly forgotten.
Some Guidelines
In my experience, there are four key rules for teaching data modeling and database design
effectively:
1. Work Backwards
Data modeling is frequently taught before database design and even before SQL
programming. While this reflects the sequence in which these activities are performed in
practice, it means that the student does not have a clear idea of how the outputs of his/her
activity will be used. The advice given to neophyte data modelers to “identify the things
of interest to the business” needs to be qualified by “with an understanding that these will
be implemented as tables”. Without this qualification, students will regularly represent
almost any noun in the problem description as an entity. Instructors resort to removing
such nouns from the problem description in order to lead the student towards the right
answer – a less than satisfactory preparation for the real world.
I suggest the following broad teaching sequence, which will typically be spread across
more than one course/unit:
1. Programming against an existing relational database, preferably including one or
two structural problems (not fully normalized, non-atomic data) to illustrate the
importance of sound data structure.
2. Modification of a relational database to incorporate new requirements (and
programming of queries to prove the changes).
3. Instruction and practice in relational data structuring techniques – including
normalization.
4. Reverse engineering of a data model diagram from the relational model
(preferably using DSD/IE/”crow’s foot” conventions.
5. Development of data model diagram “top down” from a reverse engineered
description.
6. Development of data model diagram “top down” from real-world description.
7. Discussion of (and possibly practice with) alternative approaches and
conventions.
2. Teach Data Modeling as a Design Discipline
Data modeling is difficult: there is a good argument that it is better characterized as a
design discipline than a descriptive / inquiry discipline. Students who are able to quote
quite sophisticated theory are frequently unable to tackle even the simplest non-contrived
problems. Conversely, the theory makes more sense if it can be related to personal
practical experience.
I suggest teaching data modeling as you would any design discipline (architecture is
frequently used as a metaphor for data modeling). Provide plenty of practical examples,
of increasing difficulty and show the students models developed by others.
In practice there is seldom a ‘single right answer’. An exercise that admits only one
solution may be useful in teaching a particular technique, but is not representative of
problems encountered in practice. As students move to building models from real
situation descriptions (i.e. not written with a model solution in mind), they should be
encouraged to compare answers across a range of quality dimensions.
3. Teach the Relevant Theory
There is a substantial body of theory in data modeling. Academic research priorities have
not traditionally reflected the most pressing practical issues. While it is useful to provide
an overview of research, students are likely to gain more from theory which directly
illuminates the work they are doing – and that they are likely to be able to employ in
practice.
There is a substantial body of literature around expert vs novice performance; methods,
stages and deliverables; the relational model and normalization; and data warehouse
design (this largely in the practitioner domain). This is likely to be highly relevant to
students as they work on their assignments, and to their later practice.
Conversely, the very large body of literature on alternative models is worthy of note, but
is often given attention out of proportion to its relevance. Similarly, there is some current
research interest in application of ontology to data modeling – interesting to note but
perhaps not worth pursuing in detail except as an elective option for the student at this
level.
4. Use popular conventions
I strongly recommend that you use either the IE or UML conventions in teaching data
modeling. Research has generally shown these (and the Chen E-R conventions) to be at
least as useful as the more “fringe” approaches, which nevertheless have their advocates.
The downside of using alternative approaches such as ORM is that the student misses the
opportunity to acquire proficiency with a language that is used in industry. In my
experience working with modelers who have been trained in alternative approaches, any
additional “deeper understanding” is more than outweighed by this missed opportunity.
Finally, I use diagrams heavily in teaching relational data structures. For various reasons
writers on the relational approach (in particular Date) have preferred a tabular
presentation to diagrams of structure. Experience in industry and teaching is that
diagrams can significantly help students understand the key features of relational
structures.