Download Database Notes (full version) - The ELCHK Yuen Long Lutheran

AS LEVEL Computer Application Databases YLLSS In the syllabus, we have Applications of  databases in society Students should be aware of the uses and applications of databases in everyday life (e.g. the library system, inventory system in a supermarket, credit card system, etc.).  Students should be given opportunities to discuss the importance of databases in business environments and how they are related to the success of a business. Concepts and  Students should understand the following terminology and concepts: terminology  data and information  data, fields, records, tables, files and databases  common data types such as integer, real, character, string, boolean, date, etc.  indexes and keys  database management systems (e.g. data definition language, data manipulation language, data dictionary, transaction processing and access control, etc.)  program-data independence  data redundancy and data integrity Basic concepts of a  relational database Students should know the basic concepts underpinning relational databases such as entity, relation, attribute, domain, primary key, foreign key, candidate key, entity integrity, referential integrity, domain integrity, etc. Students should be able to identify these basic elements in examples taken from everyday applications.  Students should know how to organise data differently but sensibly in a relational database and be able to establish the required relationships to link up the tables. Creating a relational  database Database maintenance and manipulation Students should be able to create a simple relational database2 based on specified requirements using SQL.  Students should be able to use SQL to maintain a simple relational database, manipulate its data or retrieve the required information. They should be able to:  modify the structure of the tables  add, delete and modify the data in the tables  view, sort and select the contents by filtering 1  use appropriate operators and expressions such as the in, between and like operators, arithmetic operators and expressions, comparison operators and logical operators etc. to perform specific operations  use simple built-in functions such as aggregate and string functions, etc.  perform multiple field indexing and multi-level ordering  perform queries on multiple tables including the use of equi-join, natural join and outer join  perform sub-queries (for 1 sub-level only)  export query results to, for example, text, html or spreadsheet format, etc. The conceptual data  model Students should understand the importance of good database design in effective database management. They should be aware of the three levels of data abstraction; namely conceptual level, physical level and view level. Entity-Relationship  modeling Students should be aware of the three types of relationship (one-to-one, one-to-many, many-to-many) among entities in a relational database.  Students should be able to create simple entity-relationship (ER) diagrams involving binary relationships only in designing databases for simple business scenarios. This includes the resolution of many-to-many relationships into multiple one-to-many relationships in order to implement the database.  Students should be able to transform the ER diagrams to tables in relational databases and be able to create a database schema for a given set of data to describe the characteristics of the database. Introduction to Normalisation  Students should be able to briefly explain the meaning and purpose of normalisation. They should be aware of the methods or measures used to reduce data redundancy 2 Introduction to Databases Data vs. Information Numbers, text, images or any recording in a form that is accessible to human beings are classified as data. Data themselves have no meaning. It is only when data is interpreted then the data content will become meaningful. Interpreted data are referred to as information. For example, the number 33.5 tells us almost nothing. However when readers are told that the number stands for the temperature in centigrade, the number makes sense to us. In this example, 33.5 is a piece of data whereas 33.5 as a temperature in centigrade is a piece of information. Information is stored in computers such that both its data value and interpretation will be recorded. In most cases, interpretation of computer data is typically given by the corresponding data name. In the context of databases (which will be elaborated in the next section) as well as in daily use, the terms “information” and “data” are often used interchangeably although such a kind of confusion is not desirable. In most cases, the interpretation of the term “data” should be clear from the context of discussion. In the context of databases, “data” usually means “information”. The Data Hierarchy Each information system has a hierarchy of data organization, and each succeeding level in the hierarchy is the result of combining the elements of the preceding level. The six levels are bits, characters (bytes), fields (data elements), records, files, and data base (see Figure 1). A bit is a binary digital which has a value of either 0 or 1. A byte is a composed of 8 ordered bits. Figure 1. Hierarchy of data organization. 3 Question to ponder  Are byte and character types the same? (Answer: Not necessarily. This depends on the underlying encoding scheme being adopted. Even an ASCII character may need more than one byte to store in certain implementation of Unicode.) Data Field/Element A (data) field or data element is the lowest level “logical unit” in the data hierarchy that can be interpreted in a meaningful way, e.g., “David” for a name, “23469345” for a phone number. The maximum number of characters (not bytes) that a field can have is called field length. A field may consist of a single character only, e.g., M(ale) and F(emale) for representing sex. How fine is the granularity of a field is a user’s decision, e.g., we can treat an address as a single field or as an aggregate of several fields such as flat-and-floor-number, street-number-and-street-name, district, city and country, etc. The key concern is the application needs. If certain processing is required to handle an address at city level, we of course need to divide the address field into its components. Record A record is a logical group of related data fields describing an event or an item, e.g., a student enrollment record consists of fields such as student-ID, student-name, programme-code, module-code, date-of-enrollment, etc. A record is the lowest level logical unit that can be accessed from a file. In other words, if one would like to access a data field within a record, the whole record has to be retrieved first before the required data field is identified. File A logical file is composed of occurrences of records. A physical file is used to refer to a named area on a secondary storage device that contains a program, a textual material, or even an image. One logical file is not necessary mapped to one physical file and vice versa. For example, a logical file may consist of an index area and a record area such that each of the areas is associated with a separate physical file. End-users are usually concerned with logical files instead of physical files. Questions to ponder  Give an example that a physical file may contain more than one logical file. 4  Give an example that a logical file may be stored in multiple physical files. Data Base A data base is a collection of files that are logically related and integrated to one another so that data redundancy is minimized or reduced. Data redundancy exists when a data field is stored in more than one logical file. Data redundancy often cannot be eliminated entirely for various reasons but it should be kept under control. Database management system is devised to control the data redundancy problem by ideally storing every data item once and/or by propagating data changes to all related record occurrences probably among a number of files so that data integrity (which concerns the validity, accuracy and correctness of data) can be maintained. Database management system is often referred to as DBMS, database or database system. Teaching remark  In many books or online learning resources, the term “data base” is often incorrectly referred to as “database”. It comes to a stage that people begin to use the two terms interchangeably. In fact, the ASCA and ALCS Curriculum and Assessment Guides also use the terms interchangeably. Need for Storing Persistent Data Almost all computer applications require some data be kept for describing some inherently stable properties or up-to-date status of certain items or events. Let us think about the information kept by a bank for its saving account holders. For each saving account, the bank must store its unique account number, name(s) of account holder(s), contact address(es) of account holder(s), account balance, etc., to say a few. Those data are considered to be persistent data as they are not changed frequently. However some data are more persistent than the others. For example, an account number should never been changed whereas there is a slim chance that changes would be required for the name(s) of the account holder(s). Account balance is most susceptible to change among the pieces of listed data as transactions like money deposit or withdrawal will affect its value. Obviously the correctness of all recorded persistent data is important to the functioning of the associated computer applications. Whether or not a piece of data is persistent varies from application to application. Age may not be considered as a piece of persistent data as it changes every year for most people. However the age field is definitely persistent if it appears on a death certificate. Problems of File Systems 5 Persistent data can be stored in file(s). However there are potential problems with that. 1. Since files are designed to fit individual application needs, a data element may appear in several files if that piece of data is needed in several applications. For example, a bank customer may open a saving account and a stock account at the same time. For the stock account, the account balance is composed of the quantity of each stock purchased. Obviously at least two different files are needed to keep data for the two types of accounts but data elements such as name(s) of account holder(s) and contact address(es) of account holder(s), are common. When the customer moves to a new address, both file are required to be updated. This is caused by the data redundancy problem. Data redundancy can cause a number of problems during data modifications; those problems are referred to as data anomalies (which will be detailed later). 2. A consequence of data redundancy problem is integrity problem or data consistency problem. Data become inconsistent if copies of data are not updated simultaneously. 3. Traditional file systems suffer from sharing problem and security problem. If a new report which needs to use some but not all data from two files is required, should one be allowed to access both files? As access control on a file system can only be made at the file level, allowing someone to read both files implies unnecessary exposure of data. If a new file is created to store all data needed to produce the new report, data redundancy problem emerges. 4. Structural dependence (also known as program-data dependence) exhibits in file systems. In order to use a file, a program needs to know the file structure, i.e., details of all data stored in the file. A change in any file’s structure requires the modification of all programs using that file. Aims of Database Systems The aims of database systems are as follows:     Reduce data redundancy and inconsistency Separate user data view from physical file structure (see next session for details) Impose data integrity constraints, e.g., for data validation Tackle atomicity problem, i.e., all activities in a transaction is either completely performed or undone. For example, if money is transferred from a saving account to a stock account, the saving account will be debited whereas the stock account will be credited with the same amount. The corresponding transaction has to ensure that both data changes are done as one single unit.      Allow concurrent data access Offer secured data access Help make data management more efficient and effective Allow quick answers to ad hoc queries using some query language Provide end users better access to more and better-managed data 6 Some databases may not be able to achieve all the above aims. Early databases may not support transaction processing or offer secured data access for concurrent users. Separating User Data View from Physical File Structure A key advantage of database is that the end-users and application programmers do not have to know how data files are organized and stored in the database. This is referred to as the structural independence (or program-data independence). Thus changing the structure of a file does not necessarily require computer programs that access the file be modified. Databases achieve structural independence by organizing data through advanced data structures in which the data fields and records are related to each other. Computer programs do not access files for data. Instead, computer programs that need accessing data have to direct their requests to the DBMS which in turn processes the requests against the data base; in other words, all operations on the data base are coordinated by the DBMS. Figure 2 describes the interactions between different parties in a database environment. Figure 2. Interactions between various parties in a database environment. Applications of Databases in Society Almost all computer applications need to use database to store persistent data. at least the following data need to be kept.    In a library system, Library user ID number Library user name Library user contact address 7   Maximum number of books that a library user can borrow Library user ID number, book’s call number and due date of the loan period for each book which is on loan  Author name(s), publisher, year of publication, and status (e.g, on-loan, on-hold, on-request, and missing, etc.) of each book. The above information must be kept in order to support basic library operations like book search, borrowing and return, etc. In a supermarket, inventory information needs to be stored so as to facilitate the inventory, purchasing, marketing and other business functions of the company. Some of the information to be kept is given below.    Item ID number Item name (e.g., ABC dental cream) Item category (e.g., oral hygienic)     Unit price Stock level Reorder level (below which an order needs to be placed for replacement) Reorder amount (i.e., the number of items to be ordered) In a credit card system, the following information should be recorded.          Card number Card owner’s name, contact address and phone number Credit limit Credit amount used Card’s expiry date Card’s date of issue First issued date Number of times that the card was reported missing Number of times of late payment Databases not only support day-to-day operations of organizations only. Applications can be built to analyze historical data in databases for planning purpose. Banks use various types of customer information such as account balances, salary information, saving patterns, credit card repayment patterns, mortgage repayment patterns to create their customers’ profiles. Customer details like occupation, age and marital status are recorded too. Such information is stored in databases and would be analyzed so as to enable the banks to identify potential customers for specific products, e.g., fund investment and insurance. Such a kind of database applications is known as data mining which analyzes data in databases to look for data trends or anomalies without the knowledge of the meaning of the data. 8 The amount of operational data would be too much for management staff to digest. Besides, the data would be too raw for them to make management decisions. In practice, operational data are typically summarized (and stored in a data warehouse sometimes) before they are presented to the management. All mentioned data, no matter in a raw or digested form, are stored in some form of database. Types of Databases There are many ways to classify databases and two of them are listed below. Number of Users Many databases designed to run on personal computers are expected to be used by one user at a time. We usually referred them as single-user databases. Earlier versions of Microsoft FoxPro and Access belong to such a type. More sophisticated databases like MySQL, Microsoft SQL Server, IBM DB2 and Oracle are called multi-user databases as they have built-in facilities for secured and concurrent data access. Location A database may be either centralized or distributed. In a centralized database, all database functions run entirely on a single computer. A distributed database is composed of a set of partially independent databases running on a group of networked computers that share a common schema (i.e., an overall design of data base), and coordinate processing of transactions that access non-local data (Silberschatz et. al., 1997). Reference Silberschatz, A., Korth, H.F., & Sudarshan, S. (1997). Database System Concepts (3rd ed.). McGraw Hill. Another form of distributed implementation of databases, more commonly known as client-server databases, focuses on the distribution of various database functions over multiple computers. In particular, the database front-end functionality such as input validation is typically done by the client machines (which are usually personal computers) whereas the back-end functionality like transaction handling and data base update is provided by server systems, which are typically either data servers or transaction servers. 9 Data Models A data model is a collection of logical constructs used to represent data structure, data semantics and data relationships found within the database. Database models can be conceptual or implementation oriented. Conceptual data models are used to describe data at the logical and (user) view levels. It offers no description about the implementation issues. Conceptual models are often used as a communication tool between database designers and end-users so as to help the designers understand the data requirements of the end-users correctly. The entity-relationship model is an instance of conceptual data model. Another type of data model provides a high-level description of the implementation. Three popular implementation models are hierarchical, network and relational models. Note that the problem of structural dependence in both hierarchical and network models is resolved in the relational model. The key advantages of relational model are as follows:      Structural independence Improved conceptual simplicity as data are structured in simple-to-understand tables Easier database design, implementation, management, and use Ad hoc query capability with the use of the structured query language Powerful database management system can be built with the system’s complexity being hidden from the user view 10 Relational Database Concepts Introduction In this section, basic relational database terminology and concepts will be introduced. The definitions and characteristics of entity, relation, attribute, domain and key, etc., are detailed. In particular, the difference between keys and indexes, and three concepts about data integrity, namely entity integrity, referential integrity and domain integrity, are explained. In order to help explain the above terminology and concepts, a problem scenario about a school library is introduced as below: The library of XYZ School has decided to computerize its services so as to make them more efficient and effective. Since computerization is relatively new to the school, the library aims to provide only basic library functions to the users initially through the implementation of a simple computerized library system. The system is expected to offer a computerized catalogue of all library items, e.g., books and past examination papers, and basic circulation functions such as item borrowing, returning and reserving. Obviously the system needs to keep library user information such as the number of library items that s/he is allowed to borrow, dates and call numbers of those library items that s/he has borrowed, or requested, etc. Library item details such as its call number, author(s), ISBN, year of publication and status (e.g., available, on loan, requested and damaged), etc., are also kept. As a teacher librarian of the school, you are asked to design a suitable database schema to support the mentioned library operations. Whenever applicable, examples will be provided in relation to the above problem scenario so as to provide a clear context for illustrating the database terminology and concepts. Entity and Entity Set/Type An entity is a distinguishable object to be described. It can be any object such as a person, a place, an event or a thing, etc. Entities that share the same properties or attributes are collectively referred to as an entity set (or entity type). Example entity sets that can be found in a school environment are students (person), classrooms (place), examinations (event), and subjects (thing), etc. 11 Entity sets in the XYZ School library example: o Suppose Linus and Jeff are students, they are entities (library users) because they share properties of a student and are distinguishable objects in a school library system. o Library users who may be teachers or students (person), library items (things), circulation transactions such as a book request (event), and user privilege (things) etc. Teaching remark (out of syllabus)  An entity set (type) may be further divided into supertype and subtypes if required. In the XYZ School library example, both teachers and students are classified as library users. However, it is possible that we need to further divide teachers and students into separate subtypes for meeting certain application needs. For example, a student library user is required to be associated with his/her class if the school would like to research into the number of books borrowed per student from each class. Such an association also helps the teacher librarian to learn more about the reading habit of various classes of students. Obviously such a sort of class association does not exist for teachers. Conversely, the library may want to know the number of times that a teacher does not return borrowed items to the library on time and the cumulative number of days overdue (as students are fined for late return of library items but it is not always easy to implement a similar system on teachers). Such a function can help the teacher librarian to identify those colleagues who do not fully respect the library regulations. Storing those pieces of information is obviously not necessary for student library users. The similarity and differences in the application need for various library users imply a need for a finer classification among them. For example, common attributes of library users such as library user ID, name, address, etc., are kept in the supertype whereas non-common attributes of teacher and student are kept in the corresponding refined entities. The supertype and corresponding subtypes are structured to form a generalization hierarchy. In relational database, an entity set is typically represented in terms of one or more relations (a mathematical term for tables), with each of which being composed of rows and columns. Each tuple (a mathematical term for rows in a table) in a relation represents an entity of the associated entity set. Each column, which is uniquely named within the table that it is associated with, represents a category of information that corresponds to an attribute. A relational database is typically composed of a number of related tables. Note that the order of the rows and columns within a table is immaterial to the database. As shown in the table below, the “user privilege” entity set of XYZ School library example is composed of 6 rows with each row defining the privilege of a user type for a given material type. 12 column (attribute) row (tuple) Table 1. The “user privilege” table of XYZ School library. Attributes Attribute and Domain Each entity has certain descriptive properties known as attributes (or fields). Some potential attributes for the student entity are student-name, student-number, and sex, etc. Attributes in the XYZ School library example: o student ID, class name (in the “library usesr” entity set) o call number, material type (e.g., CD-ROM, book), item name (e.g., book title) The set of all possible values for an attribute is called its domain. For the student entity set, the domain of the attribute sex should be {female, male} whereas the domain of the attribute age should be any positive integer (although it may make more sense by setting an upper bound for the domain). Attribute domains in XYZ School library example: o Domain of “class name”: all valid class names found in XYZ school. o Domain of “maximum number of library items that a user can borrow”: any non-zero integer not greater than 10. The relational database theory does not restrict what data type that an attribute can associate with. However, some commonly supported data types in relational database are:    Number (integer or real number) Text (fixed length or variable length) Boolean type 13  Date and time Simple vs. Composite Attributes Attributes that cannot be divided into subparts are known as simple attributes (e.g., age); otherwise they are composite attributes (e.g., address). Whether there is a need to re-structure an attribute to finer attributes depends on the application needs. In the XYZ School library example, the library user name is represented as a composite attribute as it is not further divided into simpler attributes such as first-name and surname. Such a representation does not cause any problem as the library does not have any need of processing library information in accordance with its user’s first-name or surname. To facilitate detailed queries (for the future), many database designers prefer to change a composite attribute into a series of simple attributes. Null Attributes It is possible to use a null as the value of an attribute of an entity. For example, the value of the ISBN field will be set to null for past examination papers but a valid ISBN is needed for most books. Derived Attributes In some occasions, the value of an attribute can be derived from other related attributes or entities. Such a kind of attributes is referred to as derived attribute. Suppose a database keeps an employee table to store employee information like employee-number, employee-name and number-of-dependents, and a dependent table to record information of each employee’s dependent in a separate row. In this case, the number-of-dependents attribute in the employee table is a derived attribute as its value is equal to the number of associated rows in the dependent table. In a good database design, integrity constraint (which will be detailed later) should be defined between derived attributes and their base attributes in order to ensure that an update of the value of any base attribute will trigger a corresponding update of any associated derived attributes. Otherwise, data inconsistency will occur. Intuitively, we should eliminate all derived attributes of a database because their values, if required, can be computed in real-time. However the use of derived attributes can improve the efficiency of a database. In the XYZ School library example, it is better to have (derived) attributes to record the number of times that a teacher does not return borrowed items to the library on time and the cumulative number of days overdue although those pieces of information can be derived from the teacher’s circulation records history. The use of derived attributes in this example can greater 14 enhances the database efficiency when compared to rescanning all past circulation records of a teacher for computing the required information. In this example, the computational effort for maintaining the integrity of the values of the derived attributes and their base attribute values is small. 15 Keys  A key is a value of one or more selected attributes used to identify an entity in an entity set. The concerned attribute(s) is/are known as the key field(s). A potential key field of the “library user” entity set of the XYZ School library example is the “library user ID” which is unique for each library user.  A superkey is a set of one or more attributes that, taken collectively, uniquely identify an entity in an entity set. However, a superkey may contain extraneous attributes. In the “user privilege” table of the XYZ School library example, all of the following combinations of attributes are superkeys o “User type” and “Type of material” o “User type”, “Type of material” and “Loan period o “User type”, “Type of material”, “Loan period”, and “Total number of items that can be borrowed” o “Description” and “Type of material” o “Description”, “Type of material” and “Loan period. Once the values of any of the above attribute combinations are given, we can always uniquely identify an entity (row) in an entity set (table). The following attribute combinations are NOT superkeys: o “User type” and “Description” o “Type of material” and “Loan period because giving the values of any of the above attribute combinations, more than one entity (row) may be identified. Teaching remarks  The identification of superkeys for a table must be based on the semantics of the attributes of the table instead of the table content. In “the “user privilege” table of the XYZ School library example (see Table 2), it appears that giving the values of the “Loan Period” and “Total number of items that can be borrowed”, a unique entity (row) can be identified and thus the two attributes, when combined, can be taken as a superkey. However this is misleading. Suppose school alumni are allowed to use the library and they are allowed to borrow up to 3 books for a maximum of 14 days. This obviously makes the “Loan Period” and “Total number of items that can be borrowed” no longer a superkey as a junior student is also allowed to borrow the same number of books for the same loan period.  In reality, teachers as well as textbooks often use table contents to explain the concept of key (and normalization, which will be covered later). Teachers must indicate to students their assumption that the table contents give an exhaustive illustration of the table semantics. 16  Minimal superkeys are called candidate keys. Removal of any attribute in a candidate key will render the remaining attribute(s) no longer a key. In the “user privilege” table of the XYZ School library example, all of the following combinations of attributes are candidate keys o “User type” and “Type of material” o “Description” and “Type of material” In the above example, it clearly shows that it is okay for a table to have more than one candidate key. However multiple candidate keys in a table might imply the existence of transitive dependency in the table. Transitive dependency is an indicator of poor database design and should be avoided. The notion of transitive dependency will be introduced when introducing the notion of database normalization”. Teaching remark  Like superkeys, the identification of candidate keys for a table must NOT base on the table content, but the semantics of the attributes of the table.  A primary key is a candidate key chosen by the database designer as the major means of identifying an entity (row) within an entity set (table). No part of a primary key can be null. Unlike the candidate key, a table can only have one primary key. Teaching remark  Some textbooks in the market may have given an imprecise definition of candidate key and primary key. In one textbook, a primary key is defined as a field or combination of fields that uniquely and minimally identify a particular record in a table. According to this definition, it is possible that a table would have more than one primary key but this is obviously incorrect. The definition given in the book in fact describes a candidate key rather than a primary key.  Any attribute which is not a part of any candidate key is known as a non-key attribute. In the XYZ School library example, the loan-period is a non-key attribute.  A foreign key is either null or not a superkey in its own table but a candidate key in another table. Suppose we have two tables, namely student-subject and subject which store the subjects that a student has enrolled and the subject description respectively. The student-subject table records student-ID (a part of the primary key) and subject-ID (another part of the primary key) whereas the subject table stores subject-ID (primary key) and subject-descriptor. The subject-ID in the student-subject table is a foreign key to the subject table. 17 student-subject subject student-ID subject-ID subject-ID subject-descriptor 200425642 CS1132 CS1132 Databases 200425654 CS1132 CS1145 Programming 200425854 CS1145 foreign key to the subject table Teaching remarks  It is wrong to say the subject-ID in the student-subject table is a foreign key. The notion of foreign key is defined on two tables.  Many textbooks do not explicitly state that the value of a foreign key can be null. Indexes  One or more indexes can be defined for a table for efficient data retrieval. Unlike primary key, an index does not have to be unique. Whether or not an index is required for a table depends on the application needs. Inclusion or omission of an index in a table definition may affect the efficiency, but not the functionality, of any data retrieval.  An index is an implementation structure such that given one or more attribute values, relevant rows can be efficiently retrieved. It is typically implemented through the use of sophisticated data structures like ISAM and B+ trees. Common mistakes  Some people may use the terms “index” and “secondary key” interchangeably but this should be avoided. Keys are logical concepts whereas indexes are implementation concepts. In fact, there is no notion of “secondary key” or “index” in relational database theory. Teaching remarks  Most relational databases create an index for the primary key of each table for efficient data retrieval.  Although indexing can facilitate efficient data retrieval, it should not be overused. creation and maintenance may involve a lot of computations that take time to finish. Index 18 Data Integrity As mentioned before, data integrity is concerned with the validity, accuracy and correctness of data. In relational database, three type of data integrity are of particular concerns. They are entity integrity, domain integrity and referential integrity. Entity Integrity Entity integrity is a property that ensures that 1. no rows are duplicated, and 2. no attributes that make up the primary key have a null value. Note that condition 1 must be enforced or a primary key will not be able to uniquely identify an entity (a row) in an entity set (a table). As an example, the “user privilege” table does meet the criteria of entity integrity. Domain Integrity Domain integrity is a property that ensures that whenever a new data item is entered into the database, it must be within the domain of the corresponding attribute. For instance, the enforcement of domain constraint can stop one from entering a value other than “female” or “male” to the sex attribute. Referential Integrity Referential integrity is concerned with the data consistency between coupled tables. In particular, we may want to ensure that an attribute value that appears in one table also appears for a certain set of attributes in another table. For example, the XYZ School library database may keep one table to store library user personal information like user-ID, user-name, and contact-address, etc. and another table to keep information about loaned books like user-ID, book-call-number, and due date, etc. The user-ID is the primary key of the library-user-details table whereas the concatenation of user-ID and book-call-number forms the primary key of the loaned-book table. The user-ID attribute of the loaned-book table is a foregin key to the library-user-details table (as user-ID is not a superkey in the loaned-book table but a candidate key in the library-user-details table). Obviously, it is important to ensure that any value appeared in the user-ID attribute of the loaned-book table also appears in the user-ID attribute of the library-user-details table. In relational databases, referential integrity is typically enforced by defining a referential constraint between a primary key and a foreign key. For referential integrity to hold, any attribute(s) in a table that is declared a foreign key can contain only values from the primary key attribute(s) of 19 the table that the foreign key relationship is referred to. Thus, deleting a row that contains a value referred to by a foreign key in another table would break referential integrity. In the XYZ School library example, this is equivalent to removing a library user from the library-user-details table without demanding the user to return all books that s/he has borrowed. More examples about referential integrity can be found here. It is important to note that a referential constraint may not enable us to avoid errors at the database design level. The following example illustrates such a problem. The table on the left stores ID numbers and names of all library users whereas the table on the right keeps all loaned books. ID and call number are the primary keys of the library user and loan event tables respectively. user ID in the loan event table is a foreign key to the library user table. According to the definition of foreign key, it is acceptable to assign a null value to user ID as found in third record of the loan event table. This obviously does not make sense from a user perspective to allow a book being loaned to an unknown person but the referential constraint setting between the two tables does not stop the assignment of null to user ID. To avoid the problem, we need to make user ID in the loan event table a mandatory attribute. Teaching remark  SQL92 and SQL99 provides standard features to define constraints for modeling various data integrity constraints but many commercial database management systems such as Microsoft Access tend to provide non-standard customized features to serve the purpose. Such details are not within the curriculum and will not be further discussed here. 20 Introduction to Database Design Methodology Three Levels Database Architecture Database can be viewed at three levels of abstraction, namely conceptual level, physical (or internal) level and view (or external) level. The key concerns of the three levels are as follows:  View level or external level is concerned with how individual users see the data. Note that a user is may range from application programmers to casual users who interact with the database with ad-hoc query facilities. For example, a library user may be interested in the library collection but not the library user statistics. The librarian would not be expected to have any interest in the information about individual library user’s reading habit.  Conceptual level is concerned with a community user view of the entire information content of the database that is of interest to the organization. In this level, no physical consideration is considered. A change in the internal view to improve performance may not involve any change in the conceptual view of the database.  Physical level or internal level is concerned with how data is actually stored. Efficiency is the prime concern at this level. The following aspects, among others, are considered at this level: 1. Data structures chosen, e.g. B-trees, hashing, etc. 2. Access paths, e.g. specification of primary and secondary keys, indexes and pointers and sequencing. 3. Miscellaneous issues, e.g. data encryption and compression techniques. Figure 1 outlines the three levels database architecture. Figure 1. The three-level database architecture. The key advantage of the three-level database architecture is that it separates (1) the conceptual view from the physical view, and (2) the external views from the conceptual view. The former enables a database designer to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence. The latter enables a 21 database designer to change the conceptual view without affecting the external views in most cases. This separation is sometimes called logical data independence. Readers may click here for a more detailed discussion of the three levels. Logical Data Modeling In order to identify the data need of an organization, logical data modeling is usually applied. Logical data modeling explores the domain concepts, and their relationships, of a problem domain. In databases, logical data modeling typically exhibits in the form of entity relationship modeling. The basic idea is to identify data objects called (logical) entity sets, which are described by their (data) attributes, and their relationships that meet all data requirements of the concerned organization, typically expressed in a type of diagram called entity relationship diagrams (ERD). Logical data modeling, so does entity relationship modeling, may be performed for the scope of a single project or for the entire enterprise. Teaching remarks  Different variants of ERD come with different notations and it is important to tell students to describe any potentially ambiguous ERD notations when answering a question.  For a comprehensive description on data modeling, the Information Technology Services of the University of Texas has produced an online practical guide to data modeling which is definitely worth reading. Note that the ERD notations used there are not always consistent with the ERD notations adopted in this package. Terminology and Notation of Entity Relationship Modeling Some of the terminology, e.g., such as entity and attributes, of entity relationship modeling that readers need to be familiar with have already been covered in the section entitled “Basic Terminology”. The description below offers some additional information about those mentioned terms as well as details of those terms that have not given previously. Corresponding ERD notion used in this package is also shown. Entity  An entity is a representation of any composite information of a real object (e.g., a bank customer) or an abstract object (e.g., a money withdrawal transaction of a bank). o Entities encapsulate data only, i.e., an entity is described only by its associated attributes. How its attributes will be manipulated is out of the scope of the entity. For example, an entity about a money withdrawal transaction of a bank is concerned with what amount of money being taken out from which account on a particular date. How those recorded data may be used for various purposes are immaterial from the logical data modeling perspective.  Entities may be related to one another, e.g., a bank customer may perform a number of money withdrawal transactions over a given period.  The ERD notation for entities is a rectangle. A STUDENT entity is represented below. 22 Figure 2. Notation for entities (rectangle). Attribute  Attributes define the properties of an entity so as to o name an instance of an entity o describe the instance o make reference to another instance Example: A school subject is an entity which is characterized by the subject code or name; a subject also has other attributes such as subject description; a subject may not be taken unless a student has completed its prerequisite subjects which are objects themselves  The ERD notation for attributes is an oval. An attribute is linked to the associated entity by a line or two lines depending whether or not the attribute is a multi-valued attribute. Suppose the previously mentioned STUDENT entity has two attributes only – name and address. The corresponding ERD representation is given below. Figure 3. Notation for attributes (oval connected to a rectangle with a line). The above example assumes that every student has exactly one name and one address. For a student that has more than one address, the corresponding ERD representation is as follows: Figure 4. Notation for multi-value attributes (oval connected to a rectangle with double lines). If an attribute is the (primary) key or a component of the primary key of an entity, the attribute name may be underlined. Assuming each student has a unique name, the corresponding ERD representation is changed as below. 23 Figure 5. Notation for multi-valued attributes (oval connected to a rectangle with double lines). Teaching remarks  Apparently the A/AS Level curricula do not require students to be familiar with how multi-valued attribute be drawn in an ERD.  In reality, it would be tedious to show attributes of entities in an ERD due to space limitation. Besides, the attributes associated with a selected entity are usually clear from the context. Thus attributes of entities are typically omitted in an ERD. Relationship Relationships are links connecting to entities that define the relationships of the entities. There may be more than one relationship between two (or more) entities, e.g., customers open accounts, customers close accounts in which open and close are relationships between the customer and account entity sets. Note that an entity may have a reflexive relationship with itself, e.g., the work supervisor of an employee of a company is also an employee of the company. Although a relationship can be classified by its degree, cardinality, connectivity, direction, type, and existence, etc., not all modeling methodologies use all these classifications. This package will only focus the discussion in degree, cardinality, and existence. The ERD notation for relationship is a diamond with the name of the relationship as the label of the shape. The following ERD says a teacher would mark assignment. Figure 6. Notation for relationships (diamond shape connected to associated entities). Another occasionally used notation for relationship is to get rid of the diamond and simply put the relationship name as a label of the line that represents the relationship. The previous example is now depicted as follows: 24 Figure 7. An alternative notation for relationships (line directly connected to associated entities). Although in most cases relationships are not associated with any attributes, it is possible that attributes may be required to describe some relationships. Suppose we have a relationship called borrow which relates the Student and Book. Obviously we need to keep the due date for return for each book on loan. The information can only be attached to the borrow relationship as it is not an attribute of Student or Book. In some literature, such a type of relationship is referred to as associative entity. Teaching remark  In the Curriculum and Assessment Guide (C&A guide), no associative entity is mentioned. However, associating attributes to relationship is very common in practice and the concept should be covered. Although the literature usually introduces a separate notation for associative entity, the C&A guide does not provide any for it. Having said that, we may simply associate attribute(s) to a relationship to capture the essence of an associative entity. So far, all the above examples do not offer us any information to answer the following questions.  Would an assignment be marked by more than one teacher? maximum numbers of teachers to mark an assignment?  Would a teacher mark more than one assignment? What are the minimum and maximum numbers of assignments that a teacher needs to mark?  Would there be any teacher who does not need to mark any assignment? have any unmarked assignment? What are the minimum and Is it acceptable to In order to answer the above questions, we need to know additional properties of the relationship. Degree The degree of a relationship is the number of entity sets associated with the relationship. Most relationships are binary relationship where the degree is two but ternary relationship that involves three entity sets can be found occasionally, e.g., teachers teach subjects to students. An n-ary relationship is a relationship with degree n. Many modeling approaches typically deal with binary relationships only. Ternary or n-ary relationships are typically decomposed into two or more binary relationships. Thus this e-learning package focuses its discussion on binary relationship only. 25 Cardinality and Existence (or Modality) Cardinality defines the actual number of entities that must be included in a relationship. Cardinality information can be divided into two types – minimum cardinality and maximum cardinality. Data modeling concerns whether or not the minimum cardinality is zero and whether or not the maximum cardinality is greater than one, i.e., one (1) or many (n or m), as such information will affect how a data model is translated into a data schema, i.e., database design. Existence or modality denotes whether the existence of an entity instance is dependent upon the existence of another related entity instance. The existence of an entity in a relationship is defined as either mandatory if every instance of the entity involves in that relationship. Otherwise, the existence of an entity in a relationship is defined as optional. It is clear that the minimum cardinality of an entity that has an optional existence must be zero. Conversely, a mandatory existence of an entity in a relationship implies that the minimum cardinality of the entity in the relationship is a positive integer. The following examples are devised to illustrate the above concepts. Example 1 - Man is-married-to Woman The minimum cardinality and maximum cardinality of the relationship are 0 and 1 respectively as a man (or woman) may be married to no woman (or man) and the maximum number of women (or men) that a man (or woman) can be married to is one. Obviously, both entities (Man and Woman) optionally participate in the marry relationship. Thus the existence of both entities in the relationship is optional. In ERD, a small circle is added on the line that joins an entity and a related relationship if the existence of the entity in the relationship is optional. An ERD that represents the connectivity and existence information of the marry relationship is given below. Figure 8. The “Man is-married-to Woman” scenario. 26 Example 2 - Mother give-birth-to Child As a mother may give birth to at least one child, the corresponding ERD representation is as follows: Figure 9. The “Mother give-birth-to Child” scenario. The minimum cardinality and maximum cardinality of the relationship are 1 and n (where n is a positive number) respectively as a mother must have one or more children. Regarding the existence of the entities in the relationship, both Mother and Child must involve in the relationship as every mother must have at least a child whereas every child must have a mother. Example 3 - Teacher teach Student Assuming a normal school setting in which all teachers and students are involved in the teach relationship, the relationship is of the many-to-many (m:n) type as many teachers would teach a student whereas a teacher would teach many students. According to the assumption, the minimum cardinality of the relationship is one. The maximum cardinality of the relationship is many. The corresponding ERD representation is as follows: Figure 10. A “Teacher teach Student” scenario. 27 If there exists some teacher who is taking a study leave and thus does not teach any student, the above ERD will become: Figure 11. An alternative “Teacher teach Student” scenario. The last example shows that an ERD can be correctly constructed only if all data requirements are collected. Any missing requirement may result in an inaccurate data model, which in turn would mislead a database designer to create an incorrect data schema. Thus, it is important for a database designer to confirm with the end-users that all data requirements are correctly captured, typically with the use of ERD as a communication tool. To enable an effective communication between the two parties, the database designer must teach the end-users how to read an ERD. Developing Entity Relationship Model Steps in Developing Data Model There is no standard way as to how a data model should be built. Typically, entities and relationships are modeled first, followed by key attributes, then non-key attributes. As an example, the steps described by the Information Technology Services of the University of Texas in its online practical guide to data modeling are listed below. 1. Identification of data objects and relationships 2. Drafting the initial ER diagram with entities and relationships 3. Refining the ER diagram 4. Adding key attributes to the diagram 5. Adding non-key attributes 6. Diagramming generalization hierarchies 7. Validating the model through normalization 8. Adding business and integrity rules to the model Although the steps are presented in a linear manner, the process of database design is usually iterative, i.e., some steps may need to be repeated before a final design results. This note will only cover the first three steps. Steps 4-5 are straightforward to follow whereas Steps 6-8 requires a 28 more elaborated discussion which is definitely out of the scope of the current curricula of the A/AS level computer subjects. In order to explain how a data model can be developed in accordance with the suggested steps, a problem scenario about a bookstore is given below and illustrations in light of the example will be given as far as possible. ABC Bookstore is planning to automate its inventory, enquiry, sales and purchasing functions by introducing a database management system. keep track of the stock level of each book title. The inventory system will The sales system will keep track of the details of each sales order (which is supposed to be of cash sales type only). A sales order may involve multiple titles of any given quantities. When the inventory of a title drops below a re-order level, a pre-determined re-order quantity for that title must be ordered from the supplier of that book title. Each book title is assumed to be supplied by one publisher only and a publisher may supply multiple book titles. At the end of each day, the purchasing system will be run to compile a number of purchase orders detailing the book titles, quantities needed from each publisher. Note that all book titles to be re-ordered from the same publisher must be grouped into a single order. sale. Sales details will be removed from the sales system 6 months after the Details of purchase orders will be removed from the purchasing system 6 months after the purchase orders are fulfilled. the orders are delivered. A purchase order is fulfilled when all the items in For simplicity, we assume no partially fulfilled orders. Concerning the enquiry function, the database should support enquiries based on author name and book titles. Teaching remarks  Developing an ERD from a problem description is not easy at all and it requires a lot of expertise. Many learners find it difficult to learn the skill because without any expert’s advice or comment, they do not know whether the ERD that they have developed is correct or not. Thus teachers must be prepared to give a lot of feedback to students when teaching the topic.  To help student to learn the skill, give them very simple problems (that can be described in no more than three sentences) to start with. Identification of data objects (entities) and relationships Developing an ERD typically begins with a general description of the organization’s operations and procedures obtained during the requirements analysis. The purpose is    to classify data objects as entities or attributes to identify relationships between entities to name and define identified entities, attributes, and relationships 29 While it is easy to define the basic construct of the ER model, it is not easy to distinguish their roles in building the data model. Should a data object be modeled as an entity or attribute? In the ABC Bookstore example, apparently a book title has attributes like author(s), ISBN, publisher, and year of publication, etc. It is also possible to model author as a separate entity. The correct answer usually depends upon the requirements of the data base. Generally, the following guidelines are adopted.  Entities contain descriptive information and they represent many things which share properties. It is unlikely that an entity set/type would associate with no description information or have one instance only.  Attributes identify (i.e., an identifier), describe entities, or make reference to other entity instances.  Relationships are associations between entities. In order to identify all potential entities and attributes, all nouns (or noun phrase) in the problem description are singled out. Both entities and attributes tend to be associated with those descriptive noun phrases. If there is no descriptive information associated with a noun phrase, it is unlikely to be an entity. Nouns/noun phrases (in the ABC Bookshop example) ABC Bookstore inventory enquiry sales purchasing functions database management system inventory system stock level book title sales system details sales order cash sales type re-order level re-order quantity supplier publisher day purchasing system purchase orders quantities fulfilled orders enquiry function author name 6 months As we will be able to see soon, some of the above nouns/noun phrases are in fact irrelevant whereas some additional data not appeared in the problem description are needed to be added to the data model. Several guidelines can help learners identify candidates of entities and attributes.  It is unlikely that an entity set/type would associate with no description information or have one instance only. For example, there is only one instance of ABC Bookstore. It is thus unlikely to be an entity set/type. It is not an attribute too. In fact the bookshop offers a context for the problem scenario and all entities and relationships are under its umbrella. Another example is “database management system”. 30  Some general terms like “system” can usually be safely removed while some other general terms like “details” may need to be elaborated.  A problem description may not be complete. Some data that need to be modeled may be omitted. It is important for the learners to detect such a kind of omission and put the omitted data objects back to the data model. For example, the dates of the sales and purchase orders have never been mentioned explicitly in the problem description but it is clear that they must be kept in the database. Another omission is publisher’s details like contact information.  Some descriptions may be related to the processing aspect instead of the data aspect of the application and they can be safely skipped when developing a data model. For example, the second paragraph of the problem description gives details of the processing requirements, i.e., how data should be processed to give results that users want. Basically what it says is that programs need to be run (1) to support the enquiry function; (2) to produce purchase orders; and (3) to remove old purchase and sales orders details from the database. The three mentioned functions rely mostly on data already stored in the database and require only a few new data to support those functions, e.g., purchase order fulfillment date. In reality, end-users are often approached by database designers to clarify data requirements when developing a database. The entities identified from the problem description are     Book Publisher Sales order Purchase order Their attributes are  Book – ISBN (unique for each book), book title, author(s), unit price, stock level, re-order level, re-order quantity   Publisher – Publisher name (unique for each publisher), address, phone. Sales order – sales order number, sales order date, sales order amount, (for each book sold) ISBN, unit price, quantity.  Purchase order – purchase order number, purchase order date, purchase order amount, order fulfillment date, (for each book sold) ISBN, quantity. It is possible that different people may come up with a slightly different set of entities and attributes even they all work on the same problem. In the ABC Bookstore example, one may decide to store stock level, re-order level, and re-order quantity of each book title as a separate entity. Such a proposal is also acceptable and will result in a slightly different ERD at the end. However the data schema derived from both ERDs will be the same as we will demonstrate later. 31 Teaching remarks  Try not to judge the correctness of a list of entities (and perhaps attributes too) from a problem description that the students pass to you too soon as it will be difficult to know whether or not their answer is correct without examining the whole ERD.  It is a good idea to identify potential entities and attributes before proceeding to the identification of relationships. Relationships link entities and thus we can focus to find verbs/verb phrases that link the potential entities. Verbs/verb phrases (in the ABC Bookshop example) Many printed and online resources would suggest identify potential relationships by identifying verb (phrases) from the problem description. However such a method does not work well in many cases. For example, some verbs or verb phrases that we have identified from the problem description are as follows: … planning to automate its … … introducing a database management system … … keep track of the details of … … is supposed to be … … may involve multiple titles … … drops below a re-order level … … must be ordered … … is assumed to be … … supply multiple book titles … … run to compile … … grouped into … … will be removed from … … are fulfilled … … are delivered … It is not easy to see how they can hint at the identification of valid relationships. We propose the following steps to identify relationships and they are found to be particularly useful in dealing with small problems. 1. Identify all potential entities first. 2. Exploit any possible relationship between each pairs of the entities by cross-referencing them with the problem description. (Only binary relationships are considered.) 3. Read the problem description and see whether the identified entities and relationships can capture all the users requirements described in the problem description. If not, go back to Steps 1. 32 Earlier on, we have identified four entities: Book, Publisher, Sales order, and Purchase order. The potential relationships among them are as follows.  is-included-in – Book is-included-in Sales order    is-published-by – Book is-published-by Publisher is-referred-in – Publisher is-referred-in Purchase order is-specified-in – Book is-specified-in Purchase order No obvious relationship can be identified between Sales order and Purchase order, and Publisher and Sales order. Drafting the initial ERD with entities and relationships The initial ERD aims to provide a pictorial representation of the major entities, and the relationships between them. Cardinality of each relationship is required to be shown. The initial ERD for the ABC Bookstore example could be as follows: Figure 12. An initial ERD for ABC Bookstore. Figure 13 gives the initial ERD if the inventory information of book title is modeled as a separate entity. 33 Figure 13. An alternative initial ERD for ABC Bookstore. No attributes are shown in the ERD above for simplicity. In practice, details of entities are shown in a separate document called data object description. The document typically contains the name of each entity and purpose, name and data type of each attribute for every entity, as well as the attribute characteristics such as whether its value is unique and/or mandatory, etc. Refining the ER diagram Check whether the initial ERD meets any users requirements specified in the problem description. If not, identify the inadequacy and propose new entity, attributes and/or relationships and redraw the ERD. For example, one may leave the order fulfillment date in the Purchase order entity in the initial ERD but such as omission can be identified when checking whether the initial ERD be able to meet the users requirements specified in the problem description. The ERD given in Figure 12 (or the one in Figure 13) appears to be able to meet all users requirements and thus will not be refined further. 34 Converting ERD to Database Tables ERD is a result of data analysis and it must be used in the data design process to help generate data schema. A basic 3-rule conversion process can be applied to translate an ERD into a data schema that meets the criteria of the third normal form (which will be detailed later). We refer the conversion process to as the basic conversion process. Basic Conversion Process The three rules in the process are as follows: 1. For a 1:1 cardinality relationship, all the attributes of the related entities are grouped into a single table. 2. For a 1:n cardinality relationship, model each of the related entities in a separate table and post the primary key of the “one” side entity as an (foreign key) attribute to the table that represents the “many” side entity. 3. For an m:n cardinality relationship, model each of the related entities in a separate table and create a new table (which is referred to as the intersection table) and post the primary key of each entity set/type as an attribute in the new table. If the relationship has its own attributes, those attributes are to be stored in the intersection table too. The primary key of the intersection table is a composite key which includes the primary key of each concerned entity type. Example 1 – 1:1 Relationship In the ABC Bookstore example, if an Inventory entity is introduced for representing the inventory information of book title (Book), we will have the following relationship. Figure 14. The Book is-associated-with Inventory relationship. The relationship indicates that each book title is associated with exactly one piece of inventory information and vice versa. Since the relationship is of 1:1 type, all attributes of the entities will be stored in the same table according to the first rule of the basic conversion process. As a result, the attributes to be stored in the resultant table will be exactly the same as the table corresponding to the Book entity in the original ERD. They are ISBN (unique for each book), book title, author(s), unit price, stock level, re-order level, and re-order quantity. This explains why various ERDs may lead to the same data schema. 35 For ease of reference, the attributes of a table are shown in the following notation. TableName(key-attribute1, …, key-attributeN, other-attribute1, other-attribute2, ….) The attributes of the Book table are given below. Book(ISBN, book_title, author, unit_price, stock_level, re-order_level, re-order_quantity) Note that all author names of a book title are assumed to be stored in the author field. Besides, more attribute(s) will be added to the above Book table as we deal the relationship between the Book and Publisher entities. Example 2 – 1:n Relationship In the ABC Bookstore example, we have the following relationship that links the Publisher and Purchase order entities. Figure 15. The Publisher is-referred-in Purchase order relationship. The relationship indicates that a publisher may be associated with any number of purchase orders (zero to many) whereas each purchase order is associated with exactly one publisher (as each purchase order will only be placed to one publisher). According to the second rule of the basic conversion process, the primary key (or identifier) of the Publisher entity must be posed to the table that represents the Purchase order entity. The resultants tables for representing the relationship will be as follows: Publisher(publisher_name, address, phone) Purchase_order(purchase_order_number, order_fulfillment_date) purchase_order_date, purchase_order_amount, Note that the ISBN and quantity of each book title being specified in a purchase order are excluded from the Purchase_order table as there exists an m:n relationship between the purchase order and book title entities. Such attributes need to be housed in a separate table as illustrated in the next example. Example 3 – m:n Relationship 36 In the ABC Bookstore example, we have the following relationship that links the Book and Purchase order entities. Figure 16. The Book is-specified-in Purchase order relationship. The relationship indicates that a book title may be associated with any number of purchase orders (zero to many) whereas each purchase order is associated with at least one book title. According to the third rule of the basic conversion process, the primary keys of both the Book and Purchase order entities must be posed to a new table, i.e. the intersection table, to link to the tables that represent the concerned entities. The resultants tables for representing the relationship will be as follows: Book(ISBN, book_title, author, unit_price, stock_level, re-order_level, re-order_quantity) Purchase_order(purchase_order_number, purchase_order_date, purchase_order_amount, order_fulfillment_date) Book_in_Purchase_order(purchase_order_number, ISBN, ordered_quantity) Example 4 – Data schema for the ABC Bookstore Example After applying the 3 rules specified in the basic conversion process, we can obtain the data schema for the ABC Bookstore example as follows: Book(ISBN, book_title, author, unit_price, stock_level, re-order_level, re-order_quantity, publisher_name) Publisher(publisher_name, address, phone) purchase_order_date, purchase_order_amount, Purchase_order(purchase_order_number, order_fulfillment_date) Book_in_purchase_order(purchase_order_number, ISBN, ordered_quantity) Sales_order(sales_order_number, sales_order_date, sales_order_amount) Book_in_sales_order(sales_order_number, ISBN, quantity_sold, unit_price) Note that the Book table has been added with a new field, publisher_name, after considering the relationship between the Book and Publisher entities. This illustrates that the definition of a table will not be finalized until all relationships connected to the entity concerned are considered. 37 Drawback of Basic Conversion Process The basic conversion process does not guarantee that null attribute values are minimized and problems may occur for entities with optional occurrences. Suppose a school has a number of lockers at different buildings and each student is entitled to have one locker on request. Due to the uneven demand of lockers at different buildings, some lockers are unused whereas some students are assigned to no locker. The relationship is given below. Figure 17. The Student is-assigned-to Locker relationship to illustrate the drawback of the basic conversion process Since the relationship is of 1:1 type, we may put all attributes of the two entities together into one single table. Assuming the attributes of the Student and Locker entities are:  Student – student_ID, student_name, programme_enrolled.  Locker – locker_ID, building, floor. Two possible table structures can be developed as below. Student(student_ID, student_name, programme_enrolled, locker ID, building, floor) Locker(locker_ID, building, floor, student_ID, student_name, programme_enrolled) Both of the above table structures are problematic. In the first table structure, lockers that are not assigned to any students cannot be represented. In the second table structure, students that are not assigned to any lockers cannot be represented. Optional-max Conversion Process The problem illustrated in the last example can be overcome by introducing another rule to the basic conversion process and we refer the augmented process to as the optional-max conversion process. The rules to be applied in the new process are as follows: 1. For every instance where the lower cardinality bound is zero and the upper cardinality bound is one, temporarily label the upper cardinality bound of as n, i.e., many. 2. Apply the basic conversion process as usual. After applying rule 1 of the optional-max conversion process, the relationship becomes 38 Figure 18. The Student is-assigned-to Locker relationship after the first rule of the optional-max conversion process is applied Now the relationship is considered as of an m:n type and will be modeled by three tables according to the third rule of the basic conversion process. The resultant tables are: Student(student_ID, student_name, programme_enrolled) Locker(locker_ID, building, floor) Assign(student_ID, locker_ID) With the new table structures, details of both empty lockers and students who are not given any lockers can be represented. Teaching remark  One may suggest handling the relationship as 1:m type. This will result in two tables, either with the student_ID posed to the Locker table or the locker_ID posed to the Student table. The proposed table structures can represent empty lockers and students who are not given any lockers too. However the proposal will result in null entries in at least one table. In situations that involve associative entity (i.e. relationship with attribute), more null entries would be resulted. For example, if the date that a locker is assigned to a student is to be recorded, the field will be null for an unassigned locker should we treat the relationship as 1:m type. The optional-max conversion process provides a more resilient solution to the problem as the date field will be kept in the intersection table, i.e., the Assign table. 39 Introduction to Normalization Normalization is a database design technique based on analyzing relations among key and non-key attributes of database tables. This technique includes a series of rules or steps to normalize the database into a number of tables depending on the degree of normalization that one wants to achieve. The database design compliant to those rules correspond to a specific normal form such as first normal form (1NF), second normal form (2NF) and third normal form (3NF), …, etc. Despite the existence of higher normal forms, only the 1NF, 2NF and 3NF will be covered. Higher normal forms imply a data schema with more tables and querying such a database would involve more efforts in “joining” tables together. In practice, most database designers generate data schemata normalized to 3NF in order to strike for a balance between maintainability and efficiency. Readers who are interested to have an overview of various normal forms (from 1NF to 6NF) may visit Wikipedia’s page on database normalization. Why Normalization The main purpose of normalization is to minimize data redundancy and anomalies. In the following section, we will show the problem of data redundancy and update anomalies through a problem scenario. Data Anomaly Data anomaly refers to the unexpected phenomena that occur when updating a database that exhibits data redundancy. There are several types of data anomaly – insertion, deletion and modification (or update) anomalies. Insertion Anomaly Could we record insertion of some data object of interest in a table? addition anomaly. If no, the table suffers from Deletion Anomaly Could we record deletion of some data object of interest in a table without losing any information? If no, the table suffers from deletion anomaly. Modification Anomaly Would an update in one attribute’s value be recorded in a table more than once? suffers from modification anomaly. If yes, the table 40 Functional Dependencies In order to understand why data anomalies exist, we need to understand the concept about functional dependencies. Functional dependencies are used to describe the dependency between the attributes within a table. Given A and B are attributes of the same table, the attribute B is functionally dependent on the attribute A if each value of A is associated with one and only one value of B. The notation to represent the above notion is A B. It may be read as A determines B. Suppose A is a composite attribute. Attribute B is said to be full functionally dependent on attribute A if B is functionally dependent on A and not functionally dependent on any proper subset of A. If B is functionally dependent on some proper subset of A, B is said to be partially dependent on A. Teaching remarks  Some textbooks and online resources on database may define full functionally dependency as follows: Attribute B is said to be full functionally dependent on attribute A if B is functionally dependent on A and not functionally dependent on any subset of A. Such a definition is incorrect as the authors fail to distinguish the difference between proper subset and subset. Any set is a subset of itself. A proper subset of a set is any subset of that set excluding the set itself.  An A/AS level textbook defines partial dependency as follows: one or more non-key attributes depend on part of the primary key. This is not entirely correct as the notion of functional dependencies does not restrict the independent attribute (attribute A) to be a primary key as described in the book. Suppose there is a Student_in_Society table storing information about student roles in various societies and clubs in a school. The table also contains information of the teacher supervisor of each society. The table has the following attributes (field name in parentheses): student_ID (StdID), student_name (StdName), society_ID (SocietyID), society_name (SocName), student_role_in_society (Position), society_teacher_ID (SupID), and society_teacher_name (Supervisor). Given the fact that each society has exactly one society teacher to give the society advice, the primary key of the table is a composite key composed by student_ID and society_ID. Figure 19 shows the full functionally dependency among the attributes in the table. 41 Figure 19. Full functionally dependency among attributes in the Student_in_Society table. First Normal Form If every attribute of the relation is atomic, then the relation is said to be in first normal form (1NF). An attribute is atomic if it is not multi-valued, i.e. without repeating groups. A table which is not in 1NF is in unnormalized form (UNF). The Student_in_Society table below is in UNF as SocietyID is a multi-valued attribute. StdID StdName SocietyID SocName SupID Supervisor Position 042123 May Wong 001 003 Chinese Maths 1 2 Mr. Wong Ms. Chan Chairman Member 042132 Katie Lee 001 Chinese 1 Mr. Wong Member 042142 June Chan 002 005 008 English Physics Biology 1 3 4 Mr. Wong Mr. Lee Miss Yu Member Chairman Member Figure 20. The Student_in_Society table in UNF. The usual way to modify a table in UNF to 1NF is to store the details of the repeating groups in a separate table. This will result in the following table structures. Student(StdID,StdName) Student_in_Society(StdID, SocietyID, SocietyName, SupID, Superviser, Position) The tables with data are shown in Figure 21. Student table StdID StdName 042123 May Wong 042132 Katie Lee 042142 June Chan 42 Student_in_Society table StdID SocietyID SocName SupID Supervisor Position 042123 001 Chinese 1 Mr. Wong Chairman 042123 003 Maths 2 Ms. Chan Member 042132 001 Chinese 1 Mr. Wong Member 042142 002 English 1 Mr. Wong Member 042142 005 Physics 3 Mr. Lee Chairman 042142 008 Biology 4 Miss Yu Member Figure 21. The Student and Student_in_Society tables in 1NF. It is a bad idea to store the multi-valued data in the following table structure. Student_in_Societies(StdID, StdName, SocietyID1, SocietyName1, SupID1, Superviser1, Position1, SocietyID2, SocietyName2, SupID2, Superviser2, Position2, SocietyID3, SocietyName3, SupID3, Superviser3, Position3) The table above cannot accurately represent the relationship in the real world because a student should not be restricted to join three societies only. Allowing a student to join the fourth society implies a modification of the table structure, which can be troublesome once data have been entered in the table. Anyway the table is not in the 1NF. Note that many data anomalies cannot be removed by normalizing tables to 1NF. For example, if Mr. Kwan replaces Mr. Wong to become the society teacher of the Chinese Society, two rows in the Student_in_Society table in Figure 21 need to be updated (i.e., modification anomaly). It also suffers from insertion anomaly as we cannot store information about a new society as no students have joined it. Deletion anomaly exists when the last member of a society quits. The society information will then be permanently removed from the database. Second Normal Form A table is in the second normal form (2NF) if   it is in 1NF, and it exhibits no partial dependencies, i.e., every non-key attribute in the table is full functionally dependent on the primary key of the table. If a table is in 1NF but not in 2NF, it must have a composite primary key according to the second property of the 2NF. To “promote” a table from 1NF to 2NF, we need to remove the partial dependencies in the table. 43 Let us further work on the Student_in_Society table in Figure 21 to illustrate the notion of 2NF. We illustrate that the functional dependencies for the student table are as follows: StdID, SocietyID Position SocietyID  SocName (Full functionally dependency) (Partial dependency as SocietyID is a part of the primary key only) (Partial dependency as SocietyID is a part of the primary key only) SocietyID  SupID We can reconstruct a table in 1NF to 2NF by extracting those fields that exhibit partial dependency in the table to one or more separate tables. In our example, the Student_in_Society table can be made conform to 2NF by extracting SocietyName, SupID, Superviser to a separate table, say the Society table. The attribute that the three extracted fields full functionally dependent on, i.e., SocietyID, will be copied to the Society table to serve as the table’s primary key. structures are: The new table Student(StdID,StdName) Society(SocietyID, SocietyName, SupID, Superviser) Student_in_Society(StdID, SocietyID, Position) The tables in 2NF with their data are shown in Figure 22. Student table StdID StdName 042123 May Wong 042132 Katie Lee 042142 June Chan Society table SocietyID SocName SupID Supervisor 001 Chinese 1 Mr. Wong 002 English 1 Mr. Wong 003 Mathematics 2 Ms. Chan 005 Physics 3 Mr. Lee 008 Biology 4 Miss Yu 44 Student_in_Society table (revised) StdID SocietyID Position 042123 001 Chairman 042123 003 Member 042132 001 Member 042142 002 Member 042142 005 Chairman 042142 008 Member Figure 22. The Student, Society and Student_in_Society (revised) tables in 2NF. Tables in 2NF are not able to solve all data anomalies either. Although the insertion and deletion anomalies associated with the Student_in_Society table (in 1NF) have gone, the modification anomaly still exists in the Society table in Figure 22. Suppose Mr. Wong resigns and a new teacher, Mr. Kwan, will replace Mr. Wong to become the society teacher of all societies that Mr. Wong used to be responsible for. Note that Mr. Kwan will use the same SupID as Mr. Wong does. To reflect such a change in the Society table, two rows (instead of one) need to be updated. Third Normal Form A table is in 3NF if:   it is in 2NF, and it exhibits no transitive dependencies Transitive dependency exists if one or more attributes are functionally dependent on some non-key attribute(s). If there are three attributes in a table called A, B and C such that A  B and B  C. Obviously A  C and the attribute C is transitively dependent on A. In the Society table of our example, SocietyID  SupID and SupID  Supervisor and thus SocietyID  Supervisor which is a kind of transitive dependency. To convert a table in 2NF to 3NF, attributes that contribute to transitive dependencies are extracted to separate table(s). The Society table can be made conform to 3NF by extracting Supervisor to a new table, says the Society_Teacher table. The attribute that the Supervisor field full functionally dependent on, i.e., SupID, is copied to the Society_Teacher table to serve as the table’s primary key. This will result in the following table structures. Student(StdID,StdName) Society(SocietyID, SocietyName, SupID) Student_in_Society(StdID, SocietyID, Position) Society_Teacher(SupID, Superviser) The tables in 3NF with their data are shown in Figure 23. 45 Student table StdID StdName 042123 May Wong 042132 Katie Lee 042142 June Chan Society table (revised) SocietyID SocName SupID 001 Chinese 1 002 English 1 003 Mathematics 2 005 Physics 3 008 Biology 4 Student_in_Society table StdID SocietyID Position 042123 001 Chairman 042123 003 Member 042132 001 Member 042142 002 Member 042142 005 Chairman 042142 008 Member Society_Teacher table SupID Supervisor 1 Mr. Wong 2 Ms. Chan 3 Mr. Lee 4 Miss Yu Figure 23. The Student, Society (revised), Student_in_Society and Society_Teacher tables in 3NF. Figure 24 shows the full series of changes introduced to transform the original data schema (in UNF) to the final design (in 3NF). 46 Figure 24. How the original design evolved from UNF to 3NF. Database Design Exercise 01 For the description of the following scenarios, complete the ER diagram. Scenario description: 1. In a school, students are allocated to different classes. Each student must be allocated to exactly one class, and a class is formed by at least 30 students. Each class must be managed by several different students, namely, prefect, monitor, etc. STUDENT Is allocated to CLASS Is managed by Is assigned to CLASS POST 47 2. A construction company has over 1000 employees. A client can hire this company to do projects. Usually, several types of employees are grouped together to finish a project, e.g. a project may require an accountant, 2 engineers, 1 managers and 1 system analyst. At the same time, an employee may take up more than one project. Also, to finish a project may require a number of equipments. Equipment Employee Client Is assigned to Hires Works Project Transform the following ER diagram into the database structure. Please show the structure of the database in the form of Tablename (keyfield, field1, field2, …) 3. In a school, a student may be assigned with one or more functional posts, like prefect, monitor, chairman. A post must be assigned to exactly one student. Complete the following E-R diagram. Stud_ID Name Post_ID Is assigned to Address Date_birth Post_Name Then, transform the above ER diagram into database structure: 48 4. For a chain store, it has a number of branches and each branch will have a manager and several staff. E.g. staff1 and staff2 belong to branchA whereas staff3 and staff4 belong to branchB. The salaries of the staff are according to the salary points which are according their positions and year of service. E.g. a manager with 5 years of services will have a salary point 15 which is $25,000 and a junior staff with 2 years will have a salary point 2 which is $6,000. Complete the ER diagram: Then, write down the database structure: 5. Which of the following would be multi-valued attributes? a) Contact person for a company b) Qualification of a teacher c) The name of CEO for a company d) The contact phone number for a student e) The title of a book f) Medicine for a patient g) The owner of a credit card h) The courses taken by an undergraduate Give two examples by your own: 49 6. Staff in a trading company A will purchase products from other companies through some sales agents. a) Construct the ER diagram if there is just one sale agent for each company and staff from different departments may contact the same company. DEPARTMENT have STAFF b) COMPANY contact Construct another ER diagram if there may be more than one sale agent for a company. DEPARTMENT have STAFF c) COMPANY contact To remove the multi-valued problem, we can transform the ER diagram into COMPANY through CONTACT_LIST The database structure would now become COMPANY (Comp_ID, Name, Address) CONTACT_LIST (Comp_ID, Agent_Name, Phone, Email) 50 7. Patient takes more than one medicine, and so, the ER diagram will be take 8. To remove multi-valued attribute means 1st Normalization. For the following scenario, which attribute will be multi-valued? How to perform the first normalization by modifying the database structure? Patient_ID 9. Name Date_birth Medicine_Name Quantity Apart from multi-valued attribute problems, we should solve problem of M:N relations, first we will look at the 1:1 relation: a) Assume there is just one class master for every class, so the ER diagram would be belong and so, the database structure (database schema) would become b) Then, we will look at some 1:M relation: Assume each employee will belong to a department and a department has to have at least one employee. Then, the ER diagram will be 51 belong and so, the database structure (database schema) would become c) Last, we will look at some M:N relation: Assume teacher will teach a number of classes and each class will have several teachers to teach different subjects, so the ER diagram would be teach 52 However, since it is a M:N relation, so, it will be transformed into and so, the database structure (database schema) would become <End of Database Design Exercise 01> 53 Database Design Exercise 02 1. Given that the relationship Teaches between entities TEACHER and COURSE is one-to-many. Table should include a foreign key 2. A. TEACHER, course_id B. TEACHER, teacher_id C. COURSE, teacher_id D. COURSE, course_id . In transforming into database schema, a multi-valued attribute A. will be mapped into a foreign key B. will be stored in multiple rows of the same table C. will be stored in multiple columns of the same table. D. will require creating a new table Study the paragraph below carefully and answer the following four questions: In an air freight service company, each customer will request a sales order for a freight. Each sales order is taken care of by one salesperson. Each salesperson may take care of many sales orders. A sales order is a freight requested by a customer. Each customer has made a request for at least one freight. 3. 4. 5. 6. Which of the following tables will Salesperson_id not be found? A. CUSTOMER B. ORDER C. SALESPERSON D. none of the above Which of the following tables will customer_id not be found? A. CUSTOMER B. ORDER C. SALESPERSON D. none of the above Which of the following tables will salesperson_id be used as a foreign key? A. CUSTOMER B. ORDER C. SALESPERSON D. none of the above Which of the following tables will customer_id be used as a foreign key? A. CUSTOMER B. ORDER 54 7. C. SALESPERSON D. none of the above Given that the relationship studies between STUDENT and SUBJECT is many-to-many. A. A new table is needed. B. The tables should be combined into one. C. The relationship studies should be converted into an attribute D. A foreign key should be added to a table SUBJECT 1 2 3 4 5 6 7 C D A C B B A 1. 8 9 10 11 12 13 14 15 16 17 18 19 20 The discipline Master of a school wishes to store the late records of students. The following E-R diagram is drawn. Stud_id Name Date Commits STUDENT X Parents_ name Y Time LATE Phone Reason Address If a student is being late for more than 3 times in a semester, a clerk will make a phone call to their parents. If a student is being late for more than 5 times in a semester, a clerk will send the parents of the student a letter to notify the problem through the address in the above ER diagram. a) By investigating the above diagram, what problem will be suffer? parents_name may be multi-valued. b) For the side of the entity STUDENT, is it optional or mandatory? It is mandatory. c) What is the value of X and Y in the diagram? X: 1 , Y: M, it is a one-to-many relation d) Dissolve the ER diagram and present it in a database schema. Remember to identify the primary key of tables involved. STUDENT(stud_id, name, address, phone) LATE(stud_id, date, time, reason) 55 STUDENT_PARENT (stud_id, parents_name) 2. The discipline Master of a school wishes to store the late records of students. The following E-R diagram is drawn. If a student is being late for more than 3 times in a semester, a clerk will make a phone call to their parents. If a student is being late for more than 5 times in a semester, a clerk will send the parents of the student a letter to notify the problem through the address in the above ER diagram. a) By investigating the above diagram, what problem will be suffer? parents_name may be multi-valued. <End of Database Design Exercise 02> 56 Database Design Exercise 03 M.C. 1. 2. 3. 4. 5. Which of the following is an example of entity? A. “Mr. Cheung” B. Teacher “Mr. Cheung” C. Teacher D. The subject taught by “Mr. Cheung” A primary key (1) can be made up of more than one field (2) is always a candidate key (3) can have null value A. (1) only B. (1), (2) only C. (1), (3) only D. (1), (2) and (3) only Which of the following is not an appropriate attribute for an entity “golf coach”? A. name B. sex C. charge per hour D. booked_date The field name in a table must (1) be unique (2) be made up of English letters (3) not be the same as the table name A. (1) only B. (1), (2) only C. (1), (3) only D. (1), (2) and (3) only Which of the following should use memo data type? A. sex B. date of birth C. product description D. name of student 57 6. 7. 8. Which of the following is NOT a purpose of creating index? A. Carry out sorting with a smaller amount of data B. Improve data searching performance C. Improve data ordering performance D. Improve data updating performance Which of the following is / are important in maintaining referential integrity? (1) Foreign keys (2) Set validation rules (constraints) (3) Avoid using of derived attributes A. (1) only B. (1), (2) only C. (1), (3) only D. (1), (2) and (3) only Which of the following is / are derived attribute? (1) The attribute AverageMark in the table Student Student (ID, name, EngMark, MathMark, ChineseMark, AverageMark) (2) The attribute AverageMark in the table Class Student (StuID, name, sex, ClassID) Subject (SubjCode, StuID, Mark) Class (ClassID, SubjCode, AverageMark) (3) The attribute Post in the table ClubMember ClubMember (ClubID, StuID, Post) 9. 10. A. (1), (2) only B. (1), (3) only C. (2), (3) only D. (1), (2) and (3) What would be the consequence caused by derived attribute? (1) Data inconsistency may be resulted when updating. (2) Data can be retrieved more efficiently (3) Data Security is lowered. A. (1), (2) only B. (1), (3) only C. (2), (3) only D. (1), (2) and (3) What would be used to enhance domain integrity? 58 11. (1) Set indexes to an attribute (2) Set foreign keys (3) Setting validation rules (constraint) A. (1) only B. (2) only C. (3) only D. (1), (2) only What would be used to enhance entity integrity? (1) Set constraint such that the value of an attribute for a composite primary key cannot be NULL (2) Set constraint such that the value of an attribute for a non-composite primary key cannot be NULL (3) Set constraint such that only a particular set of data can be inputted to an attribute. 12. 13. 14. 15. A. (1) only B. (2) only C. (3) only D. (1), (2) only Which of the following about a relational table is NOT true? A. Table name is unique B. Primary key is unique C. Foreign key is unique D. Field name is unique The referential integrity constraint for a field requires A. a primary key of a table matches with the foreign key of another table B. a foreign key of a table matches with the primary key of another table C. a primary key to be non-empty and unique D. data come from the same domain In a relational table, A. a field may have multi-values B. the sequence of fields is insignificant C. the sequence of rows is significant D. rows can be duplicated The advantage of program-data independency is A. structure of data can be changed without having to change the application program B. the program has no privilege to access the database C. structure of data can be known by studying the program codes D. low level programming language can be used 59 16. 17. 18. Data redundancy can be minimized by A. entering data only when necessary B. using database approach C. data validation D. using traditional file-processing system The domain constraints for a field require the field to have A. non-duplicating values B. non-empty values C. the same data type and range D. the same values In database architecture, database can be viewed at view level and (1) Conceptual level (2) Physical level (3) Logical level 19. 20. A. (1) only B. (2) only C. (3) only D. (1) and (2) only The degree of the relationship of “Students borrow books” is A. 1 B. 2 C. 3 D. none of the above Which of the following statements about a relation “audience watch TV programs” is correct? A. The existence of the entity “audience” in the relationship “watch” is optional. B. The existence of the entity “TV program” in the relationship “watch” is mandatory. C. The maximum cardinality of the entity “TV program” is 1 D. The minimum cardinality of the entity “TV program” is 1 Answers: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 C B D A C A A A A C D C B B A B C D B A 60 1. The following database is used to store the students learning portfolio. This portfolio should contain data for the whole secondary school life, data like in which year the students participate in which club should be included. Student (StuID, name, address, HKID, phone, sex, DateBirth) Club (ClubID, ClubName, TeacherInChargeID) JoinClub (RecordNumber, StuID, ClubID, Post) a) Point out the primary key and the candidate keys of each table. Primary Key Candidate keys Student StuID StuID, HKID Club ClubID ClubID JoinClub RecordNumber RecordNumber (Not StuID + ClubID) b) In what ways this database schema will not work properly, try to write the SQL statement to overcome of the above problem. It is assumed to store the information of a particular year only. If it is used to store several year data, we have to add a new field called year to both tables Club and Joinclub. ALTER TABLE club ADD year char(4) ALTER TABLE Joinclub ADD year char(4) 2. Inspect the following database schema, briefly describe some scenarios such that they will not perform correctly. (i) Record (BookCode, StuID, DoB, Returned) where DoB means Date of Borrow. Lack the field amount, it will not function properly if two books with the same bookcode that is borrowed by the same student, but in fact, it may happen. (ii) SportTrainer (ID, Name, typeofsport, charge, gender) If the trainer will be able to train more than one type of sports, then, this database structure will be in problem. (iii) For a fitness training center, its database structure is as follows: course (courseID, courseName, TrainerName, Charge) enrollment (courseID, memberID, IsPaid) membership (memberID, memberName, memberSex, memberDoB, expiryDate) courseDetail (courseID, DateCourse, timeZone) where DateCourse means the dates to have the course opening. Some of the trainer will have no information in the database if he or she does not teach any course in the fitness center. 61 3. Now, you are the database administrator of a recreation center, you designed a form as shown below Tai Tai Recreation Center Facility Order Form Membership ID: Date to use the facility : / / Facility: Table Tennis Badminton BasketBall Volleyball FacilityCode TT01 BN01 BL01 VL01 / Charge ($20) ($45) ($150) ($100) Location  Room 113  SportsRoom1A  SportsRoom1  SportsRoom1  Room 114  SportsRoom1B  SportsRoom2  SportsRoom2  Room 115  SportsRoom1C Time to use the facility: Time zone Duration Choose 1 2 3 4 5 6 7 8 9 10 12:00 - 1:00 1:00 - 2:00 2:00 - 3:00 3:00 - 4:00 4:00 - 5:00 5:00 - 6:00 6:00 - 7:00 7:00 - 8:00 8:00 - 9:00 9:00 - 10:00           Signature: Date: It is supposed that a member cannot book more than one facility at the same time zone in a particular day. i.e. A member cannot book a table tennis court and a basketball court at the same time, or he cannot book 2 table tennis court at the same time but he can book a table tennis court for time zone 3 and 4. and a database schema as shown below: Facility (FacilityCode, FacilityName, Location, charge) Membership (MemID, Name, Sex, DateBirth, address, PhoneNumber) FacilityRecord (MemID, FacilityCode, DoB, timezone) where DoB means Date of Booking. Is there any problem in the database design? Briefly describe how to solve it. There may have several different locations for a particular facility, e.g. three rooms for TT01, so, it will the attribute Location in the table Facility to have multi-valued. To solve this problem, you should either create a new table to hold data like FacilityCode and Location or assign each location a unique FacilityCode for the facility even though they are the same kind of facility. Also, the primary key for the table FacilityRecord should be RecordNo + FacilityCode instead of RecordNo + MemID because it is supposed that each RecordNo should be ordered by just one Member only and hence MemID is full functionally Dependent to and hence a new table should be created. <End of Database Design Exercise 03> 62 Database Design Exercise 04 Question 1: Consider a relational database with three tables, STUDENT, COURSE and GRADE, as shown below: STUDENT S_NO S_NAME 1025 Mary Wu 3350 Tom Leung 4170 Peter Chow COURSE C_CODE C_NAME CREDITS CHEM203 Organic Chemistry II 2 COMF117 Computer Science I 3 MATH001 Mathematics 4 GEOG108 Geography 2 GRADE StudentID C_CODE Score 1025 CHEM203 70 1025 COMF117 75 1025 MATH001 80 3350 COMF117 55 3350 GEOG108 40 4170 GEOG108 75 a) What is the primary key in table GRADE? (1 mark) StudentID + C_Code <- It is called a composite key. We use composite key because C_Code is multi-valued and hence has to be extracted into a separate table. The ER diagram in this case is Student S_Name Course take M C_Code N S_NO C_Name score credit Where we should note that score is an attribute of the relation “take”. (b) Describe a scenario to illustrate the data integrity problem when deleting a record in one of the tables. How data integrity problem can be avoid? (2 marks) 1. When a student leaves the school and corresponding record in STUDENT is deleted 2. When a course is cancelled and the corresponding record in COURSE is deleted To avoid data integrity problem, we may 1. Delete detail record (GRADE) before deleting master record (Student, COURSE) 2. Enfore a referential integrity (or foreign key) constraint on the database. 63 Question 2: A teacher has designed a database, EXAM, to store the final examination results of students as follows: Field Name (a) Field Type Description StdNo Numeric Unqiue student number Name Character Name of the student Class Character Class of the student Sex Character M = male, F == female SbjCode Numeric Unique Subject Code Subject Character Full Name of the Subject PassMk Numeric Pass Mark of the Subject Mark Numeric Mark of the student in the subject Explain briefly how this design leads to data redundancy (2 marks) If a student takes 2 or more subjects, there will be more than 1 record for the same student and fields like Name, Class and Sex are stored multiple times. Similarly, If subjects taken by more than 1 student will have fields like subject like Subject_and_Mk stored multiple times. ->Now, we should state that attributes name, class, sex are full functionally dependent on the primary key “StdNo”, however, SbjCode would be multi-valued. So, it is unnormalized form. To fix the problem of data redundancy, the teacher breaks EXAM into three interlinked tables, which use the above field names only. (b) Complete the new design below and underline the corresponding key field(s). Underline the primary key in the corresponding table. (2 marks) Table Fields STUDENT StdNo, Class, Name, Sex SUBJECT SbjCode, Subject, PassMk EXAM StdNo, SbjCode, Mark Question 3: What is wrong in the following ER diagram? Inventory MemberID Client MemberName ProductID M PointEarned buy N Product Category ProductName Amount Price 64 Answer: The attribute of Amount should not be put in the entity “Product”, it should however, be put in the relation buy. Also, on the Client side, it should be optional instead of mandatory. i.e. Inventory Amount MemberID Client MemberName ProductID M buy N Product Category ProductName PointEarned Price <End of Database Design Exercise 04> 65 Past Paper Investigation: 2000 – AS – CA #1 1. (a) A teacher uses a database file to store the information about his students. The file has the following structure: The teacher inputs marks and grades to the database file after each test or examination. At the end of the school term, he finds some problems in the file design. Identify fields that are redundant and explain why the fields are redundant (4 marks) Totaltest – it is simply the sum of all marks of test1 to 3 and all data in this field can be obtained from the data in fields of test1 to 3. There is no loss of any information if this field is deleted. Therefore this field is redundant. Grade- since this is obtained based on average of all marks in the fields of the database, as long as the criteria for conversion of marks to grades are the same, no loss of information is envisaged if this field is deleted. Therefore this field is redundant. <- At this level, we should know that the field TotalTest and Grade are redundant, however, sometimes in the real world, database would have fields that are redundant, the reason for this is to speed up the data retrieval process. Under such condition, only very frequently used fields would be created even though it is redundant. <- Of course, data redundancy would undermine the data integrity, especially referential integrity. (b) The teacher would like to add a field that will store the talent of the students (e.g. special skills, strengths, personal interests, etc.) to the database file. The teacher cannot decide whether the field should be declared a character type or a memo type. Compare the two data types and recommend the most suitable data type for the teacher to use. (4 marks) Character type of data usually stores information of a certain length that does not differ greatly. For example, names are stored as character type of length 25. Although there are names that are short and there are names that are long, they would not be much longer than 25 characters in length. However, memo type of data stores information that may vary a lot in their lengths. 66 Memo fields can even include graphics or sounds. For example, talents of students may be very different among different students. Some students will have fewer talents and thus will have just one or two words stored in the field while for some other students with many talents, they will have as much as some paragraphs stored in the field. For the above reasons, the memo type of data is recommended for the teacher in storing the students' talents. <- Of course, we can use memo type as the data type for field ‘talent’. It may looks like Name Talent Chan Tai Man Tennis, Piano, C++ Chan Siu Man Flash, Piano, Violin Wong Siu Ling Writing By using the following SQL, SELECT name FROM student WHERE UPPER(talent) LIKES “PIANO” We are able to find the name of the student who is good at piano. However, since talent is multi-valued, it is recommended to put talent into a new table such that the field talent would contain just one skill. This is especially important when the skills are pre-defined. i.e. we can set the value of the field ‘talent’ to be the foreign key which is mapped to another database table. In that foreign key, we can set the appropriate constraint, e.g. the value of the field talent should be existed in the parents table. To illustrate more, lets talk about these two items, students and talents. Originally in the question, student is regarded as the entity and talent is regarded as attribute. Now, lets think them as two separate entities and the relation is ‘OWN’, i.e. STUDENTS OWN TALENTS. Both of them are optional. STUDENT OWN N TALENT M Note: We should always be careful about the case like what would happened when some students have no particular skills, i.e. null in the field ‘talent’ for some students, or, some student just do not appear in the table ‘STUDENT’. 67 Contents Introduction to Structured Query Language ..........................................................70 What is SQL? .........................................................................................................70 History for SQL (not within the curricula) .........................................................70 Data Definition Language and Data Manipulation Language ..........................72 An Illustrative Example – A Library System......................................................73 Commonly Used Data Types in SQL....................................................................76 SQL Statements......................................................................................................77 Creating Database Objects..................................................................................77 Create a database ...........................................................................................77 Create a table in a database ..........................................................................77 Creating Table with Integrity Rule.....................................................................78 Create table with primary key......................................................................78 Create table with foreign key........................................................................79 Modifying Table Structure..................................................................................80 Add column.....................................................................................................81 Drop column ...................................................................................................81 Change columns’ data type ...........................................................................81 Change column(s) to NOT NULL ................................................................82 Add a primary key to an existing table ........................................................83 Deleting Database Objects ..................................................................................84 Delete a table...................................................................................................84 Delete a database............................................................................................84 Adding Data to Tables.........................................................................................84 Insert new row................................................................................................84 Insert new record with only specified column field(s) ................................85 Retrieving Data from Database Table(s)............................................................86 Retrieve all fields from a table ......................................................................86 Retrieve value(s) from particular column(s) of a table ..............................87 Retrieve value(s) from particular column(s) of a table without duplication87 Retrieve data with specified selection criteria.............................................88 Creating and Deleting Data View.......................................................................89 Create a data view..........................................................................................89 Delete a data view...........................................................................................90 Update the value in a column........................................................................90 Update values in a number of columns ........................................................91 Delete record(s) from the table .....................................................................92 68 Result Presentation .............................................................................................93 The ORDER BY clause .................................................................................93 The GROUP BY … HAVING clause ...........................................................94 Operators Used with WHERE ............................................................................96 The LIKE operator ........................................................................................96 The IN operator..............................................................................................98 The BETWEEN Operator.............................................................................98 The AND Operator ........................................................................................99 The OR operator ..........................................................................................100 Add alias to a column...................................................................................101 Joining Tables ...................................................................................................101 Equijoin.........................................................................................................101 The NATURAL JOIN operator..................................................................103 The INNER JOIN operator.........................................................................104 The LEFT (OUTER) JOIN operator .........................................................105 The RIGHT (OUTER) JOIN operator ......................................................105 The FULL (OUTER) JOIN operator .........................................................106 Combining Query Results.................................................................................107 The UNION operator...................................................................................107 The INTERSECT operator .........................................................................108 The EXCEPT/MINUS operator .................................................................109 Using nested SELECT statement................................................................110 Arithmetic Operators/Functions.......................................................................111 String Functions ...............................................................................................112 Aggregate Functions.........................................................................................113 The AVG function........................................................................................113 The COUNT function ..................................................................................114 The MAX function .......................................................................................115 The MIN function.........................................................................................116 The SUM Function.......................................................................................117 Create/Drop Table Index.............................................................................117 Exporting Data from MS Access ......................................................................119 Export Data from an MS Access Database to Another Access Database119 Export Data from an MS Access Database in other file formats.............120 69 Introduction to Structured Query Language What is SQL? Structural Query Language (SQL) is a standard language for manipulating and querying database objects (e.g., table structures and contents) in the relational database management system. For simplicity, we refer relational database management system to as database from now on. SQL allows you to access a database. SQL can be used to define database table structure and to store, select and manage data from the database including data insertion, update and deletion. SQL is widely used in databases like MySQL, DB2, Oracle, PostgreSQL, Sybase, Microsoft SQL Server, MS Access, etc. History for SQL (not within the curricula) In early 1970s, a seminal paper related to the relational database model authored by E.F. Codd received in a considerable notice from the database community. The relational database model provided a perfectly theoretical framework for the development of a well-formed querying language that the model could support. By 1974, IBM had defined a language called the ‘Structured English Query Language’ or SEQUEL. The name was later shortened as Structured Query Language (SQL). In 1986, a standard for Structured Query Language (SQL) was defined by the American National Standards Institute (ANSI), and this became an international standard recognized by the International Standards Organization (ISO) in 1987. In 1989, a revised standard known commonly as SQL89 or SQL1, was published. The ANSI committee released the SQL92 standard in 1992 (also called SQL2). This standard addressed several weaknesses in SQL89 and set forth conceptual SQL features which at that time exceeded the capabilities of any existing RDBMS implementation. The SQL92 standard was approximately six times the length of its predecessor. Because of this disparity, the authors defined three levels of SQL92 compliance: Entry-level conformance, Intermediate-level conformance, and Full conformance. Some information about the difference among various levels of SQL92 compliance can be found here. In 1999, the ANSI/ISO released the SQL99 standard (also called SQL3). This standard addresses some of the more advanced areas of modern SQL systems, such as object-relational database concepts, call level interfaces, and integrity management. SQL99 replaces the SQL92 levels of compliance with its own degrees of conformance: Core SQL99 and Enhanced SQL99. A short article that highlights some important changes in SQL99 can be found here. Although various databases may implement their SQL slightly differently, they support the same major functions (such as SELECT, UPDATE, DELETE, INSERT, WHERE, etc.) in a similar way 70 in order to fulfill the ANSI standard. This SQL statements introduced in this note are largely based on the Entry-level conformance of SQL92. Teaching remarks  Apparently, the SQL statements that the A/AS level curricula cover are so basic that even the entry-level of SQL92 supports them.  Most of the SQL statements included in this note have been tested on Microsoft Access 2003. It supports SQL92 but this requires some reconfiguration. The default database format is Access 2000 which is not compatible with SQL92. To change the default database format, start Access 2003. Click Tools, then Options. Click the Advanced tab and change the Default File Format to “MS Access2002-2003” (see Figure 1). To change the SQL syntax to SQL92, click Tools and then Options. Click the Tables/Queries tab and check both boxes (This database and Default for new databases) under SQL Server Compatible Syntax (ANSI 92) (see Figure 2).  It appears that the SQL92 supported by Access 2003 conforms to the entry-level only. For example, it does not support for some join features such as NATURAL JOIN and FULL OUTER JOIN. Other non-support features include EXCEPT and INTERSECT, etc.  A subset of the SQL92 standard that is both usable and commonly supported can be found at http://www.firstsql.com/tutor.htm. Figure 1. Setting Access 2003’s default database format to “Access 2002 – 2003” to support SQL92. 71 Figure 2. Setting Access 2003’s default SQL syntax to conform to SQL92. Data Definition Language and Data Manipulation Language SQL supports functions such as building and manipulating database objects, populating database tables with data, updating existing data in tables, deleting data, performing database queries, controlling database access and overall database administration. Such functions can be classified into a number of categories and the most well known two categories are Data Definition Language (DLL) and Data Manipulation Language (DML). DDL allows user to create and restructure database objects, such as creating and deleting database tables. Besides, DDL can be used to define table indexes as well as foreign keys between tables. Some of the commonly used DDL commands are:         CREATE TABLE ALTER TABLE DROP TABLE CREATE INDEX ALTER INDEX DROP INDEX CREATE VIEW DROP VIEW 72 DML allows users to manipulate data within the objects of a database. Some of the commonly used DML commands are:     SELECT INSERT INTO UPDATE DELETE In a nutshell, DDL allows database users to define database objects whereas DML allows database users to retrieve, insert, delete and update data in a database. An Illustrative Example – A Library System In order to help readers understand the SQL statements that we are going to introduce, those statements will be illustrated in a hypothetical library database as far as possible. The tables used in the simple library database are the Student, Book and LoanRecord tables and their details are given below. Readers are reminded that the tables and fields kept in the proposed database are far less than what a real library system requires. We keep the example database simple and yet adequate for the illustration purposes. The Student table is used to store basic student information like student ID, name, the class that the student belongs, and phone number. The data fields of the Student table are as follows: StdID Name Class OverduePay PhoneNo 0002011 Chan Ming Wai 2C 12.5 21238782 0002012 Wong Wai Ming 2B 30.5 21234456 0002013 Cheung Ka Fai 2C 0 23212321 0002014 Chang Wai Yee 4A 20.5 23213123 0002015 Lee Oi Lam 5C 3 25214123 0002016 Sze Yuk Ki 7B 1.5 26434534 Table 1. Data in the Student table. 73 Table 2 describes the characteristics of the data fields in the Student table. Field Name Description StdID    Unique Student number Text string – 7 digits Not null (i.e., the field is mandatory and a value is to be inserted) Name    Student Name Text string – 30 characters Not null Class    The class student study Text string – 2 characters Not null PhoneNo   Phone Number Text string – 8 digits OverduePay   Overdue Payment A number with two decimal places (<= 999.99) Table 2. Characteristics of the data fields in the Student table. Teaching remark  Some people may opt to define numeric data like StdID and PhoneNo as integers instead of text string. The reason why we prefer to represent the fields as text strings is that as the “numbers” are not used for computation.  Two different data types can be used to define text strings (see next section) and it is important for teachers to clarify to their student of the key difference between the data types. The Book table contains the key information about the books in the library. table are shown in Table 3. BookID Title 00000001 Apple Tree 00000002 Bible 00000003 Star Wing Table 3. Details of the Book Type Data in the Book table. 74 Table 4 describes the characteristics of the data fields in the Book table. Field Name Description BookID    Unique book ID Text string – 8 digits Not null Title    Book Title Text string – 100 characters Not null Type   Book category Text string – 3 digits Table 4. Characteristics of the data fields in the Book table. The LoanRecord table contains information of the library items on loan (or once on loan). Details of the Book table are as follows: LoanRecID StdID BookID DateOfBorrow Status 1 0002012 00000001 20051001 1 2 0002011 00000002 20020112 2 3 0002012 00000003 20031211 2 4 0002013 00000002 20031001 2 5 0002011 00000002 20051018 1 Table 5. Data in the LoanRecord table. Table 6 describes the characteristics of the data fields in the LoanRecord table. Field Name Description LoanRecID    Unique loan record ID Text string – 8 digits Not null StdID    Student number Text string – 7 digits Not null BookID    Book ID Text string – 8 digits Not null DateOfBorrow    Status Table 6.    Date of the book being borrowed Date data type Not null Loan status (1 – on loan; 2 – returned; 3 – on hold) Text string – 1 digit Not null Characteristics of the data fields in the LoanRecord table. 75 Commonly Used Data Types in SQL The data type of a data item restricts the values that the data item can take and the operations which one can perform on that data item. Table 7 gives some of the commonly used data types in SQL. Data Type INTEGER or SMALLINT TINYINT Description INT Hold integers only. The three types differ in the minimum and maximum value that they can represent. DECIMAL(size, decimal) NUMERIC(size, decimal) Hold numbers with fractions. The maximum number of digits is specified by size. The maximum number of decimal places is specified by decimal. CHAR(size) Hold a fixed length text string. The maximum size of fixed length string is specified by size. Unused space is packed with space characters. VARCHAR(size) Hold a variable length string. The maximum size of fixed length string is specified by size. Unused space is not packed with any characters. DATE Date format may be different in various databases but they all contain calendar date with year, month and day. Table 7. Some basic data types used in SQL. Note that the Boolean data type, which accepts TRUE or FALSE as its value, is not defined in SQL92, but in SQL99. However databases support the data type even though they are not conforming to SQL99. Teaching remarks  A character string stored in a CHAR column is left-justified and padded with trailing blanks to the length of the column. All the strings stored in a CHAR column have the same length. These trailing blanks are preserved in query results.  A character string stored in a VARCHAR column has exactly the same length as the source string or the expression that generated the string (including trailing blanks). Character strings stored in a VARCHAR column can vary in length.  A character string stored in a VARCHAR column incurs a 2-byte overhead. Do not use this data type for columns less than 6 bytes long or for columns that store strings of the same length. Use the CHAR data type instead. 76 SQL Statements Creating Database Objects Create a database The CREATE DATABASE statement can be used to create a database with a specified name. Syntax CREATE DATABASE database_name Example A database named “library_system” is created with the following statement. CREATE DATABASE library_system Teaching remark  Some databases like Microsoft Access may require users to create a database by using their own user interface instead of within a SQL environment. Create a table in a database The CREATE TABLE statement can be used to create a table with a specified name. Syntax CREATE TABLE TableName ( Column1 DataType1, Column2 DataType2, ....... ) Full Syntax Example 1 Create a table called “Teacher” with two columns named “Name” and “Age” respectively. CREATE TABLE Teacher ( Name varchar(30), Age int ) Sample Query - Q1_1_CreateTableTeacher Want to Try? 77 Result An empty Teacher table with two fields – Name and Age – is created. Teaching remark  The Teacher table is not required in the library system example. another SQL statement which removes database tables. It is created to demonstrate Example 2 Create a table called “Book” that contains fields named “BookID”, “Title” and “Type” such that a value for “BookID” must be entered for each row and its value is unique within the table. CREATE TABLE Book ( BookID char(8) NOT NULL UNIQUE, Title varchar(100), Type int ) Sample Query Q1_2_CreateTableBook Want to Try? Result An empty Book table with three fields – BookID, Title and Type – is created. The BookID field is mandatory (indicated by “NOT NULL”) and unique (indicated by “UNIQUE”) within the Book table. Creating Table with Integrity Rule Create table with primary key For each table, it is necessary to have a field or a combination of selected fields such that their values can be used to identify each table row uniquely. Such an identifier is known as a candidate key. The concept of candidate key is essential to good database design. The most commonly used candidate key of a table is typically selected to be the primary key of the table. The PRIMARY KEY keyword is used to specify the fields in a table that compose the table’s primary key. 78 Syntax CREATE TABLE TableName ( Column1 DataType, NOT NULL Column2 DataType, NOT NULL ....... PRIMARY KEY (Column1, Column2, …) ) Full Syntax Teaching remark  Technically, all fields in a primary key should be defined to be UNIQUE and NOT NULL. Although some databases like Microsoft Access 2003 may take all primary key fields as UNIQUE and NOT NULL even though they are not specified, it is a good practice to specify them explicitly. Example To create a table called “Student” with the primary key “StdID”, we can use the following statement: CREATE TABLE Student ( StdID char(7) NOT NULL UNIQUE, Name varchar(30), Class char(10), Age smallint, OverduePay decimal(5,2), PRIMARY KEY (StdID) ) Sample Query Q2_1_createStudent_PriKey Want to Try? Teaching remark  The length of the Class field is set to 10 characters long intentionally. characters long using another SQL statement later. We will alter it to 2 Create table with foreign key A foreign key (which may be composite) to another table ensures that the value of the foreign key field(s) can be found in the primary key of the foreign table. The following example shows how to create a table in a database with foreign key. 79 Syntax CREATE TABLE TableName1 ( Column1 DataType1, Column2 DataType2, ....... FOREIGN KEY (ColumnX, ColumnY) REFERENCES TableName2 ) Full Syntax Example In this example, we would like to create a table “LoanRecord” with a primary key “LoanRecID” and two foreign keys “StdID” and “BookID” that references tables “Student” and “Book” respectively by using the following statement. CREATE TABLE LoanRecord ( LoanRecID char(8) NOT NULL, StdID char(7) NOT NULL, BookID char(8) NOT NULL, Dateofborrow date, Status char(1), PRIMARY KEY (LoanRecID), FOREIGN KEY (StdID) REFERENCES Student, FOREIGN KEY (BOOKID) REFERENCES Book ) Sample Query Q2_2_CreateLoanRecord Want to Try? The SQL script given above for creating LoanRecord table cannot run successfully because a primary key has not been defined for the Book table created earlier. It is important to rectify the problem by altering the structure of the Book table before running the above SQL script again. Important remark  A special view on one or more tables in the database in form of a kind of “virtual” table can be created with the use of the CREATE VIEW statement. The data stored in the virtual table is extracted by the SELECT statement. Both the CREATE VIEW and SELECT statements will be covered later. Modifying Table Structure If required, a table structure can be altered with the use of various forms of the ALTER TABLE statement. 80 Add column To add column(s) in a table, use ADD within the ALTER TABLE statement. Syntax ALTER TABLE TableName ADD ColumnName DataType Full Syntax Example To add a column named “PhoneNo” in the “Student” table, we can use the following statement. ALTER TABLE Student ADD PhoneNo char(8) Sample Query Q3_1_AlterStudenttable Want to Try? Result Drop column To drop column(s) in a table, use DROP within the ALTER TABLE statement. Syntax ALTER TABLE TableName DROP ColumnName Full Syntax Example To drop a column ‘Age’ in the “Student” table, we can use the following statement. ALTER TABLE Student DROP Age Sample Query Q3_2_Alterstudent_Drop Want to Try? Result Change columns’ data type Apart from adding or dropping an existing column(s) in a table, we can also edit the structure or change the data type as well as characteristics for the existing column(s) by using ALTER TABLE … ALTER COLUMN statement. 81 Syntax ALTER TABLE TableName ALTER COLUMN Column1 NewDataType Full Syntax Teaching remark  If a new data type is set for an existing column, the values that already exist in the column must be compatible with the new data type. Otherwise, the query will not be running successfully. Example To change the data type ‘Class’ to char(2) in the “Student” table, we can use the following statement: ALTER TABLE Student ALTER COLUMN Class char(2) Sample Query Q3_3_Changedatatype Want to Try? Result Change column(s) to NOT NULL Syntax ALTER TABLE TableName ALTER COLUMN Column1 DataType NOT NULL Full Syntax Example To change the data type ‘Name’ to NOT NULL in the “Student” table, we can use the following statement: ALTER TABLE Student ALTER COLUMN Name varchar(30) NOT NULL Sample Query Q3_4_changefieldNotNull Want to Try? 82 Result Teaching remark  In the above MS Access 2003 interface, the item “Required” means a mandatory entry. other words, the value for the field cannot be NULL, i.e., NOT NULL. In Add a primary key to an existing table Apart from creating the primary key when creating table, we can also create a primary key to an existing table by changing the table’s column property. Syntax ALTER TABLE TableName ADD PRIMARY KEY (ColumnName) Full Syntax Example ALTER TABLE Book ADD PRIMARY KEY (BookID) Sample Query Q3_5_AddPriKey Want to Try? Result Teaching remark  As the Book table has a primary key now, the SQL script for creating the LoanRecord table that references the Book table can now be running successfully. 83 Deleting Database Objects If required, a database table or even the whole database can be deleted. Delete a table To delete a table, use the DROP TABLE statement. Syntax DROP TABLE TableName Full Syntax Example DROP TABLE teacher Sample Query Q3_6_Droptable Want to Try? Delete a database We can delete the entire database with the use of the DROP DATABASE statement. Syntax DROP DATABASE DatabaseName Example DROP DATABASE my_database Teaching remarks   The DROP DATABASE statement should be used very rarely. You will not be able to run the DROP DATABASE statement within the graphical user environment of MS ACCESS 2003. Adding Data to Tables To insert data into a table, we can use INSERT INTO statement. insert a specified field into a table. We can insert a new row or Insert new row Syntax INSERT INTO TableName VALUES ( Value1, Value2, ....... ) Full syntax 84 Value1 is the value of the first field of the TableName table when the table is created. Value2 is the value of the second field of the table. Similarly Example The following query inserts data into the Student table. INSERT INTO Student VALUES ('0002011', 'Chan Edward', '1C', 12.5, '21238782'); INSERT INTO Student VALUES ('0002012', 'Wong Wai Ming', '2B', 30.5, '21234456'); INSERT INTO Student VALUES ('0002013', 'Cheung Ka Fai', '1C', 0, '23212321'); INSERT INTO Student VALUES ('0002014', 'Chang Wai Yee', '4A', 20.5, '23123123'); INSERT INTO Student VALUES ('0002015', 'Lee Oi Lam', '5C', 3, '25214123'); INSERT INTO Student VALUES ('0002016', 'Sze Yuk Ki', '7B', 1.5, '26434534'); Sample Query Q4_1_InsertData – Q4_6_InsertData Want to Try? Result Insert new record with only specified column field(s) The following statement shows how to insert a new record with specified column field(s). Syntax INSERT INTO TableName (Column1, Column2..) VALUES ( Value1, Value2, ....... ) Full syntax Example 85 We insert the Book ID and titles of three books into the Book table. information (stored in the Type field) is empty for the three books. The book category INSERT INTO Book (BookID, Title) VALUES ('00000001', 'Apple Tree'); INSERT INTO Book (BookID, Title) VALUES ('00000002', 'Bible'); INSERT INTO Book (BookID, Title) VALUES ('00000003', 'Star Wing'); Sample Query Q4_7_InsertSpecialField - Q4_9_InsertSpecialField Want to Try? Result Retrieving Data from Database Table(s) To select specific data from one or more tables, we can use the SELECT statement. The SELECT statement can be used in conjunction with other SQL statement to build sophisticated database queries. Retrieve all fields from a table In SQL statement, the symbol “*” is used to represent the “all of them”. The statement can be used to retrieve data from multiple tables but we defer the discussion to a later stage. We can use the following statement to select all fields from a database table. Syntax SELECT * FROM TableName Full Syntax Example The following SQL statement retrieves (and displays) all records in the Student table. SELECT * FROM Student Sample Query Q5_1_select Want toTry? Result 86 Retrieve value(s) from particular column(s) of a table To select data from particular column of a table, we can use the SELECT statement too. Syntax SELECT Column1, Column2… FROM TableName Full Syntax Example To select the ‘Name’ and ‘Class’ columns from the Student table, we can use the statement as below. SELECT Name, Class FROM Student Sample Query Q5_2_SelectSpecificField Want to Try? Result Teaching remark  If the values of the selected columns from different rows of the table are the same, multiple occurrences of the same values will result. To avoid the duplication, the SELECT DISTINCT statement is required. Retrieve value(s) from particular column(s) of a table without duplication The SELECT DISTINCT statement is used to select the value(s) of those specified column field(s) with no duplication. The syntax of this statement is as follows. Syntax SELECT DISTINCT column1, coloumn2 … FROM TableName Full Syntax Example In this example, we would like to identify all students who have used the library service at least once. If we use the SELECT statement without the DISTINCT keyword, multiple occurrences of the same students may appear if those students use the library services more than once. To avoid the duplication, we retrieve all distinct value(s) of the ‘StdID’ field from the LoanRecord table (see Table 5 for its content) with the use of the SELECT DISTINCT statement. SELECT DISTINCT StdID FROM LoanRecord Sample Query Q5_3_SelectDistinct Want to Try? 87 Result Retrieve data with specified selection criteria A WHERE clause can be appended to the basic SELECT statement to specify the condition(s) that the retrieved data need to fulfill. Rows that do not meet the specified condition(s) will not be retrieved. When more than one condition is specified, AND/OR may be used to join the conditions. Syntax SELECT Column1, Column2… FROM TableName WHERE Condition(s) Full Syntax Common operators used in the WHERE clause are tabulated below. Operator Description = Equal to <> Not Equal to > Greater/ Larger than < Less/ Smaller than >= Greater or equal to <= Less or equal to BETWEEN Within the range LIKE Match the pattern Example In this example, we would like to retrieve records of those students in the class “1C”. SELECT * FROM Student WHERE class = '1C' Sample Query Q6_1_SelectwithCriteria Teaching remark  Except for numeric values, the operand(s) of the operator must be enclosed by a pair of single quotation marks ‘’. Result 88 Creating and Deleting Data View Create a data view With the use of the CREATE VIEW statement, users may create a special view on one or more tables (or views) in the database in form of a new “virtual” table. The data view is created with the use of an associated SELECT statement. Most SQL statements that apply to a database table can also be applied to a data view. Syntax CREATE VIEW ViewName (Column1, Column2…) AS Select-Statement; Full Syntax Example In this example, we would like to create a data view to store the Book ID of those library books that are currently on loan and their corresponding borrowers (Student ID and Name). CREATE VIEW BookOnLoan_n_Borrower_View (StdID, Name, BookID) AS SELECT Student.StdID, Name, BookID FROM Student, LoanRecord WHERE Student.StdID = LoanRecord.StdID AND status='1'; Sample Query Q29_Create_View Want to Try? Result A data view known as BookOnLoan_n_Borrower_View is created. The data view has the following content. 89 Delete a data view A data view can be deleted with the use of the DROP VIEW statement. Syntax DROP VIEW ViewName; Full syntax Example The BookOnLoan_n_Borrower_View data view created earlier can be removed with the following SQL statement. DROP VIEW BookOnLoan_n_Borrower_View; Sample Query Q29_Drop_View Want to Try? Result The BookOnLoan_n_Borrower_View data view is removed. Updating Data in a Table Apart from retrieving data from a table, we can also modify selected data in a table by using the UPDATE statement and delete selected row(s) from a table. Update the value in a column To modify values in a selected column of one or more rows, we can use the UPDATE … SET statement. Syntax UPDATE TableName SET Column = NewValue WHERE Condition(s) The WHERE clause is optional. Full syntax If the WHERE clause is not used, the value in the specified column of each row will be changed to the new value. 90 Example Suppose we had wrongly put ‘1C’ as the value of the ‘Class’ field for students in Class 2C (and no records for students from Class 1C have been entered), the problem can be rectified by the following SQL statement. UPDATE Student SET class = '2C' WHERE class='1C'; Sample Query Q7_1_updatetable Want to Try? Result Teaching remark  Except for numeric values, the operand(s) of the operator must be enclosed by a pair of single quotation marks ‘’. Update values in a number of columns To modify values in a number of columns, we can use the following statement. Syntax UPDATE TableName SET Column1 = NewValue1, Column2 = NewValue2 WHERE Condition(s) The WHERE clause is optional. Full syntax If the WHERE clause is not used, the values in the specified column(s) of each row will be changed to the new values. Example Suppose we have wrongly entered the name and phone number of a student with student ID equal to ‘0002011’ in the Student table earlier on, we can use the UPDATE statement to fix the problem. The name and phone number of the student should be “Chan Ming Wai” and ‘21111182’ respectively. UPDATE Student SET Name = 'Chan Ming Wai', PhoneNo = '21111182' WHERE StdID='0002011'; Sample Query Q7_2_Updateseveralcolumn Want to Try? 91 Result Delete record(s) from the table To delete record(s) from a table, the DELETE Statement can be used. Syntax DELETE FROM TableName WHERE Condition(s) The WHERE clause is optional. Full syntax Example 1 In this example, we would like to delete all records with the book ID “00000003” from a table “Book”. DELETE * FROM Book WHERE BookID='00000003'; Sample Query Q8_1_Deletefield Want to Try? Teaching remarks  As the BookID field serves as a foreign key in the LoadRecord table to the Book table and there is a corresponding record with the BookID equal to ‘00000003’ in the LoadRecord table, the above DELETE statement cannot be executed successfully. The corresponding rows in the LoadRecord table need to be removed in order to enable the query to run successfully. Example 2 To delete all records in the table, we can simply use the DELETE statement without setting any condition. After running the following SQL statement successfully, the Book table will become empty. DELETE FROM Book Sample Query Q8_2_Deleteall Want to Try? Teaching remark  Due to the same reason as indicated in the last “Teaching remark”, the above DELETE statement cannot be executed successfully unless no corresponding rows in the LoadRecord table are found. 92 Result Presentation For various reasons, users may want to organize the result of a query in ascending or descending order of some selected fields in some occasions. In other occasions, they may be interested in the value of some aggregated attribute of the retrieved data, e.g., the total number of books that a student has ever borrowed. The former can be achieved with the use of the ORDER BY clause whereas the latter can be done with the use of the GROUP BY clause, both in a SELECT statement. The ORDER BY clause A query result can be sorted in ascending or descending lexicographical order of one or more selected sort fields. A lexicographical ordering refers to how characters are ordered in the corresponding encoding table. Syntax SELECT Column(s) FROM TableName ORDER BY Column1 [ASC|DESC], Column2 [ASC|DESC], ... Full syntax Optional parts are put inside square brackets. A vertical bar stands for disjunction. [ASC|DESC] means that a user may use none of the keywords, or either one. Thus Teaching remarks  The sort fields may or may not be selected for retrieval purposes.  The default sorting order is in lexicographical order. Example 1 Suppose we would like to sort all rows in the Student table in ascending order of the student name. SELECT * FROM Student ORDER BY Name Sample Query Q9_1_sort Result Example 2 In the following example, we retrieve all rows in the Student table in ascending order of the Class field. 93 SELECT * FROM Student ORDER BY Class ASC Sample Query Q9_2_SortASC Result Example 3 In this example, we would like to sort all records in the Student table in two levels: descending order of the Class field, then in ascending order of the Name field. first in SELECT * FROM Student ORDER BY Class DESC, Name ASC Sample Query Q9_2_SortASC2 Result The GROUP BY … HAVING clause To facilitate the users to do data analysis, grouping the result in a suitable way is sometimes required. This can be achieved with the use of the GROUP BY clause. Syntax SELECT Column(s) FROM TableName GROUP BY Column1, Column2, ... HAVING Condition(s) The HAVINIG clause is optional. Full syntax 94 Example 1 To count the number of students who have borrowed books in each class, we can use the GROUP BY clause (without the HAVING part) as below: SELECT Class, count(*) AS Num FROM Student GROUP BY Class ORDER BY Class DESC; Sample Query Q10_Group The AS keyword enables a user to assign a new label to a selected object. In the above example, the output of the aggregate function COUNT(*) which counts the number of output rows in each group (as specified by the GROUP BY clause) is labeled as ‘Num’. Result A clause which can only be used after the GROUP BY clause is HAVING. It comes after GROUP BY (and before ORDER BY if the clause is needed as well). The purpose of HAVING is to set selection criteria based on some aggregate values. The following SQL query counts the number of students in each of the classes such that its students owe the library more than 20 dollars overdue fine in aggregate. Example 2 SELECT Class, Count(*) AS Num FROM Student GROUP BY Class HAVING SUM(OverduePay) > 20 ORDER BY Class DESC; Sample Query Q10_Group_By-Having The result of the query is as follows: 95 Teaching remarks  The WHERE clause sets selection criteria for the SELECT statement based on non-aggregate value(s) only. Any selection based on aggregate value must be done with the HAVING clause. A common student mistake is to use some aggregate function(s) in a WHERE clause. Aggregate functions do not work in a WHERE clause because it is given no information as to how records (i.e., table rows) are to be grouped. Such grouping information is provided to the HAVING clause by the GROUP BY clause.  The SELECT statement can reference values generated by the aggregate functions or columns specified in the GROUP BY clause only. SELECT Class, Count(*) AS Num FROM Student GROUP BY Class HAVING SUM(OverduePay) > 20 AND Class > "3" ORDER BY Class DESC;  The HAVING clause can reference values generated by the aggregate functions or columns specified in the GROUP BY clause only.  As shown in the above example, the parameter (which is a column) specified in an aggregate function referred to by the HAVING clause is not needed to be included as a column referred to by the SELECT statement. Operators Used with WHERE A number of operators can be used in conjunction with the WHERE clause to specify the condition(s) for data retrieval. The LIKE operator Earlier on, we learnt to use the SELECT and WHERE statement to select data from one or more table that meet specified condition(s). Most of those conditions require an exact match. Sometimes, we may interest to retrieve data based on a partial match. This is supported in SQL by the LIKE operator. Wildcard characters (‘_’ and ‘%’) are used for specifying a retrieving pattern. The ‘_’ stands for any character while the ‘%’ means all character combinations (including NULL) are allowed. Teaching remarks   LIKE can only be used with CHAR and VARCHAR field types. Unless the SQL-92 syntax is selected in Microsoft Access, the database uses ‘?’ and ‘*’ for ‘_’ and ‘%’ respectively. 96 Syntax SELECT Column(s) FROM TableName WHERE Column LIKE pattern Full Syntax Example 1 In the following example, all students records with the name started with ‘Ch’ are retrieved. SELECT * FROM Student WHERE Name LIKE 'Ch%'; Sample Query Q11_like1 Want to Try? Result Example 2 In this example, we would like to select all students records with the student name’s second letter being ‘h’ and last letter being ‘i’. SELECT * FROM Student WHERE Name LIKE '_h%i'; Sample Query Q11_like2 Want to Try? Result Example 3 The following query selects all students records with at least one ‘u’ character in.the student name. SELECT * FROM Student WHERE Name LIKE '%u%' Sample Query Q11_like3 Want to Try? 97 Result The IN operator When using the WHERE clause, it is possible to use IN to specify a list of values for a selected column that the SELECT statement requires the retrieved rows to have. Syntax SELECT Column(s) FROM TableName WHERE Column IN (value1,value2,...) Full Syntax The value list (which is an operand) of the IN operator can be list explicitly as shown in the above syntax or generated by another SELECT statement. The latter is known as nested SELECT statement which will be covered later. Example We use the IN operator to select records of student(s) whose name is ‘Cheung Ka Fai’ or ‘Wong Wai Ming’. SELECT * FROM Student WHERE Name IN ('Cheung Ka Fai','Wong Wai Ming'); Sample Query Q12_in Want to Try? Result The BETWEEN Operator We can specify a range of values for a selected column using the BETWEEN operator within a SELECT statement in order to require the corresponding field values of the retrieved rows to be within the specified value range in an inclusive manner. 98 Syntax SELECT Column(s) FROM TableName WHERE Column BETWEEN value1 AND value2 Full Syntax Example We use the BEWTEEN operator to select students with their student ID between 0002013 and 0002015. SELECT * FROM Student WHERE StdID Between '0002013' AND '0002015'; Sample Query Q13_between Result Teaching remark  The result of the above query may be different in various databases as some may contain the boundary records while some may not. However, according to the SQL-92 and SQL-99 standards, boundary records are to be included. The AND Operator By using the AND operator, we can require retrieval row(s) of data to meet a number of filtering conditions simultaneously. Syntax SELECT Column FROM TableName WHERE Condition1 AND Condition2 Full Syntax Example The following query retrieve student record(s) such that the student is in class 2C and has overdue fine to settle. SELECT * FROM Student WHERE Class = '2C' AND OverduePay > 0 Sample Query Q14_AND 99 Result The OR operator By using the OR operator, we can select data rows such that at least one of its operands (which is a condition) is fulfilled. Syntax SELECT Column FROM TableName WHERE Condition1 OR Condition2 Full Syntax Example The following query retrieves the student records from Student table such that the student is either a member of Class 2C or his/her name being “Chang Wai Yee”. SELECT * FROM Student WHERE Class='2C' OR Name='Chang Wai Yee'; Sample Query Q15_OR Result Example - using both AND and OR Operators in a query Retrieve the student record of a Class 2C student whose name is “Chan Ming Wai” and the record of another student whose name is “Chang Wai Yee” SELECT * FROM Student WHERE (Class='2C' AND Name='Chan Ming Wai') OR Name='Chang Wai Yee'; Sample Query Q16_ANDOR Result 100 Add alias to a column Sometimes, the column name of the resultant table may not be expressive enough for display purpose. In this case, we can assign alias to the column of resultant table using the AS operator. Syntax SELECT Column1 AS ColumnAlias1, Column2 AS ColumnAlias2,... FROM TableName Full Syntax Example The following query assigns more meaningful labels to the fields retrieved from the Student table. SELECT StdID AS Student_ID, Name AS Student_Name, PhoneNo AS Phone_Number FROM Student; Sample Query Q17_aliases Result Joining Tables Sometimes, we may need to retrieve data from two or more tables. In this case, we can join tables with the use of the relevant field(s) of the tables. In most cases, tables are joined according to search conditions that find only the rows with matching values; this type of join is known as an inner equijoin. Occasionally, non-equijoins, for example, that express a greater-than or less-than relationship, may be used. In some other occasions, decision-support analysis may require outer joins, which retrieve both matching and non-matching rows. The three types of outer joins are left outer join, right outer join, and full outer join. Equijoin We can retrieve data from tables by setting up retrieval condition that requires the column values of the “joined” tables being equal. In brief, equijoin is a join in which rows from two tables are combined and added to the result set when there are equal values in the joined columns. 101 Syntax SELECT TableName1.Column11, TableName1.Column12,... TableName2.Column21,TableName2.Column22,... FROM TableName1, TableName2 WHERE equality_condition(s) Full Syntax Example 1 (equijoin with repeated column) In this example, we find details of students and the library service that they have accessed (i.e., borrow/return/reserve a book). Output will not include any student details who did not use any library service before. In order to do so, we retrieve all details from the LoanRecord table and the Student table where the value of column “StdID” in both tables are equal. SELECT * FROM LoanRecord, Student WHERE LoanRecord.StdID=Student.StdID; Sample Query Q18_EJoin Result In the above example, the “StdID” field occurs twice in the equijoin output as it can be found in both the LoanRecord and Student tables. Obviously there is no point in repeating the same piece of information. One of the two identical columns can be eliminated by changing the SELECT list. The result is called a natural join. More exactly, the natural join operation produces a Cartesian product of its two argument tables, performs a selection that enforces equality on attributes that appears in both tables, and removes duplicate attributes at the end. Example 2 (natural join) Suppose we not only want to find the list of students who have accessed library services, but also the title of books the student borrowed/returned/reserved. To do this, we join all the three tables with the following query. SELECT LoanRecord.LoanRecID, Student.Name, Book.Title FROM LoanRecord, Student, Book WHERE LoanRecord.StdID=Student.StdID AND LoanRecord.BookID=Book.BookID; Sample Query Q19_EJoin2 102 By selecting the fields of interest only, the repeated occurrences of the same piece of information shown in the previous example disappear, i.e. a natural join. In that sense, the natural join is a subtype of the equijoin. Result Teaching remark  In SQL, all join conditions are to be specified explicitly. The fact that two tables have the same attribute name, (e.g. StdID in the LoadRecord and Student tables), does not mean that a join will be done between them automatically. Omitting the join conditions when joining tables will result in an output that corresponds to the Cartesian product of the rows in the selected tables. The NATURAL JOIN operator A NATURAL JOIN operation uses the column in both tables that has the same name (and type) to perform an equijoin. However it relies on the SELECT statement to avoid the retrieval of same pieces information for implementing the natural join operation. Syntax SELECT TableName1.Column11, TableName1.Column12,... FROM TableName1 NATURAL INNER JOIN TableName2 Full Syntax Example In this example, we search the list of students who have at least made use of the library service once, just like the example showed in the second equijoin example. However, this time we do the same query with natural join. SELECT DISTINCT Student.Name FROM Student NATURAL INNER JOIN LoanRecord 103 Result Name Chan Ming Wai Cheung Ka Fai Wong Wai Ming Note that the multiple occurrences of output records are eliminated with the use of DISTINCT. Teaching remark  NATURAL JOIN is not supported by Access 2003. However it is easy to model the NATURAL JOIN operation with the INNER JOIN operation as shown in the next section. The INNER JOIN operator Rows of two tables can be joined together by using the INNER JOIN operator when the selected rows meet some specified condition(s). Rows that fail to meet the conditions will not be selected. As the prevailing condition type used is the test of equality, INNER JOIN is often used to implement the concept of equijoin. Syntax SELECT Column1, Column2,… FROM TableName1 INNER JOIN TableName2 ON Condition(s) Full Syntax Example This query below models the NATURAL JOIN example given in the last section. SELECT distinct Student.Name FROM Student INNER JOIN LoanRecord on (Student.stdid = LoanRecord.stdid) Sample Query Q19_InnerJoin Resultant Table: 104 The LEFT (OUTER) JOIN operator The result of a LEFT JOIN operation contains every row from the first table and all matching rows in the second table. Rows found only in the second table are not displayed. If the rows in the first table have no match in the second table, fields corresponding to the second tables in the output rows will be filled with null. Syntax SELECT Column1, Column2,... FROM TableName1 LEFT JOIN TableName2 ON Condition(s) Full Syntax Example To view all library services that the students have accessed as well as those students who have not made use of the library services at all, we can use the following query. SELECT Student.StdID, Student.Name, LoanRecord.LoanRecID FROM Student LEFT JOIN LoanRecord ON LoanRecord.StdID=Student.StdID; Sample Query Q19_LeftJoin Result The RIGHT (OUTER) JOIN operator The RIGHT JOIN will return all the rows contained in the second table and all matching rows in the first table. If there is no match in the first table, the fields corresponding to the first tables in the output rows will be given a null. Syntax 105 SELECT Column1, Column2,... FROM TableName1 RIGHT JOIN TableName2 ON Condition(s) Full Syntax Example To view the student ID, student name, and the types of library services that the student had made use of, we may use the following query. SELECT Student.StdID, Student.Name, LoanRecord.LoanRecID FROM Student RIGHT JOIN LoanRecord ON LoanRecord.StdID=Student.StdID; Sample Query Q19_RightJoin Result The FULL (OUTER) JOIN operator Unlike LEFT JOIN and RIGHT JOIN which do not include all non-matching rows into the output table, the result of the FULL (OUTER) JOIN contains those rows that are unique to each table, as well as those rows that are common to both tables. The fields corresponding to any non-matching table rows will be given a null in the relevant output rows. Syntax SELECT Column1, Column2,... FROM TableName1 OUTER JOIN TableName2 ON Condition(s) Full Syntax Example We modify the RIGHT JOIN example by replacing the RIGHT JOIN by a FULL JOIN. 106 SELECT Student.StdID, Student.Name, LoanRecord.LoanRecID FROM Student FULL JOIN LoanRecord ON LoanRecord.StdID = Student.StdID Want to Try? Result StdID Name LoanRecID 0002011 Chan Ming Wai 2 0002011 Chan Ming Wai 5 0002014 Chang Wai Yee 0002013 Cheung Ka Fai 0002015 Lee Oi Lam 0002016 Sze Yuk Ki 0002012 Wong Wai Ming 1 0002012 Wong Wai Ming 3 4 Teaching remark  FULL (OUTER) JOIN is not supported by Access 2003. However it is easy to model the FULL (OUTER) JOIN operation by “integrating” the results of the LEFT JOIN operation and the RIGHT JOIN operation as shown in the next section. Combining Query Results Results of two queries can be merged to a single resultant table through UNION and INTERSECT. The former put all component query results into the resultant table whereas the latter keeps rows that appear in results of both component queries only. Query results can be removed from another query results using MINUS. Another way to combine query results together is known as nested query which is implemented with the use of multiple SELECT statements. In a nested query, the result of a SELECT statement is used as a part of another SELECT statement. The UNION operator It may be useful to merge the results of two queries together to form a single output table. This can be done with the UNION operator. UNION only works if each query in the statement has the same number of columns, and each pair of the corresponding columns is of the same type. When using UNION, all duplicating output rows are eliminated. Syntax SQL_Statement1 UNION SQL_Statement2 107 Example The following example implements the FULL OUTER JOIN example using LEFT JOIN, RIGHT JOIN and UNION. SELECT Student.StdID,Student.Name,LoanRecord.LoanRecID FROM Student LEFT JOIN LoanRecord ON LoanRecord.StdID = Student.StdID UNION SELECT Student.StdID,Student.Name,LoanRecord.LoanRecID FROM Student RIGHT JOIN LoanRecord ON LoanRecord.StdID = Student.StdID; Sample Query Q20_Union Want to Try? Result The INTERSECT operator The INTERSECT operator returns only those rows that are common to the results returned by two or more query expressions. INTERSECT only works if each query in the statement has the same number of columns, and each pair of the corresponding columns is of the same type. When using INTERSECT, all duplicating output rows are eliminated. Syntax SQL_Statement1 INTERSECT SQL_Statement2 Example The following query identifies those students whose names have the substrings “Wai” and “Chan”. SELECT Name FROM Student WHERE Name LIKE ‘%Wai%’ INTERSECT 108 SELECT Name FROM Student WHERE Name LIKE ‘%Chan%’ Want to Try? Result Name Chan Ming Wai Chang Wai Yee Teaching remark  INTERSECT is not supported by Access 2003. The EXCEPT/MINUS operator The EXCEPT/MINUS operator returns only those rows that appear in the first query results but not the second query results. When using EXCEPT/MINUS, all duplicating output rows are eliminated. EXCEPT is defined in SQL92 whereas MINUS is used by Oracle for the same purpose. Syntax SQL_Statement1 EXCEPT SQL_Statement2 Example The following query identifies those students whose names have the substring “Wai” but not “Chan”. SELECT Name FROM Student WHERE Name LIKE ‘%Wai%’ EXCEPT SELECT Name FROM Student WHERE Name LIKE ‘%Chan%’ Want to Try? 109 Result Name Wong Wai Ming Teaching remark  EXCEPT/MINUS is not supported by Access 2003. Using nested SELECT statement Besides using UNION function, we can use the nested SELECT statement to combine query results in a way that the result of a SELECT statement can be used as the values of some input parameter for another SELECT statement. For simplicity, we give the syntax of nested SELECT statements that use the =ANY operator or the IN operator only. Some other commonly used operators in nested SELECT statement are >ALL, <ALL, >=ALL, and <=ALL. The latter two are widely used to find the maximum and minimum values from a list of selected values respectively (see Example2) Syntax (=ANY) SELECT (Column1, Column2, ...) FROM TableName1,TableName2 WHERE Column =ANY SELECT (Column1, Column2, ...) FROM TableName3 Full Syntax Note that the =ANY operator can be replaced by the IN operator in the above case. Example1 In this example, we would like to find students in 2C class who had used some library service(s) before. SELECT name FROM Student WHERE (StdID =ANY (SELECT StdID FROM LoanRecord)) AND Class = '2C'; Sample Query Q21_NestSelect Another way to implement the above query is as follows: SELECT DISTINCT Name FROM Student, LoanRecord WHERE Student.StdID = LoanRecord.StdID AND Class = '2C'; To save the effort of referencing the LoanRecord table given earlier, its table content is displayed again as below. 110 Table “LoanRecord” LoanRecID StdID BookID DateOfBorrow Status 1 0002012 00000001 20051001 1 2 0002011 00000002 20020112 2 3 0002012 00000003 20031211 2 4 0002013 00000002 20031001 2 5 0002011 00000002 20051018 1 Result Example2 In this example, we would like to find the student who owes the large amount of overdue fine to the library. SELECT Name, OverduePay FROM Student WHERE OverduePay >=ALL (SELECT OverduePay from Student) Sample Query Q21_NestSelect_2 Result Arithmetic Operators/Functions The following arithmetic operators/functions can be used in relevant expressions within a SQL statement. Operator/Function Description + The arithmetic add operator or unary plus operator - The arithmetic subtract operator or unary minus operator * The multiply operator (a shorthand for all columns after the SELECT keyword) / The arithmetic divide operator ABS(numeric-expresssion) The ABS() function turns the value of a numeric expression into its absolute value. Table 8. Some SQL arithmetic operators/functions. 111 For example, if the loan period for a library item is 28 days, we can use NOW()+28 to compute the due date for return if the item is loaned to a library user now. String Functions Some SQL-92 string functions are included below. Function Type Description CHAR_LENGTH(string-expression) or Returns the number of characters in CHARACTER_LENGTH(string-expression) a string-expression. for SQL92 LENGTH(string-expression) for Oracle Example The following statement will return the value 7. LEN(string-expression) for Access SELECT CHAR_LENGTH('library') LOWER(string-expression) for SQL92 Converts all letters in a string to lower case. LCASE(string-expression) for Access Example The following statement will return 'library'. SELECT LOWER('Library') UPPER(string-expression) for SQL92 Converts all letters in a string to upper case. UCASE(string-expression) for Access Example The following statement will return 'LIBRARY'. SELECT UPPER('Library') TRIM(string-expression) for both SQL92 and Access Removes leading and trailing blanks from a string. Example The following statement will return the value 9. 112 SELECT CHAR_LENGTH(TRIM(' chocolate ' ) SUBSTRING/SUBSTR(string-expression, start, length) for SQL92 MID(string-expression, start, length) for Access Returns a substring of a string. Example The following statement will return “library”. SELECT SUBSTRING('library system', 1, 7) Table 9. Some SQL string functions. Teaching remark  It appears that many databases introduce their own built-in functions although many of those functions in fact offer the same functionality as the corresponding SQL-92 built-in functions. It is important to check carefully before teaching the topic. Aggregate Functions Aggregation functions enable the user to perform tasks on more than just one record. They can be used to perform data calculations, such as maximum, minimum, or average. Function Usage AVG(expression) Computes the average value of a column by the expression COUNT(expression) Counts the rows defined by the expression COUNT(*) Counts all rows in the specified table or view MIN(expression) Finds the minimum value in a column by the expression MAX(expression) Finds the maximum value in a column by the expression SUM(expression) Computes the sum of column values by the expression Table 10. Some SQL aggregate functions. The AVG function The AVG function returns the average value of the selected column. not be included in the calculation. Note that any null values will Syntax SELECT AVG(Column) FROM TableName 113 Example We can calculate the average overdue payment per library user by using the AVG SELECT AVG(OverduePay) AS Average_Overdue_Payment FROM Student Sample Query Q22 AVG Result Note that the overdue payment of Cheung Ka Fai (student ID 0002013) is set to zero and thus the computation of the average value has included the number. If the field is set to null, the average overdue payment per user will be 13.6 instead. The COUNT function COUNT function is useful when we would like to find the total number of records retrieved subject to certain selection criteria. There are two kinds of COUNT functions. They are COUNT(Expression) and COUNT (*). The first counts for the non-null records returned by the evaluation of the Expression (which is typically a column name). The second returns the total number of records based on the selection criteria specified in the SELECT statement, no matter the records are NULL or not. Syntax - COUNT (column) SELECT COUNT(Column) FROM TableName Example The following query counts the number of inactive library users who have never made use of any library service. SELECT Count(*)-Count(LoanRecord.Status) AS Number_of_idle_users FROM Student LEFT JOIN LoanRecord ON Student.StdID = LoanRecord.StdID; Sample Query Q23_COUNT Result Syntax - COUNT(*) SELECT COUNT(*) FROM TableName 114 Example The following query counts the number of students whose names start with “Chan”. SELECT Count(*) AS Number_of_students FROM Student WHERE (Student.Name) LIKE "Chan%"; Sample Query Q24_COUNT Result The MAX function The MAX function finds the maximum value in a selected table column. Syntax SELECT MAX(Column) FROM TableName Example The following query finds the student who owes to the library the largest amount of overdue fine. SELECT Name, OverduePay FROM Student WHERE OverduePay = (SELECT MAX(OverduePay) FROM Student) Sample Query Q25_MAX The following SQL script implements the same query without using the MAX function. SELECT Name, OverduePay FROM Student WHERE OverduePay >=ALL (SELECT OverduePay FROM Student) Result Teaching remark  Many students may produce a SQL script similar to the one below for the above query. SELECT Name, MAX(OverduePay) 115 FROM Student The above query violates the syntactic rules of SQL. The problems lies on the fact that a number of student names can be retrieved from the Student table (which correspond to several rows in the output table) but all aggregate functions like MAX() returns exactly one row in the output table only. The MIN function The MIN function finds the minimum value in a selected table column. Syntax SELECT MIN(Column) FROM TableName Example The following query finds the student who owes to the library the least amount of overdue fine. SELECT Name, OverduePay AS Overdue_Fine FROM Student WHERE OverduePay = (SELECT MIN(OverduePay) FROM Student) Sample Query Q26_MIN Teaching remark  The following SQL script implements the same query without using the MIN function. SELECT Name, OverduePay FROM Student WHERE OverduePay <=ALL (SELECT OverduePay FROM Student) Result 116 The SUM Function The SUM function computes the sum of values from a selected column. Syntax SELECT SUM(Column) FROM TableName Example The query below gives the total amount of outstanding overdue fine for each class of students. SELECT Class, SUM(OverduePay) AS Overdue_Fine FROM Student group by Class Sample Query Q27_SUM Result Miscellaneous SQL Features Create/Drop Table Index Indexing is commonly used to enhance the performance of a database system. With the CREATE INDEX statement, we can create indexing structures (which is a pre-processed list) for database tables so as to provide an efficient access path to various table rows. When running a query, database will examine any relevant index for a more efficient data access instead of traversing the entire table. To delete an index, the DROP INDEX statement can be used. Syntax (CREATE INDEX) CREATE INDEX IndexName ON TableName(Column1,Column2,...) Full Syntax Syntax (DROP INDEX) DROP INDEX IndexName ON TableName Example (CREATE INDEX) In the following, we create an index for the class and student number combination in the Student table as the two fields are often accessed by various queries. 117 CREATE INDEX ind_class_stdID ON Student (class, StdID) Sample Query Q28_CREATE_INDEX Result Example (DROP INDEX) The following delete the index that was created in the previous example. DROP INDEX ind_class_stdID ON Student Sample Query Q28_DROP_INDEX Result 118 Exporting Data from MS Access Most databases are equipped with some data export facility so that data within a database can be “exported” for the use of other applications. Many of them also allow not only export of data, but also export of table structures, queries and other database objects. Those features enable user to migrate their data base from one database to another database. In this section, we will briefly mention the data export facility in Microsoft Access. Specifically it allows its users to export data as text, HTML or Microsoft Excel format. Export Data from an MS Access Database to Another Access Database 1. Open the existing MS database and select the database object that you want to export by clicking on it. 2. Click File  Export from the menu bar 119 3. Enter the file name of another Access database (.mdb) and click the Export button Export Data from an MS Access Database in other file formats 1. Open the existing MS database and select the database object that you want to export by clicking on it. 120 2. Click File  Export from the menu bar 3. Change the file format to “TEXT”,”HTML” or “EXCEL” 121 4. Enter the file name and click the Export button 122 Database Class Practice Activities 01 http://www.yll.edu.hk/~yll-cym/ca/download/database_activity_01.mdb There are 3 tables in a database, the structures are shown below: CLUB Field Data type Width Description studID character 20 ID of the students clubname character 20 The name of the club fee character 1 If the fee is settled, ‘Y’, ‘N’ otherwise position character 20 The position of the students in that club Info Field Data type Width Description sex character 1 Sex of the students name character 20 Name of the students address character 100 District, e.g. “Yuen Long”, “Tin Shui Wai” class character 3 The class, e.g. 1A, 2A classno character 2 The class number studID character 7 ID of the students Result Field Data type Width Description subj character 20 The name of subject mark integer / The mark of the student for that subject studID character 7 The ID of the student Step 1. Description of the requirement / Corresponding SQL Create a table called club with the above structure: CREATE TABLE club ( studID varchar(20), clubname varchar(20), fee char(1), position varchar(20) ) 2. Add a new field called “skill” in the table “club” which is a character field with 20 character width. ALTER TABLE club ADD skill varchar(20) 3. Reduce all the mark by 10 for the subject “chin” in the table “result”. UPDATE result SET mark = mark-10 WHERE subj="chin" 123 4. Reset the table “club” such that all the position is changed to “senior member”. UPDATE club SET position = 'senior member' 5. Set the skill to “violin” for the student whose studID is “2006004” in the table club. UPDATE club SET skill = 'violin' WHERE studID='2006004'; 6. Set the fee to ‘N’ for the clubname = ‘maths’ in the table “club”. UPDATE club SET fee = 'N' WHERE clubname="maths"; 7. Insert a new record with the following information: studID clubname fee position skill 2006004 music N member piano INSERT INTO club VALUES ('2006004', 'music', 'N', 'member', 'piano'); 8. Insert a new record with the following information: clubname studID position fee chem 2006007 member Y INSERT INTO club ( clubname, studID, position, fee ) VALUES ('chem', '2006007', 'member', 'Y'); 9. Cancel the field “skill” in the table club. ALTER TABLE club DROP skill 10. Create a table called staff with the following structure: Field Data type Width Description id character 5 ID of the staff name character 20 Name of the staff dob date / Date of birth salary numeric 6, zero decimal Salary of the staff CREATE TABLE staff ( id char(5), name varchar(20), dob date, salary numeric(6,0) ) 11. List all the data from the table “info”. SELECT * FROM info; 12. List the fields “class” and “name” from the table “info” 124 SELECT class, name FROM info; 13. In each class, there are students from different district (i.e. the field address). Show the class and the address with no duplication. SELECT DISTINCT class, address FROM info 14. Show the name of the students and his or her class, the address would have a letter “e” and the name would start with the letter “M”. SELECT name, class FROM info WHERE address LIKE "%e%" AND name LIKE "M%" 15. Show the studID and his or her mark, select only those mark that are in the range of 75 and 95. SELECT studID, mark FROM result WHERE mark BETWEEN 75 AND 95 16. What does it mean for the following SQL? SELECT studID, mark FROM result WHERE subj="Eng" ORDER BY mark; It will show the student ID (studID) and the mark of the students of the subject “Eng”, the order of the list is according to the mark in ascending order. 19. Show the address, class and the name of the students, the list should be sorted by the address in descending order, then, sorted by class and then name accordingly in ascending order, the right diagram shows the result: SELECT address, class, name FROM info ORDER BY address DESC , class, name; 20. For each subject, shows the average mark. 125 SELECT subj, AVG(mark) FROM result GROUP BY subj; 21. Show the average mark of the student ID (studID) form the table “result”. SELECT studID, AVG(mark) FROM result GROUP BY studID; 22. Show how many 1A students lived in each district (the field address in the table “info”). SELECT address, COUNT(*) FROM info WHERE class="1A" GROUP BY address; 23 For each district, show the number of students who are living in that district. Name the field address as “district”, Count(*) as “cnt”. SELECT address AS district, COUNT(*) AS cnt FROM info GROUP BY address HAVING COUNT(*)<5; 26. For each student, shows his or her average mark. SELECT studID, AVG(mark) FROM result GROUP BY studID HAVING AVG(mark)>70; 27. Show each students’ marks in each subject. [Use Inner Join] SELECT info.class, info.name, result.subj, result.mark FROM info, result 126 WHERE info.studID=result.studID; 28. Show a list of form one girl whose English mark is at least 80. SELECT info.class, info.name, result.mark FROM info, result WHERE info.studID=result.studID And info.sex='F' And result.mark>=80 And result.subj='Eng' And class Like "1%"; 29. Show a list of students and their number of subjects in the table “Result”, e.g. Ming Chan has marks in both the subjects in “Eng” and “Chin”. So, Ming Chan will have 2 in number_subj. SELECT info.class, info.classno, info.name, COUNT(*) AS number_subj FROM info, result WHERE info.studID=result.studID GROUP BY info.class, info.classno, info.name 30. Show a list of students average marks on class and subject basis. SELECT info.class, result.subj, ROUND(AVG(result.mark),2) AS average FROM info, result WHERE info.studID=result.studID GROUP BY info.class, result.subj; 31. Show a list of students who are failed in those subjects. SELECT info.class, info.name, result.subj, result.mark FROM info, result WHERE result.studID=info.studID And result.mark<50; 127 32. Show a list of students who have not attend any club in the school. [Hint: You should focus on how to find all those studID that does not appear in the table “club”] SELECT studID, name FROM info WHERE studID NOT IN (SELECT studID FROM club); 33. Show a list of students ID who is both a member of the club “eng” and “chin” SELECT studID FROM club WHERE clubname = 'eng' AND studID IN (SELECT studID FROM club WHERE clubname='chin'); 34. Show a list of students ID who may be a member of ‘eng’ or a member of ‘chin’. [Note: Try to use the command ‘UNION’ instead of ‘OR’.] SELECT studID FROM club WHERE clubname = 'eng' UNION SELECT studID FROM club WHERE clubname = 'chin'; 35. Show a list of students ID (and their marks) whose mark is higher than the average in the subject ‘eng’. SELECT studID, mark FROM result WHERE subj = 'eng' AND mark > (SELECT AVG(mark) FROM result WHERE subj ='eng'); 36. Show a list of students and the club they attended to by making use of the command LEFT JOIN. SELECT info.name, club.clubname FROM info LEFT JOIN club ON club.studID=info.studID; 128 Database Class Practice Activities 02 http://www.yll.edu.hk/~yll-cym/ca/download/database_activity_02.mdb There are 5 tables in a database, the structures are shown below: CLUB Field Data type Width Description studID character 20 ID of the students clubname character 20 The name of the club fee character 1 If the fee is settled, ‘Y’, ‘N’ otherwise position character 20 The position of the students in that club Info Field Data type Width Description sex character 1 Sex of the students name character 20 Name of the students address character 100 District, e.g. “Yuen Long”, “Tin Shui Wai” class character 3 The class, e.g. 1A, 2A classno character 2 The class number studID character 7 ID of the students Result Field Data type Width Description subj character 20 The name of the subject mark integer / The mark of the student for that subject studID character 7 The ID of the student Teacher Field Data type Width Description teaID character 3 The teacher ID, it is the primary key of the table. teaName Var char 20 The name of the teacher dob Date / Date of birth of the teacher doa Date / Date of admission to the school subjTeacher Field Data type Width Description class character 3 The name of the class subj Var char 20 The name of the subject teaID character 3 The teacher ID lesson integer / The number of lessons in a cycle for that subj in that class. 129 *It is assumed that a teacher may teach more than one subject for a particular class, also, a lesson can be taught by more than one teacher. Step 1. Description of the requirement / Corresponding SQL State the SQL statement required to create the table ‘teacher’, you should explicitly state that the values in the field teaID should be unique but not null. CREATE TABLE teacher ( teaID char(3) UNIQUE NOT NULL, teaName varchar(20), dob date, doa date, PRIMARY KEY (teaID) ) 2. Add the following data to the table ‘teacher’. Rec No. teaID teaName dob Doa 1 006 Cheng Yung 1981/5/12 2004/9/1 2 003 Ma Ting 1978/2/5 2003/9/1 3 016 Law Man 1973/1/29 1999/12/5 4 012 Kon Li 1965/12/16 1994/9/1 Write the SQL statement required to add the data rec no = 3 to the table teacher. INSERT INTO teacher ( teaID, teaName, dob, doa ) VALUES ('016', 'Law Man', #1/29/1973#, #12/5/1999#); 3. In the table subjTeacher, can “class + teaID” be set as the Primary Key? Why? Class + teaID should not be set as the primary key, otherwise, one teacher can teaches a specific class for just one subject only which is not practical. 4. What should be set as the primary key for the table subjTeacher? class, subj, teaID 5. Write down the SQL statement that is needed to create a table which has the primary key stated in question 4. CREATE TABLE subjTeacher ( class char(3) NOT NULL, subj varchar(20) NOT NULL, teaID char(3) NOT NULL, lesson integer, PRIMARY KEY (class, subj, teaID)) 6. Set the foreign key ‘teaID’ in the table ‘subjTeacher’ by referencing the table ‘Teacher’. What is the advantage of using a foreign key? 130 ALTER TABLE subjTeacher ADD FOREIGN KEY (teaID) REFERENCES teacher(teaID) Foreign key is useful in ensuring data integrity or more specific referential integrity. 7. Show a list of teachers and the total number of lessons of that teachers. SELECT teacher.teaName, SUM(subjTeacher.lesson) AS total_lesson FROM teacher LEFT JOIN subjTeacher ON subjTeacher.teaID=teacher.teaID GROUP BY teacher.teaName; 8. Show a list of class, teacher and the number of subjects taught by this teacher. SELECT subjTeacher.class, teacher.teaName, COUNT(*) AS number_subject FROM teacher INNER JOIN subjTeacher ON subjTeacher.teaID=teacher.teaID GROUP BY subjTeacher.class, teacher.teaName HAVING COUNT(*)>1; 9. Show the name of English teachers. SELECT DISTINCT teacher.teaName FROM teacher INNER JOIN subjTeacher ON subjTeacher.teaID=teacher.teaID WHERE subjTeacher.subj='eng'; 10. Show the name of the teachers who is young than 30. SELECT teaName, ROUND((date()-dob)/365,1) AS age FROM teacher WHERE (date()-dob)/365<30; 11. Show the list of teachers who has not yet assigned any lessons. SELECT teaName FROM teacher WHERE teaID NOT IN (SELECT DISTINCT teaID FROM subjTeacher) 12. a) Create a view named “view1” which will include the class, classno, name and mark for the subject ‘Chinese’ for 1A students. 131 CREATE VIEW view1 AS SELECT DISTINCT name FROM info 12. b) Create a view called “view2” such that it will hold the information from the table info about the students from the class ‘1A’. Here, you should note that what will happen if the data in the view is being modified. Also, give reason why this approach is being used. The data in the source table is being modified too. This approach is being used because when a view is opened to a particular users, if that users is permitted to make modifications, then, the modifications should not be limited to the view but also the original source table. 13. Create a index named “index1” which is according to class and classno on the table info. CREATE INDEX index1 ON info (class, classno) 14. Add a constraint named “cons1” to the table info such that the class and classno would be unique. ALTER TABLE info ADD CONSTRAINT cons1 UNIQUE (class, classno) 15. Add a constraint named “cons2” to the table result such that the mark has to be less than 100. ALTER TABLE result ADD CONSTRAINT cons2 CHECK (mark < 100) 16. Add a constraint named “cons3” to the table info such that the first 2 characters of the studid would have to be “20. ALTER TABLE info ADD CONSTRAINT cons3 CHECK (LEFT(studid,2) = '20') 17. Create a new table called 1A_result. Then, insert those data from the table result about the 1A students. We should use two-step query: CREATE TABLE 1A_result (subj char(20), mark int, studID char(10)) INSERT INTO 1A_result SELECT * FROM result WHERE studID IN (SELECT studID FROM info WHERE class = '1A'); OR in Oracle, we can use this statement: CREATE TABLE table1 AS SELECT * FROM result WHERE studID IN (SELECT studID FROM info WHERE class=’1A’ 132 More on Join A cross join is a specialized inner join. It does the same thing as the inner join, but it does not have a WHERE clause, making it the Cartesian product of the tables you are comparing. Thus, the cross join query could look like this: SELECT * FROM Actor, Movie ActorID Actor.MovieID Movie.MovieID Name Title Year 1 22 21 Tom Hanks A Beautiful Mind 2002 1 22 22 Tom Hanks Forrest Gump 1994 1 22 23 Tom Hanks The English Patient 1999 2 21 21 Russell Crowe A Beautiful Mind 2002 2 21 22 Russell Crowe Forrest Gump 1994 2 21 23 Russell Crowe The English Patient 1999 3 23 21 Ralph Fiennes A Beautiful Mind 2002 3 23 22 Ralph Fiennes Forrest Gump 1994 3 23 23 Ralph Fiennes The English Patient 1999 So, what would happen if the SQL is changed into SELECT * FROM Actor, Movie WHERE actor.movieID = movie.movieID 133 SQL Exercise 01: 1 The staff information of a company is stored in a table with the following structure: STAFF Field Name Type Width Dec ID Character 5 ID of a staff, it is the primary key and will never null name Character 20 name of a staff salary numeric 9 dob date / 2 Description salary of the staff Date of birth State the SQL needed to create this table. CREATE TABLE staff ( id char(5) UNIQUE NOT NULL name char(20) salary numeric(9,2) dob date , , , , PRIMARY KEY (id) ) 2 The staff information of a company is stored in a table with the following structure: Branch Field Name Type Width Description staffID Character 5 ID of a staff, it will not be null branchID Character 3 Branch ID, it should be a foreign key to another table called “branch”, it will not be null State the SQL needed to create this table. CREATE TABLE branch ( staffID char(5) NOT NULL , branchID char(3) NOT NULL FOREIGN KEY , (branchID) REFERENCE branch ) 3 The staff information of a company is stored in a table with the following structure: Branch Field Name Type Width Description staffID Character 5 ID of a staff, it will not be null branchID Character 3 Branch ID, it should be a foreign key to another table called “branch”, it will not be null State the SQL needed to add a primary staffID and branchID as the primary key for the table. 134 ALTER TABLE branch ( (branchID, staffID) ADD PRIMARY KEY ) 4 The staff information of a company is stored in a table with the following structure: Branch Field Name Type Width Description staffID Character 5 ID of a staff, it will not be null branchID Character 3 Branch ID, it should be a foreign key to another table called “branch”, it will not be null State the SQL needed to add a constraint called “staff_length” to check whether the the length of staffID = 5 or not. ALTER TABLE branch ( ADD CONSTRAINT staff_length CHECK (len(staffID)=5) ) 5 Insert records with the following data to a table called “Member” id 0013 Peter name class 1A sex M dob 05/14/92 (id, name, class, sex, dob) INSERT INTO member ( VALUES (‘0013’, ‘Peter’, ‘1A’, ‘M’, {05/14/92} ) ) 6 In the table “result”, add 5 marks to each record in the field “eng”. SET UPDATE result 7 Delete the records of the table “student” with the field “class” = 2B DELETE FROM 8 eng = eng + 5 student WHERE class = ‘2B’ Drop a table “result” in the database “DB1” DROP TABLE result 9 Drop the field “mark” in the table “student” ALTER TABLE DROP mark <End of SQL Exercise 01> 135 SQL Exercise 02: Consider the following database file student.dbf to store the information of students: STUDENT 1. a) field type width contents id numeric 4 student id number name character 10 name dob date 8 date of birth sex character 1 sex: M / F class character 2 class hcode character 1 house code: R, Y, B, G dcode character 3 district code remission logical 1 fee remission mtest numeric 2 Math test score List all the 2A students SELECT * FROM student WHERE class="2A" b) List the names and Math test scores of the 1B boys. SELECT class, mtest FROM student WHERE class="1B" AND sex="M" c) List all the 2B boys who were born on Monday. SELECT name FROM student WHERE class="2B" AND sex="M" AND DOW(dob)=2 [In access, DOW is not supported and it is out of syllabus.] 2. a) List the classes, names of students whose names contain the letter "e" as the third letter. SELECT class, name FROM student WHERE name LIKE "_ _e%" b) List the classes, names of students whose names start with "T" and do not contain "y". SELECT class, name FROM student WHERE name LIKE "T%" AND name NOT LIKE "%y%" c) List the names of 1A students whose Math test score is not 51, 61, 71, 81, or 91. SELECT class, name, mtest FROM student WHERE class="1A" AND mtest NOT IN (51, 61, 71, 81, 91) d) List the students who were born between 22 March 86 and 21 April 86 SELECT class, name, dob FROM STUDENT WHERE dob BETWEEN {03/22/86} AND {04/21/86} 3. a) Find the number of girls living in Tsim Sha Tsui (TST). SELECT COUNT(*) FROM student WHERE sex="F" AND hcode="TST" 136 b) List the number of pass in the Math test of each class. (passing mark = 50) SELECT class, COUNT(*) FROM student WHERE mtest >= 50 GROUP BY class c) List the number of girls grouped by each class SELECT class, COUNT(*) FROM student WHERE sex="F" GROUP BY class d) List the number of girls grouped by the year of birth. SELECT YEAR(dob) as ydob, COUNT(*) FROM student WHERE sex="F" GROUP BY ydob e) Find the average age of Form 1 boys. SELECT AVG((DATE( )-dob)/365) FROM student WHERE class LIKE "1_" AND sex="M" 4. a) List the students with fee remission, in the order of their classes and names. SELECT class, name FROM student WHERE remission ORDER BY class, name b) The range of the Math test of a group of students is defined as: Range = Maximum – Minimum. List the range of the girls of each class. SELECT class, MAX(mtest)–MIN(mtest) FROM student WHERE sex="F" GROUP BY class c) The controlled average (CAVG) of the Math test of a group of students is the average score from which the highest and the lowest scores are excluded (ie. only n–2 out of n data are used). List the CAVG of the Form 1 boys of each house. SELECT hcode, (SUM(mtest)–MAX(mtest)–MIN(mtest)) / (COUNT(*)–2) FROM student WHERE sex="M" GROUP BY hcode 5. a) Create a view with the name view1 that contains the names and dob of the students, order in the ascending order of the dob. CREATE VIEW view1 AS SELECT name, dob FROM student ORDER BY dob b) List the name, class and Math test score of the students whose score is at least 10 marks greater than the average score of his / her class. CREATE VIEW view1 AS SELECT class, AVG(mtest)+10 AS mark FROM student GROUP BY class SELECT s.name, s.class, s.mtest FROM student s, view1 v WHERE s.class = v.class AND s.mtest > v.mark 137 The files phy.dbf, chem.dbf, bio.dbf are respectively the data files of the Physics Club, Chemistry Club and Biology Club. PHY / CHEM / BIO 6. field type width contents id numeric 4 student id number name character 10 name sex character 1 sex: M / F class character 2 class a) List the students who are common members of the Physics Club and the Chemistry Club. SELECT * FROM phy WHERE id IN (SELECT id FROM chem) b) List the students who are common members of the Chemistry Club and Biology Club but not of the Physics Club. SELECT * FROM chem WHERE id IN (SELECT id FROM bio) AND id NOT IN (SELECT id FROM phy) Consider the following swim.dbf which contains the information of Form 1 students participating in the Swimming Gala. [and also student.dbf] SWIM 7. a) field type width contents id numeric 4 student id number event character 20 event Print a list of 1A students taking part in the Swimming Gala, ordered by their names. The list should also contain the events. SELECT s.name, w.event FROM student s, swim w WHERE s.id=w.id AND s.class="1A" ORDER BY 1 TO PRINTER b) List the Blue House members taking part in Free Style events. SELECT DISTINCT s.class, s.name FROM student s, swim w WHERE s.id=w.id AND s.hcode="B" AND w.event LIKE "%Free Style" c) List the number of students taking part in each event. SELECT w.event, COUNT(*) FROM student s, swim w WHERE s.id=w.id GROUP BY w.event d) Print a complete list of the Swimming Gala. The list should also show the students not taking part in any event with "******". The list should be order by class and student name. SELECT s.class, s.name, w.event FROM student s, swim w WHERE s.id=w.id UNION SELECT class, name, "******" FROM student WHERE id NOT IN (SELECT id FROM swim) AND class LIKE "1_" ORDER BY 1, 2 138 e) List the students taking part in two or more events. [Self-join] SELECT DISTINCT s.class, s.name FROM student s, swim w1, swim w2 WHERE s.id=w1.id AND s.id=w2.id AND w1.event <> w2.event f) List the boys of each House taking part in the Swimming Gala but not taking part in 50m Back Stroke, ordered by House and student name. SELECT hcode, name FROM student WHERE sex="M" AND id IN (SELECT id FROM swim) AND id NOT IN (SELECT id FROM swim WHERE event = "50m Back Stroke") ORDER BY 1, 2 <End of SQL Exercise 02> SQL Exercise 03 1 There are two database tables, “book” and “borrow_record” for the library. Their structures are shown below: Book Field Name Type Width Dec Description bookID Character 10 ID of the book title varchar 50 The title of the book abstract memo / The abstract of the book Field Name Type Width bookID Character 10 ID of the book dob Date / Date of borrow userID character 10 ID of the borrower borrow_record Dec Description Answer the following questions: a) Why the field “title” in the table book would use the data type varchar? Should we change the field bookid into the data type varchar? The data type varchar can store the contexts of that field with variable length so that it can save the storage of the computer. In this example, the length of the variable “title” is variable with the ceiling 50 characters. It seems that varchar is much more flexible when compared with the data type char of which the length is fixed. However, it is more efficient if the length field is short (because varchar will contain overhead 2 bytes) or a field of common field length (e.g. sex, phone number, etc), so, we should not change the data type of the field bookID. b) Why the field “abstract” in the table book would use the data type memo? The data type memo is not only variable in length but also it is unlimited in length (which 139 varchar will have a limit, say, 255 characters for ACCESS. So, memo is a suitable data type for abstract. c) Now, you want to set the bookID in the table “borrow_record” as a foreign key with the reference of the table book. (i) (i) State the SQL statement needed. (ii) Under what conditions we cannot set the foreign key to another table? ALTER TABLE borrow_record ( ADD FOREIGN KEY (bookID) REFERENCES book ) (ii) The foreign key has to be mapped into the primary key of the reference book. So, if the reference book (in this case, “book”) has no primary key set, the SQL statement cannot be executed completely. d) Now, you want to set studentID as a foreign key to another table. Can we create two foreign keys (bookID and studentID) to two different database tables? Yes, we can form two different foreign keys in a table. 2 There is a database table “Enrollment”, the structure is shown below: Enrollment a) Field Name Type Width Description courseID character 5 ID of the course studentID character 5 ID of the student group Integer / The group number of the student he is attending Now, you want to set the field courseID and studentID as the composite primary key of this table. State the SQL needed (command: ALTER TABLE). ALTER TABLE enrollment ( ADD PRIMARY KEY (courseID, studentID) ) b) Under what condition we cannot set the primary key? If there is some data inside the database table, and unfortunately, some of the records of which the combination of courseID and studentID is not unique, which is the requirement of a primary key, then, we cannot set it as the primary key by that SQL statement. <End of SQL Exercise 03> 140 SQL Exercise 04 1. i. With reference to the records in the table info: ITEM_NO DESC QTY SAL_PRICE CATEGORY 003 Bugle 5 1500 wind 004 Piano 3 13000 percussion 005 Violin 2 6500 strings 001 Trumpet 3 2500 wind 006 Saxophone 4 4000 wind 002 Tuba 4 4000 wind 007 Drum 3 18000 percussion Create a new field called “amount” which is a numeric data with width=10 and 2 decimal places. Then Update the field amount by multiplying the quantity (QTY) and the sale prices (SAL_PRICE). ALTER TABLE info ADD amount numeric(10,2) UPDATE info SET amount = qty*salary ii. Show a list of musical instruments in descending of their prices. SELECT desc, sal_price FROM info ORDER BY 2 DESC iii. Show a list of ITEM_NO of which its’ SAL_PRICE is neither the highest nor the lowest. SELECT item_no FROM info WHERE sal_price NOT IN (SELECT max(sal_price), min(sal_price) FROM info) iv. What would be the output if the following SQL statement is executed: SELECT desc FROM info WHERE desc NOT LIKE “%e” AND desc LIKE “%e%” Trumpet v. Create a list that shows the categories of the musical instruments of which the total number of quantity of that category is more than 10. SELECT category, SUM(qty) AS cnt FROM info GROUP BY category HAVING SUM(qty) > 10 <End of SQL Exercise 04> 141 SQL Exercise 05 A library stores the information about the books in the following table: BOOK.DBF Field Name Book_id Title Type Date_pur Author ISBN 1. Type Character Character Character Date Character Character Width 4 40 40 8 40 20 Decimal Create a list that shows the book title, type and author of which the book ID range from 2100 to 2160. ____________________________________________________________________ ____________________________________________________________________ 2. Display the book titles which consist of the words ‘Plants’ or ‘Tree’. The book titles may be in upper or lower cases. ____________________________________________________________________ SELECT title FROM book WHERE UPPER(title) = “PLANTS” OR UPPER(title) = “TREE” ____________________________________________________________________ ____________________________________________________________________ 3. Display a list of book type, date of purchase and title by ordering the records by their Type. Within each type, arrange the records in ascending order of Date_Pur. ____________________________________________________________________ ____________________________________________________________________ 4. Count the number of books that were purchased on or before 01/01/1995. ____________________________________________________________________ ____________________________________________________________________ 5. Show the title of the book of which the ISBN equals “0333469267”. ____________________________________________________________________ ____________________________________________________________________ 6. Count the number of books by each book type. ____________________________________________________________________ ____________________________________________________________________ 7. Find out the title of the book that has the shortest name. ____________________________________________________________________ ____________________________________________________________________ 142 ____________________________________________________________________ 8. Modify the table to include a character field with the width=20, NewBook_ID, which stores the new book ID of the book and its record should be unique. Update this new book ID according to the following table. Book Type Physics Chemistry Integrated Science New Book ID “P” + Old Book ID “C” + Old Book ID “I” + Old Book ID ____________________________________________________________________ ____________________________________________________________________ ____________________________________________________________________ ____________________________________________________________________ ____________________________________________________________________ 9. Delete the records of those books which have been purchased for more that 50 years. (Note: You need to compare the year of purchase with that of the system date and you may assume a year to have 365.25 days) ____________________________________________________________________ ____________________________________________________________________ <End of SQL Exercise 05> 143 Past Paper Investigation 2004 – AS – CA #1 1. A table is created with the following SQL command to store the subject scores of Chemistry (CHEM), Biology (BIO) and Computer Studies (CS) of a class of students. REG_NO and EN_NAME represent the registration number and name of a student respectively. CREATE TABLE 4D ( REG_NO CHAR(6), EN_NAME CHAR(30), CHEM INTEGER, BIO INTEGER, CS INTEGER ) (a) Modify the above SQL command so that two records with the same registration number cannot be input into 4D. (1 mark) Reg_no CHAR(6) UNIQUE OR Reg_no CHAR(6) PRIMARY KEY (b) Write an SQL command to input the following record into 4 D. Registration number: S03108 Name: Fok Chi Yuen Chemistry: 73 Computer Studies: 88 (1 mark) INSERT INTO 4D (reg_no, en_name, chem, cs) VALUES (‘S03108’, ‘Fok Chi Yuen’, 73, 88) (c) Because of a modification in the examination paper of Chemistry, all students will be awarded two additional scores. Write an SQL command to increase the value of CHEM by 2 in each record. (1 mark) UPDATE 4D SET chem = chem + 2 (d) Describe the purpose of the following SQL command: DELETE FROM 4D WHERE LEN(TRIM(EN NAME)) = 0 (1 mark) Remove all records which contain a null string in field en_name 2004 – AS – CA #7 7. Below are two database files, DB1 and DB2, where the first row indicates the field names. subject staff_code Staff_code Staff_name Chinese 01 01 May Au 144 (a) English 03 Maths 05 02 Billy Ho Apply equi-join on DB1 and DB2, and write the result in the space provided. subject Staff_code (DB1) Staff_code (DB2) Staff_name (2 marks) Answer: (a) subject Staff_code (DB1) Staff_code (DB2) Staff_name Chinese 01 01 May Au Apply full outer join on DB1 and DB2, and write the result in the space provided. Subject Staff_code (DB1) Staff_code (DB2) Staff_name (2 marks) Answer: subject Staff_code (DB1) Staff_code (DB2) Staff_name Chinese 01 01 May Au .NULL. .NULL. 02 Billy Ho English 03 .NULL. .NULL. Maths 05 .NULL. .NULL. It could be not in order (Order of records is always not important.) It should be exactly correct. 145 Revision Exercise 01 1. How many tables will be needed to present the following relationships in 3rd normal form? (i) A B 1 1 The number of tables needed is : The foreign key is in table (ii) A B 1 M The number of tables needed is : The foreign key is in table (iii) A B 1 1 The number of tables needed is : The foreign key is in table (iv) A B N M The number of tables needed is : The foreign key is in table (v) 1 1 The number of tables needed is : The foreign key is in table 2. For the following database schema, employeeDepartment(employeeID, name, job, departmentID, departmentName) What kinds of anomaly does it suffer? Briefly explain the scenarios of the anomalies. There are 3 kinds of anomalies, they are insertion anomaly, deletion anomaly and modification anomaly. For employees with the same department, the name of department will be repeated several times, so, it is obviously redundancy here. Most importantly, if the name of department for the departmentID =123 changed from “Personnel” to “Public Relationship”, it results one change in every employee in this department, so, it suffers modification anomaly. Also, you just cannot insert a new department if no staff has been assigned to that department, 146 so it also suffers insertion anomaly. At the same time, when you want to delete the last employee in that department, you not only delete the employee but the whole department, so, it suffers deletion anomaly also. 3. For the following database schema, what kinds of anomaly it does suffer? Subject (SubjID, SubjName, TeacherInChargeName) Teacher(TeacherID, TeacherName, TeacherSex, TeacherDoB, TearcherAdmissionDate) If a subject chairperson “Mr. Wong” quit the job and “Ms. Cheung” takes the post, then, we have to changed more than once for this piece of information. Therefore, it suffers from modification anomaly. 4. For a database schema Marriage(Male, Female) What kinds of anomaly does it suffer? it suffers from insertion anomaly and deletion anomaly. It is because you cannot insert a female if she is married to a man, so, it suffers from insertion anomaly. Also, if you want to delete a particular man, you will delete another female information, so, it suffers from deletion anomaly. 5. (i) Why do anomalies happen? Because of data dependency and data redundancy (ii) How to avoid anomalies? Normalization. 6. Transform the following M:N relation into multi M:1 relations. M N Student Take 1 Student M Make Course N Application 1 Refer Course <End of Revision Exercise 01> 147 Revision Exercise 02 David is a database administrator in a famous chained bookstore, he designed a database table called “book_info” with the following structure: Field a) Type Width Decimal Description bookcode character 8 / The code for the book title character 50 / The title of the book publisher character 50 / The publisher of the book author character 100 / The author the book pub_date Date 8 / Date of publication price Numeric 6 2 The price of the book discount Numeric 3 2 The discount of the book Write the SQL statement such that it will create a database table with the structure shown in above. CREATE TABLE book_info (bookcode char(8), title char(50), publisher char(50), author char(100), pub_date date, price numeric(6,2), discount numeric(3,2)) b) Modify the database structure such that the discount will be set to 5% as default value. ALTER TABLE book_info ALTER discount SET DEFAULT 0.05 c) Modify the database structure such that the title of the book can never be null. ALTER TABLE book_info ALTER title char(50) NOT NULL d) Modify the database structure such that the date of publication can never be as early as 1980/1/1. It will give an error message to the invalid input. ALTER TABLE book_info ADD CONSTRAINT cons1 CHECK (pub_date > #1980/1/1#) e) Write down the SQL statement such that it will give the total number of books available in the bookstore according to their publishers. SELECT publisher, COUNT(*) AS cnt FROM book_info GROUP BY publisher f) David suggested that the database structure should be modified such that it can classify the books into different categories. He suggested adding a field called category in the database table. Give your opinions on his suggestions and can you give any suggestion? If a new field is added to the table book_info, then, it will cause a problem if a book can be classified into more than one categories. i.e. It does not allow a book to have two or more categories. To solve the problem, we should create new tables according to the categories, e.g. tables called science, geography, etc. in which the bookcode stored. <End of Revision Exercise 02> 148 Revision Exercise 03 David is a database administrator in a country club. He is responsible to design a database such that it can facilitate the club members to book some services. He created a database file called “activities”, in this database file, it has 3 tables called “member”, “facility”, “booking”. a) What are the differences between database file and database table? The database file is used to contain the database tables. Database table is used to store the data where database table will not be used to store data but the relations. b) Why David needs to create the database file, what kinds of features he cannot perform if he did not create the database file but only created the database table? Foreign key David used the following SQL statements to create the 3 tables: CREATE TABLE member (ID CHAR(9) PRIMARY KEY, name CHAR(25) NOT NULL, Vdate DATE, grade CHAR(10)) CREATE TABLE facility (Fcode CHAR(10) PRIMARY KEY, name CHAR(25), place CHAR(25), price INTEGER) CREATE TABLE booking (Bdate DATE, Btime INTEGER, Fcode CHAR(10), MID CHAR(9)) c) Roughly estimate the file size (in K bytes) of the table “member” if there are 10000 members in the country club. Will the file size be less than, exactly equal to or greater than your estimation? file size = 10000*(9+25+8+10)/1024 = 508 K byte The file size should be a little bit greater than the estimated result, it is because it will have an extra index file for the primary key “ID”. Here is a brief description to the structure of the table “booking”: Field Description Bdate Booked date, i.e. it will record the date that the facility will be used by the member. Btime Booked time, i.e. it will record the time zone, ranging from 1 to 14. (1 represents 8:00 to 9:00, 2 represents 9:00 to 10:00, and so on.) Focde Facility code Mid Member id d) John (member id = 1011103) booked the tennis court (facility code = ten101) on April-20-2006 at the time from 3:00 to 4:00 p.m. Write a SQL statement to insert this record to the table “booking”. 149 INSERT INTO booking (bdate, btime, fcode, mid) VALUES ({04/20/2006}, 8, “ten101”, “1011103”) e) Since the field “Btime” in the table “booking” should only range from 1 to 14, how to use SQL statement to modify the structure of the table “booking” such that it will avoid the invalid inputs. ALTER TABLE booking ALTER btime SET CHECK (btime >= 1 AND btime <=14) ERROR “Input out of range” f) Write a SQL statement such that it will produce a name list of members who have booked the facility more than 5 times. SELECT name FROM member WHERE id IN (SELECT mid FROM book GROUP BY mid HAVING COUNT(*) >= 5) g) There are 3 different grades for memberships, the first one is “general”, the second one is “prestige” and the third one is “VIP”. Now, the company wants to insert a field called “discount” in the table “member” and so, give 50% off for VIP, 20% off for prestige and 10% off for general members. (i) Write a SQL statement such that it will insert a new field to the table “member. ALTER TABLE member ADD discount numeric(3,2) (ii) Write a SQL statement such that it will set the discount to 50% if he / she is a VIP. UPDATE member SET discount = 0.5 WHERE grade = “VIP” (iii) Write a SQL statement such that it will give the total number of booking being made by the members according to their grades. SELECT member.grade, COUNT(*) AS cnt FROM member, booking WHERE member.id = booking.mid GROUP BY member.grade h) Write a SQL statement such that it will produce a name list of members who has spent more than $1000 in the year 2005. SELECT member.id FROM member, facility, booking WHERE booking.mid = member.id AND booking.fcode = facility.fcode AND YEAR(booking.bdate) = 2005 GROUP BY member.id HAVING sum(facility.price) > 1000 <End of Revision Exercise 03> 150 Revision Exercise 04 1. Which of the following is / are DDL? (1) CREATE TABLE (2) ALTER (3) INSERT INTO A. (1) and (2) only B. (1) and (3) only C. (2) and (3) only D. (1), (2) and (3) 2. Which of the following is DML? A. SELECT B. CREATE C. ALTER D. DROP Read the following statements and answer the question 3 to 7. A doctor owns a clinic located in Causeway Bay. There are approximately 100 clients. The doctor records the name, address and phone number of each client on a paper card. For each interview, the doctor will retrieve the paper card of the client and record the date, symptoms, diagnostic results and medications on the paper card. Assume the there is a set of well-known symptoms used by doctors. An ER diagram is find out the overall structure of data during the phase of conceptual data modeling. In the diagram, a relationship is set up between CLIENT and INTERVIEW.3. 3. The cadinalities on the side of CLIENT and INTERVIEW are and respectively. A. Optional, optional B. Optional, mandatory C. Mandatory, optional D. Mandatory, mandatory 4. Which of the following can be included as an entity? A. date of interview B. paper card C. symptoms D. the clinic 5. Which of the following should NOT be included as an attribute? A. A._name of client B. date of interview 151 C. diagnostic results D. Causeway Bay 6. The type of relationship in the direction from CLIENT to INTERVIEW is A. one-to-one B. one-to-many C. many-to-one D. many-to-many 7. Which of the following should be an attribute included in INTERVIEW? A. A name of client B. diagnostic result C. name of doctor D. number of visits by a client 8. Which of the following types of attributes should be resolved? A. key attribute B. non-key attribute C. single-valued attribute D. multi-valued attribute 9. After resolution, which of the following should disappear from an ER diagram? A. 1:1 unary relationship B. 1:1 binary relationship C. 1:M binary relationship D. N:M relationship 10. A binary relationship involves A. two relationships B. two different entities C. two attributes D. more than two entities Conventional Questions: 1. A private tennis club has ten tennis courts that allow members to use. Booking from members is accepted within one week before the tennis court is used. In each booking, each member can reserve at most 3 tennis courts and the duration is a multiple of half-hour. Given that two entities are identified: MEMBER A member of the tennis club. Identifier is MemberID. Other attributes are Name of member and Contact phone number 152 COURT A tennis court to be used by members. Identifiers is CourtID. Other attributes includes Location and Fee. a) Sketch an initial E-R diagram to show the relationship and cardinality between MEMBER and COURT. MemberID Name Fee Location M N b) COURT book MEMBER Redraw the E-R diagram to include an entity BOOKING with attributes including Date and StartTime of use and Duration of booking. 2. The staff of a company will have a number of skills, for example: StaffID StaffName Skill 001 John Smith Access, DB2, FoxPro 002 Dave Jones dBase, Clipper 003 Mike Beach 004 Jerry Miller DB2, Oracle 005 Ben Stuart Oracle, Sybase 006 Fred Flint Informix 007 Joe Blow 008 Greg Brown 009 Doug Hope Access, MSSqlServer 153 a) Is the following database table 1st normal form? If not, how to modify the structure to form a 1st normal form? STAFF(StaffID, StaffName, Skill) No, it is not 1st normal form. It should be changed into Staff (StaffID, StaffName) Skill (StaffID, Skill) b) If we treat Staff and Skill as two entities, then, construct the ER diagram for it. c) Given that any skill is come from a set of pre-defined skills by the company, then, how to change the structure of the database schema to reduce redundancy? Staff (StaffID, StaffName) Skill_Info(Skill_ID, Skill) StaffSkill (StaffID, Skill_ID) Foreign key are being set at the table StaffSkill to the tables Staff and Skill_info to ensure the data integrity d) How many tables will be resulted by the following Entity Relations: (i) 1 to 1 (Mandatory : Mandatory) (ii) 1 to 1 (Optional : Mandatory) (iii) M to N (Mandatory : Optional) (iv) M to 1 (Mandatory : Mandatory) <End of Revision Exercise 04> 154 Revision Exercise 05 (SQL) 1. A machinery company stores the parts information in a table with the following structure: CLIENT Field Name Type Part_no Integer Descript Character Qty Integer supplier Character Width Description Unique code for a part 20 Description of the part Quantity of the part 20 Supplier of the part Write SQL statements to fulfill the following requests. Whenever the columns are not specified, you may use SELECT * … a) Produce a list of parts in ascending order of quantity. SELECT * FROM parts ORDER BY qty ASC b) Produce a list of parts that consist of the keyword ‘Shaft’ in the description. SELECT * FROM parts WHERE descript LIKE ‘%shaft%’ c) Produce a list of parts that have a quantity more than 20 and are supplied by ‘China Metals Co.’ SELECT * FROM parts WHERE qty > 20 AND supplier = ‘China Metals Co.’ d) List all the suppliers without duplication. SELECT DISTINCT supplier FROM parts e) Increase the quantity by 10 for those parts with quantity less than 10. UPDATE parts SET qty = qty + 10 WHERE qty < 10 f) Delete records with part_no equal to 879, 654, 231 and 234 DELETE FROM parts WHERE part_no IN (879, 654, 231, 234) g) Add a field “Date_purchase” to record the date of purchase. ALTER TABLE parts ADD COLUMN date_purchase Date h) Make a copy of the table with only fields part_no and qty. Name the new table as PARTS2. CREATE TABLE parts2 SELECT part_no, qty FROM parts 155 2. A supermarket stores the payroll information a table with the following structure: RESULT Field Name Type Width Dec Description Name Character 20 Name of employee Post Character 20 Post of the employee Rate Numeric 5 2 Hourly salary rate Hour Numeric 6 2 Number of hours worked The salary of each employee is calculated by multiplying the hourly salary rate with the number of hours worked, i.e. Salary = Rate*Hour. Write SQL statements to fulfill the following requests: a) Print a list of employees showing all the information as well as the salary. SELECT name, post , rate, hour, rate*hour AS salary FROM payroll b) Print those employees with salary greater than HK$10,000 SELECT name, rate*hour AS salary FROM payroll WHERE rate*hour > 10000 c) Find the average salary and the maximum salary of employees in the supermarket. SELECT AVG(rate*hour) AS average_salary, MAX(rate*hour) AS Max_salary FROM payroll d) For each kind of post, find the total working hours of employees. Display only those posts with average working hours > 8. SELECT post, SUM(hour) FROM payroll GROUP BY post HAVING AVG(hours) > 8 3. The records in tables setP and setQ are shown below: SETP SETQ X Y X Y 3 5 2 4 1 7 8 3 2 4 1 6 3 6 5 9 2 3 State the result of the following SQL statements: a) SELECT x FROM setp UNION SELECT x FROM setq X: 1, 2, 3, 5, 8 . 156 b) SELECT x FROM setp UNION SELECT y FROM setq X: c) 4. 1, 2, 3, 4, 6, 9 SELECT p.x, p.y FROM setp AS p, setq AS q WHERE p.x = q.x X Y 1 7 2 4 2 3 The staff information of a company is stored in a table with the following structure: STAFF Field Name Type Width Dec name Character 20 name of a staff department Character 20 department of the staff salary numeric 9 2 Description salary of the staff Identify and correct the errors in the following queries: a) SELECT name WHERE salary = (SELECT MAX(salary) FROM staff) Missing the table name in the main query, so, the correction is: SELECT name FROM staff WHERE salary = (SELECT MAX(salary) FROM staff b) SELECT name FROM staff WHERE salary = SELECT MAX(salary) FROM staff Missing the parenthesis “(“ and “)” for the query, so, the correction is: SELECT name FROM staff WHERE salary = (SELECT MAX(salary) FROM staff c) SELECT COUNT(*) FROM (SELECT MAX(salary) FROM staff WHERE salary > 10000) Subquery can be used in the WHERE clause only, so, the correction is: SELECT COUNT(*) FROM staff WHERE salary > 10000 d) SELECT name FROM staff HAVING salary = (SELECT MAX(salary) FROM staff 157 Subquery cannot be used in the HAVING clause, so, the correction is: SELECT name FROM staff WHERE salary = (SELECT MAX(salary) FROM staff e) SELECT name FROM staff WHERE MAX(salary) IN (SELECT salary FROM staff) Aggregate function (like MAX, AVG, MIN, SUM, COUNT) cannot be used directly in the WHERE clause, so, the correction is: SELECT name FROM staff WHERE salary IN (SELECT MAX(salary) FROM staff) g) SELECT name FROM staff WHERE salary = (SELECT salary FROM staff WHERE salary > 10000) More than one record is returned form the sub-query, so, the correction is: SELECT name FROM staff WHERE salary IN (SELECT salary FROM salary > 10000) 5. The result of an English Contest in a class are stored in a table with the following structure: RESULT Field Name Type Width Description Name Character 20 Name of a competitor mark numeric 3 Mark of the competitor Write SQL statements to find the following: a) highest, average and lowest marks SELECT MAX(mark), MIN(mark), AVG(mark) FROM result b) competitor with the highest mark SELECT name FROM result WHERE MARK = (SELECT MAX(mark) FROM result) c) competitors with mark above the average SELECT name FROM result WHERE mark > (SELECT AVG(mark) FROM result) d) numbers of competitors with mark above the average SELECT COUNT(*) FROM result WHERE mark > (SELECT AVG(mark) FROM result) 158 6. An ISP keeps the information of the clients in a table with the following structure: CLIENT Field Name Type Width Description user_id Character 10 A unique code that identifies a user password Character 10 Password for the user name Character 40 Name of the user profession Character 30 Profession of the user Identify and correct the errors in each of the following SQL statements: a) Task: A view PASSWORD is needed to show the user_id and password only for all the clients. SQL: CREATE VIEW password SELECT * FROM client CREATE VIEW password AS SELECT user_id, password FROM client b) Task: A view STUDENT is needed to hold all the information except the password of the clients who are students. SQL: CREATE VIEW student SELECT user_id, name, profession WHERE profession=’Student’ CREATE VIEW student AS SELECT user_id, name, profession FROM client WHERE profession = ‘Student’ c) Task: A view PROF_CNT is needed to show the number of clients in each profession. SQL: CREATE VIEW COUNT(*) FROM client GROUP BY profession CREATE VIEW prof_cnt AS SELECT profession, COUNT(*) FROM client GROUP BY profession <End of Revision Exercise 05> 159 Revision Exercise 06 (SQL) 1. STAFF Field Name Type Width Department Character 20 Dec Description Name of a department: e.g. sales, purchase, account Name Character 20 Date_birth Date Salary Numeric 8 sex Character 1 Name of the employee Date of birth of the employee 2 Salary of the employee Sex of employee: ‘M’ for male, ‘F’ for female Write SQL statements to fulfill the following requests: a) Produce a list to show the names of departments without duplicate lines. SELECT DISTINCT department FROM staff b) Produce a list of all information in alphabetical order of name. SELECT * FROM staff ORDER BY name c) Produce a list of all information in ascending order of age. SELECT * FROM staff ORDER BY date_birth DESC d) Produce a sorted list of staff by name, classifying the staff in department with the male staff in each department followed by the female staff. SELECT department, sex, name, date_birth FROM staff ORDER BY department, sex DESC, name e) Increase the salary by 5% for male staff of Sales Department. UPDATE staff SET salary = salary*1.05 WHERE sex = ‘M’ AND department = ‘Sales’ f) Remove all the records for staff of Account Department DELETE FROM staff WHERE department = ‘Account’ 2. An insurance company stores the client information in a table with the following structure: CLIENT Field Name Type Width Dec Name Character 20 Name of client Sex Character 1 Sex of the client (‘M’ or ‘F’) Date_birth Date 8 Date of birth of the client Occupation Character 20 Occupation of the client premium Numeric 8 2 Description Premium of the client Write SQL statements to fulfill the following requests: 160 a) Produce a list of clients who were born on Feburary, March, June or September. SELECT * FROM client WHERE MONTH(date_birth) IN (2, 3, 6, 9) b) Produce a list showing all the occupations of the clients without duplication SELECT DISTINCT occupation FROM client c) For each year between 1970 and 1990, find the number of clients who were born in the same year. SELECT YEAR(date_birth), COUNT(*) FROM client WHERE YEAR(date_birth) BETWEEN 1970 AND 1990 GROUP BY YEAR(date_birth) d) Find the average premium of female clients SELECT AVG(premium) FROM client WHERE sex = ‘F’ e) Classify clients by year of birth. Find the average premium for those groups with average premium more than HK$500. SELECT YEAR(date_birth), AVG(premium) FROM client GROUP BY YEAR(date_birth) HAVING AVG(premium) > 500 3. A school stores the activity records of the students in two related tables which have the following structures: ENROLLMENT Field Name Type Width Description Name Character 20 Name of a student Club_id Character 4 Unique code of a club enrolled by the student Field Name Type Width Description Club_id Character 4 Unique code of a club name Character 20 Name of the club CLUB A student may enroll on more than one club. Write SQL statements for the following tasks: a) Create a list containing the student name and the corresponding club name(s) for each student. SELECT e.name, c.name FROM enrollment AS e, club AS c WHERE e.club_id = c.club_id ORDER BY e.name b) Create a list containing the club name and the names of club members for each club. SELECT c.name, e.name FROM enrollment AS e, club AS c WHERE e.club_id = c.club_id ORDER BY c.name c) Create a list of club members for Computer club. 161 SELECT e.name FROM enrollment AS e, club AS c WHERE e.club_id = c.club_id AND c.name = ‘Comptuer club’ 4. The club and activity information of students in a school is stored in the following tables: ACTIVITY a) CLUB Name Club_id Club_id Club_Name Janet Chan 02 02 Swimming Isabella Wong 03 03 Violin Quentin Cheung 02 Robin Kong 02 Robin Kong 03 Sidney Ah 03 State the result of the following SQL statement: SELECT name FROM activity WHERE club_id = (SELECT club_id FROM club WHERE club_name = ‘Violin’) Name Isabella Wong Robin Kong Sidney Ah b) Rewrite the above SQL statement using INNER JOIN. SELECT a.name FROM activity AS a, club AS c WHERE a.club_id = c.club_id AND c.club_name = ‘Violin’ OR SELECT a.name FROM activity INNER JOIN club ON activity.club_id = club.club_id AND club.club_name = ‘Violin’ 5. The results of a public examination are stored in the following table: Exam Field Name Type Width Description Subject Character 20 Name of a student Num_credit Numeric 4 Number of student with a credit (A-C) in the subject Num_pass Numeric 4 Number of student passing the subject (D-E) Num_fail Numeric 4 Number of student failing the subject (F) 162 Write SQL statements for the following task: a) Find the passing percentage of each subject. Display the results accurate to 1 decimal place. SELECT subject, ROUND( (num_credit + num_pass) / (num_credit + num_pass + num_fail)*100,1) FROM exam b) Find the subject with the highest passing percentage. SELECT subject FROM exam WHERE ROUND( (num_credit + num_pass) / (num_credit + num_pass + num_fail)*100,1) = (SELECT MAX(ROUND((num_credit + num_pass) / (num_credit + num_pass + num_fail)*100,1) FROM exam) <End of Revision Exercise 06> 163 Revision Exercise 06 (SQL) 1. The Education Bureau keeps the information about the schools in the tables as follows: SCHOOL Field Name Type Width Description Sch_id Character 4 A unique code that identifies a school School Character 40 Name of the school Principal Character 40 Name of the principal of the school telephone Character 10 Telephone number of the school Field Name Type Width Description Subj_id Character 3 A unique code that identifies a subject subject Character 40 The name of the subject Field Name Type Width Description Sch_id Character 4 Code of the school that offers a subject Subj_id Character 3 Code of the subject NumOfStud Numeric 3 Number of students taking the subject SUBJECT OFFER a) Explain what a foreign key field is. State an example using the tables above. A foreign key stores the field which forms a key field of another table. Therefore, a foreign key can uniquely identify a record in another table. Example of foreign keys are sch_id and subj_id in the table OFFER. b) State the primary key for each table. SCHOOL, primary key: sch_id SUBJECT, primary key: subj_id OFFER, primary key: sch_id, subj_id c) Write the SQL statement to create the table SUBJECT and OFFER. CREATE TABLE subject ( subj_id char(3) unique, subject char(40), PRIMARY KEY (subj_id)) CREATE TABLE offer ( sch_id char(4), subj_id char(3), numofstud numeric(3) PRIMARY KEY (sch_id, subj_id)) 164 d) Write SQL statements for the following tasks: (i) Product a list showing those schools which offer the subject ‘Computer Studies’. SELECT s.school FROM school AS s, subject AS j, offer AS o WHERE s.sch_id = o.sch_id AND j.subj_id = o.subj_id AND j.subject = ‘Computer Studies’ (ii) Find how many subjects are available in ‘ABC school’. SELECT COUNT(*) FROM school AS s, offer AS o WHERE s.sch_id = o.sch_id AND s.school = ‘ABC school’ (iii) Produce a list showing the subjects available in ‘ABC school’. SELECT j.subject FROM school AS s, subject AS j, offer AS o WHERE s.sch_id = o.sch_id AND j.subj_id = o.subj_id AND s.school = ‘ABC school’ (iv) Find the total number of students taking the subject ‘Computer studies’ in HONG KONG. SELECT SUM(numofstud) FROM subject AS s, offer AS o WHERE j.subj_id = o.subj_id AND j.subject = ‘Computer Studies’ 2. The results of an inter-class English Contest are stored in a table with the following structure: RESULT Field Name Type Width Description Name Character 20 Name of a competitor Class Character 2 Class of the competitor Mark Character 3 Mark of the competitor Write SQL statements to find the following: a) highest, average and lowest marks in each class. SELECT MAX(mark), MIN(mark), AVG(mark) FROM result b) students with the highest mark in each class. SELECT name FROM result WHERE MARK = (SELECT MAX(mark) FROM result) 165 c) students with mark above the class average mark in each class. SELECT name FROM result WHERE mark > (SELECT AVG(mark) FROM result) d) students in 3A with mark above the overall average mark. SELECT COUNT(*) FROM result WHERE mark > (SELECT AVG(mark) FROM result) 3. A fashion company keeps the stock information in the tables STOCK and DESIGNER. The structures of the tables are shown below: STOCK Field Name Type Width Description product_id Character 4 Unique code that identifies a product designer_id Character 4 Code of the designer for the product type Character 20 Type of the product size Character 1 Size of the product, may be ‘L’, ‘M’ or ‘S’ qty Numeric 4 Quantity of the product Field Name Type Width Description designer_id Character 4 Unique code that identifies a designer name Character 20 Name of the designer telephone Character 10 Telephone number of the designer DESIGNER An example of product_id is ‘0034’. Different sizes of the same product will use different product_id. a) State the primary keys for the above tables. STOCK: Primary key: product_id DESIGNER: b) Primary key: designer_id Explain why the product_id is stored as characters. Reasons for storing product_id as characters 1. Leading zeros or spaces can bed added 2. Calculation on product_id is rare 3. Display of the numbers in product_id is not affected by the display format 4. More efficient 166 c) Explain why the information about the designers is stored separately. A designer may have more than one product. Storing the designers separately can avoid data redundancy. Otherwise, updating the stock records may lead to anomalies, i.e. errors or inconsistencies. d) Write SQL statements for the following tasks: (i) Produce a list showing the total quantity of each design. SELECT SUM(qty) FROM stock GROUP BY product_id (ii) Find the total quantity for the type ‘Pullover’. SELECT SUM(qty) FROM stock WHERE type = ‘Pullover’ (iii) Produce a list showing the product_id of the designer ‘Timothy’, without duplicating rows. SELECT DISTINCT product_id FROM stock AS s, designer AS d WHERE s.designer_id = d.designer_id AND d.name = ‘Timothy’ (iv) Find which design of size ‘M’ has the largest quantity. [Hint: A design is identified by product_id] SELECT product_id FROM stock WHERE size = ‘M’ AND qty = (SELECT MAX(qty) FROM stock WHERE size = ‘M’) <End of Revision Exercise 06> 167 Revision Exercise 07 (SQL) 1. In the Hong Kong District Council Election, the information about the districts and candidates are stored in the following tables: DISTRICT Field Name Type Width Description Dist_id Character 4 A unique code that identifies a district Distric Character 20 Name of the district VoterNum Numeric 7 Number of voter of the district Field Name Type Width Description Candidate Character 4 Name of the candidate Dist_id Character 4 The district code of the candidate NumOfVote Numeric 7 Number of votes obtained by the candidate CAND a) For each of the following tasks, determine whether the SQL statement can fulfill the task. If not, rewrite the SQL statement. (i) Task: Produce a list showing all the districts SQL: SELECT DISTINCT dist_id FROM cand Incorrect. District names are not displayed. The corrected statement is SELECT district FROM district (ii) Task: Produce a list of candidates in Hong Kong East. SQL : SELECT candidate FROM cand AS c, district AS d WHERE district = ‘Hong Kong East’ Incorrect. Join condition is missed. The corrected statement is SELECT candidate FROM cand AS c, district AS d WEHRE c.dist_id = d.dist_id AND district = ‘Hong Kong East’ b) State the meaning of the following SQL statements: (i) SELECT AVG(VoterNum) FROM district Find the average number of voters among all the districts in Hong Kong (ii) SELECT AVG(NumOfVote) FROM cand GROUP BY dist_id Find the average number of votes among all candidates in each district (iii) SELECT dist_id FROM cand GROUP BY dist_id HAVING COUNT(*) > 2 Find the codes for the district which has more than 2 candidates 168 d) Assume that each voter can vote for one candidate only. Write SQL statements to find the following figures: (i) the number of districts in Hong Kong SELECT COUNT(*) FROM district (ii) the total number of voters in Hong Kong SELECT SUM(VoterNum) FROM district (iii) the number of candidates in each district SELECT COUNT(*) FROM cand GROUP BY dist_id (iv) the total number of voters who have voted in each district SELECT SUM(NumOfVote) FROM cand GROUP BY dist_id (iv) the percentage of voters who have voted in Hong Kong. SELECT SUM(NumOfVote) / VoterNum*100 FROM cand AS c, district AS d WHERE c.dist_id = d.dist_id GROUP BY c.dist_id <End of Revision Exercise 07> 169 Past Paper on Database 2001 – AS – CA #2 2. John wants to design a database DB to store information about his friends. Therefore, he designs a database file FRIENDS with the following structure: (a) (i) Although HKID is unique to each person, John cannot use it as the primary key. Explain why not. It is because the “if any” in description indicated that HKID is not a mandatory data item (ii) John wants to define a primary key involving FIRST_NAME and LAST_NAME. Describe the procedure that John should follow. Create a new field by First_Name + Last_Name and define that field as key or Define a composite key “Last_Name + First_Name” is also acceptable (iii) Give an example where the primary key in part (a)(ii) may not be valid. John’s friends may have the same given name as well as surname (b) John uses another database file SCHOOL in database DB to store the school codes and school names. The contents of SCHOOL are shown below. SCHOOL_ID SCHOOL_NAME 081 Eden College 252 Hong Kong Number One Primary School 375 The Hong Kong Government School 441 Olympian Secondary School 782 Hong Kong Iciban Secondary School 956 Intensive Middle School The contents of FRIENDS are shown below. FIRST_NAME LAST_NAME HOME_TEL SCHOOL_ID … Amy Chan 25258123 252 … Johnny Chan 25532152 780 … Chris Cheung 25787523 441 … Mary Lam 24545510 Paul Ng 25458648 … 252 … 170 Joe Yeung 28585656 441 … Do the above database files violate the integrity of the database DB? Explain. (3 marks) Yes It is because the school_ID 780 in school Is missing 2002 – AS – CA #5 5. Ms. Wong is conducting a survey on the service of the school tuck shop by doing the following steps:  collecting completed written questionnaires from students;  inputting the data into a computer; and  presenting the result of the survey using a presentation graphics software package. She finds that there are a lot of mistakes on the completed questionnaires. One of the questionnaires with mistakes is shown below. 1000 Ms. Wong now decides to have a new arrangement so that the students can fill in the questionnaires online. (a) Explain how the online input can help Ms. Wong to improve the following: (i) the completeness of data collection Validate the presence of input data for mandatory fields (e.g. the sex field on the questionnaire) /Check the number of selection, e.g. at most 3 items should be selected (ii) the correctness of data collection Validate the range of data for the correctness of data, e.g. check the number of purchase Validate the format of data for the correctness of data, e.g. check the date format (b) Give a reason to justify Ms. Wong’s new arrangement in addition to the improvements given in Part (a). 171 Reduce the time needed for data input (other reasonable answers) (c) Ms. Wong decides to employ a programmer to develop the system rather than to buy an existing software package available on the market. Give TWO reasons to support Ms. Wong’s decision. Satisfy unique requirements Future modification or enhancement is more possible (d) Ms. Wong only wants to use touch screens for students to input data. Describe how the students can fill in the numerical items in the questionnaire. Use the numeric pad on the touch screen 2003 – AS – CA #2 2 (b) Users sometimes make mistakes when keying in data into a database. Suggest two possible measures that can be considered when designing the database in order to minimize these mistakes. Set a field as primary key and set some fields to be unique Specify validation rules for some fields Set input mask for some fields Set mandatory fields to reject null entry (any two) How to make a field become primary key or unique, we can do it by the command “CREATE TABLE” or “ALTER TABLE”, remember, “Primary key” have to be handled in database level instead of table level. i.e. We cannot change a field of a table if the table is in a database, if the table is not in a database but it is just a single table, we cannot define it as the primary but unique only. Primary key implies the properties of uniqueness. E.g. ALTER TABLE info ALTER stu_id char(10) PRIMARY KEY We can set some validation rules to some fields by the command CREATE TABLE or ALTER TABLE. E.g. CREATE TABLE result (stu_id char(10), test numeric(4,0), exam numeric(4,0) SET CHECK exam >= 0 ERROR “Positive integer only!” (b) (i) Compared with the character data type, state one advantage of defining a field as the memo data type. A memo data type provides storage space for text information of variable length to avoid unnecessary waste of storage space. (unlimited / insert graphics / separate file) (ii) Describe a situation in which it is more appropriate to define a field as the character data type rather than the memo data type. Justification: When the text information in the field is very short (e.g. less 172 than 4 characters) OR the length of the text information is limited OR many complicated string manipulations (e.g. sorting, calculation) are required, it is more appropriate to define as character data type. (2 marks) 2003 – AS – CA #4 4. The following table shows the structure of a database file STUDENT containing the records of all students in a school. Field name Type Width CLS_NAME Character 2 Class name (e.g. 1A, 2E, 4D) CLASS_NO X 2 Class number EN_NAME Character 25 Student name in English Date 8 Date of Admission in format mm/dd/yy Character Y Login name of School Intranet System DOA LOGIN_ID Description For each of the following cases, write suitable statement(s) (SQL / database commands) to generate a login name for each student and store the login name into the field LOGIN_ID. a) X represents Character. Y represents 4. The first two characters of LOGIN_ID are the class name of the student. The last two characters of LOGIN_ID are the class number of the student. Example: For a 5C student with class number 08, his/her LOGIN_ID should be ‘5C08’. (2 mark) UPDATE student SET login_id = cls_name + class_no b) X represents Numeric value without decimal places. Y represents 4. The first two characters of LOGIN_ID are the class name of the student. The last two characters of LOGIN_ID are the class number of the student. Example: For a 5C student with class number 8, his/her LOGIN_ID should be ‘5C08’. UPDATE student SET login_id = cls_name + STR(class_no) WHERE class_no >=10; UPDATE student SET login_id = cls_name + ‘0’+STR(class_no) WHERE class_no <=9 c) X represents Character. 173 Y represents 6. The first two characters of LOGIN_ID are the year of admission. The next two characters of LOGIN_ID are the class name of the student. The last two characters of LOGIN_ID are the class number of the student. Example: For a 5C student with class number 08 who was admitted on 09/01/97, his/her LOGIN_ID should be ‘975C08’. UPDATE student SET login_id = RIGHT(STR(YEAR(doa)),2) +cls_name + class_no 2005 – AS – CA #6 6. Mr. Chin, the Extra-curricular Activities Master of a school, uses the database files, CLUB and MEM, to store the information of clubs and members of those clubs respectively. At the end of a school year, students' testimonials will show their extra-curricular activities records which are retrieved from these files. The records in CLUB and MEM are listed as follows: CLUB ClubNo ClubName Teacherlncharge 001 Science Club Mr. Chan 002 Mathematics Club Ms.Ng 003 English Club Mr.Fok 004 Fencing Club Ms. Chau 005 Tennis Club Mr.Fong 006 Drama Club Ms. Yau 007 Volleyball Club Mr. Tsoi 008 Basketball Club Ms. Lee 009 Football Club Mr.Sung 010 Chess Club Ms. Lau MEM StudentI StudentName Class ClassNo Telephone ClubNo 00001 Chan Chun Yin 1A 3 91632589 003 100003 Chan Ka Ho 1A 5 26739876 004 100003 Chan Ka Ho 1A 5 26739876 008 ：： 174 1700135 Wong Wai 7A 30 61388792 010 There are 10 clubs in the school and thus 10 records in CLUB. The key field of CLUB is ClubNo. The two files are related by ClubNo. One day, a student helps Mr. Chin to input the following new record into MEM: StudentID StudentName Class ClassNo Telephone ClubNo 100003 Chan Ka Ho 1A 5 26739876 012 a) Should the database system accept the above record? Explain briefly. (1 mark) No. There is no such club with ClubNo 012. <Here, we should know it violates the integrity of the database. In fact, even more, you should be enable to make the database to avoid this kind of problem. How? Use foreign key and properly set the criteria like, the inputted values of a field which have to be matched with the primary key value in the parents table. Of course, there are more features in foreign key, revise it if needed. b) During the school year, Ms. Chau, the teacher in charge of the Fencing Club, leaves the school. No other teacher in the school is suitable to lead the Club so it has to be closed. Suggest a modification to the structure of the database file(s) so that the existing clubs in the school can be shown. Add a logical field in the CLUB to indicate whether the club is active or not. <e.g. c) ClubNo ClubName TeacherInCharage Active 012 Football Chan Tai Man .T. In the school, each student can join several clubs. Each time a student joins a club, there will be a record in MEM. (i) State two drawbacks on this arrangement of MEM. 1. data of a student may appear more than once in the file MEM / longer data entry time (Data redundancy) 2. when the data of a student changes, all records related to the student should have to be updated. 3. any incomplete updating causes data inconsistency 4. wastes storage space / longer access time (ii) Suggest a primary key to MEM. StudentID & ClubNo (Class & ClassNo & ClubNo) <- Composite key. 2006 – AS – CA #6 6. A school uses a database file, STU, to store information about students, as follows: STU 175 Field name a) Type Width Description Example of data SNAME character 20 Name of the student Wong Lai Mei SCLASS character 2 Class of the student 3D (Form 3, class D) SNO character 2 Class number of the student 38 STUID character 6 Student code 501478 Give two conditions for a field that can be used as a primary key. Unique and mandatory (non-empty, not null) <- I think all of you should answer this question. Because it just requires some fundamental knowledge (Even though you don’t know the word mandatory, you should know it to be non-null.) In fact, the question itself should not use the word ‘Condition’, unique and mandatory is not condition, they are properties. The condition for the primary key is “It is the field that uniquely identifies every single entry of the database table.” Therefore, sometimes, do not think it too seriously for the questions, try your best to use the knowledge in the book to answer the questions. <- Like primary key, the condition (not property) for INDEX is speed up the data searching for some frequently used fields which can compensate the workload when data is updated. (Always remember that if too many indexes for different fields in a table, a single data entry modification would result a lot of workload for the indexing. So, usually, apart from the primary key, primary key by fault would always be indexed, there are at most one or two more fields to be indexed.) Here, you should know that the properties of the indexed field may not necessarily be unique. i.e. repeated values can be indexed. If in case you do not understand this paragraph, come and ask me. <- If this time, the question ask you ‘What is the condition for creating a VIEW in a database?’, what would you answer? First of all, I would have to admit that I do not the focus point for this question (and it happens always in the questions in ASCA) and I have nothing on my mind, but anyway, I would use the knowledge of the book to answer the question. The answer I would give would be: A view is a virtual table forming by one or more tables existing in the database, so, every condition applied to creating a table would be suitable for a view. E.g. a primary key field should be present. Since the data in the view is come from other existing tables, so, other table should be present. Also, a view is used to facilitate different users can access partial data in different tables, (access here means read and write and modify) so, changes in the view should result in the parents table. <- See, just try to use the knowledge in the book and then answer the question and then you can get the mark. b) Suggest a primary key for STU. STUDID / SCLASS + SNO / SNAME + SCLASS + SNO + STUID 176 <- Since primary key (or candidate key) should be the minimal number of combination of field that uniquely identified each entry in the database table. So, basically, only STUDID should be considered as the ‘appropriate’ choice for the primary key. Another database file, EXAM, is used to store the students’ examination results. EXAM Field name Type Width Description Example of data STUID Character 6 Student code 501478 SUBID Character 2 Subject code CH (Chemistry) MARK Integer 2 Exam mark of student STUID in subject 78 SUBID c) Suggest a primary key for EXAM. STUID + SUBID / STUID + SUBID + MARK <- STUID + SUBID + MARK should be considered as super key but not an appropriate choice for the primary key. d) Assume that EXAM is related to STU. (i) Can STUID in EXAM be used as a foreign key? Explain briefly. Yes, it can uniquely identify the records in STU. (other descriptive statements) (ii) Write down a SQL command to create EXAM with the primary key in part (c). CREATE TABLE EXAM( STUID CHAR(6), SUBID CHAR(2), MARK INTEGER(2), PRIMARY KEY (STUID, SUBID)) Note: (UNIQUE  PRIMARY KEY) Do you still remember how many data types for the database, here is some examples: char(n) numeric(n, m) date logical memo integer 177 Appendix 1: databases and web server and web applications: Web application and database relations When we are going to create some server side application, we usually have to deal with some databases. E.g. An online forum or an online multiple choice system would be some common applications. They are usually done with the following process:  1. Client computer (HTTP request) Description: Client computer sends a HTTP request by using the web browser to the web server. (Usually, the client computer will send the url (e.g. http://abc.com), then, the DNS server will resolve the domain name into its IP address) Web Server Inside the web server, … 2. The client computer will wait The web server will execute the server application for the response. (they are some program codes. These program codes will try to connect to the databases reside on the server. Usually, it is done by using some database engine, e.g. Microsoft Jet 4.0) It is the engine which performs the SQL request.    Server application Database Engine The databases. You should note to the direction of the data flow. 3. … The server application will then generate the HTML codes according to the information obtained by the databases. Then, it will send the web pages (HTML Now, the web server will have codes to the client computer. After downloading the nothing to do but to listen to web pages and its corresponding multimedia files, it the network to see if it can will be interpreted and displayed by the web serve any other client browser. computers. Since it involved with several steps, so, we will investigate it one by one. Step 1: A web server (it is in fact a software program) has to be built to listen to the network and handle the HTTP request accordingly. In the market, there are two common web servers which are widely used and free of charge. They are IIS and Apache (Open source). These web servers provide a framework to 178 deal with the HTTP requests from client computers. i.e. If a client asked a specific file (web page or a multimedia file), these web servers will arranged the time to deliver those data. Step 2: Apart from static web page, a web server is supposed to provide dynamic web page, i.e. it should be able to perform some server side applications. IIS has its server side programming built in, it is called .asp or .aspx(asp.net). However, for Apache, one has to install the PHP to make the web server be able to execute the server side programs, the common files are .php. Last of all, it is not enough to have just the server side programs, an database engine has to be installed to provides the interface or to execute the query from the application to the database. IIS has its engine installed. However, for apache, MySQL has to be installed to get the connection to the database. *The details of setting will be discussed in the section of setting up a web server. Step 3: After the delivery of the web page, basically, there is no connection between the web server and the client computer any more. It is the web browser’s responsibility to interpret the HTML code and displayed it on the screen. That is why the same web page from the same web server will look differently in two different web browsers. Setting up a web server Since IIS is easier to be installed (both the server side programs and the database engine included), so, we will use IIS as the demonstration. In Windows XP or Windows 2000, IIS is regarded as one component. So, to install an IIS, all you have to do is: 1. Call “Add / Remove Program” in Control Panel. 2. Select Install “Windows components” and check the box of “Internet Information Services (IIS)” in the dialog box as shown below: 3. Insert the installation CD (Windows XP prof. edition SP2) and then the IIS is installed. 179 4. To setup the web server, we can call the msc by control panel -> administrative tools -> Internet Information Services. From it, can call modify the property of the web server. 5. First of all we should add “index.htm”, “index.asp” or “index.html” as the default home page for the web server. 6. Now, we can create a web page called “index.htm” and put it in the root of the web server, which is in fact in the path “C:\Inetpub\wwwroot\”. 7. Then, if you’ve created the web page with the name “index.htm” and put it in the correct folder and set the default home page as “index.htm”, then, you should find your server working. You can test it by launching the HTTP request with the following URL in your browser http://127.0.0.1/ For a web server, it should be opened to the public. So, if you are using a real IP, say 212.44.55.66, you can access to the web server from a remote computer through the Internet with the following URL: http://212.44.55.66/. However, if you have a firewall, then, you should set the firewall properly otherwise, remote computer cannot connect to your web server. As you know, let your server being connected from the outside world is like being hacked, so, usually, we will only allow several ports being opened to the public. For the firewall in windows XP, the firewall should be set as 180 control panel -> network -> local area network -> (right click and select properties) -> Advance -> Edit the firewall. Usually, we should turn on the firewall and allow some exception, e.g. HTTP port, we can add a port (HTTP, port number 80) is being opened. By doing so, the web server can be opened to the public and it tries to use port number 80 for the communication. So, by now, the remote client computers are supposed to be able to connect to the web server with the URL: http://212.44.55.66. If your IP address is fixed, you can register a domain name and mapped the domain name with your IP address to hold a web server. E.g. http://abc.com -> http://212.44.55.66. Sometimes, we can set a FTP server and direct the FTP root to the root of the web server for updating… Developing server side application by using server side programming We can develop PHP programs in Apache where ASP programs in IIS. For simplicity, we will only focus on developing ASP programs. To tell the web server that it is an ASP program, the first line of the asp file would be: <%@LANGUAGE=”VBSCRIPT” CODEPAGE=”950”%> Also, all the program codes should be within <% and %>. To get the connection to the database, here are the statements required: Set Conn = Server.CreateObject(“ADODB.Connection”) <- Create a variable to hold the connection conn.Provider=”Microsoft.Jet.OLEDB.4.0” <- Define the database engine used conn.Open(Server.Mappath(“db/project.mdb”)) <- Set the path of the database file set RS = Server.CreateObject(“ADODB.recordset”) <- Create a variable to hold the query(SQL) To interact with the database, we should use SQL, so, we have to define the SQL statement and then execute the SQL and store the result into the recordset RS. sql=”SELECT * FROM student” <- Define the SQL statement 181 RS.Open sql, Conn <- Execute the SQL To ensure there will not be no result has been outputted by the SQL statement, it is common to put it this way: If not RS.eof then <- To test if RS is End of File (eof) RS.movefirst <- If not, points to the first selection in the RS Do <- Start the DO looping …… Rs.movenext <- Next selection in the RS Loop until RS.eof <- Quit condition would be end of file End if After finish using the recordset RS, we should end our connection: RS.close conn.close Set RS = nothing Set conn = nothing The above shows the core of asp statement to get access to the database. Now, we go on to study how the project can be implemented. Developing the project: This project requires an online learning platform. My proposed idea is to let the user login the system and then, give a number of choices, say MC questions, online polling or forum, etc. So, first of all, an index.htm would have to be created. Login page is shown on the left: , after pressing the button, it will pass the information to the server side application called “logon.asp”. Here, we should pay attention to two points, 1) How would information be passed to another web page (web application)? 2) How would the web application distinguish different information? <Form name=”MyForm” action=”logon.asp” Method=”Post”> -> Define the application to logon.asp Username: <INPUT TYPE=text name=”username”><br> -> Name the first textbox to username Password: <INPUT TYPE=”password” name=”passw”><br> -> Name the second textbox to passw <INPUT TYPE=”submit” VALUE=”Submit Form”> -> Note the type of button is submit </Form> *We need to use a <FORM> to submit several data to an web application. After submitting the information to the web application “logon.asp”. The program can use the following codes to get those data from the form in the index.htm. temp_user = Request.form(“username”) Get the variable username in the form in the index.htm. In the above example, temp_user will hold the value “03002”. 182 temp_password = Request.form(“passw”) By using the variable temp_user, we can construct the SQL statement to find the password in the database. sql = “SELECT password FROM student WHERE studID = ‘” & temp_user & “’” In fact, the SQL statement should look like: SELECT password FROM student WHERE studID = ‘03002’ Below shows the details of the program code to <% temp_user = Request.form(“username”) <- Get the username from the previous form Set Conn = Server.CreateObject(“ADODB.Connection”) <Set conn.Provider=”Microsoft.Jet.OLEDB.4.0” the conn.Open(Server.Mappath(“db/project.mdb”)) connection set RS = Server.CreateObject(“ADODB.recordset”) -> sql = “SELECT password, name FROM student WHERE <- Define the sql statement, to find out the studID = ‘” & temp_user & “’” password of that particular studID. RS.Open sql, Conn <- Execute the SQL and store the result in the variable RS if RS.fields(“password”) <> request.form(“passw”) then response.Redirect(“wrong_input.htm”) end if <- Test if the passw from the previous form equals to the password in the recordset RS, if not, redirect to a page called wrong_input.htm RS.close <- Conn.close Clear the variables RS and conn Set RS = Nothing Set Conn = Nothing -> %> If the password of that particular studID is correct, then, it will stay (not redirect to wrong_input.htm), then, a form is shown below: 183 At this page, select MC exercise 001 and press the corresponding button GO!, you will find a online MC question appears as shown below: As you can expect, there would be at least two forms, one for the MC and the other for the polling. For the MC, the HTML code would look like: <FORM action=”domc.asp” method=”post”> And we can use the following codes to define the combo box for the MC <SELECT name=”mc” > <- Define the selection button <% Set Conn = Server.CreateObject(“ADODB.Connection”) <Set conn.Provider=”Microsoft.Jet.OLEDB.4.0” the conn.Open(Server.Mappath(“db/project.mdb”)) connection set RS = Server.CreateObject(“ADODB.recordset”) -> sql2=”SELECT DISTINCT exID FROM mc” <- Define the SQL to find exID RS.Open sql2, Conn <- Execute the SQL If not RS.eof then RS.movefirst Do Response.write “<option value=’” &RS(“exID”) & “’> “ & RS(“exID”) & “</option>” &chr(13) Rs.movenext <- To input the exID into the selection button, each option should be given a corresponding value Loop until RS.eof End if %> </SELECT> With the codes above, the web application domc.asp can get the MC exercise number by using the code “temp_exID = request.form(“mc”). However, how does it know which user (studID)? And do we need to check the login password again? In fact, we can use a method called session to handle it, it can be done in the logon.asp: if RS.fields(“password”) <> request.form(“passw”) then response.Redirect(“wrong_input.htm”) 184 <- If the password is correct, then else define sessiond “authorized” and Session(“authorized”) = “true” Session(“user”) = request.form(“username”) “user” and set some value to them end if -> Then, in the domc.asp, all we have to do is not check the password all over again, but use a simple statement like this: Note that the case is sensitive here. If Session(“authorized”) <> “true” then Response.Redirect “index.htm” End If At last, there should be a number of answer boxes, so, the following program codes is required to generate the names of the answer boxes. no_record = 0 If not RS.eof then RS.movefirst Do no_record = no_record + 1 Response.write “<tr><td>” & RS(“queID”) & “</td>” Response.write “<td>” & RS(“question”) & “</td>” Response.write “<td>” & RS(“choice1”) & “</td>” Response.write “<td>” & RS(“choice2”) & “</td>” Response.write “<td>” & RS(“choice3”) & “</td>” Response.write “<td>” & RS(“choice4”) & “</td>” Response.write “<td><input value=’A’ type=’text’ name=’”& RS(“queID”) & “’></td></tr>” Rs.movenext <- Define the answer box to the name of the queID Loop until RS.eof End if response.write(“<input type=’hidden’ name=’exID’ value=” & <- Pass a object value “exID” to the next temp_mc & “>”) page. session(“no_records”) = no_record To write data into the database, one should set the authority of users to have write property. We can highlight the database file or the database folder, then select the choice property. Then, we can select security, and then add a new account, say, everyone. 185 Then, you can set the everyone to have full control to the folder db (i.e. include the right to write). Now, the group account (everyone) enable any users will be granted a right to write / modify data in the folder db, i.e. the database file project.mdb. So, now, we can make use of the asp program code to update data in the database. In the asp program code, we have to set the recordset’s property to have the write property, here is the statement required. RS.CursorType = 2 RS.LockType = 3 And data can be assigned as follows: RS.Addnew RS.fields(“exID”) = Request.form(“exID”) RS.fields(“studID”) = Session(“user”) RS.fields(“answer”) = Request.form(Cstr(counter)) Cstr is a function to convert a value into a string, it is required because the field “answer” in the database file MC_result is set to be text. At last, the update process can be finished by the following codes: RS.update If in case you want to have those web application (the program codes and HTML code), you can download it here: http://www.yll.edu.hk/~yll-cym/ca_web.zip In fact, for easier updating web application, one should set up a FTP server and open the root of the web server to make the updating easier. GoldenFTP is a freeware to do so, you can download here: http://www.yll.edu.hk/~yll-cym/goldenftp/goldenftp.zip <End of Appendix 1> 186

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Database Notes (full version) - The ELCHK Yuen Long Lutheran