* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download RDBMS - Simmons College
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
RDBMS RDBMS Introduction Any collection of objects, intentionally organized for a purpose and efficient retrieval is a “database”. In that sense, an entire library is a “database”; the term “database” is used casually and usually incorrectly to refer to the computer-based “relational database management systems” or “RDBMS”, discussed below. A RDBMS refers to the entire concept of business requirements, rules for describing data, commands for creating and manipulating the data, and generating reports, though you should note that most people use the term to refer solely to the computer program. A RDBMS has several major components: the business logic (or rules that govern what data are to be collected, by whom, when, and how they’re to be used), data themselves (grouped by functions based on the business needs, and the specific types of data (see below), and the commands to manipulate the data. What is informally called a “database” is usually the computer application that carries out the business rules and manipulates the data. Some database applications, such as FileMaker Pro, MS Access, and phpMyAdmin provide GUI tools to facilitate defining databases, data tables, and creating reports from the data. Other tools, such as MySQL, Oracle, Sybase are just the tool for creating and manipulating data - the programmer must create the rest of the computer application for input, reports, etc. See also the .pdf (Database1.pdf ) For students new to RDBMS, it is helpful to think of it as a spreadsheet. There are columns and rows. Every cell in the spreadsheet has a unique row and column address (e.g., row 5, column 2). If we know the row and column, we can find the data. RDBMS are computer implementations of the relational algebra that makes it possible to identify uniquely every cell and every set of related cells. This is the key: we can identify a single cell by some unique value, such as a record number, or by a set of values, such as lastname +firstname+date_of_birth+shoe_size. Steps in creating a RDBMS In both systems analysis and information architecture, one of the first phases of creating a RDBMS is to gather information about what data need to be collected, how they will be used and by whom, and the technical concerns about storing, retrieving, and presenting the data. Data collection: only data that are actually needed ought to be collected. This is determined by interviews, reviewing emails and other files that document the kind of problems people have with their current system, and by studying how people in an organization use the data. Ultimately a report is generated that outlines the problem and presents candidate solutions. From that report, a database administrator or programmer considers the data problem from three perspectives: the conceptual, logical, and physical. Conceptual phase: in this phase, the programmer or analyst breaks down the needs into functions and the relationships among the data and the people who use the data. For example, say you’re creating a payroll system (this becomes the “payroll database”). You need to know the name of the staff, their pay rate, hours worked, whether the check has been printed, how much taxes and insurance to be deducted, and so on. In the conceptual phase, you’d decide how the data should be grouped by function: e.g., record hours worked function, print paycheck function, check data for accuracy function, and so on. At this high level, only the major functions are defined and related and will become the “tables” that belong to the single database. How the data move between these functions is called “data flow analysis.” The analyst Relational Database Management Systems 1 of 8 RDBMS determines also who gets to see what data and when (this is called the “view of the data”). For example, Joe cannot see Tom’s paycheck. Joe cannot change his pay rate but his manager can. These controls have to be specified and documented. From this we can move to the logical phase. Here the main activities are “data decomposition” and “data normalization.” Data decomposition means breaking down the concepts into something closer to how a computer might use them. The concept of “staff name” must be broken down, say, into last name, first name, middle initial, job title. In addition, the analyst tries to remove redundant data from being collected. For example we want to gather the staff member’s name only once and at the right time (when hired). This task is called normalization. There are many forms of normalization but the goal for most of us is to reach “third normal form” or 3NF. FileMaker Pro’s and MS Access’s interface enforces 3NF. Otherwise the programmer is required to enforce normalization. [There are times when redundant data must be captured, but this has to be justified in the design of the database system.] All the data that are to be captured must be defined by being given a data type (see the Topic Data_Types) and all the data gathered into a document called the data dictionary. The Data Dictionary is literally that - the names of all the data to be captured, their data types, their aliases, how the data are used, what table contains what fields, etc. It is critical to the success of any project to have and maintain the data dictionary. Physical phase. Once the data have been defined, the programmer uses the logical design phase documents to construct the actual database and tables in software. If you use MS Access or similar product, the Access program guides you but it is still easy to make mistakes if you don’t understand how RDBMS work. We’ll focus on MySQL to demonstrate. MySQL is a software program that helps you to create other software programs, specifically the data part of a RDBMS, by issuing various commands. You’ll still need some kind of programming or scripting to interact with the database and tables. [In LIS488, we’ll use php and Java.] All SQL programs cluster their commands into two groups: Data Definition Language (or DDL) and Data Manipulation Language (DML). DDL commands include CREATE database... , CREATE table … and others DML commands include INSERT into …, SELECT …, MODIFY … and others Example: Let’s say you’re creating a database about yourself and you want to insert into and retrieve data from the database over the Internet. As a first try, we create a database called “Transcripts” and then a table called “grades”. This is a possible data definition: field name data type size example last_name String 25 characters Smith, De la Rosa first_name String 15 characters Jane, Tom middle_initial char 1 M collegeName String 25 Wellesley College major String 25 Art History age int 24 course_1 String 6 LIS488 grade_1 float 4 4.0 course_2 String 6 LIS458 grade_2 float 4 3.7 [and so on...] Relational Database Management Systems 2 of 8 RDBMS If we know these data, we can play the kind of report (screen) design you want: About me My classes Welcome to Jane Doe’s online resume I’m a 24 years old Art Major from Wellesley College now enrolled in GSLIS at Simmons College. LIS488 4.0 A LIS458 3.7 AGPA: 3.52 To get these data, we issue the command to SQL to use the database we want: “USE Transcripts” Then we issue the command to the table that is part of the database (“grades”): SELECT * FROM grades; * means “all” so the command is “get all the records from the grades table”. Because there is only 1 record in the grades table, we retrieve all our data (in this case only the 1 record). If everyone in the class wanted to post their grades online, then it would make more sense to separate (decompose) the data into different functions: the student name and personal data in one table, the grades in another table, and the information about the courses into another table. table 1 student_names database studentInfo table 2 grades table 3 class_info So, let’s redefine our database tables and update the data dictionary. Database: studentInfo tables: student_names, grades, class_info table: table: table: student_names record_no last_name first_name middle_initial major college int not null auto_increment String 25 String 15 char 1 String 25 String 25 grades record_no class_number grade not not null auto_increment String 6 not null String 2 class_info class_number class_name desc String 6 String 25 Text Relational Database Management Systems 3 of 8 RDBMS Note that we define Strings but must choose the right kind of String for our database MySQL. Strings are “varchar()” [a fixed length field from 0-255 characters long]. A “text” holds up to 65,535 characters. Using this data dictionary, the programmer creates the physical form of the database and tables. The underline means the field is indexed. More on that shortly. SQL commands end with a semi-colon ; CREATE database studentinfo; Now, let’s use the empty database and add tables: USE studentinfo; To create tables, we add these commands: CREATE TABLE student_names ( record_no int unsigned not null auto_increment, last_name varchar(25), first_name varchar(15), middle_initial char varchar(1), major varchar(25), college varchar(25), primary key (record_no) ); [press the return key] CREATE TABLE grades ( record_no int unsigned not null, class_number varchar(6) not null, grade varchar(2) ); CREATE TABLE class_info ( class_number varchar(6), class_name varchar(25), desc Text ); Notice that student_names.record_no field is “int unsigned” - this means the record number is an integer between 0 - 65,535. It has to be an integer because the computer performs an arithmetic function on the record number every time a new record is added. [The old record number is reviewed and 1 added to it to create the new record number.] It is “not null” because we want to index that field for fast retrieval. We cannot index on a value that is missing so the addition of “not null” forces the SQL program to give us an error if we try to save the record without a value. We know there will be an index (and the primary key) by the last line of the create statement. In the grades table, we see two fields cannot be null. Later we would create indices on these fields, too. Finally, notice that the class_info table has a field called “desc” and a value of Text. This means we can enter up to 63,535 characters in this field - far longer than we need to add a course description. Assignment or Lab - Due before the next class. 1. Practice the various commands listed below and compare their behavior with different options. Relational Database Management Systems 4 of 8 RDBMS Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations No doubt, you’ve learned already some of the concepts & terminology of relational database modeling. To get up to speed with what you’ve covered, here are some quick notes that may help us harmonize our perspectives. Components of a DBMS: Many roles and activities are involved; not all of which you may have encountered. Here’s a view of the components of a DBMS. Notice the different contributions of programmers, users, and the DBA - the database administrator. Note, too, the functions that constitute a DBMS (and where the DDL and DML fit in). Ultimately the commands and functions must be communicated to the computer system itself - via the file manager, various other access methods and buffers before reaching the actual data stored in the relationship databases and tables. Query Processor: transforms queries into a series of low-level instructions directed to the database manager. Database Manager (DM): Interacts with the user-submitted application programs and queries. The DM accepts queries and examines the external and conceptual schema to determine what conceptual records are required to satisfy the request. The DN places a call to the File Manger to perform the request. File Manager: manipulates the underlying storage files and manages the allocation of storage space on the disk. Actual physical manipulation of the data is passed to the appropriate access method. DML preprocessor: converts DML statements embedded in an application program into a standard function call in the host language (e.g., MySQL). DDL compiler: compiler converts DDL statements into a set of tables containing metadata. These tables are then stored in the catalog while control info is stored in the data files headers. Catalog manager: [not the library kind!] - manages access to and maintains the system catalogue; the system catalog is accessed by most of the DBMS components. There are other important functions in SQL software such as authorization control, command processor, integrity checker, query optimizer, transaction manager, scheduler, recovery manager, and buffer manager. Relational Database Management Systems 5 of 8 RDBMS Most of these functions are performed by the 3rd party application (such as MS Access) or can be manipulated at the command line or in other resource files. Very large systems, such as those that usually use Oracle have lots of these helper resources. Terms: Relation: Attribute: Domain: a relation is a table with columns & rows an attribute is a named column of a relation is the set of allowable values for one or more attributes. Figure 2: Instances of two tables (branch and staff relations): What to know Database Entity Attribute Domain Tuple Degree Cardinality Superkey Candidate key Relational db Shared collection of logically related data (and a description of these data), designed to meet the info needs of an organization. Distinct object (person, place, thing, concept, or event) represented in the database A property that describes some aspect of the object we wish to record; also a named column of a relation. Set of allowable values for one or more attributes. Row of a relation The degree of a relation is the number of attributes it contains. The number of rows in a relation. Attribute or a set of attributes that identifies uniquely a tuple within a relation. Superkey such that no proper subset is a superkey within the relation. Collection of normalized relations. Relational Database Management Systems 6 of 8 RDBMS Primary key Foreign key Candidate key that is selected to identify tuples uniquely within the relation. Attribute or set of attributes within one relation that matches the candidate key of some (possibly the same) relation. Null A value for an attribute that is currently unknown or is not applicable for this tuple Referential integrity If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null. Join The union of two or more tables. Relationship Association between several entities. Relation A table with columns and rows. DDL Data Definition Language - set of commands to define the data and constraints on the data stored in the database DML Data Manipulation Language - provides a general enquiry facility to the data (aka query language) View of the data Dynamic result of one or more relational operations on the base relations to produce another relation. A view is a virtual relation that doesn’t actually exist in the DB. Logical db design The layout of the relationships of data, using specific design techniques and tools that document the needs and uses of data, according to an organization’s specific data needs. Business rules The articulation of an organization’s data needs. Physical db design The implementation of the logical database design, usually expressed as the creation of databases, tables, indices, etc. OODBMS Object-Oriented Database Management System. External view Users’ view of the db Conceptual level What data are stored in the db and the relationship among the data Internal level Physical representation of the db on the computer. DB Schema The definition of a database; the structure and content in each data element within the structure. Often created using visualization tools. Data independence Techniques that allow data to be changed without affecting the applications that process it. 5GL Fifth Generation Language - expression referring to computer languages that with each iteration are closer to human language by generation, e.g., 4GL, 5GL Data normalization The process of analyzing data into record groups for more efficient processing. There are many stages, the most standard result being “3NF” (third normal form) where data are identified only by the key field in their record. The main purpose is to eliminate having to store a single datum in more than one place. 3NF The standard Information system Resources that enable the collection, management, control, and dissemination of info throughout an organization. DB app lifecycle DB planning, system definition, requirements collection and analysis, db design, application design and prototyping, etc. Requirements analysis Process of collecting and analyzing info about the part of the organization that is to be supported by the DB application and using this info to identify the users’ requirements of the new system. Relational Database Management Systems 7 of 8 RDBMS CASE Computer-Aided Software Engineering - usually software that helps in the development of an information system, including analysis, design, and programming. Data dictionary The document (or database) that defines the databases, tables, data types, sources, etc., for an information system. Entity type An object or concept that is identified by the organization as having an independent existence. Weak entity type Entity that is existence-dependent on some other entity. Strong entity type Entity that is not existence-dependent on some other entity. Composite attribute Attribute composed of components, each with an independent existence. Multi-valued attribute Attribute that holds multiple values for a single entity Derived attribute Attribute that represents a value that is derivable from the value of a related attribute or set of attributes not necessarily in the same entity. Superclass Entity type that includes distinct subclasses that requirement to be represented in the data model. Normalization The process of producing a set of relations with desirably properties, given the data requirements of an enterprise. 1:M Representation of the one-to-many relationship, one data element related to many others. M:N Representation of many-to-many data relationships; database normalization breaks down M:N to 1:M 1:1 One-to-one data relationship. Oracle Popular very large scale relational database product. SELECT Database command, or statement, to select data from a table ALTER Database command, or statement, to modify the structure of a database table UPDATE Database command to change the data contents of an existing row. INSERT Database command to add new data (add a new row) into a table. phpMyAdmin Web-based GUI for working with MySQL Data decomposition Breaking down of work functions into discrete units of data ER Entity-relation diagram, a graphic representation of data relationships UML Unified Modeling Language, an object oriented analysis and design language; has 12 diagrams (four structural, five behavioral, and three model management (packages, subsystems, and models). Web-enabled DB A relational database that has been linked to the Internet. Object-oriented programming Writing software that supports a model wherein the data and associated processing (“methods”) are defined as self-contained “objects.” OOP has three major features, encapsulation, inheritance, and polymorphism. Relational Database Management Systems 8 of 8