* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Overview of Relational Database
Serializability wikipedia , lookup
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Ingres (database) wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Overview of Relational Database What is Microsoft SQL Server? • SQL Server is a Database Management System (DBMS) sold by the Microsoft Corporation. • It is one of the most popular DBMS products in the world. Principal Editions of SQL Server • Enterprise • Business Intelligence • Standard • SQL Server Web • Breadth editions Principal Editions of SQL Server • Enterprise • Delivers comprehensive high-end data center capabilities with fast performance. • Unlimited virtualization. • End-to-end business intelligence • Enabling high service levels for mission-critical workloads and end user access to data insights. Business Intelligence • Delivers a platform empowering organizations to build and deploy secure, scalable and manageable Business intelligence (BI) solutions. • BI refers to data transformed into knowledge that can then be used to make more informed business decisions. Standard • Delivers basic data management and business intelligence database for departments and small organizations • Enabling effective database management with minimal IT resources. Other Editions of SQL Server • SQL Server Web • A specialized edition of SQL Server that targets business workloads. • A low total-cost-of-ownership option for Web hosters and Web VAPs to provide scalability and affordability. • Manageability capabilities for small to large scale Web properties. Breadth Editions of SQL Server. • These editions are engineered for specific customer scenarios and are offered FREE or at a very nominal cost. Breadth editions of SQL Server. • Developer • • • • Lets developers build any kind of application on top of SQL Server. It includes all the functionality of Enterprise edition. Is licensed for use as a development and test system. Not a production server. • Express • An entry-level, free database and ideal for learning and building desktop and small server data-driven applications. • It is the good choice for • Independent software vendors. • Developers. • Hobbyists building client applications. Data Persistence • Computer programs process data in the main memory of a computer. • The changes made to data in main memory are temporary • Do not persist beyond the execution of the program. To preserve changes to data. • The data must be written to disk. There are two common approaches to data persistence: • Use simple operating system files • Use a database Simple operating system files are common to • Spreadsheets. • Word processors. • Other productivity software. The organization of such files is based on a record • A collection of data items. Using simple operating system files • Any record for any entity such as an employee has no defined association with any record for another entity such as a department. • Users or programmers may recognize an association. • It is not made explicit in the files. Files and Databases • The fundamental concepts of a database management system. • Differs from a simple file system. • A file is a collection of data items. • A database is a collection of data items. • A database also captures relationships between data items. The manner in which relationships are captured by a database depends on the type of system. • Traditional network and hierarchical databases represent relationships via pointers. • Relational databases represent relationships taking a more value-based approach via foreign keys. Relationships have different cardinality • One to One (1:1) • One to Many (1:M) • Many to Many (M:M) Foreign key • A relational database can directly represent a 1:1 or 1:M relationship using a foreign key. • A M:M relationship is similar to two 1:M relationships and is represented in a separate table. • A database explicitly captures such relationships. A database is a collection of related data and has the following implicit properties • Represents some aspect of the real world • Consists of a logically coherent collection of data with some inherent meaning • Designed for a specific purpose - an intended group of users with some preconceived applications A database management system (DBMS) • Computer program (or collection of programs) which • Manages a database and acts as an interface between application programs and front-end tools which access the database. The DBMS acts an interface to data stored in the database. • Regardless of whether of the “front-end” program is • Running on the same machine as the DBMS (i.e. Server) • On a different machine (i.e. Client). Front-end Program • The “front-end” program communicates its requests for data through the • DBMS. • Programs do not directly access the operating system files that contain the data entrusted to the DBMS. • The front-end program could be an interactive interface for executing database commands (e.g. Server Management Studio) • It could be an application program that contains embedded database commands (e.g. Visual Basic). DBMS Responsibilities • The DBMS adds a layer of software between the application and the stored data. • This layer adds overhead. • DBMS performs many important functions. A DBMS provides all of the following. • Security • Prevents unauthorized access to data • Integrity • Enforces business rules • Concurrency • Controls data access by multiple users • Consistency • Supports transaction processing • Recovery • Mechanisms to restore the database after a • transaction. • database. • media failure RDBMS • Many popular database systems are based on the relational model. • A DBMS based on the relational model is said to be an RDBMS. Relational Database Systems • DB2 UDB (IBM) • Oracle Database (Oracle) • SQL Server (Microsoft) The relational model establishes criteria for • The (logical) structure of the data within the DBMS. • How users see the data • The language used by the DBMS. • How users operate on the data • Database integrity. • The enforcement of relationships and business rules A logical view of data • Does not impose any restrictions on how that data may be physically stored or accessed. Relational Data Structure • Data are perceived as a collection of tables. • Each table consists of columns and rows. Data Processing Terminology (1 of 7) • Table • Row • Column • Relations Table (2 of 7) • A table is similar to a file • A program reads from or writes a file. • When accessing a relational database a program directs its access to table. Row (3 of 7) • A row is similar to a record • A record is a collection of data items that describe some entity. • Its counterpart is a row. Column (4 of 7) • A column is similar to a field or data item in a record. • Each column must have a unique name within its table. • Each column contains a single ("atomic") data item. • This means the DBMS treats that data item as a single unit of data. • All values in a column must be of the same data type. Column (5 of 7) • The left-to-right order of columns is not important in the relational model. • How the data values are physically stored does not impose restrictions on how we can retrieve that data in a query. • This behavior is known as data independence. • We will see that we can return the column data in any left to-right order we desire. • regardless of the physical order in which these values are stored. Row (6 of 7) • Each row contains a single entry for each column. • Each row-column intersection in a table holds exactly one value of some data type. • Rows do not have names and can appear in any order. Relations (7 of 7) • The relational model is based on mathematical relations or sets. • A table is a computer representation of a relation. • A computer representation may not always be completely faithful to the object it represents. • A mathematical set. Relational Language • SQL (Structured Query Language) is the standard relational language used with Oracle and almost every RDBMS. Relational Language • SQL has two key features that distinguish it from conventional third generation (3GL) languages such as COBOL and languages used by traditional hierarchical and network DBMS products. • SQL is non-procedural • SQL supports set-level processing Non-procedural • Non-procedural means that a SQL statement indicates which rows and columns of a table are to be retrieved • Does not specify how they are to be retrieved. • Declarative language • Query optimizer. • This component makes the decisions of how to access the data based on what the user has requested as well as characteristics of the data stored in the database. Set-level processing • A SQL statement can refer to many rows (a set of rows). • Provides greater expressive power than traditional languages which could only reference a single record at a time. Relational Language • Classified according to three categories: • Data Definition (DDL) • Data Control (DCL) • Data Manipulation (DML) Data Definition (DDL) • DDL statements are used to • Create. • Drop. • Modify • DDL operations are generally performed by the database administrator (DBA). Data Control (DCL) • DCL statements are used to grant or revoke privileges on database objects. • DCL operations are generally performed by the DBA or security administrator. Data Manipulation (DML) • DML statements are used to • • • • Retrieve. Insert. Delete. Update • DML operations may be performed by • Application developers. • Business analysts. • The DBA. Indexes • Allows efficient retrieval of the data stored in a table. • An index might also be used to enforce an integrity constraint. How many indexes • One. • Many. • No indexes. Index (1 of 3) • Automatically updates an index when • A new row is added to a table. • An existing row is deleted from a table. • An existing row is updated. Index (2 of 3) • An index can also be used to enforce data integrity. • Enforcing uniqueness of values in a column. Index (3 of 3) • An index can also be used to enforce data integrity. • Enforcing uniqueness of values in a column. Database Integrity (1 of 2) • Three categories of database integrity: • Entity Integrity • Referential Integrity • User-Defined Integrity Database Integrity (2 of 2) • SQL Server uses: • PRIMARY KEYs (entity integrity) • FOREIGN KEYs (referential integrity) There are two ways that an RDBMS can support integrity: • Declarative integrity • Business rules enforced as constraints defined as part of the definition of the table • Procedural integrity • Business rules enforced programmatically through database triggers Entity Integrity • Entities are distinct and identifiable. • An entity is typically represented by a row in a table. Entity integrity • The primary key: • Must be a unique column (or group of columns) • Must not be null • A table can contain only one primary key constraint. • Cannot exceed 16 columns • A total key length of 900 bytes. Primary Key Index • Enforces entity integrity. • Cannot cause the number of indexes on the table to • Exceed 999 non-clustered indexes AND • 1 clustered index. Referential integrity (1 of 3) • Referential integrity pertains to an entity instance referencing another "valid" entity instance. • The values of the foreign key are constrained by the values of the referenced primary key. Referential integrity (2 of 3) • A foreign key includes a DELETE RULE to inform the system on how to handle an attempted deletion of a referenced row. • CASCADE • SET NULL • NO ACTION Referential integrity (3 of 3) • Corresponding index. • Manually create an index on the columns of a foreign key. Parent/Child • The system either accepts or rejects the child based on the existence of a referenced parent row. DELETE Rule (1 of 3) • A delete rule of CASCADE means the parent row and all child rows will be deleted. • could extend to grandchildren and beyond DELETE Rule (2 of 3) • NO ACTION DELETE Rule (3 of 3) • SET NULL • An attempted deletion of a parent row (that is the target of one or more child references) will be allowed by the system. • The child row will not be deleted. • The foreign key value of the referencing child rows will be set to a special value called NULL. • The NULL value means that the child has no parent. • This is a different situation from the case where a child row “references” a non-existent parent row. User-Defined Integrity • CHECK • CHECK identifies a condition which must be satisfied by a column value for the insertion of a new row or for the update of an existing row to be accepted by the system. • UNIQUE • UNIQUE indicates that a column value must be unique within a table. • A unique index is automatically created by the system to enforce a UNIQUE constraint. • NOT NULL and DEFAULT • clauses ensure that null values will not be stored in a table. Triggers • Triggers provide a means to implement procedural integrity. • A trigger can be used to support complex business rules that could not be implemented within the framework of declarative integrity. CONSTRAINT Example • A constraint can be assigned a name with the CONSTRAINT clause. CREATE TABLE EMPLOYEE (EMP_NO CHAR(5) NOT NULL, ENAME CHAR(30) NOT NULL, ESALARY NUMERIC(7,2) CONSTRAINT CK_ESALARY CHECK(ESALARY > 0), DEPTID VARCHAR(4), CONSTRAINT PK_EMPLOYEE PRIMARY KEY(EMP_NO), CONSTRAINT FK_DEPTID FOREIGN KEY(DEPTID) REFERENCES DEPARTMENT) SQL Server Management Studio What is SQL Server Management Studio? • SQL Server Management Studio (SSMS) is an integrated environment to • • • • • Access. Configure. Manage. Administer. Develop components of SQL server. SQL Server Architecture Fundamental Concepts (1 of 2) • A database in SQL Server is made up of a collection of tables that stores a specific set of structured data. Fundamental Concepts (2 of 2) • A table consists of a collection of rows and columns. • Each column holds a particular type of information. • Dates. • Strings. • Numbers. Database Instances • A computer can have one or more than one instance of SQL Server installed. • Each instance of SQL Server can contain one or many databases. Database • Includes one or many object ownership groups called schemas. • Each schema contains database objects • Tables. • Views. • Stored procedures. Permissions • A user that has access to a database can be given permissions to access the objects in the database. Database Instances (1 of 3) • Each instance of SQL Server has • System databases • One or more user databases. Database Instances (1 of 3) • An instance of the SQL Server Standard or Enterprise Edition can • Handle many users working in multiple databases at the same time. Database Instances (1 of 3) • Each instance of SQL Server makes • All databases in the instance available to all users that connect to the instance. • Subject to the defined security permissions. The system databases • Created by default when an instance of SQL Server is installed • • • • Master database tempdb database Model database msdb database Master database • The master database is the primary system database. • Without it, SQL Server cannot start. • Contains the most important information about objects within the SQL Server instance. tempdb database (1 of 5) • The tempdb database is a global area for temporary objects created by the internal processes that run SQL Server and temporary objects that are created by users or applications. tempdb database (2 of 5) • Temporary objects includes • • • • temporary tables and stored procedures. table variables. global temporary tables. cursors. tempdb database (3 of 5) • tempdb is re-created every time SQL Server is restarted. tempdb database (3 of 5) • tempdb stores • Row versions for read-committed or snapshot isolation transactions. • Online index operations. • AFTER triggers. tempdb database (5 of 5) • tempdb should never be used to store persistent information. • Because tempdb is global, it is accessible to all databases on the SQL Server system. Model database • The model database is a model for all databases created on an instance of SQL Server. • It serves as a template each time a database is created. msdb database • The msdb database serves primarily as the back-end database for Microsoft SQL Server Agent. • Whenever a SQL Server Agent job is created or schedules. • the metadata for that job is stored in this database. Database Files (1 of 4) • SQL Server maps a database over a set of operating system files. • Has two operating system files: • Data file • Log file Data files (2 of 4) • Data and objects • • • • Tables. Indexes. Stored procedures. Views. Data files (3 of 4) • Log files contain the information that is required to recover all transactions in the database. Data files (4 of 4) • Data files can be grouped together in file groups for allocation and administration purposes. SQL Server databases have three types of files: (1 of 4) • Primary • The primary data file contains the startup information for the database and points to the other files in the database. • User data and objects can be stored in this file or in secondary data files. SQL Server databases have three types of files: (2 of 4) • Every database has one primary data file. • The recommended file name extension for primary data files is .mdf. SQL Server databases have three types of files: (3 of 4) • Secondary • Secondary data files are optional. • User-defined. • Store user data. • Secondary files can be used to spread data across multiple disks by putting each file on a different disk drive. SQL Server databases have three types of files: (4 of 4) • If a database exceeds the maximum size for a single Windows file. • You can use secondary data files so the database can continue to grow. • The recommended file name extension for secondary data files is .ndf. Transaction Log • The transaction log files hold the log information that is used to recover the database. • There must be at least one log file for each database. • The recommended file name extension for transaction logs is .ldf. sysdatabases • Information about the database is recorded in the sysdatabases table of the master database. When a new database is created (1 of 2) • System objects are copied from the model database. • The initial size of a database must be at least the size of the model database. • The model database provides the starting point for all SQL Server databases on a system. • Administrators may add additional objects to this database. When a new database is created (2 of 2) • All objects in the model database will automatically be copied to the new database. • This is one way that administrators can ensure all databases have certain characteristics or objects. Logical database • Logical database design is concerned with the user's perception of data. • In a relational database the user sees data as tables. • Logical database design is concerned with identifying the tables for an application domain. Implementation of a design • The implementation of a design is concerned with physical matters • • • • • Indexes. Hashing. Ordering of rows. Size of a table. Size of physical blocks • Specification of a collection of base tables Design Criteria • The design should: • Satisfy specified data requirements • Be stable • Business changes should be easily incorporated • Be efficient • Implementation should perform well. • Resulting in good response time for queries Additional database design criteria • Logical Design • Base Tables • Views • Physical Design • Indexes • Tablespaces or Filegroups Database Models • Logical Model • Relational Design • applicable to any RDBMS • Physical Model • Physical Design • product specific • The logical model provides the input to the physical model Conceptual Model • No reference to any DBMS • A non-technical description of an application domain • Relational Model • Applies to any RDBMS • Physical Model • Specific to RDBMS product such as Microsoft SQL Server or Oracle Database Design Challenges (1 of 2) • Stability vs. Efficiency • Almost impossible to maximize both objectives: • Direct implementation of Conceptual Design • • • • Understandable design Simpler application programs Simpler user-written SQL queries But rejection of opportunities to improve machine efficiency Design Challenges (1 of 2) • Stability vs. Efficiency • Almost impossible to maximize both objectives: • Application of every efficiency technique • • • • • More complex design Convoluted application programming Few user-written SQL queries Possible future machine inefficiencies Loss of stability • Future design changes will be more difficult to implement Understanding Semantics of Data • Classic analysis problem enhanced • Conceptual Design forces you to ask more comprehensive questions of users • Management imposed time constraints • Significant size and complexity of some application domains • Difficulty of predicting performance Transform to Logical Model • (Relational Model) • Each entity becomes a table • Each Many-to-Many relationship becomes a table • For a One-to-Many relationship. • Include the primary key of "one" in the "many" table • Include attributes of columns in tables SQL Server Datatypes Common Datatypes BIT Maximum Size Integer that can be 0, 1, or NULL. CHAR(size ) Maximum size of 8,000 characters. Where size is the number of characters to store. Fixed-length. Space padded on right to equal size characters. Non-Unicode data. DEC(m ,d ) DECIMAL(m ,d ) m defaults to 18, if not specified. m defaults to 18, if not specified. Where m is the total digits and d is the number of digits after the Where m is the total digits and d is the number of digits after the decimal. FLOAT(n ) INT MONEY NUMERIC(m ,d ) Floating point number. -2,147,483,648 to 2,147,483,647 -922,337,203,685,477.5808 to m defaults to 18, if not specified. Where n is the number of number of bits to store in scientific notation. NVARCHAR(size ) or NVARCHAR(max) Maximum size of 4,000 or max characters. SMALLDATETIME Date values range from '1900-0101' to '2079-06-06'. SMALLINT -32768 to 32767 SMALLMONEY - 214,748.3648 to 214,748.3647 TEXT Maximum size of 2GB. VARCHAR(size ) or VARCHAR(max) Maximum size of 8,000 or max characters. Explanation Where m is the total digits and d is the number of digits after the decimal. Where size is the number of characters to store. Variable-length. If max is specified, the maximum number of characters is 2GB. Unicode data. Displayed as 'YYYY-MM-DD hh:mm:ss' Variable-length. Non-Unicode data. Where size is the number of characters to store. Variable-length. If max is specified, the maximum number of characters is 2GB. Non-Unicode Integrity constraints include the following: • Primary Key • Foreign Key • Check Conditions (including NOT NULL) ALTER TABLE Statement • The ALTER TABLE statement can be used to modify a table definition • • • • • • • • Add columns Add constraints Drop columns Drop constraints Disable constraints Enable constraints Disable triggers Enable triggers Database Design Approaches • Top-Down • Bottom-up Database Design Approaches • Top-Down • Discover • The entities. • The relationships. • The attributes or data items Bottom-up approach (1 of 5) • Produces a logical data model which is translated into a physical data model to implement the database design. • Focuses on the process model first. • Making it a function-driven approach. Bottom-up approach (2 of 5) • All processes to be performed by the system are identified. • As well as the data they require. Bottom-up approach (3 of 5) • The initial focus is on applications rather than on data. Bottom-up approach (4 of 5) • A data model is constructed to satisfy this precise set of data requirements. Bottom-up approach (5 of 5) • Techniques such as normalization are integral to this approach. Bottom-Up Approach (Analysis Phase) (1 of 2) • Collecting local views • Determining functional dependencies among data items Bottom-Up Approach (Design Phase) (2 of 2) • Normalizing local views • Synthesizing a global view Normalization Theory (1 of 2) • Normalization is an integral technique of a bottom-up approach. • Normalization theory is used as a design verification technique in a top-down approach. Normalization Theory (2 of 2) • “A record is in second and third normal form if every field is either part of the key or provides a (single-valued) fact about exactly the whole key and nothing else” • “A table is in third normal form if every non-key field depends upon • The primary key. • The whole key. • And nothing but the key” Scope of Normalization (1 of 4) • This theory is applicable to any DBMS. • Not just relational. Scope of Normalization (2 of 4) • It is applicable to the design of simple files as well. Scope of Normalization (3 of 4) • Normalization theory is a process of decomposing tables or records from an un-normalized form to a normalized form. • Each decomposition reduces the possibility of errors or anomalies occurring during database update processing. Scope of Normalization (4 of 4) • Applying the normalization process. • Tables are transformed to a higher normal form. • Note that 5th normal form (5NF) is a theoretical ideal. Benefit and Cost of Normalization (1 of 4) • The ultimate benefit derived from applying normalization theory is a stable design. • Reducing the chance of update errors. Benefit and Cost of Normalization (2 of 4) • In order to implement a database through the normalization process. • One must have a real understanding of the semantics (meaning) of each data item. • This forces the database designer to effectively communicate with the users of the database. Benefit and Cost of Normalization (3 of 4) • Users understand clearly the meaning of data in relations so they correctly formulate queries. Benefit and Cost of Normalization (4 of 4) • There is extra an effort of joining and tables. Functional Dependence (1 of 2) • Given a relation R. • Attribute Y of R is functionally dependent on attribute X of R if each X-value of R has associated with it precisely one Y-value of R (at any one time). • Attributes X and Y may be composite. Functional Dependence (2 of 2) • Functional dependence means: • "You tell me X and I'll tell you Y" • Notation: • R.X R.Y • Examples: • • • • • STUDNO SNAME STUDNO SAGE COURSENO CNAME POLICYNO NAME-OF-INSURED CLAIMNO POLICYNO Functional Dependence • Functional dependence is not limited to a single data item. • It can be extended to a collection of fields. • Notation: • A,BC Functional dependence is a semantic notion. • An understanding of the meaning of data items in the application domain is required. • Applying functional dependence enables the designer to identify keys. Candidate key • If a field uniquely identifies every other field in a given table. • It is a candidate key. • Alternatively: If every field in a table is functionally dependent on field k. • Then k is a candidate key. Composite key • Sometimes a candidate key is a composite key • A single data item is not sufficient to determine the other data items of the record. First Normal Form • For a table to be in First Normal Form (1NF). • All data item values must be atomic. • This means that there are no repeating groups. • In a table. • Every row-column intersection holds a single value. • Not a set of values. Unnormalized table. First Normal form Second Normal form Third Normal form Transitive Dependency SQL Timeline • The logical sequence of operations is as follows: 1. Rows satisfying the conditions of the WHERE clause are retrieved 2. Grouping of rows is performed as specified in the GROUP BY clause 3. The aggregate functions are evaluated for each group 4. Groups are filtered as specified in the HAVING clause 5. Rows to be returned to the user are arranged according to the ORDER BY clause Phantom Reads (1 of 7) • The concept of a phantom read is best explained through an example. 1. Assume there are two transactions. • A and B. Phantom Reads (2 of 7) 2. Transaction A performs a query. • Such as summing account balances. Phantom Reads (3 of 7) 3. Transaction B inserts information about a new account into the table processed by Transaction A. Phantom Reads (4 of 7) 4. Transaction A repeats its query to sum account balances. • This time, the result is different. Phantom Reads (5 of 7) 5. Transaction A sees something that did not exist the first time • A phantom. Phantom Reads (6 of 7) 6. Following this scenario. • Serializability has been violated. Phantom Reads (7 of 7) 7. The interleaved execution of these transactions is neither Athen-B nor B-then-A. Transactions (1 of 4) • Transactions let users guarantee consistent changes to data. • As long as the SQL statements within a transaction are grouped logically. Transactions (2 of 4) • Data in all referenced tables are in a consistent state before the transaction begins and after it ends. Transactions (3 of 4) • Transactions should consist of only the SQL statements that make one consistent change to the data. Transactions (4 of 4) • After a transaction is committed or rolled back. • The next transaction begins with the next SQL statement. Materialized Views (1 of 3) • A materialized view provides indirect access to table data by storing the results of a query in a separate object. • A materialized view is also called a materialized query table. • A conventional view does not require disk space and has no data of its own. Materialized Views (2 of 3) • To resolve a query against a conventional view. • The DBMS retrieves the stored query from the system catalog and executes a query. Materialized Views (3 of 3) • A materialized view contains rows resulting from the execution of a query against one or more base tables or views. • The result set of the query is stored in the database. • This means the contents of a materialized view is subject to aging and could need to be periodically updated or refreshed. Applications of Materialized Views (1 of 6) • Materialized views can be used to • • • • Summarize. Compute. Replicate. Distribute data. Applications of Materialized Views (2 of 6) • Materialized views are suitable in a variety of • • • • Computing environments Data warehousing. Decision support. Distributed or mobile computing Applications of Materialized Views (3 of 6) • Data warehouse: materialized views are used to compute and store aggregated data • Sums and averages. • Compute joins with or without aggregations. Applications of Materialized Views (4 of 6) • Distributed environment • materialized views are used to replicate data at distributed sites and synchronize updates done at several sites with conflict resolution methods. Applications of Materialized Views (5 of 6) • The materialized views as replicas provide local access to data that otherwise has to be accessed from remote sites. Applications of Materialized Views (6 of 6) • Mobile computing • Materialized views are used to download a subset of data from central servers to mobile clients. • With periodic refreshes from the central servers and propagation of updates by clients back to the central servers. Data Warehousing with Materialized Views (1 of 5) • Data flows from one or more online transaction processing (OLTP) databases into a data warehouse • Monthly. • Weekly. • Daily basis. Data Warehousing with Materialized Views (2 of 5) • Data are normally processed in a staging file before being added to the data warehouse. Data Warehousing with Materialized Views (3 of 5) • Data warehouses commonly range in size from tens of gigabytes to a few terabytes. • Usually, the vast majority of the data is stored in a few very large fact tables. Data Warehousing with Materialized Views Summaries (4 of 5) • One technique employed in data warehouses to improve performance is the creation of summaries. Data Warehousing with Materialized Views Summaries (5 of 5) • Summaries are special kinds of aggregate views that improve query execution times by pre-calculating expensive joins and aggregation operations prior to execution and storing the results in a table in the database. • For example., you can create a table to contain the sums of sales by region and by product. • i.e. materialized view. Data Warehousing with Materialized Views (1 of 3) • Materialized views that precompute and store aggregated data are often referred to as summaries. • They store summarized data. • Summary management • Eases the workload of the database administrator • Means the user no longer needs to be aware of the summaries that had been defined. Data Warehousing with Materialized Views (2 of 3) • This mechanism reduces response time for returning results from the query. 1. The database administrator creates one or more materialized views. • Which are the equivalent of a summary. 2. The end user queries the tables and views at the detail data level. 3. The query rewrite mechanism automatically rewrites the query to use the summary tables. Data Warehousing with Materialized Views (3 of 3) • Materialized views within the data warehouse are transparent to the end user or to the database application. • Materialized views are usually accessed through the query rewrite mechanism. • However, queries can directly access summaries. • Materialized views can also be used to precompute joins with or without aggregations. Query Rewrite Optimization • Query transformation is transparent to an application. • No reference to the materialized view is required in a SQL statement. • Materialized views can be added or dropped without invalidating the SQL in the application code. • The optimizer transparently rewrites the request to use the materialized view. • Queries are then directed to the materialized view and not to the underlying detail tables or views. • Query rewrite uses cost-based optimization. • So it is important to collect statistics both on tables involved in the query and on the tables representing materialized views. • The concepts discussed above are generic and apply to Oracle and DB2. • Details of the implementation of materialized views and their refresh mechanisms differ between database systems.