* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 21:Application Development and Administration
Global serializability wikipedia , lookup
Tandem Computers wikipedia , lookup
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Commitment ordering wikipedia , lookup
Oracle Database wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Ingres (database) wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Serializability wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
ContactPoint wikipedia , lookup
Concurrency control wikipedia , lookup
Chapter 23: Advanced Application Development 1 Database System Concepts, 5th Ed. 23.1 ©Silberschatz, Korth and Sudarshan Database System Concepts       Chapter 1: Introduction Part 1: Relational databases  Chapter 2: Relational Model  Chapter 3: SQL  Chapter 4: Advanced SQL  Chapter 5: Other Relational Languages Part 2: Database Design  Chapter 6: Database Design and the E-R Model  Chapter 7: Relational Database Design  Chapter 8: Application Design and Development Part 3: Object-based databases and XML  Chapter 9: Object-Based Databases  Chapter 10: XML Part 4: Data storage and querying  Chapter 11: Storage and File Structure  Chapter 12: Indexing and Hashing  Chapter 13: Query Processing  Chapter 14: Query Optimization Part 5: Transaction management  Chapter 15: Transactions  Chapter 16: Concurrency control  Chapter 17: Recovery System      Part 6: Data Mining and Information Retrieval  Chapter 18: Data Analysis and Mining  Chapter 19: Information Retreival Part 7: Database system architecture  Chapter 20: Database-System Architecture  Chapter 21: Parallel Databases  Chapter 22: Distributed Databases Part 8: Other topics  Chapter 23: Advanced Application Development  Chapter 24: Advanced Data Types and New Applications  Chapter 25: Advanced Transaction Processing Part 9: Case studies  Chapter 26: PostgreSQL  Chapter 27: Oracle  Chapter 28: IBM DB2  Chapter 29: Microsoft SQL Server Online Appendices  Appendix A: Network Model  Appendix B: Hierarchical Model  Appendix C: Advanced Relational Database Model 2 Database System Concepts, 5th Ed. 23.2 ©Silberschatz, Korth and Sudarshan Part 8: Other topics (Chapters 23 through 25).  Chapter 23: Advanced Application Development  covers performance benchmarks, performance tuning and standardization.  Chapter 24: Advanced Data Types and New Applications  covers advanced data types and new applications, including temporal data, spatial and geographic data, multimedia data, and issues in the management of mobile and personal databases.  Chapter 25: Advanced Transaction Processing  deals with advanced transaction processing. We discuss transactionprocessing monitors, high-performance transaction systems, real-time transaction systems, and transactional workflows. 3 Database System Concepts, 5th Ed. 23.3 ©Silberschatz, Korth and Sudarshan Table of Contents  23.1 Performance Tuning  23.2 Performance Benchmarks  23.3 Standardization  23.4 Application Migration  23.5 Summary 4 Database System Concepts, 5th Ed. 23.4 ©Silberschatz, Korth and Sudarshan • Ch 8: Application Design & Development User User Web Browser Application WEB Interface : Servlet JSP Monitoring DBMS using Trigger Limited Access by Authorization Protecting DBMS by Security DBMS  Ch 23: Advanced Application Development  Performance  Standardization  Application Migration 5 Database System Concepts, 5th Ed. 23.5 ©Silberschatz, Korth and Sudarshan Performance Tuning  Adjusting various parameters and design choices to improve system performance for a specific application.  Tuning is best done by identifying bottlenecks & eliminating them.  Can tune a database system at 3 levels:    Hardware  add disks to speed up I/O  add memory to increase buffer hits  move to a faster processor Database system parameters  set buffer size to avoid paging of buffer  set checkpointing intervals to limit log size  system may have automatic tuning. Higher level database design  such as the schema, indices and transactions (more later) 6 Database System Concepts, 5th Ed. 23.6 ©Silberschatz, Korth and Sudarshan Bottlenecks  Performance of most systems (at least before they are tuned) usually limited by performance of one or a few components  These are called bottlenecks  E.g. 80% of the code may take up 20% of time and 20% of code takes up 80% of time  Worth spending most time on 20% of code that take 80% of time  Finding bottlenecks is very important  Bottlenecks may be in hardware or in software  e.g. disks are very busy, CPU is idle  Removing one bottleneck often exposes another bottleneck  De-bottlenecking consists of repeatedly finding bottlenecks, and removing them  This is a heuristic 7 Database System Concepts, 5th Ed. 23.7 ©Silberschatz, Korth and Sudarshan Identifying Bottlenecks  Transactions request a sequence of services  e.g. CPU, Disk I/O, locks  With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served  Can model database as a queueing system with a queue for each service  transactions repeatedly do the following  request a service  wait in queue for the service  get serviced  Bottlenecks in a database system typically show up as very high utilizations of a particular service  Bottlenecks shows very long queues  E.g. Disk utilization vs CPU utilization  100% utilization leads to very long waiting time:  Rule of thumb: – design a system for about 70% utilization at peak load – utilization over 90% should be avoided 8 Database System Concepts, 5th Ed. 23.8 ©Silberschatz, Korth and Sudarshan Finding Bottlenecks: Queues in a Database System 9 Database System Concepts, 5th Ed. 23.9 ©Silberschatz, Korth and Sudarshan Tunable Parameters  The lowest level is at the hardware level  Adding disks or using a RAID system  Adding more memory  Moving to a faster processor  The second level consists of the DBMS parameters  Buffer size and Checkpointing intervals  The third level is the highest level  Schema and Transactions  Indices and Materialized views 10 Database System Concepts, 5th Ed. 23.10 ©Silberschatz, Korth and Sudarshan Tuning of Hardware  Even well-tuned transactions typically require a few I/O operations  Typical disk supports about 100 random I/O operations per second  Suppose each transaction requires just 2 random I/O operations.  Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew)  If 1000 transactions / 1 sec is required, 20 disks are needed  Number of I/O operations per transaction can be reduced by keeping more data in memory  If all data is in memory, I/O would be needed only for writes  Keeping frequently used data in memory reduces disk accesses, reducing number of disks required, but has a memory cost 11 Database System Concepts, 5th Ed. 23.11 ©Silberschatz, Korth and Sudarshan Hardware Tuning: Memory  Question: which data to keep in memory:  The disk cost for 1 I/0 per second is  price-per-disk-drive accesses-per-second-per-disk  Reducing 1 I/O saves this cost  If a page is accessed n times per second, the value of keeping the page in memory can be thought of as much as the disk cost for n I/Os per second  n* price-per-disk-drive accesses-per-second-per-disk  Cost of keeping a page in memory (buying more memory)  price-per-megabyte-of-memory pages-per-megabyte-of-memory  Break-even point: value of n for which above two costs are equal  If accesses to the papge are more than n times  better to keep the page in the memory (buying more memory) – saving by buying memory is greater than buying cost 12 Database System Concepts, 5th Ed. 23.12 ©Silberschatz, Korth and Sudarshan Hardware Tuning: 5-Min & 1-Min Rule  Solving above equation with current disk and memory prices leads to: 5-minute rule: if a page that is randomly accessed is used more frequently than once in 5 minutes  it should be kept in memory (by buying sufficient memory!)  For sequentially accessed data, more pages can be read per second Assuming sequential reads of 1MB of data at a time: 1-minute rule: sequentially accessed data that is accessed once or more in a minute should be kept in memory (by buying sufficient memory!)  Prices of disk and memory have changed greatly over the years, but the ratios have not changed much  rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules! 13 Database System Concepts, 5th Ed. 23.13 ©Silberschatz, Korth and Sudarshan Hardware Tuning: Choice of RAID Level  To use RAID 1 or RAID 5?  Depends on ratio of reads and writes  RAID 5 requires 2 block reads and 2 block writes to write out one data block  If an application requires r reads and w writes per second  RAID 1 requires r + 2w I/O operations per second  RAID 5 requires: r + 4w I/O operations per second  For reasonably large r and w, this requires lots of disks to handle workload  RAID 5 may require more disks than RAID 1 to handle load!  Apparent saving of number of disks by RAID 5 (by using parity, as opposed to the mirroring done by RAID 1) may be illusory!  Thumb rule: RAID 5 is fine when writes are rare and data is very large, but RAID 1 is preferable otherwise  If you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous! 14 Database System Concepts, 5th Ed. 23.14 ©Silberschatz, Korth and Sudarshan Tuning of the Schema  Vertically partition relations to isolate the data that is accessed most often -- only fetch needed information. • E.g., split account (account-number, branch-name, balance) into two relations, (account-number, branch-name) and (account-number, balance). • Branch-name need not be fetched unless required  Improve performance by storing a denormalized relation • E.g., store join of account relation and depositor relation; branch-name and balance information is repeated for each holder of an account, but join need not be computed repeatedly. • • Price paid: more space and more work for programmer to keep relation consistent on updates better to use materialized views (more on this later..)  Cluster together on the same disk page records that would match in a frequently required join,  compute join very efficiently when required. 15 Database System Concepts, 5th Ed. 23.15 ©Silberschatz, Korth and Sudarshan Tuning of Indices  Index tuning  Tradeoff between queries and updates  Create appropriate indices to speed up slow queries/updates  Speed up slow updates by removing excess indices  Choose type of index (B-tree / Hash) appropriate for most frequent queries  Choose which index to make clustered  Index tuning wizards (SW Tool)  Look at past history of queries and updates (the workload)  Recommend which indices would be best for the workload 16 Database System Concepts, 5th Ed. 23.16 ©Silberschatz, Korth and Sudarshan Tuning using Materialized Views  Materialized views can help speed up certain queries  Particularly aggregate queries  Overheads  Space  Time for view maintenance  Immediate view maintenance: done as part of update transaction – time overhead paid by update transaction  Deferred view maintenance: done only when required – update transaction is not affected, but system time is spent on view maintenance » until updated, the view may be out-of-date  Preferable to denormalized schema since view maintenance is systems responsibility, not programmers  Avoids inconsistencies caused by errors in update programs 17 Database System Concepts, 5th Ed. 23.17 ©Silberschatz, Korth and Sudarshan Tuning using Materialized Views (Cont.)  How to choose a set of materialized views  Helping one transaction type by introducing a materialized view may hurt others  Choice of materialized views depends on costs   Users often have no idea of actual cost of operations Overall, manual selection of materialized views is tedious  Some database systems provide tools to help DBA choose views to materialize  “Materialized view selection wizards” 18 Database System Concepts, 5th Ed. 23.18 ©Silberschatz, Korth and Sudarshan Automated Tuning of Physical Design  Automated Tools  Index selection  Materialized View selection  Partition data in a parallel database system  Users specify information about the size of database and related statistics  MicroSoft’s Database Tuning Assistant  Allow users to pose “what-if” questions for selecting materialized view  Workload compression  Generate workload using a small number of queries and updates  Greedy heuristics and other various techniques 19 Database System Concepts, 5th Ed. 23.19 ©Silberschatz, Korth and Sudarshan Tuning of Transactions  Basic approaches to tuning of transactions  Improve set orientation  Reduce lock contention  Split a large transaction  Rewriting of queries to improve performance was important in the past, but smart optimizers have made this less important  Improving set orientation  Communication and query handling overhead of each call are significant  Set orientation  fewer calls to database  Combine multiple embedded SQL/ODBC/JDBC queries into a single setoriented query E.g. tune program that computes total salary for each department using a separate SQL query by instead using a single query that computes total salaries for all department at once (using group by)  Use stored-procedures: avoids re-parsing and re-optimization of query  20 Database System Concepts, 5th Ed. 23.20 ©Silberschatz, Korth and Sudarshan Tuning of Transactions (Cont.)  Reducing lock contention  Long transactions (typically read-only) that examine large parts of a relation result in lock contention with update transactions   Use multi-version concurrency control   E.g. large query to compute bank statistics and regular bank transactions E.g. Oracle “snapshots” which support multi-version 2PL Use degree-two consistency (cursor-stability) for long transactions  Drawback: result may be approximate 21 Database System Concepts, 5th Ed. 23.21 ©Silberschatz, Korth and Sudarshan Tuning of Transactions (Cont.)  Long update transactions cause several problems  Exhaust lock space  Exhaust log space  Also greatly increase recovery time after a crash  may exhaust log space even more during recovery if recovery algorithm is badly designed!  If a single large transaction updates every record of a very large relation, log may grow too big • Split large transaction into a batch of ``mini-transactions'‘ • Each mini transaction is suppose do perform a limited number of the updates • Hold locks across transactions in a mini-batch to ensure serializability • If lock table size is a problem  can release locks, but at the cost of serializability • In case of failure during a mini-batch • must complete its remaining portion on recovery, to ensure atomicity • support the concept of nested transaction 22 Database System Concepts, 5th Ed. 23.22 ©Silberschatz, Korth and Sudarshan Performance Simulation  Performance simulation using queuing model  useful to predict bottlenecks as well as the effects of tuning changes,  even without access to real system  Queuing model as we saw earlier  Models activities that go on in parallel  Simulation model is quite detailed, but usually omits some low level details  Model service time, but disregard details of service  E.g. approximate disk read time by using an average disk read time  Experiments can be run on model, and provide an estimate of measures such as average throughput/response time  After simulation, parameters can be tuned in model and then replicated in real system  E.g. number of disks, memory, algorithms, etc 23 Database System Concepts, 5th Ed. 23.23 ©Silberschatz, Korth and Sudarshan Table of Contents  23.1 Performance Tuning  23.2 Performance Benchmarks  23.3 Standardization  23.4 Application Migration  23.5 Summary 24 Database System Concepts, 5th Ed. 23.24 ©Silberschatz, Korth and Sudarshan Performance Benchmarks  Suites of tasks used to quantify the performance of software systems  single task not enough for complex systems  Important in comparing database systems, especially as systems become more standards compliant.  Commonly used performance measures:  Throughput (transactions per second, or TPS)  Response time (delay from submission of transaction to return of result)  Availability or mean time to failure (MTTF)  Need to beware when computing average throughput of different transaction types  Suppose that a system A runs transaction type T1 at 99 tps and transaction type T2 at 1 tps and another system B runs both T1 and T2 at 50 tps.  The idea of simple averaging TPS is wrong! – equal performance of two systems A and B (50 tps)!  If we ran 50 transactions of each type – System A: T1 (0.01 sec), T2 (1 sec)  need 50.5 secs – System B: T1 (0.02 sec), T2 (0.02)  need 2 secs 25 Database System Concepts, 5th Ed. 23.25 ©Silberschatz, Korth and Sudarshan Performance Benchmarks (Cont.)  Instead, harmonic mean of n throughputs t1, t2, … tn: n 1/t1 + 1/t2 + … + 1/tn  system A (T1: 99 tps. T2: 1 tps), system B (T1: 50 tps. T2: 50 tps)  The harmonic mean for system A is 1.98 ( 2 / ((1/99) + (1/1)) ) while for system B, it is 50 (2 / ((1/ 50) + (1 / 50)) )  System B is 25 times faster than system A  Interference (e.g. lock contention) makes even this incorrect if different transaction types run concurrently 26 Database System Concepts, 5th Ed. 23.26 ©Silberschatz, Korth and Sudarshan Database Application Classes  Online transaction processing (OLTP)  requires high concurrency and clever techniques to speed up commit processing, to support a high rate of update transactions.  Decision support applications  including online analytical processing, or OLAP applications  require good query evaluation algorithms and query optimization.  Architecture of some database systems tuned to one of the two classes  E.g. Teradata DBMS is tuned to decision support  Others try to balance the two requirements  E.g. Oracle, with snapshot support for long read-only transaction 27 Database System Concepts, 5th Ed. 23.27 ©Silberschatz, Korth and Sudarshan Transaction Benchmarks Suites  The Transaction Processing Council (TPC) benchmark suites are widely used.  TPC-A and TPC-B: simple OLTP application modeling a bank teller application with and without communication Not used anymore  TPC-C: complex OLTP application modeling an inventory system  Current standard for OLTP benchmarking  TPC-D: complex decision support application  Superceded by TPC-H and TPC-R   TPC-H: (H for ad hoc) based on TPC-D with some extra queries  Models ad hoc queries which are not known beforehand – Total of 22 queries with emphasis on aggregation  prohibits materialized views permits indices only on primary and foreign keys  TPC-R: (R for reporting) same as TPC-H, but without any restrictions on materialized views and indices  TPC-W: (W for Web) End-to-end Web service benchmark modeling a Web bookstore, with combination of static and dynamically generated pages 28  Database System Concepts, 5th Ed. 23.28 ©Silberschatz, Korth and Sudarshan TPC Performance Measures  TPC performance measures  transactions-per-second with specified constraints on response time  transactions-per-second-per-dollar accounts for cost of owning system  TPC benchmark requires database sizes to be scaled up with increasing transactions-per-second  reflects real world applications where more customers means more database size and more transactions-per-second  External audit of TPC performance numbers is mandatory  TPC performance claims can be trusted 29 Database System Concepts, 5th Ed. 23.29 ©Silberschatz, Korth and Sudarshan Sample TPC Performance Measures  Two types of tests for TPC-H (ad hoc) and TPC-R (report)     Power metric test  runs queries and updates sequentially  then takes mean to find queries per hour Throughput metric test  runs queries and updates concurrently  multiple streams running in parallel: each stream generates queries, with one parallel update stream  the total time of the entire run Composite “query per hour” metric  overall metric  square root of ( power metric X throughput metric ) Composite “price vs. performance” metric  system price / composite metric Database System Concepts, 5th Ed. 23.30 30 ©Silberschatz, Korth and Sudarshan Other Benchmarks  OODB transactions require a different set of benchmarks.  Reason: hard to define what is a typical OODB application  The object operational benchmark, version 1: 001 benchmark  OO7 benchmark has several different operations  provides a separate benchmark number for each kind of operation – Traversing a connected objects – Retrieving all objects in a class  Benchmarks for XML databases are now being discussed 31 Database System Concepts, 5th Ed. 23.31 ©Silberschatz, Korth and Sudarshan Table of Contents  23.1 Performance Tuning  23.2 Performance Benchmarks  23.3 Standardization  23.4 Application Migration  23.5 Summary 32 Database System Concepts, 5th Ed. 23.32 ©Silberschatz, Korth and Sudarshan Standardization  The complexity of contemporary software systems and the need for their interoperation require a variety of standards.  syntax and semantics of programming languages  functions in application program interfaces  data models (e.g. object-oriented / object-relational databases)  Formal standards  De facto standards  Anticipatory standards  Reactionary standards 33 Database System Concepts, 5th Ed. 23.33 ©Silberschatz, Korth and Sudarshan Standardization (Cont.)  Formal standards are standards developed by a standards organization (ANSI, ISO), or by industry groups, through a public process.  De facto standards are generally accepted as standards without any formal process of recognition  Standards defined by dominant vendors (IBM, Microsoft) often become de facto standards  De facto standards often go through a formal process of recognition and become formal standards  Anticipatory standards lead the market place, defining features that vendors then implement  Ensure compatibility of future products  But at times become very large and unwieldy since standards bodies may not pay enough attention to ease of implementation (e.g.,SQL-92 or SQL:1999)  Reactionary standards attempt to standardize features that vendors have already implemented, possibly in different ways.  Can be hard to convince vendors to change already implemented features. E.g. OODB systems 34 Database System Concepts, 5th Ed. 23.34 ©Silberschatz, Korth and Sudarshan SQL Standards History  SQL developed by IBM in late 70s / early 80s  SQL-86: The first formal standard  IBM SAA standard for SQL in 1987  SQL-89 added features to SQL-86 that were already implemented in many systems  SQL-89 was a reactionary standard  SQL-92 added many new features to SQL-89 (anticipatory standard)  Defines levels of compliance (entry, intermediate and full)  Even now few database vendors have full SQL-92 implementation  SQL 1999  Adds variety of new features --- extended data types, object orientation, procedures, triggers, etc.  SQL 2003 35 Database System Concepts, 5th Ed. 23.35 ©Silberschatz, Korth and Sudarshan SQL Standards History (Cont.)  The SQL: 2003 Standard  Part 1: SQL/Framework; overview  Part 2: SQL/Foundation; types, schemas, DDL, DML, security, etc  Part 3: SQL/CLI (Call Level Interface); API interface  Part 4: SQL/PSM (Persistent Stored Modules); procedural extensions  Part 9: SQL/MED (Management of External Data)  Interfacing of database to external data sources      – other databases and even files can be viewed as part of the database Part 10: SQL/OLB (Object Language Bindings); embedding SQL in Java Part 11: SQL Schemata(Information and Definition Schema) defines a standard catalog interface Part 13: SQL/JRT (Java Routines and Types) defines standards for accessing routines and types in Java Part 14: SQL/XML defines XML-related specifications Missing parts 5,6,7, 8, 12 cover features that are not near standardization yet  SQL/Bindings, Temporal Data, Distributed Transaction, Multimedia Data 36 Database System Concepts, 5th Ed. 23.36 ©Silberschatz, Korth and Sudarshan Database Connectivity Standards  Open DataBase Connectivity (ODBC) standard for database interconnectivity      based on Call Level Interface (CLI) developed by X/Open consortium and the SQL Access Group  defines application programming interface (API), and SQL features that must be supported at different levels of compliance JDBC standard used for interconnectivity between Java and Database X/Open XA standards by X/Open consortium  define transaction management standards for supporting distributed 2PC OLE-DB by Microsoft  API like ODBC, but to support non-db sources such as flat files or email stores  OLE-DB program can negotiate with data source to find what features are supported  Interface language may be a subset of SQL ADO (Active Data Objects) by Microsoft  Easy-to-use interface to OLE-DB functionality  Can be called from scripting languages such as VBScript and Jscript  ADO.NET API is for application in the .NET languages such as C# and Visual Basic.NET 37 Database System Concepts, 5th Ed. 23.37 ©Silberschatz, Korth and Sudarshan Xopen / XA 작동 그림예제 38 Database System Concepts, 5th Ed. 23.38 ©Silberschatz, Korth and Sudarshan OLE-DB & ADO 작동 그림예제 39 Database System Concepts, 5th Ed. 23.39 ©Silberschatz, Korth and Sudarshan Object Oriented Databases Standards  Object Database Management Group (ODMG) standard for OODB  version 1 in 1993, version 2 in 1997, version 3 in 2000  provides language independent Object Definition Language (ODL) as well as several language-specific bindings  Object Management Group (OMG) standard for distributed SW based on objects  Object Request Broker (ORB) provides transparent message dispatch to distributed objects  Interface Definition Language (IDL) for defining language-independent data types  Common Object Request Broker Architecture (CORBA) defines specifications of ORB and IDL 40 Database System Concepts, 5th Ed. 23.40 ©Silberschatz, Korth and Sudarshan CORBA 작동 그림예제 41 Database System Concepts, 5th Ed. 23.41 ©Silberschatz, Korth and Sudarshan XML-Based Application Standards  Several XML based Standards for E-commerce  E.g. RosettaNet (supply chain), BizTalk  Define catalogs, service descriptions, invoices, purchase orders, etc.  XML wrappers are used to export information from relational databases to XML  Simple Object Access Protocol (SOAP):  XML based remote procedure call (RPC) standard  Uses XML to encode data (both parameters and results)  HTTP as transport protocol (procedure call becomes HTTP request)  Endorsed by W3C  Popular in B2B E-commerce  Standards based on SOAP for specific applications  E.g. OLAP and Data Mining standards from Microsoft  DOM & SAX API are for general programming language to connect to XML Data 42 Database System Concepts, 5th Ed. 23.42 ©Silberschatz, Korth and Sudarshan SOAP 작동 그림예제 43 Database System Concepts, 5th Ed. 23.43 ©Silberschatz, Korth and Sudarshan Table of Contents  23.1 Performance Tuning  23.2 Performance Benchmarks  23.3 Standardization  23.4 Application Migration  23.5 Summary 44 Database System Concepts, 5th Ed. 23.44 ©Silberschatz, Korth and Sudarshan Legacy Systems  Legacy systems are older-generation systems that are incompatible with current generation standards and systems but still in production use  E.g. applications written in Cobol that run on mainframes  Today’s hot new system is tomorrows legacy system!  Porting legacy system applications to a more modern environment is problematic  Rewriting code  Very expensive, since legacy system may involve millions of lines of code, written over decades  Original programmers usually no longer available  Switching over from old system to new system is a problem  Big-Bang Approach  Chicken-little Approach  One approach: build a wrapper layer on top of legacy application to allow interoperation between newer systems and legacy application  E.g. use ODBC or OLE-DB as wrapper 45 Database System Concepts, 5th Ed. 23.45 ©Silberschatz, Korth and Sudarshan Legacy Systems (Cont.)  Rewriting legacy application requires a first phase of understanding what it does  Often legacy code has no documentation or outdated documentation  reverse engineering: process of going over legacy code to  Come up with schema designs in ER or OO model  Find out what procedures and processes are implemented, to get a high level view of system  Re-engineering: reverse engineering followed by design of new system  Improvements are made on existing system design in this process 46 Database System Concepts, 5th Ed. 23.46 ©Silberschatz, Korth and Sudarshan Migrating to a New System   Switching over from old to new system is a major problem  Production systems are in every day, generating new data  Stopping the system may bring all of a company’s activities to a halt, causing enormous losses Big-bang approach: 1. Implement complete new system 2. Populate it with data from old system 1. No transactions while this step is executed 2. scripts are created to do this quickly 3. Shut down old system and start using new system  Danger with this approach: what if new code has bugs or performance problems, or missing features  Company may be brought to a halt 47 Database System Concepts, 5th Ed. 23.47 ©Silberschatz, Korth and Sudarshan Migrating to a New System (cont.)  Chicken-little approach:  Replace legacy system one piece at a time  Use wrappers to interoperate between legacy and new code  E.g. replace front end first, with wrappers on legacy backend – Old front end can continue working in this phase in case of problems with new front end  Replace back end, one functional unit at a time – All parts that share a database may have to be replaced together, or wrapper is needed on database also  Drawback: significant extra development effort to build wrappers and ensure smooth interoperation  Still worth it if company’s life depends on system 48 Database System Concepts, 5th Ed. 23.48 ©Silberschatz, Korth and Sudarshan Table of Contents  23.1 Performance Tuning  23.2 Performance Benchmarks  23.3 Standardization  23.4 Application Migration  23.5 Summary 49 Database System Concepts, 5th Ed. 23.49 ©Silberschatz, Korth and Sudarshan Ch 21: Summary (1)  The Web browser has emerged as the most widely used user interface to databases.  HTML provides the ability to define interfaces that combine hyperlinks with forms facilities.  Web browsers communicate with Web servers by the HTTP protocol.  Web servers can pass on request to application programs, and return the results to the browser.  There are several client-side scripting languages - Java script is the most widely used-that provide richer user interaction at the browser end.  Web servers execute applications programs to implement desired functionality.  Servlets are a widely used mechanism to write application programs that run as part of the Web server process, in order to reduce overheads.  There are also many server-side scripting languages that are interpreted by the Web server and provide application program functionality as part of Web server. 50 Database System Concepts, 5th Ed. 23.50 ©Silberschatz, Korth and Sudarshan Ch 21: Summary (2)  Tuning of the database (system parameters, as well as the higher-level database design-such as the schema, indices, and transactions) is important for good performance.   Performance benchmarks play an important role in comparisons of database systems, especially as systems become more standards compliant.   Tuning is best done by identifying bottlenecks and eliminating them. The TPC benchmark suites are widely used, and the different TPC benchmarks are useful for our comparing the performance of databases under different workloads. Standards are important because of the complexity of database systems and their need for interoperation.   Formal standards exist for SQL. Defacto standards, such as ODBC and JDBC, and standard adopted by industry groups, such as CORBA, have played an important role in the growth of client-server database systems. Standards for object-oriented databases, such as ODMG, are being developed by industry groups. 51 Database System Concepts, 5th Ed. 23.51 ©Silberschatz, Korth and Sudarshan Ch 21: Summary (3)  E-commerce systems are fast becoming a core part of how commerce is performed.  There are several database issues in e-commerce systems. Catalog management, especially personalization of the catalog, is done with the databases.  Electronic marketplaces help in pricing of products through auctions, reverse auctions, or exchanges.  High-performance database systems are needed to handle such trading.  Orders are settled by electronic payment systems, which also need highperformance database systems to handle very high transaction rates.  Legacy systems are systems based on older-generation technologies such as nonrelational databases or even directly on file systems.  Interfacing legacy systems with new-generation systems is often important when the run mission-critical systems.  Migrating from legacy systems to new-generation systems must be done carefully to avoid disruptions, which can be very expensive. 52 Database System Concepts, 5th Ed. 23.52 ©Silberschatz, Korth and Sudarshan Ch 21: Bibliographical Notes (1)  Information about servlets, including tutorials, standard specifications, and software, is available on java.sun.com/products/servlet.  Information about JSP is available at java.sun.com/products/jsp.  An early proposal for a database-system benchmark (the Wisconsin benchmark) was made by Bitton et al.[1983].  The TPC-A,-B, and –C benchmarks are described in Gray[1991].  An online version of all the TPC benchmarks descriptions, as well as benchmark results, is available on the World Wide Web at the URL www.tpc.org     the site also contains up-to-date information about new benchmark proposals. O’Neil and O’Neil[2000] provides a very good textbook coverage of performance measurement and tuning. The five minute and one minute rules are described in Gray and Putzolu[1987] and Gray and Graefe[1997]. Brown et al.[1994] describes an approach to automated tuning. 53 Database System Concepts, 5th Ed. 23.53 ©Silberschatz, Korth and Sudarshan Ch 21: Bibliographical Notes (2)  Index selection and materialized view selection are addressed by Ross et al.[1996], Labio et al. [1997], Gupta [1997], Chaudhuri and Narasayya [1997], Agrawa et al. [2000] and Mistry et al. [2001].  Poess and Floyd[2000] gives an overview of the TPC-H, TPC-R, and TPC-W benchmarks.  The OO1 benchmark for OODBs is described in Cattell and Skeen [1992]; the OO7 benchmark is described in Carey et al. [1993].  Kleinrock [1975] and Kleinrock [1976] is a popular two-volume textbook on queuing theory.  Shasha [1992] provides a good overview of database tuning.  The American National Standard SQL-86 is described in ANSI[1986].  The IBM Systems Application Architecture definition of SQL is specified by IBM [1987].  The standards for SQL-89 and SQL-92 are available as ANSI [1989] and ANSI [1992] respectively.  For reference in the SQL:1999 standard, see bibliographical notes of Chapter 9. 54 Database System Concepts, 5th Ed. 23.54 ©Silberschatz, Korth and Sudarshan Ch 21: Bibliographical Notes (3)  The X/Open SQL call-level interface is defined in X/Open [1993]; the ODBC API is described in Microsoft [1997] and Sanders [1998].  The X/Open XA interface is defined in X/Open[1991]. More information about ODBC, OLE-DB, and ADO can be found on the Web site www.microsoft.com.data, and in a number of books on the subject can be found through www.amazon.com.  The ODMG 3.0 standard is defined in Cattell [2000]. ACM Sigmod Record, which is published quarterly, has a regular section on standards in databases, including benchmark standards.  A wealth of information on XML based standards is available online.  You can use a Web search engine such as Google to search for more up-to-date information about the XML and other standards.  Loeb[1998] provides a detailed description of secure electronic transactions.  Business process reengineering is covered by Cook [1996]. Kirchmer [1999] describes application implementation using standard software such as Enterprise Resource Planning (ERP) packages.  Umar [1997] covers reengineering and issues in dealing with legacy systems. 55 Database System Concepts, 5th Ed. 23.55 ©Silberschatz, Korth and Sudarshan Ch 21: Tools  There are many Web development tools that support database connectivity through servlets, JSP, Javascript, or other mechanisms.  We list a few of the better-known ones here:   JAVA SDK from SUN (java.sun.com)  Apache’s Tomcat (jakarta.apache.org) and Webserver (apache.org)  IBM WebSphere (www.software.ibm.com)  Microsoft’s ASP tools (www.microsoft.com)  Allaire’s Coldfusion and JRun products (www.allaire.com)  Caucho’s Resin (www.caucho.com)  Zope (www.zope.org). A few of these, such as Apache, are free for any use, some are free for noncommercial use or for personal use, while others need to be paid for  See the respective Web sites for more information. Database System Concepts, 5th Ed. 23.56 56 ©Silberschatz, Korth and Sudarshan Table of Contents  23.1 Performance Tuning  23.2 Performance Benchmarks  23.3 Standardization  23.4 Application Migration  23.5 Summary 57 Database System Concepts, 5th Ed. 23.57 ©Silberschatz, Korth and Sudarshan End of Chapter Database System Concepts ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Database System Concepts 58 ©Silberschatz, Korth and Sudarshan
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            