Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 19 Current and Emerging Trends Transparencies © Pearson Education Limited, 2004 1 Chapter 19 – Objectives Requirements for advanced database applications. Why RDBMSs currently not well suited to supporting these. Main concepts of DDBMSs. Main concepts of database replication. Main concepts of OODBMSs and ORDBMSs. Main concepts of data warehousing. Main concepts of OLAP and data mining. Approaches for integrating databases into the web environment. © Pearson Education Limited, 2004 2 Advanced Database Applications Computer-Aided Design (CAD) Computer-Aided Manufacturing (CAM) Office Information Systems (OIS) and Multimedia Systems Geographic Information Systems (GIS) Interactive and Dynamic Web sites. © Pearson Education Limited, 2004 3 Advanced Database Applications Computer-Aided Design (CAD) Stores data relating to mechanical and electrical design, for example, buildings, airplanes, and integrated circuit chips. Designs of this type have some common characteristics: Data has many types, each with a small number of instances. Designs may be very large. © Pearson Education Limited, 2004 4 Advanced Database Applications Design is not static but evolves through time. Updates are far-reaching. Involves version control and configuration management. Cooperative engineering. Computer-Aided Manufacturing (CAM) Stores similar data to CAD, plus data about discrete production. © Pearson Education Limited, 2004 5 Office Information Systems (OIS) and Multimedia Systems Stores data relating to computer control of information in a business, including electronic mail, documents, invoices, etc. Modern systems now handle free-form text, photographs, diagrams, audio and video sequences. Documents may have specific structure, perhaps described using mark-up language such as SGML, HTML, or XML. © Pearson Education Limited, 2004 6 Geographic Information Systems (GIS) GIS database stores spatial and temporal information, such as that used in land management and underwater exploration. Much of data is derived from survey and satellite photographs, and tends to be very large. Searches may involve identifying features based on shape, color, texture, using advanced pattern-recognition techniques. © Pearson Education Limited, 2004 7 Interactive and Dynamic Web Sites Consider online catalog for selling clothes. Web site maintains preferences for previous visitors to site and allows visitor to: obtain 3D rendering of any item based on color, size, fabric, etc.; modify rendering to account for movement, illumination, backdrop, occasion, etc.; select accessories to go with the outfit, from items presented in a sidebar; © Pearson Education Limited, 2004 8 Interactive and Dynamic Web Sites Need to handle multimedia content and to interactively modify display based on user preferences and user selections. Also have added complexity of providing 3D rendering. © Pearson Education Limited, 2004 9 Weaknesses of RDBMSs Poor Representation Entities of “Real World” Normalization leads to relations that do not correspond to entities in “real world”. Semantic Overloading Relational model has only one construct for representing data and data relationships: the table. Relational model is semantically overloaded. © Pearson Education Limited, 2004 10 Weaknesses of RDBMSs Poor Support for Business Rules Limited Operations RDBMSs only have a fixed set of operations which cannot be extended. Difficulty Handling Recursive Queries Extremely difficult to produce recursive queries. Extension proposed to relational algebra to handle this type of query is unary transitive (recursive) closure operation. © Pearson Education Limited, 2004 11 Weaknesses of RDBMSs Impedance Mismatch Most DMLs lack computational completeness. To overcome this, SQL can be embedded in a high-level 3GL. This produces an impedance mismatch - mixing different programming paradigms. © Pearson Education Limited, 2004 12 DDBMSs - Concepts Distributed Database A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network. Distributed DBMS Software system that permits the management of the distributed database and makes the distribution transparent to users. © Pearson Education Limited, 2004 13 DDBMSs- Concepts Collection of logically-related shared data. Data split into fragments. Fragments may be replicated. Fragments/replicas allocated to sites. Sites linked by a communications network. Data at each site is under control of a DBMS. DBMSs handle local appns autonomously. Each DBMS participates in at least one global appn. © Pearson Education Limited, 2004 14 DDBMS © Pearson Education Limited, 2004 15 Distributed Processing Centralized database that can be accessed over a computer network. © Pearson Education Limited, 2004 16 Advantages of DDBMSs Reflects organizational structure Improved shareability and local autonomy Improved availability Improved reliability Improved performance Economics Modular growth © Pearson Education Limited, 2004 17 Disadvantages of DDBMSs Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex © Pearson Education Limited, 2004 18 Replication Servers Replication Process of generating and reproducing multiple copies of data at one or more sites. Provides users with access to current data where and when they need it. Provides number of benefits, including improved performance when centralized resources get overloaded, increased reliability and data availability, and support for mobile computing and data warehousing. © Pearson Education Limited, 2004 19 Synch vs Asynch Replication Synchronous – updates to replicated data are part of enclosing transaction. If one or more sites that hold replicas are unavailable transaction cannot complete. Large number of messages required to coordinate synchronization. Asynchronous - target database updated after source database modified. Delay in regaining consistency may range from few seconds to several hours or even days. © Pearson Education Limited, 2004 20 Replication - Functionality At basic level, has to be able to copy data from one database to another (synch. or asynch.). Other functions include: Scalability. Mapping and Transformation. Object Replication. Specification of Replication Schema. Subscription mechanism. Initialization mechanism. © Pearson Education Limited, 2004 21 Replication - Data Ownership Ownership relates to which privilege to update the data. Main types of ownership are: site Master/slave (or asymmetric replication), Workflow, Update-anywhere (or peer-to-peer symmetric replication). © Pearson Education Limited, 2004 has or 22 Replication - Master/Slave Ownership Asynchronously replicated data is owned by one (master) site, and can be updated by only that site. Using ‘publish-and-subscribe’ metaphor, master site makes data available. Other sites ‘subscribe’ to data owned by master site, receiving read-only copies. Potentially, each site can be master site for non-overlapping data sets, but update conflicts cannot occur. © Pearson Education Limited, 2004 23 Replication - Workflow Ownership Avoids update conflicts, while providing more dynamic ownership model. Allows right to update replicated data to move from site to site. However, at any one moment, only ever one site that may update that particular data. Example is order processing system, which follows steps, such as order entry, credit approval, invoicing, shipping, and so on. © Pearson Education Limited, 2004 24 Replication Ownership - Update-Anywhere Creates peer-to-peer environment where multiple sites have equal rights to update replicated data. Allows local sites to function autonomously, even when other sites are not available. Shared ownership can lead to conflict scenarios and have to detect conflict and resolve it. © Pearson Education Limited, 2004 25 OODBMSs No one agreed object data model. One definition: Object-Oriented Data Model (OODM) Data model that captures semantics of objects supported in object-oriented programming. Object-Oriented Database (OODB) Persistent and sharable collection of objects defined by an OODM. Object-Oriented DBMS (OODBMS) Manager of an OODB. © Pearson Education Limited, 2004 26 Origins of the OODM © Pearson Education Limited, 2004 27 Advantages of OODBMSs Enriched Modeling Capabilities. Extensibility. Removal of Impedance Mismatch. More Expressive Query Language. Support for Schema Evolution. Support for Long Duration Transactions. Applicability to Advanced Database Applications. © Pearson Education Limited, Improved Performance. 2004 28 Disadvantages of OODBMSs Lack of Experience. Lack of Standards. Competition from RDBMSs. Complexity. Lack of Support for Views. Lack of Support for Security. © Pearson Education Limited, 2004 29 ORDBMSs Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited to advanced database applications, and added functionality is required. Reject claim that ORDBMSs will not provide sufficient functionality or will be too slow to cope adequately with new complexity. Can remedy shortcomings of relational model by extending model with OO features. © Pearson Education Limited, 2004 30 ORDBMSs - Features OO features being added include: user-extensible types, encapsulation, inheritance, polymorphism, dynamic binding of methods, complex objects including non-1NF objects, object identity. © Pearson Education Limited, 2004 31 ORDBMSs - Features However, no model. All models: single extended relational share basic relational tables and query language, all have some concept of ‘object’, some can store methods (or procedures or triggers). © Pearson Education Limited, 2004 32 Advantages of ORDBMSs Resolves many of known weaknesses of RDBMS. Reuse and sharing: reuse comes from ability to extend server to perform standard functionality centrally; gives rise to increased productivity both for developer and end-user. Preserves significant body of knowledge and experience gone into developing relational applications. © Pearson Education Limited, 2004 33 Disadvantages of ORDBMSs Complexity. Increased costs. Proponents of relational approach believe simplicity and purity of relational model are lost. Some believe RDBMS is being extended for what will be a minority of applications. OO purists not attracted by extensions either. © Pearson Education Limited, 2004 34 Evolution of Data Warehousing Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective services to customer. This resulted in accumulation of growing amounts of data in operational databases. Now focus on ways to use operational data to support decision-making, as a means of gaining competitive advantage. © Pearson Education Limited, 2004 35 Evolution of Data Warehousing Operational systems were never designed to support such business activities, so using such systems may not be easy solution. Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions (such as data types). Challenge is to turn archives of data into a source of knowledge, so that a single integrated/consolidated view of organization’s Education Limited, data is presented ©toPearson user. 2004 36 The Evolution of Data Warehousing Data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources. © Pearson Education Limited, 2004 37 Data Warehousing Concepts Consolidated/integrated view of corporate data drawn from disparate operational data sources and a range of end-user access tools capable of supporting simple to highly complex queries to support decision-making. Data described as being a subjectoriented, integrated, time-variant, and non-volatile (Inmon, 1993). © Pearson Education Limited, 2004 38 Subject-Oriented Data Warehouse is organized around major subjects of the enterprise (e.g. customers, products, sales) rather than major application areas (e.g. customer invoicing, stock control, product sales). This is reflected in the need to store decision-support data rather than application-oriented data. © Pearson Education Limited, 2004 39 Integrated Data Data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent. Integrated data source must be made consistent to present a unified view of the data to the users. © Pearson Education Limited, 2004 40 Time-Variant Data Data in the warehouse is only accurate and valid at some point in time or over some time interval. Time-variance is also shown in extended time that data is held, implicit or explicit association of time all data, and the fact that the represents a series of snapshots. © Pearson Education Limited, 2004 the the with data 41 Non-Volatile Data Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. New data is always added as a supplement to the database, rather than a replacement. © Pearson Education Limited, 2004 42 Typical Architecture of a DW © Pearson Education Limited, 2004 43 Typical Architecture of a DW Operational data: Supplied from mainframes, proprietary file systems, private workstations and servers, and external systems such as the Internet. Operational data store (ODS): Repository of current and integrated operational data used for analysis. Often structured and supplied with data in the same way as the data warehouse. May act simply as a staging area for data to be moved into the warehouse. © Pearson Education Limited, 2004 44 Typical Architecture of a DW Load Manager: Warehouse Manager Performs all operations associated with extraction and loading of data into warehouse. Performs all operations associated with management of data in the warehouse, such as merging data sources. Query Manager Performs all associated with management of user queries. © Pearson Education Limited, 2004 45 Typical Architecture of a DW Detailed data: Not stored online but made available by summarizing data to the next level of detail. However, detailed data regularly added to warehouse to supplement summarized data. Lightly and highly summarized data: Predefined and generated by warehouse manager and stored in warehouse. Purpose is to speed up performance of queries. Updated continuously as new data is loaded into the warehouse. © Pearson Education Limited, 2004 46 Typical Architecture of a DW Meta-data (data about data): Used by all processes in the warehouse. End-user access tools: Principal purpose of data warehousing is to provide information to business users for strategic decision-making. Users interact with warehouse using end-user access tools. Warehouse must efficiently support ad hoc and routine analysis. Includes EIS, OLAP and data mining tools. © Pearson Education Limited, 2004 47 Data Mart Subset of data warehouse that supports requirements of particular department or business function. Characteristics include: Holds subset of data in warehouse in summary form. Focuses on requirements of one department or business function. Can be stand-alone or linked to warehouse. Popular because less complex than warehouse. © Pearson Education Limited, 2004 48 Architecture of a Data Mart Can be two-tier or three-tier database applications: Data warehouse is the optional first tier. Data mart is the second tier. End-user workstation is the third tier. Data is distributed among tiers. © Pearson Education Limited, 2004 49 Reasons for Creating a Data Mart Give users access to data they need to analyze most often. Provide data in a form that matches the collective view of the data by group of users in department or business area. Improve end-user response time due to reduction in volume of data to be accessed. Provide appropriately structured data as dictated by requirements of end-user access © Pearson Education Limited, tools. 2004 50 Reasons for Creating a Data Mart Simpler to build compared with establishing a corporate data warehouse. Cost of implementation is normally less than that required to establish a data warehouse. Potential users are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project. © Pearson Education Limited, 2004 51 Introducing OLAP Dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data. Describes a technology that uses a multidimensional view of aggregate data to provide quick access to strategic information for purposes of advanced analysis. © Pearson Education Limited, 2004 52 Introducing OLAP Enables users to gain deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to wide variety of possible views of data. Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise. © Pearson Education Limited, 2004 53 Introducing OLAP Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘what if?’ and ‘why?’ type questions distinguishes OLAP from general-purpose query tools. Types of analysis ranges from basic navigation and browsing (slicing and dicing), to calculations, to more complex analyses such as time series and complex modeling. © Pearson Education Limited, 2004 54 Examples of OLAP Applications © Pearson Education Limited, 2004 55 OLAP Applications Essential requirement of all OLAP applications is ability to provide users with just-in-time (JIT) information, to make effective decisions about an organization's strategic directions. JIT information is computed data that usually reflects complex relationships and is often calculated on the fly. Practical only if response times are consistently short and data model flexible. © Pearson Education Limited, 2004 56 OLAP Applications Although OLAP applications are found in widely divergent functional areas, all have following key features: multi-dimensional views of data; support for complex calculations; time intelligence. Time intelligence is key feature of almost any analytical application as performance is almost always judged over time. © Pearson Education Limited, 2004 57 OLAP Benefits Increased productivity of end-users. Reduced backlog of applications development for IT staff. Retention of organizational control over the integrity of corporate data. Reduced query drag and network traffic on OLTP systems or on the data warehouse. Improved potential revenue and profitability. © Pearson Education Limited, 2004 58 Data Mining Process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. Involves analysis of data and use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. © Pearson Education Limited, 2004 59 Data Mining Focus is to reveal information that is hidden and unexpected. Patterns and relationships are identified by examining the underlying rules and features in the data. Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions. © Pearson Education Limited, 2004 60 Data Mining Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data. Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing. Relatively new technology, however already used in a number of industries. © Pearson Education Limited, 2004 61 Some Applications of Data Mining Retail / Marketing Identifying buying patterns of customers. Finding associations among customer demographic characteristics. Predicting response to mailing campaigns. Market basket analysis. © Pearson Education Limited, 2004 62 Some Applications of Data Mining Banking Detecting patterns of fraudulent credit card use. Identifying loyal customers. Predicting customers likely to change their credit card affiliation. Determining credit card spending by customer groups. © Pearson Education Limited, 2004 63 Some Applications of Data Mining Insurance Claims analysis. Predicting which customers will buy new policies. Medicine Characterizing patient behavior to predict surgery visits. Identifying successful medical therapies for different illnesses. © Pearson Education Limited, 2004 64 Data Mining Operations Four main operations include: Predictive modeling. Database segmentation. Link analysis. Deviation detection. Recognized associations between the applications and corresponding operations. e.g. Direct marketing strategies use database segmentation. © Pearson Education Limited, 2004 65 Data Mining Techniques Techniques are specific implementations of the data mining operations. Each operation has its own strengths and weaknesses. Data mining tools sometimes offer a choice of operations to implement a technique. © Pearson Education Limited, 2004 66 Data Mining Techniques Criteria for selection of tool includes: Suitability for certain input data types. Transparency of the mining output. Tolerance of missing variable values. Level of accuracy possible. Ability to handle large volumes of data. © Pearson Education Limited, 2004 67 Web-database integration Just over a decade after its conception in 1989, Web is arguably most popular and powerful networked information system to date. Growth has been near exponential and it has started an information revolution that will continue through the next decade. Now combination of the Web and databases brings many new opportunities for creating advanced database applications. © Pearson Education Limited, 2004 68 Web-database integration Compelling platform for delivery and dissemination of data-centric, interactive applications. Organizations now rapidly building new database applications or reengineering existing ones to take advantage of Web as strategic platform for implementing innovative business solutions, in effect becoming Web-centric organizations. © Pearson Education Limited, 2004 69 Static and Dynamic Web Pages HTML/XML document stored in file is static Web page. Content of dynamic Web page is generated each time it is accessed. Thus, dynamic Web page can: respond to user input from browser; be customized by and for each user. Requires hypertext to be generated by servers. © Pearson Education Limited, 2004 70 Static and Dynamic Web Pages Need scripts that perform conversions from different data formats into HTML/XML ‘onthe-fly’. As a database is dynamic, changing as users create, insert, update, and delete data, then generating dynamic Web pages is a much more appropriate approach than creating static ones. © Pearson Education Limited, 2004 71 Requirements for Web-DBMS Integration Ability to access valuable corporate data in a secure manner. Data- and vendor-independent connectivity to allow freedom of choice in DBMS selection. Ability to interface to database independent of any proprietary browser or Web server. Connectivity solution that takes advantage of all the features of an organization’s DBMS. © Pearson Education Limited, 2004 72 Requirements for Web-DBMS Integration Open architecture to allow interoperability with a variety of systems and technologies. Cost-effective solution that allows for scalability, growth, and changes in strategic directions, and helps reduce applications development costs. Support for transactions that span multiple HTTP requests. © Pearson Education Limited, 2004 73 Requirements for Web-DBMS Integration Support for session- and application-based authentication. Acceptable performance. Minimal administration overhead. Set of high-level productivity tools to allow applications to be developed, maintained, and deployed with relative ease and speed. © Pearson Education Limited, 2004 74 Approaches to Integrating Web and DBMSs Scripting Languages. Common Gateway Interface (CGI). HTTP Cookies. Extending the Web Server. Java, JDBC, SQLJ, Servlets, and JSP. Vendor-specific solutions such as: Microsoft Web Solution Platform: ASP and ADO. Oracle Internet Platform. © Pearson Education Limited, 2004 75 XML (eXtensible Markup Language) Most documents on Web currently stored and transmitted in HTML. One strength of HTML is its simplicity. However, its simplicity is also one of its weaknesses, with growing need from users who want tags to simplify some tasks and make HTML documents more attractive and dynamic. © Pearson Education Limited, 2004 76 XML To satisfy this demand, vendors introduced some browser-specific HTML tags, which made it difficult to develop sophisticated, widely viewable Web documents. W3C has produced a new standard called XML, which could preserve the general application independence that makes HTML portable and powerful. © Pearson Education Limited, 2004 77 XML Meta-language (language for describing other languages) that enables designers to create their own customized tags to provide functionality not available with HTML. Restricted version of SGML (Standard Generalized Markup Language), designed especially for Web documents. © Pearson Education Limited, 2004 78 XML Set to impact every aspect of programming including graphical interfaces, embedded systems, distributed systems, and database management. Becoming de facto standard for data communication within software industry, and quickly replacing EDI as primary medium for data interchange among businesses. Some analysts believe it will become language in which most documents are © Pearson Education Limited, created and stored, both on and off Internet. 2004 79 XML and Databases As amount of data in XML expands, there will be increasing demand to store, retrieve, and query this data. Two main models anticipated: data-centric document-centric. © Pearson Education Limited, 2004 80 XML – Data-centric model Fact that data is stored/transferred as XML is incidental. In this case, data could be stored in RDBMS, ORDBMS, or OODBMS. Oracle has completely integrated XML into its Oracle 9i system. XML can be stored as entire documents using data types XMLType or CLOB/BLOB or can be decomposed into its constituent elements and stored that way. Oracle query language has been extended to Pearson Education Limited, permit searching©of XML-based content. 2004 81 XML – Document-centric model Documents designed for human consumption (eg. books, newspapers, email). Data may be irregular/incomplete, and structure may change rapidly or unpredictably. Unfortunately, RDBMSs, ORDBMSs, and OODBMSs do not handle data of this nature particularly well. Content management systems are important tools for handling these types of documents. Underlying such a system, may now find a native XML database. © Pearson Education Limited, 2004 82 Native XML Database Defines (logical) data model for an XML document (as opposed to the data in that document) and stores and retrieves documents according to that model. At a minimum, model must include elements, attributes, PCDATA, and document order. XML document must be the unit of (logical) storage although it is not restricted by any underlying physical storage model (so Limited, out) . traditional DBMSs© Pearson are Education not ruled 2004 83 XML – Query Languages DBMS vendors have extended SQL to handle query of XML-based content. Standardization of XML extensions to SQL is known as SQL/XML and initial work has been submitted to ISO and ANSI. In addition, W3C formed an XML Query Working Group to produce: data model for XML documents, set of query operators on this model, query language based on these query operators (called XQuery). © Pearson Education Limited, 2004 84 XML – XQuery Queries operate on single documents or fixed collections of documents. Can select entire documents or subtrees of documents that match conditions based on document content and structure. Queries can also construct new documents based on what has been selected. Ultimately, collections of XML documents will be accessed like databases. Web Technology is highly dynamic so expect significant developments over the next years. © Pearson Education Limited, 2004 85