Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Business Analysis and Data Design ITEC-630 Fall 2008 Advanced Database Topics Professor J. Alberto Espinosa Agenda • Client-server computing and database servers • Connecting databases to the web • Other advanced database topics: – – – – – Database administration Transactions Concurrency Distributed databases Data warehouses 2 Client Server Computing and Database Servers 3 Client-Server Computing • A key technological development in the 90’s • A form of “distributed computing” • Most predominant computing architecture today • Software application (i.e., processing) is split into tasks • These tasks are distributed among computers • Depending where it is more efficient to do the processing 4 Clients and Servers Clients – Request specialized services from servers, and – Perform other tasks for users (e.g., screen displays) Servers – Acknowledge service requests from clients, and – Provide requested services (i.e., tasks, processes) – Via responses to clients • Servers and clients connect via networks 5 Client-Server Computing Database Client Other Client Request (e.g., SQL query) Network Response (e.g., query results) Database Server Browser Web Server Service Request (e.g., a web page) Response (e.g., a web page) Other Server 6 Examples of Servers • A server can be hardware, software or both • File Server central file storage, process file requests (ex. Novell’s NetWare, Windows NT) • Database Server back-end DBMS functions (ex. MS SQL Server, Oracle Server, Lotus Notes Server) • Web Server store and fetch web files on request (ex. Apache, Microsoft IIS) • Print Server print job queuing for central printers • Mail Server routes mail to users and other mail servers 7 Examples of Clients • A client can be hardware, software or both • Networked PCs request files and other services from file servers (Windows 2000, XP) • Database Clients request records from database server, process data locally, screen formatting, etc. (Lotus Notes client, MS Access) • Web Browsers request web files from web servers, translate HTML code into formatted screen displays (Internet Explorer, Netscape) • Mail Client Send/retrieve mail to/from mail servers, organize and display user mail (Outlook Express; Lotus Notes mail client) 8 Generic Client-Server Architecture Request Application Software (Client Portion) Client Communication Software Client Operating System Response Communication Protocols Hardware Platform Application Software (Server Portion) Server Communication Software Server Operating System Hardware Platform Network Client Server 9 Ex.: “Thin Client” or “Fat Server” model Most of the processing is done by the server Thin Client Presentation Software Fat Server Example: •Web Server-Browser Applications •Easy to deploy applications Client Communication Software Application Software •Great for electronic commerce DBMS •Easy to support and upgrade applications for distributed use Server Communication Software Client Operating System Server Operating System Hardware Platform Hardware Platform Network 10 Ex.: “Fat Client” or “Thin Server” model Most of the processing is done by the client Thin Server Fat Client Example: Presentation Software Application Software Client Communication Software •File Servers (Novell’s NetWare—your G drive, Windows NT) – file servers do very little – e.g., giving you access to shared drives and folders Server Communication Software Server Operating System (incl. file management) Client Operating System Hardware Platform Hardware Platform Network 11 Examples of “Very Thin” Servers: Embeddable Web Servers 12 Example: Client-Server Database Management Systems (DBMS) (fat client) Presentation Software Database Application Request Front-End Database Client SW Response Back-End Database Server SW Client Communication Software Server Communication Software Client Operating System Server Operating System Hardware Platform Databases Hardware Platform Network Client Computer Server Computer 13 Ex.: Web Client-Server Client Server Browser Web Server HTTP HTTP TCP/IP TCP/IP Client Communication Software Server Communication Software Client Operating System Server Operating System Hardware Platform Web pages (HTML and other files) Hardware Platform Network 14 Ex.: Web Client-Server + Database Server (thin client) Client HTML Response Server HTML Form Web Server SQL Queries DBMS Server Browser HTTP Web pages (HTML and other files) HTTP Databases TCP/IP TCP/IP Client Communication Software Server Communication Software Client Operating System Server Operating System Hardware Platform Hardware Platform Network 15 Connecting Databases with Web Pages 16 Dynamic Web Pages: Connecting Web Pages to Databases Request (ex. get a price quote, place an order) Response (ex. query results with HTML-formatted product price or order confirmation notice) Client / Server Computing: BrowserClient / Web + Database Servers = Dynamic Web Pages Web Server Client Query String Results Network Click Submit Browser Service Request Database Query Results Database Server (usually runs in the same computer as the web server) Response: Dynamically Formatted HTML Page w/Results Server 18 Setting Up Your Own Site With Dynamic Web Capabilities Steps: 1. Register your own domain name (e.g., my domain is www.Jibe4Fun.com) – there are hundreds of domain registration services ($20 to $40 per year to keep your domain name active) (through a service like http://DomainName.com) 2. Contract web hosting services with a company to hold your web pages – there are hundreds of web hosting services ranging from ($100 per year for a few MB of storage to highly priced commercial-strength e-commerce services (through a service like http://www.Alentus.com) 3. Map your domain name to your web hosting service (5 minutes) 4. Design, normalize and populate your database(s) 5. Design and develop your HTML files and related scripts 6. Upload your HTML files, scripts and databases to your assigned web space with your web hosting service 19 Dynamic Web Pages: Scripts 20 HTTP and Static HTML HTTP = a document fetching protocol: 1. User clicks on URL with HTTP protocol 2. Client requests connection to server, server connects 3. Browser requests HTML page to web site 4. Server finds/sends HTML page to client “AS IS” 5. Client’s browser interprets and displays HTML doc 6. Server disconnects from client HTML is static: text (info) and tags (formatting), ex.: <FONT SIZE=2><BOLD>Hello!! </BOLD><U>there</U></FONT> Displays as: Hello!! there 21 Static HTML via HTTP Web Server Microsoft Internet Information Server (IIS) Apache Open connection and find HTML doc file.html Request connection to server and file.html Send HTML doc and close connection Client Browser Internet Explorer Netscape Navigator 22 Static HTML: HTTP Shortcomings • Corporate information is dynamic As corporate information changes, Database contents change too Web pages need to change too By hand? Or, do we link to databases? • How to customize displays for different users? 23 How to make web pages Dynamic? 2 generic solutions (workarounds) to static HTML: 1. Client-side scripting • Scripts that are processed by the browser in the local machine 2. Server-side scripting • Scripts that are processed by the web server 24 Client-Side Scripting • • • • • • Script commands embedded in HTML file Browsers need capability to process scripts Processing is done by browser AFTER page is fetched from server Useful for interactive and visual effects Browser must support scripting language Most popular: JavaScript, VB Script 25 Client-Side Scripting Embedding Client-Side Scripts in HTML HTML lines <SCRIPT LANGUAGE = “JavaScript”> script lines </SCRIPT> More HTML lines <SCRIPT LANGUAGE = “Perl”> script lines </SCRIPT> More HTML lines ………… 26 Example 1 27 Example 2 See: http://faculty.vassar.edu/lowry/kappa.html 28 Example 3 Other examples: http://auapps.american.edu/~alberto/images/BouncingDots.html http://auapps.american.edu/~alberto/images/BouncingHearts.html 29 Server-Side Scripting • Script commands embedded in HTML file • The server must have capability to process scripts • Processing is done by web server BEFORE page is sent to browser • Useful to customize pages based on data stored on the server (databases, images, etc.) • And for centralized processing (at the server) • Web sever must support the scripting language • For example: – Microsoft’s Active Server Pages (ASP) – Which is a web scripting environment – It runs on Microsoft IIS (Internet Info Server) Web Servers – Supports VB Script or JScript (MS version of JavaScript) • Other scripting languages – PHP: Like ASP, Open Source for Apache servers – Perl: used with CGI scripts (Unix servers) 30 Server-Side Scripting with Microsoft’s ASP • Embedded scripts in HTML page HTML code (i.e., tags and text) <% ‘ Everything after <% is an ASP script ‘ Note: use quote for comments ASP script code (using VB Script as default or other as declared) ………..………. ……………….... %> ‘ ASP script ends with %> More HTML code <% more ASP %> Etc. 31 How ASP Works: 1. Web file needs to be named .asp (instead of .html) User clicks on URL with .asp file Browser sends request for .asp file to server 2. Web server notices file extension .asp and Loads a program (DLL) called ASP.DLL Which processes this and other .asp files Server generates a “new” web file Contains all original HTML stuff Plus processing results from ASP scripts These are dynamically formatted w/HTML tags 3. Server sends the “new” web file to the browser Not the “original” ASP file!! 32 How ASP Works Microsoft’s Web Server (ASP + MS Access or SQL Server) SQL Query (if any) Databases Query Results (recordset) file.asp file.asp Process Scripts file.asp asp.dll Response HTML doc Generated On-the-Fly HTML Request file.html Request file.asp Client Browser Internet Explorer Netscape Navigator HTML doc Fetched (+ Client-Side scripts, if any) = file.html file.html 33 Dynamic HTML with ASP ASP file on web server (file.asp) <H3>Welcome to my page</H3> <H2>Here is my product list</H2> <% ‘Start ASP script Open a database connection SQL queries to database Copy results to a record set Display records one at a time Close database connection %> ‘End ASP script <P>Thank you very much for inquiring about our products HTML file sent to browser (file.asp) HTML Dynamically generated HTML lines by ASP HTML <H3>Welcome to my page</H3> <H2>Here is my product list</H2> <P> <B>Product Price</B> <HR> <P>Hammer ……... $8.50 <P>Pliers ……….… $7.79 <P>Screwdriver ..… $4.50 <P>Power Drill ….. $49.99 <P>Chainsaw …… $95.95 <P>Wrench ……….. $6.50 <P>Thank you very much for inquiring about our products 34 Common Uses of ASP with Databases • Register a client (add record in database) • List products & services (query database) • Place orders (add records in database) [Illustrations: Database Design Shopping Cart Order Entry] • Track order status (query database) • Tech support (query a knowledge database) • Fill out a survey (add records in database) 35 See: http://www.jibe4fun.com/scripts/orders/ 36 ASP HTML Both Example: ASP (Query) Script 37 Example: Query Results Sent to Browser (HTML dynamically generated by previous ASP script) <IMG SRC="music22.gif"><B>Alberto's Music Instruments, Inc.<p> <TABLE BORDER="0"><B>Customer List</B> <TR><TH>ClientID</TH> <TH>Client Name</TH> <TH>Shipping Address</TH> <TH>Telephone</TH> </TR> <TR><TD>josee</TD> <TD>Alberto Espinosa</TD> <TD>Schenley Park, GSIA Building, #20</TD> <TD>412-268-3681<BR></TD> </TR> <TR><TD>sandy</TD> <TD>Sandra Slaughter</TD> <TD>5000 Forbes Avenue, Pittsburgh PA 15213</TD> <TD>412-268-3681<BR></TD> </TR> etc. </TABLE></BODY></HTML> 38 See: http://www.jibe4fun.com/scripts/orders/Customer_List.asp 39 Using Forms with ASP, HTML and Databases • • • • • • • • Capture data from user using HTML forms Feed form data to an ASP script Which is what the “Submit” button does HTML forms contain data items with field names Which are passed to ASP scripts for processing Often used to embed an SQL command To query a database (product list, etc.) Or to insert records in a database (orders, etc.) 40 Example: HTML Form (Data Input) Doesn’t have to be ASP, can be plain HTML <B>Customer Registration</B><P> On submit, Pass on to <FORM ACTION= “http://softrade-11.gsia.cmu.edu/data/customerSubmit.asp” METHOD=“POST”> <TABLE> <TR><TD>Please <TD><INPUT </TD></TR> <TR><TD>Please <TD><INPUT </TD></TR> etc. </TABLE> enter a customer ID (4 to 16 characters)</TD> TYPE=“text” SIZE=“35” NAME="CustomerID"> enter your name</TD> TYPE=“text” SIZE=“35” NAME="CustName"> Form Object <INPUT TYPE="submit" VALUE=“Submit”></TD></TR> </TABLE> </FORM> 41 See: http://www.jibe4fun.com/scripts/orders/Customer_Input.html http://www.jibe4fun.com/scripts/orders/ 42 Example: ASP Processing Data from Forms <!-- customerSubmit.asp --> Request From Form Object Add record in database 43 ASP Resources • A periodic publication on ASP. It contains articles with ASP issues as well as some tips and tricks: http://www.asptoday.com • Nice introductory book with examples and a web site where you can download running code: Beginning Active Server Pages 3.0 http://www.wrox.com/books/0764543636.shtml • A more advanced book. Probably one of the most useful reference books to have for people who are doing serious ASP coding: Professional ASP.NET 1.0. http://www.wrox.com/books/0764543962.shtml • A useful book if you need more help with Visual Basic Script: VB Script Programmer's Reference. http://www.wrox.com/books/0764543679.shtml • A good and concise reference of ASP objects for those who already know ASP fairly well: ASP in a Nutshell, Weissinger and Petrusha, O'Reilly series. http://www.amazon.com/exec/obidos/ASIN/1565924908/ 44 Other Related Technologies Server-Side Processing: • JSP (Java Server Pages): Sun's version of ASP (*.jsp files) • ColdFusion (*.cfm files), Dreamweaver (Macromedia) http://www.macromedia.com/ Source – PHP (*.php files) • Lotus Notes & Domino IBM, http://lotus.com/home.nsf/welcome/domino • (Like ASP but) Open 45 Other Related Technologies (cont'd.) Extensible Markup Language (XML) • • • • Standard for inter and intra-organizational data exchange Very important for B2B e-commerce applications Like HTML, but used to fetch data, not documents Each tag is defined data, not formats, ex.: <LastName>Espinosa</LastName> <FirstName>Alberto</FirstName> <U>josee</U> (not underline, just a variable called U) • Data defined in "Document Type Definition" files (DTD) • Data itself in XML file • Need and XML processor to process XML data (a browser is an HTML processor) 46 Example: baking bread with XML <?xml version="1.0" encoding="UTF-8"?> <Recipe name="bread" prep_time="5 mins" cook_time="3 hours"> <title>Basic bread</title> <ingredient amount="3" unit="cups">Flour</ingredient> <ingredient amount="0.25" unit="ounce">Yeast</ingredient> <ingredient amount="1.5" unit="cups“ state="warm">Water</ingredient> <ingredient amount="1" unit="teaspoon">Salt</ingredient> <Instructions> <step>Mix all ingredients together, and knead thoroughly.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Knead again, place in a tin, and then bake in the oven.</step> </Instructions> </Recipe> 47 Another XML Example Can you draw a table that contains the following data? <RECORD>1</RECORD> <FIRSTNAME>Alberto</FIRSTNAME> <LASTNAME>Espinosa</LASTNAME> <EMAIL>[email protected]</EMAIL> <PROFESSION>Professor</PROFESSION> <SCHOOL>American University</SCHOOL> <DEPARTMENT>Information Technology</DEPARTMENT> <REMARKS>Looks tired, needs vacation</REMARKS> <RECORD>2</RECORD> <FIRSTNAME>Gwanhoo</FIRSTNAME> <LASTNAME>Lee</LASTNAME> <EMAIL>[email protected]</EMAIL> <PROFESSION>Professor</PROFESSION> <SCHOOL>American University</SCHOOL> <DEPARTMENT>Information Technology</DEPARTMENT> <REMARKS>He teaches the other 2 MIS sections</REMARKS> <RECORD>3</RECORD> <FIRSTNAME>Jill</FIRSTNAME> <LASTNAME>Klein</LASTNAME> <EMAIL>[email protected]</EMAIL> <PROFESSION>Professor</PROFESSION> <SCHOOL>American University</SCHOOL> <DEPARTMENT>Information Technology</DEPARTMENT> <REMARKS>She teaches the MBA MIS course</REMARKS> 48 Same data in database table format 49 Business to Business E-Commerce Example using XML Supplier INSERT query XML Processor DBMS (e.g., MS SQL Server) XML Document (e.g., Purchase Order) Internet XML Processor XML Document (e.g., Purchase Order) Query results Buyer SELECT query DBMS (e.g., Oracle) 50 Advanced Database Topics 51 Scale Issues • Scale Issue #1: Large databases need to be managed Database Administration • Scale Issue #2: Large database applications need to update multiple tables simultaneously Transactions • Scale Issue #3: Multiple users using, updating and querying the database Concurrency • Scale Issue #4: Large database application with wide geographic scope Distributed databases • Scale Issue #5: Multiple data sources needed for decision making Data warehouses 52 Database Administration 53 Database Administration “A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery.” -- Hoffer et. al. 54 Database Administration Functions • Data policies (e.g. every user must have a password, authorizations, access to data), procedures (e.g., data must be backed up daily) and standards • Develop the organization’s information architecture (i.e., understand the org’s information requirements) • Data ownership conflict resolution • Manage data repositories (metadata, data dictionary) 55 Database Administration Functions (cont’d.) • • • • Hardware and software selection Install and upgrade the DBMS Tune database and query processing performance Manage data security (threats to security, user views, access/authorization rules for users and applications), privacy (encryption) and integrity • Data backup and recovery (audit trails, transaction logs, change logs, transaction integrity, recovery management) 56 Metadata and Data Dictionaries • Metadata = data about the data • Data dictionary = a form of metadata • Relational data dictionary = tables with data about the database Tables(TableName, TableDescription) Fields (TableName, FieldName, FieldType, Length) Forms (FormName, FormDescription) Etc. • Data dictionaries can be passive or active, depending on the DBMS • Passive data dictionary: only used to document the data • Active data dictionary: database design, access and update is all done through the data dictionary 57 Data Dictionary Example http://auapps.american.edu/~alberto/itec630/DBLab&DataDictionary.mdb [local copy] 58 Transactions 59 What is a Transaction? • • • • A transaction is a logical unit of work No portion of a transaction stands by itself It represents a real-world event For example, a product sale has an effect on accounting records, inventory records, customer transaction files, cash register balances, etc. • A transaction must take a database from one consistent state (i.e., one in which all data integrity constraints are satisfied – e.g., entity integrity, referential integrity, etc.) to another consistent state • So, all portions of the transaction must execute as a whole, or none at all 60 Preventing an inconsistent database state • Acceptance of an incomplete transaction will yield an inconsistent database state • To avoid such a state, the DBMS ensures that all of a transaction's database operations are completed before they are committed to the database • A transaction begins with a database in consistent state A • All table updates in a transaction are then executed • When/if completed, the database ends in consistent state B • When/if this happens, the DBMS COMMITS the transaction • When/if the transaction is interrupted before full completion, the transaction is aborted and DBMS ROLLS BACK the database to its previous consistent state A 61 Transaction support • SQL in advanced DMBS’s provides transaction • This is supported with the COMMIT and ROLLBACK statements • COMMIT: Permanently saves changes to disk • ROLLBACK: Restores the database to its previous consistent state before the transaction started 62 Example of a Transaction • For example, to process an order: 1.Add a record in the Orders table 2.Add a record in the LineItems table for each product ordered 3.Update client history file Program (a “stored procedure”): BEGIN TRANSACTION ON TRANSACTION INCOMPLETE ROLLBACK (often implicit, thus omitted) INSERT (“19944”, “alberto”, 12/12/2003, “Top Priority”) INTO Orders INSERT (“19944”, 1, “comp”, 24) INTO LineItems INSERT (“19944”, 2, “keybd”, 14) INTO LineItems INSERT (“19944”, 3, “mouse”, 22) INTO LineItems UPDATE Clients SET ClientAmt = ClientAmt + $12,340 WHERE ClientID = “alberto” END TRANSACTION COMMIT 63 The four transaction properties are: • Atomicity requires that all parts of a transaction must be completed or the transaction is aborted. This property ensures that the database will remain in a consistent state. • Durability indicates that the database will be in a permanent consistent state after the execution of a transaction. In other words, once a consistent state is reached, it cannot be lost. • Serializability means that a series of concurrent transactions will yield the same result as if they were executed one after another. • Isolation means that the data required by an executing transaction cannot be accessed by any other transaction until the first transaction finishes. This property ensures data consistency for concurrently executing transactions. 64 Transaction Log • Is a special DBMS table that contains a description of all the database transactions executed by the DBMS • It plays a key role in maintaining database concurrency control and integrity. • The information stored in the transaction log is used by the DBMS to ROLLBACK the database after a transaction is aborted or after a system failure. • The transaction log is often stored in a different hard disk or in a different media (tape) to prevent the failure caused by a media error. 65 Concurrency 66 What is concurrency? • A common problem in computer systems occurs when a shared resource (e.g., screen display, hard disk, data file, database record) need to be used simultaneously by more than one device, application or user. • In database, concurrency refers to the management of simultaneous access to a shared table, record or data element by more than one person, application or transaction in a multiuser environment (e.g., 100 data entry clerks entering data in the same table) • This is managed by “locking” tables, records or individual data elements when necessary 67 What is a lock? • Mechanism used in concurrency control to guarantee the exclusive use of a data element to the transaction that “owns the lock”. • For example, if the data element X is locked by transaction T1, transaction T2 will not have access to the data element X until T2 releases the lock. • Generally speaking, a data item can be in only two states: locked (being used by some transaction) or unlocked (not in use by any transaction). • To access a data element X, a transaction T1 must request a lock to the DBMS. If the data element is not in use, the DBMS will lock X to be used by T1 exclusively. No other transaction will have access to X while T1 is executed • Soft lock: locked element can be read (queried) but not modified • Hard lock: locked element cannot be accessed at all 68 Distributed Databases 69 Definition • Distributed Database: “a single logical database that is spread physically across computers in multiple locations that are connected by a data communications link.” -- Hoffer et al. 70 Why distributed databases? • • • • • Distributed autonomous business units Data sharing across business units Database recovery and redundant systems Risk: eliminate single points of failure Efficiency: locate data where it is needed the most 71 Distributed Database Options • Homogeneous: Same DBMS at each node • Heterogeneous: Different DBMSs at different nodes. 72 Objectives of Distributed DBMS • Location Transparency: – The user doesn’t need to know the location of the data – The location of the data is stored in the data dictionary so that the DBMS can find it • Local Autonomy: – Local site can operate with its database when other sites are down 73 Trade-offs in distributed databases • Synchronous Distributed Database: – All copies of the database data are always identical – i.e., synchronized • Asynchronous Distributed Database: – Data may be temporarily unsynchronized – Data is replicated and synchronized with delay 74 Options for Distributing a Database • Data replication: keeping separate copies of the same data, replicated/synchronized periodically • Horizontal partitioning: store some table rows (i.e., records) in one location and some in another • Vertical partitioning: store some table columns (i.e., fields) in one location and some in another • Distributed tables: store some tables in one location and some in another location • Combinations of the above • The (distributed) data dictionary contains information about the physical location of all tables, columns and rows 75 Data Warehouses 76 Definition • Data Warehouse: “a subject-oriented, integrated, time-variant, non-updatable, organized collection of data gathered from a variety of sources to support management decisions” -- Hoffer et al. 77 Data Warehouse Architectures • Two-level – Data is extracted from various internal and sources – Then transformed and integrated in data warehouse • Independent Data Mart Warehousing Environment – – – – Data is extracted from various internal and sources Then transformed and exported to independent data marts A data mart is a smaller data warehouse of limited scope Customized for decision making of different groups • Dependent Data Mart with EDW (three-level) – Combines the two methods above – Data is integrated into an Enterprise Data Warehouse (EDW) – Which are used to load the dependent data marts 78 Two-Level Architecture Operational Data Decision Support Environment Data Source Data Source Transform and Integrate Data Warehouse Data Source 79 Independent Data Mart Operational Data Decision Support Environment Data Source Data Source Data Source Data Mart Transform and Integrate Data Mart Data Mart 80 Dependent Data Mart & EDW Decision Support Environment Operational Data Data Source Data Source Data Source Data Mart Data Mart Transform and Integrate Data Mart Enterprise Data Warehouse 81 Star Schema • Also called the dimensional model. • Fact and dimension tables. – Fact table: consists of factual or quantitative data about the business – Dimension table: hold descriptive data • Grain of a fact table - time period for each record (e.g. Monthly, weekly, every transaction). 82 Components of a star schema 83 Star schema example 84 Size of the fact table • • • • Total number of stores: 1,000 Total number of products: 10,000 Total number of periods: 24 Total rows: 1000 * 10,000 * 24 = 240,000,000 • On average 50% items record sales, – no of rows = 120,000,000 85 Data Warehouse “A database that stores and consolidates current and historical data from various systems (internal and external) with tools for management reporting and sophisticated analysis—i.e., Datamining” 86 Slicing a data cube 87