Download Servers - American University

Document related concepts

Data analysis wikipedia , lookup

Neuroinformatics wikipedia , lookup

Data remanence wikipedia , lookup

Transcript
Business Analysis and Data Design
ITEC-630 Fall 2008
Advanced Database Topics
Professor J. Alberto Espinosa
Agenda
• Client-server computing and database servers
• Connecting databases to the web
• Other advanced database topics:
–
–
–
–
–
Database administration
Transactions
Concurrency
Distributed databases
Data warehouses
2
Client Server
Computing and
Database Servers
3
Client-Server Computing
• A key technological development in the 90’s
• A form of “distributed computing”
• Most predominant computing architecture today
• Software application (i.e., processing) is split into tasks
• These tasks are distributed among computers
• Depending where it is more efficient to do the
processing
4
Clients and Servers
Clients
– Request specialized services from servers, and
– Perform other tasks for users (e.g., screen displays)
Servers
– Acknowledge service requests from clients, and
– Provide requested services (i.e., tasks, processes)
– Via responses to clients
• Servers and clients connect via networks
5
Client-Server
Computing
Database
Client
Other
Client
Request
(e.g., SQL query)
Network
Response
(e.g., query
results)
Database
Server
Browser
Web Server
Service Request
(e.g., a web page)
Response
(e.g., a web page)
Other
Server
6
Examples of Servers
• A server can be hardware, software or both
• File Server  central file storage, process file requests
(ex. Novell’s NetWare, Windows NT)
• Database Server  back-end DBMS functions
(ex. MS SQL Server, Oracle Server, Lotus Notes Server)
• Web Server  store and fetch web files on request
(ex. Apache, Microsoft IIS)
• Print Server  print job queuing for central printers
• Mail Server  routes mail to users and other mail servers
7
Examples of Clients
• A client can be hardware, software or both
• Networked PCs  request files and other services
from file servers (Windows 2000, XP)
• Database Clients  request records from database
server, process data locally, screen formatting, etc.
(Lotus Notes client, MS Access)
• Web Browsers  request web files from web servers,
translate HTML code into formatted screen displays
(Internet Explorer, Netscape)
• Mail Client  Send/retrieve mail to/from mail servers,
organize and display user mail
(Outlook Express; Lotus Notes mail client)
8
Generic Client-Server Architecture
Request
Application Software
(Client Portion)
Client Communication
Software
Client Operating System
Response
Communication
Protocols
Hardware Platform
Application Software
(Server Portion)
Server Communication
Software
Server Operating System
Hardware Platform
Network
Client
Server
9
Ex.: “Thin Client” or “Fat Server” model
Most of the processing is done by the server
Thin Client
Presentation Software
Fat Server
Example:
•Web Server-Browser
Applications
•Easy to deploy
applications
Client Communication
Software
Application Software
•Great for electronic
commerce
DBMS
•Easy to support and
upgrade applications
for distributed use
Server Communication
Software
Client Operating System
Server Operating System
Hardware Platform
Hardware Platform
Network
10
Ex.: “Fat Client” or “Thin Server” model
Most of the processing is done by the client
Thin Server
Fat Client
Example:
Presentation Software
Application Software
Client Communication
Software
•File Servers (Novell’s
NetWare—your G drive,
Windows NT) – file
servers do very little –
e.g., giving you access
to shared drives and
folders
Server Communication
Software
Server Operating System
(incl. file management)
Client Operating System
Hardware Platform
Hardware Platform
Network
11
Examples of “Very Thin” Servers:
Embeddable Web Servers
12
Example: Client-Server
Database Management Systems (DBMS)
(fat client)
Presentation Software
Database Application
Request
Front-End
Database Client SW
Response
Back-End
Database Server SW
Client Communication
Software
Server Communication
Software
Client Operating System
Server Operating System
Hardware Platform
Databases
Hardware Platform
Network
Client Computer
Server Computer
13
Ex.: Web Client-Server
Client
Server
Browser
Web Server
HTTP
HTTP
TCP/IP
TCP/IP
Client Communication
Software
Server Communication
Software
Client Operating System
Server Operating System
Hardware Platform
Web pages
(HTML and
other files)
Hardware Platform
Network
14
Ex.: Web Client-Server +
Database Server (thin client)
Client
HTML
Response
Server
HTML
Form
Web Server
SQL
Queries
DBMS Server
Browser
HTTP
Web pages
(HTML and
other files)
HTTP
Databases
TCP/IP
TCP/IP
Client Communication
Software
Server Communication
Software
Client Operating System
Server Operating System
Hardware Platform
Hardware Platform
Network
15
Connecting
Databases with
Web Pages
16
Dynamic Web Pages:
Connecting Web Pages to Databases
Request (ex. get a price quote, place an order)
Response (ex. query results with HTML-formatted
product price or order confirmation notice)
Client / Server Computing:
BrowserClient / Web + Database Servers
= Dynamic Web Pages
Web
Server
Client
Query
String
Results
Network
Click
Submit
Browser
Service Request
Database
Query
Results
Database
Server
(usually runs in
the same
computer as the
web server)
Response:
Dynamically
Formatted
HTML Page
w/Results
Server
18
Setting Up Your Own Site With
Dynamic Web Capabilities
Steps:
1. Register your own domain name (e.g., my domain is
www.Jibe4Fun.com) – there are hundreds of domain registration
services ($20 to $40 per year to keep your domain name active)
(through a service like http://DomainName.com)
2. Contract web hosting services with a company to hold your web
pages – there are hundreds of web hosting services ranging from
($100 per year for a few MB of storage to highly priced
commercial-strength e-commerce services (through a service like
http://www.Alentus.com)
3. Map your domain name to your web hosting service (5 minutes)
4. Design, normalize and populate your database(s)
5. Design and develop your HTML files and related scripts
6. Upload your HTML files, scripts and databases to your assigned
web space with your web hosting service
19
Dynamic
Web Pages:
Scripts
20
HTTP and Static HTML
HTTP = a document fetching protocol:
1. User clicks on URL with HTTP protocol
2. Client requests connection to server, server connects
3. Browser requests HTML page to web site
4. Server finds/sends HTML page to client “AS IS”
5. Client’s browser interprets and displays HTML doc
6. Server disconnects from client
HTML is static: text (info) and tags (formatting), ex.:
<FONT SIZE=2><BOLD>Hello!! </BOLD><U>there</U></FONT>
Displays as: Hello!! there
21
Static HTML via HTTP
Web Server
Microsoft Internet Information Server (IIS)
Apache
Open connection
and find
HTML doc
file.html
Request
connection
to server and
file.html
Send HTML
doc and
close
connection
Client Browser
Internet Explorer
Netscape Navigator
22
Static HTML:
HTTP Shortcomings
• Corporate information is dynamic
 As corporate information changes,
 Database contents change too
 Web pages need to change too
 By hand? Or, do we link to databases?
• How to customize displays for
different users?
23
How to make web pages Dynamic?
2 generic solutions
(workarounds) to static HTML:
1. Client-side scripting
•
Scripts that are processed by the browser in
the local machine
2. Server-side scripting
•
Scripts that are processed by the web server
24
Client-Side Scripting
•
•
•
•
•
•
Script commands embedded in HTML file
Browsers need capability to process scripts
Processing is done by browser AFTER page is
fetched from server
Useful for interactive and visual effects
Browser must support scripting language
Most popular: JavaScript, VB Script
25
Client-Side Scripting
Embedding Client-Side Scripts in HTML
HTML lines
<SCRIPT LANGUAGE = “JavaScript”>
script lines
</SCRIPT>
More HTML lines
<SCRIPT LANGUAGE = “Perl”>
script lines
</SCRIPT>
More HTML lines
…………
26
Example 1
27
Example 2
See: http://faculty.vassar.edu/lowry/kappa.html
28
Example 3
Other examples:
http://auapps.american.edu/~alberto/images/BouncingDots.html
http://auapps.american.edu/~alberto/images/BouncingHearts.html
29
Server-Side Scripting
• Script commands embedded in HTML file
• The server must have capability to process scripts
• Processing is done by web server BEFORE page is sent to
browser
• Useful to customize pages based on data stored on the server
(databases, images, etc.)
• And for centralized processing (at the server)
• Web sever must support the scripting language
• For example:
– Microsoft’s Active Server Pages (ASP)
– Which is a web scripting environment
– It runs on Microsoft IIS (Internet Info Server) Web Servers
– Supports VB Script or JScript (MS version of JavaScript)
• Other scripting languages
– PHP: Like ASP, Open Source for Apache servers
– Perl: used with CGI scripts (Unix servers)
30
Server-Side Scripting
with Microsoft’s ASP
• Embedded scripts in HTML page
HTML code (i.e., tags and text)
<%
‘ Everything after <% is an ASP script
‘ Note: use quote for comments
ASP script code
(using VB Script as default or other as declared)
………..……….
………………....
%>
‘ ASP script ends with %>
More HTML code
<% more ASP %>
Etc.
31
How ASP Works:
1.
Web file needs to be named .asp (instead of .html)
 User clicks on URL with .asp file
 Browser sends request for .asp file to server
2.
Web server notices file extension .asp and
 Loads a program (DLL) called ASP.DLL
 Which processes this and other .asp files
 Server generates a “new” web file
 Contains all original HTML stuff
 Plus processing results from ASP scripts
 These are dynamically formatted w/HTML tags
3.
Server sends the “new” web file to the browser
 Not the “original” ASP file!!
32
How ASP Works
Microsoft’s Web Server
(ASP + MS Access or SQL Server)
SQL Query
(if any)
Databases
Query
Results
(recordset)

file.asp file.asp
Process
Scripts
file.asp
asp.dll
Response
HTML doc
Generated
On-the-Fly
HTML
Request
file.html
Request
file.asp
Client Browser
Internet Explorer
Netscape Navigator
HTML doc
Fetched
(+ Client-Side
scripts, if any)
=
file.html file.html
33
Dynamic HTML with ASP
ASP file on web server (file.asp)
<H3>Welcome to my page</H3>
<H2>Here is my product list</H2>
<%
‘Start ASP script
Open a database connection
SQL queries to database
Copy results to a record set
Display records one at a time
Close database connection
%>
‘End ASP script
<P>Thank you very much for
inquiring about our products
HTML file sent to browser (file.asp)
HTML
Dynamically
generated
HTML lines
by ASP
HTML
<H3>Welcome to my page</H3>
<H2>Here is my product list</H2>
<P>
<B>Product
Price</B>
<HR>
<P>Hammer ……... $8.50
<P>Pliers ……….… $7.79
<P>Screwdriver ..… $4.50
<P>Power Drill ….. $49.99
<P>Chainsaw …… $95.95
<P>Wrench ……….. $6.50
<P>Thank you very much for
inquiring about our products
34
Common Uses of ASP with Databases
• Register a client (add record in database)
• List products & services (query database)
• Place orders (add records in database)
[Illustrations: Database Design Shopping Cart Order Entry]
• Track order status (query database)
• Tech support (query a knowledge database)
• Fill out a survey (add records in database)
35
See: http://www.jibe4fun.com/scripts/orders/
36
ASP
HTML
Both
Example:
ASP (Query) Script
37
Example: Query Results Sent to Browser
(HTML dynamically generated by previous ASP script)
<IMG SRC="music22.gif"><B>Alberto's Music Instruments, Inc.<p>
<TABLE BORDER="0"><B>Customer List</B>
<TR><TH>ClientID</TH> <TH>Client Name</TH>
<TH>Shipping Address</TH> <TH>Telephone</TH> </TR>
<TR><TD>josee</TD>
<TD>Alberto Espinosa</TD>
<TD>Schenley Park, GSIA Building, #20</TD>
<TD>412-268-3681<BR></TD> </TR>
<TR><TD>sandy</TD>
<TD>Sandra Slaughter</TD>
<TD>5000 Forbes Avenue, Pittsburgh PA 15213</TD>
<TD>412-268-3681<BR></TD> </TR>
etc.
</TABLE></BODY></HTML>
38
See: http://www.jibe4fun.com/scripts/orders/Customer_List.asp
39
Using Forms with ASP,
HTML and Databases
•
•
•
•
•
•
•
•
Capture data from user using HTML forms
Feed form data to an ASP script
Which is what the “Submit” button does
HTML forms contain data items with field names
Which are passed to ASP scripts for processing
Often used to embed an SQL command
To query a database (product list, etc.)
Or to insert records in a database (orders, etc.)
40
Example: HTML Form (Data Input)
Doesn’t have to be ASP, can be plain HTML
<B>Customer Registration</B><P>
On submit,
Pass on to
<FORM ACTION=
“http://softrade-11.gsia.cmu.edu/data/customerSubmit.asp”
METHOD=“POST”>
<TABLE>
<TR><TD>Please
<TD><INPUT
</TD></TR>
<TR><TD>Please
<TD><INPUT
</TD></TR>
etc.
</TABLE>
enter a customer ID (4 to 16 characters)</TD>
TYPE=“text” SIZE=“35” NAME="CustomerID">
enter your name</TD>
TYPE=“text” SIZE=“35” NAME="CustName">
Form Object
<INPUT TYPE="submit" VALUE=“Submit”></TD></TR>
</TABLE>
</FORM>
41
See:
http://www.jibe4fun.com/scripts/orders/Customer_Input.html
http://www.jibe4fun.com/scripts/orders/
42
Example: ASP Processing Data from Forms
<!-- customerSubmit.asp -->
Request
From Form
Object
Add record
in database
43
ASP Resources
• A periodic publication on ASP. It contains articles with ASP issues
as well as some tips and tricks: http://www.asptoday.com
• Nice introductory book with examples and a web site where you can
download running code: Beginning Active Server Pages 3.0
http://www.wrox.com/books/0764543636.shtml
• A more advanced book. Probably one of the most useful reference
books to have for people who are doing serious ASP coding:
Professional ASP.NET 1.0. http://www.wrox.com/books/0764543962.shtml
• A useful book if you need more help with Visual Basic Script: VB
Script Programmer's Reference.
http://www.wrox.com/books/0764543679.shtml
• A good and concise reference of ASP objects for those who already
know ASP fairly well: ASP in a Nutshell, Weissinger and Petrusha,
O'Reilly series. http://www.amazon.com/exec/obidos/ASIN/1565924908/
44
Other Related Technologies
Server-Side Processing:
• JSP (Java Server Pages): Sun's version of ASP (*.jsp files)
• ColdFusion (*.cfm files), Dreamweaver (Macromedia)
http://www.macromedia.com/
Source – PHP (*.php files)
• Lotus Notes & Domino
IBM, http://lotus.com/home.nsf/welcome/domino
•
(Like ASP but) Open
45
Other Related
Technologies (cont'd.)
Extensible Markup Language (XML)
•
•
•
•
Standard for inter and intra-organizational data exchange
Very important for B2B e-commerce applications
Like HTML, but used to fetch data, not documents
Each tag is defined data, not formats, ex.:
<LastName>Espinosa</LastName>
<FirstName>Alberto</FirstName>
<U>josee</U> (not underline, just a variable called U)
• Data defined in "Document Type Definition" files (DTD)
• Data itself in XML file
• Need and XML processor to process XML data
(a browser is an HTML processor)
46
Example: baking
bread with XML
<?xml version="1.0" encoding="UTF-8"?>
<Recipe name="bread" prep_time="5 mins" cook_time="3 hours">
<title>Basic bread</title>
<ingredient amount="3" unit="cups">Flour</ingredient> <ingredient
amount="0.25" unit="ounce">Yeast</ingredient> <ingredient
amount="1.5" unit="cups“ state="warm">Water</ingredient>
<ingredient amount="1" unit="teaspoon">Salt</ingredient>
<Instructions>
<step>Mix all ingredients together, and knead thoroughly.</step>
<step>Cover with a cloth, and leave for one hour in warm
room.</step>
<step>Knead again, place in a tin, and then bake in the
oven.</step>
</Instructions>
</Recipe>
47
Another
XML
Example
Can you draw a
table that contains
the following data?
<RECORD>1</RECORD>
<FIRSTNAME>Alberto</FIRSTNAME>
<LASTNAME>Espinosa</LASTNAME>
<EMAIL>[email protected]</EMAIL>
<PROFESSION>Professor</PROFESSION>
<SCHOOL>American University</SCHOOL>
<DEPARTMENT>Information Technology</DEPARTMENT>
<REMARKS>Looks tired, needs vacation</REMARKS>
<RECORD>2</RECORD>
<FIRSTNAME>Gwanhoo</FIRSTNAME>
<LASTNAME>Lee</LASTNAME>
<EMAIL>[email protected]</EMAIL>
<PROFESSION>Professor</PROFESSION>
<SCHOOL>American University</SCHOOL>
<DEPARTMENT>Information Technology</DEPARTMENT>
<REMARKS>He teaches the other 2 MIS sections</REMARKS>
<RECORD>3</RECORD>
<FIRSTNAME>Jill</FIRSTNAME>
<LASTNAME>Klein</LASTNAME>
<EMAIL>[email protected]</EMAIL>
<PROFESSION>Professor</PROFESSION>
<SCHOOL>American University</SCHOOL>
<DEPARTMENT>Information Technology</DEPARTMENT>
<REMARKS>She teaches the MBA MIS course</REMARKS>
48
Same data in
database table format
49
Business to Business
E-Commerce Example
using XML
Supplier
INSERT
query
XML
Processor
DBMS
(e.g., MS
SQL Server)
XML Document
(e.g., Purchase
Order)
Internet
XML
Processor
XML Document
(e.g., Purchase
Order)
Query results
Buyer
SELECT
query
DBMS
(e.g., Oracle)
50
Advanced
Database
Topics
51
Scale Issues
• Scale Issue #1: Large databases need to be
managed
 Database Administration
• Scale Issue #2: Large database applications need to
update multiple tables simultaneously
 Transactions
• Scale Issue #3: Multiple users using, updating and
querying the database
 Concurrency
• Scale Issue #4: Large database application with wide
geographic scope
 Distributed databases
• Scale Issue #5: Multiple data sources needed for
decision making
 Data warehouses
52
Database
Administration
53
Database Administration
“A technical function that is responsible for physical
database design and for dealing with technical
issues such as security enforcement, database
performance, and backup and recovery.”
-- Hoffer et. al.
54
Database Administration Functions
• Data policies (e.g. every user must have a password,
authorizations, access to data), procedures (e.g., data
must be backed up daily) and standards
• Develop the organization’s information architecture
(i.e., understand the org’s information requirements)
• Data ownership conflict resolution
• Manage data repositories (metadata, data dictionary)
55
Database Administration Functions (cont’d.)
•
•
•
•
Hardware and software selection
Install and upgrade the DBMS
Tune database and query processing performance
Manage data security (threats to security, user views,
access/authorization rules for users and applications),
privacy (encryption) and integrity
• Data backup and recovery (audit trails, transaction
logs, change logs, transaction integrity, recovery
management)
56
Metadata and Data Dictionaries
• Metadata = data about the data
• Data dictionary = a form of metadata
• Relational data dictionary = tables with data about the
database
Tables(TableName, TableDescription)
Fields (TableName, FieldName, FieldType, Length)
Forms (FormName, FormDescription)
Etc.
• Data dictionaries can be passive or active, depending on
the DBMS
• Passive data dictionary: only used to document the data
• Active data dictionary: database design, access and
update is all done through the data dictionary
57
Data Dictionary Example
http://auapps.american.edu/~alberto/itec630/DBLab&DataDictionary.mdb [local copy]
58
Transactions
59
What is a Transaction?
•
•
•
•
A transaction is a logical unit of work
No portion of a transaction stands by itself
It represents a real-world event
For example, a product sale has an effect on
accounting records, inventory records, customer
transaction files, cash register balances, etc.
• A transaction must take a database from one
consistent state (i.e., one in which all data
integrity constraints are satisfied – e.g., entity
integrity, referential integrity, etc.) to another
consistent state
• So, all portions of the transaction must execute as
a whole, or none at all
60
Preventing an inconsistent
database state
• Acceptance of an incomplete transaction will yield an
inconsistent database state
• To avoid such a state, the DBMS ensures that all of a transaction's
database operations are completed before they are committed to
the database
• A transaction begins with a database in consistent state A
• All table updates in a transaction are then executed
• When/if completed, the database ends in consistent state B
• When/if this happens, the DBMS COMMITS the transaction
• When/if the transaction is interrupted before full completion, the
transaction is aborted and DBMS ROLLS BACK the database to its
previous consistent state A
61
Transaction support
• SQL in advanced DMBS’s provides transaction
• This is supported with the COMMIT and ROLLBACK
statements
• COMMIT: Permanently saves changes to disk
• ROLLBACK: Restores the database to its previous
consistent state before the transaction started
62
Example of a Transaction
• For example, to process an order:
1.Add a record in the Orders table
2.Add a record in the LineItems table for each product ordered
3.Update client history file
Program (a “stored procedure”):
BEGIN TRANSACTION
ON TRANSACTION INCOMPLETE ROLLBACK (often implicit, thus omitted)
INSERT (“19944”, “alberto”, 12/12/2003, “Top Priority”) INTO Orders
INSERT (“19944”, 1, “comp”, 24) INTO LineItems
INSERT (“19944”, 2, “keybd”, 14) INTO LineItems
INSERT (“19944”, 3, “mouse”, 22) INTO LineItems
UPDATE Clients SET ClientAmt = ClientAmt + $12,340 WHERE ClientID = “alberto”
END TRANSACTION
COMMIT
63
The four transaction properties are:
• Atomicity requires that all parts of a transaction must be
completed or the transaction is aborted. This property ensures
that the database will remain in a consistent state.
• Durability indicates that the database will be in a permanent
consistent state after the execution of a transaction. In other
words, once a consistent state is reached, it cannot be lost.
• Serializability means that a series of concurrent transactions will
yield the same result as if they were executed one after another.
• Isolation means that the data required by an executing
transaction cannot be accessed by any other transaction until the
first transaction finishes. This property ensures data consistency
for concurrently executing transactions.
64
Transaction Log
• Is a special DBMS table that contains a description of
all the database transactions executed by the DBMS
• It plays a key role in maintaining database concurrency
control and integrity.
• The information stored in the transaction log is used by
the DBMS to ROLLBACK the database after a
transaction is aborted or after a system failure.
• The transaction log is often stored in a different hard disk
or in a different media (tape) to prevent the failure
caused by a media error.
65
Concurrency
66
What is concurrency?
• A common problem in computer systems occurs when a shared
resource (e.g., screen display, hard disk, data file, database
record) need to be used simultaneously by more than one
device, application or user.
• In database, concurrency refers to the management of
simultaneous access to a shared table, record or data element
by more than one person, application or transaction in a multiuser environment (e.g., 100 data entry clerks entering data in the
same table)
• This is managed by “locking” tables, records or individual data
elements when necessary
67
What is a lock?
• Mechanism used in concurrency control to guarantee the exclusive
use of a data element to the transaction that “owns the lock”.
• For example, if the data element X is locked by transaction T1,
transaction T2 will not have access to the data element X until T2
releases the lock.
• Generally speaking, a data item can be in only two states: locked
(being used by some transaction) or unlocked (not in use by any
transaction).
• To access a data element X, a transaction T1 must request a lock to
the DBMS. If the data element is not in use, the DBMS will lock X to
be used by T1 exclusively. No other transaction will have access to
X while T1 is executed
• Soft lock: locked element can be read (queried) but not modified
• Hard lock: locked element cannot be accessed at all
68
Distributed
Databases
69
Definition
• Distributed Database: “a single logical
database that is spread physically across
computers in multiple locations that are
connected by a data communications link.”
-- Hoffer et al.
70
Why distributed databases?
•
•
•
•
•
Distributed autonomous business units
Data sharing across business units
Database recovery and redundant systems
Risk: eliminate single points of failure
Efficiency: locate data where it is needed the most
71
Distributed Database Options
• Homogeneous:
Same DBMS at each node
• Heterogeneous:
Different DBMSs at different nodes.
72
Objectives of Distributed DBMS
• Location Transparency:
– The user doesn’t need to know the location
of the data
– The location of the data is stored in the data
dictionary so that the DBMS can find it
• Local Autonomy:
– Local site can operate with its database
when other sites are down
73
Trade-offs in distributed databases
• Synchronous Distributed Database:
– All copies of the database data are always identical –
i.e., synchronized
• Asynchronous Distributed Database:
– Data may be temporarily unsynchronized
– Data is replicated and synchronized with delay
74
Options for
Distributing a Database
• Data replication: keeping separate copies of the same data,
replicated/synchronized periodically
• Horizontal partitioning: store some table rows (i.e., records) in one
location and some in another
• Vertical partitioning: store some table columns (i.e., fields) in one
location and some in another
• Distributed tables: store some tables in one location and some in
another location
• Combinations of the above
• The (distributed) data dictionary contains information about the
physical location of all tables, columns and rows
75
Data Warehouses
76
Definition
• Data Warehouse: “a subject-oriented,
integrated, time-variant, non-updatable,
organized collection of data gathered from
a variety of sources to support
management decisions”
-- Hoffer et al.
77
Data Warehouse Architectures
• Two-level
– Data is extracted from various internal and sources
– Then transformed and integrated in data warehouse
• Independent Data Mart Warehousing Environment
–
–
–
–
Data is extracted from various internal and sources
Then transformed and exported to independent data marts
A data mart is a smaller data warehouse of limited scope
Customized for decision making of different groups
• Dependent Data Mart with EDW (three-level)
– Combines the two methods above
– Data is integrated into an Enterprise Data Warehouse (EDW)
– Which are used to load the dependent data marts
78
Two-Level Architecture
Operational
Data
Decision Support
Environment
Data
Source
Data
Source
Transform
and
Integrate
Data
Warehouse
Data
Source
79
Independent Data Mart
Operational
Data
Decision Support
Environment
Data
Source
Data
Source
Data
Source
Data
Mart
Transform
and
Integrate
Data
Mart
Data
Mart
80
Dependent Data Mart & EDW
Decision Support
Environment
Operational
Data
Data
Source
Data
Source
Data
Source
Data
Mart
Data
Mart
Transform
and
Integrate
Data
Mart
Enterprise
Data
Warehouse
81
Star Schema
• Also called the dimensional model.
• Fact and dimension tables.
– Fact table: consists of factual or quantitative
data about the business
– Dimension table: hold descriptive data
• Grain of a fact table - time period for each
record (e.g. Monthly, weekly, every
transaction).
82
Components of a star schema
83
Star schema example
84
Size of the fact table
•
•
•
•
Total number of stores: 1,000
Total number of products: 10,000
Total number of periods: 24
Total rows: 1000 * 10,000 * 24 =
240,000,000
• On average 50% items record sales,
– no of rows = 120,000,000
85
Data Warehouse
“A database that stores and consolidates current and historical
data from various systems (internal and external) with tools for
management reporting and sophisticated analysis—i.e.,
Datamining”
86
Slicing a data cube
87