Download Database Technologies - Boston College Personal Web Server

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
MPIII
Database Technologies
Relational Concepts
Data Warehouses & Marts
Queries, OLAP, Data Mining
Terms/Examples
Server - responds to client requests
request
response
Client - makes requests
of the DBMS server
• Database
DBMS - the program.
Manages interaction with
databases.
database - the collection of data.
Created and defined to meet the
needs of the organization.
– a collection of related data. Usually organized according
to topics: e.g. customer info, products, transactions
• Database Management System (DBMS)
– a program for creating & managing databases; ex.
Oracle, MS-Access, Sybase
A Simple Database
CUSTID
2001
2002
2003
2004
2005
FIRST
John
Abby
Warren
Peter
Charles
LAST
Gallaugher
Johnson
Buffet
Lynch
Schwab
CITY
Newton
Boston
Omaha
Rockport
San Francisco
STATE
MA
MA
NE
MA
CA
…
…
…
…
…
…
• File/Table
– Customers
• Field/Column
– 5 shown: CUSTID, FIRST, LAST, CITY, STATE
• Record/Row
– 5 shown: one for each customer
A More Complex Example
FIRST
John
John
John
Abby
Abby
Warren
Warren
Warren
Peter
Peter
Charles
Charles
LAST
Gallaugher
Gallaugher
Gallaugher
Johnson
Johnson
Buffet
Buffet
Buffet
Lynch
Lynch
Schwab
Schwab
CITY
Newton
Newton
Newton
Boston
Boston
Omaha
Omaha
Omaha
Rockport
Rockport
San Francisco
San Francisco
STATE
MA
MA
MA
MA
MA
NE
NE
NE
MA
MA
CA
CA
BUY/SELLSTOCK SHARES
Buy
MSFT
1000
Buy
INTC
2400
Sell
IBM
3000
Sell
IBM
3000
Sell
INTC
2000
Buy
INTC
1500
Buy
IBM
1700
Sell
AAPL
1900
Buy
AAPL
2000
Sell
AAPL
10000
Buy
MSFT
4500
Buy
INTC
17000
PRICE DATE
90 1/4 12/24/96
80 1/8
7/3/97
114 3/8
7/1/97
110 1/8 6/30/97
94 7/8 8/30/97
90 3/8
7/2/97
101 7/8
1/4/97
18 1/2 2/14/97
19 2/14/97
21 7/8 3/15/97
101 1/8 1/15/97
80 1/8
7/2/97
TIME
12:01 PM
10:51 AM
9:03 AM
4:53 PM
3:15 PM
11:27 AM
2:02 PM
5:00 PM
5:30 PM
11:44 AM
12:38 AM
4:53 PM
• Entry & Maintenance is complicated
– redundant data exists, increases chance of error,
complicates updates/changes, takes up space
Normalize Data - Remove Redundancy
CUSTID
2001
2002
2003
2004
2005
FIRST
John
Abby
Warren
Peter
Charles
LAST
Gallaugher
Johnson
Buffet
Lynch
Schwab
CITY
Newton
Boston
Omaha
Rockport
San Francisco
STATE
MA
MA
NE
MA
CA
Customer Table
Transaction Table
One
Many
CUSTID
2001
2001
2001
2002
2002
2003
2003
2003
2004
2004
2005
2005
BUY/SELLSTOCK SHARES
Buy
MSFT
1000
Buy
INTC
2400
Sell
IBM
3000
Sell
IBM
3000
Sell
INTC
2000
Buy
INTC
1500
Buy
IBM
1700
Sell
AAPL
1900
Buy
AAPL
2000
Sell
AAPL
10000
Buy
MSFT
4500
Buy
INTC
17000
PRICE DATE
90 1/4 12/24/96
80 1/8
7/3/97
114 3/8
7/1/97
110 1/8 6/30/97
94 7/8 8/30/97
90 3/8
7/2/97
101 7/8
1/4/97
18 1/2 2/14/97
19 2/14/97
21 7/8 3/15/97
101 1/8 1/15/97
80 1/8
7/2/97
TIME
12:01 PM
10:51 AM
9:03 AM
4:53 PM
3:15 PM
11:27 AM
2:02 PM
5:00 PM
5:30 PM
11:44 AM
12:38 AM
4:53 PM
Key Terms
• Relational DBMS
– manages databases as a collection of files/tables in
which all data relationships are represented by
common values in related tables (referred to as keys).
– a relational system has the flexibility to take multiple
files and generate a new file from the records that
meet the matching criteria (join).
• SQL - Structured Query Language
– Most popular relational database standard. Includes a
language for creating & manipulating data.
Now With More Data
CUSTID
2001
2002
2003
2004
2005
FIRST
John
Abby
Warren
Peter
Charles
LAST
Gallaugher
Johnson
Buffet
Lynch
Schwab
One
Many
CITY
Newton
Boston
Omaha
Rockport
San Francisco
CUSTID
2001
2001
2001
2002
2002
…
BROKID
B003
B001
B003
B001
B003
…
STATE
MA
MA
NE
MA
CA
BUY/SELLSTOCK SHARES PRICE DATE
Buy
MSFT
1000 90 1/4 12/24/96
Buy
INTC
2400 80 1/8
7/3/97
Sell
IBM
3000 114 3/8
7/1/97
Sell
IBM
3000 110 1/8 6/30/97
Sell
INTC
2000 94 7/8 8/30/97
…
…
…
…
…
Many
One
BROKID
B001
B002
B003
FIRST
Ivan
Dennis
Michael
LAST
Boesky
Levine
Milken
…
…
…
…
TIME
12:01 PM
10:51 AM
9:03 AM
4:53 PM
3:15 PM
…
Customer Table
Col. Name Length Type …
CUSTID
4 Char …
FIRST
10 Char …
LAST
15 Char …
CITY
15 Char …
STATE
2 Char …
…
…
…
…
Meta-Datam
1
m
Broker Table
Col. Name Length Type
BROKID
4 Char
FIRST
10 Char
LAST
15 Char
…
…
…
…
Transaction Table
Col. Name Length Type
CUSTID
4 Char
BROKID
4 Char
BUY/SELL
1 Bool
STOCK
4 Char
SHARES
8 Num
PRICE
6.2 Money
…
…
…
…
…
…
…
…
…
…
1
…
…
…
• Data that describes the characteristics of stored data
• Enterprise Data Model
– consistent, cross-functional, shareable meta-data model
– standardization increases flexibility & use (data to info)
– facilitates the creation of data warehouses
Management Levels of IS
Strategic
Planning
Management
Control
Operational
Control
DSS
MIS
TPS
Warehouses & Marts
• Data Warehouse
– a database designed to support decision-making in an
organization. It is batch-updated and structured for
fast online queries and exploration. Data warehouses
may aggregate enormous amounts of data from many
different operational systems.
• Data Mart
– a database focused on addressing the concerns of a
specific problem or business unit (e.g. Marketing,
Engineering). Size doesn’t define data marts, but
they tend to be smaller than data warehouses.
Data Warehouses & Data Marts
3rd party data
Data Mart
(Marketing)
TPS
& other
operational
systems
Data
Warehouse
= operational clients
= query, OLAP, mining, etc.
Data Mart
(Engineering)
Differing System Demands
Operational Systems
network traffic
& processor
demands
time
Managerial Systems
network traffic
& processor
demands
time
Transform Data from TPS to
Warehouse
• Consolidate data
– e.g. from multiple TPS around the country/world
• “Scrub” the data
– keep definitions consistent (e.g. translate part
numbers/product names if they differ per country)
• Calculate fields (decrease processor load)
• Summarize fields (decrease processor load)
• De-normalize data (ease of use)
Calculated Fields
Customer Service Application:
Customer support person
TPS - focuses on customer info
Total is calculated on the fly
Customer Date
Gallaugher 3/25/98
Johnson
3/26/98
Buffet
3/27/98
Stock
INTC
AAPL
MSFT
Database Query Application:
Marketing manager
Aggregate reporting of
business intelligence
Total calculated in advance
Shares Price
Total
1000
76 1/2 $
76,500
2500
23 1/4 $
58,125
3000
84 $
252,000
Query Tools & OLAP
• Query Tools
– user-lead discovery. Can return individual records or
summaries. Requests are formulated in advance (e.g.
“show me all delinquent accounts in the northeast
region during Q1”).
• OLAP - Online Analytical Processing
– user-lead discovery. Data is explored via “drill down”
into the data by selecting variables to summarize on.
Results are usually reported in a cross-tab report or
graph (e.g. “show me a tabular breakdown of sales by
business unit, product type, and year”).
OLAP
• Online Analytical Processing. (example of
cross-tab results presented below)
1. business unit
2. product type
3. year
Data Mining
• automated information discovery process,
uncovers important patterns in existing data
– can use neural networks or other approaches.
Requires ‘clean’, reliable, consistent data.
Historical data must reflect the current
environment.
• e.g. “What are the characteristics that identify
when we are likely to lose a customer?”
Data Mining Uses
•
•
•
•
•
•
Market Segmentation - e.g. Dayton Hudson
Direct Marketing - e.g. Chase
Market basket analysis - e.g. Wal-Mart
Customer Churn - e.g. Fleet Bank
Fraud Detection - e.g. Bank of America
Cost Reduction Prospecting - e.g. Merk
Medco.
Stupid Data-Miner Tricks
• Ad-Hoc Theories
– when an oddity jumps out of the data, it’s tempting
to develop a theory for it. Sometimes findings are
just statistical flukes.
• Using Too Many Variables
– the more factors considered, the more likely a
relationship will be found - valid or not.
• Not Taking No for an Answer
– it’s OK to stop looking if you can’t find anything.
There are no silver bullets.
MPIII
Internal & External Integration
Enterprise Resource Planning (ERP)
Challenges Facing IS Depts.
• Y2K & Legacy Systems
• Globalization (euro, currency issues)
• Rapid Technology Advancement
– e.g. Client/Server & Internet
• IS Staffing & Retention
• Changing Organizational Structures
– e.g. Owens Corning
• Tighter Integration with Buyers & Suppliers
Legacy Systems
Many firms have limited to no integration across
geographic areas
functional areas (v-chain)
products, plants, &
business units
Infrastructure: general mgmt, planning, finance, IS
HRM: recruiting, hiring, training, and development
Tech. Development: R&D
Procurement
Inbound
Operations
Outbound
Marketing
logistics
logistics
& Sales
Suppliers
Service
Buyers
External Integration
• EDI - Electronic Data Interchange
– uses standard formats to pass data between disparate
systems
– US format - X.12, European format - UN/EDIFACT
• Cost Savings
– paper order = $50 - $70
– EDI order = $2.50 (VANs / private networks)
– I-EDI order = less than $1 (Internet)
• XML - eXtensible Markup Language
– tagging language for the web
What is ERP?
• ERP - Enterprise Resource Planning Software
– sometimes called Enterprise Applications, Enterprise
Packages, Enterprise Suites, or Enterprise Systems
– connects all of the information which flows through a
company to a single integrated set of systems
– implemented in modules which can be integrated (all at
once or at a later date) e.g. Financials, Logistics, HR
– may work with a wide variety of databases, hardware,
and operating systems
• Leading Vendors
– SAP, Oracle, JD Edwards, Baan, Peoplesoft
ERP in Action
Production
Staffing
Inventory
Purchasing
Order Tracking
Source: BusinessWeek Int’l, 1997
Sales
Planning
The Benefits
• Internal & external integration
– squeeze out waste & enable strategies
• Standard software enables – inter-organizational systems (easier if buyers &
suppliers use the same system, e.g. petrochem. ind.)
– broad selection of add-on packages (e.g. data
warehouses, etc.)
• Package upgrading and new technology
development is handled by vendor
• Speed of deployment
The Risks
• Staff retention (e.g. Grace case)
• Tied to a single vendor
• Flexibility limited by options offered by the
vendor
– may inappropriately force generic processes
– may inappropriate force structure
• Complexity - particularly regarding
mapping and standardizing processes across
the organization.
Make vs. Buy
Comp. Adv.
Will the proposed system offer
proprietary comp. adv.?
Security
Is the process or data highly
confidential?
IT Competency Is IT a core competency?
Tech. Skill
Does the firm have sufficient
expertise with tech.?
Suitability/Fit
Is a suitable partner/package
available?
Cost/Benefit
Is the package cheaper than inhouse dev.?
Time
Is there sufficient time to develop
the system
Adapted from Applegate et al., p. 61.
Make Buy
Yes No
Yes
No
Yes
Yes
No
No
No
Yes
No
Yes
Yes
No
Successful Deployment of ERP
• Business Case
– benchmark, cost justify (e.g. unplug mainframes)
• Leadership
– from the highest levels (e.g. success at Owens Corning,
failure at Westinghouse)
• Staffing
– largely from business, not IT (users know the process)
– ‘compensation handcuffs’ (e.g. end of deployment
bonuses, training payback agreements)
– experienced consultants - check refs., clients
• Execute with proven methodologies