* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Technologies - Boston College Personal Web Server
Survey
Document related concepts
Transcript
MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining Terms/Examples Server - responds to client requests request response Client - makes requests of the DBMS server • Database DBMS - the program. Manages interaction with databases. database - the collection of data. Created and defined to meet the needs of the organization. – a collection of related data. Usually organized according to topics: e.g. customer info, products, transactions • Database Management System (DBMS) – a program for creating & managing databases; ex. Oracle, MS-Access, Sybase A Simple Database CUSTID 2001 2002 2003 2004 2005 FIRST John Abby Warren Peter Charles LAST Gallaugher Johnson Buffet Lynch Schwab CITY Newton Boston Omaha Rockport San Francisco STATE MA MA NE MA CA … … … … … … • File/Table – Customers • Field/Column – 5 shown: CUSTID, FIRST, LAST, CITY, STATE • Record/Row – 5 shown: one for each customer A More Complex Example FIRST John John John Abby Abby Warren Warren Warren Peter Peter Charles Charles LAST Gallaugher Gallaugher Gallaugher Johnson Johnson Buffet Buffet Buffet Lynch Lynch Schwab Schwab CITY Newton Newton Newton Boston Boston Omaha Omaha Omaha Rockport Rockport San Francisco San Francisco STATE MA MA MA MA MA NE NE NE MA MA CA CA BUY/SELLSTOCK SHARES Buy MSFT 1000 Buy INTC 2400 Sell IBM 3000 Sell IBM 3000 Sell INTC 2000 Buy INTC 1500 Buy IBM 1700 Sell AAPL 1900 Buy AAPL 2000 Sell AAPL 10000 Buy MSFT 4500 Buy INTC 17000 PRICE DATE 90 1/4 12/24/96 80 1/8 7/3/97 114 3/8 7/1/97 110 1/8 6/30/97 94 7/8 8/30/97 90 3/8 7/2/97 101 7/8 1/4/97 18 1/2 2/14/97 19 2/14/97 21 7/8 3/15/97 101 1/8 1/15/97 80 1/8 7/2/97 TIME 12:01 PM 10:51 AM 9:03 AM 4:53 PM 3:15 PM 11:27 AM 2:02 PM 5:00 PM 5:30 PM 11:44 AM 12:38 AM 4:53 PM • Entry & Maintenance is complicated – redundant data exists, increases chance of error, complicates updates/changes, takes up space Normalize Data - Remove Redundancy CUSTID 2001 2002 2003 2004 2005 FIRST John Abby Warren Peter Charles LAST Gallaugher Johnson Buffet Lynch Schwab CITY Newton Boston Omaha Rockport San Francisco STATE MA MA NE MA CA Customer Table Transaction Table One Many CUSTID 2001 2001 2001 2002 2002 2003 2003 2003 2004 2004 2005 2005 BUY/SELLSTOCK SHARES Buy MSFT 1000 Buy INTC 2400 Sell IBM 3000 Sell IBM 3000 Sell INTC 2000 Buy INTC 1500 Buy IBM 1700 Sell AAPL 1900 Buy AAPL 2000 Sell AAPL 10000 Buy MSFT 4500 Buy INTC 17000 PRICE DATE 90 1/4 12/24/96 80 1/8 7/3/97 114 3/8 7/1/97 110 1/8 6/30/97 94 7/8 8/30/97 90 3/8 7/2/97 101 7/8 1/4/97 18 1/2 2/14/97 19 2/14/97 21 7/8 3/15/97 101 1/8 1/15/97 80 1/8 7/2/97 TIME 12:01 PM 10:51 AM 9:03 AM 4:53 PM 3:15 PM 11:27 AM 2:02 PM 5:00 PM 5:30 PM 11:44 AM 12:38 AM 4:53 PM Key Terms • Relational DBMS – manages databases as a collection of files/tables in which all data relationships are represented by common values in related tables (referred to as keys). – a relational system has the flexibility to take multiple files and generate a new file from the records that meet the matching criteria (join). • SQL - Structured Query Language – Most popular relational database standard. Includes a language for creating & manipulating data. Now With More Data CUSTID 2001 2002 2003 2004 2005 FIRST John Abby Warren Peter Charles LAST Gallaugher Johnson Buffet Lynch Schwab One Many CITY Newton Boston Omaha Rockport San Francisco CUSTID 2001 2001 2001 2002 2002 … BROKID B003 B001 B003 B001 B003 … STATE MA MA NE MA CA BUY/SELLSTOCK SHARES PRICE DATE Buy MSFT 1000 90 1/4 12/24/96 Buy INTC 2400 80 1/8 7/3/97 Sell IBM 3000 114 3/8 7/1/97 Sell IBM 3000 110 1/8 6/30/97 Sell INTC 2000 94 7/8 8/30/97 … … … … … Many One BROKID B001 B002 B003 FIRST Ivan Dennis Michael LAST Boesky Levine Milken … … … … TIME 12:01 PM 10:51 AM 9:03 AM 4:53 PM 3:15 PM … Customer Table Col. Name Length Type … CUSTID 4 Char … FIRST 10 Char … LAST 15 Char … CITY 15 Char … STATE 2 Char … … … … … Meta-Datam 1 m Broker Table Col. Name Length Type BROKID 4 Char FIRST 10 Char LAST 15 Char … … … … Transaction Table Col. Name Length Type CUSTID 4 Char BROKID 4 Char BUY/SELL 1 Bool STOCK 4 Char SHARES 8 Num PRICE 6.2 Money … … … … … … … … … … 1 … … … • Data that describes the characteristics of stored data • Enterprise Data Model – consistent, cross-functional, shareable meta-data model – standardization increases flexibility & use (data to info) – facilitates the creation of data warehouses Management Levels of IS Strategic Planning Management Control Operational Control DSS MIS TPS Warehouses & Marts • Data Warehouse – a database designed to support decision-making in an organization. It is batch-updated and structured for fast online queries and exploration. Data warehouses may aggregate enormous amounts of data from many different operational systems. • Data Mart – a database focused on addressing the concerns of a specific problem or business unit (e.g. Marketing, Engineering). Size doesn’t define data marts, but they tend to be smaller than data warehouses. Data Warehouses & Data Marts 3rd party data Data Mart (Marketing) TPS & other operational systems Data Warehouse = operational clients = query, OLAP, mining, etc. Data Mart (Engineering) Differing System Demands Operational Systems network traffic & processor demands time Managerial Systems network traffic & processor demands time Transform Data from TPS to Warehouse • Consolidate data – e.g. from multiple TPS around the country/world • “Scrub” the data – keep definitions consistent (e.g. translate part numbers/product names if they differ per country) • Calculate fields (decrease processor load) • Summarize fields (decrease processor load) • De-normalize data (ease of use) Calculated Fields Customer Service Application: Customer support person TPS - focuses on customer info Total is calculated on the fly Customer Date Gallaugher 3/25/98 Johnson 3/26/98 Buffet 3/27/98 Stock INTC AAPL MSFT Database Query Application: Marketing manager Aggregate reporting of business intelligence Total calculated in advance Shares Price Total 1000 76 1/2 $ 76,500 2500 23 1/4 $ 58,125 3000 84 $ 252,000 Query Tools & OLAP • Query Tools – user-lead discovery. Can return individual records or summaries. Requests are formulated in advance (e.g. “show me all delinquent accounts in the northeast region during Q1”). • OLAP - Online Analytical Processing – user-lead discovery. Data is explored via “drill down” into the data by selecting variables to summarize on. Results are usually reported in a cross-tab report or graph (e.g. “show me a tabular breakdown of sales by business unit, product type, and year”). OLAP • Online Analytical Processing. (example of cross-tab results presented below) 1. business unit 2. product type 3. year Data Mining • automated information discovery process, uncovers important patterns in existing data – can use neural networks or other approaches. Requires ‘clean’, reliable, consistent data. Historical data must reflect the current environment. • e.g. “What are the characteristics that identify when we are likely to lose a customer?” Data Mining Uses • • • • • • Market Segmentation - e.g. Dayton Hudson Direct Marketing - e.g. Chase Market basket analysis - e.g. Wal-Mart Customer Churn - e.g. Fleet Bank Fraud Detection - e.g. Bank of America Cost Reduction Prospecting - e.g. Merk Medco. Stupid Data-Miner Tricks • Ad-Hoc Theories – when an oddity jumps out of the data, it’s tempting to develop a theory for it. Sometimes findings are just statistical flukes. • Using Too Many Variables – the more factors considered, the more likely a relationship will be found - valid or not. • Not Taking No for an Answer – it’s OK to stop looking if you can’t find anything. There are no silver bullets. MPIII Internal & External Integration Enterprise Resource Planning (ERP) Challenges Facing IS Depts. • Y2K & Legacy Systems • Globalization (euro, currency issues) • Rapid Technology Advancement – e.g. Client/Server & Internet • IS Staffing & Retention • Changing Organizational Structures – e.g. Owens Corning • Tighter Integration with Buyers & Suppliers Legacy Systems Many firms have limited to no integration across geographic areas functional areas (v-chain) products, plants, & business units Infrastructure: general mgmt, planning, finance, IS HRM: recruiting, hiring, training, and development Tech. Development: R&D Procurement Inbound Operations Outbound Marketing logistics logistics & Sales Suppliers Service Buyers External Integration • EDI - Electronic Data Interchange – uses standard formats to pass data between disparate systems – US format - X.12, European format - UN/EDIFACT • Cost Savings – paper order = $50 - $70 – EDI order = $2.50 (VANs / private networks) – I-EDI order = less than $1 (Internet) • XML - eXtensible Markup Language – tagging language for the web What is ERP? • ERP - Enterprise Resource Planning Software – sometimes called Enterprise Applications, Enterprise Packages, Enterprise Suites, or Enterprise Systems – connects all of the information which flows through a company to a single integrated set of systems – implemented in modules which can be integrated (all at once or at a later date) e.g. Financials, Logistics, HR – may work with a wide variety of databases, hardware, and operating systems • Leading Vendors – SAP, Oracle, JD Edwards, Baan, Peoplesoft ERP in Action Production Staffing Inventory Purchasing Order Tracking Source: BusinessWeek Int’l, 1997 Sales Planning The Benefits • Internal & external integration – squeeze out waste & enable strategies • Standard software enables – inter-organizational systems (easier if buyers & suppliers use the same system, e.g. petrochem. ind.) – broad selection of add-on packages (e.g. data warehouses, etc.) • Package upgrading and new technology development is handled by vendor • Speed of deployment The Risks • Staff retention (e.g. Grace case) • Tied to a single vendor • Flexibility limited by options offered by the vendor – may inappropriately force generic processes – may inappropriate force structure • Complexity - particularly regarding mapping and standardizing processes across the organization. Make vs. Buy Comp. Adv. Will the proposed system offer proprietary comp. adv.? Security Is the process or data highly confidential? IT Competency Is IT a core competency? Tech. Skill Does the firm have sufficient expertise with tech.? Suitability/Fit Is a suitable partner/package available? Cost/Benefit Is the package cheaper than inhouse dev.? Time Is there sufficient time to develop the system Adapted from Applegate et al., p. 61. Make Buy Yes No Yes No Yes Yes No No No Yes No Yes Yes No Successful Deployment of ERP • Business Case – benchmark, cost justify (e.g. unplug mainframes) • Leadership – from the highest levels (e.g. success at Owens Corning, failure at Westinghouse) • Staffing – largely from business, not IT (users know the process) – ‘compensation handcuffs’ (e.g. end of deployment bonuses, training payback agreements) – experienced consultants - check refs., clients • Execute with proven methodologies