Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
I ManTech Advanced Systems International, Inc. Data Warehousing: Jeffrey T. Edgell Assistant Executive Director ManTech Advanced Systems International Decision Support In order to make correct decisions, accurate, meaningful information about business environments, external issues, and internal workings must be available in a timely fashion. A Need For New Technology • Government and industrial entities have been collecting data in electronic format since the 1960s. • Today, organizations collect millions of pieces of information about every aspect of their operation on a daily basis. • Data is obtained from multiple disparate sources. A Need For New Technology • Often information is replicated, leading to confusion. • Related data is often retained in seemingly heterogeneous and incompatible platforms. • Common data attributes are represented in nonstandard formats and naming constructs across systems. A Need for New Technology • Most systems are built for data collection (transaction based). • Designed to support On-Line Transaction Processing (OLTP). • Designed to support day-to-day business operations. • Very specific applications built to support interaction with the data. A Need for New Technology • Perform best when handling small specific volumes of data. • Does not accept information from dissimilar sources readily. • Are not constructed to handle analysis of large amounts of data efficiently. A Need for New Technology • Capable of answering questions of a specific nature and time frame. – How many items do I have in stock today? – How many tickets were sold on a specific date? – What is the current price of an item? A Need for New Technology • Transaction based systems experience great difficulty in answering analytical and decision support questions. • Analysis takes a long time, interfering with: – transaction performance – daily operations • The nature of the data is dynamic and dispersed. A Need for New Technology Most organizations have created a “spider web” of systems and data sources. A Need for New Technology All of this has created “data overload” and “data confusion”. – – – – What do I do with all of this data? What does it mean? Do I really need this data? I am overwhelmed with the amount of data I am confronted with. – I cannot make a timely decision (too much data from too many sources). Data Warehousing 101 Data warehousing is: A large historical database designed to accept key analytical data from multiple and disparate sources that manage the day-to-day management of enterprise data. Furthermore, the role of the warehouse is to transform transaction data into corporate information. The warehouse is provided in a read-only fashion to a user. Data Warehousing 101 A data warehouse will provide: The ability to ask business analysis questions in a real-time, iterative fashion, obtaining decision support information readily and quickly. Data Warehousing 101 • A data warehouse is not: A repository for all corporate data. • A data warehouse will not: Single handedly solve all of the problems associated to an enterprise. Data Warehousing 101 Key components include the following: – data model – data storage architecture (relational, proprietary) – data access/replication/transport – data transformation and scrubbing – staging and publication – metadata – warehouse hardware and software Data Warehousing 101 Data Warehouse Architecture Information Directory Repository Legacy Data Data Transformation Data Warehouse Data Warehouse Management Layer Legacy Data External Data Source Data Warehousing 101 How a warehouse deals with aging data: 1993 Qtrly Data 1994 Qtrly Data 1995 Qtrly Data 1996 Month Data 1997 Month Data Aggregation 1998 Detail Data Data Warehousing 101 Model concepts: – Fact table(s) • A table containing multiple measurable descriptors relating to a specific area of business • Each fact can be viewed, calculated, and aggregated against various defining areas of the business (time, geography, customer) Data Warehousing 101 Model concepts: – Dimension Table(s) • Retains information (product description, geography description, customer description) that is descriptive and remains moderately constant over time Data Warehousing 101 Data Warehouse Modeling – Special modeling techniques must be applied to provide rapid response of queries on large volumes of data. – OLTP systems are built with update operations in mind, resulting in normalization and greatly reduced browse performance. Data Warehousing 101 Common data model techniques are as follows: – – – – star schema snowflake fact constellation relational Data Warehousing 101 Sample Star Schema Model TIME Dimensions STORE GEOGRAPHY SALES Sales Facts Dimensions CUSTOMER Data Warehousing 101 Year North Sample Snowflake Model Qtr South Month TIME GEOGRAPHY East West Dimensions SALES Dimensions East Region STORE West Region Sales Facts CUSTOMER Data Warehousing 101 Sample Fact Constellation Model TIME Dimensions Regional Sales GEOGRAPHY District Sales Dimensions Store Sales STORE CUSTOMER Data Marting 101 Data marting is: A functional segment of an enterprise restricted for purposes of security, locality, performance, or business necessity using modeling and information delivery techniques identical to data warehousing. Data Marting 101 Why build a data mart? – Allows an organization to visualize the large but focus on the small and attainable. – Provides a platform for rapid delivery of an operational system. – Minimizes risk. – A corporate warehouse can be constructed from the union of the enterprise data marts. Data Marting 101 Data From Transaction Sources The data warehouse populates the data marts. Financial Data Mart Data Warehouse Update From the Warehouse Logistics Data Mart Contract Data Mart Data Marting 101 The data marts populate the data warehouse. Data Warehouse Financial Data Mart Logistics Data Mart Update From the Data Marts Contract Data Mart Data From Transaction Sources Data Marting 101 Virtual Data Warehouse Data is moved through the abstract layer on demand. Abstract Data Warehouse Access Layer The data warehouse layer manages the data marts Financial as a warehouse. Data Mart Logistics Data Mart Contract Data Mart Data From Transaction Sources OLAP 101 • OLAP is a powerful graphics-oriented tool used to access the data warehouse • OLAP supports – – – – – Business analysis queries Data visualization Trend analysis Scenario analysis User defined queries OLAP 101 • Drill Down – Move from summary to detail • Roll Up – Move from detail to summary • Slice and Dice – Look at a specific interest of the business OLAP 101 • Pivot and Rotate – Looking at data from varying perspectives • Drill Through – Move to a near transaction level of detail OLAP 101 • The flavors of OLAP – Multidimensional On-Line Analytical Processing (MOLAP) – Relational On-Line Analytical Processing (ROLAP) – Hybrid On-Line Analytical Processing (HOLAP) OLAP 101 • MOLAP – – – – Produces a hypercube Pre-aggregated and pre-calculated Rapid response times Limited in the amount of data that can be managed OLAP 101 • ROLAP – – – – Data remains in a relational format Some degree of aggregation Slower response times Scales to large amounts of data OLAP 101 • HOLAP – Can manage data both as ROLAP and MOLAP – Currently evolving – MOLAP vendors are finding it easier to move into the HOLAP market space Data Mining 101 As defined by the Gartner Group in 1995, data mining is: “…the process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in a repository, using pattern recognition technologies and statistical and mathematical techniques.” Data Mining 101 • Data mining requires an analyst who is familiar with the domain to appropriately model scenarios. • Data mining assists analysts in uncovering nontrivial data relationships. • Analysis must be conducted to determine the meanings of these newly identified relationships. Why Use a Data Warehouse ? • Data warehousing is a must for anyone who uses multiple data sources to make decisions and understand business (trends, forecasting). • Those who do not move to warehousing will not be capable of responding to problems and business conditions, thus falling behind the competition. Why Use a Data Warehouse ? • For organizations wanting to minimize costs and maximize productivity, warehousing is a must. • Individuals who spend time gathering data instead of analyzing data require the assistance of a warehouse. • Organizations that collect data but have difficulty determining meanings and impacts need a data warehouse. Making the Warehouse a Reality • • • • Think big but work small. Match technology to requirements. Build for the future (scalability). Work closely with the users. – Requirements – Rapid Application Development (RAD) – Periodic releases to the user community Real World Success Stories • Radio Shack – Sales and stocking analysis – Marketing (regionalized mailings) • Wal-Mart – Sales and stock analysis – Trend analysis – Vendor analysis Real World Success Stories • Naval Surface Warfare Center (NSWC) – Procurement – Supply – Workload • Harris Semiconductor – Yield – Product – Personnel productivity Real World Success Stories • Defense Logistics Agency (DLA)/ManTech – – – – Trend analysis Problem identification Procurement support Enterprise data analysis A Few Observations About Data Warehouses • Industry and our experience indicate that: – Warehouses that succeed average an ROI of 400% with the top end being as much as 600% in the first year. – The incremental approach is most successful (build the warehouse a functional area at a time). – The average time to gather requirements, perform a design, and deploy a warehouse increment is six months. – New tools may be required that differ from the transaction environment. • Software oriented toward intelligent analysis and query of the data warehouse • Hardware oriented to support the massive storage requirements and analytical queries Keys to Success • Do you understand why you are building the warehouse? • Have you identified both technical and business professionals that you will need to build the warehouse? • Do you have a strong management sponsor? • Are you managing the expectations of the users? Careers in Data Warehousing • • • • • • System Administration DW Architect Data Architect DW Manager DW Administrator Decision Support Analysts • DBA • Application Developer • Data Cleansing/ Transformation Analyst • Business Analyst • Management A Real Architecture Data Movement Using MQSeries DISA SAM MS WSDB WSDB DSD DSD DISC HP V2500 Other C&T HP V2500 WSDB Other WSDB WSDB DSD DISC Users Other DSD DSCC HP V2500 Other DSCR HP V2500 DSD Medical HP V2500 C & T Users T1 T1 Other T1 DSCR Users DSCC Users T1 M edical Users DLSC T1 MPP or SMP Cluster FLIS NIPRNET Data Warehouse Other Data Source All Key DLSC Analytical Data Future Architecture Internet Other Users