Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A REPORT ON DATAWAREHOUSING OF PROFIT ANANLYSIS ISAM 5332 Abstract Data warehousing has become very popular among organizations seeking to utilize information technology to gain a competitive advantage. Moreover, many vendors, having noticed this trend, have begun to manufacture various kinds of hardware, software, and tools to help data warehouses function more effectively. These are a number of applications that banks have implemented like evaluating business concentration and risk exposure leading to modified credit policies and loan loss reserves, consumer asset data, spot market trends, provide feedback to bankers regarding customer relationships and profitability etc. For our application we have done profit analysis of banking using the Data warehousing technique by building a specialized OLAP system for information on product profitability using snow flake schema. 1|Page Introduction A data warehouse is a central repository of information (current and historical data), collected from multiple sources and organized under a unified schema, which provides business intelligence to decision makers. The data warehouse is an information environment that provides an integrated and total view of an enterprise, and makes enterprise’s current and historical information easily accessible through Online Analytical A data warehouse maintains its functions in three layers: staging, integration, and access. Staging is used to store raw data for use by developers. The integration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users. There are many advantages of warehousing like it maintain data history, even if the source transaction systems do not, it integrates data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger. It also improves data quality, by providing consistent codes and descriptions, flagging or even fixing bad data. It presents the organization's information consistently, provides a single common data model for all data of interest regardless of the data's source. 2|Page Processing (OLAP) Tools OLAP tools enable users to interactively analyze multidimensional data from multiple perspectives. OLAP consists of three basic analytical operations: consolidation, drilldown, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. . In contrast, the drill-down is a technique that allows users to navigate through the details. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the cube and view (dicing) the slices from different viewpoints. There are two types of OLAP tools Multidimensional OLAP(MOLAP) and Relational OLAP(ROLAP). In a Multidimensional OLAP a cube is aggregated from the relational data source (data warehouse). When user generates a report request, the MOLAP tool can generate the create quickly because all data is already pre-aggregated within the cube. In a relational OLAP instead of pre-aggregating everything into a cube, the ROLAP engine essentially acts as a smart SQL generator. The ROLAP tool typically comes with a 'Designer' piece, where the data warehouse administrator can specify the relationship between the relational tables, as well as how dimensions, attributes, and hierarchies map to the underlying database tables. Some of the popular OLAP tools include Business Objects, Cognos, Hyperion, Microsoft Analysis Services, MicroStrategy, Pentaho and Palo OLAP Server. 3|Page A Case Study: Datawarehousing in Profit Analysis Business Scenario: In competitive market it is vital for banks to understand how customers and market sectors contribute towards profitability, and to what extent relationships with particular customers are profitable. Banks have come a long way towards Customer Relationship Management in the past five years. Their credit card accounts were kept on one computer, checking accounts on another, and home mortgages on a third. A specialized OLAP system for information on product profitability is the business need. This data warehouse gives the information required for profit analysis in the following forms. Profit from a particular customer at any given point of time. Aggregate Profit from wholesale/retail customers at any given point of time. Profit for any given country’s customer. Profit for a particular product customer. 4|Page Why Data Warehousing? Faster more efficient decisions- provide timely access to vast amount of data available. Improve decision making through standardized reporting definitions. When the production databases are copied in the warehouse, it becomes easier to answer all the queries without hampering the consistency of the production system. With data warehousing, you can provide a common data model for different interest areas regardless of data's source. In this way, it becomes easier to report and analyze information. Many inconsistencies are identified and resolved before loading of information in data warehousing. This makes the reporting and analyzing process simpler. The best part of data warehousing is that the information is under the control of users, so that in case the system gets purged over time, information can be easily and safely stored for longer time period. Because of being different from operational systems, a data warehouse helps in retrieving data without slowing down the operational system. Data warehousing enhances the value of operational business applications and customer relationship management systems. Data warehousing also leads to proper functioning of support system applications like trend reports, exception reports and the actual performance analyzing reports. 5|Page Methodology Dimensional Modeling The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. 6|Page To build the schema, the following design model is used: 1. Choose the business process: The first step in the model is to describe the business process which the model builds on which in this case is profit analysis in banking. 2. Declare the Grain: After describing the Business Process, the next step in the design is to declare the grain of the model. The grain of the model is the exact description of what the dimensional model should be focusing on. This could for instance be “An individual line item on a customer slip from a retail store”. To clarify what the grain means, you should pick the central process and describe it with one sentence. Furthermore the grain (sentence) is what you are going to build your dimensions and fact table from. For our project it is “ “ 3. Identify the dimensions: For the profit analysis of banking we have identified are Customer, Product and Time. 4. Identify the Fact: Profit is our Fact. 7|Page Implementation in SQL Server 2008: OLAP cubes can be thought of as extensions to the two-dimensional array of a spreadsheet. For example a company might wish to analyze some financial data by product, by time-period, by city, by type of revenue and cost, and by comparing actual data with a budget. These additional methods of analyzing the data are known as dimensions. Because there can be more than three dimensions in an OLAP system the term hypercube is sometimes used. The OLAP cube consists of numeric facts called measures which are categorized by dimensions. The cube metadata (structure) may be created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Constructing the cube is done in SQL Server Analysis Services. 8|Page Business Dimensions: The business dimensions in the model contain members that sales information that is specifically related to the requirements of our business, such as product, time, customer. 9|Page 10 | P a g e IDENTIFY THE FACTS After defining the dimensions, the next step in the process is to make keys for the fact table. This step is to identify the numeric facts that will populate each fact table row. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse. Therefore most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc. For our application we have chosen the Profit table as a fact table and is shown below. Fact-Table: Fact Table consists of the measurements, metrics or facts of a business process. often It is located at the centre of a star schema or a snowflake schema, surrounded by dimension tables. 11 | P a g e Measures: A measure is a property on which calculations (e.g., sum, count, average, minimum, maximum) can be made using pre-computed aggregates. 12 | P a g e Star Schema The star schema (also called star-join schema, data cube, or multi-dimensional schema) is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queriesQueries are written with simple inner joins between the facts and a small number of dimensions. Star joins are simpler than possible in snowflake schema. Where conditions need only to filter on the attributes desired, and aggregations are fast. To optimize user ease-of-use and retrieval performance by minimizing the number of tables to join to materialize a transaction. The primary benefit of a star schema is its simplicity for users to write, and databases to process: queries are written with simple inner joins between the facts and a small number of dimensions. Star joins are simpler than possible in snowflake schema. Where conditions need only to filter on the attributes desired, and aggregations are fast. The star schema is a way to implement multidimensional database (MDDB) functionality using a mainstream relational database: given most organizations' commitment to relational databases, a specialized multidimensional DBMS is likely to be both expensive and inconvenient. 13 | P a g e Star schema for profit analysis : 14 | P a g e Reports Profit for a particular bank during a course of an year/month/which week of the month/whether if it’s a holiday or weekday 15 | P a g e Profit for a particular bank during a course of an year/month/which week of the month/whether if it’s a holiday or weekday and whether if the customer is retail or wholesale customer. 16 | P a g e Profit based on location for a wholesale or retail customer 17 | P a g e Profit from retail or wholesale customers for a particular country for a year » quarter » month 18 | P a g e For BOFA for a year->month and for a particular country » state 19 | P a g e Conclusion: A data warehouse is the concept of data extracted from operational systems and made available as historical snapshots for ad-hoc queries and scheduled reporting. Characteristics that distinguish data in the data warehouse from data found in the operational environment are that it is: organized in such as way that relevant data is clustered together for easy access, several copies of the data from various points in time are kept together, and once the data is placed into the data warehouse it is not updated. Rather, the historical snapshots stored in the Data Warehouse are periodically refreshed with data from the operational databases. There are many advantages of warehousing such as it improves end-user access to a wide variety of data, increases data consistency, potentially lower computing costs and increased productivity, provides a place to combine related data from separate sources. It can create a computing infrastructure that can support changes in computer systems and business structures. It empowers end-users to perform any level of ad-hoc queries or reports without impacting the performance of the operational systems. We have faced several challenges and getting to the final reporting stage was really a challenging task for us. For our profit analysis datawarehouse we have generated reports for which gives the profit for a particular bank during a course of an year/month/which week of the month/whether if it’s a holiday or weekday. The profit for a particular bank during a course of an year/month/which week of the month/whether if it’s a holiday or weekday and whether if the customer is retail or wholesale customer. Profit based on location for a wholesale or retail customer and profit from retail or wholesale customers for a particular country for a year to quarter to month. For a particular bank for a year to month and for a particular country to state. Using these reports we can analyze the profits based on any specific categories using any given field as shown in the screen shots above. 20 | P a g e