Download Reports

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A REPORT
ON
DATAWAREHOUSING
OF
PROFIT ANANLYSIS
ISAM 5332
Abstract
Data warehousing has become very popular among organizations seeking to utilize
information technology to gain a competitive advantage. Moreover, many vendors,
having noticed this trend, have begun to manufacture various kinds of hardware,
software, and tools to help data warehouses function more effectively. These are a
number of applications that banks have implemented like evaluating business
concentration and risk exposure leading to modified credit policies and loan loss
reserves, consumer asset data, spot market trends, provide feedback to bankers
regarding customer relationships and profitability etc. For our application we have
done profit analysis of banking using the Data warehousing technique by building a
specialized OLAP system for information on product profitability using snow flake
schema.
1|Page
Introduction
A data warehouse is a central repository of information (current and historical
data), collected from multiple sources and organized under a unified schema, which
provides business intelligence to decision makers. The data warehouse is an information
environment that provides an integrated and total view of an enterprise, and makes
enterprise’s current and historical information easily accessible through Online
Analytical A data warehouse maintains its functions in three layers: staging, integration,
and access. Staging is used to store raw data for use by developers. The integration layer
is used to integrate data and to have a level of abstraction from users. The access layer is
for getting data out for users. There are many advantages of warehousing like it maintain
data history, even if the source transaction systems do not, it integrates data from multiple
source systems, enabling a central view across the enterprise. This benefit is always
valuable, but particularly so when the organization has grown by merger. It also improves
data quality, by providing consistent codes and descriptions, flagging or even fixing bad
data. It presents the organization's information consistently, provides a single common
data model for all data of interest regardless of the data's source.
2|Page
Processing (OLAP) Tools
OLAP tools enable users to interactively analyze multidimensional data from multiple
perspectives. OLAP consists of three basic analytical operations: consolidation, drilldown, and slicing and dicing. Consolidation involves the aggregation of data that can be
accumulated and computed in one or more dimensions. . In contrast, the drill-down is a
technique that allows users to navigate through the details. Slicing and dicing is a feature
whereby users can take out (slicing) a specific set of data of the cube and view (dicing)
the slices from different viewpoints. There are two types of OLAP tools
Multidimensional
OLAP(MOLAP)
and
Relational
OLAP(ROLAP).
In
a
Multidimensional OLAP a cube is aggregated from the relational data source (data
warehouse). When user generates a report request, the MOLAP tool can generate the
create quickly because all data is already pre-aggregated within the cube. In a relational
OLAP instead of pre-aggregating everything into a cube, the ROLAP engine essentially
acts as a smart SQL generator. The ROLAP tool typically comes with a 'Designer' piece,
where the data warehouse administrator can specify the relationship between the
relational tables, as well as how dimensions, attributes, and hierarchies map to the
underlying database tables. Some of the popular OLAP tools include Business Objects,
Cognos, Hyperion, Microsoft Analysis Services, MicroStrategy, Pentaho and Palo OLAP
Server.
3|Page
A Case Study: Datawarehousing in Profit Analysis
Business Scenario:
In competitive market it is vital for banks to understand how customers and market
sectors contribute towards profitability, and to what extent relationships with particular
customers are profitable. Banks have come a long way towards Customer Relationship
Management in the past five years. Their credit card accounts were kept on one
computer, checking accounts on another, and home mortgages on a third. A specialized
OLAP system for information on product profitability is the business need. This data
warehouse gives the information required for profit analysis in the following forms.
 Profit from a particular customer at any given point of time.
 Aggregate Profit from wholesale/retail customers at any given point of time.
 Profit for any given country’s customer.
 Profit for a particular product customer.
4|Page
Why Data Warehousing?
Faster more efficient decisions- provide timely access to vast amount of data
available. Improve decision making through standardized reporting definitions. When the
production databases are copied in the warehouse, it becomes easier to answer all the
queries without hampering the consistency of the production system. With data
warehousing, you can provide a common data model for different interest areas
regardless of data's source. In this way, it becomes easier to report and analyze
information. Many inconsistencies are identified and resolved before loading of
information in data warehousing. This makes the reporting and analyzing process
simpler. The best part of data warehousing is that the information is under the control of
users, so that in case the system gets purged over time, information can be easily and
safely stored for longer time period. Because of being different from operational systems,
a data warehouse helps in retrieving data without slowing down the operational system.
Data warehousing enhances the value of operational business applications and customer
relationship management systems. Data warehousing also leads to proper functioning of
support system applications like trend reports, exception reports and the actual
performance analyzing reports.
5|Page
Methodology
Dimensional Modeling The dimensional model is built on a star-like schema, with
dimensions surrounding the fact table.
6|Page
To build the schema, the following design model is used:
1. Choose the business process: The first step in the model is to describe the business
process which the model builds on which in this case is profit analysis in banking.
2. Declare the Grain: After describing the Business Process, the next step in the design
is to declare the grain of the model. The grain of the model is the exact description of
what the dimensional model should be focusing on. This could for instance be “An
individual line item on a customer slip from a retail store”. To clarify what the grain
means, you should pick the central process and describe it with one sentence.
Furthermore the grain (sentence) is what you are going to build your dimensions and
fact table from. For our project it is “ “
3. Identify the dimensions: For the profit analysis of banking we have identified are
Customer, Product and Time.
4. Identify the Fact: Profit is our Fact.
7|Page
Implementation in SQL Server 2008:
OLAP cubes can be thought of as extensions to the two-dimensional array of
a spreadsheet. For example a company might wish to analyze some financial data by
product, by time-period, by city, by type of revenue and cost, and by comparing actual
data with a budget. These additional methods of analyzing the data are known as
dimensions. Because there can be more than three dimensions in an OLAP system the
term hypercube is sometimes used. The OLAP cube consists of numeric facts
called measures which are categorized by dimensions. The cube metadata (structure) may
be created from a star schema or snowflake schema of tables in a relational database.
Measures are derived from the records in the fact table and dimensions are derived from
the dimension tables. Constructing the cube is done in SQL Server Analysis Services.
8|Page
Business Dimensions:
The business dimensions in the model contain members that sales information that is specifically
related to the requirements of our business, such as product, time, customer.
9|Page
10 | P a g e
IDENTIFY THE FACTS
After defining the dimensions, the next step in the process is to make keys for the fact
table. This step is to identify the numeric facts that will populate each fact table row. This
step is closely related to the business users of the system, since this is where they get
access to data stored in the data warehouse. Therefore most of the fact table rows are
numerical, additive figures such as quantity or cost per unit, etc. For our application we
have chosen the Profit table as a fact table and is shown below.
Fact-Table:
Fact Table consists of the measurements, metrics or facts of a business process.
often
It is
located at the centre of a star schema or a snowflake schema, surrounded by
dimension tables.
11 | P a g e
Measures:
A measure is a property on which calculations (e.g., sum, count, average, minimum,
maximum) can be made using pre-computed aggregates.
12 | P a g e
Star Schema
The star schema (also called star-join schema, data cube, or multi-dimensional schema) is
the simplest style of data warehouse schema. The star schema consists of one or more
fact tables referencing any number of dimension tables. The star schema is an important
special case of the snowflake schema, and is more effective for handling simpler
queriesQueries are written with simple inner joins between the facts and a small number
of dimensions. Star joins are simpler than possible in snowflake schema. Where
conditions need only to filter on the attributes desired, and aggregations are fast. To
optimize user ease-of-use and retrieval performance by minimizing the number of tables
to join to materialize a transaction.
The primary benefit of a star schema is its simplicity for users to write, and databases to
process: queries are written with simple inner joins between the facts and a small number
of dimensions. Star joins are simpler than possible in snowflake schema. Where
conditions need only to filter on the attributes desired, and aggregations are fast.
The star schema is a way to implement multidimensional database (MDDB) functionality
using a mainstream relational database: given most organizations' commitment to
relational databases, a specialized multidimensional DBMS is likely to be both expensive
and inconvenient.
13 | P a g e
Star schema for profit analysis :
14 | P a g e
Reports
Profit for a particular bank during a course of an year/month/which week of the
month/whether if it’s a holiday or weekday
15 | P a g e
Profit for a particular bank during a course of an year/month/which week of the
month/whether if it’s a holiday or weekday and whether if the customer is retail or
wholesale customer.
16 | P a g e
Profit based on location for a wholesale or retail customer
17 | P a g e
Profit from retail or wholesale customers for a particular country for a year » quarter »
month
18 | P a g e
For BOFA for a year->month and for a particular country » state
19 | P a g e
Conclusion:
A data warehouse is the concept of data extracted from operational systems and made
available as historical snapshots for ad-hoc queries and scheduled reporting.
Characteristics that distinguish data in the data warehouse from data found in the
operational environment are that it is: organized in such as way that relevant data is
clustered together for easy access, several copies of the data from various points in time
are kept together, and once the data is placed into the data warehouse it is not updated.
Rather, the historical snapshots stored in the Data Warehouse are periodically refreshed
with data from the operational databases. There are many advantages of warehousing
such as it improves end-user access to a wide variety of data, increases data consistency,
potentially lower computing costs and increased productivity, provides a place to
combine related data from separate sources. It can create a computing infrastructure that
can support changes in computer systems and business structures. It empowers end-users
to perform any level of ad-hoc queries or reports without impacting the performance of
the operational systems. We have faced several challenges and getting to the final
reporting stage was really a challenging task for us.
For our profit analysis datawarehouse we have generated reports for which gives the
profit for a particular bank during a course of an year/month/which week of the
month/whether if it’s a holiday or weekday. The profit for a particular bank during a
course of an year/month/which week of the month/whether if it’s a holiday or weekday
and whether if the customer is retail or wholesale customer. Profit based on location for a
wholesale or retail customer and profit from retail or wholesale customers for a particular
country for a year to quarter to month. For a particular bank for a year to month and for a
particular country to state. Using these reports we can analyze the profits based on any
specific categories using any given field as shown in the screen shots above.
20 | P a g e