Download Report - University of Houston-Clear Lake

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expense and cost recovery system (ECRS) wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Versant Object Database wikipedia , lookup

Database wikipedia , lookup

Data center wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data model wikipedia , lookup

SAP IQ wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
UNIVERSITY OF HOUSTON-CLEAR LAKE
Benchmark Financial Brokers
A Case Study on Building a Data Warehouse
Prepared by:
Terry Lee
Tran Ngo
Sandeep Udhani
Navin Negi
Prepared for:
Dr. Rob
ISAM 5332
5/5/2010
University of Houston-Clear Lake
ISAM 5332
Dr. Rob
Benchmark Financial Brokers: A Case Study on Building a Data Warehouse
Terry Lee
Tran Ngo
Sandeep Udhani
Navin Negi
Abstract
Case Study: Benchmark Financial Brokers
The purpose of this paper is to discuss how we
approached building a data warehouse for
Benchmark Financial Brokers (Benchmark),
starting with taking the wrong approach and
ending up with a functioning project.
Introduction
Benchmark is involved in the securities trading
business and as such has to maintain multiple
transactional systems. These systems are
adequate to run the day to day operations of
the business, however they do not allow for
timely access to strategic information. This is
information that will help the owner and
managers maintain the long-term health of the
organization. The purpose of a data warehouse
is to collect desired data from various
transactional systems, convert it into a common
format, cleanse it, and allow it to be queried for
useful strategic information in the future. While
the concept of a data warehouse sounds quite
simple, in reality it is almost completely
opposite. Data must be continually collected,
transformed, and cleansed from each of the
different transactional systems in order to keep
the warehouse properly maintained.
1. Business Scenario
Benchmark started with a single location in
Texas and has now expanded to forty five
locations throughout Texas, Louisiana, and
Florida. The business owner plans on expanding
to include more locations throughout the
country. The problem that the owner and the
managers of this business face is that with their
continual expansion it is becoming increasingly
difficult for them to conduct timely visits to
each of the locations in order to monitor
employee performance, track revenue, and
ensure sound financial advice is being offered to
the customer. Benchmark has operational
systems that allow the advisors to monitor the
constantly fluctuating stock markets, as well as
transactional systems that allow them to buy or
sell various financial instruments for their
customers. These various systems prevent a
challenge when it comes to generating useful
reports that provide the information that the
owner is looking for. The owner is convinced
that if he can find a way to extract the data that
he needs from each of these systems he will be
able to maintain greater control over his
business.
2. Why a Data Warehouse?
Benchmark’s owner needs better strategic
information about his business. This is what a
data warehouse is perfect for. The concept of a
data warehouse is to be able to collect data
from multiple operational and transactional
systems, combine them in some manner, store
them in a central location, and provide Online
Analytical Processing abilities in the future. This
is exactly what the owner of Benchmark has
been looking for. He needs the ability to collect
and query this data in order to obtain reports
that will provide him with the strategic
information that he needs. This can include
reports such as revenue for each location by
state, revenue by investment type, customers
who have money in high risk investment, and
employees who may be purposely selling these
investments. The ability to query this kind of
system online from his office is the perfect
solution to his problem.
3. Methodology
Beginning this project as a group, we did not
have a very clear picture of how we needed to
go about designing a data warehouse. We
began with designing what was essentially a
transactional database. One of our primary
concerns when we started the process was how
to track the price of the securities that are being
sold, as the stock market is constantly
fluctuating. Even though it made complete
sense to us, our initial approach proved to be
wrong. A data warehouse is not meant to be a
transactional database; it is a database that
warehouses historical data about the
organization. This is important because the
stored data will be used strictly for designing
the reports that management needs to help
them make strategic decisions about the future.
Approaching the project a second time we
realized that instead of trying to figure out how
to collect the data, we needed to figure out
how to present the data that we already had.
When designing the data warehouse, probably
the best method to employ is to approach it
from a reporting aspect. Approaching the
project this way means that you must be
thinking about the information that is
ultimately desired by the end user. Once we
realized that we needed to look at the project
from this aspect, it led us to completely rethink
our strategy, and led us to completely redesign
our dimensional models.
4. Dimensional Modeling and defining data
structure
Dimensional modeling incorporates the
business dimensions into the logical data
model. (Ponniah 206) After defining
requirements from the business needs, data
structures are designed within the logical
model. When defining requirements, users may
not be able to precisely describe what they
want in a data warehouse, they can provide you
with very important insights into how they think
about the business, as well as tell us what
measurement units are important for them.
Managers think of the business in terms of how
they want to measure it. These measurements
are the facts that indicate to the users how
their departments are doing in fulfilling their
objectives. We can say business metrics or facts
are what managers want to analyze in order to
know the current situation of their business.
These are important when making decisions
that will affect the future of their organization.
When designing dimensions for a data
warehouse it is extremely important to pay
attention the hierarchies that are contained
within each of the dimensions. These
dimensional hierarchies are the various levels of
detail contained within a business dimension.
Managers can use the dimensional hierarchies
as the paths for drilling down or rolling up in
analysis. Dimensional modeling is the technique
that is in designing a data warehouse. Many
software vendors have expanded their
modeling case tools to include dimensional
modeling. Modern software is very useful when
designing fact tables, dimension tables, and
establishing the relationships between them.
When you have finished modeling the
dimensions and establishing the relationships,
you end up with a database schema. The two
types of schemas that are generally used in a
data warehouse are the STAR schema and the
snowflake schema. The STAR schema is a simple
database schema for data design using a
dimensional model. This schema consists of a
fact table in the center that is directly related to
the dimension tables that surround it. Although
the STAR schema is a relational model, it is not
a normalized model. The snowflake method
normalizes the dimension tables in a STAR
schema.
As mentioned earlier, it is very
important to employ the correct approach
when designing the schema for a data
warehouse. This is where we made our first
mistake. As a result of this, we have two
different database schemas. The first schema
that we designed was a complex snowflake
schema which is depicted in figure 1.1.
warehouse. Although a data warehouse is
based on a relational database like one that
would be used in an operational system what
we were building was a decision-support
system. The important component to the entire
project is the ability to track and analyze the
commission. We do not need to worry about
the price of investments or employee salary. In
order to correct our error, we redesigned our
database schema and arrived at a Star schema
which is depicted in figure 1.2. In this schema,
we have the Transaction as the fact table with
four dimension tables: Employee, Customer,
Investment and Date_Time. We kept these
dimensions because we needed to analyze
commission by state, by employee, by
customer, by investment type, and by
investment risk class.
Figure 1.1
When the dimensions in a STAR schema are
completely normalized the resulting structure
resembles a snowflake with the fact table in the
middle. In the case of Benchmark, the most
important fact to analyze is the sales
commission which is the revenue for the
company. The transaction table is the fact table
which contains commission as an attribute.
Transactions are analyzed base on dimensions
such as Customer, Employee, Investment, Time,
Date, and the Commission collected. At first, we
approached this project from the wrong
direction, and because of that we normalized
the tables in our schema. This was done
because we were thinking in terms of a
transactional system where we would need to
track changes in the price of stocks, bonds, and
other marketable securities. We stored detail
information about employees such as position,
and salary. After we realized that we weren’t
looking at project correctly, we recognized that
this information was not necessary for the data
Figure 1.2
From our new Star schema, we were able to
define the data format that we need for the
warehouse. We defined table name, attribute
name, data type of each attribute in each table,
and relationship among tables. From
Benchmark’s operational system, we extracted
data into an excel file that is depicted in figure
1.3. Many of the fields that were extracted are
necessary for an operational system, but were
not needed in the data warehouse. This is
where cleansing data becomes important. In
order to keep the warehouse efficient, we used
Excel to remove the extraneous data before it
was imported. After cleansing the data, we had
attributes that were important to the structure
of our system. An example of the cleansed data
is found in figure 1.4. When we were satisfied
that are data was formatted and cleansed
correctly, we moved to our next step which was
to implement our database schema in Microsoft
Access. This schema is displayed in figure 1.5.
Figure 1.3
Figure 1.4
We used Access to define our relationships and
make sure that the system functioned before
importing the database into SQL Server 2008.
Figure 1.5
5. Implementation in SQL Server 2008
In order to browse the data that is contained
within the data warehouse, you must design a
data structure called a cube. Constructing the
cube is done in SQL Server Analysis Services. A
cube is comprised of the fact table and all of the
data that is directly related to it. The cube
organizes the data into a format that can be
easily queried, rolled up, drilled down, and
sliced and diced based on the measures and
hierarchies that are applicable to your particular
data set. Importing a database from Access to
SQL Server is supposed to be an easy process,
but trust us it is anything but. Trying to figure
out how to get the program to accept your data
and process it turned out to be one of the
biggest challenges of the whole project.
According to Scott Cameron’s SQL Server 2008
Analysis Services: Step by Step, if you have an
existing relational database such as Access,
Teradata, Oracle, IBM DB2, as well as some
others, you should be able to select the
appropriate driver and connect to your data
source without any difficulties.(Cameron 39) If
only this were true. Due to not having sufficient
security clearance to upload our database onto
the University of Houston-Clear Lake (UHCL)
server, we decided to use a personal laptop to
run
the
software.
Operating
system
compatibility was one of the first problems that
we encountered. The solution to this problem
was to download the applicable service pack
from Microsoft Update. Once the software was
installed we attempted to import the database
from Access. The next attempt to import the
database resulted in being able to import the
data, but this time we could not build or deploy
our cube. Do not get frustrated when you
encounter this problem. We have chosen not to
outline the steps that we took to get the
program to function correctly as they will be
different for each application. Once we had the
database imported and functioning properly,
we commenced building our cube. The ability to
roll up, and drill down your data is based on the
hierarchies that are defined within your
dimensions. This very important step is
depicted in figure 1.6.
Figure 1.6
Without defining the hierarchies in each of your
dimensions, you will not have access to all of
the data. When a manager is looking for
information, he may want a very high level of
granularity, or a very low level of granularity.
These types of details are very important when
deciding how to define the dimensions that are
contained within your data warehouse. When
all of the hierarchies are defined, you must set
up the relationships that are contained within
the dimensions. An example of these
relationships can be seen in figure 1.7.
Figure 1.8
6. Browsing the Cube
Keeping in mind that the ultimate goal of the
data warehouse is to provide strategic
information to managers and business owners,
it is now time to browse the cube that you have
created. This is the process where you are
actually designing the queries that will provide
the reports the end user is looking for. The
Benchmark project is concerned primarily with
commission that is collected from each
transaction. In order to get a picture of the
business as a whole it is more reasonable to
query the data for commission from a particular
region, or in our case, by each state. In figure
1.9, we have shown commission by state as it is
presented in the cube browser.
Figure 1.7
When all of hierarchies and relationships are set
up, the cube can be launched. A fully
implemented cube will look something like
figure 1.8.
Figure 1.9
This is a very high level of detail. If you were to
add all of the hierarchies that are available to
this query, you can drill the data down to
provide commission for each employee in each
zip code, as it relates to each type and name of
investment. This is shown in figure 1.10.
Figure 1.10
7. Generating useful reports
Being able to browse the cube and design
queries is a very powerful and useful tool.
Unfortunately, to the end user of the system,
some of these queries are almost unreadable
within in the cube browser. Remember that the
final result of this project is to provide strategic
information that will be useful to management
in making decisions that will affect the future
health of the organization. These reports are
not going to be provided to a member of the IT
staff who would be comfortable viewing the
format in the browser. Management will want a
report that can be read and interpreted easily.
Providing these kinds of reports is easily done
once you have a functional cube. The cube that
was initially created within Analysis Services can
also be accessed with Reports Service which is
another very powerful tool that is included in
SQL Server 2008. By creating a Reports Services
project, we were able to generate reports from
the Benchmark warehouse that will be useful to
the owner and management.
The same
information that is depicted in figure 1.9 is
again displayed in figure 1.11 in a much easier
to read format.
Figure 1.11
Another
report
that
the
Benchmark
management wanted was Customers with High
Risk Investments, figure 1.12, which would
allow them to find customers who have money
in an investment that is now considered to be
high risk.
Figure 1.12
Even though this particular report does not
directly track commission, it is directly related
to the amount of commission that the company
collects. The goal of Benchmark is help grow the
retirement funds of their customers, and if they
were to ignore these risky investments, they
would lose money, ruin the reputation they
have strived to build, and drive new and
existing customers away. When there are no
customers, there is no commission to keep
track.
Conclusion
Entering into the process of constructing a data
warehouse with no prior knowledge of the
subject proved to be quite a challenge. It also
turned into an exceptional learning experience.
We learned to carefully analyze the project that
has been presented before diving into it head
first. It is essential to do this so that you can be
sure that the correct approach is being taken in
regards to the end result. Starting with the
desired result and working backwards turned
out to be the direction that we ultimately took
with this project, and is probably a viable
approach to take when designing a data
warehouse. A data warehouse is a report
centric system, so beginning with an
understanding of the desired output will lead to
a much more efficient design plan. We believe
that we have constructed a system that
Benchmark will be able to rely on for their
reporting needs for the foreseeable future.
Works Cited
Cameron, Scott. (2009) Microsoft SQL Server 2008 Analysis Services Step by Step. Redmund, Washington:
Microsoft Press.
Ponniah, Paulraj. (2001) Data WarehousingFundamentals: A Comprehensive Guide for IT Professionals.
New York, New York: John Wiley & Sons.