Download Report - University of Houston-Clear Lake

UNIVERSITY OF HOUSTON-CLEAR LAKE Benchmark Financial Brokers A Case Study on Building a Data Warehouse Prepared by: Terry Lee Tran Ngo Sandeep Udhani Navin Negi Prepared for: Dr. Rob ISAM 5332 5/5/2010 University of Houston-Clear Lake ISAM 5332 Dr. Rob Benchmark Financial Brokers: A Case Study on Building a Data Warehouse Terry Lee Tran Ngo Sandeep Udhani Navin Negi Abstract Case Study: Benchmark Financial Brokers The purpose of this paper is to discuss how we approached building a data warehouse for Benchmark Financial Brokers (Benchmark), starting with taking the wrong approach and ending up with a functioning project. Introduction Benchmark is involved in the securities trading business and as such has to maintain multiple transactional systems. These systems are adequate to run the day to day operations of the business, however they do not allow for timely access to strategic information. This is information that will help the owner and managers maintain the long-term health of the organization. The purpose of a data warehouse is to collect desired data from various transactional systems, convert it into a common format, cleanse it, and allow it to be queried for useful strategic information in the future. While the concept of a data warehouse sounds quite simple, in reality it is almost completely opposite. Data must be continually collected, transformed, and cleansed from each of the different transactional systems in order to keep the warehouse properly maintained. 1. Business Scenario Benchmark started with a single location in Texas and has now expanded to forty five locations throughout Texas, Louisiana, and Florida. The business owner plans on expanding to include more locations throughout the country. The problem that the owner and the managers of this business face is that with their continual expansion it is becoming increasingly difficult for them to conduct timely visits to each of the locations in order to monitor employee performance, track revenue, and ensure sound financial advice is being offered to the customer. Benchmark has operational systems that allow the advisors to monitor the constantly fluctuating stock markets, as well as transactional systems that allow them to buy or sell various financial instruments for their customers. These various systems prevent a challenge when it comes to generating useful reports that provide the information that the owner is looking for. The owner is convinced that if he can find a way to extract the data that he needs from each of these systems he will be able to maintain greater control over his business. 2. Why a Data Warehouse? Benchmark’s owner needs better strategic information about his business. This is what a data warehouse is perfect for. The concept of a data warehouse is to be able to collect data from multiple operational and transactional systems, combine them in some manner, store them in a central location, and provide Online Analytical Processing abilities in the future. This is exactly what the owner of Benchmark has been looking for. He needs the ability to collect and query this data in order to obtain reports that will provide him with the strategic information that he needs. This can include reports such as revenue for each location by state, revenue by investment type, customers who have money in high risk investment, and employees who may be purposely selling these investments. The ability to query this kind of system online from his office is the perfect solution to his problem. 3. Methodology Beginning this project as a group, we did not have a very clear picture of how we needed to go about designing a data warehouse. We began with designing what was essentially a transactional database. One of our primary concerns when we started the process was how to track the price of the securities that are being sold, as the stock market is constantly fluctuating. Even though it made complete sense to us, our initial approach proved to be wrong. A data warehouse is not meant to be a transactional database; it is a database that warehouses historical data about the organization. This is important because the stored data will be used strictly for designing the reports that management needs to help them make strategic decisions about the future. Approaching the project a second time we realized that instead of trying to figure out how to collect the data, we needed to figure out how to present the data that we already had. When designing the data warehouse, probably the best method to employ is to approach it from a reporting aspect. Approaching the project this way means that you must be thinking about the information that is ultimately desired by the end user. Once we realized that we needed to look at the project from this aspect, it led us to completely rethink our strategy, and led us to completely redesign our dimensional models. 4. Dimensional Modeling and defining data structure Dimensional modeling incorporates the business dimensions into the logical data model. (Ponniah 206) After defining requirements from the business needs, data structures are designed within the logical model. When defining requirements, users may not be able to precisely describe what they want in a data warehouse, they can provide you with very important insights into how they think about the business, as well as tell us what measurement units are important for them. Managers think of the business in terms of how they want to measure it. These measurements are the facts that indicate to the users how their departments are doing in fulfilling their objectives. We can say business metrics or facts are what managers want to analyze in order to know the current situation of their business. These are important when making decisions that will affect the future of their organization. When designing dimensions for a data warehouse it is extremely important to pay attention the hierarchies that are contained within each of the dimensions. These dimensional hierarchies are the various levels of detail contained within a business dimension. Managers can use the dimensional hierarchies as the paths for drilling down or rolling up in analysis. Dimensional modeling is the technique that is in designing a data warehouse. Many software vendors have expanded their modeling case tools to include dimensional modeling. Modern software is very useful when designing fact tables, dimension tables, and establishing the relationships between them. When you have finished modeling the dimensions and establishing the relationships, you end up with a database schema. The two types of schemas that are generally used in a data warehouse are the STAR schema and the snowflake schema. The STAR schema is a simple database schema for data design using a dimensional model. This schema consists of a fact table in the center that is directly related to the dimension tables that surround it. Although the STAR schema is a relational model, it is not a normalized model. The snowflake method normalizes the dimension tables in a STAR schema. As mentioned earlier, it is very important to employ the correct approach when designing the schema for a data warehouse. This is where we made our first mistake. As a result of this, we have two different database schemas. The first schema that we designed was a complex snowflake schema which is depicted in figure 1.1. warehouse. Although a data warehouse is based on a relational database like one that would be used in an operational system what we were building was a decision-support system. The important component to the entire project is the ability to track and analyze the commission. We do not need to worry about the price of investments or employee salary. In order to correct our error, we redesigned our database schema and arrived at a Star schema which is depicted in figure 1.2. In this schema, we have the Transaction as the fact table with four dimension tables: Employee, Customer, Investment and Date_Time. We kept these dimensions because we needed to analyze commission by state, by employee, by customer, by investment type, and by investment risk class. Figure 1.1 When the dimensions in a STAR schema are completely normalized the resulting structure resembles a snowflake with the fact table in the middle. In the case of Benchmark, the most important fact to analyze is the sales commission which is the revenue for the company. The transaction table is the fact table which contains commission as an attribute. Transactions are analyzed base on dimensions such as Customer, Employee, Investment, Time, Date, and the Commission collected. At first, we approached this project from the wrong direction, and because of that we normalized the tables in our schema. This was done because we were thinking in terms of a transactional system where we would need to track changes in the price of stocks, bonds, and other marketable securities. We stored detail information about employees such as position, and salary. After we realized that we weren’t looking at project correctly, we recognized that this information was not necessary for the data Figure 1.2 From our new Star schema, we were able to define the data format that we need for the warehouse. We defined table name, attribute name, data type of each attribute in each table, and relationship among tables. From Benchmark’s operational system, we extracted data into an excel file that is depicted in figure 1.3. Many of the fields that were extracted are necessary for an operational system, but were not needed in the data warehouse. This is where cleansing data becomes important. In order to keep the warehouse efficient, we used Excel to remove the extraneous data before it was imported. After cleansing the data, we had attributes that were important to the structure of our system. An example of the cleansed data is found in figure 1.4. When we were satisfied that are data was formatted and cleansed correctly, we moved to our next step which was to implement our database schema in Microsoft Access. This schema is displayed in figure 1.5. Figure 1.3 Figure 1.4 We used Access to define our relationships and make sure that the system functioned before importing the database into SQL Server 2008. Figure 1.5 5. Implementation in SQL Server 2008 In order to browse the data that is contained within the data warehouse, you must design a data structure called a cube. Constructing the cube is done in SQL Server Analysis Services. A cube is comprised of the fact table and all of the data that is directly related to it. The cube organizes the data into a format that can be easily queried, rolled up, drilled down, and sliced and diced based on the measures and hierarchies that are applicable to your particular data set. Importing a database from Access to SQL Server is supposed to be an easy process, but trust us it is anything but. Trying to figure out how to get the program to accept your data and process it turned out to be one of the biggest challenges of the whole project. According to Scott Cameron’s SQL Server 2008 Analysis Services: Step by Step, if you have an existing relational database such as Access, Teradata, Oracle, IBM DB2, as well as some others, you should be able to select the appropriate driver and connect to your data source without any difficulties.(Cameron 39) If only this were true. Due to not having sufficient security clearance to upload our database onto the University of Houston-Clear Lake (UHCL) server, we decided to use a personal laptop to run the software. Operating system compatibility was one of the first problems that we encountered. The solution to this problem was to download the applicable service pack from Microsoft Update. Once the software was installed we attempted to import the database from Access. The next attempt to import the database resulted in being able to import the data, but this time we could not build or deploy our cube. Do not get frustrated when you encounter this problem. We have chosen not to outline the steps that we took to get the program to function correctly as they will be different for each application. Once we had the database imported and functioning properly, we commenced building our cube. The ability to roll up, and drill down your data is based on the hierarchies that are defined within your dimensions. This very important step is depicted in figure 1.6. Figure 1.6 Without defining the hierarchies in each of your dimensions, you will not have access to all of the data. When a manager is looking for information, he may want a very high level of granularity, or a very low level of granularity. These types of details are very important when deciding how to define the dimensions that are contained within your data warehouse. When all of the hierarchies are defined, you must set up the relationships that are contained within the dimensions. An example of these relationships can be seen in figure 1.7. Figure 1.8 6. Browsing the Cube Keeping in mind that the ultimate goal of the data warehouse is to provide strategic information to managers and business owners, it is now time to browse the cube that you have created. This is the process where you are actually designing the queries that will provide the reports the end user is looking for. The Benchmark project is concerned primarily with commission that is collected from each transaction. In order to get a picture of the business as a whole it is more reasonable to query the data for commission from a particular region, or in our case, by each state. In figure 1.9, we have shown commission by state as it is presented in the cube browser. Figure 1.7 When all of hierarchies and relationships are set up, the cube can be launched. A fully implemented cube will look something like figure 1.8. Figure 1.9 This is a very high level of detail. If you were to add all of the hierarchies that are available to this query, you can drill the data down to provide commission for each employee in each zip code, as it relates to each type and name of investment. This is shown in figure 1.10. Figure 1.10 7. Generating useful reports Being able to browse the cube and design queries is a very powerful and useful tool. Unfortunately, to the end user of the system, some of these queries are almost unreadable within in the cube browser. Remember that the final result of this project is to provide strategic information that will be useful to management in making decisions that will affect the future health of the organization. These reports are not going to be provided to a member of the IT staff who would be comfortable viewing the format in the browser. Management will want a report that can be read and interpreted easily. Providing these kinds of reports is easily done once you have a functional cube. The cube that was initially created within Analysis Services can also be accessed with Reports Service which is another very powerful tool that is included in SQL Server 2008. By creating a Reports Services project, we were able to generate reports from the Benchmark warehouse that will be useful to the owner and management. The same information that is depicted in figure 1.9 is again displayed in figure 1.11 in a much easier to read format. Figure 1.11 Another report that the Benchmark management wanted was Customers with High Risk Investments, figure 1.12, which would allow them to find customers who have money in an investment that is now considered to be high risk. Figure 1.12 Even though this particular report does not directly track commission, it is directly related to the amount of commission that the company collects. The goal of Benchmark is help grow the retirement funds of their customers, and if they were to ignore these risky investments, they would lose money, ruin the reputation they have strived to build, and drive new and existing customers away. When there are no customers, there is no commission to keep track. Conclusion Entering into the process of constructing a data warehouse with no prior knowledge of the subject proved to be quite a challenge. It also turned into an exceptional learning experience. We learned to carefully analyze the project that has been presented before diving into it head first. It is essential to do this so that you can be sure that the correct approach is being taken in regards to the end result. Starting with the desired result and working backwards turned out to be the direction that we ultimately took with this project, and is probably a viable approach to take when designing a data warehouse. A data warehouse is a report centric system, so beginning with an understanding of the desired output will lead to a much more efficient design plan. We believe that we have constructed a system that Benchmark will be able to rely on for their reporting needs for the foreseeable future. Works Cited Cameron, Scott. (2009) Microsoft SQL Server 2008 Analysis Services Step by Step. Redmund, Washington: Microsoft Press. Ponniah, Paulraj. (2001) Data WarehousingFundamentals: A Comprehensive Guide for IT Professionals. New York, New York: John Wiley & Sons.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Report - University of Houston-Clear Lake