Download Data Warehouse Operational Architecture

DATA WAREHOUSE OPERATIONAL ARCHITECTURE Certified IT Architect Codrin POPA Summary: This article describes the main operational aspects required to establish the proper environment to support the data warehouse applications and provides a comparation of DB2 LUW and Oracle Architecture. Keywords: DB2, Oracle, Data Warehouse Data Warehouse Environment The architecture description reflects the separation of concerns by identifying two main aspects of architecture:  The functional aspect’s focus is on describing the function of the IT system and is primarily concerned with: -the structure and modularity of the software components (both application and technical); -interactions between components; -interfaces provided by components and their usage; -dynamic behavior, expressed as collaborations between components.  The operational aspect’s focus is on describing the operation of the IT system and is primarily concerned with: -representing network organization (hardware platforms, topology); -describing what runs where – where software are placed on this network; -satisfying service level requirements (performance, availability, security); -defining the systems management and operation on the IT system. Operational Database s Extract, Transform Transform & Load Processes ET ETL L IBM Information Server metadat metadata a Enterprise Data Warehous Warehouse e RDBMS (DB2,Oracle) Data Marts Database OLAP Server (Cognos) End User Application s Reports, Analysis Dashboards, Extracts (Cognos, SPSS) Data Mining (Future) OLAP Analysis Dashboards Fig 1: Data Warehouse/ Business Intelligence Architecture To provide a homogeneous solution, integrated into the global system architecture, physical architecture consisting of hardware and software components is associated with the logical solution architecture. It is interesting to note that the best decision support solutions are built almost identical to human learning cycles. During lifetime, the events occurred are analyzed and then transformed into rules consciously or not. Every time a rule is applied the reaction is expected to validate it or not, to tune and adapt the past rules based on present events. In a Business Intelligence System data is extracted from various operational sources with different formats (databases, text files) and then processed and transformed into a homogeneous computing environment based on the information. This information is analyzed on the basis of specialized tools for ad hoc reporting, multidimensional analysis and data mining, and thus various indicators and metrics are identified. Based on these indicators, some decisions are adopt, a plan is built and the organization opts for a particular implementation decision. Currently, data (information) is extracted from the operational source systems using ETL tool component (Fig 1). Information is loaded into the database/schema corresponding to the Staging database – temporary, intermediate storage for performing validation and possible changes in the structure. The next phase consists of the execution of transformations, before detailed data is loaded into the reporting database. The Staging Database is used only by the loading process. It is used as a holding space and a work space to perform validations and calculations, and not to provide user query services. The data from operational systems moves into staging area with minimal changes. All the transformations are made between staging (intermediate zone) and target data warehouse database. The Data Integration Layer focuses on the methods that deal with the processing and movement of data to prepare it for storage in the Data Repository Layer in order to share it with the Analytical/Access applications and systems. This layer will process data in scheduled batch intervals or in near real-time/”just-in-time” intervals. ETL process can be segmented into two phases, each with a specific purpose. These phases are: The data flow from source systems to the intermediate zone, which contains mainly extraction processes; Data flow from the intermediate zone to the Data Warehouse, which contains the transformation, validation and loading processes. At first, ETL processes will consider basic validation which includes data type conversions, "trimming" etc. Advanced transformations will be implemented in phase two, from intermediate component to data warehouse component. These changes involve the following activities: Transformation and derivation of the fields; Identifying changes in the source and history maintenance; Generating surrogate keys and their maintenance; Integrating data from multiple sources; Establishing an automated process, repeatable and restartable. The Data Repository Layer contains the databases and components that provide most of the storage for the data which supports the Data Warehouse environment. Reporting System (Business Intelligence System) is a decision support system, data-oriented, with the central point, the Data Warehouse database that meets the demands and needs of decision makers on different hierarchical levels within the organization. Database is composed of complex data structures from the information presented in detail, derived, to aggregate information. Data Warehouse Database is NOT a replacement or replica of the operational database but is a complementary database, where data obtained from external operational sources are organized and reshaped into a specific structure and format in order to support decision activities. It contains the conceptual, logical, and physical data models and data model types. The metadata developed into the database component of the reporting system will contain both functional information to highlight the analytical perspective regarding the meaning of data and relationships between them, as well as technical information. The data warehouse database holds all the detailed information in the system. The Data Warehouse components, DDS – Detailed Data Store and SDS – Summarized Data Store (aggregates and result tables) are maintained up to date and the last one is used to provide data for a couple of applications that run over the Data Warehouse. Currently there are two main architectural approaches for a data warehouse. The Kimball Data Warehouse Bus (BUS) and The Corporate Information Factory (CIF) are considered the two main types of data warehousing architecture. From an architectural point of view, W.H Inmon theory is based on top-down approach and support the idea of transferring data from operational databases to a central database that is subject oriented, integrated, non-volatile and time-variant. Contrasting with the data warehouse where the data are normalized, each data mart is developed at the department level, data is nonnormalized and summarized for a specific department. R. Kimball theory is based on bottom-up approach, data marts can be contained in the data warehouse and in fact they represent the data warehouse and linked by a bus structure that should contain all the common elements used for data marts sites known as conformed dimensions and measures. The Analytics Layer provides the analytical applications that are the heart of Information on Data Warehouse environment. This layer is composed of various technological components destined to meet specific needs, and are built from “bestof-breed” software and tools. The Analytics Layer focuses on the decision support usage of Information and provides tools and components that enable advanced analytics and to deliver information in a useful fashion. Users Submit queries to the database via HTTP server to the reporting server providing a single, modern Web services architecture. For performing user authentication, reporting server can be connected to a LDAP server (Lightweight Directory Access Protocol). LDAP has emerged as a critical infrastructure component for network security and as a vital platform for enabling integration among applications and services on the network. DB2/Oracle Database Systems Description DATABASE MANAGER SHARED MEMORY (instance shared memory) Audit Buffer - buffer used in database auditing activities Monitor Heap - Memory area used for database system monitoring data Database Heap Package Cache Log Buffer Sort heap Lock List Shared Application Memory Agent Private Memory Buffer Pools Catalog Cache Application Specific Memory Application Heap Statistics Heap Statement Heap DATABASE n GLOBAL MEMORY (database memory) Utility Heap APPLICATION GLOBAL MEMORY APPLICATION GLOBAL MEMORY DATABASE 1 GLOBAL MEMORY (database memory) FCM Buffer Pool Agent/Application Shared Memory Fig 2: DB2 Database System The DB2 memory architecture (Fig 2) consits of three areas: Database Manager Shared Memory (Instance Shared Memory), Database Global memory(Database Shared Memory) and Application Global Memory (Application Shared Memory).  Instance Shared Memory is allocated when the instance is started. All other memory is attached or allocated from the Instance Shared Memory, which is controlled by the INSTANCE_MEMORY database manager (DBM) configuration parameter.  Database Shared Memory is allocated when the database is first activated or connected to for the first time. This memory is shared by all the applications that might connect to the database and process that runs within each database. Database Shared Memory is controlled by DATABASE_MEMORY database (DB) configuration parameter.  Application Shared Memory is allocated when an application connects to a database. This happens only in partitioned database environments, or in a nonpartitioned database environment where intra-partition is enabled, or if the connection concentrator is enabled. The following background processes are defined for DB2: Instance Level Processes: DB2 daemon spawner (db2gds), DB2 system controller (db2sysc), DB2 watchdog (db2wdog), DB2 format log (db2fmtlg), DB2 system logger (db2syslog); Database Level Processes: DB2 log reader (db2loggr), DB2 log writer (db2loggw),DB2 page cleaner (db2pclnr), DB2 prefetcher (db2pfchr), DB2 deadlock detector (db2dlock); Application Level Processes: DB2 communication manager (db2ipccm), DB2 TCP manager(db2tcpcm),DB2 coordinating agent (db2agent)Active subagent (db2agntp). The DB2 files can be splitted in DMS/SMS containers (data files in Oracle), initialization files(DBM CFG file, DB CFG file-init.ora file in Oracle), transaction log files and audit files - db2diag.log (alert log in Oracle). DBM CFG is used to configure and tune the DB2 server at the instance level and DB CFG is spec ific to each database. DB2 does not maintain any dynamic performance views, but uses commands to get the information from the system directory, such as LIST DATABASE DIRECTORY, LIST TABLESPACES, LIST APPLICATIONS The Oracle Architecture consits of three areas (Fig 3): Oracle Instance, Files Area and Client Area. The Oracle Instance represents the esential component in order to assure the functionality of the Oracle SGBD. The Oracle Instance is composed of memory area, background processes and user processes. RDBMS Memory Areas System Global Area (SGA) Database Buffer Cache Program Global Area Default Cache Keep Cache Pool Recycle Cache Pool Shared Pool Area SQL Statements Parse Tree Execution Plan (Library Cache) Redo Logs Buffers Technical Metadata (Data Dictionary Cache) Large Pool – for parallel executions Dbwr Ckpt Logical Unit (Tablespace) Lgwr Smon On line Redo Logs File 1 User Proces Session Memory Private SQL Area Persistent Area Run Time Area for sorting operations, hash algoritms, bitmaps creation Server Process (Dedicated) Pmon Control Files Listener Extent 1 Bloc Extent n Bloc Bloc Init.ora File ARCn Archived Logs Client Application Trace files File n Fig 3: Oracle Database System The memory area can be split into the SGA (System Global Area) and PGA (Program Global Area). The SGA System Global Area has the following structure: - Shared Pool Area; - Library cache area holds the SQL statement text, the parsed SQL statement, and the execution plan; - Data Dictionary Cache or Technical Metada Area contains the definitions of the database, analyze structures and security structures. The alghorithm used for this area is LRU (Last recently Used). - Database Buffer Cache area which contains the blocks get from the database segments. It’s an area who is functioning based on LRU alghoritm and wrinting list. The wrinting list contains the modified blocks which are not saved yet to the database. The LRU list contains free blocks that could be used, accesed blocks and modified blocks which are not moved yet to the writing list. The alghoritm consists of reading the block directly from memory (cache hit) or from disk (cache miss) - resulting in a suplimetary procces; - Redo Log Area it is a circular area which contains informations based on changes made on data base objects. This information is stored in redo entries. The LGWR background procces is wrinting these buffers to the redo log files in real time. If the archive log mode is specified than the archive processes (ARCn) copy them to the into the archive area; - Large pool size is a very important area regarding data warehouse because it keeps the necessary blocks for parallel processing. The area is not using the LRU alghoritm . The Program Global Area is an area reserved for each user process who is connecting to the database and contains control information for a single server process or a single background process. Unlike SGA who is a sharable area PGA is used for a single process. This area is alocated when the process is created. In a dedicated mode for every user process a server user process is alocated. PGA consists of:  Session memory contains session variables (ex logging information);  Private SQL area contains data such as bind information and runtime memory structures. Each session that issues a SQL statement has a private SQL area. o The persistent area, which contains, for example, bind information. It is freed only when the cursor is closed; o The run-time area, for complex operations like sorting, hash algorhitms, tables creatins is freed at the end of the execution. Oracle Processes represent services who are assuring the functionality of database server. Two type of Oracle processes:  User process who is running the application code;  Server processes include server and background processes. The server processes are created for each user proceeses in case of dedicated mode database; The major background processes are: - DBWR (DataBase Writer) – responsible for wrinting modified blocks to disk; - LGWR(Log Writer) – responsible for writing log buffers to disk when an operations is commited or when log file is re-wrinting or at every three seconds; - SMON (System Monitor) –keep the database consistency, initialization of the restore process and eliminating of temporary segments; - PMON (Process Monitor) - cleaning up the database buffer cache and freeing resources that the user process was using when a user process is blocked; - CKPT (Checkpoint Process) – controling DBWR processes and updates of datafile and control files; - ARCH (Archive Process) – This proccess copy redo log files to archive log files. File and directory structure consist of :  Trace files for server and background processes;  Data files that contain all the data and technical metadata (data dictionary). The database structure map a logical view: tablespace, segment, extension and physical view-datafiles specific for tablespaces and blocks;  Redo log files that contains oll the changes made into the database (DDL and DML);  Init files which contain all the neccesary parameters for instance open;  Control files who contain information about file status and localization;  Undo files – keep the cuurent state of the object, controlling in this way database consistency. Database Optimization The DW database must be configured to take the advantage of all the Data Warehousing features offered by RDBMS. The physical granularity of the main database will be as minimal as possible, means using different tablespaces with different characteristics for partitioned/non-partitioned tables: dimension tables, fact tables, aggregated tables etc. Also there must be a consistent file structure with data files separate from program files, separate from log files etc. The Staging database could be similarly configured, the main difference is that substantially less memory must be allocated to the Staging server as it will have less concurrent connections, and the size of the database is substantially smaller. The methodology for calculating the number of CPU’s is based of the expected maximum number of concurrent queries and ETL procedures. The procedure is to calculate the maximum number of concurrent users times the maximum number of concurrent reports per user that the end-user business intelligence tool permits. The size of the memory is based on the size of tables, type of sorts, number and types of users, ETL transformations etc. The calculation will be based on the assumptions done for two types of operations: -Queries -Batch (ETL jobs) The compression feature must be used. In Oracle the database compression can be achive at tabalespace level, in DB2 at table level. Data stored in relational databases is keep growing as a result of businesses requirements for more information. A big portion of the cost of keeping large amounts of data is in the cost of disk systems, and the resources utilized in managing that data. Both databases provide a way to deal with this cost by compressing data stored in relational tables with virtually no negative impact on query time against that data, thereby enabling substantial cost savings. by eliminating duplicate values in a database block. Compressed data stored in a database block (a.k.a. disk page) is self-contained. That is, all the information needed to recreate the uncompressed data in a block is available within that block. The typical compression ratio for large data-warehouse tables ranges from 2:1 to 4:1. The data warehouse must use compression especially for fact partitioned tables and results partitioned tables. RAID5 provides some protection against disk crashes, without being as expensive as RAID1+0. It does this by calculating parity bits which can be used to recreate 1 disk out of 4 if it fails, as opposed to saving everything twice, which will allow a retrieval of data if one disk in two crashes. However the calculation of the parity bits slows down the write performance by half compared to RAID1+0. Files that have critical performance and/or resilience to failure (eg Database redo logs/circular logs for protection and performance, database temp files/system temporary files-containers for performance) should be stored on RAID1+0 disks. One RAID1+0 logical volumes should be used for Oracle/DB2 Logs and Temporary files. The second RAID1+0 logical volumes will be used for Data Stage temporary files. Table partitioning in DB2 is similar cu table partitioning in oracle. This data organization scheme is one in which table data is divided across multiple storage objects called data partitions or ranges according to values in one or more table columns. Each data partition is stored separately and can be in different table spaces. In DB2 only hash partitioning is posible and only local indexes, not gloval indexes. Partitioning tablespace will assure the database modularity that must be o top priority in order to maintain an optimum back-up/restor and purge strategy and also to increase database performance specifing database parallelism. DB2 has another option Data Partitioning Feature (DPF) reffered as database partitioning. In a partitioned environment, a database is distributed across multiple partitions, capable of residing on different machines. Each partition, called a database partition, is responsible for a portion of a database’s total data. Because data is divided across database partitions, it can use the power of multiple processors on multiple physical servers to satisfy requests for information. Data retrieval and update requests are decomposed automatically into sub-requests, and executed in parallel among the applicable database partitions. Multidimensional clustering, also known as an MDC table, is a method of data organization that clusters data together on disk according to dimension key values. A dimension is a key or attribute, such as product, time period, or geography, used to group factual data into a meaningful way for a particular application. A cell is a unique combination of values across all dimensions. Data in the MDC tables is clustered via cells. Using the Materialized Views (MV-Oracle) or Materialized Query Tables(MQT-DB2) can improve the performance of the system since that is not a logical view but in fact are disk based and refresh periodically. For Oracle the memory allocation to the different applications running on the server could be changed depending upon whether the system is performing data loads, or is not loading and is being used for queries. This is implemented mainly through the dynamic Oracle system parameters. They can be changed at the beginning and end of every batch load. Caching options should be set up:  Most dimension tables with indexes should be allocated to the Keep buffer and specified as cached. The Keep Cache should be allocated enough space to hold all these objects with a little room for growth;  All other DW tables should be allocated to the Recycle Cache, which should also be the tablespace defaults. The remainder of the memory is allocated to the buffer cache is allocated to the Recycle pool. This leaves the system tables alone in the default cache. All objects that are not in the Keep cache should be specified as Parallel. In the first instance no degree should be specified. All objects that are in the Keep pool should be left as (or changed to) NOPARALLEL. For data warehouse objects the DOP (degree of parallelism) recommended settings are: Fact tables: Dimension tables: Aggregation tables: DOP = high DOP = low DOP = high DB2 Enterprise DPF uses two levels of parallelism:  Intra-partition parallelism, which is the ability to have multiple processors process different parts of an SQL query, index creation, or a database load within a database partition. This level of parallelism can be specified in the DBM configuration file by setting the INTRA_PARALLEL parameter to ON.  Inter-partition parallelism, which provides the ability to break up a query into multiple parts across multiple partitions of a partitioned database, o one server or multiple database servers. This can be accomplished on both SMP servers and massively parallel processing (MPP) clustered servers. Back-up and Restore/Recovery Database back-up, restoration, and recovery are essential processes underlying any high availability system. I propose a back-up strategy that shoul be based on a standard methodology applied for data warehouse production environment consists of two (main) databases associated:  The Data Warehouse itself  The Staging database The back-ups will protect data from application error and acts as a safeguard against unexpected data loss, by providing a way to restore original data. If a failure takes place with the databases or an intruder destroys part or the entire databases, then it is crucial to have correct backup and recovery procedures in place. As part of normal operation, the main database creates transaction logs that record changes to the database. Databases created with default parameters create circular logs where log data is overwritten after a cycle. The archived transaction logs must be backed-up themselves periodically because they are used during the restore process. Depends on the size of data to be exported and mainly the log retain mode/archive log mode there are two options regarding back-up strategy :  no archivelog mode/circular logging  archivelog mode/retain logging If a database is not in Log Retain Mode, and it is hacked or damaged in any way, then a point in time recovery is not possible and data will be lost. In order to avoid common problems regarding failures on production environment the main data warehouse database could be in log retain mode (archivelog mode). The location of the software distribution, logs, data files etc should be splited for resilience. Reconstructing the contents of all or part of a database from a back-up typically involves two phases: retrieving a copy of the datafile from a back-up, and reapplying changes to the file since the backup from the archived and online redo logs, to bring the database to a desired moment since the backup (usually, the present). To restore a datafile or control file from backup is to retrieve the file onto disk from a backup location on tape, disk or other media, and make it available to the database server. To recover a datafile (also called performing recovery on a datafile), is to take a restored copy of the datafile and apply to it changes recorded in the database's redo logs. To recover a whole database is to perform recovery on each of its datafiles. In planning the database back-up and recovery strategy, a common problem is to anticipate the errors that will arise, and put in place the backups needed to recover from them. While there are several types of problem that can halt the normal operation of an database or affect database I/O operations, only two typically require DBA intervention and media recovery: media failure, and user errors. Instance failures, network failures and failure of a statement to execute due to, for instance, exhaustion of some resource such as space in a datafile may require DBA intervention, and might even crash a database instance, but will not generally cause data loss or the need to recover from backup. The appropriate recovery technique following a media failure depends on the files affected and the types of backup available. No Archive Log Mode/Circular Logging-Restore / Recover Strategy The restore mechanism will be based on the ETL daily load and realized by applying staging data to data warehouse database. In this case, a daily basis backup of staging database must be performed. Main Production database will be fully backed-up (cold backup) to tape once a week. After the main data warehouse is restored every instance of staging database will be applied to data warehouse database. Restore/Recover DW Environment Restore Datafiles from Tape Applying Staging Data to DW Database STGn DW Backup library Fig 4: Data Warehouse/ Business Intelligence Architecture Daily copy to disk (daily image of the source) of staging database will take place also in normal situations not only when crush-restore scenario occurs. The restoring mechanism will be based on restoring the entire data warehouse database and after applying necessary staging data from staging database (that is backed-up daily) restored from disk. It is madatory that at least 2 copy of Staging database have to be apply to the main data warehouse database, every day after data warehouse database is restored. Archive Log Mode/Log Retain-Restore/Recover Strategy The second approach is based on log retain/archive database mode. Main Production database will be fully backed-up – cold back-up to tape once a week. Archive logs will be backed-up on a daily basis. Data warehouse database must be recovered up to the moment before crash, by applying archives. Restore/Recover DW Environment Restore Archives/Datafiles from Tape Applying Archives ARCn DW Applying Staging Data to DW Database STGn DW Backup library After the main data warehouse is restored every instance of staging database will be applied to data warehouse database. Daily copy to disk of staging database will take place only when crush-restore scenario occurs. While restore to a point in time before crash, main data warehouse database is not available and therefore it is necessary to capture the daily image of the source every day. The restoring mechanism will be based on restoring the entire data warehouse database and the necessary logs from tapes and after applying necessary staging data from staging database restored from disk. It is madatory that at least 2 copy of Staging database have be apply to the main data warehouse database every day after the system is restored. The back-up/recovery strategy must follows not only the infrastructure and software technology context but also logical database context. Scalability and High Availability Without support for scalable hardware environments the following problems can occur:  Processing is slower, because hardware resources are not maximized;  Application design and hardware configuration cannot be decoupled, and manual intervention and possibly redesign is required for every hardware change;  Scaling on demand is not possible. Database-level scalability is possible vertically as well as horizontally. Based on the hardware solution the server scales with additional system resources to the possible expansion limit. The horizontal scaling or the "scale out" approach at database level could be achieved by using Real Application Cluster for Oracle or DB2 Data partitioning feature (DPF) architecture. In a shared-nothing architecture, the database is partitioned among the nodes of a cluster system. IBM DB2 LUW is considered a shared-nothing architecture and is based on data partitioning feature. In a partitioned environment, a database is distributed across multiple partitions, capable of residing on different machines. Each partition, called a database partition, is responsible for a portion of a database’s total data. Server Server Server Server Storage Storage Storage Storage Shared Nothing Architecture Fig 4: Shared Nothing Architecture In a shared disk database, database files are logically shared among the nodes of a loosely coupled system with each instance having access to all data. The shared disk access is accomplished either through direct hardware connectivity or by using an operating system abstraction layer that provides a single view of all the devices on all the nodes. Server Server Server Server Storage Storage Storage Storage Fig 5: Shared Disk Architecture Share Disk Architecture Oracle Real Application Cluster (RAC) allows multiple Database servers to access a single Oracle database in a clustered environment. RAC uses the Oracle Clusterware for the infrastructure to bind multiple servers so that they operate as a single system. Each instance has a buffer cache in its System Global Area (SGA). Using Cache Fusion, RAC environments logically combine each instance’s buffer cache to enable the instances to process data as if the data resided on a logically combined, single cache. For disaster recovery the DB2 solution provide HADR capabilities. DB2 high availability disaster recovery (HADR) is a data replication feature that provides a high availability solution for both partial and complete site failures. HADR protects against data loss by replicating data changes from a source database, called the primary, to a target database, called the standby. The architecture consists of two sites primary and secondary site -disaster recovery (optionally). In case of a node or site failure a switch will be done in order to switch to the other site. High Availability Disaster Recovery is a data replication feature that provides a high availability solution for both partial and complete site failures. HADR protects against data loss by replicating data changes from a source database, called the primary, to a target database, called the standby. Synchronous Mode In synchronous mode, DB2 ensures that the log records being written to disk on the primary server are also written to disk on the standby server before an application receives a successful return code to its commit statements. In this mode, there is a guarantee that no committed transactions will ever be lost as both servers stay completely in sync. Near Synchronous Mode In near synchronous mode, DB2 ensures that the log records being written to disk on the primary server are in memory at the standby server (but perhaps not on disk at the standby) prior to notifying an application that its commit statement was successful. In this mode, there will never be any transactions lost unless both the primary and standby fail simultaneously. Asynchronous Mode In asynchronous mode, DB2 will write the log buffer to disk on the primary server and ensure the log buffer has been passed down to the TPC/IP socket to be sent over to the standby. In this case, it would be possible to lose a committed transaction if the primary failed and the packets containing the log buffer did not make it to the standby server prior to a takeover. Oracle Data Guard is the Oracle feature that allows for the creation of standby databases that can be kept transactionally consistent with a primary database. To achieve this, Oracle ships log buffers (or log files in some configurations) from the primary server to the standby server where the log records are replayed on the standby database. Oracle Data Guard standby types are: Logical Standby and Physical Standby. In logical standby mode, log records are converted to SQL statements and replayed on the standby database. This more closely resembles DB2’s SQL Replication and Q Replication capabilities and as such will not be discussed in this paper. In physical standby mode, log records are applied using redo logic which applies the records much in the same fashion as would occur when rolling forward a database through log files. Security IT Security Standards will be followed during the definition of the AAA (Access, Authentication and Authorisation) mechanism, which shall be used to guarantee data security. A full user access, report level, and data level security approach must be detailed in the Security Plan. A security Plan describes:  Who is allowed access to the instance and/or database;  Where and how a user's password will be verified;  The authority level that a user is granted;  ;  ;  . Logical Data Level authentication is mainly managed by Front-End (end-user reporting tool) tool. The Cognos security model supports the distribution of security administration. Because objects in Content Manager, such as folders and groups, can be secured separately, security administration can be assigned to individuals in separate functional areas of the organization. The following authorization services are setup in Cognos Server - reporting layer. Permissions are related to the users, groups, and roles defined in third-party authentication providers. Permissions define access rights to objects, such as directories, folders, and other content, for each user, group, or role. Permissions also define the activities that can be performed with these objects. Cognos authorization assigns permissions to: - groups and roles created in the Cognos namespace in the Content Manager.; - entire namespaces, users, groups, and roles created in third-party authentication providers. The users accessing the Data Warehouse Environment should be split based on main activities and types of environment on the database: - DBA users for administrative reasons - Application users for running ETL processes - Reporting users Both DB2/Oracle engine can assign specific tabalespaces, privileges on database objects to data warehouse users. In DB2 a user at database level must be created first at operating system level. RESOURCES C. Ballard, D. M. Farrell, A. Gupta,C. Mazuela, S.Vohnik, Dimensional Modeling in a Business Intelligence Environment, ibm.com/redbooks, 2006 W. Chen, A.N. Choi, M Greenstein, S.J. Martin, F. McArthur, C. Pinto, A.V. Sammartino, N. Sokolof Oracle to DB2 Conversion Guide for Linux, UNIX, and Windows, ibm.com/redbooks, 2007 IBM DB2 Database for Linux, UNIX, and Windows Information Center, http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp, 2011 R. Youngs, D. Redmond-Pyle, P. Spaas, E. Kahan, A standard for architecture description , IBM SYSTEMS JOURNAL, VOL 38, NO 1, 1999 IBM Software Group, Technical Comparison of DB2 HADR and Oracle Data Guard, IBM Corporation, 2005 IBM Software Group,Why should you care about the cost of your High Availability Solution?, IBM Corporation, 2004 James Colquhoun – Data Warehouse Review, 2004 Oracle Database Performance Tuning Guide, 9i Release 2 (9.2) - A96533 02, Oracle Documentation, 2002 Oracle Database Concepts 11g Release 2 (11.2) - E16508-05, Oracle Documentation, 2010 Oracle Database Concepts 10g Release 2 (10.2) - B14220-02, Oracle Documentation, 2005 Oracle Database Backup and Recovery Basics 10g Release 2 (10.2) - B1419202, Oracle Documentation, 2005 Oracle Database Performance Tuning Guide 10g Release 2 (10.2) - B14211-01, Oracle Documentation, 2005 Technical Comparison of Oracle RealApplication Clusters 11g vs. IBM DB2 v9 for Linux, Unix, and Windows, Oracle White Paper, 2009

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Warehouse Operational Architecture