* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Generic Information Builders` Presentation Template
Survey
Document related concepts
Transcript
WebFOCUS Hyperstage Overview Peter Azzarello April 11, 2012 IB Toronto User Forum Summit 2012 WebFOCUS Higher Adoption & Reuse with Lower TCO Visualization & Mapping Mobile Applications Data Updating Predictive Analytics Enterprise Search High Performance Data Store Performance Management Reporting Query & Analysis MS Office & e-Publishing Dashboards Information Delivery Business to Business Data Warehouse & ETL Data Profiling & Data Quality Master Data Management Business Activity Monitoring Extensions to the WebFOCUS platform allow you to build more application types at a lower cost The Business Challenge Big Data Copyright 2007, Information Builders. Slide 3 Today’s Top Data-Management Challenge Big Data and Machine Generated Data Machine- Generated Data Data Storage Human-Generated Data Time IT Manager’s try to mitigate these response times ….. How Performance Issues are Typically Addressed – by Pace of Data Growth Tune or upgrade existing databases 66% Upgrade server hardware/processors 54% Upgrade/expand storage systems 30% Upgrade networking infrastructure 21% 32% 0% 20% 44% High Growth Low Growth 4% 7% Don't Know / Unsure 70% 60% 33% Archive older data on other systems 75% 40% 60% 80% 100% When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the problem Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010) Data Warehousing Challenges Limited Resources and Budget More Data, More Data Sources 010101010101010101010101010 01010101010101010101010101 Real time data 01 1 010101010101010101010 0101010101010101010101010 Multiple databases More Kinds of Output Needed by More Users, More Quickly 1010 01010101010101010101 0101010101010101010 External Sources 01 101 1 10 1 1 1 010 1 0 Labor intensive, heavy Traditional Data Warehousing indexing, aggregations and partitioning Hardware intensive: massive storage; big servers Copyright 2007, Information Builders. Slide 6 Expensive and complex Data Warehousing Challenges New Demands: Larger transaction volumes driven by the internet Impact of Cloud Computing More -> Faster -> Cheaper Data Warehousing Matures: Near real time updates Integration with master data management Data mining using discrete business transactions Provision of data for business critical applications Early Data Warehouse Characteristics: Integration of internal systems Monthly and weekly loads Heavy use of aggregates Classic Approaches to deal with Large Data INDEXES CUBES/OLAP Limitations of Indexes Increased Space requirements Sum of Index Space requirements can exceed the source DB Index Management Increases Load times Building the index Predefines a fixed access path Limitations of OLAP Cube technology has limited scalability Number of dimensions is limited Amount of data is limited Cube technology is difficult to update (add Dimension) Usually requires a complete rebuild Cube builds are typically slow New design results in a new cube Easy Migration to Hyperstage Most cubes will be fed from a relational source Common that relational source is a star schema The source star schema can be migrated directly to Hyperstage WebFOCUS metadata can be used to define hierarchies and drill paths to navigate the star schema Pivoting Your Perspective: Columnar Technology …. Copyright 2007, Information Builders. Slide 12 The Limitation of Rows These Solutions Contribute to Operational Limitations 1. Impediments to business agility: Organizations often must wait for DBAs to create indexes or other tuning structures, thereby delaying access to data. In addition, indexes significantly slow data-loading operations and increase the size of the database, sometimes by a factor of 2x. 2. Loss of data and time fidelity: IT generally performs ETL operations in batch mode during non-business hours. Such transformations delay access to data and often result in mismatches between operational and analytic databases. 3. Limited ad hoc capability: Response times for ad hoc queries increase as the volume of data grows. Unanticipated queries (where DBAs have not tuned the database in advance) can result in unacceptable response times, and may even fail to complete. 4. Unnecessary expenditures: Attempts to improve performance using hardware acceleration and database tuning schemes raise the capital costs of equipment and the operational costs of database administration. Further, the added complexity of managing a large database diverts operational budgets away from more urgent IT projects. The Limitation of Rows The Ubiquity of Rows … 30 columns Row-based databases are ubiquitous because so many of our most important business systems are transactional. 50 millions Rows Row-oriented databases are well suited for transactional environments, such as a call center where a customer’s entire record is required when their profile is retrieved and/or when fields are frequently updated. But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all column data for any query. Pivoting Your Perspective: Columnar Technology Employee Id Name Location Sales 1 Smith New York 50,000 2 Jones New York 65,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 Row Oriented (1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000) Works well if all the columns are needed for every query. Efficient for transactional processing if all the data for the row is available Column Oriented (1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000) Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression Pivoting Your Perspective: Columnar Technology Employee Id Name Location Sales 1 Smith New York 50,000 2 Jones New York 65,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 Data stored in rows Data stored in columns 1 Smith New York 50,000 1 Smith New York 50,000 2 Jones New York 65,000 2 Jones New York 65,000 3 Fraser Boston 40,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 4 Fraser Boston 70,000 Introducing WebFOCUS Hyperstage Copyright 2007, Information Builders. Slide 17 The Hyperstage Mission Improve database performance for WebFOCUS applications with less hardware, no database tuning and easy migration. Introducing WebFOCUS Hyperstage …. What is it? The WebFOCUS Hyperstage high performance analytic data store is designed to handle business-driven queries on large volumes of data—without IT intervention. Easy to implement and manage, Hyperstage provides the answers to your business users need at a price you can afford. Introducing WebFOCUS Hyperstage …. How is it architected? Hyperstage Engine Hyperstage combines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses. . Knowledge Grid Compressor Bulk Loader • Unmatched Administrative Simplicity • No Indexes • No data partitioning • No Manual tuning Introducing WebFOCUS Hyperstage …. What does this mean for Customers? Self-managing: 90% less administrative effort Low-cost: More than 50% less than alternative solutions Scalable, high-performance: Up to 50 TB using a single industry standard server Fast queries: Ad-hoc queries are as fast as anticipated queries, so users have total flexibility Compression: Data compression of 10:1 to 40:1 that means a lot less storage is needed, it might mean you can get the entire database in memory! Introducing WebFOCUS Hyperstage …. How does it work? Create Information (Metadata) about the data, and, upon Load, automatically … o o o Stores it in the Knowledge Grid (KG) KG Is loaded into Memory Less than 1% of compressed data Size Uses the metadata when Processing a query to Eliminate / reduce need to access data o The less data that needs to be accessed, the faster the response Sub-second responses when answered by KG o o Architecture Benefits o No Need to partition data, create/maintain indexes projections, or tune for performance Ad hoc queries are as fast as static queries, so users have total flexibility WebFOCUS Hyperstage Runtime Architecture WebFOCUS WebFOCUS Pro Server Hyperstage Adapter WebFOCUS Server Hyperstage Engine Knowledge Grid Compressor Bulk Loader Hypercopy MySQL Hyperstage Server WebFOCUS Hyperstage Engine How does it work? Column Orientation Smarter Architecture Knowledge Grid – statistics and metadata “describing” the super-compressed data No maintenance No query planning No partition schemes No DBA Data Packs – data stored in manageably sized, highly compressed data packs Data compressed using algorithms tailored to data type Summary Copyright 2007, Information Builders. Slide 26 Business Intelligence – Meeting Requirements Copyright 2007, Information Builders. Slide 27 WebFOCUS Hyperstage The Big Deal… No indexes No partitions No views No materialized aggregates Value proposition Low IT overhead Allows for autonomy from IT Ease of implementation Fast time to market Less Hardware Lower TCO No DBA Required! What’s it look like? What’s it look like? Pay no attention to that man behind the curtain. CREATE FILE baseapp/pa_inventory_ind_t DROP -RUN BULKLOAD baseapp/pa_inventory_ind_t FOR SQLINLD INV_CODE; TYPE; CATEGORY; NAME; MODEL; MEASURE1_INV; MEASURE2_INV; MEASURE3_INV; JOIN SYMBOLS.SYMBOLS.SYMBOL IN SYMBOLS TO MULTIPLE QUOTES_2B.QUOTES_2B.SYMBOL IN QUOTES_2B TAG J0 AS J0 END TABLE FILE SYMBOLS PRINT SYMBOL CLOSE_DATE CLOSE_PRICE VOLUME OPEN_PRICE WHERE ( SYMBOL EQ '&SYMBOL.(<MSFT,MSFT>).SYMBOL.' ) AND ( CLOSE_DATE GT '&START_DATE.(<2000-0301,2000-03-01>).yyyy-mm-dd.' ) AND ( CLOSE_DATE LT '&END_DATE.(<2000-03-31,2000-03-31>).yyyy-mm-dd.' ); ON TABLE SET PAGE-NUM NOLEAD ON TABLE NOTOTAL ON TABLE PCHOLD FORMAT HTML ON TABLE SET HTMLCSS ON ON TABLE SET STYLE * INCLUDE = endeflt, $ ENDSTYLE END Example – Focus to Hyperstage Compression 243639 Rows Q&A Copyright 2007, Information Builders. Slide 33 STAR SCHEMA CONSIDERATIONS Leverage the Knowledge Grid • • • • Do constrain the fact table directly Do use sub-selects instead of joins Do use date based constraints as much as possible Do add additional columns to create useful knowledge nodes Everyone wants to be a Star Adding as many WHERE conditions as you can to your SQL increases the chance that knowledge grid statistics can be used to increase the performance of your queries.