* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Generic Information Builders` Presentation Template
Survey
Document related concepts
Transcript
Maximize WebFOCUS Performance with Hyperstage Louisville User Group Meeting April 25, 2012 Lori Pieper Agenda The “Big Data” Business Challenge Pivoting Your Perspective Introducing WebFOCUS Hyperstage How does it work? So what’s the big deal? Demonstration Wrap Up and Q&A The “Big Data”Business Challenge Copyright 2007, Information Builders. Slide 3 Data Warehousing Challenges Limited Resources and Budget More Data, More Data Sources 010101010101010101010101010 01010101010101010101010101 Real time data 01 1 010101010101010101010 0101010101010101010101010 Multiple databases More Kinds of Output Needed by More Users, More Quickly 1010 01010101010101010101 0101010101010101010 External Sources 01 101 1 10 1 1 1 010 1 0 Labor intensive, heavy Traditional Data Warehousing indexing, aggregations and partitioning Hardware intensive: massive storage; big servers Copyright 2007, Information Builders. Slide 4 Expensive and complex IT Manager’s try to mitigate these response times ….. How Performance Issues are Typically Addressed – by Pace of Data Growth Tune or upgrade existing databases 66% Upgrade server hardware/processors 54% Upgrade/expand storage systems 30% Upgrade networking infrastructure 21% 32% 0% 20% 44% High Growth Low Growth 4% 7% Don't Know / Unsure 70% 60% 33% Archive older data on other systems 75% 40% 60% 80% 100% When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the problem Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010) Limitations of “Traditional” Solutions Adding indexes: Increases disk space requirements Sum of index space requirements can even exceed the source DB Index Management Increases load times to build the index Predefines a fixed access path Reports run slow if you haven’t “anticipated” the reporting needs correctly Limitations of “Traditional” Solutions Building OLAP Cubes: Cube technology has limited scalability Number of dimensions is limited Amount of data is limited Cube technology is difficult to update (add Dimension) Usually requires a complete rebuild Cube builds are typically slow New design results in a new cube Reports run slow if you haven’t “anticipated” the reporting needs correctly Pivoting Your Perspective: Turn Row-based into Column-based Copyright 2007, Information Builders. Slide 8 Why is Row-based Limiting for Analytics? The Ubiquity of Rows … 30 columns Row-based databases are ubiquitous because so many of our most important business systems are transactional. 50 millions Rows Row-oriented databases are well suited for transactional environments, such as a call center where a customer’s entire record is required when their profile is retrieved and/or when fields are frequently updated. But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all column data for any query. Why is Column-based Perfect for Analytics? Employee Id Name Location Sales 1 Smith New York 50,000 2 Jones New York 65,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 Row Oriented (1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000) Works well if all the columns are needed for every query. Efficient for transactional processing if all the data for the row is available Column Oriented (1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000) Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression Why is Column-based Perfect for Analytics? Employee Id Name Location Sales 1 Smith New York 50,000 2 Jones New York 65,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 Data stored in rows Data stored in columns 1 Smith New York 50,000 1 Smith New York 50,000 2 Jones New York 65,000 2 Jones New York 65,000 3 Fraser Boston 40,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 4 Fraser Boston 70,000 Introducing Hyperstage Copyright 2007, Information Builders. Slide 12 Introducing WebFOCUS Hyperstage …. Hyperstage is a high performance analytic data store designed to handle business-driven queries on large volumes of data—with minimal IT intervention—achieving outstanding query performance, with less hardware, no database tuning and easy migration. Introducing WebFOCUS Hyperstage …. But really… What is it? Easy to implement and manage, Hyperstage provides the answers to your business users’ needs at a price you can afford. Introducing WebFOCUS Hyperstage …. How is it architected? Hyperstage Engine Hyperstage combines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses. Knowledge Grid Compressor Bulk Loader Unmatched Administrative Simplicity: • No indexes • No data partitioning • No materialized views Introducing WebFOCUS Hyperstage …. How is it architected? Hyperstage Engine Hyperstage adds data compression of 10:1 to 40:1 so you can manage large amounts of data using much smaller disk footprint. Knowledge Grid Compressor Bulk Loader Powerful Data compression: • Store terabytes of data with only gigabytes of disk space Introducing WebFOCUS Hyperstage …. How is it architected? Hyperstage Engine Hyperstage adds a bulk loader plus an easy to use extraction and load tool, called HyperCopy, making data loading a breeze. Knowledge Grid Compressor Bulk Loader Includes embedded ETL: • Easy and seamless migration of existing analytical databases • No change in query or application required How Does it Work? Copyright 2007, Information Builders. Slide 18 WebFOCUS Hyperstage Engine How does it work? Column Orientation Smarter Architecture Knowledge Grid – statistics and metadata “describing” the super-compressed data No maintenance No query planning No partition schemes Easy “load and go” Data Packs – data stored in manageably sized, highly compressed data packs Data compressed using algorithms tailored to data type Data Organization and the Knowledge Grid …. Data Packs and Compression Data Packs 64K Each data pack contains 65, 536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data 64K type and data distribution Compression Results vary depending on the 64K 64K Patent Pending Compression Algorithms distribution of data among data packs A typical overall compression ratio seen in the field is 10:1 Some customers have seen results have been as high as 40:1 Data Organization and the Knowledge Grid …. Data Pack Nodes (DPN) A separate DPN is created for every data pack created in the database to store basic statistical information Character Maps (CMAPs) Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character Histograms Histograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals. Pack-to-Pack Nodes (PPN) PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used. This knowledge grid layer = 1% of the compressed volume WebFOCUS Hyperstage Example: Query and Knowledge Grid SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; salary age job city All values match Completely Irrelevant Suspect WebFOCUS Hyperstage Example: salary > 50000 SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; salary age job city 1. Find the Data Packs with salary > 50000 All values match Completely Irrelevant Suspect WebFOCUS Hyperstage Example: age<65 SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; salary age job city 1. Find the Data Packs with salary > 50000 2. Find the Data Packs that contain age < 65 All values match Completely Irrelevant Suspect WebFOCUS Hyperstage Example: job = ‘shipping’ SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; salary age job city 1. Find the Data Packs with salary > 50000 2. Find the Data Packs that contain age < 65 3. Find the Data Packs that have job = ‘shipping’ All values match Completely Irrelevant Suspect WebFOCUS Hyperstage Example: city = ‘Louisville’ SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; 1. 2. 3. 4. salary age job city Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Louisville’ All values match Completely Irrelevant Suspect WebFOCUS Hyperstage Example: Eliminate Pack Rows SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; 1. 2. 3. 4. 5. salary age job city All packs ignored All packs ignored Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Louisville’ Eliminate All rows that have been flagged as irrelevant All packs ignored All values match Completely Irrelevant Suspect WebFOCUS Hyperstage Example: Decompress and scan SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Louisville’ Eliminate All rows that have been flagged as irrelevant 6. Finally we identify the pack that needs to be decompressed salary age job city All packs ignored All packs ignored 1. 2. 3. 4. 5. All packs ignored Only this pack will be de-compressed All values match Completely Irrelevant Suspect Hyperstage – So what’s the big deal? Copyright 2007, Information Builders. Slide 29 WebFOCUS Hyperstage The Big Deal… No indexes No partitions No views No materialized aggregates Value proposition Low IT overhead Reduced I/O = faster response times Ease of implementation Fast time to market Less Hardware Lower TCO “Load and Go” Some Real World Results Insurance Company Query performance issues with SQL Server - Insurance claims analysis Compression achieved 40:1 Most queries running 3X faster in Hyperstage Large Bank Query performance issues with SQL Server - Web traffic analysis Compression achieved 10:1 Queries that ran for 10 to 15 mins in SQL Server ran in sub-seconds in Hyperstage Government Application Query performance issues with Oracle – Federal Loan/Grant Tracking Compression achieved 15:1 Queries that ran for 10 to 15 minutes in Oracle ran in 30 seconds in Hyperstage 31 Demonstration … Copyright 2007, Information Builders. Slide 32 Q&A Copyright 2007, Information Builders. Slide 33