Download Data Warehousing with MySQL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Ingres (database) wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data warehousing with MySQL
By Anand Pandey
MySQL
MS-SQL
Oracle
DB2
Flat Files
MySQL
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Agenda
•
•
•
•
•
•
•
•
•
Introduction
Free and Open Source Software
Data Warehousing application
Extraction, Transformation and Loading
Partitioning and Storage Engine
Configuration Parameters
Business Intelligence
Summary
Q&A
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Introduction
MySQL AB develops and markets a family of high
performance, affordable database servers and
tools.
MySQL is a key part of LAMP (Linux, Apache,
MySQL, PHP / Perl / Python), a fast growing open
source enterprise software stack.
Anand Pandey, Senior Consultant, MySQL Inc.
Josh Chamas, Senior Consultant, MySQL Inc.
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Free and Open Source Software
MySQL is licensed under GPL.
The GPL is a Free and Open Source
Software (FOSS) license that grants
licensees many rights to the software under
the condition that, if they choose to share
the software, or software built with GPLlicensed software, they share it under the
same liberal terms.
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Free and Open Source Software
Quid Pro Quo
MySQL has a dual license that works on a
quid pro quo basis—i.e., if you're free,
MySQL is free. If you're closed, you need a
license.
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Free and Open Source Software
Advantages of Open Source
•MySQL has 5 million plus active installation
base.
•New releases immediately downloaded by
users providing early feedback on bugs and
features.
•Access to source code
•Write your own features/proprietary Storage
Engine
•Freedom !
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Data Warehousing application
Data Warehouse is a relational database.
It is designed for query and analysis rather
than for transaction processing.
It enables an organization to consolidate data
from several resources.
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Data Warehousing application
Why DWH?
• How to measure and manage your
company's intangible assets?
• How to leverage its data for competitive
advantage ?
• How to measure sales performance of
previous year?
• Which department produced the maximum
profits in the current financial year?
SOLUTION: Create and Manage Data
Warehouse.
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Data Warehousing application
DWH
OLTP
Query and Extract
Transaction
Data structure
Multi Dimensional
Third Normal Form
Summary and
derived data
Common
Rare
Many
Few
Denormalized
DBMS
Normalized DBMS
Few
Many
Type of activities
Index
Duplicated Data
Join
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Data Warehousing application
A Typical Data Warehouse
Data
Source
Staging
Area
DWH
BI / OLTP
MySQL
Mining
Oracle
Staging
Database
MS-SQL
Flat
File
Copyright 2004 MySQL AB
AWH
Meta
Data
SWH
SWH
Analysis
Reporting
The World’s Most Popular Open Source Database
‹#›
Data Warehousing application
DWH Design
• Identification of important things
(Entities), their properties (Attributes) and
relationship among them (ER modeling ).
• Summary data is more important than
individual transactions (Physical and
Logical Design).
• Use tools for modeling like ERWin and
many others.
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Data Warehousing application
DWH Design
• Most common schemas
• Third Normal Form schema
• Star schema
• Snowflake schema
• Most popular table structure
• Fact Table
• Dimensional tables
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Extraction ,Transformation and Loading
Data
Source
Staging
Tables
MERGE
& BULK
INSERT
MERGE
Tables
Indexes,
Memory
Views,
Summary
Users
SWH
AWH
HEAP
Extract
Load
Copyright 2004 MySQL AB
Transform Storage
Performance
OLTP/ BI
The World’s Most Popular Open Source Database
‹#›
Extraction ,Transformation and Loading
•
•
•
•
•
•
•
Staging database
“LOAD DATA INFILE ….” Command.
Merging of SQLs
Segregating Informations
View enhancements
Index Enhancement
Memory Manipulation
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Extraction, Transformation and Loading
Staging Area and its benefits
Relational Table structures are flattened
to support extract processes in Staging
Area.
• First data is loaded into the temporary
table and then to the main DB tables.
• Reduces the required space during ETL.
• Data can be distributed to any number of
data marts
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Partitioning and Storage Engine
The MERGE Table
• A collection of identical MyISAM
tables used as one
• You can use SELECT, DELETE,
UPDATE, and INSERT on the
collection of tables.
• Use it when having large tables
• DROP the MERGE table, you drop
only the MERGE spec.
• Advantage : manageability and
performance
Copyright 2004 MySQL AB
MERGE SALES
Table
Sales
for
Yr’04
Aug’04
Oct’04
Sep’04
The World’s Most Popular Open Source Database
‹#›
Partitioning and Storage Engine
MERGING based on month as Range
JUN2004
JUL2004
JUN2004
OCT2004
AUG2004
SEP2004
OCT2004
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Partitioning and Storage Engine
MERGE Table Example
mysql> CREATE TABLE jan04 ( -> a INT NOT NULL
AUTO_INCREMENT PRIMARY KEY, -> message
CHAR(20));
mysql> CREATE TABLE feb04 ( -> a INT NOT NULL
AUTO_INCREMENT PRIMARY KEY, -> message
CHAR(20));
mysql> CREATE TABLE year04 ( -> a INT NOT
NULL AUTO_INCREMENT, -> message
CHAR(20), INDEX(a)) -> TYPE=MERGE
UNION=(jan04,feb04) INSERT_METHOD=LAST;
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Partitioning and Storage Engine
MyISAM Storage Engine
• Supports MERGE table.
• Support fulltext indexing
• “INSERT DELAYED ...” option very useful when
clients can't wait for the INSERT to complete.
Many client bundled together and written in one
block
• Compress MyISAM tables with “myisampack” to
take up much less space.
• Benefit from higher performance on SELECT
statements
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Partitioning and Storage Engine
Restrictions on MERGE tables
• You can use only identical MyISAM tables
for a MERGE table.
• MERGE tables use more file descriptors. If
10 clients are using a MERGE table that
maps to 10 tables, the server uses (10*10)
+ 10 file descriptors.
• Key reads are slower. When you read a
key, the MERGE storage engine needs to
issue a read on all underlying tables to
check which one most closely matches
the given key.
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Partitioning and Storage Engine
my.cnf parameters for DWH (example)
•
key_buffer
=
1G
•
myisam_sort_buffer_size
=
256M
• sort_buffer
=
5M
• query_cache_type
=
1
• query_cache_size
=
100M
key_buffer is the important one, this tells mysql
how much memory to cap itself
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Business Intelligence
Using MySQL database server
• Drastically reduce information retrieval by
distributing data into replicated clusters. This
enables parallel processing.
• Tighter storage format (3 TB squeezed to 1TB)
• Aggregate huge amount of data and deliver
reports for OLAP
• Relieve overloaded OLTP databases
• Availability, scalability and throughput for the
most demanding applications, and of course
affordability
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Summary
•
•
•
•
•
•
Free and Open Source under GPL
MyISAM Storage Engine
No Transactional Overhead
MERGE Table
Tighter storage format
Highly efficient
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›
Any Questions?
Anand and Josh
Copyright 2004 MySQL AB
The World’s Most Popular Open Source Database
‹#›