Download The Oracle9i Data Warehouse Toolkit

Document related concepts
no text concepts found
Transcript
Oracle9i:
The Data Warehouse Database
OracleWorld 2002
#31348
0011 0010 1010 1101 0001 0100 1011
Ian Abramson
IAS Inc.
[email protected]
1
2
4
Agenda
• The database
0011 0010 1010 1101 0001 0100 1011
–
–
–
–
–
–
Partitioning
External Tables
Parallel Execution
Materialized Views
The Optimizer
Bit-map join indexes
• The SQL
– CUBE and ROLLUP
– Rolling Windows
– Multi-table Inserts
• OLAP Overview
• Data Mining Overview
1
2
4
Oracle9i for Data
Warehousing
0011 0010 1010 1101 0001 0100 1011
Oracle 7.3
Hash Join
Bitmap Indexes
Parallel-Aware Optimizer
Partition Views
Partitioned Tables and Indexes
Instance Affinity: Function Shipping
Partition
Pruning
Parallel Union
All
Asynchronous Read-Ahead
Parallel Index Scans
Histograms
Hash and Composite
Parallel Insert, Update, Delete
Anti-Join
Oracle 8.0


Oracle8i
Partitioning
1
2
Oracle9i
Resource Manager
List Partitioning
Parallel Bitmap Star
Query
Progress Monitor
Bitmap Join Index
Parallel ANALYZE
Adaptive Parallel Query
Dynamic Aggregation Buffersize
Parallel Constraint Enabling
Server-based Analytic Functions
Materialized Intermediate Results
Server Managed 
Backup/Recovery
Materialized Views
Grouping Sets
Point-in-Time Recovery
Transportable Tablespaces
Concatenated Grouping Sets
Direct Loader API
Aggregate Pruning
Functional Indexes
Partition-wise Joins
New Analytic Functions
Security Enhancements Self-Tuning Execution Memory

4
System Managed Undo
Dynamic Resizing of Buffer Pool
ETL Infrastructure
and much more ...

Oracle9.2i
Complete e-Business Intelligence
Infrastructure
0011 0010
1010 1101 0001 0100 1011
1
2
4
Oracle9.2i Database
0011 0010 1010 1101 0001 0100 1011
Relational
OLAP
Data Mining
ETL
M
e
t
a
d
a
t
a
1
2
4
Oracle9.2i Application Server
Runs All Your Business Intelligence
Applications
0011 0010 1010 1101 0001 0100 1011
Portal
M
e
t
a
d
a
t
a
Query & Reporting
1
BI Components
2
4
Real-time Personalization
Hello! We have recommendations for you.
Oracle9i R2 DW Architecture
0011 0010 1010 1101 0001 0100 1011
Oracle9iDB
Oracle9iAS
Data Warehousing
ETL
OLAP
Data Mining
Portal
M
e
t
a
d
a
t
a
1
Query & Reporting
2
4
BI Components
Real-Time Personalization
Hello! We have recommendations for you.
The Old Way:
Everything is a Different Product
0011 0010 1010 1101 0001 0100 1011
OLAP
Engine
Data
Sources
Data
Integration
Engine
Data
Warehouse
Engine
1
2
4
Mining
Engine
The New Way: Oracle9i
0011 0010 1010 1101 0001 0100 1011
Oracle9i
Data Warehousing
ETL
OLAP
Data Mining
1
2
4
• All aspects of architecture are integrated
The Oracle9i DW Database
0011 0010 1010 1101 0001 0100 1011
1
2
4
Computers are useless.
They only give you answers
Picasso
Partitioning Advantages
0011 0010 1010 1101 0001 0100 1011
•
•
•
•
•
Separates data in separate pieces
Partition key defined at creation
Partition pruning
Partition-wise joins
May partition
1
– Tables
– Indexes
2
4
Traditional vs. Partitions
0011 0010 1010 1101 0001 0100 1011
Partition table
Approach
Single table
Approach
1
2
4
More Advantages
0011 0010 1010 1101 0001 0100 1011
• Partitions - separate physical entities
– physical attributes (PCTFREE, PCTUSED,
INITRANS,MAXTRANS) may vary for different
partitions of the same table or index
1
2
4
• Different partitions - different tablespaces
– minimizes the impact of data corruption
– independent back up and recovery of each
partition
– balance the I/O load
Partition Options
0011 0010 1010 1101 0001 0100 1011
•
•
•
•
•
Range
Hash
Composite
List
Range-List
1
2
4
Range Partitioning
0011 0010 1010 1101 0001 0100 1011
• Range partitioning - maps rows to
partitions based on ranges of column
values
1
CREATE TABLE Sales_by_department
(Department NUMBER,
SalesId
NUMBER,
Amount
NUMBER )
PARTITION BY RANGE ( Department )
PARTITION single_digits VALUES LESS THAN (10)
TABLESPACE sd_low,
PARTITION double_digits VALUES LESS THAN (100)
TABLESPACE dd_middle,
PARTITION multiple_digits VALUES LESS THAN (maxvalue)
TABLESPACE md_high);
2
4
Hash Partitioning
0011 0010 1010 1101 0001 0100 1011
• maps rows to partitions based on a
hash value of the partitioning key
(oracle internally determines that)
CREATE TABLE Sales_by_department
(Department NUMBER,
SalesId
NUMBER,
Amount
NUMBER )
PARTITION BY HASH ( SalesID )
1
2
4
PARTITION hash_name1 TABLESPACE hash_name1_tbls,
PARTITION hash_name2 TABLESPACE hash_name2_tbls,
PARTITION hash_name3 TABLESPACE hash_name3_tbls);
Composite Partitioning
0011 0010 1010 1101 0001 0100 1011
• Partitions data using the range method,
and within each partition sub-partitions it
using the hash method
CREATE TABLE Sales_by_department
(Department
NUMBER,
SalesId
NUMBER,
Amount
NUMBER )
PARTITION BY RANGE ( Department )
SUBPARTITION BY HASH (SalesID)
SUPPARTITIONS 2 STORE IN (sub1_tbls, sub2_tbls)
(PARTITION single_digits
VALUES LESS THAN (10)
TABLESPACE sd_tbls,
PARTITION double_digits
VALUES LESS THAN (100)
TABLESPACE dd_tbls,
PARTITION multiple_digits VALUES LESS THAN (maxvalue)
TABLESPACE md_tbls) );
1
2
4
List Partitions
0011 0010 1010 1101 0001 0100 1011
CREATE TABLE sales_list (
salesman_id NUMBER(5),
salesman_name VARCHAR2(30),
sales_state VARCHAR2(20),
sales_amount NUMBER(10),
sales_date DATE)
PARTITION BY LIST(sales_state)
(PARTITION sales_west VALUES
('California', 'Hawaii'),
PARTITION sales_east VALUES
('New York', 'Virginia',
'Florida'),
PARTITION sales_central VALUES('Texas', 'Illinois')
PARTITION sales_other VALUES(DEFAULT) );
1
2
4
Range-List Partitions
0011 0010 1010 1101 0001 0100 1011
CREATE TABLE quarterly_regional_sales
(deptno
NUMBER,
item_no
VARCHAR2(20),
txn_date
DATE,
txn_amount
NUMBER,
state
VARCHAR2(2))
PARTITION BY RANGE (txn_date)
SUBPARTITION BY LIST (state)
(PARTITION q1_2002 VALUES
LESS THAN(TO_DATE('1-APR-2002','DD-MON-YYYY'))
(SUBPARTITION q1_2002_northwest VALUES ('OR', 'WA'),
SUBPARTITION q1_2002_southwest VALUES ('AZ', 'UT', 'NM'),
SUBPARTITION q1_2002_northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION q1_2002_southeast VALUES ('FL', 'GA'),
SUBPARTITION q1_2002_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q1_2002_southcentral VALUES ('NM', 'TX')),
PARTITION q2_2002 VALUES LESS THAN(TO_DATE('1-JUL-2002','DDMON-YYYY'))
1
2
4
Partition Maintenance
Functions
0011 0010 1010 1101 0001 0100 1011
•
•
•
•
•
•
•
Add
Drop
Exchange
Move
Split and Merge
Truncating
Coalesce (Hash only)
1
2
4
Partition Exchange
0011 0010 1010 1101 0001 0100 1011
• Allows you to create data in a separate table and then
replace a partition with the table
• Validate or don’t validate is the question
• Actually exchanges table and partition
• Nice for archiving
SQL>
1
2
3
4
5*
run
alter table sales_transactions
exchange partition sales_feb_2000
with table load_sales
including indexes
without validation
Table altered.
1
2
4
Partition Notes
0011 0010 1010 1101 0001 0100 1011
• Separate partitions - separate tablespaces
• Beware of MAXVALUE in rolling-window
• Use naming conventions
– range partitions -->
table_name_YYYY_MM_DD
table_name_tbls_YYYY_MM_DD
1
2
4
• No global indexes for hash partitions
External Tables
0011 0010 1010 1101 0001 0100 1011
• Data resides on operating system
• Table definition resides in database
(SYSTEM tablespace)
• Need to define:
– Directory for files
– Table definition
1
2
4
• Read-only
• No indexes
• Problems exhibit themselves at SELECT time
Oracle9i: ETL Scenario
Oracle8i: Multiple staging tables and SQL statements
Staging
Table
0011 0010 1010 1101 0001 0100 1011
FLAT FILES
Step 1: Load
into staging
table
Step 2:
Transform data
using function
TRANSFORM
Oracle9i: Single SQL statement
Staging
Table
1
Step 3: Insert
and update into
target table
2
TARGET
4
Oracle9i: Parallel pipelining of data
External Tables the SQL
0011 0010 1010 1101 0001 0100 1011
•
Create directory:
CREATE DIRECTORY data_dir AS 'd:\wkdir';
CREATE DIRECTORY log_dir AS 'c:\TEMP';
1
2
4
External Tables the SQL (2)
0011 0010 1010 1101 0001 0100 1011
CREATE TABLE products_delta
(
PROD_ID
NUMBER(6),
PROD_NAME
VARCHAR2(50),
PROD_DESC
VARCHAR2(4000),
PROD_SUBCATEGORY VARCHAR2(50),
PROD_SUBCAT_DESC VARCHAR2(2002),
PROD_CATEGORY
VARCHAR2(50),
PROD_CAT_DESC
VARCHAR2(2002),
PROD_WEIGHT_CLASS NUMBER(2),
PROD_UNIT_OF_MEASURE VARCHAR2(20),
PROD_PACK_SIZE
VARCHAR2(30),
SUPPLIER_ID
NUMBER(6),
PROD_STATUS
VARCHAR2(20),
PROD_LIST_PRICE NUMBER(8,2),
PROD_MIN_PRICE
NUMBER(8,2)
)
ORGANIZATION external
(TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
CHARACTERSET US7ASCII
BADFILE
log_dir:'prod_delta.bad_xt‘
LOGFILE
log_dir:'prod_delta.log_xt‘
FIELDS TERMINATED BY "|" LDRTRIM
)
Location
('prodDelta.dat')
)
REJECT LIMIT UNLIMITED NOPARALLEL;
1
2
4
Parallel Execution
0011 0010 1010 1101 0001 0100 1011
• Do more work at the same time
• Best for:
–
–
–
–
–
Large table scans
Creation of large indexes
Partition table scans
Bulk inserts, updates and deletes
Aggregations and summarizations
• System characteristics
–
–
–
–
SMP, MPP, Clusters
Sufficient I/O bandwidth
Under-utilized CPU
Sufficient memory
1
2
4
Getting Parallel to Work
• In the init.ora
0011 0010 1010 1101 0001 0100 1011
parallel_automatic_tuning=TRUE
parallel_max_servers=n
• 2 * DOP * # of concurrent users
parallel_min_servers=n
• 0 or ??(your choice)
large_pool_size & shared_pool_size
1
2
4
mem in bytes = (3 x size x users x groups x connections)
• SIZE = PARALLEL_EXECUTION_MESSAGE_SIZE
• USERS = the number of concurrent parallel execution users that
you expect to have running with the optimal DOP
• GROUPS = the number of query server process groups used for
each query
• A simple SQL statement requires only one group. However, if
your queries involve subqueries which will be processed in
parallel, then Oracle uses an additional group of query server
processes.
• CONNECTIONS = (DOP2 + 2 x DOP)
Materialized Views
0011 0010 1010 1101 0001 0100 1011
•
•
•
•
Provide summary tables
A “Physical” view
Performance gains are significant
Similar to snapshots
1
– Refresh may be FAST (need log on master)
– COMPLETE
– FORCE
2
4
• Oracle packages provide help and guidance
DBMS_MVIEW.EXPLAIN_MVIEW ('SH.CAL_MONTH_SALES_MV');
Materialized Views
0011 0010 1010 1101 0001 0100 1011
• Enable it in the init.ora
• Query rewrite
1
2
QUERY_REWRITE_ENABLED = TRUE
QUERY_REWRITE_INTEGRITY = TRUSTED
4
• May enable at session level as well
Creating a Materialized View
0011 0010 1010 1101 0001 0100 1011
CREATE MATERIALIZED VIEW cust_sales_mv
PCTFREE 0
STORAGE (initial 8k next 8k pctincrease 0)
BUILD IMMEDIATE
REFRESH FORCE
ENABLE QUERY REWRITE
AS SELECT c.cust_id,
SUM(amount_sold) AS dollar_sales
FROM sales s, customers c
WHERE s.cust_id= c.cust_id
GROUP BY c.cust_id ;
1
2
4
Collecting the Details
0011 0010 1010 1101 0001 0100 1011
• Use the DBMS_MVIEW supplied
package
• Has many other packages to help you
with your materialized views.
• Today you get one!
1
2
4
EXECUTE DBMS_MVIEW.EXPLAIN_MVIEW('SH.CAL_MONTH_SALES_MV');
Verifying a Materialized View
SELECT capability_name, possible, SUBSTR(related_text,1,8) AS rel_text, SUBSTR(msgtxt,1,60) AS
msgtxt
FROM MV_CAPABILITIES_TABLE ORDER BY seq;
0011 0010 1010 1101 0001 0100 1011
CAPABILITY_NAME
--------------PCT
REFRESH_COMPLETE
REFRESH_FAST
REWRITE
PCT_TABLE
PCT_TABLE
REFRESH_FAST_AFTER_INSERT
REFRESH_FAST_AFTER_INSERT
REFRESH_FAST_AFTER_INSERT
REFRESH_FAST_AFTER_INSERT
REFRESH_FAST_AFTER_INSERT
REFRESH_FAST_AFTER_INSERT
REFRESH_FAST_AFTER_ONETAB_DML
REFRESH_FAST_AFTER_ONETAB_DML
P
N
Y
N
Y
N
N
N
N
N
N
N
N
N
N
REL_TEXT
--------
REFRESH_FAST_AFTER_ONETAB_DML
REFRESH_FAST_AFTER_ONETAB_DML
REFRESH_FAST_AFTER_ANY_DML
N
N
N
REFRESH_FAST_AFTER_ANY_DML
REFRESH_FAST_AFTER_ANY_DML
REFRESH_PCT
N SH.TIMES
N SH.SALES
N
REWRITE_FULL_TEXT_MATCH
REWRITE_PARTIAL_TEXT_MATCH
REWRITE_GENERAL
REWRITE_PCT
Y
Y
Y
N
MSGTXT
-----(Partition Change Tracking)
SALES
TIMES
SH.TIMES
SH.TIMES
SH.TIMES
SH.SALES
SH.SALES
SH.SALES
DOLLARS
no partition key or PMARKER in select list
relation is not a partitioned table
mv log must have new values
mv log must have ROWID
mv log does not have all necessary columns
mv log must have new values
mv log must have ROWID
mv log does not have all necessary columns
SUM(expr) without COUNT(expr)
see the reason why REFRESH_FAST_AFTER_INSERT is
disabled
COUNT(*) is not present in the select list
SUM(expr) without COUNT(expr)
see the reason why
REFRESH_FAST_AFTER_ONETAB_DML is disabled
mv log must have sequence
mv log must have sequence
PCT is not possible on any of the detail tables
in the materialized view
1
2
4
PCT is not possible on any detail tables
Dimensions
0011 0010 1010 1101 0001 0100 1011
•
•
•
•
•
•
Categorizes data
Provides hierarchy guidance
Allows roll-up and roll-down
Needed for query rewrite
Needed for materialized views
OEM has Dimension Wizard
Category
1
2
Sub-Category
4
Product
Dimensions the SQL
0011 0010 1010 1101 0001 0100 1011
CREATE DIMENSION products_dim
LEVEL product IS
(products.prod_id)
LEVEL subcategory IS
(products.prod_subcategory)
LEVEL category IS
(products.prod_category)
HIERARCHY prod_rollup (
product
CHILD OF
subcategory
CHILD OF
category )
ATTRIBUTE product DETERMINES
(products.prod_name, products.prod_desc,
prod_weight_class, prod_unit_of_measure,
prod_pack_size,prod_status, prod_list_price,
prod_min_price)
ATTRIBUTE subcategory DETERMINES
(prod_subcategory, prod_subcat_desc)
ATTRIBUTE category DETERMINES
(prod_category, prod_cat_desc);
1
2
4
The Optimizer is a Star
0011 0010 1010 1101 0001 0100 1011
• Star Transform must be enabled
STAR_TRANSFORMATION_ENABLED=TRUE
• Requires Bit Mapped indexes or bit mapped
join index on foreign key columns
CREATE BITMAP INDEX sales_c_state_bjix
ON sales(customers.cust_state_province)
FROM sales, customers
WHERE sales.cust_id = customers.cust_id
LOCAL NOLOGGING COMPUTE STATISTICS;
1
2
4
• Cost-based optimizer must be used
– Optimizer looks for small set of dimensions to
satisfy query, even if large number of rows in fact
The Star SQL
0011 0010 1010 1101 0001 0100 1011
SELECT
ch.channel_class, c.cust_city,
t.calendar_quarter_desc,
SUM(s.amount_sold) sales_amount
FROM sales s, times t, customers c, channels ch
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND s.channel_id = ch.channel_id
AND c.cust_state_province = 'CA'
AND ch.channel_desc in ('Internet','Catalog')
AND t.calendar_quarter_desc IN (‘2001-Q1',‘2001-Q2')
GROUP BY ch.channel_class, c.cust_city,
t.calendar_quarter_desc;
1
2
4
How the Optimizer Sees It
0011 0010 1010 1101 0001 0100 1011
SELECT ch.channel_class, c.cust_city,
t.calendar_quarter_desc,
SUM(s.amount_sold) sales_amount
FROM sales
WHERE time_id IN
(SELECT time_id FROM times
WHERE calendar_quarter_desc IN(‘2001-Q1',‘2001-Q2'))
AND cust_id IN
(SELECT cust_id FROM customers
WHERE cust_state_province='CA')
AND channel_id IN
(SELECT channel_id FROM channels
WHERE channel_desc IN('Internet','Catalog'));
1
2
4
Star Transform Restrictions
0011 0010 1010 1101 0001 0100 1011
• The star transform use is restricted:
–
–
–
–
A hint tells optimizer to not use a bitmap index
Query contains bind variables
Not enough bitmap indexes
The fact table is a remote table
1
– Tables have a single access path
– Tables are too small to be useful
– Database is in read-only mode
2
4
• The optimizer may not choose the star:
Bitmap Join Indexes
0011 0010 1010 1101 0001 0100 1011
Sales
Customer
CREATE BITMAP INDEX cust_sales_bji
ON Sales(Customer.state)
FROM Sales, Customer
WHERE Sales.cust_id = Customer.cust_id;
2
4
Index key is Customer.State
Indexed table is Sales
1
Resumable Transactions
0011 0010 1010 1101 0001 0100 1011
• Suspend transactions
• Ability to resume these transactions
• Only errors currently handled:
–
–
–
–
–
1
SQL statements that run out of TEMP space
DML/Export/CREATE TABLE as …
Space limits
Out-of-space transaction
Exceed space quota
2
4
• Full support of locally managed tablespaces
• Great for large DW queries
Resumable the SQL
• Enable Resumable transactions
ALTER SESSION ENABLE RESUMABLE TIMEOUT 1200;
0011 0010 1010 1101 0001 0100 1011
• The Transaction:
CREATE TABLE sales_prod_dept
( prod_category, prod_subcategory, cust_id,
time_id,channel_id,promo_id, quantity_sold, amount_sold
)
NOLOGGING TABLESPACE transfer
PARTITION BY LIST (prod_category)
( PARTITION boys_sales values ('Boys'),
PARTITION girls_sales values ('Girls'),
PARTITION men_sales values
('Men'),
PARTITION women_sales values ('Women') )
AS SELECT p.prod_category, p.prod_subcategory, s.cust_id,
s.time_id, s.channel_id, s.promo_id, SUM(s.amount_sold)
amount_sold, SUM(s.quantity_sold) quantity_sold
FROM sales s, products p, times t
WHERE p.prod_id=s.prod_id AND s.time_id = t.time_id
AND t.fiscal_year= 2002
GROUP BY prod_category, prod_subcategory,cust_id,
s.time_id, channel_id, promo_id ;
1
2
4
The Proof it Stopped
0011 0010 1010 1101 0001 0100 1011
SELECT name, status, error_msg FROM
dba_resumable;
1
2
4
The SQL
0011 0010 1010 1101 0001 0100 1011
1
2
4
The SQL
0011 0010 1010 1101 0001 0100 1011
• Oracle provides many analytical functions
–
–
–
–
–
–
–
Cross tabular reports
Ranking functions
Percentile functions
Regression analysis
Moving windows
Ratios within a report
CASE statements
1
2
4
CUBE, ROLLUP and GROUP
functions
0011 0010 1010 1101 0001 0100 1011
•
•
•
•
•
Efficient data access
Creates matrix of totals
Crosstabs are computed for you
CUBE provides all totals
GROUPING provides guidance of totals
1
2
4
ROLLUP the SQL
0011 0010 1010 1101 0001 0100 1011
SELECT channel_desc,
calendar_month_desc,
country_id,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
1
2
FROM sales, customers, times, channels
WHERE
sales.time_id=times.time_id
AND
sales.cust_id=customers.cust_id
AND
sales.channel_id= channels.channel_id
AND
channels.channel_desc IN ('Direct Sales', 'Internet')
AND
times.calendar_month_desc IN ('2002-09', '2002-10')
AND
country_id IN ('CA', 'US')
4
GROUP BY ROLLUP (channel_desc,calendar_month_desc,country_id);
ROLLUP Results
0011 0010 1010 1101 0001 0100 1011
CHANNEL_DESC
------------Direct Sales
Direct Sales
Direct Sales
Direct Sales
Direct Sales
Direct Sales
Direct Sales
Internet
Internet
Internet
Internet
Internet
Internet
Internet
CALENDAR
-------2002-09
2002-09
2002-09
2002-10
2002-10
2002-10
CO
-CA
US
2002-09
2002-09
2002-09
2002-10
2002-10
2002-10
CA
US
CA
US
CA
US
SALES$
-------------1,378,126
2,835,557
4,213,683
1,388,051
2,908,706
4,296,757
8,510,440
911,739
1,732,240
2,643,979
876,571
1,893,753
2,770,324
5,414,303
13,924,743
BY Channel and Month
BY Month
BY Channel
1
2
4
BY Channel and Month
BY Month
BY Channel
For Everything
CUBE the SQL
0011 0010 1010 1101 0001 0100 1011
SELECT channel_desc,
calendar_month_desc,
country_id,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
1
2
FROM sales, customers, times, channels
WHERE
sales.time_id=times.time_id
AND
sales.cust_id=customers.cust_id
AND
sales.channel_id= channels.channel_id
AND
channels.channel_desc IN ('Direct Sales', 'Internet')
AND
times.calendar_month_desc IN ('2002-09', '2002-10')
AND
country_id IN ('CA', 'US')
4
GROUP BY CUBE (channel_desc,calendar_month_desc,country_id);
CUBE the Results
CHANNEL_DESC
-------------------Direct Sales
Direct Sales
Direct Sales
Direct Sales
Direct Sales
Direct Sales
Direct Sales
Direct Sales
Direct Sales
Internet
Internet
Internet
Internet
Internet
Internet
Internet
Internet
Internet
-------2002-09
2002-09
2002-09
2002-10
2002-10
2002-10
CALENDAR
-CA
US
CO
---------1,378,126
2,835,557
4,213,683
1,388,051
2,908,706
4,296,757
2,766,177
5,744,263
8,510,440
911,739
1,732,240
2,643,979
876,571
1,893,753
2,770,324
1,788,310
3,625,993
5,414,303
2,289,865
4,567,797
6,857,662
2,264,622
4,802,459
7,067,081
4,554,487
9,370,256
13,924,743
SALES$
0011 0010 1010 1101 0001 0100 1011
CA
US
CA
US
2002-09
2002-09
2002-09
2002-10
2002-10
2002-10
CA
US
CA
US
CA
US
2002-09
2002-09
2002-09
2002-10
2002-10
2002-10
CA
US
CA
US
CA
US
BY Channel and Month
BY Channel and Month
BY Channel and Country
BY Channel
BY Channel and Month
BY Channel
BY Month and Country
BY Month
Everything
1
2
4
BY Channel and Month
BY Channel and Country
The GROUPING function
0011 0010 1010 1101 0001 0100 1011
SELECT DECODE(GROUPING(channel_desc), 1,
'All Channels', channel_desc) AS Channel,
DECODE(GROUPING(country_id), 1,
'All Countries', country_id) AS Country,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND sales.channel_id= channels.channel_id
AND channels.channel_desc IN ('Direct Sales', 'Internet')
AND times.calendar_month_desc= '2002-09'
AND country_id IN (‘CA', 'US')
GROUP BY CUBE(channel_desc, country_id);
1
2
4
GROUPING Results
0011 0010 1010 1101 0001 0100 1011
CHANNEL
----------------Direct Sales
Direct Sales
Direct Sales
Internet
Internet
Internet
All Channels
All Channels
All Channels
COUNTRY
----------CA
US
All Countries
CA
US
All Countries
CA
US
All Countries
SALES$
-------------1,378,126
2,835,557
4,213,683
911,739
1,732,240
2,643,979
2,289,865
4,567,797
6,857,662
1
2
4
Rolling Windows
0011 0010 1010 1101 0001 0100 1011
• Allow analysis
– Cumulative aggregates
– Moving aggregations
– Centred aggregations
– Logical offsets
1
2
4
Rolling Windows the SQL
0011 0010 1010 1101 0001 0100 1011
SELECT cust_id, t.time_id,
TO_CHAR (SUM(amount_sold), '9,999,999,999') AS SALES,
TO_CHAR(AVG(SUM(amount_sold)) OVER
(PARTITION BY s.cust_id ORDER BY t.time_id RANGE BETWEEN
INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY
FOLLOWING), '9,999,999,999') AS CENTERED_3_DAY_AVG
FROM sales s, times t
WHERE s.time_id=t.time_id
AND t.calendar_week_number IN (51)
AND calendar_year=2001
AND cust_id IN (6380, 6510)
GROUP BY cust_id, t.time_id ORDER BY cust_id, t.time_id;
1
2
4
The Rolling Window Result
0011 0010 1010 1101 0001 0100 1011
CUST_ID
--------6380
6380
6380
6380
6380
6380
6380
6510
6510
6510
6510
6510
6510
6510
TIME_ID
SALES
CENTERED_3_DAY
--------- --------- -------------20-DEC-01
2,240
1,136
21-DEC-01
32
873
22-DEC-01
348
148
23-DEC-01
64
302
24-DEC-01
493
212
25-DEC-01
80
423
26-DEC-01
696
388
20-DEC-01
196
106
21-DEC-01
16
155
22-DEC-01
252
143
23-DEC-01
160
305
24-DEC-01
504
240
25-DEC-01
56
415
26-DEC-01
684
370
1
2
4
Ranking Data
0011 0010 1010 1101 0001 0100 1011
SELECT channel_desc, calendar_month_desc,
TO_CHAR(TRUNC(SUM(amount_sold),-6), '9,999,999,999') SALES$,
RANK() OVER (ORDER BY trunc(SUM(amount_sold),-6) DESC) AS
RANK,
DENSE_RANK() OVER (ORDER BY TRUNC(SUM(amount_sold),-6) DESC)
AS DENSE_RANK
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id
AND sales.cust_id=customers.cust_id
AND sales.time_id=times.time_id
AND sales.channel_id=channels.channel_id
AND times.calendar_month_desc IN ('2002-09', '2002-10')
AND channels.channel_desc<>'Tele Sales'
GROUP BY channel_desc, calendar_month_desc;
1
2
4
Rank Results
0011 0010 1010 1101 0001 0100 1011
CHANNEL_DESC CALENDAR
------------ --------Direct Sales
2002-10
Direct Sales
2002-09
Internet
2002-09
Internet
2002-10
Catalog
2002-09
Catalog
2002-10
Partners
2002-09
Partners
2002-10
SALES$
RANK DENSE_RANK
-------------- --------- ---------10,000,000
1
1
9,000,000
2
2
6,000,000
3
3
6,000,000
3
3
3,000,000
5
4
3,000,000
5
4
2,000,000
7
5
2,000,000
7
5
1
2
4
RATIO_TO_REPORT function
0011 0010 1010 1101 0001 0100 1011
• Computes a ratio value to a sum of
values
• Deals with NULLs accurately
• Data can be partitioned by values
1
2
4
• select product_key, sum(sales_amount) Sales,
•
sum(sum(sales_amount)) over ()
Tot_Sales,
•
ratio_to_report(sum(sales_amount))
over ()
•
report_ratio
• from
monthly_sales
• group by product_key;
RATIO_TO_REPORT Result
0011 0010 1010 1101 0001 0100 1011
• Product_key sales
Tot_sales
• ----------- -----------------• A123
210
1104
• B9837
112
• C8743
90
• C9662
472
• R4300
100
• T0843
120
Report_ratio
-----------0.19
1104
1104
1104
1104
1104
1
0.10
0.08
0.43
0.09
0.11
2
4
100%
Multi-table Inserts
0011 0010 1010 1101 0001 0100 1011
• Conditional insert of data
• Need to have source data in a table
• Insert options
1
2
4
– ALL (all conditions that are true)
– FIRST (first condition that is true)
Muti-table Insert SQL
0011 0010 1010 1101 0001 0100 1011
INSERT ALL
INTO sales VALUES (product_id,
customer_id,weekly_start_date,'P', 501,q_sun,sales_sun)
INTO sales VALUES (product_id,
customer_id,weekly_start_date+1,'P', 501,q_mon,sales_mon)
INTO sales VALUES (product_id,
customer_id,weekly_start_date+2,'P', 501,q_tue,sales_tue)
INTO sales VALUES (product_id,
customer_id,weekly_start_date+3,'P', 501,q_wed,sales_wed)
INTO sales VALUES (product_id,
customer_id,weekly_start_date+4,'P', 501,q_thu,sales_thu)
INTO sales VALUES (product_id,
customer_id,weekly_start_date+5,'P', 501,q_fri,sales_fri)
INTO sales VALUES (product_id,
customer_id,weekly_start_date+6,'P', 501,q_sat,sales_sat)
SELECT * FROM sales_input_table;
1
2
4
Conditional Multi-Table Insert
0011 0010 1010 1101 0001 0100 1011
INSERT ALL
WHEN order_total < 1000000 THEN
INTO small_orders
2
WHEN order_total > 1000000 AND order_total < 2000000
THEN
INTO medium_orders
WHEN order_total > 2000000 THEN
INTO large_orders
1
4
SELECT order_id, order_total, sales_rep_id,
customer_id FROM orders;
The Upsert
0011 0010 1010 1101 0001 0100 1011
• Perform and insert or update
• Touch data once
• Data is always SELECT’ed from source
1
2
4
Upsert the SQL
0011 0010 1010 1101 0001 0100 1011
MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_list_price=s.prod_list_price,
t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT
(prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcat_desc, prod_category, prod_cat_desc,
prod_status, prod_list_price, prod_min_price)
VALUES
(s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory,
s.prod_subcat_desc, s.prod_category, s.prod_cat_desc,
s.prod_status, s.prod_list_price, s.prod_min_price);
1
2
4
The Table Function
0011 0010 1010 1101 0001 0100 1011
•
•
•
•
Reduces need for staging of large data sets
Pipe data into a select statement
Parallel execution supported
Also known as pipelining
1
2
4
Table Functions: The SQL
• Create TYPE definition for targets
0011 0010 1010 1101 0001 0100 1011
CREATE TYPE city_populations_row
AS OBJECT
(city_name VARCHAR2(9), census_year NUMBER, population NUMBER );
CREATE TYPE city_populations_table
AS TABLE OF city_populations_row;
• Create Cursor package
• Create TYPE definition for targets
CREATE OR REPLACE PACKAGE census_package
AS
TYPE pop_cursor_type IS REF CURSOR
RETURN city_populations_ext%ROWTYPE;
FUNCTION census_transform
(indata IN pop_cursor_type)
RETURN city_populations_table
PARALLEL_ENABLE (PARTITION indata BY ANY)
PIPELINED; END;
1
2
4
Select Using the Table Function
0011 0010 1010 1101 0001 0100 1011
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ APPEND PARALLEL (t,4) */
INTO city_populations t
SELECT *
FROM TABLE
(census_package.census_transform
(CURSOR(SELECT city_name, pop_1990,
pop_2000
FROM city_populations_ext)));
1
2
4
The OLAP Engine
Computers are composed of
nothing more than logic
0011 0010 1010 1101 0001 0100 1011
gates stretched out to the
horizon in a vast numerical
irrigation system
Stan Augarten
1
2
4
OLAP Definition
0011 0010 1010 1101 0001 0100 1011
• The FASMI:
F:
A:
M:
S:
I:
Fast
Analytical
Multi Dimensional
Shared
Information About the Data
1
2
4
Codd’s Rules and Features of
OLAP
0011 0010 1010 1101 0001 0100 1011
• Basic Features
F1: Multidimensional Views
F2: Intuitive Data Manipulation
F3: Accessible
F4: Batch and Interpretive Extraction
F5: OLAP Analysis Model
F6: Client Server Access/Web Access
F7: Transparency
F8: Multi-User Support
1
2
4
Codd’s Rules and Features of
OLAP
0011 0010 1010 1101 0001 0100 1011
• Special Features:
F9: Treatment of Non-Normalized Data
F10: Storing OLAP Results (not w/ Source)
F11: Standardization of Missing Values
F12: Possibility of Ignoring Missing Values
• Reporting Features
F13: Flexible Reporting
F14: Uniform Reporting Performance
F15: Adjustment to Type of Models
1
2
4
Codd’s Rules and Features of
OLAP
0011 0010 1010 1101 0001 0100 1011
• Dimension Control
F16: Generic Dimensionality
F17: Unlimited dimensions and
aggregations
F18: Unrestricted Cross-dimension
operations
1
2
4
Platform for Business Intelligence:
OLAP
0011 0010 1010 1101 0001 0100 1011
Data Warehousing
ETL
OLAP
Data Mining
Oracle9i
Oracle OLAP
Analysis-ready Oracle
database
Support for complex,
multidimensional queries
1
2
Development platform for
Internet-ready analytical
applications
4
Java OLAP API
Business Intelligence Beans
and JDeveloper
OLAP Application Platform
0011 0010 1010 1101 0001 0100 1011
BI Beans
Rapid application development
Analysis ready
JDeveloper
Oracle9i Application Server and Dev Suite
1
2
Java OLAP API
Predictive analysis functions
Oracle OLAP
Oracle9i Database
4
Scaleable data store
Integrated metadata
Summary management
SQL analytic functions
Key Concepts
0011 0010 1010 1101 0001 0100 1011
• OLAP in the RDBMS
–
–
–
–
–
–
–
–
Updated version of Oracle Express
Single RDBMS-MDDS process
Single data storage
Single security model
Single metadata repository
Single set of management tools
SQL based metadata APIs
SQL access and OLAP API access to relational
tables and analytic workspaces
1
2
4
OLAP in Oracle9i Release 2
0011 0010 1010
Java OLAP API
Application
1101
0001 0100
1011
Generic SQL
Application
SQL
OLAP API
‘Direct’ OLAP
application
PL/SQL
OLAP Process
SQL
via Table Function
Relational
Tables
Analytic
Workspace
Oracle9i Database
Source: Vlamis Software
1
2
4
Data Mining
1
The beginning of knowledge
is the discovery of something
we do not understand
Frank Herbert1
0011 0010 1010 1101 0001 0100 1011
2
4
Data Mining Overview
0011 0010 1010 1101 0001 0100 1011
Data Mining is a decision support process in
which we search for patterns of information
in data
Two Types of Traditional Analysis:
Confirmatory
Exploratory
1
2
4
The Data Mining Process
0011 0010 1010 1101 0001 0100 1011
Trends and
Validation
Affiliations &
Associations
1
Discovery
Data Mining
Predictive
Modeling
Forensic
Analysis
2
Conditional
Logic
Outcome
Predictive
4
Forecasting
Deviation
Detection
Link
Analysis
Data Mining Techniques
0011 0010 1010 1101 0001 0100 1011
Nearest
Neighbor
Data
Retained
Case-Based
Reasoning
1
Rules
Data Mining
Approaches
Logical
Decision
Trees
Distilled
Data
Cross
Tabulation
Belief Nets
Equational
2
4
Agents
Statistics
Neural
Nets
Data Retention Techniques
0011 0010 1010 1101 0001 0100 1011
• New data compared to existing information
• Proximity comparison between new and old
Compare
Entire Database
New Record
1
2
4
Results:
Top K
Neighbors
Pattern Distillations
0011 0010 1010 1101 0001 0100 1011
• You are looking for patterns in the data
• What patterns? How should they be
represented?
Y
Y
Y
X
X
Regression Line
Logical
Representation
1
2
4
X
Universal
Approximation
Decision Trees
0011 0010 1010 1101 0001 0100 1011
NV
NY
US
NYC
Other
FL
New
Record
BC
CAN
ON
Toronto
Other
QC
High
High
Average
Low
1
2
4
High
High
Average
Low
Neural Nets
0011 0010 1010 1101 0001 0100 1011
• Works like the brain
• Decisions are learned
• Inputs trigger neurons and they lead to decisions
1
Age
Accept
Loc
Sal
Decision
Emp
Inputs
Neurons
2
4
Decline
Oracle9i Data Mining
0011 0010 1010 1101 0001 0100 1011
• Data mining completely embedded in
Oracle9i database
– Simplified data-mining process
– Eliminates need for data-movement and
redundant data
• Java-based API
– Supports application integration
1
2
4
• Will comply with emerging standard API (JDM)
Oracle9i Data Mining
0011 0010 1010 1101 0001 0100 1011
• Key capabilities:
– Multiple algorithms
– Executed within the database
1
2
• Transactional Naïve Bayes
• Predictive Assocation rules
• Decision Trees via Adaptive Bayesian Network (Oracle9i,
Release 2)
• Clustering (Oracle9i, Release 2)
– Multiple prediction types
• Probability of specific outcome
• Most probable outcome
4
Thanks!
Questions and Comments
Presentation #31348
0011 0010 1010 1101 0001 0100 1011
Ian Abramson
Toronto, Ontario
416-407-2448
[email protected]
1
2
4
Related documents