Download Rdb Continuous LogMiner and the JCC LogMiner Loader

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Database model wikipedia , lookup

Transcript
Rdb Continuous LogMiner
and the JCC LogMiner
Loader
A presentation of MnSCU’s use of
this technology
Topics
•
•
•
•
Overview of MnSCU
Logmining Uses
Loading Data
Bonus: Global Buffer implementation and
resulting performance improvements
Overview of MnSCU
Overview of MnSCU
• Minnesota State College and University System
– Comprised of 32 Institutions
•
•
•
•
State Universities, Community and Technical Colleges
53 Campuses in 46 Communities
Over 16,000 faculty and staff
More than 3,600 degree programs
– Serves over 240,000 Students per year
• Additional 130,000 Students in non-credit courses
– About 30,000 graduates each year
ISRS
• MnSCU’s Primary Application
– ISRS: Integrated State-wide Record System
• Written in Uniface (4GL), Cobol, C, JAVA
– 2,000+ 3GL programs; 2,200+ 4GL forms
– Over 2,900,000 lines of code
MnSCU’s Rdb Topology
North Region
9 Institution Dbs
1 Regional Db
South Region
8 Institution Dbs
1 Regional Db
Metro Region
Central Region
7 Institution Dbs
1 Regional Db
1 Central Db
13 Institution Dbs
1 Regional Db
Development
20+ Dvlp/QC/Train
Dbs
•Each Institution Db has over 1200 tables
•Over 900,000,000 rows
•Over 1 Terra-byte total disk space
•Over 20% Annual Data Growth
Production Users
• Each regional center supports:
– Between 500 and 1,000 on-line users (during
the day)
– Numerous batch reporting and update jobs
daily and over-night
– 25,000+ Web transactions each day 24x7
• Registrations, Grades, Charges (fees),
On-line Payments, etc…
LogMining
LogMining Modes
• Static
– The Rdb LogMiner runs, by default, as a
stand-alone process against backup copies of
the source database AIJ files
• Continuous
– The Rdb LogMiner can run against live
database AIJs to produce a continuous output
stream that captures transactions when they
are committed
LogMining Uses
• Hot Standby (replication) replacement
• Mining a single Db to multiple targets
• Mining to Non-Rdb Target(s)
– XML, File, API, Tuxedo, Orrible
• Mining multiple Dbs to a single target
• Minimizing production Db maintenance
downtime
LogMining at MnSCU
The combination of the Rdb Continuous LogMiner
and the JCC LogMiner Loader allow us to:
• Distribute centrally controlled data to multiple
local databases
• Replicate production databases into multiple
partitioned query databases
• Roll up multiple production databases into a
single data warehouse
• Replicate production data into non-Rdb
development databases to support development
of database-independent application
The Big Picture
REGIONALDB
WAREHOUSE
CLM w/PK
4 sessions
REGIONALDB
CENTRALDB
REGIONALDB
CLM w/PK
37 sessions
CC_SUMMARY
Combined (all 37)
MV/DJ
Other Data
Warehouse
Structures
REGIONALDB
ISRS
36 Standby Rdb
Reporting Dbs
HS
NWR db
NTC
FFC
37 Production
36 Dbs Rdb DBs
CLM w/DBK
4 sessions
CLM w/DBK
37 sessions
CLM w/FilterMaps
DBK+RC_ID
37 sessions
37 Schemas
Each with 238
Tables
MVs to
Combined
Tables
MnOnline
Application
Tables (20+)
MnOnline Application
Read-Only 24 x 7
MV/DJ
MNSCUALL
Schema
5 Tables
Combined
(all 37)
Other Data
Warehouse
Structures
MVs
MVs to
Individual
Schemas
REPL
MVs
TRF
CORE Tables
Combined
(all 37)
BTC
VAL Tables
1 central
copy
37 Schemas
Each with 238
Tables
Future
Ad-Hoc Users
Read-Only
Replication Replacement
• Historically we’ve used Rdb’s Hot Standby
feature to maintain reporting Dbs for users
(ODBC and some batch reports)
• However, of the 1200+ tables in
production ISRS Dbs, we found users only
use 260 tables for reporting from the
standby Dbs
• Batch reports were found to use another 222 tables
• With Logminer, we can maintain just this subset of tables
(482) for reporting purposes
37 Production
Rdb DBs
NTC
FFC
TRF
BTC
NWR db
Mining Single Db to
Multiple Targets
• We’ve begun combining
institutions in some of our
production databases
• Introduced a new column called
RC_ID to over 700 tables for rowlevel security
• Row security provided by views
• However, the reporting databases still needed to
be separate so users wouldn’t have to change
hundreds of existing queries to include RC_ID
37 Production
Rdb DBs
NTC
FFC
TRF
BTC
NWR db
‘CORE’ Tables
• ‘CORE’ tables contain common data for 37 Production
Rdb DBs
MnSCU
• Tables like PERSON, ADDRESS, PHONE,
etc. that can be combined across all
institutions
• These tables incorporate a new surrogate
key called MNSCU_ID which is unique
across all institutions
• Users do not access ‘CORE’ tables directly
NWR db
‘CORE’ Table Views
• ‘CORE’ data is presented to
users via views that join RC_ID
(an institution-specific value) and
TECH_ID (the original surrogate
key) to determine MNSCU_ID,
via a third ‘REPORTING’ table
• All ‘CORE’ tables (PERSON,
ADDRESS, PHONE, …) are
accessed via the views
37 Production
Rdb DBs
NWR db
CR_REPORTING:
Columns:
------------------MNSCU_ID
TECH_ID
RC_ID
View PERSON:
Columns:
---------------TECH_ID
RC_ID
Rest of Attributes
CR_PERSON:
Columns:
---------------MNSCU_ID
Rest of attributes
Query Performance
• Row-Level security in our production dbs
(‘CORE’ table views) have increased the query
tuning challenges
• Implemented better reporting performance in
target db by using a filtering-trigger into a table
the users use, rather than using the same views
as Production
• Reporting Db has different indexes and uses
row-caching to maximize query performance
‘CORE’ Triggers
Production
CR_REPORTING
MNSCU_ID
RC_ID
TECH_ID
Reporting
CR_RC_ID
RC_ID
CR_REPORTING
AND Exists by
RC_ID
If Exists by
MNSCU_ID
CR_PERSON
MNSCU_ID
CR_PERSON
AFTER INSERT
Insert Into
PERSON
Non-’CORE’ Tables to
Multiple Targets
Reporting Dbs
Production Db
Logminer w/ filtering for RC_ID=‘0142’
0142
Logminer w/ filtering for RC_ID=‘0263’
0263
Logminer w/ filtering for RC_ID=‘0215’
0215
Logminer w/ filtering for RC_ID=‘0303’
0303
NWR
0142
0263
0215
0303
Eliminating Remote
Attaches
• We also have some centralized
data, like ‘codes tables’
• Some jobs require reading this
centralized data while processing against
each Production ISRS database
REGIONALDB
CLM w/PK
4 sessions
REGIONALDB
CENTRALDB
REGIONALDB
REGIONALDB
– Forces a remote attachment to the central db
• With Logminer, we can maintain regional copies
of this data so these jobs do not have to use
remote attaches to get data
Non-Rdb Target
• Since we have 37 separate institutional
databases (and reporting databases),
getting combined data for system-wide
reporting was difficult
• With Logminer we can combine data from
each of the production ISRS databases
into one target, in this case our Oracle
warehouse
ISRS
37
Production
Rdb DBs
NWR
db
CLM w/FilterMaps
DBK+RC_ID
37 sessions
MNSCUALL
Schema
5 Tables
Combined
(all 37)
Non-Rdb Target
• With Logminer we can mine tables from
each of our Production ISRS databases
into individual Oracle Schemas
• This allows us to test our application
against a different DBMS
– Structure and data are the identical
• Keeps data separate for institutional
reporting
37 Production
Rdb DBs
NWR
db
CLM w/DBK
37 sessions
ISRS
37 Schemas
Each with 238
Tables
MnSCU’s Oracle Topology
ISRS Instance
37 Institution Schemas
(238 tables each)
1 Combined Schema
(5 tables)
1 VAL Schema
(Codes Tables)
Other Reporting Structures
WAREHOUS Schema
Several Warehouse
Schemas
Development
Schema
•Over 800,000,000 rows
(And growing rapidly)
•Over 500 Gb total disk space
•Various ETL
LogMining Scope at
MnSCU
• Sessions with Rdb Targets:
– NWR: 482 tables from 1 source to 4 Rdb targets
(4 sessions; Example of mining 1 to Many)
• In these sessions we are separating data from a combined institutional
database into separate reporting databases for each institution
• Triggers used to populate base tables from production ‘Core’ tables
– CENTRLDB: 3 tables from 1 source db to 4 target Rdb dbs
• In this session we are taking centralized data and placing copies of
it on our regional servers (allows us to maintain these 3 tables
centrally without changing our application which reads the data
locally)
LogMining Scope at
MnSCU
• Sessions with Oracle Targets:
– ISRS: 238 tables from each of 37 source dbs to 37 Oracle
schemas
(37 sessions; Non-Rdb target; example of mining many to many)
• These sessions allow us to create exact copies of our databases in Oracle
• From this Oracle data materialized views are built to provide value-added
reporting ‘data-marts’
• Allows us to shift our reporting focus to Oracle while continuing to base
production on Rdb
• Also allows us to work on converting our application to use Oracle without
impacting production or having to do a cold-switch
– MNSCUALL: 5 tables from each of 37 source dbs to 1 Oracle
schema
(37 sessions; Example of mining Many sources to 1 target)
• These sessions allow us to build a combined version of data
• These are some of our larger tables, with over 250,000,000 rows
• Total of 9,513 tables being mined by 119
separate continuous LogMiner sessions!
Session Support
• To support so many sessions we’ve developed a
naming convention for sessions
• Includes a specific directory structure
• Built several tools to simplify the task of
creating/recreating/reloading tables
• Some of the tools are based on the naming
convention
• Currently our tools are all DCL, but better
implementations could be made with 3GLs
Source Db Preparation
• The LogMiner input to the Loader is created when the
source database is enabled for LogMining
• This is accomplished with an RMU command. The
optional parameter ‘continuous’ is used to specify
continuous operation:
$ rmu/set logminer/enable[/continuous] <database name>
$ rmu/backup/after/quiet <database name> ...
• Many of the procedures included with the Loader kit rely
on the procedure vms_functions.sql having been applied
to the source database:
SQL> attach ‘filename <source database>’;
SQL> @jcc_tool_sql:vms_functions.sql
SQL> commit;
Target Db Preparation
• Besides the task of creating the target db
and tables itself, there are many ‘details’ to
attend to depending upon target db type
• For Oracle targets, Rdb field and table
name lengths (and names) can be an
issue (and target tablespaces too)
• One thing in common: the HighWater table
– Used by the loader to keep track of what has
been processed
Minimizing Production
Downtime
• Basic Steps:
– Do an AIJ backup
– Create copy of production Db
– Perform restructuring / maintenance / etc on Db copy
• This could take many hours
– Remove users from Production Db
– Apply AIJ transactions to Db copy using LogMiner
• This step requires minimal time
– Switch applications to use Db copy – This is now the
new production Db!
Loading Data
Loading Data
• 3 Methods to accomplish this:
– Direct Load (RMU, SQLLOADER)
• Could impose data restrictions by using views
• Can configure commit-interval
– LogMiner Pump
•
•
•
•
Use a no-change update transaction on source
Allows for data restrictions
Commit-interval matches source transaction
Consumes AIJ space
– JCC Data Pump
• Configurable to do parent/child tables, data restrictions,
commit-interval and delay-interval to minimize performance
impact
• Consumes AIJ space
Loading Data Example
• Loaded a table with 19,986 rows with
SQLLOADER
– This took about 13 minutes, no AIJ blocks,
some disk space for .UNL file
• Used LogMiner to ‘pump’ the rows via a nochange update
– This took about 10 minutes; 20,000 AIJ-blocks
Pumping Data Example
• Using the JCC Data Pump
– About 22 seconds to update source data:
– Only 3 minutes to update target data!!
Loader Performance
• SCSU_REPISRS session
– From CLM log:
17-NOV-2004 16:15:42.26 20234166 CLM SCSU_REPISR
records written (36048 modify, 333 delete)
Total : 36381
– The 3 processes themselves:
• CTL: 14.6 CPU secs / 1667 Direct IO
• CLM: 1 min 11.4 CPU secs / 67,112 Direct IO
• LML: 7 min 55.7 CPU secs / 67 Direct IO
– After 11:15 hours of connect time, this is about
1.6 Direct IO per second average
– Processed about .9 records per second average
Heartbeat
• Without Heartbeat a session can become
‘stale’
• AIJ backups can be blocked
• With Heartbeat enabled this does not
occur
– Side-affect is that ‘trailing’ messages are not
displayed with heartbeat enabled
Bonus: Global Buffers
• Despite great performance gains from Row
Cache over the past couple of years, we still
were anticipating issues for our fall busy period
• We turned on GB on many Dbs
– On our busiest server, we enabled it on all dbs
– On other servers, we have about 50% implementation
• Used to run with RDM$BIND_BUFFERS of 220
• Estimated GB at max number of users@200 ea
Global Buffers
• Prior to implementing GB, our busiest
server was running at a constant 6,0007,000 IO/sec
• Other servers were running around 3,000
but had spikes to 7,000 or more
• Global buffers both lowered overall IO, as
well as eliminated spikes
• Cost is in total Locks (Resources)
– Increase LOCKIDTBL and RESHASHTBL
Before GB
METE
7,000
6,000
5,000
10-11 IO
4,000
2-3 IO
3,000
10-11PRC
2,000
2-3 PRC
1,000
28-Jul
1 w/Gb
2-Aug
1 w/Gb
3-Aug
1 w/Gb
28
-Ju
3-A l 1
ug
w/
Gb
5-A
1
u
w/
G
9-a g
ug 7 w b
/G
11
b
9
-au
w
/g
13 g
-au
9w b
/g
17 g
-au 11 b
w
19 g 1 /gb
-au
1w
/g
23 g
-au 11 b
w/
25 g
g
-au 11 b
w
27 g 1 /gb
-au
1w
g
/g
30
-au 11 b
w/
g
2-s
1 gb
ep 1 w
/
7-s
11 gb
ep
w/
g
9-s
11 b
e
w/
13 p
g
-se 11 b
w
15 p 1 /gb
-se
1w
p
/g
17
-se 11 b
w
21 p 1 /gb
-se
1w
p
/
11 gb
w/
g
23 b
-S
e
27 p
-S
e
29 p
-S
ep
1Oc
t
5Oc
t
7Oc
t
After GB
METE
7,000
6,000
5,000
4,000
3,000
2,000
1,000
10-11 IO
2-3 IO
10-11 PRC
2-3 PRC
9-
-J
g
11
-a
ug
13
-a
ug
17
-a
ug
19
-a
ug
23
-a
ug
25
-a
ug
27
-a
ug
31
-a
ug
2se
p
7se
p
9se
p
13
-s
ep
15
-s
ep
17
-s
ep
21
-s
ep
g
Au
g
ul
Au
au
5-
3-
28
b
b
b
gb
G
G
G
w/
w/
w/
w/
O
O
O
157-
ct
ct
ct
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
4
w/
gb
23
-S
ep
27
-S
ep
29
-S
ep
2
1
1
0
0
After Gb
MNSCU1
6,000
5,000
4,000
10-11 IO
3,000
2-3 IO
10-11 PRC
2,000
2-3 PRC
1,000
-
Resources
CPU
• Since we are now using less IO, more
CPU is available
• Before:
SDA> lck show lck /rep=5/int=10
23-AUG-2004 14:28:48.80 Delta sec: 10.0
Ave Req: 39875 Req/sec: 15656.9 Busy:
23-AUG-2004 14:28:58.80 Delta sec: 10.0
Ave Req: 30166 Req/sec: 20289.7 Busy:
Ave Spin: 24005
62.4%
Ave Spin: 19121
61.2%
• After:
24-AUG-2004 11:44:01.90 Delta sec: 10.0
Ave Req: 13439 Req/sec: 38043.3 Busy:
24-AUG-2004 11:44:11.90 Delta sec: 10.0
Ave Req: 15514 Req/sec: 31109.1 Busy:
Ave Spin: 12846
51.1%
Ave Spin: 16629
48.3%
Lock Rates
• Reducing these numbers to Lock
Operations per 1% of CPU time yields:
– Before: 299
– After: 573
For More Information
• [email protected]
• (218) 755-4614
Q U E S T I O N S
&
ANSWERS