Download Documenation - PivotalGuru

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

PL/SQL wikipedia , lookup

Transcript
Outsourcer 4.1.6
Administrator Guide
http://www.PivotalGuru.com
Document Date 2015-01-14
1
Contents
Overview of Outsourcer ............................................................................................................................ 5
New to 4.1.6 ............................................................................................................................................. 5
Bugs Fixed ............................................................................................................................................ 5
Installation and Upgrades ......................................................................................................................... 5
User Interface ........................................................................................................................................... 6
Starting and Stopping Outsourcer ........................................................................................................ 6
uistart ................................................................................................................................................ 6
uistop ................................................................................................................................................. 6
All Web Pages ...................................................................................................................................... 6
Login ..................................................................................................................................................... 6
Sources ................................................................................................................................................. 6
Create Jobs ....................................................................................................................................... 7
Jobs ...................................................................................................................................................... 7
Queue Job Link ................................................................................................................................. 8
Inserting/Updating Jobs .................................................................................................................... 8
Define New Job Link ......................................................................................................................... 9
Queue All Jobs Link .......................................................................................................................... 9
Delete All Jobs Link ........................................................................................................................... 9
Refresh Job Type .............................................................................................................................. 9
Append Job Type ............................................................................................................................ 10
Replication Job Type ....................................................................................................................... 10
Transform Job Type ........................................................................................................................ 11
DDL Job Type ................................................................................................................................. 11
Append-Optimized/Append-Only .................................................................................................... 11
Compression ................................................................................................................................... 11
Row/Column Orientation ................................................................................................................. 12
Queue ................................................................................................................................................. 12
Statuses .......................................................................................................................................... 12
Rerun Link ....................................................................................................................................... 12
Delete Link ...................................................................................................................................... 12
2
Cancel Link ..................................................................................................................................... 12
Schedules ........................................................................................................................................... 13
Assign to Jobs Link ......................................................................................................................... 13
Environment ........................................................................................................................................ 13
Scheduler Daemon Link .................................................................................................................. 14
Queue Daemon Link ....................................................................................................................... 14
Max Jobs Link ................................................................................................................................. 14
oFetchSize Link ............................................................................................................................... 14
Appendix ................................................................................................................................................ 14
Data Cleansing ................................................................................................................................... 14
UTF-8 .............................................................................................................................................. 14
Special Characters .......................................................................................................................... 15
Outsourcer Installation Files ............................................................................................................... 15
Environment Variables ........................................................................................................................ 15
Database Components ....................................................................................................................... 16
Datatypes ............................................................................................................................................ 17
External Tables ................................................................................................................................... 19
Source Connection .......................................................................................................................... 20
Creating an External Table ............................................................................................................. 20
Insert Only Model ................................................................................................................................ 20
Previous Version Notes .......................................................................................................................... 21
New to 4.1.5 ........................................................................................................................................ 21
Bugs Fixed ...................................................................................................................................... 21
Enhancements ................................................................................................................................ 21
New to 4.1.4 ........................................................................................................................................ 21
New to 4.1.3 ........................................................................................................................................ 21
New to 4.1.1 ........................................................................................................................................ 21
New to 4.1.0 ........................................................................................................................................ 22
4.0.2 .................................................................................................................................................... 22
4.0.1 .................................................................................................................................................... 22
4.0 ....................................................................................................................................................... 23
Open Source ................................................................................................................................... 23
User Interface .................................................................................................................................. 23
3
Scheduling ...................................................................................................................................... 23
Transform Jobs ............................................................................................................................... 23
SQL_Text ........................................................................................................................................ 23
Dependent Jobs .............................................................................................................................. 23
Dynamic Environment Variables ..................................................................................................... 23
3.1 ....................................................................................................................................................... 24
Oracle .............................................................................................................................................. 24
SQL Server ..................................................................................................................................... 24
DDL Refresh Type .......................................................................................................................... 24
3.0 ....................................................................................................................................................... 24
Security ........................................................................................................................................... 24
Enhancements ................................................................................................................................ 24
Fixes ................................................................................................................................................ 24
4
Overview of Outsourcer
Outsourcer automates all of the tasks needed to load data into Greenplum from SQL Server and Oracle
by leveraging Java and Greenplum’s External Web Tables. It automates everything for you. You
simply define the source to target mapping in a “Job”, and then submit the Job into a Queue to be
processed. It can be done completely from a web browser so no coding is needed.
New to 4.1.6
Bugs Fixed
1. Small bug fix for cancel feature on transform jobs. Multiple statement transform jobs were not
being cancelled properly.
2. Added missing documentation on new External Tables.
Installation and Upgrades
1. Download the os_x_x.zip file from http://www.pivotalguru.com, transfer it to your MASTER
(mdw) server in Greenplum, and then unzip it.
Files in os_x_x.zip: install.sh, os.zip, and version.sh.
2. Log in as root, chmod 755 the install.sh file if necessary and then execute:
./install.sh
3. The installer will set all environment variables for you but you can also adjust these optional
environment variables in your .bashrc file:
# optional (defaults below)
export UIPORT=8080
# optional (defaults below)
export XMS=128M
# optional (defaults below)
export XMX=256M
# optional (defaults below)
export OSJAR=$OSHOME/jar/Outsourcer.jar
# optional (defaults below)
export MSJAR=$OSHOME/jar/sqljdbc4.jar
# optional (defaults below)
OJAR=$OSHOME/jar/ojdbc6.jar
Be sure to source your .bashrc file after making these changes.
3. Follow additional instructions from installer to install the Oracle and SQL Server JDBC drivers.
Be sure to set the ownership of the JDBC jar files to be owned by gpadmin (or the admin you
picked during installation).
5
User Interface
Starting and Stopping Outsourcer
uistart
This is executed at the command line to start the User Interface.
uistop
This is executed at the command line to stop the User Interface.
All Web Pages
All pages have a Search bar, Limit for the number of records to show, sorting of every column, and
pagination (Previous and Next).
Login
Enter a valid username and password from the Greenplum database. The username you use must be
a “superuser” in the database. Authentication is determined by the AUTHSERVER environment variable
and the pg_hba.conf file configuration.
Sources
Clicking on the Sources tab, you see the list of configured Sources (Oracle and SQL Server
connections).
This is the view to update and validate the source. Validation attempts to connect to the source and it
will notify you if the connection is successful or not.
6
Create Jobs
This option will bulk create a Job for every source table. You can define the source schema, the target
schema (it will create the schema if it doesn’t already exist), the Refresh Type, and the Schedule.
Once you execute creating the Jobs, you can edit the Job definition from the Job page.
Jobs
A Job is a source to target mapping.
7
Queue Job Link
The Job screen also has a “Queue” button next to each Job. This will immediately place this Job into
the Queue to be processed.
Inserting/Updating Jobs
The Job definition shows the entire source to target mapping plus the Transform SQL, which is
executed at the end of the loading process and the Schedule if you want to define one.
8
Define New Job Link
Defines a new Job.
Queue All Jobs Link
Puts all jobs into the Queue except for ones that are already in a Queued or Processing status.
Delete All Jobs Link
Removes all Job definitions.
Refresh Job Type
A Refresh Job refreshes data in Greenplum from the source.
Refresh Process
Outsourcer Refresh Jobs do the following:
1. Connects to the Source and checks that it is valid
2. Creates Target schema if not found
3. Creates Target table if not found based on DDL from Source
4. Creates External Web Table
5. Truncates Target table
6. Insert all data from Source to Target
9
Append Job Type
An Append Job is similar to a Refresh Job but it also has a feature of only getting new data. This is only
useful for tables that never have DELETE or UPDATE statements executed on in the Source. A table
that contains Web Logs is a good example where an Append Job would be useful.
Append Process
Outsourcer Append Jobs do the following:
1. Connects to the Source and checks that it is valid
2. Creates Target schema if not found
3. Creates Target table if not found based on DDL from Source
4. Gets the MAX ID from Greenplum
5. Gets the MAX ID from the Source
6. If this is the initial load
a. Create External Web
7. If not the initial load, compare the two MAX ID columns and if don’t match
a. Create External Web Table with added filter of greater than Greenplum MAX value
Replication Job Type
This type of Job does Change Data Capture using Triggers in the Source. It is a complete solution that
does everything for you. You will not have to manually create database triggers, use files, etc. to
replicate data from Oracle and SQL Server to Greenplum.
Replication requires that the Source table has a Primary Key. If a key is not available, the Job will fail.
SQL Server Objects
Three triggers are created in the Source database to track INSERTS, UPDATES, and DELETES.
These changes are inserted into a new table in the Source that Outsourcer creates dynamically.
Oracle Objects
One trigger is create in the Source database to track INSERTS, UPDATES, and DELETES. These
changes are inserted into a new table in the Source that Outsourcer creates dynamically.
Replication Process
Outsourcer Replication Jobs do the following:
1. Connects to the Source and checks that it is valid
2. Creates Target schema if not found
3. Creates Target table if not found based on DDL from Source
4. Creates stage and archive tables in Greenplum if not found
5. Creates triggers and log table in Source if not found
6. If Job is configured to snapshot or if any of the required, dynamically created objects aren’t present,
force a snapshot.
7. If snapshot
a. Refresh the table
b. Recreate Archive and Stage tables plus the triggers and log table in the Source
8. If not a Snapshot
a. Get the MAX ID from the log table in the Source
b. Get the MAX ID from the Archive table in Greenplum
c. If the two don’t match
i. Load new changes to the stage table
ii. Apply changes with Outsourcer Replication function in Greenplum
10
iii. Archive changes from stage table to archive table in Greenplum
As you can see, the Replication Job is very powerful and complete. There are other Change Data
Capture solutions available from commercial vendors that read database log files directly rather than
using triggers. Unfortunately, those tools do not provide a complete solution like Outsourcer. You will
have to manually create all of the logic that is built into Outsourcer.
Note: Be sure VACUUM your replicated tables from time to time to prevent table bloat. You can also
use the sql_text value to VACUUM the table.
Transform Job Type
Transform Jobs are useful to manipulate data in Greenplum. This allows you to create an ELT or
Extract Load and Transform solution for Greenplum.
DDL Job Type
A DDL Job only creates the table in Greenplum based on the DDL from the source. It is similar to a
refresh Job but it doesn’t load any data.
DDL Process
Outsourcer Refresh Jobs do the following:
1. Connects to the Source and checks that it is valid
2. Creates Target schema if not found
3. Creates Target table if not found based on DDL from Source
Append-Optimized/Append-Only
All job types, except for Transform, now have the option to make tables Append-Only/AppendOptimized, Compressed, and Row or Column oriented. There are some exceptions to this.
1. HAWQ installations have the Append-Only/Append-Optimized option defaulted to on and the
ability to change this has been disabled. This is because HAWQ only supports Append-Only
tables.
2. Greenplum Database 4.2 and earlier is defined as “Append-Only”.
3. Greenplum Database 4.3 and later is defined as “Append-Optimized” as DELETE and UPDATE
statements are allowed.
4. Replication to Greenplum Database 4.2 has the Append-Only option removed.
5. Replication to HAWQ has been disabled and will be introduced in a future release.
It is highly recommended to use Append-Only or Append-Optimized tables as they use less space,
allow for compression, allow for column orientation, and generally speed up query performance.
Compression
Greenplum database and HAWQ both support quicklz and zlib compression algorithms. Quicklz is the
most common and provides a good balance of performance of compressing and decompressing data
while saving disk space. Zlib has the ability to specify a range of 1 through 10 for the level of
compression desired. Outsourcer simply uses the quicklz algorithm if you define a table to be
compressed. This was done to keep the user interface as simple as possible and because quicklz is
the most commonly used compression algorithm.
Note: Compression requires using Append-Only/Append-Optimized tables.
11
Row/Column Orientation
Greenplum database and HAWQ both support row or column oriented tables. This is useful for
reducing IO and improving compression for very wide and large tables. The default is for tables to be
row oriented.
Note: Column orientation requires using Append-Only/Append-Optimized tables.
Queue
This is the heart of the multi-threaded processing that is done in Outsourcer. Each Job in the Queue is
processed in a separate thread and records that date it was put in the Queue, when it actually started,
and when it completed. It also shows the duration, number of rows, the target table, and any error
message that might have happened.
Statuses
A Job goes through several statuses in the Queue.
1. Queued
2. Processing
3. Successful or Failed
Rerun Link
A Successful or Failed Job can be Rerun.
Delete Link
A Job in the Queue that hasn’t started to process yet can be removed and there will be a link for this.
Cancel Link
Jobs currently processing will have the ability to be cancelled.
12
Schedules
A Schedule defines how often Outsourcer will automatically put a Job into the Queue to be processed.
Several predefined schedules are included with Outsourcer and can be updated, deleted, or added.
Assign to Jobs Link
This is a quick way to assign a schedule to a group of Jobs based on the target schema.
Environment
View and update Environment settings.
13
Scheduler Daemon Link
This will Start and Stop the daemon that schedules Jobs.
Queue Daemon Link
This will Start and Stop the daemon that processes Jobs in the Queue.
Max Jobs Link
This is a dynamic setting which controls how many concurrent Jobs (threads) Outsourcer will execute.
Try adjusting this up or down based on the demand you can place on the source databases and how
much load you want in Greenplum for loading data.
oFetchSize Link
This is an Oracle specific setting for how many records are retrieved per fetch. The larger this value,
the less fetches are needed to get all of the rows but this requires more memory per fetch. There is
also a point of diminishing returns on making this value too large.
Appendix
Data Cleansing
Data sometimes contains special characters that break typical loading utilities and then require
manually editing files to correct the problem. This makes the entire load process much slower because
you will have a single threaded process be the bottleneck for your data loads. The other problem
encountered are the various code pages/character sets involved with the source database, source file
server, target file system, and target database (Greenplum). Using UTF-8 solves this problem.
UTF-8
Outsourcer uses Microsoft’s, Oracle’s, and Greenplum’s JDBC drivers to connect to the source and
target databases. “High-ASCII” values that are often times problematic are no problem for Outsourcer.
A common example of this is a smart quote from Microsoft. This loads easily to Greenplum with
Outsourcer.
14
Special Characters
Some characters are just special and need to be addressed beyond just the character set.
\
This is an escape character for the Greenplum loading utility. To fix this, Outsourcer escapes the
escape character.
|
The pipe is the delimiter and if this is found in your data, it will fail to load unless it is escaped. To fix
this, Outsourcer escapes the pipe.
\r and \n
This is carriage return and new line characters. These characters are replaced with a space so that
data will load and you will still be able to analyze the text.
\0
This is a “null” character. This is sometimes used to insert “nothing” into a NOT NULL column. This
character is replaced with an empty string.
Outsourcer Installation Files
/usr/local/os
/usr/local/os/LICENSE.txt
/usr/local/os/README.txt
/usr/local/os/bin/agentstart
/usr/local/os/bin/agentstatus
Symbolic link to installation
Software license
Read me file with basic information
Stops the Scheduler Agent
Retuns the status (up or down) of the
Scheduler Agent
Starts the Scheduler Agent
Starts Queue daemon
Returns the status (up or down) of the
Queue Daemon
Stops Queue daemon
Starts the UI
Stops the UI
/usr/local/os/bin/agentstop
/usr/local/os/bin/osstart
/usr/local/os/bin/osstatus
/usr/local/os/bin/osstop
/usr/local/os/bin/uistart
/usr/local/os/bin/uistop
/usr/local/os/jar/Outsourcer.jar
/usr/local/os/jar/OutsourcerScheduler.jar
/usr/local/os/jar/OutsourcerUI.jar
/usr/local/os/jar/gpdb.jar
/usr/local/os/jar/nanohttpd.jar
/usr/local/os/log/
/usr/local/os/os_install.sh
/usr/local/os/os_path.sh
/usr/local/os/sql/*
Outsourcer Jar file
Outsourcer Scheduler Jar file
Outsourcer UI Jar file
Greenplum JDBC driver
NanoHttpd Jar file
Log files are placed here for debugging
Install script for the database components
of Outsourcer
Sets the Outsourcer path and default
environment variables
Used by os_install.sh to install database
components
Environment Variables
These environment variables can be changed by setting the variables in your .bashrc file.
15
Variable
PGDATABASE
PGPORT
AUTHSERVER
UIPORT
XMX
XMS
OSJAR
MSJAR
OJAR
OSAGENTJAR
OSUIJAR
GPDBJAR
NANOJAR
Description
Name of the database
where Outsourcer is
installed
Port number of the
Greenplum instance
IP Address or Name of
the Master host where
users connect and are
prompted for a
password. Do not put
localhost. This may
require editing your
pg_hba.conf file too.
Port number for the Web
User Interface
Java Max memory
Java Initial memory
Location of the
Outsourcer Jar file
Location of the SQL
Server Jar file
Location of the Oracle
Jar file
Outsourcer Scheduler
Agent Jar file
Outsourcer UI Jar file
Greenplum JDBC driver
Nanohttpd Jar file
Default Value
(set during installation)
Optional
N
(set during installation)
N
(set during installation)
Y
8080
Y
256m
128m
$OSHOME/jar/Outsourcer.jar
Y
Y
Y
$OSHOME/jar/sqljdbc4.jar
Y
$OSHOME/jar/ojdbc6.jar
Y
$OSHOME/jar/OutsourcerScheduler.jar
Y
$OSHOME/jar/OutsourcerUI.jar
$OSHOME/jar/gpdb.jar
$OSHOME/jar/nanohttpd.jar
Y
Y
Y
Database Components
Outsourcer uses database objects to manage two main concepts. The first being a Job which contains
a source to target mapping. The second concept is a Queue. The Outsourcer Queue is like any other
Queue in which items are submitted and processed in multiple threads.
Type
Schema
Schema
Function
Name
ext
os
os.fn_cancel_job
Function
Function
Function
Function
Function
Function
Function
Function
Function
os.fn_create_ext_table
os.fn_get_variable
os.fn_queue
os.fn_queue_all
os.fn_replication
os.fn_replication_setup
os.fn_schedule
os.fn_start_schedule
os.fn_update_status
Description
External tables are created here by Outsourcer
Main schema used by Outsourcer
Function used to cancel a running job. Submit the job
id to the function to cancel
Creates an External Table
Used internally to get the value of a variable
Puts a Job into the Queue for a given Job ID
Puts all Jobs into the Queue
Used internally for a replication Job
Used internally for a replication Job
Used internally for scheduling Jobs in the Queue
Used internally for scheduling Jobs in the Queue
Used internally to manage the Queue status of a Job
16
View
os.job
Table
View
os.ao_job
os.queue
Table
View
os.ao_queue
os.ext_connection
Table
Ext Table
os.ao_ext_connection
os.agentstart
Ext Table
os.agentstatus
Ext Table
os.agentstop
Ext Table
os.osstart
Ext Table
os.osstatus
Ext Table
os.osstop
Ext Table
os.sessions
Ext Table
Ext Table
os.uistart
os.uistop
Job definition of a source to target mapping (hides
deleted and updated rows)
Table with job information
Contains the scheduled or executing Jobs (hides
deleted and updated rows)
Table with queue information
Contains external table connection information (hides
deleted and updated rows)
Table with external table connection information
Selecting from this table will start the background
Scheduler agent (/usr/local/os/agentstart)
Selecting from this table will get the status of the
background Scheduler agent
(/usr/local/os/agentstatus)
Selecting from this table will stop the background
Scheduler agent (/usr/local/os/agentstop)
Selecting from this table will start the background
Queue daemon (/usr/local/os/osstart)
Selecting from this table will get the status of the
background Queue daemon (/usr/local/os/osstatus)
Selecting from this table will stop the background
Queue daemon (/usr/local/os/osstop)
Sessions text file. Managed here rather than
overhead of maintaining inside a database table
Selecting from this table will start the User Interface
Selecting from this table will stop the User Interface
Datatypes
A key feature of Outsourcer is the automatic conversion of Oracle and SQL Server datatypes to
Greenplum datatypes. The following charts describe the conversion done by Outsourcer. Please note
that the chart does not include data types that have the same datatype name between the source and
Greenplum. Also, when a column is defined in the source with one of the unsupported datatypes, the
column is omitted from the table in Greenplum.
Oracle
Datatype
BFILE
Greenplum
Description
BINARY_DOUBLE
BINARY_FLOAT
BLOB
CHAR
CLOB
DATE
DECIMAL
Datatype
Description
Not supported by
Outsourcer
float8
float8
Single byte
characters
Large text
Sub-type of NUMER
that has 38 decimal
character(length)
text
timestamp
numeric
17
Not supported by
Outsourcer
Single and Multi-byte
characters
Single and multi-byte
large text
decimal is equivalent
to numeric in
digits
INT
LONG
LONG RAW
MLSLABEL
NCHAR
NCLOB
Sub-type of
NUMBER that has 38
digits
Only supported by
Oracle for backward
compatibility
De-supported by
Oracle
De-supported by
Oracle
Single and Multi-byte
characters
Single and multi-byte
large text
NUMBER
NVARCHAR2
RAW
ROWID
TIMESTAMP
TIMESTAMP WITH
LOCAL TIME ZONE
TIMESTAMP WITH
TIME ZONE
UROWID
VARCHAR
VARCHAR2
numeric
text
character(length)
text
numeric
Single and Multi-byte
characters
De-supported by
Oracle
Base 64 encoding of
the location
character
varying(length)
character varying(18)
Greenplum so
Outsourcer uses
numeric
Far larger than a
Greenplum integer or
bigint
Single and multi-byte
large text
Not supported by
Outsourcer
Not supported by
Outsourcer
Single and Multi-byte
characters
Single and multi-byte
large text
Precision and scale
does not reduce
overhead of storing
data
Single and Multi-byte
characters
Not supported by
Outsourcer
Convert this value to
text
timestamp
timestamptz
timestamptz
Universal Row ID
Single byte
characters
Single byte
characters
character
varying(length)
character
varying(length)
character
varying(length)
XMLTYPE
SQL Server
Convert this value to
text
Single and Multi-byte
characters
Single and Multi-byte
characters
Not supported by
Outsourcer
Greenplum
Datatype
bigint
binary
Description
Datatype
bigint
bit
char
boolean
Single byte characters character(length)
Description
Not supported by
Outsourcer
18
date
datetime
datetime2
datetimeoffset
date
timestamp
timestamp
timestamptz
decimal
numeric
float
hierarchyid
float8
image
int
money
nchar
ntext
nvarchar
sysname
time
timestamp
tinyint
uniqueidentifier
varbinary
integer
numeric
character(length)
character
varying(length)
float8
timestamp
smallint
numeric
real
smalldatetime
smallint
smallmoney
sql_variant
text
Not supported by
Outsourcer
Not supported by
Outsourcer
De-supported
datatype by Microsoft
Single and multi-byte
characters
De-supported
datatype by Microsoft
Includes timezone
offset
Numeric is
equivalent to
decimal
character
varying(length)
De-supported
datatype by Microsoft
Single and multibyte characters
Not supported by
Outsourcer
Single and multibyte characters
Not supported by
Outsourcer
Single and multibyte characters
Not supported by
Outsourcer
time
Not the ANSI
standard for date but
a binary field.
Not supported by
Outsourcer
smallint
character varying(36)
Not supported by
Outsourcer
Not supported by
Outsourcer
xml
External Tables
In addition to dynamically creating External Tables based on a source table, Outsourcer has the ability
to create permanent external tables sourcing data from SQL Server or Oracle and also using a user
defined SQL statement. This is a new feature starting with 3.0 and requires defining a source
connection and executing an Outsourcer function.
19
Source Connection
Define a connection on the Sources page and then use the ID from the page below for
p_connection_id.
Creating an External Table
The following function is used to create an external table.
os.fn_create_ext_table(p_table text, p_columns text[],
p_connection_id integer, p_sql_text text)
Parameter
p_table
p_columns
p_connection_id
p_sql_text
Description
Name of the External Table to be created. This can include the schema
name as well.
This is an array which contains the name of the column with the
Greenplum datatype per element.
This is the unique identifier from the os.ext_connection table that will be
used to connect to SQL Server or Oracle.
This is the SQL statement you wish to execute
Here is an example of using this function to create an External Table.
SELECT os.fn_create_ext_table('public.my_ext_table',
ARRAY['id int', 'fname varchar', 'lname varchar'],
1,
'SELECT id, fname, lname from os_demo.dbo.poc');
This will create the external table “my_ext_table” in the public schema with three columns and pull
data from SQL Server with the provided SQL statement. Notice for SQL Server you need to specify the
database.schema.table. The connection to SQL Server uses the default database connection so
you will need to fully qualify the database name if you want to extract data from other databases.
Insert Only Model
The Outsourcer catalog is stored in the os schema and contains tables prefixed with ao_ and views
with the same name but without the prefix. Example: table: os.ao_job and view: os.job. HAWQ
does not support UPDATE or DELETE commands from a table so Outsourcer has been designed to
only INSERT data into the ao_ tables. These tables have 2 additional columns, which is insert_id
and deleted.
When a row needs to be UPDATED, a new row is inserted into the ao_ table and then the view only
shows the greatest insert_id for the given logical primary key.
When a row needs to be DELETED, a new row is inserted into the ao_ table but with the deleted
column set to true.
The user interface and functions take care of this insert only model but if you need to manually edit
tables, be aware of how this works.
20
Previous Version Notes
New to 4.1.5
Bugs Fixed
1. Error messages raised from Oracle and SQL Server could include special characters which
would cause the thread in Outsourcer to die and the status remain labeled as “processing”.
Error messages no longer have escape and quote characters to prevent this problem.
2. On new installations, the permissions for the Oracle and SQL Server Jar files would remain
owned by root. This has now been corrected to be the admin user (gpadmin).
3. On the “Sources” screen, the label for Append-Only now is correct for Greenplum Database 4.3
and reflects that these are really “Append-Optimized” tables.
Enhancements
1. Added External Tables to enable starting and stopping the UI rather than having to ssh to the
Master host.
2. “Queue” screen now provides the ability to cancel processing jobs.
3. Stopping Outsourcer Queue Daemon (Environment Screen), now will cancel currently running
Jobs in the Queue.
4. Starting Outsourcer Queue Daemon (Environment Screen), now will make sure orphaned jobs
in the Queue will be marked as Failed and if any are currently running, it will cancel these jobs
too.
New to 4.1.4
•
•
•
Corrected Jar file installation instructions in os_install.sh
Added support for new SQL Server 2014 data types including bit, date, datetime2,
datetimeoffset, real, time, and tinyint
Removed ANALYZE from table loading because it is handled automatically by
gp_autostats_mode=on_no_stats (default) or on_change. If you are using none, you can still
ANALYZE the table using the sql_text value.
New to 4.1.3
•
•
•
Added ability to specify the User Name and UI Port during the install.
Huge UI performance increase by using PGPoolingDataSource and by managing sessions with
in a file rather than in a table.
Fixed a bug in the installer where the AUTHSERVER environment variable wasn’t being set in
the .bashrc file correctly.
New to 4.1.1
•
•
Fixed a bug in the Create External Table function used to create static external tables. This was
introduced when refactoring code and using SourcePort rather than port for the column names.
Fixed a bug in the Java code to execute static External Tables.
21
New to 4.1.0
•
•
•
•
•
•
•
•
•
Added support for HAWQ for Append, Refresh, Transform, and DDL refresh types. Replication
to be added in the future.
Added ability to make target tables Append-Only (<= GPDB v4.2) or Append-Optimized (>=
GPDB v4.3).
Added ability to make target tables Compressed (quicklz) for Append-Only/Append-Optimized
tables for GPDB and HAWQ.
Added ability to pick row or column orientation for Append-Only/Append-Optimized tables for
GPDB and HAWQ.
Append-Optimized tables allow for DELETE statements (>= GPDB v4.3) while Append-Only
tables (<= GPDB v4.2) do not. Replication jobs in GPDB v4.2 will not give the ability to pick
Append-Optimized tables.
All Outsourcer tables are now Append-Optimized/Append-Only.
New installer makes installation much simpler with automatic backups of the os schema,
automatic upgrades from previous versions, automatic editing of configuration files, and
automatic starting of services.
Removed the automatic creation of primary keys on all tables.
Fixed bug where an extra database session was kept open while tables were being loaded.
4.0.2
•
•
•
•
•
Corrected Open Source license file. It is now using a BSD license and the NanoHTTPD license
(web server).
Corrected install.sql file that incorrectly had $BODY$ for one function in the declaration.
Corrected cookie conflict with Command Center.
Reduced the number of database calls when using the Web interface.
Removed updates from the os.sessions table for the Web interface.
4.0.1
•
•
•
•
•
•
•
Changed the default Oracle Fetch Size from 40,000 to 2000. 2000 seems to be the point of
diminishing returns for any values greater. The Oracle default is only 10, which makes
exporting data very slow. The higher the value, the more memory is needed and the data
exporting goes faster. But a fetch size of more than 2000 doesn’t improve performance but it
does consume more memory.
Separated the Greenplum JDBC driver into a separate JAR file so that it is possible to upgrade
just this driver.
Separated the User Interface classes into a separate JAR file so it needs less memory.
Separated the Scheduler classes into a separate JAR file so it needs less memory.
Separated the NanoHttpd classes into a separate JAR file so it can be upgraded independently
of Outsourcer.
Fixed a bug in the UI where the SQL Text Box wasn’t visible when creating a Transform Job.
Fixed a bug in the UI where quotes weren’t escaped properly for SQL entered into the SQL Text
Box.
22
4.0
Open Source
• Outsourcer is now open source with the Boost Software License and the NanoHTTPD license
(web server).
User Interface
• An all-new User Interface has been built allowing you to manage Sources, Jobs, the Queue,
Schedules, and the Environment.
• No separate web server is needed. Everything you need for the User Interface is included.
• Dynamic Job creation based on a source schema.
• Start/Stop the Queue and Scheduler.
• Update the dynamic variables in the Environment.
• Security is handled by the database authentication method so it leverages the pg_hba.conf
file.
Scheduling
• Jobs can be scheduled to execute on a recurring basis with many predefined schedules
included.
• Separate daemon process for the scheduler agent.
Transform Jobs
• In previous versions, a transform Job would not execute unless there weren’t any other Job
types Processing or Queued. This was done to force all data to be loaded before it is
transformed. In other words, Extract, Load, and then Transform. This restriction has been
removed and will execute based on the queue_date just like the other Job types.
SQL_Text
• The sql_text column in the Job table was ignored for all Jobs types except transform for
previous versions. This is now available for all Job types and it is execute as the last step.
Dependent Jobs
• Dependent jobs can not be achieved by updating the sql_text field of a Job and adding “select
os.fn_queue(12)” which will put job.id = 12 in the Queue after the main Job completes. You can
also add multiple Jobs to kick off by using multiple SQL statements that are separated with a
semicolon.
Dynamic Environment Variables
• The Oracle fetch size and the maximum number of concurrent Jobs are now dynamically set
and no longer set in the .bashrc file.
23
3.1
Oracle
• FetchSize is now configurable. To minimize round trips to Oracle, make this setting rather large
(greater than the default of 10) which increases exporting speed but at the expense of needing
more RAM. Adjust this setting up or down based on your environment. Default is 2000. There
is a point of diminishing returns when setting this higher than 2000 and the larger this value, the
more RAM will be needed by Java.
SQL Server
• Fix to better handle non-default schemas
DDL Refresh Type
• Several customers have requested for Outsourcer to just create the tables in Greenplum based
on the DDL in the source without loading the tables. This new Refresh Type does just that.
3.0
Security
• The External Tables created by Outsourcer no long contain the username and password for the
source connection.
• Oracle sources now use ALL_% objects rather than DBA_% objects. This means you don’t
need to grant DBA or SELECT on the DBA Views in Oracle to use Outsourcer.
• TRUST authentication is used to start Outsourcer so no password is stored.
Enhancements
• Oracle connections are now faster and use less memory than with 2.x.
• New External Table feature for defining a SQL statement to execute in Oracle or SQL Server to
be the source of an External Table.
• Better examples of the Job types for Oracle and SQL Server
Fixes
• When a Job fails and under certain conditions, Outsourcer could leave a database connection
open to the source. Outsourcer now properly closes the connection on Failed Jobs.
• If multiple Jobs executed at the same time, it was possible for more than 1 Job to attempt to
create the target schema at the same time. This no longer is a problem.
24