Download Getting connected with your DATA: Using SAS/CONNECT® and SAS/ACCESS® to work with data housed in a remote environment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

SQL wikipedia , lookup

Microsoft Access wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Relational model wikipedia , lookup

Oracle Database wikipedia , lookup

Clusterpoint wikipedia , lookup

PL/SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database model wikipedia , lookup

Transcript
Getting connected with your DATA: Using SAS/CONNECT® and SAS/ACCESS® to work with
data housed in a remote environment
Kevin Delaney, New York State Office of Mental Health, Albany, New York
Abstract
SAS/CONNECT :
This paper will provide an overview of
SAS/CONNECT and SAS/ACCESS software for
accessing and manipulating databases located in a
remote environment. Using the example of an
Oracle database on a remotely located Unix server,
the author will demonstrate many of the main
features of SAS/Connect and SAS/Access.
SAS/Connect topics to be covered include:
connecting to the remote server, submitting SAS
code remotely, and moving data back and forth
between the server and client. SAS/Access topics to
be covered include: interfacing with the data using
both the libname statement with SAS/Access
specific options and running a query against the data
using the SQL pass through facility. Issues of
efficiency and practicality will also be discussed.
The introduction to the SAS/CONNECT User's
GuideTM tells us that SAS/CONNECT is a "SAS-toSAS client/server toolkit." What exactly does this
mean? SAS/CONNECT software can be used to
connect to a SAS session running on a remote
server, to transfer data between environments, and to
process data on the remote server. I will attempt to
address the multitude of methods that
SAS/CONNECT provides for accomplishing these
tasks within this paper. SAS/CONNECT also
supports SCL commands and SAS/AF applications
that allow for remote messaging, linking of objects
on different platforms, and running of scheduled
applications for routine updates from one server to
another, but I will not cover these topics here.
Getting Connected
Introduction
At this company, many of our large data sets are
housed in an ORACLE relational database, on a
Unix server. In order to access these data from our
local area network, and work with them in SAS, we
had to become familiar with the SAS/CONNECT
and SAS/ACCESS software packages. Like most
SAS® software, within these packages there are
many different ways to reach the same goal. Once
you become familiar with several of these methods,
the only challenge is to figure out which method is
most appropriate for a given circumstance. This
paper will attempt to walk through several of the
more common utilities of SAS/CONNECT and
SAS/ACCESS software, and hopefully clarify which
methods are most efficient and most practical.
Throughout this paper I will stick with what I know,
I will use examples involving a SAS/CONNECT
session with a Unix server, and the SAS/ACCESS
interface with ORACLE relational databases. For
those of you who know more about other operating
systems, or other database management systems, I
hope you will find that my examples are adaptable to
your host/entity of choice.
There are several methods within SAS/CONNECT
that can be used to actually connect to a remote
environment. In the Windowing environment you
can use the SIGNON window to connect to the
remote host.
Select RUN from the toolbar, then SIGNON
Figure 1: Select Run and SIGNON from the Display
Manager Pull down menus.
This gets you the following SIGNON menu:
For my example I am using a script call
tcpunix2.scr, to connect to the remote host Unixdata,
using the TCP/IP access method. These are the only
three lines that need to be filled in, as the NOTE on
the bottom of the window states, leaving a field
blank will default to the current setting. The only
other item you might want to change is whether or
not remotely submitted commands execute
synchronously, but we will discuss this more fully in
a minute.
Figure 2: SAS/CONNECT SIGNON window
SAS/CONNECT ships with a number of script files
that establish the connection between SAS on the
local host and SAS on the remote host. These are
specific to the remote host, but can be modified from
their standard form. You can also write your own
script file, instructions for doing so are included in
the SAS/CONNECT User's Guide, Version 8TM. By
default these script files are stored in the
!SASROOT\CONNECT\SASLINK
folder in Windows. SAS will also look for script
files in your SASUSER folder. An example of the
default TCPUNIX.scr that ships with SAS is
attached to this paper, as well as my modified
TCPUNIX2.scr, see if you can recognize the
changes. As you might guess you can have a lot of
fun with the script files, if you are so inclined.
SAS/CONNECT supports several different access
methods that are operating system dependant. All of
my examples will involve the TCP/IP access method
for communication between Unix and Windows. I
will not say a whole lot about access methods other
than to mention that you need to use one that may be
used by both the local and remote hosts. For a more
in depth discussion see: Communications Access
methods for SAS/CONNECT and SAS/SHARE
software, Version 8TM.
With this information, and the name of the remote
host onto which you would like to connect, you can
then sign on to your remote SAS session using the
SAS/CONNECT SIGNON window pictured in
Figure 2. You would place the name of the script
file you would like to use in the first line, the remote
session's name in the second and your
communications access method in the third.
If you prefer a more programmatic approach when
signing on to the remote host, the syntax is equally
easy to grasp. In SAS Version 8, you need only
associate the fileref RLINK with your script file and
then issue the SIGNON command. For my example:
filename rlink 'tcpunix2.scr';
signon unixdata;
Passing SAS statements to the remote host
Now that we are signed onto SAS "up" on Unix, lets
send some SAS commands through and see how it
works. To send SAS statements to a remote host
you need only bracket your normal SAS code with
an RSUBMIT; - ENDRSUBMIT; block. For
instance:
rsubmit;
libname myunix '/home/myunixdir';
endrsubmit;
Creates the LIBRARY MYUNIX within the session
of SAS executing on Unix, and then returns the log
from this remote session to your local SAS Log. (If
we had done something that produced output, the
output would also be directed down to the local
output window.)
Remote Library Services
SAS/CONNECT also offers the ability to create a
local library that refers to files on the remote session
using the REMOTE engine. This is useful if you
wish to use the Explorer window to look at the SAS
data sets housed in your remote directories. The
syntax to create a LOCAL libref to the same
directory as our MYUNIX LIBRARY "up there"
would be:
libname mylocux '/home/myunixdir'
server=unixdata;
Once you have set up this remote libref you can then
manipulate data on the remote host without
wrapping it in an RSUBMIT; - ENDRSUBMIT
block. For example:
proc contents data=mylocux.set1
out=mylocux.set1contents;
run;
If you happen to know the directory you have been
assigned on the remote host this works well, but
what about viewing the work directory? You can
use the SASHELP.VMEMBER data set view on
your remote host to set up a local libref to your
remote WORK library:
rsubmit;
data findwork;
x=1;
run;
data find2(keep=path);
set sashelp.vmember;
if Upcase(memname)='FINDWORK';
run;
proc download data=find2 out=finduxwork;
run;
endrsubmit;
data _null_;
set finduxwork;
call symput("workdir",trim(path));
run;
%put &workdir ; *to make sure it worked;
push to your local SAS session. OUT=data set name
is the name of the data set that will reside in the local
session. In this case the procedure copies the data
set FIND2 from Unix down into the data set
FINDUXWORK on our local SAS session. This
data set is then used to create the MACRO variable
WORKDIR, and a remote library ref to WORKDIR
is established. This seems like a lot of work, but it
actually executes in tenths or hundredths of seconds,
and then allows you to use the local EXPLORER
WINDOW to look at data sets on the remote server,
rename them interactively, and even move them to
other referenced libraries on either host.
You can use the remote library reference as you
would any other library reference, so you can SET
data on the remote host, and use it to create a local
data set, you can use PROC PRINT to print data
from the remote host, and well, you get the point.
However, this is often not the most efficient way to
use the SAS/CONNECT product. For example, lets
look at the following code:
HEAT # 1
data work.test;
set unixwork.smallset;
run;
rsubmit;
VS.
proc download data=work.smallset
out=work.test2;
run;
endrsubmit;
libname unixwork "&workdir"
server=uxdata2;
Notice we are looking for the Unix WORK library
so we need to SET SASHELP.VMEMBER from
Unix, by using an RSUBMIT with our data set. For
those of you who have not used the VMEMBER
data set view in the past, it contains the attributes of
all the data sets currently referenced in your SAS
session. By creating a dataset in the WORK library
and then selecting the variable path for that data set,
we obtain the full path of our current WORK library.
This example also adds a new SAS/CONNECT
procedure. PROC DOWNLOAD, and its partner in
crime PROC UPLOAD, are SAS/CONNECT
procedures that perform data transfer. The syntax
for the procedures really is as easy as it looks. For
PROC DOWNLOAD DATA= data set name refers
to the data on the remote host which you wish to
Heat #2
data unixwork.test;
set localref.smallset;
run;
VS.
rsubmit;
proc upload data=localref.smallset
out=work.test2;
run;
endrsubmit;
Heat # 3
proc format library=work
cntlout=unixwork.fmts;
run;
rsubmit;
proc format library=work cntlin=work.fmts;
run;
endrsubmit;
VS.
proc format library=work
cntlout=work.fmts;
run;
rsubmit;
proc upload data=work.fmts
out=work.fmts2;
run;
proc format cntlin=fmts2;
run;
endrsubmit;
I am not sure where the word HEAT comes from, but
definition 10a in my dictionary does state " One
round of many in a sporting competition, such as a
race."
This example pits remote library services against
PROC DOWNLOAD/UPLOAD in a little contest to
see who is faster. With relatively small numbers of
observations, and particularly with small numbers of
variables, these two methods come pretty close.
However, PROC DOWNLOAD/UPLOAD
definitely wins both HEAT # 1 and HEAT # 2. The
advantage to using this procedure over the Remote
library option grows wider as you add more
variables and observations to the data sets you are
moving between hosts. Of course if you are
cleaning up for the night and interactively moving
data from your Unix work directory to a permanent
library it might be easier to click and drag in the
EXPLORER WINDOW, but for long programs that
need to be duplicable and or completely automated,
PROC DOWNLOAD/UPLOAD seems to make
more sense.
HEAT # 3 is much closer, because there is an extra
step needed to use PROC UPLOAD to move the
data. Also, unless you have a HUGE FORMAT
CATALOG, I don't know that the FMTS data set
will ever be big enough to see a real difference in
efficiency.
What would be neat (this is directed to those SAS
people who make this stuff happen) is if
options fmtsearch = (work.formats
unixwork.formats library.formats);
actually worked. Unfortunately as it stands now if
you try to assign formats located in the unixwork
library or any other remote library using the
OPTIONS FMTSEARCH=() option and a remote
library reference, you won't get an error, but when
you try to assign a format from a remote FORMAT
catalog to a local session variable it won't work.
This is because "You cannot open a catalog through
a server because access to catalogs is not supported
when the user machine and server machine have
different data representations." (If you want to see
this "NOTE" yourself double click on the
FORMATS catalog as it appears in the
UNIXWORK folder of the EXPLORER
WINDOW.)
Are we having fun yet? The best attributes of
SAS/CONNECT software are still ahead of us. Not
only can SAS/CONNECT talk back and forth with a
remote host, but it can also do so asynchronously.
To this point we have not made use of the
WAIT=NO option in any of our RSUBMIT
statements. This option tells SAS to send the SAS
statements in the RSUBMIT; - ENDRSUBMIT;
block through to the host server, but to immediately
return control to the local SAS session. We haven't
used this option thus far because we haven't needed
it; all of the code we have submitted executed and
returned results faster than we could blink. This
would not be true if we were trying to pull records
out of a database with a couple million records, or
to perform an SQL query that combines ten tables
from a relational database. In my mind the best
reason for using SAS/CONNECT is to be able to
send large, memory intensive tasks such as these to
another server, and let the processing take place on
the remote host, allowing you to be free to do other
things locally. This is especially true if you store
your data remotely so as not to bog down your local
server.
We will look more closely at the uses of the
SAS/CONNECT WAIT=NO and other statements
that work with it as we turn our attention to another
important piece of SAS software.
SAS/ACCESS software
If your data is stored in a format other than a SAS
data set on the remote server you are CONNECTed
to, how do we ACCESS it??
In effect SAS/ACCESS software provides a SAS-toNONSAS database management software
connection in the same way that SAS/CONNECT is
a SAS-to-SAS connection. SAS/ACCESS allows
you to read in and modify data housed in a NONSAS data storage package, and then write that
modified data back out to the database. From the
data analysts prospective, I don't have a need to
write data back out to the database, in fact, in my
job; I don't have the privilege of doing so. My focus
will therefore be on the various ways to 'access' data
stored in a relational database, using SAS/ACCESS,
rather than on the way to write these data back out
(PROC DBLOAD). Again, the examples in this
paper discuss accessing ORACLE tables on a Unix
server, if you are using a different DBMS, see the
SAS/ACCESS User's Guide specific to your product
for modifications that you might need to run these
examples on your system.
SAS/ACCESS software provides three main
methods for accessing a relational database, The
ACCESS Procedure, a DBMS specific LIBNAME
statement, or the SQL Pass-through facility. I will
compare and contrast the three.
The ACCESS Procedure
This procedure is the most code intensive method of
accessing a DBMS (Those of you deathly afraid of
SQL will note that I didn't say 'of using DBMS
data'), although none of the code is particular
difficult to grasp. The ACCESS procedure for
relational databases consists of two distinct
components, the ACCESS descriptor, and the VIEW
descriptor.
The ACCESS descriptor is a set of statements that
tells SAS how to access a DBMS table. For
example:
proc access dbms=oracle;
create work.mytest.access;
user=kpd;
orapw=mypassword;
table=category_service;
path='prda';
assign=yes;
rename catsrv_code=CATCODE
catsrv_label=Service;
list all;
This is the access descriptor for an ORACLE table
called CATEGORY_SERVICE within the
ORACLE instance 'prda'. The access descriptor
contains the information SAS/ACCESS will need to
read this table when it is called upon to do so,
including my userid (USER=) and password
(ORAPW=). ASSIGN=YES tells SAS that all
attributes of data sets created from this ACCESS
descriptor must conform to what is described here.
For example, I have renamed the ORACLE field
catsrv_code to be CATCODE. Any SAS data sets
created using this descriptor will contain the variable
CATCODE, and I will not be able to rename them in
the VIEW descriptor. In addition to RENAME you
can also use such familiar SAS options as FORMAT
and DROP within the ACCESS descriptor.
A VIEW descriptor uses the information contained
in its reference ACCESS descriptor to access the
database, then CREATE VIEW to "take a picture" of
the data. When you create a view, you actually set
up a query of the data, which can later be called by
any SAS procedure or DATA step. You can also
create a SAS data set from the ACCESS procedure,
but it must occur after the initialization of a VIEW
description. In other words while we would like:
rsubmit;
proc access dbms=oracle;
create work.mytest.access;
user=kpd;
orapw=noturpassword;
table=category_service;
path='prda';
assign=yes;
rename catsrv_code=CATCODE
catsrv_label=Service;
list all;
create work.myview.view
out=outputdataset;
select catsrv_code catsrv_label;
subset where catsrv_code ='96';
run;
We instead need to use a second PROC ACCESS
statement to create the data set:
rsubmit;
proc access dbms=oracle;
create work.mytest.access;
user=coevkpd;
orapw=urnosey;
table=category_service;
path='prda';
assign=yes;
rename catsrv_code=CATCODE
catsrv_label=Service;
list all;
create work.myview.view;
select catsrv_code catsrv_label;
subset where catsrv_code ='96';
run;
proc access viewdesc=work.myview
out=oratable1;
run;
endrsubmit;
Notice that I submitted this code to my SAS session
running remotely. This is, even in the case of a data
set with only two variables and one observation, the
most efficient way of using PROC ACCESS. There
are two reasons for this, first the Unix server is far
less bogged down with everyday traffic than my
Windows server. Even if I had a copy of this
ORACLE database available locally, SAS could
probably read it faster "up there." Second, since I
don't actually have a copy of the data to access
locally, it is much faster to access and manipulate it
up where it lives than to pull the data through my
network connection to Unix (which is what would
happen if I submitted the code without the
RSUBMIT).
The LIBNAME statement
The next option available to me is to reference the
ORACLE instance where my data is stored using a
LIBNAME statement.
The first piece of code represents a local LIBRARY
reference to the remotely stored ORACLE data. The
second demonstrates the DBPROMT= option
discussed below. The main reason I can think of to
set up the local LIBREF is the same as the reason
we used the SERVER= option earlier. It provides a
way to look at and move the smaller data tables
interactively.
The third example shows the preferred method,
remotely submitting the library reference to create
the ORACLE library as close to the data as possible.
Like remote submitting PROC ACCESS in the
previous section, we are trying to avoid pulling data
through the network until absolutely necessary, i.e.,
when we have a small enough subset of our data to
use PROC DOWNLOAD or REMOTE LIBRARY
SERVICES. In case you are wondering the
SERVER= option presented in the SAS/CONNECT
portion of the paper applies to the REMOTE library
engine, while ORACLE in your LIBREF here calls
the ORACLE library engine, so we can't combine
the two to get a local copy of a remote ORACLE
library. Nice thought though.
The LIBNAME statement with options for the
SAS/ACCESS to ORACLE engine has two distinct
advantages over PROC ACCESS. First, by
referencing the ORACLE instance (an instance is
ORACLE's way of saying LIBRARY) you set up a
reference to an entire group of tables at once, rather
than having to create a descriptor for each table.
Second, by using the DBPROMPT= option you can
tell SAS to prompt you for your username, password
and path rather than leaving them laying out in open
code. (Note: this obviously will not work in
BATCH SAS code, nor will it work for a remote
library reference, since you won't have access to the
resulting prompt locally.)
SQL Pass-Through Facility
libname dwh1 oracle user=kpd
password=stopasking
path='dwh1' schema=cpeom;
libname dwh1 oracle dbprompt=yes
schema=cpeom;
rsubmit;
libname dwh2 oracle user=kpd
password=iwonttell path='dwh1'
schema=cpeom;
endrsubmit;
For those of you familiar with SQL, the code for
PROC ACCESS probably looked familiar. That is
because SQL queries underlie most of what SAS
does with SAS/ACCESS for relational databases.
(SAS/ACCESS software for other types of database
management systems that do not use SQL to operate
on the data stored within them works differently.) If
you do not use SQL, don't know how to use SQL,
and have no interest in learning SQL, then the SQL
Pass-through facility is not for you. You can do
pretty much everything you want to do with your
DBMS data using PROC ACCESS or the
LIBNAME statement, and never have to write any
"real SQL code." But if you are going to be working
with data with large numbers of observations, or
many (50, 100, 250, etc.) related tables, you might
want to start playing with SQL. Here is an example
of what looks to be a complicated SQL Query (its
really not that bad, but I am not teaching SQL today
so you will have to take my word for it) that
combines information from three different tables in a
relational database with over 1 million total records.
It produces a count of the total number of clients
served per year by county.:
rsubmit wait=no;
proc sql;
connect to oracle (user=kpd
orapw=mylipsrsealed
path='pwh1'
schema=snp);
create table querytable as select *
from connection to oracle (select dates.year,
counties.ctyofres, count(distinct
services.recipient) as tot_served
from snp.dates dates, snp.services services,
snp.counties counties
where dates.datekey=services.datekey
and
counties.countykey=services.countykey
group by dates.year counties.cntyofres);
disconnect from oracle;
quit;
endrsubmit;
ALL) FROM CONNECTION TO ORACLE
statements. These statements are used to leave SAS
entirely and run this query from within the
ORACLE database itself. SAS then is passed the
results of this ORACLE SQL query, which it uses to
make the data set QUERYTABLE. This is by far
the most efficient way of running a query against a
database this large and complicated. It lets
ORACLE do the work it was designed to do, and
then lets SAS do the rest. This could have been
submitted on Unix using a LIBRARY reference for
ORACLE such as the DWH2 from my LIBNAME
example, but this would have been slower than the
query that uses the SQL-Pass Through facility. The
query could also have been run using the local
LIBREF DWH1, but this would have been by far the
slowest option (in the case of queries against HUGE
data sets the slowest by HOURS).
Also, since this was submitted remotely with the
WAIT=NO you can run other SAS procedures
locally while this is running on your remote SAS
session. The last two lines of code bring us back to
SAS/CONNECT. The RDISPLAY and RGET
commands are used with the WAIT=NO option to
go up to the remote server at a later point in time and
pull down the SAS LOG and output printed to the
LISTING OUTPUT destination. RGET puts these
results into your local LOG and OUTPUT windows
respectively, while RDISPLAY opens two more
windows to display this output separately. Of these
two, I prefer RGET. The reason for this preference
being that you can use RGET with PROC PRINTTO
to save a local copy of the remote SAS session's
LOG and OUTPUT, separate from your local SAS
session log.
*rdisplay unixdata;
proc printto
/*Pick one of us not both*/
run;
rget unixdata;
proc printto;run;
*rget unixdata;
There are several key points. First to toot SQL's
horn a little, notice that it did not require sorting the
database to perform by group processing, that it
produced a frequency count for me, and that it also
essentially produced a report dataset of Total clients
served by county and year.
Second, what you may not have noticed is probably
the most exciting part of this SQL code, the
CONNECT TO ORACLE and SELECT * (SQL for
log='remote.log'
print='remote.lst' new;
I haven't figured out a good way to do this with
RDISPLAY output, other than to interactively copy
the LOG or OUTPUT and then paste it into some
other text file for later.
Conclusion
This paper was intended to present just some of the
many ways to use SAS/CONNECT and
SAS/ACCESS software, and, within the ways
presented, to describe their pros and cons.
Hopefully the suggestions CONNECTed with you,
and they will serve to make these two valuable
packages more ACCESSible to you.
References
SAS Institute Inc. (1999), Communication Access
Methods for SAS/CONNECT and SAS/SHARE
software, Version 8, Cary, NC: SAS Institute Inc.
SAS Institute Inc. (1999), SAS/CONNECT User's
Guide, Version 8, Cary, NC: SAS Institute Inc.
SAS Institute Inc. (1999), SAS OnlineDoc, Version
8, Cary, NC: SAS Institute Inc.
SAS and all other SAS Institute Inc. product or
service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and
other countries.  indicates USA registration.
Other brand and product names are registered
trademarks or trademarks of their respective
companies.
Contact Information
Please send questions, comments and suggestions to:
Kevin Delaney
NYS Office of Mental Health
44 Holland Ave
Albany, NY 12229
(518) 473-7868
[email protected]