Download Transferring SAS Datasets from z/OS to SQL Server Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Data analysis wikipedia , lookup

SAP IQ wikipedia , lookup

Clusterpoint wikipedia , lookup

Information privacy law wikipedia , lookup

PL/SQL wikipedia , lookup

3D optical data storage wikipedia , lookup

SQL wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database model wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Transcript
Transferring SAS® Data Sets from z/OS to SQL Server Databases
Lan Hu, West Virginia University, Morgantown, WV
Xiaoqiang Wang, Mylan Pharmaceuticals Inc., Morgantown, WV
ABSTRACT
This paper demonstrates an efficient method to make local SQL Server backups of large SAS data sets hosted in
z/OS (not directory-based file system) mainframe computers based on the authors’ hands-on experience. We will
discuss several ways to transfer the data from z/OS to SQL Server. In addition, best suggestions will be given for
data that are either active on disk or already archived on tape.
INTRODUCTION
In Veterans Administration (VA) Pittsburgh Healthcare Systems, we keep a copy of national VA clinical data in
Microsoft SQL Server for local researchers. Maintaining a local backup is preferred over retrieving the SAS data sets
from our national data center (runs on z/OS) each time they are needed because
(1) It saves both time and cost (yes, we are charged by the data center) of getting the data;
(2) It does not clog the data center;
(3) We don’t have to obtain a mainframe account for every data analyst and train them to run SAS on z/OS.
We save the data copies in SQL Server tables instead of SAS data sets because database tables are easier to be
shared by multiple users and are randomly accessed more efficiently than SAS data sets.
PC
SQL Server
Mainframe
Figure 1. Data transfer from z/OS to SQL Server via a PC.
The source data in SAS reside in the Austin Information Technology Center (AITC), a recognized, award-winning
Federal data center within the Department of Veterans Affairs (VA), which provides a full complement of cost-efficient
e-government solutions to support the information technology (IT) needs of customers within the Federal sector. A
library of SAS data sets is available on the AITC mainframe. Researchers may use this library with proper
authorization. This library includes the following data sets and databases:
•
VHA Medical SAS Data Sets
•
Vital Status Files
•
Beneficiary Identification Records Locator System (BIRLS)
•
Assistant Deputy Under Secretary for Health for Policy and Planning (ADUSH) Enrollment Files
•
Veterans Administration Site Tracking System (VAST)
•
VETSNET
1
Data sets for the most recent fiscal years are active on disk, but most data sets that are more than 5 years old have
been archived on tape. Secure FTP access is allowed to and from the mainframe. SAS for z/OS 9.1.3 with Service
®
Pack 4 is available and SAS/CONNECT is enabled. A Time Sharing Option (TSO) user account is necessary to use
the AITC mainframe. There are two ways to submit SAS jobs to the mainframe SAS.
(1) A user logs in via a text-based console and submits SAS jobs via ISPF (Interactive System Productivity
Facility);
®
(2) A user submits SAS code via SAS/CONNECT remotely from another SAS session.
Our target database is Microsoft SQL Server 2005 in a Windows 2003 Server. We’ve been granted database owner
privileges to a database named ‘Research’ so we can create new tables and populate them.
We work on a Windows PC with SAS 9.2 SP1 and SQL Server Enterprise Manager. We connect to AITC mainframe
with the web-based IBM WebSphere Host On-Demand.
Figure 2. IBM Websphere Host On Demand.
Throughout this paper, we’ll demonstrate the transferring of outpatient visit files SF00 on tape (for fiscal year 2000
with 37,786,271 records) and SF10 on disk (for fiscal year 2010 with 67,031,975 records). The contents of those two
data sets are available at http://www.virec.research.va.gov/MedSAS/PC/FY00/PC-MedSAS-OP-SF-FY00-ER.txt and
http://www.virec.research.va.gov/MedSAS/PC/FY10/PC-MedSAS-OP-SF-FY10-ER.txt. We’ll concentrate the
following list of variables to keep our sample programs short.
SAS
Name
SQL Server Data Type
Data Type
Length
AGE
Num
2
SCRSSN
Num
5
STA5A
Char
5
VIZDAY
Num
4
Format
tinyint
SSN11.
char(9)
varchar(5)
Date9.
smalldatetime
Table 1. Variables to be transferred.
Although this paper focuses on transferring SAS data sets from z/OS to SQL Server, the ideas can easily be
extended to similar situations.
SAS METHODS TO TRANSFER DATA CROSS-PLATFORM
There are 6 ways to share SAS data cross-platform.
1.
CEDA
Cross-Environment Data Access (CEDA) is a Base SAS feature that allows a SAS file that was created in a
directory-based operating environment (for example, UNIX, Windows, or OpenVMS) to be readable in
another platform, i.e., one can simply copy a SAS data file into another machine and it can still be processed
2
by the SAS system on the new machine. Of course this method has its limitations. Please check reference
[4] for details.
However, CEDA does not apply in our situation because our mainframe uses the MVS file system which is
not directory-based.
2.
XPORT
Using the XPORT engine, one can convert SAS data sets into SAS V5 transportable files. This type of file
can be transferred back to SAS by all current versions of SAS or be processed directly by third-party
programs. But the XPORT engine does not support SAS 7 and later features, such as long file and variable
names.
3.
XML/CSV/TEXT etc.
By formatting a SAS data file into a text-based document (XML, CSV, Text, or whatever) in the source
environment, the text document shall be accessible by the target environment.
4.
CPORT/CIMPORT
PROC CPORT converts data sets or catalogs into transport format, while PROC CIMPORT translates the
transport file into the target environment's native format.
5.
®
Data transfer services in SAS/CONNECT software
®
SAS/CONNECT enables two SAS sessions to exchange SAS data files seamlessly without explicit
conversion, regardless the use of DATA step, PROC UPLOAD or DOWNLOAD.
6.
®
®
Remote library services in both SAS/CONNECT software and SAS/SHARE software
Remote library services enable transparent access to remote data through the use of the LIBNAME
statement.
SAS METHODS TO WRITE DATA INTO SQL SERVER
Suppose we get the source data into our PC as either SAS data sets or tab-delimited text files, there are 5 ways to
upload the records into SQL Server.
1.
PROC DBLOAD
Because this procedure is phasing out, we won’t discuss it here.
In method 2-4 we assume that the outpatient visit file for fiscal year 2000 had been downloaded to PC as a
SAS data set called ‘sf00’.
2.
DATA STEP
We can overwrite or create table sf2000 in the research database as follows.
❶
libname sqlsrv oledb init_string='***' insertbuff=2000 dbcommit=10000 ;
❷
data sqlsrv.sf2000(dbtype =(
❸
AGE='DBTYPE_UI1'
❹
VIZDAY='DBTYPE_DBTIMESTAMP'
));
set sf00(rename=(SCRSSN=SSN));
SCRSSN = put(SSN, z9.);
❺
VIZDAY = dhms(VIZDAY, 0, 0, 0) ;
drop SSN;
run;
❶INSERTBUFF= LIBNAME option specifies the number of rows per single DBMS insert. DBCOMMIT= LIBNAME option
causes an automatic COMMIT (a permanent writing of data to the DBMS) after a specified number of rows have been
processed. We have tried many combinations to find some values work more efficient with our database.
❷Note that we may have to use the DBTYPE= set option to make sure data types are cast as expected.
❸DBTYPE_UI1 is the OLEDB equivalent to smallint of SQL Server.
❹DBTYPE_DBTIMESTAMP is the OLEDB equivalent to datetime of SQL Server.
3
❺SQL Server date/time variables are in seconds, so we have to convert SAS dates into seconds before loading them
into the database.
3.
PROC APPEND
We can create or append to table sf2000 in the research database using PROC APPEND.
libname sqlsrv oledb init_string='***' insertbuff=2000 dbcommit=10000;
data sf00_view/view=sf00_view;
set sf00(rename=(SCRSSN=SSN));
SCRSSN = put(SSN, z9.);
VIZDAY = dhms(VIZDAY, 0, 0, 0);
drop SSN;
❶
proc append base=sqlsrv.sf2000(dbtype =(
AGE='DBTYPE_UI1'
VIZDAY='DBTYPE_DBTIMESTAMP'
)) data=sf00_view;
run;
❶Note that the DBTYPE= set option can be omitted if the SQL table already exists.
4.
INSERT statement in PROC SQL
This method can only append to existing database table sf2000.
libname sqlsrv oledb init_string='***' insertbuff=2000 dbcommit=10000;
data sf00_view/view=sf00_view;
set sf00(rename=(SCRSSN=SSN));
SCRSSN = put(SSN, z9.);
VIZDAY = dhms(VIZDAY, 0, 0, 0);
drop SSN;
proc sql;
insert into sqlsrv.sf2000(insertbuff=2000) (AGE, SCRSSN, STA6A, VIZDAY)
select AGE, SCRSSN, STA6A, VIZDAY from sf00_view;
quit;
5.
BULK LOADING
Bulk loading is a way to load a specially-formatted text file into a SQL Server table while minimally logging
the transaction. SQL Server provides 3 ways to do it, (1) use the bcp utility; (2) use the BULK INSERT
statement; (3) use the INSERT INTO statement with the OPENROWSET(BULK…) function. With bulk
loading, table constraints are ignored, but indexes are fully logged if the table was not empty, so it’s better to
disable indexes before bulk loading. If you import a large number of records, dividing the data into batches
can offer advantages due to the internal buffer of SQL Server.
Methods 1-4 are essentially the same in terms of efficiency, because SAS will translate them into INSERT
statements. The good news is that bulk loading is available for methods 2-4 by setting the BULKLOADING= option to
YES in the LIBNAME statement.
libname sqlsrv oledb datasource=’***’ provider=sqloledb
❶
properties =(‘Integrated Security’=SSPI ‘Persist Security Info’=True)
❷
bulkload=yes ;
❶Specifying OLEDB parameters individually is preferred to a single init_string according to SAS knowledge base
http://support.sas.com/kb/16/941.html.
❷BULKLOAD=YES calls a DBMS-specific bulk-load facility to insert or append rows to a DBMS table.
SOLUTION TO OUR SCENARIO
We have to optimize both the time of retrieving data from the AITC mainframe and the time of loading data into SQL
Server. After lots of trial and error, we finally reached an acceptable solution.
1.
Retrieving Tape Data from mainframe
SAS data sets on tapes can only be accessed sequentially and a tape can be accessed by one job at a time.
®
As a consequence, we cannot submit a program via SAS/CONNECT to retrieve tape data. Thus we have
to log in using our time sharing accounts and submit our SAS program as a job before it get its turn to run.
4
The following piece of code converts the fiscal year 2000 outpatient visit file to a tab-delimited text file (but
saves it remotely in mainframe still).
//S646XW1 JOB XXXUNKA9,S646XW1,MSGCLASS=I,NOTIFY=&SYSUID1
//STEP1
EXEC SAS
//INLIB
DD DSN=MDPPRD.MDP.SAS.SF00,UNIT=CART,DISP=SHR
//OUTLIB
DD DSN=S646XW1.SF00,DISP=(NEW,CATLG,DELETE),
//
VOL=(,,,99),
❶
//
UNIT=CART
//SYSIN
DD *
OPTIONS NOFMTERR MISSING='';
DATA _NULL;
❷
FILE OUTLIB DLM='05'X ;
SET INLIB.SF00(KEEP=AGE STA6A SCRSSN VIZDAY);
PUT AGE SCRSSN:Z9. STA6A VIZDAY:YYMMDD10.;
RUN;
❶The output text file was allocated on cartridge with enough space.
❷Note that ‘05’x is the tab character in EBCDIC code.
Then we use the FTP tool to download the text file to our PC. We have chosen text files instead of SAS
transportable files because (1) the file size is smaller for a text file compared against a SAS transportable
file; (2) properly formatted text files can be bulk loaded into SQL Server directly while SAS transportable files
demand further processing.
2.
Retrieving Disk Data from mainframe
®
SAS data sets on disk can be accessed real-time, so we choose SAS/CONNECT to download them to our
PC in one step. The following piece of code downloads the fiscal year 2010 outpatient visit file to our PC.
%let remhost=austin.aac.va.gov 5227;
libname out 'c:\data';
signon remhost USER="S646XW1" PASS=_prompt_;
RSUBMIT remote=remhost;
Options nofmterr;
Libname mylib "MDPPRD.MDP.SAS.sf10(0)" disp=shr;
PROC DOWNLOAD DATA=mylib.sf10(KEEP=VIZDAY)
❶
OUT=out.sf2010 V6TRANSPORT EXTENDSN=YES ;
RUN;
ENDRSUBMIT;
waitfor remhost;
SIGNOFF remhost;
❶EXTENDSN=YES promotes 2-byte numbers in Mainframe SAS to 3-byte numbers in PC SAS and it is necessary.
After that, we use a DATA step to write the downloaded data set into a tab-delimited text file. Note that the
text file conversion can be skipped, but we choose to use text files so the method of loading into database
will be the same.
3.
Load data into SQL Server
We choose to create SQL Server tables via pass-through queries in PROC SQL and load the data file using
the bcp utility. By using pass-through queries, we can take advantage of full transact-SQL features and
demand SQL Server to do exactly what we want. And the bcp utility can be simply invoked from within SAS
as an external command thus saves the programming effort. The following piece of code demonstrates the
loading of sf2010.txt into SQL Server.
proc sql;
connect to oledb(init_string="***");
execute(
create table dbo.SF2010
(AGE tinyint, SCRSSN char(9), STA6A varchar(6),
VIZDAY
smalldatetime)
)by oledb;
execute(sp_tableoption dbo.SF2010, %str(%')table lock on bulk
❶
load%str(%'), 1) by oledb;
5
quit;
x "bcp sf2010 in c:\data\sf2010.txt -c -t \t -b 10000 -T -S vhapthrwh";
❶This pass-through query locks the table for bulk-load.
Note that there might be some extra SQL needs to be executed before or after bulk loading, for example, to
disable/rebuild indexes.
Before we found this solution, it would take the database administrator about a month to load all the data files for one
fiscal year. Now it only takes a SAS programmer one week to finish the same task in his/her spare time.
CONCLUSION
Moving data cross environment is a trivial job in SAS, but it could be very time-consuming if not done correctly.
Fortunately SAS provides many tools for moving data around, and we can always find a way to work out a solution by
careful experimentation.
REFERENCES
®
[1] John E. Bentley, Using SAS to Move Data between Servers, SAS Global Forum 2012.
[2] Microsoft Developer Network, bcp Utility.
[3] Microsoft Developer Network, Bulk Import and Export of Data (SQL Server).
[4] Microsoft Developer Network, Data Type Mappings from OLE DB Data Types (SQL Server Compact Edition).
®
[5] SAS Institute Inc., SAS 9.2 Language Reference: Concepts, Second Edition.
®
[6] SAS Institute Inc., SAS/ACCESS 9.2 Interface for Relational Databases: Reference, Fourth Edition.
®
[7] SAS Institute Inc., SAS/CONNECT 9.2: User's Guide.
[8] VA Information Resource Center, Introduction to Job Control Language (JCL).
[9] VA Information Resource Center, Running a SAS Job at AITC.
[10] VA Information Resource Center, Transferring Data to and from AITC.
ACKNOWLEDGMENTS
The authors would like to thank Melissa Skanderson and Howard Bachtel at VA Pittsburgh Healthcare Systems for
providing examples of running SAS jobs on z/OS. Our special thanks and appreciation go to the section chair, H.
Mark Keintz, whose patience made this presentation possible!
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
XIAOQIANG WANG
Mylan
P.O. Box 4310
Morgantown, WV 26504-4310
Phone: 304-554-6146
E-mail: [email protected]
DISCLAIMER
The opinions expressed here are the view of the author and do not necessarily reflect the views and opinions of
Mylan Pharmaceuticals Inc. Mylan is not, by means of this article, providing technical, business, or other professional
advice or services and is not endorsing any of the software, techniques, approaches, or solutions presented herein.
This article is not a substitute for professional advice or services and should not be used as a basis for decisions that
could impact your business
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
6