* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Transferring SAS Datasets from z/OS to SQL Server Databases
Survey
Document related concepts
Data Protection Act, 2012 wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Clusterpoint wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Business intelligence wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Transcript
Transferring SAS® Data Sets from z/OS to SQL Server Databases Lan Hu, West Virginia University, Morgantown, WV Xiaoqiang Wang, Mylan Pharmaceuticals Inc., Morgantown, WV ABSTRACT This paper demonstrates an efficient method to make local SQL Server backups of large SAS data sets hosted in z/OS (not directory-based file system) mainframe computers based on the authors’ hands-on experience. We will discuss several ways to transfer the data from z/OS to SQL Server. In addition, best suggestions will be given for data that are either active on disk or already archived on tape. INTRODUCTION In Veterans Administration (VA) Pittsburgh Healthcare Systems, we keep a copy of national VA clinical data in Microsoft SQL Server for local researchers. Maintaining a local backup is preferred over retrieving the SAS data sets from our national data center (runs on z/OS) each time they are needed because (1) It saves both time and cost (yes, we are charged by the data center) of getting the data; (2) It does not clog the data center; (3) We don’t have to obtain a mainframe account for every data analyst and train them to run SAS on z/OS. We save the data copies in SQL Server tables instead of SAS data sets because database tables are easier to be shared by multiple users and are randomly accessed more efficiently than SAS data sets. PC SQL Server Mainframe Figure 1. Data transfer from z/OS to SQL Server via a PC. The source data in SAS reside in the Austin Information Technology Center (AITC), a recognized, award-winning Federal data center within the Department of Veterans Affairs (VA), which provides a full complement of cost-efficient e-government solutions to support the information technology (IT) needs of customers within the Federal sector. A library of SAS data sets is available on the AITC mainframe. Researchers may use this library with proper authorization. This library includes the following data sets and databases: • VHA Medical SAS Data Sets • Vital Status Files • Beneficiary Identification Records Locator System (BIRLS) • Assistant Deputy Under Secretary for Health for Policy and Planning (ADUSH) Enrollment Files • Veterans Administration Site Tracking System (VAST) • VETSNET 1 Data sets for the most recent fiscal years are active on disk, but most data sets that are more than 5 years old have been archived on tape. Secure FTP access is allowed to and from the mainframe. SAS for z/OS 9.1.3 with Service ® Pack 4 is available and SAS/CONNECT is enabled. A Time Sharing Option (TSO) user account is necessary to use the AITC mainframe. There are two ways to submit SAS jobs to the mainframe SAS. (1) A user logs in via a text-based console and submits SAS jobs via ISPF (Interactive System Productivity Facility); ® (2) A user submits SAS code via SAS/CONNECT remotely from another SAS session. Our target database is Microsoft SQL Server 2005 in a Windows 2003 Server. We’ve been granted database owner privileges to a database named ‘Research’ so we can create new tables and populate them. We work on a Windows PC with SAS 9.2 SP1 and SQL Server Enterprise Manager. We connect to AITC mainframe with the web-based IBM WebSphere Host On-Demand. Figure 2. IBM Websphere Host On Demand. Throughout this paper, we’ll demonstrate the transferring of outpatient visit files SF00 on tape (for fiscal year 2000 with 37,786,271 records) and SF10 on disk (for fiscal year 2010 with 67,031,975 records). The contents of those two data sets are available at http://www.virec.research.va.gov/MedSAS/PC/FY00/PC-MedSAS-OP-SF-FY00-ER.txt and http://www.virec.research.va.gov/MedSAS/PC/FY10/PC-MedSAS-OP-SF-FY10-ER.txt. We’ll concentrate the following list of variables to keep our sample programs short. SAS Name SQL Server Data Type Data Type Length AGE Num 2 SCRSSN Num 5 STA5A Char 5 VIZDAY Num 4 Format tinyint SSN11. char(9) varchar(5) Date9. smalldatetime Table 1. Variables to be transferred. Although this paper focuses on transferring SAS data sets from z/OS to SQL Server, the ideas can easily be extended to similar situations. SAS METHODS TO TRANSFER DATA CROSS-PLATFORM There are 6 ways to share SAS data cross-platform. 1. CEDA Cross-Environment Data Access (CEDA) is a Base SAS feature that allows a SAS file that was created in a directory-based operating environment (for example, UNIX, Windows, or OpenVMS) to be readable in another platform, i.e., one can simply copy a SAS data file into another machine and it can still be processed 2 by the SAS system on the new machine. Of course this method has its limitations. Please check reference [4] for details. However, CEDA does not apply in our situation because our mainframe uses the MVS file system which is not directory-based. 2. XPORT Using the XPORT engine, one can convert SAS data sets into SAS V5 transportable files. This type of file can be transferred back to SAS by all current versions of SAS or be processed directly by third-party programs. But the XPORT engine does not support SAS 7 and later features, such as long file and variable names. 3. XML/CSV/TEXT etc. By formatting a SAS data file into a text-based document (XML, CSV, Text, or whatever) in the source environment, the text document shall be accessible by the target environment. 4. CPORT/CIMPORT PROC CPORT converts data sets or catalogs into transport format, while PROC CIMPORT translates the transport file into the target environment's native format. 5. ® Data transfer services in SAS/CONNECT software ® SAS/CONNECT enables two SAS sessions to exchange SAS data files seamlessly without explicit conversion, regardless the use of DATA step, PROC UPLOAD or DOWNLOAD. 6. ® ® Remote library services in both SAS/CONNECT software and SAS/SHARE software Remote library services enable transparent access to remote data through the use of the LIBNAME statement. SAS METHODS TO WRITE DATA INTO SQL SERVER Suppose we get the source data into our PC as either SAS data sets or tab-delimited text files, there are 5 ways to upload the records into SQL Server. 1. PROC DBLOAD Because this procedure is phasing out, we won’t discuss it here. In method 2-4 we assume that the outpatient visit file for fiscal year 2000 had been downloaded to PC as a SAS data set called ‘sf00’. 2. DATA STEP We can overwrite or create table sf2000 in the research database as follows. ❶ libname sqlsrv oledb init_string='***' insertbuff=2000 dbcommit=10000 ; ❷ data sqlsrv.sf2000(dbtype =( ❸ AGE='DBTYPE_UI1' ❹ VIZDAY='DBTYPE_DBTIMESTAMP' )); set sf00(rename=(SCRSSN=SSN)); SCRSSN = put(SSN, z9.); ❺ VIZDAY = dhms(VIZDAY, 0, 0, 0) ; drop SSN; run; ❶INSERTBUFF= LIBNAME option specifies the number of rows per single DBMS insert. DBCOMMIT= LIBNAME option causes an automatic COMMIT (a permanent writing of data to the DBMS) after a specified number of rows have been processed. We have tried many combinations to find some values work more efficient with our database. ❷Note that we may have to use the DBTYPE= set option to make sure data types are cast as expected. ❸DBTYPE_UI1 is the OLEDB equivalent to smallint of SQL Server. ❹DBTYPE_DBTIMESTAMP is the OLEDB equivalent to datetime of SQL Server. 3 ❺SQL Server date/time variables are in seconds, so we have to convert SAS dates into seconds before loading them into the database. 3. PROC APPEND We can create or append to table sf2000 in the research database using PROC APPEND. libname sqlsrv oledb init_string='***' insertbuff=2000 dbcommit=10000; data sf00_view/view=sf00_view; set sf00(rename=(SCRSSN=SSN)); SCRSSN = put(SSN, z9.); VIZDAY = dhms(VIZDAY, 0, 0, 0); drop SSN; ❶ proc append base=sqlsrv.sf2000(dbtype =( AGE='DBTYPE_UI1' VIZDAY='DBTYPE_DBTIMESTAMP' )) data=sf00_view; run; ❶Note that the DBTYPE= set option can be omitted if the SQL table already exists. 4. INSERT statement in PROC SQL This method can only append to existing database table sf2000. libname sqlsrv oledb init_string='***' insertbuff=2000 dbcommit=10000; data sf00_view/view=sf00_view; set sf00(rename=(SCRSSN=SSN)); SCRSSN = put(SSN, z9.); VIZDAY = dhms(VIZDAY, 0, 0, 0); drop SSN; proc sql; insert into sqlsrv.sf2000(insertbuff=2000) (AGE, SCRSSN, STA6A, VIZDAY) select AGE, SCRSSN, STA6A, VIZDAY from sf00_view; quit; 5. BULK LOADING Bulk loading is a way to load a specially-formatted text file into a SQL Server table while minimally logging the transaction. SQL Server provides 3 ways to do it, (1) use the bcp utility; (2) use the BULK INSERT statement; (3) use the INSERT INTO statement with the OPENROWSET(BULK…) function. With bulk loading, table constraints are ignored, but indexes are fully logged if the table was not empty, so it’s better to disable indexes before bulk loading. If you import a large number of records, dividing the data into batches can offer advantages due to the internal buffer of SQL Server. Methods 1-4 are essentially the same in terms of efficiency, because SAS will translate them into INSERT statements. The good news is that bulk loading is available for methods 2-4 by setting the BULKLOADING= option to YES in the LIBNAME statement. libname sqlsrv oledb datasource=’***’ provider=sqloledb ❶ properties =(‘Integrated Security’=SSPI ‘Persist Security Info’=True) ❷ bulkload=yes ; ❶Specifying OLEDB parameters individually is preferred to a single init_string according to SAS knowledge base http://support.sas.com/kb/16/941.html. ❷BULKLOAD=YES calls a DBMS-specific bulk-load facility to insert or append rows to a DBMS table. SOLUTION TO OUR SCENARIO We have to optimize both the time of retrieving data from the AITC mainframe and the time of loading data into SQL Server. After lots of trial and error, we finally reached an acceptable solution. 1. Retrieving Tape Data from mainframe SAS data sets on tapes can only be accessed sequentially and a tape can be accessed by one job at a time. ® As a consequence, we cannot submit a program via SAS/CONNECT to retrieve tape data. Thus we have to log in using our time sharing accounts and submit our SAS program as a job before it get its turn to run. 4 The following piece of code converts the fiscal year 2000 outpatient visit file to a tab-delimited text file (but saves it remotely in mainframe still). //S646XW1 JOB XXXUNKA9,S646XW1,MSGCLASS=I,NOTIFY=&SYSUID1 //STEP1 EXEC SAS //INLIB DD DSN=MDPPRD.MDP.SAS.SF00,UNIT=CART,DISP=SHR //OUTLIB DD DSN=S646XW1.SF00,DISP=(NEW,CATLG,DELETE), // VOL=(,,,99), ❶ // UNIT=CART //SYSIN DD * OPTIONS NOFMTERR MISSING=''; DATA _NULL; ❷ FILE OUTLIB DLM='05'X ; SET INLIB.SF00(KEEP=AGE STA6A SCRSSN VIZDAY); PUT AGE SCRSSN:Z9. STA6A VIZDAY:YYMMDD10.; RUN; ❶The output text file was allocated on cartridge with enough space. ❷Note that ‘05’x is the tab character in EBCDIC code. Then we use the FTP tool to download the text file to our PC. We have chosen text files instead of SAS transportable files because (1) the file size is smaller for a text file compared against a SAS transportable file; (2) properly formatted text files can be bulk loaded into SQL Server directly while SAS transportable files demand further processing. 2. Retrieving Disk Data from mainframe ® SAS data sets on disk can be accessed real-time, so we choose SAS/CONNECT to download them to our PC in one step. The following piece of code downloads the fiscal year 2010 outpatient visit file to our PC. %let remhost=austin.aac.va.gov 5227; libname out 'c:\data'; signon remhost USER="S646XW1" PASS=_prompt_; RSUBMIT remote=remhost; Options nofmterr; Libname mylib "MDPPRD.MDP.SAS.sf10(0)" disp=shr; PROC DOWNLOAD DATA=mylib.sf10(KEEP=VIZDAY) ❶ OUT=out.sf2010 V6TRANSPORT EXTENDSN=YES ; RUN; ENDRSUBMIT; waitfor remhost; SIGNOFF remhost; ❶EXTENDSN=YES promotes 2-byte numbers in Mainframe SAS to 3-byte numbers in PC SAS and it is necessary. After that, we use a DATA step to write the downloaded data set into a tab-delimited text file. Note that the text file conversion can be skipped, but we choose to use text files so the method of loading into database will be the same. 3. Load data into SQL Server We choose to create SQL Server tables via pass-through queries in PROC SQL and load the data file using the bcp utility. By using pass-through queries, we can take advantage of full transact-SQL features and demand SQL Server to do exactly what we want. And the bcp utility can be simply invoked from within SAS as an external command thus saves the programming effort. The following piece of code demonstrates the loading of sf2010.txt into SQL Server. proc sql; connect to oledb(init_string="***"); execute( create table dbo.SF2010 (AGE tinyint, SCRSSN char(9), STA6A varchar(6), VIZDAY smalldatetime) )by oledb; execute(sp_tableoption dbo.SF2010, %str(%')table lock on bulk ❶ load%str(%'), 1) by oledb; 5 quit; x "bcp sf2010 in c:\data\sf2010.txt -c -t \t -b 10000 -T -S vhapthrwh"; ❶This pass-through query locks the table for bulk-load. Note that there might be some extra SQL needs to be executed before or after bulk loading, for example, to disable/rebuild indexes. Before we found this solution, it would take the database administrator about a month to load all the data files for one fiscal year. Now it only takes a SAS programmer one week to finish the same task in his/her spare time. CONCLUSION Moving data cross environment is a trivial job in SAS, but it could be very time-consuming if not done correctly. Fortunately SAS provides many tools for moving data around, and we can always find a way to work out a solution by careful experimentation. REFERENCES ® [1] John E. Bentley, Using SAS to Move Data between Servers, SAS Global Forum 2012. [2] Microsoft Developer Network, bcp Utility. [3] Microsoft Developer Network, Bulk Import and Export of Data (SQL Server). [4] Microsoft Developer Network, Data Type Mappings from OLE DB Data Types (SQL Server Compact Edition). ® [5] SAS Institute Inc., SAS 9.2 Language Reference: Concepts, Second Edition. ® [6] SAS Institute Inc., SAS/ACCESS 9.2 Interface for Relational Databases: Reference, Fourth Edition. ® [7] SAS Institute Inc., SAS/CONNECT 9.2: User's Guide. [8] VA Information Resource Center, Introduction to Job Control Language (JCL). [9] VA Information Resource Center, Running a SAS Job at AITC. [10] VA Information Resource Center, Transferring Data to and from AITC. ACKNOWLEDGMENTS The authors would like to thank Melissa Skanderson and Howard Bachtel at VA Pittsburgh Healthcare Systems for providing examples of running SAS jobs on z/OS. Our special thanks and appreciation go to the section chair, H. Mark Keintz, whose patience made this presentation possible! CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: XIAOQIANG WANG Mylan P.O. Box 4310 Morgantown, WV 26504-4310 Phone: 304-554-6146 E-mail: [email protected] DISCLAIMER The opinions expressed here are the view of the author and do not necessarily reflect the views and opinions of Mylan Pharmaceuticals Inc. Mylan is not, by means of this article, providing technical, business, or other professional advice or services and is not endorsing any of the software, techniques, approaches, or solutions presented herein. This article is not a substitute for professional advice or services and should not be used as a basis for decisions that could impact your business SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. 6