Download From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Serializability wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Access wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Relational model wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

PL/SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Transcript
From Database to your Desktop: How to almost completely automate
reports in SAS, with the power of Proc SQL
Kirtiraj Mohanty, Department of Mathematics and Statistics, San Diego State University, San
Diego, CA
Trinh Nguyen, Department of Mathematics and Statistics, San Diego State University, San
Diego, CA
ABSTRACT
S
AS has many varied applications, and in many situations, data analysts use SAS, to automate reporting and
data summarization processes. Proc SQL is a powerful procedure, which could be used to pull data from databases,
manipulate and summarize the data within SAS, as per the requirement and then email or export Excel or CSV files
to the end user. SAS connection to ODBC can be used to connect to any popular database servers (e.g. Teradata,
Oracle, MS SQL Server, MS Access etc.) and conveniently bring data into SAS (as SAS dataset format) and then
perform various data manipulation/summarization techniques using Proc SQL to bring the data into the desired
format, for further analysis or reporting purposes. Macro variables could be used to dynamically generate variables,
like date ranges, which need to change over time, to execute SQL queries. Proc SQL is a powerful procedure, where
the SQL statements could be used on SAS datasets, to perform operations like count, sum, average, join (merging
multiple datasets), filter, insert, delete etc. We found that Proc SQL can be used, in almost all scenarios, to bring the
data into a desired format. Then the final dataset can be sent to the end user via email or exported to a hard drive.
This whole process can be fully automated and then scheduled as SAS jobs in Windows Task Scheduler. This paper
provides a step by step process of connecting to database, summarize data, export/email the final dataset and
scheduling a batch job in Windows 7 OS, with examples.
KEY WORDS: Proc SQL, Macro variables, Proc Export, Batch Jobs
INTRODUCTION
To perform advanced data analysis and/or modeling, first and foremost requirement is to bring the data into
the right format. For Data Analysts and Statisticians, one of the very essential skills is, to learn how to pull data from a
database and format or summarize the data into the right format, to carry out data analysis or modeling. SAS is a
very powerful tool in this regard. Many companies (especially Online Retail, Traditional Retail, Credit Card companies
etc.) generate huge amount of transactional data every day. Frequently, business executives want to look at those
transactional data, in summarized and readable format, to help them make data driven decisions. When
reports/charts/dashboards need to be updated on a periodic basis, SAS could be used to automate the whole
process, with very little manual intervention.
CONFIGURING ODBC FOR MS ACCESS
Create an ACCESS database and store the weekly transactions dataset as a .mdb file. In our case, we
created a database called Database1.mdb. In that database we created a table called Daily_transactions and stored
some made up data for the purpose of demoing. One of the most important tasks, is to add the data source to the
ODBC connections. Without this your codes will simply not work. The steps to add the data source to ODBC are as
follows:
1.
Open ODBC Data Source Administrator (Shown below in Display 1)
Display 1: ODBC Data Source Administrator
1
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL, continued
2.
Click on MS ACCESS Database and then click on Configure
3.
Then click on Select and browse the .mdb file, you just created.
4.
Restart you computer, if your SAS codes gives an error saying that the database was not found.
EXTRACTING DATA FROM A DATABASE
Most companies store transactional data in databases like Oracle, Teradata, DB2, MS SQL Server, MS
Access etc. Proc SQL could be used to connect to these databases and extract the data into SAS dataset format.
Once the data is in SAS dataset format, any SAS procedure could be used on it, to carry out data summarization,
analysis and/or modeling. Proc SQL is basically a procedure in SAS which enables a user, to incorporate SQL
commands in SAS. Hence, to effectively use Proc SQL, the user needs to be trained in SQL first. In this paper we are
going to show you, how reports can be created from a database (here MS ACCESS) to your desktop. Take an
example of XYZ company, which sells product P, with an average price of $100. The daily transaction records of
product P, are stored in an MS ACCESS database as show in Display 1 below.
Display 2. Daily Transaction of Product P (XYZ Company) as stored in a MS ACCESS Database table
Txn_dt is the date of the transactions, Txns are the number of transactions and Sales_Amt_USD is the sales amount
of those transactions in US Dollars. This table is in database name = Database1 and table name =
Daily_transactions.
In this example, we are extracting the sum of transactions and sales, for the week of 01-JUL-2013 and 02JUL-2012 (the week corresponding to 2012 for comparison). The following SAS code calculates the start and end
dates and extracts the required data.
2
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL, continued
The output dataset (i.e. xyz.transac) is as follows
3
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL, continued
Display 3. Output dataset which summarizes transaction and sales for the 2 weeks
Here Transactions and Sales are for the week starting 01-JUL-2013 and Transactionsly and Salesly, are for the week
starting 02-JUL-2012 (NOTE: ly stands for last year)
Based on our experience we found that Proc SQL could be used to in almost all scenarios to bring the data
into a desired format. Proc SQL gives you the power, to use the SQL language in SAS and carry out all the typical
data manipulation/summarization techniques like count, sum, average, join (merging multiple datasets), filter, insert,
delete etc. Please also note that hitting the database using proc sql should be minimized as much as possible, as
extracting data from a database, could be much slower, as compared to working with SAS dataset files.
EXPORTING/SENDING THE DATASET
Once the desired dataset has been obtained, it is time to export/send the dataset to the end user. In this
example, the weekly numbers were appended to a master dataset, where all previous weekly numbers were stored.
Before appending the, xyz.transac should be brought into the correct format, for it to be appended to the master
dataset.
The weekly transaction master dataset (before the append) is shown in Display 4
Display 4. Weekly Transactions master dataset before the Append statement
The format of xyz.transac is modified as follows:
Then we execute the Proc Append statement to append xyz.transac to the Weekly Transactions master dataset.
4
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL, continued
After the Proc Append statement, the Weekly Transactions master dataset looks as below (Display 5):
Display 5. Weekly Transactions master dataset after the Append statement
The above dataset is the desired dataset, which needs to be either exported to a hard drive or sent to a distribution
list via email, to generate the final report in EXCEL. The codes for those operations are shown below:
SCHEDULING A SAS BATCH JOB IN WINDOWS 7
The entire process mentioned above could be fully automated and run periodically (in this case weekly), by using
Windows’ Task Scheduler. The steps are
1. Open Task Scheduler and click on create task
2. In the general tab enter the job name.
3. In the triggers tab, schedule the job, as per your requirement (see Display 6 below)
5
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL, continued
Display 6. Setting the trigger for the batch job
4. Under Action, enter the following in the Program/script text field
C:\Program Files\SAS Institute\SAS\V9\Sas.exe -sysin c:\Batch Jobs\WUSS.sas (see Display 7)
Display 7. Entering the file path
5. Click OK to schedule your task. You can also check you task in the active tasks list (see Display 8)
6
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL, continued
Display 8. Checking the list of active jobs
THE FINAL REPORT
Once the CSV file has been received via email or from the hard drive, the analyst needs
to create the final report. Many Company executives prefer to see the final result, in Excel
format. Hence we are presenting the final report in Excel format, where the data pulled from
the database is being used to show weekly transactions and sales trends, along with Year over
Year (YoY) changes. Display 8 shows the final report:
Display 9: The final Report
This part of EXCEL processing could be automated by using some excel techniques and/or VBA
for EXCEL. Demonstrating these techniques is beyond the scope of this paper.
CONCLUSION
It is quite evident that SAS is also a powerful tool to extract, summarize and present the data in a desired
format, in fully automated way. These skills in SAS, along with Statistical data analysis and modeling, will equip the
Data Analyst/Statistician to perform end-to-end processes, i.e. from extracting raw data from a source, to presenting
highly advanced statistical inference on a Powerpoint, for enabling data-driven decisions/strategies.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Name: Raj Mohanty
Enterprise: n/a
7
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL, continued
E-mail: [email protected]
Web: www.linkedin.com/in/kirtiraj/
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
8