Download Data Step in SAS - Electrical and Computer Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Forecasting wikipedia , lookup

Database model wikipedia , lookup

Data vault modeling wikipedia , lookup

Information privacy law wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Data Preparation in SAS
Cheng Lei
Department of Electrical and Computer Engineering
University of Victoria, Canada
[email protected]
I.
Prior work: NoSQL
Investigation
After a comprehensive search and
survey from forums, official websites,
and other websites, the conclusion of
the interaction between SAS/ACCESS
and NoSQL is that there are no official
interfaces provided by SAS company,
even third-party libraries. There are
only official supports between
SAS/ACCESS and relational database
management systems. The SAS
University Edition only supports the
data access to PC files.
II.
Data Step
The general process flow of SAS
data analysis consists of five main
steps. The first part is the data step
which reads external raw data with its
corresponding formats or models,
then formats the reading data to store
as SAS data sets. The raw data can be
any kinds of data stored in any kinds
of repositories. After obtaining the
SAS data sets, the procedures in
SAS/STAT are applied to implement
the data analysis, which is called the
PROC step. The PROC step produces
the data analysis results including line
charts, scatter plots and other graphs.
The last step is to conclude all the
results to create a final report for the
data analysis.
The purpose of data step is to
create data sets used in SAS program’s
analysis and reporting procedures.
The SAS programming language
supplies the statements, interfaces,
and functions to read the external raw
data. The operations in data step
statements include SET, MERGE,
MODIFY, UPDATE and others. Before
reading the raw data, developers or
analysts have to write programs to
format the reading data. The SAS data
sets have a tabular structure
consisting rows and columns. By the
name norm in SAS, each row in SAS
data set is called an observation while
each column is named a variable. Each
column has a unified data type. The
value in each cell can be a missing
value represented by a placeholder, a
dot.
The general process of SAS Analysis
There are generally two steps for
the data step to work in SAS. Firstly,
the SAS base complies the SAS
statement to check the statement
syntax, to initial the memory. Then, it
begins to iterate the data records and
format them, store the formatted new
data till there are no more data
records existing. This is the compile
and execution phases.
the data analysis. These filtering
statements
contain
conditional
statements, like greater than, less
than, equal and etc., and if-then/else
statements.
Steps in Execution Phase
III.
Two phases in Data Step
In more details, besides checking
and initial memory, the compile phase
also create input buffer, program data
vector and descriptor information that
consists of data and file metadata. The
descriptor data is created and
managed by the SAS system to track
data change and maintain the
corresponding history.
During the execution phase, the
program vector created in the compile
phase is instantiated by the data from
the input vector after the error checks
and data records’ availability checks
are finished. Once the program vector
is instantiated, it begins to execute
other statements, like MERGE,
MODIFY, UPDATE and/or other
statements to meet the need. After
these operations, the new outputted
data is written to the library as SAS
data. SAS loops this process till all the
raw data have been refined.
The data step also provides
filtering statements to filter out some
variables that may be not involved in
SAS/ACCESS procedure
SAS has a powerful interface to
interact with databases. It can directly
query data from the databases by
using SQL statements. Almost all the
SQL statements provided in the
opensource/commercial
databases
are supported in SAS/ACCESS.
The keyword applied to connect
the database is LIBNAME combined
with
some
specified
database
information, including user, password,
server address, port number and so
on. More specifically, it refers to
SAS/ACCESS procedure document.