Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clusterpoint wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Database model wikipedia , lookup
Data vault modeling wikipedia , lookup
Information privacy law wikipedia , lookup
Data analysis wikipedia , lookup
Data Preparation in SAS Cheng Lei Department of Electrical and Computer Engineering University of Victoria, Canada [email protected] I. Prior work: NoSQL Investigation After a comprehensive search and survey from forums, official websites, and other websites, the conclusion of the interaction between SAS/ACCESS and NoSQL is that there are no official interfaces provided by SAS company, even third-party libraries. There are only official supports between SAS/ACCESS and relational database management systems. The SAS University Edition only supports the data access to PC files. II. Data Step The general process flow of SAS data analysis consists of five main steps. The first part is the data step which reads external raw data with its corresponding formats or models, then formats the reading data to store as SAS data sets. The raw data can be any kinds of data stored in any kinds of repositories. After obtaining the SAS data sets, the procedures in SAS/STAT are applied to implement the data analysis, which is called the PROC step. The PROC step produces the data analysis results including line charts, scatter plots and other graphs. The last step is to conclude all the results to create a final report for the data analysis. The purpose of data step is to create data sets used in SAS program’s analysis and reporting procedures. The SAS programming language supplies the statements, interfaces, and functions to read the external raw data. The operations in data step statements include SET, MERGE, MODIFY, UPDATE and others. Before reading the raw data, developers or analysts have to write programs to format the reading data. The SAS data sets have a tabular structure consisting rows and columns. By the name norm in SAS, each row in SAS data set is called an observation while each column is named a variable. Each column has a unified data type. The value in each cell can be a missing value represented by a placeholder, a dot. The general process of SAS Analysis There are generally two steps for the data step to work in SAS. Firstly, the SAS base complies the SAS statement to check the statement syntax, to initial the memory. Then, it begins to iterate the data records and format them, store the formatted new data till there are no more data records existing. This is the compile and execution phases. the data analysis. These filtering statements contain conditional statements, like greater than, less than, equal and etc., and if-then/else statements. Steps in Execution Phase III. Two phases in Data Step In more details, besides checking and initial memory, the compile phase also create input buffer, program data vector and descriptor information that consists of data and file metadata. The descriptor data is created and managed by the SAS system to track data change and maintain the corresponding history. During the execution phase, the program vector created in the compile phase is instantiated by the data from the input vector after the error checks and data records’ availability checks are finished. Once the program vector is instantiated, it begins to execute other statements, like MERGE, MODIFY, UPDATE and/or other statements to meet the need. After these operations, the new outputted data is written to the library as SAS data. SAS loops this process till all the raw data have been refined. The data step also provides filtering statements to filter out some variables that may be not involved in SAS/ACCESS procedure SAS has a powerful interface to interact with databases. It can directly query data from the databases by using SQL statements. Almost all the SQL statements provided in the opensource/commercial databases are supported in SAS/ACCESS. The keyword applied to connect the database is LIBNAME combined with some specified database information, including user, password, server address, port number and so on. More specifically, it refers to SAS/ACCESS procedure document.