Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SPECIFICATION On Formula Based Calculations In SQL Server Production Databases OBJECTIVE As you know, one of the main complains from the side of the Statistical Division is that the development of the required calculations takes too much time. This criticism is partly justified, as every stored procedure should pass the standard scenario of code developing, debugging, testing and integrating into the dataAdmin application in the production phase. There is no almost any reuse of the previously developed code. Users tend to compare the time they spend for calculation development in spreadsheet applications and the development time for stored procedures. This comparison is not in favor of the ad-hoc stored procedure approach. The objective and the scope of this report is to define a specification for the project that enables statisticians to develop the formula based calculations in the SQL Server Production Databases as this can be done in the spreadsheet applications. PREREQUISITES Currently, the macroeconomic database can contain more than one product segment for any combination of Country, Indicator and Periodicity values. The uniqueness of a time series for any combination of Country, Indicator and Periodicity values would facilitate the development of the formula based calculations. This is because the filtering conditions will not be anymore necessary to introduce. CONCEPT AND DESIGN In the first phase it is proposed to develop such an interface that will allow to certain statisticians possessing some IT skills to a) Develop modules with arithmetical formulas for time series, 1 b) Integrate the developed modules into the new dataAdmin application for further testing and production. If the first phase is successful, we can think of implementing the same approach for the nightly job calculations. The plug-in model of the new dataAdmin application stated above will allow shifting the development of calculation routines to the Statistical Division. If implemented, this approach will attach more importance to the reengineering of dataAdmin and dbAdmin applications. dataAdmin application where statisticians do all their data cooking work is rather primitive application as to its graphical interface. The main problem is with the data calculations. This is why it is considered of the primary importance to start the applications reengineering with the formula based calculations. The central object of the specification is a time series. Here is a trial routine that can be used to reproduce the spreadsheet formula calculations for the sample presumable time series below, Ri A i 1 100 Bi where i stands for a time period like, 1999Y or 2005Q2 etc. Apparently, we can illustrate that the above-mentioned formula translates to the following Table1 table and SQL coding. Table1 A A indicator Year 98 A 98 99 A 99 00 A 00 01 A01 02 A 02 03 A 03 04 Null B Year 97 98 99 00 01 02 03 B Indicator B 97 B 98 B 99 B 00 B 01 B 02 B 03 2 R Year 98 99 00 01 02 03 04 R Indicator R 98 R 99 R 00 R 01 R 02 R 03 Null The values for “B Indicator” values are shifted in order to ensure the previous year denominator in the formula. After the shifting is done, the result value in the “R Indicator” is just the division of the values in the corresponding row/record. Below is the SQL statement that can ensure this operation, UPDATE Table1 SET R_Indicator = A_Indicator/B_Indicator*100 It is clear that the SET clause of the UPDATE SQL statement contains just a simple arithmetic formula complying with the Transact-SQL syntax. Thus the main problem of developing the syntax analyzer can be circumvented because the Transact-SQL compliant arithmetic formula can be used instead. A statistician can develop and save this arithmetic formula to the database. We are sure that some statisticians like, Ioussoufou already have all necessary IT skills to develop arithmetical operations compliant with the Transact-SQL syntax. The stored procedure will collect the saved metadata like, shifting, arithmetic formula and other eventual similar operators in order to dynamically build the UPDATE SQL statement. Such an approach was successfully implemented in the PC-Axis data mapping procedure. Until now, we received very few requests for the development of the data mapping SQL queries. GROUPING AND PIPING Apart from shifting and arithmetic formula, we need to add another primitive operator, GROUPING. The grouping operator will be used to make the aggregate operators like, SUM, AVERAGE etc. Let’s take an example table. Table2 Q Year 98 98 98 98 99 99 Q Quarter Q Indicator Q1 Q2 Q3 Q4 Q1 Q2 Q 98, q1 Q 98,q 2 Q 98,q 3 Q 98,q 4 Q 99,q1 Q 99,q 2 3 99 99 00 00 00 00 01 01 01 01 Q 99,q 3 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q 99,q 4 Q 00,q1 Q 00,q 2 Q 00,q 3 Q 00,q 4 Q 01,q1 Q 01,q 2 Q 01,q 3 Q 01,q 4 When grouping operator is applied to Q Year column in Table2 table, the SUM operation applied to Q Indicator column will give a time aggregation of quarterly data to annual data. Let’s note the result time series of the grouping operation as Ai . Then this time series can be used to feed Table1 table used in the example with the shifting operator. See below how such kind of a statistical calculation is used in the rebase of quarterly data, Group quarterly data Join A and B Shift B Indicator Calculate A/B*100 formula The idea of chaining statistical data manipulations is called piping by analogy with Unix/Linux shells. The piping of output tables for one operation into input tables for the next operation will allow hiding the creation/deletion of temporary tables appearing at every step. The piping may require the joining operation like it is done with A and B columns in Table1 table data for which are coming from different tables. TYPICAL EXAMPLE FORMULAE a) Share. Objective Calculate share of the agricultural sector in GDP? 4 Given Agriculture in absolute values, Ai Formula Sum of all sectors to get GDP. Divide each sector by GDP. Ri = Ai/ SUM(A1,A2,…An) b) Aggregated growth rate. Objective Calculate GDP growth rate from growth rates of contributing indicators. Given Growth rate and values in constant prices for indicators contributing to GDP (e.g. agricultural sector). Formula Calculate weights from the absolute value for a base year (value in constant prices), Wi = Ai/SUM (A1, A2, ..An) Then multiply time series for indices (growth rates for every sector contributing to GDP) by 100 to calculated weights (and divide by 100???), Then sum up the result time series to get the GDP growth rate R2005 = SUM (Ii, 2005 * Wi, 2005 ) Ask Why is the GDP growth rate not calculated from its values in constant prices by summing up and finding the ratio? c) Contributions to GDP growth. Objective How much the agricultural sector contributes into the GDP growth? Given Agriculture for year 2005 in absolute figures C2005 = 200,0 Agriculture for year 2004 in absolute figures C2004 = 150,0 5 Formula Agriculture growth in absolute figures G = C2005 - C2004 GDP growth in absolute figures D = SUM (A, B, C, D) = 2000,0 (calculated as a sum of all growth figures for all sectors). Weight of agricultural sector is the result figure, R = G/D = 50/2000 * 100= 2.5% d) Deflator. Deflator is a ratio of the agricultural sector in current prices to the agricultural sector in constant prices. D = Acur, 2005/Aconst, 2005 * 100 e) Rebasing quarterly data. Rebase is an operation to bring figures for one base year to another base (common) year. Below is an algorithm how to rebase quarterly data, UNRESOLVED PROBLEMS The ad-hoc stored procedures abound with various verifications. This ad-hoc code may become the main obstacle for this project. The objective of this project is to give a tool to statisticians to independently develop calculation modules. They will not be able to code that kind of ad-hoc verifications. The central object of the project is a time series and not tables with records and columns. Below is an assertion which accuracy we have to evaluate. As the statistical formula (e.g. R i A i 1 100 ) is sufficiently describing the Bi calculation in question and does not contain any conditional statements, the data update routine should not have any conditional statements either! CONCLUSION The project will require developing the following software, 1. Primitive preparatory operators, like shifting and grouping, 2. Arithmetic formula computations, 3. Piping the result tables, 6 4. 5. 6. 7. Conditional operators, Update of the time series definitions and data, Plug-in development interface, New dataAdmin with an integration of plug-ins. CONSEQUENCES If implemented, this will allow to: a) Shift the development of some statistical computations to statisticians, b) Reduce the maintenance cost for the Statistical Database Project from 1.5 person/year to 1 person/year. IMPLEMENTATION DETAILS If the statement above on the sufficiency of a time series formula to describe the algorithm will prove to be true, then the nightly job calculations can be abandoned. ISU can develop the server application receiving the batch calculation requests from client application(s). This server application will process a request and do all necessary calculations including those done in the night. The advantage is that instead of recalculating everything, the server application will process the updated/inserted data only. Thus the higher performance can be achieved. It is proposed to develop the new dataAdmin application on top of the RADAPI framework. The application will not include any calculations in the beginning and will be used for viewing time series and their definitions. It will require developing a grid directly bound to a table/time series figures. dataAdmin should be based on the formula based calculation plug-in model. The plug-in model should ensure the easy integration of the developed modules into the new application. The next step will be to develop an interface to develop plug-ins inside dataAdmin. The application will give access to this interface to the limited number of users possessing IT skills. We should also consider an idea of merging functions of the old dataAdmin, csvImport and, possibly, dbAdmin applications in the new dataAdmin 7 application. If dbAdmin is included into the new application, the security policy should be reinforced to restrict access to the metadata part for all users other than database managers. ALGORITHM 8