Download Formula based calculations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database model wikipedia , lookup

Transcript
SPECIFICATION
On Formula Based Calculations
In SQL Server Production Databases
OBJECTIVE
As you know, one of the main complains from the side of the Statistical
Division is that the development of the required calculations takes too much
time. This criticism is partly justified, as every stored procedure should pass the
standard scenario of code developing, debugging, testing and integrating into the
dataAdmin application in the production phase. There is no almost any reuse of
the previously developed code.
Users tend to compare the time they spend for calculation development in
spreadsheet applications and the development time for stored procedures. This
comparison is not in favor of the ad-hoc stored procedure approach.
The objective and the scope of this report is to define a specification for the
project that enables statisticians to develop the formula based calculations in
the SQL Server Production Databases as this can be done in the spreadsheet
applications.
PREREQUISITES
Currently, the macroeconomic database can contain more than one product
segment for any combination of Country, Indicator and Periodicity values. The
uniqueness of a time series for any combination of Country, Indicator and
Periodicity values would facilitate the development of the formula based
calculations. This is because the filtering conditions will not be anymore
necessary to introduce.
CONCEPT AND DESIGN
In the first phase it is proposed to develop such an interface that will allow to
certain statisticians possessing some IT skills to
a) Develop modules with arithmetical formulas for time series,
1
b) Integrate the developed modules into the new dataAdmin application
for further testing and production.
If the first phase is successful, we can think of implementing the same approach
for the nightly job calculations.
The plug-in model of the new dataAdmin application stated above will allow
shifting the development of calculation routines to the Statistical Division. If
implemented, this approach will attach more importance to the reengineering of
dataAdmin and dbAdmin applications. dataAdmin application where
statisticians do all their data cooking work is rather primitive application as to its
graphical interface. The main problem is with the data calculations. This is why
it is considered of the primary importance to start the applications reengineering
with the formula based calculations.
The central object of the specification is a time series. Here is a trial routine that
can be used to reproduce the spreadsheet formula calculations for the sample
presumable time series below,
Ri 
A i 1
 100
Bi
where i stands for a time period like, 1999Y or 2005Q2 etc.
Apparently, we can illustrate that the above-mentioned formula translates to the
following Table1 table and SQL coding.
Table1
A
A indicator
Year
98
A 98
99
A 99
00
A 00
01
A01
02
A 02
03
A 03
04
Null
B
Year
97
98
99
00
01
02
03
B Indicator
B 97
B 98
B 99
B 00
B 01
B 02
B 03
2
R
Year
98
99
00
01
02
03
04
R Indicator
R 98
R 99
R 00
R 01
R 02
R 03
Null
The values for “B Indicator” values are shifted in order to ensure the previous
year denominator in the formula. After the shifting is done, the result value in
the “R Indicator” is just the division of the values in the corresponding
row/record. Below is the SQL statement that can ensure this operation,
UPDATE Table1 SET
R_Indicator = A_Indicator/B_Indicator*100
It is clear that the SET clause of the UPDATE SQL statement contains just a
simple arithmetic formula complying with the Transact-SQL syntax. Thus the
main problem of developing the syntax analyzer can be circumvented because
the Transact-SQL compliant arithmetic formula can be used instead.
A statistician can develop and save this arithmetic formula to the database. We
are sure that some statisticians like, Ioussoufou already have all necessary IT
skills to develop arithmetical operations compliant with the Transact-SQL
syntax. The stored procedure will collect the saved metadata like, shifting,
arithmetic formula and other eventual similar operators in order to dynamically
build the UPDATE SQL statement.
Such an approach was successfully implemented in the PC-Axis data mapping
procedure. Until now, we received very few requests for the development of the
data mapping SQL queries.
GROUPING AND PIPING
Apart from shifting and arithmetic formula, we need to add another primitive
operator, GROUPING. The grouping operator will be used to make the
aggregate operators like, SUM, AVERAGE etc. Let’s take an example table.
Table2
Q Year
98
98
98
98
99
99
Q Quarter
Q Indicator
Q1
Q2
Q3
Q4
Q1
Q2
Q
98, q1
Q 98,q 2
Q 98,q 3
Q 98,q 4
Q 99,q1
Q 99,q 2
3
99
99
00
00
00
00
01
01
01
01
Q 99,q 3
Q3
Q4
Q1
Q2
Q3
Q4
Q1
Q2
Q3
Q4
Q 99,q 4
Q 00,q1
Q 00,q 2
Q 00,q 3
Q 00,q 4
Q 01,q1
Q 01,q 2
Q 01,q 3
Q 01,q 4
When grouping operator is applied to Q Year column in Table2 table, the SUM
operation applied to Q Indicator column will give a time aggregation of
quarterly data to annual data.
Let’s note the result time series of the grouping operation as Ai . Then this time
series can be used to feed Table1 table used in the example with the shifting
operator. See below how such kind of a statistical calculation is used in the
rebase of quarterly data,
Group quarterly data
Join A and B
Shift B Indicator
Calculate A/B*100 formula
The idea of chaining statistical data manipulations is called piping by analogy
with Unix/Linux shells. The piping of output tables for one operation into input
tables for the next operation will allow hiding the creation/deletion of temporary
tables appearing at every step. The piping may require the joining operation like
it is done with A and B columns in Table1 table data for which are coming from
different tables.
TYPICAL EXAMPLE FORMULAE
a) Share.
Objective
Calculate share of the agricultural sector in GDP?
4
Given
Agriculture in absolute values, Ai
Formula
Sum of all sectors to get GDP. Divide each sector by GDP.
Ri = Ai/ SUM(A1,A2,…An)
b) Aggregated growth rate.
Objective
Calculate GDP growth rate from growth rates of contributing indicators.
Given
Growth rate and values in constant prices for indicators contributing to
GDP (e.g. agricultural sector).
Formula
Calculate weights from the absolute value for a base year (value in
constant prices),
Wi = Ai/SUM (A1, A2, ..An)
Then multiply time series for indices (growth rates for every sector
contributing to GDP) by 100 to calculated weights (and divide by 100???), Then
sum up the result time series to get the GDP growth rate
R2005 = SUM (Ii, 2005 * Wi, 2005 )
Ask
Why is the GDP growth rate not calculated from its values in constant
prices by summing up and finding the ratio?
c) Contributions to GDP growth.
Objective
How much the agricultural sector contributes into the GDP growth?
Given
Agriculture for year 2005 in absolute figures C2005 = 200,0
Agriculture for year 2004 in absolute figures C2004 = 150,0
5
Formula
Agriculture growth in absolute figures G = C2005 - C2004
GDP growth in absolute figures D = SUM (A, B, C, D) = 2000,0
(calculated as a sum of all growth figures for all sectors). Weight of agricultural
sector is the result figure,
R = G/D = 50/2000 * 100= 2.5%
d) Deflator.
Deflator is a ratio of the agricultural sector in current prices to the
agricultural sector in constant prices.
D = Acur, 2005/Aconst, 2005 * 100
e) Rebasing quarterly data.
Rebase is an operation to bring figures for one base year to another base
(common) year. Below is an algorithm how to rebase quarterly data,
UNRESOLVED PROBLEMS
The ad-hoc stored procedures abound with various verifications. This ad-hoc
code may become the main obstacle for this project. The objective of this project
is to give a tool to statisticians to independently develop calculation modules.
They will not be able to code that kind of ad-hoc verifications.
The central object of the project is a time series and not tables with records and
columns. Below is an assertion which accuracy we have to evaluate.
As the statistical formula (e.g. R i 
A i 1
 100 ) is sufficiently describing the
Bi
calculation in question and does not contain any conditional statements, the
data update routine should not have any conditional statements either!
CONCLUSION
The project will require developing the following software,
1. Primitive preparatory operators, like shifting and grouping,
2. Arithmetic formula computations,
3. Piping the result tables,
6
4.
5.
6.
7.
Conditional operators,
Update of the time series definitions and data,
Plug-in development interface,
New dataAdmin with an integration of plug-ins.
CONSEQUENCES
If implemented, this will allow to:
a) Shift the development of some statistical computations to statisticians,
b) Reduce the maintenance cost for the Statistical Database Project from 1.5
person/year to 1 person/year.
IMPLEMENTATION DETAILS
If the statement above on the sufficiency of a time series formula to describe the
algorithm will prove to be true, then the nightly job calculations can be
abandoned. ISU can develop the server application receiving the batch
calculation requests from client application(s). This server application will
process a request and do all necessary calculations including those done in the
night. The advantage is that instead of recalculating everything, the server
application will process the updated/inserted data only. Thus the higher
performance can be achieved.
It is proposed to develop the new dataAdmin application on top of the RADAPI
framework. The application will not include any calculations in the beginning
and will be used for viewing time series and their definitions. It will require
developing a grid directly bound to a table/time series figures.
dataAdmin should be based on the formula based calculation plug-in model.
The plug-in model should ensure the easy integration of the developed modules
into the new application.
The next step will be to develop an interface to develop plug-ins inside
dataAdmin. The application will give access to this interface to the limited
number of users possessing IT skills.
We should also consider an idea of merging functions of the old dataAdmin,
csvImport and, possibly, dbAdmin applications in the new dataAdmin
7
application. If dbAdmin is included into the new application, the security policy
should be reinforced to restrict access to the metadata part for all users other
than database managers.
ALGORITHM
8