Download Application Mapping

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Navitaire Inc v Easyjet Airline Co. and BulletProof Technologies, Inc. wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
e Books
Application Mapping
on
IBM i
Robert Cancilla
2 Application Mapping on IBM i
Chapter 1: Application Mapping on IBM i: Present
and Future
There is a great deal of uncertainty about the future of IBM i, including mixed messages from IBM following the
consolidation of Systems i and p into the new Power Systems brand. Although IBM may have weakened the public
perception of the brand, the hardware and software still deliver what they always have—rock-solid reliability and
dependable applications.
So although the world has forgotten about the “AS/400” and green screens, there are still huge code bases written
over the past 10 to 40 years (RPG celebrated its 40th birthday in 2009) powering corporations of all sizes. The
investment this technology represents can’t simply be replaced with packaged ERP software or quickly rewritten
in a new language or framework. The fact that these systems are still running is a testament to the success of the
platform and its development ecosystem in general. This is a point that seems lost on the wider development and
business community. There is simply no other system that supports—in their original form—applications written
more than 40 years ago, without source-code modification.
The challenge for today’s IBM i sites is how to retain sufficient development resources to maintain and develop the
applications as the number of active RPG people diminish through promotions, retirement, and natural attrition.
There has to be a way of enabling new people to understand quickly and accurately the complexities and subtleties
of these sometimes vast systems and give them the confidence to make changes and extend these systems, even
though they will never have developed anything like them themselves. This chapter describes this growing challenge
in some detail and also explains how new technologies and concepts are evolving to provide solutions and bolster
IBM i development.
A typical application on IBM i could be anything from a few thousand to many millions of lines of code, with all the
complexity, design inconsistencies, languages, syntaxes, and semantics that go with years of ongoing development.
Mission-critical applications consist of a great many physical files or tables, and programs. The interdependencies
of program-to-file and file-to-program alone can easily reach hundreds of thousands. We’re not talking about the
abstracted or esoteric nature of individual pieces of technology here, but entire business systems.
As with any successful management system, the key is information about your systems. The level of detail and
availability of this information is another critical factor, which has already been proven in business by the success
of Enterprise Resource Planning (ERP) and business systems in general. The requirement is not a new one but is
becoming more universal as systems continue to grow and mature. A key question is how to manage the cost and
risk of maintaining and modernizing these systems.
Let’s examine how application mapping has become a core solution to the problem. Application mapping means
analyzing and extracting a database of information about the resources that constitute a business application system.
Making Informed Decisions
Mapping an entire application provides a baseline of information for all sorts of metrics and analysis. Counting
objects and source lines is generally the most common practice used for obtaining system-wide metrics. Many
companies carry out software project estimations and budgeting using only this type of information. To some
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 3
degree, the level of experience and technical knowledge of a manager and his staff might help in getting accurate
numbers, but more often than not, it’s mostly guesswork.
A slightly more advanced approach used with RPG or Cobol applications is to dig more deeply into the application
and count design elements within the programs themselves. These elements include
•files
•displays
•subfiles
•source lines
•subroutines
•called programs
•calling programs
By using a simple formula to allocate significance to the count of an element, you can categorize programs by
their respective counts into low, medium, and high complexities. This type of matrix-based assessment, which
Figure 1shows, is still fairly crude but
adds enough detail to make estimations
and budgeting much more accurate
without too much additional effort.
Another common practice is to take small
representative samples, such as those
selected for a proof-of-concept (POC), do
project estimations, and then extrapolate
this information in a simplistic linear way
across the entire system or for an entire
project. This method naturally relies upon
the assumption that design, style, and
syntax for the entire application are consistent with the samples used for the POC. The reality is that samples are most
often selected for POCs based on functionality rather than complexity. Sometimes the opposite is true, whereby the most
complex example is selected on the basis of “if it works for that, it’ll work for anything.”
Calculations that use comprehensive and accurate metrics data for an entire application, versus data from a sample, will
exponentially improve the reliability of time and cost estimation. Risk is not entirely removed, but plans, estimates, and
budgets can be more accurately quantified, audited, and even reused to measure performance of a project or process.
Some more advanced techniques to measure application complexity are worth mentioning. If such techniques are
used over an application map, a number of very useful statistics and metrics can be calculated, including detailed
testing requirements and a “maintainability index” for entire systems or parts thereof.
Building Application Maps
As application knowledge is lost and not replaced, the cost of ownership of these large, complex IBM i applications
increases, and maintenance becomes more risky. The CL command Display Program References (DSPPGMREF)
provides information about how a program object relates to other objects in the system. Figure 2 shows an example
of DSPPGMREF’s output. The information is useful in determining how a program relates to other objects. It is
possible to extract this information and store it in a file, as Figure 3 shows, and then carry out searches on this file
during analysis work.
Brought to you by Databorough and System iNetwork eBooks
4 Application Mapping on IBM i
A much more efficient way of
presenting the same information,
however, is to show it graphically.
Additional information, such
as the directional flow of data,
can be added to diagrams easily.
Systems design and architecture
is best served using diagrams.
Color coding within these
constructs is also important
because it helps people assimilate
structure and logically significant
information more quickly. A good
example of using a diagram for
more effective communication is
to use it to show where program
updates take place, for example,
by using the color pink, as
Figure 4 shows (SLMEN and
CUSTS are the two updated
tables in this program).
Embedding other important
textual information such as an
object’s text into or along with
diagrams is another way of
presenting information effectively
and efficiently. In Figure 4, you
see how graphical and textual
information combine to provide
rich information about the
program references. The diagram
also uses arrows to show the flow
of data between the program and
the other objects.
Tom Demarco, the inventor of the data flow diagram concept, stated that what is critical is the flow of data through
a system. Application mapping information can be extended, as Figure 5 shows, to simultaneously include details
about individual variables associated with each of the referenced objects. In the case of a program-to-program
relationship, the method used to extract this level of precise variable detail is to scan the source code of the
programs and establish which entry parameters are used.
In a program-to-file relationship, the diagramming job is somewhat more tedious because you must look for
instances in which database fields and corresponding variables are used throughout the entire program. Also useful
is seeing where individual variables are updated as opposed to being used just as input. The diagram now presents a
rich set of information in a simple and intuitive way. The amount of work to extract and present this level of detail
in a diagram can quickly become prohibitive, so the task is therefore better suited to a tools-based approach rather
than to manual extraction.
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 5
Figure 6 shows a program-centric
diagram. The same diagram in
which the file is the central object
being referenced is also useful in
understanding and analyzing complex
applications. The same diagrammatic
concepts can be used: color coding
for updates, arrows for data flow,
and simultaneous display of detailed
variables. By using the same diagram
types for different types of objects
in this way, the same skills and
methods can be reused to twice the
effectiveness. Figure 5 shows how
additional information, such as related
logical files (displayed as database shapes), can be added and easily
recognized by using different shapes to depict different object types.
Application mapping and formal metric analysis were first attributed
to Thomas J. McCabe Sr. in 1976 and Maurice Howard Halstead in
1977 (see the sidebar, “Calculating Complexity.”)
Functionally Organizing an Application
Single-level information about an RPG or Cobol program is
obviously not enough to understand a business system’s design.
You need to be able to follow the logical flow downward through
the application. You can use the DSPPGMREF output to do this.
If you start at program A and see that it calls program B, you can
then look at the DSPPGMREF information for program B, and so
on. Additionally, you can deduce precisely in this structure where
and how data, print, and display files are being used in the call
stack, which is very useful for testing and finding bugs that produce
erroneous data.
For large, complicated systems, this can be a slow and tedious
process if done manually using the display output of DSPPGMREF.
Extracting all programs’ DSPPGMREF information out to a single
file makes it possible to recursively query the file to follow the calls
down successive levels, starting at a given program. This process can
then show the entire call stack or structure chart for all levels, starting at a given program or entry point.
A given program’s call stack or call structure can be represented much more effectively diagrammatically than
with any textual description alone. Quite often, these call stacks may go down as many as 15 levels from a single
starting point. Therefore, being able to display or hide details according to the information required at the time is
important, along with having search facilities built in to the application map that supports the diagrams.
As with other diagrams, color coding plays an important role in classifying objects in the stack by their general use,
such as update, display, input only, and so on. Figure 7 shows the structure of a program as seen graphically. Additional
Brought to you by Databorough and System iNetwork eBooks
6 Application Mapping on IBM i
information, such as what data files, displays, and
data areas are used by each object, can be added
to enrich the information provided.
This diagram alone, however, doesn’t tell
you where you are in relation to the overall
hierarchal structure of the application. You don’t
know whether the program is an entry point
into the system or is buried in the lower levels
of the application.
For better understanding of an entire system,
therefore, objects need to be organized
into functional groups or areas. This can
be achieved by using naming conventions,
provided that they exist and are consistent
across the application. The entry points into the
application need to be established. Sometimes a user menu system is useful for this but is not necessarily complete or
concise enough. One way to establish what programs are potential entry points is to determine each program’s call
index. If a program isn’t called anywhere but does call other programs, it can essentially be classed as an entry point
into the system. If a program is called, and in turn if it calls other programs itself, it’s not an entry point.
A functional area can be mapped by selecting an entry point (or a group of them) and then using the underlying
application map to include all objects (everything, including programs, files, displays) in the call stack. Figure 8
shows a diagram of a series of entry points and their relative call stacks grouped as a functional area.
To more accurately describe an entire system’s architecture,
functional application areas might need to be grouped into
other functional application areas. These hierarchal application
areas can then be diagrammed, showing how they interrelate
with each other. This interrelation can be hierarchal but also
programmatic because some objects might be found in more than
one application area simultaneously.
Figure 9 is a diagram showing how application areas
interrelate. For the sake of clarity, the diagram includes only
those programmatic interrelations from entry-level objects.
The diagrams show how the accounting Main application area
has other (e.g., B, A1) application areas embedded in it. The
red lines show the programmatic links between objects within
the application. In this example, the level of interrelation
has been limited to programmatic links between entry-point
programs and programs they call in other application areas.
This is a good way of mapping business functional areas to
application architecture in a simple diagram.
Logical subdivisions of an entire application are also being
employed in other areas of application management. Some
of these include
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 7
•clear and concise allocation of responsibility for
maintenance/support of a set of objects
•integration with source change management tools for
check-in and check-out processes during development
•production of user documentation for support, training,
and testing staff
Mapping Databases
An IBM i business application is primarily an application
written over a relational database. Therefore, no map of
an enterprise application would be complete without the
database architecture explicitly specified—not just the
physical specifications and attributes but the logical or
relational constraints, too.
With the possible exception of CA 2E systems, virtually
all RPG or Cobol applications running on IBM i have no
explicit relational data model or schema defined. This means
that millions of lines of RPG or Cobol code must be read in
order to recover an explicit version of the relational model.
What you need to know is what keys constitute these links or
relationships between physical files or tables in the database.
The first task is to produce a key-map of all the primary
keys and fields for all physical files, tables, logical files, access
paths, and views in the database. By using a simple algorithm
and looking at the DDS or DDL, you can often determine
whether foreign-key relationships exist between files.
Figure 10 shows a diagram of this simple algorithm
using the database definitions themselves.
A more advanced and comprehensive approach for
determining foreign key relationships is to analyze the
program source code for the system. If you look at the
source code of a program and see that more than one
file/table is used, there’s a possibility that these files are
related by foreign key constraints. By finding instances
in the program in which one of the files is accessed for
any reason, and determining the keys used to do so,
you can then trace these variables back through the
code to keys in another file in the program. If at least
one of the key fields match in attribute and size with
the other file and is also part of the unique identifier
of the file, you have a strong likelihood that there’s a
relationship between these two files. By then looking at
the data using these key matches, you can test for the
truth of the relationship. By cycling through all the files
Brought to you by Databorough and System iNetwork eBooks
8 Application Mapping on IBM i
in the system one by one and testing for these matches
with each and every other file, you can establish all the
relationships.
This task is complicated generally by the fact that
the same field in different files will usually have
a different mnemonic name. When analyzing
the program source, you’ll have to deal with data
structures, renames, prefixes, and multiple variables. If
you have the program variable mapping information
at your fingertips beforehand, the analysis process
will be a lot quicker. The vast majority of this type
of repetitive but structured analysis can be handled
programmatically and thus enable completion of the
task in a few hours rather than several months. Such
automation naturally allows for keeping the relational
model current at all times without huge overhead on
resources.
Once explicitly defined, the relational model or
architecture of the database can be reused in a number
of scenarios, including
•
•
•
•
•
•
•
understanding application architecture
testing data quality for referential integrity
extracting test data
scrambling and aging test data
building BI applications a data warehouses
mapping data for system migrations
building object relational maps for modernization
Database access in all modern languages today is
primarily driven by embedded SQL. IBM i legacy
databases are typified by transaction-based table design
with many columns and foreign key joins. This makes
the task of writing SQL statements much more difficult
and error prone unless the design of the database is
clearly understood. It also creates an environment in
which it’s relatively easy for inexperienced developers
or users to write I/O routines or reports that have
an extremely negative performance impact. One way
to combat this problem is to provide detailed design
information about the database being accessed. Figure 11
shows a typical entity relationship diagram, and this can
be accompanied with the underlying foreign key details,
as Figure 12 shows.
Another, more generic approach to ensuring integrity of the database, guaranteeing productivity for modern
technology developers, and limiting negative I/O performance impacts is to build a framework of I/O modules as
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 9
stored procedures. The explicitly defined
data model is a key source of information
and will greatly simplify building of such a
framework and can even be used to automate
the generation of the framework itself.
It’s also worth mentioning that products such
as IBM’s DB2 Web Query for i can become
exponentially more useful and productive if
the metadata layer is properly implemented.
The derived data model can be used to build
this data instantly for the entire system.
Hard-Coding Application
Knowledge
The output of DSPPGMREF is a great starting point for the type of mapping I’ve described so far. To produce such
details and abstractions, the application source code needs to be read and analyzed.
From a design perspective, application software is made up of discrete layers or levels of detail. In an IBM i
application for example, libraries contain programs, physical files, logical files, data areas, commands, and many
more object types, and programs might contain file specs, variables, subroutines, procedures, display definitions,
arrays, and various other language constructs. Data files have fields and text descriptions and keys and other
attributes. Having an inventory of all these elements is useful—but only in a limited way, from a management
perspective. What’s needed is context. For example, mapping what files and displays are specified in a program
helps you understand at an object level the impact of change. This rudimentary mapping provided by most program
comprehension tools is limited in its usefulness because it still provides information at only a single level.
Mapping all levels of detail and how they interrelate with all other elements at all levels is the ultimate objective.
The only way to achieve this is to read the source code itself line-by-line and infer all relationships implicit in each
statement or specification. Naturally, the mapping process must allow for variants of RPG, Cobol, and CL going
back 20 years, if it is to be useful for the vast number of companies that have code written 20 years ago in their
mix. Relatively few humans have such knowledge or skill and, as I’ve mentioned, few people could keep up with
the workload required for even the most modest of IBM i applications. Computer programs can be “taught” such
knowledge and retain it permanently. Such programs can also be reused as often as necessary to keep abreast of any
code changes that occur.
Prebuilding the application map and storing it in an open and accessible format, such as a spreadsheet in Google
Docs, is also an important aspect of the overall usefulness of such information. Figure 13 shows the output of a
DSPGMREF uploaded into a Google Docs spreadsheet and being filtered. Having the map available provides for
any number of complex, system-wide abstractions or inquiries at acceptable speeds.
For a complete and accurate application map, you have to follow the trail of inferred references described in the
programs themselves. This is obviously a labor-intensive task made all the more difficult by common coding
practices, such as
•overriding the database field name in a CL program
•prefixing fields from a file being used in an RPG program
•moving values from database fields into program variables before passing them as parameters to called programs
Brought to you by Databorough and System iNetwork eBooks
10 Application Mapping on IBM i
• changing key field names
between different database
files
• passing the name of the
program to be called as a
parameter to a generic calling
program rather than making a
direct call
If the prebuilt application
map includes all these
inferred logical references,
measurement of impact can be
complete and, more important,
instant. It also means that
higher-level analysis of rules
and model-type designs is
easier by virtue of the easy
availability of variable- and
object-level mapping.
Moving Forward with Confidence
Application mapping provides a new way to manage and modernize complex business applications. It’s also a way
facilitate collaboration between modern and legacy developers. Think about what computerized mapping has done
for navigational and guidance systems in our day-to-day lives and travels. Similarly, application mapping provides
a strong platform for a number of benefits and technologies that will continue to evolve for many years. I’ll discuss
these subjects further in the following chapters.
Calculating Complexity
Halstead Complexity Metrics
Halstead complexity metrics were developed by the late Maurice Halstead as a means of determining a quantitative measure of complexity
directly from the operators and operands in the module to measure a program module’s complexity directly from source code. These
metrics are among the earliest formal software metrics. They’re strong indicators of code complexity based on the fact that they analyze
actual source code. These metrics are most often used as maintenance metrics. They’re one of the oldest measures of program complexity.
See en.wikipedia.org/wiki/Halstead_complexity_measures for more information.
Cyclomatic Complexity
Cyclomatic complexity is a software metric (measurement) developed by Thomas McCabe. It measures the amount of decision logic
in a single software module. Cyclomatic complexity is used for two related purposes. First, it gives the number of recommended tests
for software. Second, it is used during all phases of the software life cycle, beginning with design, to keep software reliable, testable,
and manageable. Cyclomatic complexity is based entirely on the structure of software’s control flow graph. See en.wikipedia.org/wiki/
Cyclomatic_complexity for more information.
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 11
Chapter 2: Writing Programs to Update Your
Programs
As a follow on to Chapter 1, let’s look at how you can use application mapping to actively change an entire system
programmatically. Using the application map as a primary input, some simple reengineering concepts, and a fair
amount of time to perfect, you can write programs to update application programs. This approach has saved many
companies literally thousands of man-hours and millions of dollars.
The writing of programs to update your programs is typically used as a way to make structural changes to the
application source, not functional changes. When a system enhancement produces a large number of fairly simple
system-wide changes, programmatic automation of these changes begins to make sense. The most obvious example
of this is Y2K. Some companies spent as much as five million dollars to change their systems for Y2K compliance.
Some companies used programs to carry out the same amount of work on similar-sized systems for five percent of
the cost. How did they do that, and why is this relevant nearly 10 years later?
After an application’s life of 20 to 30 years, it’s fairly safe to assume that there might be a business demand to change
important and well-used fields in the database. This demand might be driven by industry standardization, system
integration, upgrades, internationalization, or commercial growth (e.g., you run out of invoice numbers or even
customer numbers).
Y2K affected almost every RPG application in existence. It also affected just about the entire application in each
case. Since 2000, most systems have grown at a rate of 10 percent per year. It’s a widely acknowledged fact that RPG
resources haven’t kept pace with this growth. In reality, they’ve probably reduced by the same amount each year. So
although database changes are now generally industry or company specific, the problems and their related solutions
remain the same—but with more code affected and fewer people to fix it.
There are several applications for automated reengineering of a system, which I briefly mention later in this chapter.
Solving a field-expansion problem is, however, relevant for many companies, so I use it to flesh out the subject of
this chapter in more detail.
An Engineered Approach
A more conventional approach to solving a fieldexpansion problem is to get a feel for the scope and size
of the problem, understand clearly the requirements
for the change, and then send one or many developers
off to fix the problem one program at a time. Figure 1
illustrates this manual approach.
Many problems are associated with this approach. Here
are a few:
•labor-intensive
•vague (at best) scope and timelines
•not repeatable
•prone to human error and inconsistencies and therefore
risky
Brought to you by Databorough and System iNetwork eBooks
12 Application Mapping on IBM i
The upside is of course that such an approach
requires little preparation and little initial investment
in time or money. It’s also generally flexible and
therefore useful for small projects. There is, however,
a risk that humans were unable to identify all required
changes and do so in a consistent manner across all
programs that they changed. As the size of the system
increases, the risk of failure increases exponentially.
The basis of an engineered approach is to break
down the process into a set of discrete, repeatable,
and automated steps. Each step is then applied
across the entire system or project and repeated
until an optimum result is achieved. Figure 2 shows
diagrammatically how this approach compares to a
conventional manual approach.
Many benefits are associated with a structured and
engineered approach. Some of these include:
•each step is repeatable and so can be perfected
•outcome is more predictable
•scope and approach can be changed without a loss of expended effort
•latest code version can be introduced at the last minute
•far fewer resources are required
•process is much quicker
•potentially less testing is required because changes are consistent
Without an explicit, detailed, and very precise measurement of the impact of a database change across the system,
automating the required changes would be impossible. Let’s start by looking at this task in more detail.
Establishing the Precise Scope of a Task
Even in well-designed and well-documented systems, the impact of changing the database of an integrated and
complex application on IBM i can be huge. Just the recompilation and data copying tasks can create logistical
nightmares. The most significant and difficult task is of course measuring the impact on source code across the
entire system. If analysis is done right, subsequent work will be highly predictable and measurable. If analysis is done
incorrectly, the results could be catastrophic. Overruns in project timelines are just one possible impact, and I don’t
think I need to elucidate the potential outcome of having “missed” something in a production system.
Specifying fields to be changed. The first task in the analysis stage is to specify which fields need changing in
the database. This task should be straightforward but may be complicated by virtue of integrated systems, poor
documentation, or often a combination of both. The next step, which I describe in a moment, may actually produce
results that warrant additional fields being added and included in the process.
Finding where fields are used. The next step is to establish precisely where these fields are used throughout
the system. This is where things start to get tricky. Establishing the explicit use of a given field by its name can
be achieved with a simple Find String Using PDM (FNDSTRPDM) command. You then need to start at these
specific points and establish where these fields are associated with any other variable or data construct, by virtue of a
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 13
compute or definition statement. There’s only one way to do this, and that’s to read the source code of every single
instance in which the field being changed is used. RPG applications have many technical constructs that make this
type of analysis complex and time consuming. For example:
•the use of variable names that don’t match or resemble database field names
•the use of Prefix or Rename key words in the programs
•the need to trace input and calling parameters
•the existence of CL programs that have no file definitions
•the use of data structures and arrays
•undefined input and return parameters in procedure prototypes
Legacy cross-reference tools can help with this analysis up to a point. That point ends at each level or instance of
a variable. So many individual queries—sometimes thousands—need to be run and amalgamated when using these
older technologies. Figure 3 shows a simple example of conventional approaches being used to analyze tracing the
CUSNO field.
The obvious answer to this
problem is to prebuild an
application map of the entire
system being analyzed,
where variable-field-variable
associations are instantly
available. Using this map,
you can write a program that
traces a field throughout all
its iterations and variants
across the entire system in
a single query. Some of the
trace work is accomplished in
a previous stage in the form
of prebuilding the application
map. Let’s look at an example
of this at work.
Figure 4 shows the source of
a CLP named CUSLET. If I
were to carry out a traditional
analysis on a system with this
program in it, looking for the impact of a change to
the field CUSNO, this program wouldn’t show in
the results.
Figure 5, however, shows a snippet of the source
of an RPG program that calls CLP CUSLET
passing the parameter CUSNO. Figure 6 shows the
spreadsheet of the results of our extraction program
written over the application map, and we can see
that CUSLET has been included in the analysis
Brought to you by Databorough and System iNetwork eBooks
14 Application Mapping on IBM i
results. This is because the
parameter CUSNO was passed to
CUSLET from the RPG program
displayed in Figure 5.
The output of this analysis is a
specific list of all source members
and lines therein that are affected
by the proposed field changes.
Making the required changes
programmatically. Changes
that can be made without causing
any conflicts can be done programmatically. The percentage of these against the total changes required may vary from
project to project, but essentially this task can be fully automated with a carefully written program. The tedious and timeconsuming part of writing a program to do this is accounting for all instances or specific types of change. Nevertheless,
these programmatic changes can provide a significant productivity gain in any project. There are different standards that
can be used to notate and make the changes, such as making comments in margins, commenting out replaced code, or just
overwriting existing code. This can be done one way during iterative trial conversions and then changed for a production
conversion with little effort.
Note: It may be desirable to retain the original code as comments during the project but remove it prior to the final
production implementation.
These programmatic changes can be categorized into two types:
Direct Definition Changes: Direct definition changes can be made where database fields or variables that can be
traced back to database fields are defined. This includes files, displays, reports, and programs (RPG or CL) and
refers to D-specs, arrays, and in-line calc specs, amongst others. This type of change is straightforward and is
the most obvious candidate for programmatic change. Figure 7 shows the source of a physical file that has been
programmatically updated and has had the original code commented out. Columns 1―5 have had the programmer’s
name added for audit purposes.
Indirect Definition Changes: In some cases, direct definition changes have a “knock-on” effect. For example, if a field
is expanded by two digits, and this field is used before the end of an internal data structure in an RPG program, the
other elements in the data structure must be adjusted to accommodate this change. Similarly in a print file format,
a column-size increase may require columns to the right to be shifted to make space. In some cases this “knock-on”
effect may actually cause conflicts of various types. These conflicts might be resolved by using clever algorithms in
the programs that make the changes, but usually conflicts require human intervention. Figure 8 shows an example
of how the data structure definition is adjusted, the second element is expanded, and subsequent elements are
moved to accommodate
this change. This
type of change is
fairly straightforward
to program into the
automated process. The
time-consuming part
is finding and allowing
for all different types
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 15
of patterns of instances in a system. As
such, the repeated use and fine-tuning of
programs that make changes to programs
makes them naturally more useful with
each successive project.
Managing design conflicts and manual
intervention. In virtually every fieldexpansion project, there will be design
problems that arise from the proposed
changes. These might vary from a
simple overlay or overflow on a report
to embedded business logic based on
a field substring. Although it may be
impossible to automatically make changes
to these constructs, it’s possible to
programmatically identify where they occur. Again, the role of the prebuilt application map is critical to this process
as a primary input to the search algorithms. These conflicts can be clearly identified by subtracting the changes
made programmatically from the total required changes. These conflicts can be generally categorized as follows:
Device Problems: Device problems are those in which any direct change or shuffling of affected columns runs out of space.
Program Problems: An example of a program problem is lines where there may be a conversion problem because a
resized field (or a field dependent upon it) is a subfield in a structure that can’t be resized. Another example is when
a work field is used in a program by two fields at various stages. One field is being resized, and the other isn’t. Again
this requires design logic to resolve.
Database Problems: The whole process of solving a field-expansion problem starts by specifying which fields will be
changed. The where-used analysis, when run on resized database fields, might trace to fields not included in the
resize exercise. This may or may not be a problem but generally must be assessed manually.
Some of these problems might be resolved by making some manual changes before rerunning the analysis and
programmatic changes. In certain cases, this process might have an exponential effect of removing problems
with a conversion project. In other cases it will be necessary to make these design decisions and changes after the
completion of the programmatic changes. The objective of this stage is an optimum result combining programmatic
changes with whatever manual intervention is deemed necessary.
Final Conversion and Production Integration
The automated nature of this process allows for the latest version of the source code to be brought in and run
through the first three stages. It’s also only at this stage that formal software configuration management (SCM)
policies and procedures need to be implemented.
In many cases, no conversion or change will take place, but a recompile will be needed. Again the application
map can be used to good effect here. Simply building recompile lists based on the converted source code and
all related objects from the where-used information will help ensure that nothing is missed. It also means that
simple CL programs can be written to bulk recompile and incorporate any compilation strings in the compile
commands.
Brought to you by Databorough and System iNetwork eBooks
16 Application Mapping on IBM i
Application Modernization
Structural changes to an application can be a key part of a company’s modernization strategy. Some of these
structural changes are motivated by more strategic objectives, such as agile development, reusable architecture,
and functional redesign. Other modernization projects are driven by more commercial demands, such as
internationalization.
Unicode conversions. An increasingly popular modernization requirement on IBM i is Unicode conversion. The
principle of a Unicode conversion is largely the same as that of a field-expansion project: changing the attributes of
database and display fields and updating all affected logic in the programs. There are some differences in the process
and requirements, but the same approach can generally be followed. Indeed the same programs used for field
expansion can be enhanced to accommodate for Unicode conversions without too much work involved.
Let’s look at some simple examples of what could be changed programmatically with a Unicode conversion. The
first aspect is updating the fields in the files and displays. This sort of change is consistent with the field-expansion
algorithms mentioned earlier in this chapter. Figure 9 shows how the field definition for the COMPANY field has
been updated to a type G, and the desired Unicode Coded Character Set Identifier (CCSID) has been specified in
the function column for this field.
Figure 10 shows how the H-spec
of an RPGLE program has been
automatically updated with the
requisite CCSID code. In this
instance, the CCSID H-spec
keyword is used to set the default
UCS-2 CCSID for RPGLE
modules and programs. These
defaults are used for literals,
compile-time data, programdescribed input and output fields,
and data definitions that don’t
have the CCSID keyword coded.
Figure 11 shows how, by using a
fairly straightforward algorithm,
your automated program can
intervene in your C-specs and
automatically convert statements
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 17
to include the %UCS built-in function (BIF) where required. In this example, as with the field-expansion samples,
old lines have been commented out to show how the programmatically created new line has been changed.
There are two important points to make regarding Unicode conversions:
•Unicode data isn’t supported in non-ILE versions of RPG. If you want to implement Unicode support in non-ILE
RPG programs, you must convert them to RPGIV (ILE RPG) source code and recompile beforehand.
•IBM is actively enhancing Unicode support on the IBM i through the release of PTFs both for the DB2 for i
database and for the ILE RPG compiler.
Externalizing database I/O. Another increasing trend in the IBM i application space is the need to separate out I/O
logic from legacy programs. One primary motivation for this trend is the necessity for making significant changes to
database architecture without interrupting proven process and business logic.
Another business driver for this trend is from companies replacing legacy custom software with off-the-shelf
applications but wanting to keep certain core functions running as is, at least for a period of time. In this scenario,
mapping to the replacement database architecture can be carried out without interruption to critical legacy
functions, provided of course that the database I/O has been externalized from the legacy programs first.
The algorithms used by programs that would automatically make such a change would be different from a fieldexpansion process, but once again the core asset here would be the application map for the initial analysis. These
reengineering programs can then be designed to identify and convert all source code instructions needed to transfer
file I/O into external modules, giving identical functionality.
Thus the code in Figure 12 shows how an I/O statement is replaced with a procedure.
Another requirement of the
reengineering programs is
to automatically build fully
functional I/O modules, which
can then be adapted to a
radically changed database, with
no impact on the reengineered
RPG code—the module returns
a buffer identical to the original
file. So if you wanted to switch
to a completely new customer file, you could
simply change the I/O module code (as shown
in Figure 13), and the hundreds of RPG
programs using the CUSTS file would require
no source changes whatsoever!
Refactoring monolithic code into services.
Another important way of using programs
to update programs is in the area of building
services from legacy application code.
There are many articles and guidelines
from leading thinkers, such as Jon Paris,
Susan Gantner, and others, on the subject of
Brought to you by Databorough and System iNetwork eBooks
18 Application Mapping on IBM i
using subprocedures over subroutines. This is fine for new applications, but most interactive legacy programs are
written in a monolithic style, which can severely limit long-term modernization opportunities, not to mention add
significant stress and complexity to ongoing maintenance and development tasks in general.
By advancing the algorithms of replacement and code regeneration described in all three areas here, it’s possible
to refactor monolithic programs by externalizing the subroutines into procedures automatically. Breaking up the
program into two components like this makes the rewrite of the user interface layer easier but simultaneously
makes available externalized subprocedures as callable services. This is a great way to start on a staged application
reengineering while realizing immediate benefit.
Figure 14 shows two subroutines, VALID1 and
VALID2, being invoked in a monolithic legacy
program called WWCSUSTS. A “reworked”
program was written, using similar logic to the
field-expansion and I/O externalization programs
mentioned earlier, to create a new Business
Logic module that would contain procedures
created from all the legacy subroutines in the
original programs. Figure 15 shows the definition
for the wwcustsvalid1 in this new ILE module
WWCUSTSB.
The reworked program updated the
original program to use the service program
WWCUSTSB invoking the appropriate
procedure as opposed to subroutine and passing
the correct parameters. The reworked program
also created the necessary prototypes in the
updated WWCUSTS program, as Figure 16
shows.
A Way Forward
Using programs to update programs is not a
new or even an unusual technique. Combined
with a very detailed application map of an entire
system, this approach to system engineering
can help solve the problem of modernizing and
enhancing large and complex legacy applications
using limited resources in shorter timeframes.
For many companies, this approach has saved
millions of dollars in development costs and
has also provided a means to bring legacy
application code into the world of modern
architectures and techniques.
In the next chapter, I look at how to extract
design model assets from legacy systems. I cover areas such as relational data models and business rules and how
these make legacy applications relevant in a modern context.
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 19
Chapter 3: Auditing Legacy Application Assets
Chances are that you own, support, develop, test, or use a large, complicated application written in RPG, Cobol, or
CA 2E on IBM i. You have a vested interest in the designs and assets that make the application useful to its users.
In the two previous chapters, I looked at how application maps were built, how they were used to reveal granular
architecture and function, and how they were deployed to programmatically reengineer those applications. In
this chapter, I look at a higher abstraction of an application’s design and how this can be used to extend ROI in
modernization for many years.
Architectural Erosion
When business applications are first designed and written, well-thought-out application architecture contributes to
their success and resulting life span. Nothing demonstrates this more conclusively than the success and longevity
over the last 40 years of thousands of IBM i applications, many of which are still in daily use. The continual
enhancement, syntax and programming style variations, general maintenance, along with time and budget pressures
conspire to compromise application architecture. As with geological erosion, architectural erosion is often not
noticeable or problematic until many years have passed. In some cases, and given enough time, the quality, efficiency,
and maintainability of the application will begin to suffer from this natural evolution of the code base. This problem
will vary in significance from company to company and application to application. It’s not uncommon to hear of
cases in which years of continued enhancements to an IBM i application have rendered the application virtually
unmaintainable, especially when matched with delivery-time expectations of users and development budgets.
The Modern Technology Tease
The last decade has seen the introduction of many powerful enhancements to IBM i, the RPG/Cobol syntax/
compilers, and DB2 for i. The benefit of these enhancements has remained tantalizingly out of reach for most
of the current applications, for the simple reason that most of the current application code is written using
monolithic procedural methods. Integration with other systems, modernizing the user interface, implementing SOA
strategies—all expect a distributed application design, if the task is to be done in a sustainable and optimum way.
The task of rewriting these entire systems to take advantage of these modern technologies has, for most companies,
been too expensive and risky. The optimum approach is to establish what code is useful and relevant and therefore
should be rewritten or refactored into modern constructs and program designs. Even with application mapping
technologies, this is still a significant task on any complicated legacy application.
Optimizing an Application with the Business
Generally, there will be varying degrees of consensus about the relevance of a company’s legacy application,
depending on who you ask in the organization. The specific touch points between the application function and
the business process are rarely known in their entirety and even more rarely documented and, therefore, auditable.
It’s also not uncommon for applications to outlive users at companies by many years. If the original application
designers are no longer with the company, it stands to reason that potentially large parts of the application design
assets are known only by the application itself.
With the application designs explicitly defined, documented, and ready to hand, analysts and architects can map
these designs to business architecture and process accordingly. This can also form the basis of subsequent renewal
or replacement strategies, if applicable. In addition to this, a number of technological project types become feasible
even with limited resources. I examine some of those in detail a bit later in the chapter.
Brought to you by Databorough and System iNetwork eBooks
20 Application Mapping on IBM i
Let’s now look at the two most important design assets: the referential integrity (RI) data model and the business logic.
Deriving a Referential Integrity Data Model
The most fundamental design asset of an IBM i application is the data model. The data model of an application
is not just the design of the files, tables, views, and access paths but includes the foreign key relationships or RI
between database tables.
The simplest definition of RI is that it defines a
relationship between two files in which one is the
parent and one is the dependent or child. Records
in the dependent file are joined by a unique key in
the parent file. For example, the contract header file
is a dependent file to the customer master file, and
records in the contract header file must always contain
a valid customer number. Figure 1 shows a diagram
displaying the detail of this example RI.
For large and complex applications, this task needs to be
approached in a structured manner and can be broken
down into a few discrete steps. The first is to establish
the physical model by extracting the table, file, view,
and logical file definitions. This provides both a data
dictionary of the database and a map of all important
keys, such as primary or unique identifiers. Taking one
table at a time, the unique key of the file is compared
with all the keys of other files in turn. Where there is a
match between the primary key of the file and any key
or at least partial key of the other file, a relationship
can be derived. In most cases, the analysis is further
complicated by virtue of field names being different in
different files. In certain cases, the difference is simple,
such as the first two characters being different, whereas
in other cases, the names are completely different.
Figure 2 shows a simple example of how two files are
joined by a foreign key that is similar in name.
Even though DB2 for i has been capable of
implementing RI in the database itself since OS
version 3.1, virtually no applications use this approach,
even today. In the absence of referential constraints
on the tables, RI is managed by using program logic.
There’s nothing wrong with this as a practice, but to
use or visualize the referential data model of such an
application, the program source of the system must be
analyzed. This analysis serves to validate relationships
derived by analyzing the file and field structures,
and it can be used to derive relationships that would
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 21
otherwise not be obvious
at all, because of file and
field name differences.
Programs that use each
of the files are analyzed,
and fields and variables are
traced through the source
code looking for clues that
indicate a relationship
between the files.
Figure 3 shows a source
snippet of field SINIT
from the file CNTACS,
and the text description
gives us a clue that this
field means Salesperson.
Figure 4 shows the source
code of the file SLMEN
and clearly shows that
the key of the file is
PERSON. Figure 5 shows
a snippet of code from
an RPG program that is
using the field SINIT read in from the file CNTACS to read the file SLMEN.
This is validation that the SINIT field equals the PERSON field, and as such, the two files CNTACS and SLMEN
have a foreign key relationship. Admittedly, this is a very simple example, and I knew what I was looking for. In
even modest-sized IBM i applications, the task of analyzing an entire system is a tedious and time-consuming
one and requires fairly good analytical skills. In Chapter 1, I demonstrate how an application map can be used to
accelerate analysis tasks by providing mapping between variables and database fields for an entire system. Deriving
foreign keys by analyzing source code is a classic use of application mapping technology. It’s also possible to write
programs that use the application map to analyze the source code and look for clues and proof of foreign key
relationships between application files. On large applications, it’s virtually a prerequisite to do this sort of analysis
programmatically. The added benefit is that it’s very easy to keep up-to-date with a repeatable automated extraction
process.
Extracting Business Logic
Over a 20-year period, a company might invest tens of millions of dollars in adding, fine-tuning, and fixing the
business logic in the legacy code. Business Rule Extraction is the process of isolating the code segments directly
related to business processes. For example: ensuring that when a customer is added to the system, a valid telephone
number is provided by the user. Figure 6 shows a sample of RPG code used to do this.
The challenge has always been to identify, isolate, and reuse only those designs relevant in the new context in which
they’re desirable. The sheer volume of code, its complexity, and the general lack of resources to understand legacy
languages and specifically RPG represents a tragic potential waste of valuable business assets. The problem is that
in the vast majority of legacy RPG and Cobol programs, the business-rule logic is mixed in with screen handling,
Brought to you by Databorough and System iNetwork eBooks
22 Application Mapping on IBM i
database I/O, and flow
control. So harvesting
these business rules
from legacy applications
requires knowledge
of the application and
the language used to
implement it, both of
which are a steadily
diminishing resource.
Once harvested, these
rules need to be narrated
and indexed, thus providing crucial information for any analyst, architect, or developer charged with renewing or
maintaining the legacy application. Figure 7 shows the same piece of code as in Figure 6 but with some narrative
about the business logic added, along with an index on line 169.99.
Indexing the business logic in a systematic and structured way by providing reference to its source, the field
and file it refers to, and some form of logic-type classification provides some additional benefits. The first and
most obvious is that it provides a
mechanism to programmatically
extract and document the rules in
various ways. Figure 8 shows the
business logic of the preceding
program documented in a Microsoft
Word table.
The second benefit is the ability
to filter cross-reference information
about a field in such a way as to
show only where business logic
is executed against it. Figure 9
shows a spreadsheet of instances
across a system where business logic has been
applied to a field TELNO. These features can then
be put to use by developers wanting to centralize
business logic across the system by rapidly and
accurately accelerating the analysis work required to
do this.
In addition to these uses for indexed business logic,
it’s possible to write programs that can extract the
indexed logic to create web service modules, provide
documentation for redevelopment in a modern
language such as Java or C#, or even populate
business-rule management systems, such as JBoss
Drools or IBM’s ILOG.
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 23
Unlocking the Power of
the Database
As I mentioned earlier, few IBM
i application databases have any
form of data model or schema
explicitly defined. As many
companies have discovered,
this can significantly hinder
development initiatives that use
direct access to the application
data. Here are four of the most
obvious areas that benefit from an explicitly defined data
model:
Using modern input/output methods in programs. One
of the key aspects of RPG is its native I/O access. The
terse and simple syntax for this also gives RPG significant
development productivity over other languages. In modern
languages such as Java and C#, the most common practice is
to handle database I/O by using embedded SQL statements.
For simple, single-table reads, most developers can create
SQL statements that meet this requirement. Things get
more complicated when tables and files must be joined for
reads, updates, or deletes. In this context, most developers
need to understand the data model and must have access
to foreign key relationship information, so as to build the
Join statements correctly in SQL. Figure 10 shows an entity
relationship or data model diagram extracted from a legacy
RPG application, which is a common way to visualize how
files/tables in the application database relate to each other.
Figure 11 in turn shows a spreadsheet with all the foreign
keys that determine the relationships shown in Figure 10.
An explicitly defined application data model described
in DDL can be imported directly into persistence
frameworks such as Hibernate and NHibernate for .NET.
These open-source object relational mapping (ORM)
solutions let Java and C# developers access DB2 for i databases without needing to know the architecture of the
database or JDBC or ODBC technologies, thus greatly simplifying their work. DDL can also be imported by all
popular application modeling, such as Rational Software Modeler, Borland Together, and Eclipse. Microsoft Visual
Studio also allows the import of DDL for building Data Projects.
Data quality analysis—referential integrity testing. Over many years of application use, enhancement, upgrades,
and fixes, it’s only natural that RI will suffer. This is untrue of systems that implement RI in the database itself but,
as I mentioned, very few, if any IBM i applications use this facility. With an explicitly defined data model of an
IBM i application database, database records can be tested for referential integrity programmatically, producing a
report of orphaned records as an output. Simply explained, the program starts at the bottom level of the hierarchical
Brought to you by Databorough and System iNetwork eBooks
24 Application Mapping on IBM i
data model—in other words, those
files/tables that have no children/
dependents—and looks for
corresponding records in owning
files/tables by using the foreign
keys provided by the data model.
It carries out this test for all files/
tables in the application one after
the other.
Automated test data extraction. The
need for accurate and representative
test data is a requirement as old
as application development. Most
companies use copied production
data to fulfill this need. There are
a few problems associated with this
approach, such as the need to keep test data current, the length of time required to copy production data, the disk
space requirements, and the length of time for testing over complete data sets versus limited data. Another well-used
method for creating test data is to use simple copy statements in the OS for the required files only. This approach
works fine but is still labor intensive and can be error prone. An increasingly popular approach is to select specific
master or control records from the production database and then write a program to copy the related records from
the other files, using the foreign keys provided by the explicitly defined data model. This approach produces small
and current test data quickly and with guaranteed RI. It’s often used by ISVs and support organizations to assist with
customer testing of changes and enhancements of base systems.
Building BI applications or data warehouses. There has been a big push over the last couple of years in the area of
business intelligence (BI). By using the data in the application database more and more effectively, companies expect
to attain and sustain a real competitive edge. The technologies available to facilitate this aren’t new or even that
complicated for the most part. They all have a fundamental requirement for use: that the application database design
be defined or described to them
in detail. This requirement is
not in and of itself a problem,
but when you consider that
virtually no IBM legacy
applications have an explicitly
described relational data
model design, it can become a
problem for IBM i users. With
access to an explicitly defined
data model of the legacy
application, these tools can be
used much more productively
and help provide increased and
ongoing ROI from the legacy
application, even with relatively
small development and support
teams.
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 25
A good example of this situation is with IBM’s DB2 Web Query for i. A great tool and natural successor to IBM’s
Query/400 product, it comes with many powerful BI-type features. To really get the full benefit of all of this
rich functionality in DB2 Web Query, you need to populate the meta-data repository with the DB2 application
database design, including the foreign key information. The explicitly defined database supplies this information
and can be used to create an entire meta-data layer in DB2 Web Query. Figure 12 shows an example of the source
and model views of the meta-data of IBM’s DB2 Web Query. The model view shows how a file is joined to the
other files in the database. This meta-data was created automatically from an explicit data model derived from an
IBM i DB2 database.
Reducing Risk and Maximizing ROI
IT departments throughout the world are struggling to balance day-to-day support with a backlog of new user
requirements—often operating under severe headcount and cost restrictions. Compliance and regulatory pressures
have increased over the last 10 years or so, making large “build it from scratch” projects too risky to contemplate.
From this unpromising start point, it’s possible to make headway against the storm of conflicting demands by using
the proven business processes contained in your existing systems.
If we can extract business logic and data model, a world of new possibilities opens up, and a layer of risk and
uncertainty around potential projects is reduced because you have access to what your systems really do, as opposed
to what people think they do or what the outdated documentation says they do.
If the business-logic and data-model extraction processes can be automated, it follows that many project types
become feasible with limited resources. These can range from quick solutions, such as making an order enquiry
process available as a web service so customers can integrate it into their own processes, to longer-term solutions,
such as full ERP systems or other large-scale systems. Recovered business rules can be recycled and reused in
Business Rule and Process Management systems, such as JBoss Drools, or workflow systems. Automated businesslogic and data-model extraction processes also serve well in smaller-scale developments, such as using the business
rules and processes of a CRM system to build a new Java-based web application—offering peace of mind in the
knowledge that the processes contained in the application are proven, and accordingly reducing the risk and
development time.
Ultimately, this process translates to large-scale reuse precisely because it can be taken step by step in manageable
chunks and delivers working prototypes quickly from the recovered business rules and model; there’s no need for
development to be done in a vacuum starved of the oxygen of user feedback. The next chapter covers in more detail
design recovery and application renewal by using extracted logic. Application development and modernization is all
about designs—not pure code philosophy.
From a compliance or audit point of view, compare the confidence levels of two IT directors: One IT director
has used an automated business-rule and data-model recovery system to reuse and rebuild systems, and the other
IT director has used a conventional approach of manually designing and building new functionality to match the
functionality of an existing system by looking solely at the user requirements documentation and source code of the
existing systems. One IT director has an auditable and repeatable process to recover rules and processes and build
new systems. The other IT director is solely dependent on the skill and experience of the people building the new
system and on those people’s interpretation of the existing system’s scope and eccentricities.
Brought to you by Databorough and System iNetwork eBooks
26 Application Mapping on IBM i
Chapter 4: Modernizing Legacy Applications Using
Design Recovery
The concept of reusing existing code or logic isn’t a new one. The challenge has always been to identify, isolate, and
reuse only those designs that are relevant in the new context in which they’re desirable. In the case of IBM i, the
sheer volume of code, its complexity, and the general lack of resources to understand legacy languages, specifically
RPG, represent a tragic potential waste of valuable business assets for hundreds of thousands of companies. In many
cases, these expensive and well-established legacy designs have little chance of even having their relevance assessed,
let alone being reused.
To fully understand and appreciate the problem domain, just think for a minute of two approaches to these
problems, namely screen scraping and code conversion. Simply screen scraping the user interface with a GUI or
web emulation product doesn’t improve the situation. The application may appear slightly more “modern,” but the
cosmetic changes still leave it with all the same maintenance and enhancement problems, and it may not be much
easier for new users to use. The same applies to building web services around wrapper programs written to interpret
the interactive data stream from 5250 applications.
Another common approach is code conversion—line-by-line syntax conversion of a legacy application. This
approach typically transfers the same problems from one environment/language to another. Indeed, it often
produces source code that is less maintainable, canceling out the benefit of using modern technologies and
architectures in the first place. Syntax conversions are still being done by some companies and are often promoted
by vendors of proprietary development tools for obvious reasons. This approach has never to my knowledge
produced an optimum long-term result, despite many attempts over the last two decades.
The objective, therefore, in a true modernization project is to extract the essence or design from the legacy
application and reuse these designs as appropriate in rebuilding the application, using modern languages,
development tools, and techniques, and tapping into more widely available skills and resources.
In the previous three chapters, I describe how to recover
application legacy design assets in a structured and proven
manner. In this chapter, I detail how to use these recovered
designs to create a modern application.
Modern Application Architecture
Modern applications are implemented with distributed
architecture. A popular standard used for this architecture
is Model-View-Controller (MVC). Figure 1 shows the
architecture of a typical legacy application and the MVC
architecture side by side. MVC allows for independent
implementation and development of each layer and facilitates
object-oriented (OO) techniques and code reusability rarely
found in legacy applications. All these characteristics of a
modern application radically improve the maintainability and
agility of the application. Legacy applications do have these
same elements, but they tend to be embedded and mixed
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 27
up in large monolithic programs, with vast amounts of redundancy and
duplication throughout.
Using MVC to implement an RPG application requires that the business
logic be separate from the user interface and controller logic. Figure 2
shows a schematic of the code implementation in a typical modern
application. This architecture can be implemented by using 5250 and pure
RPG, but it’s more likely and common when using a web interface for
the view and with the controller logic to be written in a modern language
that supports web interfaces, such as Java, EGL, or C#. The optimum
modernization result is to reduce dependency on legacy and proprietary
languages as much as possible. To achieve this, recovered design assets
are reused as input to redevelop the appropriate layer. Figure 3 shows an
overview of the overall process of modernizing the legacy code by using the
recovered designs.
In Chapter 3, I discussed how to extract the data model and business rule
logic from legacy code. If these extracted designs can be articulated in
language or programmatic format, such as Unified Modeling Language
(UML), SQL’s Data Definition Language (DDL), and XML, or even in
structured database language statements, it’s
possible to use them programmatically to generate
the basis of a new application skeleton. This can
save companies millions of dollars and significantly
reduce timelines. It also means that the designs
can be perfected before code is written in the new
application. Another benefit is that the generation
process can be run repeatedly until the optimum
start point of the new application development
process is achieved, rapidly and with little effort.
This programmatic reuse of recovered application
designs requires a certain amount of restructuring
of the designs. The legacy designs of the
interactive logic and flow of the legacy application
can be used to build a modern application
skeleton, and thereafter the extracted business-rule logic can be added to this skeleton. Modern resources, tools, and
methods can then be used independently to enhance and complete the modernization as required. Let’s look at these
steps in more detail.
Building a Modern Application Skeleton
The most fundamental change and the biggest challenge in modernizing a legacy application is moving from a
procedural programming model to an event-driven one. This aspect is one of the primary reasons that line-by-line
syntax conversions to modern languages produce results that are often less maintainable than the original code.
One legacy-application design element that’s almost directly transferrable to a modern, event-driven programming
model is individual screen formats. Legacy screen formats largely if not explicitly correspond to individual steps in
a transaction or business process. An individual web page or application form largely if not explicitly corresponds to
Brought to you by Databorough and System iNetwork eBooks
28 Application Mapping on IBM i
an individual step in a transaction or business process. By simple deduction therefore, all the design detail relating to
the rendering of this specific legacy screen format can be used to specify and build a modern UI component. I refer
to the design information that forms this intersection as a “function definition,” as Figure 4 shows.
To rebuild a modern application skeleton from the legacy designs, a function definition should consist of the
following elements:
•Screen fields - work fields and fields directly traceable to database fields
•Screen field types and attributes - fields used as dates, foreign keys, descriptors, size, type, and so forth
•Screen constants - column headings, field prompts, function names, command-key descriptions, and so forth
•Screen layout - column and row positions can be converted later using relative pixel ratios
•Screen field database mapping - where the data for the screen comes from, including join rules for foreign keys
•Screen actions - command keys, default entry, and subfile options
This design information is entangled in DDS, program logic, and the database of the legacy application. With a
reasonable level of skill in legacy languages, developers can extract this information manually by analyzing the
source code manually. With larger systems, the best course is to use
tools for the analysis and extraction process. Beyond productivity,
consistency, and accuracy, the added benefit of using an analysis
and extraction tool is that the results can more easily be stored
programmatically and thereby used to automate the next step
of writing the code. Using UML is one way to achieve this. The
function definitions can be generated as a UML model for an
application, with a number of specific UML constructs that will
also assist in modeling and documenting the new application for
modern developers. Some of these constructs include Activity
Diagrams, Use Cases, and Class Diagrams. Figure 5 shows a UML
Activity Diagram that represents the users’ flow through a series of
legacy programs having multiple screen formats.
DDL and XML can be used as a means to efficiently specify the
detailed aspects of the function definitions. DDL created from the
legacy application data model can be imported into a persistence
framework or object relational map (ORM), such as Hibernate for
Java and NHibernate for .NET. An ORM greatly simplifies the
subsequent coding required in Java or C#
by subcontracting all the complicated SQL
programming required in an enterprise
business application. An additional
approach is to create a single database I/O
class for each table. This approach removes
the need to have I/O logic embedded in
every program in the system, immediately
making the application more maintainable
and agile.
The function definitions are then used to
create the UIs and controller beans in the
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 29
language and standard of choice (one JSF page and corresponding Java bean per legacy screen format). Using XML
to store the function definition provides input for documented specifications for manual rebuilds and serves as an
input to programs that can create the view and controller components. This approach is applicable to Java, EGL,
C#, and PHP implementations and can be used
for web, mobile web, and Rich Client Platform
(RCP) alike. This is an important factor for
enterprise applications that often require a mix of
device types and even technology implementation
options for a single system. Figure 6 shows a
JSF page generated from a function definition
extracted from a legacy program.
The important factor here is not so much the
look and feel but rather the functionality of each
button now associated with an event handler in
the underlying JSF bean triggered by the HTML
code itself. The data in the grid was retrieved
from the DB2 for i database by using the SQL
in the bean created for the underlying database
table. This was invoked by the JSF bean when
the JSF page was requested by the user (in this
instance from a menu on a previous page). The
HTML and CSS layout were created by using
the information from the function definition, and
the buttons to put into the JSF page from the
options, command keys, and default entry were all
extracted from the legacy program. In this instance,
the design was extracted from legacy RPG/DDS,
and the JSF page and Java beans were created
automatically, by using a tool, in a few minutes.
The style was implemented by using a standard
CSS file and supporting images. This is all
industry-standard, best-practice, modern stuff.
Figure 7 shows a snippet of the underlying
HTML code that triggers an event in the bean
to invoke the orders page, passing the key of the
row selected by the user.
The underlying Java bean knows what to do
because the parameter being passed tells it where
to go next. In this way, the JSF beans can be
kept small and simple—another good industry
standard and best practice. Figure 8 shows the
record-type page that the user is taken to when
selecting the Change button in Figure 6. The
drop-down combo boxes and date controls were
added because of the presence of the foreign key
information and date field types, respectively, in
Brought to you by Databorough and System iNetwork eBooks
30 Application Mapping on IBM i
the extracted function definitions. This simple algorithm can save thousands of hours of configuration and editing of
web pages in a modern application with hundreds or thousands of screen formats.
Adding Business-Rule Logic
In Chapter 3, I describe how business-rule logic can be extracted, indexed, and documented from legacy RPG
code. One approach is to add these documented rules manually to the appropriate business logic class in the
modern application. This approach should be reserved for cases in which very little of the legacy business logic
is to be reused, including, of course, smaller programs that have little or no specific business logic beyond what
has already been created in the JSF page, JSF bean, and database I/O beans. I want to reiterate that the same
principles that I describe here, using JSF and Java examples, are applicable in .NET and even in modern RPG
applications.
Another more practical approach that has already been automated is to essentially refactor the original interactive
program, to the extent that only the business logic processing is reused. Naturally, the refactoring must include
restructuring to turn it from a
procedural design to an event-driven
one. Again, this process is applicable
whether creating Java, .NET, EGL,
or even RPGLE business-logic
components. During the initial
modernization effort, the businesslogic bean should be created as a
single class/module/program that
services each of the modern, eventdriven JSF pages that came from
the original legacy program. This
is a maintainable architecture and
follows modern coding practices but
retains at least some reference to the
legacy transactions. Figure 9 shows
a schematic representation of the
architectural mapping between legacy
and modern designs for a single legacy
program.
The original legacy program (File A) had three screen formats, which become three discrete free-standing JSF pages
and corresponding Java beans, two database I/O classes determined by the number of unique tables or physical
files, and a single business-logic bean/module/program for the business logic. I used JSF technology and Java for
this diagram, but the same architecture is applicable for any modern language. The same architecture would be
consistent with using Spring and Hibernate frameworks, too.
The business-logic bean now contains a restructured version of all the relevant business logic from the original
program. This restricting process turns business logic from procedural into event-driven code, and in doing so maps
the relevant business-rule processing to the relevant JSF page. The first step to achieving this restructuring is to
recover the logic executed before screen 1 is rendered. This will essentially be placed in the pre-entry method or
procedure of a new business-logic bean and invoked by the JSF bean before the JSF page is displayed. Any legacy UI
logic, such as interactive indicators, is removed during this extraction.
Brought to you by Databorough and System iNetwork eBooks
Application Mapping on IBM i 31
The next step is to map the business logic to
each JSF page. This is done by identifying
the business logic executed after the legacy
Format 1 but before legacy Format 2. We
ignore the interactive logic and legacy
structures such as indicators, which were
turned into variables where applicable. This
logic is then created as a new method (called
JSF1validation, for example) in the new
business-logic bean. This business logic is
invoked by the JSF bean that corresponds
with legacy Format 1, when triggered by the
validation event in the new JSF page. The
trigger is usually the Submit button on the
JSF page itself. This stage is repeated for each
of the legacy screen formats/JSF pages. The
documented, indexed business rules—which
I describe in previous chapters—can be used
as a reference for auditing the applicable logic
during this refactoring and extraction exercise.
Finally, the legacy subroutines, by simple
logic, can be considered business logic and, as such, have a scope that’s potentially applicable to any of the newly
created validation methods/procedures. Therefore, only the legacy-specific code or redundant interactive lines need
to be removed before these subroutines are coded or copied into the new business-logic bean. Figure 10 shows an
example program outline documented in a form of platform-independent pseudo code.
I’ve included only the method outline with one of them expanded, and I added some simple color coding. The preentry method will be executed before JSF1 is rendered. Validation for JSF1 is executed when the user selects the
Submit button from the web page, and so on. The GETREC through ZGETNAMES are subroutines that have had
the interactive logic removed and verified to contain valid business logic in them. It is not possible in such a short
chapter to show the complete detail, but I can provide detailed examples upon request.
Sustainable Reuse
The harvesting of valuable designs is now complete, and the application can be enhanced and refactored. It’s worth
noting that tools are available to automate each step described in this process. Staged automation of the recovery
and rebuild process can reduce a system-rewrite effort by at least 50 percent. Even executed manually, this approach
provides for iterative and parallel use of resources and is applicable for individual programs, application areas
consisting of multiple programs, and even entire systems. It allows for sustained reuse of legacy technology—but
isn’t bound by it—while simultaneously producing a real, modern application, not an emulated one.
About the Author
Robert Cancilla spent the past four years as market manager for IBM’s Rational Enterprise
Modernization tools and compilers group for IBM i. Prior to that, Robert spent 34 years as an IT
executive for three major insurance companies and an insurance software house. He has written
four books on e-business for the AS/400 or iSeries and founded and operated the electronic user
group IGNITe/400. Robert is currently retired and does independent consulting work.
Brought to you by Databorough and System iNetwork eBooks