Download Sasdiff - sasCommunity.org

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
%SASDIFF: A SAS Macro for Differential File Comparison
Ross Bettinger
ABSTRACT
Differential file comparison is a technique often used to compare changes between two files.
These files may contain code developed for a software project or data collected and revised with
new information. In either case, you may want to know the differences between the original file
and the new file. The %SASDIFF macro is patterned after the UNIX sdiff program and may
be used to reveal differences that have been introduced after an original file has been modified
and saved as a new file.
KEYWORDS
Differential file comparison, sdiff, UNIX, SAS/Macro
INTRODUCTION
Suppose that you are asked to assume responsibility on a project that involves many files of SAS
code. You see several versions of a file and want to know what the differences are between each
version. Looking at the timestamp for each file, you can put them into sequential order. You can
print out the code for each version and use PROC EYEBALL to compare one version to the next
as it has evolved, but you would soon get tired of looking from one sheet of code listing to the
next.
What you really want is a utility program to do the comparison for you so that all you need to do
is distinguish changes between the previous version and its successor. Such a program is called a
differential file comparison utility like the UNIX sdiff utility. This useful utility compares
sections of text from two files (orig and new) and creates a formatted listing showing text
common to both files and in one but not the other.
The %SASDIFF macro produces a listing much like the sdiff utility in that it shows lines of
code that belong to the orig file only, or to the new file only, or that are common to both files.
Perusing the output of the %SASDIFF macro, you can quickly follow the course of revisions
between the orig file and the new file.
EXAMPLE OF USE
An example suffices to demonstrate the operation of the %SASDIFF macro. The text is taken
from [1].
Here is the original text:
Here is the new text:
This part of the
document has stayed the
same from version to
version. It shouldn't
be shown if it doesn't
change. Otherwise, that
would not be helping to
compress the size of the
changes.
This is an important
notice! It should
therefore be located at
the beginning of this
document!
This paragraph contains
text that is outdated.
It will be deleted in the
near future.
This part of the
document has stayed the
same from version to
version. It shouldn't
be shown if it doesn't
change. Otherwise, that
would not be helping to
compress anything.
It is important to spell
check this dokument. On
the other hand, a
misspelled word isn't
the end of the world.
Nothing in the rest of
this paragraph needs to
be changed. Things can
be added after it.
It is important to spell
check this document. On
the other hand, a
misspelled word isn't
the end of the world.
Nothing in the rest of
this paragraph needs to
be changed. Things can
be added after it.
Figure 1 Original Text
This paragraph contains
important new additions
to this document.
Figure 2 New Text
The %SASDIFF macro was invoked to process the original and new files with the following
code:
%SASDIFF(
,
,
,
,
,
,
,
,
)
orig
diff
FLOW=Y
IGNORE_WHITE_SPACE=Y
IGNORE_BLANK_LINES=Y
IGNORE_CASE=Y
IGNORE_MATCHES=N
LINESIZE=80
WINDOW=4
It produced the following results:
%SASDIFF Differential File Comparison
orig.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
new.txt
> This is an important
1
> notice! It should
2
> therefore be located at 3
> the beginning of this
4
> document!
5
This part of the
This part of the
6
document has stayed the
document has stayed the 7
same from version to
same from version to
8
version. It shouldn't
version. It shouldn't
9
be shown if it doesn't
be shown if it doesn't 10
change. Otherwise, that
change. Otherwise, that 11
would not be helping to
would not be helping to 12
compress the size of the | compress anything.
13
changes.
<
This paragraph contains <
text that is outdated.
<
It will be deleted in the <
near future.
<
It is important to spell It is important to spell 14
check this dokument. On | check this document. On 15
the other hand, a
the other hand, a
16
misspelled word isn't
misspelled word isn't
17
the end of the world.
the end of the world.
18
Nothing in the rest of
Nothing in the rest of 19
this paragraph needs to
this paragraph needs to 20
be changed. Things can
be changed. Things can 21
be added after it.
be added after it.
22
> This paragraph contains 23
> important new additions 24
> to this document.
25
Figure 3 %SASDIFF Differential File Comparison
Text that is unique to orig is indicated by ‘<’, text that is unique to new is indicated by ‘>’, text
that is the same in both files is indicated by ‘ ’, and text that is changed from orig to new is
marked with ‘|’.
Page 3 of 8
The macro parameters are described in Table 1. Optional parameters are assigned default values
or are taken from the SAS user environment, e.g., Linesize.
Table 1 %SASDIFF Parameters
Parameter
Original File Name
New File Name
FLOW
IGNORE_WHITE_SPACE
IGNORE_BLANK_LINES
IGNORE_CASE
IGNORE_MATCHES
LINESIZE
WINDOW
Default Value
None
None
Description
Name of original text file
Name of new text file containing changes
N
Flag to wrap long lines of text
Y
Flag to ignore tabs, blanks, control characters
Y
Flag to ignore blank lines in file
Y
Flag to ignore differences in upper or lower case
N
Flag to ignore matching lines of text
Session setting Width of comparison table
10
Number of lines of text above or below line being
compared
Another example showing differences between an original SAS program and a revised version is
given in the Appendix.
DESCRIPTION OF ALGORITHM
The %SASDIFF macro builds a table that indicates the results of searching for orig lines that are
found in new and for new lines that are found in orig. Each line of orig is compared to each line
of new that is within a window of comparison1. If a line in orig matches a line in new during the
orig-to-new comparison, it is not tested again during the new-to-orig comparison. Lines in orig
that are not found in new are marked with ‘<’, and similarly, lines in new that are not found in
orig are marked with ‘>’. Lines found in both files in the same position which are not identical
(after the filters IGNORE_WHITE_SPACE, IGNORE_CASE, and IGNORE_BLANK_LINES have
been applied if requested) are marked with ‘|’. Lines that occur in the same position and which
are identical (after the requested filters have been applied) are considered to match and are
marked with ‘ ’.
Once the comparison process is completed, sequence information in the table is used to order
lines from orig and new so as to group lines in orig only or in new only into left or right columns
respectively. Lines common to both files are juxtaposed, with matching lines indicated
accordingly.
Another way of understanding this process is to realize that %SASDIFF computes the set of lines
that are disjoint between orig and new and the set that is common to both files. The macro
interleaves the disjoint and intersecting lines according to their order in the two text files and
labels the lines in common as matching or nonmatching.
The “window of comparison” represents a set of lines in the file being compared, e.g., new, that is above and below
the current line in the reference file, e.g., orig. It is used to limit the search to a reasonable number of lines in the
comparison file per line in the reference file so as to reduce the number of operations. The maximum difference
between matched lines is the optimal value of the window parameter, and it is displayed in the SAS log file.
1
Page 4 of 8
CONCLUSION
The %SASDIFF macro can be a useful utility for program development or for QA purposes. It is
an efficient tool for comparing two sets of text and displaying commonalities and differences
between them.
REFERENCES
1. Wikipedia contributors, "Diff," Wikipedia, The Free Encyclopedia,
http://en.wikipedia.org/w/index.php?title=Diff
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Please contact the author at:
Ross Bettinger
Email: [email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
Page 5 of 8
APPENDIX
The following example applies %SASDIFF to an original SAS program and its revision. The
original program is shown below:
/* purpose: compute the first 100 prime numbers */
data _null_ ;
array primes( 100 ) p1-p100 ;
candidate = 3 ;
continue = 1 ;
n_primes = 1 ;
primes( 1 ) = 2 ;
do i = 1 to 1000 while( continue ) ;
do j = 2 to ceil( sqrt( candidate )) ;
not_prime = mod( candidate, j ) = 0 ;
if not_prime then leave ;
end ;
if not not_prime
then do ;
n_primes = n_primes + 1 ;
primes( n_primes ) = candidate ;
end ;
candidate = candidate + 1 ;
if n_primes < dim( primes )
then continue = 1 ;
else continue = 0 ;
end ;
put 'The first 100
/ ( p1 -p25 )(
/ ( p26-p50 )(
/ ( p51-p75 )(
/ ( p76-p100 )(
run ;
primes are '
4. )
4. )
4. )
4. ) ;
Appendix Figure 1 Prime1
Page 6 of 8
The revised program, which contains improvements in the algorithm, is:
/* purpose: compute the first n prime numbers */
%let N_PRIMES = 100 ;
data _null_ ;
array primes( &N_PRIMES ) _temporary_ ;
candidate
continue
n_primes
primes( 1 )
=
=
=
=
3
1
1
2
;
;
;
;
do while( continue ) ;
sum_factors = 0 ;
/* if no previous primes are a factor in candidate,
* candidate must be prime
*/
do i = 1 to n_primes ;
sum_factors + mod( candidate, primes( i )) = 0 ;
end ;
if not sum_factors
then do ;
n_primes + 1 ;
primes( n_primes ) = candidate ;
end ;
candidate + 1 ;
continue = n_primes < &N_PRIMES ;
end ;
put 'The first 100 primes are ' ;
do i = 1 to &N_PRIMES ;
put ( primes( i )) ( 4. ) @ ;
if not mod( i, 10 ) then put ;
end ;
run ;
Appendix Figure 2 Prime2
Page 7 of 8
The %SASDIFF invocation was
%SASDIFF(
,
,
,
,
)
Prime1.txt
Prime2.txt
FLOW=y
LINESIZE=64
WINDOW=4
with the following differential file comparison results:
%SASDIFF Differential File Comparison
1
Prime1.txt
/* purpose: compute the first 100 prime
numbers */
2
3
4
5
6
7
8
9
10
data _null_ ;
Array primes( 100 ) p1-p100 ;
candidate = 3 ;
continue = 1 ;
n_primes = 1 ;
primes( 1 ) = 2 ;
do i = 1 to 1000 while( continue ) ;
do j = 2 to ceil( sqrt( candidate )) ;
not_prime = mod( candidate, j ) = 0 ;
11
if not_prime then leave ;
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
end ;
if not not_prime
then do ;
n_primes = n_primes + 1 ;
primes( n_primes ) = candidate ;
end ;
candidate = candidate + 1 ;
if n_primes < dim( primes )
then continue = 1 ;
else continue = 0 ;
end ;
put 'The first 100 primes are '
/ ( p1 -p25 )( 4. )
/ ( p26-p50 )( 4. )
/ ( p51-p75 )( 4. )
/ ( p76-p100 )( 4. ) ;
run ;
Appendix Figure 3 Differential File Comparison
|
>
|
|
|
|
|
|
|
|
>
>
>
|
|
|
|
<
<
|
|
|
|
|
Prime2.txt
/* purpose: compute the first n prime
1
numbers */
%let N_PRIMES = 100 ;
2
data _null_ ;
3
array primes( &N_PRIMES ) _temporary_ ;
4
candidate = 3 ;
5
continue = 1 ;
6
n_primes = 1 ;
7
primes( 1 ) = 2 ;
8
do while( continue ) ;
9
sum_factors = 0 ;
10
/* if no previous primes are a factor in
11
candidate,
* candidate must be prime
12
*/
13
do i = 1 to n_primes ;
14
sum_factors + mod( candidate, primes( i )) 15
= 0 ;
end ;
16
if not sum_factors
17
then do ;
18
n_primes + 1 ;
19
primes( n_primes ) = candidate ;
20
end ;
21
candidate + 1 ;
22
continue = n_primes < &N_PRIMES ;
23
end ;
put 'The first 100 primes are ' ;
do i = 1 to &N_PRIMES ;
put ( primes( i )) ( 4. ) @ ;
if not mod( i, 10 ) then put ;
end ;
run ;
Page 8 of 8
24
25
26
27
28
29
30