Download A BUG TAXONOMY

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A BUG TAXONOMY (Java)
There are three basic groups into which programming errors can be classified: syntax errors,
run-time errors, and logic errors. A program with a syntax error will not compile; the program is
not a valid statement in the chosen programming language. A program with a run-time error will
compile but not complete execution. The reasons are varied but the most common are illegal
address faults, arithmetic traps, and uncaught exceptions. A term logic error is used when a
program compiles and runs to completion, but does not always produce the correct answers.
In order for a program to have observable bugs, it must first compile and link indicating that
the program syntax is valid. If the program compiles and links, but produces incorrect results or
aborts execution during processing, then we say it has a bug or bugs. Therefore, debugging is the
detection and removal of run time errors. Bugs can derive from many sources, particularly in a
language such as JAVA where many typos and non-logic errors go undetected by the compiler.
The C programming language is even worse. Strongly typed languages such as Ada generally
have fewer such errors. The list below is representative but by no means exhaustive.
1. Incomplete analysis or logic error. (The possibilities are endless. A couple of common
examples are listed below.)
a. Inadequate programmer analysis may not foresee and prepare for all the possible paths of
the correct algorithm.
b. The test data may not test all the critical paths of the program. Thus logic errors may
remain undetected until actual data is used.
2. Inadequate understanding of syntax.
a. Leaving out braces in a FOR or WHILE loop containing two or more statements.
b. A SWITCH statement missing BREAK(s)/RETURN(s) where needed.
c. A SWITCH statement missing a DEFAULT when needed.
d. Forgetting that ELSE binds tight. In the left hand code below, the ELSE is attached to the
second IF even thought it is lined up under the first IF. In the loop on the right, the print
statement is not in the loop and will only be printed after the loop terminates.
if( a > b)
if( a > d )
c = d;
else c = a;
3.
int i, sum = 0;
for( i = 0; i < 10; i++ )
sum += i;
System.out.println( i + “ “ + sum );
Parameter error.
a. Java method parameters are always call-by-value, but that is not the whole story. For
language primitive data type such as int, a copy of the argument is sent to the method.
Thus a change to the parameter cannot change the argument. If the argument sent a
method is an object, a copy of the address of the object is sent to the method. Java calls
this a reference type, and this is, in effect, the same as call-by-reference as the method has
the address of the argument. The same is true for any reference type such as a String,
array, Integer, or Double.
b. Inconsistency between the types specified in the formal parameter definitions and the
types of the actual parameters in the method call. Sometimes these errors are masked by
implicit type conversions such as int to double. As Java only converts from smaller to
larger types implicitly, Java is less prone to these errors than most languages.
4. Small typing errors not caught by the compiler. Unfortunately, this is common in C and
C++. Happily, Java is much more likely to catch such an error. For example, a C++
version of either loop below would compile, but only the one on the left will compile in
Java. The difference is that Java recognizes the block on the right as spurious. C++
permits the creation of a block most anywhere. These loops have a semicolon (;) at the
end of the FOR and WHILE opening statement. This error effectively nulls the for loop
or makes the while an infinite loop.
int I;
int I; int sum = 0;
for( I=0; I<SIZE; I++);
while( sum < 100);
A[I] = kb.nextInt();
{ I = kb.nextInt(); sum += I; }
5. Implicit type conversion errors are rare in Java.
The compiler will automatically convert among types in the direction shown if needed in
an expression or if needed to make a method argument type match the defined parameter
type. This should cause few problems as these conversions lose no precision. Note that
the non-numeric primitive types boolean and char are not listed here. You can convert
explicitly between most any of these, and between int and char but not int and boolean.
byte → short → int → long → float → double
6. Incorrect Data.
Some of the data typed in or read from the input file(s) may be incorrect.
7. Incorrect Input.
a. The input statements may be inconsistent with the data layout. This is a common problem
in a language such as Fortran. Among other errors, this can produce an infinite loop.
b. The location of input statements may be inappropriate for the loop structure. In the left
hand loop below, X is undefined the first time the WHILE is tested and the result is
implementation dependent. The error is eliminated in the second loop. The third loop
shows one of the best ways to construct an input loop.
while( x > 0 )
{ x = kb.nextInt();
sum += x;
}
x = kb.nextInt();
while( x>0 )
{ sum += x;
x = kb.nextInt();
}
String input = “”;
while( kb.hasNext() )
{ input = kb.nextLine();
… whatever
}
c. Confusion about the exact effect of next(), nextInt(), nextByte(), nextDouble(), nextLine()
or the failure to preface a loop containing next() with a hasNext() when needed.
d. Combinations of these commands can produce odd circumstances where the programmer
thinks the input pointer has moved to the beginning of the next line when it has not.
8. Incorrect Output.
Output the wrong variable or the right variable in the wrong format. If the format is too
small to output the number correctly, Java will ignore the format and output the number
correctly.
9. Loop entry or exit errors.
Most FOR loop errors involve traversing the loop one too many or one too few times.
Most WHILE and DO-WHILE errors involve infinite loops because of a missing or
unexecuted update. Sometimes these loops will have improperly constructed exit
conditions such as OR (||) where AND (&&) is needed. Occasionally these errors have
the effect of taking a loop out of the program. Example 4b above shows one such error.
10. Assignment operator (=) error.
Primitive Types: “A = B;” copies the content of B into A. If A and B are ints, chars or
doubles, the expected and correct thing happens.
Reference Types with automatic boxing and unboxing and Immutable Objects: If A
and B are Integers, the expected and correct thing also happens, but only because of
automatic boxing and unboxing. If A and B are strings, a copy of B is created and
attached to A, again correct. Strings are immutable objects and copies are automatically
made.
Other Reference Types: If A and B are ordinary, mutable objects such as objects of a
class you constructed, then the content of B is the address where B’s instance data is
stored. Thus, the address of B is copied into A and now both A and B point to the same
instance data. This is called a shallow copy. What the programmer should have done is
call a copy constructor or a clone method. i.e. TheClass A = new TheClass(B);
11. Array index error.
Array indices in Java are always in the range 0..ArraySize-1. In Java, any array index out
of bounds should abort the program.
12. Initialization error.
a. All accumulator variables should be initialized, counters and sums usually to zero, and
products usually to one. Relying on Java to initialize is considered poor style.
b. Special variables referring to a list of values such LARGEST and SMALLEST require
initialization. One approach is to initialize such a variable to a value smaller than or larger
than any that will be encountered such as Integer.MIN_VALUE or
Double.MAX_VALUE. An accumulator could be initialized to the first value
encountered.
c. If a program runs correctly in Windows and dies a ghastly death in Unix/Linux, the first
place to look is for a missing initialization. Most PC compilers initialize all variables
automatically; most compilers on UNIX/LINUX do not. Java has fewer problems with
this than C or C++ because of 12.d below.
d. Java automatically initializes all instance variables with null values for that type.
Similarly, all object variables are initially null until the constructor is called.
13. Calculation based errors.
a. Division by zero
b. Integer overflow or underflow
c. Floating overflow or underflow
14. Run time errors that may abort execution, or generate invalid results or infinite loops.
a. Example 4 above discusses one kind of infinite loop. Another possibility is an
incompatibility between the data and the input statements. If C++ cannot read in new
data because of such a problem, then the old data remains in the variables. In a loop that
is supposed to terminate upon end of file or upon detection of sentinel data, the loop may
never end. I have not had a chance yet to explore this possibility in Java.
b. Range of integers. All numeric types have a definite range of values. The range of int on
most PCs is -32768..32767. The range for long is about -2 billion to +2 billion. The
wrapper classes for primitive types give the largest and smallest values in that type. For
int, the values are Integer.MAX_VALUE and Integer.MIN_VALUE.
c. Range and number of significant digits of floating point numbers. Floating point
representations such as float and double are stored as a mantissa and exponent similar to
scientific notation. Thus they have a maximum range based on the exponent and a certain
number of digits of precision based on the mantissa.
d. Floating round-off errors. Rounding errors can often be reduced, but they can rarely be
completely eliminated from programs using floating point values. Good basic techniques
include:
1. Never compare for equality or inequality between floating values. Instead check for a
difference within some acceptable error factor. { if( abs(A-B) < error ) rather than if(
A== B) }
2. Try to arrange a sequence of operations so that the partial results move as little as
possible. { A*B/C*D/E rather than (A*B*C) / (D*E) }
3. Try to avoid subtracting values that are almost equal.
e. Overflow and underflow. If a calculation produces a positive value too large for the
computer to store properly, the result is called overflow. Too large a negative value is
called underflow. The most common clue that overflow or underflow has occurred is an
unexpected change of sign. Suppose you have written a factorial method where you input
an integer N and output N!. If you put in a positive N and N! is computed as negative, the
error is almost undoubtedly overflow. Change the data type to LONG and try again.
Similarly an accumulator of positive values can suddenly become negative if it overflows.
On some implementations overflow and underflow will abort the program.
f. Incorrect use of the address and pointer capabilities of Java will often generate an
illegal address fault that will abort the program. It is recommended that inexperienced
students make limited use of these facilities, as their paths to perdition are manifold.
g. Exceptions Java has many built in exceptions, and the programmer can add others. The
verbs are throw, try, and catch. This is too large a topic for this section, but the basic
philosophy is to write the program in such a way that most/all possible exceptions are
caught and handled by the program.
15. Semantic Errors.
The number of possible semantic errors in java is almost unlimited. This will be true in any
language supporting semantically complex constructs such as classes, inheritance,
polymorphism, and exceptions. One common such error is unexpected side effects generated
by shallow copies. A correctly defined copy constructor generally avoids this error.
Debugging Techniques
Virtually all debugging involves first finding out what the program is doing wrong, then
fixing the error. The fastest way to find out what the program is doing wrong is to first find
out exactly what the program is doing. To put it another way, if you know exactly what is
going on inside your program, you should be able to locate and then repair the error.
Inexperienced students have a tendency to guess, and in effect make random changes in the
hope that the error will disappear. What they need instead is information. The techniques
described below are just different ways of getting this information to the programmer. If you
go to your instructor and ask for help debugging a program, the first question you will
probably hear will be something along the lines of "What is it doing?" If your answer is
any version of "I don't know," then you have not done enough yet on your own to eliminate
the problem.
Simple deduction using available information
If the program is producing output, and the output is incorrect, sometimes the existing output
provides the necessary clues on the problem. For example, if the last expected line of output is
duplicated (or missing), the first place to look is for an incorrectly formed loop. (See 7b) Most of
the time, however, the output alone is not enough and some of the techniques described below
should be employed.
A. Echo Input Data to an Output file or Device
If the program is not reading correctly, this can be difficult to detect if the input data is
not echoed to output. A simple code section that echoes input can verify the input and can
be removed or commented out when no longer needed.
B. Variable Values
Output statements can be added to the program to output the values of variables. These
might be intermediate values, they might be parameters to methods, they might be values
about to be returned by methods or values just returned by methods, or whatever seems
appropriate. If such statements may produce a large volume of output, then it may be
desirable to output to a file. This error file may be the same as the normal output file or a
separate debug file.
C. Trace
Similarly, statements can be added to the program that produce a trace of the methods
called. For example, upon entering a method ComputeGrossPay with the parameter 'rate'
the programmer might insert the statement:
Sytsem.out.println( “inCGP rate :" + rate );
Upon leaving the same method and before returning the value of 'gross', the following
statement could be added:
System.out.println( “outCGP return :" + gross );
Obviously, enough such statements could be included so as to output everything pertinent
about the method. It is helpful if the trace information generated by a single method be on
the same line, or at least clearly grouped. Blank lines can help make sense out of a large
volume of output.
D. Built in Debug Variables and supporting code
While hardly appropriate for a small student program, a large production program that
will undergo a series of revisions may benefit from a built in debugging mechanism. The
statement below creates an array of five debug variables, any of which may be true or
false.
boolean[] db = new boolean[5];
The first element of db might be used to echo input, the second might just pertain to the
main, the third might be a trace of the primary methods, the fourth might be a trace of the
small methods, etc. These variables can be turned on or off as needed. One way is to
prompt the user for the values of the debug variables, but this is unacceptable for
production work. Another technique more appropriate for production is to set the values
within the program or read the values from a special debug file.
Statements producing output for debugging purposes would simply be prefaced by the
appropriate test. In this way, it is no longer necessary to remove the debugging
statements. When no longer needed, they are simply turned off. They can be turned back
on if a new bug is discovered, or if new bugs are introduced by programming changes.
if( db[0] ) System.out.println( whatever );
Similarly, one might build a debug variable into the member methods associated with a
class. The client program could turn on or off this output as needed. Eventually, the
program will be ready to go into production. At this time, these debug statements could
be commented out, thus speeding up execution of the program. In this way, they could be
reactivated later if needed.
E. Stubs and Drivers
When a program needs to be tested even though one or more parts are missing, stubs can
be used to fill in the gaps. Typically, stubs are needed to substitute for methods that are
not yet written, but it could be just a loop. A stub looks like the method not written, and is
called in the normal manner, but the actual code that would define the method is missing.
Instead the method merely returns a reasonable value or set of values. The main program
then takes those values and continues on.
A driver serves the opposite need. The method is written, but the code that will call it is
not. A typical driver will call the method several times with a complete set of reasonable
test data.
F. Interactive Debugger
Netbeans has an interactive debugger that is easy to use and quit effective. I have written
a tutorial on its use that you may find to be of help. In general, an interactive debugger
can replace or approximate all of the techniques described above and do them more
quickly that can be accomplished by modifying your code.