Download A BUG TAXONOMY

A BUG TAXONOMY (Java) There are three basic groups into which programming errors can be classified: syntax errors, run-time errors, and logic errors. A program with a syntax error will not compile; the program is not a valid statement in the chosen programming language. A program with a run-time error will compile but not complete execution. The reasons are varied but the most common are illegal address faults, arithmetic traps, and uncaught exceptions. A term logic error is used when a program compiles and runs to completion, but does not always produce the correct answers. In order for a program to have observable bugs, it must first compile and link indicating that the program syntax is valid. If the program compiles and links, but produces incorrect results or aborts execution during processing, then we say it has a bug or bugs. Therefore, debugging is the detection and removal of run time errors. Bugs can derive from many sources, particularly in a language such as JAVA where many typos and non-logic errors go undetected by the compiler. The C programming language is even worse. Strongly typed languages such as Ada generally have fewer such errors. The list below is representative but by no means exhaustive. 1. Incomplete analysis or logic error. (The possibilities are endless. A couple of common examples are listed below.) a. Inadequate programmer analysis may not foresee and prepare for all the possible paths of the correct algorithm. b. The test data may not test all the critical paths of the program. Thus logic errors may remain undetected until actual data is used. 2. Inadequate understanding of syntax. a. Leaving out braces in a FOR or WHILE loop containing two or more statements. b. A SWITCH statement missing BREAK(s)/RETURN(s) where needed. c. A SWITCH statement missing a DEFAULT when needed. d. Forgetting that ELSE binds tight. In the left hand code below, the ELSE is attached to the second IF even thought it is lined up under the first IF. In the loop on the right, the print statement is not in the loop and will only be printed after the loop terminates. if( a > b) if( a > d ) c = d; else c = a; 3. int i, sum = 0; for( i = 0; i < 10; i++ ) sum += i; System.out.println( i + “ “ + sum ); Parameter error. a. Java method parameters are always call-by-value, but that is not the whole story. For language primitive data type such as int, a copy of the argument is sent to the method. Thus a change to the parameter cannot change the argument. If the argument sent a method is an object, a copy of the address of the object is sent to the method. Java calls this a reference type, and this is, in effect, the same as call-by-reference as the method has the address of the argument. The same is true for any reference type such as a String, array, Integer, or Double. b. Inconsistency between the types specified in the formal parameter definitions and the types of the actual parameters in the method call. Sometimes these errors are masked by implicit type conversions such as int to double. As Java only converts from smaller to larger types implicitly, Java is less prone to these errors than most languages. 4. Small typing errors not caught by the compiler. Unfortunately, this is common in C and C++. Happily, Java is much more likely to catch such an error. For example, a C++ version of either loop below would compile, but only the one on the left will compile in Java. The difference is that Java recognizes the block on the right as spurious. C++ permits the creation of a block most anywhere. These loops have a semicolon (;) at the end of the FOR and WHILE opening statement. This error effectively nulls the for loop or makes the while an infinite loop. int I; int I; int sum = 0; for( I=0; I<SIZE; I++); while( sum < 100); A[I] = kb.nextInt(); { I = kb.nextInt(); sum += I; } 5. Implicit type conversion errors are rare in Java. The compiler will automatically convert among types in the direction shown if needed in an expression or if needed to make a method argument type match the defined parameter type. This should cause few problems as these conversions lose no precision. Note that the non-numeric primitive types boolean and char are not listed here. You can convert explicitly between most any of these, and between int and char but not int and boolean. byte → short → int → long → float → double 6. Incorrect Data. Some of the data typed in or read from the input file(s) may be incorrect. 7. Incorrect Input. a. The input statements may be inconsistent with the data layout. This is a common problem in a language such as Fortran. Among other errors, this can produce an infinite loop. b. The location of input statements may be inappropriate for the loop structure. In the left hand loop below, X is undefined the first time the WHILE is tested and the result is implementation dependent. The error is eliminated in the second loop. The third loop shows one of the best ways to construct an input loop. while( x > 0 ) { x = kb.nextInt(); sum += x; } x = kb.nextInt(); while( x>0 ) { sum += x; x = kb.nextInt(); } String input = “”; while( kb.hasNext() ) { input = kb.nextLine(); … whatever } c. Confusion about the exact effect of next(), nextInt(), nextByte(), nextDouble(), nextLine() or the failure to preface a loop containing next() with a hasNext() when needed. d. Combinations of these commands can produce odd circumstances where the programmer thinks the input pointer has moved to the beginning of the next line when it has not. 8. Incorrect Output. Output the wrong variable or the right variable in the wrong format. If the format is too small to output the number correctly, Java will ignore the format and output the number correctly. 9. Loop entry or exit errors. Most FOR loop errors involve traversing the loop one too many or one too few times. Most WHILE and DO-WHILE errors involve infinite loops because of a missing or unexecuted update. Sometimes these loops will have improperly constructed exit conditions such as OR (||) where AND (&&) is needed. Occasionally these errors have the effect of taking a loop out of the program. Example 4b above shows one such error. 10. Assignment operator (=) error. Primitive Types: “A = B;” copies the content of B into A. If A and B are ints, chars or doubles, the expected and correct thing happens. Reference Types with automatic boxing and unboxing and Immutable Objects: If A and B are Integers, the expected and correct thing also happens, but only because of automatic boxing and unboxing. If A and B are strings, a copy of B is created and attached to A, again correct. Strings are immutable objects and copies are automatically made. Other Reference Types: If A and B are ordinary, mutable objects such as objects of a class you constructed, then the content of B is the address where B’s instance data is stored. Thus, the address of B is copied into A and now both A and B point to the same instance data. This is called a shallow copy. What the programmer should have done is call a copy constructor or a clone method. i.e. TheClass A = new TheClass(B); 11. Array index error. Array indices in Java are always in the range 0..ArraySize-1. In Java, any array index out of bounds should abort the program. 12. Initialization error. a. All accumulator variables should be initialized, counters and sums usually to zero, and products usually to one. Relying on Java to initialize is considered poor style. b. Special variables referring to a list of values such LARGEST and SMALLEST require initialization. One approach is to initialize such a variable to a value smaller than or larger than any that will be encountered such as Integer.MIN_VALUE or Double.MAX_VALUE. An accumulator could be initialized to the first value encountered. c. If a program runs correctly in Windows and dies a ghastly death in Unix/Linux, the first place to look is for a missing initialization. Most PC compilers initialize all variables automatically; most compilers on UNIX/LINUX do not. Java has fewer problems with this than C or C++ because of 12.d below. d. Java automatically initializes all instance variables with null values for that type. Similarly, all object variables are initially null until the constructor is called. 13. Calculation based errors. a. Division by zero b. Integer overflow or underflow c. Floating overflow or underflow 14. Run time errors that may abort execution, or generate invalid results or infinite loops. a. Example 4 above discusses one kind of infinite loop. Another possibility is an incompatibility between the data and the input statements. If C++ cannot read in new data because of such a problem, then the old data remains in the variables. In a loop that is supposed to terminate upon end of file or upon detection of sentinel data, the loop may never end. I have not had a chance yet to explore this possibility in Java. b. Range of integers. All numeric types have a definite range of values. The range of int on most PCs is -32768..32767. The range for long is about -2 billion to +2 billion. The wrapper classes for primitive types give the largest and smallest values in that type. For int, the values are Integer.MAX_VALUE and Integer.MIN_VALUE. c. Range and number of significant digits of floating point numbers. Floating point representations such as float and double are stored as a mantissa and exponent similar to scientific notation. Thus they have a maximum range based on the exponent and a certain number of digits of precision based on the mantissa. d. Floating round-off errors. Rounding errors can often be reduced, but they can rarely be completely eliminated from programs using floating point values. Good basic techniques include: 1. Never compare for equality or inequality between floating values. Instead check for a difference within some acceptable error factor. { if( abs(A-B) < error ) rather than if( A== B) } 2. Try to arrange a sequence of operations so that the partial results move as little as possible. { A*B/C*D/E rather than (A*B*C) / (D*E) } 3. Try to avoid subtracting values that are almost equal. e. Overflow and underflow. If a calculation produces a positive value too large for the computer to store properly, the result is called overflow. Too large a negative value is called underflow. The most common clue that overflow or underflow has occurred is an unexpected change of sign. Suppose you have written a factorial method where you input an integer N and output N!. If you put in a positive N and N! is computed as negative, the error is almost undoubtedly overflow. Change the data type to LONG and try again. Similarly an accumulator of positive values can suddenly become negative if it overflows. On some implementations overflow and underflow will abort the program. f. Incorrect use of the address and pointer capabilities of Java will often generate an illegal address fault that will abort the program. It is recommended that inexperienced students make limited use of these facilities, as their paths to perdition are manifold. g. Exceptions Java has many built in exceptions, and the programmer can add others. The verbs are throw, try, and catch. This is too large a topic for this section, but the basic philosophy is to write the program in such a way that most/all possible exceptions are caught and handled by the program. 15. Semantic Errors. The number of possible semantic errors in java is almost unlimited. This will be true in any language supporting semantically complex constructs such as classes, inheritance, polymorphism, and exceptions. One common such error is unexpected side effects generated by shallow copies. A correctly defined copy constructor generally avoids this error. Debugging Techniques Virtually all debugging involves first finding out what the program is doing wrong, then fixing the error. The fastest way to find out what the program is doing wrong is to first find out exactly what the program is doing. To put it another way, if you know exactly what is going on inside your program, you should be able to locate and then repair the error. Inexperienced students have a tendency to guess, and in effect make random changes in the hope that the error will disappear. What they need instead is information. The techniques described below are just different ways of getting this information to the programmer. If you go to your instructor and ask for help debugging a program, the first question you will probably hear will be something along the lines of "What is it doing?" If your answer is any version of "I don't know," then you have not done enough yet on your own to eliminate the problem. Simple deduction using available information If the program is producing output, and the output is incorrect, sometimes the existing output provides the necessary clues on the problem. For example, if the last expected line of output is duplicated (or missing), the first place to look is for an incorrectly formed loop. (See 7b) Most of the time, however, the output alone is not enough and some of the techniques described below should be employed. A. Echo Input Data to an Output file or Device If the program is not reading correctly, this can be difficult to detect if the input data is not echoed to output. A simple code section that echoes input can verify the input and can be removed or commented out when no longer needed. B. Variable Values Output statements can be added to the program to output the values of variables. These might be intermediate values, they might be parameters to methods, they might be values about to be returned by methods or values just returned by methods, or whatever seems appropriate. If such statements may produce a large volume of output, then it may be desirable to output to a file. This error file may be the same as the normal output file or a separate debug file. C. Trace Similarly, statements can be added to the program that produce a trace of the methods called. For example, upon entering a method ComputeGrossPay with the parameter 'rate' the programmer might insert the statement: Sytsem.out.println( “inCGP rate :" + rate ); Upon leaving the same method and before returning the value of 'gross', the following statement could be added: System.out.println( “outCGP return :" + gross ); Obviously, enough such statements could be included so as to output everything pertinent about the method. It is helpful if the trace information generated by a single method be on the same line, or at least clearly grouped. Blank lines can help make sense out of a large volume of output. D. Built in Debug Variables and supporting code While hardly appropriate for a small student program, a large production program that will undergo a series of revisions may benefit from a built in debugging mechanism. The statement below creates an array of five debug variables, any of which may be true or false. boolean[] db = new boolean[5]; The first element of db might be used to echo input, the second might just pertain to the main, the third might be a trace of the primary methods, the fourth might be a trace of the small methods, etc. These variables can be turned on or off as needed. One way is to prompt the user for the values of the debug variables, but this is unacceptable for production work. Another technique more appropriate for production is to set the values within the program or read the values from a special debug file. Statements producing output for debugging purposes would simply be prefaced by the appropriate test. In this way, it is no longer necessary to remove the debugging statements. When no longer needed, they are simply turned off. They can be turned back on if a new bug is discovered, or if new bugs are introduced by programming changes. if( db[0] ) System.out.println( whatever ); Similarly, one might build a debug variable into the member methods associated with a class. The client program could turn on or off this output as needed. Eventually, the program will be ready to go into production. At this time, these debug statements could be commented out, thus speeding up execution of the program. In this way, they could be reactivated later if needed. E. Stubs and Drivers When a program needs to be tested even though one or more parts are missing, stubs can be used to fill in the gaps. Typically, stubs are needed to substitute for methods that are not yet written, but it could be just a loop. A stub looks like the method not written, and is called in the normal manner, but the actual code that would define the method is missing. Instead the method merely returns a reasonable value or set of values. The main program then takes those values and continues on. A driver serves the opposite need. The method is written, but the code that will call it is not. A typical driver will call the method several times with a complete set of reasonable test data. F. Interactive Debugger Netbeans has an interactive debugger that is easy to use and quit effective. I have written a tutorial on its use that you may find to be of help. In general, an interactive debugger can replace or approximate all of the techniques described above and do them more quickly that can be accomplished by modifying your code.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A BUG TAXONOMY