Download The Unicode Worldwide Character Standard is a character

BIT 2206/CSC 1209 INTRODUCTION TO PROGRAMMING IN JAVA: THE CHARACTER TYPE LESSON 1 CONTENTS 1. Introduction 2. ASCII Code 3. Alternative codes 4. The Character class 5. Character input 6. Example problem - lower to upper case conversion 6.1. Requirements 6.2. Analysis 6.3. Design 6.4. Implementation 6.5. Testing 7. The System.out.flush method Example combines the two statement input data declaration used up until now, into a single statement declaration. Example also introduces the concepts of Boundary Value Analysis (BVA) and limit testing. 1. INTRODUCTION The Java type character is used for handling single characters such as letters, digits and special symbols (e.g. question mark, full stop, colon etc.), or non-printable control character (e.g. tab, newline etc.). In Java (like many other programming languages) characters are written by enclosing them in single quotes. Examples: 'a' 'A' '2' '+' ''' 2. ASCII CODE In the early days of computing characters were usually stored, in a computer, using a group of 8 bits, i.e. a byte. Originally, only seven of these bits were used. The eighth most significant bit, referred to as the parity bit, was used for error checking. Using only seven bits there are 128 different character codes available (2^7). There is a generally accepted standard, called the ASCII standard, which determines which characters can be encoded using the seven available bits, and which character code represents which character. ASCII (pronounced "ass-key") is an acronym for American Standard Code for Information Interchange. 3. ALTERNATIVE CODES The ASCII standard was developed on the assumption that all computer usage would be in English. The English alphabet has 26 letters derived from the Latin alphabet. This set of letters is sufficient for only a small group of languages, e.g. English, Swahili and Hawaiian! All other living languages use either the Latin alphabet plus other characters, or other nonLatin alphabets, or syllabaries. Use of the ASCII standard therefore presents a problem in many countries. 3.1 LATIN-1 CODE The obvious solution to addressing the above problem is to drop the use of the parity bit so that 256 character codes are available. There are a number of "8 bit" character standards available. Some languages (for example Ada) use what is commonly referred to as the LATIN-1 standard (ISO-8859). In this standard the first 128 codes (0 to 127) adhere to the ASCII standard, while the remaining codes provide for additional characters. 3.2 Unicode Worldwide Character Standard The Unicode Worldwide Character Standard is a character coding system whereby characters are stored in two bytes of memory (i.e. 16 bits as opposed to 8 bits). "At time of writing" the Unicode standard contained 34,168 distinct coded characters. Java use the Unicode Standard. Provided that we have an editor that supports the Unicode character set we can include any of the Unicode characters in our Java programs. 4. THE Character CLASS The character class contains many useful methods for manipulating and testing characters. A Fragment of this class is presented in Figure 1. This fragment includes the following:        Character Constructor to create an instance of the class Character so that it represents the primitive value given as its argument. charValue Returns the value of an instance of the class Character. getNumericValue returns the Unicode numeric value of the character as a non-negative integer. isDigit determines if the specified character is a digit (a number). isLetter determines if the specified character is a letter. toLowerCase maps the given character to its lowercase equivalent; if the character has no lowercase equivalent, the character itself is returned. toUpperCase converts the character argument to uppercase. Figure 1: Class diagram for Character class Note: the above five functions are all class methods so are invoked by linking the desired method to the class name Character, e.g.: Character.isLetter(n); where n is a data item of type char. Note also that the Character class contains many methods of the form is... for carrying out various test on instances of the type Character. 5. CHARACTER INPUT Input, using the next method in the Scanner class is always in the form of a string. If, for example, we want integers or doubles we use the nextInt or nextDouble methods respectively. However there is no "nextChar" method. There are mechanisms for getting a single "char" from the input stream but at present we do not have sufficient knowledge to do this. However, what we can do is input a charcter as an ASCII integer and convert it to a "char" using a cast. Thus: char inputInt = input.next(); char inputChar = (char) inputInt; where input (in input.next()) is an instance of the Scanner class. Of course we can run the two statments together as follows: char inputChar = (char) input.next(); The code example presented in Table 1 indicates how two characters may be input. // CHARACTER INPUT APPLICATION // Pius Nyaanga // Thursday 3 July 2013 // St. Paul’s University import java.util.*; class CharacterInputApp { // ------------------- FIELDS -----------------------// Create Scanner class instance private static Scanner input = new Scanner(System.in); // ------------------ METHODS -----------------------public static void main(String[] args) { // Invite input System.out.println("Input two characters seperated by a " + "carriage return:"); // Read in input as a string. char inputChar1 = (char) input.nextInt(); char inputChar2 = (char) input.nextInt(); // Output the result System.out.println("input 1 = " + inputChar1 + " input 2 = " + inputChar2); } } Table 1: Character input code example 6. EXAMPLE PROBLEM LOWER TO UPPER CASE CONVERSION 6.1 Requirements To produce a program that converts lower case alphabetic characters to upper case alphabetic characters (Figure 2). Note that lower case letters a..z have Unicodes 97..122, and upper case letters A..Z have Unicodes 65..90. Therefore to convert from lower case to upper case we must subtract -32 from the Unicode of Figure 2: Lower to uppercase character conversion the input character. 6.2 Analysis Using "noun extraction" the class diagram presented in Figure 2 is proposed. 6.3 Design From Figure 3 the analysis indicates that we need to design a single class, Lower2UpperApp; all other methods used are contained in existing classes that come with the Java API. Figure 3: Lower to Upper case class diagram 6.3.1 Lower2UpperApp Class Field Summary private static Scanner input A class instance field to facilitate input from the input stream. Method Summary public static void main(String[] args) Main method to read in a character from the keyboard as a Unicode value, output this value (i.e. "echo" to the screen), and then convert to upper case equivalent by subtracting 32. Output this new Unicode value and the associated character. A Nassi-Shneiderman in Figure 4. Figure 4: Nassi-Shneiderman charts for Lower2UpperApp class method 6.4. Implementation 6.4.1 Lower2UpperApp Class The implementation for the Lower2UpperApp Class is given in Table 2. Points to note: 1. We use the nextInt method contained in the Scanner class to input an Unicode integer. 2. To covert a Unicode value into its character we use a cast: 3. character = (char) unicodeValue; // LOWER 2 UPPER APPLICATION // Pius Nyaanga // Tuesday 2 March 2013 // Wednesday 30 June 2013 //St. Paul’s University import java.util.*; class Lower2UpperApp { // ------------------- FIELDS -----------------------// Create Scanner class instance private static Scanner input = new Scanner(System.in); // ------------------ METHODS -----------------------public static void main(String[] args) { char upperCaseChar; int uniCodeValue; // Input a unicode value and output associated charcater System.out.print("Input a Unicode value: "); uniCodeValue = input.nextInt(); System.out.println("Character equivalent is : (char) uniCodeValue); " + // Subtract 32 to find uppercase equivalent and output. uniCodeValue = uniCodeValue-32; System.out.println("Unicode upper case equivalent is: " + uniCodeValue); upperCaseChar = (char) uniCodeValue; System.out.println("Upper case charactere is: " + upperCaseChar); } } Table 2: Lower to upper case conversion application (Version 1) Of course to be in tune with the spirit of OOP we should not write code where appropriate alternative pre-defined methods already exist (code reuse). Inspection of the character class indicates that there is a method toUpperCase already available. Thus an alternative encoding for the above might be as follows: // // // // LOWER 2 UPPER APPLICATION VERSION 2 St. Paul’s University Tuesday 2 March 2013 St. Paul’s University import java.util.*; class Lower2UpperApp2 { // ------------------- FIELDS -----------------------// Create Scanner class instance private static Scanner input = new Scanner(System.in); // ------------------ METHODS -----------------------public static void main(String[] args) { char lowerCaseChar, upperCaseChar; // Input a character and output associated unicode System.out.print("Input a Unicode value: lowerCaseChar = (char) input.nextInt(); "); // Convert to uppercase equivalent and output. upperCaseChar = Character.toUpperCase(lowerCaseChar); System.out.println("Upper case charactere is: " + upperCaseChar); } } Table 3: Lower to upper case conversion application (Version 2) 6.5 Testing Boundary Value Analysis (BVA) Testing: When using input variables that can only take a particular "range" of values it has been demonstrated that errors often occur at the boundaries of the input domain. It is for this reason that Boundary Value Analysis (BVA) has been developed as a testing technique. Boundary value analyses leads to a selection of test cases that exercise bounding values for data items. At its simplest this involves the derivation of test cases with values just above and just below the bounding values. Thus suitable boundary values for the above application will be '`', 'b', 'y' and '{' (the Unicode character code for the symbol ``' is 96, and that for the symbol `{' is 123). Limit testing is related to BVA testing, and is concerned EXPECTED TEST CASE with the generation of test cases to exercise the program RESULT when maximum and minimum input values are supplied. In Unicode Output the some cases this may be the maxima/minima for the type, in others this may be the limits of a particular range that we are interested in ('a' to 'z' in the above case). An appropriate set of BVA and limit test cases is given in the table below. These test cases will also serve to test the arithmetic operation of the code with the inclusion of a sample input value near the middle of the prescribed range (e.g. 'm'). We should also carry out some random data validation testing. number ("char" equivalent 96 (') '@' 97 (a) 'A' 98 (b) 'B' 77 ('m') 'M' 121 ('y') 'Y' 122 ('z') 'Z' 123 ('{') [ Some sample output using the above test cases is given in Table 4. $ $java Lower2UpperApp Input a Unicode value: 96 Character equivalent is : ` Unicode upper case equivalent is: 64 Upper case charactere is: @ $java Lower2UpperApp Input a Unicode value: 97 Character equivalent is : a Unicode upper case equivalent is: 65 Upper case charactere is: A $java Lower2UpperApp Input a Unicode value: 98 Character equivalent is : b Unicode upper case equivalent is: 66 Upper case charactere is: B $java Lower2UpperApp Input a Unicode value: 109 Character equivalent is : m Unicode upper case equivalent is: 77 Upper case charactere is: M $java Lower2UpperApp Input a Unicode value: 121 Character equivalent is : y Unicode upper case equivalent is: 89 Upper case charactere is: Y $java Lower2UpperApp Input a Unicode value: 122 Character equivalent is : z Unicode upper case equivalent is: 90 Upper case charactere is: Z $java Lower2UpperApp Input a Unicode value: 123 Character equivalent is : { Unicode upper case equivalent is: 91 Upper case charactere is: [ Table 4: Sample output Note that at present, given our current knowledge, we are still not in a position to prevent undesired inputs! Further examples of character manipulation are available. 7. THE System.out.flush METHOD When using System.out.print() to output data the data is first passed to a temporary storage area called a buffer from where it is output to (say) the screen. This arrangement is known as output buffering and is designed to save processing time, however it may cause code to appear to be behaving in a strange manner. This is because output is not always passed from the buffer to the screen immediately; the Java interpreter might process some further lines of code before doing this. To force the buffer to be flushed we can use the method: System.out.flush(); contained in the PrintStream and PrintWriter classes. For exmple we might write: System.out.print("Answer = "); System.out.flush(); System.out.print(100/5); This will cause the string "Answer = " to be output before the calculation is undertaken. The buffer is always flushed whenever a "new line" character is encountered. Therefore when using System.out.println() the above is not a problem. Created and maintained by Pius Nyaanga. Last updated 18 July 2013

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The Unicode Worldwide Character Standard is a character